JSoup not showing correct text - java

So I wanted to create a Java App which crawls the Songname of a website called chillstep.info and saves it into a .txt file. However JSoup prints this out:
<div id="titel">
♫
</div>
Here's the code:
public class Crawltitle {
public static void getTitle() throws IOException{
Document doc = Jsoup.connect("http://chillstep.info/").get();
String title = doc.getElementById("titel").outerHtml();
System.out.println(title);
}
public static void main(String[] args) throws IOException{
getTitle();
}
}
Is this problem because of the website (if yes, why and how to solve that problem) or JSoups?

The title is loaded dynamically via
http://chillstep.info/jsonInfo.php
You still can use Jsoup to get this, if you ignore the usually allowed content type:
Connection con = Jsoup
.connect("http://chillstep.info/jsonInfo.php")
.ignoreContentType(true);
Response res = con.execute();
String rawJSON = res.body();
Note that I did not use the JSoup parser. So you might as well have used any other library to get HTTP content, like Apache HtmlClient or such.
At this point you can parse the answser with a json library of your choice. Or do it "by hand" since it is so simple:
String title = rawJSON.replaceAll(".*:\"([^\"]*).*","$1");

Related

Spring Boot Web RestTemplate Send Object as Query Param

I want to make a POST request with URL Query Params set to the values of an object.
For example
http://test/data?a=1&b=2&c=3
I want to make a post request to this URL with a class like this:
public class Data {
private Integer a;
private Integer b;
private Integer c;
}
I do NOT want to do each field manually, like this:
public void sendRequest(Data data) {
String url = UriComponentsBuilder.fromHttpUrl("http://test/")
.queryParam("a", data.getA())
.queryParam("b", data.getB())
.queryParam("c", data.getC())
.toUriString();
restTemplate.postForObject(url, body, Void.class);
}
Instead, I want to use the entire object:
public void sendRequest(Data data) {
String url = UriComponentsBuilder.fromHttpUrl("http://test/")
.queryParamsAll(data) //pseudo
.toUriString();
restTemplate.postForObject(url, body, Void.class);
}
Your requirement is like QS in js. Thx qianshui423/qs . It is implementation QS in java. It is coded by a Chinese guy. At first git clone it and use below cmd to build. You will get a jar called "qs-1.0.0.jar" in build/libs (JDK required version 8)
# cd qs directory
./gradlew build -x test
Import it, I do a simple demo as below. For your requirement, you can build class to transfer your Obj into QSObject. Besides toQString, QS can parse string to QSObject. I think it powerful.
import com.qs.core.QS;
import com.qs.core.model.QSObject;
public class Demo {
public static void main(String[] args) throws Exception{
QSObject qsobj = new QSObject();
qsobj.put("a",1);
qsobj.put("b",2);
qsobj.put("c",3);
String str = QS.toQString(qsobj);
System.out.println(str); // output is a=1&b=2&c=3
}
}

How to create an empty folder in google cloud bucket using Java API

I am using the JSON API - Google API Client Library for Java to access the objects in Google Cloud Storage. I need to create (not upload) an empty folder in the bucket. Google Developer Web Console has that option to creating a directory, but neither the Java API nor the gsutil command has a create folder command. If anybody knows how to do so, please let me know. Thanks in advance...
You can emulate a folder by uploading a zero-sized object with a trailing slash.
As noted in the question comments, Google Cloud Storage is not a filesystem and emulating folders has serious limitations.
I think is better that you create the folder within the file name. For example if you need a folder called images and other one called docs, when you give the name of the object to upload do it in the following way images/name_of_file or docs/name_of_file.
If the name of the file is images/dogImage and you upload that file, you will find in your bucket a folder called images.
I hope to help you and others
This is my Java method to create an empty (emulated) folder:
public static void createFolder(String name) throws IOException {
Channels.newOutputStream(
createFile(name + "/")
).close();
}
public static GcsOutputChannel createFile(String name) throws IOException {
return getService().createOrReplace(
new GcsFilename(getName(), name),
GcsFileOptions.getDefaultInstance()
);
}
private static String name;
public static String getName() {
if (name == null) {
name = AppIdentityServiceFactory.getAppIdentityService().getDefaultGcsBucketName();
}
return name;
}
public static GcsService service;
public static GcsService getService() {
if (service == null) {
service = GcsServiceFactory.createGcsService(
new RetryParams.Builder()
.initialRetryDelayMillis(10)
.retryMaxAttempts(10)
.totalRetryPeriodMillis(15000)
.build());
}
return service;
}

Getting text off a website and set it as a string in Java

I'm trying to get some text from a website and set it as a String in Java.
I have little to no experience with web connections in Java and would appreciate some help.
Here's what I've got so far:
static String wgetURL = "http://www.realmofthemadgod.com/version.txt";
static Document Version;
static String displayLink = "http://www.realmofthemadgod.com/AGCLoader" + Version + ".swf";
public static void main(String[] args) throws IOException{
Version = Jsoup.connect(wgetURL).get();
System.out.println(Version);
JOptionPane.showMessageDialog(null, Version, "RotMG SWF Finder", JOptionPane.DEFAULT_OPTION);
}
I'm trying to use Jsoup but I keep getting startup errors (it has issues when starting up).
Your problem is not Jsoup related.
You are trying to create a String with Version while Version is not defined.
Change your code to:
public static void main(String[] args) throws IOException{
String url = "http://www.realmofthemadgod.com/version.txt"
Document doc = Jsoup.connect(url).get();
System.out.println(doc);
// query doc using jsoup ...
}

How can I crawl a webPage which through javascript dynamically generated like this

I hava crawl the webPageSource by use the selenium and the code like this
public static String getCurrentPageSource(String url) {
webDriver = new ChromeDriver();
webDriver.get(url);
String src = webDriver.getPageSource();
//writeInfile(src);
webDriver.close();
return src;
}
but I can't see the remain time in the pageSource I have crawl Because it is dynamically generated by JS.

NekoHTML SAX fragment parsing

I'm trying to parse a simple fragment of HTML with NekoHTML :
<h1>This is a basic test</h1>
To do so, I've set a specific Neko feature not to have any HTML, HEAD or BODY tag calling startElement(..) callback.
Unfortunatly, it doesn't work for me.. I certainly missed something but can't figured out what it would be.
Here is a very simple code to reproduce my problem :
public static class MyContentHandler implements ContentHandler {
public void characters(char[] ch, int start, int length) throws SAXException {
String text = String.valueOf(ch, start, length);
System.out.println(text);
}
public void startElement(String nameSpaceURI, String localName, String rawName, Attributes attributes) throws SAXException {
System.out.println(rawName);
}
public void endElement(String nameSpaceURI, String localName, String rawName) throws SAXException {
System.out.println("end " + localName);
}
}
And the main() to launch a test :
public static void main(String[] args) throws SAXException, IOException {
SAXParser saxReader = new SAXParser();
// set the feature like explained in documentation : http://nekohtml.sourceforge.net/faq.html#fragments
saxReader.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);
saxReader.setContentHandler(new MyContentHandler());
saxReader.parse(new InputSource(new StringInputStream("<h1>This is a basic test</h1>")));
}
The corresponding output :
HTML
HEAD
end HEAD
BODY
H1
This is a basic test
end H1
end BODY
end HTML
whereas I was expecting
H1
This is a basic test
end H1
Any idea ?
I finally got it !
Actually, I was parsing my HTML string in a GWT application, where I've added the gwt-dev.jar dependency. This jar packages a lot of external librairies, like the xercesImpl. But the version of embedded xerces classes does not match the one requiered by NeokHTML.
As a (strange) result, it appears that NeokHTML SAX parser didn't use any custom feature when using gwt-dev embedded xerces version.
So, I had to rework some code to remove the gwt-dev dependency, which by the way is not recommanded to be added to any standard GWT project.

Categories

Resources