I am attempting to parse the HTML of google's search results to grab the title of each result. This is done through android in a private nested class shown below:
private class WebScraper extends AsyncTask<String, Void, String> {
public WebScraper() {}
#Override
protected String doInBackground(String... urls) {
Document doc;
try {
doc = Jsoup.connect(urls[0]).get();
} catch (IOException e) {
System.out.println("Failed to open document");
return "";
}
Elements results = doc.getElementsByClass("rc");
int count = 0;
for (Element lmnt : results) {
System.out.println(count++);
System.out.println(lmnt.text());
}
System.out.println("Count is : " + count);
String key = "test";
//noinspection Since15
SearchActivity.this.songs.put(key, SearchActivity.this.songs.getOrDefault(key, 0) + 1);
// return requested
return "";
}
}
an example url I am trying to parse: http://www.google.com/#q=i+might+site:genius.com
For some reason, when i run the above code, my count is printed as 0, thus no elements are being stored in results. Any help is much appreciated! P.S. docs is definitely initialized and the HTML page is loading properly
This code will search a word like "Apple" in google and fetch all links from results and display their title and url. It can search upto 500 words in a day after that google detect it and stop giving results.
search="Apple"; //your word to be search on google
String userAgent = "ExampleBot 1.0 (+http://example.com/bot)";
Elements links=null;
try {
links = Jsoup.connect(google +
URLEncoder.encode(search,charset)).
userAgent(userAgent).get().select(".g>.r>a");
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
for (Element link : links) {
String title = link.text();
String url = link.absUrl("href"); // Google returns URLs in
format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
try {
url = URLDecoder.decode(url.substring(url.indexOf('=') +
1, url.indexOf('&')), "UTF-8");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if (!url.startsWith("http")) {
continue; // Ads/news/etc.
}
System.out.println("Title: " + title);
System.out.println("URL: " + url);
}
If you check source code of the Google's page, you will notice that it does not contain any text data which is shown normally in the browser - there is only a bunch of javascript code. That means that Google outputs all the search results dynamically.
Jsoup will fetch that javascript code and it will not find any html code with "rc" classes, that's why you get zero count in your code sample.
Consider using Google's public search API instead of direct parsing of its html pages: https://developers.google.com/custom-search/.
I completely agree with Matvey Sidorenko but for using the google public search API, you need to have the Google Api key. But the problem is that google limits 100 searches per api key, exceeding which, it stops working and it gets reset in 24 hours.
Recently i was working on a project where we needed to get the google search result links for different queries provided by the user, so as to overcome this issue of API limit, i made my own API that searches directly on google/ncr and gives you the result link.
Free Google Search API-
http://freegoogleapi.azurewebsites.net/ OR http://google.bittque.com
I used HTML-UNIT library for making this API.
You can use my API or you can use the HTML UNIT Library for achieving what you need.
Related
I'm trying to print out all the locations in the table on this wikipedia page: https://en.wikipedia.org/wiki/COVID-19_pandemic, but it always shows up blank. Is this a problem with my code or am I searching for the wrong html classes?
try {
Document doc = Jsoup.connect("https://en.wikipedia.org/wiki/COVID-19_pandemic").get();
for (Element row : doc.select("table.wikitable.plainrowheaders.sortable.jquery-tablesorter tr")){
if (row.select("th:nth-of-type(2)").text().equals("")){
continue;
}else {
final String location = row.select("th:nth-of-type(2)").text();
System.out.println(location);
}
}
} catch (IOException e) {
e.printStackTrace();
}
When I changed
doc.select("table.wikitable.plainrowheaders.sortable.jquery-tablesorter tr")
to
doc.select("table.wikitable tr")
I was able to get the countries names.
Please try.
due to the fact that I am a newbie, I am sorry in advance for making you angry ;-)
I am currently working on a project and today I reached a milestone. My API-Request works.
I am using this URL:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=turkey
My problem is , that for my app I only need the "Extract" part of this JSON (at the moment i receive everything). How can I go on?
Try this one:
try {
JSONObject jsonResult = new JSONObject(yourline);
String loudScreaming = jsonResult.getJSONObject("query").getJSONObject("pages").getJSONObject("11125639").getString("extract");
System.out.println(loudScreaming);
} catch (JSONException e) {
e.printStackTrace();
}
try this one:
public String getExtract(String json) {
try {
JSONObject object = new JSONObject(json).getJSONObject("query").getJSONObject("pages");
object = object.getJSONObject(object.names().get(0).toString());
return object.getString("extract");
} catch (Exception e) { return ""; }
}
in different wikis, pages object might have different page ids. But every JSON result has query, pages, and extract.
so we use the first item in pages object and return extract text from there.
I have an app, which sets the hardware parameters of the Camera programmatically.
However, as I've been told, and have come to observe, not all chipsets support all parameters.
For example, the Nexus 4 (Qualcomm) has sharpness, and sharpness-max parameters, the Galaxy Note II 3g doesn't have any.
Hence, when I set sharpness parameter, the Nexus responds well, but the Galaxy force closes:
java.lang.RuntimeException: setParameters failed
at android.hardware.Camera.native_setParameters(Native Method)
at android.hardware.Camera.setParameters(Camera.java:1452)
My question is, how can I get the RAW info programmatically? I need to get the parameters, their values, and whether they exist or not.
I wish to get the RAW-Metadata parameters, as like this: database
Alright, thought this would be a fun bit of practice. So, Android does not give a public API into this information. Why? I have no idea. Looks like you can do a Camera.Parameters#get(String) to check for any particular parameter that you're interested in, but lets say you're greedy and want the whole list to yourself. In that case, we can dive in using Reflection, but be aware that there is a strong possibility that this will not work on all versions of Android or may break in future versions. With that said, here's how you do it:
private static Map<String, String> getFullCameraParameters (Camera cam) {
Map<String, String> result = new HashMap<String, String>(64);
final String TAG = "CameraParametersRetrieval";
try {
Class camClass = cam.getClass();
//Internally, Android goes into native code to retrieve this String of values
Method getNativeParams = camClass.getDeclaredMethod("native_getParameters");
getNativeParams.setAccessible(true);
//Boom. Here's the raw String from the hardware
String rawParamsStr = (String) getNativeParams.invoke(cam);
//But let's do better. Here's what Android uses to parse the
//String into a usable Map -- a simple ';' StringSplitter, followed
//by splitting on '='
//
//Taken from Camera.Parameters unflatten() method
TextUtils.StringSplitter splitter = new TextUtils.SimpleStringSplitter(';');
splitter.setString(rawParamsStr);
for (String kv : splitter) {
int pos = kv.indexOf('=');
if (pos == -1) {
continue;
}
String k = kv.substring(0, pos);
String v = kv.substring(pos + 1);
result.put(k, v);
}
//And voila, you have a map of ALL supported parameters
return result;
} catch (NoSuchMethodException ex) {
Log.e(TAG, ex.toString());
} catch (IllegalAccessException ex) {
Log.e(TAG, ex.toString());
} catch (InvocationTargetException ex) {
Log.e(TAG, ex.toString());
}
//If there was any error, just return an empty Map
Log.e(TAG, "Unable to retrieve parameters from Camera.");
return result;
}
I am new to JSoup, Sorry if my question is too trivial.
I am trying to extract article text from http://www.nytimes.com/ but on printing the parse document
I am not able to see any articles in the parsed output
public class App
{
public static void main( String[] args )
{
String url = "http://www.nytimes.com/";
Document document;
try {
document = Jsoup.connect(url).get();
System.out.println(document.html()); // Articles not getting printed
//System.out.println(document.toString()); // Same here
String title = document.title();
System.out.println("title : " + title); // Title is fine
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
ok I have tried to parse "http://en.wikipedia.org/wiki/Big_data" to retrieve the wiki data, same issue here as well I am not getting the wiki data in the out put.
Any help or hint will be much appreciated.
Thanks.
Here's how to get all <p class="summary> text:
final String url = "http://www.nytimes.com/";
Document doc = Jsoup.connect(url).get();
for( Element element : doc.select("p.summary") )
{
if( element.hasText() ) // Skip those tags without text
{
System.out.println(element.text());
}
}
If you need all <p> tags, without any filtering, you can use doc.select("p") instead. But in most cases it's better to select only those you need (see here for Jsoup Selector documentation).
I am using Youtube api version 2 and using Java language to develop my project.
I want to fetch comment and replies associated with that comment. I do not know how to do it. I have tried all the possibilities that's written in the youtube development guide but not succeed.
Can any one give me code example how to achieve this?
Now I'm using this code to fetch the replies of that comment(hardcoded in the code)
int startIndex = 1, maxResult = 50, j = 0;
url = "http://gdata.youtube.com/feeds/api/videos/6afPT1hbMLk?client=comment+research?start-index%3D1";
YouTubeService service = new YouTubeService(appName);
YouTubeQuery query = new YouTubeQuery(new URL(url));
try {
CommentFeed commentFeed = service.query(query, CommentFeed.class);
} catch (IOException | ServiceException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
but it give me following exception...
com.google.gdata.util.ParseException: [Line 1, Column 268] Invalid root element, expected (namespace uri:local name) of (http://www.w3.org/2005/Atom:feed), found (http://www.w3.org/2005/Atom:entry
at com.google.gdata.util.XmlParser.throwParseException(XmlParser.java:730)
at com.google.gdata.util.XmlParser.parse(XmlParser.java:693)
at com.google.gdata.util.XmlParser.parse(XmlParser.java:576)
at com.google.gdata.data.BaseFeed.parseAtom(BaseFeed.java:867)
at com.google.gdata.wireformats.input.AtomDataParser.parse(AtomDataParser.java:68)
at com.google.gdata.wireformats.input.AtomDataParser.parse(AtomDataParser.java:39)