Retrieve Google results programmatically

Retrieve Google results programmatically - java

How do I create a Java program that enters the words "Hello World" into Google and then retrieves the html from the results page? I'm not trying to use the Robot class.

URL url = new URL("http://www.google.com/search?q=hello+world");
url.openStream(); // returns an InputStream which you can read with e.g. a BufferedReader
If you make repeated programmatic requests to Google in this way they will start to redirect you to "we're sorry but you look like a robot" pages pretty quick.
What you may be better doing is using Google's custom search api.

For performing google search through a program, you will need a developer api key and a custom search engine id. You can get the developer api key and custom search engine id from below urls.
https://cloud.google.com/console/project'>Google Developers Console
https://www.google.com/cse/all'>Google Custom Search
After you got the both the key and id use it in below program. Change apiKey and customSearchEngineKey with your keys.
For step by step information please visit - http://www.basicsbehind.com/google-search-programmatically/
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
public class CustomGoogleSearch {
final static String apiKey = "AIzaSyAFmFdHiFK783aSsdbq3lWQDL7uOSbnD-QnCnGbY";
final static String customSearchEngineKey = "00070362344324199532843:wkrTYvnft8ma";
final static String searchURL = "https://www.googleapis.com/customsearch/v1?";
public static String search(String pUrl) {
try {
URL url = new URL(pUrl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
StringBuffer buffer = new StringBuffer();
while ((line = br.readLine()) != null) {
buffer.append(line);
}
return buffer.toString();
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
private static String buildSearchString(String searchString, int start, int numOfResults) {
String toSearch = searchURL + "key=" + apiKey + "&cx=" + customSearchEngineKey + "&q=";
// replace spaces in the search query with +
String newSearchString = searchString.replace(" ", "%20");
toSearch += newSearchString;
// specify response format as json
toSearch += "&alt=json";
// specify starting result number
toSearch += "&start=" + start;
// specify the number of results you need from the starting position
toSearch += "&num=" + numOfResults;
System.out.println("Seacrh URL: " + toSearch);
return toSearch;
}
public static void main(String[] args) throws Exception {
String url = buildSearchString("BasicsBehind", 1, 10);
String result = search(url);
System.out.println(result);
}
}

Related

G suite account get report java sample question

I am trying to use this api to get report with java, and here is the link
https://developers.google.com/admin-sdk/reports/v1/appendix/activity/meet
and here is what i am using now
public static String getGraph() {
String PROTECTED_RESOURCE_URL = "https://www.googleapis.com/admin/reports/v1/activity/users/all/applications/meet?eventName=call_ended&maxResults=10&access_token=";
String graph = "";
try {
URL urUserInfo = new URL(PROTECTED_RESOURCE_URL + "access_token");
HttpURLConnection connObtainUserInfo = (HttpURLConnection) urUserInfo.openConnection();
if (connObtainUserInfo.getResponseCode() == HttpURLConnection.HTTP_OK) {
StringBuilder sbLines = new StringBuilder("");
BufferedReader reader = new BufferedReader(
new InputStreamReader(connObtainUserInfo.getInputStream(), "utf-8"));
String strLine = "";
while ((strLine = reader.readLine()) != null) {
sbLines.append(strLine);
}
graph = sbLines.toString();
}
} catch (IOException ex) {
x.printStackTrace();
}
return graph;
}
I am pretty sure it's not a smart way to do that and the string I get is quite complex, are there any jave sample that i can get the data directly instead of using java origin httpRequest
Or, are there and class I can import to switch the json string to the object!?
Anyone can help?!
I have trying this for many days already!
Thanks!!

Googles Custom Search as if manually searched

I want to use Googles Custom Search Api for searching for song lyrics in the web via Java.
For getting the name and artist of current song playing I use Tesseract OCR. Even if the OCR works perfectly, I often don't get any results.
But when I try it manually: open Google in the web browser and search for the same string, then it works fine.
So now I don't really know what is the difference between the manual search engine and the api call.
Do I have to add some parameters to the Api request?
//The String searchString is what I am searching for, so the song name and artist
String searchUrl = "https://www.googleapis.com/customsearch/v1?key=(myKEY)=de&cx=(myID)&q=" + searchString + "lyrics";
String data = getData(searchUrl);
JSONObject json = new JSONObject(data);
String link = "";
try
{
link = json.getJSONArray("items").getJSONObject(0).getString("link");
URI url = new URI(link);
System.out.println(link);
Desktop.getDesktop().browse(url);
}
catch(Exception e)
{
System.out.println("No Results");
}
private static String getData(String _urlLink) throws IOException
{
StringBuilder result = new StringBuilder();
URL url = new URL(_urlLink);
URLConnection conn = url.openConnection();
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while((line = rd.readLine()) != null)
{
result.append(line);
}
rd.close();
return result.toString();
}

Try to remove =de before &cx and use + to represent the space between words. Like this - https://www.googleapis.com/customsearch/v1?key=(yourKEY)&cx=(yourID)&q=paradise+coldplay+lyrics

how to take multiple numbers from string to double varibles

I would like to pass some values i have from a string in to double variables. the string output looks like this:
{
"high":"1635.07",
"last":"1635.07",
"timestamp":"1489299397",
"volume":"321.34139374",
"vwap":"1602.72987907",
"low":"1595.03",
"ask":"1635.89",
"bid":"1605.10"
}
I just want this data to be like:
double high = (value of high in string);
double last = (value of last in string);
ect...
Im having trouble as java throws an error I believe because of the mix of words and numbers.
Thanks in advance for the help.
code:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import javax.swing.JOptionPane;
public class btc {
private final String USER_AGENT = "Mozilla/5.0";
public static void main(String[] args) throws Exception {
btc http = new btc();
http.sendGet();
}
// HTTP GET request
private void sendGet() throws Exception {
String url = "https://api.quadrigacx.com/v2/ticker?book=btc_cad";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
// optional default is GET
con.setRequestMethod("GET");
//add request header
con.setRequestProperty("User-Agent", USER_AGENT);
System.out.println("\nSending 'GET' request to URL : " + url);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
//write to variables
String test = response.toString();
//double high = test("high");
//Double high = Double.parseDouble(test);
System.out.println(test);
//print result
//JOptionPane.showMessageDialog(null, response.toString());
}
}

As already mentioned in the comments what you receiving from the server is a JSON object as documented in QuadrigaCX's API description so it should be parsed as such as the order of the members may vary aswell as the whitespace.
What's interesting about this JSON string is that all values are actually strings as they are enclosed in double quotation marks. But these strings contain values that can be interpreted and parsed as double.
Using minimal-json, which is a minimalistic Java library that allows you to parse JSON and access contained values directly. The following code makes use of it and "reads" high and last as double values:
JsonObject jsonObject = Json.parse(responseBody).asObject();
double high = Double.parseDouble(jsonObject.get("high").asString());
double last = Double.parseDouble(jsonObject.get("last").asString());
Here responseBody corresponds to what you have named test in your sendGet method and is the response from the web server as one string.

How to get name of website from any string url [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have given String which contains any valid url.
I have to find only name of website from given url.
I have also ignore sub domains.
like
http://www.yahoo.com => yahoo
www.google.co.in => google
http://in.com => in
http://india.gov.in/ => india
https://in.yahoo.com/ => yahoo
http://philotheoristic.tumblr.com/ =>tumblr
http://philotheoristic.tumblr.com/
https://in.movies.yahoo.com/ =>yahoo
How to do this

Yo can make use of URL
From Documentation - http://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html
import java.net.*;
import java.io.*;
public class ParseURL {
public static void main(String[] args) throws MalformedURLException {
URL aURL = new URL("http://example.com:80/docs/books/tutorial"
+ "/index.html?name=networking#DOWNLOADING");
System.out.println("protocol = " + aURL.getProtocol());
System.out.println("authority = " + aURL.getAuthority());
System.out.println("host = " + aURL.getHost());
System.out.println("port = " + aURL.getPort());
System.out.println("path = " + aURL.getPath());
System.out.println("query = " + aURL.getQuery());
System.out.println("filename = " + aURL.getFile());
System.out.println("ref = " + aURL.getRef());
}
}
Here is the output displayed by the program:
protocol = http
authority = example.com:80
host = example.com // name of website
port = 80
path = /docs/books/tutorial/index.html
query = name=networking
filename = /docs/books/tutorial/index.html?name=networking
ref = DOWNLOADING
So by using aURL.getHost() you can get website name. To ignore sub domains you can split it with "." Therefore it becomes aURL.getHost().split(".")[0] to get only name.

Regular expressions may help you:
String str = "www.google.co.in";
String [] res = str.split("(\\.|//)+(?=\\w)");
System.out.println(res[1]);
A regular expression is a way to represent a set of strings. This set is composed by any string matching the expression. In the code above, the string used as split argument is the regular expression that matches: Any "." followed by an alphanumeric text OR "//" followed by an alphanumeric text.
So these "." and "//" substrings are the separators used to split the string in parts, being the first one the site name.
In "www.google.co.in", the string would be splited this way: goole, co, in. Since the solution is using the first element of the spit array, the result is: google.

I found similar contents. although some different.
http://www.yahoo.com => Yahoo
http://www.google.co.in => Google
http://in.com => In.com Offers Videos, News, Photos, Celebs, Live TV Channels.....
http://india.gov.in/ => National Portal of India
https://in.yahoo.com/ => Yahoo India
http://philotheoristic.tumblr.com/ => Philotheoristic
https://in.movies.yahoo.com/ => Yahoo India Movies - Bollywood News, Movie Reviews & Hindi Movie Videos
here is the code
public class TitleExtractor {
/* the CASE_INSENSITIVE flag accounts for
* sites that use uppercase title tags.
* the DOTALL flag accounts for sites that have
* line feeds in the title text */
private static final Pattern TITLE_TAG =
Pattern.compile("\\<title>(.*)\\</title>", Pattern.CASE_INSENSITIVE|Pattern.DOTALL);
/**
* #param url the HTML page
* #return title text (null if document isn't HTML or lacks a title tag)
* #throws IOException
*/
public static String getPageTitle(String url) throws IOException {
URL u = new URL(url);
URLConnection conn = u.openConnection();
// ContentType is an inner class defined below
ContentType contentType = getContentTypeHeader(conn);
if (!contentType.contentType.equals("text/html"))
return null; // don't continue if not HTML
else {
// determine the charset, or use the default
Charset charset = getCharset(contentType);
if (charset == null)
charset = Charset.defaultCharset();
// read the response body, using BufferedReader for performance
InputStream in = conn.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in, charset));
int n = 0, totalRead = 0;
char[] buf = new char[1024];
StringBuilder content = new StringBuilder();
// read until EOF or first 8192 characters
while (totalRead < 8192 && (n = reader.read(buf, 0, buf.length)) != -1) {
content.append(buf, 0, n);
totalRead += n;
}
reader.close();
// extract the title
Matcher matcher = TITLE_TAG.matcher(content);
if (matcher.find()) {
/* replace any occurrences of whitespace (which may
* include line feeds and other uglies) as well
* as HTML brackets with a space */
return matcher.group(1).replaceAll("[\\s\\<>]+", " ").trim();
}
else
return null;
}
}
/**
* Loops through response headers until Content-Type is found.
* #param conn
* #return ContentType object representing the value of
* the Content-Type header
*/
private static ContentType getContentTypeHeader(URLConnection conn) {
int i = 0;
boolean moreHeaders = true;
do {
String headerName = conn.getHeaderFieldKey(i);
String headerValue = conn.getHeaderField(i);
if (headerName != null && headerName.equals("Content-Type"))
return new ContentType(headerValue);
i++;
moreHeaders = headerName != null || headerValue != null;
}
while (moreHeaders);
return null;
}
private static Charset getCharset(ContentType contentType) {
if (contentType != null && contentType.charsetName != null && Charset.isSupported(contentType.charsetName))
return Charset.forName(contentType.charsetName);
else
return null;
}
/**
* Class holds the content type and charset (if present)
*/
private static final class ContentType {
private static final Pattern CHARSET_HEADER = Pattern.compile("charset=([-_a-zA-Z0-9]+)", Pattern.CASE_INSENSITIVE|Pattern.DOTALL);
private String contentType;
private String charsetName;
private ContentType(String headerValue) {
if (headerValue == null)
throw new IllegalArgumentException("ContentType must be constructed with a not-null headerValue");
int n = headerValue.indexOf(";");
if (n != -1) {
contentType = headerValue.substring(0, n);
Matcher matcher = CHARSET_HEADER.matcher(headerValue);
if (matcher.find())
charsetName = matcher.group(1);
}
else
contentType = headerValue;
}
}
}
Making use of this class is simple:
String title = TitleExtractor.getPageTitle("http://en.wikipedia.org/");
System.out.println(title);
here is the link:
http://www.gotoquiz.com/web-coding/programming/java-programming/how-to-extract-titles-from-web-pages-in-java/
I hope it is help you.

There is no any possible way to find out valid website name from url. But if you are trying to cut a particular part of url string, you can do this by string operation as follows
if(url.endsWith("co.in"){
website = url.substring(indexOfLostThirdDot, indexofco.in)
}

Read text from a webpage in Java

I'm new to Java and I'm trying to create a lib for fetching Brazilian addresses from a webservice but I can't read the response.
in the constructor of the class I have this result string to which I want to append the response, once this variable is populated with the response I will know what to do.
The problem is: for some reason, I guess the BufferedReader object is not working so there is no response to be read :/
Here is the code:
package cepfacil;
import java.net.*;
import java.io.*;
import java.io.IOException;
public class CepFacil {
final String baseUrl = "http://www.cepfacil.com.br/service/?filiacao=%s&cep=%s&formato=%s";
private String zipCode, apiKey, state, addressType, city, neighborhood, street, status = "";
public CepFacil(String zipCode, String apiKey) throws IOException {
String line = "";
try {
URL apiUrl = new URL("http://www.cepfacil.com.br/service/?filiacao=" + apiKey + "&cep=" +
CepFacil.parseZipCode(zipCode) + "&formato=texto");
String result = "";
BufferedReader in = new BufferedReader(new InputStreamReader(apiUrl.openStream()));
while ((line = in.readLine()) != null) {
result += line;
}
in.close();
System.out.println(line);
} catch (MalformedURLException e) {
e.printStackTrace();
}
this.zipCode = zipCode;
this.apiKey = apiKey;
this.state = state;
this.addressType = addressType;
this.city = city;
this.neighborhood = neighborhood;
this.street = street;
}
}
So here is how the code is supposed to work, you build an object like this:
String zipCode = "53416-540";
String token = "0E2ACA03-FC7F-4E87-9046-A8C46637BA9D";
CepFacil address = new CepFacil(zipCode, token);
// so the apiUrl object string inside my class definition will look like this:
// http://www.cepfacil.com.br/service/?filiacao=0E2ACA03-FC7F-4E87-9046-A8C46637BA9D&cep=53416540&formato=texto
// which you can check, is a valid url with content in there
I've omitted some parts of this code for brevity but all the methods called in the constructor are defined in my code and there is no compilation or runtime error going on.
I'd appreciate any help you could give me and I'd love to hear the simplest posible solutions :)
Thanks in advance!
UPDATE: now that I could fix the problem (huge props to #Uldz for pointing me the problem out) it is open sourced http://www.rodrigoalvesvieira.com/cepfacil/

In
System.out.println(line + "rodrigo");
you output line not the result. Maybe last line is empty?

There could be multiple reasons.
wrap your URL in HttpURLConnection, this will help you to see the response code and more info on response you get from server.

You could/should add an encoding to InputStreamReader.
And then result does not add newlines.
BufferedReader in = new BufferedReader(new InputStreamReader(apiUrl.openStream()));
while ((line = in.readLine()) != null) {
System.out.println("Line: " + line);
String[] keyValue = line.split("\\s*=\\s*", 2);
if (keyValue.length != 2) {
System.err.println("*** Line: " + line);
continue;
}
switch (keyValue[0]) {
case "status":
status = keyValue[1];
break;
...
default:
System.err.println("*** Key wrong: " + line);
}
result += line + "\n";
}
in.close();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Retrieve Google results programmatically - java

How do I create a Java program that enters the words "Hello World" into Google and then retrieves the html from the results page? I'm not trying to use the Robot class.

Related

G suite account get report java sample question

Googles Custom Search as if manually searched

how to take multiple numbers from string to double varibles

How to get name of website from any string url [closed]

Read text from a webpage in Java

Categories

Resources