How can I full parsing HTML without third party library?

How can I full parsing HTML without third party library? - java

I am puzzled with this question.
I can parse a HTML like below way.
package org.owls.parser.html;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
public class HTMLParser {
public static String getHTTPStringsFromWeb(String urlStr) throws Exception {
StringBuffer sb = new StringBuffer();
URL url = new URL(urlStr);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
BufferedReader br = null;
if(con.getResponseCode() == HttpURLConnection.HTTP_OK)
{
br = new BufferedReader(new InputStreamReader(con.getInputStream()));
String line = "";
while((line = br.readLine()) != null){
sb.append(line);
}
br.close();
}
return sb.toString();
}
}
This code works well, but there is a problem. This code can not get dynamic data which made of ajax result.
So I want to get full page. Is it possible?
People talk about jsoup, but I want to know is there anyway to get this with native.
Thanks :D

There is an inherent problem in what you are trying to do, you need a web browser/environment to execute the ajax requests. reading them into a string and looking for url's is not enough, the functions may be doing something special with the data that you won't be able to support.
You will have to use something like phantomjs which can load and parse pages in a headless environment

Related

Google Custom Search Engine Java output

I am running this program just fine :
package cse;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
public class cse {
public static void main(String[] args) throws Exception {
String key = "My API KEY";
String qry = "Test";
if(qry.contains(" ")) {
qry = qry.replace(' ', '+');
System.out.println("here");
System.out.println(qry);
}
URL url = new URL("https://www.googleapis.com/customsearch/v1?key="+key+ "&cx=MY_CUSTOM SEARCH&q="+qry+"&alt=json");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/json");
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String output;
System.out.println("Output from Server .... \n");
while ((output = br.readLine()) != null) {
if(output.contains("\"link\": \"")) {
String link = output.substring(output.indexOf("\"link\": \"")+("\"link\": \"").length(), output.indexOf("\","));
System.out.println(link); //Will print the google search links
}
}
conn.disconnect();
}
}
Now what I want to do is have it only return the one result, and add this result to a string array. I could even just grab the first result and put it in there if that would be easier but I'm just not sure how to go about this.
So basically, if I searched the same query three times, the array would consist of [http://www.speedtest.net/,http://www.speedtest.net/,http://www.speedtest.net/]
Thanks in advance for any help, which would be much appreciated!

If you want to sort the data (ascending or descending) use a Query. The Table doesn't have any order. Even the Fields don't have any relevant order.

How to access http server directory from a java application?

I want to access a file stored under a http directory for ex: http://www.zen134237.zen.co.uk/
from a java application. Assume the server allows directory listing or it is on FTP. could some one point me in right direction ?

You mentioned you want to access a file under the http directory. Do you know the location of this file? You can use Java's net package and open a url connection. Then extract the contents from this url connection.
Taken from the java tutorials:
http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}

There are numerous examples on the web, so I'll just leave two pointers here:
Writing an FTP Client
Apache HttpClient - Tutorial

How to get data from a URL?

I have data provided in a website and it is presented as in the following image:
This website offers values of some parameters which are updated every while.
Now I want to retrieve these data according to the column name and the row number in the Android app, I know how to open http connection. But unfortunately I have no idea from where should I start and how to read the data provided in the image.

Unless you have a special data source to work on, you have to read the website's contents then process it manually. Here is a link from the java tutorials on how to read from an URL connection.
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
EDIT:
If you are behind a proxy you should also set these system properties (to the appropriate values):
System.setProperty("http.proxyHost", "3.182.12.1");
System.setProperty("http.proxyPort", "1111");

If the data is only clear text and the format of table doesnt change you can parse the entire table, for Example after reading the "------- ..." Line you can parse the values using a scanner:
Scanner s;
while ((inputLine = in.readLine()) != null)
{
s = new Scanner(input).useDelimiter(" ");
//Then readthe Values like
value = s.next()); // add all values in a list or array
}
s.close();

You have to parse the whole content. Couldn't you call a webservice to get this data, or directly the database which belongs to this view?

How to fetch content from other websites in jsp and java ???

I am a beginner in java programming . i want to establish as a java programmer in web based application . Now at begning i want to develop cms and also want to fetch data from other website so i am looking for the effective code.
thank you

I guess you want something like java.net.URL.
Example:
import java.net.*;
import java.io.*;
public class Main {
public void main(String[] args) {
URL url = new URL("http://stackoverflow.com");
URLConnection con = url.openConnection();
BufferedReader br = new BufferedReader(new InputstreamReader(con.openStream()));
String line;
while((line = br.readLine()) != null)
System.out.println(line);
}
This should print out the source code of Stackoverflow.com.

Use JSTL <c:import /> to importing content.

How to read a text file directly from Internet using Java?

I am trying to read some words from an online text file.
I tried doing something like this
File file = new File("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner scan = new Scanner(file);
but it didn't work, I am getting
http://www.puzzlers.org/pub/wordlists/pocket.txt
as the output and I just want to get all the words.
I know they taught me this back in the day but I don't remember exactly how to do it now, any help is greatly appreciated.

Use an URL instead of File for any access that is not on your local computer.
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner s = new Scanner(url.openStream());
Actually, URL is even more generally useful, also for local access (use a file: URL), jar files, and about everything that one can retrieve somehow.
The way above interprets the file in your platforms default encoding. If you want to use the encoding indicated by the server instead, you have to use a URLConnection and parse it's content type, like indicated in the answers to this question.
About your Error, make sure your file compiles without any errors - you need to handle the exceptions. Click the red messages given by your IDE, it should show you a recommendation how to fix it. Do not start a program which does not compile (even if the IDE allows this).
Here with some sample exception-handling:
try {
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
Scanner s = new Scanner(url.openStream());
// read from your scanner
}
catch(IOException ex) {
// there was some connection problem, or the file did not exist on the server,
// or your URL was not in the right format.
// think about what to do now, and put it here.
ex.printStackTrace(); // for now, simply output it.
}

try something like this
URL u = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
InputStream in = u.openStream();
Then use it as any plain old input stream

What really worked to me: (source: oracle documentation "reading url")
import java.net.*;
import java.io.*;
public class UrlTextfile {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://yoursite.com/yourfile.txt");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}

Using Apache Commons IO:
import org.apache.commons.io.IOUtils;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.charset.StandardCharsets;
public static String readURLToString(String url) throws IOException
{
try (InputStream inputStream = new URL(url).openStream())
{
return IOUtils.toString(inputStream, StandardCharsets.UTF_8);
}
}

Use this code to read an Internet resource into a String:
public static String readToString(String targetURL) throws IOException
{
URL url = new URL(targetURL);
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(url.openStream()));
StringBuilder stringBuilder = new StringBuilder();
String inputLine;
while ((inputLine = bufferedReader.readLine()) != null)
{
stringBuilder.append(inputLine);
stringBuilder.append(System.lineSeparator());
}
bufferedReader.close();
return stringBuilder.toString().trim();
}
This is based on here.

For an old school input stream, use this code:
InputStream in = new URL("http://google.com/").openConnection().getInputStream();

I did that in the following way for an image, you should be able to do it for text using similar steps.
// folder & name of image on PC
File fileObj = new File("C:\\Displayable\\imgcopy.jpg");
Boolean testB = fileObj.createNewFile();
System.out.println("Test this file eeeeeeeeeeeeeeeeeeee "+testB);
// image on server
URL url = new URL("http://localhost:8181/POPTEST2/imgone.jpg");
InputStream webIS = url.openStream();
FileOutputStream fo = new FileOutputStream(fileObj);
int c = 0;
do {
c = webIS.read();
System.out.println("==============> " + c);
if (c !=-1) {
fo.write((byte) c);
}
} while(c != -1);
webIS.close();
fo.close();

Alternatively, you can use Guava's Resources object:
URL url = new URL("http://www.puzzlers.org/pub/wordlists/pocket.txt");
List<String> lines = Resources.readLines(url, Charsets.UTF_8);
lines.forEach(System.out::println);

corrected method is deprecated now. It is giving the option
private WeakReference<MyActivity> activityReference;
here solution will useful.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I full parsing HTML without third party library? - java

Related

Google Custom Search Engine Java output

How to access http server directory from a java application?

How to get data from a URL?

How to fetch content from other websites in jsp and java ???

How to read a text file directly from Internet using Java?

Categories

Resources