I am following a tutorial on web scraping from the book "Web Scraping with Java". The following code gives me a nullPointerExcpetion. Part of the problem is that (line = in.readLine()) is always null, so the while loop at line 33 never runs. I do not know why it is always null however. Can anyone offer me insight into this? This code should print the first paragraph of the wikipedia article on CPython.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.*;
import java.io.*;
public class WikiScraper {
public static void main(String[] args) {
scrapeTopic("/wiki/CPython");
}
public static void scrapeTopic(String url){
String html = getUrl("http://www.wikipedia.org/"+url);
Document doc = Jsoup.parse(html);
String contentText = doc.select("#mw-content-text > p").first().text();
System.out.println(contentText);
}
public static String getUrl(String url){
URL urlObj = null;
try{
urlObj = new URL(url);
}
catch(MalformedURLException e){
System.out.println("The url was malformed!");
return "";
}
URLConnection urlCon = null;
BufferedReader in = null;
String outputText = "";
try{
urlCon = urlObj.openConnection();
in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
String line = "";
while((line = in.readLine()) != null){
outputText += line;
}
in.close();
}catch(IOException e){
System.out.println("There was an error connecting to the URL");
return "";
}
return outputText;
}
}
If you enter http://www.wikipedia.org//wiki/CPython in web browser, it will be redirected to https://en.wikipedia.org/wiki/CPython, so
use String html = getUrl("https://en.wikipedia.org/"+url);
instead String html = getUrl("http://www.wikipedia.org/"+url);
then line = in.readLine() can really read something.
Related
Thanks in advance for every input!
I'm getting a little familiar with how to read data from websites with Java and have tried to do this by reading data using a URLConnectionReader.
Unfortunately I get an UnknownHostException when I test the whole thing in a Java online compiler (https://www.jdoodle.com/online-java-compiler/).
Have I forgotten any imports? I proceeded according to a tutorial.
Code: (designed for online-java-compiler jdoodle):
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String[] args)
{
String output = getUrlContents("https://www.tradegate.de/orderbuch_umsaetze.php?isin=NO0010892359");
System.out.println(output);
}
private static String getUrlContents(String theUrl)
{
StringBuilder content = new StringBuilder();
try
{
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null)
{
content.append(line + "\n");
}
bufferedReader.close();
}
catch(Exception e)
{
e.printStackTrace();
}
return content.toString();
}
}
Error message:
java.net.UnknownHostException: www.tradegate.de
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:591)
at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:285)
at java.base/sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:182)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)
at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)
at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:265)
at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:372)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1515)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:250)
at URLConnectionReader.getUrlContents(URLConnectionReader.java:21)
at URLConnectionReader.main(URLConnectionReader.java:8)
I separated the classes as follows and your code works without any exceptions=>
class Mian:
public class Mian {
public static void main(String[] args) throws ClassNotFoundException {
URLConnectionReader urlcr = new URLConnectionReader();
String output =
urlcr.getUrlContents("https://www.tradegate.de/orderbuch_umsaetze.php?
isin=NO0010892359");
System.out.println(output);
}
}
and URLConnectionReader class:
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public String getUrlContents(String theUrl)
{
StringBuilder content = new StringBuilder();
try
{
URL url = new URL(theUrl);
URLConnection urlConnection = url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(urlConnection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null)
{
content.append(line + "\n");
}
bufferedReader.close();
}
catch(Exception e)
{
e.printStackTrace();
}
return content.toString();
}
}
I am trying to create a simple java command line code that will accept the URL from a playlist in command line and it should return the playlist content.
I am getting the following response back Enter playlist url here (0 to quit):
http://gv8748.lu.edu:8084/sweng987/simple-01/playlist.m3u8
java.net.MalformedURLException: no protocol: http://lu8748.lu.edu:8084/sweng987/simple-01/playlist.m3u8
at java.base/java.net.URL.<init>(URL.java:627)
at java.base/java.net.URL.<init>(URL.java:523)
at java.base/java.net.URL.<init>(URL.java:470)
at edu.lu.sweng987.SimplePlaylist.getPlaylistUrl(SimplePlaylist.java:36)
at edu.lu.sweng987.SimplePlaylist.main(SimplePlaylist.java:21)
My code is the following
package edu.psgv.sweng861;
import java.util.Scanner;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.net.*;
public class SimplePlaylist {
private SimplePlaylist() {
//don't allow instances
}
// The main function returns the URL entered
public static void main(String[] args) throws IOException{
String output = getPlaylistUrl("");
System.out.println(output);
}
private static String getPlaylistUrl(String theUrl) {
String content = "";
Scanner scanner = new Scanner(System.in);
boolean validInput = false;
System.out.println("Enter playlist url here (0 to quit):");
content = scanner.nextLine();
try {
URL url = new URL(theUrl);
URLConnection urlConnection = (HttpURLConnection) url.openConnection();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String line;
while ((line = bufferedReader.readLine()) != null) {
content += line + "\n";
}
bufferedReader.close();
} catch(Exception e) {
e.printStackTrace();
}
return content;
}
}
You have incorrectly used the parameter for the method when creating the URL instance instead of the local variable actually containing the url
Change
content = scanner.nextLine();
try {
URL url = new URL(theUrl);
to
content = scanner.nextLine();
try {
URL url = new URL(content);
I am trying to write a Java program, which establishes connection to yahoo finance, and pulls some data of the website for a specific stock.
The program terminates with the exception no line found, which is thrown at the if(input.hasNextLine()) statement. I get what the exception mean, but i Can't figure out what the error is.
I know that the problem is not in the URL construction, because the URL downloads the requested data from the web, when copied into a web browser.
hopes someone can point me in the right direction, i have been puzzled for several hours, trying to search the forum, but no luck so far.
My code looks as follows:
import java.net.URL;
import java.net.URLConnection;
import java.util.Calendar;
import java.util.GregorianCalendar;
import java.util.Scanner;
public class Download {
public Download(String symbol, GregorianCalendar end, GregorianCalendar start){
//Creates the URL
String url = "http://chart.finance.yahoo.com/table.csv?s="+symbol+
"&a="+start.get(Calendar.MONTH)+
"&b="+start.get(Calendar.DAY_OF_MONTH)+
"&c="+start.get(Calendar.YEAR)+
"&d="+end.get(Calendar.MONTH)+
"&e="+end.get(Calendar.DAY_OF_MONTH)+
"&f="+end.get(Calendar.YEAR)+
"&g=d&ignore=.csv";
try{
//Creates the URL object, and establishes connection
URL yhoofin = new URL(url);
URLConnection data = yhoofin.openConnection();
//Opens an input stream, to read from
Scanner input = new Scanner(data.getInputStream(),"UTF-8");
System.out.println(input.nextLine());
//skips the first line,
if(input.hasNextLine()){
input.nextLine();
//tries to print the data.
while(input.hasNextLine()){
String line = input.nextLine();
System.out.println(line);
}
}
//closes connection
input.close();
}
catch(Exception e){
System.err.println(e);
}
}
}
with the following main method:
import java.util.GregorianCalendar;
public class test {
public static void main(String[] args){
GregorianCalendar start = new GregorianCalendar(2015,7,10);
GregorianCalendar end = new GregorianCalendar(2016,7,10);
String symbol ="NVO";
Download test = new Download(symbol,end,start);
System.out.println("Done");
}
}
//http://chart.finance.yahoo.com/table.csv?s=ABCB4.SA&a=1&b=19&c=2017&d=2&e=19&f=2017&g=d&ignore=.csv
String url = "http://chart.finance.yahoo.com/table.csv?s="+symbol+".SA"+
"&a="+start.get(Calendar.MONTH)+
"&b="+start.get(Calendar.DAY_OF_MONTH)+
"&c="+start.get(Calendar.YEAR)+
"&d="+end.get(Calendar.MONTH)+
"&e="+end.get(Calendar.DAY_OF_MONTH)+
"&f="+end.get(Calendar.YEAR)+
"&g=d&ignore=.csv";
System.out.println(url);
try
{
URL yhoofin = new URL(url);
URLConnection data = yhoofin.openConnection();
data.connect();//not necessary
System.out.println("Connection Open! = "+data.getHeaderFields().toString());
String redirect = data.getHeaderField("Location");
if (redirect != null){
data = new URL(redirect).openConnection();
}
BufferedReader in = new BufferedReader(new InputStreamReader(data.getInputStream()));
String inputLine;
System.out.println();
in.readLine();//skip first line
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
lines.add(inputLine);
}
I am trying to get recharge plan information of service provider into my java program, the website contains dynamic data, and when i am fetching the URL using URLConnection i am only getting the static content,I want to automate the recharge plans of different website into my program.
package com.fs.store.test;
import java.net.*;
import java.io.*;
public class MyURLConnection
{
private static final String baseTataUrl = "https://www.tatadocomo/pre-paypacks";`enter code here`
public MyURLConnection()
{
}
public void getMeData()
{
URLConnection urlConnection = null;
BufferedReader in = null;
try
{
URL url = new URL(baseTataUrl);
urlConnection = url.openConnection();
HttpURLConnection connection = null;
connection = (HttpURLConnection) urlConnection;
in = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()/*,"UTF-8"*/));
String currentLine = null;
StringBuilder line = new StringBuilder();
while((currentLine = in.readLine()) != null)
{
System.out.println(currentLine);
line = line.append(currentLine.trim());
}
}catch(IOException e)
{
e.printStackTrace();
}
finally{
try{
in.close();
}
catch(Exception e){
e.printStackTrace();
}
}
}
public static void main (String args[])
{
MyURLConnection test = new MyURLConnection();
System.out.println("About to call getMeData()");
test.getMeData();
}
}
You must use one of HtmlEditorKits
with Javascript enabled in your browser
and then get content.
See examples:
oreilly
Inspect the traffjc. Firefox has a TamperData plugin for instance. Then you may communicate more directly.
Use apache's HttpClient to facilitate the communication, instead of plain URL.
Maybe use some JSON library if JSON data are coming back.
More details, but you might now skip some loading.
I am trying to run this code and i am facing the "Null Pointer Exception" in my program.I used try and catch but i donot know how to eliminate the problem.
Here is the code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.net.*;
import java.io.*;
import java.lang.NullPointerException;
public class WikiScraper {
public static void main(String[] args) throws IOException
{
scrapeTopic("/wiki/Python");
}
public static void scrapeTopic(String url){
String html = getUrl("http://www.wikipedia.org/"+url);
Document doc = Jsoup.parse(html);
String contentText = doc.select("#mw-content-text>p").first().text();
System.out.println(contentText);
System.out.println("The url was malformed!");
}
public static String getUrl(String url){
URL urlObj = null;
try{
urlObj = new URL(url);
}
catch(MalformedURLException e){
System.out.println("The url was malformed!");
return "";
}
URLConnection urlCon = null;
BufferedReader in = null;
String outputText = "";
try{
urlCon = urlObj.openConnection();
in = new BufferedReader(new InputStreamReader(urlCon.getInputStream()));
String line = "";
while((line = in.readLine()) != null){
outputText += line;
}
in.close();
}catch(IOException e){
System.out.println("There was an error connecting to the URL");
return "";
}
return outputText;
}
}
The Error shown is:
There was an error connecting to the URL
Exception in thread "main" java.lang.NullPointerException
at hello.WikiScraper.scrapeTopic(WikiScraper.java:17)
at hello.WikiScraper.main(WikiScraper.java:11)
You have
public static String getUrl(String url){
// ...
return "";
}
What always ends in an empty String.
Try
Document doc = Jsoup.connect("http://example.com/").get();
for example.