How to download protected web page using Java

How to download protected web page using Java - java

We have a task, to design a class which can download source of any web page. But when I try to test my code and fetch page like http://anidb.net/perl-bin/animedb.pl?show=main – nothing is working.
A standard code like this fails:
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL link = new URL("http://www.anidb.net/");
BufferedReader in = new BufferedReader(
new InputStreamReader(link.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
Here is the result I got:
Šwq>²"¦§5´_ï__ÇUº=ôÙö?kŠ}~“bd`?l“Ïçz¢Çêõ>_"?j×‰R“y}K¸\Ìc_DLÙªÏ_
–óMm_¼_0”•ö°ËC_aí½sî¤ìÁS ‚>dC0ìs_–y¹ñ±ÏÝÜAø%È_äÖá__æ©A#,4x„Š¶_ëÉƒ?
I have tried everything: cookies, header files but nothing seems to work. If you have some hint for me, I will appreciate it.

Writing a http client, you have to take gzip encoding into account as well as chunked transfer. Its better to use a library to download a webpage.
Try something like this:
http://code.google.com/p/google-http-java-client/

The site you referred to in your question doesn't seem to honor the 'Accept` request header nor they are setting the 'Content-Encoding' response header correctly, which I think is not correct.
Anyways, you can also use the java.util.zip.GZipInputStream to read the response in plain text format:
public static void main(String[] args) throws Exception
{
URL link = new URL("http://www.anidb.net/");
HttpURLConnection con = (HttpURLConnection) link.openConnection();
GZIPInputStream in = new GZIPInputStream(con.getInputStream());
byte[] b = new byte[1024];
StringBuilder content = new StringBuilder();
while (in.read(b) > 0)
{
content.append(new String(b));
}
System.out.println(content);
}

Related

reproducing the jackson vulnerability CVE-2017-7525 using spring boot and custom java class

I read about the ongoing jackson vulnerability(CVE-2017-7525) which allows for remote code execution, as explainedhere.
I did some modifications to the example class given on that page and wrote something like this:
import java.io.*;
import java.net.*;
public class Exploit extends com.sun.org.apache.xalan.internal.xsltc.runtime.AbstractTranslet {
private static String urlString = "https://sv443.net/jokeapi/category/any?blacklistFlags=nsfwreligiouspolitical";
public Exploit() throws Exception {
StringBuilder result = new StringBuilder();
URL url = new URL(urlString);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
result.append(line);
}
rd.close();
//Lets see the joke in the logs
System.out.println(result);
}
#Override
public void transform(com.sun.org.apache.xalan.internal.xsltc.DOM document, com.sun.org.apache.xml.internal.dtm.DTMAxisIterator iterator, com.sun.org.apache.xml.internal.serializer.SerializationHandler handler) {
}
#Override
public void transform(com.sun.org.apache.xalan.internal.xsltc.DOM document, com.sun.org.apache.xml.internal.serializer.SerializationHandler[] handler) {
}
}
Compiled the .java file and opened the generated .class file and passed its contents as part of the sample api request body provided, however it appears the the custom code may not have been executed (or so I think), I am expecting to see something on the application logs, printing the output of the request. However I do not see anything being printed.
Does anyone have a simple example that showcases this vulnerability using spring boot and jackson, through an api call using bogus jackson?
I understand this is an unusual question, but I am looking into this interesting topic hoping there is someone out there who has come across the need to demo this issue.
In short I am looking to demo this java deserialization vulnerability while using spring boot, jackson by making an api call and passing a Json document which contains the compiled java code to be executed.

How to get the content of a Website to a String in Android Studio ?

I want to display the parts of the content of a Website in my app. I've seen some solutions here but they are all very old and do not work with the newer versions of Android Studio. So maybe someone can help out.

https://jsoup.org/ should help for getting full site data, parse it based on class, id and etc. For instance, below code gets and prints site's title:
Document doc = Jsoup.connect("http://www.moodmusic.today/").get();
String title = doc.select("title").text();
System.out.println(title);

If you want to get raw data from a target website, you will need to do the following:
Create a URL object with the link of the website specified in the parameter
Cast it to HttpURLConnection
Retrieve its InputStream
Convert it to a String
This can work generally with java, no matter which IDE you're using.
To retrieve a connection's InputStream:
// Create a URL object
URL url = new URL("https://yourwebsitehere.domain");
// Retrieve its input stream
HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
InputStream instream = connection.getInputStream();
Make sure to handle java.net.MalformedURLException and java.io.IOException
To convert an InputStream to a String
public static String toString(InputStream in) throws IOException {
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = reader.readLine()) != null) {
builder.append(line).append("\n");
}
reader.close();
return builder.toString();
}
You can copy and modify the code above and use it in your source code!
Make sure to have the following imports
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
Example:
public static String getDataRaw() throws IOException, MalformedURLException {
URL url = new URL("https://yourwebsitehere.domain");
HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
InputStream instream = connection.getInputStream();
return toString(instream);
}
To call getDataRaw(), handle IOException and MalformedURLException and you're good to go!
Hope this helps!

Downloaded File Has 0 Bytes

I'm trying to download a .mp3 music file from this URL to the project root directory but the downloaded file always has 0 bytes in size (it is blank). The download also immediately stops.
I'm using the following code:
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.commons.io.FileUtils;
public class MusicDownloader
{
public static void main(String[] arguments) throws MalformedURLException, IOException
{
download("https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297", "Ponies and Balloons");
}
public static void download(String url, String fileName) throws MalformedURLException, IOException
{
FileUtils.copyURLToFile(new URL(url), new File(fileName + ".mp3"));
}
}
In a browser, downloading the file manually works flawlessly. A download link from another website e.g. this one had no problems to be processed by the code. What could be the problem here?
Sending a valid user-agent String doesn't work either.

The problem is actually with your URL https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297. It is actually issuing a redirect as Resource Temp Moved - 301 status code. So you need to pick its new URL. I tried using it HttpURLConnection to see that new redirected url is https://youtube-audio-library.storage.googleapis.com/d0a68933f592c297. You can use the below code :-
String urlString = "https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297";
URL url = new URL(urlString);
HttpURLConnection huc = (HttpURLConnection)url.openConnection();
int statusCode = huc.getResponseCode(); //get response code
if (statusCode == HttpURLConnection.HTTP_MOVED_TEMP
|| statusCode == HttpURLConnection.HTTP_MOVED_PERM){ // if file is moved, then pick new URL
urlString = huc.getHeaderField("Location");
url = new URL(urlString);
huc = (HttpURLConnection)url.openConnection();
}
System.out.println(urlString);
InputStream is = huc.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is);
FileOutputStream fos = new FileOutputStream("test.mp3");
int i = 0;
while ((i = bis.read()) != -1)
fos.write(i);
The same effect you can check is available in FileUtils or not. I am sure, it should be . Cheers :)

Because it is illegal and against Youtube Terms of Service
Youtube specifically blocks most generic ways of downloading mp3 off their site. A simple 10ish lines of code won't work or piracy would've been bigger than it already is.
If they catch you, you WILL be blocked

How do I work with the Expedia XML API in Java

I am having the Expedia account for getting a hotel list, and they giving the XML format data.
I need to process the XML and display HTML formatted data on my website using the Java programming language. I used the file_get_contents in PHP, but I don't know about penny of link to API in Java. What would be an elaborate explanation?

The code below will read the page contents as file_get_contents in php
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class URLExp {
public static void main(String[] args) {
try {
URL google = new URL("http://www.google.com/");
URLConnection yc = google.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc
.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Now after passing your url in the code you should get the xml as you said. Parse the XML and use it.

Use apache httpclient to invoke the URL, get the XML by setting headers as text /XML. Easiest way to parse the XML is to use castor library. Post your code and we can help more...

Using Java to pull data from a webpage?

I'm attempting to make my first program in Java. The goal is to write a program that browses to a website and downloads a file for me. However, I don't know how to use Java to interact with the internet. Can anyone tell me what topics to look up/read about or recommend some good resources?

The simplest solution (without depending on any third-party library or platform) is to create a URL instance pointing to the web page / link you want to download, and read the content using streams.
For example:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
public class DownloadPage {
public static void main(String[] args) throws IOException {
// Make a URL to the web page
URL url = new URL("http://stackoverflow.com/questions/6159118/using-java-to-pull-data-from-a-webpage");
// Get the input stream through URL Connection
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
// Once you have the Input Stream, it's just plain old Java IO stuff.
// For this case, since you are interested in getting plain-text web page
// I'll use a reader and output the text content to System.out.
// For binary content, it's better to directly read the bytes from stream and write
// to the target file.
try(BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line = null;
// read each line and write to System.out
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
}
}
Hope this helps.

The Basics
Look at these to build a solution more or less from scratch:
Start from the basics: The Java Tutorial's chapter on Networking, including Working With URLs
Make things easier for yourself: Apache HttpComponents (including HttpClient)
The Easily Glued-Up and Stitched-Up Stuff
You always have the option of calling external tools from Java using the exec() and similar methods. For instance, you could use wget, or cURL.
The Hardcore Stuff
Then if you want to go into more fully-fledged stuff, thankfully the need for automated web-testing as given us very practical tools for this. Look at:
HtmlUnit (powerful and simple)
Selenium, Selenium-RC
WebDriver/Selenium2 (still in the works)
JBehave with JBehave Web
Some other libs are purposefully written with web-scraping in mind:
JSoup
Jaunt
Some Workarounds
Java is a language, but also a platform, with many other languages running on it. Some of which integrate great syntactic sugar or libraries to easily build scrapers.
Check out:
Groovy (and its XmlSlurper)
or Scala (with great XML support as presented here and here)
If you know of a great library for Ruby (JRuby, with an article on scraping with JRuby and HtmlUnit) or Python (Jython) or you prefer these languages, then give their JVM ports a chance.
Some Supplements
Some other similar questions:
Scrape data from HTML using Java
Options for HTML Scraping

Here's my solution using URL and try with resources phrase to catch the exceptions.
/**
* Created by mona on 5/27/16.
*/
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
public class ReadFromWeb {
public static void readFromWeb(String webURL) throws IOException {
URL url = new URL(webURL);
InputStream is = url.openStream();
try( BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
catch (MalformedURLException e) {
e.printStackTrace();
throw new MalformedURLException("URL is malformed!!");
}
catch (IOException e) {
e.printStackTrace();
throw new IOException();
}
}
public static void main(String[] args) throws IOException {
String url = "https://madison.craigslist.org/search/sub";
readFromWeb(url);
}
}
You could additionally save it to file based on your needs or parse it using XML or HTML libraries.

Since Java 11 the most convenient way it to use java.net.http.HttpClient from the standard library.
Example:
HttpClient client = HttpClient.newBuilder()
.version(Version.HTTP_1_1)
.followRedirects(Redirect.NORMAL)
.connectTimeout(Duration.ofSeconds(20))
.proxy(ProxySelector.of(new InetSocketAddress("proxy.example.com", 80)))
.authenticator(Authenticator.getDefault())
.build();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("httpss://foo.com/"))
.timeout(Duration.ofMinutes(2))
.GET()
.build();
HttpResponse<String> response = client.send(request, BodyHandlers.ofString());
System.out.println(response.statusCode());
System.out.println(response.body());

I use the following code for my API:
try {
URL url = new URL("https://stackoverflow.com/questions/6159118/using-java-to-pull-data-from-a-webpage");
InputStream content = url.openStream();
int c;
while ((c = content.read())!=-1) System.out.print((char) c);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException ie) {
ie.printStackTrace();
}
You can catch the characters and convert them to string.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to download protected web page using Java - java

Writing a http client, you have to take gzip encoding into account as well as chunked transfer. Its better to use a library to download a webpage. Try something like this: http://code.google.com/p/google-http-java-client/

Related

reproducing the jackson vulnerability CVE-2017-7525 using spring boot and custom java class

How to get the content of a Website to a String in Android Studio ?

Downloaded File Has 0 Bytes

How do I work with the Expedia XML API in Java

Using Java to pull data from a webpage?

Categories

Resources