Downloaded File Has 0 Bytes - java

I'm trying to download a .mp3 music file from this URL to the project root directory but the downloaded file always has 0 bytes in size (it is blank). The download also immediately stops.
I'm using the following code:
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.commons.io.FileUtils;
public class MusicDownloader
{
public static void main(String[] arguments) throws MalformedURLException, IOException
{
download("https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297", "Ponies and Balloons");
}
public static void download(String url, String fileName) throws MalformedURLException, IOException
{
FileUtils.copyURLToFile(new URL(url), new File(fileName + ".mp3"));
}
}
In a browser, downloading the file manually works flawlessly. A download link from another website e.g. this one had no problems to be processed by the code. What could be the problem here?
Sending a valid user-agent String doesn't work either.

The problem is actually with your URL https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297. It is actually issuing a redirect as Resource Temp Moved - 301 status code. So you need to pick its new URL. I tried using it HttpURLConnection to see that new redirected url is https://youtube-audio-library.storage.googleapis.com/d0a68933f592c297. You can use the below code :-
String urlString = "https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297";
URL url = new URL(urlString);
HttpURLConnection huc = (HttpURLConnection)url.openConnection();
int statusCode = huc.getResponseCode(); //get response code
if (statusCode == HttpURLConnection.HTTP_MOVED_TEMP
|| statusCode == HttpURLConnection.HTTP_MOVED_PERM){ // if file is moved, then pick new URL
urlString = huc.getHeaderField("Location");
url = new URL(urlString);
huc = (HttpURLConnection)url.openConnection();
}
System.out.println(urlString);
InputStream is = huc.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is);
FileOutputStream fos = new FileOutputStream("test.mp3");
int i = 0;
while ((i = bis.read()) != -1)
fos.write(i);
The same effect you can check is available in FileUtils or not. I am sure, it should be . Cheers :)

Because it is illegal and against Youtube Terms of Service
Youtube specifically blocks most generic ways of downloading mp3 off their site. A simple 10ish lines of code won't work or piracy would've been bigger than it already is.
If they catch you, you WILL be blocked

Related

How to get the content of a Website to a String in Android Studio ?

I want to display the parts of the content of a Website in my app. I've seen some solutions here but they are all very old and do not work with the newer versions of Android Studio. So maybe someone can help out.
https://jsoup.org/ should help for getting full site data, parse it based on class, id and etc. For instance, below code gets and prints site's title:
Document doc = Jsoup.connect("http://www.moodmusic.today/").get();
String title = doc.select("title").text();
System.out.println(title);
If you want to get raw data from a target website, you will need to do the following:
Create a URL object with the link of the website specified in the parameter
Cast it to HttpURLConnection
Retrieve its InputStream
Convert it to a String
This can work generally with java, no matter which IDE you're using.
To retrieve a connection's InputStream:
// Create a URL object
URL url = new URL("https://yourwebsitehere.domain");
// Retrieve its input stream
HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
InputStream instream = connection.getInputStream();
Make sure to handle java.net.MalformedURLException and java.io.IOException
To convert an InputStream to a String
public static String toString(InputStream in) throws IOException {
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = reader.readLine()) != null) {
builder.append(line).append("\n");
}
reader.close();
return builder.toString();
}
You can copy and modify the code above and use it in your source code!
Make sure to have the following imports
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
Example:
public static String getDataRaw() throws IOException, MalformedURLException {
URL url = new URL("https://yourwebsitehere.domain");
HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
InputStream instream = connection.getInputStream();
return toString(instream);
}
To call getDataRaw(), handle IOException and MalformedURLException and you're good to go!
Hope this helps!

How to refresh website until server has response ? in Java?

I try to entry a busy website but always get the busy server result(error 500).
And I would like to use java code to loop the website until I can entry the website but i only know how to get the response code don't know how to open it in browse.
How to open the website in the connection of HttpURLConnection in browse or is there any other way to open or loop the website until no error? Many thanks!!!
import java.net.URL;
import java.io.IOException;
import java.net.HttpURLConnection;
public class test{
public static String URL = "http://www.example.com/";
public static void main (String args[]) throws Exception{
int i =0;
while(i==0){
URL url = new URL("http://www.example.com/");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
int code = connection.getResponseCode();
if (code != 500){
//how to open the no error website.
}
System.out.println(code);
}
}
}
If you would like to open a website upon receiving the 200 HTTP Status, you can do something like below inside your if ... else statement :
if (code == 200){
//how to open the no error website.
if (Desktop.isDesktopSupported()) {
Desktop.getDesktop().browse(url.toURI()); //referring to the url connection already created
System.exit(0);
}
}

Save all images from web page in Java

I'm writing a little project and there is a part that I have to download all images from different web pages.
I tried a code that I've found in the solution and it still not working for me.
The code:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import javax.imageio.ImageIO;
import javax.swing.text.AttributeSet;
import javax.swing.text.html.HTMLDocument;
public class ExtractAllImages {
public static void main(String args[]) throws Exception {
String webUrl = "https://www.pexels.com/search/HD%20wallpaper/";
URL url = new URL(webUrl);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
htmlKit.read(br, htmlDoc, 0);
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A); iterator.isValid(); iterator.next()) {
AttributeSet attributes = iterator.getAttributes();
String imgSrc = (String) attributes.getAttribute(HTML.Attribute.HREF);
System.out.println(imgSrc);
if (imgSrc != null && (imgSrc.toLowerCase().endsWith(".jpg") || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico")))) {
try {
downloadImage(webUrl, imgSrc);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
}
private static void downloadImage(String url, String imgSrc) throws IOException {
BufferedImage image = null;
try {
if (!(imgSrc.startsWith("http"))) {
url = url + imgSrc;
} else {
url = imgSrc;
}
imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
String imageFormat = null;
imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
String imgPath = null;
imgPath = "C:/Check/" + imgSrc + "";
URL imageUrl = new URL(url);
image = ImageIO.read(imageUrl);
if (image != null) {
File file = new File(imgPath);
ImageIO.write(image, imageFormat, file);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
The error I get:
Exception in thread "main" java.io.IOException: Server returned HTTP
response code: 403 for URL:
https://www.pexels.com/search/HD%20wallpaper/ at
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
Source) at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
Source) at
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown
Source) at ExtractAllImages.main(ExtractAllImages.java:23)
Any help would be highly appreciated. Thanks.
Edit:
I've tried other web pages and sometimes there is no error at all and still no image save to my path.
On some web pages I got this error:
Exception in thread "main" javax.swing.text.ChangedCharSetException
at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(Unknown
Source) at javax.swing.text.html.parser.Parser.startTag(Unknown
Source) at javax.swing.text.html.parser.Parser.parseTag(Unknown
Source) at javax.swing.text.html.parser.Parser.parseContent(Unknown
Source) at javax.swing.text.html.parser.Parser.parse(Unknown Source)
at javax.swing.text.html.parser.DocumentParser.parse(Unknown Source)
at javax.swing.text.html.parser.ParserDelegator.parse(Unknown Source)
at javax.swing.text.html.HTMLEditorKit.read(Unknown Source) at
ExtractAllImages.main(ExtractAllImages.java:29)
Any other way to write this code?
HTTP error 403 means that server refused request from client. Often you can bypass such check by introducing your client as someone else, by changing user-agent value.
To change user-agent you can call this (e.g. in first line of your code):
System.setProperty("http.agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36");
After this change your application will be able to connect to mentioned page (or similar) without problem because it will introduce as Chrome browser.
Anyway, there are few other issues in your application. While implementing app have in mind this:
You are using imgSrc.toLowerCase().endsWith(".jpg"). In real world many image links doesn't end with .jpg but with parameters, e.g.: https://images.pexels.com/photos/33109/fall-autumn-red-season.jpg?h=350&auto=compress&cs=tinysrgb. You should consider at least using imgSrc.toLowerCase().contains(".jpg") method.
Images are added to web page using img tag. In that case you should search for img tags and get src property in which path to image is set.
In case of www.pexels.com, when you click on wallpaper you are redirected to second page where you can download the wallpaper. Your application is trying to download images from primary page. You should first open second page and download desired image from there.

How to download protected web page using Java

We have a task, to design a class which can download source of any web page. But when I try to test my code and fetch page like http://anidb.net/perl-bin/animedb.pl?show=main – nothing is working.
A standard code like this fails:
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL link = new URL("http://www.anidb.net/");
BufferedReader in = new BufferedReader(
new InputStreamReader(link.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
Here is the result I got:
Šwq>²"¦§5´_ï__ÇUº=ôÙö?kŠ}~“bd`?l“Ïçz¢Çêõ>_"?j׉R“y}K¸\Ìc_DLÙªÏ_
–óMm_¼_0”•ö°ËC_aí½sî¤ìÁS ‚>dC0ìs_–y¹ñ±ÏÝÜAø%È_äÖá__æ©A#,4x„Š¶_ëɃ?
I have tried everything: cookies, header files but nothing seems to work. If you have some hint for me, I will appreciate it.
Writing a http client, you have to take gzip encoding into account as well as chunked transfer. Its better to use a library to download a webpage.
Try something like this:
http://code.google.com/p/google-http-java-client/
The site you referred to in your question doesn't seem to honor the 'Accept` request header nor they are setting the 'Content-Encoding' response header correctly, which I think is not correct.
Anyways, you can also use the java.util.zip.GZipInputStream to read the response in plain text format:
public static void main(String[] args) throws Exception
{
URL link = new URL("http://www.anidb.net/");
HttpURLConnection con = (HttpURLConnection) link.openConnection();
GZIPInputStream in = new GZIPInputStream(con.getInputStream());
byte[] b = new byte[1024];
StringBuilder content = new StringBuilder();
while (in.read(b) > 0)
{
content.append(new String(b));
}
System.out.println(content);
}

Why does this HTTP servlet behave inconsistently?

An intranet site has a search form which uses AJAX to call a servlet on a different domain for search suggestions.
This works in Internet Explorer with the intranet domain being a "trusted site" and with cross-domain requests enabled for trusted sites, but doesn't work in Firefox.
I have tried to work around the problem by creating a servlet on the intranet server, so there's a JS call to my servlet on the same domain, then my servlet calls the suggestions servlet on the other domain. The cross-domain call is server-side, so it should work regardless of browser settings.
The AJAX call and my servlet's call to the other servlet both use a HTTP POST request with arguments in the URL and empty request-content.
The reason I'm sticking with POST requests is that the JS code is all in files on the search server, which I can't modify, and that code uses POST requests.
I've tried calling the customer's existing suggestions servlet with a GET request, and it produces a 404 error.
The problem is that the result is inconsistent.
I've used System.out.println calls to show the full URL and size of the result on the server log.
The output first seemed to change depending on the calling browser and/or website, but now seems to change even between sessions of the same browser.
E.g. entering "g" in the search box, I got this output from the first few tries on the Development environment using Firefox:
Search suggestion URL: http://searchdev.companyname.com.au/suggest?q=g&max=10&site=All&client=ie&access=p&format=rich
Search suggestion result length: 64
Initial tries with Firefox on the Test environment (different intranet server but same search server) produced a result length of 0 for the same search URL.
Initial tries with Internet Explorer produced a result length of 0 in both environments.
Then I tried searching for different letters, and found that "t" produced a result in IE when "g" hadn't.
After closing the browsers and leaving it for a while, I tried again and got different results.
E.g. Using Firefox and trying "g" in the Development environment now produces no result when it was previously producing one.
The inconsistency makes me think something is wrong with my servlet code, which is shown below. What could be causing the problem?
I think the search suggestions are being provided by a Google Search Appliance, and the JS files on the search server all seem to have come from Google.
The actual AJAX call is this line in one file:
XH_XmlHttpPOST(xmlhttp, url, '', handler);
The XH_XmlHttpPOST function is as follows in another file:
function XH_XmlHttpPOST(xmlHttp, url, data, handler) {
xmlHttp.open("POST", url, true);
xmlHttp.onreadystatechange = handler;
xmlHttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
xmlHttp.setRequestHeader("Content-Length",
/** #type {string} */ (data.length));
XH_XmlHttpSend(xmlHttp, data);
}
Here is my servlet code:
package com.companyname.theme;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Properties;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class suggest extends HttpServlet {
Properties props=null;
#Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
String result = "";
String args = req.getQueryString();
String baseURL = props.getProperty("searchFormBaseURL");
String urlStr = baseURL + "/suggest?" + args;
System.out.println("Search suggestion URL: " + urlStr);
try {
int avail, rCount;
int totalCount = 0;
byte[] ba = null;
byte[] bCopy;
URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setDoOutput(true);
OutputStream os = conn.getOutputStream();
os.write("".getBytes());
os.close();
InputStream is = conn.getInputStream();
while ((avail = is.available()) > 0) {
if (ba == null) ba = new byte[avail];
else if (totalCount + avail > ba.length) {
// Resize ba if there's more data available.
bCopy = new byte[totalCount + avail];
System.arraycopy(ba, 0, bCopy, 0, totalCount);
ba = bCopy;
bCopy = null;
}
rCount = is.read(ba, totalCount, avail);
if (rCount < 0) break;
totalCount += rCount;
}
is.close();
conn.disconnect();
result = (ba == null ? "" : new String(ba));
System.out.println("Search suggestion result length: " + Integer.toString(result.length()));
} catch(MalformedURLException e) {
e.printStackTrace();
} catch(IOException e) {
e.printStackTrace();
}
PrintWriter pw = resp.getWriter();
pw.print(result);
}
#Override
public void init() throws ServletException {
super.init();
InputStream stream = this.getClass().getResourceAsStream("/WEB-INF/lib/endeavour.properties");
props = new Properties();
try {
props.load(stream);
stream.close();
} catch (Exception e) {
// TODO: handle exception
}
}
}
Solution: don't rely on InputStream.available().
The JavaDoc for that method says it always returns 0.
HttpURLConnection.getInputStream() actually returns a HttpInputStream, in which available() seems to work but apparently sometimes returns 0 when there is more data.
I changed my read loop to not use available() at all, and now it consistently returns the expected results.
The working servlet is below.
package com.integral.ie.theme;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Properties;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class suggest extends HttpServlet implements
javax.servlet.Servlet {
Properties props=null;
#Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
//super.doPost(req, resp);
final int maxRead=200;
String result="";
String args=req.getQueryString();
String baseURL=props.getProperty("searchFormBaseURL");
String urlStr=baseURL+"/suggest?"+args;
//System.out.println("Search suggestion URL: "+urlStr);
try {
int rCount=0;
int totalCount=0;
int baLen=maxRead;
byte[] ba=null;
byte[] bCopy;
URL url=new URL(urlStr);
HttpURLConnection conn=(HttpURLConnection)url.openConnection();
conn.setRequestMethod("POST");
// Setting these properties may be unnecessary - just did it
// because the GSA javascript does it.
conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
conn.setRequestProperty("Content-Length","0");
InputStream is=conn.getInputStream();
ba=new byte[baLen];
while (rCount>=0) {
try {
rCount=is.read(ba,totalCount,baLen-totalCount);
if (rCount>0) {
totalCount+=rCount;
if (totalCount>=baLen) {
baLen+=maxRead;
bCopy=new byte[baLen];
System.arraycopy(ba,0,bCopy,0,totalCount);
ba=bCopy;
bCopy=null;
}
}
} catch(IOException e) {
// IOException while reading - allow the method to return
// anything we've read so far.
}
}
is.close();
conn.disconnect();
result=(totalCount==0?"":new String(ba,0,totalCount));
//System.out.println("Search suggestion result length: "
//+Integer.toString(result.length()));
} catch(MalformedURLException e) {
e.printStackTrace();
} catch(IOException e) {
e.printStackTrace();
}
PrintWriter pw=resp.getWriter();
pw.print(result);
}
#Override
public void init() throws ServletException {
super.init();
InputStream stream=this.getClass().getResourceAsStream("/WEB-INF/lib/endeavour.properties");
props=new Properties();
try {
props.load(stream);
stream.close();
} catch (Exception e) {
// TODO: handle exception
}
}
}
Start with a unit test. Servlets are pretty straightforward to unit test and HttpUnit has worked for us.
Debugging Servlet code in a browser and with println calls will cost more time in the long run and it's difficult for someone on SO to digest all of that information to help you.
Also, consider using a JavaScript framework such as JQuery for your AJAX calls. In my opinion there's little reason to touch an xmlHttp object directly now that frameworks will hide that for you.

Categories

Resources