I'm writing a little project and there is a part that I have to download all images from different web pages.
I tried a code that I've found in the solution and it still not working for me.
The code:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import javax.imageio.ImageIO;
import javax.swing.text.AttributeSet;
import javax.swing.text.html.HTMLDocument;
public class ExtractAllImages {
public static void main(String args[]) throws Exception {
String webUrl = "https://www.pexels.com/search/HD%20wallpaper/";
URL url = new URL(webUrl);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
htmlKit.read(br, htmlDoc, 0);
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A); iterator.isValid(); iterator.next()) {
AttributeSet attributes = iterator.getAttributes();
String imgSrc = (String) attributes.getAttribute(HTML.Attribute.HREF);
System.out.println(imgSrc);
if (imgSrc != null && (imgSrc.toLowerCase().endsWith(".jpg") || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico")))) {
try {
downloadImage(webUrl, imgSrc);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
}
}
private static void downloadImage(String url, String imgSrc) throws IOException {
BufferedImage image = null;
try {
if (!(imgSrc.startsWith("http"))) {
url = url + imgSrc;
} else {
url = imgSrc;
}
imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
String imageFormat = null;
imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
String imgPath = null;
imgPath = "C:/Check/" + imgSrc + "";
URL imageUrl = new URL(url);
image = ImageIO.read(imageUrl);
if (image != null) {
File file = new File(imgPath);
ImageIO.write(image, imageFormat, file);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
The error I get:
Exception in thread "main" java.io.IOException: Server returned HTTP
response code: 403 for URL:
https://www.pexels.com/search/HD%20wallpaper/ at
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
Source) at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
Source) at
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown
Source) at ExtractAllImages.main(ExtractAllImages.java:23)
Any help would be highly appreciated. Thanks.
Edit:
I've tried other web pages and sometimes there is no error at all and still no image save to my path.
On some web pages I got this error:
Exception in thread "main" javax.swing.text.ChangedCharSetException
at javax.swing.text.html.parser.DocumentParser.handleEmptyTag(Unknown
Source) at javax.swing.text.html.parser.Parser.startTag(Unknown
Source) at javax.swing.text.html.parser.Parser.parseTag(Unknown
Source) at javax.swing.text.html.parser.Parser.parseContent(Unknown
Source) at javax.swing.text.html.parser.Parser.parse(Unknown Source)
at javax.swing.text.html.parser.DocumentParser.parse(Unknown Source)
at javax.swing.text.html.parser.ParserDelegator.parse(Unknown Source)
at javax.swing.text.html.HTMLEditorKit.read(Unknown Source) at
ExtractAllImages.main(ExtractAllImages.java:29)
Any other way to write this code?
HTTP error 403 means that server refused request from client. Often you can bypass such check by introducing your client as someone else, by changing user-agent value.
To change user-agent you can call this (e.g. in first line of your code):
System.setProperty("http.agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36");
After this change your application will be able to connect to mentioned page (or similar) without problem because it will introduce as Chrome browser.
Anyway, there are few other issues in your application. While implementing app have in mind this:
You are using imgSrc.toLowerCase().endsWith(".jpg"). In real world many image links doesn't end with .jpg but with parameters, e.g.: https://images.pexels.com/photos/33109/fall-autumn-red-season.jpg?h=350&auto=compress&cs=tinysrgb. You should consider at least using imgSrc.toLowerCase().contains(".jpg") method.
Images are added to web page using img tag. In that case you should search for img tags and get src property in which path to image is set.
In case of www.pexels.com, when you click on wallpaper you are redirected to second page where you can download the wallpaper. Your application is trying to download images from primary page. You should first open second page and download desired image from there.
Related
I have a java program, that connects to a website to retrieve some XML from it. This works fine on my computer, as well as others outside our company. One of our customers is now not able to connect to the website. I figured out, that they are behind a proxy. I have now found which settings I need to use, and in my test program it works (partially).
In the code below, the downloadFile() call works as expected, and the file can be downloaded without problems. The contactHost() fails on our client machines with an UnknownHostException:
java.net.UnknownHostException: No such host is known (api.myserver.de)
at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:925)
at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1505)
at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:844)
at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1495)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1354)
at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1288)
at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:111)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
Background: Windows 10 machines, our program is shipped with an internal OpenJDK, version "10.0.2" 2018-07-17. The program is started with the following defines -Djdk.http.auth.tunneling.disabledSchemes="" -Djava.net.preferIPv4Stack=true in order to use IP4 only, and to enable BasicAuthentification for the Proxy. With these settings, the file can be downloaded, however the UnknownHostException is still there.
We have also tried to open the used URL in an browser, and this works as excepted, i.e. in the browser the website is opened.
Here is my code for testing:
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.Authenticator;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
public class LFTProxyTest {
private static String uname = null;
private static String pass = null;
public static void main(String[] args) {
System.setProperty("java.net.useSystemProxies", "true");
// uname = "test"; // whatever that user provides
// pass = "sectret"; // whatever that user provides
Authenticator.setDefault(new ProxyAuth(uname, pass));
contactHost();
downloadFile();
}
private static boolean downloadFile() {
System.out.println("CHECK connection");
int cp = contactHost();
if (cp == 200)
return true;
if (cp == 407)
return false;
else {
try {
System.out.println("Try loading file: ");
URL url = new URL("https://www.google.de");
URLConnection urlConnection = url.openConnection();
InputStream in = new BufferedInputStream(urlConnection.getInputStream());
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
dBuilder.parse(in);
System.out.println(" FILE DOWNLOAD successfull!");
} catch (Exception e) {
System.out.println(" FILE DOWNLOAD failed:");
System.out.println("***EXCEPTION: " + e.getMessage());
return false;
}
}
System.out.println("CHECK done");
return true;
}
private static int contactHost() {
HttpClient client = HttpClientBuilder.create().build();// new DefaultHttpClient();
String catalogURI = "https://api.myserver.de/query";
HttpGet request = new HttpGet(catalogURI);
try {
int ret = 0;
HttpResponse response = client.execute(request);
ret = response.getStatusLine().getStatusCode();
System.out.println("PROXY test: " + ret);
((CloseableHttpClient) client).close();
return ret;
} catch (IOException e) {
e.printStackTrace();
return -1;
}
}
}
I don't know what do do know, I'm not even sure where the error could be. Any ideas are highly appreciated!
Ok, so after some further digging, I found out that org.apache.http.client.HttpClient is not respecting java.net.useSystemProxies at all, be it set via System or via -D. And it is also ignoring http.proxyHost etc. Solution is to use a ProxySelector like this:
ProxySelector.setDefault(new ProxySelector() {
#Override
public List<Proxy> select(URI uri) {
ArrayList<Proxy> list = new ArrayList<Proxy>();
list.add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("proxy1.de", 8000)));
list.add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("proxy2.de", 8080)));
return list;
}
#Override
public void connectFailed(URI uri, SocketAddress sa, IOException ioe) {
logger.error("Error in ProxySelector, connection Failed: ", ioe);
}
});
I'm getting another exception now, but I might open another thread for this.
UnknownHostException designates a pretty straight forward problem. That the IP address of the remote host you are trying to reach cannot be resolved. So the solution to this is very simple. You should check the input of Socket (or any other method that throws an UnknownHostException), and validate that it is the intended one. If you are not whether you have the correct host name, you can launch a UNIX terminal and use the nslookup command (among others) to see if your DNS server can resolve the host name to an IP address successfully.
If you are on Windows you can use the host command. If that doesn’t work as expected then, you should check if the host name you have is correct and then try to refresh your DNS cache. If that doesn’t work either, try to use a different DNS server, eg Google Public DNS is a very good alternative.
I want to display the parts of the content of a Website in my app. I've seen some solutions here but they are all very old and do not work with the newer versions of Android Studio. So maybe someone can help out.
https://jsoup.org/ should help for getting full site data, parse it based on class, id and etc. For instance, below code gets and prints site's title:
Document doc = Jsoup.connect("http://www.moodmusic.today/").get();
String title = doc.select("title").text();
System.out.println(title);
If you want to get raw data from a target website, you will need to do the following:
Create a URL object with the link of the website specified in the parameter
Cast it to HttpURLConnection
Retrieve its InputStream
Convert it to a String
This can work generally with java, no matter which IDE you're using.
To retrieve a connection's InputStream:
// Create a URL object
URL url = new URL("https://yourwebsitehere.domain");
// Retrieve its input stream
HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
InputStream instream = connection.getInputStream();
Make sure to handle java.net.MalformedURLException and java.io.IOException
To convert an InputStream to a String
public static String toString(InputStream in) throws IOException {
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = reader.readLine()) != null) {
builder.append(line).append("\n");
}
reader.close();
return builder.toString();
}
You can copy and modify the code above and use it in your source code!
Make sure to have the following imports
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
Example:
public static String getDataRaw() throws IOException, MalformedURLException {
URL url = new URL("https://yourwebsitehere.domain");
HttpURLConnection connection = ((HttpURLConnection) url.openConnection());
InputStream instream = connection.getInputStream();
return toString(instream);
}
To call getDataRaw(), handle IOException and MalformedURLException and you're good to go!
Hope this helps!
I'm working on a team software project that involves designing a client for a server-based AI called SeeFood. You can send it a picture, and it will tell you whether or not the picture has food in it. We currently have a python script deployed to the server that accepts Http POST requests and calls the AI with an image that it is given. You can access that at 34.236.92.140.
The challenge I'm facing right now is getting my Java client to be able to send an image to the server, have it analyzed, and get a response back. I've been trying different things, including the Apache HttpComponents library, but I'm constantly getting this response code from the server when I run the code:
400 BAD REQUEST
Server: Apache/2.4.27 (Amazon) PHP/5.6.30 mod_wsgi/3.5 Python/2.7.12
Connection: close
Content-Length: 192
Date: Fri, 17 Nov 2017 16:11:28 GMT
Content-Type: text/html; charset=UTF-8
Judging by research done on HTTP code 400, the server doesn't like how I've formatted the POST request. Does anyone have experience with HTTP servers and sending images via POST? Again, you can try out the server side application at 34.236.92.140. I'll also include the Java client and Python server code.
Java Client (relevant code under the exportImages and readResultsToString methods):
package javaapplication12;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.FileChannel;
import java.nio.channels.WritableByteChannel;
import java.util.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import javafx.application.*;
import static javafx.application.Application.launch;
import javafx.event.*;
import javafx.geometry.*;
import javafx.scene.*;
import javafx.scene.control.*;
import javafx.scene.layout.*;
import javafx.stage.*;
public class UserInterface extends Application {
private List<File> _images;
/**
* #param args the command line arguments
*/
public static void main (String[] args) {
System.setProperty("java.net.preferIPv4Stack" , "true");
launch (args);
}
#Override
public void start (Stage primaryStage) {
final FileChooser fc=new FileChooser ();
primaryStage.setTitle ("SeeFood AI User Interface");
Button imageButton=new Button ("Import Images");
Button exportButton=new Button ("Send Images to SeeFood");
//When image button is pressed, a FileChooser should load up and add all selected images to a list
imageButton.setOnAction ((ActionEvent event) -> {
_images=fc.showOpenMultipleDialog (primaryStage);
if (_images!=null) {
int i=0;
//loop to verify that all selected images are added
for (File file:_images) {
System.out.println ("image "+i);
i++;
}
}
});
exportButton.setOnAction ((ActionEvent event) -> {
try {
exportImages();
} catch (IOException ex) {
Logger.getLogger(UserInterface.class.getName()).log(Level.SEVERE, null, ex);
}
});
final GridPane inputGridPane=new GridPane ();
GridPane.setConstraints (imageButton,0,0);
GridPane.setConstraints (exportButton,0,1);
inputGridPane.setHgap (6);
inputGridPane.setVgap (6);
inputGridPane.getChildren ().addAll (imageButton, exportButton);
final Pane rootGroup=new VBox (12);
rootGroup.getChildren ().addAll (inputGridPane);
rootGroup.setPadding (new Insets (12,12,12,12));
primaryStage.setScene (new Scene (rootGroup));
primaryStage.show ();
}
/**
* Sends one or more images to SeeFood via HTTP POST.
* #throws MalformedURLException
* #throws IOException
*/
private void exportImages() throws MalformedURLException, IOException{
//InetAddress host=InetAddress.getByName(_ip);
// System.out.println(InetAddress.getByName(_ip));
URL url=new URL("http://34.236.92.140");
HttpURLConnection con=(HttpURLConnection) url.openConnection();
String output;
con.setRequestMethod("POST");
con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
con.setRequestProperty("Content-Type", "multipart/form-data");
FileChannel in;
WritableByteChannel out;
con.setDoOutput(true); //this must be set to true in order to work
con.setDoInput(true);
for(File file:_images){
in=new FileInputStream(file).getChannel();
out=Channels.newChannel(con.getOutputStream());
in.transferTo(0, file.length(), out);
StringBuilder builder = new StringBuilder();
builder.append(con.getResponseCode())
.append(" ")
.append(con.getResponseMessage())
.append("\n");
Map<String, List<String>> map = con.getHeaderFields();
for (Map.Entry<String, List<String>> entry : map.entrySet()){
if (entry.getKey() == null)
continue;
builder.append( entry.getKey())
.append(": ");
List<String> headerValues = entry.getValue();
Iterator<String> it = headerValues.iterator();
if (it.hasNext()) {
builder.append(it.next());
while (it.hasNext()) {
builder.append(", ")
.append(it.next());
}
}
builder.append("\n");
}
System.out.println(builder);
//Output the result from SeeFood
//Later on, this result should be stored for each image
output=readResultsToString(con);
if(output!=null){
System.out.println(output);
} else {
System.out.println("There was an error in the connection.");
}
in.close();
out.close();
}
con.disconnect();
}
/**
* Helper method to exportImages(). Should get response from server
* and append contents to string.
* #param con - the active http connection
* #return response from the server
*/
private String readResultsToString(HttpURLConnection con){
String result = null;
StringBuffer sb = new StringBuffer();
InputStream is = null;
try {
is=new BufferedInputStream(con.getInputStream());
BufferedReader br=new BufferedReader(new InputStreamReader(is));
String inputLine="";
while((inputLine=br.readLine())!=null){
sb.append(inputLine);
}
result=sb.toString();
} catch (IOException ex) {
Logger.getLogger(UserInterface.class.getName()).log(Level.SEVERE, null, ex);
} finally {
if(is!=null){
try {
is.close();
} catch (IOException ex) {
Logger.getLogger(UserInterface.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
return result;
}
}
Python server:
from flask import Flask, send_from_directory, request
from werkzeug.utils import secure_filename
import argparse
import numpy as np
import tensorflow as tf
from PIL import Image
import sys
app = Flask(__name__)
'''
method for uploading files to the server
via http POST request
'''
#app.route('/upload', methods=['GET', 'POST'])
def upload_file():
if request.method == 'POST':
f = request.files['file']
f.save(secure_filename(f.filename))
print f.filename
score = ai_call(f.filename)
#save file in location based on score
return score
return '''
<!doctype html>
<title>Upload new File</title>
<h1>Upload new File</h1>
<form method=post enctype=multipart/form-data>
<p><input type=file name=file>
<input type=submit value=Upload>
</form>
'''
'''
method for returning files from the server based on filename
'''
#app.route('/download/<file_name>')
def get_file(file_name):
return app.send_static_file(file_name)
'''
index page
needs to be motifed to return default images
'''
#app.route('/')
def index():
find_food
return 'Hello World'
"""
A script to ask SeeFood if it sees food in the image at
path specified by the command line argument.
"""
def ai_call(system_arg):
#parser = argparse.ArgumentParser(description="Ask SeeFood if there is
food in the image provided.")
#parser.add_argument('image_path', help="The full path to an image file stored on disk.")
#args = parser.parse_args()
# The script assumes the args are perfect, this will crash and burn otherwise.
###### Initialization code - we only need to run this once and keep in memory.
sess = tf.Session()
saver = tf.train.import_meta_graph('saved_model/model_epoch5.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint('saved_model/'))
graph = tf.get_default_graph()
x_input = graph.get_tensor_by_name('Input_xn/Placeholder:0')
keep_prob = graph.get_tensor_by_name('Placeholder:0')
class_scores = graph.get_tensor_by_name("fc8/fc8:0")
######
# Work in RGBA space (A=alpha) since png's come in as RGBA, jpeg come in as RGB
# so convert everything to RGBA and then to RGB.
#image_path = args.image_path
image_path = system_arg
image = Image.open(image_path).convert('RGB')
image = image.resize((227, 227), Image.BILINEAR)
img_tensor = [np.asarray(image, dtype=np.float32)]
print 'looking for food in '+ image_path
#Run the image in the model.
scores = sess.run(class_scores, {x_input: img_tensor, keep_prob: 1.})
print scores
# if np.argmax = 0; then the first class_score was higher, e.g., the model sees food.
# if np.argmax = 1; then the second class_score was higher, e.g., the model does not see food.
if np.argmax(scores) == 1:
print "No food here... :disappointed: "
else:
print "Oh yes... I see food! :D"
return str(scores)
if __name__ == '__main__':
app.debug = True
app.run()
Any help you can offer is appreciated. Thank you in advance.
I had a similar problem. I fixed it by -
String url = "http://127.0.0.1:8080/";
// 2. create obj for the URL class
URL obj = new URL(url);
// 3. open connection on the url
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Content-Type","image/jpeg");
con.setDoInput(true);
con.setDoOutput(true);
OutputStream out = con.getOutputStream();
DataOutputStream image = new DataOutputStream(out);
Path path = Paths.get("jpeg.jpg");
byte[] fileContents = Files.readAllBytes(path);
image.write(fileContents, 0, fileContents.length);
I'm trying to download a .mp3 music file from this URL to the project root directory but the downloaded file always has 0 bytes in size (it is blank). The download also immediately stops.
I'm using the following code:
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.commons.io.FileUtils;
public class MusicDownloader
{
public static void main(String[] arguments) throws MalformedURLException, IOException
{
download("https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297", "Ponies and Balloons");
}
public static void download(String url, String fileName) throws MalformedURLException, IOException
{
FileUtils.copyURLToFile(new URL(url), new File(fileName + ".mp3"));
}
}
In a browser, downloading the file manually works flawlessly. A download link from another website e.g. this one had no problems to be processed by the code. What could be the problem here?
Sending a valid user-agent String doesn't work either.
The problem is actually with your URL https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297. It is actually issuing a redirect as Resource Temp Moved - 301 status code. So you need to pick its new URL. I tried using it HttpURLConnection to see that new redirected url is https://youtube-audio-library.storage.googleapis.com/d0a68933f592c297. You can use the below code :-
String urlString = "https://www.youtube.com/audiolibrary_download?vid=d0a68933f592c297";
URL url = new URL(urlString);
HttpURLConnection huc = (HttpURLConnection)url.openConnection();
int statusCode = huc.getResponseCode(); //get response code
if (statusCode == HttpURLConnection.HTTP_MOVED_TEMP
|| statusCode == HttpURLConnection.HTTP_MOVED_PERM){ // if file is moved, then pick new URL
urlString = huc.getHeaderField("Location");
url = new URL(urlString);
huc = (HttpURLConnection)url.openConnection();
}
System.out.println(urlString);
InputStream is = huc.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is);
FileOutputStream fos = new FileOutputStream("test.mp3");
int i = 0;
while ((i = bis.read()) != -1)
fos.write(i);
The same effect you can check is available in FileUtils or not. I am sure, it should be . Cheers :)
Because it is illegal and against Youtube Terms of Service
Youtube specifically blocks most generic ways of downloading mp3 off their site. A simple 10ish lines of code won't work or piracy would've been bigger than it already is.
If they catch you, you WILL be blocked
An intranet site has a search form which uses AJAX to call a servlet on a different domain for search suggestions.
This works in Internet Explorer with the intranet domain being a "trusted site" and with cross-domain requests enabled for trusted sites, but doesn't work in Firefox.
I have tried to work around the problem by creating a servlet on the intranet server, so there's a JS call to my servlet on the same domain, then my servlet calls the suggestions servlet on the other domain. The cross-domain call is server-side, so it should work regardless of browser settings.
The AJAX call and my servlet's call to the other servlet both use a HTTP POST request with arguments in the URL and empty request-content.
The reason I'm sticking with POST requests is that the JS code is all in files on the search server, which I can't modify, and that code uses POST requests.
I've tried calling the customer's existing suggestions servlet with a GET request, and it produces a 404 error.
The problem is that the result is inconsistent.
I've used System.out.println calls to show the full URL and size of the result on the server log.
The output first seemed to change depending on the calling browser and/or website, but now seems to change even between sessions of the same browser.
E.g. entering "g" in the search box, I got this output from the first few tries on the Development environment using Firefox:
Search suggestion URL: http://searchdev.companyname.com.au/suggest?q=g&max=10&site=All&client=ie&access=p&format=rich
Search suggestion result length: 64
Initial tries with Firefox on the Test environment (different intranet server but same search server) produced a result length of 0 for the same search URL.
Initial tries with Internet Explorer produced a result length of 0 in both environments.
Then I tried searching for different letters, and found that "t" produced a result in IE when "g" hadn't.
After closing the browsers and leaving it for a while, I tried again and got different results.
E.g. Using Firefox and trying "g" in the Development environment now produces no result when it was previously producing one.
The inconsistency makes me think something is wrong with my servlet code, which is shown below. What could be causing the problem?
I think the search suggestions are being provided by a Google Search Appliance, and the JS files on the search server all seem to have come from Google.
The actual AJAX call is this line in one file:
XH_XmlHttpPOST(xmlhttp, url, '', handler);
The XH_XmlHttpPOST function is as follows in another file:
function XH_XmlHttpPOST(xmlHttp, url, data, handler) {
xmlHttp.open("POST", url, true);
xmlHttp.onreadystatechange = handler;
xmlHttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
xmlHttp.setRequestHeader("Content-Length",
/** #type {string} */ (data.length));
XH_XmlHttpSend(xmlHttp, data);
}
Here is my servlet code:
package com.companyname.theme;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Properties;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class suggest extends HttpServlet {
Properties props=null;
#Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
String result = "";
String args = req.getQueryString();
String baseURL = props.getProperty("searchFormBaseURL");
String urlStr = baseURL + "/suggest?" + args;
System.out.println("Search suggestion URL: " + urlStr);
try {
int avail, rCount;
int totalCount = 0;
byte[] ba = null;
byte[] bCopy;
URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setDoOutput(true);
OutputStream os = conn.getOutputStream();
os.write("".getBytes());
os.close();
InputStream is = conn.getInputStream();
while ((avail = is.available()) > 0) {
if (ba == null) ba = new byte[avail];
else if (totalCount + avail > ba.length) {
// Resize ba if there's more data available.
bCopy = new byte[totalCount + avail];
System.arraycopy(ba, 0, bCopy, 0, totalCount);
ba = bCopy;
bCopy = null;
}
rCount = is.read(ba, totalCount, avail);
if (rCount < 0) break;
totalCount += rCount;
}
is.close();
conn.disconnect();
result = (ba == null ? "" : new String(ba));
System.out.println("Search suggestion result length: " + Integer.toString(result.length()));
} catch(MalformedURLException e) {
e.printStackTrace();
} catch(IOException e) {
e.printStackTrace();
}
PrintWriter pw = resp.getWriter();
pw.print(result);
}
#Override
public void init() throws ServletException {
super.init();
InputStream stream = this.getClass().getResourceAsStream("/WEB-INF/lib/endeavour.properties");
props = new Properties();
try {
props.load(stream);
stream.close();
} catch (Exception e) {
// TODO: handle exception
}
}
}
Solution: don't rely on InputStream.available().
The JavaDoc for that method says it always returns 0.
HttpURLConnection.getInputStream() actually returns a HttpInputStream, in which available() seems to work but apparently sometimes returns 0 when there is more data.
I changed my read loop to not use available() at all, and now it consistently returns the expected results.
The working servlet is below.
package com.integral.ie.theme;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Properties;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class suggest extends HttpServlet implements
javax.servlet.Servlet {
Properties props=null;
#Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {
//super.doPost(req, resp);
final int maxRead=200;
String result="";
String args=req.getQueryString();
String baseURL=props.getProperty("searchFormBaseURL");
String urlStr=baseURL+"/suggest?"+args;
//System.out.println("Search suggestion URL: "+urlStr);
try {
int rCount=0;
int totalCount=0;
int baLen=maxRead;
byte[] ba=null;
byte[] bCopy;
URL url=new URL(urlStr);
HttpURLConnection conn=(HttpURLConnection)url.openConnection();
conn.setRequestMethod("POST");
// Setting these properties may be unnecessary - just did it
// because the GSA javascript does it.
conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
conn.setRequestProperty("Content-Length","0");
InputStream is=conn.getInputStream();
ba=new byte[baLen];
while (rCount>=0) {
try {
rCount=is.read(ba,totalCount,baLen-totalCount);
if (rCount>0) {
totalCount+=rCount;
if (totalCount>=baLen) {
baLen+=maxRead;
bCopy=new byte[baLen];
System.arraycopy(ba,0,bCopy,0,totalCount);
ba=bCopy;
bCopy=null;
}
}
} catch(IOException e) {
// IOException while reading - allow the method to return
// anything we've read so far.
}
}
is.close();
conn.disconnect();
result=(totalCount==0?"":new String(ba,0,totalCount));
//System.out.println("Search suggestion result length: "
//+Integer.toString(result.length()));
} catch(MalformedURLException e) {
e.printStackTrace();
} catch(IOException e) {
e.printStackTrace();
}
PrintWriter pw=resp.getWriter();
pw.print(result);
}
#Override
public void init() throws ServletException {
super.init();
InputStream stream=this.getClass().getResourceAsStream("/WEB-INF/lib/endeavour.properties");
props=new Properties();
try {
props.load(stream);
stream.close();
} catch (Exception e) {
// TODO: handle exception
}
}
}
Start with a unit test. Servlets are pretty straightforward to unit test and HttpUnit has worked for us.
Debugging Servlet code in a browser and with println calls will cost more time in the long run and it's difficult for someone on SO to digest all of that information to help you.
Also, consider using a JavaScript framework such as JQuery for your AJAX calls. In my opinion there's little reason to touch an xmlHttp object directly now that frameworks will hide that for you.