Socket versus URL website access - java

I have a Java application which opens an existing company's website using the Socket class:
Socket sockSite;
InputStream inFile = null;
BufferedWriter out = null;
try
{
sockSite = new Socket( presetSite, 80 );
inFile = sockSite.getInputStream();
out = new BufferedWriter( new OutputStreamWriter(sockSite.getOutputStream()) );
}
catch ( IOException e )
{
...
}
out.write( "GET " + presetPath + " HTTP/1.1\r\n\r\n" );
out.flush();
I would read the website with the stream inFile and life is good.
Recently this started to fail. I was getting an HTTP 301 "site has moved" error but no moved-to link. The site still exists and responds using the same original HTTP reference and any web browser. But the above code comes back with the HTTP 301.
I changed the code to this:
URL url;
InputStream inFile = null;
try
{
url = new URL( presetSite + presetPath );
inFile = url.openStream();
}
catch ( IOException e )
{
...
}
And read the site with the original code from inFile stream and it now works again.
This difference doesn't just occur in Java but it also occurs if I use Perl (using IO::Socket::INET approach opening the website port 80, then issuing a GET fails, but using LWP::Simple method get just works). In other words, I get a failure if I open the web page first with port 80, then do a GET, but it works fine if I use a class which does it "all at once" (that just says, "get me web page with such-and-such an HTTP address").
I thought I'd try the different approaches on http://www.microsoft.com and got an interesting result. In the case of opening port 80, followed by issuing the GET /..., I received an HTTP 200 response with a page that said, "Your current user agent
In one case, I tried the "port 80" open followed by GET / on www.microsoft.com and I received an HTTP 200 response page that said, "Your current user agent appears to be from an automated process...". But if I use the second method (URL class in Java, or LWP in Perl) I simply get their web page.
So my question is: how does the URL class (in Java) or the LWP module (in Perl) do its thing under the hood that makes it different from opening the website on port 80 and issuing a GET?

Most servers require the Host: header, to allow virtual hosting (multiple domains on one IP)

If you use a packet capturing software to see what's being sent when URL is used, you'll realize that there's a lot more than just "GET /" being sent. All sorts of additional header information are included. If a server gets just a simple "GET /", it's easy to deduct that it can't be a very sophisticated client on the other end.
Also, HTTP 1.0 is "outdated", the current version is 1.1.

Java URL implementation delegates to HttpURLConnection if it starts with "http:"

Related

How can I read a text file from the internet with Java?

I want to read the second line of the text at this URL: "http://vuln2014.picoctf.com:51818/" (this is a capture-the-flag competition but only asking for flags or direction to flags breaks the competition rules). I am attempting to open an input stream from the URL but I get an Invalid HTTP Response exception. Any help is appreciated, and I recognize that my error is likely quite foolish.
Code:
URL url = new URL("http://vuln2014.picoctf.com:51818");
URLConnection con = url.openConnection();
InputStream is = con.getInputStream()
The error occurs at the third line.
java.io.IOException: Invalid Http response at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1342) at name.main(name.java:41)
curl happily gets the text from the page, and it is perfectly accessible from a web browser.
When you do this:
URL url = new URL("http://vuln2014.picoctf.com:51818");
URLConnection con = url.openConnection();
You are entering into a contract that says that this URL uses the http protocol. When you call openConnection it expects to get http responses because you used http:// in the URL as the protocol. The Java Documentation says:
If for the URL's protocol (such as HTTP or JAR), there exists a public, specialized URLConnection subclass belonging to one of the following packages or one of their subpackages: java.lang, java.io, java.util, java.net, the connection returned will be of that subclass. For example, for HTTP an HttpURLConnection will be returned, and for JAR a JarURLConnection will be returned.
The server you are connecting to just returns a couple lines of data. I retrieved them with the command nc vuln2014.picoctf.com 51818. There is no http response code like HTTP/1.1 200 OK:
Welcome to the Daedalus Corp Spies RSA Key Generation Service. The public modulus you should use to send your updates is below. Remember to use exponent 65537.
b4ab920c4772c5247e7d89ec7570af7295f92e3b584fc1a1a5624d19ca07cd72ab4ab9c8ec58a63c09f382aa319fa5a714a46ffafcb6529026bbc058fc49fb1c29ae9f414db4aa609a5cab6ff5c7b4c4cfc7c18844f048e3899934999510b2fe25fcf8c572514dd2e14c6e19c4668d9ad82fe647cf9e700dcf6dc23496be30bb
In this case I would use java.net.Socket to establish a connection and then read the lines. This is a simplistic approach that assumes there are 2 lines of data:
Socket theSocket;
try {
theSocket = new Socket("vuln2014.picoctf.com", 51818);
BufferedReader inFile = new BufferedReader(new InputStreamReader(theSocket.getInputStream()));
String strGreet = inFile.readLine();
String strData = inFile.readLine();
} catch (IOException e) {
e.printStackTrace();
}
As for why curl and browsers may render it properly? They are likely more lenient about the data they read and will just dump what is read from the port even if it doesn't conform to the specified protocol (like http)

open website code in java as illusion of web browser

I want to open a website in web browser. I know it is easy but i want to do it in different way ...
It is like proxy server .I have made a java code that will get content(source code) of webpage and when browser request localhost on particular port number this code writes source code in browser. But instead of getting web page I am getting source code of webpage in browser and also i want to make a request from java code as a illusion of browser means server should feel that that request is made from a browser and not from java console.
import java.net.*;
import java.io.*;
public class URLConnectionReader {
public static void main(String args[]) throws Exception{
URL ul = null;
HttpURLConnection ulc = null;
ServerSocket server = null;
Socket client = null;
DataInputStream in = null;
DataOutputStream out = null;
String c = null;
server = new ServerSocket(9898);
System.out.println("Server is waiting for clients on port no 9898....");
while(client == null){
client = server.accept();
}
System.out.println("Connected.....");
out = new DataOutputStream(client.getOutputStream());
ul = new URL("http://www.google.com");
ulc = (HttpURLConnection)ul.openConnection();
in = new DataInputStream(ulc.getInputStream());
while((c = in.readLine())!=null){
out.writeBytes(c);
}
in.close();
out.close();
client.close();
}
}
Loading web pages is not quite as simple as you probably think. Both the browser and the server use a protocol called HTTP. In simple terms, the browser sends a request consisting of a request line, headers and sometimes data, and the server responds with a response line, headers and data. Most web pages also have related resources that need to be loaded for displaying the page (such as images, stylesheets and scripts), and each resource is loaded through a separate request.
Your program only accepts one request, completely ignores the details of the request, and then loads a fixed web page and sends it as the response. The way you are loading the web page (with a URL), you are only getting the data part of the response (the page source); the response line and the headers are missing. The headers are very important as one of them (named "Content-Type") specifies what kind of resource it is - web page, image or something else. Without it, browsers usually assume the data is plain text and display it accordingly.
So if you want your experiment to work better, you need to make sure you send a complete and valid HTTP response to the browser. You can probably reconstruct the response line and headers from the HttpURLConnection object. Or you can use sockets directly to load the web page.
A better solution would be to use a java web server (such as Jetty) in which you'd run a servlet that loads the remote page using an HTTP client library (such as Apache HttpComponents) and does the necessary processing of addresses and headers. But.. small steps :)

reading bytes from web site

I am trying to create a proxy server.
I want to read the websites byte by byte so that I can display images and all other stuff. I tried readLine but I can't display images. Do you have any suggestions how I can change my code and send all data with DataOutputStream object to browser ?
try{
Socket s = new Socket(InetAddress.getByName(req.hostname), 80);
String file = parcala(req.url);
DataOutputStream out = new DataOutputStream(clientSocket.getOutputStream());
BufferedReader dis = new BufferedReader(new InputStreamReader(s.getInputStream()));
PrintWriter socketOut = new PrintWriter(s.getOutputStream());
socketOut.print("GET "+ req.url + "\n\n");
//socketOut.print("Host: "+req.hostname);
socketOut.flush();
String line;
while ((line = dis.readLine()) != null){
System.out.println(line);
}
}
catch (Exception e){}
}
Edited Part
This is what I should have to do. I can block banned web sites but can't allow other web sites in my program.
In the filter program, you will open a TCP socket at the specified port and wait for connections. If a
request comes (i.e. the client types a URL to access a web site), the application will process it to
decide whether access is allowed or not and then, using the same socket, it will send the reply back
to the client. After the client opened her connection to WebPolice (and her request has been checked
and is allowed), the real web page needs to be shown to the client. Therefore, since the user already gave her request, now it is WebPolice’s turn to forward the request so that the user can get the web page. Thus, WebPolice acts as a client and requests the web page. This means you need to open a connection to the web server (without closing the connection to the user), forward the request over this connection, get the reply and forward it back to the client. You will use threads to handle multiple connections (at the same time and/or at different times).
I don't know what exactly you're trying to do, but crafting an HTTP request and reading its response incorporates somewhat more than you have done here. Readline won't work on binary data anyway.
You can take a look at the URLConnection class (stolen here):
URL oracle = new URL("http://www.oracle.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
Then you can read textual or binary data from the in object.
Read line will treat the line read as a String, so unless you want to mess around with conversions over to bytes, I wouldn't recommend that.
I would just read bytes until you can't read anymore, then write them out to a file, this should allow you to grab the images, keeping file headers intact which can be important when dealing with files other than text.
Hope this helps.
Instead of using BufferedReader you can try to use InputStream.
It has several methods for reading bytes.
http://docs.oracle.com/javase/6/docs/api/java/io/InputStream.html

File not found exception while reading connection.getInputStream()

I am sending a request on a server URL but I am getting File not found exception but when I browse this file through a web browser it seems fine.
URL url = new URL(serverUrl);
connection = getSecureConnection(url);
// Connect to server
connection.connect();
// Send parameters to server
writer = new BufferedWriter(new OutputStreamWriter(connection.getOutputStream(), "UTF-8"));
writer.write(parseParameters(CoreConstants.ACTION_PREFIX + actionName, parameters));
writer.flush();
// Read server's response
reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
when I try to getInputStream then it throws error file not found.
It is an .aspx Controller page.
If the request works fine in a browser but not in code, and you've verified that the URL is the same, then the problem probably has something to do with how you are sending your parameters to the server. Specifically, this part:
writer.write(parseParameters(CoreConstants.ACTION_PREFIX + actionName, parameters));
Perhaps there is a bug in the parseParameters() function?
But more generally, I would recommend using something a bit higher-level than a raw URLConnection. HtmlUnit and HttpClient are both fine choices, particularly since it seems like your request is a fairly simple one. I've used both to perform similar client/server interaction in a number of apps. I suggest revising your code to use one of these libraries, and then see if it still produces the error.
Ok finally I have found that the problem was at IIS side it has been resolved in .Net 4.0. for previous version go to your web.config and specify validateRequest==false

No images displayed when website called from self written webserver

I have Java webserver (no standard software ... self written). Everything seems to work fine, but when I try to call a page that contains pictures, those pictures are not displayed. Do I have to send images with the output stream to the client? Am I missing an extra step?
As there is too much code to post it here, here is a little outline what happens or is supposed to happen:
1. client logs in
2. client gets a session id and so on
3. the client is connected with an output stream
4. we built the response with the HTML-Code for a certain 'GET'-request
5. look what the GET-request is all about
6. send html response || file || image (not working yet)
So much for the basic outline ...
It sends css-files and stuff, but I still have a problem with images!
Does anybody have an idea? How can I send images from a server to a browser?
Thanks.
I check requests from the client and responses from the server with charles. It sends the files (like css or js) fine, but doesn't with images: though the status is "200 OK" the transfer-encoding is chunked ... I have no idea what that means!? Does anybody know?
EDIT:
Here is the file-reading code:
try{
File requestedFile = new File( file );
PrintStream out = new PrintStream( this.getHttpExchange().getResponseBody() );
// File wird geschickt:
InputStream in = new FileInputStream( requestedFile );
byte content[] = new byte[(int)requestedFile.length()];
in.read( content );
try{
// some header stuff
out.write( content );
}
catch( Exception e ){
e.printStackTrace();
}
in.close();
if(out!=null){
out.close();
System.out.println( "FILE " + uri + " SEND!" );
}
}
catch ( /*all exceptions*/ ) {
// catch it ...
}
Your browser will send separate GET image.png HTTP 1.1 requests to your server, you should handle these file-gets too. There is no good way to embed and image browser-independent in HTML, only the <img src="data:base64codedimage"> protocol handler is available in some browsers.
As you create your HTML response, you can include the contents of the external js/css files directly between <script></script> and <style></style> tags.
Edit: I advise to use Firebug for further diagnostics.
Are you certain that you send out the correct MIME type for the files?
If you need a tiny OpenSource webserver to be inspired by, then have a look at http://www.acme.com/java/software/Acme.Serve.Serve.html which serves us well for ad-hoc server needs.
Do I have to send those external files
or images with the output stream to
the client?
The client will make separate requests for those files, which your server will have to serve. However, those requests can arrive over the same persisten connection (a.k.a. keepalive). The two most likely reasons for your problem:
The client tries to send multiple requests over a persistent connection (which is the default with HTTP 1.1) and your server is not handling this correctly. The easiest way to avoid this is to send a Connection: close header with the response.
The client tries to open a separate connection and your server isn't handling it correctly.
Edit:
There's a problem with this line:
in.read( content );
This method is not guaranteed to fill the array; it will read an arbitrary number of bytes and return that number. You have to use it in a loop to make sure everything is read. Since you have to do a loop anyway, it's a good idea to use a smaller array as a buffer to avoid keeping the whole file in memory and running into an OutOfMemoryError with large files.
Proabably step #4 is where you are going wrong:
// 4. we built the response with the HTML-Code for a certain 'GET'-request
Some of the requests will be a 'GET /css/styles.css' or 'GET /js/main.js' or 'GET /images/header.jpg'. Make sure you stream those files in those circumstances - try loading those URLs directly.
Images (and css/js files) are requested by the browser as completely separate GET requests to the page, so there's definitely no need to "send those ... with the output stream". So if you're getting pages served up ok, but images aren't being loaded, my first guess would be that you're not setting your response headers appropriately (for example, setting the Content-Type of the response to text/html), so the browser isn't interpreting it as a proper page & therefore not loading the images.
Some other things to try if that doesn't work:
Check if you can access an image directly
Use something like firebug or fiddler to check whether the browser is actually requesting the image/css/js files & that all your request/response headers look ok
Use an existing web server!

Categories

Resources