How can I read a text file from the internet with Java? - java

I want to read the second line of the text at this URL: "http://vuln2014.picoctf.com:51818/" (this is a capture-the-flag competition but only asking for flags or direction to flags breaks the competition rules). I am attempting to open an input stream from the URL but I get an Invalid HTTP Response exception. Any help is appreciated, and I recognize that my error is likely quite foolish.
Code:
URL url = new URL("http://vuln2014.picoctf.com:51818");
URLConnection con = url.openConnection();
InputStream is = con.getInputStream()
The error occurs at the third line.
java.io.IOException: Invalid Http response at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1342) at name.main(name.java:41)
curl happily gets the text from the page, and it is perfectly accessible from a web browser.

When you do this:
URL url = new URL("http://vuln2014.picoctf.com:51818");
URLConnection con = url.openConnection();
You are entering into a contract that says that this URL uses the http protocol. When you call openConnection it expects to get http responses because you used http:// in the URL as the protocol. The Java Documentation says:
If for the URL's protocol (such as HTTP or JAR), there exists a public, specialized URLConnection subclass belonging to one of the following packages or one of their subpackages: java.lang, java.io, java.util, java.net, the connection returned will be of that subclass. For example, for HTTP an HttpURLConnection will be returned, and for JAR a JarURLConnection will be returned.
The server you are connecting to just returns a couple lines of data. I retrieved them with the command nc vuln2014.picoctf.com 51818. There is no http response code like HTTP/1.1 200 OK:
Welcome to the Daedalus Corp Spies RSA Key Generation Service. The public modulus you should use to send your updates is below. Remember to use exponent 65537.
b4ab920c4772c5247e7d89ec7570af7295f92e3b584fc1a1a5624d19ca07cd72ab4ab9c8ec58a63c09f382aa319fa5a714a46ffafcb6529026bbc058fc49fb1c29ae9f414db4aa609a5cab6ff5c7b4c4cfc7c18844f048e3899934999510b2fe25fcf8c572514dd2e14c6e19c4668d9ad82fe647cf9e700dcf6dc23496be30bb
In this case I would use java.net.Socket to establish a connection and then read the lines. This is a simplistic approach that assumes there are 2 lines of data:
Socket theSocket;
try {
theSocket = new Socket("vuln2014.picoctf.com", 51818);
BufferedReader inFile = new BufferedReader(new InputStreamReader(theSocket.getInputStream()));
String strGreet = inFile.readLine();
String strData = inFile.readLine();
} catch (IOException e) {
e.printStackTrace();
}
As for why curl and browsers may render it properly? They are likely more lenient about the data they read and will just dump what is read from the port even if it doesn't conform to the specified protocol (like http)

Related

get protocol from URL if not typed in

I am trying to find a way to get the protocol from a URL that the user types in. I have an EditText set as uri in an android layout file. The user types in his web address as www.thiersite.com or theirsite.com.
Now how can I get the correct protocol from what they have typed in? It seems everywhere I look that you need to have either https:// or http:// as the protocol in the http request. I get a malformed exception when I don't have a protocol for their web address.
Is there a way to check the URL without having the need to have the protocol when they typed their address? So in essence, do I need to ask the User to type in the protocol as part of the URL? I would prefer to do it programmatically.
/**
* Protocol value could be http:// or https://
*/
boolean usesProtocol(String url,String protocol){
boolean uses = false;
try{
URL u = new URL( protocol.concat(url) );
URLConnection con = u.openConnection();
con.connect();
// the following line will be hit only if the
// supplied protocol is supported
uses = true;
}catch(MalformedURLException e){
// new URL() failed
// user has made a typing error
}catch(IOException e){
// openConnection() failed
// the supplied protocol is not supported
}finally{
return uses;
}
}
I believe that the code is self-explaining. The above code uses no external dependencies. If you do not mind using JSoup, there is another answer on SO that deals with the same: Java how to find out if a URL is http or https?
My Source: http://docs.oracle.com/javase/tutorial/networking/urls/connecting.html

Socket versus URL website access

I have a Java application which opens an existing company's website using the Socket class:
Socket sockSite;
InputStream inFile = null;
BufferedWriter out = null;
try
{
sockSite = new Socket( presetSite, 80 );
inFile = sockSite.getInputStream();
out = new BufferedWriter( new OutputStreamWriter(sockSite.getOutputStream()) );
}
catch ( IOException e )
{
...
}
out.write( "GET " + presetPath + " HTTP/1.1\r\n\r\n" );
out.flush();
I would read the website with the stream inFile and life is good.
Recently this started to fail. I was getting an HTTP 301 "site has moved" error but no moved-to link. The site still exists and responds using the same original HTTP reference and any web browser. But the above code comes back with the HTTP 301.
I changed the code to this:
URL url;
InputStream inFile = null;
try
{
url = new URL( presetSite + presetPath );
inFile = url.openStream();
}
catch ( IOException e )
{
...
}
And read the site with the original code from inFile stream and it now works again.
This difference doesn't just occur in Java but it also occurs if I use Perl (using IO::Socket::INET approach opening the website port 80, then issuing a GET fails, but using LWP::Simple method get just works). In other words, I get a failure if I open the web page first with port 80, then do a GET, but it works fine if I use a class which does it "all at once" (that just says, "get me web page with such-and-such an HTTP address").
I thought I'd try the different approaches on http://www.microsoft.com and got an interesting result. In the case of opening port 80, followed by issuing the GET /..., I received an HTTP 200 response with a page that said, "Your current user agent
In one case, I tried the "port 80" open followed by GET / on www.microsoft.com and I received an HTTP 200 response page that said, "Your current user agent appears to be from an automated process...". But if I use the second method (URL class in Java, or LWP in Perl) I simply get their web page.
So my question is: how does the URL class (in Java) or the LWP module (in Perl) do its thing under the hood that makes it different from opening the website on port 80 and issuing a GET?
Most servers require the Host: header, to allow virtual hosting (multiple domains on one IP)
If you use a packet capturing software to see what's being sent when URL is used, you'll realize that there's a lot more than just "GET /" being sent. All sorts of additional header information are included. If a server gets just a simple "GET /", it's easy to deduct that it can't be a very sophisticated client on the other end.
Also, HTTP 1.0 is "outdated", the current version is 1.1.
Java URL implementation delegates to HttpURLConnection if it starts with "http:"

Getting error 502/504 when trying to get InputStream

URL queryUrl = new URL(url);
InputStream inputStream = null;
HttpsURLConnection connection = (HttpsURLConnection) queryUrl.openConnection();
connection.setRequestProperty("User-Agent", "My Client");
connection.setRequestMethod("GET");
connection.setDoInput(true);
inputStream = connection.getInputStream();
Hi,
I'm using the above code to perform an HttpGet query.
I'm getting once in a few tries an exception that server returned error code 502 or 504 (both scenarios occur).
The exception is thrown in the line :
inputStream = connection.getInputStream()
Any ideas?
Your help would be appreciated.
Thanks in advance
The error code in 5xx indicates some issue with Server or proxy. Rfc Statement
Response status codes beginning with the digit "5" indicate cases in which the server is
aware that it has erred or is incapable of performing the request. Except when responding
to a HEAD request, the server SHOULD include an entity containing an explanation of the
error situation, and whether it is a temporary or permanent condition. User agents SHOULD
display any included entity to the user. These response codes are applicable to any request method.
Please check what is the actual error by reading the error steam of url connection as below:
If the HTTP response code is 4nn (Client Error) or 5nn (Server Error), then you may want to read the HttpURLConnection#getErrorStream() to see if the server has sent any useful error information.
InputStream error = ((HttpURLConnection) connection).getErrorStream();
Also I think for working with http requests, you can use Apache HttpClient instead of directly working with HttpUrlConnection. Using HttpClient is lot more easier.

How i reset an URL connection

I use URL connection to download stream in the Internet. But after i reset the modem, i can't continue download this stream caz it error: Connection reset. How i solve it?
Here is my code:
URL url = new URL(_URL);
HttpURLConnection hUC = (HttpURLConnection) url.openConnection();
hUC.connect();
while (true) {
if ((_data.num = is.read(_data.b)) == -1) {
break;
}
//write to file
fos.write(_data.b, 0, _data.num);
}
You can't - at least, not how you may be expecting.
Instead, you need to handle your exception, and determine how much data you've already read. Once your Internet connection is re-established - assuming that the HTTP server you're downloading from supports requestable byte ranges - you can then set custom HTTP Headers on the request and re-download the remaining portions. (This will require a new HttpURLConnection.)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35 shows the related HTTP specifications involved to make this work.
This is a bit more complicated if you're looking for a "resume" type feature.
You would need to reissue the request once the internet comes back after a disconnect, and add a header to the request in order to resume the download at the byte number where you left off.
You need to set the Range property in the request header in order to specify how far in you're resuming. Then you would just continue to write to the "fos" object from there.
Check out this url: Java: resume Download in URLConnection

Java http call returning response code: 501

I am having an issue with this error:
**Server returned HTTP response code: 501 for URL: http://dev1:8080/data/xml/01423_01.xml**
See this code:
private static Map sendRequest(String hostName, String serviceName) throws Exception {
Map assets = null;
HttpURLConnection connection = null;
Authenticator.setDefault(new Authenticator());
URL serviceURL = new URL(hostName + "/" + serviceName);
connection = (HttpURLConnection)serviceURL.openConnection();
connection.setRequestMethod("GET");
ClientHttpRequest postRequest = new ClientHttpRequest(connection);
InputStream input = null;
/*
At line input = postRequest.post(); I get the following error
Server returned HTTP response code: 501 for URL: http://dev1:8080/data/xml/01423_01.xml
Yet if I enter that url in my browser it opens up fine.
Is this a common problem? Is there some type of content type I need to set?
*/
input = postRequest.post();
connection.disconnect();
return assets;
}
A 501 response means "not implemented", and is usually taken to mean that the server didn't understand the HTTP method that you used (e.g. get, post, etc).
I don't recognise ClientHttpRequest , but you have a line that says
connection.setRequestMethod("GET");
and then a line that says
input = postRequest.post();
I'm not sure what post() actually does, but does that mean send a POST request? If so, then that contradicts the GET specified in the first line.
Either way, the server is saying that it doesn't under the GET or the POST method, whichever one your code is actually sending. You need to find out what method the server does support for that URL, and use that.
Perhaps you should check your port settings:
new URL(hostName + "/" + serviceName);
Looks like the port number ":8080" is missing.
Some server expect additional information from the client in the request like a user agent or some form data. Even cookies could be expected by the application running on the server. You should also check the complete response and not only the response code.
I would recommend you to use a library like httpclient that is more convenient:
https://hc.apache.org/httpcomponents-client-ga/index.html
Here is simple usage example:
https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/test/java/org/apache/hc/client5/http/examples/ClientWithResponseHandler.java

Categories

Resources