private String indexPage(URL currentPage) throws IOException {
String content = "";
is = currentPage.openStream();
content = new Scanner( is ).useDelimiter( "\\Z" ).next();
return content;
}
This is my function with which I'm currently crawling webpages. The function that a problem is:
content = new Scanner( is ).useDelimiter( "\\Z" ).next();
If the webpage doesn't answer or takes a long time to answer, my thread just hangs at the above line. What's the easiest way to abort this function, if it takes longer than 5 seconds to load fully load that stream?
Thanks in advance!
Instead of struggling with a separate watcher thread, it might be enough for you (although not exactly an answer to your requirement) if you enable connect and read timeouts on the network connection, e.g.:
URL url = new URL("...");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setConnectTimeout(5000);
conn.setReadTimeout(10000);
InputStream is = conn.getInputStream();
This example will fail if it takes more than 5 seconds (5000ms) to connect to the server or if you have to wait more than 10 seconds (10000ms) between any content chunks which are actually read. It does not however limit the total time you need to retrieve the page.
You can close the stream from another thread.
Google's recently released guava-libraries have some classes that offer similar functionality:
TimeLimiter:
Produces proxies that impose a time limit on method calls to the proxied object. For example, to return the value of target.someMethod(), but substitute DEFAULT_VALUE if this method call takes over 50 ms, you can use this code ...
Have a look at FutureTask...
Try to interrupt the thread; many blocking calls in Java will continue when they receive an interrupt.
In this case, content should be empty and Thread.isInterrupted() should be true.
Related
It is very common to get an InputStream from URLConnection like so:
InputStream stream = connection.getInputStream();
where connection is a URLConnection.
https://developer.android.com/reference/java/net/URLConnection.html#getInputStream()
I have a situation where I would like to reuse a stream. I have access to the connection object and my question now is does a single instance of a URLConnection return the 'same' InputStream every time ? That is if I call connection.getInputStream() again (but for the same connection object) will I be getting a new stream ?
In general, the answer to this kind of question ("Does X do Y?") is: Does the documentation say X does Y? If so, yes (or it's broken); if not, you have no guarantee so and must assume not.
So let's look at URLConnection#getInputStream's documentation:
Returns an input stream that reads from this open connection. A SocketTimeoutException can be thrown when reading from the returned input stream if the read timeout expires before data is available for read.
Returns:
an input stream that reads from this open connection.
So you can't rely on it doing so, because it doesn't promise to. (I also looked elsewhere in the JavaDoc for URLConnection.)
(My tests outside Android suggest that HttpURLConnection does at least sometimes, though.)
URLConnection is an interface and it all depends on the implementors if the getInputStream returns a new stream or not.
The best way to verify this is...
URLConnection con = new URL(MY_URL_STRING).openConnection() ;
InputStream in1 = con.getInputStream();
InputStream in2 = con.getInputStream();
boolean streamEquals = in1.equals(in2);
It doesn't matter. It wouldn't make any difference whether it did or it didn't. Consider a method that always returns new DataInputStream(socket.getInputStream()). (I don't claim it is implemented like that: we are just considering.) There is nothing practical you can do with the stream short of comparing it with == that would tell you whether it was a new stream each time like that or always the same stream. What you read out of it is not affected.
Would the following be an appropriate way of dealing with a 503 response code in java networking? What does this code- specifically the calls to disconnect and null do?
URL url = new URL(some url);
HttpURLConnection h =(HttpURLConnection)url.openConnection();
int x = h.getResponseCode();
while(x==503)
{
h.disconnect();
h = null;
h =(HttpURLConnection)url.openConnection();
x = h.getResponseCode();
}
The disconnect() closes the underlying TCP socket.
Setting the local variable to null immediately before reassigning it accomplishes nothing whatsoever.
There should be a sleep in that loop, with an interval that increases on every failure, and a limited number of retries.
Whatever you want it to do is an appropriate way. To make something failsafe, it would be better to repeat until success is achieved, rather than only handling a 503 scenario.
simplest example: loop until 200 (success) code comes back.
(better would be to abstract that out into methods and classes and use OOP and unit tests where possible.)
I am going to read from a socket in java. Here is what I am going to do:
System.out.println("Start Reading");
/* bab is socket connector */
/* and readLine is the method below.
/* public String readLine()throws IOException
{
String a = inStream.readLine();
return a;
}
*/
for( int j=0;j<9;j++)
{
response = bab.readLine();
System.out.println(response);
}
I see a lot of delay (2-3 seconds) between printing "start Reading" and first line of the response. But when I requested it with Firefox, it responsed quickly (20 ms). What is the problem? And how can I solve this problem?
I suspect the reason is the server doesn't send the line-delimiter for some time, so the readLine() method waits. I bet if you just do readByte() it must be quick.
As Firefox or any other browser wouldn't read line by line, it dosn't affect them.
Firefox is probably caching the response and is therefore able to display it very quickly to you. I suggest you clear the cache on Firefox and time it again.
If you are using a domain name for the call then Firefox will also cache the DNS lookup which could save time in Firefox whereas making the call in Java could require a DNS lookup.
If you are using Windows then download Fiddler which will allow you to monitor the HTTP connection and give you a better idea of what is happening.
I have used TaskExecutor from spring and futures to get the future from the task,but now when some task doesnt respond in some time I have to stop those tasks.I have tried using Future.cancel() method but its of no use the thread stills seems to be running .Can you suggest me some method to stop it?
My thread is waiting on reading the Html content from a link..so how do i stop the task
Edit: the relevant code:
URL link = new URL(Content);
URLConnection yc = link.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
StringBuffer Sbuff = new StringBuffer();
while (((inputLine = in.readLine()) != null) &&
((Calendar.getInstance().getTimeInMillis() - TimeatStart) / 1000) < 4) {
Sbuff.append(inputLine);
}
Generally speaking, you don't do that. If you want the HTTP request to stop after hanging for a while, then set a timeout on it. It's always best to let the thread finish whatever work it's been given to do.
Edit: Based on your source code, the first thing you need to do is make sure to call connect() on the URLConnection. Otherwise, nothing will happen. Take a look at Working with URLs in the Java Tutorial. After that, if you still want to set timeouts, use setConnectTimeout() to set the amount of time it should try to connect before timing out and setReadTimeout() to set the amount of time it should wait to receive data after a request is sent.
I wrote a simple downloader as Java applet. During some tests I discover that my way of downloading files is not even half as perfect as e.g. Firefox's way of doing it.
My code:
InputStream is = null;
FileOutputStream os = null;
os = new FileOutputStream(...);
URL u = new URL(...);
URLConnection uc = u.openConnection();
is = uc.getInputStream();
final byte[] buf = new byte[1024];
for(int count = is.read(buf);count != -1;count = is.read(buf)) {
os.write(buf, 0, count);
}
Sometimes my applet works fine, sometimes unexpected things happen. E.g. from time to time, in the middle of downloading applet throws an IO exception or just lose a connection for a while, without possibility to return to current download and finish it.
I know that really advanced way is too complicated for single unexperienced Java programmer, but maybe you know some techniques to minimalise risk of appearing these problems.
So you want to resume your download.
If you get an IOException on reading from the URL, there was a problem with the connection.
This happens. Now you must note how much you already did download, and open a new connection which starts from there.
To do this, use setRequestProperty() on the second, and send the right header fields for "I want only the range of the resource starting with ...". See section 14.35.2 Range Retrieval Requests in the HTTP 1.1 specification. You should check the header fields on the response to see if you really got back a range, though.