I use this code snippet to read text from a webpage aand save it to a string?
I would like the readline() function to start from the beggining. So it would read content of the webpage again. How Can I do that
if (response == httpURLConnection.HTTP_OK) {
in = httpURLConnection.getInputStream();
isr = new InputStreamReader(in);
br = new BufferedReader(isr);
while ((line = br.readLine()) != null) {
fullText += line;
}
// I want to go through a webpage source again, but
// I can't because br.readLine() = null. How can I put
// put a marker on the beginning of the page?
while ((line1 = br.readLine()) != null) {
fullText1 += line1;
// It will not go into this loop
}
You can only mark a position for a Reader (and return to it with reset()) if markSupported returns true, and I very much doubt that the stream returned by httpURLConnection.getInputStream() supports marks.
The best option, I think, is to read the response into a buffer and then you can create as many readers as you like over that buffer. You will need to include the line termination characters (which you are currently discarding) to preserve the line structure. (Alternatively, you can read the response into a List<String> rather than into a single String.)
From InputStream will not reset to beginning
your stream inside a BufferedInputStream object like:
with the markSupported() method if your InputStream actually support using mark. According to the API the InputStream class doesn't, but the java.io.BufferedInputStream class does. Maybe you should embed your stream inside a BufferedInputStream object like:
InputStream data = new BufferedInputStream(realResponse.getEntity().getContent());
// data.markSupported() should return "true" now
data.mark(some_size);
// work with "data" now
...
data.reset();
Related
I am trying to parse HTML from a website to get very specific data. The following method reads the source and outputs it as a string to be processed by other methods.
StringBuilder source = new StringBuilder();
URL url = new URL(urlIn);
URLConnection spoof;
spoof = url.openConnection();
spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
String strLine = "";
while ((strLine = in.readLine()) != null){
source.append(strLine);
}
return source.toString();
The problem that I'm having is that since I call this method multiple times with a different urlIn argument each time, sometimes the method gets stuck at the readLine command. I read that this is because readLine looks for a line break and if the BufferedReader object does not contain one for whatever reason, it will be stuck indefinitely.
Is there a way to check whether my BufferedReader object contains a line break before I run the readLine command. I tried using an if (in.toString().contains("\n")) but that always returns false. Alternatively, could I add a "\n" at the end of my Buffered Reader "in" object every time just so that the while loop would break and not hang up indefinitely?
Any help would be appreciated.
Okay, this here should be what you are looking for.
fis = new FileInputStream("C:/sample.txt");
reader = new BufferedReader(new InputStreamReader(fis));
System.out.println("Reading File line by line using BufferedReader");
String line = reader.readLine();
while(line != null){
System.out.println(line);
line = reader.readLine();
}
Read more: http://javarevisited.blogspot.com/2012/07/read-file-line-by-line-java-example-scanner.html#ixzz3g4RHvy6V
Edit, in your case, since it seems like you are doing webapp testing, I do believe WebDriverWait may work for your needs.
This is not true. BufferedReader.readLine() will not block if the underlying stream has reached the end of input. It will return null. See http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#readLine().
If your method is getting stuck there is another explanation.
Carefully check all of your exception handling and stream closing logic.
I have following piece of code :
fis = new FileInputStream(new File(st[0]));
br = new BufferedReader(new InputStreamReader(fis));
while(fis.available()!=-1)
{
System.out.println(br.readLine());
System.out.println(fis.available());
}
The first println statement prints whole of my file but alongside second println statement always shows 0. why when there is actual content to read, is it showing 0 ?
and what should i put as end condition over here.
You want to stop when readLine() returns null, something like this:
String sCurrentLine;
br = new BufferedReader(new FileReader("C:\\testing.txt"));
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
}
The first println statement prints whole of my file but alongside second println statement always shows 0.
You're checking available() twice. After you've read some data, it's no longer available to read, so the available() value printed is different to the one used for the loop condition above.
Secondly, you're reading from the BufferedReader, which does its own buffering of the data from the input stream. That means it's wrong to then sneak around the reader's back to call the available method of the underlying input stream!
Try this:
for (;;) {
String line = br.readLine();
if (line == null) break;
System.out.println(line);
}
availabe() is returning the amount of bytes that can be read for that InputStream when it is not blocking. your readLine() is blocking that InputStream.
I am trying to read text from a web document using a BufferedReader over an InputStreamReader on an URL (to the file on some Apache server).
String result = "";
URL url = new URL("http://someserver.domain/somefile");
BufferedReader in = null;
in = new BufferedReader(new InputStreamReader(url.openStream(), "iso-8859-1"));
result += in.readLine();
Now this works just fine. But Obviously I'd like the reader not to just read one line, but as many as there are in the file.
Looking at the BufferedReader API the following code should do just that:
while (in.ready()) {
result += in.readLine();
}
I.e. read all lines while there are more lines, stop when no more lines are there. This code does not work however - the reader just never reports ready() = true!
I can even print the ready() value right before reading a line (which reads the correct string from the file) but the reader will report 'false'.
Am I doing something wrong? Why does the BufferedReader return 'false' on ready when there is actually stuff to read?
ready() != has more
ready() does not indicate that there is more data to be read. It only shows if a read will could block the thread. It is likely that it will return false before you read all data.
To find out if there is no more data check if readLine() returns null.
String line = in.readLine();
while(line != null){
...
line = in.readLine();
}
Another way you can do this that bypasses the in.ready() is something like:
while ((nextLine = in.readLine()) != null) {
result += nextLine;
}
You will just continue reading until you are done. This way you do not need to worry about the problem with in.ready().
I think the standard way to write this is to just attempt to read the line and verify that it returned sometime. Something like this:
while ((String nextLine = in.readLine()) != null) {
//System.out.println(nextLine);
result += nextLine;
}
So you just continue to go until you get null returned from the stream. See here for extra information:
http://download.oracle.com/javase/1.5.0/docs/api/java/io/BufferedReader.html#readLine()
The BufferedReader.ready() method is behaving as specified:
The Reader.ready() javadoc says the following:
[Returns] true if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.
Then the BufferedReader.ready() javadoc says the following:
Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready.
If you put these two together, it is clear that BufferedReader.ready() can return false in situations where are characters available. In short, you shouldn't rely on ready() to test for logical end-of-file or end-of-stream.
This is what we have been using consistently for years - not sure if it is the "standard" method. I'd like to hear comments about the pros and cons of using URL.openURLStream() directly, and if that is causing the OP's problems. This code works for both HTTP and HTTPS connections.
URL getURL = new URL (servletURL.toString() + identifier+"?"+key+"="+value);
URLConnection uConn = getURL.openConnection();
BufferedReader br = new BufferedReader (new
InputStreamReader (uConn.getInputStream()));
for (String s = br.readLine() ; s != null ; s = br.readLine()) {
System.out.println ("[ServletOut] " + s);
// do stuff with s
}
br.close();
Basically the BufferedReader.ready() method can be used for checking whether the underlying stream is ready for providing data to the method caller.... else we can wait the thread for some time till it becomes ready.
But the real problem is that after we completely read the data stream, it will throw false..
so we didn't know whether the stream is fully read OR underlying stream is busy....
If you want to use in.ready(), the following worked for me well:
for (int i = 0; i < 10; i++) {
System.out.println("is InputStreamReader ready: " + in.ready());
if (!in.ready()) {
Thread.sleep(1000);
} else {
break;
}
}
I need to read lines in URL html page from specific line.
For now, I have the following code:
u = new URL("http://s.ll/message/" + counter);
is = u.openStream(); // throws an IOException
dis = new DataInputStream(new BufferedInputStream(is));
while ((s = dis.readLine()) != null) {
if (s.contains('%')
...
}
I know that this content will not be before the 50th line.
How can I read just from this line?
And is it the quickest way to read URLs?
How can I read just from this line?
Count the lines and ignore the line when the count is below 50. There's no magic way to go straight to line 50 other than just reading the stream and counting the lines. The stream has to be read in anyway.
And is it the quickest way to read URLs?
Depends. However, a more common approach is BufferedReader + InputStreamReader wherein you specify the charset the webpage is encoded in to avoid mojibake.
You're on the right track. To read data from URLs, the simplest way is to just use the URL object. For more complicated HTTP communication tasks you might consider HTTPClient.
The method you're using DataInputStream.readLine() is deprecated since you can't provide the character set used when converting from bytes to string.
I'd do like this:
u = new URL("http://s.ll/message/" + counter);
is = u.openStream(); // throws an IOException
// XXX notice the charset set to utf-8 here.
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "utf-8"));
while ((s = reader.readLine()) != null) {
if (s.contains('%')
...
}
Finding the 50th line requires you to skip to it. Since you can't know at which byte offset into the stream the 50th '\n' (or '\r' or '\r\n' depending on Unix, Mac or Windows line breaks) is - you simply have to count from the beginning.
We're streaming a CSV file from a web service. It appears that we're losing the new line characters when streaming - the client gets the file all on a single line. Any idea what we're doing wrong?
Code:
public static void writeFile(OutputStream out, File file) throws IOException {
BufferedReader input = new BufferedReader(new FileReader(file)); //File input stream
String line;
while ((line = input.readLine()) != null) { //Read file
out.write(line.getBytes()); //Write to output stream
out.flush();
}
input.close();
}
Don't use BufferedReader. You already have an OutputStream at hands, so just get an InputStream of the file and pipe the bytes from input to output it the usual Java IO way. This way you also don't need to worry about newlines being eaten by BufferedReader:
public static void writeFile(OutputStream output, File file) throws IOException {
InputStream input = null;
byte[] buffer = new byte[10240]; // 10KB.
try {
input = new FileInputStream(file);
for (int length = 0; (length = input.read(buffer)) > 0;) {
output.write(buffer, 0, length);
}
} finally {
if (input != null) try { input.close(); } catch (IOException logOrIgnore) {}
}
}
Using a Reader/Writer would involve character encoding problems if you don't know/specify the encoding beforehand. You actually also don't need to know about them here. So just leave it aside.
To improve performance a bit more, you can always wrap the InputStream and OutputStream in an BufferedInputStream and BufferedOutputStream respectively.
The readline method uses the newline chars to delimit what gets read, so the newlines themselves are not returned by readLine.
Don't use readline, you can use a BufferedInputStream and read the file one byte at a time if you want, or pass your own buffer into OutputStream.write.
Note that, like BalusC and Michael Borgwardt say, Readers and Writers are for text, if you just want to copy the file you should use InputStream and OutputStream, you are only concerned with bytes.
There are several things wrong with that code. It may also mutilate any NON-ASCII text since it converts via the platform default encoding twice - and for no good reason at all.
Don't use a Reader to read the file, use a FileInputStream and transfer bytes, avoiding the unnecessary and potentially destructive charset conversions. The line break problem will also be gone.
Any idea what we're doing wrong?
Yes. This line drops the "new line character"
while ((line = input.readLine()) != null) {
And then you write it without it:
out.write(line.getBytes());
This this related question.
BufferedReader.ReadLine() does not preserve the newline. Thus you'll have to add it when writing it out
You can use a PrintWriter which offers a prinln() method. This will also save you from converting the string into an array of chars.
public static void writeFile(OutputStream o, File file) throws IOException {
PrintWriter out = new PrintWriter(new OutputStreamWriter(o));
BufferedReader input = new BufferedReader(new FileReader(file)); //File input stream
String line;
while ((line = input.readLine()) != null) { //Read file
out.println(line); //Write to output stream
out.flush();
}
input.close();
}