I need to read lines in URL html page from specific line.
For now, I have the following code:
u = new URL("http://s.ll/message/" + counter);
is = u.openStream(); // throws an IOException
dis = new DataInputStream(new BufferedInputStream(is));
while ((s = dis.readLine()) != null) {
if (s.contains('%')
...
}
I know that this content will not be before the 50th line.
How can I read just from this line?
And is it the quickest way to read URLs?
How can I read just from this line?
Count the lines and ignore the line when the count is below 50. There's no magic way to go straight to line 50 other than just reading the stream and counting the lines. The stream has to be read in anyway.
And is it the quickest way to read URLs?
Depends. However, a more common approach is BufferedReader + InputStreamReader wherein you specify the charset the webpage is encoded in to avoid mojibake.
You're on the right track. To read data from URLs, the simplest way is to just use the URL object. For more complicated HTTP communication tasks you might consider HTTPClient.
The method you're using DataInputStream.readLine() is deprecated since you can't provide the character set used when converting from bytes to string.
I'd do like this:
u = new URL("http://s.ll/message/" + counter);
is = u.openStream(); // throws an IOException
// XXX notice the charset set to utf-8 here.
BufferedReader reader = new BufferedReader(new InputStreamReader(is, "utf-8"));
while ((s = reader.readLine()) != null) {
if (s.contains('%')
...
}
Finding the 50th line requires you to skip to it. Since you can't know at which byte offset into the stream the 50th '\n' (or '\r' or '\r\n' depending on Unix, Mac or Windows line breaks) is - you simply have to count from the beginning.
Related
How to read file twice eihher using buffer reader or using stream twice ???
That I need manipulate large amounts of data in the code, so the performance needs to be considered.
Sample code 1 below, gives exception "stream closed" -
Url url = 'www.google.com'
InputStream in = url.openStream();
BufferReader br = new BufferReader(in);
Stream<String> ss = br.lines; // read all the lines
List ll = ss.collect();
br.close();
BufferReader br = new BufferReader(in); //exception occurs
Sample code 2 below, gives exception "stream closed/being used" -
Url url = 'www.google.com'
InputStream in = url.openStream();
BufferReader br = new BufferReader(in);
Supplier<Stream<String>> ss = br.lines; // read all the lines
List ll = ss.collect();
List xx = ss.collect();. // Exception occurs
Please ignore the syntax, it's just a draft code.
Kindly suggest.
Here have an example below. You could use it to read as many times as you wish.
BufferedReader br = new BufferedReader(new FileReader( "users/desktop/xxx.txt" ));
String strLine;
List<String> ans= new ArrayList<String>();
// Read rows
while ((strLine = br.readLine()) != null) {
System.out.println(strLine);
ans.add(strLine);
}
// Read again
for (String result: ans) {
System.out.println(result);
}
reference
https://www.dreamincode.net/forums/topic/272652-reading-from-same-file-twice/
You cannot. A stream is just like its real-life watery counterpart. You can observe the water going under the bridge you're standing on, but you can't instruct the water to go back to the top of the hill so that you can observe it again.
Either have each consumers process each line before moving on to the next line, or if that is not possible then you will need to create your own "buffer" of the entire thing: i.e. store each line to Collection<String>, which the second (and third, and fourth...) consumer can iterate over. The potential problem with this is that it's a bigger memory overhead. The HTML of most websites is not likely to prove to be much of a problem in this regard.
Your last example can be trivially fixed by copying the list.
List ll = ss.collect();
List xx = new ArrayList(ll);
In terms of use a stream is somewhat analogous to an iterator in that it can only be used once.
If you want to use the contents of the same stream again you need to create a new stream as you did the first.
As of Java 12, you can pass values of the same stream into two branches by using the Collectors.teeing() method.
List.stream().collect(Collectors.teeing(
Collector1, // do something with the stream
Collector2, // do something else with the stream
BiFunction, use to merge results)
You can also do this.
Supplier<Stream<String>> ss1 = br.lines; // read all the lines
Supplier<Stream<String>> ss2 = br.lines; // read all the lines
Now you can use ss1 and ss2 as two separate streams.
I found this code from another question
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
String line;
String input = "";
while ((line = file.readLine()) != null)
input += line + "\n";
input = input.replace(toUpdate, updated);
FileOutputStream os = new FileOutputStream(data);
os.write(input.getBytes());
file.close();
os.close();
}
This is my file before I replace some lines
example1
example2
example3
But when I replace a line, the file now looks like this
example1example2example3
Which makes it impossible to read the file when there are a lot of lines in it.
How would I go about editing the code above to make my file look what it looked like at the start?
Use System.lineSeparator() instead of \n.
while ((line = file.readLine()) != null)
input += line + System.lineSeparator();
The issue is that on Unix systems, the line separator is \n while on Windows systems, it's \r\n.
In Java versions older then Java 7, you would have to use System.getProperty("line.separator") instead.
As pointed out in the comments, if you have concerns about memory usage, it would be wise to not store the entire output in a variable, but write it out line-by-line in the loop that you're using to process the input.
If you read and modify line by line this has the advantage, that you dont need to fit the whole file in memory. Not sure if this is possible in your case, but it is generally a good thing to aim for streaming. In your case this would in addition remove the need for concatenate the string and you don't need to select a line terminator, because you can write each single transformed line with println(). It requires to write to a different file, which is generally a good thing as it is crash safe. You would lose data if you rewrite a file and get aborted.
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
PrintWriter writer = new PrintWriter(new File(data+".out"), "UTF-8");
String line;
while ((line = file.readLine()) != null)
{
line = line.replace(toUpdate, updated);
writer.println(line);
}
file.close();
if (writer.checkError())
throw new IOException("cannot write");
writer.close();
}
In this case, it assumes that you need to do the replace only on complete lines, not multiple lines. I also added an explicit encoding and use a writer, as you have a string to output.
This is because you use OutputStream which is better for handling binary data. Try using PrintWriter and don't add any line terminator at the end of the lines. Example is here
I am trying to read text from a web document using a BufferedReader over an InputStreamReader on an URL (to the file on some Apache server).
String result = "";
URL url = new URL("http://someserver.domain/somefile");
BufferedReader in = null;
in = new BufferedReader(new InputStreamReader(url.openStream(), "iso-8859-1"));
result += in.readLine();
Now this works just fine. But Obviously I'd like the reader not to just read one line, but as many as there are in the file.
Looking at the BufferedReader API the following code should do just that:
while (in.ready()) {
result += in.readLine();
}
I.e. read all lines while there are more lines, stop when no more lines are there. This code does not work however - the reader just never reports ready() = true!
I can even print the ready() value right before reading a line (which reads the correct string from the file) but the reader will report 'false'.
Am I doing something wrong? Why does the BufferedReader return 'false' on ready when there is actually stuff to read?
ready() != has more
ready() does not indicate that there is more data to be read. It only shows if a read will could block the thread. It is likely that it will return false before you read all data.
To find out if there is no more data check if readLine() returns null.
String line = in.readLine();
while(line != null){
...
line = in.readLine();
}
Another way you can do this that bypasses the in.ready() is something like:
while ((nextLine = in.readLine()) != null) {
result += nextLine;
}
You will just continue reading until you are done. This way you do not need to worry about the problem with in.ready().
I think the standard way to write this is to just attempt to read the line and verify that it returned sometime. Something like this:
while ((String nextLine = in.readLine()) != null) {
//System.out.println(nextLine);
result += nextLine;
}
So you just continue to go until you get null returned from the stream. See here for extra information:
http://download.oracle.com/javase/1.5.0/docs/api/java/io/BufferedReader.html#readLine()
The BufferedReader.ready() method is behaving as specified:
The Reader.ready() javadoc says the following:
[Returns] true if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.
Then the BufferedReader.ready() javadoc says the following:
Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready.
If you put these two together, it is clear that BufferedReader.ready() can return false in situations where are characters available. In short, you shouldn't rely on ready() to test for logical end-of-file or end-of-stream.
This is what we have been using consistently for years - not sure if it is the "standard" method. I'd like to hear comments about the pros and cons of using URL.openURLStream() directly, and if that is causing the OP's problems. This code works for both HTTP and HTTPS connections.
URL getURL = new URL (servletURL.toString() + identifier+"?"+key+"="+value);
URLConnection uConn = getURL.openConnection();
BufferedReader br = new BufferedReader (new
InputStreamReader (uConn.getInputStream()));
for (String s = br.readLine() ; s != null ; s = br.readLine()) {
System.out.println ("[ServletOut] " + s);
// do stuff with s
}
br.close();
Basically the BufferedReader.ready() method can be used for checking whether the underlying stream is ready for providing data to the method caller.... else we can wait the thread for some time till it becomes ready.
But the real problem is that after we completely read the data stream, it will throw false..
so we didn't know whether the stream is fully read OR underlying stream is busy....
If you want to use in.ready(), the following worked for me well:
for (int i = 0; i < 10; i++) {
System.out.println("is InputStreamReader ready: " + in.ready());
if (!in.ready()) {
Thread.sleep(1000);
} else {
break;
}
}
I am writing a small java app which will scan a text file for any instances of particular word and need to have a feature whereby it can report that an instance of the word was found to be the 14th word in the file, on the third line, for example.
For this i tried to use the following code which i thought would check to see whether or not the input was a newline (\n) character and then incerement a line variable that i created:
FileInputStream fileStream = new FileInputStream("src/file.txt");
DataInputStream dataStream = new DataInputStream(fileStream);
BufferedReader buffRead = new BufferedReader(new InputStreamReader(dataStream));
String strLine;
String Sysnewline = System.getProperty("line.separator");
CharSequence newLines = Sysnewline;
int lines = 1;
while ((strLine = buffRead.readLine()) != null)
{
if(strLine.contains(newLines))
{
System.out.println("Line Found");
lines++;
}
}
System.out.println("Total Number Of Lines In File: " + lines);
This does not work for, it simply display 0 at the end of this file. I know the data is being placed into strLine during the while loop as if i change the code slightly to output the line, it is successfully getting each line from the file.
Would anyone happen to know the reason why the above code does not work?
Read the javadocs for readLine.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
readLine() strips newlines. Just increment every iteration of the loop. Also, you're overcomplicating your file reading code. Just do new BufferedReader(new FileReader("src/file.txt"))
I'm having some problems with the FileReader class.
How do I specify an offset in the lines it goes through, and how do I tell it when to stop?
Let's say I want it to go through each line in a .txt file, but only lines 100-200 and then stop?
How would I do this? Right now I'm using ReadLine() but I don't think there's a way to specify offset with that.
Any fast help is VERY appreciated. Thanks.
You can't. FileReader reads a character at a time or a line at a time. Obviously you can write your own code extending or wrapping it to skip to the unneeded lines.
An aside: Be CAREFUL using FileReader or FileWriter - they use the default LOCALE character set. If you want to force a character set use OutputStreamWriter or InputStreamReader. Example
Writer w = new FileWriter(file) can be replaced by
Writer w = new OutputStreamWriter(new FileOutputStream(file),"UTF-8"); <=== see how I can set the character set.
An alternative: If you have FIXED-WIDTH text, then look at RandomAccessFile which lets you seek to any position. This doesn't help you much unless you have fixed width text or an index to skip to a line. But it is handy :)
Read all the lines but use another variable to count which line you are on. Call continue if you are on a line that you don't want to process (say, before the 100th line) and break when you will not want to process any more lines (after the 200th line).
There is not a way to tell the reader to only read certain lines, you can just use a counter to do it.
try {
BufferedReader in = new BufferedReader(new FileReader("infilename"));
String str;
int lineNumber = 0;
while ((str = in.readLine()) != null) {
lineNumber++;
if (lineNumber >= 100 && lineNumber <= 200) {
System.out.println("Line " + lineNumber + ": " + str);
}
}
in.close();
} catch (IOException e) { }
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
for(int i=0;i<100;i++,in.readLine()){}
String line101 = in.readLine();