2 BufferedReaders from one InputStream - java

I made the following code:
try {
URL url = new URL("http://bbc.com");
is = url.openStream();
BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF-8"));
System.out.println(in.readLine());
//in.close(); with this next lines throw java.io.IOException: stream is closed
in = new BufferedReader(new InputStreamReader(is, "iso-8859-2"));
System.out.println(in.readLine().length());
} catch (Exception ex) {
ex.printStackTrace();
}
The problem is the second BufferedReader starts read from a few different point after almost every program run (the printed length is different). Same problems occur withe the same encoding. How can I read encoding and then read content with this encoding without creating new InputStream (every creation of new InputStream takes 0.1 to 3 s depending on site)?

I suggest you copy the entire stream, e.g. by repeatedly calling read() and then writing the results into a ByteArrayOutputStream. You can then get a byte array from that, and create multiple independent ByteArrayInputStream wrappers around the byte array.
(You can use Guava's ByteStreams.ToByteArray(is) as an alternative for the first part.)
Another alternative would be to wrap the original InputStream in a BufferedInputStream, call mark immediately with a "large enough" limit, then reset it after you've read the first line, before creating the second BufferedReader.

I wouldn't use a BufferedReader to read the first line, I would just read characters until I find a '\n'.

Related

Reading a large compressed file using Apache Commons Compress

I'm trying to read a bz2 file using Apache Commons Compress.
The following code works for a small file.
However for a large file (over 500MB), it ends after reading a few thousands lines without any error.
try {
InputStream fin = new FileInputStream("/data/file.bz2");
BufferedInputStream bis = new BufferedInputStream(fin);
CompressorInputStream input = new CompressorStreamFactory()
.createCompressorInputStream(bis);
BufferedReader br = new BufferedReader(new InputStreamReader(input,
"UTF-8"));
String line = "";
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {
e.printStackTrace();
}
Is there another good way to read a large compressed file?
I was having the same problem with a large file, until I noticed that CompressorStreamFactory has a couple of overloaded constructors that take a boolean decompressUntilEOF parameter.
Simply changing to the following may be all that's missing...
CompressorInputStream input = new CompressorStreamFactory(true)
.createCompressorInputStream(bis);
Clearly, whoever wrote this factory seems to think it's better to create new compressor input streams at certain points, with the same underlying buffered input stream so that the new one picks up where the last one left off. They seem to think that's a better default, or preferred way of doing it over allowing one stream to decompress data all the way to the end of the file. I've no doubt they are cleverer than me, and I haven't worked out what trap I'm setting for future me by setting this parameter to true. Maybe someone will tell me in the comments! :-)

HttpURLConnection : What is the right way to read data from a WebService, character by character or line by line?

I know this has been asked before, but since I haven't been able to find an answer with a definitive conclusion or at least one that shows the pros and cons of the possibles approaches, I have to ask :
When it comes to read data from the Internet, a webservice for instance, what is the correct or more efficient way to read this data?
From all the books I have glanced over, I've found at least 4 ways to read data:
1) Reading a specific amount of characters at a time.
In this case the data is read in chunks of 4026 characters
BufferedReader reader = new BufferedReader(
new InputStreamReader(in, encoding));
char[] buffer = new char[4096];
StringBuilder sb = new StringBuilder();
int downloadedBytes = 0;
int len1 = 0;
while ((len1 = reader.read(buffer)) > 0) {
sb.append(buffer);
}
return sb.toString();
2) Read the data knowing the content lenght
int length =(HttpURLConnection) urlConnection.getContentLength();
InputStream inputStream = urlConnection.getInputStream();
BufferedReader bufferedReader =new BufferedReader(new InputStreamReader(inputStream));
StringBuilder stringBuilder = new StringBuilder(length);
char[] buffer = new char[length];
int charsRead;
while ((charsRead = bufferedReader.read(buffer)) != -1) {
stringBuilder.append(buffer, 0, charsRead);
}
return stringBuilder.toString();
3) Read the data line by line :
BufferedReader reader=new BufferedReader(new InputStreamReader(c.getInputStream()));
StringBuilder buf=new StringBuilder();
String line=null;
while ((line=reader.readLine()) != null) {
buf.append(line);
}
return(buf.toString());
4) Read the data character by character:
InputStream in = mConnection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(
in, enconding));
int ch;
StringBuilder sb = new StringBuilder();
while ((ch=reader.read()) > 0) {
sb.append((char)ch);
}
return sb.toString().trim();
I have tried 3 of these 4 different techniques, except for number 3 (Read the data line by line ) and out of the three techniques only the fourth has given me good results.
The first method, didn't work for me because when I read large amounts of data, as it often cut the data giving me as a result invalid json strings or string with white spaces at the end.
The second approach, well I wasn't able to use that method because getContentLength is not always reliable and if the value is not set , there's nothing we can do about it , well that's my case.
I didn't tried the third method because I wasn't sure about the fact of reading data "line" by "line". Does this apply to data that contains an array of json objects or only to files that indeed contain lines??
Being the last technique the last choice I was left with, I tried it and it worked, BUT I don't think that reading a large amount of data character by character would be efficient at all.
So now I would really appreciate your opinions and ideas. What approach do you use when it comes to reading data from webservices? and more importantly why?
Thanks.
P.D. I know I could've easily used DefaultHttpClient, but the doc clearly encourages not to do so.
For Android 2.3 (Gingerbread) and later, HttpURLConnection is the best
choice. Its simple API and small size makes it great fit for Android.
Transparent compression and response caching reduce network use,
improve speed and save battery.
Ive tried all the methods that you have mentioned. One problem if faced was the reply not being read completely. After some research, the most efficient/fastest way i found was to go about it like this
DefaultHttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("Accept", "application/json");
httpGet.setHeader("Content-type", "application/json");
//ive put json header because im using json
try {
HttpResponse execute = client.execute(httpGet);
String responseStr = EntityUtils.toString(execute.getEntity());
}
responseStr will contain the webservice reply and it reads it in one go. Hope this helps
If the data volume is not too big, it doesn't really matter, what approach you use. If it is, then it makes sense to use buffering - and read data in chunks.
2nd approach is not too good, as you not always can get ContentLength.
Then, if your data is text/html/JSON you can use 3rd approach, as you don't have to bother yourself with the chunk size. Also, you can print the incoming data line-by-line to aim debugging.
If your data is a binary/base64 stream like image, you should use 1st approach and read data in 4k (usually used) blocks.
UPDATE:
BTW, instead of the dreaded DefaultHttpClient I'm using the AndroidHttpClient as a singleton and it works smooth :)
It matters. Best for performance is to read from InputStream into a buffer of a reasonable size. This way you transfer a decent amount of data at one time, rather then repeating the same operation thousand times. Do not always rely on Content-length header value. For gzipped content it might show incorrect size.

Java reading a file different methods

It seems that there are many, many ways to read text files in Java (BufferedReader, DataInputStream etc.) My personal favorite is Scanner with a File in the constructor (it's just simpler, works with mathy data processing better, and has familiar syntax).
Boris the Spider also mentioned Channel and RandomAccessFile.
Can someone explain the pros and cons of each of these methods? To be specific, when would I want to use each?
(edit) I think I should be specific and add that I have a strong preference for the Scanner method. So the real question is, when wouldn't I want to use it?
Lets start at the beginning. The question is what do you want to do?
It's important to understand what a file actually is. A file is a collection of bytes on a disc, these bytes are your data. There are various levels of abstraction above that that Java provides:
File(Input|Output)Stream - read these bytes as a stream of byte.
File(Reader|Writer) - read from a stream of bytes as a stream of char.
Scanner - read from a stream of char and tokenise it.
RandomAccessFile - read these bytes as a searchable byte[].
FileChannel - read these bytes in a safe multithreaded way.
On top of each of those there are the Decorators, for example you can add buffering with BufferedXXX. You could add linebreak awareness to a FileWriter with PrintWriter. You could turn an InputStream into a Reader with an InputStreamReader (currently the only way to specify character encoding for a Reader).
So - when wouldn't I want to use it [a Scanner]?.
You would not use a Scanner if you wanted to, (these are some examples):
Read in data as bytes
Read in a serialized Java object
Copy bytes from one file to another, maybe with some filtering.
It is also worth nothing that the Scanner(File file) constructor takes the File and opens a FileInputStream with the platform default encoding - this is almost always a bad idea. It is generally recognised that you should specify the encoding explicitly to avoid nasty encoding based bugs. Further the stream isn't buffered.
So you may be better off with
try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream())), "UTF-8") {
//do stuff
}
Ugly, I know.
It's worth noting that Java 7 Provides a further layer of abstraction to remove the need to loop over files - these are in the Files class:
byte[] Files.readAllBytes(Path path)
List<String> Files.readAllLines(Path path, Charset cs)
Both these methods read the entire file into memory, which might not be appropriate. In Java 8 this is further improved by adding support for the new Stream API:
Stream<String> Files.lines(Path path, Charset cs)
Stream<Path> Files.list(Path dir)
For example to get a Stream of words from a Path you can do:
final Stream<String> words = Files.lines(Paths.get("myFile.txt")).
flatMap((in) -> Arrays.stream(in.split("\\b")));
SCANNER:
can parse primitive types and strings using regular expressions.
A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types.more can be read at http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
DATA INPUT STREAM:
Lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class. More can be read at http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html
BufferedReader:
Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.More detail are at http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
NOTE: This approach is outdated. As Boris points out in his comment. I will leave it here for history, but you should use methods available in JDK.
It depends on what kind of operation you are doing and the size of the file you are reading.
In most of the cases, I recommend using commons-io for small files.
byte[] data = FileUtils.readFileToByteArray(new File("myfile"));
You can read it as string or character array...
Now, you are handing big files, or changing parts of a file directly on the filesystem, then the best it to use a RandomAccessFile and potentially even a FileChannel to do "nio" style.
Using BufferedReader
BufferedReader reader;
char[] buffer = new char[10];
reader = new BufferedReader(new FileReader("FILE_PATH"));
//or
reader = Files.newBufferedReader(Path.get("FILE_PATH"));
while (reader.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new char[10];
}
//or
while (buffReader.ready()) {
System.out.println(
buffReader.readLine());
}
reader.close();
Using FileInputStream-Read Binary Files to Bytes
FileInputStream fis;
byte[] buffer = new byte[10];
fis = new FileInputStream("FILE_PATH");
//or
fis=Files.newInoutSream(Paths.get("FILE_PATH"))
while (fis.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new byte[10];
}
fis.close();
Using Files– Read Small File to List of Strings
List<String> allLines = Files.readAllLines(Paths.get("FILE_PATH"));
for (String line : allLines) {
System.out.println(line);
}
Using Scanner – Read Text File as Iterator
Scanner scanner = new Scanner(new File("FILE_PATH"));
while (scanner.hasNextLine()) {
System.out.println(scanner.nextLine());
}
scanner.close();
Using RandomAccessFile-Reading Files in Read-Only Mode
RandomAccessFile file = new RandomAccessFile("FILE_PATH", "r");
String str;
while ((str = file.readLine()) != null) {
System.out.println(str);
}
file.close();
Using Files.lines-Reading lines as stream
Stream<String> lines = Files.lines(Paths.get("FILE_PATH") .forEach(s -> System.out.println(s));
Using FileChannel-for increasing performance by using off-heap memory furthermore using MappedByteBuffer
FileInputStream i = new FileInputStream(("FILE_PATH");
ReadableByteChannel r = i.getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while (r.read(buffer) != -1) {
buffer.flip();
while (buffer.hasRemaining()) {
System.out.print((char) buffer.get());
}
buffer.clear();
}

TeeInputStream and PipedStream does not work in any case

I run into a problem while "cloning" an InputStream.
This does not work:
InputStream is = ClassLoader.getSystemResourceAsStream("myResource");
But this works:
InputStream is = new BufferedInputStream(new FileInputStream("/afas.cfg"));
My code is:
// Create a piped input stream for one of the readers.
PipedInputStream in = new PipedInputStream();
// Create a tee-splitter for the other reader.(from apache commons io)
TeeInputStream tee = new TeeInputStream(is, new PipedOutputStream(in));
// Create the two buffered readers.
BufferedReader br1 = new BufferedReader(new InputStreamReader(tee));
BufferedReader br2 = new BufferedReader(new InputStreamReader(in));
// Do some interleaved reads from them.
System.out.println("One line from br1:");
System.out.println(br1.readLine());
System.out.println();
System.out.println("Two lines from br2:");
System.out.println(br2.readLine());
System.out.println(br2.readLine());
System.out.println();
System.out.println("One line from br1:");
System.out.println(br1.readLine());
System.out.println();
The problem occurs at the first br1.readLine() call. It just get stuck at PipedInputStream.awaitSpace() and is in an endless loop.
Are the PipedStreams only for threads? Meaning that when writing to the PipedOutputStreams the PipedInputStream would "wake up"
What do i have to do to get this work in any case?
This is a misuse of piped streams. They are intended to be used by different threads. They will not work as you are using them here, because there is a 4k buffer and the writer blocks when it fills. From the Javadoc:
Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread.
Personally I have never encountered a valid use for these piped streams since May 1997. I used one once back then and took it out immediately in favour of a queue.

Java, read from file throws io exception - read error

Im reading from a file (data.bin) using the following approach -
fis1 = new FileInputStream(file1);
String data;
dis1 = new DataInputStream(fis);
buffread1=new BufferedReader(new InputStreamReader(dis1));
while( (data= buffread1.readLine())!=null){
}
Now im getting the io exception of read error. Now im guessing that I probably am not able read the data in the file as they are stored in the following format.
#SP,IN-1009579,13:00:33,20/01/2010, $Bœ™šAe%N
B\VÈ–7$B™šAciC B]|XçF [s + ýŒ 01210B3âEªP6#·B.
the above is just one line of the file and i want to read every line of that file and carry out operation on the data that is read.
Any pointers on how the above can be accomplished would be of great help.
Cheers
That look like part of binary data. You don't want to read it entirely as character data. Rather use an InputStream instead of Reader to read binary data. To learn more about the IO essentials, consult Sun's own IO tutorial.
I guess what you want is just: (DataInputStream expects some objects that have been serialized as an array of bytes...)
buffread1=new BufferedReader(new FileReader(file1));
while( (data= buffread1.readLine())!=null){
}

Categories

Resources