Using an InputStream for Logging and then XML parsing - java

What I want to do is log the output from an inputstream that I go using
org.apache.http.HttpEntity entity = response.getEntity();
org.apache.http.HttpResponse content =entity.getContent();
//Print the result to the screen for debugging
//puroposes
if(Logging.DEBUG) {
InputStream content =entity.getContent();
int i;
StringBuilder b = new StringBuilder();
while( (i=content.read()) != -1 ) {
b.append((char)i);
}
Log.d(TAG, b.toString());
}
Now after I have finished logging, I want to use the exact same stream through an XML parser. The problem is that it tells me that the steam has already been used.
I tried to the use mark() and reset() calls before and after debugging but it didn't work.

It depends whether the inputstream that is returned supports it. The default implementation in the InputStream class does nothing, as described in the API. So you can't be sure whether the returned Stream actually supports it. To be sure of this, you should wrap it in a BufferedInputStream, which does supports these methods.

In general mark() and reset() won't work on an arbitrary InputStream. They only work on subclasses like FileInputStream where the underlying data source supports these operations.
For something like a SocketInputStream or a console InputStream, your only option will be to read and buffer the entire stream contents somewhere; e.g. in memory or by writing it to a temporary file.

Related

equivalent to Files.readAllLines() for InputStream or Reader?

I have a file that I've been reading into a List via the following method:
List<String> doc = java.nio.file.Files.readAllLines(new File("/path/to/src/resources/citylist.csv").toPath(), StandardCharsets.UTF_8);
Is there any nice (single-line) Java 7/8/nio2 way to pull off the same feat with a file that's inside an executable Jar (and presumably, has to be read with an InputStream)? Perhaps a way to open an InputStream via the classloader, then somehow coerce/transform/wrap it into a Path object? Or some new subclass of InputStream or Reader that contains an equivalent to File.readAllLines(...)?
I know I could do it the traditional way in a half page of code, or via some external library... but before I do, I want to make sure that recent releases of Java can't already do it "out of the box".
An InputStream represents a stream of bytes. Those bytes don't necessarily form (text) content that can be read line by line.
If you know that the InputStream can be interpreted as text, you can wrap it in a InputStreamReader and use BufferedReader#lines() to consume it line by line.
try (InputStream resource = Example.class.getResourceAsStream("resource")) {
List<String> doc =
new BufferedReader(new InputStreamReader(resource,
StandardCharsets.UTF_8)).lines().collect(Collectors.toList());
}
You can use Apache Commons IOUtils#readLines:
List<String> doc = IOUtils.readLines(inputStream, StandardCharsets.UTF_8);

Creating a Java InputStream from an Enumerator[Array[Byte]]

I've been reading up a lot on Iteratees & Enumerators in order to implement a new module in my application.
I'm now at a point where I'm integrating with a 3rd party Java library, and am stuck at working with this method:
public Email addAttachment(String name, InputStream file) throws IOException {
this.attachments.put(name, file);
return this;
}
What I have in my API is the body returned from a WS HTTP call that is an Enumerator[Array[Byte]].
I am wondering now how to write an Iteratee that would process the chunks of Array[Bytes] and create an InputStream to use in this method.
(Side bar): There are other versions of the addAttachment method that take java.io.File however I want to avoid writing to the disk in this operation, and would rather deal with streams.
I attempted to start by writing something like this:
Iteratee.foreach[Array[Byte]] { bytes =>
???
}
However I'm not sure how to interact with the java InputStream here. I found something called a ByteArrayInputStream however that takes the entire Array[Byte] in its constructor, which I'm not sure would work in this scenario as I'm working with chunks ?
I probably need some Java help here!
Thanks for any help in advance.
If I'm following you, I think you want to work with PipedInputStream and PipedOutputStream:
https://docs.oracle.com/javase/8/docs/api/java/io/PipedInputStream.html
You always use them in pairs. You can construct the pair like so:
PipedInputStream in = new PipedInputStream(); //can also specify a buffer size
PipedOutputStream out = new PipedOutputSream(in);
Pass the input stream to the API, and in your own code iterate through your chucks and write your bytes.
The only caveat is that you need to read/write in separate threads. In your case, its probably good to do your iterating / writing in a separate thread. I'm sure you can handle it in Scala better than me, in Java it would be something like:
PipedInputStream in = new PipedInputStream(); //can also specify a buffer size
PipedOutputStream out = new PipedOutputSream(out);
new Thread(() -> {
// do your looping in here, write to 'out'
out.close();
}).run();
email.addAttachment(in);
email.send();
in.close();
(Leaving out exception handling & resource handling for clarity)

URLConnection returning empty inputStream

I am trying to fetch input stream pdf from URL Connection but I am getting an empty input stream. Can anyone please tell me what is I am doing wrong? Following is the code:
<!-- language: java -->
URL fileUrl = new URL("https://www.dropbox.com/s/ao3up7xudju4qm0/Amalgabond%20Adhesive%20Agent.pdf");
HttpURLConnection connection = (HttpURLConnection)fileUrl.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
Log.i("TAG", "is.available(): " + is.available());
is.available() is returning 0 empty stream.
According to the javadoc, available() does not block and wait until all data is available, so you might have not completely received your stuff when its called.
You should use something like this instead of available() :
int bytesRead;
byte[] buffer = new byte[100000];
while((bytesRead = is.read(buffer)) > 0){
// Do something here with buffer
}
read() is a blocking method.
You're misusing the available() method. It doesn't tell you the length of the input stream, so the fact that it returns zero doesn't indicate that it's empty. See the Javadoc, where all this is explicitly stated.
Just read it until end of stream.
If your ultimate goal is to download a file from Dropbox, you should use the Dropbox Java API, or maybe this simpler solution. Otherwise, a URLConnection to a file on Dropbox will download a web page (in HTML) showing you a link to click (with a lot of other stuff !) for downloading your file.

Java reading a file different methods

It seems that there are many, many ways to read text files in Java (BufferedReader, DataInputStream etc.) My personal favorite is Scanner with a File in the constructor (it's just simpler, works with mathy data processing better, and has familiar syntax).
Boris the Spider also mentioned Channel and RandomAccessFile.
Can someone explain the pros and cons of each of these methods? To be specific, when would I want to use each?
(edit) I think I should be specific and add that I have a strong preference for the Scanner method. So the real question is, when wouldn't I want to use it?
Lets start at the beginning. The question is what do you want to do?
It's important to understand what a file actually is. A file is a collection of bytes on a disc, these bytes are your data. There are various levels of abstraction above that that Java provides:
File(Input|Output)Stream - read these bytes as a stream of byte.
File(Reader|Writer) - read from a stream of bytes as a stream of char.
Scanner - read from a stream of char and tokenise it.
RandomAccessFile - read these bytes as a searchable byte[].
FileChannel - read these bytes in a safe multithreaded way.
On top of each of those there are the Decorators, for example you can add buffering with BufferedXXX. You could add linebreak awareness to a FileWriter with PrintWriter. You could turn an InputStream into a Reader with an InputStreamReader (currently the only way to specify character encoding for a Reader).
So - when wouldn't I want to use it [a Scanner]?.
You would not use a Scanner if you wanted to, (these are some examples):
Read in data as bytes
Read in a serialized Java object
Copy bytes from one file to another, maybe with some filtering.
It is also worth nothing that the Scanner(File file) constructor takes the File and opens a FileInputStream with the platform default encoding - this is almost always a bad idea. It is generally recognised that you should specify the encoding explicitly to avoid nasty encoding based bugs. Further the stream isn't buffered.
So you may be better off with
try (final Scanner scanner = new Scanner(new BufferedInputStream(new FileInputStream())), "UTF-8") {
//do stuff
}
Ugly, I know.
It's worth noting that Java 7 Provides a further layer of abstraction to remove the need to loop over files - these are in the Files class:
byte[] Files.readAllBytes(Path path)
List<String> Files.readAllLines(Path path, Charset cs)
Both these methods read the entire file into memory, which might not be appropriate. In Java 8 this is further improved by adding support for the new Stream API:
Stream<String> Files.lines(Path path, Charset cs)
Stream<Path> Files.list(Path dir)
For example to get a Stream of words from a Path you can do:
final Stream<String> words = Files.lines(Paths.get("myFile.txt")).
flatMap((in) -> Arrays.stream(in.split("\\b")));
SCANNER:
can parse primitive types and strings using regular expressions.
A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types.more can be read at http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
DATA INPUT STREAM:
Lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class. More can be read at http://docs.oracle.com/javase/7/docs/api/java/io/DataInputStream.html
BufferedReader:
Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. For example,
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
will buffer the input from the specified file. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.More detail are at http://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
NOTE: This approach is outdated. As Boris points out in his comment. I will leave it here for history, but you should use methods available in JDK.
It depends on what kind of operation you are doing and the size of the file you are reading.
In most of the cases, I recommend using commons-io for small files.
byte[] data = FileUtils.readFileToByteArray(new File("myfile"));
You can read it as string or character array...
Now, you are handing big files, or changing parts of a file directly on the filesystem, then the best it to use a RandomAccessFile and potentially even a FileChannel to do "nio" style.
Using BufferedReader
BufferedReader reader;
char[] buffer = new char[10];
reader = new BufferedReader(new FileReader("FILE_PATH"));
//or
reader = Files.newBufferedReader(Path.get("FILE_PATH"));
while (reader.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new char[10];
}
//or
while (buffReader.ready()) {
System.out.println(
buffReader.readLine());
}
reader.close();
Using FileInputStream-Read Binary Files to Bytes
FileInputStream fis;
byte[] buffer = new byte[10];
fis = new FileInputStream("FILE_PATH");
//or
fis=Files.newInoutSream(Paths.get("FILE_PATH"))
while (fis.read(buffer) != -1) {
System.out.print(new String(buffer));
buffer = new byte[10];
}
fis.close();
Using Files– Read Small File to List of Strings
List<String> allLines = Files.readAllLines(Paths.get("FILE_PATH"));
for (String line : allLines) {
System.out.println(line);
}
Using Scanner – Read Text File as Iterator
Scanner scanner = new Scanner(new File("FILE_PATH"));
while (scanner.hasNextLine()) {
System.out.println(scanner.nextLine());
}
scanner.close();
Using RandomAccessFile-Reading Files in Read-Only Mode
RandomAccessFile file = new RandomAccessFile("FILE_PATH", "r");
String str;
while ((str = file.readLine()) != null) {
System.out.println(str);
}
file.close();
Using Files.lines-Reading lines as stream
Stream<String> lines = Files.lines(Paths.get("FILE_PATH") .forEach(s -> System.out.println(s));
Using FileChannel-for increasing performance by using off-heap memory furthermore using MappedByteBuffer
FileInputStream i = new FileInputStream(("FILE_PATH");
ReadableByteChannel r = i.getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while (r.read(buffer) != -1) {
buffer.flip();
while (buffer.hasRemaining()) {
System.out.print((char) buffer.get());
}
buffer.clear();
}

"available" of DataInputStream from Socket

I have this code on the client side :
DataInputStream dis = new DataInputStream(socketChannel.socket().getInputStream());
while(dis.available()){
SomeOtherClass.method(dis);
}
But available() keeps returning 0, although there is readable data in the stream. So after the actual data to be read is finished, empty data is passed to the other class to be read and this causes corruption.
After a little search; I found that available() is not reliable when using with sockets, and that I should be reading first few bytes from stream to actually see if data is available to parse.
But in my case; I have to pass the DataInputStream reference I get from the socket to some other class that I cannot change.
Is it possible to read a few bytes from DataInputStream without corrupting it, or any other suggestions ?
Putting a PushbackInputStream in between allows you to read some bytes without corrupting the data.
EDIT: Untested code example below. This is from memory.
static class MyWrapper extends PushbackInputStream {
MyWrapper(InputStream in) {
super(in);
}
#Override
public int available() throws IOException {
int b = super.read();
// do something specific?
super.unread(b);
return super.available();
}
}
public static void main(String... args) {
InputStream originalSocketStream = null;
DataInputStream dis = new DataInputStream(new MyWrapper(originalSocketStream));
}
This should work:
PushbackInputStream pbi = new PushbackInputStream(socketChannel.socket().getInputStream(), 1);
int singleByte;
DataInputStream dis = new DataInputStream(pbi);
while((singleByte = pbi.read()) != -1) {
pbi.unread(singleByte);
SomeOtherClass.method(dis);
}
But please note that this code will behave different from the example with available (if availabe would work) because available does not block but read may block.
But available() keeps returning 0, although there is readable data in the stream
If available() returns zero, either:
The input stream you are using doesn't support available() and so it just returns zero. That isn't the case here, as you are using a DataInputStream wrapped directly around the socket's input stream, and that configuration does support available(), OR ...
There is no readable data in the stream. That appears to be the case here. In fact the only possible way you can know there is readable data in the stream without actually reading it is to call available() and get a positive result. There is no other way of telling.
There are few correct uses of availabe(), and this isn't one of them. Why should you fall out of that loop just because there isn't any data in the socket receive buffer? The only way you should get out of that loop is by getting an end of stream condition.
I should be reading first few bytes from stream to actually see if data is available to parse.
That doesn't even make sense. If you can read anything from the stream, there is data available, and if you can't, there isn't.
Just read, block, and react correctly to EOS, in its various manifestations.

Categories

Resources