Good day,
Currently, we are using ByteArrayInputStream for our reset-able InputStream. My problem with it is that it consumes a lot of memory (it loads all the bytes it represents in memory unlike some other InputStream implementations).
My question then is, is there any lighter implementation of InputStream which supports mark() & read()?
I tried searching in commons-io as well, but I fail to see any.
Thanks,
Franz
Would using a BufferedInputStream work for you? Without knowing where the original data is coming from (eg, why you have a ByteArrayInputStream) to begin with it is a bit hard to answer your question.
I most often use a PushbackInputStream when parsing data, and have the need to go back and re-read the data. Here is an explanation:
http://tutorials.jenkov.com/java-io/pushbackinputstream.html
There is also a PushbackReader should you need a character based stream instead.
Related
Good afternoon everyone,
First of all, I'll say that it's only for personal purpose in a certain way, it's made to use for little projects to improve my Java knowledge, but my idea is to make this kind of things to understand better the way developers works with sockets and bytes, as I really like to understand this kind of things better for my future ideas.
Actually I'm making a lightweight HTTP server in Java to understand the way it works, and I've been reading documentation but still have some difficulties to actually understand part of the official documentation. The main problem I'm facing is that, something I'd like to know if it's related or not, the content-length seems to have a higher length than the one I get from the BufferedReader. I don't know if the issue is about the way chars are managed and bytes are being parsed to chars on the BufferedReader, so it has less data, so probably what I have to do is treat this part as a binary, so I'd have to read the bytes of the InputStream, but here comes the real deal I'm facing.
As Readers reads a certain amount of bytes, and then it stops and uses this as buffer, this means the data from the InputStream is being used on the Reader, and it's no longer on the stream, so using read() would end up on a -1 as there aren't more bytes to read. A multipart is divided in multiple elements separated with a boundary, and a newline that delimiters the information from the content. I still have to get the information as an String to process it, but the content should be parsed into a binary data, and, without modifying the buffer length, implying I'd require knowledge about the exact length I require to get only the information, the most probably result would be the content being transferred to the BufferedReader buffer. Is it possible to do it even with the processed data from the BufferedStream, or should I find a way to get that certain content as binary without being processed?
As I said, I'm new working with sockets and services, so I don't exactly know which are the possibilities it's another kind of issue, so any help would be appreciated, thank you in advance.
Answer from Remy Lebeau, that can be found on the comments, which become useful for me:
since multipart data is both textual and binary, you are going to have to do your own buffering of the socket data so you have more control and know where the data switches back and forth. At the very least, since you can read binary data directly from a BufferedInputStream, and access its internal buffer, you can let it handle the actual buffering for you, and it is not difficult to write a custom readLine() method that can read a line of text from a BufferedInputStream without using BufferedReader
As understood, InputStream once consumed, cannot be reused/ re-read back, as it would not have any content.
But, using PushbackInputStream, we could unread the bytes read back into the inputstream.
Can we use PushbackInputStream safely to make InputStream consumable repeatedly?
Yes, within its pushback limit, and so is BufferedInputStream with the mark() and reset() features and an adequate buffer, but if you have a solution that involves re-reading input, you are designing it wrong. A compiler can proceed left to right along the input file without ever backtracking: so can you.
While reading Java Tutorials, the topic Basic I/o says, use InputStreamReader and OutputStreamWriter when there are no prepackaged character stream classes.
1)What are Pepackaged character stream classes?
Does it mean, a file already has some text!
The term is quite vague and doesn't really seem to be defined anywhere, so good question.
As best I understand it it means things like FileInputStream, FileOutputStream, ByteArrayOutputStream, etc. Classes that have wrapped up a particular kind of stream for you and provide the functionality required to work with it.
Note that most of these streams are working with characters not bytes, and that is generally what you want in Java for dealing with String data in files. On the other hand though if you are reading a pure binary source then the data will come in as bytes and you can then use InputStreamReader to convert those bytes to characters.
So a prepackaged stream reader is one that already provides you the data pre-packaged in the form that you want it.
I believe it to mean classes which inherit Reader or Writer. Such classes "wrap" byte streams so as to convert them automatically to character streams. Example: FileReader, FileWriter; they can read text from files directly.
If no such classes exist for your particular stream needs but you know what you get out of it/put into it is text, then you must use these two wrapper classes.
Classical example: HTML. It is text, but what you get from sockets is byte streams; if you want to read it as HTML, use a Reader (with the correct encoding!) over the socket stream (but of course, many APIs today don't require you to do that).
I can see there are a number of posts regarding reuse InputStream. I understand InputStream is a one-time thing and cannot be reused.
However, I have a use case like this:
I have downloaded the file from DropBox by obtaining the DropBoxInputStream using the DropBox's Java SDK. I then need to upload the file to another system by passing the InputStream. However, as part of the download, I have to provide the MD5 of the file. So I have to read the file from the stream before uploading the file. Because the DropBoxInputStream I received can only be used once, I have to get another DropBoxInputStream after I have calculated the MD5 and before uploading the file. The procedure is like:
Get first DropBoxInputStream
Read from the DropBoxInputStream and calculate MD5
Get the second DropBoxInputStream
Upload file using the MD5 and the second DropBoxInputStream.
I am thinking that, if there are many way for me to "cache" or "backup" the InputStream before I calculate the MD5 so that I can save step 3 of obtaining the same DropBoxInputStream again?
Many thanks
EDIT:
Sorry I missed some information.
What I am currently doing is that I use a MD5DigestOutputStream to calculate MD5. I stream data across the MD5DigestOutputStream and save them locally as a temp file. Once the data goes through the MD5DigestOutputStream, it will calculate the MD5.
I then call a third party library to upload the file using the calculated md5 and a FileInputStream which reads from the temp file.
However, this requires huge disk space sometime and I want to remove the needs to use temp file. The library I use only accepts a MD5 and InputStream. This means I have to calculate the MD5 on my end. My plan is to use my MD5DigestOutputStream to write data to /dev/null (not keeping the file) so that I can calculate theMD5, and get the InputStream from DropBox again and pass that to the library I use. I assume the library will be able to get the file directly from DropBox without the need for me to cache the file either in the memory of at the disk. Will it work?
Input streams aren't really designed for creating copies or re-using, they're specifically for situations where you don't want to read off into a byte array and use array operations on that (this is especially useful when the whole array isn't available, as in, for e.g. socket comunication). You could buffer up into a byte array, which is the process of reading sections from the stream into a byte array buffer until you have enough information.
But that's unnecessary for calculating an md5. Notice that InputStream is abstract, so it needs be implemented in an extended class. It has many implementations- GZIPInputStream, fileinputstream etc. These are, in design pattern speak, decorators of the IO stream: they add extra functionality to the abstract base IO classes. For example, GZIPInputStream gzips up the stream.
So, what you need is a stream to do this for md5. There is, joyfully, a well documented similar thing: see this answer. So you should just be able to pass your dropbox input stream (as it will be itself an input stream) to create a new DigestInputStream, and then you can both take the md5 and continue to read as before.
Worried about type casting? The idea with decorators in Java is that, since the InputStream base class interfaces all the methods and 'beef' you need to do your IO, there's no harm in passing instances of objects inheriting from InputStream in the constructor of each stream implementation, and you can still do the same core IO.
Finally, I should probably answer your actual question- say you still want to "cache" or "backup" the stream anyway? Well, you could just write it to a byte array. This is well documented, but can become faff when your streams get more complicated. Alternatively, try looking at a PushbackInputStream. Here, you can easily write a function to read off n bytes, perform and operation on them, and then restore them to the stream. Generally good to avoid these implementations of streams in Java, as it's bad for memory use, but no worse than buffering everything up which you'd otherwise have to do.
Or, of course, I would have a go with DigestInputStream.
Hope this helps,
Best.
You don't need to open a new InputStream from DropBox.
Once you have read the file from DropBox, you have it locally. So it is either in memory (in a byte array) or you stored it in a local file. Now you can create an InputStream that reads the data from memory (ByteArrayInputStream) or disk (FileInputStream) in order to upload the file.
So instead of caching the InputStream (which you can't) you cache the contents (which you can).
Is it bad style to keep the references to streams "further down" a filter chain, and use those lower level streams again, or even to swap one type of stream for another? For example:
OutputStream os = new FileOutputStream("file");
PrintWriter pw = new PrintWriter(os);
pw.print("print writer stream");
pw.flush();
pw = null;
DataOutputStream dos = new DataOutputStream(os);
dos.writeBytes("dos writer stream");
dos.flush();
dos = null;
os.close();
If so, what are the alternatives if I need to use the functionality of both streams, e.g. if I want to write a few lines of text to a stream, followed by binary data, or vice versa?
This can be done in some cases, but it's error-prone. You need to be careful about buffers and stuff like the stream headers of ObjectOutputStream.
if I want to write a few lines of text to a stream, followed by binary
data, or vice versa?
For this, all you need to know is that you can convert text to binary data and back but always need to specify an encoding. However, it is also error-prone because people tend to use the API methods that use the platform default encoding, and of course you're basically implementing a parser for a custom binary file format - lots of things can go wrong there.
All in all, if you're creating a file format, especially when mixing text and binary data, it's best to use an existing framework like Google protocol buffers
If you have to do it, then you have to do it. So if you're dealing with an external dependency that you don't have control over, you just have to do it.
I think the bad style is the fact that you would need to do it. If you had to send binary data across sometimes, and text across at others, it would probably be best to have some kind of message object and send the object itself over the wire with Serialization. The data overhead isn't too much if structured properly.
I don't see why not. I mean, the implementations of the various stream classes should protect you from writing invalid data. So long as you're reading it back the same way, and your code is otherwise understandable, I don't see why that would be a problem.
Style doesn't always mean you have to do it the way you've seen others do it. So long as it's logical, and someone reading the code would see what (and why) you're doing it without you needing to write a bunch of comments, then I don't see what the issue is.
Since you're flushing between, it's probably fine. But it might be cleaner to use one OutputStream and just use os.write(string.getBytes()); to write the strings.