what are prepackaged character stream classes? - java

While reading Java Tutorials, the topic Basic I/o says, use InputStreamReader and OutputStreamWriter when there are no prepackaged character stream classes.
1)What are Pepackaged character stream classes?
Does it mean, a file already has some text!

The term is quite vague and doesn't really seem to be defined anywhere, so good question.
As best I understand it it means things like FileInputStream, FileOutputStream, ByteArrayOutputStream, etc. Classes that have wrapped up a particular kind of stream for you and provide the functionality required to work with it.
Note that most of these streams are working with characters not bytes, and that is generally what you want in Java for dealing with String data in files. On the other hand though if you are reading a pure binary source then the data will come in as bytes and you can then use InputStreamReader to convert those bytes to characters.
So a prepackaged stream reader is one that already provides you the data pre-packaged in the form that you want it.

I believe it to mean classes which inherit Reader or Writer. Such classes "wrap" byte streams so as to convert them automatically to character streams. Example: FileReader, FileWriter; they can read text from files directly.
If no such classes exist for your particular stream needs but you know what you get out of it/put into it is text, then you must use these two wrapper classes.
Classical example: HTML. It is text, but what you get from sockets is byte streams; if you want to read it as HTML, use a Reader (with the correct encoding!) over the socket stream (but of course, many APIs today don't require you to do that).

Related

Proper Java classes for reading and writing files?

Reading some sources about Java file I/O managing, I get to know that there are more than 1 alternative for input and output operations.
These are:
BufferedReader and BufferedWriter
FileReader and FileWriter
FileInputStream and FileOutputStream
InputStreamReader and OutputStreamWriter
Scanner class
What of these is best alternative for text files managing? What's best alternative for serialization? What does Java NIO say about it?
Two kinds of data
Generally speaking there are two "worlds":
binary data
text data
When it's a file (or a socket, or a BLOB in a DB, or ...), then it's always binary data first.
Some of that binary data can be treated as text data (which involves something called an "encoding" or "character encoding").
Binary Data
Whenever you want to handle the binary data then you need to use the InputStream/OutputStream classes (generally, everything that contains Stream in its name).
That's why there's a FileInputStream and a FileOutputStream: those read from and write to files and they handle binary data.
Text Data
Whenever you want to handle text data, then you need to use the Reader/Writer classes.
Whenever you need to convert binary data to text (or vice versa), then you need some kind of encoding (common ones are UTF-8, UTF-16, ISO-8859-1 (and related ones) and the good old US-ASCII). "Luckily" the Java platform also has something called the "default platform encoding" which it will use whenever it needs one but the code doesn't specify one.
The platform default encoding is a two-sided sword, however:
it makes writing code easier, because you don't have to specify an encoding for each operation but
it might not match the data you have: If the platform-default encoding is ISO-8859-1 and the file you read is actually UTF-8, then you will get a scrambled output!
For reading, we should also mention the BufferedReader which can be wrapped around any other Reader and adds the ability to handle whole lines at once.
Scanner is a special class that's meant to parse text input into tokens. It's most useful for structured text but often used on System.in to provide a very simple way to read data from stdin (i.e. from what the user inputs on the keyboard).
Bridgin the gap
Now, confusingly enough there are classes that make the bridge between those worlds, which generally have both parts in their names:
an InputStreamReader consumes a InputStream and is itself a Reader.
an OutputStreamWriter is a Writer and writes to an OutputStream.
And then there are "shortcut classes" that basically combine two other classes that are often combined.
a FileReader is basically a combination of a FileInputStream with an InputStreamReader
a FileWriter is basically a combination of a FileOutputStream with an OutputStreamWriter
Note that FileReader and FileWriter have a major drawback compared to their more complicated "hand-built" alternative: they always use the platform default encoding, which might not be what you're trying to do!
What about serialization?
ObjectOutputStream and ObjectInputStream are special streams used for serialization.
As the name of the classes implies serializing involves only binary data (even if serializing String objects), so you'll want to use *Stream classes exclusively. As long as you avoid any Reader/Writer classes, you should be fine.
Further resources
the Basic I/O trail.
Joel's old-ish article on Unicode (good introduction, slightly light on technical detail)
On the evils of platform default encoding (also this)

Tokenize Java InputStream into streams, not Strings

I know the Java libraries pretty well, so I was surprised when I realized that, apparently, there's no easy way to do something seemingly simple with a stream. I'm trying to read an HTTP request containing multipart form data (large, multiline tokens separated be delimiters that look like, for example, ------WebKitFormBoundary5GlahTkFmhDfanAn--), and I want to read until I encounter a part of the request with a given name, and then return an InputStream of that part.
I'm fine with just reading the stream into memory and returning a ByteArrayInputStream, because the files submitted should never be larger than 1MB. However, I want to make sure that the reading method throws an exception if the file is larger than 1MB, so that excessively-large files don't fill up the JVM's memory and crash the server. The file data may be binary, so that rules out BufferedReader.readLine() (it drops newlines, which could be any of \r, \n, or \r\n, resulting in loss of data).
All of the obvious tokenizing solutions, such as Scanner, read the tokens as Strings, not streams, which could cause OutOfMemoryErrors for large files--exactly what I'm trying to avoid. As far as I can tell, there's no equivalent to Scanner that returns each token as an InputStream without reading it into memory. Is there something I'm missing, or is there any way to create something like that myself, using just the standard Java libraries (no Apache Commons, etc.), that doesn't require me to read the stream a character at a time and write all of the token-scanning code myself?
Addendum: Shortly before posting this, I realized that the obvious solution to my original problem was simply to read the full request body into memory, failing if it's too large, and then to tokenize the resulting ByteArrayInputStream with a Scanner. This is inefficient, but it works. However, I'm still interested to know if there's a way to tokenize an InputStream into sub-streams, without reading them into memory, without using extra libraries, and without resorting to character-by-character processing.
It's not possible without loading them into memory (the solution you don't want) or saving them to disk (becomes I/O heavy). Tokenizing the stream into separate streams without loading it into memory implies that you can read the stream (to tokenize it) and be able to read it again later. In short, what you want is impossible unless your stream is seekable, but these are generally specialized streams for very specific applications and specialized I/O objects, like RandomAccessFile.

Understand Java IO

I am still a newbie java programmer. I was learning about Java IO and noticed that in the book as well as in the online tutorials they donot talk about scanner class. They always mention, creating input/output stream reader objects and use them to read or write.
I am very familiar with scanner class and after reading I started to think may be scanner is not the right way to read console input/files in java.
Please clarify my doubt and if you could point me to an easy to understand tutorial, it will be great. I have already looked up oracle docs and other popular websites. Read Herbert schildt's book & the awful head first java book (barf..barf)
You to understand that a) a lot of material about Java was written years ago and Scanner is relatively recent. b) Scanner the right tool in some situations but you can use raw stream for binary or readers for text in all situations.
As you suspect Scanner is the right choice for simple text documents.
You have evaluate the material you are reading and give it context (like how old is it) There isn't any tutorial which will help you with that. ;)
The scanner class is a special file reader which is optimized for reading text files. If you want to read other file types the scanner class is not optimal.
A good overview can be found here java i/o. The summary form there:
The java.io package contains many classes that your programs can use to read and write data. Most of the classes implement sequential access streams. The sequential access streams can be divided into two groups: those that read and write bytes and those that read and write Unicode characters. Each sequential access stream has a speciality, such as reading from or writing to a file, filtering data as its read or written, or serializing an object.
After reading this, you should look into Apache Commons I/O which give you some handy utility classes for i/o.
Java io package supports byte level and character level operations. Both can be done in streamed or buffered fashion done. Examples about these IO types can be found here.
A Scanner object is useful for breaking down formatted input into tokens and translating individual tokens according to their data type.
Example about the Scanner

Stream chaining in Java

Is it bad style to keep the references to streams "further down" a filter chain, and use those lower level streams again, or even to swap one type of stream for another? For example:
OutputStream os = new FileOutputStream("file");
PrintWriter pw = new PrintWriter(os);
pw.print("print writer stream");
pw.flush();
pw = null;
DataOutputStream dos = new DataOutputStream(os);
dos.writeBytes("dos writer stream");
dos.flush();
dos = null;
os.close();
If so, what are the alternatives if I need to use the functionality of both streams, e.g. if I want to write a few lines of text to a stream, followed by binary data, or vice versa?
This can be done in some cases, but it's error-prone. You need to be careful about buffers and stuff like the stream headers of ObjectOutputStream.
if I want to write a few lines of text to a stream, followed by binary
data, or vice versa?
For this, all you need to know is that you can convert text to binary data and back but always need to specify an encoding. However, it is also error-prone because people tend to use the API methods that use the platform default encoding, and of course you're basically implementing a parser for a custom binary file format - lots of things can go wrong there.
All in all, if you're creating a file format, especially when mixing text and binary data, it's best to use an existing framework like Google protocol buffers
If you have to do it, then you have to do it. So if you're dealing with an external dependency that you don't have control over, you just have to do it.
I think the bad style is the fact that you would need to do it. If you had to send binary data across sometimes, and text across at others, it would probably be best to have some kind of message object and send the object itself over the wire with Serialization. The data overhead isn't too much if structured properly.
I don't see why not. I mean, the implementations of the various stream classes should protect you from writing invalid data. So long as you're reading it back the same way, and your code is otherwise understandable, I don't see why that would be a problem.
Style doesn't always mean you have to do it the way you've seen others do it. So long as it's logical, and someone reading the code would see what (and why) you're doing it without you needing to write a bunch of comments, then I don't see what the issue is.
Since you're flushing between, it's probably fine. But it might be cleaner to use one OutputStream and just use os.write(string.getBytes()); to write the strings.

Lightweight java.io.InputStream implementation that supports mark() & reset()

Good day,
Currently, we are using ByteArrayInputStream for our reset-able InputStream. My problem with it is that it consumes a lot of memory (it loads all the bytes it represents in memory unlike some other InputStream implementations).
My question then is, is there any lighter implementation of InputStream which supports mark() & read()?
I tried searching in commons-io as well, but I fail to see any.
Thanks,
Franz
Would using a BufferedInputStream work for you? Without knowing where the original data is coming from (eg, why you have a ByteArrayInputStream) to begin with it is a bit hard to answer your question.
I most often use a PushbackInputStream when parsing data, and have the need to go back and re-read the data. Here is an explanation:
http://tutorials.jenkov.com/java-io/pushbackinputstream.html
There is also a PushbackReader should you need a character based stream instead.

Categories

Resources