I just asked a question about why my thread shut down wasn't working. It ended up being due to readLine() blocking my thread before the shutdown flag could be recognised. This was easy to fix by checking ready() before calling readLine().
However, I'm now using a DataInputStream to do the following in series:
int x = reader.readInt();
int y = reader.readInt();
byte[] z = new byte[y]
reader.readFully(z);
I know I could implement my own buffering which would check the running file flag while loading up the buffer. But I know this would be tedious. Instead, I could let the data be buffered within the InputStream class, and wait until I have my n bytes read, before executing a non-blocking read - as I know how much I need to read.
4 bytes for the first integer
4 bytes for the second integer y
and y bytes for the z byte array.
Instead of using ready() to check if there is a line in the buffer, is there some equivalent ready(int bytesNeeded)?
The available() method returns the amount of bytes in the InputStreams internal buffer.
So, one can do something like:
while (reader.available() < 4) checkIfShutdown();
reader.readInt();
You can use InputStream.available() to get an estimate of the amount of bytes that can be read. Quoting the Javadoc:
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking, which may be 0, or 0 when end of stream is detected. The read might be on the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In other words, if available() returns n, you know you can safely call read(n) without blocking. Note that, as the Javadoc states, the value returned is an estimate. For example, InflaterInputStream.available() will always return 1 if EOF isn't reached. Check the documentation of the InputStream subclass you will be using to ensure it meets your needs.
You are going to need to implement your own equivalent of BufferedInputStream. Either as a sole owner of an InputStream and a thread (possibly borrowed from a pool) to block in. Alternatively, implement with NIO.
How can I make this piece of code extremely quick?
It reads a raw image using RandomAccessFile (in) and write it in a file using DataOutputStream (out)
final int WORD_SIZE = 4;
byte[] singleValue = new byte[WORD_SIZE];
long position;
for (int i=1; i<=100000; i++)
{
out.writeBytes(i + " ");
for(int j=1; j<=17; j++)
{
in.seek(position);
in.read(singleValue);
String str = Integer.toString(ByteBuffer.wrap(singleValue).order(ByteOrder.LITTLE_ENDIAN).getInt());
out.writeBytes(str + " ");
position+=WORD_SIZE;
}
out.writeBytes("\n");
}
The inner for creates a new line in the file every 17 elements
Thanks
I assume that the reason you are asking is because this code is running really slowly. If that is the case, then one reason is that each seek and read call is doing a system call. A RandomAccessFile has no buffering. (I'm guessing that singleValue is a byte[] of length 1.)
So the way to make this go faster is to step back and think about what it is actually doing. If I understand it correctly, it is reading each 4th byte in the file, converting them to decimal numbers and outputting them as text, 17 to a line. You could easily do that using a BufferedInputStream like this:
int b = bis.read(); // read a byte
bis.skip(3); // skip 3 bytes.
(with a bit of error checking ....). If you use a BufferedInputStream like this, most of the read and skip calls will operate on data that has already been buffered, and the number of syscalls will reduce to 1 for every N bytes, where N is the buffer size.
UPDATE - my guess was wrong. You are actually reading alternate words, so ...
bis.read(singleValue);
bis.skip(4);
Every 100000 offsets I have to jump 200000 and then do it again till the end of the file.
Use bis.skip(800000) to do that. It should do a big skip by moving the file position without actually reading any data. One syscall at most. (For a FileInputStream, at least.)
You can also speed up the output side by a roughly equivalent amount by wrapping the DataOutputStream around a BufferedOutputStream.
But System.out is already buffered.
I have a small TCP server program and a corresponding client, and they communicate via ServerSocket and Socket classes and DataInputStream/DataOutputStream. And I have a problem with sending Strings to the server.
connection = new Socket("localhost", 2233);
outStream = new DataOutputStream(connection.getOutputStream());
outStream.writeBytes(fileName);
fileName is, at this point in time, a hard-coded String with the value "listener.jardesc". The server reads the string with the following code:
inStream = new DataInputStream(connection.getInputStream());
String fileName = inStream.readLine();
The string is received properly, but three zero-value bytes have been added to the end. Why is that and how can I stop it from happening? (I could, of course, trim the received string or somehow else stop this problem from mattering, but I'd rather prevent the problem completely)
I'm just going to throw this out there. You're using the readLine() method which has been deprecated in Java 5, 6 & 7. The API docs state quite clearly that this method "does not properly convert bytes to characters". I would read it as bytes or use a Buffered Reader.
http://docs.oracle.com/javase/1.5.0/docs/api/java/io/DataInputStream.html#readLine%28%29
writeBytes() does not add extra bytes.
The code you've written is invalid, as you aren't writing a newline. Therefore it doesn't work, and blocks forever in readLine().
In trying to debug this you appear to have read the bytes some other way, probably with read(); and to have ignored the return value returned by read, and to have concluded that read() filled the buffer you provided, when it didn't, leaving three bytes in their initial state, which is zero.
Currently I have the below code for reading an InputStream. I am storing the whole file into a StringBuilder variable and processing this string afterwards.
public static String getContentFromInputStream(InputStream inputStream)
// public static String getContentFromInputStream(InputStream inputStream,
// int maxLineSize, int maxFileSize)
{
StringBuilder stringBuilder = new StringBuilder();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String lineSeparator = System.getProperty("line.separator");
String fileLine;
boolean firstLine = true;
try {
// Expect some function which checks for line size limit.
// eg: reading character by character to an char array and checking for
// linesize in a loop until line feed is encountered.
// if max line size limit is passed then throw an exception
// if a line feed is encountered append the char array to a StringBuilder
// after appending check the size of the StringBuilder
// if file size exceeds the max file limit then throw an exception
fileLine = bufferedReader.readLine();
while (fileLine != null) {
if (!firstLine) stringBuilder.append(lineSeparator);
stringBuilder.append(fileLine);
fileLine = bufferedReader.readLine();
firstLine = false;
}
} catch (IOException e) {
//TODO : throw or handle the exception
}
//TODO : close the stream
return stringBuilder.toString();
}
The code went for a review with the Security team and the following comments were received:
BufferedReader.readLine is susceptible to DOS (Denial of Service) attacks (line of infinite length, huge file containing no line feed/carriage return)
Resource exhaustion for the StringBuilder variable (cases when a file containing data greater than the available memory)
Below are the solutions I could think of:
Create an alternate implementation of readLine method (readLine(int limit)), which checks for the no. of bytes read and if it exceeds the specified limit, throw a custom exception.
Process the file line by line without loading the file in entirety. (pure non-Java solution :) )
Please suggest if there are any existing libraries which implement the above solutions.
Also suggest any alternate solutions which offer more robustness or are more convenient to implement than the proposed ones. Though performance is also a major requirement, security comes first.
Updated Answer
You want to avoid all sorts of DOS attacks (on lines, on size of the file, etc). But in the end of the function, you're trying to convert the entire file into one single String!!! Assume that you limit the line to 8 KB, but what happens if somebody sends you a file with two 8 KB lines? The line reading part will pass, but when finally you combine everything into a single string, the String will choke all available memory.
So since finally you're converting everything into one single String, limiting line size doesn't matter, nor is safe. You have to limit the entire size of the file.
Secondly, what you're basically trying to do is, you're trying to read data in chunks. So you're using BufferedReader and reading it line-by-line. But what you're trying to do, and what you really want at the end - is some way of reading the file piece by piece. Instead of reading one line at a time, why not instead read 2 KB at a time?
BufferedReader - by its name - has a buffer inside it. You can configure that buffer. Let's say you create a BufferedReader with buffer size of 2 KB:
BufferedReader reader = new BufferedReader(..., 2048);
Now if the InputStream that you pass to BufferedReader has 100 KB of data, BufferedReader will automatically read it 2 KB at at time. So it will read the stream 50 times, 2 KB each (50x2KB = 100 KB). Similarly, if you create BufferedReader with a 10 KB buffer size, it will read the input 10 times (10x10KB = 100 KB).
BufferedReader already does the work of reading your file chunk-by-chunk. So you don't want to add an extra layer of line-by-line above it. Just focus on the end result - if your file at the end is too big (> available RAM) - how are you going to convert it into a String at the end?
One better way is to just pass things around as a CharSequence. That's what Android does. Throughout the Android APIs, you will see that they return CharSequence everywhere. Since StringBuilder is also a subclass of CharSequence, Android will internally use either a String, or a StringBuilder or some other optimized string class based on the size/nature of input. So you could rather directly return the StringBuilder object itself once you've read everything, rather than converting it to a String. This would be safer against large data. StringBuilder also maintains the same concept of buffers inside it, and it will internally allocate multiple buffers for large strings, rather than one long string.
So overall:
Limit the overall file size since you're going to deal with the entire content at some point. Forget about limiting or splitting lines
Read in chunks
Using Apache Commons IO, here is how you would read data from a BoundedInputStream into a StringBuilder, splitting by 2 KB blocks instead of lines:
// import org.apache.commons.io.output.StringBuilderWriter;
// import org.apache.commons.io.input.BoundedInputStream;
// import org.apache.commons.io.IOUtils;
BoundedInputStream boundedInput = new BoundedInputStream(originalInput, <max-file-size>);
BufferedReader reader = new BufferedReader(new InputStreamReader(boundedInput), 2048);
StringBuilder output = new StringBuilder();
StringBuilderWriter writer = new StringBuilderWriter(output);
IOUtils.copy(reader, writer); // copies data from "reader" => "writer"
return output;
Original Answer
Use BoundedInputStream from Apache Commons IO library. Your work becomes much more easier.
The following code will do what you want:
public static String getContentFromInputStream(InputStream inputStream) {
inputStream = new BoundedInputStream(inputStream, <number-of-bytes>);
// Rest code are all same
You just simply wrap your InputStream with a BoundedInputStream and you specify a maximum size. BoundedInputStream will take care of limiting reads up to that maximum size.
Or you can do this when you're creating the reader:
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(
new BoundedInputStream(inputStream, <no-of-bytes>)
)
);
Basically what we're doing here is, we're limiting the read size at the InputStream layer itself, rather than doing that when reading lines. So you end up with a reusable component like BoundedInputStream which limits reading at the InputStream layer, and you can use that wherever you want.
Edit: Added footnote
Edit 2: Added updated answer based on comments
There are basically 4 ways to do file processing:
Stream-Based Processing (the java.io.InputStream model): Optionally put a bufferedReader around the stream, iterate & read the next available text from the stream (if no text is available, block until some becomes available), process each piece of text independently as it's read (catering for widely-varying sizes of text pieces)
Chunk-Based Non-Blocking Processing (the java.nio.channels.Channel model): Create a set of fixed-sized buffers (representing the "chunks" to be processed), read into each of the buffers in turn without blocking (nio API delegates to native IO, using fast O/S-level threads), your main processing thread picks each buffer in turn once it is filled and processes the fixed-size chunk, as other buffers continue to be asynchronously loaded.
Part File Processing (including line-by-line processing) (can leverage (1) or (2) to isolate or build up each "part"): break your file format down into semantically meaningful sub-parts (if possible! breaking into lines could be possible!), iterate through stream pieces or chunks and build-up content in memory until the next part is completely built, process each part as soon as it's built.
Entire File Processing (the java.nio.file.Files model): Read the entire file into memory in one operation, process the complete contents
Which one should you use?
It depends - on your file contents and the type of processing you require.
From a resource-use efficiency perspective (best to worst) is: 1,2,3,4.
From a processing speed & efficiency perspective (best to worst) is: 2,1,3,4.
From an ease of programming perspective (best to worst): 4,3,1,2.
However, some types of processing might require more than the smallest piece of text (ruling out 1, and maybe 2) and some file formats may not have internal parts (ruling out 3).
You're doing 4. I suggest you shift to 3 (or lower), if you can.
Under 4, there's only one way to avoid DOS - limit the size before it's read into memory, (or for that matter copied to your file system). It's too late once it's read in. If this is not possible, then try 3, 2 or 1.
Limiting File Size
Often the file is uploaded via a HTML form.
If uploading using Servlet #MultipartConfig annotation and request.getPart().getInputStream(), you have control over how much data you read from the stream. Also, request.getPart().getSize() returns the file size in advance and if it's small enough, you can do request.getPart().write(path) to write the file to disk.
If uploading using JSF, then JSF 2.2 (very new) has the standard html component <h:inputFile> (javax.faces.component.html.InputFile), which has an attribute for maxLength; pre-JSF 2.2 implementations have similar custom components (e.g. Tomahawk has <t:InputFileUpload> with maxLength attribute; PrimeFaces has <p:FileUpload> with sizeLimit attribute).
Alternatives to Read Entire File
Your code which uses InputStream, StringBuilder, etc, is an efficient way to read the entire file, but is not necessarily the simplest way (least lines of code).
Junior/average developers could get the misapprehension that you're doing efficient stream-based processing, when you're processing the entire file - so include appropriate comments.
If you want less code, you could try one of the following:
List<String> stringList = java.nio.file.Files.readAllLines(path, charset);
or
byte[] byteContents = java.nio.file.Files.readAllBytes(path);
But they require care, or they could be inefficient in resource usage. If you use readAllLines and then concatenate the List elements into a single String, then you would consume double the memory (for the List elements + the concatenated String). Similarly, if you use readAllBytes, followed by encoding to String (new String(byteContents, charset)), then again, you're using "double" the memory. So best to process directly against List<String> or byte[], unless you limit your files to a small enough size.
instead of readLine use read which reads a given amount of chars.
in each loop check how much data has been read, if it's more then a certain amount, more then the maximum of an expected input, stop it and return an error and log it.
I faced a similar issue when copying a huge binary file (which generally does not contain newline character). doing a readline() leads to reading the entire binary file into one single string causing OutOfMemory on Heap space.
Here is a simple JDK alternative:
public static void main(String[] args) throws Exception
{
byte[] array = new byte[1024];
FileInputStream fis = new FileInputStream(new File("<Path-to-input-file>"));
FileOutputStream fos = new FileOutputStream(new File("<Path-to-output-file>"));
int length = 0;
while((length = fis.read(array)) != -1)
{
fos.write(array, 0, length);
}
fis.close();
fos.close();
}
Things to note:
The above example copies the file using a buffer of 1K bytes. However, if you are doing this copy over network, you may want to tweak the buffer size.
If you would like to use FileChannel or libraries like Commons IO, just make sure that the implementation boils down to something like above
This worked for me without any problems.
char charArray[] = new char[ MAX_BUFFER_SIZE ];
int i = 0;
int c = 0;
while((c = br.read()) != -1 && i < MAX_BUFFER_SIZE) {
char character = (char) c;
charArray[i++] = character;
}
return Arrays.copyOfRange(charArray,0,i);
I cannot think a soloution other than Apache Commons IO FileUtils.
Its pretty simple with FileUtils class, as the so called DOS attack wont come directly from the top layer.
Reading and writing a file is very much simple as you can do it with just one line of code like
String content =FileUtils.readFileToString(new File(filePath));
You can explore more about this.
There is class EntityUtils under Apache httpCore. Use getString() method of this class to get the String from Response content.
Recommendations from Fortify Scan. You can adapt the InputStream to other resources such as HTTP request InputStream.
InputStream zipInput = zipFile.getInputStream(zipEntry);
Reader zipReader = new InputStreamReader(zipInput);
BufferedReader br = new BufferedReader(zipReader);
StringBuffer sb = new StringBuffer();
int intC;
while ((intC = br.read()) != -1){
char c = (char)intC;
if (c == "\n"){
break;
}
if (sb.length >= MAX_STR_LEN){
throw new Exception("Input too long");
}
sb.append(c);
}
String line = sb.toString();
For good or bad I have been using code like the following without any problems:
ZipFile aZipFile = new ZipFile(fileName);
InputStream zipInput = aZipFile.getInputStream(name);
int theSize = zipInput.available();
byte[] content = new byte[theSize];
zipInput.read(content, 0, theSize);
I have used it (this logic of obtaining the available size and reading directly to a byte buffer)
for File I/O without any issues and I used it with zip files as well.
But recently I stepped into a case that the zipInput.read(content, 0, theSize); actually reads 3 bytes less that the theSize available.
And since the code is not in a loop to check the length returned by zipInput.read(content, 0, theSize); I read the file with the 3 last bytes missing
and later the program can not function properly (the file is a binary file).
Strange enough with different zip files of larger size e.g. 1075 bytes (in my case the problematic zip entry is 867 bytes) the code works fine!
I understand that the logic of the code is probably not the "best" but why am I suddenly getting this problem now?
And how come if I run the program immediately with a larger zip entry it works?
Any input is highly welcome
Thanks
From the InputStream read API docs:
An attempt is made to read as many as len bytes, but a smaller number
may be read.
... and:
Returns: the total number of bytes read into the buffer, or -1 if
there is no more data because the end of the stream has been reached.
In other words unless the read method returns -1 there is still more data available to read, but you cannot guarantee that read will read exactly the specified number of bytes. The specified number of bytes is the upper bound describing the maximum amount of data it will read.
Using available() does not guarantee that it counted total available bytes to the end of stream.
Refer to Java InputStream's available() method. It says that
Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.
An example solution for your problem can be as follows:
ZipFile aZipFile = new ZipFile(fileName);
InputStream zipInput = aZipFile.getInputStream( caImport );
int available = zipInput.available();
byte[] contentBytes = new byte[ available ];
while ( available != 0 )
{
zipInput.read( contentBytes );
// here, do what ever you want
available = dis.available();
} // while available
...
This works for sure on all sizes of input files.
The best way to do this should be as bellows:
public static byte[] readZipFileToByteArray(ZipFile zipFile, ZipEntry entry)
throws IOException {
InputStream in = null;
try {
in = zipFile.getInputStream(entry);
return IOUtils.toByteArray(in);
} finally {
IOUtils.closeQuietly(in);
}
}
where the IOUtils.toByteArray(in) method keeps reading until EOF and then return the byte array.