Reading other process' **unbuffered** output stream - java

I'm programming a little GUI for a file converter in java. The file converter writes its current progress to stdout. Looks like this:
Flow_1.wav: 28% complete, ratio=0,447
I wanted to illustrate this in a progress bar, so I'm reading the process' stdout like this:
ProcessBuilder builder = new ProcessBuilder("...");
builder.redirectErrorStream(true);
Process proc = builder.start();
InputStream stream = proc.getInputStream();
byte[] b = new byte[32];
int length;
while (true) {
length = stream.read(b);
if (length < 0) break;
// processing data
}
Now the problem is that regardless which byte array size I choose, the stream is read in chunks of 4 KB. So my code is being executed until length = stream.read(b); and then blocks for quite a while. Once the process generates 4 KB output data, my programm gets this chunk and works through it in 32 byte slices. And then waits again for the next 4 KB.
I tried to force java to use smaller buffers like this:
BufferedInputStream stream = new BufferedInputStream(proc.getInputStream(), 32);
Or this:
BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream()), 32);
But neither changed anything.
Then I found this: Process source (around line 87)
It seems that the Process class is implemented in such a way that it pipes the process' stdout to a file. So what proc.getInputStream(); actually does, is returning a stream to a file. And this file seems to be written with a 4 KB buffer.
Does anyone know some kind of workaround for this situation? I just want to get the process' output instantly.
EDIT: As suggested by Ian Roberts, I also tried to pipe the converter's output into the stderr stream, since this stream doesn't seem to be wrapped in a BufferedInputStream. Still 4k chunks.
Another interesting thing is: I actually don't get exactly 4096 bytes, but about 5 more. I'm afraid the FileInputStream itself is buffered natively.

Looking at the code you linked to the process's standard output stream gets wrapped in a BufferedInputStream but its standard error remains unbuffered. So one possibility might be to execute not the converter directly, but a shell script (or Windows equivalent if you're on Windows) that sends the converter's stdout to stderr:
ProcessBuilder builder = new ProcessBuilder("/bin/sh", "-c",
"exec /path/to/converter args 1>&2");
Don't redirectErrorStream, and then read from proc.getErrorStream() instead of proc.getInputStream().
It may be the case that your converter is already using stderr for its progress reporting in which case you don't need the script bit, just turn off redirectErrorStream(). If the converter program writes to both stdout and stderr then you'll need to spawn a second thread to consume stdout as well (the script approach gets around this by sending everything to stderr).

Related

Why is BufferedReader readLine reading past EOF

I have a very large file (~6GB) that has fixed-width text separated by \r\n, and so I'm using buffered reader to read line by line. This process can be interrupted or stopped and if it is, it uses a checkpoint "lastProcessedLineNbr" to fast forward to the correct place to resume reading. This is how the reader is initialized.
private void initializeBufferedReader(Integer lastProcessedLineNbr) throws IOException {
reader = new BufferedReader(new InputStreamReader(getInputStream(), "UTF-8"));
if(lastProcessedLineNbr==null){lastProcessedLineNbr=0;}
for(int i=0; i<lastProcessedLineNbr;i++){
reader.readLine();
}
currentLineNumber = lastProcessedLineNbr;
}
This seems to work fine, and I read and process the data in this method:
public Object readItem() throws Exception {
if((currentLine = reader.readLine())==null){
return null;
}
currentLineNumber++;
return parse(currentLine);
}
And again, everything works fine until I reach the last line in the document. readLine() in the latter method throws an error:
17:06:49,980 ERROR [org.jberet] (Batch Thread - 1) JBERET000007: Failed to run job ProdFileRead, parse, org.jberet.job.model.Chunk#3965dcc8: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569)
at java.lang.StringBuffer.append(StringBuffer.java:369)
at java.io.BufferedReader.readLine(BufferedReader.java:370)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.rational.batch.reader.TextLineReader.readItem(TextLineReader.java:55)
Curiously, it seems to be reading past the end of the file and allocating so much space that it runs out of memory. I tried looking at the contents of the file using Cygwin and "tail file.txt" and in the console it gave me the expected 10 lines. But when I did "tail file.txt > output.txt" output.txt ended up being like 1.8GB, much larger than the 10 lines I expected. So it seems Cygwin is doing the same thing. As far as I can tell there is no special EOF character. It's just the last byte of data and it ends abruptly.
Anyone have any idea on how I can get this working? I'm thinking I could resort to counting the number of bytes read until I get the full size of the file, but I was hoping there was a better way.
But when I did tail file.txt > output.txt output.txt ended up being like 1.8GB, much larger than the 10 lines I expected
What this indicates to me is that the file is padded with 1.8GB of binary zeroes, which Cygwin's tail command ignored when writing to the terminal, but which Java is not ignoring. This would explain your OutOfMemoryError as well, as the BufferedReader continued reading data looking for the next \r\n, never finding it before overflowing memory.

Java compressor not reading file completely

We have an issue unzipping bz2 files in Java, whereby the input stream thinks it's finished after reading ~3% of the file.
We would welcome any suggestions for how to decompress and read large bz2 files which have to be processed line by line.
Here are the details of what we have done so far:
For example, a bz2 file is 2.09 GB in size and uncompressed it is 24.9 GB
The code below only reads 343,800 lines of the actual ~10 million lines the file contains.
Modifying the code to decompress the bz2 into a text file (FileInputStream straight into the CompressorInputStream) results in a file of ~190 MB - irrespective of the size of the bz2 file.
I have tried setting a buffer value of 2048 bytes, but this has no effect on the outcome.
We have executed the code on Windows 64 bit and Linux/CentOS both with the same outcome.
Could the buffered reader come to an empty, "null" line and cause the code to exit the while-loop?
import org.apache.commons.compress.compressors.*;
import java.io.*;
...
CompressorInputStream is = new CompressorStreamFactory()
.createCompressorInputStream(
new BufferedInputStream(
new FileInputStream(filePath)));
lineNumber = 0;
line = "";
br = new BufferedReader(
new InputStreamReader(is));
while ((line = br.readLine()) != null) {
this.processLine(line, ++lineNumber);
}
Even this code, which forces an exception when the end of the stream is reached, has exactly the same result:
byte[] buffer = new byte[1024];
int len = 1;
while (len == 1) {
out.write(buffer, 0, is.read(buffer));
out.flush();
}
There is nothing obviously wrong with your code; it should work. This means the problem must be elsewhere.
Try to enable logging (i.e. print the lines as you process them). Make sure there are no gaps in the input (maybe write the lines to a new file and do a diff). Use bzip2 --test to make sure the input file isn't buggy. Check whether it always fails for the same line (maybe the input contains odd characters or binary data?)
The issue lies with the bz2 files: they were created using a version of Hadoop which includes bad block headers inside the files.
Current Java solutions stumble over this, while others ignore it or handle it somehow.
Will look for a solution/workaround.

How to read Java InputStream without delay?

I have C++ program (feeder.exe), that prints some data:
printf("%s\n", line);
In average it produces 20-30 lines per second, but not uniformly.
I want to catch this data in Java program by running the exe from Java code:
package temp_read;
import java.io.*;
public class Main {
public static void main(String[] args) throws Throwable {
Process p = Runtime.getRuntime().exec("d:/feeder.exe");
InputStream is = p.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = in.readLine()) != null) {
System.out.println(System.currentTimeMillis() + "," + line);
}
}
}
But when I look into the output, I see that it receives a bulk of strings once per 3-5 seconds.
Question: how to receive the data from feeder.exe immediately without any delay when it prints to stdout?
PS: not related question: How to stop the feeder.exe if I stop the java by Ctrl+C?
If redirected, stdout is probably buffered, meaning that the problem is in the C++ code and not on the Java side. The C++ process will buffer the output and flush several "printf"s at once, as soon as the buffer is full.
If you are able to modify the C++ software, try to do a fflush(stdout); after the printf to force the output buffer to be flushed.
The most likely cause is that the feeder.exe is not flushing its output stream regularly, and the data is sitting in its output buffer until the buffer fills and is written as a result.
If that is the cause, there is nothing you can do on the Java side to avoid this. The problem can only be fixed by modifying the feeder application.
Note that if the data was in the "pipe" that connected the two processes, then reading on the Java side would get it. Assuming that the end-of-line had been written to the pipe, the readLine() call would deliver the line without blocking.

Realtime read of Runtime.getRuntime().exec()

The problem with the program works in real time.
For example: getevent
But when I try to read the data coming out of the process, the exec gives their parts at least 4096 bytes!
For example:
if getevent returned 1000 bytes of text that: stdout.available () == 0
if getevent returned 4000 bytes of text that: stdout.available () == 0
if getevent returned 4096 bytes of text that: stdout.available () == 4096
if getevent returned 8192 bytes of text that: stdout.available () == 8192
if getevent returned 10000 bytes of text that: stdout.available () == 8192
If to use stdout.read() the function will wait until 4096*n bytes or until getevent is closed.
How do I read the data that come in real time instead of waiting until 4096 will be dialed bytes?
Process p = Runtime.getRuntime().exec(new String[]{"su", "-c", "system/bin/sh"});
DataOutputStream stdin = new DataOutputStream(p.getOutputStream());
stdin.writeBytes("getevent\n");
InputStream stdout = p.getInputStream();
byte[] buffer = new byte[1];
int read;
String out = new String();
while(true){
read = stdout.read(buffer);
out += new String(buffer, 0, read);
System.out.println("MYLOG: "+(new String(buffer, 0, read)));
}
I find this buffed in documentation!
Copies the InputStream into the OutputStream, until the end of the
stream has been reached. This method uses a buffer of 4096 kbyte.
>> Documentation
The most likely cause of this is that the external application is buffering its output. This is pretty typical for an application that is writing to its "standard output". The solution is to modify the external application so that it "flushes" its output at the appropriate time.
There is nothing in your Java code that will cause it to delay if there is data that is available to be read. In particular, use of a DataOutputStream won't cause this.
It should also be noted that available() does not give reliable information. If you read the API documentation carefully, you will see that a return value of N only means that a simultaneous attempt to read more than N bytes might block. A thread cannot call both available() and read() simultaneously, so by the time you come to use the information supplied by available() it could be out of date.

Sending images through a socket using qt and read it using java

I'm trying to send an image upload in a Qt server trough the socket and visualize it in a client created using Java. Until now I have only transferred strings to communicate on both sides, and tried different examples for sending images but with no results.
The code I used to transfer the image in qt is:
QImage image;
image.load("../punton.png");
qDebug()<<"Image loaded";
QByteArray ban; // Construct a QByteArray object
QBuffer buffer(&ban); // Construct a QBuffer object using the QbyteArray
image.save(&buffer, "PNG"); // Save the QImage data into the QBuffer
socket->write(ban);
In the other end the code to read in Java is:
BufferedInputStream in = new BufferedInputStream(socket.getInputStream(),1);
File f = new File("C:\\Users\\CLOUDMOTO\\Desktop\\JAVA\\image.png");
System.out.println("Receiving...");
FileOutputStream fout = new FileOutputStream(f);
byte[] by = new byte[1];
for(int len; (len = in.read(by)) > 0;){
fout.write(by, 0, len);
System.out.println("Done!");
}
The process in Java gets stuck until I close the Qt server and after that the file generated is corrupt.
I'll appreciate any help because it's neccessary for me to do this and I'm new to programming with both languages.
Also I've used the following commands that and the receiving process now ends and show a message, but the file is corrupt.
socket->write(ban+"-1");
socket->close(); in qt.
And in java:
System.out.println(by);
String received = new String(by, 0, by.length, "ISO8859_1");
System.out.println(received);
System.out.println("Done!");
You cannot transport file over socket in such simple way. You are not giving the receiver any clue, what number of bytes is coming. Read javadoc for InputStream.read() carefully. Your receiver is in endless loop because it is waiting for next byte until the stream is closed. So you have partially fixed that by calling socket->close() at the sender side. Ideally, you need to write the length of ban into the socket before the buffer, read that length at receiver side and then receive only that amount of bytes. Also flush and close the receiver stream before trying to read the received file.
I have absolutely no idea what you wanted to achieve with socket->write(ban+"-1"). Your logged output starts with %PNG which is correct. I can see there "-1" at the end, which means that you added characters to the image binary file, hence you corrupted it. Why so?
And no, 1x1 PNG does not have size of 1 byte. It does not have even 4 bytes (red,green,blue,alpha). PNG needs some things like header and control checksum. Have a look at the size of the file on filesystem. This is your required by size.

Categories

Resources