In Java I need to put content from an OutputStream (I fill data to that stream myself) into a ByteBuffer. How to do it in a simple way?
You can create a ByteArrayOutputStream and write to it, and extract the contents as a byte[] using toByteArray(). Then ByteBuffer.wrap(byte []) will create a ByteBuffer with the contents of the output byte array.
There is a more efficient variant of #DJClayworth's answer.
As #seh correctly noticed, ByteArrayOutputStream.toByteArray() returns a copy of the backing byte[] object, which may be inefficient. However, the backing byte[] object as well as the count of the bytes are both protected members of the ByteArrayOutputStream class. Hence, you can create your own variant of the ByteArrayOutputStream exposing them directly:
public class MyByteArrayOutputStream extends ByteArrayOutputStream {
public MyByteArrayOutputStream() {
}
public MyByteArrayOutputStream(int size) {
super(size);
}
public int getCount() {
return count;
}
public byte[] getBuf() {
return buf;
}
}
Using this class is easy:
MyByteArrayOutputStream out = new MyByteArrayOutputStream();
fillTheOutputStream(out);
return new ByteArrayInputStream(out.getBuf(), 0, out.getCount());
As a result, once all the output is written the same buffer is used as the basis of an input stream.
Though the above-mention answers solve your problem, none of them are efficient as you expect from NIO. ByteArrayOutputStream or MyByteArrayOutputStream first write the data into a Java heap memory and then copy it to ByteBuffer which greatly affects the performance.
An efficient implementation would be writing ByteBufferOutputStream class yourself. Actually It's quite easy to do. You have to just provide a write() method. See this link for ByteBufferInputStream.
// access the protected member buf & count from the extend class
class ByteArrayOutputStream2ByteBuffer extends ByteArrayOutputStream {
public ByteBuffer toByteBuffer() {
return ByteBuffer.wrap(buf, 0, count);
}
}
Try using PipedOutputStream instead of OutputStream. You can then connect a PipedInputStream to read the data back out of the PipedOutputStream.
You say you're writing to this stream yourself? If so, maybe you could implement your own ByteBufferOutputStream and plug n' play.
The base class would look like so:
public class ByteBufferOutputStream extends OutputStream {
//protected WritableByteChannel wbc; //if you need to write directly to a channel
protected static int bs = 2 * 1024 * 1024; //2MB buffer size, change as needed
protected ByteBuffer bb = ByteBuffer.allocate(bs);
public ByteBufferOutputStream(...) {
//wbc = ... //again for writing to a channel
}
#Override
public void write(int i) throws IOException {
if (!bb.hasRemaining()) flush();
byte b = (byte) i;
bb.put(b);
}
#Override
public void write(byte[] b, int off, int len) throws IOException {
if (bb.remaining() < len) flush();
bb.put(b, off, len);
}
/* do something with the buffer when it's full (perhaps write to channel?)
#Override
public void flush() throws IOException {
bb.flip();
wbc.write(bb);
bb.clear();
}
#Override
public void close() throws IOException {
flush();
wbc.close();
}
/*
}
See efficient implementation of ByteBuffer backed OutputStream with dynamic re-allocation.
/**
* Wraps a {#link ByteBuffer} so it can be used like an {#link OutputStream}. This is similar to a
* {#link java.io.ByteArrayOutputStream}, just that this uses a {#code ByteBuffer} instead of a
* {#code byte[]} as internal storage.
*/
public class ByteBufferOutputStream extends OutputStream {
private ByteBuffer wrappedBuffer;
private final boolean autoEnlarge;
https://gist.github.com/hoijui/7fe8a6d31b20ae7af945
Related
I have created a shared memory (of size 200MB) which is mapped to both a Java process as well as a C++ process running on the system. The C++ process writes 50MB of data in this shared memory. At the JAVA side, a JNI function which has mapped the same shared memory reads this data into a direct buffer like this:
JNIEXPORT jobject JNICALL Java_service_SharedMemoryJNIService_getDirectByteBuffer
(JNIEnv *env, jclass jobject, jlong buf_addr, jint buf_len){
return env->NewDirectByteBuffer((void *)buf_addr, buf_len);
}
Now, at the JAVA side, I need to upload this 50MB of data to S3. Currently, I have to copy this direct buffer to a buffer in JVM heap like this:
public String uploadByteBuffer(String container, String objectKey, ByteBuffer bb) {
BlobStoreContext context = getBlobStoreContext();
BlobStore blobStore = context.getBlobStore();
byte[] buf = new byte[bb.capacity()];
bb.get(buf);
ByteArrayPayload payload = new ByteArrayPayload(buf);
Blob blob = blobStore.blobBuilder(objectKey)
.payload(payload)
.contentLength(bb.capacity())
.build();
blobStore.putBlob(container, blob);
return objectKey;
}
I want to avoid this extra copy form shared memory to JVM heap. Is there a way to directly upload data contained in Direct buffer to S3 ?
Thanks
BlobBuilder.payload can take a ByteSource and you can use a ByteBuffer wrapper:
public class ByteBufferByteSource extends ByteSource {
private final ByteBuffer buffer;
public ByteBufferByteSource(ByteBuffer buffer) {
this.buffer = checkNotNull(buffer);
}
#Override
public InputStream openStream() {
return new ByteBufferInputStream(buffer);
}
private static final class ByteBufferInputStream extends InputStream {
private final ByteBuffer buffer;
private boolean closed = false;
ByteBufferInputStream(ByteBuffer buffer) {
this.buffer = buffer;
}
#Override
public synchronized int read() throws IOException {
if (closed) {
throw new IOException("Stream already closed");
}
try {
return buffer.get();
} catch (BufferUnderflowException bue) {
return -1;
}
}
#Override
public void close() throws IOException {
super.close();
closed = true;
}
}
}
You will want to override read(byte[], int, int) for efficiency. I also proposed this pull request to jclouds: https://github.com/apache/jclouds/pull/158 that you can improve on.
I am wondering if BufferedOutputStream offers any way to provide the count of bytes it has written. I am porting code from C# to Java. The code uses Stream.Position to obtain the count of written bytes.
Could anyone shed some light on this? This is not a huge deal because I can easily add a few lines of code to track the count. It would be nice if BufferedOutputStream already has the function.
For text there is a LineNumberReader, but no counting the progress of an OutputStream. You can add that with a wrapper class, a FilterOutputStream.
public class CountingOutputStream extends FilterOutputStream {
private long count;
private int bufferCount;
public CountingOutputStream(OutputStream out) {
super(out);
}
public long written() {
return count;
}
public long willBeWritten() {
return count + bufferCount;
}
#Override
public void flush() {
count += bufferCount;
bufferCount = 0;
super.flush();
}
public void write(int b)
throws IOException {
++bufferCount;
super.write(b);
}
#Override
public void write(byte[] b, int off, int len)
throws IOException {
bufferCount += len;
super.write(b, off, len);
}
#Override
public void write(byte[] b, int off, int len)
throws IOException {
bufferCount += len;
super.write(b, off, len);
}
}
One could also think using a MemoryMappedByteBuffer (a memory mapped file) for a better speed/memory behavior. Created from a RandomAccessFile or FileChannel.
If all is too circumstantial, use the Files class which has many utilities, like copy. It uses Path - a generalisation of (disk I/O) File -, for files from the internet, files inside zip files, class path resources and so on.
This is a followup to anonymous file streams reusing descriptors
As per my previous question, I can't depend on code like this (happens to work in JDK8, for now):
RandomAccessFile r = new RandomAccessFile(...);
FileInputStream f_1 = new FileInputStream(r.getFD());
// some io, not shown
f_1 = null;
f_2 = new FileInputStream(r.getFD());
// some io, not shown
f_2 = null;
f_3 = new FileInputStream(r.getFD());
// some io, not shown
f_3 = null;
However, to prevent accidental errors and as a form of self-documentation, I would like to invalidate each file stream after I'm done using it - without closing the underlying file descriptor.
Each FileInputStream is meant to be independent, with positioning controlled by the RandomAccessFile. I share the same FileDescriptor to prevent any race conditions arising from opening the same path multiple times. When I'm done with one FileInputStream, I want to invalidate it so as to make it impossible to accidentally read from it while using the second FileInputStream (which would cause the second FileInputStream to skip data).
How can I do this?
notes:
the libraries I use require compatibiity with java.io.*
if you suggest a library (I prefer builtin java semantics if at all possible), it must be commonly available (packaged) for linux (the main target) and usable on windows (experimental target)
but, windows support isn't a absolutely required
Edit: in response to a comment, here is my workflow:
RandomAccessFile r = new RandomAccessFile(String path, "r");
int header_read;
int header_remaining = 4; // header length, initially
byte[] ba = new byte[header_remaining];
ByteBuffer bb = new ByteBuffer.allocate(header_remaining);
while ((header_read = r.read(ba, 0, header_remaining) > 0) {
header_remaining -= header_read;
bb.put(ba, 0, header_read);
}
byte[] header = bb.array();
// process header, not shown
// the RandomAccessFile above reads only a small amount, so buffering isn't required
r.seek(0);
FileInputStream f_1 = new FileInputStream(r.getFD());
Library1Result result1 = library1.Main.entry_point(f_1)
// process result1, not shown
// Library1 reads the InputStream in large chunks, so buffering isn't required
// invalidate f_1 (this question)
r.seek(0)
int read;
while ((read = r.read(byte[4096] buffer)) > 0 && library1.continue()) {
library2.process(buffer, read);
}
// the RandomAccessFile above is read in large chunks, so buffering isn't required
// in a previous edit the RandomAccessFile was used to create a FileInputStream. Obviously that's not required, so ignore
r.seek(0)
Reader r_1 = new BufferedReader(new InputStreamReader(new FileInputStream(r.getFD())));
Library3Result result3 = library3.Main.entry_point(r_2)
// process result3, not shown
// I'm not sure how Library3 uses the reader, so I'm providing buffering
// invalidate r_1 (this question) - bonus: frees the buffer
r.seek(0);
FileInputStream f_2 = new FileInputStream(r.getFD());
Library1Result result1 = library1.Main.entry_point(f_2)
// process result1 (reassigned), not shown
// Yes, I actually have to call 'library1.Main.entry_point' *again* - same comments apply as from before
// invalidate f_2 (this question)
//
// I've been told to be careful when opening multiple streams from the same
// descriptor if one is buffered. This is very vague. I assume because I only
// ever use any stream once and exclusively, this code is safe.
//
A pure Java solution might be to create a forwarding decorator that checks on each method call whether the stream is validated or not. For InputStream this decorator may look like this:
public final class CheckedInputStream extends InputStream {
final InputStream delegate;
boolean validated;
public CheckedInputStream(InputStream stream) throws FileNotFoundException {
delegate = stream;
validated = true;
}
public void invalidate() {
validated = false;
}
void checkValidated() {
if (!validated) {
throw new IllegalStateException("Stream is invalidated.");
}
}
#Override
public int read() throws IOException {
checkValidated();
return delegate.read();
}
#Override
public int read(byte b[]) throws IOException {
checkValidated();
return read(b, 0, b.length);
}
#Override
public int read(byte b[], int off, int len) throws IOException {
checkValidated();
return delegate.read(b, off, len);
}
#Override
public long skip(long n) throws IOException {
checkValidated();
return delegate.skip(n);
}
#Override
public int available() throws IOException {
checkValidated();
return delegate.available();
}
#Override
public void close() throws IOException {
checkValidated();
delegate.close();
}
#Override
public synchronized void mark(int readlimit) {
checkValidated();
delegate.mark(readlimit);
}
#Override
public synchronized void reset() throws IOException {
checkValidated();
delegate.reset();
}
#Override
public boolean markSupported() {
checkValidated();
return delegate.markSupported();
}
}
You can use it like:
CheckedInputStream f_1 = new CheckedInputStream(new FileInputStream(r.getFD()));
// some io, not shown
f_1.invalidate();
f_1.read(); // throws IllegalStateException
Under unix you could generally avoid such problems by dup'ing a file descriptor.
Since java does not not offer such a feature one option would be a native library which exposes that. jnr-posix does that for example. On the other hand jnr depends on a lot more jdk implementation properties than your original question.
I have got an OutputStream which can be initialized as a chain of OutputStreams. There could be any level of chaining .Only thing guaranteed is that at the end of the chain is a FileOutputStream.
I need to recreate this chained outputStream with a modified Filename in FileOutputStream. This would have been possible if out variable (which stores the underlying chained outputStream) was accessible ; as shown below.
public OutputStream recreateChainedOutputStream(OutputStream os) throws IOException {
if(os instanceof FileOutputStream) {
return new FileOutputStream("somemodified.filename");
} else if (os instanceof FilterOutputStream) {
return recreateChainedOutputStream(os.out);
}
}
Is there any other way of achieving the same?
You can use reflection to access the os.out field of the FilterOutputStream, this has however some drawbacks:
If the other OutputStream is also a kind of RolloverOutputStream, you can have a hard time reconstructing it,
If the other OutputStream has custom settings, like GZip compression parameter, you cannot reliable read this
If there is a
A quick and dirty implementation of recreateChainedOutputStream( might be:
private final static Field out;
{
try {
out = FilterInputStream.class.getField("out");
out.setAccessible(true);
} catch(Exception e) {
throw new RuntimeException(e);
}
}
public OutputStream recreateChainedOutputStream(OutputStream out) throws IOException {
if (out instanceof FilterOutputStream) {
Class<?> c = ou.getClass();
COnstructor<?> con = c.getConstructor(OutputStream.class);
return con.invoke(this.out.get(out));
} else {
// Other output streams...
}
}
While this may be ok in your current application, this is a big no-no in the production world because the large amount of different kind of OutputStreams your application may recieve.
A better way to solve would be a kind of Function<String, OutputStream> that works as a factory to create OutputStreams for the named file. This way the external api keeps its control over the OutputStreams while your api can adress multiple file names. An example of this would be:
public class MyApi {
private final Function<String, OutputStream> fileProvider;
private OutputStream current;
public MyApi (Function<String, OutputStream> fileProvider, String defaultFile) {
this.fileProvider = fileProvider;
selectNewOutputFile(defaultFile);
}
public void selectNewOutputFile(String name) {
OutputStream current = this.current;
this.current = fileProvider.apply(name);
if(current != null) current.close();
}
}
This can then be used in other applications as:
MyApi api = new MyApi(name->new FileOutputStream(name));
For simple FileOutputStreams, or be used as:
MyApi api = new MyApi(name->
new GZIPOutputStream(
new CipherOutputStream(
new CheckedOutputStream(
new FileOutputStream(name),
new CRC32()),
chipper),
1024,
true)
);
For a file stream that stored checksummed using new CRC32(), chipped using chipper, gzip according to a 1024 buffer with sync write mode.
So, I'm feeding file data to an API that takes a Reader, and I'd like a way to report progress.
It seems like it should be straightforward to write a FilterInputStream implementation that wraps the FileInputStream, keeps track of the number of bytes read vs. the total file size, and fires some event (or, calls some update() method) to report fractional progress.
(Alternatively, it could report absolute bytes read, and somebody else could do the math -- maybe more generally useful in the case of other streaming situations.)
I know I've seen this before and I may even have done it before, but I can't find the code and I'm lazy. Has anyone got it laying around? Or can someone suggest a better approach?
One year (and a bit) later...
I implemented a solution based on Adamski's answer below, and it worked, but after some months of usage I wouldn't recommend it. When you have a lot of updates, firing/handling unnecessary progress events becomes a huge cost. The basic counting mechanism is fine, but much better to have whoever cares about the progress poll for it, rather than pushing it to them.
(If you know the total size, you can try only firing an event every > 1% change or whatever, but it's not really worth the trouble. And often, you don't.)
Here's a fairly basic implementation that fires PropertyChangeEvents when additional bytes are read. Some caveats:
The class does not support mark or reset operations, although these would be easy to add.
The class does not check whether the total number bytes read ever exceeds the maximum number of bytes anticipated, although this could always be dealt with by client code when displaying progress.
I haven't test the code.
Code:
public class ProgressInputStream extends FilterInputStream {
private final PropertyChangeSupport propertyChangeSupport;
private final long maxNumBytes;
private volatile long totalNumBytesRead;
public ProgressInputStream(InputStream in, long maxNumBytes) {
super(in);
this.propertyChangeSupport = new PropertyChangeSupport(this);
this.maxNumBytes = maxNumBytes;
}
public long getMaxNumBytes() {
return maxNumBytes;
}
public long getTotalNumBytesRead() {
return totalNumBytesRead;
}
public void addPropertyChangeListener(PropertyChangeListener l) {
propertyChangeSupport.addPropertyChangeListener(l);
}
public void removePropertyChangeListener(PropertyChangeListener l) {
propertyChangeSupport.removePropertyChangeListener(l);
}
#Override
public int read() throws IOException {
int b = super.read();
updateProgress(1);
return b;
}
#Override
public int read(byte[] b) throws IOException {
return (int)updateProgress(super.read(b));
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
return (int)updateProgress(super.read(b, off, len));
}
#Override
public long skip(long n) throws IOException {
return updateProgress(super.skip(n));
}
#Override
public void mark(int readlimit) {
throw new UnsupportedOperationException();
}
#Override
public void reset() throws IOException {
throw new UnsupportedOperationException();
}
#Override
public boolean markSupported() {
return false;
}
private long updateProgress(long numBytesRead) {
if (numBytesRead > 0) {
long oldTotalNumBytesRead = this.totalNumBytesRead;
this.totalNumBytesRead += numBytesRead;
propertyChangeSupport.firePropertyChange("totalNumBytesRead", oldTotalNumBytesRead, this.totalNumBytesRead);
}
return numBytesRead;
}
}
Guava's com.google.common.io package can help you a little. The following is uncompiled and untested but should put you on the right track.
long total = file1.length();
long progress = 0;
final OutputStream out = new FileOutputStream(file2);
boolean success = false;
try {
ByteStreams.readBytes(Files.newInputStreamSupplier(file1),
new ByteProcessor<Void>() {
public boolean processBytes(byte[] buffer, int offset, int length)
throws IOException {
out.write(buffer, offset, length);
progress += length;
updateProgressBar((double) progress / total);
// or only update it periodically, if you prefer
}
public Void getResult() {
return null;
}
});
success = true;
} finally {
Closeables.close(out, !success);
}
This may look like a lot of code, but I believe it's the least you'll get away with. (note other answers to this question don't give complete code examples, so it's hard to compare them that way.)
The answer by Adamski works but there is a small bug. The overridden read(byte[] b) method calls the read(byte[] b, int off, int len) method trough the super class.
So updateProgress(long numBytesRead) is called twice for every read action and you end up with a numBytesRead that is twice the size of the file after the total file has been read.
Not overriding read(byte[] b) method solves the problem.
If you’re building a GUI application there’s always ProgressMonitorInputStream. If there’s no GUI involved wrapping an InputStream in the way you describe is a no-brainer and takes less time than posting a question here.
To complete the answer given by #Kevin Bourillion, it can be applied to a network content as well using this technique (that prevents reading the stream twice : one for size and one for content) :
final HttpURLConnection httpURLConnection = (HttpURLConnection) new URL( url ).openConnection();
InputSupplier< InputStream > supplier = new InputSupplier< InputStream >() {
public InputStream getInput() throws IOException {
return httpURLConnection.getInputStream();
}
};
long total = httpURLConnection.getContentLength();
final ByteArrayOutputStream bos = new ByteArrayOutputStream();
ByteStreams.readBytes( supplier, new ProgressByteProcessor( bos, total ) );
Where ProgressByteProcessor is an inner class :
public class ProgressByteProcessor implements ByteProcessor< Void > {
private OutputStream bos;
private long progress;
private long total;
public ProgressByteProcessor( OutputStream bos, long total ) {
this.bos = bos;
this.total = total;
}
public boolean processBytes( byte[] buffer, int offset, int length ) throws IOException {
bos.write( buffer, offset, length );
progress += length - offset;
publishProgress( (float) progress / total );
return true;
}
public Void getResult() {
return null;
}
}