For the life of me, I haven't been able to find a question that matches what I'm trying to do, so I'll explain what my use-case is here. If you know of a topic that already covers the answer to this, please feel free to direct me to that one. :)
I have a piece of code that uploads a file to Amazon S3 periodically (every 20 seconds). The file is a log file being written by another process, so this function is effectively a means of tailing the log so that someone can read its contents in semi-real-time without having to have direct access to the machine that the log resides on.
Up until recently, I've simply been using the S3 PutObject method (using a File as input) to do this upload. But in AWS SDK 1.9, this no longer works because the S3 client rejects the request if the content size actually uploaded is greater than the content-length that was promised at the start of the upload. This method reads the size of the file before it starts streaming the data, so given the nature of this application, the file is very likely to have increased in size between that point and the end of the stream. This means that I need to now ensure I only send N bytes of data regardless of how big the file is.
I don't have any need to interpret the bytes in the file in any way, so I'm not concerned about encoding. I can transfer it byte-for-byte. Basically, what I want is a simple method where I can read the file up to the Nth byte, then have it terminate the read even if there's more data in the file past that point. (In other words, insert EOF into the stream at a specific point.)
For example, if my file is 10000 bytes long when I start the upload, but grows to 12000 bytes during the upload, I want to stop uploading at 10000 bytes regardless of that size change. (On a subsequent upload, I would then upload the 12000 bytes or more.)
I haven't found a pre-made way to do this - the best I've found so far appears to be IOUtils.copyLarge(InputStream, OutputStream, offset, length), which can be told to copy a maximum of "length" bytes to the provided OutputStream. However, copyLarge is a blocking method, as is PutObject (which presumably calls a form of read() on its InputStream), so it seems that I couldn't get that to work at all.
I haven't found any methods or pre-built streams that can do this, so it's making me think I'd need to write my own implementation that directly monitors how many bytes have been read. That would probably then work like a BufferedInputStream where the number of bytes read per batch is the lesser of the buffer size or the remaining bytes to be read. (eg. with a buffer size of 3000 bytes, I'd do three batches at 3000 bytes each, followed by a batch with 1000 bytes + EOF.)
Does anyone know a better way to do this? Thanks.
EDIT Just to clarify, I'm already aware of a couple alternatives, neither of which are ideal:
(1) I could lock the file while uploading it. Doing this would cause loss of data or operational problems in the process that's writing the file.
(2) I could create a local copy of the file before uploading it. This could be very inefficient and take up a lot of unnecessary disk space (this file can grow into the several-gigabyte range, and the machine it's running on may be that short of disk space).
EDIT 2: My final solution, based on a suggestion from a coworker, looks like this:
private void uploadLogFile(final File logFile) {
if (logFile.exists()) {
long byteLength = logFile.length();
try (
FileInputStream fileStream = new FileInputStream(logFile);
InputStream limitStream = ByteStreams.limit(fileStream, byteLength);
) {
ObjectMetadata md = new ObjectMetadata();
md.setContentLength(byteLength);
// Set other metadata as appropriate.
PutObjectRequest req = new PutObjectRequest(bucket, key, limitStream, md);
s3Client.putObject(req);
} // plus exception handling
}
}
LimitInputStream was what my coworker suggested, apparently not aware that it had been deprecated. ByteStreams.limit is the current Guava replacement, and it does what I want. Thanks, everyone.
Complete answer rip & replace:
It is relatively straightforward to wrap an InputStream such as to cap the number of bytes it will deliver before signaling end-of-data. FilterInputStream is targeted at this general kind of job, but since you have to override pretty much every method for this particular job, it just gets in the way.
Here's a rough cut at a solution:
import java.io.IOException;
import java.io.InputStream;
/**
* An {#code InputStream} wrapper that provides up to a maximum number of
* bytes from the underlying stream. Does not support mark/reset, even
* when the wrapped stream does, and does not perform any buffering.
*/
public class BoundedInputStream extends InputStream {
/** This stream's underlying #{code InputStream} */
private final InputStream data;
/** The maximum number of bytes still available from this stream */
private long bytesRemaining;
/**
* Initializes a new {#code BoundedInputStream} with the specified
* underlying stream and byte limit
* #param data the #{code InputStream} serving as the source of this
* one's data
* #param maxBytes the maximum number of bytes this stream will deliver
* before signaling end-of-data
*/
public BoundedInputStream(InputStream data, long maxBytes) {
this.data = data;
bytesRemaining = Math.max(maxBytes, 0);
}
#Override
public int available() throws IOException {
return (int) Math.min(data.available(), bytesRemaining);
}
#Override
public void close() throws IOException {
data.close();
}
#Override
public synchronized void mark(int limit) {
// does nothing
}
#Override
public boolean markSupported() {
return false;
}
#Override
public int read(byte[] buf, int off, int len) throws IOException {
if (bytesRemaining > 0) {
int nRead = data.read(
buf, off, (int) Math.min(len, bytesRemaining));
bytesRemaining -= nRead;
return nRead;
} else {
return -1;
}
}
#Override
public int read(byte[] buf) throws IOException {
return this.read(buf, 0, buf.length);
}
#Override
public synchronized void reset() throws IOException {
throw new IOException("reset() not supported");
}
#Override
public long skip(long n) throws IOException {
long skipped = data.skip(Math.min(n, bytesRemaining));
bytesRemaining -= skipped;
return skipped;
}
#Override
public int read() throws IOException {
if (bytesRemaining > 0) {
int c = data.read();
if (c >= 0) {
bytesRemaining -= 1;
}
return c;
} else {
return -1;
}
}
}
thanks for reading this
Well what I'm trying to do is to take a .wav file (only a short audio) and convert it to ints, and every one represent a tone of the audio...
If you're asking why I'm doing this, is because I'm doing an arduino project, and I want to make the arduino to play a song, and for doing that I need an int array where every int is a tone.
So I thought, "well if I program a little application to convert any .wav file to a txt where are stored the ints that represent the melody notes, I just need to copy this values to the arduino project code";
So after all this, maybe you're asking "What is your problem?";
I done the code and is "working", the only problem is that the txt only have "1024" in each line...
So it's obviously that I'm having a problem, no all the tones are 1024 -_-
package WaveToText;
import java.io.*;
/**
*
* #author Luis Miguel Mejía Suárez
* #project This porject is to convert a wav music files to a int array
* Which is going to be printed in a txt file to be used for an arduino
* #serial 1.0.1 (05/11/201)
*/
public final class Converter
{
/**
*
* #Class Here is where is going to be allowed all the code for the application
*
* #Param Text is an .txt file where is going to be stored the ints
* #Param MyFile is the input of the wav file to be converted
*/
PrintStream Text;
InputStream MyFile;
public Converter () throws FileNotFoundException, IOException
{
MyFile = new FileInputStream("C:\\Users\\luismiguel\\Dropbox\\ESTUDIO\\PROGRAMAS\\JAVA\\WavToText\\src\\WaveToText\\prueba.wav");
Text = new PrintStream(new File("Notes.txt"));
}
public void ConvertToTxt() throws IOException
{
BufferedInputStream in = new BufferedInputStream(MyFile);
int read;
byte[] buff = new byte[1024];
while ((read = in.read(buff)) > 0)
{
Text.println(read);
}
Text.close();
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException{
// TODO code application logic here
Converter Exc = new Converter();
Exc.ConvertToTxt();
}
}
Wait wait wait..... a lot of things aren't right here....
You can't just read the bytes and send them to Arduino because as you are saying Arduino expects note numbers. The numbers in a Wav file are, first the "header" with audio info, and then the numbers representing discrete points in the signal (Waveform). If you want to get notes you need some algorithms for pitch detection or music transcription.
Pitch detection could work if your music is monophonic or close to monophonic. For full band songs it would be troublesome. So... I guess the "Arduino part" will play monophonic music, and you need to extract the fundamental frequency of the signal in particular time moment (This is called pitch detection and there are different ways to do it (autocorrelation, amdf, spectral analisys)). You must also keep the timing of the notes.
When you extract the frequencies there is a formula to convert frequency into integer number representing a note number on a piano. n=12(log2(f/440)) + 49 where n is the integer note number and f is the fundamental frequency of the note. Before calculating you should also quantize the frequencies you get from the pitch recognition algorithm to the closest (google for the exact note frequencies).
However I really suggest to do some more research. It would be really difficult to detect note in a music where you have few instruments playing, drums, singer, all together....
while ((read = in.read(buff)) > 0)
{
Text.println(read);
}
This bit of code reads 1024 bytes of data from in, then assigns the number of bytes read to read, which is 1024, until the end of file. You then print read to your text file.
You probably wanted to print buff to your text file, but that is going to write 1024 bytes, rather than the 1024 ints you want.
You will need to create a for loop to print the individual bytes as ints.
while ((read = in.read(buff)) > 0)
{
for (int i = 0; i < buff.length; i++)
Text.print((int)buff[i]);
}
Dude, I'm using following code to read up a large file(2MB or more) and do some business with data.
I have to read 128Byte for each data read call.
At the first I used this code(no problem,works good).
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
for(int idx=0;idx<128;idx++){
read=is.read(); if(read==-1){return;}//end of stream
buff[idx]=(byte)read;
}
process_data(buff);
}
Then I tried this code which the problems got appeared(Error! weird responses sometimes)
InputStream is;//= something...
int read=-1;
byte[] buff=new byte[128];
while(true){
//ERROR! java doesn't read 128 bytes while it's available
if((read=is.read(buff,0,128))==128){process_data(buff);}else{return;}
}
The above code doesn't work all the time, I'm sure that number of data is available, but reads(read) 127 or 125, or 123, sometimes. what is the problem?
I also found a code for this to use DataInputStream#readFully(buff:byte[]):void which works too, but I'm just wondered why the seconds solution doesn't fill the array data while the data is available.
Thanks buddy.
Consulting the javadoc for FileInputStream (I'm assuming since you're reading from file):
Reads up to len bytes of data from this input stream into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.
The key here is that the method only blocks until some data is available. The returned value gives you how many bytes was actually read. The reason you may be reading less than 128 bytes could be due to a slow drive/implementation-defined behavior.
For a proper read sequence, you should check that read() does not equal -1 (End of stream) and write to a buffer until the correct amount of data has been read.
Example of a proper implementation of your code:
InputStream is; // = something...
int read;
int read_total;
byte[] buf = new byte[128];
// Infinite loop
while(true){
read_total = 0;
// Repeatedly perform reads until break or end of stream, offsetting at last read position in array
while((read = is.read(buf, read_total, buf.length - offset)) != -1){
// Gets the amount read and adds it to a read_total variable.
read_total = read_total + read;
// Break if it read_total is buffer length (128)
if(read_total == buf.length){
break;
}
}
if(read_total != buf.length){
// Incomplete read before 128 bytes
}else{
process_data(buf);
}
}
Edit:
Don't try to use available() as an indicator of data availability (sounds weird I know), again the javadoc:
Returns an estimate of the number of remaining bytes that can be read (or skipped over) from this input stream without blocking by the next invocation of a method for this input stream. Returns 0 when the file position is beyond EOF. The next invocation might be the same thread or another thread. A single read or skip of this many bytes will not block, but may read or skip fewer bytes.
In some cases, a non-blocking read (or skip) may appear to be blocked when it is merely slow, for example when reading large files over slow networks.
The key there is estimate, don't work with estimates.
Since the accepted answer was provided a new option has become available. Starting with Java 9, the InputStream class has two methods named readNBytes that eliminate the need for the programmer to write a read loop, for example your method could look like
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
byte[] buff = new byte[128];
while (true) {
int numRead = is.readNBytes(buff, 0, buff.length);
if (numRead == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff, numRead);
}
}
or the slightly simpler
public static void some_method( ) throws IOException {
InputStream is = new FileInputStream(args[1]);
while (true) {
byte[] buff = is.readNBytes(128);
if (buff.length == 0) {
break;
}
// The last read before end-of-stream may read fewer than 128 bytes.
process_data(buff);
}
}
According to Java documentation, the readlimit parameter of the mark method in the Class InputStream server for set "the maximum limit of bytes that can be read before the mark position becomes invalid.".
I have a file named sample.txt whose content is "hello". And i wrote this code:
import java.io.*;
public class InputStream{
public static void main (String[] args) throws IOException {
InputStream reader = new FileInputStream("sample.txt");
BufferedInputStream bis = new BufferedInputStream(reader);
bis.mark(1);
bis.read();
bis.read();
bis.read();
bis.read();
bis.reset();
System.out.println((char)bis.read());
}
}
The output is "h". But if i after the mark method read more than one bytes, shouldn't i get an error for the invalid reset method call?
I would put this down to documentation error.
The non-parameter doc for BufferedInputStream is "See the general contract of the mark method of InputStream," which to me indicates that BufferedInputStream does not behave differently, parameter doc notwithstanding.
And the general contract, as specified by InputStream, is
The readlimit arguments tells this input stream to allow that many bytes to be read before the mark position gets invalidated [...] the stream is not required to remember any data at all if more than readlimit bytes are read from the stream
In other words, readlimit is a suggestion; the stream is free to under-promise and over-deliver.
If you look at the source, particularly the fill() method, you can see (after a while!) that it only invalidates the mark when it absolutely has to, i.e. it is more tolerant than the documentation might suggest.
...
else if (pos >= buffer.length) /* no room left in buffer */
if (markpos > 0) { /* can throw away early part of the buffer */
int sz = pos - markpos;
System.arraycopy(buffer, markpos, buffer, 0, sz);
pos = sz;
markpos = 0;
} else if (buffer.length >= marklimit) {
markpos = -1; /* buffer got too big, invalidate mark */
pos = 0; /* drop buffer contents */
....
The default buffer size is relatively large (8K), so invalidation won't be triggered in your example.
Looking at the implementation of BufferedInputStream, it describes the significance of the marker position in the JavaDocs (of the protected markpos field):
[markpos is] the value of the pos field at the time the last mark method was called.
This value is always in the range -1 through pos. If there is no marked position in the input stream, this field is -1. If there is a marked position in the input stream, then buf[markpos] is the first byte to be supplied as input after a reset operation. If markpos is not -1, then all bytes from positions buf[markpos] through buf[pos-1] must remain in the buffer array (though they may be moved to another place in the buffer array, with suitable adjustments to the values of count, pos, and markpos); they may not be discarded unless and until the difference between pos and markpos exceeds marklimit.
Hope this helps. Take a peek at the definitions of read, reset and the private method fill in the class to see how it all ties together.
In short, only when the class retrieves more data to fill its buffer will the mark position be taken into account. It will be correctly invalidated if more bytes are read than the call to mark allowed. As a result, calls to read will not necessarily trigger the behaviour advertised in the public JavaDoc comments.
This looks like a subtle bug. If you reduce the buffer sizey you'll get an IOException
public static void main(String[] args) throws IOException {
InputStream reader = new ByteArrayInputStream(new byte[]{1, 2, 3, 4, 5, 6, 7, 8});
BufferedInputStream bis = new BufferedInputStream(reader, 3);
bis.mark(1);
bis.read();
bis.read();
bis.read();
bis.read();
bis.reset();
System.out.println((char)bis.read());
}
I am on the stage of development, where I have two modules and from one I got output as a OutputStream and second one, which accepts only InputStream. Do you know how to convert OutputStream to InputStream (not vice versa, I mean really this way) that I will be able to connect these two parts?
Thanks
There seem to be many links and other such stuff, but no actual code using pipes. The advantage of using java.io.PipedInputStream and java.io.PipedOutputStream is that there is no additional consumption of memory. ByteArrayOutputStream.toByteArray() returns a copy of the original buffer, so that means that whatever you have in memory, you now have two copies of it. Then writing to an InputStream means you now have three copies of the data.
The code using lambdas (hat-tip to #John Manko from the comments):
PipedInputStream in = new PipedInputStream();
final PipedOutputStream out = new PipedOutputStream(in);
// in a background thread, write the given output stream to the
// PipedOutputStream for consumption
new Thread(() -> {originalOutputStream.writeTo(out);}).start();
One thing that #John Manko noted is that in certain cases, when you don't have control of the creation of the OutputStream, you may end up in a situation where the creator may clean up the OutputStream object prematurely. If you are getting the ClosedPipeException, then you should try inverting the constructors:
PipedInputStream in = new PipedInputStream(out);
new Thread(() -> {originalOutputStream.writeTo(out);}).start();
Note you can invert the constructors for the examples below too.
Thanks also to #AlexK for correcting me with starting a Thread instead of just kicking off a Runnable.
The code using try-with-resources:
// take the copy of the stream and re-write it to an InputStream
PipedInputStream in = new PipedInputStream();
new Thread(new Runnable() {
public void run () {
// try-with-resources here
// putting the try block outside the Thread will cause the
// PipedOutputStream resource to close before the Runnable finishes
try (final PipedOutputStream out = new PipedOutputStream(in)) {
// write the original OutputStream to the PipedOutputStream
// note that in order for the below method to work, you need
// to ensure that the data has finished writing to the
// ByteArrayOutputStream
originalByteArrayOutputStream.writeTo(out);
}
catch (IOException e) {
// logging and exception handling should go here
}
}
}).start();
The original code I wrote:
// take the copy of the stream and re-write it to an InputStream
PipedInputStream in = new PipedInputStream();
final PipedOutputStream out = new PipedOutputStream(in);
new Thread(new Runnable() {
public void run () {
try {
// write the original OutputStream to the PipedOutputStream
// note that in order for the below method to work, you need
// to ensure that the data has finished writing to the
// ByteArrayOutputStream
originalByteArrayOutputStream.writeTo(out);
}
catch (IOException e) {
// logging and exception handling should go here
}
finally {
// close the PipedOutputStream here because we're done writing data
// once this thread has completed its run
if (out != null) {
// close the PipedOutputStream cleanly
out.close();
}
}
}
}).start();
This code assumes that the originalByteArrayOutputStream is a ByteArrayOutputStream as it is usually the only usable output stream, unless you're writing to a file. The great thing about this is that since it's in a separate thread, it also is working in parallel, so whatever is consuming your input stream will be streaming out of your old output stream too. That is beneficial because the buffer can remain smaller and you'll have less latency and less memory usage.
If you don't have a ByteArrayOutputStream, then instead of using writeTo(), you will have to use one of the write() methods in the java.io.OutputStream class or one of the other methods available in a subclass.
An OutputStream is one where you write data to. If some module exposes an OutputStream, the expectation is that there is something reading at the other end.
Something that exposes an InputStream, on the other hand, is indicating that you will need to listen to this stream, and there will be data that you can read.
So it is possible to connect an InputStream to an OutputStream
InputStream----read---> intermediateBytes[n] ----write----> OutputStream
As someone metioned, this is what the copy() method from IOUtils lets you do. It does not make sense to go the other way... hopefully this makes some sense
UPDATE:
Of course the more I think of this, the more I can see how this actually would be a requirement. I know some of the comments mentioned Piped input/ouput streams, but there is another possibility.
If the output stream that is exposed is a ByteArrayOutputStream, then you can always get the full contents by calling the toByteArray() method. Then you can create an input stream wrapper by using the ByteArrayInputStream sub-class. These two are pseudo-streams, they both basically just wrap an array of bytes. Using the streams this way, therefore, is technically possible, but to me it is still very strange...
As input and output streams are just start and end point, the solution is to temporary store data in byte array. So you must create intermediate ByteArrayOutputStream, from which you create byte[] that is used as input for new ByteArrayInputStream.
public void doTwoThingsWithStream(InputStream inStream, OutputStream outStream){
//create temporary bayte array output stream
ByteArrayOutputStream baos = new ByteArrayOutputStream();
doFirstThing(inStream, baos);
//create input stream from baos
InputStream isFromFirstData = new ByteArrayInputStream(baos.toByteArray());
doSecondThing(isFromFirstData, outStream);
}
Hope it helps.
ByteArrayOutputStream buffer = (ByteArrayOutputStream) aOutputStream;
byte[] bytes = buffer.toByteArray();
InputStream inputStream = new ByteArrayInputStream(bytes);
You will need an intermediate class which will buffer between. Each time InputStream.read(byte[]...) is called, the buffering class will fill the passed in byte array with the next chunk passed in from OutputStream.write(byte[]...). Since the sizes of the chunks may not be the same, the adapter class will need to store a certain amount until it has enough to fill the read buffer and/or be able to store up any buffer overflow.
This article has a nice breakdown of a few different approaches to this problem:
http://blog.ostermiller.org/convert-java-outputstream-inputstream
The easystream open source library has direct support to convert an OutputStream to an InputStream: http://io-tools.sourceforge.net/easystream/tutorial/tutorial.html
// create conversion
final OutputStreamToInputStream<Void> out = new OutputStreamToInputStream<Void>() {
#Override
protected Void doRead(final InputStream in) throws Exception {
LibraryClass2.processDataFromInputStream(in);
return null;
}
};
try {
LibraryClass1.writeDataToTheOutputStream(out);
} finally {
// don't miss the close (or a thread would not terminate correctly).
out.close();
}
They also list other options: http://io-tools.sourceforge.net/easystream/outputstream_to_inputstream/implementations.html
Write the data the data into a memory buffer (ByteArrayOutputStream) get the byteArray and read it again with a ByteArrayInputStream. This is the best approach if you're sure your data fits into memory.
Copy your data to a temporary file and read it back.
Use pipes: this is the best approach both for memory usage and speed (you can take full advantage of the multi-core processors) and also the standard solution offered by Sun.
Use InputStreamFromOutputStream and OutputStreamToInputStream from the easystream library.
I encountered the same problem with converting a ByteArrayOutputStream to a ByteArrayInputStream and solved it by using a derived class from ByteArrayOutputStream which is able to return a ByteArrayInputStream that is initialized with the internal buffer of the ByteArrayOutputStream. This way no additional memory is used and the 'conversion' is very fast:
package info.whitebyte.utils;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
/**
* This class extends the ByteArrayOutputStream by
* providing a method that returns a new ByteArrayInputStream
* which uses the internal byte array buffer. This buffer
* is not copied, so no additional memory is used. After
* creating the ByteArrayInputStream the instance of the
* ByteArrayInOutStream can not be used anymore.
* <p>
* The ByteArrayInputStream can be retrieved using <code>getInputStream()</code>.
* #author Nick Russler
*/
public class ByteArrayInOutStream extends ByteArrayOutputStream {
/**
* Creates a new ByteArrayInOutStream. The buffer capacity is
* initially 32 bytes, though its size increases if necessary.
*/
public ByteArrayInOutStream() {
super();
}
/**
* Creates a new ByteArrayInOutStream, with a buffer capacity of
* the specified size, in bytes.
*
* #param size the initial size.
* #exception IllegalArgumentException if size is negative.
*/
public ByteArrayInOutStream(int size) {
super(size);
}
/**
* Creates a new ByteArrayInputStream that uses the internal byte array buffer
* of this ByteArrayInOutStream instance as its buffer array. The initial value
* of pos is set to zero and the initial value of count is the number of bytes
* that can be read from the byte array. The buffer array is not copied. This
* instance of ByteArrayInOutStream can not be used anymore after calling this
* method.
* #return the ByteArrayInputStream instance
*/
public ByteArrayInputStream getInputStream() {
// create new ByteArrayInputStream that respects the current count
ByteArrayInputStream in = new ByteArrayInputStream(this.buf, 0, this.count);
// set the buffer of the ByteArrayOutputStream
// to null so it can't be altered anymore
this.buf = null;
return in;
}
}
I put the stuff on github: https://github.com/nickrussler/ByteArrayInOutStream
The library io-extras may be useful. For example if you want to gzip an InputStream using GZIPOutputStream and you want it to happen synchronously (using the default buffer size of 8192):
InputStream is = ...
InputStream gz = IOUtil.pipe(is, o -> new GZIPOutputStream(o));
Note that the library has 100% unit test coverage (for what that's worth of course!) and is on Maven Central. The Maven dependency is:
<dependency>
<groupId>com.github.davidmoten</groupId>
<artifactId>io-extras</artifactId>
<version>0.1</version>
</dependency>
Be sure to check for a later version.
From my point of view, java.io.PipedInputStream/java.io.PipedOutputStream is the best option to considere. In some situations you may want to use ByteArrayInputStream/ByteArrayOutputStream. The problem is that you need to duplicate the buffer to convert a ByteArrayOutputStream to a ByteArrayInputStream. Also ByteArrayOutpuStream/ByteArrayInputStream are limited to 2GB. Here is an OutpuStream/InputStream implementation I wrote to bypass ByteArrayOutputStream/ByteArrayInputStream limitations (Scala code, but easily understandable for java developpers):
import java.io.{IOException, InputStream, OutputStream}
import scala.annotation.tailrec
/** Acts as a replacement for ByteArrayOutputStream
*
*/
class HugeMemoryOutputStream(capacity: Long) extends OutputStream {
private val PAGE_SIZE: Int = 1024000
private val ALLOC_STEP: Int = 1024
/** Pages array
*
*/
private var streamBuffers: Array[Array[Byte]] = Array.empty[Array[Byte]]
/** Allocated pages count
*
*/
private var pageCount: Int = 0
/** Allocated bytes count
*
*/
private var allocatedBytes: Long = 0
/** Current position in stream
*
*/
private var position: Long = 0
/** Stream length
*
*/
private var length: Long = 0
allocSpaceIfNeeded(capacity)
/** Gets page count based on given length
*
* #param length Buffer length
* #return Page count to hold the specified amount of data
*/
private def getPageCount(length: Long) = {
var pageCount = (length / PAGE_SIZE).toInt + 1
if ((length % PAGE_SIZE) == 0) {
pageCount -= 1
}
pageCount
}
/** Extends pages array
*
*/
private def extendPages(): Unit = {
if (streamBuffers.isEmpty) {
streamBuffers = new Array[Array[Byte]](ALLOC_STEP)
}
else {
val newStreamBuffers = new Array[Array[Byte]](streamBuffers.length + ALLOC_STEP)
Array.copy(streamBuffers, 0, newStreamBuffers, 0, streamBuffers.length)
streamBuffers = newStreamBuffers
}
pageCount = streamBuffers.length
}
/** Ensures buffers are bug enough to hold specified amount of data
*
* #param value Amount of data
*/
private def allocSpaceIfNeeded(value: Long): Unit = {
#tailrec
def allocSpaceIfNeededIter(value: Long): Unit = {
val currentPageCount = getPageCount(allocatedBytes)
val neededPageCount = getPageCount(value)
if (currentPageCount < neededPageCount) {
if (currentPageCount == pageCount) extendPages()
streamBuffers(currentPageCount) = new Array[Byte](PAGE_SIZE)
allocatedBytes = (currentPageCount + 1).toLong * PAGE_SIZE
allocSpaceIfNeededIter(value)
}
}
if (value < 0) throw new Error("AllocSpaceIfNeeded < 0")
if (value > 0) {
allocSpaceIfNeededIter(value)
length = Math.max(value, length)
if (position > length) position = length
}
}
/**
* Writes the specified byte to this output stream. The general
* contract for <code>write</code> is that one byte is written
* to the output stream. The byte to be written is the eight
* low-order bits of the argument <code>b</code>. The 24
* high-order bits of <code>b</code> are ignored.
* <p>
* Subclasses of <code>OutputStream</code> must provide an
* implementation for this method.
*
* #param b the <code>byte</code>.
*/
#throws[IOException]
override def write(b: Int): Unit = {
val buffer: Array[Byte] = new Array[Byte](1)
buffer(0) = b.toByte
write(buffer)
}
/**
* Writes <code>len</code> bytes from the specified byte array
* starting at offset <code>off</code> to this output stream.
* The general contract for <code>write(b, off, len)</code> is that
* some of the bytes in the array <code>b</code> are written to the
* output stream in order; element <code>b[off]</code> is the first
* byte written and <code>b[off+len-1]</code> is the last byte written
* by this operation.
* <p>
* The <code>write</code> method of <code>OutputStream</code> calls
* the write method of one argument on each of the bytes to be
* written out. Subclasses are encouraged to override this method and
* provide a more efficient implementation.
* <p>
* If <code>b</code> is <code>null</code>, a
* <code>NullPointerException</code> is thrown.
* <p>
* If <code>off</code> is negative, or <code>len</code> is negative, or
* <code>off+len</code> is greater than the length of the array
* <code>b</code>, then an <tt>IndexOutOfBoundsException</tt> is thrown.
*
* #param b the data.
* #param off the start offset in the data.
* #param len the number of bytes to write.
*/
#throws[IOException]
override def write(b: Array[Byte], off: Int, len: Int): Unit = {
#tailrec
def writeIter(b: Array[Byte], off: Int, len: Int): Unit = {
val currentPage: Int = (position / PAGE_SIZE).toInt
val currentOffset: Int = (position % PAGE_SIZE).toInt
if (len != 0) {
val currentLength: Int = Math.min(PAGE_SIZE - currentOffset, len)
Array.copy(b, off, streamBuffers(currentPage), currentOffset, currentLength)
position += currentLength
writeIter(b, off + currentLength, len - currentLength)
}
}
allocSpaceIfNeeded(position + len)
writeIter(b, off, len)
}
/** Gets an InputStream that points to HugeMemoryOutputStream buffer
*
* #return InputStream
*/
def asInputStream(): InputStream = {
new HugeMemoryInputStream(streamBuffers, length)
}
private class HugeMemoryInputStream(streamBuffers: Array[Array[Byte]], val length: Long) extends InputStream {
/** Current position in stream
*
*/
private var position: Long = 0
/**
* Reads the next byte of data from the input stream. The value byte is
* returned as an <code>int</code> in the range <code>0</code> to
* <code>255</code>. If no byte is available because the end of the stream
* has been reached, the value <code>-1</code> is returned. This method
* blocks until input data is available, the end of the stream is detected,
* or an exception is thrown.
*
* <p> A subclass must provide an implementation of this method.
*
* #return the next byte of data, or <code>-1</code> if the end of the
* stream is reached.
*/
#throws[IOException]
def read: Int = {
val buffer: Array[Byte] = new Array[Byte](1)
if (read(buffer) == 0) throw new Error("End of stream")
else buffer(0)
}
/**
* Reads up to <code>len</code> bytes of data from the input stream into
* an array of bytes. An attempt is made to read as many as
* <code>len</code> bytes, but a smaller number may be read.
* The number of bytes actually read is returned as an integer.
*
* <p> This method blocks until input data is available, end of file is
* detected, or an exception is thrown.
*
* <p> If <code>len</code> is zero, then no bytes are read and
* <code>0</code> is returned; otherwise, there is an attempt to read at
* least one byte. If no byte is available because the stream is at end of
* file, the value <code>-1</code> is returned; otherwise, at least one
* byte is read and stored into <code>b</code>.
*
* <p> The first byte read is stored into element <code>b[off]</code>, the
* next one into <code>b[off+1]</code>, and so on. The number of bytes read
* is, at most, equal to <code>len</code>. Let <i>k</i> be the number of
* bytes actually read; these bytes will be stored in elements
* <code>b[off]</code> through <code>b[off+</code><i>k</i><code>-1]</code>,
* leaving elements <code>b[off+</code><i>k</i><code>]</code> through
* <code>b[off+len-1]</code> unaffected.
*
* <p> In every case, elements <code>b[0]</code> through
* <code>b[off]</code> and elements <code>b[off+len]</code> through
* <code>b[b.length-1]</code> are unaffected.
*
* <p> The <code>read(b,</code> <code>off,</code> <code>len)</code> method
* for class <code>InputStream</code> simply calls the method
* <code>read()</code> repeatedly. If the first such call results in an
* <code>IOException</code>, that exception is returned from the call to
* the <code>read(b,</code> <code>off,</code> <code>len)</code> method. If
* any subsequent call to <code>read()</code> results in a
* <code>IOException</code>, the exception is caught and treated as if it
* were end of file; the bytes read up to that point are stored into
* <code>b</code> and the number of bytes read before the exception
* occurred is returned. The default implementation of this method blocks
* until the requested amount of input data <code>len</code> has been read,
* end of file is detected, or an exception is thrown. Subclasses are encouraged
* to provide a more efficient implementation of this method.
*
* #param b the buffer into which the data is read.
* #param off the start offset in array <code>b</code>
* at which the data is written.
* #param len the maximum number of bytes to read.
* #return the total number of bytes read into the buffer, or
* <code>-1</code> if there is no more data because the end of
* the stream has been reached.
* #see java.io.InputStream#read()
*/
#throws[IOException]
override def read(b: Array[Byte], off: Int, len: Int): Int = {
#tailrec
def readIter(acc: Int, b: Array[Byte], off: Int, len: Int): Int = {
val currentPage: Int = (position / PAGE_SIZE).toInt
val currentOffset: Int = (position % PAGE_SIZE).toInt
val count: Int = Math.min(len, length - position).toInt
if (count == 0 || position >= length) acc
else {
val currentLength = Math.min(PAGE_SIZE - currentOffset, count)
Array.copy(streamBuffers(currentPage), currentOffset, b, off, currentLength)
position += currentLength
readIter(acc + currentLength, b, off + currentLength, len - currentLength)
}
}
readIter(0, b, off, len)
}
/**
* Skips over and discards <code>n</code> bytes of data from this input
* stream. The <code>skip</code> method may, for a variety of reasons, end
* up skipping over some smaller number of bytes, possibly <code>0</code>.
* This may result from any of a number of conditions; reaching end of file
* before <code>n</code> bytes have been skipped is only one possibility.
* The actual number of bytes skipped is returned. If <code>n</code> is
* negative, the <code>skip</code> method for class <code>InputStream</code> always
* returns 0, and no bytes are skipped. Subclasses may handle the negative
* value differently.
*
* The <code>skip</code> method of this class creates a
* byte array and then repeatedly reads into it until <code>n</code> bytes
* have been read or the end of the stream has been reached. Subclasses are
* encouraged to provide a more efficient implementation of this method.
* For instance, the implementation may depend on the ability to seek.
*
* #param n the number of bytes to be skipped.
* #return the actual number of bytes skipped.
*/
#throws[IOException]
override def skip(n: Long): Long = {
if (n < 0) 0
else {
position = Math.min(position + n, length)
length - position
}
}
}
}
Easy to use, no buffer duplication, no 2GB memory limit
val out: HugeMemoryOutputStream = new HugeMemoryOutputStream(initialCapacity /*may be 0*/)
out.write(...)
...
val in1: InputStream = out.asInputStream()
in1.read(...)
...
val in2: InputStream = out.asInputStream()
in2.read(...)
...
As some here have answered already, there is no efficient way to just ‘convert’ an OutputStream to an InputStream. The trick to solve a problem like yours is to execute all code that requires the OutputStream into its own thread. By using piped streams, we can then transfer the data out of the created thread over into an InputStream.
Example usage:
public static InputStream downloadFileAsStream(final String uriString) throws IOException {
final InputStream inputStream = runInOwnThreadWithPipedStreams((outputStream) -> {
try {
downloadUriToStream(uriString, outputStream);
} catch (final Exception e) {
LOGGER.error("Download of uri '{}' has failed", uriString, e);
}
});
return inputStream;
}
Helper function:
public static InputStream runInOwnThreadWithPipedStreams(
final Consumer<OutputStream> outputStreamConsumer) throws IOException {
final PipedInputStream inputStream = new PipedInputStream();
final PipedOutputStream outputStream = new PipedOutputStream(inputStream);
new Thread(new Runnable() {
public void run() {
try {
outputStreamConsumer.accept(outputStream);
} finally {
try {
outputStream.close();
} catch (final IOException e) {
LOGGER.error("Closing outputStream has failed. ", e);
}
}
}
}).start();
return inputStream;
}
Unit Test:
#Test
void testRunInOwnThreadWithPipedStreams() throws IOException {
final InputStream inputStream = LoadFileUtil.runInOwnThreadWithPipedStreams((OutputStream outputStream) -> {
try {
IOUtils.copy(IOUtils.toInputStream("Hello World", StandardCharsets.UTF_8), outputStream);
} catch (final IOException e) {
LoggerFactory.getLogger(LoadFileUtilTest.class).error(e.getMessage(), e);
}
});
final String actualResult = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
Assertions.assertEquals("Hello World", actualResult);
}
If you want to make an OutputStream from an InputStream there is one basic problem. A method writing to an OutputStream blocks until it is done. So the result is available when the writing method is finished. This has 2 consequences:
If you use only one thread, you need to wait until everything is written (so you need to store the stream's data in memory or disk).
If you want to access the data before it is finished, you need a second thread.
Variant 1 can be implemented using byte arrays or filed.
Variant 1 can be implemented using pipies (either directly or with extra abstraction - e.g. RingBuffer or the google lib from the other comment).
Indeed with standard java there is no other way to solve the problem. Each solution is an implementataion of one of these.
There is one concept called "continuation" (see wikipedia for details). In this case basically this means:
there is a special output stream that expects a certain amount of data
if the ammount is reached, the stream gives control to it's counterpart which is a special input stream
the input stream makes the amount of data available until it is read, after that, it passes back the control to the output stream
While some languages have this concept built in, for java you need some "magic". For example "commons-javaflow" from apache implements such for java. The disadvantage is that this requires some special bytecode modifications at build time. So it would make sense to put all the stuff in an extra library whith custom build scripts.
Though you cannot convert an OutputStream to an InputStream, java provides a way using PipedOutputStream and PipedInputStream that you can have data written to a PipedOutputStream to become available through an associated PipedInputStream. Sometime back I faced a similar situation when dealing with third party libraries that required an InputStream instance to be passed to them instead of an OutputStream instance. The way I fixed this issue is to use the PipedInputStream and PipedOutputStream. By the way they are tricky to use and you must use multithreading to achieve what you want. I recently published an implementation on github which you can use. Here is the link . You can go through the wiki to understand how to use it.
Old post but might help others, Use this way:
OutputStream out = new ByteArrayOutputStream();
...
out.write();
...
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(out.toString().getBytes()));