Byte array to File object without saving to disk

Byte array to File object without saving to disk - java

I have a method that takes in a byte[] that came from Files.readAllBytes() in a different part of the code for either .txt or .docx files. I want to create a new File object from the bytes to later read contents from, without saving that file to disk. Is this possible? Or is there a better way to get the contents from the File bytes?

That's not how it works. a java.io.File object is a light wrapper: Check out the source code - it's got a String field that contains the path and that is all it has aside from some bookkeeping stuff.
It is not possible to represent arbitrary data with a java.io.File object. j.i.File objects represent literal files on disk and are not capable of representing anything else.
Files.readAllBytes gets you the contents from the bytes, that's.. why the method has that name.
The usual solution is that a method in some library that takes a File is overloaded; there will also be a method that takes a byte[], or, if that isn't around, a method that takes an InputStream (you can make an IS from a byte[] easily: new ByteArrayInputStream(byteArr) will do the job).
If the API you are using doesn't contain any such methods, it's a bad API and you should either find something else, or grit your teeth and accept that you're using a bad API, with all the workarounds that this implies, including having to save bytes to disk just to satisfy the asinine API.
But look first; I bet there is a byte[] and/or InputStream variant (or possibly URL or ByteBuffer or ByteStream or a few other more exotic variants).

Related

How do I get a NetcdfFile out of a byte array?

I am relatively new to using the netcdf-java library, and I've immediately run into a problem when trying to load a file. The problem is that there doesn't seem to be a way to load a NetcdfFile from a byte array stored in memory, and that is the base form of my data. To elaborate a little, it is actually a .cdf file uploaded through a client, which the client then converts into a byte array for the server code to read. So the server, where my code is running, cannot see the uploaded file at all. I also cannot assume the server itself is writable, so essentially there is no "location" to pass into the typical NetcdfFile loading methods.
The FAQ on ucar.edu does mention the possibility of reading from a non-file source, here. It says I should write my own IOSP, which I am happy to do. However, there is very little guidance on how to do this.
I don't know how to implement isValidFile when the only thing passed into the function is a RandomAccessFile, which the FAQ says can be ignored.
I don't know how my IOSP will obtain the byte array in question for use in readData.
I don't know why the minimal example in the FAQ advises me to make a new NetcdfFile class, when it seems I could just use the default one but pass in my custom IOSP.
This question is a little vague, but I am truly lost without many clues on where to even begin. Any guidance would be appreciated.
EDIT: I'm using 5.4.2 of the netcdf-java library.

I found this answer in the support archives. The solution is to use InMemoryRandomAccessFile. The constructor takes a String location and a byte array containing the file's contents. From my testing, I think the location can be any arbitrary string. Here is the code that worked for me.
byte[] filebytes = retrieveFileBytes(clientFilepath);
InMemoryRandomAccessFile raf = new InMemoryRandomAccessFile(clientFilepath, filebytes);
NetcdfFile file = NetcdfFiles.open(raf, clientFilepath, null, null);
Variable peakRetentionTime = file.findVariable("peak_retention_time");
if (peakRetentionTime == null) {
displayWarning("peak_retention_time null!");
} else {
Array data = peakRetentionTime.read();
displayInfo(Ncdump.printArray(data));
}

Java - ByteBuffer or ArrayList<Byte>?

Recently I created a wrapper to read and write data into a byte array. To do it, I've been using an ArrayList<Byte>, but I was wondering if this is the most efficent way to do it, because:
addAll() doesn't work with byte arrays (even using Arrays.asList(), which returns me List<Byte[]>). To fix it I'm just looping and adding a byte at each loop, but I suppose this supposes a lot of function calls and so it has a performance cost.
The same happens for getting a byte[] from the ArrayList. I can't cast from Byte[] to byte[], so I have to use a loop for it.
I suppose storing Byte instead of byte uses more memory.
I know ByteArrayInputStream and ByteArrayOutputStream could be used for this, but it has some inconvenients:
I wanted to implement methods for reading different data types in different byte order (for example, readInt, readLEInt, readUInt, etc), while those classes only can read / write a byte or a byte array. This isn't really a problem because I could fix that in the wrapper. But here comes the second problem.
I wanted to be able to write and read at the same time because I'm using this to decompress some files. And so to create a wrapper for it I would need to include both ByteArrayInputStream and ByteArrayOutputStream. I don't know if those could be syncronized in some way or I'd have to write the entire data of one to the other each time I wrote to the wrapper.
And so, here comes my question: would using a ByteBuffer be more efficient? I know you can take integers, floats, etc from it, even being able to change the byte order. What I was wondering is if there is a real performance change between using a ByteBuffer and a ArrayList<Byte>.

Definitely ByteBuffer or ByteArrayOutputStream. In your case ByteBuffer seems fine. Inspect the Javadoc, as it has nice methods, For putInt/getInt and such, you might want to set order (of those 4 bytes)
byteBuffer.order(ByteBuffer.LITTLE_ENDIAN);
With files you could use getChannel() or variants and then use a MappedByteBuffer.
A ByteBuffer may wrap a byte array, or allocate.

Keep in mind that every object has overhead associated with it including a bit of memory per object and garbage collection once it goes out of scope.
Using List<Byte> would mean creating / garbage collecting an object per byte which is very wasteful.

ByteBuffer is a wrapper class around a byte array, it doesn't have dynamical size like ArrayList, but it consumes less memory per byte and is faster.
If you know the size you need, then use ByteBuffer, if you don't, then you could use ByteArrayOutputStream (and maybe wrapped by ObjectOutputStream, it has some methods to write different kinds of data). To read the data you have written to ByteArrayOutputStream you can extend the ByteArrayOutputStream, and then you can access the fields buf[] and count, those fields are protected, so you can access them from extending class, it look like:
public class ByteArrayOutputStream extends OutputStream {
/**
* The buffer where data is stored.
*/
protected byte buf[];
/**
* The number of valid bytes in the buffer.
*/
protected int count;
...
}
public class ReadableBAOS extends ByteArrayOutputStream{
public byte readByte(int index) {
if (count<index) {
throw new IndexOutOfBoundsException();
}
return buf[index];
}
}
so you can make some methods in your extending class to read some bytes from the underlying buffer without the need to make an copy of its content each time like toByteArray() method do.

Writing BitSet to output file without overhead?

I get a line of overhead ("java.util.BitSet") when writing a BitSet to an output file using ObjectOutputStream.writeObject().
Anyway around it?

That is not an "overhead", that't the marker that lets Java figure out what type it needs to create when deserializing the object from that file.
Since ObjectInputStream has no idea what you have serialized into a file, and has no way for you to provide a "hint", ObjectOutputStream must "embed" something for the input stream to be able to decide what class needs to be instantiated. That is why it places the "java.util.BitSet" string in front of the data of your BitSet.
You cannot get around writing this marker when you use serialization capabilities built into BitSet class. If you are serializing the object into a file by itself, with no other objects going in with it, you could write the result of toByteArray() call into a file, and call BitSet.valueOf(byteArray) after reading byteArray from the file.

Java Thrift Client and Binary data

So by my understanding for thrift, Java is the only language supported that does not have binary-safe Strings, hence the thrift binary type. My problem is it doesn't seem to work.
My definition File is:
service myService {
int myMethod(1:binary input)
}
My Java client builds a ByteBuffer from binary data that is observed to have positive length, and printable bytes prior to calling myMethod.
Immediately inside the C++ implementation of myMethod (from the thrift generated server skeleton), attempts to print input show it as always being empty of size 0.
Any ideas what I'm missing here? Changing binary to string makes everything work like a charm, minus the fact that I don't want the unsafe java-converted string to deal with later...

Most likely you're having problem because ByteBuffer in Java has mutable state. So, any read operation actually modifies ByteBuffer, since it modifies read position.
The simpliest (whereas not the most effective) way to work with thrift binaries in java is creating binaries as byte arrays and using wrapping them info buffers immidiately before invocation, i.e.:
byte[] input = ....;
myService.myMethod(ByteBuffer.wrap(input));
Another possible solution is to use ByteBuffer.duplicate to keep original buffer safe, i.e.:
ByteBuffer input = ....;
dump(input.duplicate());// dump function may change buffer position
myService.myMethod(input);

Can I use Xuggler to encode video/audio to a byte array?

It seems all methods expect either files or urls. I see some methods that work with OutputStream, but I haven't managed to open an IContainer using one of those methods; I always get an invalid return value.

Create your own IURLProtocolHandler interface and pass to IContainer.open(...) to open any type of media type you want.

You can look at this answer I posted on another question to write to an OutputStream (which could easily be a ByteArrayOutputStream).
This gist of it would be to use com.xuggle.xuggler.io.XugglerIO to map from an OutputStream to a special kind of file URL so that FFMPEG can access the stream.
IMediaWriter writer = ToolFactory.makeWriter(XugglerIO.map(outputStream));
Keep in mind that you'll now have to manually set your format (because it can't detect it from the filename). For example:
IContainerFormat containerFormat = IContainerFormat.make();
containerFormat.setOutputFormat("ogg", null, "application/ogg");
writer.getContainer().setFormat(containerFormat);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.