Java: How to "trim" a byte array? - java

So I have some code that reads a certain amount of bytes from a file and returns the resulting byte array (this is basically used for chunking up files to send over the network as (eventually) base64-encoded ascii text).
It works fine, except that when the last chunk of the file is generated, it isnt a full chunk. Therefore, the resulting byte array isnt full. However, it is a constant size, which means that the file is reassembled there is a whole bunch of extra data (0's maybe) appended to the end.
How can I make it so that the byte[] for the last chunk of the file really only contains the data it needs to? The code looks like this:
private byte[] readData(File f, int startByte, int chunkSize) throws Exception {
RandomAccessFile raf = new RandomAccessFile(f, "r");
raf.seek(startByte);
byte[] data = new byte[chunkSize];
raf.read(data);
raf.close();
return data;
}
So if chunkSize is bigger than the remaining bytes in the file, a full sized byte[] gets returned but its only half-full with data.

You'll have to check the return value of RandomAccessFile.read() to determine the number of bytes read. If it's different than the chunkSize, you'll have to copy the array over to a smaller one and return that.
private byte[] readData(File f, int startByte, int chunkSize) throws Exception {
RandomAccessFile raf = new RandomAccessFile(f, "r");
raf.seek(startByte);
byte[] data = new byte[chunkSize];
int bytesRead = raf.read(data);
if (bytesRead != chunkSize) {
byte[] smallerData = new byte[bytesRead];
System.arraycopy(data, 0, smallerData, 0, bytesRead);
data = smallerData;
}
raf.close();
return data;
}

RandomAccessFile.read() returns the number of bytes read, so you can do copy the array if needed:
private byte[] readData(File f, int startByte, int chunkSize) throws Exception {
RandomAccessFile raf = new RandomAccessFile(f, "r");
raf.seek(startByte);
byte[] data = new byte[chunkSize];
int read = raf.read(data);
raf.close();
if (read == data.length) return data;
else
return Arrays.copyOf(data, read);
}
If you are using Java pre-6, then you need to implement Arrays.copyOf yourself:
byte[] r = new byte[read];
System.arraycopy(data, 0, r, 0, read);
return r;

You could also use the size of the file to calculate the remaining number of bytes.
private byte[] readData(File f, int startByte, int chunkSize) throws Exception {
RandomAccessFile raf = new RandomAccessFile(f, "r");
raf.seek(startByte);
int size = (int) Math.min(chunkSize, raf.length()-startByte);
byte[] data = new byte[size];
raf.read(data);
// TODO check the value returned by read (throw Exception or loop)
raf.close();
return data;
}
This way you don't create an additional Array and do not need the copy. Probably not a big impact.
One important point IMO: check the value returned by read, I think it can be less than the remaining bytes. The javadoc states:
The number of bytes read is, at most, equal to the length of b

Related

Java Reading large files into byte array chunk by chunk

So I've been trying to make a small program that inputs a file into a byte array, then it will turn that byte array into hex, then binary. It will then play with the binary values (I haven't thought of what to do when I get to this stage) and then save it as a custom file.
I studied a lot of internet code and I can turn a file into a byte array and into hex, but the problem is I can't turn huge files into byte arrays (out of memory).
This is the code that is not a complete failure
public void rundis(Path pp) {
byte bb[] = null;
try {
bb = Files.readAllBytes(pp); //Files.toByteArray(pathhold);
System.out.println("byte array made");
} catch (Exception e) {
e.printStackTrace();
}
if (bb.length != 0 || bb != null) {
System.out.println("byte array filled");
//send to method to turn into hex
} else {
System.out.println("byte array NOT filled");
}
}
I know how the process should go, but I don't know how to code that properly.
The process if you are interested:
Input file using File
Read the chunk by chunk of the file into a byte array. Ex. each byte array record hold 600 bytes
Send that chunk to be turned into a Hex value --> Integer.tohexstring
Send that hex value chunk to be made into a binary value --> Integer.toBinarystring
Mess around with the Binary value
Save to custom file line by line
Problem:: I don't know how to turn a huge file into a byte array chunk by chunk to be processed.
Any and all help will be appreciated, thank you for reading :)
To chunk your input use a FileInputStream:
Path pp = FileSystems.getDefault().getPath("logs", "access.log");
final int BUFFER_SIZE = 1024*1024; //this is actually bytes
FileInputStream fis = new FileInputStream(pp.toFile());
byte[] buffer = new byte[BUFFER_SIZE];
int read = 0;
while( ( read = fis.read( buffer ) ) > 0 ){
// call your other methodes here...
}
fis.close();
To stream a file, you need to step away from Files.readAllBytes(). It's a nice utility for small files, but as you noticed not so much for large files.
In pseudocode it would look something like this:
while there are more bytes available
read some bytes
process those bytes
(write the result back to a file, if needed)
In Java, you can use a FileInputStream to read a file byte by byte or chunk by chunk. Lets say we want to write back our processed bytes. First we open the files:
FileInputStream is = new FileInputStream(new File("input.txt"));
FileOutputStream os = new FileOutputStream(new File("output.txt"));
We need the FileOutputStream to write back our results - we don't want to just drop our precious processed data, right? Next we need a buffer which holds a chunk of bytes:
byte[] buf = new byte[4096];
How many bytes is up to you, I kinda like chunks of 4096 bytes. Then we need to actually read some bytes
int read = is.read(buf);
this will read up to buf.length bytes and store them in buf. It will return the total bytes read. Then we process the bytes:
//Assuming the processing function looks like this:
//byte[] process(byte[] data, int bytes);
byte[] ret = process(buf, read);
process() in above example is your processing method. It takes in a byte-array, the number of bytes it should process and returns the result as byte-array.
Last, we write the result back to a file:
os.write(ret);
We have to execute this in a loop until there are no bytes left in the file, so lets write a loop for it:
int read = 0;
while((read = is.read(buf)) > 0) {
byte[] ret = process(buf, read);
os.write(ret);
}
and finally close the streams
is.close();
os.close();
And thats it. We processed the file in 4096-byte chunks and wrote the result back to a file. It's up to you what to do with the result, you could also send it over TCP or even drop it if it's not needed, or even read from TCP instead of a file, the basic logic is the same.
This still needs some proper error-handling to work around missing files or wrong permissions but that's up to you to implement that.
A example implementation for the process method:
//returns the hex-representation of the bytes
public static byte[] process(byte[] bytes, int length) {
final char[] hexchars = "0123456789ABCDEF".toCharArray();
char[] ret = new char[length * 2];
for ( int i = 0; i < length; ++i) {
int b = bytes[i] & 0xFF;
ret[i * 2] = hexchars[b >>> 4];
ret[i * 2 + 1] = hexchars[b & 0x0F];
}
return ret;
}

Reading binary from any type of file

I'm looking for a way that I can read the binary data of a file into a string.
I've found one that reads the bytes directly and converts the bytes to binary, the only problem is that it takes up a significant amount of RAM.
Here's the code I'm currently using
try {
byte[] fileData = new byte[(int) sellect.length()];
FileInputStream in = new FileInputStream(sellect);
in.read(fileData);
in.close();
getBinary(fileData[0]);
getBinary(fileData[1]);
getBinary(fileData[2]);
} catch (IOException e) {
e.printStackTrace();
}
And the getBinary() method
public String getBinary(byte bite) {
String output = String.format("%8s", Integer.toBinaryString(bite & 0xFF)).replace(' ', '0');
System.out.println(output); // 10000001
return output;
}
Can you do something like this:
int buffersize = 1000;
int offset = 0;
byte[] fileData = new byte[buffersize];
int numBytesRead;
String string;
while((numBytesRead = in.read(fileData,offset,buffersize)) != -1)
{
string = getBinary(fileData);//Adjust this so it can work with a whole array of bytes at once
out.write(string);
offset += numBytesRead;
}
This way, you never store more information in the ram than the byte and string structures. The file is read 1000 bytes at a time, translated to a string 1 byte at a time, and then put into a new file as a string. Using read() returns the value of how many bytes it reads.
This link can help you :
File to byte[] in Java
public static byte[] toByteArray(InputStream input) throws IOException
Gets the contents of an InputStream as a byte[]. This method buffers
the input internally, so there is no need to use a
BufferedInputStream.
Parameters: input - the InputStream to read from Returns: the
requested byte array Throws: NullPointerException - if the input is
null IOException - if an I/O error occurs

java: read large binary file

I need to read out a given large file that contains 500000001 binaries. Afterwards I have to translate them into ASCII.
My Problem occurs while trying to store the binaries in a large array. I get the warning at the definition of the array ioBuf:
"The literal 16000000032 of type int is out of range."
I have no clue how to save these numbers to work with them! Has somebody an idea?
Here is my code:
public byte[] read(){
try{
BufferedInputStream in = new BufferedInputStream(new FileInputStream("data.dat"));
ByteArrayOutputStream bs = new ByteArrayOutputStream();
BufferedOutputStream out = new BufferedOutputStream(bs);
byte[] ioBuf = new byte[16000000032];
int bytesRead;
while ((bytesRead = in.read(ioBuf)) != -1){
out.write(ioBuf, 0, bytesRead);
}
out.close();
in.close();
return bs.toByteArray();
}
The maximum Index of an Array is Integer.MAX_VALUE and 16000000032 is greater than Integer.MAX_VALUE
Integer.MAX_VALUE = 2^31-1 = 2147483647
2147483647 < 16000000032
You could overcome this by checking if the Array is full and create another and continue reading.
But i'm not quite sure if your approach is the best way to perform this. byte[Integer_MAX_VALUE] is huge ;)
Maybe you can split the input file in smaller chunks process them.
EDIT: This is how you could read a single int of your file. You can resize the buffer's size to the amount of data you want to read. But you tried to read the whole file at once.
//Allocate buffer with 4byte = 32bit = Integer.SIZE
byte[] ioBuf = new byte[4];
int bytesRead;
while ((bytesRead = in.read(ioBuf)) != -1){
//if bytesRead == 4 you read 1 int
//do your stuff
}
If you need to declare a large constant, append an 'L' to it which indicates to the compiler that is a long constant. However, as mentioned in another answer you can't declare arrays that large.
I suspect the purpose of the exercise is to learn how to use the java.nio.Buffer family of classes.
I made some progress by starting from scratch! But I still have a problem.
My idea is to read up the first 32 bytes, convert them to a int number. Then the next 32 bytes etc. Unfortunately I just get the first and don't know how to proceed.
I discovered following method for converting these numbers to int:
public static int byteArrayToInt(byte[] b){
final ByteBuffer bb = ByteBuffer.wrap(b);
bb.order(ByteOrder.LITTLE_ENDIAN);
return bb.getInt();
}
so now I have:
BufferedInputStream in=null;
byte[] buf = new byte[32];
try {
in = new BufferedInputStream(new FileInputStream("ndata.dat"));
in.read(buf);
System.out.println(byteArrayToInt(buf));
in.close();
} catch (IOException e) {
System.out.println("error while reading ndata.dat file");
}

How to use ByteStream to read 1Mb of a file into a string

What I have now is using FileInputStream
int length = 1024*1024;
FileInputStream fs = new FileInputStream(new File("foo"));
fs.skip(offset);
byte[] buf = new byte[length];
int bufferSize = fs.read(buf, 0, length);
String s = new String(buf, 0, bufferSize);
I'm wondering how can I realize the same result by using ByteStreams in guava library.
Thanks a lot!
Here's how you could do it with Guava:
byte[] bytes = Files.asByteSource(new File("foo"))
.slice(offset, length)
.read();
String s = new String(bytes, Charsets.US_ASCII);
There are a couple of problems with your code (though it may work fine for files, it won't necessarily for any type of stream):
fs.skip(offset);
This doesn't necessarily skip all offset bytes. You have to either check the number of bytes it skipped in the return value until you've skipped the full amount or use something that does that for you, such as ByteStreams.skipFully.
int bufferSize = fs.read(buf, 0, length);
Again, this won't necessarily read all length bytes, and the number of bytes it does read can be an arbitrary amount--you can't rely on it in general.
String s = new String(buf, 0, bufferSize);
This implicitly uses the system default Charset, which usually isn't a good idea--and when you do want it, it's best to make it explicit with Charset.defaultCharset().
Also note that in general, a certain number of bytes may not translate to a legal sequence of characters depending on the Charset being used (i.e. if it's ASCII you're fine, if it's Unicode, not so much).
Why try to use Guava when it's not necessary ?
In this case, it looks like you're looking exactly for a RandomAccessFile.
File file = new File("foo");
long offset = ... ;
try (RandomAccessFile raf = new RandomAccessFile(file, "r")) {
byte[] buffer = new byte[1014*1024];
raf.seek(offset);
raf.readFully(buffer);
return new String(buffer, Charset.defaultCharset());
}
I'm not aware of a more elegant solution:
public static void main(String[] args) throws IOException {
final int offset = 20;
StringBuilder to = new StringBuilder();
CharStreams.copy(CharStreams.newReaderSupplier(new InputSupplier<InputStream>() {
#Override
public InputStream getInput() throws IOException {
FileInputStream fs = new FileInputStream(new File("pom.xml"));
ByteStreams.skipFully(fs, offset);
return fs;
}
}, Charset.defaultCharset()), to);
System.out.println(to);
}
The only advantage is that you can save some GC time when your String is really big by avoiding conversion into String.

Reading a binary input stream into a single byte array in Java

The documentation says that one should not use available() method to determine the size of an InputStream. How can I read the whole content of an InputStream into a byte array?
InputStream in; //assuming already present
byte[] data = new byte[in.available()];
in.read(data);//now data is filled with the whole content of the InputStream
I could read multiple times into a buffer of a fixed size, but then, I will have to combine the data I read into a single byte array, which is a problem for me.
The simplest approach IMO is to use Guava and its ByteStreams class:
byte[] bytes = ByteStreams.toByteArray(in);
Or for a file:
byte[] bytes = Files.toByteArray(file);
Alternatively (if you didn't want to use Guava), you could create a ByteArrayOutputStream, and repeatedly read into a byte array and write into the ByteArrayOutputStream (letting that handle resizing), then call ByteArrayOutputStream.toByteArray().
Note that this approach works whether you can tell the length of your input or not - assuming you have enough memory, of course.
Please keep in mind that the answers here assume that the length of the file is less than or equal to Integer.MAX_VALUE(2147483647).
If you are reading in from a file, you can do something like this:
File file = new File("myFile");
byte[] fileData = new byte[(int) file.length()];
DataInputStream dis = new DataInputStream(new FileInputStream(file));
dis.readFully(fileData);
dis.close();
UPDATE (May 31, 2014):
Java 7 adds some new features in the java.nio.file package that can be used to make this example a few lines shorter. See the readAllBytes() method in the java.nio.file.Files class. Here is a short example:
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
// ...
Path p = FileSystems.getDefault().getPath("", "myFile");
byte [] fileData = Files.readAllBytes(p);
Android has support for this starting in Api level 26 (8.0.0, Oreo).
You can use Apache commons-io for this task:
Refer to this method:
public static byte[] readFileToByteArray(File file) throws IOException
Update:
Java 7 way:
byte[] bytes = Files.readAllBytes(Paths.get(filename));
and if it is a text file and you want to convert it to String (change encoding as needed):
StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes)).toString()
You can read it by chunks (byte buffer[] = new byte[2048]) and write the chunks to a ByteArrayOutputStream. From the ByteArrayOutputStream you can retrieve the contents as a byte[], without needing to determine its size beforehand.
I believe buffer length needs to be specified, as memory is finite and you may run out of it
Example:
InputStream in = new FileInputStream(strFileName);
long length = fileFileName.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("File is too large!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + fileFileName.getName());
}
in.close();
Max value for array index is Integer.MAX_INT - it's around 2Gb (2^31 / 2 147 483 647).
Your input stream can be bigger than 2Gb, so you have to process data in chunks, sorry.
InputStream is;
final byte[] buffer = new byte[512 * 1024 * 1024]; // 512Mb
while(true) {
final int read = is.read(buffer);
if ( read < 0 ) {
break;
}
// do processing
}

Categories

Resources