Java: is it possible to store raw data in source file? - java

OK, I know this is a bit of a weird question:
I'm writing this piece of java code and need to load raw data (approx 130000 floating points):
This data never changes, and since I don't want to write different loading methods for PC and Android, I was thinking of embedding it into the source file as a float[].
Too bad, there seems to be a limit of 65535 entries; is there an efficient way to do it?

Store that data in a file in the classpath; then read that data as a ByteBuffer which you then "convert" to a FloatBuffer. Note that the below code assumes big endian:
final InputStream in = getClass().getResourceAsStream("/path/to/data");
final ByteArrayOutputStream out = new ByteArrayOutputStream();
final byte[] buf = new byte[8192];
int count;
try {
while ((count = in.read(buf)) != -1)
out.write(buf, 0, count);
} finally {
out.close();
in.close();
}
final FloatBuffer buf = ByteBuffer.wrap(out.toByteArray()).asFloatBuffer();
You can then .get() from the FloatBuffer.

You could use 2 or 3 arrays to get around the limit, if that was your only problem with that approach.

Related

java: read large binary file

I need to read out a given large file that contains 500000001 binaries. Afterwards I have to translate them into ASCII.
My Problem occurs while trying to store the binaries in a large array. I get the warning at the definition of the array ioBuf:
"The literal 16000000032 of type int is out of range."
I have no clue how to save these numbers to work with them! Has somebody an idea?
Here is my code:
public byte[] read(){
try{
BufferedInputStream in = new BufferedInputStream(new FileInputStream("data.dat"));
ByteArrayOutputStream bs = new ByteArrayOutputStream();
BufferedOutputStream out = new BufferedOutputStream(bs);
byte[] ioBuf = new byte[16000000032];
int bytesRead;
while ((bytesRead = in.read(ioBuf)) != -1){
out.write(ioBuf, 0, bytesRead);
}
out.close();
in.close();
return bs.toByteArray();
}
The maximum Index of an Array is Integer.MAX_VALUE and 16000000032 is greater than Integer.MAX_VALUE
Integer.MAX_VALUE = 2^31-1 = 2147483647
2147483647 < 16000000032
You could overcome this by checking if the Array is full and create another and continue reading.
But i'm not quite sure if your approach is the best way to perform this. byte[Integer_MAX_VALUE] is huge ;)
Maybe you can split the input file in smaller chunks process them.
EDIT: This is how you could read a single int of your file. You can resize the buffer's size to the amount of data you want to read. But you tried to read the whole file at once.
//Allocate buffer with 4byte = 32bit = Integer.SIZE
byte[] ioBuf = new byte[4];
int bytesRead;
while ((bytesRead = in.read(ioBuf)) != -1){
//if bytesRead == 4 you read 1 int
//do your stuff
}
If you need to declare a large constant, append an 'L' to it which indicates to the compiler that is a long constant. However, as mentioned in another answer you can't declare arrays that large.
I suspect the purpose of the exercise is to learn how to use the java.nio.Buffer family of classes.
I made some progress by starting from scratch! But I still have a problem.
My idea is to read up the first 32 bytes, convert them to a int number. Then the next 32 bytes etc. Unfortunately I just get the first and don't know how to proceed.
I discovered following method for converting these numbers to int:
public static int byteArrayToInt(byte[] b){
final ByteBuffer bb = ByteBuffer.wrap(b);
bb.order(ByteOrder.LITTLE_ENDIAN);
return bb.getInt();
}
so now I have:
BufferedInputStream in=null;
byte[] buf = new byte[32];
try {
in = new BufferedInputStream(new FileInputStream("ndata.dat"));
in.read(buf);
System.out.println(byteArrayToInt(buf));
in.close();
} catch (IOException e) {
System.out.println("error while reading ndata.dat file");
}

Extract tar.gz file in memory in Java

I'm using the Apache Compress library to read a .tar.gz file, something like this:
final TarArchiveInputStream tarIn = initializeTarArchiveStream(this.archiveFile);
try {
TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
while (tarEntry != null) {
byte[] btoRead = new byte[1024];
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
int len = 0;
while ((len = tarIn.read(btoRead)) != -1) {
bout.write(btoRead, 0, len);
}
bout.close();
tarEntry = tarIn.getNextTarEntry();
}
tarIn.close();
}
catch (IOException e) {
e.printStackTrace();
}
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
You could replace the file stream with a ByteArrayOutputStream.
i.e. replace this:
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
with this:
ByteArrayOutputStream bout = new ByteArrayOutputStream();
and then after closing bout, use bout.toByteArray() to get the bytes.
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
Yea sure.
Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream ... or a series of such streams.
The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)
Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.
copy the value of btoread to a String like
String s = String.valueof(byteVar);
and goon appending the byte value to the string untill end of the file reaches..

FileOutputStream:Something That I am Missing Out?

I have this program that reads 2 Kb Data from a binary file adds some header to it and then writes it to a new file.
The code is
try {
FileInputStream fis = new FileInputStream(bin);
FileOutputStream fos = new FileOutputStream(bin.getName().replace(".bin", ".xyz"));
DataOutputStream dos=new DataOutputStream(fos);
fos.write(big, 0, big.length);
for (int n = 1; n <= pcount; n++) {
fis.read(file, mark, 2048);
mark = mark + 2048;
prbar.setValue(n);
prbar.setString("Converted packets:" + String.valueOf(n));
metas = "2048";
meta = metas.getBytes();
pc = String.valueOf(file.length).getBytes();
nval = String.valueOf(n).getBytes();
System.arraycopy(pc, 0, bmeta, 0, pc.length);
System.arraycopy(meta, 0, bmeta, 4, meta.length);
System.arraycopy(nval, 0, bmeta, 8, nval.length);
fos.write(bmeta, 0, bmeta.length);
fos.flush();
fos.write(file, 0, 2048);
fos.flush();
}
}catch (Exception ex) {
erlabel.setText(ex.getMessage());
}
First it should write the header and then the file.But the output file is full of data that does not belong to the file.It is writing some garbage data.What may be the problem?
It's not quite clear with some of the declarations missing, but it looks like your problem is with the fis.read() method: the second argument is an offset in the byte array, not the file (common mistake).
You probably want to use relative reads. You also need to check the return value from .read() to see how many bytes were actually read, before writing the buffer out.
The common idiom is:
InputStream is = ...
OutputStream os = ...
byte[] buf = new byte[2048];
int len;
while((len = is.read(buf)) != -1)
os.write(buf, 0, len);
is.close();
os.close();
Edit
That's a pretty weird way of writing out your metadata, I assume that's what the (unused) DataOutputStream is for?
You don't need to keep flushing the output stream, just close it when you're done.
In addition to what #Dmitri has pointed out, there is something seriously wrong with the way you are writing the metadata.
You are writing the metadata every time around the loop, which cannot be right.
You are essentially allocating 4 bytes for it, via "2048".getBytes(), then copying many more than 4 bytes into it, then writing the 4 bytes. This cannot be right either, in fact it should really be throwing ArrayIndexExceptions at you.
It looks as though the metadata is supposed to contain three binary integers. However you are putting String data into it. I suspect you should be using DataOutputStream.writeInt() directly for these fields, without all the String.valueOf()/getBytes() and System.arraycopy() nonsense.
I would like suggest to use lib community supported like apache common-io for IO features.
There are usefule classes and method;
org.apache.commons.io.DirectoryWalker;
org.apache.commons.io.FileUtils;
org.apache.commons.io.IOCase;
FileUtils.copyDirectory(from, to);
FileUtils.writeByteArrayToFile(file, data);
FileUtils.writeStringToFile(file, data);
FileUtils.deleteDirectory(dir);
FileUtils.forceDelete(dir);

Compressing and decompressing large size data in java?

I need to compress/decompress different types of files that are contained in a Folder the size of that folder might be more than 10-11 GB.
I used following code but this is taking long time to compress the data.
BufferedReader in = new BufferedReader(new FileReader("D:/ziptest/expansion1.MPQ"));
BufferedOutputStream out = new BufferedOutputStream(
new GZIPOutputStream(new FileOutputStream("test.gz")));
int c;
while ((c = in.read()) != -1)
out.write(c);
in.close();
out.close();
Please suggest me some fast compressing and decompressing library in java, i also want to split the large file in different parts such as in a chunk of 100MB each.
Reader/Writer is only for Text and if you try to read binary with these is will get corrupted.
Instead I suggest you use FileInputStream. The fastest way to copy the data is to use your own buffer.
InputStream in = new FileInputStream("D:/ziptest/expansion1.MPQ");
OutputStream out = new GZIPOutputStream(
new BufferedOutputStream(new FileOutputStream("test.gz")));
byte[] bytes = new byte[32*1024];
int len;
while((len = in.read(bytes)) > 0)
out.write(bytes, 0, len);
in.close();
out.close();
Since you reading large chunks of bytes, it is more efficient not to BufferedInput/OuptuStream as this removes one copy. There is a BufferedOutptuStream after the GZIPOutputStream as you cannot control the size of data it produces.
BTW: If you are only reading this with Java, you can use DeflatorOutputStream, its slightly faster and smaller, but only supported by Java AFAIK.

Java reading file into memory and how not to blow up memory

I'm a bit of a newbie in Java and I trying to perform a MAC calculation on a file.
Now since the size of the file is not known at runtime, I can't just load all of the file in to memory. So I wrote the code so it would read in bits (4k in this case).
The issue I'm having is I tried loading the entire file into memory to see if both methods produce the same hash. However they seem to be producing different hashes
Here's the bit by bit code:
FileInputStream fis = new FileInputStream("sbs.dat");
byte[] file = new byte[4096];
m = Mac.getInstance("HmacSHA1");
int i=fis.read(file);
m.init(key);
while (i != -1)
{
m.update(file);
i=fis.read(file);
}
mac = m.doFinal();
And here's the all at once approach:
File f = new File("sbs.dat");
long size = f.length();
byte[] file = new byte[(int) size];
fis.read(file);
m = Mac.getInstance("HmacSHA1");
m.init(key);
m.update(file);
mac = m.doFinal();
Shouldn't they both produce the same hash?
The question however is more generic. Is the 1st code the correct way of loading a file into memory into pieces and perform whatever we want to do inside the while cycle? (socket send, cipher a file, etc...).
This question is useful because every tutorial I've seen just loads everything at once...
Update: Working :-D. Will this approach work properly sending a file in pieces through a socket?
No. You have no guarantee that in fis.read(file) will read file.length bytes. This is why read() is returning an int to tell you how many bytes it has actually read.
You should instead do this:
m.init(key);
int i=fis.read(file);
while (i != -1)
{
m.update(file, 0, i);
i=fis.read(file);
}
taking advantage of Mac.update(byte[] data, int offset, int len) method that allows you to specify length of actual data in in byte[] array.
The read function will not necessarily fill up your entire array. So, you need to check how many bytes were returning from the read function, and only use that many bytes of your buffer.
Just like Jason LeBrun says - The read method will not always read the specified amount of bytes. For example: What do you think will happen if the file does not contain a multiple of 4096 bytes?
I would go for something like this:
FileInputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[buffersize];
Mac m = Mac.getInstance("HmacSHA1");
m.init(key);
int n;
while ((n = fis.read(buffer)) != -1)
{
m.update(buffer, 0, n);
}
byte[] mac = m.doFinal();

Categories

Resources