My DeflaterOutputStream/InputStream code corrupting data - java

I've got a simple test case that fails to compress a stream of data. I generate a byte[] of some random bytes, compress it via DeflaterOutputStream, flush() the stream, then reverse those operations to retrieve the original array. At byte 505 the reconstructed stream starts to consist entirely of 0x00 bytes, and I don't understand why:
//
// create some random bytes
//
Random rng = new Random();
int len = 5000;
byte[] data = new byte[len];
for (int i = 0; i < len; ++i)
data[i] = (byte) rng.nextInt(0xff);
//
// write to byte[] via a deflater stream
//
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DeflaterOutputStream os = new DeflaterOutputStream(baos, true);
os.write(data);
os.flush();
//
// read back into byte[] via an inflater stream
//
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
InflaterInputStream is = new InflaterInputStream(bais);
byte[] readbytes = new byte[len];
is.read(readbytes);
//
// check they match (they don't, at byte 505)
//
for (int i = 0; i < len; ++i)
if (data[i] != readbytes[i])
throw new RuntimeException("Mismatch at position " + i);
It doesn't seem to matter what's in the source array, it's always at position 505 it fails.
Here's what the two byte[] arrays look like around the region they differ:
?\m·g··gWNLErZ···,··-··=·;n=··F?···13·{·rw·······\`3···f····{/····t·1·WK$·······WZ······x
?\m·g··gWNLErZ···,··-····································································
^byte 505
All those unprintable chars are 0x00 from that point on. Why is this happening? I feel like I must be misunderstanding something fundamental about how the Deflate/Inflate streams work. The real-world use case here is a stream over a network that I thought I could easily improve the performance of by inserting Deflate/Inflate streams into

When I test this, is.read(readBytes) returns 505, the length of bytes read. The other single-argument-array stream methods return void and guarantee that the entire array is read or written, but is.read() is a different API and requires that you check the amount of bytes actually read.
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
System.err.println( "bais size = " + bais.available() );
InflaterInputStream is = new InflaterInputStream(bais);
byte[] readbytes = new byte[len];
System.err.println( "read = " + is.read(readbytes) ); // 505
This runs without throwing an error for me:
ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
System.err.println( "bais size = " + bais.available() );
InflaterInputStream is = new InflaterInputStream(bais);
byte[] readbytes = new byte[len];
for( int total = 0, result = 0; (result = is.read(readbytes, total, len-total )) != -1; )
{
total += result;
System.err.println( "reading : " + total );
if( total == len ) break;
}

Related

Error in reading GZip in java but not in python

I write some data to GZIP binary file using the below java code
public static void WriteDictAndIndex(HashMap<String, Term> terms, int index){
try{
GZIPOutputStream postingListOutput = new GZIPOutputStream(new FileOutputStream(String.format("./generated/posting_list_%d", index)));
GZIPOutputStream dictionaryOutput = new GZIPOutputStream(new FileOutputStream(String.format("./generated/dictionary_%d", index)));
Integer START=0, SIZE=0, VOCAB=0;
for(String s : terms.keySet()){
ArrayList<Pair<Integer, Byte>> postingList = terms.get(s).postingList;
SIZE = postingList.size()*5;
// Write one posting list to the file system
ByteBuffer list_buffer = ByteBuffer.allocate(SIZE);
int totalCount = 0;
for(Pair<Integer, Byte> p : postingList) {
// Write the docID (4 bytes)
list_buffer.putInt(p.getValue0());
// Write the term frequency (1 byte)
byte termFrequency = p.getValue1();
list_buffer.put(termFrequency);
// Counter for the total occurrences of words
totalCount += (int)termFrequency;
}
if(index == 0 && totalCount == 1)
continue;
postingListOutput.write(list_buffer.array());
// Write one dictionary entry to the file system
byte[] token = s.getBytes();
ByteBuffer dict_buffer = ByteBuffer.allocate(16+token.length);
dict_buffer.putInt(token.length);
dict_buffer.put(token);
dict_buffer.putInt(terms.get(s).documentFrequency);
dict_buffer.putInt(START);
dict_buffer.putInt(SIZE);
dictionaryOutput.write(dict_buffer.array());
START += SIZE;
VOCAB += 1;
}
//INFO
System.out.println(String.format("Vocabulary Size: %d", VOCAB));
postingListOutput.close();
dictionaryOutput.close();
}catch(IOException e){
System.err.println(e);
}
}
Now when I read first 695 bytes of this file using python, it reads as expected. But when I read the file using java GZIP, there are some discrepancies (the last 10 bytes of the first 695 bytes that I read are different)
I am trying to read using the following code:
try{
GZIPInputStream postingList = new GZIPInputStream(new FileInputStream(new File(args[1])));
GZIPInputStream dictionary = new GZIPInputStream(new FileInputStream(new File(args[2])));
byte[] buf = new byte[4];
while(true){
// Get the size of the token from the dictionary
dictionary.read(buf);
int tokenSize = ByteBuffer.wrap(buf).getInt();
// Read the token
byte[] tokenBuffer = new byte[tokenSize];
dictionary.read(tokenBuffer);
String token = new String(tokenBuffer, StandardCharsets.UTF_8);
// Read the document frequency
dictionary.read(buf);
int documentFrequency = ByteBuffer.wrap(buf).getInt();
// Read the starting index of the posting list
dictionary.read(buf);
int START = ByteBuffer.wrap(buf).getInt();
// Read the size of the posting list
dictionary.read(buf);
int SIZE = ByteBuffer.wrap(buf).getInt();
// Read the posting list
for(int i=0; i<documentFrequency; i++){
byte[] ID = new byte[4];
postingList.read(ID);
int docID = ByteBuffer.wrap(ID).getInt();
byte[] frequency = new byte[1];
postingList.read(frequency);
System.out.println(String.format("%d: %d: %d",i, docID, frequency[0]));
}
break;
}
postingList.close();
dictionary.close();
}
catch(IOException e){
System.err.println(e);
}
The print statement above will print multiple lines with after reading an integer(4 byte) and a byte in each line.
Last 2 print statments should be of the form(which python reads fine)
137: 81257: 1
138: 81737: 1
But I am getting(using the below java code)
137: 65536: 61
138: 1761673217: 63
Any pointers on what could be the mistake?

FileInputStream read until last 128 bytes of file

I'm trying to read the last 128 bytes from a file (the signature) and then trying to read until those bytes but the first part (reading the last 128 bytes) is returning an ArrayIndexOutOfBoundsException:
byte[] signature = new byte[128];
FileInputStream sigFis = new FileInputStream(f);
sigFis.read(signature, (int)f.length()-128, 128);
sigFis.close();
And then the last part doesn't seem to be working either, I'm using an offset that i increase gradually:
CipherInputStream cis = new CipherInputStream(fis, c);
FileOutputStream fos = new FileOutputStream(destFile);
int i = cis.read(data);
int offset = 0, maxsize = (int)f.length()-128;
while((i != -1) && offset<maxsize){
fos.write(data, 0, i);
sig.update(data);
fos.flush();
i = cis.read(data);
offset+=1024;
}
I get an EOFExcpetion with the RAF I used to do my ops...
byte[] signature = new byte[128];
int offset = (int)f.length()-128;
raf.seek(offset);
raf.readFully(signature, 0, 128);
I would use File or FileChannel to get the file size. This is how to read until the last 128 bytes
FileInputStream is = new FileInputStream("1.txt");
FileChannel ch = is.getChannel();
long len = ch.size() - 128;
BufferedInputStream bis = new BufferedInputStream(is);
for(long i = 0; i < len; i++) {
int b = bis.read();
...
}
if we continue reading we will get the last 128 bytes
ByteArrayOutputStream bout128 = new ByteArrayOutputStream();
for(int b; (b=bis.read() != -1);) {
bout128.write(b);
}
byte[] last128 = bout128.toByteArray();
I think you got confused with the read method parameters..
FileInputStream sigFis = new FileInputStream(f);
sigFis.read(signature, (int)f.length()-128, 128);
//This doesn't give you last 128 bits.
// The offset is offset of the byte array 'signature
// Thats the reason you see ArrayIndexOutOfBoundsException
sigFis.close();
replace your read() method with
sigFis.read(signature);
//But now signature cannot be just 128 array but length of file. And read the last 128 bytes
InputStream read method signature looks as below:
int java.io.FileInputStream.read(byte[] b, int off, int len)
Parameters:
b the buffer into which the data is read.
off the start offset in the destination array b
len the maximum number of bytes read.
Hope this helps!

To convert the file into byte array

I am creating one directory i.e file and storing the bitmap images into that file,now how to convert it into byte array
File myDir = new File(root + "/saved_images");
myDir.mkdirs();
Random generator = new Random();
int n = 10000;
n = generator.nextInt(n);
String fname = "Image-"+ n +".jpg";
File file = new File (myDir, fname);
if (file.exists ()) file.delete ();
try {
FileOutputStream out = new FileOutputStream(file);
bmp.compress(Bitmap.CompressFormat.JPEG, 90, out);
out.flush();
out.close();
} catch (Exception e) {
e.printStackTrace();
}
If you just want to modify your existing code to write the image to a byte array instead of a file, then replace the try block with this code:
ByteArrayOutputStream out = new ByteArrayOutputStream();
bmp.compress(Bitmap.CompressFormat.JPEG, 90, out);
bytes = out.getBytes();
... where bytes has type byte[], and get rid of the code that generates the filename and deletes the existing file if it exists. Since you writing to a ByteArrayOutputStream, there is not need to call flush() or close() on out. (They won't do anything.)
Not exactly sure what you're trying to do, but you can try something like:
InputStream is = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead;
byte[] data = new byte[some huge number, power of 2 preferably];
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
}
buffer.flush();
byte[] byteArray = buffer.toByteArray();
Just Use this to read the file where you kept.
// Returns the contents of the file in a byte array.
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
// Get the size of the file
long length = file.length();
// You cannot create an array using a long type.
// It needs to be an int type.
// Before converting to an int type, check
// to ensure that file is not larger than Integer.MAX_VALUE.
if (length > Integer.MAX_VALUE) {
// File is too large
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset = 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset
Courtesy : http://www.exampledepot.com
I have used this code for converting image file into byte araay,
Bitmap bm = BitmapFactory.decodeResource(getResources(),R.drawable.abc);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
bm.compress(Bitmap.CompressFormat.JPEG, 40 , bos);
public byte[] bitmapdata = bos.toByteArray();
Log.w("Image Conversion", String.valueOf(bitmapdata.length));
String converted_txt="";
for (int i = 0; i < bitmapdata.length; i++)
{
Log.w("Image Conversion", String.valueOf(bitmapdata[i]));
ba = bitmapdata[i];
converted_txt=converted_txt+bitmapdata[i];
}
try
{
File myFile = new File("/sdcard/myImageToByteFile.jpg");
myFile.createNewFile();
fOut = new FileOutputStream(myFile);
OutputStreamWriter myOutWriter = new OutputStreamWriter(fOut);
myOutWriter.write(ba);
myOutWriter.close();
fOut.close();
}
catch (Exception e)
{
Toast.makeText(getApplicationContext(), e.getMessage(),5000).show();
}

create an ArrayList of bytes

I want to read bytes from a wave file into an array. Since the number of bytes read depends upon the size of the wave file, I'm creating a byte array with a maximum size of 1000000. But this is resulting in empty values at the end of the array. So, I wanted to create a dynamically increasing array and I found that ArrayList is the solution. But the read() function of the AudioInputStream class reads bytes only into a byte array! How do I pass the values into an ArrayList instead?
ArrayList isn't the solution, ByteArrayOutputStream is the solution. Create a ByteArrayOutputStream write your bytes to it, and then invoke toByteArray() to get the bytes.
Example of what your code should look like:
in = new BufferedInputStream(inputStream, 1024*32);
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] dataBuffer = new byte[1024 * 16];
int size = 0;
while ((size = in.read(dataBuffer)) != -1) {
out.write(dataBuffer, 0, size);
}
byte[] bytes = out.toByteArray();
You can have an array of byte like:
List<Byte> arrays = new ArrayList<Byte>();
To convert it back to arrays
Byte[] soundBytes = arrays.toArray(new Byte[arrays.size()]);
(Then, you will have to write a converter to transform Byte[] to byte[]).
EDIT: You are using List<Byte> wrong, I'll just show you how to read AudioInputStream simply with ByteArrayOutputStream.
AudioInputStream ais = ....;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int read;
while((read = ais.read()) != -1) {
baos.write(read);
}
byte[] soundBytes = baos.toByteArray();
PS An IOException is thrown if frameSize is not equal to 1. Hence use a byte buffer to read data, like so:
AudioInputStream ais = ....;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int bytesRead = 0;
while((bytesRead = ais.read(buffer)) != -1) {
baos.write(buffer, 0, bytesRead);
}
byte[] soundBytes = baos.toByteArray();
Something like this should do:
List<Byte> myBytes = new ArrayList<Byte>();
//assuming your javax.sound.sampled.AudioInputStream is called ais
while(true) {
Byte b = ais.read();
if (b != -1) { //read() returns -1 when the end of the stream is reached
myBytes.add(b);
} else {
break;
}
}
Sorry if the code is a bit wrong. I haven't done Java for a while.
Also, be careful if you do implement it as a while(true) loop :)
Edit: And here's an alternative way of doing it that reads more bytes each time:
int arrayLength = 1024;
List<Byte> myBytes = new ArrayList<Byte>();
while(true) {
Byte[] aBytes = new Byte[arrayLength];
int length = ais.read(aBytes); //length is the number of bytes read
if (length == -1) { //read() returns -1 when the end of the stream is reached
break; //or return if you implement this as a method
} else if (length == arrayLength) { //Array is full
myBytes.addAll(aBytes);
} else { //Array has been filled up to length
for (int i = 0; i < length; i++) {
myBytes.add(aBytes[i]);
}
}
}
Note that both read() methods throw an IOException - handling this is left as an exercise for the reader!

How to read a bin file to a byte array?

I have a bin file that I need to convert to a byte array. Can anyone tell me how to do this?
Here is what I have so far:
File f = new File("notification.bin");
is = new FileInputStream(f);
long length = f.length();
/*if (length > Integer.MAX_VALUE) {
// File is too large
}*/
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+f.getName());
}
But it's not working...
Kaddy
try using this
public byte[] readFromStream(InputStream inputStream) throws Exception
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
byte[] data = new byte[4096];
int count = inputStream.read(data);
while(count != -1)
{
dos.write(data, 0, count);
count = inputStream.read(data);
}
return baos.toByteArray();
}
Btw, do you want a Java code or C++ code. Seeing the code in your question, I assumed it to be a java code and hence gave a java answer to it
You're probably better off using a memory mapped file. See this question
In Java, a simple solution is:
InputStream is = ...
ByteArrayOutputStream os = new ByteArrayOutputStream();
byte[] data = new byte[4096]; // A larger buffer size would probably help
int count;
while ((count = is.read(data)) != -1) {
os.write(data, 0, count);
}
byte[] result = os.toByteArray();
If the input is a file, we can preallocate a byte array of the right size:
File f = ...
long fileSize = f.length();
if (fileSize > Integer.MAX_VALUE) {
// file too big
}
InputStream is = new FileInputStream(f);
byte[] data = new byte[fileSize];
if (is.read(data)) != data.length) {
// file truncated while we were reading it???
}
However, there is probably a more efficient way to do this task using NIO.
Unless you really need to do it just that way, maybe simplify what you're doing.
Doing everything in the for loop may seem like a very slick way of doing it, but it's shooting yourself in the foot when you need to debug and don't immediately see the solution.
In this answer I read from an URL
You could modify it so the InputStream is from a File instead of a URLConnection.
Something like:
FileInputStream inputStream = new FileInputStream("your.binary.file");
ByteArrayOutputStream output = new ByteArrayOutputStream();
byte [] buffer = new byte[ 1024 ];
int n = 0;
while (-1 != (n = inputStream.read(buffer))) {
output.write(buffer, 0, n);
}
inputStream.close();
etc
Try open source library apache commons-io
IOUtils.toByteArray(inputStream)
You are not the first and not the last developer who needs to read a file, no need to reinvent it each time.

Categories

Resources