I'm trying to zip a stream from .Net that can be read from Java code.
So as input I have a byte array, which I want to compress and I'm expecting to have a binary array.
I've tested with SharpZipLib and DotNetZip to the compressed byte array,
but unfortunately I always get an error when trying to uncompress it using the java.util.zip.Deflater class in Java.
Do someone have a code sample of compressing a String or a byte array with .Net and de-compressing it with the java.util.zip.Deflater class?
You shouldn't need to touch Deflater. Deflater deals with decompressing individual entries within the zip file.
ZipInputStream is the odd class to go for. There is also ZipFile if you really need to go for random access to an actual file (for many reasons, I wouldn't recommend it).
Inflater doesn't read zip streams. It reads ZLIB (or DEFLATE) streams. The ZIP format surrounds a pure DEFLATE stream with additional metadata. Inflater doesn't handle that metadata.
If you are inflating on the Java side, you need Inflater.
On the .NET side you can use the Ionic.Zlib.ZlibStream class from DotNetZip to compress - in other words to produce something the Java Inflater can read.
I've just tested this; this code works. The Java side decompresses what the .NET side has compressed.
.NET side:
byte[] compressed = Ionic.Zlib.ZlibStream .CompressString(originalText);
File.WriteAllBytes("ToInflate.bin", compressed);
Java side:
public void Run()
throws java.io.FileNotFoundException,
java.io.IOException,
java.util.zip.DataFormatException,
java.io.UnsupportedEncodingException,
java.security.NoSuchAlgorithmException
{
String filename = "ToInflate.bin";
File file = new File(filename);
InputStream is = new FileInputStream(file);
// Get the size of the file
int length = (int)file.length();
byte[] deflated = new byte[length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < deflated.length
&& (numRead=is.read(deflated, offset, deflated.length-offset)) >= 0) {
offset += numRead;
}
// Decompress the bytes
Inflater decompressor = new Inflater();
decompressor.setInput(deflated, 0, length);
byte[] result = new byte[100];
int totalRead= 0;
while ((numRead = decompressor.inflate(result)) > 0)
totalRead += numRead;
decompressor.end();
System.out.println("Inflate: total size of inflated data: " + totalRead + "\n");
result = new byte[totalRead];
decompressor = new Inflater();
decompressor.setInput(deflated, 0, length);
int resultLength = decompressor.inflate(result);
decompressor.end();
// Decode the bytes into a String
String outputString = new String(result, 0, resultLength, "UTF-8");
System.out.println("Inflate: inflated string: " + outputString + "\n");
}
(I'm kinda rusty at Java so it might stand some improvement, but you get the idea)
Here is the page from Sun on ZipStreams:
http://java.sun.com/developer/technicalArticles/Programming/compression/
Another library that deals with ZipStreams is POI. It is more focused on working with MS OFfic XML format docs but it might have some different insights as to how to handle the stream. http://poi.apache.org/apidocs/org/apache/poi/openxml4j/opc/internal/marshallers/ZipPartMarshaller.html
Related
I am new to the Java I/O so please help.
I am trying to process a large file(e.g. a pdf file of 50mb) using the apache commons library.
At first I try:
byte[] bytes = FileUtils.readFileToByteArray(file);
String encodeBase64String = Base64.encodeBase64String(bytes);
byte[] decoded = Base64.decodeBase64(encodeBase64String);
But knowing that the
FileUtils.readFileToByteArray in org.apache.commons.io will load the whole file into memory, I try to use BufferedInputStream to read the file piece by piece:
BufferedInputStream bis = new BufferedInputStream(inputStream);
StringBuilder pdfStringBuilder = new StringBuilder();
int byteArraySize = 10;
byte[] tempByteArray = new byte[byteArraySize];
while (bis.available() > 0) {
if (bis.available() < byteArraySize) { // reaching the end of file
tempByteArray = new byte[bis.available()];
}
int len = Math.min(bis.available(), byteArraySize);
read = bis.read(tempByteArray, 0, len);
if (read != -1) {
pdfStringBuilder.append(Base64.encodeBase64String(tempByteArray));
} else {
System.err.println("End of file reached.");
}
}
byte[] bytes = Base64.decodeBase64(pdfStringBuilder.toString());
However, the 2 decoded bytes array don't look quite the same... ... In fact, the only give 10 bytes, which is my temp array size... ...
Can anyone please help:
what am I doing it wrong to read the file piece by piece?
why is the decoded byte array only returns 10 bytes in the 2nd solution?
Thanks in advance:)
After some digging, it turns out that the byte array's size has to be multiple of 3 in order to avoid padding. After using a temp array size with multiple of 3, the program is able to go through.
I simply change
int byteArraySize = 10;
to be
int byteArraySize = 1024 * 3;
We are using Apache Camel for compressing and decompressing our files.
We use the standard .marshal().gzip() and .unmarshall().gzip() APIs.
Our problem is that when we get really large files, say 800MB to more than 1GB file size, our application runs out of memory, since the entire file is loading into memory for compression and decompression.
Are there any camel apis or java libraries which will help zip/unzip the file without loading the entire file in memory.
There is a similar unanswered question here
Explanation
Use a different approach: Stream the file.
That is, don't load it into memory completely but read it byte per byte and simultaneously write it back byte per byte .
Get an InputStream to the file, wrap some GZipInputStream around. Read byte per byte, write to an OutputStream.
The opposite if you want to compress an archive. Then you wrap the OutputStream by some GZipOutputStream.
Code
The example uses Apache Commons Compress but the logic of the code remains the same for all libraries.
Unpacking a gz archive:
Path inputPath = Paths.get("archive.tar.gz");
Path outputPath = Paths.get("archive.tar");
try (InputStream fin = Files.newInputStream(inputPath );
OutputStream out = Files.newOutputStream(outputPath);) {
GZipCompressorInputStream in = new GZipCompressorInputStream(
new BufferedInputStream(fin));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
Packing as gz archive:
Path inputPath = Paths.get("archive.tar");
Path outputPath = Paths.get("archive.tar.gz");
try (InputStream in = Files.newInputStream(inputPath);
OutputStream fout = Files.newOutputStream(outputPath);) {
GZipCompressorOutputStream out = new GZipCompressorOutputStream(
new BufferedOutputStream(fout));
// Read and write byte by byte
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = in.read(buffer))) {
out.write(buffer, 0, n);
}
}
You could also wrap BufferedReader and PrintWriter around if you feel more comfortable with them. They manage the buffering themselves and you can read and write lines instead of bytes. Note that this only works correct if you read a file with lines and not some other format.
My Java client sends a file to a C++ server using this code:
FileInputStream fileInputStream = new FileInputStream(path);
byte[] buffer = new byte[64*1024];
int bytesRead = 0;
while ( (bytesRead = fileInputStream.read(buffer)) != -1)
{
if (bytesRead > 0)
{
this.outToServer.write(buffer, 0, bytesRead);
}
}
My C++ server receives the bytes using this code:
vector<char> buf5(file_length);
size_t read_bytes;
do
{
read_bytes = socket.read_some(boost::asio::buffer(buf5,file_length));
file_length -= read_bytes;
}
while(read_bytes != 0);
string file(buf5.begin(), buf5.end());
And then creates the file using this code:
ofstream out_file( (some_path).c_str() );
out_file << file << endl;
out_file.close();
However, somehow the file gets corrupted during this process.
At the end of the process, both files(the one sent and the one created) have the same size.
What am I doing wrong? Any help would be appreciated!
Edit: tried to use different code for receiving the file, same result:
char buf[file_length];
size_t length = 0;
while( length < file_length )
{
length += socket.read_some(boost::asio::buffer(&buf[length], file_length - length), error);
}
string file(buf);
1) is it a text file?
2) if not try opening the file in binary mode before writing, also do not use << operator, instead use write or put methods
In your first example the problem appears to be this line:
read_bytes = socket.read_some(boost::asio::buffer(buf5,file_length));
This results in you overwriting the first N bytes of your string and not appending multiple reads correctly.
In your second example the problem is likely:
string file(buf);
If buf contains any NUL characters then the string will be truncated. Use the same string creation as in your first example with a std::vector<char>.
If you still have problems I would recommend doing a binary diff of the source and copied files (most hex editors can do this). This should give you a better picture of exactly where the difference is and what may be causing it.
i want to convert the video to bytes it gives me result but i think the result is not correct because i test it for different videos and the gives me the same result so
can any one help please to do how to convert video to byte
String filename = "D:/try.avi";
byte[] myByteArray = filename.getBytes();
for(int i = 0; i<myByteArray.length;i ++)
{
System.out.println(myByteArray[i]);
}
Any help Please?
String filename = "D:/try.avi";
byte[] myByteArray = filename.getBytes();
That is converting the file name to bytes, not the file content.
As for reading the content of the file, see the Basic I/O lesson of the Java Tutorial.
Videos in same container formats start with same bytes. The codec used determines the actual video files.
I suggest you read more about container file formats and codecs first if you plan developing video applications.
But you have a different problem. As Andrew Thompson correctly pointed out, you are getting the bytes of the filename string.
The correct approach would be:
private static File fl=new File("D:\video.avi");
byte[] myByteArray = getBytesFromFile(fl);
Please also bear in mind that terminals usually have fixed buffer size (on Windows, it's several lines), so outputting a big chunk of data will display only last several lines of it.
Edit: Here's an implementation of getBytesFromFile; a java expert may offer more standard approach.
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = openFile(file.getPath());
// Get the size of the file
long length = file.length();
if (length > Integer.MAX_VALUE) {
// File is too large
Assert.assertExp(false);
logger.warn(file.getPath()+" is too big");
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// debug - init array
for (int i = 0; i < length; i++){
bytes[i] = 0x0;
}
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}
// Close the input stream and return bytes
is.close();
return bytes;
}
If you want to read the contents of the video file then use File.
String filename = "D:/try.avi";
File file=new File(filename);
byte myByteArray[]=new byte[(int)file.length()];
RandomAccessFile raf=new RandomAccessFile(file,"rw");
raf.read(myByteArray);
The documentation says that one should not use available() method to determine the size of an InputStream. How can I read the whole content of an InputStream into a byte array?
InputStream in; //assuming already present
byte[] data = new byte[in.available()];
in.read(data);//now data is filled with the whole content of the InputStream
I could read multiple times into a buffer of a fixed size, but then, I will have to combine the data I read into a single byte array, which is a problem for me.
The simplest approach IMO is to use Guava and its ByteStreams class:
byte[] bytes = ByteStreams.toByteArray(in);
Or for a file:
byte[] bytes = Files.toByteArray(file);
Alternatively (if you didn't want to use Guava), you could create a ByteArrayOutputStream, and repeatedly read into a byte array and write into the ByteArrayOutputStream (letting that handle resizing), then call ByteArrayOutputStream.toByteArray().
Note that this approach works whether you can tell the length of your input or not - assuming you have enough memory, of course.
Please keep in mind that the answers here assume that the length of the file is less than or equal to Integer.MAX_VALUE(2147483647).
If you are reading in from a file, you can do something like this:
File file = new File("myFile");
byte[] fileData = new byte[(int) file.length()];
DataInputStream dis = new DataInputStream(new FileInputStream(file));
dis.readFully(fileData);
dis.close();
UPDATE (May 31, 2014):
Java 7 adds some new features in the java.nio.file package that can be used to make this example a few lines shorter. See the readAllBytes() method in the java.nio.file.Files class. Here is a short example:
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
// ...
Path p = FileSystems.getDefault().getPath("", "myFile");
byte [] fileData = Files.readAllBytes(p);
Android has support for this starting in Api level 26 (8.0.0, Oreo).
You can use Apache commons-io for this task:
Refer to this method:
public static byte[] readFileToByteArray(File file) throws IOException
Update:
Java 7 way:
byte[] bytes = Files.readAllBytes(Paths.get(filename));
and if it is a text file and you want to convert it to String (change encoding as needed):
StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes)).toString()
You can read it by chunks (byte buffer[] = new byte[2048]) and write the chunks to a ByteArrayOutputStream. From the ByteArrayOutputStream you can retrieve the contents as a byte[], without needing to determine its size beforehand.
I believe buffer length needs to be specified, as memory is finite and you may run out of it
Example:
InputStream in = new FileInputStream(strFileName);
long length = fileFileName.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("File is too large!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + fileFileName.getName());
}
in.close();
Max value for array index is Integer.MAX_INT - it's around 2Gb (2^31 / 2 147 483 647).
Your input stream can be bigger than 2Gb, so you have to process data in chunks, sorry.
InputStream is;
final byte[] buffer = new byte[512 * 1024 * 1024]; // 512Mb
while(true) {
final int read = is.read(buffer);
if ( read < 0 ) {
break;
}
// do processing
}