File gets corrupted when transferring it via socket - java

My Java client sends a file to a C++ server using this code:
FileInputStream fileInputStream = new FileInputStream(path);
byte[] buffer = new byte[64*1024];
int bytesRead = 0;
while ( (bytesRead = fileInputStream.read(buffer)) != -1)
{
if (bytesRead > 0)
{
this.outToServer.write(buffer, 0, bytesRead);
}
}
My C++ server receives the bytes using this code:
vector<char> buf5(file_length);
size_t read_bytes;
do
{
read_bytes = socket.read_some(boost::asio::buffer(buf5,file_length));
file_length -= read_bytes;
}
while(read_bytes != 0);
string file(buf5.begin(), buf5.end());
And then creates the file using this code:
ofstream out_file( (some_path).c_str() );
out_file << file << endl;
out_file.close();
However, somehow the file gets corrupted during this process.
At the end of the process, both files(the one sent and the one created) have the same size.
What am I doing wrong? Any help would be appreciated!
Edit: tried to use different code for receiving the file, same result:
char buf[file_length];
size_t length = 0;
while( length < file_length )
{
length += socket.read_some(boost::asio::buffer(&buf[length], file_length - length), error);
}
string file(buf);

1) is it a text file?
2) if not try opening the file in binary mode before writing, also do not use << operator, instead use write or put methods

In your first example the problem appears to be this line:
read_bytes = socket.read_some(boost::asio::buffer(buf5,file_length));
This results in you overwriting the first N bytes of your string and not appending multiple reads correctly.
In your second example the problem is likely:
string file(buf);
If buf contains any NUL characters then the string will be truncated. Use the same string creation as in your first example with a std::vector<char>.
If you still have problems I would recommend doing a binary diff of the source and copied files (most hex editors can do this). This should give you a better picture of exactly where the difference is and what may be causing it.

Related

Java Reading large files into byte array chunk by chunk

So I've been trying to make a small program that inputs a file into a byte array, then it will turn that byte array into hex, then binary. It will then play with the binary values (I haven't thought of what to do when I get to this stage) and then save it as a custom file.
I studied a lot of internet code and I can turn a file into a byte array and into hex, but the problem is I can't turn huge files into byte arrays (out of memory).
This is the code that is not a complete failure
public void rundis(Path pp) {
byte bb[] = null;
try {
bb = Files.readAllBytes(pp); //Files.toByteArray(pathhold);
System.out.println("byte array made");
} catch (Exception e) {
e.printStackTrace();
}
if (bb.length != 0 || bb != null) {
System.out.println("byte array filled");
//send to method to turn into hex
} else {
System.out.println("byte array NOT filled");
}
}
I know how the process should go, but I don't know how to code that properly.
The process if you are interested:
Input file using File
Read the chunk by chunk of the file into a byte array. Ex. each byte array record hold 600 bytes
Send that chunk to be turned into a Hex value --> Integer.tohexstring
Send that hex value chunk to be made into a binary value --> Integer.toBinarystring
Mess around with the Binary value
Save to custom file line by line
Problem:: I don't know how to turn a huge file into a byte array chunk by chunk to be processed.
Any and all help will be appreciated, thank you for reading :)
To chunk your input use a FileInputStream:
Path pp = FileSystems.getDefault().getPath("logs", "access.log");
final int BUFFER_SIZE = 1024*1024; //this is actually bytes
FileInputStream fis = new FileInputStream(pp.toFile());
byte[] buffer = new byte[BUFFER_SIZE];
int read = 0;
while( ( read = fis.read( buffer ) ) > 0 ){
// call your other methodes here...
}
fis.close();
To stream a file, you need to step away from Files.readAllBytes(). It's a nice utility for small files, but as you noticed not so much for large files.
In pseudocode it would look something like this:
while there are more bytes available
read some bytes
process those bytes
(write the result back to a file, if needed)
In Java, you can use a FileInputStream to read a file byte by byte or chunk by chunk. Lets say we want to write back our processed bytes. First we open the files:
FileInputStream is = new FileInputStream(new File("input.txt"));
FileOutputStream os = new FileOutputStream(new File("output.txt"));
We need the FileOutputStream to write back our results - we don't want to just drop our precious processed data, right? Next we need a buffer which holds a chunk of bytes:
byte[] buf = new byte[4096];
How many bytes is up to you, I kinda like chunks of 4096 bytes. Then we need to actually read some bytes
int read = is.read(buf);
this will read up to buf.length bytes and store them in buf. It will return the total bytes read. Then we process the bytes:
//Assuming the processing function looks like this:
//byte[] process(byte[] data, int bytes);
byte[] ret = process(buf, read);
process() in above example is your processing method. It takes in a byte-array, the number of bytes it should process and returns the result as byte-array.
Last, we write the result back to a file:
os.write(ret);
We have to execute this in a loop until there are no bytes left in the file, so lets write a loop for it:
int read = 0;
while((read = is.read(buf)) > 0) {
byte[] ret = process(buf, read);
os.write(ret);
}
and finally close the streams
is.close();
os.close();
And thats it. We processed the file in 4096-byte chunks and wrote the result back to a file. It's up to you what to do with the result, you could also send it over TCP or even drop it if it's not needed, or even read from TCP instead of a file, the basic logic is the same.
This still needs some proper error-handling to work around missing files or wrong permissions but that's up to you to implement that.
A example implementation for the process method:
//returns the hex-representation of the bytes
public static byte[] process(byte[] bytes, int length) {
final char[] hexchars = "0123456789ABCDEF".toCharArray();
char[] ret = new char[length * 2];
for ( int i = 0; i < length; ++i) {
int b = bytes[i] & 0xFF;
ret[i * 2] = hexchars[b >>> 4];
ret[i * 2 + 1] = hexchars[b & 0x0F];
}
return ret;
}

difference between input.read and input.read(array, offset, length)

I'm trying to understand how inputstreams work. The following block of code is one of the many ways to read data from a text file:-
File file = new File("./src/test.txt");
InputStream input = new BufferedInputStream (new FileInputStream(file));
int data = 0;
while (data != -1) (-1 means we reached the end of the file)
{
data = input.read(); //if a character was read, it'll be turned to a bite and we get the integer representation of it so a is 97 b is 98
System.out.println(data + (char)data); //this will print the numbers followed by space then the character
}
input.close();
Now to use input.read(byte, offset, length) i have this code. I got it from here
File file = new File("./src/test.txt");
InputStream input = new BufferedInputStream (new FileInputStream(file));
int totalBytesRead = 0, bytesRemaining, bytesRead;
byte[] result = new byte[ ( int ) file.length()];
while ( totalBytesRead < result.length )
{
bytesRemaining = result.length - totalBytesRead;
bytesRead = input.read ( result, totalBytesRead, bytesRemaining );
if ( bytesRead > 0 )
totalBytesRead = totalBytesRead + bytesRead;
//printing integer version of bytes read
for (int i = 0; i < bytesRead; i++)
System.out.print(result[i] + " ");
System.out.println();
//printing character version of bytes read
for (int i = 0; i < bytesRead; i++)
System.out.print((char)result[i]);
}
input.close();
I'm assuming that based on the name BYTESREAD, this read method is returning the number of bytes read. In the documentation, it says that the function will try to read as many as possible. So there might be a reason why it wouldn't.
My first question is: What are these reasons?
I could replace that entire while loop with one line of code: input.read(result, 0, result.length)
I'm sure the creator of the article thought about this. It's not about the output because I get the same output in both cases. So there has to be a reason. At least one. What is it?
The documentation of read(byte[],int,int says that it:
Reads up to len bytes of data.
An attempt is made to read as many as len bytes
A smaller number may be read.
Since we are working with files that are right there in our hard disk, it seems reasonable to expect that the attempt will read the whole file, but input.read(result, 0, result.length) is not guaranteed to read the whole file (it's not said anywhere in the documentation). Relying in undocumented behaviors is a source for bugs when the undocumented behavior change.
For instance, the file stream may be implemented differently in other JVMs, some OS may impose a limit on the number of bytes that you may read at once, the file may be located in the network, or you may later use that piece of code with another implementation of stream, which doesn't behave in that way.
Alternatively, if you are reading the whole file in an array, perhaps you could use DataInputStream.readFully
About the loop with read(), it reads a single byte each time. That reduces performance if you are reading a big chunk of data, since each call to read() will perform several tests (has the stream ended? etc) and may ask the OS for one byte. Since you already know that you want file.length() bytes, there is no reason for not using the other more efficient forms.
Imagine you are reading from a network socket, not from a file. In this case you don't have any information about the total amount of bytes in the stream. You would allocate a buffer of fixed size and read from the stream in a loop. During one iteration of the loop you can't expect there are BUFFERSIZE bytes available in the stream. So you would fill the buffer as much as possible and iterate again, until the buffer is full. This can be useful, if you have data blocks of fixed size, for example serialized object.
ArrayList<MyObject> list = new ArrayList<MyObject>();
try {
InputStream input = socket.getInputStream();
byte[] buffer = new byte[1024];
int bytesRead;
int off = 0;
int len = 1024;
while(true) {
bytesRead = input.read(buffer, off, len);
if(bytesRead == len) {
list.add(createMyObject(buffer));
// reset variables
off = 0;
len = 1024;
continue;
}
if(bytesRead == -1) break;
// buffer is not full, adjust size
off += bytesRead;
len -= bytesRead;
}
} catch(IOException io) {
// stream was closed
}
ps. Code is not tested and should only point out, how this function can be useful.
You specify the amount of bytes to read because you might not want to read the entire file at once or maybe you couldn't or might not want to create a buffer as large as the file.

Divide the video to bytes

i want to convert the video to bytes it gives me result but i think the result is not correct because i test it for different videos and the gives me the same result so
can any one help please to do how to convert video to byte
String filename = "D:/try.avi";
byte[] myByteArray = filename.getBytes();
for(int i = 0; i<myByteArray.length;i ++)
{
System.out.println(myByteArray[i]);
}
Any help Please?
String filename = "D:/try.avi";
byte[] myByteArray = filename.getBytes();
That is converting the file name to bytes, not the file content.
As for reading the content of the file, see the Basic I/O lesson of the Java Tutorial.
Videos in same container formats start with same bytes. The codec used determines the actual video files.
I suggest you read more about container file formats and codecs first if you plan developing video applications.
But you have a different problem. As Andrew Thompson correctly pointed out, you are getting the bytes of the filename string.
The correct approach would be:
private static File fl=new File("D:\video.avi");
byte[] myByteArray = getBytesFromFile(fl);
Please also bear in mind that terminals usually have fixed buffer size (on Windows, it's several lines), so outputting a big chunk of data will display only last several lines of it.
Edit: Here's an implementation of getBytesFromFile; a java expert may offer more standard approach.
public static byte[] getBytesFromFile(File file) throws IOException {
InputStream is = openFile(file.getPath());
// Get the size of the file
long length = file.length();
if (length > Integer.MAX_VALUE) {
// File is too large
Assert.assertExp(false);
logger.warn(file.getPath()+" is too big");
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// debug - init array
for (int i = 0; i < length; i++){
bytes[i] = 0x0;
}
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read file "+file.getName());
}
// Close the input stream and return bytes
is.close();
return bytes;
}
If you want to read the contents of the video file then use File.
String filename = "D:/try.avi";
File file=new File(filename);
byte myByteArray[]=new byte[(int)file.length()];
RandomAccessFile raf=new RandomAccessFile(file,"rw");
raf.read(myByteArray);

Java reading file into memory and how not to blow up memory

I'm a bit of a newbie in Java and I trying to perform a MAC calculation on a file.
Now since the size of the file is not known at runtime, I can't just load all of the file in to memory. So I wrote the code so it would read in bits (4k in this case).
The issue I'm having is I tried loading the entire file into memory to see if both methods produce the same hash. However they seem to be producing different hashes
Here's the bit by bit code:
FileInputStream fis = new FileInputStream("sbs.dat");
byte[] file = new byte[4096];
m = Mac.getInstance("HmacSHA1");
int i=fis.read(file);
m.init(key);
while (i != -1)
{
m.update(file);
i=fis.read(file);
}
mac = m.doFinal();
And here's the all at once approach:
File f = new File("sbs.dat");
long size = f.length();
byte[] file = new byte[(int) size];
fis.read(file);
m = Mac.getInstance("HmacSHA1");
m.init(key);
m.update(file);
mac = m.doFinal();
Shouldn't they both produce the same hash?
The question however is more generic. Is the 1st code the correct way of loading a file into memory into pieces and perform whatever we want to do inside the while cycle? (socket send, cipher a file, etc...).
This question is useful because every tutorial I've seen just loads everything at once...
Update: Working :-D. Will this approach work properly sending a file in pieces through a socket?
No. You have no guarantee that in fis.read(file) will read file.length bytes. This is why read() is returning an int to tell you how many bytes it has actually read.
You should instead do this:
m.init(key);
int i=fis.read(file);
while (i != -1)
{
m.update(file, 0, i);
i=fis.read(file);
}
taking advantage of Mac.update(byte[] data, int offset, int len) method that allows you to specify length of actual data in in byte[] array.
The read function will not necessarily fill up your entire array. So, you need to check how many bytes were returning from the read function, and only use that many bytes of your buffer.
Just like Jason LeBrun says - The read method will not always read the specified amount of bytes. For example: What do you think will happen if the file does not contain a multiple of 4096 bytes?
I would go for something like this:
FileInputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[buffersize];
Mac m = Mac.getInstance("HmacSHA1");
m.init(key);
int n;
while ((n = fis.read(buffer)) != -1)
{
m.update(buffer, 0, n);
}
byte[] mac = m.doFinal();

Issue with Zipped Streams from .Net and reading them from Java

I'm trying to zip a stream from .Net that can be read from Java code.
So as input I have a byte array, which I want to compress and I'm expecting to have a binary array.
I've tested with SharpZipLib and DotNetZip to the compressed byte array,
but unfortunately I always get an error when trying to uncompress it using the java.util.zip.Deflater class in Java.
Do someone have a code sample of compressing a String or a byte array with .Net and de-compressing it with the java.util.zip.Deflater class?
You shouldn't need to touch Deflater. Deflater deals with decompressing individual entries within the zip file.
ZipInputStream is the odd class to go for. There is also ZipFile if you really need to go for random access to an actual file (for many reasons, I wouldn't recommend it).
Inflater doesn't read zip streams. It reads ZLIB (or DEFLATE) streams. The ZIP format surrounds a pure DEFLATE stream with additional metadata. Inflater doesn't handle that metadata.
If you are inflating on the Java side, you need Inflater.
On the .NET side you can use the Ionic.Zlib.ZlibStream class from DotNetZip to compress - in other words to produce something the Java Inflater can read.
I've just tested this; this code works. The Java side decompresses what the .NET side has compressed.
.NET side:
byte[] compressed = Ionic.Zlib.ZlibStream .CompressString(originalText);
File.WriteAllBytes("ToInflate.bin", compressed);
Java side:
public void Run()
throws java.io.FileNotFoundException,
java.io.IOException,
java.util.zip.DataFormatException,
java.io.UnsupportedEncodingException,
java.security.NoSuchAlgorithmException
{
String filename = "ToInflate.bin";
File file = new File(filename);
InputStream is = new FileInputStream(file);
// Get the size of the file
int length = (int)file.length();
byte[] deflated = new byte[length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < deflated.length
&& (numRead=is.read(deflated, offset, deflated.length-offset)) >= 0) {
offset += numRead;
}
// Decompress the bytes
Inflater decompressor = new Inflater();
decompressor.setInput(deflated, 0, length);
byte[] result = new byte[100];
int totalRead= 0;
while ((numRead = decompressor.inflate(result)) > 0)
totalRead += numRead;
decompressor.end();
System.out.println("Inflate: total size of inflated data: " + totalRead + "\n");
result = new byte[totalRead];
decompressor = new Inflater();
decompressor.setInput(deflated, 0, length);
int resultLength = decompressor.inflate(result);
decompressor.end();
// Decode the bytes into a String
String outputString = new String(result, 0, resultLength, "UTF-8");
System.out.println("Inflate: inflated string: " + outputString + "\n");
}
(I'm kinda rusty at Java so it might stand some improvement, but you get the idea)
Here is the page from Sun on ZipStreams:
http://java.sun.com/developer/technicalArticles/Programming/compression/
Another library that deals with ZipStreams is POI. It is more focused on working with MS OFfic XML format docs but it might have some different insights as to how to handle the stream. http://poi.apache.org/apidocs/org/apache/poi/openxml4j/opc/internal/marshallers/ZipPartMarshaller.html

Categories

Resources