SocketChannel.write(ByteBuffer[]) "corrupting" data - java

Problem: Corrupt TCP segment.
I send a sequence of ByteBuffers over and over in a SocketChannel. The sequence is the following:
\r\n
length of chunk (example: fff)
\r\n
chunk data (rubbish, a 1000 - 5000 character long string)
\r\n
length of next chunk (example: fff)
\r\n
next chunk data (rubbish, a 1000 - 5000 character long string)
...
I hope you see the pattern. The MTU on network level is about 1500, so it'll create TCP segments to send over the "chunk data".
The problem in the segments is: Somehow(?), randomly(?), a segment (its payload) starts with \r\n instead of the remaining bytes from the "chunk data" first.
So you get for example:
(segment 1)
\r\n
length of chunk (example: fff)
\r\n
chunk data (456 bytes)
(segment 2)
\r\n
chunk data (remaining 156 bytes)
length of next
\r\n
Instead of:
(segment 1)
\r\n
length of chunk (example: fff)
\r\n
chunk data (456 bytes)
(segment 2)
chunk data (remaining 156 bytes)
\r\n
length of next
\r\n
I'd like to know if Java code is even able to cause that, knowing that my "chunk data" ByteBuffer sent correctly, except for the ByteBuffer containing \r\n that joins in...
Any help is welcome, thank you for your time!
Andrew

I will bet that you are ignoring the result of a read or write. TCP does not lose or corrupt data and neither do the Socket APIs or the Java networking libraries. At least I've never seen it in about 22 years of network programming and 14 years of Java.

It is not because of network issue but with the way we coded. If we are reading and writing the data in chunks it may lead to data corruption because of last chunk. It may be possible the last data chunk read is partially filled and having default values as 0 for byte array.
Following example shows the solution for it
ObjectOutputStream out = new ObjectOutputStream(
socket.getOutputStream());
ObjectInputStream in = new ObjectInputStream(socket.getInputStream());
ObjectOutputStream out = new ObjectOutputStream(socket.getOutputStream());
//Something local returns DataInputStream from server
InputStream dataInputStream = local.downloadFile(fileToBeRead);
int chunkSize = 10000;
byte[] chunkByteArray = new byte[chunkSize];
int bytesRead = -1;
while ((bytesRead = dataInputStream.read(chunkByteArray)) != -1) {
if (bytesRead < chunkSize) {
System.out.println("Last Chunk is " + bytesRead);
chunkByteArray = getLastChunkByteArray(bytesRead,chunkByteArray);
}
out.write(chunkByteArray);
}
dataInputStream.close();
And the method
private byte[] getLastChunkByteArray(int noOfBytesRead,
byte[] partialFilledChunk) {
byte[] lastChunk = new byte[noOfBytesRead];
for (int i = 0; i < noOfBytesRead; i++) {
lastChunk[i] = partialFilledChunk[i];
}
return lastChunk;
}

Related

Sending strings (bytes) over Bluetooth in Android

I'm implementing sending data through Bluetooth in Android between two Android devices. For simplicity, I'll mostly likely pass strings and parse them as JSON or other string format.
Now I'm wondering, how should I read the data to be sure, that I've received all of them? Currently I'm taking the following approach:
byte[] buffer = new byte[1024];
while (!finished) {
// My class
MemoryStream ms = new MemoryStream();
int bytesRead = 0;
do {
bytesRead = input.read(buffer, 0, buffer.length);
ms.write(buffer, 0, bytesRead);
} while (bytesRead == buffer.length);
// Now process data
}
However, this will work only if inputStream.read() will always return as many bytes as were sent on the other device.
For example, I'm assuming, that if first device sends 1234 bytes, first call to read will return 1024 and second 210. Or if first device sends 1024 bytes, first call to read will return 1024 bytes and second - -1 (stream end).
Am I right? Or should I implement my own mechanism of determining, whether all sent data was received (or should I wait for more to complete current chunk)?
The answer is: no. It is possible, that transmission will end (in terms of input.read) and not the whole sent buffer is transferred.
One has to guard transmission, preferably by preceeding the data with their size in bytes and then read data until all of them are transferred.

Java Reading large files into byte array chunk by chunk

So I've been trying to make a small program that inputs a file into a byte array, then it will turn that byte array into hex, then binary. It will then play with the binary values (I haven't thought of what to do when I get to this stage) and then save it as a custom file.
I studied a lot of internet code and I can turn a file into a byte array and into hex, but the problem is I can't turn huge files into byte arrays (out of memory).
This is the code that is not a complete failure
public void rundis(Path pp) {
byte bb[] = null;
try {
bb = Files.readAllBytes(pp); //Files.toByteArray(pathhold);
System.out.println("byte array made");
} catch (Exception e) {
e.printStackTrace();
}
if (bb.length != 0 || bb != null) {
System.out.println("byte array filled");
//send to method to turn into hex
} else {
System.out.println("byte array NOT filled");
}
}
I know how the process should go, but I don't know how to code that properly.
The process if you are interested:
Input file using File
Read the chunk by chunk of the file into a byte array. Ex. each byte array record hold 600 bytes
Send that chunk to be turned into a Hex value --> Integer.tohexstring
Send that hex value chunk to be made into a binary value --> Integer.toBinarystring
Mess around with the Binary value
Save to custom file line by line
Problem:: I don't know how to turn a huge file into a byte array chunk by chunk to be processed.
Any and all help will be appreciated, thank you for reading :)
To chunk your input use a FileInputStream:
Path pp = FileSystems.getDefault().getPath("logs", "access.log");
final int BUFFER_SIZE = 1024*1024; //this is actually bytes
FileInputStream fis = new FileInputStream(pp.toFile());
byte[] buffer = new byte[BUFFER_SIZE];
int read = 0;
while( ( read = fis.read( buffer ) ) > 0 ){
// call your other methodes here...
}
fis.close();
To stream a file, you need to step away from Files.readAllBytes(). It's a nice utility for small files, but as you noticed not so much for large files.
In pseudocode it would look something like this:
while there are more bytes available
read some bytes
process those bytes
(write the result back to a file, if needed)
In Java, you can use a FileInputStream to read a file byte by byte or chunk by chunk. Lets say we want to write back our processed bytes. First we open the files:
FileInputStream is = new FileInputStream(new File("input.txt"));
FileOutputStream os = new FileOutputStream(new File("output.txt"));
We need the FileOutputStream to write back our results - we don't want to just drop our precious processed data, right? Next we need a buffer which holds a chunk of bytes:
byte[] buf = new byte[4096];
How many bytes is up to you, I kinda like chunks of 4096 bytes. Then we need to actually read some bytes
int read = is.read(buf);
this will read up to buf.length bytes and store them in buf. It will return the total bytes read. Then we process the bytes:
//Assuming the processing function looks like this:
//byte[] process(byte[] data, int bytes);
byte[] ret = process(buf, read);
process() in above example is your processing method. It takes in a byte-array, the number of bytes it should process and returns the result as byte-array.
Last, we write the result back to a file:
os.write(ret);
We have to execute this in a loop until there are no bytes left in the file, so lets write a loop for it:
int read = 0;
while((read = is.read(buf)) > 0) {
byte[] ret = process(buf, read);
os.write(ret);
}
and finally close the streams
is.close();
os.close();
And thats it. We processed the file in 4096-byte chunks and wrote the result back to a file. It's up to you what to do with the result, you could also send it over TCP or even drop it if it's not needed, or even read from TCP instead of a file, the basic logic is the same.
This still needs some proper error-handling to work around missing files or wrong permissions but that's up to you to implement that.
A example implementation for the process method:
//returns the hex-representation of the bytes
public static byte[] process(byte[] bytes, int length) {
final char[] hexchars = "0123456789ABCDEF".toCharArray();
char[] ret = new char[length * 2];
for ( int i = 0; i < length; ++i) {
int b = bytes[i] & 0xFF;
ret[i * 2] = hexchars[b >>> 4];
ret[i * 2 + 1] = hexchars[b & 0x0F];
}
return ret;
}

TCP socket data getting scrambled

I have a Multi threaded TCP socket listener program. I do a blocked read for data of a particular no of bytes(128 bytes and 4xmultiples),so my packet sizes are 128 bytes,256 bytes,384 bytes and 512 bytes.
I am having problem because sometimes data is getting messed in the socket. For eg:
Supposed to read:
<header><data payload(padded with zeros to compensate size)><footer>
ex-- ABCDddddddddd0000000000WXYZ
What i read sometimes:
ex-- ABCDdd00000000000000000dddddd00
and then the next packet looks like
00000WXYZABCDddddd00000000000000000
so i close the socket and we have defined the protocol to send back 2 or 3 old packets to avoid the loss.
my questions are
1. why does the data get scrambled/messed?
2. can it be avoided by any means?
here is my code for read data.
in = new DataInputStream(conn.getInputStream());
outStream = conn.getOutputStream();
while (m_bRunThread) {
// read incoming stream
in.readFully(rec_data_in_byte, 0, 128); //blocks until 128 bytes are read from the socket
{
//converting the read byte array into string
//finding out the size from a particular position,helps determine if any further reads are required or not.
//if the size is a multiple of 128 and the size is a multiple higher than 1 then more reads are required.
if ((Integer.parseInt(SIZE) % 128 == 0) && ((SIZE / 128) > 1)) {
for(int z = 1;z < lenSIZE;z++) {
in.readFully(rec_data_in_byte1, 0, 128);//changed from in.read(rec_data_in_byte1, 0, 128); as per suggestions
}
//extracting the data,validating and processing it
}
}
}
UPDATE:
Implemented Peters fix but the problem still persists. data is getting scrambled.
adding a few lines of extra code where the byte array is converted into a string.
byte[] REC_data=new byte[1024];
System.arraycopy(rec_data_in_byte1, 0, REC_data, 128*z, 128);
rec_data_string=MyClass2.getData(REC_data,0,Integer.parseInt(SIZE)-1,Integer.parseInt(SIZE));
the getdata() method is below:
String msg = "";//the return String
int count = 1;
for (int i = 0; i < datasize; i++) {
if (i >= startindex) {
if (count <= lengthofpacket) {
msg += String.valueOf((char) (bytedata[i]));
count++;
}
}
}
return msg;
can any of this be the reason for the scramble?
P.S-the scramble is happening the same way as it was happening before.
When you do
int lengthActuallyRead = in.read(rec_data_in_byte1, 0, 128);
You need to check the length read. Otherwise it might read 1 byte, or anything up to 128 in this case. Note, any bytes after what was actually read are untouched so they might be 0 or they could be garbage left from a previous message.
If you expect 128 bytes you can use readFully as you did previously
in.readFully(rec_data_in_byte, 0, 128);
Note: If the amount remaining is less than 128 you might want to do this.
int remaining = size - sizeReadSoFar;
int length = in.read(rec_data_in_byte1, 0, remaining);
This prevents you reading part of the next message while you are still reading the old one.

Is there any way to get the size in bytes of a string in Java?

I need the size in bytes of each line in a file, so I can get a percentage of the file read. I already got the size of the file with file.length(), but how do I get each line's size?
final String hello_str = "Hello World";
hello_str.getBytes().length is the "byte size", i.e. the number of bytes
You need to know the encoding - otherwise it's a meaningless question. For example, "foo" is 6 bytes in UTF-16, but 3 bytes in ASCII. Assuming you're reading a line at a time (given your question) you should know which encoding you're using as you should have specified it when you started to read.
You can call String.getBytes(charset) to get the encoded representation of a particular string.
Do not just call String.getBytes() as that will use the platform default encoding.
Note that all of this is somewhat make-work... you've read the bytes, decoded them to text, then you're re-encoding them into bytes...
You probably use about the following to read the file
FileInputStream fis = new FileInputStream(path);
BufferedReader br = new BufferedReader(new InputStreamReader(fis, "UTF-8"));
String line;
while ((line = br.readLine()) != null) {
/* process line */
/* report percentage */
}
You need to specify the encoding already at the beginning. If you don't, you should get UTF-8 on Android. It is the default but that can be changed. I would assume that no device does that though.
To repeat what the other answers already stated: The character count is not always the same as the byte count. Especially the UTF encodings are tricky. There are currently 249,764 assigned Unicode characters and potentially over a million (WP) and UTF uses 1 to 4 byte to be able to encode all of them. UTF-32 is the simplest case since it will always use 4 bytes. UTF-8 does that dynamically and uses 1 to 4 bytes. Simple ASCII characters use just 1 byte. (source: UTF & BOM FAQ)
To get the amount of bytes you can use e.g. line.getBytes("UTF-8").length(). One big disadvantage is that this is very inefficient since it creates copy of the String internal array each time and throws it away after that. That is #1 addressed at Android | Performance Tips
It is also not 100% accurate in terms of actual bytes read from the file for following reasons:
UTF-16 textfiles for example often start with a special 2 byte BOM (Byte Order Mark) to signal whether they have to interpreted little or big endian. Those 2 (UTF-8: 3, UTF-32: 4) bytes are not reported when you just look at the String you get from your reader. So you are already some bytes off here.
Turning every line of a file into an UTF-16 String will include those BOM bytes for each line. So getBytes will report 2 bytes too much for each line.
Line ending characters are not part of the resulting line-String. To make things worse you have different ways of signaling the end of a line. Usually the Unix-Style '\n' which is only 1 character or the Windows-Style '\r''\n' which is two characters. The BufferedReader will simply skip those. Here your calculation is missing a very variable amount of bytes. From 1 byte for Unix/UTF-8 to 8 bytes for Windows/UTF-32.
The last two reasons would negate each other if you have Unix/UTF-16, but that is probably not the typical case. The effect of the error also depends on line length: if you have an error of 4 byte for each line that is in total only 10 bytes long your progress will be quite considerably wrong (if my math is good your progress would be at 140% or 60% when after the last line, depending on whether your calculation assumes -4 or +4 byte per line)
That means so far that regardless of what you do, you get no more than an approximation.
Getting the actual byte-count could probably be done if you write your own special byte counting Reader but that would be quite a lot of work.
An alternative would be to use a custom InputStream that counts how much bytes are actually read from the underlying stream. That's not too hard to do and it does not care for encodings.
The big disadvantage is that it does not increase linearly with the lines you read since BufferedReader will fill it's internal buffer and read lines from there, then read the next chunk from the file and so on. If the buffer is large enough you are at 100% at the first line already. But I assume your files are big enough or you would not want to find out about the progress.
This for example would be such an implementation. It works but I can't guarantee that it is perfect. It won't work if streams use mark() and reset(). File reading should no do that though.
static class CountingInputStream extends FilterInputStream {
private long bytesRead;
protected CountingInputStream(InputStream in) {
super(in);
}
#Override
public int read() throws IOException {
int result = super.read();
if (result != -1) bytesRead += 1;
return result;
}
#Override
public int read(byte[] b) throws IOException {
int result = super.read(b);
if (result != -1) bytesRead += result;
return result;
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
int result = super.read(b, off, len);
if (result != -1) bytesRead += result;
return result;
}
#Override
public long skip(long n) throws IOException {
long result = super.skip(n);
if (result != -1) bytesRead += result;
return result;
}
public long getBytesRead() {
return bytesRead;
}
}
Using the following code
File file = new File("mytestfile.txt");
int linesRead = 0;
long progress = 0;
long fileLength = file.length();
String line;
CountingInputStream cis = new CountingInputStream(new FileInputStream(file));
BufferedReader br = new BufferedReader(new InputStreamReader(cis, "UTF-8"), 8192);
while ((line = br.readLine()) != null) {
long newProgress = cis.getBytesRead();
if (progress != newProgress) {
progress = newProgress;
int percent = (int) ((progress * 100) / fileLength);
System.out.println(String.format("At line: %4d, bytes: %6d = %3d%%", linesRead, progress, percent));
}
linesRead++;
}
System.out.println("Total lines: " + linesRead);
System.out.println("Total bytes: " + fileLength);
br.close();
I get output like
At line: 0, bytes: 8192 = 5%
At line: 82, bytes: 16384 = 10%
At line: 178, bytes: 24576 = 15%
....
At line: 1621, bytes: 155648 = 97%
At line: 1687, bytes: 159805 = 100%
Total lines: 1756
Total bytes: 159805
or in case of the same file UTF-16 encoded
At line: 0, bytes: 24576 = 7%
At line: 82, bytes: 40960 = 12%
At line: 178, bytes: 57344 = 17%
.....
At line: 1529, bytes: 303104 = 94%
At line: 1621, bytes: 319488 = 99%
At line: 1687, bytes: 319612 = 100%
Total lines: 1756
Total bytes: 319612
Instead of printing that you could update your progress.
So, what is the best approach?
If you know that you have simple ASCII text in an encoding that uses only 1 byte for those characters: just use String#length() (and maybe add +1 or +2 for the line ending)
String#length() is fast and simple and as long as you know what files you have you should have no problems.
If your have international text where the simple approach won't work:
for smaller files where processing each line takes rather long: String#getBytes(), the longer processing 1 line takes the lower the impact of temporary arrays and their garbage collection. The inaccuracy should be within acceptable bounds. Just make sure not to crash if progress > 100% or < 100% at the end.
for larger files above approach. The larger the file the better. Updating progress in 0.001% steps is just slowing down things. Decreasing the reader's buffer size would increases the accuracy but it also decreases the read performance.
If you have enough time: write your own Reader that tells you the exact byte position. Maybe a combination of InputStreamReader and BufferedReader since Reader already operates on characters. Android's implementation may help as starting point.
If the File is an ASCII file, then you can use String.length();
otheriwse it gets more complex.
Consider you have a string variable called hello_str
final String hello_str = "Hello World";
//Check Character length
hello_str.length() //output will be 11
// Check encoded sizes
final byte[] utf8Bytes = hello_str.getBytes("UTF-8");
utf8Bytes.length //output will be 11
final byte[] utf16Bytes= hello_str.getBytes("UTF-16");
utf16Bytes.length // output will be "24"
final byte[] utf32Bytes = hello_str.getBytes("UTF-32");
utf32Bytes.length // output will be "44"

Does Java´s BufferedReader leaves bytes in its internal buffer after a readline() call?

I´m having a problem, in my server, after I send a file with X bytes, I send a string saying this file is over and another file is coming, like
FILE: a SIZE: Y\r\n
send Y bytes
FILE a FINISHED\r\n
FILE b SIZE: Z\r\n
send Z byes
FILE b FINISHED\r\n
FILES FINISHED\r\n
In my client it does not recive properly.
I use readline() to get the command lines after reading Y or Z bytes from the socket.
With one file it works fine, with multiple files it rarely works (yeah, I dont know how it worked once or twice)
Here are some code I use to transfer binary
public static void readInputStreamToFile(InputStream is, FileOutputStream fout,
long size, int bufferSize) throws Exception
{
byte[] buffer = new byte[bufferSize];
long curRead = 0;
long totalRead = 0;
long sizeToRead = size;
while(totalRead < sizeToRead)
{
if(totalRead + buffer.length <= sizeToRead)
{
curRead = is.read(buffer);
}
else
{
curRead = is.read(buffer, 0, (int)(sizeToRead - totalRead));
}
totalRead = totalRead + curRead;
fout.write(buffer, 0, (int) curRead);
}
}
public static void writeFileInputStreamToOutputStream(FileInputStream in, OutputStream out, int bufferSize) throws Exception
{
byte[] buffer = new byte[bufferSize];
int count = 0;
while((count = in.read(buffer)) != -1)
{
out.write(buffer, 0, count);
}
}
just for note I could solve replacing readline to this code:
ByteArrayOutputStream ba = new ByteArrayOutputStream();
int ch;
while(true)
{
ch = is.read();
if(ch == -1)
throw new IOException("Conecção finalizada");
if(ch == 13)
{
ch = is.read();
if(ch == 10)
return new String(ba.toByteArray(), "ISO-8859-1");
else
ba.write(13);
}
ba.write(ch);
}
PS: "is" is my input stream from socket: socket.getInputStream();
still I dont know if its the best implementation to do, im tryinf to figure out
There's no readLine() calls in the code here, but to answer your question; Yes, calling BufferedReader.readLine() might very well leave stuff around in its internal buffer. It's buffering the input.
If you wrap one of your InputStream in a BufferedReader, you can't really get much sane behavior if you read from the BufferedReader and then later on read from the InputStream.
You could read bytes from your InputStream and parse out a text line from that by looking for a pair of \r\n bytes. When you got a line saying "FILE: a SIZE: Y\r\n" , you go on as usual, except the buffer you used to parse lines might contain the first few bytes of your file, so write those bytes out first.
Or you use the idea of FTP and use one TCP stream for commands and one TCP stream for the actual transfer, reading from the command stream with a BufferedReader.readLine(), and reading the data as you already do with an InputStream.
Yes, the main point of a BufferedReader is to buffer the data. It is reading input from its underlying Reader in bigger chunks to avoid having multiple small reads.
That it has a readLine() method is just a nice bonus which is made easily possible by the buffering.
You may want to use a DataInputStream (on top of a BufferedInputStream) and it's readLine() method, if you really have to mix text and binary data over the same connection - read the data from the same DataInputStream. (But take care about the encoding here.)
Call flush() on the OutputStream after you've written data that you want to be certain has been sent. So essentially at the end of each file call flush().
I guess you must flush your output stream in order to make sure any buffered bytes are properly sent down the stream. Closing the stream will equally have this process run.
The Javadocs for flush say:
Flushes this output stream and forces
any buffered output bytes to be
written out. The general contract of
flush is that calling it is an
indication that, if any bytes
previously written have been buffered
by the implementation of the output
stream, such bytes should immediately
be written to their intended
destination.

Categories

Resources