Does read() return -1 if EOF is reached during the read operation, or on the subsequent call? The Java docs aren't entirely clear on this and neither is the book I'm reading.
The following code from the book is reading a file with three different types of values repeating, a double, a string of varying length and a binary long. The buffer is supposed to fill at some random place in the middle of any of the values, and the code will handle it. What I don't understand is if the -1 is returned during the read operation, the last values won't get output in the prinf statment.
try(ReadableByteChannel inCh = Files.newByteChannel(file)) {
ByteBuffer buf = ByteBuffer.allocateDirect(256);
buf.position(buf.limit());
int strLength = 0;
byte[] strChars = null;
while(true) {
if(buf.remaining() < 8) {
if(inCh.read(buf.compact()) == -1) {
break;
}
buf.flip();
}
strLength = (int)buf.getDouble();
if (buf.remaining() < 2*strLength) {
if(inCh.read(buf.compact()) == -1) {
System.err.println("EOF found while reading the prime string.");
break;
}
buf.flip();
}
strChars = new byte[2*strLength];
buf.get(strChars);
if(buf.remaining() <8) {
if(inCh.read(buf.compact()) == -1) {
System.err.println("EOF found reading the binary prime value.");
break;
}
buf.flip();
}
System.out.printf("String length: %3s String: %-12s Binary Value: %3d%n",
strLength, ByteBuffer.wrap(strChars).asCharBuffer(), buf.getLong());
}
System.out.println("\n EOF Reached.");
I suggest to make a simple test to understand how it works, like this
ReadableByteChannel in = Files.newByteChannel(Paths.get("1.txt"));
ByteBuffer b = ByteBuffer.allocate(100);
System.out.println(in.read(b));
System.out.println(in.read(b));
1.txt contains 1 byte, the test prints
1
-1
Related
I found a lot of answers here, but none of them is really what I want.
Lets say I can send an Image with a maximum 10000 of size, so if I send for example a image with 25796 I need to split the byte[] in 3, 10000 + 10000 + 5796.
So I guess I should have a method that receives a byte[] and returns a list<byte[]> am I correct?
Im using Arrays.copyOfRange in a cycle, but the last bytes are tricky to get (for example those 5796 in the end).
Hope someone can help me.
Thanks a lot!
edit:
I think I had success, I need to test more cases, but will do that tomorrow since im tired rn. If everything is right I will post it as an answer.
Here is what I have right now: (there is an extra method to check if the bytes match in the end)
public List<byte[]> byteSplitter(byte[] origin) {
List<byte[]> byteList = new LinkedList<>();
int splitIndex = 10000;
int currBytes = 0;
boolean bytesHasSameLength = false;
while (!bytesHasSameLength) {
if (splitIndex > origin.length) {
splitIndex = origin.length;
bytesHasSameLength = true;
}
byteList.add(Arrays.copyOfRange(origin, currBytes, splitIndex));
currBytes = splitIndex;
splitIndex += 10000;
}
return byteList;
}
public void appendByteAndCheckMatch(byte[] bytes, List<byte[]> byteList) {
ByteArrayOutputStream output = new ByteArrayOutputStream();
for (byte[] b : byteList) {
try {
output.write(b);
} catch (IOException e) {
e.printStackTrace();
}
}
byte[] bytesFromList = output.toByteArray();
System.out.println("list bytes: " + bytesFromList.length);
System.out.println("request bytes: " + bytes.length);
if (Arrays.equals(bytesFromList, bytes))
System.out.println("bytes are equals");
else
System.out.println("bytes are different");
}
result:
list bytes: 25796
request bytes: 25796
bytes are equals
You need to loop over the original array in increments of "max length" while also ensuring you don't go past the length of the array. This can be done with a single for loop. For example:
public static List<byte[]> split(byte[] source, int maxLength) {
Objects.requireNonNull(source);
if (maxLength <= 0) {
throw new IllegalArgumentException("maxLength <= 0");
}
List<byte[]> result = new ArrayList<>();
for (int from = 0; from < source.length; from += maxLength) {
int to = Math.min(from + maxLength, source.length);
byte[] range = Arrays.copyOfRange(source, from, to);
result.add(range);
}
return result;
}
That will handle any non-zero positive value for maxLength, even if the value is greater than or equal to the length of source.
Note this is copying the data from the original array into new arrays. That means the original array is effectively duplicated which may or may not be acceptable since you now consume at least twice the memory (though you may be throwing away the original). If your use case allows it, and you need to use as little memory as possible, then consider ByteBuffer.
public static List<ByteBuffer> split(byte[] source, int maxLength) {
return split(ByteBuffer.wrap(source), maxLength);
}
public static List<ByteBuffer> split(ByteBuffer source, int maxLength) {
Objects.requireNonNull(source);
if (maxLength <= 0) {
throw new IllegalArgumentException("maxLength <= 0");
}
List<ByteBuffer> result = new ArrayList<>();
for (int index = source.position(); index < source.limit(); index += maxLength) {
int length = Math.min(maxLength, source.limit() - index);
ByteBuffer slice = source.slice(index, length);
result.add(slice);
}
return result;
}
Each ByteBuffer returned in the list uses the same backing data. Of course, one consequence of this is that changes to the source data can affect every buffer. Though if you want you can make the buffers read-only (keep in mind that if you wrap an array then changing the array directly also affects the buffers, and you can't make an array read-only).
You can do it pretty easily like this. The end location is adjusted to not exceed the original size.
byte[] array = new byte[123039];
int bufSize = 10_000;
int end = 0;
for (int offset = 0; offset <= array.length; offset+= bufSize) {
end+=bufSize;
if (end >= array.length) {
end = array.length;
}
byte[] arr = Arrays.copyOfRange(array, offset, end );
// do something with arr
}
I'm trying to code a program where I can:
Load a file
Input a start and beginning offset addresses where to scan data from
Scan that offset range in search of specific sequence of bytes (such as "05805A6C")
Retrieve the offset of every match and write them to a .txt file
i66.tinypic.com/2zelef5.png
As the picture shows I need to search the file for "05805A6C" and then print to a .txt file the offset "0x21F0".
I'm using Java Swing for this. So far I've been able to load the file as a Byte array[]. But I haven't found a way how to search for the specific sequence of bytes, nor setting that search between a range of offsets.
This is my code that opens and reads the file into byte array[]
public class Read {
static public byte[] readBytesFromFile () {
try {
JFileChooser chooser = new JFileChooser();
int returnVal = chooser.showOpenDialog(null);
if (returnVal == JFileChooser.APPROVE_OPTION) {
FileInputStream input = new FileInputStream(chooser.getSelectedFile());
byte[] data = new byte[input.available()];
input.read(data);
input.close();
return data;
}
return null;
}
catch (IOException e) {
System.out.println("Unable to read bytes: " + e.getMessage());
return null;
}
}
}
And my code where I try to search among the bytes.
byte[] model = Read.readBytesFromFile();
String x = new String(model);
boolean found = false;
for (int i = 0; i < model.length; i++) {
if(x.contains("05805A6C")){
found = true;
}
}
if(found == true){
System.out.println("Yes");
}else{
System.out.println("No");
}
Here's a bomb-proof1 way to search for a sequence of bytes in a byte array:
public boolean find(byte[] buffer, byte[] key) {
for (int i = 0; i <= buffer.length - key.length; i++) {
int j = 0;
while (j < key.length && buffer[i + j] == key[j]) {
j++;
}
if (j == key.length) {
return true;
}
}
return false;
}
There are more efficient ways to do this for large-scale searching; e.g. using the Boyer-Moore algorithm. However:
converting the byte array a String and using Java string search is NOT more efficient, and it is potentially fragile depending on what encoding you use when converting the bytes to a string.
converting the byte array to a hexadecimal encoded String is even less efficient ... and memory hungry ... though not fragile if you have enough memory. (You may need up to 5 times the memory as the file size while doing the conversion ...)
1 - bomb-proof, modulo any bugs :-)
EDIT It seems the charset from system to system is different so you may get different results so I approach it with another method:
String x = HexBin.encode(model);
String b = new String("058a5a6c");
int index = 0;
while((index = x.indexOf(b,index)) != -1 )
{
System.out.println("0x"+Integer.toHexString(index/2));
index = index + 2;
}
...
I am trying to solve this interview question.
After given clearly definition of UTF-8 format. ex: 1-byte :
0b0xxxxxxx 2- bytes:.... Asked to write a function to validate whether
the input is valid UTF-8. Input will be string/byte array, output
should be yes/no.
I have two possible approaches.
First, if the input is a string, since UTF-8 is at most 4 byte, after we remove the first two characters "0b", we can use Integer.parseInt(s) to check if the rest of the string is at the range 0 to 10FFFF. Moreover, it is better to check if the length of the string is a multiple of 8 and if the input string contains all 0s and 1s first. So I will have to go through the string twice and the complexity will be O(n).
Second, if the input is a byte array (we can also use this method if the input is a string), we check if each 1-byte element is in the correct range. If the input is a string, first check the length of the string is a multiple of 8 then check each 8-character substring is in the range.
I know there are couple solutions on how to check a string using Java libraries, but my question is how I should implement the function based on the question.
Thanks a lot.
Let's first have a look at a visual representation of the UTF-8 design.
Now let's resume what we have to do.
Loop over all character of the string (each character being a byte).
We will need to apply a mask to each byte depending on the codepoint as the x characters represent the actual codepoint. We will use the binary AND operator (&) which copy a bit to the result if it exists in both operands.
The goal of applying a mask is to remove the trailing bits so we compare the actual byte as the first code point. We will do the bitwise operation using 0b1xxxxxxx where 1 will appear "Bytes in sequence" time, and other bits will be 0.
We can then compare with the first byte to verify if it is valid, and also determinate what is the actual byte.
If the character entered in none of the case, it means the byte is invalid and we return "No".
If we can get out of the loop, that means each of the character are valid, hence the string is valid.
Make sure the comparison that returned true correspond to the expected length.
The method would look like this :
public static final boolean isUTF8(final byte[] pText) {
int expectedLength = 0;
for (int i = 0; i < pText.length; i++) {
if ((pText[i] & 0b10000000) == 0b00000000) {
expectedLength = 1;
} else if ((pText[i] & 0b11100000) == 0b11000000) {
expectedLength = 2;
} else if ((pText[i] & 0b11110000) == 0b11100000) {
expectedLength = 3;
} else if ((pText[i] & 0b11111000) == 0b11110000) {
expectedLength = 4;
} else if ((pText[i] & 0b11111100) == 0b11111000) {
expectedLength = 5;
} else if ((pText[i] & 0b11111110) == 0b11111100) {
expectedLength = 6;
} else {
return false;
}
while (--expectedLength > 0) {
if (++i >= pText.length) {
return false;
}
if ((pText[i] & 0b11000000) != 0b10000000) {
return false;
}
}
}
return true;
}
Edit : The actual method is not the original one (almost, but not) and is stolen from here. The original one was not properly working as per #EJP comment.
A small solution for real world UTF-8 compatibility checking:
public static final boolean isUTF8(final byte[] inputBytes) {
final String converted = new String(inputBytes, StandardCharsets.UTF_8);
final byte[] outputBytes = converted.getBytes(StandardCharsets.UTF_8);
return Arrays.equals(inputBytes, outputBytes);
}
You can check the tests results:
#Test
public void testEnconding() {
byte[] invalidUTF8Bytes1 = new byte[]{(byte)0b10001111, (byte)0b10111111 };
byte[] invalidUTF8Bytes2 = new byte[]{(byte)0b10101010, (byte)0b00111111 };
byte[] validUTF8Bytes1 = new byte[]{(byte)0b11001111, (byte)0b10111111 };
byte[] validUTF8Bytes2 = new byte[]{(byte)0b11101111, (byte)0b10101010, (byte)0b10111111 };
assertThat(isUTF8(invalidUTF8Bytes1)).isFalse();
assertThat(isUTF8(invalidUTF8Bytes2)).isFalse();
assertThat(isUTF8(validUTF8Bytes1)).isTrue();
assertThat(isUTF8(validUTF8Bytes2)).isTrue();
assertThat(isUTF8("\u24b6".getBytes(StandardCharsets.UTF_8))).isTrue();
}
Test cases copy from https://codereview.stackexchange.com/questions/59428/validating-utf-8-byte-array
the CharsetDecoder might be what you are looking for:
#Test
public void testUTF8() throws CharacterCodingException {
// the desired charset
final Charset UTF8 = Charset.forName("UTF-8");
// prepare decoder
final CharsetDecoder decoder = UTF8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
byte[] bytes = new byte[48];
new Random().nextBytes(bytes);
ByteBuffer buffer = ByteBuffer.wrap(bytes);
try {
decoder.decode(buffer);
fail("Should not be UTF-8");
} catch (final CharacterCodingException e) {
// noop, the test should fail here
}
final String string = "hallo welt!";
bytes = string.getBytes(UTF8);
buffer = ByteBuffer.wrap(bytes);
final String result = decoder.decode(buffer).toString();
assertEquals(string, result);
}
so your function might look like that:
public static boolean checkEncoding(final byte[] bytes, final String encoding) {
final CharsetDecoder decoder = Charset.forName(encoding).newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
final ByteBuffer buffer = ByteBuffer.wrap(bytes);
try {
decoder.decode(buffer);
return true;
} catch (final CharacterCodingException e) {
return false;
}
}
public static boolean validUTF8(byte[] input) {
int i = 0;
// Check for BOM
if (input.length >= 3 && (input[0] & 0xFF) == 0xEF
&& (input[1] & 0xFF) == 0xBB & (input[2] & 0xFF) == 0xBF) {
i = 3;
}
int end;
for (int j = input.length; i < j; ++i) {
int octet = input[i];
if ((octet & 0x80) == 0) {
continue; // ASCII
}
// Check for UTF-8 leading byte
if ((octet & 0xE0) == 0xC0) {
end = i + 1;
} else if ((octet & 0xF0) == 0xE0) {
end = i + 2;
} else if ((octet & 0xF8) == 0xF0) {
end = i + 3;
} else {
// Java only supports BMP so 3 is max
return false;
}
while (i < end) {
i++;
octet = input[i];
if ((octet & 0xC0) != 0x80) {
// Not a valid trailing byte
return false;
}
}
}
return true;
}
Well, I am grateful for the comments and the answer.
First of all, I have to agree that this is "another stupid interview question". It is true that in Java String is already encoded, so it will always be compatible with UTF-8. One way to check it is given a string:
public static boolean isUTF8(String s){
try{
byte[]bytes = s.getBytes("UTF-8");
}catch(UnsupportedEncodingException e){
e.printStackTrace();
System.exit(-1);
}
return true;
}
However, since all the printable strings are in the unicode form, so I haven't got a chance to get an error.
Second, if given a byte array, it will always be in the range -2^7(0b10000000) to 2^7(0b1111111), so it will always be in a valid UTF-8 range.
My initial understanding to the question was that given a string, say "0b11111111", check if it is a valid UTF-8, I guess I was wrong.
Moreover, Java does provide constructor to convert byte array to string, and if you are interested in the decode method, check here.
One more thing, the above answer would be correct given another language. The only improvement could be:
In November 2003, UTF-8 was restricted by RFC 3629 to end at U+10FFFF, in order to match the constraints of the UTF-16 character encoding. This removed all 5- and 6-byte sequences, and about half of the 4-byte sequences.
So 4 bytes would be enough.
I am definitely to this, so correct me if I am wrong. Thanks a lot.
I am writing an application to decode MP3 frames. I am having difficulty finding the headers.
An MP3 header is 32 bits and begins with the signature: 11111111 111
In the inner loop below, I look for this signature. When this signature is found, I retrieve the next two bytes and then pass the latter three bytes of the header into a custom MpegFrame() class. The class verifies the integrity of the header and parses information from it. MpegFrame.isValid() returns a boolean indicating the validity/integrity of the frame's header. If the header is invalid, the outer loop is executed again, and the signature is looked for again.
When executing my program with a CBR MP3, only some of the frames are found. The application reports many invalid frames.
I believe the invalid frames may be a result of bits being skipped. The header is 4 bytes long. When the header is determined to be invalid, I skip all 4 bytes and start seeking the signature from the next four bytes. In a case like the following: 11111111 11101101 11111111 11101001, the header signature is found in the first two bytes, however the third byte contains an error which invalidates the header. If I skip all of the bytes because I've determined the header beginning with the first byte is invalid, I miss the pottential header starting with the third byte (as the third and fourth bytes contain a signature).
I cannot seek backwards in an InputStream, so my question is the following: When I determine a header starting with bytes 1 and 2 to be invalid, how do I run my signature finding loop starting with byte 2, rather than byte 5?
In the below code, b is the first byte of the possible header under consideration, b1 is the second byte, b2 is the third byte and b3 is the fourth byte.
int bytesRead = 0;
//10 bytes of Tagv2
int j = 0;
byte[] tagv2h = new byte[10];
j = fis.read(tagv2h);
bytesRead += j;
ByteBuffer bb = ByteBuffer.wrap(new byte[]{tagv2h[6], tagv2h[7],tagv2h[8], tagv2h[9]});
bb.order(ByteOrder.BIG_ENDIAN);
int tagSize = bb.getInt();
byte[] tagv2 = new byte[tagSize];
j = fis.read(tagv2);
bytesRead += j;
while (bytesRead < MPEG_FILE.length()) {
boolean foundHeader = false;
// Seek frame
int b = 0;
int b1 = 0;
while ((b = fis.read()) > -1) {
bytesRead++;
if (b == 255) {
b1 = fis.read();
if (b1 > -1) {
bytesRead++;
if (((b1 >> 5) & 0x7) == 0x7) {
System.out.println("Found header.");
foundHeader = true;
break;
}
}
}
}
if (!foundHeader) {
continue;
}
int b2 = fis.read();
int b3 = fis.read();
MpegFrame frame = new MpegFrame(b1, b2, b3, false);
if (!frame.isValid()) {
System.out.println("Invalid header # " + (bytesRead-4));
continue;
}
}
You can wrap your input stream in a PushbackInputStream so that you can push back some bytes and re-parse them.
I ended up writing a function to shift the bytes of an invalid header so it can be re-parsed. I call the function in a loop where I essentially Seek for valid frames.
Seek() returns true when a valid frame is found (elsewhere the last frame found by calling Seek() is stored). CheckHeader() verifies the integrity of a header. SkipAudioData() reads all the audio data of a frame, placing the stream marker at the end of the frame.
private boolean Seek() throws IOException {
while(!(CheckHeader() && SkipAudioData())){
if(!ShiftHeader()){
return false;
}
}
return true;
}
private boolean ShiftHeader() {
try {
if (bytesRead >= MPEG_FILE.length()) {
return false;
}
} catch (Exception ex) {
ex.printStackTrace();
return false;
}
header[0] = header[1];
header[1] = header[2];
header[2] = header[3];
try {
int b = fis.read();
if (b > -1) {
header[3] = b;
return true;
}
} catch (IOException ex) {
return false;
} catch (Exception ex) {
return false;
}
return false;
}
I'm trying to implement a simple client-server application, using NIO.
As an exercise, communication should be text-based and line-oriented. But when the server reads the bytes sent by the client, it gets nothing, or rather, the buffer is filled with a bunch of zeroes.
I'm using a selector, and this is the code triggered, when the channel is readable.
private void handleRead() throws IOException {
System.out.println("Handler Read");
while (lineIndex < 0) {
buffer.clear();
switch (channel.read(buffer)) {
case -1:
// Close the connection.
return;
case 0:
System.out.println("Nothing to read.");
return;
default:
System.out.println("Converting to String...");
buffer.flip();
bufferToString();
break;
}
}
// Do something with the line read.
}
In this snippet, lineIndex is an int holding the index at which the first \n occurred, when reading. It is initialized with -1, meaning there's no \n present.
The variable buffer references a ByteBuffer, and channel represents a SocketChannel.
To keep it simple, without Charsets and whatnot, this is how bufferToString is coded:
private void bufferToString() {
char c;
System.out.println("-- Buffer to String --");
for (int i = builder.length(); buffer.remaining() > 1; ++i) {
c = buffer.getChar();
builder.append(c);
System.out.println("Appending: " + c + "(" + (int) c + ")");
if (c == '\n' && lineIndex < 0) {
System.out.println("Found a new-line character!");
lineIndex = i;
}
}
}
The variable builder holds a reference to a StringBuilder.
I expected getChar to do a reasonable convertion, but all I get in my output is a bunch (corresponding to half of the buffer capacity) of
Appending: (0)
Terminated by a
Nothing to read.
Any clues of what may be the cause? I have similar code in the client which is also unable to properly read anything from the server.
If it is of any help, here is a sample of what the writing code looks like:
private void handleWrite() throws IOException {
buffer.clear();
String msg = "Some message\n";
for (int i = 0; i < msg.length(); ++i) {
buffer.putChar(msg.charAt(i));
}
channel.write(buffer);
}
I've also confirmed that the result from channel.write is greater than zero, reassuring that the bytes are indeed written and sent.
Turns out, this was a buffer indexing problem. In the server, a flip() was missing before writing to the socket. In the client code, a few flip() were missing too, after reading and before writing. Now everything works as expected.
Current writing code (server side):
private void handleWrite() throws IOException {
String s = extractLine();
for (int i = 0, len = s.length(); i < len;) {
buffer.clear();
while (buffer.remaining() > 1 && i < len) {
buffer.putChar(s.charAt(i));
++i;
}
buffer.flip();
channel.write(buffer);
}
// some other operations...
}