Java String encoding

Java String encoding - java

What´s the difference between
"hello world".getBytes("UTF-8");
and
Charset.forName("UTF-8").encode("hello world").array();
?
The second code produces a byte array with 0-bytes at the end in most cases.

Your second snippet uses ByteBuffer.array(), which just returns the array backing the ByteBuffer. That may well be longer than the content written to the ByteBuffer.
Basically, I would use the first approach if you want a byte[] from a String :) You could use other ways of dealing with the ByteBuffer to convert it to a byte[], but given that String.getBytes(Charset) is available and convenient, I'd just use that...
Sample code to retrieve the bytes from a ByteBuffer:
ByteBuffer buffer = Charset.forName("UTF-8").encode("hello world");
byte[] array = new byte[buffer.limit()];
buffer.get(array);
System.out.println(array.length); // 11
System.out.println(array[0]); // 104 (encoded 'h')

Related

conversion of byte array to string causing OOM

In my application i m storing strings using randomaccessfile and while reading back the string i need to convert byte array to string, which is causing OOM. Is there a better way to convert other than this
str = new String(b, "UTF-8");
where b is byte array

Is there a better way to convert other than new String(bytes, "UTF-8") ?
This is actually a rather complicated question.
This constructor cannot simply incorporate the byte[] into the string:
Prior to Java 9, it is always necessary to decode the byte array to a UTF-16 coded array of char. So the constructor is liable to allocate roughly double the memory used by the source byte[].
With Java 9 you have the option of using a new compact representation for String. If you do the AND if the UTF-8 encoded byte array only contains code-points in Unicode code-plane zero (\u0000 to \u00ff) then the String value is a byte[]. However, even in this case the constructor must copy the bytes to a new byte[].
In both cases, there is no more space-efficient way to create a String from a byte[]. Furthermore, I don't think there is a more space-efficient way do the conversion starting with a stream of bytes and a character count. (I am excluding things like modifying the lava.lang.* implementation, or breaking abstraction using reflection.)
Bottom line: when converting a byte[] to a String you should allow at least twice as much contiguous free memory as the original byte[] if you want your code to work on older JVMs.

Slicing byte arrays in Java

I'm trying to slice a byte array to prune the first part of the array. I'm using ByteBuffer but it does not behave like I would expect.
byte[] myArray = new byte[10];
ByteBuffer buf = ByteBuffer.wrap(myArray);
buf.position(5);
ByteBuffer slicedBuf = buf.slice();
byte[] newArray = slicedBuf.array();
I would expect the size of newArray to be 5, containing only the last portion of my ByteBuffer. Instead, the full byte array is returned. I understand that this is because the "backing buffer" is the same all along.
How can I slice to have only the desired part of the array?
EDIT: Added context
The bytes are received from network. The buffer is formed like this :
[ SHA1 hash ] [ data... lots of it ]
I already have a function that takes a byte array as a parameter and calculate the SHA1 hash. What I want is to slice the full buffer to pass only the data without the expected hash.

You can use the Arrays.copyOfRange method. For example:
// slice from index 5 to index 9
byte[] slice = Arrays.copyOfRange(myArray, 5, 10);

The ByteBuffer you created is being backed by that array. When you call slice() you effectively receive a specific view of that data:
Creates a new byte buffer whose content is a shared subsequence of this buffer's content.
So calling array() on that returned ByteBuffer returns the backing array in its entirety.
To extract all the bytes from that view, you could do:
byte[] bytes = new byte[slicedBuf.remaining()];
slicedBuf.read(bytes);
The bytes from that view would be copied to the new array.
Edit to add from comments below: It's worth noting that if all you're interested in doing is copying bytes from one byte[] to another byte[], there's no reason to use a ByteBuffer; simply copy the bytes.

how to convert php unpack() in a similar method in Java

I've no coding experience in PHP at all. But while looking for a solution for my Java project, i found an example of the problem in PHP, which incidentally is alien to me.
Can anyone please explain the working and the result of the unpack('N*',"string") function of PHP and how to implement it in Java?
An example would help me a lot!
Thanks!

In PHP (and in Perl, where PHP copied it from), unpack("N*", ...) takes a string (actually representing a sequence of bytes) and parses each 4-byte segment of it as a signed 32-bit big-endian ("Network byte order") integer, returning them in an array.
There are several ways to do the same in Java, but one way would be to wrap the input byte array in a java.nio.ByteBuffer, convert it to an IntBuffer and then read the integers from that:
public static int[] unpackNStar ( byte[] bytes ) {
// first, wrap the input array in a ByteBuffer:
ByteBuffer byteBuf = ByteBuffer.wrap( bytes );
// then turn it into an IntBuffer, using big-endian ("Network") byte order:
byteBuf.order( ByteOrder.BIG_ENDIAN );
IntBuffer intBuf = byteBuf.asIntBuffer();
// finally, dump the contents of the IntBuffer into an array
int[] integers = new int[ intBuf.remaining() ];
intBuf.get( integers );
return integers;
}
Of course, if you just want to iterate over the integers, you don't really need the IntBuffer or the array:
ByteBuffer buf = ButeBuffer.wrap( bytes );
buf.order( ByteOrder.BIG_ENDIAN );
while ( buf.hasRemaining() ) {
int num = buf.getInt();
// do something with num...
}
In fact, iterating over a ByteBuffer like this is a convenient way to emulate the behavior of even more complicated examples of unpack() in Perl or PHP.
(Disclaimer: I have not tested this code. I believe it should work, but it's always possible that I may have mistyped or misunderstood something. Please test before using.)
Ps. If you're reading the bytes from an input stream, you could also wrap it in a DataInputStream and use its readInt() method. Of course, it's also possible to use a ByteArrayInputStream to read the input from a byte array, achieving the same results as the ByteBuffer examples above.

How to convert Integer array to InputStream?

I would like to convert an integer array in java, to an Inputstream, after that I would like to use the stream of bytes to be decompressed using LZMA library.
int [] header = new int[copy.length];
edu.coeia.Compression.LZMA.Decoder decoder = new edu.coeia.Compression.LZMA.Decoder();
ByteArrayInputStream bStream = new ByteArrayInputStream(bheader);
bStream.coder(// InputSream of bytes);

What you need to do is convert the array of integers into an equivalent array of bytes, and then use the ByteArrayInputStream(byte[]) constructor to create the input stream. Finally, decode the stream using the code that you already have.
The first step (conversion) is probably the one that you are having difficulty with, but the code depends on how the bytes are represented in the integer array.

pass array byte to getReader

How can I pass array byte to getReader without changes data.
byte_msg = Some array byte
println(">>>" + byte_msg)
HttpServletRequest.getReader returns new BufferedReader(
new InputStreamReader(new ByteArrayInputStream(byte_msg)))
And post reciever:
byte_msg = IOUtils.toByteArray(post.request.getReader)
println("<<<" + byte_msg)
And print return. Why do I get different answers?
>>>[B#38ffd135
<<<[B#60c0c8b5

You're printing out the result of byte[].toString() - which isn't the value of the byte array... it's just the value returned by Object.toString() - [B for "byte array", # and then the hash code. You need to convert the data to hex or something like that - which you need to do explicitly. For example, you could use the Hex class from Apache Commons Codec:
String hex = new String(Hex.encode(byte_msg));
Not that if this is arbitrary binary data you should not use InputStreamReader to convert it to a string in the first place. InputStreamReader is designed for binary data which is encoded text data - and IMO you should specify the encoding, too.
If you want to transfer arbitrary binary data, you should either transfer it without any conversion into text (so see whether your post class allows that) or use something like hex or base64 to convert to/from binary data safely.

IOUtils.toByteArray creates a new ByteArrayOutputStream then uses toByteArray() which creates a new byte[] and this array being a new objects has a new object id (the hash code you see, which is different). And this happens even if the content of the array was not changed.
In this case the mere observation (via IOUtils.toByteArray) has altered the output, because this check creates a new byte[] ;)
As Jon said, check the content of the array to see if there are any changes.

In order to print the content arrays you can convert the content of array to string using :
java.util.Arrays.toString(byte[])
and then print the result to stdout.
println(">>>" + Arrays.toString(byte_msg));
j.u.Arrays documentation is here.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java String encoding - java

What´s the difference between "hello world".getBytes("UTF-8"); and Charset.forName("UTF-8").encode("hello world").array(); ? The second code produces a byte array with 0-bytes at the end in most cases.

Related

conversion of byte array to string causing OOM

Slicing byte arrays in Java

how to convert php unpack() in a similar method in Java

How to convert Integer array to InputStream?

pass array byte to getReader

Categories

Resources