getBytes then toString don't give the same result each time - java

I am trying to use the sample BlueToothChat of Android but there is something I don't understand :
byte[] send = message.getBytes();
Log.d("SEND_BYTE", send.toString());
mChatService.write(send);
Here, message is a String, which is then converted to bytes, I guess in order to be sent. But When I check the log, the send.toString() part is really short even if the message I type is long. Worse, if I type twice the same message, I get 2 different logs, which is really weird I find.
Here is what I get in the log for the message hello, three times in a row :
[B#413d62e0
[B#41390078
[B#413ed3d8
There must be something (maybe really simple) I didn't get, but can(t figure out what it is. Can you help me with this?
Edit :
Maybe it is useful to add the following of the code, so here is the complete code :
byte[] send = message.getBytes();
Log.d("SEND_BYTE", send.toString());
mChatService.write(send);
// Reset out string buffer to zero and clear the edit text field (buffer is used in the write function)
mOutStringBuffer.setLength(0);
mOutEditText.setText(mOutStringBuffer);

Yes, calling toString() on a byte array is a bad idea. Arrays don't override toString(), so you get the default behaviour of Object.toString().
To reverse the String.getBytes() call, you want:
Log.d("SEND_BYTE", new String(send));
Or to see the bytes more directly:
Log.d("SEND_BYTE", Arrays.toString(send));
However, I would strongly encourage you not to do that directly. Instead, you should specify an encoding when you convert to or from binary, otherwise it will use the platform default encoding. What encoding is the chat service expecting? For example, if it's expecting UTF-8:
byte[] send = message.getBytes("UTF-8");
Log.d("SEND_BYTE", Arrays.toString(send));
mChatService.write(send);

You need to create a new string object to get the actual string
String senddata=new String(send);

Try:
Log.d("SEND_BYTE", new String(send, "UTF-8"););

Related

Converting string to byte[] returns wrong value (encoding?)

I read a byte[] from a file and convert it to a String:
byte[] bytesFromFile = Files.readAllBytes(...);
String stringFromFile = new String(bytesFromFile, "UTF-8");
I want to compare this to another byte[] I get from a web service:
String stringFromWebService = webService.getMyByteString();
byte[] bytesFromWebService = stringFromWebService.getBytes("UTF-8");
So I read a byte[] from a file and convert it to a String and I get a String from my web service and convert it to a byte[]. Then I do the following tests:
// works!
org.junit.Assert.assertEquals(stringFromFile, stringFromWebService);
// fails!
org.junit.Assert.assertArrayEquals(bytesFromFile, bytesFromWebService);
Why does the second assertion fail?
Other answers have covered the likely fact that the file is not UTF-8 encoded giving rise to the symptoms described.
However, I think the most interesting aspect of this is not that the byte[] assert fails, but that the assert that the string values are the same passes. I'm not 100% sure why this is, but I think the following trawl through the source code might give us the answer:
Looking at how new String(bytesFromFile, "UTF-8"); works - we see that the constructor calls through to StringCoding.decode()
This in turn, if supplied with tht UTF-8 character set, calls through to StringDecoder.decode()
This calls through to CharsetDecoder.decode() which decides what to do if the character is unmappable (which I guess will be the case if a non-UTF-8 character is presented)
In this case it uses an action defined by
private CodingErrorAction unmappableCharacterAction
= CodingErrorAction.REPORT;
Which means that it still reports the character it has decoded, even though it's technically unmappable.
I think this means that even when the code gets an umappable character, it substitutes its best guess - so I'm guessing that its best guess is correct and hence the String representations are the same under comparison, but the byte[] are no longer the same.
This hypothesis is kind of supported by the fact that the catch block for CharacterCodingException in StringCoding.decode() says:
} catch (CharacterCodingException x) {
// Substitution is always enabled,
// so this shouldn't happen
I don't understand it fully, but here's what I get so fare:
The problem is that the data contains some bytes which are not valid UTF-8 bytes as I know by the following check:
// returns false for my data!
public static boolean isValidUTF8(byte[] input) {
CharsetDecoder cs = Charset.forName("UTF-8").newDecoder();
try {
cs.decode(ByteBuffer.wrap(input));
return true;
}
catch(CharacterCodingException e){
return false;
}
}
When I change the encoding to ISO-8859-1 everything works fine. The strange thing (which a don't understand yet) is why my conversion (new String(bytesFromFile, "UTF-8");) doesn't throw any exception (like my isValidUTF8 method), although the data is not valid UTF-8.
However, I think I will go another and encode my byte[] in a Base64 string as I don't want more trouble with encoding.
The real problem in your code is that you don't know what the real file encoding.
When you read the string from the web service you get a sequence of chars; when you convert the string from chars to bytes the conversion is made right because you specify how to transform char in bytes with a specific encoding ("UFT-8"). when you read a text file you face a different problem. You have a sequence of bytes that needs to be converted to chars. In order to do it properly you must know how the chars where converted to bytes i.e. what is the file encoding. For files (unless specified) it's a platform constants; on windows the file are encoded in win1252 (which is very close to ISO-8859-1); on linux/unix it depends, I think UTF8 is the default.
By the way the web service call did a decond operation under the hood; the http call use an header taht defins how chars are encoded, i.e. how to read the bytes form the socket and transform then to chars. So calling a SOAP web service gives you back an xml (which can be marshalled into a Java object) with all the encoding operations done properly.
So if you must read chars from a File you must face the encoding issue; you can use BASE64 as you stated but you lose one of the main benefits of text files: the are human readable, easing debugging and developing.

How to represent header value and actual message in Byte Array Java?

I need to make a byte array in which I will have header values initially and my actual message will come after the header values.
My header values will have - data center which is a string, client_id which is integer, pool_id which is also integer and data_count is also an integer.
And my actual message which will come after header values is - hello world
In my case, my header length may grow so I need to initialize that as a variable so that I can increase it later on as needed.
I am little bit confuse in how to use Byte Array here. How can I represent this in a byte array in network byte order so that c++ program can decode this out properly on ubuntu 12.04 machine?
You can use Protocol Buffers to represent the messages (header and content). It will handle the transformations between languages and different platforms. Also, it is providing room for further expansion and support for multiple message versions.
For your example you can define the message format like (eg. messageModel.proto):
package common;
option java_package = "my.java.package";
option java_outer_classname = "MessageProto";
message MyMessage {
optional string dataCenter = 1 [default = DEFAULT_DC];
optional int64 clientId = 2;
optional int64 poolId = 3;
optional int64 dataCount = 4;
optional string body = 5;
}
Then using the protoc compile like:
protoc -I src/java/ --java_out=src/java/ messageModel.proto
You will generate the transport objects and the utility classes to marshal them from one endpoint to another (representing different messages even). Please check the java tutorial for more details.
To create a MyMessage from java you will be able to do something like:
MessageProto.MyMessage.Builder mb = MessageProto.MyMessage.newBuilder();
mb.setDataCenter("aDC");
mb.setClientId(12);
mb.setPoolId(14);
mb.setDataCount(2);
mb.setbody("hello world");
MessageProto.MyMessage message = mb.build();
To transform the message into a byte array, you will use: message.toByteArray()
If C++/C is your destination you will need to generate (from the same model) the C builders and objects too. And to decode the message you will do something like:
MessageProto.MyMessage message = MessageProto.MyMessage.parseFrom(buffer);
Where buffer will represent the received content.
If this is only a homework assignment then you can serialize your header and body message using
a DataOutputStream, but I would suggest investigating Protocol Buffers as well.
Try using a DataOutputStream that is targeted to a ByteArrayOutputStream. When you're done with writing the message to the DataOutputStream, you can obtain the constructed byte array from the ByteArrayOutputStream.
Like this:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
dos.writeInt(client_id);
dos.writeUTF(data_center);
// etc...
byte[] message = baos.toByteArray();
Protocol Buffers are also a good option, if you want more flexibility and higher performance. It depends on what you want to get out of this application; if it needs higher performance, or whether it's a one-off throwaway app or something that you expect to grow and be maintained in the longer future. DataOutputStream and DataInputStream are simple to use and you can start right away, you need to invest a bit more of your time to learn Protocol Buffers.

Protocol Buffer: From Java to Objective-C using byte[]

I am currently using google's protocol buffers. It works painlessly between Java and C#, however I am running into problems trying to achieve the same use with Obj-c.
The Java WS returns a byte[]. The code that uses the protocol buffer API is simple enough:
productGroup.toByteArray();
I am able to recreate the object with Skeet's C# port, using:
byte[] result = searchWebService.SearchProductsProtocolBuffer(search);
ProductProtoGroup products = ProductProtoGroup.ParseFrom(result);
However, on the obj-c side, I am struggling to work with the return value.
The NSString I receive from the same web service RPC is this:
CmYKEzgwMDAwMUFELTEzMjUyNzk5MTQySUZPT0QgJiBCRV...
I'm not quite sure what to do with this, because I don't know what it is, save it was generated from an array of bytes. I tried parsing it directly to NSData using
NSData* data = [returnValue dataUsingEncoding:NSUTF8StringEncoding];
but on [ProductProtoGroup parseFromData:data];, I get an InvalidProtocolBuffer
I've checked on the Java side what byte string/hexadecimal representations of the original byte[], and it doesn't match the string I receive from the ws.
Hexadecimal is 0-F. Perhaps each byte was converted to a char? No, that doesn't match.
Any help would be appreciated.
Dane
With a fresher mind and some helpful comments, I finally got it.
I was wondering how the web service automagically sent a byte[] in Java, then reconstructed it in C# later, and also how to store this message later.
Turns out the string,
CmYKEzgwMDAwMUFELTEzMjUyNzk5MTQySUZPT0QgJiBCRV...
was indeed what was being sent in the soap envelopes. So inbetween creation of a byte[] in Java and transmission, something was happening. Looking at the message handler in C#,
[System.Web.Services.Protocols.SoapDocumentMethodAttribute("",
RequestNamespace="x", ResponseNamespace="x",
Use=System.Web.Services.Description.SoapBindingUse.Literal,
ParameterStyle=System.Web.Services.Protocols.SoapParameterStyle.Wrapped)]
[return: System.Xml.Serialization.XmlElementAttribute("return",
Form=System.Xml.Schema.XmlSchemaForm.Unqualified,
DataType="base64Binary", IsNullable=true)]
public byte[] searchProductProtocolBuffer([System.Xml.Serialization.XmlElementAttribute(Form=System.Xml.Schema.XmlSchemaForm.Unqualified)] string arg0)
{
object[] results = this.Invoke("searchProductProtocolBuffer", new object[] {
arg0});
return ((byte[])(results[0]));
}
So base64Binary!, which I admittedly am meeting for the first time. Poking around, Skeet says it is the safest way:
How can I safely convert a byte array into a string and back?
So, knowing the encoding, the solution becomes straight forward. Using the algorithm presented in an answer to this question: How do I do base64 encoding on iphone-sdk?, my final code becomes:
NSString* returnValue = [WebServiceUtil processStringReturnValue:value];
NSData* data = [Encoding base64DataFromString:returnValue];
ProductProtoGroup* products = [ProductProtoGroup parseFromData:data];
I know very little about Objective C, but an NSString is no byte array. Have you tried converting the NSString to char* using -[NSString UTF8String]?

Convert String to a JMS BytesMessage in Java

I am trying to convert a String to a JMS BytesMessage. Is there a good way to do this?
I need to do this because I have a method that takes in a String that is decrypted and I need to convert it into a BytesMessage in order to decrypt the message.
Thanks
byte[] bArray = "foo".getBytes("UTF-8");
BytesMessage msg = session.createBytesMessage(); // throws JMSException
msg.writeBytes(bArray);
Of course, like Arcadien said, you need to execute the code in an appropriate environment to obtain the javax.jms.Session object from.
A bit late,
but better late than never.
The #Carlo answer is essentially correct,
except that you must catch an exception if you use getBytes
with a String parameter.
Here is a variation (of the getBytes call) that will not throw an exception:
final byte[] byteArray = "blammy".getBytes(StandardCharsets.UTF_8);
final BytesMessage bytesMessage = session.createBytesMessage();
bytesMessage.writeBytes(byteArray);
The StandardCharsets class defines charsets that are guaranteed to be available on every implementation of the Java platform.
you can use
byte[] String#getBytes()
to get a byte array from a string and write it to a ByteMessage

Read special charatters ( æ ø å ) with Java from Oracle database

i have a problem when reading special charatters from oracle database (use JDBC driver and glassfish tooplink).
I store on database the name "GRØNLÅEN KJÆTIL" through WebService and, on database, the data are store correctly.
But when i read this String, print on log file and convert this in byte array whit this code:
int pos = 0;
byte[] msg=new byte[1024];
String F = "F" + passenger.getName();
logger.debug("Add " + F + " " + F.length());
msg = addStringToArrayBytePlusSeparator(msg, F,pos);
..............
private byte[] addStringToArrayBytePlusSeparator(byte[] arrDest,String strToAdd,int destPosition)
{
System.arraycopy(strToAdd.getBytes(Charset.forName("ISO-8859-1")), 0, arrDest, destPosition, strToAdd.getBytes().length);
arrDest = addSeparator(arrDest,destPosition+strToAdd.getBytes().length,1);
return arrDest;
}
1) In the log file there is:"Add FGRÃNLÃ " (the name isn't correct and the F.length() are not printed).
2) The code throw:
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at it.edea.ebooking.business.chi.control.VingCardImpl.addStringToArrayBytePlusSeparator(Test.java:225).
Any solution?
Tanks
You're calling strToAdd.getBytes() without specifying the character encoding, within the System.arraycopy call - that will be using the system default encoding, which may well not be ISO-8859-1. You should be consistent in which encoding you use. Frankly I'd also suggest that you use UTF-8 rather than ISO-8859-1 if you have the choice, but that's a different matter.
Why are you dealing with byte arrays anyway at this point? Why not just use strings?
Also note that your addStringToArrayBytePlusSeparator method doesn't give any indication of how many bytes it's copied, which means the caller won't have any idea what to do with it afterwards. If you must use byte arrays like this, I'd suggest making addStringToArrayBytePlusSeparator return either the new "end of logical array" or the number of bytes copied. For example:
private static final Charset ISO_8859_1 = Charset.forName("ISO-8859-1");
/**
* (Insert fuller description here.)
* Returns the number of bytes written to the array
*/
private static int addStringToArrayBytePlusSeparator(byte[] arrDest,
String strToAdd,
int destPosition)
{
byte[] encodedText = ISO_8859_1.getBytes(strToAdd);
// TODO: Verify that there's enough space in the array
System.arraycopy(encodedText, 0, arrDest, destPosition, encodedText.length);
return encodedText.length;
}
Encoding/Decoding problems are hard. In every process step you have to do the correct encoding/decoding. So,
familiarize yourself with the difference of bytes (inputstream) and Characters (Readers, Strings)
Choose in which character encoding you want to store your data in the database, and in which character encoding you want to expose your webservice. Make sure when you load initial data in the database it's in the right encoding
connect with the right database properties. mysql requires an addition to the connection url:?useUnicode=true&characterEncoding=UTF-8 when using UTF-8, I don't know about oracle.
if you print/debug at a certain step and it looks ok, you can't be sure you did it right. The logger can write with the wrong encoding (sometimes making something look ok, while in fact it's broken). Your terminal might not handle strange byte encodings correct. The same holds for command-line database clients. Your data might wrongly be stored, but your wrongly configured terminal interprets/shows the data as correct.
In XML, it's not only the stream encoding that matters, but also the xml-encoding attribute.

Categories

Resources