send a string from c++ to java through socket - java

I have a block of code on the c++ end that would pass a string like this:
char someName[100]="Some String here";
send(sock,someName,sizeof(someName),0);
and on the other end I have a java code looking for a string message like this:
DataInputStream dIn= new DataInputStream(SOCK.getInputStream());
String filename=dIn.readUTF(); //Looks for "Some String here"
the code does not continue and gives a UTFDataFormatException. So I'm basically looking for conversion on c++ someName to UTF-8 format so both ends will be happy!
Thanks!
EDIT:
I tried using BufferedReader on the java code and got something like:
‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~‡áø~

DataInputBuffer.ReadUTF() reads in a string encoded in modified utf-8, which makes it a bad candidate for reading data that is not encoded with DataOutputBuffer.WriteUTF().
Also, your C++ program sends a fixed length string padded with zero's. This can also cause problem when the receiver doesn't expect this.
C++11 also supports UTF-8 string literals, see here.

Write the data into a text file from C/C++ application and read that text file from Java application.

Related

Converting string to byte[] returns wrong value (encoding?)

I read a byte[] from a file and convert it to a String:
byte[] bytesFromFile = Files.readAllBytes(...);
String stringFromFile = new String(bytesFromFile, "UTF-8");
I want to compare this to another byte[] I get from a web service:
String stringFromWebService = webService.getMyByteString();
byte[] bytesFromWebService = stringFromWebService.getBytes("UTF-8");
So I read a byte[] from a file and convert it to a String and I get a String from my web service and convert it to a byte[]. Then I do the following tests:
// works!
org.junit.Assert.assertEquals(stringFromFile, stringFromWebService);
// fails!
org.junit.Assert.assertArrayEquals(bytesFromFile, bytesFromWebService);
Why does the second assertion fail?
Other answers have covered the likely fact that the file is not UTF-8 encoded giving rise to the symptoms described.
However, I think the most interesting aspect of this is not that the byte[] assert fails, but that the assert that the string values are the same passes. I'm not 100% sure why this is, but I think the following trawl through the source code might give us the answer:
Looking at how new String(bytesFromFile, "UTF-8"); works - we see that the constructor calls through to StringCoding.decode()
This in turn, if supplied with tht UTF-8 character set, calls through to StringDecoder.decode()
This calls through to CharsetDecoder.decode() which decides what to do if the character is unmappable (which I guess will be the case if a non-UTF-8 character is presented)
In this case it uses an action defined by
private CodingErrorAction unmappableCharacterAction
= CodingErrorAction.REPORT;
Which means that it still reports the character it has decoded, even though it's technically unmappable.
I think this means that even when the code gets an umappable character, it substitutes its best guess - so I'm guessing that its best guess is correct and hence the String representations are the same under comparison, but the byte[] are no longer the same.
This hypothesis is kind of supported by the fact that the catch block for CharacterCodingException in StringCoding.decode() says:
} catch (CharacterCodingException x) {
// Substitution is always enabled,
// so this shouldn't happen
I don't understand it fully, but here's what I get so fare:
The problem is that the data contains some bytes which are not valid UTF-8 bytes as I know by the following check:
// returns false for my data!
public static boolean isValidUTF8(byte[] input) {
CharsetDecoder cs = Charset.forName("UTF-8").newDecoder();
try {
cs.decode(ByteBuffer.wrap(input));
return true;
}
catch(CharacterCodingException e){
return false;
}
}
When I change the encoding to ISO-8859-1 everything works fine. The strange thing (which a don't understand yet) is why my conversion (new String(bytesFromFile, "UTF-8");) doesn't throw any exception (like my isValidUTF8 method), although the data is not valid UTF-8.
However, I think I will go another and encode my byte[] in a Base64 string as I don't want more trouble with encoding.
The real problem in your code is that you don't know what the real file encoding.
When you read the string from the web service you get a sequence of chars; when you convert the string from chars to bytes the conversion is made right because you specify how to transform char in bytes with a specific encoding ("UFT-8"). when you read a text file you face a different problem. You have a sequence of bytes that needs to be converted to chars. In order to do it properly you must know how the chars where converted to bytes i.e. what is the file encoding. For files (unless specified) it's a platform constants; on windows the file are encoded in win1252 (which is very close to ISO-8859-1); on linux/unix it depends, I think UTF8 is the default.
By the way the web service call did a decond operation under the hood; the http call use an header taht defins how chars are encoded, i.e. how to read the bytes form the socket and transform then to chars. So calling a SOAP web service gives you back an xml (which can be marshalled into a Java object) with all the encoding operations done properly.
So if you must read chars from a File you must face the encoding issue; you can use BASE64 as you stated but you lose one of the main benefits of text files: the are human readable, easing debugging and developing.

Writing strings with chars like "ñ" to a txt file

Im having a strange issue trying to write in text files with strings which contain characters like "ñ", "á".. and so on. Let me first show you my little piece of code:
import java.io.*;
public class test {
public static void main(String[] args) throws Exception {
String content = "whatever";
int c;
c = System.in.read();
content = content + (char)c;
FileWriter fw = new FileWriter("filename.txt");
BufferedWriter bw = new BufferedWriter(fw);
bw.write(content);
bw.close();
}
}
In this example, im just reading a char from the keyboard input and appending it to a given string; then writting the final string into a txt. The problem is that if I type an "ñ" for example (i have a Spanish layout keyboard), when i check the txt, it shows a strange char "¤" where there should be a "ñ", that is, the content of the file is "whatever¤". The same happens with "ç", "ú"..etc. However it writes it fine ("whateverñ") if i just forget about the keyboard input and i write:
...
String content = "whateverñ";
...
or
...
content = content + "ñ";
...
It makes me think that there might be something wrong with the read() method? Or maybe im using it wrongly? or should i use a different method to get the keyboard input? or..? Im a bit lost here.
(Im using the jdk 7u45 # Windows 7 Pro x64)
So ...
It works (i.e. you can read the accented characters on the output file) if you write them as literal strings.
It doesn't work when you read them from System.in and then write them.
This suggests that the problem is on the input side. Specifically, I think your console / keyboard must be using a character encoding for the input stream that does not match the encoding that Java thinks should be used.
You should be able to confirm this tentative diagnosis by outputting the characters you are reading in hexadecimal, and then checking the codes against the unicode tables (which you can find at unicode.org for example).
It strikes me as "odd" that the "platform default encoding" appears to be working on the output side, but not the input side. Maybe someone else can explain ... and offer a concrete suggestion for fixing it. My gut feeling is that the problem is in the way your keyboard is configured, not in Java or your application.
files do not remember their encoding format, when you look at a .txt, the text editor makes a "best guess" to the encoding used.
if you try to read the file into your program again, the text should be back to normal.
also, try printing the "strange" character directly.

conwert object which contains strings with utf-8 to string with proper coding

I'm processing MMS and got it text part as :
mmsBodyPart.getContent();
it's simpy Object. Now i need to convert it to String using utf-8. I have tried:
String contentText = (String) mmsBodyPart.getContent();
but it doesn't works with specyfics characters and some strange chars appear.
Also i tried :
String content = new String(contentText.getBytes("UTF-8"), "UTF-8"));
not a mystery that also failed.
How that can be done ?
EDIT: Problem was caused by bad encoding in file. Nothing wrong was in code, ya didn't thought about it in first place...
Strings haven't an Encoding in Java. If you need one, you should use byte[] with Encoding to get a String

Encryption/Decryption, getting IllegalBlockSizeException

I'm working with Java's encryption library and getting a IllegalBlockSizeException.
I am currently trying to extract database contents in XML file format. During the data dump, I am creating a manifest file with a string that gets decrypted using a key defined in the database.
Later, when the contents of the XML's files are loaded into another database, it gets the key from that database and uses it to decrypt the manifest. If the decrypted manifest does not match the original contents, that means the encryption keys in the source and destination databases do not match and the user is notified of this.
The following is the code. The EncryptionEngine object is a singleton that uses the Java encryption library to abstract away a lot of the details of encryption. Assume that it works correctly, as it's fairly old and mature code.
This is all in a class I've made. First, we have these data members:
private final String encryptedManifestContents;
private final static String DECRYPTED_MANIFEST_CONTENTS = "This file contains the encrypted string for validating data in the dump and load process";
final static String ENCRYPTED_MANIFEST_FILENAME = "manifest.bin";
First the encryption process. The string is encrypted like the following:
final EncryptionEngine encryptionEngine = EncryptionEngine.getInstance();
encryptedManifestContents = encryptionEngine.symmetricEncrypt(DECRYPTED_MANIFEST_CONTENTS); // The contents get converted to bytes via getBytes("UTF-8")
Then written to the manifest file (destination is just a variable holding the file path as a string):
EncryptedManifestUtil encryptedManifestUtil = new EncryptedManifestUtil(); // The class I've created. The constructor is the code above, which just initialized the EncryptionEngine and encrypted the manifest string.
manifestOut = new FileOutputStream(destination + "/" + ENCRYPTED_MANIFEST_FILENAME);
manifestOut.write(encryptedManifestUtil.encryptedManifestContents.getBytes("UTF-8"));
At this point, the encryption process is done. We've taken a String, encrypted it, and written the contents to a file, in that order. Now when someone loads the data, the decryption process starts:
BufferedReader fileReader = new BufferedReader(new FileReader(filename)); // Filename is the manifest's file name and location
final EncryptionEngine encryptionEngine = EncryptionEngine.getInstance();
String decryptedManifest = encryptionEngine.decryptString(fileReader.readLine().getBytes("UTF-8")); // This is a symmetric decrypt
When the decryption happens, it throws this exception:
Caused by: javax.crypto.IllegalBlockSizeException: last block incomplete in decryption
at org.bouncycastle.jcajce.provider.symmetric.util.BaseBlockCipher.engineDoFinal(Unknown Source)
at javax.crypto.Cipher.doFinal(DashoA13*..)
It appears to read and write correctly to the file, but the contents are gibberish to me. The result from the fileReader.readLine() is:
9�Y�������䖷�߾��=Ă��� s7Cx�t�b��_-(�b��LFA���}�6�f����Ps�n�����ʢ�#�� �%��%�5P�p
Thanks for the help.
EDIT: So I changed the way I write to a file.
Recall this line:
encryptedManifestContents = encryptionEngine.symmetricEncrypt(DECRYPTED_MANIFEST_CONTENTS);
The encrypt first gets the bytes from the inputted string, then decrypts, then changes the bytes back to a string by first encoding it to the base 64 bytes. Then it converts the base 64 bytes array back to a string.
With this in mind, I changed the file writer to a PrintWriter instead of a FileOutputStream and directly write the string to the file instead of the bytes. I'm still getting the error unfortunately. However there seem to be less of the � in the resulting String from the read line.
It looks like the problem is with your fileReader.readLine() - you're writing a byte stream to a file, and then reading it back in as a string. Instead, you should either read in a byte stream, e.g. refer to this question, or else use Base64 Encoding to convert your byte array to a string, write it to a file, read it from a file, and convert it back to a byte array.
I believe you are incorrectly using a Reader which is an object defined to read characters when you actually want to be dealing strictly in bytes. This is most likely not the entirety of your problem but if you are writing bytes you should read bytes not characters.

How to handle character encoding with XML, JDom, JNI and C++

I am developing an application that reads in an XML document and passes the contents with JNI to a C++-DLL which validates it.
For this task I am using JDom and JUniversalChardet to parse the XML file in the correct encoding. My C++ accepts a const char* for the contents of the XML file and needs it in the encoding "ISO-8895-15", otherwise it will throw an exception because of malformed characters.
My first approach was to use the shipped OutputFormatter of JDom and tell it to use Charset.forName("ISO-8859-15") while formatting the JDom document to a String. After that the header part of the XML in this String says:
<?xml version="1.0" encoding="ISO-8859-15"?>
The Problem is that it is still stored in a Java String and therefore UTF-16 if I got that right.
My native method looks something like this:
public native String jniApiCall(String xmlFileContents);
So I pass the above mentioned String from the OutputFormatter of JDom into this JNI-Method. Still everything UTF-16, right?
In the JNI-C++-Method I access the xmlFileContents String with
const string xmlDataString = env->GetStringUTFChars(xmlFileContents, NULL);
So, now I got my above mentioned String in UTF-16 or UTF-8? And my next question would be: how can I change the character encoding of the std::string xmlDataString to ISO-8859-15? Or is the way I am doing this not exactly elegant? Or is there a way to do the character encoding completely in Java?
Thanks for your help!
Marco
You can always convert any String to byte array with needed character encoding using byte[] getBytes(Charset charset) method (or even byte[] getBytes(String charsetName)).
In java you can maybe use myString.getBytes("ISO-8859-15"); to get the byte array of the String using the character encoding used as parameter (in this case ISO-8859-15).
And then use that byte array in C to get the std::string with something like:
std::string myNewstring ( reinterpret_cast< char const* >(myByteArray) )

Categories

Resources