NewString() & NewStringUTF() showing error not valid Modified UTF-8:

NewString() & NewStringUTF() showing error not valid Modified UTF-8: - java

I am trying to pass char* from C++ to java using JNI in android.
I have tried number of ways to pass that data
1) Using NewStringUTF:
const char* data = getData(); // this method returns a char array.
env->NewStringUTF(data);
Executing above code throws below error
JNI WARNING: input is not valid Modified UTF-8: illegal continuation byte 0x70.
2) Using NewString:
const char* data = getData(); // this method returns a char array.
// passing a byte array to java
jbyteArray trackIDArray = env->NewByteArray(strlen(data));
env->SetByteArrayRegion(trackIDArray, 0, strlen(data), (const jbyte*)trackID);
On java side, I am getting some garbage value. I don't understand how to get this char array to Java.

1) your data is simply not a valid UTF-8 string. Not every char array is automatically a valid UTF-8. You probably have it in some single-byte encoding (like ISO or Windows CP), or it's not a readable string at all.
2) should be ok, but show the code which fills trackID from data. The fact that you need to hard typecast it to jbyte* is suspicious. This code might be correct, but you can make a mistake on Java side too:
If data is not a readable string or is in single-byte encoding which is not "platform's default charset" java.lang.String(byte[]) constructor won't be able to make a readable string out of it! In that case, you must convert to UTF-8 on C side. You will also release yourself from the dependency on platform specific encoding (which may be wildly different).

I would suspect data instead of trackID.
env->SetByteArrayRegion(trackIDArray, 0, strlen(data), (const jbyte*)data);
Then you have the bytes and on the java side may look what encoding it is - by a hex dump or other inspection.
Later:
String s = new String(data, "Cp1252"); // Or so.

NewStringUTF expects you to pass a Modified UTF-8 string.
You are likely trying to pass UTF-8.
There are multiple ways to fix it:
Most obvious one is to encode the string to UTF-8 modified in C++ before passing it to Java.
Another way is to pass it to Java as a byte array and use String constructor to convert it from UTF-16.
The second way might be more efficient as in the end Java uses UTF-16 for string representation.
As an alternative approach, you could convert the string to UTF-16 in C++ and pass it to newString JNI function which expects UTF-16.

I put very big bytesources (>2kbyte) behind the JNI like this :
Content of a csv-table:
R"xxx(tbl_Cbla,Column 02,Column 03,Column 04
sdfsdsad,sdfasd,dsfaddf,fdsasdf,fafasa
18,1,10,8,0)xxx"`
std::string data1 =
#include "big_table1.csv"
;
std::string data2 =
#include "big_table2.csv"
;
extern "C" JNIEXPORT jbyteArray JNICALL
Java_com_example_bigData_rawResource(
JNIEnv *env,
jobject /* this */, jint index) {
std::string values;
switch (index) {
case 0: {values = data1;break;}
case 1: {values = data2;break;}
}
int byteCount = values.length();
jbyteArray ret = env->NewByteArray(byteCount);
const jbyte* pNativeMessage = reinterpret_cast<const jbyte*>(values.c_str());
env->SetByteArrayRegion (ret, 0, byteCount, pNativeMessage);
return ret;
}
In Java you can get it back like this, to import the native function is up to you:
ByteArrayInputStream bis = null;
try {
bis = new ByteArrayInputStream(rawResource(1);
} catch (Exception e) {
e.printStackTrace();
}
BufferedReader buffer = new BufferedReader(new InputStreamReader(bis, Charset.forName("UTF-8")));
To to handle the buffered reader is also up to you, small exg.:
Strig line = buffer.readLine();
while ((line = buffer.readLine()) != null) {
//play around with 'line'
}

Related

In Java Is it possible to convert character set 1047 into another character set, say 500?

I have a program which reads a message from MQ. the character set is 1047. Since my java version is very old it doesn't support thus character set.
Is it possible to change this string into char set 500 in the program after receiving but before reading.
For eg:
public void fun (String str){ //str in char set 1047. **1047 is not supported in my system**
/* can I convert str into char set 500 here. Convert it into byte stream and then back to string. Something like this */
byte [] b=str.getBytes();
ByteArrayOutputStream baos = null;
try{
baos = new ByteArrayOutputStream();
baos.write(b);
String str = baos.toString("IBM-500");
System.out.println(str);
}

byte [] b=str.getBytes(); //will convert string(encoding could only be Unicode in jvm) to bytes using file.encoding. You should check whether the str contains correct information, if so, you need not care the 1047 encoding, just run str.getBytes("IBM-500"), you will get the 500 encoded bytes. Again, String object only use Unicode, if you convert string to bytes, the encoding matters the result bytes array.

How to convert a String-represented ByteBuffer into a byte array in Java

I'm new to Java and I'm no sure how to do the following:
A Scala application somewhere converts a String into bytes:
ByteBuffer.wrap(str.getBytes)
I collect this byte array as a Java String, and I wish to do the inverse of what the Scala code above did, hence get the original String (object str above).
Getting the ByteBuffer as a String to begin with is the only option I have, as I'm reading it from an AWS Kinesis stream (or is it?). The Scala code shouldn't change either.
Example string:
String str = "AAAAAAAAAAGZ7dFR0XmV23BRuufU+eCekJe6TGGUBBu5WSLIse4ERy9............";
How can this be achieved in Java?
EDIT
Okay, so I'll try to elaborate a little more about the process:
A 3rd party Scala application produces CSV rows which I need to consume
Before storing those rows in an AWS Kinesis stream, the application does the following to each row:
ByteBuffer.wrap(output.getBytes);
I read the data from the stream as a string, and the string could look like the following one:
String str = "AAAAAAAAAAGZ7dFR0XmV23BRuufU+eCekJe6TGGUBBu5WSLIse4ERy9............";
I need to restore the contents of the string above into its original, readable, form;
I hope I've made it clearer now, sorry for puzzling you all to begin with.

If you want to go from byte[] to String, try new String(yourBytes).
Both getBytes and the String(byte[]) uses the default character encoding.
From Amazon Kinesis Service API Reference:
The data blob to put into the record, which is Base64-encoded when the blob is serialized.
You need to base64 decode the string. Using Java 8 it would look like:
byte[] bytes = Base64.getDecoder().decode("AAAAAAAAAAGZ7dFR0XmV23BR........");
str = new String(bytes, "utf-8"));
Other options: Base64 Encoding in Java

I m not sure if I understand the question exactly but do you mean this?
String decoded = new String(bytes);

public static void main(String[] args){
String decoded = new String(bytesData);
String actualString;
try{
actualString = new String(bytesData,"UTF-8");
System.out.printLn("String is" + actualString);
}catch(UnsupportedEncodingException e){
e.printstacktrace();
}
}

Sorry,wrong answer.
Again,ByteBuffer is a java class. SO they may work the same way
You need java version..
From kafka ApiUtils:
def writeShortString(buffer:ByteBuffer,string:String){
if(String == null){
buffer.putShort(-1)
}
else{
val encodedString = string.getBytes(“utf-8”)
if(encodedString.length > Short.MaxValue){
throw YourException(Your Message)
else{
buffer.putShort(encodedString.length.asInstanceOf[Short])
buffer.put(encodedString)
}
}
}

For Kinesis data blobs:
private CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.decode(record.getData()).toString();

JAVA: failing to get encrypted data in string using xor

I was trying to print encrypted text using string perhaps i was wrong somewhere. I am doing simple xor on a plain text. Coming encrypted text/string i am putting in a C program and doing same xor again to get plain text again.
But in between, I am not able to get proper string of encrypted text to pass in C
String xorencrypt(byte[] passwd,int pass_len){
char[] st = new char[pass_len];
byte[] crypted = new byte[pass_len];
for(int i = 0; i<pass_len;i++){
crypted[i] = (byte) (passwd[i]^(i+1));
st[i] = (char)crypted[i];
System.out.println((char)passwd[i]+" "+passwd[i] +"= " + (char)crypted[i]+" "+crypted[i]);/* characters are printed fine but problem is when i am convering it in to string */
}
return st.toString();
}
I don't know if any kind of encoding also needed because if i did so how I will decode and decrypt from C program.
example if suppose passwd = bond007
then java program should return akkb78>
further C program will decrypt akkb78> to bond007 again.

Use
return new String(crypted);
in that case you don't need st[] array at all.
By the way, the encoded value for bond007 is cmm`560 and not what you posted.
EDIT
While solution above would most likely work in most java environments, to be safe about encoding,
as suggested by Alex, provide encoding parameter to String constructor.
For example if you want your string to carry 8-bit bytes :
return new String(crypted, "ISO-8859-1");
You would need the same parameter when getting bytes from your string :
byte[] bytes = myString.getBytes("ISO-8859-1")
Alternatively, use solution provided by Alex :
return new String(st);
But, convert bytes to chars properly :
st[i] = (char) (crypted[i] & 0xff);
Otherwise, all negative bytes, crypted[i] < 0 will not be converted to char properly and you get surprising results.

Change this line:
return st.toString();
with this
return new String(st);

Sending java string as c/c++ bytes

I'm working on a Android app that acts as TCP server/client i want to send data from the app to a server/client that is written in c/c++(made with the boost libraray). I have a normal Java function that calls a native c function for string converstion to bytes:
The function is defined as followed (the native function is Convert String:
// Send buffer, the method can be used by both client and server objects.
public void SendBuffer(String Buffer){
try {
// Convert char to string to byte
byte[] Temp = new byte[10];
String Teststring = "AAAAAAAABB";
Temp = ConvertString(Teststring);
//byte[] Temp = new String(Buffer).getBytes();
// Get socket output stream
OutputStream OutputBuffer = ClientSocket.getOutputStream();
//Write byte data to outputstream
OutputBuffer.write(Temp);
// Neatly flush and close the outputbuffer
OutputBuffer.flush();
OutputBuffer.close();
}
catch (IOException e) {
Log.e("TCPIPCommunicator: ", "Client: Failed to send", e);
e.printStackTrace();
}
}
The function ConvertString is a native function that converts the Java string to a C/C++ string and returns it as Java bytes, it is defined as followed:
JNIEXPORT jbyteArray JNICALL Java_com_example_communicationmoduleTCPIP_communicationmoduleTCPIP_ConvertString(
JNIEnv * env, jobject,
jstring Buffer)
{
// Array to fill with data
jbyteArray Array;
// Init java byte array
Array = env->NewByteArray(10);
const char* NewBuffer = env->GetStringUTFChars(Buffer, 0);
// Set byte array region with the size of the SendData CommStruct.
// Now we can send the data back.
env->SetByteArrayRegion(Array, 0, 10, (jbyte*)NewBuffer);
env->ReleaseStringUTFChars(Buffer, NewBuffer);
// Return java array
return Array;
}
}
When i run the program i get two 'AAAA' on the c side but not as a whole array ( so no 'AAAAAAAADD). I think the problem is that the server sends 2 'AAAA' and not the whole array at once. The client crashes with the following error:
'boost::exception_detail::clone_impl >'
what(): read: End of file
Does the java server sends the data wrong? can anyone give me a suggestion? all feedback is welcome!

Are you willing to send the data or to make the JNI stuff work?
In the former case, use Java to convert a string into UTF-8 (which will be ASCII for English.)
Conversion of text byte[] -> byte[] is not exactly what you need, but you'll get the idea:
//byte[] result;
//byte[] source;
String s = new String(source,"UTF-8");
result = s.getBytes("UTF-16LE");
For the 2nd case, I can share a portion of working code; it calls Java to convert from one encoding to another
// it returns NULL in the case of an exception
// the returned memory is calloc()'d; it's the caller's responsibility to free() it.
char* changeEncoding(const char*source, int len, int direction)
{
JNIEnv* env = threadUnsafeInfo.env;
jobject obj = threadUnsafeInfo.obj;
if (!source) {
JNU_ThrowByName(env, "java/lang/NullPointerException", 0);
return NULL;
}
jbyteArray srcArray = env->NewByteArray(len);
jclass cls = env->FindClass("com/xyz/MyClass");
jmethodID mid = env->GetMethodID(cls, "convert", "([BI)[B");
if (mid != NULL && srcArray != NULL) {
env->SetByteArrayRegion(srcArray, 0, len, (jbyte*)source);
env->ExceptionClear();
//jbyteArray resArray = (jbyteArray)env->CallStaticObjectMethod(cls, mid, srcArray, direction);
jbyteArray resArray = (jbyteArray)env->CallObjectMethod(obj, mid, srcArray, direction);
if(env->ExceptionOccurred()) {
DLOG("exception in convert ([BI)[B");
env->ExceptionDescribe();
//env->ExceptionClear(); // ??
return NULL;
}
int resultLen = env->GetArrayLength(resArray);
char* result = (char*)calloc(2 + resultLen,1); // why 2: a bit of healthy paranoia ain't gonna hurt anyone
if (result == 0) {
JNU_ThrowByName(env, "java/lang/OutOfMemoryError", 0);
return NULL;
}
env->GetByteArrayRegion(resArray, 0, resultLen, (jbyte *)result);
env->DeleteLocalRef(cls);
env->DeleteLocalRef(resArray);
env->DeleteLocalRef(srcArray);
return result;
} else {
JNU_ThrowByName(env, "java/lang/NullPointerException", 0);
myassert(("method id = 0",0));
}
return NULL;
}
In the code that I have at hand
I did not use jstrings, preferring the byte arrays.

Get Multilingual Data from ByteBuffer

I am receiving ByteBuffers in an UDP Java application.
Now the data in this ByteBuffer can be any string in any language or any special chars separated by zero.
I use following code to get Strings from it.
public String getString() {
byte[] remainingBytes = new byte[this.byteBuffer.remaining()];
this.byteBuffer.slice().get(remainingBytes);
String dataString = new String(remainingBytes);
int stringEnd = dataString.indexOf(0);
if(stringEnd == -1) {
return null;
} else {
dataString = dataString.substring(0, stringEnd);
this.byteBuffer.position(this.byteBuffer.position() + dataString.getBytes().length + 1);
return dataString;
}
}
These strings are stored in MySQL DB with everything set as UTF8.
IF i run application in Windows then special chars like ® are displayed but chinese are not.
On adding VM argument -Dfile.encoding=UTF8 chinese are displayed but chars like ® are shown as ?? etc.
Please Help.
Edit:
Input Strings in UDP packet are variable-length byte field, encoded in UTF-8, terminated by 0x00
For JDBC also i use useUnicode=true&characterEncoding=UTF-8

String dataString = new String(remainingBytes); is wrong. You should almost never do that. You should find out what encoding was used to put the bytes into the UDP packet, and use the same encoding on that line:
String dataString = new String(remainingBytes, encoding); // e.g. "UTF-8"
Edit: based on your updated question, encoding should be "UTF-8"

Not sure, but dataString contains only data till this zero, because stringEnd shows on first zero postion but not behind.
dataString = dataString.substring(0, stringEnd+1);
or
char specChar = dataString.substring(stringEnd, stringEnd+1); and it should return only special character, but as I said in the biggining, not sure...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

NewString() & NewStringUTF() showing error not valid Modified UTF-8: - java

I would suspect data instead of trackID. env->SetByteArrayRegion(trackIDArray, 0, strlen(data), (const jbyte*)data); Then you have the bytes and on the java side may look what encoding it is - by a hex dump or other inspection. Later: String s = new String(data, "Cp1252"); // Or so.

Related

In Java Is it possible to convert character set 1047 into another character set, say 500?

How to convert a String-represented ByteBuffer into a byte array in Java

JAVA: failing to get encrypted data in string using xor

Sending java string as c/c++ bytes

Get Multilingual Data from ByteBuffer

Categories

Resources