Java - Write a string prefixed with the 7 bit encoded length [duplicate]

Java - Write a string prefixed with the 7 bit encoded length [duplicate] - java

I have a program in C# .net which writes 1 integer and 3 strings to a file, using BinaryWriter.Write().
Now I am programming in Java (for Android, and I'm new in Java), and I have to access the data which were previously written to a file using C#.
I tried using DataInputStream.readInt() and DataInputStream.readUTF(), but I can't get proper results. I usually get a UTFDataFormatException:
java.io.UTFDataFormatException: malformed input around byte 21
or the String and int I get is wrong...
FileInputStream fs = new FileInputStream(strFilePath);
DataInputStream ds = new DataInputStream(fs);
int i;
String str1,str2,str3;
i=ds.readInt();
str1=ds.readUTF();
str2=ds.readUTF();
str3=ds.readUTF();
ds.close();
What is the proper way of doing this?

I wrote a quick example on how to read .net's binaryWriter format in java here
excerpt from link:
/**
* Get string from binary stream. >So, if len < 0x7F, it is encoded on one
* byte as b0 = len >if len < 0x3FFF, is is encoded on 2 bytes as b0 = (len
* & 0x7F) | 0x80, b1 = len >> 7 >if len < 0x 1FFFFF, it is encoded on 3
* bytes as b0 = (len & 0x7F) | 0x80, b1 = ((len >> 7) & 0x7F) | 0x80, b2 =
* len >> 14 etc.
*
* #param is
* #return
* #throws IOException
*/
public static String getString(final InputStream is) throws IOException {
int val = getStringLength(is);
byte[] buffer = new byte[val];
if (is.read(buffer) < 0) {
throw new IOException("EOF");
}
return new String(buffer);
}
/**
* Binary files are encoded with a variable length prefix that tells you
* the size of the string. The prefix is encoded in a 7bit format where the
* 8th bit tells you if you should continue. If the 8th bit is set it means
* you need to read the next byte.
* #param bytes
* #return
*/
public static int getStringLength(final InputStream is) throws IOException {
int count = 0;
int shift = 0;
boolean more = true;
while (more) {
byte b = (byte) is.read();
count |= (b & 0x7F) << shift;
shift += 7;
if((b & 0x80) == 0) {
more = false;
}
}
return count;
}

As its name implies, BinaryWriter writes in binary format. .Net binary format to be precise, and as java is not a .Net language, it has no way of reading it. You have to use an interoperable format.
You can choose an existing format, like xml or json or any other interop format.
Or you can create your own, providing your data is simple enough to make it this way (it seems to be the case here). Just write a string to your file (using a StreamWriter for instance), provided you know your string's format. Then read your file from java as a string and parse it.

There is a very good explanation of the format used by BinaryWriter in this question Right Here it should be possible to read the data with a ByteArrayInputStream and write a simple translator.

Related

Store Binary Data in QR Code (ZXING Java Library)

My Java program needs to send a binary payload via QR Code, but I can't get it to work. I have tried several QR Code libraries and many approaches, but all seem to have this problem. My current implementation uses ZXING.
The problem is that all the Java libraries I've tried seem to be focused on String payloads, and do not provide support for binary data. The common suggested solution to this is to encode the binary data as Base64. However, my data is already near the size limit of QR Codes. With the 33% inflation caused by Base64 encoding, my data is too big. I have already expended significant effort into reducing the size of the payload, and it currently consists of 4 character hashes delimited by new lines; all inside max level compression by the Java Deflator class. I can't make it any smaller.
I need a way to store binary data in a QR code with minimal data inflation overhead.

Update:
I recently went back and published the referenced code as a project on GitHub for anyone who wants to use it.
https://github.com/yurelle/Base45Encoder
I developed a solution which only introduces a storage efficiency loss of -8%. It exploits a built-in compression optimization of the ZXING QR Code Library.
Explanation
ZXING will automatically detect if your String payload is purely AlphaNumeric (by their own definition), and if so, it will automatically compress 2 AlphaNumeric characters into 11 bits. The definition ZXING uses for "alphanumeric" is all-caps only, 0-9, and a few special symbols ('/', ':', etc.). All told, their definition allows 45 possible values. Then, it packs 2 of these Base45 digits into 11 bits.
2 digits in base 45 is 2,025 possible values. 11 bits has a maximum storage capacity of 2,048 possible states. This is only a loss of 1.1% in storage efficiency behind raw binary.
45 ^ 2 = 2,025
2 ^ 11 = 2,048
2,048 - 2,025 = 23
23 / 2,048 = 0.01123046875 = 1.123%
However, this is the ideal / theoretical efficiency. My implementation processes data in chunks, using a Long as a computational buffer. However, since Java Long's are singed, we can only use the lower 7 bytes. The conversion code requires continuously positive values; using the highest 8th byte would contaminate the sign bit and randomly produce negative values.
Real-World Test:
Using a 7 byte Long to encode a 2KB buffer of random bytes, we get the following results.
Raw Binary Size: 2,048
Encoded String Size: 3,218
QR Code Alphanum Size: 2,213 (after the QR Code compresses 2 base45 digits to 11 bits)
This is a real-world storage efficiency loss of only 8%.
2,213 - 2,048 = 165
165 / 2,048 = 0.08056640625 = 8.0566%
Solution
I implemented it as a self-contained static utility class, so all you have to do is call:
//Encode
final byte[] myBinaryData = ...;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(myBinaryData);
//Decode
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(encodedStr);
Alternatively, you can also do it via InputStreams:
//Encode
final InputStream in_1 = ... ;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(in_1);
//Decode
final InputStream in_2 = ... ;
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(in_2);
Here's the implementation
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;
/**
* For some reason none of the Java QR Code libraries support binary payloads. At least, none that
* I could find anyway. The commonly suggested workaround for this is to use Base64 encoding.
* However, this results in a 33% payload size inflation. If your payload is already near the size
* limit of QR codes, this is a lot.
*
* This class implements an encoder which takes advantage of a built-in compression optimization
* of the ZXING QR Code library, to enable the storage of Binary data into a QR Code, with a
* storage efficiency loss of only -8%.
*
* The built-in optimization is this: ZXING will automatically detect if your String payload is
* purely AlphaNumeric (by their own definition), and if so, it will automatically compress 2
* AlphaNumeric characters into 11 bits.
*
*
* ----------------------
*
*
* The included ALPHANUMERIC_TABLE is the conversion table used by the ZXING library as a reverse
* index for determining if a given input data should be classified as alphanumeric.
*
* See:
*
* com.google.zxing.qrcode.encoder.Encoder.chooseMode(String content, String encoding)
*
* which scans through the input string one character at a time and passes them to:
*
* getAlphanumericCode(int code)
*
* in the same class, which uses that character as a numeric index into the the
* ALPHANUMERIC_TABLE.
*
* If you examine the values, you'll notice that it ignores / disqualifies certain values, and
* effectively converts the input into base 45 (0 -> 44; -1 is interpreted by the calling code
* to mean a failure). This is confirmed in the function:
*
* appendAlphanumericBytes(CharSequence content, BitArray bits)
*
* where they pack 2 of these base 45 digits into 11 bits. This presents us with an opportunity.
* If we can take our data, and convert it into a compatible base 45 alphanumeric representation,
* then the QR Encoder will automatically pack that data into sub-byte chunks.
*
* 2 digits in base 45 is 2,025 possible values. 11 bits has a maximum storage capacity of 2,048
* possible states. This is only a loss of 1.1% in storage efficiency behind raw binary.
*
* 45 ^ 2 = 2,025
* 2 ^ 11 = 2,048
* 2,048 - 2,025 = 23
* 23 / 2,048 = 0.01123046875 = 1.123%
*
* However, this is the ideal / theoretical efficiency. This implementation processes data in
* chunks, using a Long as a computational buffer. However, since Java Long's are singed, we
* can only use the lower 7 bytes. The conversion code requires continuously positive values;
* using the highest 8th byte would contaminate the sign bit and randomly produce negative
* values.
*
*
* Real-World Test:
*
* Using a 7 byte Long to encode a 2KB buffer of random bytes, we get the following results.
*
* Raw Binary Size: 2,048
* Encoded String Size: 3,218
* QR Code Alphanum Size: 2,213 (after the QR Code compresses 2 base45 digits to 11 bits)
*
* This is a real-world storage efficiency loss of only 8%.
*
* 2,213 - 2,048 = 165
* 165 / 2,048 = 0.08056640625 = 8.0566%
*/
public class BinaryToBase45Encoder {
public final static int[] ALPHANUMERIC_TABLE;
/*
* You could probably just copy & paste the array literal from the ZXING source code; it's only
* an array definition. But I was unsure of the licensing issues with posting it on the internet,
* so I did it this way.
*/
static {
final Field SOURCE_ALPHANUMERIC_TABLE;
int[] tmp;
//Copy lookup table from ZXING Encoder class
try {
SOURCE_ALPHANUMERIC_TABLE = com.google.zxing.qrcode.encoder.Encoder.class.getDeclaredField("ALPHANUMERIC_TABLE");
SOURCE_ALPHANUMERIC_TABLE.setAccessible(true);
tmp = (int[]) SOURCE_ALPHANUMERIC_TABLE.get(null);
} catch (NoSuchFieldException e) {
e.printStackTrace();//Shouldn't happen
tmp = null;
} catch (IllegalAccessException e) {
e.printStackTrace();//Shouldn't happen
tmp = null;
}
//Store
ALPHANUMERIC_TABLE = tmp;
}
public static final int NUM_DISTINCT_ALPHANUM_VALUES = 45;
public static final char[] alphaNumReverseIndex = new char[NUM_DISTINCT_ALPHANUM_VALUES];
static {
//Build AlphaNum Index
final int len = ALPHANUMERIC_TABLE.length;
for (int x = 0; x < len; x++) {
// The base45 result which the alphanum lookup table produces.
// i.e. the base45 digit value which String characters are
// converted into.
//
// We use this value to build a reverse lookup table to find
// the String character we have to send to the encoder, to
// make it produce the given base45 digit value.
final int base45DigitValue = ALPHANUMERIC_TABLE[x];
//Ignore the -1 records
if (base45DigitValue > -1) {
//The index into the lookup table which produces the given base45 digit value.
//
//i.e. to produce a base45 digit with the numeric value in base45DigitValue, we need
//to send the Encoder a String character with the numeric value in x.
alphaNumReverseIndex[base45DigitValue] = (char) x;
}
}
}
/*
* The storage capacity of one digit in the number system; i.e. the maximum
* possible number of distinct values which can be stored in 1 logical digit
*/
public static final int QR_PAYLOAD_NUMERIC_BASE = NUM_DISTINCT_ALPHANUM_VALUES;
/*
* We can't use all 8 bytes, because the Long is signed, and the conversion math
* requires consistently positive values. If we populated all 8 bytes, then the
* last byte has the potential to contaminate the sign bit, and break the
* conversion math. So, we only use the lower 7 bytes, and avoid this problem.
*/
public static final int LONG_USABLE_BYTES = Long.BYTES - 1;
//The following mapping was determined by brute-forcing -1 Long (all bits 1), and compressing to base45 until it hit zero.
public static final int[] BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION = new int[] {0,2,3,5,6,8,9,11,12};
public static final int NUM_BASE45_DIGITS_PER_LONG = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[LONG_USABLE_BYTES];
public static final Map<Integer, Integer> BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION = new HashMap<>();
static {
//Build Reverse Lookup
int len = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION.length;
for (int x=0; x<len; x++) {
int numB45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[x];
BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.put(numB45Digits, x);
}
}
public static String encodeToBase45QrPayload(final byte[] inputData) throws IOException {
return encodeToBase45QrPayload(new ByteArrayInputStream(inputData));
}
public static String encodeToBase45QrPayload(final InputStream in) throws IOException {
//Init conversion state vars
final StringBuilder strOut = new StringBuilder();
int data;
long buf = 0;
// Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
// so we can process more digits of arbitrary size before we hit the wall of the binary
// chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
// left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
// and the powers of 2 don't quite line up.
while(in.available() > 0) {
//Fill buffer
int numBytesStored = 0;
while (numBytesStored < LONG_USABLE_BYTES && in.available() > 0) {
//Read next byte
data = in.read();
//Push byte into buffer
buf = (buf << 8) | data; //8 bits per byte
//Increment
numBytesStored++;
}
//Write out in lower base
final StringBuilder outputChunkBuffer = new StringBuilder();
final int numBase45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[numBytesStored];
int numB45DigitsProcessed = 0;
while(numB45DigitsProcessed < numBase45Digits) {
//Chunk out a digit
final byte digit = (byte) (buf % QR_PAYLOAD_NUMERIC_BASE);
//Drop digit data from buffer
buf = buf / QR_PAYLOAD_NUMERIC_BASE;
//Write Digit
outputChunkBuffer.append(alphaNumReverseIndex[(int) digit]);
//Track output digits
numB45DigitsProcessed++;
}
/*
* The way this code works, the processing output results in a First-In-Last-Out digit
* reversal. So, we need to buffer the chunk output, and feed it to the OutputStream
* backwards to correct this.
*
* We could probably get away with writing the bytes out in inverted order, and then
* flipping them back on the decode side, but just to be safe, I'm always keeping
* them in the proper order.
*/
strOut.append(outputChunkBuffer.reverse().toString());
}
//Return
return strOut.toString();
}
public static byte[] decodeBase45QrPayload(final String inputStr) throws IOException {
//Prep for InputStream
final byte[] buf = inputStr.getBytes();//Use the default encoding (the same encoding that the 'char' primitive uses)
return decodeBase45QrPayload(new ByteArrayInputStream(buf));
}
public static byte[] decodeBase45QrPayload(final InputStream in) throws IOException {
//Init conversion state vars
final ByteArrayOutputStream out = new ByteArrayOutputStream();
int data;
long buf = 0;
int x=0;
// Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
// so we can process more digits of arbitrary size before we hit the wall of the binary
// chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
// left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
// and the powers of 2 don't quite line up.
while(in.available() > 0) {
//Convert & Fill Buffer
int numB45Digits = 0;
while (numB45Digits < NUM_BASE45_DIGITS_PER_LONG && in.available() > 0) {
//Read in next char
char c = (char) in.read();
//Translate back through lookup table
int digit = ALPHANUMERIC_TABLE[(int) c];
//Shift buffer up one digit to make room
buf *= QR_PAYLOAD_NUMERIC_BASE;
//Append next digit
buf += digit;
//Increment
numB45Digits++;
}
//Write out in higher base
final LinkedList<Byte> outputChunkBuffer = new LinkedList<>();
final int numBytes = BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.get(numB45Digits);
int numBytesProcessed = 0;
while(numBytesProcessed < numBytes) {
//Chunk out 1 byte
final byte chunk = (byte) buf;
//Shift buffer to next byte
buf = buf >> 8; //8 bits per byte
//Write byte to output
//
//Again, we need to invert the order of the bytes, so as we chunk them off, push
//them onto a FILO stack; inverting their order.
outputChunkBuffer.push(chunk);
//Increment
numBytesProcessed++;
}
//Write chunk buffer to output stream (in reverse order)
while (outputChunkBuffer.size() > 0) {
out.write(outputChunkBuffer.pop());
}
}
//Return
out.flush();
out.close();
return out.toByteArray();
}
}
Here are some tests I ran to verify the code:
#Test
public void stringEncodingTest() throws IOException {
//Init test data
final String testStr = "Some cool input data! !##$%^&*()_+";
//Encode
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(testStr.getBytes("UTF-8"));
//Decode
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(encodedStr);
final String decodedStr = new String(decodedBytes, "UTF-8");
//Output
final boolean matches = testStr.equals(decodedStr);
assert(matches);
System.out.println("They match!");
}
#Test
public void binaryEncodingAccuracyTest() throws IOException {
//Init test data
final int maxBytes = 10_000;
for (int x=1; x<=maxBytes; x++) {
System.out.print("x: " + x + "\t");
//Encode
final byte[] inputArray = getTestBytes(x);
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(inputArray);
//Decode
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(encodedStr);
//Output
for (int y=0; y<x; y++) {
assertEquals(inputArray[y], decodedBytes[y]);
}
System.out.println("Passed!");
}
}
#Test
public void binaryEncodingEfficiencyTest() throws IOException, WriterException, NoSuchMethodException, InvocationTargetException, IllegalAccessException {
//Init test data
final byte[] inputData = new byte[2048];
new Random().nextBytes(inputData);
//Encode
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(inputData);
//Write to QR Code Encoder // Have to use Reflection to force access, since the function is not public.
final BitArray qrCode = new BitArray();
final Method appendAlphanumericBytes = com.google.zxing.qrcode.encoder.Encoder.class.getDeclaredMethod("appendAlphanumericBytes", CharSequence.class, BitArray.class);
appendAlphanumericBytes.setAccessible(true);
appendAlphanumericBytes.invoke(null, encodedStr, qrCode);
//Output
final int origSize = inputData.length;
final int qrSize = qrCode.getSizeInBytes();
System.out.println("Raw Binary Size:\t\t" + origSize + "\nEncoded String Size:\t" + encodedStr.length() + "\nQR Code Alphanum Size:\t" + qrSize);
//Calculate Storage Efficiency Loss
final int delta = origSize - qrSize;
final double efficiency = ((double) delta) / origSize;
System.out.println("Storage Efficiency Loss: " + String.format("%.3f", efficiency * 100) + "%");
}
public static byte[] getTestBytes(int numBytes) {
final Random rand = new Random();
final ByteArrayOutputStream bos = new ByteArrayOutputStream();
for (int x=0; x<numBytes; x++) {
//bos.write(255);// -1 (byte) = 255 (int) = 1111 1111
byte b = (byte) rand.nextInt();
bos.write(b);
}
return bos.toByteArray();
}

How to convert specific bytes from binary file into string most efficiently

So I have binary FRX files, from which I need to extract strings into Java.
I wrote this into my Java program like so:
FileInputStream ReadFRX = null ;
FileOutputStream TempCapt = null ;
try{
// refNum is hex number on end of VB form property converted to decimal, ex: $"frmResidency.frx":0134
int refNum = Integer.parseInt(line.substring(line.length() - 4, line.length()), 16);
// FRXtemp.txt is created, to temporarily write FRX captions onto to be read from.
PrintWriter writer = new PrintWriter("FRXtemp.txt", "UTF-8");
writer.close();
//opens corresponding FRX file to read into
ReadFRX = new FileInputStream("FRXFiles\\"+curFrmName + ".frx");
//aLittleEndian... must be used to match readInt() little-endianness
LittleEndianDataInputStream ActReadFRX = new LittleEndianDataInputStream(ReadFRX);
TempCapt = new FileOutputStream("FRXtemp.txt");
ActReadFRX.skipBytes(refNum);
int length = ActReadFRX.readInt();
int c;
for (c = 0; c < length; c++) {
// first read byte and check for EOF
TempCapt.write(ActReadFRX.read());
}
}
//If caption is not read properly (ie. possibly wrong bytes), EOF Exception will occur and designer will break
catch (EOFException e){
System.out.println("ERROR : FRX Caption property was mishandled");
break;
}
//Read data from FRXtemp.txt into string
String actCaption = "\"" + new Scanner(new File("FRXtemp.txt")).useDelimiter("\\A").next() + " \" ";
This works perfectly, however I think writing to a temporary file so that I can read off of it must be highly unnecessary.
Why I can't think of a more efficient method:
I feel like a much more practical approach would be to use a Byte[] Array, and then convert that to a string, however I must only have the bytes in which the string are stored. Research led me to believe that RandomAccessFile was then necessary so that I could set an offset from ReadInt to begin reading bytes , however RandomAccessFile assumes big-endian format, whereas I have little-endian format. I can obviously convert, however at that point my current solution seems just as viable.
My question is, is there an efficient way to convert a specific section of bytes corresponding to a 4-byte integer (from a binary file with little-endian format) into a string in Java?
I feel as though I must be overlooking something much more simple. Thanks :)

You can do this any number ways, however the simplest might be.
try (DataInputStream dis = new DataInputStream(new FileInputStream(file))) {
dis.skip(bytesToSkip);
int length = Integer.reverseBytes(dis.readInt());
byte[] bytes = new bytes[length];
dis.readFully(bytes);
return new String(bytes, "UTF-8");
}
The method you might have been looking for is in Integer
/**
* Returns the value obtained by reversing the order of the bytes in the
* two's complement representation of the specified {#code int} value.
*
* #param i the value whose bytes are to be reversed
* #return the value obtained by reversing the bytes in the specified
* {#code int} value.
* #since 1.5
*/
public static int reverseBytes(int i) {
return ((i >>> 24) ) |
((i >> 8) & 0xFF00) |
((i << 8) & 0xFF0000) |
((i << 24));
}

Something like this maybe?
long length = 0xff && mybytes[0]; length<<8;
length |= 0xff && mybytes[1]; length<<8;
length |= 0xff && mybytes[2]; length<<8;
length |= 0xff && mybytes[3]; length<<8;

You can use the inputStream that you have as the source and use a ByteBuffer to correct the endianess when creating the Strings as needed. This would be the most efficient way.

Error when show UTF8 String with build mode in java

I have a hex string (sA) convert from UTF8 string.
When I convert hex string sA to UTF8 string, I can't show it in form UI with build mode (run file .jar) but when I run with run mode or debug mode UTF8 string can show in form UI.
I use netbeans IDE 7.3.1.
My code below:
public String hexToString(String txtInHex) {
byte[] txtInByte = new byte[txtInHex.length() / 2];
int j = 0;
for (int i = 0; i < txtInHex.length(); i += 2) {
txtInByte[j++] = Byte.parseByte(txtInHex.substring(i, i + 2), 16);
}
return new String(txtInByte);
}
private String asHex(byte[] buf) {
char[] chars = new char[2 * buf.length];
for (int i = 0; i < buf.length; ++i) {
chars[2 * i] = HEX_CHARS[(buf[i] & 0xF0) >>> 4];
chars[2 * i + 1] = HEX_CHARS[buf[i] & 0x0F];
}
return new String(chars);
}

There are multiple problems with this code.
The valid range for byte values is -128 to 127, or -80 to 7F in hex, and Byte.parseByte enforces this. If your asHex method has to process a character whose second byte is greater than 127 it will produce a string that can't be decoded by toHexString.
The asHex method processes only the second byte of the input characters, so it will work correctly only for the first 256 Unicode characters and produce bogus output for the rest of them.
The toHexString method decodes a string from a byte array assuming some platform-specific default encoding, which will give incorrect results if the data was supposedly encoded in UTF-8 and the default encoding is something else.
Why are you trying to create your own methods for encoding and decoding hex strings instead of using a well known and tested library?

new String(txtInByte, "UTF-8");
Without the encoding the platform encoding is taken, for instance Windows-1252. The same holds for its inverse: String.getBytes-
String s = "....";
byte[] b = s.getBytes("UTF-8");

Java Convert 7bit Charset Octets to Readable String (From PDU SMS)

I'm receiving SMS from GSM modem in PDU format; the TP-User-Data is "C8329BFD06DDDF72363904"
and what I get is: "�2����r69", while the sent sms is "Hello World!".
Here is my java code:
private String fromPDUText(String PDUSMSText) {
String endoding = PDUSMSText.substring(0, 2);
PDUSMSText = PDUSMSText.substring(18);
byte bs[] = new byte[PDUSMSText.length() / 2];
for(int i = 0; i < PDUSMSText.length(); i += 2) {
bs[i / 2] = (byte) Integer.parseInt(PDUSMSText.substring(i, i + 2), 16);
}
try {
String out = new String(bs, "ASCII");
} catch(UnsupportedEncodingException e) {
e.printStackTrace();
return "";
} finally {
return out;
}
}

The input is packed in 7-bits per character, which means that every 8 bytes encode 9 characters. Constructing a parser for this format can be a fun exercise or a frustrating experience, depending on how you take it. You are probably better off using a library, and a quick Google search reveals several code examples.

This is how 7Bit characters are packed:
Encoding-Decoding-7-bit-User-Data-for-SMS-PDU-PDU

Personally I find it easiest to attack this kind of problem by viewing it as having a pipe where you feed 8 bits in one end and retrieve 7 bits in the other. As long as there is at least 7 bits in the pipe you read from it. When there are less than 7 bits you need to add some more so you write 8 new bits to it. So what you need is:
A pipe that can hold at least 14 bits (but why be cheap? Go with a 32-bit int!).
A counter keeping track of how many bits are in the pipe at any given moment.
The algorithm in pseudo code is as follows:
pipe = 0;
bitCount = 0;
while(hasMoreData())
{
pipe |= readByte() << bitCount;
bitCount += 8;
while(bitCount >= 7)
{
writeByte(pipe & 0x7F);
pipe >>= 7;
bitCount -= 7;
}
}

Converting char array into byte array and back again

I'm looking to convert a Java char array to a byte array without creating an intermediate String, as the char array contains a password. I've looked up a couple of methods, but they all seem to fail:
char[] password = "password".toCharArray();
byte[] passwordBytes1 = new byte[password.length*2];
ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password);
byte[] passwordBytes2 = new byte[password.length*2];
for(int i=0; i<password.length; i++) {
passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8);
passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF);
}
String passwordAsString = new String(password);
String passwordBytes1AsString = new String(passwordBytes1);
String passwordBytes2AsString = new String(passwordBytes2);
System.out.println(passwordAsString);
System.out.println(passwordBytes1AsString);
System.out.println(passwordBytes2AsString);
assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));
The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. Why are passwordBytes1AsString and passwordBytes2AsString different from passwordAsString, yet appear identical? Am I missing out a null terminator or something? What can I do to make the conversion and unconversion work?

Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. It doesn't really mean extra code volume:
Charset latin1Charset = Charset.forName("ISO-8859-1");
charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
byteBuffer = latin1Charset.encode(charBuffer); // also decode from String
Aside:
java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). So often preferable if you use these classes directly. However, you can always do:
byteArray = ByteBuffer.array(); byteBuffer = ByteBuffer.wrap(byteArray);
byteBuffer.get(byteArray); charBuffer.put(charArray);
charArray = CharBuffer.array(); charBuffer = ByteBuffer.wrap(charArray);
charBuffer.get(charArray); charBuffer.put(charArray);

Original Answer
public byte[] charsToBytes(char[] chars){
Charset charset = Charset.forName("UTF-8");
ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes){
Charset charset = Charset.forName("UTF-8");
CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Edited to use StandardCharsets
public byte[] charsToBytes(char[] chars)
{
final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes)
{
final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Here is a JavaDoc page for StandardCharsets.
Note this on the JavaDoc page:
These charsets are guaranteed to be available on every implementation of the Java platform.

The problem is your use of the String(byte[]) constructor, which uses the platform default encoding. That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. Currently I suspect that passwordBytes1AsString and passwordBytes2AsString are each 16 characters long, with every other character being U+0000.

I would do is use a loop to convert to bytes and another to conver back to char.
char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
bytes[i*2] = (byte) (chars[i] >> 8);
bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++)
chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);

If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer(), which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order method) conversion (since the Java Strings and thus your char[] internally uses this encoding).
Use Charset.forName(charsetName), and then its encode or decode method, or the newEncoder /newDecoder.
When converting your byte[] to String, you also should indicate the encoding (and it should be the same one).

This is an extension to Peter Lawrey's answer. In order to backward (bytes-to-chars) conversion work correctly for the whole range of chars, the code should be as follows:
char[] chars = new char[bytes.length/2];
for (int i = 0; i < chars.length; i++) {
chars[i] = (char) (((bytes[i*2] & 0xff) << 8) + (bytes[i*2+1] & 0xff));
}
We need to "unsign" bytes before using (& 0xff). Otherwise half of the all possible char values will not get back correctly. For instance, chars within [0x80..0xff] range will be affected.

You should make use of getBytes() instead of toCharArray()
Replace the line
char[] password = "password".toCharArray();
with
byte[] password = "password".getBytes();

When you use GetBytes From a String in Java, The return result will depend on the default encode of your computer setting.(eg: StandardCharsetsUTF-8 or StandardCharsets.ISO_8859_1etc...).
So, whenever you want to getBytes from a String Object. Make sure to give a encode . like :
String sample = "abc";
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8);
Let check what has happened with the code.
In java, the String named sample , is stored by Unicode. every char in String stored by 2 byte.
sample : value: "abc" in Memory(Hex): 00 61 00 62 00 63
a -> 00 61
b -> 00 62
c -> 00 63
But, When we getBytes From a String, we have
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8)
//result is : 61 62 63
//length: 3 bytes
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_16BE)
//result is : 00 61 00 62 00 63
//length: 6 bytes
In order to get the oringle byte of the String. We can just read the Memory of the string and get Each byte of the String.Below is the sample Code:
public static byte[] charArray2ByteArray(char[] chars){
int length = chars.length;
byte[] result = new byte[length*2+2];
int i = 0;
for(int j = 0 ;j<chars.length;j++){
result[i++] = (byte)( (chars[j] & 0xFF00) >> 8 );
result[i++] = (byte)((chars[j] & 0x00FF)) ;
}
return result;
}
Usages:
String sample = "abc";
//First get the chars of the String,each char has two bytes(Java).
Char[] sample_chars = sample.toCharArray();
//Get the bytes
byte[] result = charArray2ByteArray(sample_chars).
//Back to String.
//Make sure we use UTF_16BE. Because we read the memory of Unicode of
//the String from Left to right. That's the same reading
//sequece of UTF-16BE.
String sample_back= new String(result , StandardCharsets.UTF_16BE);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Write a string prefixed with the 7 bit encoded length [duplicate] - java

There is a very good explanation of the format used by BinaryWriter in this question Right Here it should be possible to read the data with a ByteArrayInputStream and write a simple translator.

Related

Store Binary Data in QR Code (ZXING Java Library)

How to convert specific bytes from binary file into string most efficiently

Error when show UTF8 String with build mode in java

Java Convert 7bit Charset Octets to Readable String (From PDU SMS)

Converting char array into byte array and back again

Categories

Resources