I have the following C++ code to cipher a string with XOR.
#define MPI_CIPHER_KEY "qwerty"
Buffer FooClient::cipher_string(const Buffer& _landing_url)
{
String key(CIPHER_KEY);
Buffer key_buf(key.chars(), key.length());
Buffer landing_url_cipher = FooClient::XOR(_url, key_buf);
Buffer b64_url_cipher;
base64_encode(landing_url_cipher, b64_url_cipher);
return b64_url_cipher;
}
Buffer FooClient::XOR(const Buffer& _data, const Buffer& _key)
{
Buffer retval(_data);
unsigned int klen=_key.length();
unsigned int dlen=_data.length();
unsigned int k=0;
unsigned int d=0;
for(;d<dlen;d++)
{
retval[d]=_data[d]^_key[k];
k=(++k<klen?k:0);
}
return retval;
}
I have seen in this question such java impl. would that work for this case?
String s1, s2;
StringBuilder sb = new StringBuilder();
for(int i=0; i<s1.length() && i<s2.length();i++)
sb.append((char)(s1.charAt(i) ^ s2.charAt(i)));
String result = sb.toString();
or is there an easier way to do it?
doesn't look the same to me. the c++ version loops across all of _data no matter what the _key length was, cycling through _key as necessary. (k=(++k<klen?k:0); in the c++ code)
yours returns as soon as the shortest of key or data is hit.
Personally, i'd start with the closest literal translation of C++ to java that you can do, keeping param and local names the same.
Then write unit tests for it that have known inputs and outputs from C++
then start refactoring the java version into using java idioms/etc ensuring the tests still pass.
No - the java code will only XOR up to the length of the smaller string - whereas the C++ code will XOR the entire data completely.
Assuming s1 is your "key" this can be fixed by changing to
for(int i=0; i<s2.length();i++)
sb.append((char)(s1.charAt(i%s1.length()) ^ s2.charAt(i)));
Also the base-64 encoding of the return value is missing.
Related
I'm porting my JNA-based library to "pure" Java using the Foreign Function and Memory API ([JEP 424][1]) in JDK 19.
One frequent use case my library handles is reading (null-terminated) Strings from native memory. For most *nix applications, these are "C Strings" and the MemorySegment.getUtf8String() method is sufficient to the task.
Native Windows Strings, however, are stored in UTF-16 (LE). Referenced as arrays of TCHAR or as "Wide Strings" they are treated similarly to "C Strings" except consume 2 bytes each.
JNA provides a Native.getWideString() method for this purpose which invokes native code to efficiently iterate over the appropriate character set.
I don't see a UTF-16 equivalent to the getUtf8String() (and corresponding set...()) optimized for these Windows-based applications.
I can work around the problem with a few approaches:
If I'm reading from a fixed size buffer, I can create a new String(bytes, StandardCharsets.UTF_16LE) and:
If I know the memory was cleared before being filled, use trim()
Otherwise split() on the null delimiter and extract the first element
If I'm just reading from a pointer offset with no knowledge of the total size (or a very large total size I don't want to instantiate into a byte[]) I can iterate character-by-character looking for the null.
While certainly I wouldn't expect the JDK to provide native implementations for every character set, I would think that Windows represents a significant enough usage share to support its primary native encoding alongside the UTF-8 convenience methods. Is there a method to do this that I haven't discovered yet? Or are there any better alternatives than the new String() or character-based iteration approaches I've described?
Since Java’s char is a UTF-16 unit, there’s no need for special “wide string” support in the Foreign API, as the conversion (which may be a mere copying operation in some cases) does already exist:
public static String fromWideString(MemorySegment wide) {
var cb = wide.asByteBuffer().order(ByteOrder.nativeOrder()).asCharBuffer();
int limit = 0; // check for zero termination
for(int end = cb.limit(); limit < end && cb.get(limit) != 0; limit++) {}
return cb.limit(limit).toString();
}
public static MemorySegment toWideString(String s, SegmentAllocator allocator) {
MemorySegment ms = allocator.allocateArray(ValueLayout.JAVA_CHAR, s.length() + 1);
ms.asByteBuffer().order(ByteOrder.nativeOrder()).asCharBuffer().put(s).put('\0');
return ms;
}
This is not using UTF-16LE specifically, but the current platform’s native order, which is usually the intended thing on a platform with native wide strings. Of course, when running on Windows x86 or x64, this will result in the UTF-16LE encoding.
Note that CharBuffer implements CharSequence which implies that for a lot of use cases you can omit the final toString() step when reading a wide string and effectively process the memory segment without a copying step.
A charset decoder provides a way to deal with null terminated MemorySegment wide / UTF16_LE to String on Windows using Foreign Memory API. This may not be any different / improvement to your workaround suggestions, as it involves scanning the resulting character buffer for the null position.
public static String toJavaString(MemorySegment wide) {
return toJavaString(wide, StandardCharsets.UTF_16LE);
}
public static String toJavaString(MemorySegment segment, Charset charset) {
// JDK Panama only handles UTF-8, it does strlen() scan for 0 in the segment
// which is valid as all code points of 2 and 3 bytes lead with high bit "1".
if (StandardCharsets.UTF_8 == charset)
return segment.getUtf8String(0);
// if (StandardCharsets.UTF_16LE == charset) {
// return Holger answer
// }
// This conversion is convoluted: MemorySegment->ByteBuffer->CharBuffer->String
CharBuffer cb = charset.decode(segment.asByteBuffer());
// cb.array() isn't valid unless cb.hasArray() is true so use cb.get() to
// find a null terminator character, ignoring it and the remaining characters
final int max = cb.limit();
int len = 0;
while(len < max && cb.get(len) != '\0')
len++;
return cb.limit(len).toString();
}
Going the other way String -> null terminated Windows wide MemorySegment:
public static MemorySegment toCString(SegmentAllocator allocator, String s, Charset charset) {
// "==" is OK here as StandardCharsets.UTF_8 == Charset.forName("UTF8")
if (StandardCharsets.UTF_8 == charset)
return allocator.allocateUtf8String(s);
// else if (StandardCharsets.UTF_16LE == charset) {
// return Holger answer
// }
// For MB charsets it is safer to append terminator '\0' and let JDK append
// appropriate byte[] null termination (typically 1,2,4 bytes) to the segment
return allocator.allocateArray(JAVA_BYTE, (s+"\0").getBytes(charset));
}
/** Convert Java String to Windows Wide String format */
public static MemorySegment toWideString(String s, SegmentAllocator allocator) {
return toCString(allocator, s, StandardCharsets.UTF_16LE);
}
Like you, I'd also like to know if there are better approaches than the above.
I'm currently trying to move over some encoding script from Java to NodeJs.
At the moment the current Java script is as follows:
public static final char[] chars = "0123456789abcdef".toCharArray();
public static String sha1Digest(String str) {
try {
MessageDigest instance = MessageDigest.getInstance('SHA-1');
instance.reset();
instance.update(str.getBytes('UTF-8'));
return lastEncode(instance.digest());
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(e);
}
}
public static String lastEncode(byte[] bArr) {
StringBuilder encoded = new StringBuilder(bArr.length * 2);
for (byte b : bArr) {
encoded.append(chars[(b >> 4) & 15]);
encoded.append(chars[b & 15]);
}
return encoded.toString();
}
The initial parameter passed to the sha1Digest function is a string that consists of a URL appended with a secret key.
Currently, I'm trying to transfer the code over to NodeJs in which I have this code (for now):
async function sha1Digest(str) {
try {
const sha1 = crypto.createHmac("SHA1");
const hmac = sha1.update(new Buffer(str, 'utf-8'));
return encoder(hmac.digest());
} catch (e) {
console.dir(e);
}
}
async function lastEncode(bArr) {
let chars = "0123456789abcdef".split('')
let sb = '';
for (b in bArr) {
sb = sb + (chars[(b >> 4) & 15]);
sb = sb + (chars[b & 15]);
}
return sb;
}
Sadly tho, I have no understanding of what the part in the for loop in lastEncode does.
Is anybody able to help me out with this, and also verify that the sha1Digest function seems correct in the NodeJS?
Much appreciated!
lastEncode turns a byte array into hex nibbles. It turns the array: new byte[] {10, 16, (byte) 255} into the string "0a10ff". (0a is hex notation for 10, ff is hex notation for 255, etc - if this sounds like gobbledygook to you, the web has many tutorials on hexadecimal :P).
Your javascript translation messes up because you're joining on ,. More generally, to do that 'bytes to nibbles' operation of before, see this SO answer.
Just test the lastEncode function by itself. Run it in java, then run your javascript port, and ensure the exact same string is produced in both variants. Only then, move on to the hashing part.
NB: To be clear, this protocol is rather idiotic - you can just hash the byte array itself, there is no need to waste a ton of time turning that into hex nibbles (which is always exactly 2x as large as the input) and then hashing that. But, presumably, you can't mess with the protocol at this point. But if you can, change it. It'll be faster, simpler to explain, and less code. Win-win-win.
EDIT: NB: You also are using a different hash algorithm in the javascript side (HMAC-SHA1 is not the same as just SHA1).
Native Code :
writing number 27 using fwrite().
int main()
{
int a = 27;
FILE *fp;
fp = fopen("/data/tmp.log", "w");
if (!fp)
return -errno;
fwrite(&a, 4, 1, fp);
fclose();
return 0;
}
Reading back the data(27) using DataInputStream.readInt() :
public int readIntDataInputStream(void)
{
String filePath = "/data/tmp.log";
InputStream is = null;
DataInputStream dis = null;
int k;
is = new FileInputStream(filePath);
dis = new DataInputStream(is);
k = dis.readInt();
Log.i(TAG, "Size : " + k);
return 0;
}
O/p
Size : 452984832
Well that in hex is 0x1b000000
0x1b is 27. But the readInt() is reading the data as big endian while my native coding is writing as little endian. . So, instead of 0x0000001b i get 0x1b000000.
Is my understanding correct? Did anyone came across this problem before?
From the Javadoc for readInt():
This method is suitable for reading bytes written by the writeInt method of interface DataOutput
If you want to read something written by a C program you'll have to do the byte swapping yourself, using the facilities in java.nio. I've never done this but I believe you would read the data into a ByteBuffer, set the buffer's order to ByteOrder.LITTLE_ENDIAN and then create an IntBuffer view over the ByteBuffer if you have an array of values, or just use ByteBuffer#getInt() for a single value.
All that aside, I agree with #EJP that the external format for the data should be big-endian for greatest compatibility.
There are multiple issues in your code:
You assume that the size of int is 4, it is not necessarily true, and since you want to deal with 32-bit ints, you should use int32_t or uint32_t.
You must open the file in binary more to write binary data reliably. The above code would fail on Windows for less trivial output. Use fopen("/data/tmp.log", "wb").
You must deal with endianness. You are using the file to exchange data between different platforms that may have different native endianness and/or endian specific APIs. Java seems to use big-endian, aka network byte order, so you should convert the values on the C platform with the hton32() utility function. It is unlikely to have significant impact on performance on the PC side, as this function is usually expanded inline, possibly as a single instruction and most of the time will be spent waiting for I/O anyway.
Here is a modified version of the code:
#include <endian.h>
#include <stdint.h>
#include <stdio.h>
int main(void) {
uint32_t a = hton32(27);
FILE *fp = fopen("/data/tmp.log", "wb");
if (!fp) {
return errno;
}
fwrite(&a, sizeof a, 1, fp);
fclose();
return 0;
}
I'm in front of a curious problem. Some code is better than long story:
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
buffer.write(...); // I write byte[] data
// In debugger I can see that buffer's count = 449597
String szData = buffer.toString();
int iSizeData = buffer.size();
// But here, szData's count = 240368
// & iSizeData = 449597
So my question is: why szData doesn't contain all the buffer's data? (only one Thread run this code) because after that kind of operation, I don't want szData.charAt(iSizeData - 1) crashes!
EDIT: szData.getBytes().length = 450566. There is encoding problems I think. Better use a byte[] instead of a String finally?
In Java, char ≠ byte, depending on the default character coding of the platform, char can occupy up to 4 bytes in memory. You work either with bytes (binary data), or with characters (strings), you cannot (easily) switch between them.
For String operations like strncasecmp in C, use the methods of the String class, e.g. String.compareToIgnoreCase(String str). Also have a look at the StringUtils class from the Apache Commons Lang library.
I'm writing a code generator that is replaying events recorded during a packet capture.
The JVM is pretty limited - it turns out. Methods can't be >64KB in size. So I added all kinds of trickery to make my code generator split up Java methods.
But now I have a new problem. I was taking a number of byte[] arrays and making them static variables in my class, e.g.:
public class myclass {
private static byte[] byteArray = { 0x3c, 0x3f, ...
...
};
private static byte[] byteArray2 = { 0x1a, 0x20, ...
...
};
...
private static byte[] byteArray_n = { 0x0a, 0x0d, ...
...
};
}
Now I get the error: "The code for the static initializer is exceeding the 65535 bytes limit".
I DO NOT WANT TO HAVE AN EXTERNAL FILE AND READ IN THE DATA FROM THERE. I WANT TO USE CODE GENERATED IN A SINGLE FILE.
What can I do? Can I declare the arrays outside the class? Or should I be using a string with unicode for the values 128-255 (e.g. \u009c instead of (byte)0x9c)? Or am I the only person in the world right now that wants to use statically initialised data?
UPDATE
The technique I'm now using is auto-creation of functions like the following:
private byte[] byteArray_6() {
String localString = "\u00ff\u00d8\u00ff\u00e0\u0000\u0010JFIF\u0000" +
"(0%()(\u00ff\u00db\u0000C\u0001\u0007\u0007\u0007\n\u0008\n\u0013\n" +
"\u0000\u00b5\u0010\u0000\u0002\u0001\u0003\u0003\u0002\u0004\u0003";
byte[] localBuff = new byte[ localString.length() ];
for ( int localInt = 0; localInt < localString.length(); localInt++ ) {
localBuff[localInt] = (byte)localString.charAt(localInt);
}
return localBuff;
}
Note: Java keeps on surprising. You'd think you could just encode every value in the range 0-255 as \u00XX (where XX is the 2-character hex representation). But you'd be wrong. The Java compiler actually thinks \u000A is a literal "\n" in your code - which breaks the compilation of your source code. So your strings can be littered with Unicode escapes but you'll have to use "\n" and "\r" instead of \u000a and \u000d respectively. And it doesn't hurt to put printable characters as they are in the strings instead of the 6 character Unicode escape representation.
Generally, you would put the data in a literal String and then have a method which decodes that to a byte[]. toByteArray() is of limited use as UTF-8 wont produce all possible byte sequences, and some values don't appear at all.
This technique is quite popular when trying to produce small object code. Removing huge sequences of array initialisation code will also help start up time.
Off the top of my head:
public static byte[] toBytes(String str) {
char[] src = str.toCharArray();
int len = src.length;
byte[] buff = new byte[len];
for (int i=0; i<len; ++i) {
buff[i] = (byte)src[i];
}
return buff;
}
More compact schemes are available. For instance you could limit string character contents to [1, 127] (0 is encoded in a non-normalised form for really bad reasons). Or something more complicated. I believe JDK8 will have a public API for Base64 decoding which isn't too bad and nicely standardised.
declare an arraylist and use a static constructor
May by you can use nested classes for storing static arrays.
This step is not the best in means of performans, but I think you could get it with minimum changes in your code.