Cassandra = Memory/Encoding-Footprint of Keys (Hash/Bytes[]=>Hex=>UTF16=>Bytes[])

Cassandra = Memory/Encoding-Footprint of Keys (Hash/Bytes[]=>Hex=>UTF16=>Bytes[]) - java

I am trying to understand the implications of using an MD5 Hash as Cassandra Key, in terms of "memory/storage consumption":
MD5 Hash of my content (in Java) = byte[] is 16 bytes long. (16 bytes is from wikipedia for generic md5, I am not shure if the java implementations also returns 16 bytes)
Hex encode this value, to be able to print it in human readable format => 1byte becomes 2hex values
I have to represent every hex value as a "character" in java => result= "two string character values" (for examle "FF" is a string of length/size = 2.)
Java uses UTF-16 => so every "string character" is encoded with two bytes. "FF" would require 2x2 bytes?
Conclusion => The MD5 Hash in Bytes format is 16 bytes, but represented as a java hex utf16 string consumes 16x2x2 = 64Bytes (in memory)!?!? Is this correct?
What is the storage Consumption in Cassandra, using this as a row-key?
If I had directly used the byte-array from the Hash function i would assume it consumes 16 bytes in Cassandra?
But if I use the hex-String representation (as noted above), can cassandra "compress" it to a 16 byets or will it also take 64bytes in cassandra? I assume 64 bytes in Cassandra, is this correct?
What kind of keys do you use? Do you use directly the outpout of an hash function or do you first encode into a hex string and then use the string?
(In MySQL I always, whenever I used a hash-key, I used the hex-string representation of it...So it is directly readable in the MySQL Tools and in the whole application. But I now realize it wastes storage???)
Maybe my thinking is completely incorrect, then it would be kind to explain where I am wrong.
Thans very much!
jens

Correct on both counts: byte[] would be 16 bytes, utf16-as-hex would be 64.
In 0.8, Cassandra has key metadata so you can tell it "this key is a byte[]" and it will display in hex in the cli.

Related

How to convert RSA encrypted numbers into text/characters

I wrote a RSA encryption in Java. I am trying to turn the numbers that it outputs into text or characters. For example if I feed it Hello I get:
23805663430659911910
However, online RSA encryptions return something to the effect of this:
GVom5zCerZ+dmOCE7YAp0F+N3L26L
I would just like to know how to convert my numbers into something similar. The number returned by my system is a BigInteger. This is what I've tried so far:
RSA rsa = new RSA("Hello");
BigInteger cypher_number = rsa.encrypt(); // 23805663430659911910
byte[] cypher_bytes = cypher_number.toByteArray(); // [B#368102c8
String cypher_text = new String(cypher_bytes); // J^��*���
// Now even though cypher_text is J^��*��� I wouldn't care as long as I can turn it back.
byte[] plain_bytes = cypher_text.getBytes(); // [B#6996db8 | Not the same as cypher_bytes but lets keep going.
BigInteger plain_number = new BigInteger(plain_bytes); // 28779359581043512470254837759607478877667261
// plain_number has more than doubled in size compared to cypher_number and won't decrypt properly.
Using bytes it the only way I can think of. Can someone please help me understand what I'm supposed to be doing or if it's even possible?

This is generally a 2-step process:
convert to binary encoding of the number;
convert the binary encoding to a text base encoding.
For both steps there are multiple schemes possible.
For binary encoding: the PKCS#1 specifications have always included one that converts the number to a statically sized integer. To be precise, it describes the number into a statically sized, unsigned, big endian octet string. An octet string is nothing but a byte array.
Now, BigInteger.toByteArray returns a dynamically sized, signed, big endian octet string. So you need to implement the possible resizing and removal of initial 00 byte in a separate method, which I have at my other post here. Fortunately going back to a number is much easier as the Java implementation provides a BigInteger(int sign, byte[] value) constructor that reads in an unsigned number and skips leading zero bytes.
Having a standardized and statically sized octet string can be terribly useful, so I would not go for any other scheme.
This leaves the conversion to and from text. For that you can (indeed) use the java.util.Base64 class, which doesn't need much explaining. The only note that I must make is that it converts to an ASCII byte[] for some of the methods, so you need to use the encodeToString(byte[] src) instead.
Another method would be hexadecimals, but since Java doesn't contain a hex encoder for byte arrays in the base classes, I'd go for base 64 instead.

I have found the answer. In case you've found this looking for the answer, you just need to encode the numbers into Base64.
The following code converts the number into a dynamically sized, signed, big endian encoded integer, and then converts it back into a number using the reverse process.
// Encode
BigInteger numbers = new BigInteger("5109763");
byte[] bytes = Base64.getEncoder().encode(numbers.toByteArray());
String encoded = new String(bytes); // Encoded value
// Decode
byte[] decoded_bytes = Base64.getDecoder().decode(encoded.getBytes());
BigInteger numbers_again = new BigInteger(decoded_bytes); // Original numbers

TOTP / HOTP / HmacSHA256 with unsigned bytes key in Java

As we can see from the following questions:
Java HmacSHA256 with key
Java vs. Golang for HOTP (rfc-4226)
, Java doesn't really play nicely when using a key in a TOTP / HOTP / HmacSHA256 use case. My analysis is that the following cause trouble:
String.getBytes will (of course) give negative byte values for characters with a character value > 127;
javax.crypto.Mac and javax.crypto.spec.SecretKeySpec both externally and internally use byte[] for accepting and transforming the key.
We have acquired a number of Feitian C-200 Single Button OTP devices, and they come with a hexadecimal string secret which consist of byte values > 127.
We have successfully created a PoC in Ruby for these tokens, which works flawlessly. Since we want to integrate these in Keycloak, we need to find a Java solution.
Since every implementation of TOTP / HOTP / HmacSHA256 we have seen makes use the javax.crypto library and byte[], we fear we have to rewrite all the used classes but using int in order to support this scenario.
Q: Is there another way? How can we use secrets in a HmacSHA256 calculation in Java of which the bytes have values > 127 without having to rewrite everything?
Update
I was looking in the wrong direction. My problem was that the key was represented a String (UTF-16 in Java), which contained Unicode characters that were exploded into two bytes by getBytes(), before being passed into the SecretKeySpec.
Forcing StandardCharsets.ISO_8859_1 on this conversion fixes the problem.

Signed vs. unsigned is a presentation issue that's mainly relevant to humans only. The computer doesn't know or care whether 0xFF means -1 or 255 to you. So no, you don't need to use ints, using byte[] works just fine.
This doesn't mean that you can't break things, since some operations work based on default signed variable types. For example:
byte b = (byte)255; // b is -1 or 255, depending on how you interpret it
int i = b; // i is -1 or 2³² instead of 255
int u = b & 0xFF; // u is 255
It seems to throw many people off that Java has only signed primitives (boolean and char not withstanding). However Java is perfectly capable of performing cryptographic operations, so all these questions where something is "impossible" are just user errors. Which is not something you want when writing security sensitive code.

Don't be afraid of Java :) I've tested dozens tokens from different vendors, and everything is fine with Java, you just need to pickup correct converter.
It's common issue to get bytes from String as getBytes() instead of using proper converter. The file you have from your vendor represent secret keys in hex format, so just google 'java hex string to byte array' and choose solution, that works for you.
Hex, Base32, Base64 is just a representation and you can easily convert from one to another.

I've ran into absolutely the same issue (some years later): we got Feitian devices, and had to set up their server side code.
None of the available implementations worked with them (neither php or java).
Solution: Feitian devices come with seeds in hexadecimal. First you have to decode the seed into raw binary (e.g. in PHP using the hex2bin()). That data is the correct input of the TOTP/HOTP functions.
The hex2bin() version of java is a bit tricky, and its solution is clearly written in the question of the OP.
(long story short: the result of hex2bin you have to interpret with StandardCharsets.ISO_8859_1, otherwise some chars will be interpreted as 2 bytes utf-16 char, which causes different passcode at the end)
String hex = "1234567890ABCDEF"; // original seed from Feitian
Sring secretKey = new String(hex2bin(hex), StandardCharsets.ISO_8859_1);
Key key = new SecretKeySpec(secretKey.getBytes(StandardCharsets.ISO_8859_1), "RAW");
// or without String representation:
Key key = new SecretKeySpec(hex2bin(hex), "RAW");

I wanted to Convert any length String to fixed 32 Bytes

I want to convert any length of String to byte32 in Java.
Code
String s="9c46267273a4999031c1d0f7e40b2a59233ce59427c4b9678d6c3a4de49b6052e71f6325296c4bddf71ea9e00da4e88c4d4fcbf241859d6aeb41e1714a0e";
//Convert into byte32

From the comments it became clear that you want to reduce the storage space of that string to 32 bytes.
The given string can easily be compressed from the 124 bytes to 62 bytes by doing a hexadecimal conversion.
However, there is no algorithm and there will not be an algorithm that can compress any data to 32 bytes. Imagine that would be possible: it would have been implemented and you would be able to get ZIP files of just 32 bytes for any file you compress.
So, unfortunately, the answer is: it's not possible.

You can not convert any length string to a byte array of length 32.
Java uses UTF-16 as it's string encoding, so in order to store 100% of the string, 1:1 as a fixed length byte array, you would be at a surface glance be limited to 16 characters.
If you are willing to live with the limitation of 16 characters, byte[] bytes = s.getBytes(); should give you a variable length byte array, but it's best to specify an explicit encoding. e.g. byte [] array2 = str.getBytes("UTF-16");
This doesn't completely solve your problem. You will now likely have to check that the byte array doesn't exceed 32 bytes, and come up with strategies for padding, possible null termination (which may potentially eat into your character budget)
Now, if you don't need the entire UTF-16 string space that Java uses for strings by default, you can get away with longer strings, by using other encodings.
IF this is to be used for any kind of other standard or something ( I see references to etherium being thrown around) then you will need to follow their standards.
Unless you are writing your own library for dealing with it directly, I highly recommend using a library that already exists, and appears to be well tested, and used.

You can achieve with the following function
byte[] bytes = s.getBytes();

AES Encryption from integers to integers

I am using AES encryption algorithm in java to encrypt my database values..My encryption function returns encrypted value as String but the columns of type "Int" fails to store such string values which is quite logical..Is there a way to encrypt the integers as integers (numerical values)? Thankyou.

Plain AES returns an array of bytes. You can store this as an array of bytes, a Base64 text string or as a BigInteger:
BigInteger myBigInt = new BigInteger(AESByteArray);
It is very unlikely that the 128 bit, or larger, AES result will fit into a 32 bit Java int.
If you want 32 bit input and 32 bit output, so everything fits into a Java int, then either write your own 32 bit Feistel cipher, or use Hasty Pudding Cipher, which can be set for any bit size you require.

Encrypting integer into integer is FPE (format preserving encryption). FPE does not change data type or data length.
Here is a reason why databases implementing FPE only for character data, never for int.
AES 128 will encrypt 128-bit block. Which is 16 bytes.
If you want to encrypt 64 or 32 bit integer(4 or 8 byte values), you still have to encrypt 16 byte block. This problem can be solved by adding 8 (or 12) bytes to int32 or int64 values. This creates issue - if added bytes are always 0, you create huge weakness in encryption, as your data set is severely limited. It can be used for brute force attack on AES etc. In turn, this can be solved by filing with cryptographically strong random number added 8 or 12 bytes (that also creates a weakness, as most likely your random genertor is not strong enough). When decrypting, you can purge extra added bytes and extract only 4 or 8 bytes our of 16 bytes.
Still, life is not perfect. AES encryption does not change size of block, it always produces 16 bytes. You can encrypt your int into 16 bytes, but database can store only 8 bytes for int.
Unless you will store data in binary(16) column. But that is not an integer, and you are asking for integer.
In theory, numeric(38) is taking 16 bytes. In some databases it is possible to set 16 bytes to arbitrary value and then extract. I have not seen it is implemented.

You can always encode your string in an integer, however it could be a large integer.
If you can't afford large integer, you can encode it in multiple small integers.
If you can afford neither large integer nor multiple integers, maybe you can't do it well anyway, using a block cipher in ECB mode is almost always a bad idea.

Try converting the output of the encryption from string to binary, and then from binary to a decimal integer.

Java: Why does this string of 4 characters create a byte[] with 6 data?

I need to be able to convert an int into a string which represents a series of bytes, and back. To do this, I came up with this code:
Int -> Byte[] -> String
new String(ByteBuffer.allocate(5).putInt(num).array())
String -> Byte[] -> Int
ByteBuffer.allocate(4).put(team.getBytes()).getInt(0)
One of my test cases is the number 4231. When viewed as a string, none of the characters are visible but that's not completely unusual, and when I invoke it's .length() method, it returns 4. But when I used .getBytes(), I get [0, 0, 16, -17, -65, -67], which causes a StackOverflowException. Can someone explain this result to me?

Without knowing the platform default encoding of your machine, it's slightly hard to say - and you should avoid calling String.getBytes without specifying an encoding, IMO.
However, basically a String represents a sequence of characters, encoded as a sequence of UTF-16 code units. Not every character is representable in one byte, in many encodings - and you certainly shouldn't assume it is. (You shouldn't even assume there's one character per char, due surrogate pairs used to represent non-BMP characters.)
Fundamentally, you shouldn't treat a string like this - if you want to encode non-text data in a string, use hex or base64 to encode the binary data, and then decode it appropriately. Otherwise you can easily get invalid strings, and lose data - and more importantly, you're simply not treating the type for the purpose it was designed.
When you convert a byte[] into a String, you're saying "This is the binary representation of some text, in a particular encoding" (either explicitly or using the platform default). That's simply not the case here - there's no text to start with, just a number... the binary data isn't encoded text, it's an encoded integer.

First, the integer was convert to 4 bytes, so the bytes are [ 0, 0, 16, -17 ]. First, let's convert 4231 to hex. We get: 000010E1. Converting to decimal, the zeroes are obviously zero. The 10 has a 1 in the 16's place, so it's 16.
So the only real mystery is where the -17 came from. The answer is that if you take the 8-bit representation of E1(hex) and add the 8 bit representation of 17(decimal) to it, you get zero (with a carry to nowhere). Therefore E1(hex) is the 8-bit representation of -17 decimal.
If this kind of stuff isn't obvious to you, you probably shouldn't mess with native encodings and should instead separate and combine the numbers yourself using things like multiplication and division. (Use just use decimal numbers and strings.)

What you are trying is viewing bytes as characters. That concept became invalid with the introduction of multi-byte characters in operating systems and languages.
In java Strings are composed of characters, not bytes. A mistake often made is that a conversion from byte[] -> String -> byte[] using the getBytes()/new String(byte[]) will yield the original bytes. Thats simply not true, depending on the encoding, byte[] -> String may already lose information (if the byte[] contains values invalid for that encoding). Likewise, not every encoding can encode every possible character.
So you are chaining two possibly lossy operations and wonder why information is lost.
Proper way to encode the information contained in the int is to select a specific representation for the int (e.g. decimal or hexadecimal) and encode/decode that.
Try this for encoding/decoding:
String hex = Integer.toString(i, 16);
int decoded = Integer.parseInt(hex, 16);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cassandra = Memory/Encoding-Footprint of Keys (Hash/Bytes[]=>Hex=>UTF16=>Bytes[]) - java

Correct on both counts: byte[] would be 16 bytes, utf16-as-hex would be 64. In 0.8, Cassandra has key metadata so you can tell it "this key is a byte[]" and it will display in hex in the cli.

Related

How to convert RSA encrypted numbers into text/characters

TOTP / HOTP / HmacSHA256 with unsigned bytes key in Java

I wanted to Convert any length String to fixed 32 Bytes

AES Encryption from integers to integers

Java: Why does this string of 4 characters create a byte[] with 6 data?

Categories

Resources