Adler32 Repeating Very Quickly

Adler32 Repeating Very Quickly - java

I'm using the adler32 checksum algorithm to generate a number from a database id. So, when I insert a row into the database, I take the identity of that row and use it to create the checksum. The problem that I'm running into is that I just generated a repeat checksum after only 207 inserts into the database. This is much much faster than I expected. Here is my code:
String dbIdStr = Long.toString(dbId);
byte[] bytes = dbIdStr.getBytes();
Checksum checksum = new Adler32();
checksum.update(bytes, 0, bytes.length);
result = checksum.getValue();
Is there something wrong with what/how I'm doing? Should I be using a different method to create unique strings? I'm doing this because I don't want to use the db id in a url... a change to the structure of the db will break all the links out there in the world.
Thanks!

You should not be using Adler-32 as a hash code generator. That's not what it's for. You should use an algorithm that has good hash properties, which, among other things minimizes the probability of collisions.
You can simply use Java's hashCode method (on any object). For the String object, the hash code is the sum of the byte values of string times successive powers of 31. There can be collisions with very short strings, but it's not a horrible algorithm. It's definitely a lot better than Adler-32 as a hash algorithm.
The suggestions to use a cryptographically secure hash function (like SHA-256) are certainly overkill for your application, both in terms of execution time and hash code size. You should try Java's hashCode and see how many collisions you get. If it seems much more frequent than you'd expect for a 2-n probability (where n is the number of bits in the hash code), then you can override it with a better one. You can find a link here for decent Java hash functions.

Try and use a secure hash function like SHA-256. If you ever find a collision for any data that is not binary equal, you'll get $1000 on your bank account, with compliments. Offer ends if/when SHA-2 is cracked and you enter a collision deliberately. That said, the output is 32 bytes instead of 32 bits.

Related

Generate small UID with uniqueness

I need to generate the UID (alphanumeric) for my use case but that should be a maximum of 7 characters long as we want UID to be random but manageable, like a PNR (CYB6KL) for example.
Now if I am not wrong, I can generate a random UID that is small, but uniqueness might be compromised because of collisions (birthday paradox), so for 32 bits, 50% collision probability would be around 77k UID generations.
So in essence, I need a way to generate UIDs that are:
Small (max 7 character)
Random
Unique
Don't require lookups for the previous existance.
I will be storing this UID in a database column and it's imperative that the UID is unique. It will NOT be the table's primary key which right now is an autogenerated ID.
I am thinking of something along the lines, but I am not sure about uniqueness.
BigInteger big = new BigInteger(32, new SecureRandom());
return big.toString(32).toUpperCase();
Really appreciate any thoughts that might help on this. Generation must be unique.
Thanks in advance.

You can use a library like hashids for this purpose which implements a bimorphic translation that can encode a numeric value into a string code with a custom alphabet. This should do exactly what you want. If you need this to be traversal-secure, you should use some kind of SecureRandom as source for the underlying numeric value. If not, you could even base this on the auto increment value you already have. The benefit of reusing the primary key is that you can just translate the string code and do a lookup by primary key.

java - is there a way to confirm that a string is a sha256 hash?

I'd like to validate that a String is a sha256 representation of another without having to decrypt it. Is this possible?

Yes and no.
You can test that a string is hex very easily. You can then test that it contains a statistically sensible number of digits and letters. That will rule out some common non sha256 strings.
But if someone creates a random string designed to look like a sha256, I don't think it's possible to distinguish it from the real thing by any mathematical test. The algorithm is designed to be robust to that.

A sha-256 value is just a 256 bits (32 bytes) value which you usually represent as a String or as a byte[] in Java.
As a value per se it's pointless, if you want to tell if a specific String is a hash then any 32 bytes number is a hash of an infinite unknown plain texts. But it's like asking "how do I know that a 32 bytes number is a number?", you see that you are going nowhere.
It's useful only when it's paired to a plain text so that you can compare it with the hash computed from the plain text to verify they match.

I think what you could do is to hash the other string and then compare these two strings with each other.
No idea if this would help you but I read that it was commonly used praxis when creating rainbow tables for cracking password attempts.
EDIT: Oh forgot this is also the way to compare passwords in php when you login to a webpage iirc. At least I had to do it like this for university.

How to decrypt a SHA-256 encrypted string?

I have a string that was salted, hashed with SHA-256, then base64 encoded. Is there a way to decode this string back to its original value?

SHA-256 is a cryptographic (one-way) hash function, so there is no direct way to decode it. The entire purpose of a cryptographic hash function is that you can't undo it.
One thing you can do is a brute-force strategy, where you guess what was hashed, then hash it with the same function and see if it matches. Unless the hashed data is very easy to guess, it could take a long time though.
You may find the question "Difference between hashing a password and encrypting it" interesting.

It should be noted - Sha256 does not encrypt the data/content of your string, it instead generates a fixed size hash, using your input string as a seed.
This being the case - I could feed in the content of an encyclopedia, which would be easilly 100 mb in size of text, but the resulting string would still be 256 bits in size.
Its impossible for you to reverse the hash, to get that 100mb of data back out of the fixed size hash, the best you can do, is try to guess / compute the seed data, hash, and then see if the hash matches the hash your trying to break.
If you could reverse the hash, you would have the greatest form of compression to date.

SHA* is a hash function. It creates a representation (hash) of the original data. This hash is never intended to be used to recreate the original data. Thus it's not encryption. Rather the same hash function can be used at 2 different locations on the same original data to see if the same hash is produced. This method is commonly used for password verification.

You've done the correct thing by using a salt aka SSHA.
SHA and SHA-2 (or SHA-256) by itself without a salt are NOT considered secure anymore! Salting a SHA hash is called Salted SHA or SSHA.
Below is a simple example on how easily it is to de-hash SHA-1. The same can be done for SHA-2 without much effort as well.
Enter a password into this URL:
http://www.xorbin.com/tools/sha1-hash-calculator
Copy paste the hash into this URL:
https://hashes.com/en/decrypt/hash
Here's a page which de-hashes SHA-2. The way this pages works is somebody must have hashed your password before, otherwise it won't find it:
md5hashing dot net/hashing/sha256
Here's a page that claims to have complete SHA-2 tables available for download for a "donation" (I haven't tried it yet):
crackstation dot net/buy-crackstation-wordlist-password-cracking-dictionary.htm
Here's a good article that explains why you have to use SSHA over SHA:
crackstation dot net/hashing-security.htm

Store a number as ASCII text in Java?

It's probably a stupid question but here's the thing. I was reading this question:
Storing 1 million phone numbers
and the accepted question was what I was thinking: using a trie. In the comments Matt Ball suggested:
I think storing the phone numbers as ASCII text and compressing is a very reasonable suggestion
Problem: how do I do that in Java? And ASCII text does stand for String?

For in-memory storage as indicated in the question:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(
new GZIPOutputStream(baos), "US-ASCII");
for(String number : numbers){
out.write(number);
out.write('\n');
}
byte[] data = baos.toByteArray();
But as Pete remarked: this may be good for memory efficiency, but you can't really do anything with the data afterwards, so it's not really very useful.

Yes, ASCII means Strings in this case. You can store compressed data in Java using the java.util.zip.GZIPOutputStream.

In answer to an implied, but different question;
Q: You have 1 billion phones numbers and you need to send these over a low bandwidth connection. You only need to send whether the phone number is in the collection or not. (No other information required)
A: This is the general approach
First sort the list if its not sorted already.
From the lowest number find regions of continuous numbers. Send the start of the region and the phones which are taken. This can be stored a BitSet (1-bit per possible number) Send the phone number at the start and the BitSet whenever the gap is more than some threshold.
Write the stream to a compressed data set.
Test this to compare with a simple sending of all numbers.
You can use Strings in a sorted TreeMap. One million numbers is not very much and will use about 64 MB. I don't see the need for a more complex solution.
The latest version of Java can store ASCII text efficiently by using a byte[] instead of a char[] however, the overhead of your data structure is likely to be larger.
If you need to store a phone numbers as a key, you could store them with the assumption that large ranges will be continous. As such you could store them like
NavigableMap<String, PhoneDetails[]>
In this structure, the key would define the start of the range and you could have a phone details for each number. This could be not much bigger than the reference to the PhoneDetails (which is the minimum)
BTW: You can invent very efficient structures if you don't need access to the data. If you never access the data, don't keep it in memory, in fact you can just discard it as it won't ever be needed.
Alot depending on what you want to do with the data and why you have it in memory at all.
You can Use DeflatorOutputStream to a ByteArrayOutputStream, which will be very small, but not very useful.
I suggest using DeflatorOutputStream as its more light weight/faster/smaller than GZIPOutputStream.

Java String are by default UTF-8 encoded, you have to change the encoding if you want to manipulate ASCII text.

Making a line of code difficult to read

Im writing a way of checking if a customers serial number matches my hard coded number. Is there a way of making this as hard to read as possible in case an undesirable gets their hands on the code?
I am working in java.
For instance (pseudo code)
if (x != y) jump out of code and return error
Cheers , apologies if this is a bit of an odd one

Security through obscurity is always a bad idea. You don't need to avoid it, but you should not trust solely on it.
Either encrypt your serials with a key you type in at startup of the service, or just specify the serials as hex or base64, not ASCII.

The normal way to do this would be to use a hash.
Create a hash of your serial code.
To validate the client serial, hash that using the same function.
If the hashes match, the serial was correct, even though the serial itself was not in the code.
By definition, a from the hash it's almost impossible to deduce the original code.

Making the code look complex to avoid being hacked never helps!

You can try SHA1 or some other one-way encrypting (MD5 not so secure but it's pretty good). Don't do this:
if (userPassword equals myHardCodedpassword)
Do this:
if (ENCRYPTED(userPassword) equals myhardcodedEncryptedpassword)
So the code-reader only can see an encrypted (and very very very difficult to decrypt) value.

Tangle the control structure of the released code?
e.g feed the numbers in at a random point in the code under a different variable and at some random point make them equal x and y?
http://en.wikipedia.org/wiki/Spaghetti_code

There is a wikipedia article on code obfuscation. Maybe the tricks there can help you =)

Instead of trying to make the code complex, you can implement other methods which will not expose your hard-coded serial number.
Try storing the hard coded number at some permanent location as encrypted byte array. That way its not readable. For comparison encrypt the client serial code with same algorithm and compare.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.