Why do I sometimes get different SHA256 hashes in Java and PHP?

Why do I sometimes get different SHA256 hashes in Java and PHP? - java

So I have an odd little problem with the hashing function in PHP. It only happens some of the time, which is what is confusing me. Essentially, I have a Java app and a PHP page, both of which calculate the SHA256 of the same string. There hasn't been any issues across the two, as they calculate the same hash (generally). The one exception is that every once in a while, PHP's output is one character longer than Java's.
I have this code in PHP:
$token = $_GET["token"];
$token = hash("sha256", $token."<salt>");
echo "Your token is " . $token;
99% of the time, I get the right hash. But every once in a while, I get something like this (space added to show the difference):
26be60ec9a36f217df83834939cbefa33ac798776977c1970f6c38ba1cf92e92 # PHP
26be60ec9a36f217df83834939cbefa33ac798776977c197 f6c38ba1cf92e92 # Java
As you can see, they're nearly identical. But the top one (computed by PHP) has one more 0 for some reason. I haven't really noticed a rhyme or reason to it, but it's certainly stumped me. I've tried thinking of things like the wrong encoding, or wrong return value, but none of them really explain why they're almost identical except for that one character.
Any help on this issue would be much appreciated.
EDIT: The space is only in the bottom one to highlight where the extra 0 is. The actual hash has no space, and is indeed a valid hash, as it's the same one that Java produces.
EDIT2: Sorry about that. I checked the lengths with Notepad++, and since it's different than my normal text editor, I misread the length by 1. So yes, the top one is indeed right. Which means that it's a bug in my Java code. I'm going to explore Ignacio's answer and get back to you.

The top hash is the correct length; the bottom hash is output because the hexadecimal values were not zero-filled on output (note that it's the MSn of a byte). So, a bug in the Java program unrelated to the hash algorithm.
>>> '%04x %02x%02x %x%x' % (0x1201, 0x12, 0x01, 0x12, 0x01)
'1201 1201 121'

Actually it's the SECOND hash which seems to have an incorrect length (63). Could it be that it is generated by assembling two different tokens, and maybe the last one - which should be 16 characters - gets the initial zero removed?

Related

java - is there a way to confirm that a string is a sha256 hash?

I'd like to validate that a String is a sha256 representation of another without having to decrypt it. Is this possible?

Yes and no.
You can test that a string is hex very easily. You can then test that it contains a statistically sensible number of digits and letters. That will rule out some common non sha256 strings.
But if someone creates a random string designed to look like a sha256, I don't think it's possible to distinguish it from the real thing by any mathematical test. The algorithm is designed to be robust to that.

A sha-256 value is just a 256 bits (32 bytes) value which you usually represent as a String or as a byte[] in Java.
As a value per se it's pointless, if you want to tell if a specific String is a hash then any 32 bytes number is a hash of an infinite unknown plain texts. But it's like asking "how do I know that a 32 bytes number is a number?", you see that you are going nowhere.
It's useful only when it's paired to a plain text so that you can compare it with the hash computed from the plain text to verify they match.

I think what you could do is to hash the other string and then compare these two strings with each other.
No idea if this would help you but I read that it was commonly used praxis when creating rainbow tables for cracking password attempts.
EDIT: Oh forgot this is also the way to compare passwords in php when you login to a webpage iirc. At least I had to do it like this for university.

Is my encryption safe?

I made a own encryption and I would like to know wether it is safe or not.
First of all, its written in Java.
I started with this String:
"Never gonna give you up, never gonna let you down"
it ends as a byte array when encrypted, but for visualizing, I changed it to hexadecimal.
"6b4053405705424a4b4b4405424c5340055c4a5005505509054b4053405705424a4b4b4405494051055c4a5005414a524b"
now is it safe or should i rethink?

Converting each pair of characters as a byte value to ascii gives
k#S#W BJKKD BLS# \JP PU K#S#W BJKKD I#Q \JP AJRK
which is just a simple substitution cipher.
They have these in the newspaper, and people solve them with a pen and paper.

Cryptanalysis: XOR of two plaintext files

I have a file which contains the result of two XORed plaintext files. How do I attack this file in order to decrypt either of the plaintext files? I have searched quite a bit, but could not find any answers. Thanks!
EDIT:
Well, I also have the two ciphertexts which i XORed to get the XOR of the two plaintexts. The reason I ask this question, is because, according to Bruce Schneier, pg. 198, Applied Cryptography, 1996 "...she can XOR them together and get two plaintext messages XORed with each other. This is easy to break, and then she can XOR one of the plaintexts with the ciphertext to get the keystream." (This is in relation to a simple stream cipher) But beyond that he provided no explanation. Which is why I asked here. Forgive my ignorance.
Also, the algorithm used is a simple one, and a symmetric key is used whose length is 3.
FURTHER EDIT:
I forgot to add: Im assuming that a simple stream cipher was used for encryption.

I'm no cryptanalyst, but if you know something about the characteristics of the files you might have a chance.
For example, lets assume that you know that both original plaintexts:
contain plain ASCII English text
are articles about sports (or whatever)
Given those 2 pieces of information, one approach you might take is to scan through the ciphertext 'decrypting' using words that you might expect to be in them, such as "football", "player", "score", etc. Perform the decryption using "football" at position 0 of the ciphertext, then at position 1, then 2 and so on.
If the result of decrypting a sequence of bytes appears to be a word or word fragment, then you have a good chance that you've found plaintext from both files. That may give you a clue as to some surrounding plaintext, and you can see if that results in a sensible decryption. And so on.
Repeat this process with other words/phrases/fragments that you might expect to be in the plaintexts.
In response to your question's edit: what Schneier is talking about is that if someone has 2 ciphertexts that have been XOR encrypted using the same key, XORing those ciphertexts will 'cancel out' the keystream, since:
(A ^ k) - ciphertext of A
(B ^ k) - ciphertext of B
(A ^ k) ^ (B ^ k) - the two ciphertexts XOR'ed together which simplifies to:
A ^ B ^ k ^ k - which continues to simplify to
A ^ B ^ 0
A ^ B
So now, the attacker has a new ciphertext that's composed only of the two plaintexts. If the attacker knows one of the plaintexts (say the attacker has legitimate access to A, but not B), that can be used to recover the other plaintext:
A ^ (A ^ B)
(A ^ A) ^ B
0 ^ B
B
Now the attacker has the plaintext for B.
It's actually worse than this - if the attacker has A and the ciphertext for A then he can recover the keystream already.
But, the guessing approach I gave above is a variant of the above with the attacker using (hopefully good) guesses instead of a known plaintext. Obviously it's not as easy, but it's the same concept, and it can be done without starting with known plaintext. Now the attacker has a ciphertext that 'tells' him when he's correctly guessed some plaintext (because it results in other plaintext from the decryption). So even if the key used in the original XOR operation is random gibberish, an attacker can use the file that has that random gibberish 'removed' to gain information when he's making educated guesses.

You need to take advantage of the fact that both files are plain text. There is a lot of implications which can be derived from that fact. Assuming that both texts are English texts, you can use fact that some letters are much more popular than the others. See this article.
Another hint is to note the structure of correct English text. For example, every time one statements ends, and next begins you there is a (dot, space, capital letter) sequence.
Note that in ASCII code, space is binary "0010 0000" and changing that bit in a letter will change the letter case (lower to upper and vice versa). There will be a lot of XORing using space, if both files are plain text, right?
Analyse printable characters table on this page.
Also, at the end you can use spell checker.
I know I didn't provide a solution for your question.
I just gave you some hints. Have fun, and please share your findings.
It's really an interesting task.

That is interesting. The Schneier book does indeed say that it is easy to break this. And then he kind of leaves it hanging at that. I guess you have to leave some exercises up to the reader!
There is an article by Dawson and Nielson that apparently describes an automated process for this task for text files. It's a bit on the $$ side to buy the single article. However, a second paper titled A Natural Language Approach to Automated Cryptanalysis
of Two-time Pads references the Dawson and Nielsen work and describes some assumptions they made (primarily that the text was limited to 27 characters). But this second paper appears to be freely available and describes their own system. I don't know for sure that it is free, but it is openly available on a Johns Hopkins University server.
That paper is about 10 pages long and looks interesting. I don't have time to read it at the moment but may later. I find it interesting (and telling) that it takes a 10 page paper to describe a task that another cryptographer describes as "easy".

I don't think you can - not without knowing anything about the structure of the two files.

Unless you have one of the plaintext files, you can't get the original information of the other. Mathematically expressed:
p1 XOR p2 = en
You have one equation with two unknowns, you can't possibly get something meaningful out of it.

Making a line of code difficult to read

Im writing a way of checking if a customers serial number matches my hard coded number. Is there a way of making this as hard to read as possible in case an undesirable gets their hands on the code?
I am working in java.
For instance (pseudo code)
if (x != y) jump out of code and return error
Cheers , apologies if this is a bit of an odd one

Security through obscurity is always a bad idea. You don't need to avoid it, but you should not trust solely on it.
Either encrypt your serials with a key you type in at startup of the service, or just specify the serials as hex or base64, not ASCII.

The normal way to do this would be to use a hash.
Create a hash of your serial code.
To validate the client serial, hash that using the same function.
If the hashes match, the serial was correct, even though the serial itself was not in the code.
By definition, a from the hash it's almost impossible to deduce the original code.

Making the code look complex to avoid being hacked never helps!

You can try SHA1 or some other one-way encrypting (MD5 not so secure but it's pretty good). Don't do this:
if (userPassword equals myHardCodedpassword)
Do this:
if (ENCRYPTED(userPassword) equals myhardcodedEncryptedpassword)
So the code-reader only can see an encrypted (and very very very difficult to decrypt) value.

Tangle the control structure of the released code?
e.g feed the numbers in at a random point in the code under a different variable and at some random point make them equal x and y?
http://en.wikipedia.org/wiki/Spaghetti_code

There is a wikipedia article on code obfuscation. Maybe the tricks there can help you =)

Instead of trying to make the code complex, you can implement other methods which will not expose your hard-coded serial number.
Try storing the hard coded number at some permanent location as encrypted byte array. That way its not readable. For comparison encrypt the client serial code with same algorithm and compare.

How best to search binary data for variable length bit strings?

Can anyone tell me the best way to decode binary data with variable length bit strings in java?
For example:
The binary data is 10101000 11100010 01100001 01010111 01110001 01010110
I might need to find the first match of any of the following 01, 100, 110, 1110, 1010...
In this case the match would be 1010. I then need to do the same for the remainder of the binary data. The bit strings can be up to 16 bits long and cross the byte boundaries.
Basically, I'm trying to Huffman decode jpegs using the bit strings I created from the Huffman tables in the headers. I can do it, only it's very messy, I'm turning everything, binary data included, into Stringbuffers first and I know that isn't the right way.
Before I loaded everything in string buffers I tried using just numbers in binary but of course I can't ignore the leading 0s in a code like 00011. I'm sure there must be some clever way using bit wise operators and the like to do this, but I've been staring at pages explaining bit masks and leftwise shifts etc and I still don't have a clue!
Thanks a lot for any help!
EDIT:
Thanks for all the suggestions. I've gone with the binary tree approach as it seems to be the standard way with Huffman stuff. Makes sense really as Huffman codes are created using trees. I'll also look into to storing the binary data I need to search in a big integer. Don't know how to mark multiple answers as correct, but thanks all the same.

You might use a state machine consuming zeros and ones. The state machine would have final states for all the patterns that you want to detect. Whenever it enters one of the final states, is sends a message to you with the matched pattern and goes back to the initial state.
Finally you would have only one state machine in form of a DAG which contains all your patterns.
To implement it use the state pattern (http://en.wikipedia.org/wiki/State_pattern) or any other implementation of a state machine.

Since you are decoding Huffman encoded-data, you should create a binary tree, where leaves hold the decoded bit string as data, and the bits of each Huffman code are the path to the corresponding data. The bits of the Huffman code are accessed with bit-shift and bit-mask operations. When you get to a leaf, you output the data at that leaf and go back to the root of the tree. It's very fast and efficient.

You could try stuffing it into a BigInteger then using the shift and test methods. Then use loop to walk and accept each sub pattern.
If the huffman code are in a tree, 1 == right node, 0 == left node.
for( int i =numbitsTotal; i > 0; --i )
{
int bit = bigInt.testBit( i );
if( bit == 1 )
{
// take right node -- if null accept code, apply from top
}
else
{
// take left node -- if null accept code, apply from top
}
}

I would suggest a trie. It is explicitly designed for prefix searching. In your case, it would be a binary trie.

You could use a java.util.BitSet to store your binary data and then you can implement some search functions to find the position of a smaller BitSet inside the big one...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.