Get hash of a String as String - java

I'm here:
String password = "123";
byte passwordByte[] = password.getBytes();
MessageDigest md = MessageDigest.getInstance("SHA-512");
byte passwortHashByte[] = md.digest(passwordByte);
The passwortHashByte-Array contains only a lot of numbers. I want to convernt this numbers to one String which contains the hash-code as plaintext.
How i do this?

I want to convernt this numbers to one String which contains the hash-code as plaintext.
The hash isn't plain-text. It's binary data - arbitrary bytes. That isn't plaintext any more than an MP3 file is.
You need to work out what textual representation you want to use for the binary data. That in turn depends on what you want to use the data for. For the sake of easy diagnostics I'd suggest a pure-ASCII representation - probably either base64 or hex. If you need to easily look at individual bytes, hex is simpler to read, but base64 is a bit more compact.
It's also important to note that MD5 isn't a particularly good way of hashing passwords... and it looks like you're not even salting them. It may be good enough for a demo app which will never be released into the outside world, but you should really look into more secure approaches. See Jeff Atwood's blog post on the topic for an introduction, and ideally get hold of a book about writing secure code.

Here is how I did it for my website.
private static byte[] fromHex(String hex) {
byte[] bytes = new byte[hex.length() / 2];
for (int i = 0; i < hex.length() / 2; i++) {
bytes[i] = (byte)(Character.digit(hex.charAt(i * 2), 16) * 16 + Character.digit(hex.charAt(i * 2 + 1), 16) - 128);
}
return bytes;
}
private static String toHex(byte[] bytes) {
String hex = new String();
for (int i = 0; i < bytes.length; i++) {
String c = Integer.toHexString(bytes[i] + 128);
if (c.length() == 1) c = "0" + c;
hex = hex + c;
}
return hex;
}
That'll allow you to convert your byte array to and from a hex string.

Well, byte passwortHashByte[] = md.digest(passwordByte); can contain some controll characters, which will broke your String. Consider encoding passwortHashByte[] to Base64 form, if you really need String from it. You can use Apache Commons Codecs for making base64 form.

Related

Replicating Java password hashing code in Python (PBKDF2WithHmacSHA1)

I have been trying to replicate the java password authenticate to python, however the resulted hash is different.
password: abcd1234
password token (java): $31$16$sWy1dDEx52vwQUCswXDYMQMzTJC39g1_nmrK384T4-w
generated password token (python): $pbkdf2$16$c1d5MWRERXg1MnZ3UVVDcw$qPQvE4QbrnYJTmRXk0M7wlfhH5U
From the Java code, the Iteration is 16, SALT should the first 16 char in sWy1dDEx52vwQUCswXDYMQMzTJC39g1_nmrK384T4-w, which is sWy1dDEx52vwQUCs and the hash should be wXDYMQMzTJC39g1_nmrK384T4-w
however, applying the variables to python gave me a different hash result which, qPQvE4QbrnYJTmRXk0M7wlfhH5U which is different from Java's hash.
Where did i missed out?
Java:
private static final String ALGORITHM = "PBKDF2WithHmacSHA1";
private static final int SIZE = 128;
private static final Pattern layout = Pattern.compile("\\$31\\$(\\d\\d?)\\$(.{43})");
public boolean authenticate(char[] password, String token)
{
Matcher m = layout.matcher(token);
if (!m.matches())
throw new IllegalArgumentException("Invalid token format");
int iterations = iterations(Integer.parseInt(m.group(1)));
byte[] hash = Base64.getUrlDecoder().decode(m.group(2));
byte[] salt = Arrays.copyOfRange(hash, 0, SIZE / 8);
byte[] check = pbkdf2(password, salt, iterations);
int zero = 0;
for (int idx = 0; idx < check.length; ++idx)
zero |= hash[salt.length + idx] ^ check[idx];
return zero == 0;
}
Python:
from passlib.hash import pbkdf2_sha1
def hasher(password):
size = 128
key0 = "abcd1234"
iter = int(password.split("$")[2])
salt0 = password.split("$")[3][0: 16]
hash = pbkdf2_sha1.using(rounds=iter, salt = salt0.encode()).hash(key0)
print(hash.split('$')[4])
return hash
Original Link for Java code: How can I hash a password in Java?
There's a bunch of things different between how that java code does things, and how passlib's pbkdf2_sha1 hasher does things.
The java hash string contains a log cost parameter, which needs passing through 1<<cost to get the number of rounds / iterations.
The salt+digest needs to be base64 decoded, then take the first 16 bytes as the salt (which actually corresponds to first 21 1/3 characters of base64 data).
Similarly, since the digest's bits start in the middle of a base64 character, when the salt+digest is decoded, and digest is then encoded separately, the base64 string would be
AzNMkLf2DX-easrfzhPj7A (noticably different from the original encoded string).
Based on that, the following bit of code converts a java hash into the format used by pbkdf1_sha1.verify:
from passlib.utils.binary import b64s_decode, ab64_encode
def adapt_java_hash(jhash):
_, ident, cost, data = jhash.split("$")
assert ident == "31"
data = b64s_decode(data.replace("_", ".").replace("-", "+"))
return "$pbkdf2$%d$%s$%s" % (1<<int(cost), ab64_encode(data[:16]),
ab64_encode(data[16:]))
>>> adapt_java_hash("$31$16$sWy1dDEx52vwQUCswXDYMQMzTJC39g1_nmrK384T4-w")
'$pbkdf2$65536$sWy1dDEx52vwQUCswXDYMQ$AzNMkLf2DX.easrfzhPj7A'
The resulting string should be suitable for passing into pbkdf2_sha1.verify("abcd1234", hash); except for one issue:
The java code truncates the sha1 digest to 16 bytes, rather than the full 20 bytes; and way passlib's hasher is coded, the digest must be the full 20 bytes.
If you alter the java code to use SIZE=160 instead of SIZE=128, running the hash through the above adapt() function should then work in passlib.

Write bits in a file and retrieve them to a string of "0101.." in java?

I am working on a compression algorithm and for that i need to write strings of bits to a binary file and retrieve back exactly the same to a String again!
say, i have a string "10100100100....." and i will write them in a file as bits
(not chars '0' '1')
. and read back as bits and convert to string...
and this is for a large amount of data (>100 megabytes).
is there any neat and fast way of doing this?
So far i tried (and failed) writing them to bytes by sub-stringing into 8 bits and then as ASCII characters to a string and finally to a .txt file.
{
String Bits="10001010100000000000"; // a lot larger in actual program
String nCoded="";
char nextChar;
int i = 0;
for(i=0; i < Bits.length()-8; i += 8){
nextChar = (char)Integer.parseInt( Bits.substring(i, i+8), 2 );
nCoded += nextChar;
}
// for the remainding bits, padding
if(newBits.length()%8 != 0){
nCoded+=(char)Integer.parseInt(Bits.substring(i), 2);
}
nCoded+=(char)Bits.length()%8; //to track the remainder of Bits that was padded
writeToTextFile( nCoded, "file.txt"); //write the nCoded string to file
}
but this seems to corrupt information and inefficient.
again for clarification, i dont want the String to be written, its just a representation of the actual data. So, i want to
convert each 0 and 1 from the string representation to its binary form
and write that to file.
Here is a method you can use to convert the String to a series of bits, ready for output to file:
private byte[] toByteArray(String input){
//to charArray
char[] preBitChars = input.toCharArray();
int bitShortage = (8 - (preBitChars.length%8));
char[] bitChars = new char[preBitChars.length + bitShortage];
System.arraycopy(preBitChars, 0, bitChars, 0, preBitChars.length);
for (int i= 0; i < bitShortage; i++) {
bitChars[preBitChars.length + i]='0';
}
//to bytearray
byte[] byteArray = new byte[bitChars.length/8];
for(int i=0; i<bitChars.length; i++) {
if (bitChars[i]=='1'){
byteArray[byteArray.length - (i/8) - 1] |= 1<<(i%8);
}
}
return byteArray;
}
Passing the String "01010101" will return the result [85] as a byte[].
It turns out there is an easier way. There is a static Byte.parseByte(String) that returns Byte object. Calling:
Byte aByte = Byte.parseByte("01010101");
System.out.println(aByte);
Displays the same value: 85.
So you may ask a couple of questions here.
Why are we passing a String that is 8 characters in length. Well, you can prefix the String with an 9th character, that would represent a sign bit. I don't think you have this case, but if you needed to, the documentation for Byte.parseByte() states it should be:
An ASCII minus sign '-' ('\u002D') to indicate a negative value or an ASCII plus sign '+' ('\u002B') to indicate a positive value.
So from this information, you would need to break up your String manually into 8 bit Strings and call Byte.parseByte() to get a Byte object for each.
2) What about writing bits to a file? No, file writing is done in bytes. If you need to write the file, then read it back in and convert back to a String, you will need to reverse the process and read the file in as a byte[] then convert that to it's String representation.
A Hint on how to convert a byte to a nice String format can be found here:
Convert byte (java data type) value to bits (a string containing only 8 bits)
You can get an InputStream from a String, read each byte and write it to a file (byte is a smallest unit that you can read/write). Once everything is written, you can read the data in a similar way (i.e. InputStream) and use it. Below is an example:
String hugeSting = "10101010010101010110101001010101";
InputStream in = new ByteArrayInputStream(hugeSting.getBytes());
OutputStream out = new FileOutputStream("Test.txt");
byte b;
while((b = (byte) in.read()) != -1){
out.write(b);
}
in.close();
in = new FileInputStream("Test.txt");
//Read data

Efficiently convert Java string into null-terminated byte[] representing a C string? (ASCII)

I would like to transform a Java String str into byte[] b with the following characteristics:
b is a valid C string (ie it has b.length = str.length() + 1 and b[str.length()] == 0.
the characters in b are obtained by converting the characters in str to 8-bit ASCII characters.
What is the most efficient way to do this — preferably an existing library function? Sadly, str.getBytes("ISO-8859-1") doesn't meet my first requirement...
// do this once to setup
CharsetEncoder enc = Charset.forName("ISO-8859-1").newEncoder();
// for each string
int len = str.length();
byte b[] = new byte[len + 1];
ByteBuffer bbuf = ByteBuffer.wrap(b);
enc.encode(CharBuffer.wrap(str), bbuf, true);
// you might want to ensure that bbuf.position() == len
b[len] = 0;
This requires allocating a couple of wrapper objects, but does not copy the string characters twice.
You can use str.getBytes("ISO-8859-1") with a little trick at the end:
byte[] stringBytes=str.getBytes("ISO-8859-1");
byte[] ntBytes=new byte[stringBytes.length+1];
System.arraycopy(stringBytes, 0, ntBytes, 0, stringBytes.length);
arraycopy is relatively fast as it can use native tricks and optimizations in many cases. The new array is filled with null bytes everywhere we didn't overwrite it(basically just the last byte).
ntBytes is the array you need.

Short, case-insensitive string obfuscation strategy

I am looking for a way to identify (i.e. encode and decode) a set of Java strings with one token. The identification should not involve DB persistence. So far I have looked into Base64 encoding and DES encryption, but both are not optimal with respect to the following requirements:
Token should be as short as possible
Token should be insensitive to casing
Token should survive a URLEncoder/Decoder round-trip (i.e. will be used in URLs)
Is Base32 my best shot or are there better options? Note that I'm primarily interested in shortening & obfuscating the set, encryption/security is not important.
What's a structure of the text (i.e. set of strings)? You could use your knowledge of it to encode it in a shorten form. E.g. if you have large base-decimal number "1234567890" you could translate it into 36-base number, which will be shorter.
Otherwise it looks like you are trying invent an universal archiver.
If you don't care about length, then yes, processing by alphabet based encoder (such as Base32) is the only choice.
Also, if text is large enough, maybe you could save some space by gzipping it.
Rot13 obfuscates but does not shorten. Zip shortens (usually) but does not survive the URL round trip. Encryption will not shorten, and may lengthen. Hashing shortens but is one-way. You do not have an easy problem. Base32 is case insensitive, but takes more space than Base64, which isn't. I suspect that you are going to have to drop or modify your requirements. Which requirements are most important and which least important?
I have spent some time on this and I have a good solution for you.
Encode as base64 then as a custom base32 that uses 0-9a-v. Essentially, you lay out the bits 6 at a time (your chars are 0-9a-zA-Z) then encode them 5 at a time. This leads to hardly any extra space. For example, ABCXYZdefxyz123789 encodes as i9crnsuj9ov1h8o4433i14
Here's an implementation that works, including some test code that proves it is case-insensitive:
// Note: You can add 1 more char to this if you want to
static String chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static String decodeToken(String encoded) {
// Lay out the bits 5 at a time
StringBuilder sb = new StringBuilder();
for (byte b : encoded.toLowerCase().getBytes())
sb.append(asBits(chars.indexOf(b), 5));
sb.setLength(sb.length() - (sb.length() % 6));
// Consume it 6 bits at a time
int length = sb.length();
StringBuilder result = new StringBuilder();
for (int i = 0; i < length; i += 6)
result.append(chars.charAt(Integer.parseInt(sb.substring(i, i + 6), 2)));
return result.toString();
}
private static String generateToken(String x) {
StringBuilder sb = new StringBuilder();
for (byte b : x.getBytes())
sb.append(asBits(chars.indexOf(b), 6));
// Round up to 5 bit multiple
// Consume it 5 bits at a time
int length = sb.length();
sb.append("00000".substring(0, length % 5));
StringBuilder result = new StringBuilder();
for (int i = 0; i < length; i += 5)
result.append(chars.charAt(Integer.parseInt(sb.substring(i, i + 5), 2)));
return result.toString();
}
private static String asBits(int index, int width) {
String bits = "000000" + Integer.toBinaryString(index);
return bits.substring(bits.length() - width);
}
public static void main(String[] args) {
String input = "ABCXYZdefxyz123789";
String token = generateToken(input);
System.out.println(input + " ==> " + token);
Assert.assertEquals("mixed", input, decodeToken(token));
Assert.assertEquals("lower", input, decodeToken(token.toLowerCase()));
Assert.assertEquals("upper", input, decodeToken(token.toUpperCase()));
System.out.println("pass");
}

Efficient way to calculate byte length of a character, depending on the encoding

What's the most efficient way to calculate the byte length of a character, taking the character encoding into account? The encoding would be only known during runtime. In UTF-8 for example the characters have a variable byte length, so each character needs to be determined individually. As far now I've come up with this:
char c = getCharSomehow();
String encoding = getEncodingSomehow();
// ...
int length = new String(new char[] { c }).getBytes(encoding).length;
But this is clumsy and inefficient in a loop since a new String needs to be created everytime. I can't find other and more efficient ways in the Java API. There's a String#valueOf(char), but according its source it does basically the same as above. I imagine that this can be done with bitwise operations like bit shifting, but that's my weak point and I'm unsure how to take the encoding into account here :)
If you question the need for this, check this topic.
Update: the answer from #Bkkbrad is technically the most efficient:
char c = getCharSomehow();
String encoding = getEncodingSomehow();
CharsetEncoder encoder = Charset.forName(encoding).newEncoder();
// ...
int length = encoder.encode(CharBuffer.wrap(new char[] { c })).limit();
However as #Stephen C pointed out, there are more problems with this. There may for example be combined/surrogate characters which needs to be taken into account as well. But that's another problem which needs to be solved in the step before this step.
Use a CharsetEncoder and reuse a CharBuffer as input and a ByteBuffer as output.
On my system, the following code takes 25 seconds to encode 100,000 single characters:
Charset utf8 = Charset.forName("UTF-8");
char[] array = new char[1];
for (int reps = 0; reps < 10000; reps++) {
for (array[0] = 0; array[0] < 10000; array[0]++) {
int len = new String(array).getBytes(utf8).length;
}
}
However, the following code does the same thing in under 4 seconds:
Charset utf8 = Charset.forName("UTF-8");
CharsetEncoder encoder = utf8.newEncoder();
char[] array = new char[1];
CharBuffer input = CharBuffer.wrap(array);
ByteBuffer output = ByteBuffer.allocate(10);
for (int reps = 0; reps < 10000; reps++) {
for (array[0] = 0; array[0] < 10000; array[0]++) {
output.clear();
input.clear();
encoder.encode(input, output, false);
int len = output.position();
}
}
Edit: Why do haters gotta hate?
Here's a solution that reads from a CharBuffer and keeps track of surrogate pairs:
Charset utf8 = Charset.forName("UTF-8");
CharsetEncoder encoder = utf8.newEncoder();
CharBuffer input = //allocate in some way, or pass as parameter
ByteBuffer output = ByteBuffer.allocate(10);
int limit = input.limit();
while(input.position() < limit) {
output.clear();
input.mark();
input.limit(Math.max(input.position() + 2, input.capacity()));
if (Character.isHighSurrogate(input.get()) && !Character.isLowSurrogate(input.get())) {
//Malformed surrogate pair; do something!
}
input.limit(input.position());
input.reset();
encoder.encode(input, output, false);
int encodedLen = output.position();
}
If you can guarantee that the input is well-formed UTF-8, then there's no reason to find code points at all. One of the strengths of UTF-8 is that you can detect the start of a code point from any position in the string. Simply search backwards until you find a byte such that (b & 0xc0) != 0x80, and you've found another character. Since a UTF-8 encoded code point is always 6 bytes or less, you can copy the intermediate bytes into a fixed-length buffer.
Edit: I forgot to mention, even if you don't go with this strategy, it is not sufficient to use a Java "char" to store arbitrary code points since code point values can exceed 0xffff. You need to store code points in an "int".
It is possible that an encoding scheme could encode a given character as a variable number of bytes, depending on what comes before and after it in the character sequence. The byte length you get from encoding a single character String is therefore not the whole answer.
(For example, you could theoretically receive a baudot / teletype characters encoded as 4 characters every 3 bytes, or you could theoretically treat a UTF-16 + a stream compressor as an encoding scheme. Yes, it is all a bit implausible, but ...)
Try Charset.forName("UTF-8").encode("string").limit(); Might be a bit more efficient, maybe not.

Categories

Resources