What I am trying to do is read from a text file where each line has the path to a file and then space for a separator and a hash that accompanies it. So I call checkVersion() and loadStrings(File f_) returns a String[], one place for each line. When I try to check the hashes however I end up with something that isn't even hex and is twice as long as it should be, it's probably something obvious that my eyes are just overlooking. The idea behind this is an auto-update for my game to save bandwidth, thanks for your time.
The code is fixed, here is the final version if anyone else has this issue, thanks a lot everyone.
void checkVersion() {
String[] v = loadStrings("version.txt");
for(int i=0; i<v.length; i++) {
String[] piece = split(v[i], " "); //BREAKS INTO FILENAME, HASH
println("Checking "+piece[0]+"..."+piece[1]);
if(checkHash(piece[0], piece[1])) {
println("ok!");
} else {
println("NOT OKAY!");
//CONTINUE TO DOWNLOAD FILE AND THEN CALL CHECKVERSION AGAIN
}
}
}
boolean checkHash(String path_, String hash_) {
return createHash(path_).equals(hash_);
}
byte[] messageDigest(String message, String algorithm) {
try {
java.security.MessageDigest md = java.security.MessageDigest.getInstance(algorithm);
md.update(message.getBytes());
return md.digest();
} catch(java.security.NoSuchAlgorithmException e) {
println(e.getMessage());
return null;
}
}
String createHash(String path_) {
byte[] md5hash = messageDigest(new String(loadBytes(path_)),"MD5");
BigInteger bigInt = new BigInteger(1, md5hash);
return bigInt.toString(16);
}
The String.getBytes() method returns the bytes that represent the character encodings for the string. It doesn't parse it into bytes that represent a number in some arbitrary radix. For example "AA".getBytes() would yield you 0x41 0x41 on windows, not 10101010b, which is what it appears you were expecting? To get that you could, for example Byte.parseByte("AA", 16)
The library you're using to create hashes probably has a method for taking back in its own string representation. How to convert back depends on the representation, which you didn't give us.
use following code to convert hash bytes to string
//byte[] md5sum = digest.digest();
BigInteger bigInt = new BigInteger(1, md5sum);
String output = bigInt.toString(16);
System.out.println("MD5: " + output);
for more information
Related
I want to load the MD5 of may different files. I am following this answer to do that but the main problem is that the time taken to load the MD5 of the files ( May be in hundreds) is a lot.
Is there any way which can be used to find the MD5 of an file without consuming much time.
Note- The size of the file may be large ( May go up to 300MB).
This is the code which I am using -
import java.io.*;
import java.security.MessageDigest;
public class MD5Checksum {
public static byte[] createChecksum(String filename) throws Exception {
InputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[1024];
MessageDigest complete = MessageDigest.getInstance("MD5");
int numRead;
do {
numRead = fis.read(buffer);
if (numRead > 0) {
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
fis.close();
return complete.digest();
}
// see this How-to for a faster way to convert
// a byte array to a HEX string
public static String getMD5Checksum(String filename) throws Exception {
byte[] b = createChecksum(filename);
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString( ( b[i] & 0xff ) + 0x100, 16).substring( 1 );
}
return result;
}
public static void main(String args[]) {
try {
System.out.println(getMD5Checksum("apache-tomcat-5.5.17.exe"));
// output :
// 0bb2827c5eacf570b6064e24e0e6653b
// ref :
// http://www.apache.org/dist/
// tomcat/tomcat-5/v5.5.17/bin
// /apache-tomcat-5.5.17.exe.MD5
// 0bb2827c5eacf570b6064e24e0e6653b *apache-tomcat-5.5.17.exe
}
catch (Exception e) {
e.printStackTrace();
}
}
}
You cannot use hashes to determine any similarity of content.
For instance, generating the MD5 of hellostackoverflow1 and hellostackoverflow2 calculates two hashes where none of the characters of the string representation match (7c35[...]85fa vs b283[...]3d19). That's because a hash is calculated based on the binary data of the file, thus two different formats of the same thing - e.g. .txt and a .docx of the same text - have different hashes.
But as already noted, some speed might be achieved by using native code, thus the NDK. Additionally, if you still want to compare files for exact matches, first compare the size in bytes, after that use a hashing algorithm with enough speed and a low risk of collisions. As stated, CRC32 is fine.
Hash/CRC calculation takes some time as the file has to be read completely.
The code of createChecksum you presented is nearly optimal. The only parts that can be tweaked is the read buffer size (I would use a buffer size 2048 bytes or larger). However this may get you a maximum of 1-2% speed improvement.
If this is still too slow the only option left is to implement the hashing in C/C++ and use it as native method. Besides that there is nothing you can do.
Sending hex string in url parameter and trying to convert it in to string at server side.
Converting user input string by using following javascript encoding code
function encode(string) {
var number = "";
var length = string.trim().length;
string = string.trim();
for (var i = 0; i < length; i++) {
number += string.charCodeAt(i).toString(16);
}
return number;
}
Now I'm trying to parse hex string 419 for russian character Й in java code as follows
byte[] bytes = "".getBytes();
try {
bytes = Hex.decodeHex(hex.toCharArray());
sb.append(new String(bytes,"UTF-8"));
} catch (DecoderException e) {
e.printStackTrace(); // Here it gives error 'Odd number of characters'
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
but it gives following error
"org.apache.commons.codec.DecoderException: Odd number of characters."
How it can be resolved. As there are many russian character have hex code 3 digit and due to this it is not able to convert it to .toCharArray().
Use Base64 instead
val aes = KeyGenerator.getInstance("AES")
aes.init(128)
val secretKeySpec = aes.generateKey()
val base64 = Base64.encodeToString(secretKeySpec.encoded, 0)
val bytes = Base64.decode(base64, 0)
SecretKeySpec(bytes, 0, bytes.size, "AES") == secretKeySpec
In the case you mentioned Й is U+0419 and most cyrillic characters start with a leading 0. This apparently means that adding a 0 before odd numbered character arrays before converting would help.
Testing the javascript seems that this could be safe only for 1 letter long strings: Ѓ(U+0403) returned 403, Ѕ(U+0405) returned 405, but ЃЅ returned 403405 instead of 04030405 or 4030405, which is even worse, becouse it is even and would not trigger the exception and could decode to something completely different.
This question dealing with padding with leading zeros may help with the javascript part.
Instead of
sb.append(new String(bytes,"UTF-8"));
Try this
sb.append(new String(bytes,"Windows-1251"));
I was trying to print encrypted text using string perhaps i was wrong somewhere. I am doing simple xor on a plain text. Coming encrypted text/string i am putting in a C program and doing same xor again to get plain text again.
But in between, I am not able to get proper string of encrypted text to pass in C
String xorencrypt(byte[] passwd,int pass_len){
char[] st = new char[pass_len];
byte[] crypted = new byte[pass_len];
for(int i = 0; i<pass_len;i++){
crypted[i] = (byte) (passwd[i]^(i+1));
st[i] = (char)crypted[i];
System.out.println((char)passwd[i]+" "+passwd[i] +"= " + (char)crypted[i]+" "+crypted[i]);/* characters are printed fine but problem is when i am convering it in to string */
}
return st.toString();
}
I don't know if any kind of encoding also needed because if i did so how I will decode and decrypt from C program.
example if suppose passwd = bond007
then java program should return akkb78>
further C program will decrypt akkb78> to bond007 again.
Use
return new String(crypted);
in that case you don't need st[] array at all.
By the way, the encoded value for bond007 is cmm`560 and not what you posted.
EDIT
While solution above would most likely work in most java environments, to be safe about encoding,
as suggested by Alex, provide encoding parameter to String constructor.
For example if you want your string to carry 8-bit bytes :
return new String(crypted, "ISO-8859-1");
You would need the same parameter when getting bytes from your string :
byte[] bytes = myString.getBytes("ISO-8859-1")
Alternatively, use solution provided by Alex :
return new String(st);
But, convert bytes to chars properly :
st[i] = (char) (crypted[i] & 0xff);
Otherwise, all negative bytes, crypted[i] < 0 will not be converted to char properly and you get surprising results.
Change this line:
return st.toString();
with this
return new String(st);
I have a byte[] array named byteval in java and if I do System.out.println(byteval), I can read: d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
Now I need this what I read there as a string, but if I try to convert it with Byte.toString or a new string constuctor, the value is not the same, most there are some numbers instead.
So how can I get the byte[] array as a String called strval, also cutting off the [B#40d150e0?
Now: System.out.println(byteval)>> d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
Goal: System.out.println(strval)>> d3e1547c254ff7cec8dbcef2262b5cf10ec079c7
Lot of thanks!
Danny
EDIT: Working solution for me:
byte[] byteval = getValue();
// Here System.out.println(byteval) is
// d3e1547c254ff7cec8dbcef2262b5cf10ec079c7[B#40d150e0
BigInteger bi = new BigInteger(1, byteval);
String strval = bi.toString(16);
if ((strval.length() % 2) != 0) {
strval = "0" + strval;
}
System.out.println(strval);
// Here the String output is
// d3e1547c254ff7cec8dbcef2262b5cf10ec079c7
Thank all answerer.
just do
System.out.println(byteval.toString())
instead of
System.out.println(byteval)
this will remove the ending part(Actually that is just the address of the object referenced)
Try this:
String str= new String(byteval, "ISO-8859-1");
System.out.println(str);
according to the specification: http://wiki.theory.org/BitTorrentSpecification
info_hash: urlencoded 20-byte SHA1 hash of the value of the info key from the Metainfo file. Note that the value will be a bencoded dictionary, given the definition of the info key above.
torrentMap is my dictionary, I get the info key which is another dictionary, I calculate the hash and I URLencode it.
But I always get an invalid info_hash message when I try to send it to the tracker.
This is my code:
public String GetInfo_hash() {
String info_hash = "";
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(torrentMap.get("info"));
byte[] bytes = bos.toByteArray(); //Map => byte[]
MessageDigest md = MessageDigest.getInstance("SHA1");
info_hash = urlencode(md.digest(bytes)); //Hashing and URLEncoding
out.close();
bos.close();
} catch (Exception ex) { }
return info_hash;
}
private String urlencode(byte[] bs) {
StringBuffer sb = new StringBuffer(bs.length * 3);
for (int i = 0; i < bs.length; i++) {
int c = bs[i] & 0xFF;
sb.append('%');
if (c < 16) {
sb.append('0');
}
sb.append(Integer.toHexString(c));
}
return sb.toString();
}
This is almost certainly the problem:
out = new ObjectOutputStream(bos);
out.writeObject(torrentMap.get("info"));
What you're going to be hashing is the Java binary serialization format of the value of torrentMap.get("info"). I find it very hard to believe that all BitTorrent programs are meant to know about that.
It's not immediately clear to me from the specification what the value of the "info" key is meant to be, but you need to work out some other way of turning it into a byte array. If it's a string, I'd expect some well-specified encoding (e.g. UTF-8). If it's already binary data, then use that byte array directly.
EDIT: Actually, it's sounds like the value will be a "bencoded dictionary" as per your quote, which looks like it will be a string. Quite how you're meant to encode that string (which sounds like it may include values which aren't in ASCII, for example) before hashing it is up for grabs. If your sample strings are all ASCII, then using "ASCII" and "UTF-8" as the encoding names for String.getBytes(...) will give the same result anyway, of course...