I am getting an error while trying to check the MD5 hash of a file.
The file, notice.txt has the following contents:
My name is sanjay yadav . i am in btech computer science .>>
When I checked online with onlineMD5.com it gave the MD5 as: 90F450C33FAC09630D344CBA9BF80471.
My program output is:
My name is sanjay yadav . i am in btech computer science .
Read 58 bytes
d41d8cd98f00b204e9800998ecf8427e
Here's my code:
import java.io.*;
import java.math.BigInteger;
import java.security.DigestException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class MsgDgt {
public static void main(String[] args) throws IOException, DigestException, NoSuchAlgorithmException {
FileInputStream inputstream = null;
byte[] mybyte = new byte[1024];
inputstream = new FileInputStream("e://notice.txt");
int total = 0;
int nRead = 0;
MessageDigest md = MessageDigest.getInstance("MD5");
while ((nRead = inputstream.read(mybyte)) != -1) {
System.out.println(new String(mybyte));
total += nRead;
md.update(mybyte, 0, nRead);
}
System.out.println("Read " + total + " bytes");
md.digest();
System.out.println(new BigInteger(1, md.digest()).toString(16));
}
}
There's a bug in your code and I believe the online tool is giving the wrong answer. Here, you're currently computing the digest twice:
md.digest();
System.out.println(new BigInteger(1, md.digest()).toString(16));
Each time you call digest(), it resets the internal state. You should remove the first call to digest(). That then leaves you with this as the digest:
2f4c6a40682161e5b01c24d5aa896da0
That's the same result I get from C#, and I believe it to be correct. I don't know why the online checker is giving an incorrect result. (If you put it into the text part of the same site, it gives the right result.)
A couple of other points on your code though:
You're currently using the platform default encoding when converting the bytes to a string. I would strongly discourage you from doing that.
You're currently converting the whole buffer to a string, instead of only the bit you've read.
I don't like using BigInteger as a way of converting binary data to hex. You potentially need to pad it with 0s, and it's basically not what the class was designed for. Use a dedicated hex conversion class, e.g. from Apache Commons Codec (or various Stack Overflow answers which provide standalone classes for the purpose).
You're not closing your input stream. You should do so in a finally block, or using a try-with-resources statement in Java 7.
I use this function:
public static String md5Hash(File file) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
InputStream is = new FileInputStream(file);
byte[] buffer = new byte[1024];
try {
is = new DigestInputStream(is, md);
while (is.read(buffer) != -1) { }
} finally {
is.close();
}
byte[] digest = md.digest();
BigInteger bigInt = new BigInteger(1, digest);
String output = bigInt.toString(16);
while (output.length() < 32) {
output = "0" + output;
}
return output;
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Related
I am experimenting with Java and created a small program that copies a file and generates a MD5 checksum. The program works and generates a checksum, but the resulting file that is copied does not match the original checksum.
I am new to Java and do not understand what the problem is here. Am I writing the wrong buffer to the output file?
package com.application;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.math.BigInteger;
import java.security.MessageDigest;
public class Main {
static int secure_copy(String src, String dest) throws Exception {
InputStream inFile = new FileInputStream(src);
OutputStream outFile = new FileOutputStream(dest);
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] buf = new byte[1024];
int numRead;
do {
numRead = inFile.read(buf);
if (numRead > 0) {
md.update(buf, 0, numRead);
outFile.write(buf);
outFile.flush();
}
} while (numRead != -1);
inFile.close();
outFile.close();
BigInteger no = new BigInteger(1, md.digest());
String result = no.toString(16);
while(result.length() < 32) {
result = "0" + result;
}
System.out.println("MD5: " + result);
return 0;
}
public static void main(String[] args) {
try {
secure_copy(args[0], args[1]);
} catch (Exception e) {
System.out.println("Error: " + e.getMessage());
}
}
}
Output from source file: (Correct)
MD5: 503ea121d2bc6f1a2ede8eb47f0d13ef
The file from the copy function, checked via md5sum
md5sum file.mov
56883109c28590c33fb31cc862619977 file.mov
You are writing the entire buffer to the output file, not just the portion that has data from the latest read. The fix is simple:
if (numRead > 0) {
md.update(buf, 0, numRead);
outFile.write(buf, 0, numRead);
}
On every read from the InputStream, the code is continually changing the data to calculate the hash of. Instead of calling md.update(buf, 0, numRead); within the loop, it should read the entire file into a byte[] and then call md.update(entireFileByeArray) once. (See this answer for a way to find the appropriate array size ahead of opening the file.)
import java.io.*;
import java.nio.*;
import java.util.Base64;
import java.util.UUID;
import java.io.UnsupportedEncodingException;
public class Abc {
public static String readFileAsString(String filePath) throws IOException {
DataInputStream dis = new DataInputStream(new FileInputStream(filePath));
try {
long len = new java.io.File(filePath).length();
if (len > Integer.MAX_VALUE) throw new IOException("File " + filePath + " too large")
byte[] bytes = new byte[(int) len];
dis.readFully(bytes);
String ans = new String(bytes, "UTF-8");
return ans;
} finally {
dis.close();
}
}
public static void main(String args[]) throws IOException {
String base64encodedString = null;
FileOutputStream stream = new FileOutputStream("C:\\Users\\EMP142738\\Desktop\\New folder\\Readhjbdsdsefd.pdf");
String filePath = new String("C:\\Users\\EMP142738\\Desktop\\New folder\\Readers Quick Ref Card.pdf");
try {
base64encodedString = java.util.Base64.getUrlEncoder().encodeToString(new Abc().readFileAsString(filePath).getBytes("utf-8"));
} catch (IOException e) {
e.printStackTrace();
}
try {
byte[] base64decodedBytes = java.util.Base64.getUrlDecoder().decode(base64encodedString);
stream.write(base64decodedBytes);
} catch(IOException e){
e.printStackTrace();}
finally {
stream.close();
}//catch (FileNotFoundException e) {
// e.printStackTrace();
}
}
I'm trying to encode and decode a PDF file using Base64. What I'm doing is converting a PDF(Binary File) to a ByteArray, then returning the ByteArray as a string. I'm then encoding this string in Base64, using java.util.Base64. When I try to backtrack through the process, I'm able to convert a PDF(Binary File) but the File is corrupted/damaged. Also, the output file after the entire process ( Encode- Decode) is significantly larger than the input file. I expected that both of them would be of the same size. What am I doing wrong here?
Edit 1( 7/13/16):
In the main method, I modified the code as per Jim's suggestion.
I tried using Base64.encode(byte[] src) after reading the documentation of the same. However it keeps giving the error "cannot find symbol Base64.encode(byte[])". But I've used the encodetoString method from the same Class( java.util.Base64.Encoder). I'm unable to understand the issue here.
Here's the modified main method used after returning a byte[] from the readFileAsString method.
public void main(String args[]) throws IOException {
String filePath = new String("C:\\Users\\EMP142738\\Desktop\\New folder\\Readers Quick Ref Card.pdf");
byte[] src = new Abc().readFileAsString(filePath);
byte[] destination = Base64.encode(src);
}
The problem is in your flow
byte[] -> String -> base64 string
You need to omit the conversion to String and go directly:
byte[] -> base64 string
Converting to String will corrupt a binary stream as it involves a decode operation from the input character set to 16-bit Unicode characters.
This question already has answers here:
Getting a File's MD5 Checksum in Java
(22 answers)
Closed 7 years ago.
I wrote the following program to calculate SHA-256 hash value of a string in Java:
public class ToHash {
public static void main(String[] args) {
byte[] data = "test".getBytes("UTF8");
MessageDigest digest = MessageDigest.getInstance("SHA-256");
byte[] hash = digest.digest(data);
System.out.println(new BASE64Encoder().encode(hash));
}
}
Well, that works fine. In the next step I want to develop it in a way to accept a file and calculate its hash value. My solution is to read whole the file in a string array and the call the digest() method on that string array. But there are two problems :
I don't have any idea how to read whole the file into an array? Currently I think I must read it line by line and append an array with the new lines!
Above methodology need a lot of memory for big files!
This is my current program to read a file:
public class ToHash {
public static void main(String[] args) throws NoSuchAlgorithmException, UnsupportedEncodingException, FileNotFoundException, IOException {
// TODO code application logic here
// The name of the file to open.
String fileName = "C:\\Users\\ghasemi\\Desktop\\1.png";
BufferedReader br = null;
try {
String sCurrentLine;
br = new BufferedReader(new FileReader(fileName));
while ((sCurrentLine = br.readLine()) != null) {
byte[] data = sCurrentLine.getBytes("UTF8");
System.out.println(new BASE64Encoder().encode(data));
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null) {
br.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
It seems that there is no method for BufferedReader object to read whole the file with one call.
You can read the file and calculate the value of the hash as you go.
byte[] buffer= new byte[8192];
int count;
MessageDigest digest = MessageDigest.getInstance("SHA-256");
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(fileName));
while ((count = bis.read(buffer)) > 0) {
digest.update(buffer, 0, count);
}
bis.close();
byte[] hash = digest.digest();
System.out.println(new BASE64Encoder().encode(hash));
This doesn't assume anything about character sets or about the file fitting into memory, and it doesn't ignore line terminators either.
Or you can use a DigestInputStream.
I would like to hash (MD5) all the files of a given directory, which holds 1000 2MB photos.
I tried just running a for loop and hashing a file at a time, but that caused memory issues.
I need a method to hash each file in an efficient manner (memory wise).
I have posted 3 questions with my problem, but now instead of fixing my code, I want to see what would be the best general approach to my requirement.
Thank you very much for the help.
public class MD5 {
public static void main(String[] args) throws IOException {
File file = new File("/Users/itaihay/Desktop/test");
for (File f : file.listFiles()) {
try {
model.MD5.hash(f);
} catch (Exception e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
private static MessageDigest md;
private static BufferedInputStream fis;
private static byte[] dataBytes;
private static byte[] mdbytes;
private static void clean() throws NoSuchAlgorithmException {
md = MessageDigest.getInstance("MD5");
dataBytes = new byte[8192];
}
public static void hash(File file) {
try {
clean();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
try {
fis = new BufferedInputStream(new FileInputStream(file));
int nread = 0;
while ((nread = fis.read(dataBytes)) != -1) {
md.update(dataBytes, 0, nread);
}
nread = 0;
mdbytes = md.digest(); System.out.println(javax.xml.bind.DatatypeConverter.printHexBinary(mdbytes).toLowerCase());
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
fis.close();
dataBytes = null;
md = null;
mdbytes = null;
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
As others have said, using built-in Java MD5 code, you should be able to keep your memory footprint very small. I do something similar when hashing a large number of Jar files (up to a few MB apiece, usually 500MB-worth at a time) and get decent performance. You'll definitely want to play around with different buffer sizes until you find the optimal size for your system configuration. The following code-snippet uses no more than bufSize+128 bytes at a time, plus a negligible amount of overhead for the File, MessageDigest, and InputStream objects used to compute the md5 hash:
InputStream is = null;
File f = ...
int bufSize = ...
byte[] md5sum = null;
try {
MessageDigest digest = MessageDigest.getInstance("MD5");
is = new FileInputStream(f);
byte[] buffer = new byte[bufSize];
int read = 0;
while((read = is.read(buffer)) > 0) digest.update(buffer,0,read);
md5sum = digest.digest();
} catch (Exception e){
} finally {
try{
if(is != null) is.close();
} catch (IOException e){}
}
Increasing your Java heap space could solve it short term.
Long term, you want to look into reading images into a fixed-size queue that can fit in the memory. Don't read them all in at once. Enqueue the most recent image and dequeue the earliest image.
MD5 updates its state in 64 byte chunks, so you only need 16 bytes of a file in memory at a time. The MD5 state itself is 128 bits, as is the output size.
The most memory conservative approach would be to read 64 bytes at a time from each file, file-by-file, and use it to update that file's MD5 state. You would need at most 999 * 16 + 64 = 16048 ~= 16k of memory.
But such small reads would be very inefficient, so from there you can increase the read size from a file to fit within your memory constraints.
How to write a byte array to a file in Java?
As Sebastian Redl points out the most straight forward now java.nio.file.Files.write. Details for this can be found in the Reading, Writing, and Creating Files tutorial.
Old answer:
FileOutputStream.write(byte[]) would be the most straight forward. What is the data you want to write?
The tutorials for Java IO system may be of some use to you.
You can use IOUtils.write(byte[] data, OutputStream output) from Apache Commons IO.
KeyGenerator kgen = KeyGenerator.getInstance("AES");
kgen.init(128);
SecretKey key = kgen.generateKey();
byte[] encoded = key.getEncoded();
FileOutputStream output = new FileOutputStream(new File("target-file"));
IOUtils.write(encoded, output);
As of Java 1.7, there's a new way: java.nio.file.Files.write
import java.nio.file.Files;
import java.nio.file.Paths;
KeyGenerator kgen = KeyGenerator.getInstance("AES");
kgen.init(128);
SecretKey key = kgen.generateKey();
byte[] encoded = key.getEncoded();
Files.write(Paths.get("target-file"), encoded);
Java 1.7 also resolves the embarrassment that Kevin describes: reading a file is now:
byte[] data = Files.readAllBytes(Paths.get("source-file"));
A commenter asked "why use a third-party library for this?" The answer is that it's way too much of a pain to do it yourself. Here's an example of how to properly do the inverse operation of reading a byte array from a file (sorry, this is just the code I had readily available, and it's not like I want the asker to actually paste and use this code anyway):
public static byte[] toByteArray(File file) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
boolean threw = true;
InputStream in = new FileInputStream(file);
try {
byte[] buf = new byte[BUF_SIZE];
long total = 0;
while (true) {
int r = in.read(buf);
if (r == -1) {
break;
}
out.write(buf, 0, r);
}
threw = false;
} finally {
try {
in.close();
} catch (IOException e) {
if (threw) {
log.warn("IOException thrown while closing", e);
} else {
throw e;
}
}
}
return out.toByteArray();
}
Everyone ought to be thoroughly appalled by what a pain that is.
Use Good Libraries. I, unsurprisingly, recommend Guava's Files.write(byte[], File).
To write a byte array to a file use the method
public void write(byte[] b) throws IOException
from BufferedOutputStream class.
java.io.BufferedOutputStream implements a buffered output stream. By setting up such an output stream, an application can write bytes to the underlying output stream without necessarily causing a call to the underlying system for each byte written.
For your example you need something like:
String filename= "C:/SO/SOBufferedOutputStreamAnswer";
BufferedOutputStream bos = null;
try {
//create an object of FileOutputStream
FileOutputStream fos = new FileOutputStream(new File(filename));
//create an object of BufferedOutputStream
bos = new BufferedOutputStream(fos);
KeyGenerator kgen = KeyGenerator.getInstance("AES");
kgen.init(128);
SecretKey key = kgen.generateKey();
byte[] encoded = key.getEncoded();
bos.write(encoded);
}
// catch and handle exceptions...
Apache Commons IO Utils has a FileUtils.writeByteArrayToFile() method. Note that if you're doing any file/IO work then the Apache Commons IO library will do a lot of work for you.
No need for external libs to bloat things - especially when working with Android. Here is a native solution that does the trick. This is a pice of code from an app that stores a byte array as an image file.
// Byte array with image data.
final byte[] imageData = params[0];
// Write bytes to tmp file.
final File tmpImageFile = new File(ApplicationContext.getInstance().getCacheDir(), "scan.jpg");
FileOutputStream tmpOutputStream = null;
try {
tmpOutputStream = new FileOutputStream(tmpImageFile);
tmpOutputStream.write(imageData);
Log.d(TAG, "File successfully written to tmp file");
}
catch (FileNotFoundException e) {
Log.e(TAG, "FileNotFoundException: " + e);
return null;
}
catch (IOException e) {
Log.e(TAG, "IOException: " + e);
return null;
}
finally {
if(tmpOutputStream != null)
try {
tmpOutputStream.close();
} catch (IOException e) {
Log.e(TAG, "IOException: " + e);
}
}
File file = ...
byte[] data = ...
try{
FileOutputStream fos = FileOutputStream(file);
fos.write(data);
fos.flush();
fos.close();
}catch(Exception e){
}
but if the bytes array length is more than 1024 you should use loop to write the data.