when I use apache common-codec md5Hex to get the inputstream's md5 result,but get the different result for twice. the example code is below :
public static void main(String[] args) {
String data = "D:\\test.jpg";
File file = new File(data);
InputStream is = null;
try {
is = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
String digest = null, digest2 = null;
try {
System.out.println(is.hashCode());
digest = DigestUtils.md5Hex(is);
System.out.println(is.hashCode());
digest2 = DigestUtils.md5Hex(is);
System.out.println(is.hashCode());
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Digest = " + digest);
System.out.println("Digest2 = " + digest2);
}
and the result is:
1888654590
1888654590
1888654590
Digest = 5cc6c20f0b3aa9b44fe952da20cc928e
Digest2 = d41d8cd98f00b204e9800998ecf8427e
Thank you for answer!
d41d8cd98f00b204e9800998ecf8427e is the md5 hash of the empty string ("").
That is because is is a stream, meaning that once you've read it (in DigestUtils.md5Hex(is)), the "cursor" is at the end of the stream, where there is no more data to read, so attempting to read anything will return 0 bytes.
I suggest reading the contents of the stream to a byte[] instead, and hashing that.
For how to get a byte[] from an InputStream, see this question.
The InputStream can be traversed only once. The first call traverses it and returns the MD5 for your input file. When you call md5hex the second time, the InputStream points to the end-of-file, thus the digest2 is the MD5 for empty input.
You cannot move back within InputStream. So invoking twice:
DigestUtils.md5Hex(is);
is not the same. Better read into byte array and use:
public static String md5Hex(byte[] data)
Related
I have the following task to obtain a PDF from URL and return a BASE64 string.
What I have currently (sorry I am not a Java Expert):
public String readPDFSOAP(String var, Container container) throws StreamTransformationException{
try {
//get the url page from the arguments array
URL url = new URL("URLPDF");
try {
//get input Stream from URL
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[131072];
int n = 0;
while (-1 != (n = in.read(buf))) {
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
String string = new String(response);
} catch (Exception e) {
e.printStackTrace();
}
} catch (Exception e) {
e.printStackTrace();
}return String;}
But the string can't be returned.
Any help is appreciated.
Thanks,
Julian
Your code is all kinds of wrong. For starters, use the Base64 class to handle encoding your byte array. And no need to assign it to a variable, just return it.
return Base64.getEncoder().encodeToString(response)
and on your last line, outside of your try/catch block, just throw an exception. If you get there then you weren't able to properly retrieve and encoded the response, so no need to return a value. You're in an error condition.
Use java.util.Base64.
PDFs can be pretty large. Instead of reading it into memory, encode the InputStream directly:
ByteArrayOutputStream out = new ByteArrayOutputStream();
try (InputStream in = new BufferedInputStream(url.openStream())) {
in.transferTo(Base64.getEncoder().wrap(out));
}
String base64 = out.toString(StandardCharsets.US_ASCII);
The Base64 encoded version is even larger than the original file. I don’t know what you plan to do with the encoded version, but if you’re planning to write it somewhere, you want to avoid keeping any version of the file—original or encoded—in memory. You can do that by having your method accept an OutputStream as an argument:
public void readPDFSOAP(OutputStream destination,
String var,
Container container)
throws StreamTransformationException,
IOException {
URL url = new URL("https://example.com/doc.pdf");
try (InputStream in = new BufferedInputStream(url.openStream())) {
in.transferTo(Base64.getEncoder().wrap(destination));
}
}
Update:
Since you have said you cannot use a try-with-resources statement:
A try-with-resources statement is just a convenient way to guarantee an InputStream (or other closeable resource) is closed. This:
try (InputStream in = new BufferedInputStream(url.openStream())) {
// code that uses 'in'
}
is (nearly) equivalent to this:
InputStream in = null;
try {
in = new BufferedInputStream(url.openStream());
// code that uses 'in'
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
// Suppress
}
}
}
I get the data in the form of byte buffer of 32KB, and want to calculate the checksum of the whole data. So using the MessageDigest I keep updating the bytes into it and at the end I use the digest method to calculate the bytes read and calculating the checksum out of it. Checksum calculated is wrong by the above method. Below is the code. Any idea how to get it right?
private MessageDigest messageDigest;
//Keep getting bytebuffer of 32kb till eof is read
public int write(ByteBuffer src) throws IOException {
try {
ByteBuffer copiedByteBUffer = src.duplicate();
try{
messageDigest = MessageDigest.getInstance(MD5_CHECKSUM);
while(copiedByteBUffer.hasRemaining()){
messageDigest.update(copiedByteBUffer.get());
}
}catch(Exception e){
throw new IOException(e);
}
copiedByteBUffer = null;
}catch(Exception e){
}
}
//called after whole file is read in write function
public void calculateDigest(){
if(messageDigest != null){
byte[] digest = messageDigest.digest();
checkSumMultiPartFile = toHex(digest); // converting bytes into hexadecimal
}
}
Updated try #2
//Will Keep getting bytebuffer of 32kb till eof is read
public int write(ByteBuffer original) throws IOException {
try {
ByteBuffer copiedByteBuffer = cloneByteBuffer(original);
messageDigest = MessageDigest.getInstance(MD5_CHECKSUM);
messageDigest.update(copiedByteBuffer);
copiedByteBUffer = null;
}catch(Exception e){
}
}
public static ByteBuffer cloneByteBuffer(ByteBuffer original) {
final ByteBuffer clone = (original.isDirect()) ? ByteBuffer.allocateDirect(original.capacity()):ByteBuffer.allocate(original.capacity());
final ByteBuffer readOnlyCopy = original.asReadOnlyBuffer();
readOnlyCopy.flip();
clone.put(readOnlyCopy);
clone.position(original.position());
clone.limit(original.limit());
clone.order(original.order());
return clone;
}
After trying the above code i was able to see that the message digest was getting updated with all the bytes read for example: if the file size is 52,42,892 bytes then it was updated with 52,42,892 bytes. But when the checksum of file calculated using certutil -hashfile MD5 using CMD and the one calculated using the above method does not match.
I have a little problem: I decompress byte array and everything is ok with following code but sometimes with some data it throws DataFormatException with incorrect data check. Any ideas?
private byte[] decompress(byte[] compressed) throws DecoderException {
Inflater decompressor = new Inflater();
decompressor.setInput(compressed);
ByteArrayOutputStream outPutStream = new ByteArrayOutputStream(compressed.length);
byte temp [] = new byte[8196];
while (!decompressor.finished()) {
try {
int count = decompressor.inflate(temp);
logger.info("count = " + count);
outPutStream.write(temp, 0, count);
}
catch (DataFormatException e) {
logger.info(e.getMessage());
throw new DecoderException("Wrong format", e);
}
}
try {
outPutStream.close();
} catch (IOException e) {
throw new DecoderException("Cant close outPutStream ", e);
}
return outPutStream.toByteArray();
}
Try with a different compression level or using the nowrap options
1 Some warning: do you use the same algorithm in both sides ?
do you use bytes ? (not String)
your arrays have the good sizes ?
2
I suggest you check step by step, catching exceptions, checking sizes, null, and comparing bytes.
like this: Using Java Deflater/Inflater with custom dictionary causes IllegalArgumentException
Take your input
Compress it
copy your bytes
decompress them
compare output with input
3 if you cant find, take another example which works, and modify it step by step
hope it helps
I found out why its happening
byte temp [] = new byte[8196];
its too big, it must be exactly size of decompressed array cause it was earlier Base64 encoded, how i can get this size before decompressing it?
I am getting an error while trying to check the MD5 hash of a file.
The file, notice.txt has the following contents:
My name is sanjay yadav . i am in btech computer science .>>
When I checked online with onlineMD5.com it gave the MD5 as: 90F450C33FAC09630D344CBA9BF80471.
My program output is:
My name is sanjay yadav . i am in btech computer science .
Read 58 bytes
d41d8cd98f00b204e9800998ecf8427e
Here's my code:
import java.io.*;
import java.math.BigInteger;
import java.security.DigestException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class MsgDgt {
public static void main(String[] args) throws IOException, DigestException, NoSuchAlgorithmException {
FileInputStream inputstream = null;
byte[] mybyte = new byte[1024];
inputstream = new FileInputStream("e://notice.txt");
int total = 0;
int nRead = 0;
MessageDigest md = MessageDigest.getInstance("MD5");
while ((nRead = inputstream.read(mybyte)) != -1) {
System.out.println(new String(mybyte));
total += nRead;
md.update(mybyte, 0, nRead);
}
System.out.println("Read " + total + " bytes");
md.digest();
System.out.println(new BigInteger(1, md.digest()).toString(16));
}
}
There's a bug in your code and I believe the online tool is giving the wrong answer. Here, you're currently computing the digest twice:
md.digest();
System.out.println(new BigInteger(1, md.digest()).toString(16));
Each time you call digest(), it resets the internal state. You should remove the first call to digest(). That then leaves you with this as the digest:
2f4c6a40682161e5b01c24d5aa896da0
That's the same result I get from C#, and I believe it to be correct. I don't know why the online checker is giving an incorrect result. (If you put it into the text part of the same site, it gives the right result.)
A couple of other points on your code though:
You're currently using the platform default encoding when converting the bytes to a string. I would strongly discourage you from doing that.
You're currently converting the whole buffer to a string, instead of only the bit you've read.
I don't like using BigInteger as a way of converting binary data to hex. You potentially need to pad it with 0s, and it's basically not what the class was designed for. Use a dedicated hex conversion class, e.g. from Apache Commons Codec (or various Stack Overflow answers which provide standalone classes for the purpose).
You're not closing your input stream. You should do so in a finally block, or using a try-with-resources statement in Java 7.
I use this function:
public static String md5Hash(File file) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
InputStream is = new FileInputStream(file);
byte[] buffer = new byte[1024];
try {
is = new DigestInputStream(is, md);
while (is.read(buffer) != -1) { }
} finally {
is.close();
}
byte[] digest = md.digest();
BigInteger bigInt = new BigInteger(1, digest);
String output = bigInt.toString(16);
while (output.length() < 32) {
output = "0" + output;
}
return output;
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
On server (C++), binary data is compressed using ZLib function:
compress2()
and it's sent over to client (Java).
On client side (Java), data should be decompressed using the following code snippet:
public static String unpack(byte[] packedBuffer) {
InflaterInputStream inStream = new InflaterInputStream(new ByteArrayInputStream( packedBuffer);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
int readByte;
try {
while((readByte = inStream.read()) != -1) {
outStream.write(readByte);
}
} catch(Exception e) {
JMDCLog.logError(" unpacking buffer of size: " + packedBuffer.length);
e.printStackTrace();
// ... the rest of the code follows
}
Problem is that when it tries to read in while loop it always throws:
java.util.zip.ZipException: invalid stored block lengths
Before I check for other possible causes can someone please tell me can I compress on one side with compress2 and decompress it on the other side using above code, so I can eliminate this as a problem? Also if someone has a possible clue about what might be wrong here (I know I didn't provide too much of of the code in here but projects are rather big.
Thanks.
I think the problem is not with unpack method but in packedBuffer content. Unpack works fine
public static byte[] pack(String s) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DeflaterOutputStream dout = new DeflaterOutputStream(out);
dout.write(s.getBytes());
dout.close();
return out.toByteArray();
}
public static void main(String[] args) throws Exception {
byte[] a = pack("123");
String s = unpack(a); // calls your unpack
System.out.println(s);
}
output
123
public static String unpack(byte[] packedBuffer) {
try (GZipInputStream inStream = new GZipInputStream(
new ByteArrayInputStream(packedBuffer));
ByteArrayOutputStream outStream = new ByteArrayOutputStream()) {
inStream.transferTo(outStream);
//...
return outStream.toString(StandardCharsets.UTF_8);
} catch(Exception e) {
JMDCLog.logError(" unpacking buffer of size: " + packedBuffer.length);
e.printStackTrace();
throw new IllegalArgumentException(e);
}
}
ZLib is the zip format, hence a GZipInputStream is fine.
A you seem to expect the bytes to represent text, hence be in some encoding, add that encoding, Charset, to the conversion to String (which always holds Unicode).
Note, UTF-8 is the encoding of the bytes. In your case it might be an other encoding.
The ugly try-with-resources syntax closes the streams even on exception or here the return.
I rethrowed a RuntimeException as it seems dangerous to do something with no result.