I'm working on moving some files to a different directory in my project and it's working great, except for the fact that I can't verify it's moved properly.
I want to verify the length of the copy is the same as the original and then I want to delete the original. I'm closing both FileStreams before I do my verification but it still fails because the sizes are different. Below is my code for closing the streams, verification and deletion.
in.close();
out.close();
if (encCopyFile.exists() && encCopyFile.length() == encryptedFile.length())
encryptedFile.delete();
The rest of the code before this is using a Util to copy the streams, and it's all working fine so really I just need a better verification method.
One wonderful way you can check is to compare md5 hashes. Checking file length doesn't mean they are the same. While md5 hashes doesn't mean they are the same either, it is better than checking the length albeit a longer process.
public class Main {
public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
System.out.println("Are identical: " + isIdentical("c:\\myfile.txt", "c:\\myfile2.txt"));
}
public static boolean isIdentical(String leftFile, String rightFile) throws IOException, NoSuchAlgorithmException {
return md5(leftFile).equals(md5(rightFile));
}
private static String md5(String file) throws IOException, NoSuchAlgorithmException {
MessageDigest digest = MessageDigest.getInstance("MD5");
File f = new File(file);
InputStream is = new FileInputStream(f);
byte[] buffer = new byte[8192];
int read = 0;
try {
while ((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
byte[] md5sum = digest.digest();
BigInteger bigInt = new BigInteger(1, md5sum);
String output = bigInt.toString(16);
return output;
} finally {
is.close();
}
}
}
You could include a checksum in your copy operation. Perform a checksum on the destination file and see that it matches a checksum on the source.
You could use commons io:
org.apache.commons.io.FileUtils.contentEquals(File file1, File file2)
or you could use checksum methods:
org.apache.commons.io.FileUtils:
static Checksum checksum(File file, Checksum checksum) //Computes the checksum of a file using the specified checksum object.
static long checksumCRC32(File file) //Computes the checksum of a file using the CRC32 checksum routine.
If you get no exception while copying streams, you should be OK. Make sure you don't ignore exceptions thrown by close method!
Update: If you use FileOutputStream, you can also make sure everything was written properly by calling fileOutputStream.getFD().sync() before closing your fileOutputStream.
Of course, if you want to absolutely make sure that files are the same, you can compare their checksums/digests, but that sounds bit paranoid to me.
If the sizes are different, perhaps you are not flushing the output stream before closing it.
Which file is bigger? What are the sizes of each file? Have you actually looked at the two files to see what is different?
Related
I am base64 encoding an excel file, send it somewhere where it is saved. Apparently after this, excel complains that the file is incorrect and if I want to attept a restore. The code I am doing (actually I did a quick test main method) is:
public static void main(String[] args) throws IOException {
Path p = Paths.get("C:\\VariousJunk\\excel-test", "test.xlsx");
ByteArrayOutputStream base64StringOutputStream = new ByteArrayOutputStream();
OutputStream base64EncodingStream = Base64.getEncoder().wrap(base64StringOutputStream);
Files.copy(p, base64EncodingStream);
base64StringOutputStream.close();
String b64 = base64StringOutputStream.toString();
byte[] data = Base64.getDecoder().decode(b64);
FileOutputStream fos = new FileOutputStream("C:\\VariousJunk\\excel-test\\test-backup.xlsx");
fos.write(data);
fos.close();
}
Now I have compared binary data of both files and it appears, that the output file is only missing one last byte with value 0. I have added the last bit for the test
fos.write(data);
fos.write(0);
fos.close();
And it works fine. The problem is I will be using this for any other type of data, and so I am not sure whether hardcoding a last byte is a good idea, possibly it might crash other filetypes. Is this a feature of this Base64 encoding method or am I doing something wrong?
Apparently the missing bit was base64EncodingStream.close() just after Files.copy()
OutputStream base64EncodingStream = Base64.getEncoder().wrap(base64StringOutputStream);
Files.copy(p, base64EncodingStream);
base64EncodingStream.close();
String b64 = base64StringOutputStream.toString();
I'm trying to develop a file updater for some files in a folder, to Sync an FTP server with a local folder, using Java on the client and PHP on the server side.
On the server side, I'm calculating the md5_file($filename) for the file and returning every of them on a JSON.
On Java, I'm checking first if the file exists in the local folder. If the file exists, then I check for the MD5 checksum to see if the file is exactly the same as the online one.
The MD5 is not matching when checking .txt or .lua files. It's ok when checking other file types, as .dds texture files.
The MD5 I'm using on Java is this:
private String md5(File f) throws FileNotFoundException, NoSuchAlgorithmException {
MessageDigest digest = MessageDigest.getInstance("MD5");
InputStream is = new FileInputStream(f);
byte[] buffer = new byte[8192];
int read = 0;
try {
while( (read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
byte[] md5sum = digest.digest();
BigInteger bigInt = new BigInteger(1, md5sum);
String output = bigInt.toString(16);
return output;
}
catch(IOException e) {
throw new RuntimeException("Unable to process file for MD5", e);
}
finally {
try {
is.close();
}
catch(IOException e) {
throw new RuntimeException("Unable to close input stream for MD5 calculation", e);
}
}
}
As an example, for a description.lua file, with the following contents:
livery = {
{"KC-130_fusel", 0 ,"KC-130_map_fus",false};
{"KC-130_wing", 0 ,"KC-130_map_wingS",false};
{"KC-130_wing_2", 0 ,"KC-130_map_wings_2",false};
{"KC-130_notes", 0 ,"KC-130_notes_empty",true};
{"KC-130_FPod", 0 ,"kc-130_map_drg",false};
}
name = "Spain ALA 31 TK.10-06"
countries = {"SPN"} -- and any others you want to add
PHP md5_file($filename) = d0c32f9e38cc6e1bb8b54a6aca4a0190
JAVA md5(File) = 08bf57441b904c69e9ce3ca02a9257c7
I've been trying to find a relation between those two codes to see what's making the difference, but have not find any. I have checked like 10 md5 scripts for Java and all of them give the same result.
Is there any way I can fix this?
EDIT: Solution given on first comment: Change the Transfer type on the FTP Client to Binary to avoid changing txt files to ASCII encoding, changing their length and md5.
I want to load the MD5 of may different files. I am following this answer to do that but the main problem is that the time taken to load the MD5 of the files ( May be in hundreds) is a lot.
Is there any way which can be used to find the MD5 of an file without consuming much time.
Note- The size of the file may be large ( May go up to 300MB).
This is the code which I am using -
import java.io.*;
import java.security.MessageDigest;
public class MD5Checksum {
public static byte[] createChecksum(String filename) throws Exception {
InputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[1024];
MessageDigest complete = MessageDigest.getInstance("MD5");
int numRead;
do {
numRead = fis.read(buffer);
if (numRead > 0) {
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
fis.close();
return complete.digest();
}
// see this How-to for a faster way to convert
// a byte array to a HEX string
public static String getMD5Checksum(String filename) throws Exception {
byte[] b = createChecksum(filename);
String result = "";
for (int i=0; i < b.length; i++) {
result += Integer.toString( ( b[i] & 0xff ) + 0x100, 16).substring( 1 );
}
return result;
}
public static void main(String args[]) {
try {
System.out.println(getMD5Checksum("apache-tomcat-5.5.17.exe"));
// output :
// 0bb2827c5eacf570b6064e24e0e6653b
// ref :
// http://www.apache.org/dist/
// tomcat/tomcat-5/v5.5.17/bin
// /apache-tomcat-5.5.17.exe.MD5
// 0bb2827c5eacf570b6064e24e0e6653b *apache-tomcat-5.5.17.exe
}
catch (Exception e) {
e.printStackTrace();
}
}
}
You cannot use hashes to determine any similarity of content.
For instance, generating the MD5 of hellostackoverflow1 and hellostackoverflow2 calculates two hashes where none of the characters of the string representation match (7c35[...]85fa vs b283[...]3d19). That's because a hash is calculated based on the binary data of the file, thus two different formats of the same thing - e.g. .txt and a .docx of the same text - have different hashes.
But as already noted, some speed might be achieved by using native code, thus the NDK. Additionally, if you still want to compare files for exact matches, first compare the size in bytes, after that use a hashing algorithm with enough speed and a low risk of collisions. As stated, CRC32 is fine.
Hash/CRC calculation takes some time as the file has to be read completely.
The code of createChecksum you presented is nearly optimal. The only parts that can be tweaked is the read buffer size (I would use a buffer size 2048 bytes or larger). However this may get you a maximum of 1-2% speed improvement.
If this is still too slow the only option left is to implement the hashing in C/C++ and use it as native method. Besides that there is nothing you can do.
I use the "get" method from java drive api, and I can get the inputstream. but I cannt open the file when I use the inputstream to creat it. It likes the file is broken.
private static String fileurl = "C:\\googletest\\drive\\";
public static void newFile(String filetitle, InputStream stream) throws IOException {
String filepath = fileurl + filetitle;
BufferedInputStream bufferedInputStream=new BufferedInputStream(stream);
byte[] buffer = new byte[bufferedInputStream.available()];
File file = new File(filepath);
if (!file.exists()) {
file.getParentFile().mkdirs();
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(new FileOutputStream(filepath));
while( bufferedInputStream.read(buffer) != -1) {
bufferedOutputStream.write(buffer);
}
bufferedOutputStream.flush();
bufferedOutputStream.close();
}
}
Firstly, C:\googletest\drive\ is not a URL. It is a file system pathname.
Next, the following probably does not do what you think it does:
byte[] buffer = new byte[bufferedInputStream.available()];
The problem is that the available() call can return zero ... for a non-empty stream. The value returned by available() is an estimate of how many bytes that are currently available to read ... right now. That is not necessarily the stream length ... or anything related to it. And indeed the device drivers for some devices consistently return zero, even when there is data to be read.
Finally, this is wrong:
while( bufferedInputStream.read(buffer) != -1) {
bufferedOutputStream.write(buffer);
You are assuming that read returning -1 means that it filled the buffer. That is not so. Any one of the read calls could return with a partly full buffer. But then you write the entire buffer contents to the output stream ... including "junk" from previous reads.
Either or both of the 2nd and 3rd problems could lead to file corruption. In fact, the third one is likely to.
I'm essentially trying to do the following on a Java/JSP-driven web site:
User supplies a password
Password is used to build a strongly-encrypted archive file (zip, or anything else) containing a text file as well as a number of binary files that are stored on the server. It's essentially a backup of the user's files and settings.
Later, the user can upload the file, provide the original password, and the site will decrypt and unpack the archive, save the extracted binary files to the appropriate folder on the server, and then read the text file so the site can restore the user's old settings and metadata about the binary files.
It's the building/encrypting the archive and then extracting its contents that I'm trying to figure out how to do. I really don't care about the archive format, other than that it is very secure.
The ideal solution to my problem will be very easy to implement, and will require only tried-and-tested libraries with free and nonrestrictive licenses (e.g. apache, berkeley, lgpl).
I'm aware of the TrueZIP and WinZipAES libraries; the former seems like massive overkill and I can't tell how stable the latter is... Are there other solutions out there that would work well?
If you know how to create a zip file using the java.util.zip package, you can create a PBE Cipher and pass that to a CipherOutputStream or a CipherInputStream (depending on if you're reading or writing).
The following should get you started:
public class ZipTest {
public static void main(String [] args) throws Exception {
String password = "password";
write(password);
read(password);
}
private static void write(String password) throws Exception {
OutputStream target = new FileOutputStream("out.zip");
target = new CipherOutputStream(target, createCipher(Cipher.ENCRYPT_MODE, password));
ZipOutputStream output = new ZipOutputStream(target);
ZipEntry e = new ZipEntry("filename");
output.putNextEntry(e);
output.write("helloWorld".getBytes());
output.closeEntry();
e = new ZipEntry("filename1");
output.putNextEntry(e);
output.write("helloWorld1".getBytes());
output.closeEntry();
output.finish();
output.flush();
}
private static Cipher createCipher(int mode, String password) throws Exception {
String alg = "PBEWithSHA1AndDESede"; //BouncyCastle has better algorithms
PBEKeySpec keySpec = new PBEKeySpec(password.toCharArray());
SecretKeyFactory keyFactory = SecretKeyFactory.getInstance(alg);
SecretKey secretKey = keyFactory.generateSecret(keySpec);
Cipher cipher = Cipher.getInstance("PBEWithSHA1AndDESede");
cipher.init(mode, secretKey, new PBEParameterSpec("saltsalt".getBytes(), 2000));
return cipher;
}
private static void read(String password) throws Exception {
InputStream target = new FileInputStream("out.zip");
target = new CipherInputStream(target, createCipher(Cipher.DECRYPT_MODE, password));
ZipInputStream input = new ZipInputStream(target);
ZipEntry entry = input.getNextEntry();
while (entry != null) {
System.out.println("Entry: "+entry.getName());
System.out.println("Contents: "+toString(input));
input.closeEntry();
entry = input.getNextEntry();
}
}
private static String toString(InputStream input) throws Exception {
byte [] data = new byte[1024];
StringBuilder result = new StringBuilder();
int bytesRead = input.read(data);
while (bytesRead != -1) {
result.append(new String(data, 0, bytesRead));
bytesRead = input.read(data);
}
return result.toString();
}
}
The answer is already given (use a cipher as Kevin pointed out), so I am only doing a suggestion about an important matter which seems to be missing in your topicstart: ensure that you're using HTTPS instead of HTTP. Otherwise one with a network sniffer would be able to get the user-supplied password from the packets. How to do it depends on the appserver in question. Best is to refer its documentation. If it is for example Apache Tomcat, then you can find everything in the Tomcat SSL HOW-TO.
Hope this helps.
Though it may not be specific to your query I wonder if truecrypt could be of use. Your webserver could create an encrypted container into which the zip file would be copied. The encrypted container could then be downloaded. Potentially a little messy however the encryption should be strong and the downloaded image could be mounted on a variety of operating systems.
There are surely a few suggestions here on how to solve your problem, but I'm missing a very big BUT in the responses. You cannot fulfill both "password based" and "strong encryption" for any reasonable definition of "strong encryption".