I have two text files that are more than 600MB and I want to compare the content of them if they are the same (Ignoring any space at the end or the start of any line in it i.e. trim() each line).
I am thinking of reading each line of them as a string and then trim it and compare it.
Is there is a better idea and if not what is the fastest implementation to this idea?
Thanks in advance.
If you want to compare whether the files are consistent, please calculate the file md5 value to compare:
import java.io.FileInputStream;
import java.io.InputStream;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
public class MainServer {
public static void main(String[] args) {
String filePath1 = "D:\\Download\\a.mp3";
String filePath2 = "D:\\Download\\b.mp3";
String file1_md5 = md5HashCode(filePath1);
String file2_md5 = md5HashCode(filePath2);
System.out.println(file1_md5);
System.out.println(file2_md5);
if(file1_md5.equals(file2_md5)){
System.out.println("Two files are the same ");
}
}
/**
* get file md5 value
*/
public static String md5HashCode(String filePath) {
try {
InputStream fis =new FileInputStream(filePath);
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] buffer = new byte[1024];
int length = -1;
while ((length = fis.read(buffer, 0, 1024)) != -1) {
md.update(buffer, 0, length);
}
fis.close();
byte[] md5Bytes = md.digest();
BigInteger bigInt = new BigInteger(1, md5Bytes);
return bigInt.toString(16);
} catch (Exception e) {
e.printStackTrace();
return "";
}
}
}
If you need to read each line of the file for comparison:
List<String> file1_lines = null;
List<String> file2_lines = null;
try {
file1_lines = Files.readAllLines(Paths.get("D:/a.txt"), StandardCharsets.UTF_8);
file2_lines = Files.readAllLines(Paths.get("D:/b.txt"), StandardCharsets.UTF_8);
} catch (IOException e) {
e.printStackTrace();
}
for (int i = 0; i < file1_lines.size(); i++) {
String file1_line = file1_lines.get(i).trim();
String file2_line = file2_lines.get(i).trim();
if (file1_line.equals(file2_line)) {
//do some
}
}
Why must I use DigestInputStream and not FileInputStream to get a digest of an file?
I have written a program that reads ints from FileInputStream, converts them to bytes and passes them to update method of MessageDigest object. But I have a suspicion that it doesn't work properly, because it calculates a digest of a very large file instanlty. Why doesn't it work?
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class DigestDemo {
public static byte[] getSha1(String file) {
FileInputStream fis = null;
MessageDigest md = null;
try {
fis = new FileInputStream(file);
} catch(FileNotFoundException exc) {
System.out.println(exc);
}
try {
md = MessageDigest.getInstance("SHA-1");
} catch (NoSuchAlgorithmException exc) {
System.out.println(exc);
}
byte b = 0;
do {
try {
b = (byte) fis.read();
} catch (IOException e) {
System.out.println(e);
}
if (b != -1)
md.update(b);
} while(b != -1);
return md.digest();
}
public static void writeBytes(byte[] a) {
for (byte b : a) {
System.out.printf("%x", b);
}
}
public static void main(String[] args) {
String file = "C:\\Users\\Mike\\Desktop\\test.txt";
byte[] digest = getSha1(file);
writeBytes(digest);
}
}
You need to change the type of b to int,, and you need to call MessageDigest.doFinal() at the end of the file, but this is horrifically inefficient. Try reading and updating from a byte array.
There's too much try-catching in this code. Reduce it to one try and two catches, outside the loop.
I'm having a few issues with an extra credit assignment for my Java class. The objective is to decrypt a file without the password. It is encrypted with the PBEWithSHA1AndDESede algorithm and the password is a dictionary word with no numbers or special characters.
The way I'm trying to solve this is by guessing the password over and over again until I get it right using the code below.
The problem I'm running into is that the extra_out.txt file is being output after the first cycle of the for loop, when I want it to only be output if the correct word is guessed.
So when it runs, I get the exception "Encryption Error" and then the extra_out.txt file is output (still encrypted) and then 9999 more "Encryption Errors."
Any helpful advice is greatly appreciated!
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Random;
import java.util.Scanner;
public class WordGuess {
public static void main(String[] args) {
ArrayList<String> words = new ArrayList();
Random numGen = new Random();
String curWord = "";
try {
File aFile = new File("english.txt");
Scanner reader = new Scanner(aFile);
while (reader.hasNext()) {
curWord = reader.next();
if (curWord.length() == 5) {
words.add(curWord);
}
}
}
catch (FileNotFoundException e) {
System.out.println("Error: " + e);
}
for(int i = 0; i < 10000; i++){
int rand = Math.abs(numGen.nextInt(words.size()));
File fileIn = new File("extracredit.enc");
File fileOut = new File("extra_out.txt");
String password = words.get(rand);
crackFile(fileIn, fileOut, password);
}
}
public static void crackFile(File input, File output, String password) {
try{
Crypt c = new Crypt(password);
byte[] bytes = FileIO.read(input);
FileIO.write(output, c.decrypt(bytes));
}
catch (IOException e) {
System.out.println("Could not read/write file");
}
catch (Exception e) {
System.out.println("Encryption error");
}
}
}
I've read the documentation and the examples but I'm having a hard time putting it all together. I'm just trying to take a test pdf file and then convert it to a byte array then take the byte array and convert it back into a pdf file then create the pdf file onto disk.
It probably doesn't help much, but this is what I've got so far:
package javaapplication1;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import org.apache.pdfbox.cos.COSStream;
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
public class JavaApplication1 {
private COSStream stream;
public static void main(String[] args) {
try {
PDDocument in = PDDocument.load("C:\\Users\\Me\\Desktop\\JavaApplication1\\in\\Test.pdf");
byte[] pdfbytes = toByteArray(in);
PDDocument out;
} catch (Exception e) {
System.out.println(e);
}
}
private static byte[] toByteArray(PDDocument pdDoc) throws IOException, COSVisitorException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
pdDoc.save(out);
pdDoc.close();
} catch (Exception ex) {
System.out.println(ex);
}
return out.toByteArray();
}
public void PDStream(PDDocument document) {
stream = new COSStream(document.getDocument().getScratchFile());
}
}
You can use Apache commons, which is essential in any java project IMO.
Then you can use FileUtils's readFileToByteArray(File file) and writeByteArrayToFile(File file, byte[] data).
(here is commons-io, which is where FileUtils is: http://commons.apache.org/proper/commons-io/download_io.cgi )
For example, I just tried this here and it worked beautifully.
try {
File file = new File("/example/path/contract.pdf");
byte[] array = FileUtils.readFileToByteArray(file);
FileUtils.writeByteArrayToFile(new File("/example/path/contract2.pdf"), array);
} catch (IOException e) {
e.printStackTrace();
}
Here is my code:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.PrintWriter;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.util.Date;
import javax.crypto.Cipher;
import javax.crypto.CipherOutputStream;
import javax.crypto.NoSuchPaddingException;
import javax.crypto.spec.SecretKeySpec;
public class EncryptedLogger {
private static Date lastLogTime = null;
private static EncryptedLogger instance = null;
private static FileOutputStream fos = null;
private static CipherOutputStream cos = null;
private static PrintWriter writer = null;
private Cipher cipher;
byte[] Key ={(byte) 0x12,(byte) 0x34,0x55,(byte) 0x66,0x67,(byte)0x88,(byte)0x90,0x12,(byte) 0x23,0x45,0x67,(byte)0x89,0x12,0x33,(byte) 0x55,0x74};
public static EncryptedLogger getInstance(){
if (instance==null) {
instance = new EncryptedLogger();
}
return instance;
}
private EncryptedLogger(){
class SQLShutdownHook extends Thread{
#Override
public void run() {
EncryptedLogger.close();
super.run();
}
}
SecretKeySpec sks = new SecretKeySpec(Key,"AES");
try {
cipher = Cipher.getInstance("AES/ECB/NoPadding");
cipher.init(Cipher.ENCRYPT_MODE,sks);
fos = new FileOutputStream(new File("log.txt"),true);
} catch (InvalidKeyException e) {
e.printStackTrace();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
} catch (NoSuchPaddingException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
cos = new CipherOutputStream(fos, cipher);
writer = new PrintWriter(cos);
SQLShutdownHook hook = new SQLShutdownHook();
Runtime.getRuntime().addShutdownHook(hook);
}
public synchronized void logSQL(String s){
if ((lastLogTime==null)||((new Date().getTime() -lastLogTime.getTime())>1000)){
lastLogTime = new Date();
writer.printf("-- %1$tm-%1$te-%1$tY %1$tH-%1$tM-%1$tS\n%2$s\n",new Date(),s);
}
else{
writer.println(s);
}
}
public synchronized void logComment(String s){
writer.printf("-- %1$tm-%1$te-%1$tY %1$tH-%1$tM-%1$tS: %2$s\n",new Date(),s);
}
public static void close(){
writer.flush();
writer.close();
}
public static void main(String[] args) throws InterruptedException {
EncryptedLogger.getInstance().logSQL("1");
EncryptedLogger.getInstance().logSQL("22");
EncryptedLogger.getInstance().logSQL("33333");
EncryptedLogger.getInstance().logSQL("4900");
EncryptedLogger.getInstance().logSQL("5");
EncryptedLogger.getInstance().logSQL("66666");
EncryptedLogger.getInstance().logSQL("Some test logging statement");
EncryptedLogger.getInstance().logSQL("AAAAAAAAAAAAAAAAAAAAAAAAAA");
EncryptedLogger.getInstance().logComment("here is test commentary");
}
}
As you see i'm trying to encrypt text entries piping them through PrintWriter->CipherOutputStream->FileOutputStream chain. But when I decrypt result file there are missing bytes. I tried to flush cos and fos in EncryptedLogger.close() method - same result. Obviously i'm missing something. What is wrong?
EDIT: here is decryption code i use. It's not mine, taken from tutorial or something...
And it works fine when using simmilar encryption. But when using my code...
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import javax.crypto.Cipher;
import javax.crypto.CipherInputStream;
import javax.crypto.SecretKey;
import javax.crypto.spec.SecretKeySpec;
public class AESDecrypter
{
Cipher dcipher;
public AESDecrypter(SecretKey key)
{
try
{
dcipher = Cipher.getInstance("AES");
dcipher.init(Cipher.DECRYPT_MODE, key);
}
catch (Exception e)
{
e.printStackTrace();
}
}
byte[] buf = new byte[1024];
public void decrypt(InputStream in, OutputStream out)
{
System.out.println("decrypting");
try
{
in = new CipherInputStream(in, dcipher);
int numRead = 0;
while ((numRead = in.read(buf)) >= 0)
{
out.write(buf, 0, numRead);
}
out.close();
}
catch (java.io.IOException e)
{
}
}
public static void main(String args[])
{
try
{
byte[] keystr ={(byte) 0x12,(byte) 0x34,0x55,(byte) 0x66,0x67,(byte)0x88,(byte)0x90,0x12,(byte) 0x23,0x45,0x67,(byte)0x89,0x12,0x33,(byte) 0x55,0x74};
SecretKeySpec sks = new SecretKeySpec(keystr,"AES");
AESDecrypter encrypter = new AESDecrypter(sks);
encrypter.decrypt(new FileInputStream("sqllogenc.log"),new FileOutputStream("sqllogdec.log"));
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
EDIT2: when i write directly to fos i get this output:
-- 04-19-2012 16-17-56
1
22
33333
4900
5
66666 + delay starting 1100
Some test logging statement
AAAAAAAAAAAAAAAAAAAAAAAAAA
-- 04-19-2012 16-17-56: here is test commentary
and when writing using cos and decrypting:
-- 04-19-2012 16-22-13
1
22
33333
4900
5
66666 + delay starting 1100
Some test logging statement
AAAAAAAAAAAAAAAAAAAAAAAAAA
-- 04-19-2012 16-22-13: here
As you see part of the last line is missing including linebreak.
You should use the same cryptographic transformation (such as AES/ECB/NoPadding) at both sides. Also, note that NoPadding mode doesn't allow you to pass data of arbitrary size, therefore you need to specify some other kind of padding.
So, you need to construct Ciphers as Cipher.getInstance("AES/ECB/PKCS5Padding") at both sides.
Also, note the suggestion of rossum about use of CBC or CTR instead of ECB.
Well, AES has a fixed block size of 128 bits.
When you use AES/ECB/NoPadding, you take the responsability of making sure the size of your message is a multiple of the block size.
It probably isn't, so you get less text when you decrypt.
You should use AES/ECB/NoPadding for arbitrary length of text.