Basically, I have two Strings that are Fully Qualified File Names. I want to compare that the two files are the same thing. So I converted both Strings to File Objects. Using google's Files.equal(File file, File file2) method, I tried to compare them, but the value returned was false. However, wondering what was wrong, I converted both file objects to byte arrays and output those which equaled the same number. So, does anyone know why Files.equal is considering them false.
I'm just curious why the method is returning false because after reading the doc Files.equal compares the two files by bytes.
Thanks.
Code:
public class WhenEncrypting {
private String[] args = new String[4];
/**
* encrypts a plain text file
*
* #throws IOException
* IOException could occur
*/
#Test()
public void normalEncryption() throws IOException {
this.args[0] = "-e";
this.args[1] = "./src/decoderwheel/tests/valid.map";
this.args[2] = "./src/decoderwheel/tests/input.txt";
this.args[3] = "./src/decoderwheel/tests/crypt.txt";
DecoderWheel.main(this.args);
File plainFile = new File("./src/decoderwheel/tests/input.txt");
File crypted = new File("./src/decoderwheel/tests/crypt.txt");
byte[] f1 = Files.toByteArray(plainFile);
byte[] f2 = Files.toByteArray(crypted);
int number = f1.length;
int size = f2.length;
Files.equal(crypted, plainFile);
System.out.println(number);
System.out.println(size);
System.out.println(Files.equal(crypted, plainFile));
assertTrue(Files.equal(crypted, plainFile));
}
}
Output:
360
360
false
Based on what you've shown us, I think that the problem is most likely to be that the two files' contents are NOT equal.
The fact that the two byte arrays (read from the files) have the same lengths does not mean that their contents (and hence the files' contents) are the same.
Add something like this:
for (int i = 0; i < f1.length; i++) {
if (f1[i] != f2[i]) {
System.out.println("File content mismatch at index " + i + ": " +
f1[i] + " != " + f2[i]);
}
}
Related
I am exploring an option to compare two files in Java and show the difference in html.
Below is the code, I am using -
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.LineIterator;
import org.apache.commons.text.diff.CommandVisitor;
import org.apache.commons.text.diff.StringsComparator;
public class FileDiff {
public static void main(String[] args) throws IOException {
// Read both files with line iterator.
LineIterator file1 = FileUtils.lineIterator(new File("file-1.txt"), "utf-8");
LineIterator file2 = FileUtils.lineIterator(new File("file-2.txt"), "utf-8");
// Initialize visitor.
FileCommandsVisitor fileCommandsVisitor = new FileCommandsVisitor();
// Read file line by line so that comparison can be done line by line.
while (file1.hasNext() || file2.hasNext()) {
/*
* In case both files have different number of lines, fill in with empty
* strings. Also append newline char at end so next line comparison moves to
* next line.
*/
String left = (file1.hasNext() ? file1.nextLine() : "") + "\n";
String right = (file2.hasNext() ? file2.nextLine() : "") + "\n";
// Prepare diff comparator with lines from both files.
StringsComparator comparator = new StringsComparator(left, right);
if (comparator.getScript().getLCSLength() > (Integer.max(left.length(), right.length()) * 0.4)) {
/*
* If both lines have atleast 40% commonality then only compare with each other
* so that they are aligned with each other in final diff HTML.
*/
comparator.getScript().visit(fileCommandsVisitor);
} else {
/*
* If both lines do not have 40% commanlity then compare each with empty line so
* that they are not aligned to each other in final diff instead they show up on
* separate lines.
*/
StringsComparator leftComparator = new StringsComparator(left, "\n");
leftComparator.getScript().visit(fileCommandsVisitor);
StringsComparator rightComparator = new StringsComparator("\n", right);
rightComparator.getScript().visit(fileCommandsVisitor);
}
}
fileCommandsVisitor.generateHTML();
}
}
/*
* Custom visitor for file comparison which stores comparison & also generates
* HTML in the end.
*/
class FileCommandsVisitor implements CommandVisitor<Character> {
// Spans with red & green highlights to put highlighted characters in HTML
private static final String DELETION = "<span style=\"background-color: #FB504B\">${text}</span>";
private static final String INSERTION = "<span style=\"background-color: #45EA85\">${text}</span>";
private String left = "";
private String right = "";
#Override
public void visitKeepCommand(Character c) {
// For new line use <br/> so that in HTML also it shows on next line.
String toAppend = "\n".equals("" + c) ? "<br/>" : "" + c;
// KeepCommand means c present in both left & right. So add this to both without
// any
// highlight.
left = left + toAppend;
right = right + toAppend;
}
#Override
public void visitInsertCommand(Character c) {
// For new line use <br/> so that in HTML also it shows on next line.
String toAppend = "\n".equals("" + c) ? "<br/>" : "" + c;
// InsertCommand means character is present in right file but not in left. Show
// with green highlight on right.
right = right + INSERTION.replace("${text}", "" + toAppend);
}
#Override
public void visitDeleteCommand(Character c) {
// For new line use <br/> so that in HTML also it shows on next line.
String toAppend = "\n".equals("" + c) ? "<br/>" : "" + c;
// DeleteCommand means character is present in left file but not in right. Show
// with red highlight on left.
left = left + DELETION.replace("${text}", "" + toAppend);
}
public void generateHTML() throws IOException {
// Get template & replace placeholders with left & right variables with actual
// comparison
String template = FileUtils.readFileToString(new File("difftemplate.html"), "utf-8");
String out1 = template.replace("${left}", left);
String output = out1.replace("${right}", right);
// Write file to disk.
FileUtils.write(new File("finalDiff.html"), output, "utf-8");
System.out.println("HTML diff generated.");
}
}
For smaller files this works good and gives me good results on my laptop. But if file size is more (200MB) with half a million of rows then my IntelliJ seems to hang. RAM for my laptop is 16GB.
How can I improve this to handle large files for comparison?
Thanks
The way you wrote FileCommandsVisitor might cause it to fail to get optimized. What you're doing is adding strings for every character visited, for instance:
left = left + toAppend;
right = right + toAppend;
That might cause a new instance of a String to happen for every addition you do - new instance of a string that by the end is nearly 200 MB long. A new one for every character you visit. And old ones will need to get garbage collected. If your class held StringBuilders instead, and you used append() method it might drastically speed up. For more details read String concatenation: concat() vs "+" operator
For clarity (since according to comments you missed the point twice now):
class FileCommandsVisitor implements CommandVisitor<Character> {
//StringBuilder as properties
private StringBuilder left = new StringBuilder();
private StringBuilder right = new StringBuilder();
#Override
public void visitKeepCommand(Character c) {
String toAppend = "\n".equals("" + c) ? "<br/>" : "" + c;
// append to the StringBuilders where you would concat strings
left.append(toAppend);
right.append(toAppend);
}
//same as above for other methods
..
public void generateHTML() throws IOException {
String template = FileUtils.readFileToString(new File("difftemplate.html"), "utf-8");
//turn StringBuilders into Strings only when you actually need a String.
String out1 = template.replace("${left}", left.toString());
String output = out1.replace("${right}", right.toString());
FileUtils.write(new File("finalDiff.html"), output, "utf-8");
System.out.println("HTML diff generated.");
}
}
If that doesn't help however, and it was optimized at runtime - I don't see anything else fundamentally wrong with the way you're doing it. Comparing huge files is not a cheap operation, it won't be faster than the speed with which you can read two files line by line from your hard drive. You're still making a shortcut (that increases speed, not decreases) in having your FileCommandsVisitor hold both diffs in memory instead of writing it as it goes, which means that at best your code can diff a file of a size equal to half your available RAM. I note however, that you never mentioned how long it actually takes, so it's hard to say if the time you're seeing is expected or an anomaly.
Hi I'm working on a simple imitation of Panda's fillna method which requires me to replace a null/missing value in a csv file with an input (in terms of parameter). Almost everything is working fine but I have one issue. My CSV reader can't recognize the null/missing at the beginning and at the end of a row. For example,
Name,Age,Class
John,20,CLass-1
,18,Class-1
,21,Class-3
It will return errors.
Same goes to this example ..
Name,Age,Class
John,20,CLass-1
Mike,18,
Tyson,21,
But for this case (at the end of the row problem), I can solve this by adding another comma at the end. Like this
Name,Age,Class
John,20,CLass-1
Mike,18,,
Tyson,21,,
However, for the beginning of the row problem, I have no idea how to solve it.
Here's my code for the CSV file reader:
public void readCSV(String fileName) {
fileLocation = fileName;
File csvFile = new File(fileName);
Scanner sfile;
// noOfColumns = 0;
// noOfRows = 0;
data = new ArrayList<ArrayList>();
int colCounter = 0;
int rowCounter = 0;
try {
sfile = new Scanner(csvFile);
while (sfile.hasNextLine()) {
String aLine = sfile.nextLine();
Scanner sline = new Scanner(aLine);
sline.useDelimiter(",");
colCounter = 0;
while (sline.hasNext()) {
if (rowCounter == 0)
data.add(new ArrayList<String>());
data.get(colCounter).add(sline.next());
colCounter++;
}
rowCounter++;
sline.close();
}
// noOfColumns = colCounter;
// noOfRows = rowCounter;
sfile.close();
} catch (FileNotFoundException e) {
System.out.println("File to read " + csvFile + " not found!");
}
}
Unless you write a CSV file yourself, the writer mechanism will never arbitrarily add delimiters to suit the needs of your application method so, give up on that train of thought altogether because you shouldn't do it either. If you do indeed have access to the CSV file creation process then the simple solution would be to not allow the possibility of null or empty values to enter the file. In other words, have the defaults (in such a case) placed into empty elements as the CSV file is being written.
The Header line within a CSV file is there for a reason, it tells you the number of data columns and the names of those columns within each line (row) that make up the file. Between the header line and the actual data in the file you can also establish a pretty good idea of what each column Data Type should be.
In my opinion, the first thing your readCSV() method should do is read this Header Line (if it exists) and gather some information about the file that the method is about to iterate through. In your case the Header Line consists of:
Name,Age,Class
Right off the start we know that each line within the file consists of three (3) data columns. The first column contains the name of Name, the second column contains the name of Age, and the third column contains the name of Class. Based on all the information provided within the CSV file we can actually quickly assume the data types:
Name (String)
Age (Integer)
Class (String)
I'm only pointing this out because in my opinion, although not mandatory, I think it would be better to store the CSV data in an ArrayList or List Interface of an Object class, for example:
ArrayList<Student> studentData = new ArrayList<>();
// OR //
List<Student> studentData = new ArrayList<>();
where Student is an object class.
You seem to want everything within a 2D ArrayList so with that in mind, below is a method to read CSV files and place its' contents into this 2D ArrayList. Any file column elements that contain the word null or nothing at all will have a default string applied. There are lots of comments within the code explaining what is going on and I suggest you give them a read. This code can be easily modified to suit your needs. At the very least I hope it gives you an idea of what can be done to apply defaults to empty values within the CSV file:
/**
* Reads a supplied CSV file with any number of columnar rows and returns
* the data within a 2D ArrayList of String ({#code ArrayList<ArrayList<String>>}).
* <br><br>File delimited data that contains 'null' or nothing (a Null String (""))
* will have a supplied common default applied to that column element before it is
* stored within the 2D ArrayList.<br><br>
*
* Modify this code to suit your needs.<br>
*
* #param fileName (String) The CSV file to process.<br>
*
* #param csvDelimiterUsed (String) // The delimiter use in CSV file.<br>
*
* #param commonDefault (String) A default String value that can be common
* to all columnar elements within the CSV file that contains the string
* 'null' or nothing at all (a Null String ("")). Those empty elements will
* end up containing this supplied string value postfixed with the name of
* that column. As an Example, If the CSV file Header line was
* 'Name,Age,Class Room' and if the string "Unknown " is supplied to the
* commonDefault parameter and during file parsing a specific data column
* (let's say Age) contained the word 'null' or nothing at all (ex:
* Bob,null,Class-Math OR Bob,,Class-Math) then this line will be stored
* within the 2D ArrayList as:<pre>
*
* Bob, Unknown Age, Class-Math</pre>
*
* #return (2D ArrayList of String Type - {#code ArrayList<ArrayList<String>>})
*/
public ArrayList<ArrayList<String>> readCSV(final String fileName, final String csvDelimiterUsed,
final String commonDefault) {
String fileLocation = fileName; // The student data file name to process.
File csvFile = new File(fileLocation); // Create a File Object (use in Scanner reader).
/* The 2D ArrayList that will be returned containing all the CSV Row/Column data.
You should really consider creating a Class to hold Student instances of this
data however, this can be accomplish by working the ArrayList later on when it
is received. */
ArrayList<ArrayList<String>> fileData = new ArrayList<>();
// Open the supplied data file using Scanner (as per OP).
try (Scanner reader = new Scanner(csvFile)) {
/* Read the Header Line and gather information... This array
will ultimately be setup to hold default values should
any file columnar data hold null OR null-string (""). */
String[] columnData = reader.nextLine().split("\\s*\\" + csvDelimiterUsed + "\\s*");
/* How many columns of data will be expected per row.
This will be used in the String#split() method later
on as the limiter when we parse each file data line.
This limiter value is rather important in this case
since it ensures that a Null String ("") is in place
of where valid Array element should be should there
be no data available instead of just providing an
array of 'lesser length'. */
int csvValuesPerLineCount = columnData.length;
// Copy column Names Array: To just hold the column Names.
String[] columnName = new String[columnData.length];
System.arraycopy(columnData, 0, columnName, 0, columnData.length);
/* Create default data for columns based on the supplied
commonDefault String. Here the supplied default prefixes
the actual column name (see JavaDoc). */
for (int i = 0; i < columnData.length; i++) {
columnData[i] = commonDefault + columnData[i];
}
// An ArrayList to hold each row of columnar data.
ArrayList<String> rowData;
// Iterate through in each row of file data...
while (reader.hasNextLine()) {
rowData = new ArrayList<>(); // Initialize a new ArrayList.
// Read file line and trim off any leading or trailing white-spaces.
String aLine = reader.nextLine().trim();
// Only Process lines that contain something (blank lines are ignored).
if (!aLine.isEmpty()) {
/* Split the read in line based on the supplied CSV file
delimiter used and the number of columns established
from the Header line. We do this to determine is a
default value will be reguired for a specific column
that contains no value at all (null or Null String("")). */
String[] aLineParts = aLine.split("\\s*\\" + csvDelimiterUsed + "\\s*", csvValuesPerLineCount);
/* Here we determine if default values will be required
and apply them. We then add the columnar row data to
the rowData ArrayList. */
for (int i = 0; i < aLineParts.length; i++) {
rowData.add((aLineParts[i].isEmpty() || aLineParts[i].equalsIgnoreCase("null"))
? columnData[i] : aLineParts[i]);
}
/* Add the rowData ArrayList to the fileData
ArrayList since we are now done with this
file row of data and will now iterate to
the next file line for processing. */
fileData.add(rowData);
}
}
}
// Process the 'File Not Found Exception'.
catch (FileNotFoundException ex) {
System.err.println("The CSV file to read (" + csvFile + ") can not be found!");
}
// Return the fileData ArrayList to the caller.
return fileData;
}
And to use the method above you might do this:
ArrayList<ArrayList<String>> list = readCSV("MyStudentsData.txt", ",", "Unknown ");
if (list == null) { return; }
StringBuilder sb;
for (int i = 0; i < list.size(); i++) {
sb = new StringBuilder("");
for (int j = 0; j < list.get(i).size(); j++) {
if (!sb.toString().isEmpty()) { sb.append(", "); }
sb.append(list.get(i).get(j));
}
System.out.println(sb.toString());
}
I create an empty file with desired length in Android using Java like this:
long length = 10 * 1024 * 1024 * 1024;
String file = "PATH\\File.mp4";
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "rw");
randomAccessFil.setLength(length);
That code creates a file with desired length and with NULL data. Then I write data into the file like this:
randomAccessFile.write(DATA);
Now my question is : I want to extract end of data written into the File. I have written this function to find end of data as fast as possible with binary search:
long extractEndOfData(RandomAccessFile accessFile, long from, long end) throws IOException {
accessFile.seek(from);
if (accessFile.read() == 0) {
//this means no data has written into the file
return 0;
}
accessFile.seek(end);
if (accessFile.read() != 0) {
return end + 1;
}
long mid = (from + end) / 2;
accessFile.seek(mid);
if (accessFile.read() == 0) {
return extractEndOfData(accessFile, from, mid - 1);
} else {
if (accessFile.read() == 0) {
return mid + 1;
} else {
return extractEndOfData(accessFile, mid + 1, end);
}
}
}
and I call that function like this to find end of data into the file:
long endOfData = extractEndOfData(randomAccessFile, 0, randomAccessFile.length() - 1);
That function works fine for Files that their data begin with NON-NULL data and there is not any NULL data among data like this:
But for some some files it does not. because some files begin with NULL data as this:
What can i do to solve this problem? Thanks a lot.
I think your issue is clear: You will never be able to find out how much data is written (or where the end of the content) is, when you are only searching for a NULL inside the file. The reason is that NULL is a byte with the value 0x00, which appears in all kinds of binary files (maybe not textfiles) and on the other side, your file is initialized with NULLs.
What you could do is for example to store the size of your data written to the file in the first four bytes of the file.
So when writing the DATA to the file, first write the length of it, and then the actual data content.
But I am still wondering why you don't initialize the file's size to the size you need.
need your help to convert prn file to csv file using java.
Thank you so much.
Below is my prn file.
i would like to make it shows like this
Thank you so much.
In your example you have four entries as input, each in a row. In your result table they all are in one row. I assume the input describes a complete prn set. So if a file would contain n prn sets, it would have n * 4 rows.
To map the pm set to a csv file you have to
read in the entries from the input file
write a header row (with eight titles)
extract in each entry the relevant values
combine the extracted values from four entries in sequence to one csv row
write the row
repeat steps 3 to 5 as long as there are further entries
Here is my suggestion:
public class PrnToCsv {
private static final String DILIM_PRN = " ";
private static final String DILIM_CSV = ",";
private static final Pattern PRN_SPLITTER = Pattern.compile(DILIM_PRN);
public static void main(String[] args) throws URISyntaxException, IOException {
List<String> inputLines = Files.readAllLines(new File("C://Temp//csv/input.prn").toPath());
List<String[]> inputValuesInLines = inputLines.stream().map(l -> PRN_SPLITTER.split(l)).collect(Collectors.toList());
try (BufferedWriter bw = Files.newBufferedWriter(new File("C://Temp//csv//output.csv").toPath())) {
// header
bw.append("POL1").append(DILIM_CSV).append("POL1_Time").append(DILIM_CSV).append("OLV1").append(DILIM_CSV).append("OLV1_Time").append(DILIM_CSV);
bw.append("POL2").append(DILIM_CSV).append("POL2_Time").append(DILIM_CSV).append("OLV2").append(DILIM_CSV).append("OLV2_Time");
bw.newLine();
// data
for (int i = 0; i + 3 < inputValuesInLines.size(); i = i + 4) {
String[] firstValues = inputValuesInLines.get(i);
bw.append(getId(firstValues)).append(DILIM_CSV).append(getDateTime(firstValues)).append(DILIM_CSV);
String[] secondValues = inputValuesInLines.get(i + 1);
bw.append(getId(secondValues)).append(DILIM_CSV).append(getDateTime(secondValues)).append(DILIM_CSV);
String[] thirdValues = inputValuesInLines.get(i + 2);
bw.append(getId(thirdValues)).append(DILIM_CSV).append(getDateTime(thirdValues)).append(DILIM_CSV);
String[] fourthValues = inputValuesInLines.get(i + 3);
bw.append(getId(fourthValues)).append(DILIM_CSV).append(getDateTime(fourthValues));
bw.newLine();
}
}
}
public static String getId(String[] values) {
return values[1];
}
public static String getDateTime(String[] values) {
return values[2] + " " + values[3];
}
}
Some remarks to the code:
Using the nio-API you can read the whole file with one line of code.
To extract the values of an entry line I used a Pattern to split the line into an array with each single word as a value.
Then it is easy get the relevant values of an entry using the appropriate array indexes.
To write the csv file line by line (without additional libs) you can use a BufferedWriter.
The file you're writting to is a resource. It is recommended to use resources with the try-with-resource-statement.
I hope I could answer your question.
We have a data file for which we need to generate a CRC. (As a placeholder, I'm using CRC32 while the others figure out what CRC polynomial they actually want.) This code seems like it ought to work:
broken:
Path in = ......;
try (SeekableByteChannel reading =
Files.newByteChannel (in, StandardOpenOption.READ))
{
System.err.println("byte channel is a " + reading.getClass().getName() +
" from " + in + " of size " + reading.size() + " and isopen=" + reading.isOpen());
java.util.zip.CRC32 placeholder = new java.util.zip.CRC32();
ByteBuffer buffer = ByteBuffer.allocate (reasonable_buffer_size);
int bytesread = 0;
int loops = 0;
while ((bytesread = reading.read(buffer)) > 0) {
byte[] raw = buffer.array();
System.err.println("Claims to have read " + bytesread + " bytes, have buffer of size " + raw.length + ", updating CRC");
placeholder.update(raw);
loops++;
buffer.clear();
}
// do stuff with placeholder.getValue()
}
catch (all the things that go wrong with opening files) {
and handle them;
}
The System.err and loops stuff is just for debugging; we don't actually care how many times it takes. The output is:
byte channel is a sun.nio.ch.FileChannelImpl from C:\working\tmp\ls2kst83543216xuxxy8136.tmp of size 7196 and isopen=true
finished after 0 time(s) through the loop
There's no way to run the real code inside a debugger to step through it, but from looking at the source to sun.nio.ch.FileChannelImpl.read() it looks like a 0 is returned if the file magically becomes closed while internal data structures are prepared; the code below is copied from the Java 7 reference implementation, comments added by me:
// sun.nio.ch.FileChannelImpl.java
public int read(ByteBuffer dst) throws IOException {
ensureOpen(); // this throws if file is closed...
if (!readable)
throw new NonReadableChannelException();
synchronized (positionLock) {
int n = 0;
int ti = -1;
Object traceContext = IoTrace.fileReadBegin(path);
try {
begin();
ti = threads.add();
if (!isOpen())
return 0; // ...argh
do {
n = IOUtil.read(fd, dst, -1, nd);
} while (......)
.......
But the debugging code tests isOpen() and gets true. So I don't know what's going wrong.
As the current test data files are tiny, I dropped this in place just to have something working:
works for now:
try {
byte[] scratch = Files.readAllBytes(in);
java.util.zip.CRC32 placeholder = new java.util.zip.CRC32();
placeholder.update(scratch);
// do stuff with placeholder.getValue()
}
I don't want to slurp the entire file into memory for the Real Code, because some of those files can be large. I do note that readAllBytes uses an InputStream in its reference implementation, which has no trouble reading the same file that SeekableByteChannel failed to. So I'll probably rewrite the code to just use input streams instead of byte channels. I'd still like to figure out what's gone wrong in case a future scenario comes up where we need to use byte channels. What am I missing with SeekableByteChannel?
Check that 'reasonable_buffer_size' isn't zero.