Comparing text files with Junit - java

I am comparing text files in junit using:
public static void assertReaders(BufferedReader expected,
BufferedReader actual) throws IOException {
String line;
while ((line = expected.readLine()) != null) {
assertEquals(line, actual.readLine());
}
assertNull("Actual had more lines then the expected.", actual.readLine());
assertNull("Expected had more lines then the actual.", expected.readLine());
}
Is this a good way to compare text files? What is preferred?

Here's one simple approach for checking if the files are exactly the same:
assertEquals("The files differ!",
FileUtils.readFileToString(file1, "utf-8"),
FileUtils.readFileToString(file2, "utf-8"));
Where file1 and file2 are File instances, and FileUtils is from Apache Commons IO.
Not much own code for you to maintain, which is always a plus. :) And very easy if you already happen to use Apache Commons in your project. But no nice, detailed error messages like in mark's solution.
Edit:
Heh, looking closer at the FileUtils API, there's an even simpler way:
assertTrue("The files differ!", FileUtils.contentEquals(file1, file2));
As a bonus, this version works for all files, not just text.

junit-addons has nice support for it: FileAssert
It gives you exceptions like:
junitx.framework.ComparisonFailure: aa Line [3] expected: [b] but was:[a]

Here is a more exhaustive list of File comparator's in various 3rd-party Java libraries:
org.apache.commons.io.FileUtils
org.dbunit.util.FileAsserts
org.fest.assertions.FileAssert
junitx.framework.FileAssert
org.springframework.batch.test.AssertFile
org.netbeans.junit.NbTestCase
org.assertj.core.api.FileAssert

As of 2015, I would recomment AssertJ, an elegant and comprehensive assertion library. For files, you can assert against another file:
#Test
public void file() {
File actualFile = new File("actual.txt");
File expectedFile = new File("expected.txt");
assertThat(actualFile).hasSameTextualContentAs(expectedFile);
}
or against inline strings:
#Test
public void inline() {
File actualFile = new File("actual.txt");
assertThat(linesOf(actualFile)).containsExactly(
"foo 1",
"foo 2",
"foo 3"
);
}
The failure messages are very informative as well. If a line is different, you get:
java.lang.AssertionError:
File:
<actual.txt>
and file:
<expected.txt>
do not have equal content:
line:<2>,
Expected :foo 2
Actual :foo 20
and if one of the files has more lines you get:
java.lang.AssertionError:
File:
<actual.txt>
and file:
<expected.txt>
do not have equal content:
line:<4>,
Expected :EOF
Actual :foo 4

Simple comparison of the content of two files with java.nio.file API.
byte[] file1Bytes = Files.readAllBytes(Paths.get("Path to File 1"));
byte[] file2Bytes = Files.readAllBytes(Paths.get("Path to File 2"));
String file1 = new String(file1Bytes, StandardCharsets.UTF_8);
String file2 = new String(file2Bytes, StandardCharsets.UTF_8);
assertEquals("The content in the strings should match", file1, file2);
Or if you want to compare individual lines:
List<String> file1 = Files.readAllLines(Paths.get("Path to File 1"));
List<String> file2 = Files.readAllLines(Paths.get("Path to File 2"));
assertEquals(file1.size(), file2.size());
for(int i = 0; i < file1.size(); i++) {
System.out.println("Comparing line: " + i)
assertEquals(file1.get(i), file2.get(i));
}

I'd suggest using Assert.assertThat and a hamcrest matcher (junit 4.5 or later - perhaps even 4.4).
I'd end up with something like:
assertThat(fileUnderTest, containsExactText(expectedFile));
where my matcher is:
class FileMatcher {
static Matcher<File> containsExactText(File expectedFile){
return new TypeSafeMatcher<File>(){
String failure;
public boolean matchesSafely(File underTest){
//create readers for each/convert to strings
//Your implementation here, something like:
String line;
while ((line = expected.readLine()) != null) {
Matcher<?> equalsMatcher = CoreMatchers.equalTo(line);
String actualLine = actual.readLine();
if (!equalsMatcher.matches(actualLine){
failure = equalsMatcher.describeFailure(actualLine);
return false;
}
}
//record failures for uneven lines
}
public String describeFailure(File underTest);
return failure;
}
}
}
}
Matcher pros:
Composition and reuse
Use in normal code as well as test
Collections
Used in mock framework(s)
Can be used a general predicate function
Really nice log-ability
Can be combined with other matchers and descriptions and failure descriptions are accurate and precise
Cons:
Well it's pretty obvious right? This is way more verbose than assert or junitx (for this particular case)
You'll probably need to include the hamcrest libs to get the most benefit

FileUtils sure is a good one. Here's yet another simple approach for checking if the files are exactly the same.
assertEquals(FileUtils.checksumCRC32(file1), FileUtils.checksumCRC32(file2));
While the assertEquals() does provide a little more feedback than the assertTrue(), the result of checksumCRC32() is a long. So, that may not be intrisically helpful.

If expected has more lines than actual, you'll fail an assertEquals before getting to the assertNull later.
It's fairly easy to fix though:
public static void assertReaders(BufferedReader expected,
BufferedReader actual) throws IOException {
String expectedLine;
while ((expectedLine = expected.readLine()) != null) {
String actualLine = actual.readLine();
assertNotNull("Expected had more lines then the actual.", actualLine);
assertEquals(expectedLine, actualLine);
}
assertNull("Actual had more lines then the expected.", actual.readLine());
}

This is my own implementation of equalFiles, no need to add any library to your project.
private static boolean equalFiles(String expectedFileName,
String resultFileName) {
boolean equal;
BufferedReader bExp;
BufferedReader bRes;
String expLine ;
String resLine ;
equal = false;
bExp = null ;
bRes = null ;
try {
bExp = new BufferedReader(new FileReader(expectedFileName));
bRes = new BufferedReader(new FileReader(resultFileName));
if ((bExp != null) && (bRes != null)) {
expLine = bExp.readLine() ;
resLine = bRes.readLine() ;
equal = ((expLine == null) && (resLine == null)) || ((expLine != null) && expLine.equals(resLine)) ;
while(equal && expLine != null)
{
expLine = bExp.readLine() ;
resLine = bRes.readLine() ;
equal = expLine.equals(resLine) ;
}
}
} catch (Exception e) {
} finally {
try {
if (bExp != null) {
bExp.close();
}
if (bRes != null) {
bRes.close();
}
} catch (Exception e) {
}
}
return equal;
}
And to use it just use regular AssertTrue JUnit method
assertTrue(equalFiles(expected, output)) ;

Related

finding character count between two special symbols

Am trying to find the character count between = and \n new line character using below java code. But \n is not considering in my case.
am using import org.apache.commons.lang3.StringUtils; package
Please find my below java code.
public class CharCountInLine {
public static void main(String[] args)
{
BufferedReader reader = null;
try
{
reader = new BufferedReader(new FileReader("C:\\wordcount\\sample.txt"));
String currentLine = reader.readLine();
String[] line = currentLine.split("=");
while (currentLine != null ){
String res = StringUtils.substringBetween(currentLine, "=", "\n"); // \n is not working.
if(res != null) {
System.out.println("line -->"+res.length());
}
currentLine = reader.readLine();
}
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
reader.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
}
Please find my sample text file.
sample.txt
Karthikeyan=123456
sathis= 23546
Arun = 23564
Well, you're reading the string using readLine(), which according to the Javadoc (emphasis mine):
Returns:
A String containing the contents of the line, not including
any line-termination characters, or null if the end of the stream has
been reached
So your code doesn't work because the string does not contain a newline character.
You can address this in a number of ways:
Use StringUtils.substringAfter() instead of StringUtils.substringBetween().
If it meets the requirements, treat your file as a Java properties file so you don't need to parse it yourself.
Use String.split().
Use String.lastIndexOf().
Some simple regex matching and grouping.
You don't need to change how you read the lines, simply change your logic to extract the text after =.
Pattern p = Pattern.compile("(?:.+)=(.+)$");
Matcher m = p.matcher("Karthikeyan=123456");
if (m.find()) {
System.out.println(m.group(1).length());
}
No need for Apache StringUtils either, simple Java regex will do. If you don't want to count whitespace, trim the string before calling length().
Alternatively, you can also split the line around = as discussed here.
10x simpler code:
Path p = Paths.get("C:\\wordcount\\sample.txt");
Files.lines(p)
.forEach { line ->
// Put the above code here
}

Java compare strings from two places and exclude any matches

I'm trying to end up with a results.txt minus any matching items, having successfully compared some string inputs against another .txt file. Been staring at this code for way too long and I can't figure out why it isn't working. New to coding so would appreciate it if I could be steered in the right direction! Maybe I need a different approach? Apologies in advance for any loud tutting noises you may make. Using Java8.
//Sending a String[] into 'searchFile', contains around 8 small strings.
//Example of input: String[]{"name1","name2","name 3", "name 4.zip"}
^ This is my exclusions list.
public static void searchFile(String[] arr, String separator)
{
StringBuilder b = new StringBuilder();
for(int i = 0; i < arr.length; i++)
{
if(i != 0) b.append(separator);
b.append(arr[i]);
String findME = arr[i];
searchInfo(MyApp.getOptionsDir()+File.separator+"file-to-search.txt",findME);
}
}
^This works fine. I'm then sending the results to 'searchInfo' and trying to match and remove any duplicate (complete, not part) strings. This is where I am currently failing. Code runs but doesn't produce my desired output. It often finds part strings rather than complete ones. I think the 'results.txt' file is being overwritten each time...but I'm not sure tbh!
file-to-search.txt contains: "name2","name.zip","name 3.zip","name 4.zip" (text file is just a single line)
public static String searchInfo(String fileName, String findME)
{
StringBuffer sb = new StringBuffer();
try {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String line = null;
while((line = br.readLine()) != null)
{
if(line.startsWith("\""+findME+"\""))
{
sb.append(line);
//tried various replace options with no joy
line = line.replaceFirst(findME+"?,", "");
//then goes off with results to create a txt file
FileHandling.createFile("results.txt",line);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}
What i'm trying to end up with is a result file MINUS any matching complete strings (not part strings):
e.g. results.txt to end up with: "name.zip","name 3.zip"
ok with the information I have. What you can do is this
List<String> result = new ArrayList<>();
String content = FileUtils.readFileToString(file, "UTF-8");
for (String s : content.split(", ")) {
if (!s.equals(findME)) { // assuming both have string quotes added already
result.add(s);
}
}
FileUtils.write(newFile, String.join(", ", result), "UTF-8");
using apache commons file utils for ease. You may add or remove spaces after comma as per your need.

JAVA: Getting the content of specific strings from text files

I have a text file like this:
text
text
text
.
.
#data
instances1
instances2
.
.
instancesN
I want to get the contents of this file from #data until the end of the file, how can I do?
I found this method of FileUtils (from apache commons-lang) class but it's usable only if I already know the line number.
String ln = FileUtils.readLines(new File("arff_file/"+results.get(0)))
.get(lineNumber);
Since you are using Apache Commons, you can do it in one line:
String contents = FileUtils.readFileToString(new File("arff_file/"+results.get(0)), "UTF-16").replaceAll("^.*?(?=#data)", "");
This works by
reading the whole file into a single String
using regex-based replaceAll() to remove (by replacing with a blank) everything up to, but not including, #data
The regex breakdown of ^.*?(?=#data) is:
^ start of input
.*? a reluctantly quantified wildcard
(?=#data) a positive (non-consuming) look ahead that asserts that the next input is #data
A reluctant quantifier could be important to use so it won't skip past the first #data, in case it appears more than once in the input.
try {
String file = "fileName";
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
if (line.equals("#data"))
nowRead(br);//I just do this for more efficiency, you can set a boolean flag instead
}
br.close();
}catch (IOException e) {
//OMG Exception again!
}
}
static ArrayList<String> nowRead(BufferedReader br) throws IOException {
ArrayList<String> s = new ArrayList<String>();// do it as you wish
String line;
while ((line = br.readLine()) != null) {
s.add(line);
}
return s;
}
Path start = Paths.get("test.txt");
try
{
List<String> lines = Files.readAllLines(start);
for (Iterator<String> it = lines.iterator(); it.hasNext();)
{
String line = it.next();
if (!"#data".equals(line.trim()))
{
it.remove();
}
else
{
break;
}
}
System.out.println(lines);
}
catch (IOException e)
{
e.printStackTrace();
}
I was reading about Path online so why not something like this as alternative to Bohemian code?
Maybe something could be done using stream() of Java 8 but not still nothing...

Iterating over the content of a text file line by line - is there a best practice? (vs. PMD's AssignmentInOperand)

We have a Java Application that has a few modules that know to read text files. They do it quite simply with a code like this:
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
while ((line = br.readLine()) != null)
{
... // do stuff to file here
}
I ran PMD on my project and got the 'AssignmentInOperand' violation on the while (...) line.
Is there a simpler way of doing this loop other than the obvious:
String line = br.readLine();
while (line != null)
{
... // do stuff to file here
line = br.readLine();
}
Is this considered a better practice? (although we "duplicate" the line = br.readLine() code?)
I know is an old post but I just had the same need (almost) and I solve it using a LineIterator from FileUtils in Apache Commons.
From their javadoc:
LineIterator it = FileUtils.lineIterator(file, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
} finally {
it.close();
}
Check the documentation:
http://commons.apache.org/proper/commons-io/javadocs/api-release/org/apache/commons/io/LineIterator.html
The support for streams and Lambdas in java-8 and Try-With-Resources of java-7 allows you to achive what you want in more compact syntax.
Path path = Paths.get("c:/users/aksel/aksel.txt");
try (Stream<String> lines = Files.lines(path)) {
lines.forEachOrdered(line->System.out.println(line));
} catch (IOException e) {
//error happened
}
I generally prefer the former. I don't generally like side-effects within a comparison, but this particular example is an idiom which is so common and so handy that I don't object to it.
(In C# there's a nicer option: a method to return an IEnumerable<string> which you can iterate over with foreach; that isn't as nice in Java because there's no auto-dispose at the end of an enhanced for loop... and also because you can't throw IOException from the iterator, which means you can't just make one a drop-in replacement for the other.)
To put it another way: the duplicate line issue bothers me more than the assignment-within-operand issue. I'm used to taking in this pattern at a glance - with the duplicate line version I need to stop and check that everything's in the right place. That's probably habit as much as anything else, but I don't think it's a problem.
I routinely use the while((line = br.readLine()) != null) construct... but, recently I came accross this nice alternative:
BufferedReader br = new BufferedReader(new FileReader(file));
for (String line = br.readLine(); line != null; line = br.readLine()) {
... // do stuff to file here
}
This is still duplicating the readLine() call code, but the logic is clear, etc.
The other time I use the while(( ... ) ...) construct is when reading from a stream in to a byte[] array...
byte[] buffer = new byte[size];
InputStream is = .....;
int len = 0;
while ((len = is.read(buffer)) >= 0) {
....
}
This can also be transformed in to a for loop with:
byte[] buffer = new byte[size];
InputStream is = .....;
for (int len = is.read(buffer); len >= 0; len = is.read(buffer)) {
....
}
I am not sure I really prefer the for-loop alternatives.... but, it will satisfy any PMD tool, and the logic is still clear, etc.
Based on Jon's answer I got to thinking it should be easy enough to create a decorator to act as a file iterator so you can use a foreach loop:
public class BufferedReaderIterator implements Iterable<String> {
private BufferedReader r;
public BufferedReaderIterator(BufferedReader r) {
this.r = r;
}
#Override
public Iterator<String> iterator() {
return new Iterator<String>() {
#Override
public boolean hasNext() {
try {
r.mark(1);
if (r.read() < 0) {
return false;
}
r.reset();
return true;
} catch (IOException e) {
return false;
}
}
#Override
public String next() {
try {
return r.readLine();
} catch (IOException e) {
return null;
}
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
};
}
}
Fair warning: this suppresses IOExceptions that might occur during reads and simply stops the reading process. It's unclear that there's a way around this in Java without throwing runtime exceptions as the semantics of the iterator methods are well defined and must be conformed to in order to use the for-each syntax. Also, running multiple iterators here would have some strange behavior; so I'm not sure this is recommended.
I did test this, though, and it does work.
Anyway, you get the benefit of for-each syntax using this as a kind of decorator:
for(String line : new BufferedReaderIterator(br)){
// do some work
}
Google's Guava Libraries provide an alternative solution using the static method CharStreams.readLines(Readable, LineProcessor<T>) with an implementation of LineProcessor<T> for processing each line.
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
CharStreams.readLines(br, new MyLineProcessorImpl());
} catch (IOException e) {
// handling io error ...
}
The body of the while loop is now placed in the LineProcessor<T> implementation.
class MyLineProcessorImpl implements LineProcessor<Object> {
#Override
public boolean processLine(String line) throws IOException {
if (// check if processing should continue) {
// do sth. with line
return true;
} else {
// stop processing
return false;
}
}
#Override
public Object getResult() {
// return a result based on processed lines if needed
return new Object();
}
}
I'm a bit surprised the following alternative was not mentioned:
while( true ) {
String line = br.readLine();
if ( line == null ) break;
... // do stuff to file here
}
Before Java 8 it was my favorite because of its clarity and not requiring repetition. IMO, break is a better option to expressions with side-effects. It's still a matter of idioms, though.
AssignmentInOperand is a controversial rule in PMD, the reason of this rule is: "this can make code more complicated and harder to read" (please refer http://pmd.sourceforge.net/rules/controversial.html)
You could disable that rule if you really want to do it that way. In my side I prefer the former.

Modify a .txt file in Java

I have a text file that I want to edit using Java. It has many thousands of lines. I basically want to iterate through the lines and change/edit/delete some text. This will need to happen quite often.
From the solutions I saw on other sites, the general approach seems to be:
Open the existing file using a BufferedReader
Read each line, make modifications to each line, and add it to a StringBuilder
Once all the text has been read and modified, write the contents of the StringBuilder to a new file
Replace the old file with the new file
This solution seems slightly "hacky" to me, especially if I have thousands of lines in my text file.
Anybody know of a better solution?
I haven't done this in Java recently, but writing an entire file into memory seems like a bad idea.
The best idea that I can come up with is open a temporary file in writing mode at the same time, and for each line, read it, modify if necessary, then write into the temporary file. At the end, delete the original and rename the temporary file.
If you have modify permissions on the file system, you probably also have deleting and renaming permissions.
if the file is just a few thousand lines you should be able to read the entire file in one read and convert that to a String.
You can use apache IOUtils which has method like the following.
public static String readFile(String filename) throws IOException {
File file = new File(filename);
int len = (int) file.length();
byte[] bytes = new byte[len];
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
assert len == fis.read(bytes);
} catch (IOException e) {
close(fis);
throw e;
}
return new String(bytes, "UTF-8");
}
public static void writeFile(String filename, String text) throws IOException {
FileOutputStream fos = null;
try {
fos = new FileOutputStream(filename);
fos.write(text.getBytes("UTF-8"));
} catch (IOException e) {
close(fos);
throw e;
}
}
public static void close(Closeable closeable) {
try {
closeable.close();
} catch(IOException ignored) {
}
}
You can use RandomAccessFile in Java to modify the file on one condition:
The size of each line has to be fixed otherwise, when new string is written back, it might override the string in the next line.
Therefore, in my example, I set the line length as 100 and padding with space string when creating the file and writing back to the file.
So in order to allow update, you need to set the length of line a little larger than the longest length of the line in this file.
public class RandomAccessFileUtil {
public static final long RECORD_LENGTH = 100;
public static final String EMPTY_STRING = " ";
public static final String CRLF = "\n";
public static final String PATHNAME = "/home/mjiang/JM/mahtew.txt";
/**
* one two three
Text to be appended with
five six seven
eight nine ten
*
*
* #param args
* #throws IOException
*/
public static void main(String[] args) throws IOException
{
String starPrefix = "Text to be appended with";
String replacedString = "new text has been appended";
RandomAccessFile file = new RandomAccessFile(new File(PATHNAME), "rw");
String line = "";
while((line = file.readLine()) != null)
{
if(line.startsWith(starPrefix))
{
file.seek(file.getFilePointer() - RECORD_LENGTH - 1);
file.writeBytes(replacedString);
}
}
}
public static void createFile() throws IOException
{
RandomAccessFile file = new RandomAccessFile(new File(PATHNAME), "rw");
String line1 = "one two three";
String line2 = "Text to be appended with";
String line3 = "five six seven";
String line4 = "eight nine ten";
file.writeBytes(paddingRight(line1));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line2));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line3));
file.writeBytes(CRLF);
file.writeBytes(paddingRight(line4));
file.writeBytes(CRLF);
file.close();
System.out.println(String.format("File is created in [%s]", PATHNAME));
}
public static String paddingRight(String source)
{
StringBuilder result = new StringBuilder(100);
if(source != null)
{
result.append(source);
for (int i = 0; i < RECORD_LENGTH - source.length(); i++)
{
result.append(EMPTY_STRING);
}
}
return result.toString();
}
}
If the file is large, you might want to use a FileStream for output, but that seems pretty much like it is the simplest process to do what you're asking (and without more specificity i.e. on what types of changes / edits / deletions you're trying to do, it's impossible to determine what more complicated way might work).
No reason to buffer the entire file.
Simply write each line as your read it, insert lines when necessary, delete lines when necessary, replace lines when necessary.
Fundamentally, you will not get around having to recreate the file wholesale, especially if it's just a text file.
What kind of data is it? Do you control the format of the file?
If the file contains name/value pairs (or similar), you could have some luck with Properties, or perhaps cobbling together something using a flat file JDBC driver.
Alternatively, have you considered not writing the data so often? Operating on an in-memory copy of your file should be relatively trivial. If there are no external resources which need real time updates of the file, then there is no need to go to disk every time you want to make a modification. You can run a scheduled task to write periodic updates to disk if you are worried about data backup.
In general you cannot edit the file in place; it's simply a very long sequence of characters, which happens to include newline characters. You could edit in place if your changes don't change the number of characters in each line.
Can't you use regular expressions, if you know what you want to change ? Jakarta Regexp should probably do the trick.
Although this question was a time ago posted, I think it is good to put my answer here.
I think that the best approach is to use FileChannel from java.nio.channels package in this scenario. But this, only if you need to have a good performance! You would need to get a FileChannel via a RandomAccessFile, like this:
java.nio.channels.FileChannel channel = new java.io.RandomAccessFile("/my/fyle/path", "rw").getChannel();
After this, you need a to create a ByteBuffer where you will read from the FileChannel.
this looks something like this:
java.nio.ByteBuffer inBuffer = java.nio.ByteBuffer.allocate(100);
int pos = 0;
int aux = 0;
StringBuilder sb = new StringBuilder();
while (pos != -1) {
aux = channel.read(inBuffer, pos);
pos = (aux != -1) ? pos + aux : -1;
b = inBuffer.array();
sb.delete(0, sb.length());
for (int i = 0; i < b.length; ++i) {
sb.append((char)b[i]);
}
//here you can do your stuff on sb
inBuffer = ByteBuffer.allocate(100);
}
Hope that my answer will help you!
I think, FileOutputStream.getFileChannel() will help a lot, see FileChannel api
http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html
private static void modifyFile(String filePath, String oldString, String newString) {
File fileToBeModified = new File(filePath);
StringBuilder oldContent = new StringBuilder();
try (BufferedReader reader = new BufferedReader(new FileReader(fileToBeModified))) {
String line = reader.readLine();
while (line != null) {
oldContent.append(line).append(System.lineSeparator());
line = reader.readLine();
}
String content = oldContent.toString();
String newContent = content.replaceAll(oldString, newString);
try (FileWriter writer = new FileWriter(fileToBeModified)) {
writer.write(newContent);
}
} catch (IOException e) {
e.printStackTrace();
}
}
You can change the txt file to java by saving on clicking "Save As" and saving *.java extension.

Categories

Resources