I am beginning with Java and testng test cases.
I need to write a class, which reads data from a file and makes an in-memory data structure and uses this data structure for further processing. I would like to test, if this DS is being populated correctly. This would call for dumping the DS into a file and then comparing the input file with the dumped file. Is there any testNG assert available for file matching? Is this a common practice?
I think it would be better to compare the data itself not the written out data.
So I would write a method in the class to return this data structure (let's call it getDataStructure()) and then write a unit test to compare with the correct data.
This only needs a correct equals() method in your data structure class and do:
Assert.assertEquals(yourClass.getDataStructure(), correctData);
Of course if you need to write out the data structure to a file, then you can test the serialization and deserialization separately.
File compare/matching can be extracted to a utility method or something like that.
If you need it only for testing there are addons for jUnit
http://junit-addons.sourceforge.net/junitx/framework/FileAssert.html
If you need file compare outside the testing environment you can use this simple function
public static boolean fileContentEquals(String filePathA, String filePathB) throws Exception {
if (!compareFilesLength(filePathA, filePathB)) return false;
BufferedInputStream streamA = null;
BufferedInputStream streamB = null;
try {
File fileA = new File(filePathA);
File fileB = new File(filePathB);
streamA = new BufferedInputStream(new FileInputStream(fileA));
streamB = new BufferedInputStream(new FileInputStream(fileB));
int chunkSizeInBytes = 16384;
byte[] bufferA = new byte[chunkSizeInBytes];
byte[] bufferB = new byte[chunkSizeInBytes];
int totalReadBytes = 0;
while (totalReadBytes < fileA.length()) {
int readBytes = streamA.read(bufferA);
streamB.read(bufferB);
if (readBytes == 0) break;
MessageDigest digestA = MessageDigest.getInstance(CHECKSUM_ALGORITHM);
MessageDigest digestB = MessageDigest.getInstance(CHECKSUM_ALGORITHM);
digestA.update(bufferA, 0, readBytes);
digestB.update(bufferB, 0, readBytes);
if (!MessageDigest.isEqual(digestA.digest(), digestB.digest()))
{
closeStreams(streamA, streamB);
return false;
}
totalReadBytes += readBytes;
}
closeStreams(streamA, streamB);
return true;
} finally {
closeStreams(streamA, streamB);
}
}
public static void closeStreams(Closeable ...streams) {
for (int i = 0; i < streams.length; i++) {
Closeable stream = streams[i];
closeStream(stream);
}
}
public static boolean compareFilesLength(String filePathA, String filePathB) {
File fileA = new File(filePathA);
File fileB = new File(filePathB);
return fileA.length() == fileB.length();
}
private static void closeStream(Closeable stream) {
try {
stream.close();
} catch (IOException e) {
// ignore exception
}
}
Your choice, but having an utility class with that functionality that can be reused is better imho.
Good luck and have fun.
Personally I would do the opposite. Surely you need a way to compare two of these data structure in the Java world - so the test would read from the file, build the DS, do its processing, and then assert it's equal to an "expected" DS you set up in your test.
(using JUnit4)
#Test
public void testProcessingDoesWhatItShould() {
final DataStructure original = readFromFile(filename);
final DataStructure actual = doTheProcessingYouNeedToDo(original);
final DataStructure expected = generateMyExpectedResult();
Assert.assertEquals("data structure", expected, actual);
}
If this DS is a simple Java Bean. then you can use EqualsBuilder from Apache Commons to compare 2 objects.
compare bytes loaded from file system and bytes you are going to write file system
pseudo code
byte[] loadedBytes = loadFileContentFromFile(file) // maybe apache commons IOUtils.toByteArray(InputStream input)
byte[] writeBytes = constructBytesFromDataStructure(dataStructure)
Assert.assertTrue(java.util.Arrays.equals(writeBytes ,loadedBytes));
Related
How would you write a java function boolean sameContent(Path file1,Path file2)which determines if the two given paths point to files which store the same content? Of course, first, I would check if the file sizes are the same. This is a necessary condition for storing the same content. But then I'd like to listen to your approaches. If the two files are stored on the same hard drive (like in most of my cases) it's probably not the best way to jump too many times between the two streams.
Exactly what FileUtils.contentEquals method of Apache commons IO does and api is here.
Try something like:
File file1 = new File("file1.txt");
File file2 = new File("file2.txt");
boolean isTwoEqual = FileUtils.contentEquals(file1, file2);
It does the following checks before actually doing the comparison:
existence of both the files
Both file's that are passed are to be of file type and not directory.
length in bytes should not be the same.
Both are different files and not one and the same.
Then compare the contents.
If you don't want to use any external libraries, then simply read the files into byte arrays and compare them (won't work pre Java-7):
byte[] f1 = Files.readAllBytes(file1);
byte[] f2 = Files.readAllBytes(file2);
by using Arrays.equals.
If the files are large, then instead of reading the entire files into arrays, you should use BufferedInputStream and read the files chunk-by-chunk as explained here.
Since Java 12 there is method Files.mismatch which returns -1 if there is no mismatch in the content of the files. Thus the function would look like following:
private static boolean sameContent(Path file1, Path file2) throws IOException {
return Files.mismatch(file1, file2) == -1;
}
If the files are small, you can read both into the memory and compare the byte arrays.
If the files are not small, you can either compute the hashes of their content (e.g. MD5 or SHA-1) one after the other and compare the hashes (but this still leaves a very small chance of error), or you can compare their content but for this you still have to read the streams alternating.
Here is an example:
boolean sameContent(Path file1, Path file2) throws IOException {
final long size = Files.size(file1);
if (size != Files.size(file2))
return false;
if (size < 4096)
return Arrays.equals(Files.readAllBytes(file1), Files.readAllBytes(file2));
try (InputStream is1 = Files.newInputStream(file1);
InputStream is2 = Files.newInputStream(file2)) {
// Compare byte-by-byte.
// Note that this can be sped up drastically by reading large chunks
// (e.g. 16 KBs) but care must be taken as InputStream.read(byte[])
// does not neccessarily read a whole array!
int data;
while ((data = is1.read()) != -1)
if (data != is2.read())
return false;
}
return true;
}
This should help you with your problem:
package test;
import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;
public class CompareFileContents {
public static void main(String[] args) throws IOException {
File file1 = new File("test1.txt");
File file2 = new File("test2.txt");
File file3 = new File("test3.txt");
boolean compare1and2 = FileUtils.contentEquals(file1, file2);
boolean compare2and3 = FileUtils.contentEquals(file2, file3);
boolean compare1and3 = FileUtils.contentEquals(file1, file3);
System.out.println("Are test1.txt and test2.txt the same? " + compare1and2);
System.out.println("Are test2.txt and test3.txt the same? " + compare2and3);
System.out.println("Are test1.txt and test3.txt the same? " + compare1and3);
}
}
If it for unit test, then AssertJ provides a method named hasSameContentAs. An example:
Assertions.assertThat(file1).hasSameContentAs(file2)
I know I'm pretty late to the party on this one, but memory mapped IO is a pretty simple way to do this if you want to use straight Java APIs and no third party dependencies. It's only a few calls to open the files, map them, and then compare use ByteBuffer.equals(Object) to compare the files.
This is probably going to give you the best performance if you expect the particular file to be large because you're offloading a majority of the IO legwork onto the OS and the otherwise highly optimized bits of the JVM (assuming you're using a decent JVM).
Straight from the
FileChannel JavaDoc:
For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory.
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
public class MemoryMappedCompare {
public static boolean areFilesIdenticalMemoryMapped(final Path a, final Path b) throws IOException {
try (final FileChannel fca = FileChannel.open(a, StandardOpenOption.READ);
final FileChannel fcb = FileChannel.open(b, StandardOpenOption.READ)) {
final MappedByteBuffer mbba = fca.map(FileChannel.MapMode.READ_ONLY, 0, fca.size());
final MappedByteBuffer mbbb = fcb.map(FileChannel.MapMode.READ_ONLY, 0, fcb.size());
return mbba.equals(mbbb);
}
}
}
It's >=JR6 compatible, library-free and don't read all content at time.
public static boolean sameFile(File a, File b) {
if (a == null || b == null) {
return false;
}
if (a.getAbsolutePath().equals(b.getAbsolutePath())) {
return true;
}
if (!a.exists() || !b.exists()) {
return false;
}
if (a.length() != b.length()) {
return false;
}
boolean eq = true;
FileChannel channelA;
FileChannel channelB;
try {
channelA = new RandomAccessFile(a, "r").getChannel();
channelB = new RandomAccessFile(b, "r").getChannel();
long channelsSize = channelA.size();
ByteBuffer buff1 = channelA.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
ByteBuffer buff2 = channelB.map(FileChannel.MapMode.READ_ONLY, 0, channelsSize);
for (int i = 0; i < channelsSize; i++) {
if (buff1.get(i) != buff2.get(i)) {
eq = false;
break;
}
}
} catch (FileNotFoundException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(HotUtils.class.getName()).log(Level.SEVERE, null, ex);
}
return eq;
}
package test;
import org.junit.jupiter.api.Test;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import static org.junit.Assert.assertEquals;
public class CSVResultDIfference {
#Test
public void csvDifference() throws IOException {
Path file_F = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestX", "yolo2.csv");
long size_F = Files.size(file_F);
Path file_I = FileSystems.getDefault().getPath("C:\\Projekts\\csvTestZ", "yolo2.csv");
long size_I = Files.size(file_I);
assertEquals(size_F, size_I);
}
}
it worked for me :)
I built a classic Hoffman code, with encoder and decoder. I noticed that I had a problem, I use code in "bitset", to compress the input file. But the "bitset" - does not decode all the files I send to, for example when I send a txt file, it works great, but when I send other files like BMP. It doesn't work.
Before I used bitset - the code worked - but without any compression - so I'm afraid the problem is with bitset.
The decoder I built is:
public void Decompress(String[] input_names, String[] output_names) {
HuffmanVerticle tree = new HuffmanVerticle();
tree = readTreeFile(output_names);
restoreInput(tree, output_names, input_names);
}
public static void restoreInput(HuffmanVerticle tree, String[] binary_names, String[] original_names) {
BitSet huffmanCodeBit;
try {
FileOutputStream to_original = new FileOutputStream(original_names[0]);
FileInputStream binary = new FileInputStream(binary_names[0]);
ObjectInputStream s = new ObjectInputStream(binary);
huffmanCodeBit = (BitSet) s.readObject();
System.out.println(huffmanCodeBit.toString());
int index = 0;
while(huffmanCodeBit.length() > index)
{
HuffmanVerticle tmp = tree;
while (!tmp.isNullTree())
{
boolean bit = huffmanCodeBit.get(index);
index++;
System.out.println(bit);
if (!bit)
tmp = tmp.left;
else
tmp = tmp.right;
}
to_original.write(tmp.character);
}
binary.close();
to_original.close();
} catch (Exception e) {
e.printStackTrace();
}
}
What am I missing here? Why doesn't the code work for certain files? I'm trying to run the code on some files but it doesn't work, the files that come back don't work.
The code does not work for bmp files at all, even after half an hour, for example txt files, it runs very fast.
Thank for your help.
Am trying to encode pdf documents to base64, If it is less in number ( like 2000 documents) its working nicely. But am having 100k plus doucments to be encode.
Its take more time to encode all those files. Is there any better approach to encode large data set.?
Please find my current approach
String filepath=doc.getPath().concat(doc.getFilename());
file = new File(filepath);
if(file.exists() && !file.isDirectory()) {
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
fileInputStreamReader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
Try this:
Figure out how many files you need to encode.
int files = Files.list(Paths.get(directory)).count();
Split them up into a reasonable amount that a thread can handle in java. I.E) If you have 100k files to encode. Split it into 1000 lists of 1000, something like that.
int currentIndex = 0;
for (File file : filesInDir) {
if (fileMap.get(currentIndex).size() >= cap)
currentIndex++;
fileMap.get(currentIndex).add(file);
}
/** Its going to take a little more effort than this, but its the idea im trying to show you*/
Execute each worker thread one after another if the computers resources are available.
for (Integer key : fileMap.keySet()) {
new WorkerThread(fileMap.get(key)).start();
}
You can check the current resources available with:
public boolean areResourcesAvailable() {
return imNotThatNice();
}
/**
* Gets the resource utility instance
*
* #return the current instance of the resource utility
*/
private static OperatingSystemMXBean getInstance() {
if (ResourceUtil.instance == null) {
ResourceUtil.instance = ManagementFactory.getOperatingSystemMXBean();
}
return ResourceUtil.instance;
}
I'm currently trying to write a custom streams proxy (let's call it in that way) that can change the content from the given input stream and produce a modified, if necessary, output. This requirement is really necessary because sometimes I have to modify the streams in my application (e.g. compress the data truly on the fly). The following class is pretty easy and it uses internal buffering.
private static class ProxyInputStream extends InputStream {
private final InputStream iStream;
private final byte[] iBuffer = new byte[512];
private int iBufferedBytes;
private final ByteArrayOutputStream oBufferStream;
private final OutputStream oStream;
private byte[] oBuffer = emptyPrimitiveByteArray;
private int oBufferIndex;
ProxyInputStream(InputStream iStream, IFunction<OutputStream, ByteArrayOutputStream> oStreamFactory) {
this.iStream = iStream;
oBufferStream = new ByteArrayOutputStream(512);
oStream = oStreamFactory.evaluate(oBufferStream);
}
#Override
public int read() throws IOException {
if ( oBufferIndex == oBuffer.length ) {
iBufferedBytes = iStream.read(iBuffer);
if ( iBufferedBytes == -1 ) {
return -1;
}
oBufferIndex = 0;
oStream.write(iBuffer, 0, iBufferedBytes);
oStream.flush();
oBuffer = oBufferStream.toByteArray();
oBufferStream.reset();
}
return oBuffer[oBufferIndex++];
}
}
Let's assume we also have a sample test output stream that simply adds a space character before every written byte ("abc" -> " a b c") like this:
private static class SpacingOutputStream extends OutputStream {
private final OutputStream outputStream;
SpacingOutputStream(OutputStream outputStream) {
this.outputStream = outputStream;
}
#Override
public void write(int b) throws IOException {
outputStream.write(' ');
outputStream.write(b);
}
}
And the following test method:
private static void test(final boolean useDeflater) throws IOException {
final FileInputStream input = new FileInputStream(SOURCE);
final IFunction<OutputStream, ByteArrayOutputStream> outputFactory = new IFunction<OutputStream, ByteArrayOutputStream>() {
#Override
public OutputStream evaluate(ByteArrayOutputStream outputStream) {
return useDeflater ? new DeflaterOutputStream(outputStream) : new SpacingOutputStream(outputStream);
}
};
final InputStream proxyInput = new ProxyInputStream(input, outputFactory);
final OutputStream output = new FileOutputStream(SOURCE + ".~" + useDeflater);
int c;
while ( (c = proxyInput.read()) != -1 ) {
output.write(c);
}
output.close();
proxyInput.close();
}
This test method simply reads the file content and writes it to another stream, that's probably can be modified somehow. If the test method is running with useDeflater=false, the expected approach works fine as it's expected. But if the test method is invoked with the useDeflater set on, it behaves really strange and simply writes almost nothing (if omit the header 78 9C). I suspect that the deflater class may not be designed to meet the approach I like to use, but I always believed that ZIP format and the deflate compression are designed to work on-fly.
Probably I'm wrong at some point with the specifics of the deflate compression algorithm. What do I really miss?.. Perhaps there could be another approach to write a "streams proxy" to behave exactly as I want it to work... How can I compress the data on the fly being limited with the streams only?
Thanks in advance.
UPD: The following basic version works pretty nice with deflater and inflater:
public final class ProxyInputStream<OS extends OutputStream> extends InputStream {
private static final int INPUT_BUFFER_SIZE = 512;
private static final int OUTPUT_BUFFER_SIZE = 512;
private final InputStream iStream;
private final byte[] iBuffer = new byte[INPUT_BUFFER_SIZE];
private final ByteArrayOutputStream oBufferStream;
private final OS oStream;
private final IProxyInputStreamListener<OS> listener;
private byte[] oBuffer = emptyPrimitiveByteArray;
private int oBufferIndex;
private boolean endOfStream;
private ProxyInputStream(InputStream iStream, IFunction<OS, ByteArrayOutputStream> oStreamFactory, IProxyInputStreamListener<OS> listener) {
this.iStream = iStream;
oBufferStream = new ByteArrayOutputStream(OUTPUT_BUFFER_SIZE);
oStream = oStreamFactory.evaluate(oBufferStream);
this.listener = listener;
}
public static <OS extends OutputStream> ProxyInputStream<OS> proxyInputStream(InputStream iStream, IFunction<OS, ByteArrayOutputStream> oStreamFactory, IProxyInputStreamListener<OS> listener) {
return new ProxyInputStream<OS>(iStream, oStreamFactory, listener);
}
#Override
public int read() throws IOException {
if ( oBufferIndex == oBuffer.length ) {
if ( endOfStream ) {
return -1;
} else {
oBufferIndex = 0;
do {
final int iBufferedBytes = iStream.read(iBuffer);
if ( iBufferedBytes == -1 ) {
if ( listener != null ) {
listener.afterEndOfStream(oStream);
}
endOfStream = true;
break;
}
oStream.write(iBuffer, 0, iBufferedBytes);
oStream.flush();
} while ( oBufferStream.size() == 0 );
oBuffer = oBufferStream.toByteArray();
oBufferStream.reset();
}
}
return !endOfStream || oBuffer.length != 0 ? (int) oBuffer[oBufferIndex++] & 0xFF : -1;
}
}
I don't believe that DeflaterOutputStream.flush() does anything meaningful. the deflater will accumulate data until it has something to write out to the underlying stream. the only way to force the remaining bit of data out is to call DeflaterOutputStream.finish(). however, this would not work for your current implementation, as you can't call finish until you are entirely done writing.
it's actually very difficult to write a compressed stream and read it within the same thread. In the RMIIO project i actually do this, but you need an arbitrarily sized intermediate output buffer (and you basically need to push data in until something comes out compressed on the other end, then you can read it). You might be able to use some of the util classes in that project to accomplish what you want to do.
Why don't use GZipOutputStream?
I'm a little lost. But I should simple use the original outputStream when I don't want to compress and new GZipOutputStream(outputStream) when I DO want to compress. That's all. Anyway, check you are flushing the output streams.
Gzip vs zip
Also: one thing is GZIP (compress a stream, that's what you're doing) and another thing is writing a valid zip file (file headers, file directory, entries (header,data)*). Check ZipOutputStream.
Be careful, if somewhere you use method
int read(byte b[], int off, int len) and in case of exception in line
final int iBufferedBytes = iStream.read(iBuffer);
you will get stuck in infinite loop
Is it possible to write objects in Java to a binary file? The objects I want to write would be 2 arrays of String objects. The reason I want to do this is to save persistent data. If there is some easier way to do this let me know.
You could
Serialize the Arrays, or a class
that contains the arrays.
Write the arrays as two lines in a formatted
way, such as JSON,XML or CSV.
Here is some code for the first one (You could replace the Queue with an array)
Serialize
public static void main(String args[]) {
String[][] theData = new String[2][1];
theData[0][0] = ("r0 c1");
theData[1][0] = ("r1 c1");
System.out.println(theData.toString());
// serialize the Queue
System.out.println("serializing theData");
try {
FileOutputStream fout = new FileOutputStream("thedata.dat");
ObjectOutputStream oos = new ObjectOutputStream(fout);
oos.writeObject(theData);
oos.close();
}
catch (Exception e) { e.printStackTrace(); }
}
Deserialize
public static void main(String args[]) {
String[][] theData;
// unserialize the Queue
System.out.println("unserializing theQueue");
try {
FileInputStream fin = new FileInputStream("thedata.dat");
ObjectInputStream ois = new ObjectInputStream(fin);
theData = (Queue) ois.readObject();
ois.close();
}
catch (Exception e) { e.printStackTrace(); }
System.out.println(theData.toString());
}
The second one is more complicated, but has the benefit of being human as well as readable by other languages.
Read and Write as XML
import java.beans.XMLEncoder;
import java.beans.XMLDecoder;
import java.io.*;
public class XMLSerializer {
public static void write(String[][] f, String filename) throws Exception{
XMLEncoder encoder =
new XMLEncoder(
new BufferedOutputStream(
new FileOutputStream(filename)));
encoder.writeObject(f);
encoder.close();
}
public static String[][] read(String filename) throws Exception {
XMLDecoder decoder =
new XMLDecoder(new BufferedInputStream(
new FileInputStream(filename)));
String[][] o = (String[][])decoder.readObject();
decoder.close();
return o;
}
}
To and From JSON
Google has a good library to convert to and from JSON at http://code.google.com/p/google-gson/ You could simply write your object to JSOn and then write it to file. To read do the opposite.
You can do it using Java's serialization mechanism, but beware that serialization is not a good solution for long-term persistent storage of objects. The reason for this is that serialized objects are very tightly coupled to your Java code: if you change your program, then the serialized data files become unreadable, because they are not compatible anymore with your Java code. Serialization is good for temporary storage (for example for an on-disk cache) or for transferring objects over a network.
For long-term storage, you should use a standard and well-documented format (for example XML, JSON or something else) that is not tightly coupled to your Java code.
If, for some reason, you absolutely want to use a binary format, then there are several options available, for example Google protocol buffers or Hessian.
One possibility besides serialization is to write Objects to XML files to make them more human-readable. The XStream API is capable of this and uses an approach that is similar to serialization.
http://x-stream.github.io/
If you want to write arrays of String, you may be better off with a text file. The advantage of using a text file is that it can be easily viewed, edited and is usuable by many other tools in your system which mean you don't have to have to write these tools yourself.
You can also find that a simple text format will be faster and more compact than using XML or JSON. Note: Those formats are more useful for complex data structures.
public static void writeArray(PrintStream ps, String... strings) {
for (String string : strings) {
assert !string.contains("\n") && string.length()>0;
ps.println(strings);
}
ps.println();
}
public static String[] readArray(BufferedReader br) throws IOException {
List<String> strings = new ArrayList<String>();
String string;
while((string = br.readLine()) != null) {
if (string.length() == 0)
break;
strings.add(string);
}
return strings.toArray(new String[strings.size()]);
}
If your start with
String[][] theData = { { "a0 r0", "a0 r1", "a0 r2" } {"r1 c1"} };
This could result in
a0 r0
a0 r1
a0 r2
r1 c1
As you can see this is easy to edit/view.
This makes some assumptions about what a string can contain (see the asset). If these assumptions are not valid, there are way of working around this.
You need to write object, not class, right? Because classes are already compiled to binary .class files.
Try ObjectOutputStream, there's an example
http://java.sun.com/javase/6/docs/api/java/io/ObjectOutputStream.html