Fastest way to write multiple files in java - java

I have requirement where I need to write multiple input streams to a temp file in java. I have the below code snippet for the logic. Is there a better way to do this in an efficient manner?
final String tempZipFileName = "log" + "_" + System.currentTimeMillis();
File tempFile = File.createTempFile(tempZipFileName, "zip");
final FileOutputStream oswriter = new FileOutputStream(tempFile);
for (final InputStream inputStream : readerSuppliers) {
byte[] buffer = new byte[102400];
int bytesRead = 0;
while ((bytesRead = inputStream.read(buffer)) > 0) {
oswriter.write(buffer, 0, bytesRead);
}
buffer = null;
oswriter.write(System.getProperty("line.separator").getBytes());
inputStream.close();
}
I have multiple files of size ranging from 45 to 400 mb, for a typical 45mb and 360 mb files this method is taking around 3 mins on average. Can this be further improved?

You could try a BufferedInputStream
As #StephenC replied is it unrelevant in this case to use a BufferedInputStream because the buffer is big enough.
I reproduced the behaviour on my computer (with an SSD drive). I took a 100MB file.
It took 110ms to create the new file with this example.
With an InputStreamBuffer and an OutputStream = 120 ms.
With an InputStream and an OutputStreamBuffer = 120 ms.
With an InputStreamBuffer and an
OutputStreamBuffer = 110 ms.
I don't have a so long execution time as your's.
Maybe the problem comes from your readerSuppliers ?

Related

Fast modular addition of 2 (potentially large) files in java

I am doing a cryptography experiment with One Time Pads.
I have two files OTP and TEXT (1K-10K) with bytes in them. OTP is large (>1GB). I want to create a third file CYPHERTEXT (same size as TEXT) by performing modular addition of TEXT with OTP using an offset into OTP. I coded this by hand using java.io, and it works, but isn't very snappy, even with buffered IO (streams or writers).
I was looking for a way to add one of underlying byte buffers together with the other one using NIO but could not find a (built-in) way to do that, or to filter the contents of TEXT using the data from OTP except by hand. Is there any way to do something like this without reinventing the wheel? I thought I could use a selector. Ideally I'd like to be able to handle files larger than 2GB in size for both the OTP and the TEXT which is why I was looking at NIO.
private static void createOTP() {
...
System.out.print("Generating " + filename + " ");
long startTime = System.nanoTime();
FileOutputStream fos = new FileOutputStream(f);
BufferedOutputStream bos = new BufferedOutputStream(fos, MB);
for(long currentSize =0; currentSize < OTPSize; currentSize += baSize){
new SecureRandom().nextBytes(ba);
bos.write(ba);
if(currentSize % (MB * 20L * (long)sizeInGB)==0){
System.out.print(".");
}
}
long elapsedTime = System.nanoTime() - startTime;
System.out.println(" OTP generation elapsed Time is " + (elapsedTime / 1000000.0) + " msec");
fos.close();
bos.close();
...
}
private static void symetricEncryptionDecryption(boolean encrypt) {
...
outtext=new File(intext.getParentFile(), direction + ".OTP");
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(outtext), MB);
byte[] plaindata = new byte[(int)intext.length()];
DataInputStream dataIs = new DataInputStream(new FileInputStream(intext));
dataIs.readFully(plaindata);
dataIs.close();
ByteBuffer bb = ByteBuffer.wrap(plaindata);
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(otpFile)));
in.skip(offset);
while(bb.hasRemaining()){
bos.write( bb.get() + (encrypt? in.readByte() : -in.readByte()) );
}
bos.close();
in.close();
System.out.println("Offset: " + offset);
}
So is there a far slicker way to do this:
while(bb.hasRemaining()){
bos.write( bb.get() + (encrypt? in.readByte() : -in.readByte()) );
}
Or to generate the OTP for that matter.
It's not clear what you are trying to do but if you memory map the OTP file, giving you random access, and you read/process 8 bytes at a time i.e. long values you should eb able to write an encrypted 10K text file under 100 ms where most that time will be spend starting the JVM.
BTW: if you have access to the encrypted TEXT and OTP file you might decode the text without he offset i.e. you could work it out using brute force.

Java filecopy with the same speed (USB2 -> USB3 on Linux server). What's wrong?

I have a very simple file copy in Java. It is copies the DB file about 6 minutes to my external (usb3) HDD.
//First database:
try {
fileInputStream = new FileInputStream(selectedfile);
bufferedInputStream = new BufferedInputStream(fileInputStream);
outputFile = new File("" + chosenDestination);
fileOutputStream = new FileOutputStream(outputFile);
bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
size = selectedfile.length();
byte[] buffer = new byte[1024 * 6];
dbLabel.setText("Copying Database...");
while ((data = bufferedInputStream.read(buffer)) > 0) {
bufferedOutputStream.write(buffer, 0, data);
total += data;
String percent = "" + total / (size / 100);
pbar.setValue(Integer.valueOf(percent));
sizeLabel.setText("(" + total / (1024 * 1024) + " / " + (size / (1024 * 1024)) + " MB) ");
}
} finally {
bufferedOutputStream.close();
fileInputStream.close();
fileOutputStream.close();
bufferedInputStream.close();
}
Yesterday I bought a Transcend TS-PDU3 (PCI-E) USB additon card. So my computer and my HDD are able to USB3 too. But when I tried the copyjob, it is copy the file with same speed. It is a Linux server so don't need driver (lspci see it) and I think everything work good so I think the error is in the java code. What buffer size I need to choose for USB3. Is the 6 * 1024 is a small buffer size. Or I need to search the error in elsewhere? Thanks.
Provided you use Java 7, you have a much easier way to copy data from one file to another. With NIO, you can do that:
final Path src = Paths.get(selectedFile);
final Path dst = Paths.get(outputFile);
Files.copy(src, dst);
The JVM does a pretty good job at that. If you don't see an improvement from your current code, well... Replace your code with this first, and investigate further. As you said, no specific drivers are needed in theory.

BufferedInputStream to ByteArrayOutputStream very slow

I have a problem very similar to the link below:
PDF to byte array and vice versa
The main difference being I am trying to interpret a Socket connection via a ServerSocket containing Binary, rather than a file.
This works as expected.
However, the problem I am having is that this process is taking quite a long time to read into memory, about 1 minute 30 seconds for 500 bytes (although the size of each stream will vary massively)
Here's my code:
BufferedInputStream input = new BufferedInputStream(theSocket.getInputStream());
byte[] buffer = new byte[8192];
int bytesRead;
ByteArrayOutputStream output = new ByteArrayOutputStream();
while ((bytesRead = input.read(buffer)) != -1)
{
output.write(buffer, 0, bytesRead);
}
byte[] outputBytes = output.toByteArray();
//Continue ... and eventually close inputstream
If I log it's progress within the while loop within the terminal it seems to log all the bytes quite quickly (i.e. reaches the end of the stream), but then seems to pause for a time before breaking out of the while loop and continuing.
Hope that makes sense.
Well you're reading until the socket is closed, basically - that's when read will return -1.
So my guess is that the other end of the connection is holding it open for 90 seconds before closing it. Fix that, and you'll fix your problem.
ByteArrayOutputStream(int size);
By default the size is 32 bytes so it increses like this: 32->64->128->256->...
So initialize it with a bigger capacity.
You can time how long it takes to copy data between a BufferedInputStream and a ByteArrayOutputStream.
int size = 256 << 20; // 256 MB
ByteArrayInputStream bais = new ByteArrayInputStream(new byte[size]);
long start = System.nanoTime();
BufferedInputStream input = new BufferedInputStream(bais);
byte[] buffer = new byte[8192];
int bytesRead;
ByteArrayOutputStream output = new ByteArrayOutputStream();
while ((bytesRead = input.read(buffer)) != -1) {
output.write(buffer, 0, bytesRead);
}
byte[] outputBytes = output.toByteArray();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f seconds to copy %,d MB %n", time / 1e9, size >> 20);
prints
Took 0.365 seconds to copy 256 MB
It will be much faster for smaller messages i.e. << 256 MB.

Stop HtmlUnit download after specified file size is reached

I'm stuck trying to stop a download initiated with HtmlUnit after a certain size was reached. The InputStream
InputStream input = button.click().getWebResponse().getContentAsStream();
downloads the complete file correctly. However, seems like using
OutputStream output = new FileOutputStream(fileName);
int bytesRead;
int total = 0;
while ((bytesRead = input.read(buffer)) != -1 && total < MAX_SIZE) {
output.write(buffer, 0, bytesRead);
total += bytesRead;
System.out.print(total + "\n");
}
output.flush();
output.close();
input.close();
somehow downloads the file to a different location (unknown to me) and once finished copies the max size into the file "fileName". No System.out is printed during this process. Interestingly, while running the debugger in Netbeans and going slowly step-by-step, the total is printed and I get the MAX_SIZE file.
Varying the buffer size in a range between 1024 to 102400 didn't make any difference.
I also tried Commons'
BoundedInputStream b = new BoundedInputStream(button.click().getWebResponse().getContentAsStream(), MAX_SIZE);
without success.
There's this 2,5 years old post, but I couldn't figure out how to implement the proposed solution.
Is there something I'm missing in order to stop the download at MAX_SIZE?
(Exceptions handling and other etcetera omitted for brevity)
There is no need to use HTMLUnit for this. Actually, using it to such a simple task is a very overkill solution and will make things slow. The best approach I can think of is the following:
final String url = "http://yoururl.com";
final String file = "/path/to/your/outputfile.zip";
final int MAX_BYTES = 1024 * 1024 * 5; // 5 MB
URLConnection connection = new URL(url).openConnection();
InputStream input = connection.getInputStream();
byte[] buffer = new byte[4096];
int pendingRead = MAX_BYTES;
int n;
OutputStream output = new FileOutputStream(new File(file));
while ((n = input.read(buffer)) >= 0 && (pendingRead > 0)) {
output.write(buffer, 0, Math.min(pendingRead, n));
pendingRead -= n;
}
input.close();
output.close();
In this case I've set a maximum download size of 5 MB and a buffer of 4 KB. The file will be written to disk in every iteration of the while loop, which seems to be what you're looking for.
Of course, make sure you handle all the needed exceptions (eg: FileNotFoundException).

Reading a binary file in Java vs C++

I have a binary file (about 100 MB) that I need to read in quickly. In C++ I could just load the file into a char pointer and march through it by incrementing the pointer. This of course would be very fast.
Is there a comparably fast way to do this in Java?
If you use a memory mapped file or regular buffer you will be able to read the data as fast your hardware allows.
File tmp = File.createTempFile("deleteme", "bin");
tmp.deleteOnExit();
int size = 1024 * 1024 * 1024;
long start0 = System.nanoTime();
FileChannel fc0 = new FileOutputStream(tmp).getChannel();
ByteBuffer bb = ByteBuffer.allocateDirect(32 * 1024).order(ByteOrder.nativeOrder());
for (int i = 0; i < size; i += bb.capacity()) {
fc0.write(bb);
bb.clear();
}
long time0 = System.nanoTime() - start0;
System.out.printf("Took %.3f ms to write %,d MB using ByteBuffer%n", time0 / 1e6, size / 1024 / 1024);
long start = System.nanoTime();
FileChannel fc = new FileInputStream(tmp).getChannel();
MappedByteBuffer buffer = fc.map(FileChannel.MapMode.READ_ONLY, 0, size);
LongBuffer longBuffer = buffer.order(ByteOrder.nativeOrder()).asLongBuffer();
long total = 0; // used to prevent a micro-optimisation.
while (longBuffer.remaining() > 0)
total += longBuffer.get();
fc.close();
long time = System.nanoTime() - start;
System.out.printf("Took %.3f ms to read %,d MB MemoryMappedFile%n", time / 1e6, size / 1024 / 1024);
long start2 = System.nanoTime();
FileChannel fc2 = new FileInputStream(tmp).getChannel();
bb.clear();
while (fc2.read(bb) > 0) {
while (bb.remaining() > 0)
total += bb.get();
bb.clear();
}
fc2.close();
long time2 = System.nanoTime() - start2;
System.out.printf("Took %.3f ms to read %,d MB File via NIO%n", time2 / 1e6, size / 1024 / 1024);
prints
Took 305.243 ms to write 1,024 MB using ByteBuffer
Took 286.404 ms to read 1,024 MB MemoryMappedFile
Took 155.598 ms to read 1,024 MB File via NIO
This is for a file 10x larger than what you want. Its this fast because the data is being cached in memory (and I have an SSD drive). If you have fast hardware, the data can be read pretty fast.
Sure, you could use a memory mapped file.
Here are two good links with sample code:
Thinking in Java: Memory-mapped files
Java Tips: How to create a memory-mapped file
If you don't want to go this route, just use an ordinary InputStream (such as a DataInputStream after wrapping it in a BufferedInputStream.
Most files will not need memory mapping but can simply be read by the standard Java I/O, especially since your file is so small. A reasonable way to read said files is by using a BufferedInputStream.
InputStream in = new BufferedInputStream(new FileInputStream("somefile.ext"));
Buffering is already optimized in Java for most computers. If you had a larger file, say 100MB, then you would look at optimizing it further.
Reading the file from the disk is going to be the slowest part by miles, so it's likely to make no difference whatsoever. Of this individual operation, of course- the JVM still takes a decade to start up, so add that time in.
Take a look at this blog post here on how to read a binary file into a byte array in Java:
http://www.spartanjava.com/2008/read-a-file-into-a-byte-array/
Copied from link:
File file = new File("/somepath/myfile.ext");
FileInputStream is = new FileInputStream(file);
// Get the size of the file
long length = file.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("The file is too big");
}
// Create the byte array to hold the data
byte[] bytes = new byte[(int)length];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("The file was not completely read: "+file.getName());
}
// Close the input stream, all file contents are in the bytes variable
is.close()
Using the DataInputStream of the Java SDK can be helpful here. DataInputStream provide such functions as readByte() or readChar(), if that's what needed.
A simple example can be:
DataInputStream dis = new DataInputStream(new FileInputStream("file.dat"));
try {
while(true) {
byte b = dis.readByte();
//Do something with the byte
}
} catch (EOFException eofe) {
//Stream Ended
} catch (IOException ioe) {
//Input exception
}
Hope it helps. You can, of course, read the entire stream to a byte array and iterate through it as well...

Categories

Resources