FileChannel.transferTo for large file in windows

FileChannel.transferTo for large file in windows - java

Using Java NIO use can copy file faster. I found two kind of method mainly over internet to do this job.
public static void copyFile(File sourceFile, File destinationFile) throws IOException {
if (!destinationFile.exists()) {
destinationFile.createNewFile();
}
FileChannel source = null;
FileChannel destination = null;
try {
source = new FileInputStream(sourceFile).getChannel();
destination = new FileOutputStream(destinationFile).getChannel();
destination.transferFrom(source, 0, source.size());
} finally {
if (source != null) {
source.close();
}
if (destination != null) {
destination.close();
}
}
}
In 20 very useful Java code snippets for Java Developers I found a different comment and trick:
public static void fileCopy(File in, File out) throws IOException {
FileChannel inChannel = new FileInputStream(in).getChannel();
FileChannel outChannel = new FileOutputStream(out).getChannel();
try {
// inChannel.transferTo(0, inChannel.size(), outChannel); // original -- apparently has trouble copying large files on Windows
// magic number for Windows, (64Mb - 32Kb)
int maxCount = (64 * 1024 * 1024) - (32 * 1024);
long size = inChannel.size();
long position = 0;
while (position < size) {
position += inChannel.transferTo(position, maxCount, outChannel);
}
} finally {
if (inChannel != null) {
inChannel.close();
}
if (outChannel != null) {
outChannel.close();
}
}
}
But I didn't find or understand what is meaning of
"magic number for Windows, (64Mb - 32Kb)"
It says that inChannel.transferTo(0, inChannel.size(), outChannel) has problem in windows, is 32768 (= (64 * 1024 * 1024) - (32 * 1024)) byte is optimum for this method.

Windows has a hard limit on the maximum transfer size, and if you exceed it you get a runtime exception. So you need to tune. The second version you give is superior because it doesn't assume the file was transferred completely with one transferTo() call, which agrees with the Javadoc.
Setting the transfer size more than about 1MB is pretty pointless anyway.
EDIT Your second version has a flaw. You should decrement size by the amount transferred each time. It should be more like:
while (size > 0) { // we still have bytes to transfer
long count = inChannel.transferTo(position, size, outChannel);
if (count > 0)
{
position += count; // seeking position to last byte transferred
size-= count; // {count} bytes have been transferred, remaining {size}
}
}

I have read that it is for compatibility with the Windows 2000 operating system.
Source: http://www.rgagnon.com/javadetails/java-0064.html
Quote: In win2000, the transferTo() does not transfer files > than 2^31-1 bytes. it throws an exception of "java.io.IOException: Insufficient system resources exist to complete the requested service is thrown." The workaround is to copy in a loop 64Mb each time until there is no more data.

There appears to be anecdotal evidence that attempts to transfer more than 64MB at a time on certain Windows versions results in a slow copy. Hence the check: this appears to be the result of some detail of the underlying native code that implements the transferTo operation on Windows.

Related

Parsing files over 2.15 GB in Java using Kaitai Struct

I'm parsing large PCAP files in Java using Kaitai-Struct. Whenever the file size exceeds Integer.MAX_VALUE bytes I face an IllegalArgumentException caused by the size limit of the underlying ByteBuffer.
I haven't found references to this issue elsewhere, which leads me to believe that this is not a library limitation but a mistake in the way I'm using it.
Since the problem is caused by trying to map the whole file into the ByteBuffer I'd think that the solution would be mapping only the first region of the file, and as the data is being consumed map again skipping the data already parsed.
As this is done within the Kaitai Struct Runtime library it would mean to write my own class extending fom KatiaiStream and overwrite the auto-generated fromFile(...) method, and this doesn't really seem the right approach.
The auto-generated method to parse from file for the PCAP class is.
public static Pcap fromFile(String fileName) throws IOException {
return new Pcap(new ByteBufferKaitaiStream(fileName));
}
And the ByteBufferKaitaiStream provided by the Kaitai Struct Runtime library is backed by a ByteBuffer.
private final FileChannel fc;
private final ByteBuffer bb;
public ByteBufferKaitaiStream(String fileName) throws IOException {
fc = FileChannel.open(Paths.get(fileName), StandardOpenOption.READ);
bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
}
Which in turn is limitted by the ByteBuffer max size.
Am I missing some obvious workaround? Is it really a limitation of the implementation of Katiati Struct in Java?

There are two separate issues here:
Running Pcap.fromFile() for large files is generally not a very efficient method, as you'll eventually get all files parsed into memory array at once. A example on how to avoid that is given in kaitai_struct/issues/255. The basic idea is that you'd want to have control over how you read every packet, and then dispose of every packet after you've parsed / accounted it somehow.
2GB limit on Java's mmaped files. To mitigate that, you can use alternative RandomAccessFile-based KaitaiStream implementation: RandomAccessFileKaitaiStream — it might be slower, but it should avoid that 2GB problem.

This library provides a ByteBuffer implementation which uses long offset. I haven't tried this approach but looks promising. See section Mapping Files Bigger than 2 GB
http://www.kdgregory.com/index.php?page=java.byteBuffer
public int getInt(long index)
{
return buffer(index).getInt();
}
private ByteBuffer buffer(long index)
{
ByteBuffer buf = _buffers[(int)(index / _segmentSize)];
buf.position((int)(index % _segmentSize));
return buf;
}
public MappedFileBuffer(File file, int segmentSize, boolean readWrite)
throws IOException
{
if (segmentSize > MAX_SEGMENT_SIZE)
throw new IllegalArgumentException(
"segment size too large (max " + MAX_SEGMENT_SIZE + "): " + segmentSize);
_segmentSize = segmentSize;
_fileSize = file.length();
RandomAccessFile mappedFile = null;
try
{
String mode = readWrite ? "rw" : "r";
MapMode mapMode = readWrite ? MapMode.READ_WRITE : MapMode.READ_ONLY;
mappedFile = new RandomAccessFile(file, mode);
FileChannel channel = mappedFile.getChannel();
_buffers = new MappedByteBuffer[(int)(_fileSize / segmentSize) + 1];
int bufIdx = 0;
for (long offset = 0 ; offset < _fileSize ; offset += segmentSize)
{
long remainingFileSize = _fileSize - offset;
long thisSegmentSize = Math.min(2L * segmentSize, remainingFileSize);
_buffers[bufIdx++] = channel.map(mapMode, offset, thisSegmentSize);
}
}
finally
{
// close quietly
if (mappedFile != null)
{
try
{
mappedFile.close();
}
catch (IOException ignored) { /* */ }
}
}
}

PDF file encode to base64 take more time if 100k documents are to be encode

Am trying to encode pdf documents to base64, If it is less in number ( like 2000 documents) its working nicely. But am having 100k plus doucments to be encode.
Its take more time to encode all those files. Is there any better approach to encode large data set.?
Please find my current approach
String filepath=doc.getPath().concat(doc.getFilename());
file = new File(filepath);
if(file.exists() && !file.isDirectory()) {
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
fileInputStreamReader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}

Try this:
Figure out how many files you need to encode.
int files = Files.list(Paths.get(directory)).count();
Split them up into a reasonable amount that a thread can handle in java. I.E) If you have 100k files to encode. Split it into 1000 lists of 1000, something like that.
int currentIndex = 0;
for (File file : filesInDir) {
if (fileMap.get(currentIndex).size() >= cap)
currentIndex++;
fileMap.get(currentIndex).add(file);
}
/** Its going to take a little more effort than this, but its the idea im trying to show you*/
Execute each worker thread one after another if the computers resources are available.
for (Integer key : fileMap.keySet()) {
new WorkerThread(fileMap.get(key)).start();
}
You can check the current resources available with:
public boolean areResourcesAvailable() {
return imNotThatNice();
}
/**
* Gets the resource utility instance
*
* #return the current instance of the resource utility
*/
private static OperatingSystemMXBean getInstance() {
if (ResourceUtil.instance == null) {
ResourceUtil.instance = ManagementFactory.getOperatingSystemMXBean();
}
return ResourceUtil.instance;
}

Memory issue when storing images in byteArray

I have an app that needs to access a large number of images very quickly, so I need to load those images into memory in some way. Doing so as bitmaps used over 100MB of RAM, which was completely out of the question, so I opted to read jpg files into memory, storing them inside a byteArray. Then I decode them and write them to the canvas as each is needed. This works pretty well, cutting out the slow disk access, while also respecting memory limits.
However, memory usage seems 'off' to me. I'm storing 450 jpgs with a file size of approximately 33kb each. This totals around 15MB of data. However, the app continually runs at between 35MB and 40MB of RAM as reported by both Eclipse DDMS and Android (on a physical device). I've tried modifying how many jpgs are loaded and the RAM used by the app tends to decrease by around 60-70kb per jpg, indicating that each image is stored twice in RAM. Memory usage does not fluctuate which implies that there is not an actual 'leak' involved.
Here is the relevant loading code:
private byte[][] bitmapArray = new byte[totalFrames][];
for (int x=0; x<totalFrames; x++) {
File file = null;
if (cWidth <= cHeight){
file = new File(directory + "/f"+x+".jpg");
} else {
file = new File(directory + "/f"+x+"-land.jpg");
}
bitmapArray[x] = getBytesFromFile(file);
imagesLoaded = x + 1;
}
public byte[] getBytesFromFile(File file) {
byte[] bytes = null;
try {
InputStream is = new FileInputStream(file);
long length = file.length();
bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead = is.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + file.getName());
}
is.close();
} catch (IOException e) {
//TODO Write your catch method here
}
return bytes;
}
Eventually, they get written to screen like so:
SurfaceHolder holder = getSurfaceHolder();
Canvas c = null;
try {
c = holder.lockCanvas();
if (c != null) {
int canvasWidth = c.getWidth();
int canvasHeight = c.getHeight();
Rect destinationRect = new Rect();
destinationRect.set(0, 0, canvasWidth, canvasHeight);
c.drawBitmap(BitmapFactory.decodeByteArray(bitmapArray[bgcycle], 0, bitmapArray[bgcycle].length), null, destinationRect, null);
}
} finally {
if (c != null)
holder.unlockCanvasAndPost(c);
}
Am I correct that there is some sort of duplication going on here? Or is there just that much overhead involved in storing jpgs in a byteArray like this?

Storing bytes in RAM is very different to storing data on hard drives... There is alot more overhead to it. The references to the objects as well the byte array structures all take up additional memory. There isn't really a single source to all the additional memory but just remember than loading a file into RAM normally takes up 2 ~ 3x more space (from experience, I'm afraid I can't quote any documentation here).
Consider this:
File F = //Some file here (Less than 2 GB please)
FileInputStream fIn = new FileInputStream(F);
ByteArrayOutputStream bOut = new ByteArrayOutputStream(((int)F.length()) + 1);
int r;
byte[] buf = new byte[32 * 1000];
while((r = fIn.read(buf) != -1){
bOut.write(buf, 0, r);
}
//Do a memory measurement at this point. You'll see your using nearly 3x the memory in RAM compared to the file.
//If your actually gonna try this, remember to surround with try-catch and close the streams as appropriate.
Also remember that unused memory is not instantly cleared up. The method getBytesFromFile() may be returning a copy of a byte array which causes memory duplication which may not immediately be garbage collected. If you want to be safe, check the method getBytesFromFile(file) is not leaking any references that should be cleaned up. It won't appear as a memory leak as you only call it a finite number of times.

It might be because your byte array is 2 dimensional, you only need one dimension for loading an image using a byte array, and the second dimension could potentially double the Ram needed as for each byte you would have an empty but still existing byte that you don't use

Android to computer FTP resuming upload strange phenomenon

I have a strange phenomenon when resuming a file transfer.
Look at the picture below you see the bad section.
This happens apparently random, maybe every 10:th time.
Im sending the picture from my Android phone to java server over ftp.
What is it that i forgot here.
I see the connection is killed due to java.net.SocketTimeoutException:
The transfer is resuming like this
Resume at : 287609 Sending 976 bytes more
The bytes are always correct when file is completely received.
Even for the picture below.
Dunno where to start debug this since its working most of the times.
Any suggestions or ideas would be grate i think i totally missed something here.
The device Sender code (only send loop):
int count = 1;
//Sending N files, looping N times
while(count <= max) {
String sPath = batchFiles.get(count-1);
fis = new FileInputStream(new File(sPath));
int fileSize = bis.available();
out.writeInt(fileSize); // size
String nextReply = in.readUTF();
// if the file exist,
if(nextReply.equals(Consts.SERVER_give_me_next)){
count++;
continue;
}
long resumeLong = 0; // skip this many bytes
int val = 0;
buffer = new byte[1024];
if(nextReply.equals(Consts.SERVER_file_exist)){
resumeLong = in.readLong();
}
//UPDATE FOR #Justin Breitfeller, Thanks
long skiip = bis.skip(resumeLong);
if(resumeLong != -1){
if(!(resumeLong == skiip)){
Log.d(TAG, "ERROR skip is not the same as resumeLong ");
skiip = bis.skip(resumeLong);
if(!(resumeLong == skiip)){
Log.d(TAG, "ERROR ABORTING skip is not the same as resumeLong);
return;
}
}
}
while ((val = bis.read(buffer, 0, 1024)) > 0) {
out.write(buffer, 0, val);
fileSize -= val;
if (fileSize < 1024) {
val = (int) fileSize;
}
}
reply = in.readUTF();
if (reply.equals(Consts.SERVER_file_receieved_ok)) {
// check if all files are sent
if(count == max){
break;
}
}
count++;
}
The receiver code (very truncated):
//receiving N files, looping N times
while(count < totalNrOfFiles){
int ii = in.readInt(); // File size
fileSize = (long)ii;
String filePath = Consts.SERVER_DRIVE + Consts.PTPP_FILETRANSFER;
filePath = filePath.concat(theBatch.getFileName(count));
File path = new File(filePath);
boolean resume = false;
//if the file exist. Skip if done or resume if not
if(path.exists()){
if(path.length() == fileSize){ // Does the file has same size
logger.info("File size same skipping file:" + theBatch.getFileName(count) );
count++;
out.writeUTF(Consts.SERVER_give_me_next);
continue; // file is OK don't upload it again
}else {
// Resume the upload
out.writeUTF(Consts.SERVER_file_exist);
out.writeLong(path.length());
resume = true;
fileSize = fileSize-path.length();
logger.info("Resume at : " + path.length() +
" Sending "+ fileSize +" bytes more");
}
}else
out.writeUTF("lets go");
byte[] buffer = new byte[1024];
// ***********************************
// RECEIVE FROM PHONE
// ***********************************
int size = 1024;
int val = 0;
bos = new BufferedOutputStream(new FileOutputStream(path,resume));
if(fileSize < size){
size = (int) fileSize;
}
while (fileSize >0) {
val = in.read(buffer, 0, size);
bos.write(buffer, 0, val);
fileSize -= val;
if (fileSize < size)
size = (int) fileSize;
}
bos.flush();
bos.close();
out.writeUTF("file received ok");
count++;
}

Found the error and the problem was bad logic from my part.
say no more.
I was sending pictures that was being resized just before they where sent.
The problem was when the resume kicked in after a failed transfer
the resized picture was not used, instead the code used the original
pictured that had a larger scale size.
I have now setup a short lived cache that holds the resized temporary pictures.
In the light of the complexity of the app im making I simply forgot that the files during resume was not the same as original.

With a BufferedOutputStream, BufferedInputStream, you need to watch out for the following
Create BufferedOutputStream before BuffererdInputStream (on both client and server)
And flush just after create.
Flush after every write (not just before close)
That worked for me.
Edited
Add sentRequestTime, receivedRequestTime, sentResponseTime, receivedResponseTime to your packet payload. Use System.nanoTime() on these, run your server and client on the same host, use ExecutorService to run multiple clients for that server, and plot your (received-sent) for both request and response packets, time delay on a excel chart (some csv format). Do this before bufferedIOStream and afterIOStream. You will be pleased to know that your performance has boosted by 100%. Made me very happy to plot that graph, took about 45 mins.
I have also heard that using custom buffer's further improves performance.
Edited again
In my case I am using Object IOStreams, I have added a payload of 4 long variables to the object, and initialize sentRequestTime when I send the packet from the client, initialize receivedRequestTime when the server receives the response, so and so forth for the response from server to client too. I then find the difference between received and sent time to find out the delay in response and request. Be careful to run this test on localhost. If you run it between different hardware/devices, their actual time difference may interfere with your test results. Since requestReceivedTime is time stamped at the server end and the requestSentTime is time stamped at the client end. In other words, their own local time is stamped (obviously). And both of these devices running the exact same time to the nano second is not possible. If you must run it between different devices atleast make sure that you have ntp running (to keep them time synchronized). That said, you hare comparing the performance before and after bufferedio (you dont really care about the actual time delays right ?), so time drift should not really matter. Comparing a set of results before buffered and after buffered is your actual interest.
Enjoy!!!

Search in a InputStream

Is there a way to do a efficient search for 2 fix bytes on an InputStream?
Background
I have to deal with Multipart Http traffic on Android. (Motion JPEG from a IP Webcam).
I already found some Classes on anddev.org to deal with it. Now I do some performance improvements. To find the start of a JPEG, I need to find the magic number for JPEGs (SOI=FFD8) in the InputStream.

Since you've no idea where in the stream those 2 bytes are, you'll have to look at the entire input. That means, your performance will be at least linear. Two find two bytes linearly is straightforward:
static long search(InputStream inputStream) throws IOException {
BufferedInputStream is = new BufferedInputStream(inputStream);
int previous = is.read(read);
long pos = 0;
int current;
while((current = is.read()) != -1) {
pos++;
if(previous == 0xff && current == 0xd8) {
return pos;
}
last = current;
}
throw new RuntimeException("There ain't no pic in here.");
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.