download file in java from wampserver - java

i create java downloader and download 900MB file from my wampserver , and it works correctly , but when i check the Ram usage , it increase a lot , i dont know why?
i used IDM to download same file form my wampserver and it didnt use a lot of Ram
when i see process and disk usage of my java downloader it take about 50MB and 5% cup but when i look to performance in Performance TAB , my RAM increase alot.
this is performance pic : http://i.stack.imgur.com/zfxNv.png
and this my code that my app create 8 thread to download this file simultaneously:
private void downloadFile() {
try {
this.response = this.con.getInputStream();
this.bis = new BufferedInputStream(this.response, 32 * 1024);
this.responseContentSize = this.con.getContentLength();
if (this.responseContentSize == (this.end_range - this.start_range) + 1) {
int MAX_BUFFER_SIZE = 32*1024;
byte buffer[] = new byte[MAX_BUFFER_SIZE];
makeTmpFile();
out = new FileOutputStream(this.tmpdir + this.tmpFilename);
bos = new BufferedOutputStream(out, 32 * 1024);
while (true) {
int r = this.bis.read(buffer, 0, this.MAX_BUFFER_SIZE);
if (r == -1)
break;
bos.write(buffer, 0, r);
downloadedBytes += r;
}
if (bos != null)
bos.close();
if (out != null)
out.close();
if (bis != null)
bis.close();
if (response != null)
response.close();
}
} catch (IOException e) {
sharedDownloadStatus.setCell(this.threadIndex, STATUS, 0);//error
log.setLog("func : downloadFile =>\n couldnt download file\n" + e.getMessage());
}
}

You allocate data buffer for each chunk. They temporarily occupy a lot of memory befor garbage collector take care of them.
Try holding only one data buffer per thread. (Using ThreadLocal, for example).

You can use the jvisualvm.exe program to look at your memory consumption and see where its being allocated. jvisualvm is located in the bin folder of your jdk installation.
Heres what I would do to change your code:
Get rid of the BufferedInput/BufferedOutput streams. These buffered streams are unnecessary in your program and are just adding in an additional memory allocation (of 32kb per buffered stream) but arent needed.
Dont use 8 threads, I've build file download application to transfer very large files across networks and I have found the optimal amount of threads is 2 or 3. You cant download the file faster than your internet connection therefore adding in extra thread doesnt do anything but waste memory and time.
Use a smaller buffer, a 32kb buffer is quite large, I would test using a smaller buffer like 1kb.
Dont confuse wammp memory/cpu usage with your Java applications memory/cpu usage. Use the 'Processes' tab of the Windows Task Manager instead of the Performance tab.
With that said I dont have your entire program and cant test and debug performance for you but these are the things I would do. Below is your code that I have modified as much as possible.
private void downloadFile() {
try {
this.response = this.con.getInputStream();
this.responseContentSize = this.con.getContentLength();
if (this.responseContentSize == (this.end_range - this.start_range) + 1) {
int MAX_BUFFER_SIZE = 1024;
byte buffer[] = new byte[MAX_BUFFER_SIZE];
makeTmpFile();
out = new FileOutputStream(this.tmpdir + this.tmpFilename);
int r = 0;
while ((r = response.read(buffer)) != -1) {
out.write(buffer, 0, r);
downloadedBytes = r;
}
}
} catch (IOException e) {
sharedDownloadStatus.setCell(this.threadIndex, STATUS, 0);//error
log.setLog("func : downloadFile =>\n couldnt download file\n" + e.getMessage());
} finally {
if (out != null)
out.close();
if (response != null)
response.close();
}
}

Related

I'm having trouble writing copy large files

I'm having problems with my code, I'm encrypting a file with more than 300mb in base 64 but my application gives errors when I open the lra encrypt file
this is my code crashes on the byte, i don't understand why
private void encript(final File file) {
new AsyncTask<Void, Void, Void>() {
#Override
protected Void doInBackground(Void[] p) {
File new_file = null;
try {
new_file = new File(file.getAbsolutePath() + ".enc.txt");
if (!new_file.exists()) {
new_file.createNewFile();
}
BufferedInputStream mInputStream = new BufferedInputStream(new FileInputStream(file));
OutputStream mOutputStream = new DataOutputStream(new FileOutputStream(new_file));
byte[] data = new byte[mInputStream.available()];
int len = 0;
while (true) {
len = mInputStream.read(data);
if (len > 0) {
mOutputStream.write(Base64.encode(data, 0, len, Base64.DEFAULT));
}
break;
}
mOutputStream.flush();
if (mOutputStream != null) {
mOutputStream.close();
}
if (mInputStream != null) {
mInputStream.close();
}
} catch (Exception io) {
Toast.makeText(MainActivity.this, io.toString(), Toast.LENGTH_LONG).show();
}
return null;
}
#Override
protected void onPostExecute(Void res) {
Toast.makeText(MainActivity.this, "Sucesss", Toast.LENGTH_LONG).show();
}
}.execute(new Void[0]);
}
Note that what you are doing here is Base64 encoding the file contents. Don't imagine that someone can't trivially crack this (so-called) "encryption".
There are lots of things wrong with your attempt. I shall go through the more important ones:
#Override
protected Void doInBackground(Void[] p) {
File new_file = null;
try {
Problem: You should be using try with resources to avoid resource leaks.
new_file = new File(file.getAbsolutePath() + ".enc.txt");
if (!new_file.exists()) {
new_file.createNewFile();
}
Problems:
On the one hand, there is no need to use createNewFile to pre-create an output file. Opening the file using FileOutputStream will create it if it doesn't exist already.
On the other hand, this won't prevent (or report) errors in cases where the file's parent directory doesn't exist, is not writeable and so on.
It would be better to use java.nio.file.Path and java.nio.file.Files from Java 7 / Android API 26. Path and Files are better APIs and they will report problems as exceptions so that you can (hypothetically) report them to the user via your exception handler.
There are even some Files.copy methods, though they are not directly applicable to your use-case since you are encoding the data as you copy it.
BufferedInputStream mInputStream =
new BufferedInputStream(new FileInputStream(file));
OutputStream mOutputStream =
new DataOutputStream(new FileOutputStream(new_file));
Problem:
I don't think you need a DataOutputStream. It won't actually be doing anything.
byte[] data = new byte[mInputStream.available()];
Problem:
The available() method should not be used for this. It returns the number of bytes that are "available" to be read right now. The value you get is context dependent. For a socket stream it is typically the number of bytes that are currently in the kernel buffers ready to read. For a "regular" file it may be the length of the input file.
So if you are copying a "really big" file, then you may be attempting to allocate a buffer that will hold the entire file. In the worst case, that will cause your app to OOME!
NOTE - Such an OOME might be the "out of nowhere" problem that you are seeing.
The "best" way is debatable, but I would just use a fixed buffer size ... if I was doing an explicit read / write copy of a stream. The size of the buffer affects throughput, but if you are looking for ultimate performance you shouldn't be doing it this way.
int len = 0;
while (true) {
len = mInputStream.read(data);
if (len > 0) {
mOutputStream.write(
Base64.encode(data, 0, len, Base64.DEFAULT));
}
break;
}
Problem: This loop is simply wrong. You are unconditionally breaking on the first iteration. You should be doing something like this:
int len;
while ((len = mInputStream.read(data)) > 0) {
mOutputStream.write(Base64.encode(data, 0, len, Base64.DEFAULT));
}
In other words, keep reading / writing until read returns a non-positive result.
Note: I'm not sure which Base64 class you are using there. It doesn't appear to be java.util.Base64
mOutputStream.flush();
if (mOutputStream != null) {
mOutputStream.close();
}
if (mInputStream != null) {
mInputStream.close();
}
Problems:
The flush() is not necessary. Closing the stream will flush. And besides, what happens with your attempted flush if mOutputStream is null.
This version leaks resources (file descriptors). If an exception has been thrown, these statements won't be executed, and the stream objects will not be closed.
This is all unnecessary if you use try with resources instead.
} catch (Exception io) {
Toast.makeText(MainActivity.this, io.toString(),
Toast.LENGTH_LONG).show();
}
return null;
}
Problems:
Catching Exception is a bad idea. A better idea is to catch and handle the expected exceptions, and let the unexpected ones propagate so that they can be handled further up the stack.
In this case, it looks like you are assuming that the exception will be some sort of I/O exception. In fact, it could also be an unchecked exception such as an NPE. (An OOME is also possible, though this catch wouldn't catch that because OOMEs are Error exceptions.)
You are throwing away the exception details. Unexpected exceptions should be logged so that you can diagnose them via logcat.

Improve performance when reading file from URL and writing it to disk

I made a program which accesses some URLs and downloads the pdfs from there. The files vary between 2MB to 40MB. The program works with no problems but is there a way to improve the perfomance on this? For the larger files it takes a long time to do it.
The code below is the one used for reading / writing the file. This is called in a for loop with different fileNameURLPath.
#Override
public void downloadFile(String fileNameURLPath, String titleCellValue) throws FileException {
try (BufferedInputStream inputStream
= new BufferedInputStream(new URL(fileNameURLPath).openStream())){
FileOutputStream fileOS = new FileOutputStream(FileConstants.MandatoryDownloadProperties.path + titleCellValue + ".pdf");
byte data[] = new byte[32*1024];
int byteContent;
while((byteContent = inputStream.read(data,0 , data.length)) != -1) {
fileOS.write(data, 0 , byteContent);
}
inputStream.close();
fileOS.close();
} catch (MalformedURLException e) {
throw new FileException("Error while processing url. Make sure it is correct");
} catch (IOException e) {
throw new FileException("Error while downloading file. Make sure the download path is correct");
}
}
I read something about Java NIO but I couldn't quite comprehend it or if it can help me in this situation

How to read file part by part while writing it?

I'm working on an app that records video and I need to send already written data in videofile to server in base64 string without stopping record process. Does anyone know how to make it with less memory consumption?
For now I'm doing it this way
private void sendNewVideos(String path) {
try {
Log.i(TAG, "VIDEO PATH - " + path);
FileWriter fileWriter = new FileWriter(new File(pathToFolder + "/temp.txt"));
String base64String = new String();
File file = new File(path);
Long size = 0L;
base64String = Base64.encodeToString(readFile(file, size), Base64.DEFAULT);
fileWriter.append(base64String);
fileWriter.flush();
boolean flag = true;
while (flag) {
if (size < file.length()) {
base64String = Base64.encodeToString(readFile(file, size), Base64.DEFAULT);
fileWriter.append(base64String);
fileWriter.flush();
size = file.length();
}
}
fileWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private byte[] readFile(File file, Long size) {
try {
RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
randomAccessFile.seek(size);
FileChannel fileChannel = randomAccessFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(1024 * 1024 * 2);
while (fileChannel.read(buffer) > 0) {
buffer.flip();
byte[] temp = new byte[buffer.limit()];
for (int i = 0; i < buffer.limit(); i++) {
temp[i] = buffer.get(i);
}
buffer.clear();
return temp;
}
fileChannel.close();
randomAccessFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Writing to file is just to check how it works. But after some time recording stops. Sometimes LogCat shows something like this
I/art: Thread[3,tid=23425,WaitingInMainSignalCatcherLoop,Thread*=0x7fe42c410800,peer=0x22c08080,"Signal Catcher"]: reacting to signal 3
I/art: Wrote stack traces to '/data/anr/traces.txt'
I think that's because of either memory leak or just out of memory problem.
Some kind of solutions.
Don't use Base64 for encoding video for sending via network (even wi-fi) as it increases amount of data approximately 10 times which is not very good for battery and could kill or hang you process/service.
Avoid reading file that is in process of written as it could and would slowdown IO operation speed.
If you still need to send data from such file use some kind of next algorithm:
get access to file (for example with buffered input stream);
read part of file to buffer;
do as simpler work with it as possible. For, example, send buffer to server in separate thread with HTTPUrlConnection. You can find example here.
Control used memory otherwise system try to kill you process.

Faster way of copying data in Java?

I have been given a task of copying data from a server. I am using BufferedInputStream and output stream to copy the data and I am doing it byte by byte. Even though it is running but It is taking ages to copy the data as some of them are in 100's MBs, so definitely it is not gonna work. Can anyone suggest me any alternate of Byte by Byte copy so that my code can copy file that are in few Hundred MBs.
Buffer is 2048.
Here is how my code look like:
static void copyFiles(SmbFile[] files, String parent) throws IOException {
SmbFileInputStream input = null;
FileOutputStream output = null;
BufferedInputStream buf_input = null;
try {
for (SmbFile f : files) {
System.out.println("Working on files :" + f.getName());
if (f.isDirectory()) {
File folderToBeCreated = new File(parent+f.getName());
if (!folderToBeCreated.exists()) {
folderToBeCreated.mkdir();
System.out.println("Folder name " + parent
+ f.getName() + "has been created");
} else {
System.out.println("exists");
}
copyFiles(f.listFiles(), parent + f.getName());
} else {
input = (SmbFileInputStream) f.getInputStream();
buf_input = new BufferedInputStream(input, BUFFER);
File t = new File(parent + f.getName());
if (!t.exists()) {
t.createNewFile();
}
output = new FileOutputStream(t);
int c;
int count;
byte data[] = new byte[BUFFER];
while ((count = buf_input.read(data, 0, BUFFER)) != -1) {
output.write(data, 0, count);
}
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (input != null) {
input.close();
}
if (output != null) {
output.close();
}
}
}
Here is a link to an excellent post explaining how to use nio channels to make copies of streams. It introduces a helper method ChannelTools.fastChannelCopy that lets you copy streams like this:
final InputStream input = new FileInputStream(inputFile);
final OutputStream output = new FileOutputStream(outputFile);
final ReadableByteChannel inputChannel = Channels.newChannel(input);
final WriteableByteChannel outputChannel = Channels.newChannel(output);
ChannelTools.fastChannelCopy(inputChannel, outputChannel);
inputChannel.close();
outputChannel.close()
Well since you're using a BufferedInputStream, you aren't reading byte by byte, but rather the size of the buffer. You could just try increasing the buffer size.
Reading/writing byte-by-byte is definitely going to be slow, even though the actual reading/writing is done by chunks of the buffer size. One way to speed it up is to read/write by blocks. Have a look at read(byte[] b, int off, int len) method of BufferedInputStream. However it probably won't give you enough of the improvement.
What would be much better is to use nio package (new IO) to copy data using nio channels. Have a look at nio documentation for more info.
I would suggest to use FileUtils from org.apache.commons.io. It has enough utility methods to perform file operations.
org.apache.commons.io.FileUtils API Here

Java match/exceed performance of readline

For my application, I had to write a custom "readline" method since I wanted to detect and preserve the newline endings in an ASCII text file. The Java readLine() method does not tell which newline sequence (\r, \n, \r\n) or EOF was encountered, so I cannot put the exact same newline sequence when writing to the modified file.
Here is the SSCE of my test example.
public class TestLineIO {
public static java.util.ArrayList<String> readLineArrayFromFile1(java.io.File file) {
java.util.ArrayList<String> lineArray = new java.util.ArrayList<String>();
try {
java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(file));
String strLine;
while ((strLine = br.readLine()) != null) {
lineArray.add(strLine);
}
br.close();
} catch (java.io.IOException e) {
System.err.println("Could not read file");
System.err.println(e);
}
lineArray.trimToSize();
return lineArray;
}
public static boolean writeLineArrayToFile1(java.util.ArrayList<String> lineArray, java.io.File file) {
try {
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
int size = lineArray.size();
for (int i = 0; i < size; i++) {
out.write(lineArray.get(i));
out.newLine();
}
out.close();
} catch (java.io.IOException e) {
System.err.println("Could not write file");
System.err.println(e);
return false;
}
return true;
}
public static java.util.ArrayList<String> readLineArrayFromFile2(java.io.File file) {
java.util.ArrayList<String> lineArray = new java.util.ArrayList<String>();
try {
java.io.FileInputStream stream = new java.io.FileInputStream(file);
try {
java.nio.channels.FileChannel fc = stream.getChannel();
java.nio.MappedByteBuffer bb = fc.map(java.nio.channels.FileChannel.MapMode.READ_ONLY, 0, fc.size());
char[] fileArray = java.nio.charset.Charset.defaultCharset().decode(bb).array();
if (fileArray == null || fileArray.length == 0) {
return lineArray;
}
int length = fileArray.length;
int start = 0;
int index = 0;
while (index < length) {
if (fileArray[index] == '\n') {
lineArray.add(new String(fileArray, start, index - start + 1));
start = index + 1;
} else if (fileArray[index] == '\r') {
if (index == length - 1) { //last character in the file
lineArray.add(new String(fileArray, start, length - start));
start = length;
break;
} else {
if (fileArray[index + 1] == '\n') {
lineArray.add(new String(fileArray, start, index - start + 2));
start = index + 2;
index++;
} else {
lineArray.add(new String(fileArray, start, index - start + 1));
start = index + 1;
}
}
}
index++;
}
if (start < length) {
lineArray.add(new String(fileArray, start, length - start));
}
} finally {
stream.close();
}
} catch (java.io.IOException e) {
System.err.println("Could not read file");
System.err.println(e);
e.printStackTrace();
return lineArray;
}
lineArray.trimToSize();
return lineArray;
}
public static boolean writeLineArrayToFile2(java.util.ArrayList<String> lineArray, java.io.File file) {
try {
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
int size = lineArray.size();
for (int i = 0; i < size; i++) {
out.write(lineArray.get(i));
}
out.close();
} catch (java.io.IOException e) {
System.err.println("Could not write file");
System.err.println(e);
return false;
}
return true;
}
public static void main(String[] args) {
System.out.println("Begin");
String fileName = "test.txt";
long start = 0;
long stop = 0;
start = java.util.Calendar.getInstance().getTimeInMillis();
java.io.File f = new java.io.File(fileName);
java.util.ArrayList<String> javaLineArray = readLineArrayFromFile1(f);
stop = java.util.Calendar.getInstance().getTimeInMillis();
System.out.println("Total time = " + (stop - start) + " ms");
java.io.File oj = new java.io.File(fileName + "_readline.txt");
writeLineArrayToFile1(javaLineArray, oj);
start = java.util.Calendar.getInstance().getTimeInMillis();
java.util.ArrayList<String> myLineArray = readLineArrayFromFile2(f);
stop = java.util.Calendar.getInstance().getTimeInMillis();
System.out.println("Total time = " + (stop - start) + " ms");
java.io.File om = new java.io.File(fileName + "_custom.txt");
writeLineArrayToFile2(myLineArray, om);
System.out.println("End");
}
}
Version 1 uses readLine(), whereas version 2 is my version, which preserves newline characters.
On a text file with about 500K lines, version1 takes about 380 ms, whereas version2 takes 1074 ms.
How can I speed-up the performance of version2?
I checked Google guava and apache-commons libraries but cannot find a suitable replacement for "readLine()" that will tell which newline character was encountered when reading a text file.
Whenever the issue regards a program's speed, the main thing you should keep in mind is that, for any continuous process within that program, the speed is nearly always limited by one of two things: CPU (processing power) or IO (memory allocation and transfer speed).
Usually either your CPU is faster than your IO, or the contrary. Because of this, your program's speed-limit is almost always dictated by one of them, and it's usually easy to know which:
A program that does a lot of calculations but makes only a few, small operations with files, is almost certainly CPU-bound.
A program that reads a lot of data from files, or writes a lot of data to them, but is not very demanding towards processing, is almost certainly IO-bound.
Things are kinda straightforward when trying to improve an CPU-bounded program's speed. It mostly comes down to achieving the same goal or effect while making less operations.
This, on the other hand, does not make the process any easier. In fact, it's usually much harder to optimize CPU-bounded programs than to optimize IO-bounded ones, because each CPU-related operation is usually unique, and has to be revised individually.
Although generally easier once you have the experience, things are not so straightforward with IO-bound programs. There are a lot more stuff to consider when dealing with IO-bound processes.
I'll be using Hard-Disk Drives (HDDs) as the basis, since the characteristics I'll mention affect HDDs the strongest (because they are mechanical), but you should keep in mind that many of the same concepts apply, to some extent, to almost every memory-storage hardware, including Solid-State Drives (SSDs) and even RAM!
These are the main performance characteristics of most memory-storage hardware:
Access time: Also known as response time, it is the time it takes before the hardware can actually transfer data.
For mechanical hardware such as HDDs, this is mostly related to the mechanical nature of the drive, in other words, it's rotating disk and moving "heads". As such, access time of mechanical drives can vary significantly between each-other.
For circuital hardware such as SSDs and RAM, this time is not dependent on moving parts, but rather electrical connections, so the access time is very quick and consistent, and you shouldn't worry about it.
Seek time: The time it takes for the hardware to seek (reach) the correct position within it's internal subdivisions, in order to read from or write to addresses in that section.
For mechanical drives, mainly rotary ones, the seek time measures the time it takes the head assembly on the actuator arm to travel to the track of the disk where the data will be read from or written to.
Average seek time ranges from 3 ms (~) for high-end server drives, to 15 ms (~) for mobile drives, with the most common desktop drives typically having a seek time around 9 ms (~).
With RAM and SSDs, there are no moving parts, so a measurement of the seek time is only testing the electronic circuits, and preparing a particular location on the memory in the device for the operation.
Typical SSDs will have a seek time between 0.08 to 0.16 ms (~), with RAM being even faster.
Command-Processing time: Also known as command overhead, it is the time it takes for the drive's electronics to set up the necessary communication between the various internal components, so it can read or write the data.
This is in the range of 0.003 ms (~) for both, mechanical and circuital devices, and is usually ignored in benchmarks.
Settle time: It is the time it takes for the heads to settle on the target track and stop vibrating, so that they do not read or write off-track.
This amount is usually very small (typically less than 0.1 ms), and typically included in benchmarks as part of the seek time.
Data-Transfer rate: Also called throughput, it covers both: The internal rate, which is the time it takes to move data between the disk surface and the controller on the drive. And the external rate, which is the time to move data between the controller on the drive and an external component in the host system. It has a few sub-factors within:
Media rate: Speed at which the drive can read bits from the media. In other words, the actual read/write speed.
Sector overhead: Additional time (bytes) needed for control structures and other information necessary to manage the drive, locate and validate data and perform other support functions.
Allocation speed: Similar to sector overhead, it's the time taken for the drive to determine the slots that will be written to, and to register them on it's address dictionary. Only needed for write operations.
Head-Switch time: Time required to electrically switch from one head to another; Only applies to multi-head drives and is about 1 to 2 ms.
Cylinder-switch time: Time required to move to an adjacent track; The name cylinder is used because typically all the tracks of a drive with more than one head or data surface are read before moving the actuator, implying the image of a circle or cylinder rather than a track. This time is exclusive to rotary mechanical drives, and is typically about about 2 to 3 ms.
This means that the main performance issues regarding IO are caused by going back-and-forth between IO and processing. An issue that can be enormously diminished by using buffers, and processing and reading/writhing in bigger chunks of data, rather than every byte.
As you can also see, although many of the speed characteristics are still present, RAM and SSDs do not have the same internal limits of HDDs, so their internal and external transfer rates often reach the maximum capabilities of the drive-to-host interface.
Chunk approach example:
This example will create a Test folder on the desktop, and generate a Test.txt file within.
The file is generated with an specified number of lines, each line containing the word "Test" repeated for an specific number of times (for file-size purposes). Each line is ended by "\r", "\n" or "\r\n", sequentially.
It's meaningless to save the results of each chunk in-memory cumulatively, as doing so would lead the whole file end up in-memory eventually, which is nearly the same problem of not using chunks to begin with.
As such, an output file is created in the same Test folder, to which the result of every chunk is stored at, once that chunk is finished.
The base file is read using buffers, and those buffers are additionally used as the chunks.
The process here is simply printing a textual version of the line-separator ("\\r", "\\n" or "\\r\\n"), followed by ": ", followed by the line contents; But for the last line, "EOF" is used instead.
To actually operate with chunks, it's probably easier to manage with a class-based approach, rather than a purely function-based one.
Anyways, here goes the code:
public static void main(String[] args) throws FileNotFoundException, IOException {
File file = new File(TEST_FOLDER, "Test.txt");
//These settings create a 122 MB file.
generateTestFile(file, 500000, 50);
long clock = System.nanoTime();
processChunks(file, 8 * (int) Math.pow(1024, 2));
clock = System.nanoTime() - clock;
float millis = clock / 1000000f;
float seconds = millis / 1000f;
System.out.printf(""
+ "%12d nanos\n"
+ "%12.3f millis\n"
+ "%12.3f seconds\n",
clock, millis, seconds);
}
public static File prepareResultFile(File source) {
String ofn = source.getName(); //Original File Name.
int extPos = ofn.lastIndexOf('.'); //Extension index.
String ext = ofn.substring(extPos); //Get extension.
ofn = ofn.substring(0, extPos); //Get name without extension reusing 'ofn'.
return new File(source.getParentFile(), ofn + "_Result" + ext);
}
public static void processChunks(File file, int buffSize)
throws FileNotFoundException, IOException {
//No need for buffers bigger than the file itself.
if (file.length() < buffSize) {
buffSize = (int)file.length();
}
byte[] buffer = new byte[buffSize];
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file), buffSize);
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(
prepareResultFile(file)), buffSize);
StringBuilder sb = new StringBuilder();
while (bis.read(buffer) > (-1)) {
//Check if a "\r\n" was split between chunks.
boolean skipFirst = false;
if (sb.length() > 0 && sb.charAt(sb.length() - 1) == '\r') {
if (buffer[0] == '\n') {
bos.write(("\\r\\n: " + sb.toString() + System.lineSeparator()).getBytes());
sb = new StringBuilder();
skipFirst = true;
}
}
for (int i = skipFirst ? 1 : 0; i < buffer.length; i++) {
if (buffer[i] == '\r') {
if (i + 1 < buffer.length) {
if (buffer[i + 1] == '\n') {
bos.write(("\\r\\n: " + sb.toString() + System.lineSeparator()).getBytes());
i++; //Skip '\n'.
} else {
bos.write(("\\r: " + sb.toString() + System.lineSeparator()).getBytes());
}
sb = new StringBuilder(); //Reset accumulator.
} else {
//A "\r\n" might be split between two chunks.
}
} else if (buffer[i] == '\n') {
bos.write(("\\n: " + sb.toString() + System.lineSeparator()).getBytes());
sb = new StringBuilder(); //Reset accumulator.
} else {
sb.append((char) buffer[i]);
}
}
}
bos.write(("EOF: " + sb.toString()).getBytes());
bos.flush();
bos.close();
bis.close();
System.out.println("Finished!");
}
public static boolean generateTestFile(File file, int lines, int elements)
throws IOException {
String[] lineBreakers = {"\r", "\n", "\r\n"};
BufferedOutputStream bos = null;
try {
bos = new BufferedOutputStream(new FileOutputStream(file));
for (int i = 0; i < lines; i++) {
for (int ii = 1; ii < elements; ii++) {
bos.write("test ".getBytes());
}
bos.write("test".getBytes());
bos.write(lineBreakers[i % 3].getBytes());
}
bos.flush();
System.out.printf("LOG: Test file \"%s\" created.\n", file.getName());
return true;
} catch (IOException ex) {
System.err.println("ERR: Could not write file.");
throw ex;
} finally {
try {
bos.close();
} catch (IOException ex) {
System.err.println("WRN: Could not close stream.");
Logger.getLogger(Q_13458142_v2.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I don't know what IDE you are using, but if it's NetBeans, make a memory-profile of your code and compare to a profile of this one. You should notice a big difference in the amount of memory needed during processing.
Here, the chunk approach's memory usage, which includes not only the chunk itself but also the program's own variables and structures, does not go over 40 MB even tough we are dealing with a file bigger than 100 MB. As you can see:
It also spends very little time in GB, mostly less than 5% at any given point:
The second version doesn't seem to use BufferedReader or another form of buffer. It might be the cause of slow down.
Since you seem to read the whole file in memory, you can perhaps read it as a big string (with a buffer) then parse it in memory to analyze the line endings.
Your are doubling the out statements(one for line and one for newline):
Can you try below(use lineSeparator() to get the line separator and append before writing):
out.write(lineArray.get(i)+System.lineSeparator());
Don't reinvent the wheel.
Check the BufferedReader#readLine() code
Copy, paste, and make the changes you need to keep the line separator inside the line

Categories

Resources