Limit Android Filesize - java

Background
I'm keeping a relatively large text file in android storage, and appending to it periodically- while limiting the file's size to some arbitrary size (say 2MB)
Hopefully I'm missing a function somewhere, or hopefully there is a better way to do this process.
Currently, when the file a goes over that arbitrary size, I create a temporary file b, copy the relevant portion of the file a (more or less the substring of the file a starting at byte xxx where xxx is the number of bytes too large the file a would be if I wrote the next bit of data to the log) plus the current data, then overwrite the file a with the second file b.
This is obviously terribly inefficient...
Another solution that I'm not terribly fond of is to keep two files, and toggle between the two of them, clearing the next when the current is full, and switching to that file for output.
However, it would be suuuuuper handy if I could just do something like this
File A = new File("output");
A.chip(500);
or maybe
A.subfile(500,A.length()-500);
TLDR;
Is there a function or perhaps library available for Android that can remove a portion of a file?

Did you already take a look at RandomAccessFile? Though you cannot remove portions of a file you can seek any position within the file and even set the length. So if you detect your file grows too large, just grab the relevant portion and jump to the beginning. Set length to 0 and write the new data.
EDIT:
I wrote a small demo. It shows if the file size is limeted to 10 bytes. If you pass in the values 10 to 15 as strings and separate them with commas, after 10,11,12, the file is written from the beginning, so after 15 it reads 13,14,15
public class MainActivity extends Activity {
private static final String TAG = MainActivity.class.getSimpleName();
private static final long MAX = 10;
private static final String FILE_TXT = "file.txt";
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
for (int i = 10; i <= 15; i++) {
if (i > 10) {
writeToFile(",");
}
writeToFile(Integer.toString(i));
}
}
private void writeToFile(String text) {
try {
File f = new File(getFilesDir(), FILE_TXT);
RandomAccessFile file = new RandomAccessFile(f, "rw");
long currentLength = file.length();
if (currentLength + text.length() > MAX) {
file.setLength(0);
}
file.seek(file.length());
file.write(text.getBytes());
file.close();
} catch (IOException e) {
Log.e(TAG, "writeToFile()", e);
}
printFileContents();
}
private void printFileContents() {
StringBuilder sb = new StringBuilder();
try {
FileInputStream fin = openFileInput(FILE_TXT);
int ch;
while ((ch = fin.read()) != -1) {
sb.append((char) ch);
}
fin.close();
} catch (IOException e) {
Log.e(TAG, "printFileContents()", e);
}
Log.d(TAG, "current content: " + sb.toString());
}
}

Related

Holding an static StringBuilder

We are looking to have a String builder that holds references to some events in the device,
We considered to write and read a file but the cost of opening and closing a file every time we write to it seems too high.
The issue is that sometimes we are getting a StackOverflow exception even if we try to keep the StringBuilder for just a defined size
public class DiagnosticUtil {
private static final int DIAGNOSTIC_SIZE = 5000;
public static StringBuilder DIAGNOSTICS_HOLDER = new StringBuilder(DIAGNOSTIC_SIZE);
public static void addDiagnosticLine(String message){
try {
//Limits the size of the diagnostics recolection removing the first 2000 characters
if (DiagnosticUtil.DIAGNOSTICS_HOLDER.length() > DIAGNOSTIC_SIZE - 300) {
DiagnosticUtil.DIAGNOSTICS_HOLDER.delete(0, DiagnosticUtil.DIAGNOSTICS_HOLDER.length() - 2000);
}
DIAGNOSTICS_HOLDER.append(TimeUtils.getCurrentDate()).append(message).append("\n");
}catch (Exception e){
Timber.d("Error saving additional data");
}
}
}
The question is, Is this a good approach? Or should we save this logs to an external file?.
Thanks!
When you create StringBuilder with preferred size you consume memory, because internally in StringBuilder created char[] array with this size before you pass any String to it, so you need to use default constructor there. Why do you decide to use Builder there instead of List? I don't see all picture but i think you may be prefer to choose something different with two approaches (memory log and file log store) When you collect certain amount of messages simple write it to file, in this way you don't need to touch filesystem for every message and don't populate memory with that amount of log data. You need code something like this:
public class DiagnosticUtil {
private final static int threshold = 1000;
private static List<String> messages = new ArrayList<>();
private static final File log = new File("path to your file");
public static void addDiagnosticLine(String message) {
if (messages.size() > threshold) {
try (BufferedWriter file = new BufferedWriter(new FileWriter(log))) {
for (String msg : messages) {
file.write(msg);
}
file.flush();
} catch (IOException e) {
Timber.d("Error saving additional data " + e);
}
messages = new ArrayList<>();
} else {
messages.add(TimeUtils.getCurrentDate() + message + "\n");
}
}
}
pay attention this is procedural code, not oop, util classes are bad

Reading a block of bytes from one file and writing to other until all blocks are read?

I am working a project in which I have to play with some file reading writing tasks. I have to read 8 bytes from a file at one time and perform some operations on that block and then write that block to second file, then repeat the cycle until first file is completely read in chuncks of 8 bytes everytime and the after manipulation the data should be added/appended to the second. However, in doing so, I am facing some problems. Following is what I am trying:
private File readFromFile1(File file1) {
int offset = 0;
long message= 0;
try {
FileInputStream fis = new FileInputStream(file1);
byte[] data = new byte[8];
file2 = new File("file2.txt");
FileOutputStream fos = new FileOutputStream(file2.getAbsolutePath(), true);
DataOutputStream dos = new DataOutputStream(fos);
while(fis.read(data, offset, 8) != -1)
{
message = someOperation(data); // operation according to business logic
dos.writeLong(message);
}
fos.close();
dos.close();
fis.close();
} catch (IOException e) {
System.out.println("Some error occurred while reading from File:" + e);
}
return file2;
}
I am not getting the desired output this way. Any help is appreciated.
Consider the following code:
private File readFromFile1(File file1) {
int offset = 0;
long message = 0;
File file2 = null;
try {
FileInputStream fis = new FileInputStream(file1);
byte[] data = new byte[8]; //Read buffer
byte[] tmpbuf = new byte[8]; //Temporary chunk buffer
file2 = new File("file2.txt");
FileOutputStream fos = new FileOutputStream(file2.getAbsolutePath(), true);
DataOutputStream dos = new DataOutputStream(fos);
int readcnt; //Read count
int chunk; //Chunk size to write to tmpbuf
while ((readcnt = fis.read(data, 0, 8)) != -1) {
//// POINT A ////
//Skip chunking system if an 8 byte octet is read directly.
if(readcnt == 8 && offset == 0){
message = someOperation(tmpbuf); // operation according to business logic
dos.writeLong(message);
continue;
}
//// POINT B ////
chunk = Math.min(tmpbuf.length - offset, readcnt); //Determine how much to add to the temp buf.
System.arraycopy(data, 0, tmpbuf, offset, chunk); //Copy bytes to temp buf
offset = offset + chunk; //Sets the offset to temp buf
if (offset == 8) {
message = someOperation(tmpbuf); // operation according to business logic
dos.writeLong(message);
if (chunk < readcnt) {
System.arraycopy(data, chunk, tmpbuf, 0, readcnt - chunk);
offset = readcnt - chunk;
} else {
offset = 0;
}
}
}
//// POINT C ////
//Process remaining bytes here...
//message = foo(tmpbuf);
//dos.writeLong(message);
fos.close();
dos.close();
fis.close();
} catch (IOException e) {
System.out.println("Some error occurred while reading from File:" + e);
}
return file2;
}
In this excerpt of code, what I did was:
Modify your reading code to include the amount of bytes actually read from the read() method (noted readcnt).
Added a byte chunking system (the processing does not happen until there are at least 8 bytes in the chunking buffer).
Allowed for separate processing of the final bytes (that do not make up a 8 byte octet).
As you can see from the code, the data being read is first stored in a chunking buffer (denoted tmpbuf) until at least 8 bytes are available. This will happen only if 8 bytes are not always available (If 8 bytes are available directly and nothing is chunked, directly process. See "Point A" in code). This is done as a form of optimization to prevent excess array copies.
The chunking system uses offsets which increment every time bytes are written to tmpbuf until it reaches a value of 8 (it will not go over as the Math.min() method used in the assignment of 'chunk' will limit the value). Upon offset == 8, proceed to execute the processing code.
If that particular read produced more bytes than actually processed, continue writing them to tmpbuf, from the beginning again, whilst setting offset appropriately, otherwise set offset to 0.
Repeat cycle.
The code will leave the last few bytes of data that do not fit in an octet in the array tmpbuf with the offset variable indicating how much has actually been written. This data can then be processed separately at point C.
Seems a lot more complicating than it should be, and there probably is a better solution (possibly using existing java library methods), but off the top of my head, this is what I got. Hope this is clear enough for you to understand.
You could use the following, it uses NIO and especially the ByteBuffer class for the long handling. You can of course implement it the standard java way, but since i am a NIO fan, here is a possible solution.
The major problem in your code is that while(fis.read(data, offset, 8) != -1) will read up to 8 bytes, and not always 8 bytes, plus reading in such small portions is not very efficient.
I have put some comments in my code, if something is unclear please leave a comment. My someOperation(...) function just copies the next long value from the buffer.
Update:
added finally block to close the files.
import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.StandardOpenOption;
public class TestFile {
static final int IN_BUFFER_SIZE = 1024 * 8;
static final int OUT_BUFFER_SIZE = 1024 *9; // make the out-buffer > in-buffer, i am lazy and don't want to check for overruns
static final int MIN_READ_BYTES = 8;
static final int MIN_WRITE_BYTES = 8;
private File readFromFile1(File inFile) {
final File outFile = new File("file2.txt");
final ByteBuffer inBuffer = ByteBuffer.allocate(IN_BUFFER_SIZE);
final ByteBuffer outBuffer = ByteBuffer.allocate(OUT_BUFFER_SIZE);
FileChannel readChannel = null;
FileChannel writeChannel = null;
try {
// open a file channel for reading and writing
readChannel = FileChannel.open(inFile.toPath(), StandardOpenOption.READ);
writeChannel = FileChannel.open(outFile.toPath(), StandardOpenOption.CREATE, StandardOpenOption.WRITE);
long totalReadByteCount = 0L;
long totalWriteByteCount = 0L;
boolean readMore = true;
while (readMore) {
// read some bytes into the in-buffer
int readOp = 0;
while ((readOp = readChannel.read(inBuffer)) != -1) {
totalReadByteCount += readOp;
} // while
// prepare the in-buffer to be consumed
inBuffer.flip();
// check if there where errors
if (readOp == -1) {
// end of file reached, read no more
readMore = false;
} // if
// now consume the in-buffer until there are at least MIN_READ_BYTES in the buffer
while (inBuffer.remaining() >= MIN_READ_BYTES) {
// add data to the write buffer
outBuffer.putLong(someOperation(inBuffer));
} // while
// compact the in-buffer and prepare for the next read, if we need to read more.
// that way the possible remaining bytes of the in-buffer can be consumed after leaving the loop
if (readMore) inBuffer.compact();
// prepare the out-buffer to be consumed
outBuffer.flip();
// write the out-buffer until the buffer is empty
while (outBuffer.hasRemaining())
totalWriteByteCount += writeChannel.write(outBuffer);
// prepare the out-buffer for writing again
outBuffer.flip();
} // while
// error handling
if (inBuffer.hasRemaining()) {
System.err.println("Truncated data! Not a long value! bytes remaining: " + inBuffer.remaining());
} // if
System.out.println("read total: " + totalReadByteCount + " bytes.");
System.out.println("write total: " + totalWriteByteCount + " bytes.");
} catch (IOException e) {
System.out.println("Some error occurred while reading from File: " + e);
} finally {
if (readChannel != null) {
try {
readChannel.close();
} catch (IOException e) {
System.out.println("Could not close read channel: " + e);
} // catch
} // if
if (writeChannel != null) {
try {
writeChannel.close();
} catch (IOException e) {
System.out.println("Could not close write channel: " + e);
} // catch
} // if
} // finally
return outFile;
}
private long someOperation(ByteBuffer bb) {
// consume the buffer, do whatever you want with the buffer.
return bb.getLong(); // consumes 8 bytes of the buffer.
}
public static void main(String[] args) {
TestFile testFile = new TestFile();
File source = new File("input.txt");
testFile.readFromFile1(source);
}
}

Java match/exceed performance of readline

For my application, I had to write a custom "readline" method since I wanted to detect and preserve the newline endings in an ASCII text file. The Java readLine() method does not tell which newline sequence (\r, \n, \r\n) or EOF was encountered, so I cannot put the exact same newline sequence when writing to the modified file.
Here is the SSCE of my test example.
public class TestLineIO {
public static java.util.ArrayList<String> readLineArrayFromFile1(java.io.File file) {
java.util.ArrayList<String> lineArray = new java.util.ArrayList<String>();
try {
java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(file));
String strLine;
while ((strLine = br.readLine()) != null) {
lineArray.add(strLine);
}
br.close();
} catch (java.io.IOException e) {
System.err.println("Could not read file");
System.err.println(e);
}
lineArray.trimToSize();
return lineArray;
}
public static boolean writeLineArrayToFile1(java.util.ArrayList<String> lineArray, java.io.File file) {
try {
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
int size = lineArray.size();
for (int i = 0; i < size; i++) {
out.write(lineArray.get(i));
out.newLine();
}
out.close();
} catch (java.io.IOException e) {
System.err.println("Could not write file");
System.err.println(e);
return false;
}
return true;
}
public static java.util.ArrayList<String> readLineArrayFromFile2(java.io.File file) {
java.util.ArrayList<String> lineArray = new java.util.ArrayList<String>();
try {
java.io.FileInputStream stream = new java.io.FileInputStream(file);
try {
java.nio.channels.FileChannel fc = stream.getChannel();
java.nio.MappedByteBuffer bb = fc.map(java.nio.channels.FileChannel.MapMode.READ_ONLY, 0, fc.size());
char[] fileArray = java.nio.charset.Charset.defaultCharset().decode(bb).array();
if (fileArray == null || fileArray.length == 0) {
return lineArray;
}
int length = fileArray.length;
int start = 0;
int index = 0;
while (index < length) {
if (fileArray[index] == '\n') {
lineArray.add(new String(fileArray, start, index - start + 1));
start = index + 1;
} else if (fileArray[index] == '\r') {
if (index == length - 1) { //last character in the file
lineArray.add(new String(fileArray, start, length - start));
start = length;
break;
} else {
if (fileArray[index + 1] == '\n') {
lineArray.add(new String(fileArray, start, index - start + 2));
start = index + 2;
index++;
} else {
lineArray.add(new String(fileArray, start, index - start + 1));
start = index + 1;
}
}
}
index++;
}
if (start < length) {
lineArray.add(new String(fileArray, start, length - start));
}
} finally {
stream.close();
}
} catch (java.io.IOException e) {
System.err.println("Could not read file");
System.err.println(e);
e.printStackTrace();
return lineArray;
}
lineArray.trimToSize();
return lineArray;
}
public static boolean writeLineArrayToFile2(java.util.ArrayList<String> lineArray, java.io.File file) {
try {
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
int size = lineArray.size();
for (int i = 0; i < size; i++) {
out.write(lineArray.get(i));
}
out.close();
} catch (java.io.IOException e) {
System.err.println("Could not write file");
System.err.println(e);
return false;
}
return true;
}
public static void main(String[] args) {
System.out.println("Begin");
String fileName = "test.txt";
long start = 0;
long stop = 0;
start = java.util.Calendar.getInstance().getTimeInMillis();
java.io.File f = new java.io.File(fileName);
java.util.ArrayList<String> javaLineArray = readLineArrayFromFile1(f);
stop = java.util.Calendar.getInstance().getTimeInMillis();
System.out.println("Total time = " + (stop - start) + " ms");
java.io.File oj = new java.io.File(fileName + "_readline.txt");
writeLineArrayToFile1(javaLineArray, oj);
start = java.util.Calendar.getInstance().getTimeInMillis();
java.util.ArrayList<String> myLineArray = readLineArrayFromFile2(f);
stop = java.util.Calendar.getInstance().getTimeInMillis();
System.out.println("Total time = " + (stop - start) + " ms");
java.io.File om = new java.io.File(fileName + "_custom.txt");
writeLineArrayToFile2(myLineArray, om);
System.out.println("End");
}
}
Version 1 uses readLine(), whereas version 2 is my version, which preserves newline characters.
On a text file with about 500K lines, version1 takes about 380 ms, whereas version2 takes 1074 ms.
How can I speed-up the performance of version2?
I checked Google guava and apache-commons libraries but cannot find a suitable replacement for "readLine()" that will tell which newline character was encountered when reading a text file.
Whenever the issue regards a program's speed, the main thing you should keep in mind is that, for any continuous process within that program, the speed is nearly always limited by one of two things: CPU (processing power) or IO (memory allocation and transfer speed).
Usually either your CPU is faster than your IO, or the contrary. Because of this, your program's speed-limit is almost always dictated by one of them, and it's usually easy to know which:
A program that does a lot of calculations but makes only a few, small operations with files, is almost certainly CPU-bound.
A program that reads a lot of data from files, or writes a lot of data to them, but is not very demanding towards processing, is almost certainly IO-bound.
Things are kinda straightforward when trying to improve an CPU-bounded program's speed. It mostly comes down to achieving the same goal or effect while making less operations.
This, on the other hand, does not make the process any easier. In fact, it's usually much harder to optimize CPU-bounded programs than to optimize IO-bounded ones, because each CPU-related operation is usually unique, and has to be revised individually.
Although generally easier once you have the experience, things are not so straightforward with IO-bound programs. There are a lot more stuff to consider when dealing with IO-bound processes.
I'll be using Hard-Disk Drives (HDDs) as the basis, since the characteristics I'll mention affect HDDs the strongest (because they are mechanical), but you should keep in mind that many of the same concepts apply, to some extent, to almost every memory-storage hardware, including Solid-State Drives (SSDs) and even RAM!
These are the main performance characteristics of most memory-storage hardware:
Access time: Also known as response time, it is the time it takes before the hardware can actually transfer data.
For mechanical hardware such as HDDs, this is mostly related to the mechanical nature of the drive, in other words, it's rotating disk and moving "heads". As such, access time of mechanical drives can vary significantly between each-other.
For circuital hardware such as SSDs and RAM, this time is not dependent on moving parts, but rather electrical connections, so the access time is very quick and consistent, and you shouldn't worry about it.
Seek time: The time it takes for the hardware to seek (reach) the correct position within it's internal subdivisions, in order to read from or write to addresses in that section.
For mechanical drives, mainly rotary ones, the seek time measures the time it takes the head assembly on the actuator arm to travel to the track of the disk where the data will be read from or written to.
Average seek time ranges from 3 ms (~) for high-end server drives, to 15 ms (~) for mobile drives, with the most common desktop drives typically having a seek time around 9 ms (~).
With RAM and SSDs, there are no moving parts, so a measurement of the seek time is only testing the electronic circuits, and preparing a particular location on the memory in the device for the operation.
Typical SSDs will have a seek time between 0.08 to 0.16 ms (~), with RAM being even faster.
Command-Processing time: Also known as command overhead, it is the time it takes for the drive's electronics to set up the necessary communication between the various internal components, so it can read or write the data.
This is in the range of 0.003 ms (~) for both, mechanical and circuital devices, and is usually ignored in benchmarks.
Settle time: It is the time it takes for the heads to settle on the target track and stop vibrating, so that they do not read or write off-track.
This amount is usually very small (typically less than 0.1 ms), and typically included in benchmarks as part of the seek time.
Data-Transfer rate: Also called throughput, it covers both: The internal rate, which is the time it takes to move data between the disk surface and the controller on the drive. And the external rate, which is the time to move data between the controller on the drive and an external component in the host system. It has a few sub-factors within:
Media rate: Speed at which the drive can read bits from the media. In other words, the actual read/write speed.
Sector overhead: Additional time (bytes) needed for control structures and other information necessary to manage the drive, locate and validate data and perform other support functions.
Allocation speed: Similar to sector overhead, it's the time taken for the drive to determine the slots that will be written to, and to register them on it's address dictionary. Only needed for write operations.
Head-Switch time: Time required to electrically switch from one head to another; Only applies to multi-head drives and is about 1 to 2 ms.
Cylinder-switch time: Time required to move to an adjacent track; The name cylinder is used because typically all the tracks of a drive with more than one head or data surface are read before moving the actuator, implying the image of a circle or cylinder rather than a track. This time is exclusive to rotary mechanical drives, and is typically about about 2 to 3 ms.
This means that the main performance issues regarding IO are caused by going back-and-forth between IO and processing. An issue that can be enormously diminished by using buffers, and processing and reading/writhing in bigger chunks of data, rather than every byte.
As you can also see, although many of the speed characteristics are still present, RAM and SSDs do not have the same internal limits of HDDs, so their internal and external transfer rates often reach the maximum capabilities of the drive-to-host interface.
Chunk approach example:
This example will create a Test folder on the desktop, and generate a Test.txt file within.
The file is generated with an specified number of lines, each line containing the word "Test" repeated for an specific number of times (for file-size purposes). Each line is ended by "\r", "\n" or "\r\n", sequentially.
It's meaningless to save the results of each chunk in-memory cumulatively, as doing so would lead the whole file end up in-memory eventually, which is nearly the same problem of not using chunks to begin with.
As such, an output file is created in the same Test folder, to which the result of every chunk is stored at, once that chunk is finished.
The base file is read using buffers, and those buffers are additionally used as the chunks.
The process here is simply printing a textual version of the line-separator ("\\r", "\\n" or "\\r\\n"), followed by ": ", followed by the line contents; But for the last line, "EOF" is used instead.
To actually operate with chunks, it's probably easier to manage with a class-based approach, rather than a purely function-based one.
Anyways, here goes the code:
public static void main(String[] args) throws FileNotFoundException, IOException {
File file = new File(TEST_FOLDER, "Test.txt");
//These settings create a 122 MB file.
generateTestFile(file, 500000, 50);
long clock = System.nanoTime();
processChunks(file, 8 * (int) Math.pow(1024, 2));
clock = System.nanoTime() - clock;
float millis = clock / 1000000f;
float seconds = millis / 1000f;
System.out.printf(""
+ "%12d nanos\n"
+ "%12.3f millis\n"
+ "%12.3f seconds\n",
clock, millis, seconds);
}
public static File prepareResultFile(File source) {
String ofn = source.getName(); //Original File Name.
int extPos = ofn.lastIndexOf('.'); //Extension index.
String ext = ofn.substring(extPos); //Get extension.
ofn = ofn.substring(0, extPos); //Get name without extension reusing 'ofn'.
return new File(source.getParentFile(), ofn + "_Result" + ext);
}
public static void processChunks(File file, int buffSize)
throws FileNotFoundException, IOException {
//No need for buffers bigger than the file itself.
if (file.length() < buffSize) {
buffSize = (int)file.length();
}
byte[] buffer = new byte[buffSize];
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file), buffSize);
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(
prepareResultFile(file)), buffSize);
StringBuilder sb = new StringBuilder();
while (bis.read(buffer) > (-1)) {
//Check if a "\r\n" was split between chunks.
boolean skipFirst = false;
if (sb.length() > 0 && sb.charAt(sb.length() - 1) == '\r') {
if (buffer[0] == '\n') {
bos.write(("\\r\\n: " + sb.toString() + System.lineSeparator()).getBytes());
sb = new StringBuilder();
skipFirst = true;
}
}
for (int i = skipFirst ? 1 : 0; i < buffer.length; i++) {
if (buffer[i] == '\r') {
if (i + 1 < buffer.length) {
if (buffer[i + 1] == '\n') {
bos.write(("\\r\\n: " + sb.toString() + System.lineSeparator()).getBytes());
i++; //Skip '\n'.
} else {
bos.write(("\\r: " + sb.toString() + System.lineSeparator()).getBytes());
}
sb = new StringBuilder(); //Reset accumulator.
} else {
//A "\r\n" might be split between two chunks.
}
} else if (buffer[i] == '\n') {
bos.write(("\\n: " + sb.toString() + System.lineSeparator()).getBytes());
sb = new StringBuilder(); //Reset accumulator.
} else {
sb.append((char) buffer[i]);
}
}
}
bos.write(("EOF: " + sb.toString()).getBytes());
bos.flush();
bos.close();
bis.close();
System.out.println("Finished!");
}
public static boolean generateTestFile(File file, int lines, int elements)
throws IOException {
String[] lineBreakers = {"\r", "\n", "\r\n"};
BufferedOutputStream bos = null;
try {
bos = new BufferedOutputStream(new FileOutputStream(file));
for (int i = 0; i < lines; i++) {
for (int ii = 1; ii < elements; ii++) {
bos.write("test ".getBytes());
}
bos.write("test".getBytes());
bos.write(lineBreakers[i % 3].getBytes());
}
bos.flush();
System.out.printf("LOG: Test file \"%s\" created.\n", file.getName());
return true;
} catch (IOException ex) {
System.err.println("ERR: Could not write file.");
throw ex;
} finally {
try {
bos.close();
} catch (IOException ex) {
System.err.println("WRN: Could not close stream.");
Logger.getLogger(Q_13458142_v2.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I don't know what IDE you are using, but if it's NetBeans, make a memory-profile of your code and compare to a profile of this one. You should notice a big difference in the amount of memory needed during processing.
Here, the chunk approach's memory usage, which includes not only the chunk itself but also the program's own variables and structures, does not go over 40 MB even tough we are dealing with a file bigger than 100 MB. As you can see:
It also spends very little time in GB, mostly less than 5% at any given point:
The second version doesn't seem to use BufferedReader or another form of buffer. It might be the cause of slow down.
Since you seem to read the whole file in memory, you can perhaps read it as a big string (with a buffer) then parse it in memory to analyze the line endings.
Your are doubling the out statements(one for line and one for newline):
Can you try below(use lineSeparator() to get the line separator and append before writing):
out.write(lineArray.get(i)+System.lineSeparator());
Don't reinvent the wheel.
Check the BufferedReader#readLine() code
Copy, paste, and make the changes you need to keep the line separator inside the line

Java FTP Download progress

I have looked at many examples and tried to understand what i`m doing wrong but with no success, maybe you can help me. It always stops at the second file, but the first one is just crated on c:\ with 0kb size.
files_server.get(i) is ArrayList with all files that i wish to download.
My code:
public FTPConnection() {
StartD std = new StartD();
std.start();
}
class StartD extends Thread{
#Override
public void run()
{
for (int i = 0; i < files_server.size(); i++) {
err = ftpDownload(files_server.get(i), "C:/"+ files_server.get(i));
if (!err)
{
System.out.println("Error in download, breaking");
break;
}
}
}
public boolean ftpDownload(String srcFilePath, String desFilePath)
{
try {
FileOutputStream desFileStream = new FileOutputStream(desFilePath);
InputStream input = mFTPClient.retrieveFileStream(srcFilePath);
byte[] data = new byte[1024];
int count;
while ((count = input.read(data)) != -1)
{
desFileStream.write(data, 0, count);
}
desFileStream.close();
} catch (Exception e) {
return false;
}
return true;
}}
If I use the finction:
public boolean ftpDownload(String srcFilePath, String desFilePath) {
boolean status = false;
try {
FileOutputStream desFileStream = new FileOutputStream(desFilePath);
status = mFTPClient.retrieveFile(srcFilePath, desFileStream);
desFileStream.close();
return status;
} catch (Exception e) {
}
return status;
}
instead, everything works just fine, but i can`t monitor file download progress.
I've only used it for file unzipping and not FTP, but in that case InputStream buffers can return zero, so I'd say it's worth trying changing your while loop to something like:
while ((count = input.read(data)) >= 0)
public int read(byte[] b) throws IOException
Reads some number of bytes from the input stream and stores them into the buffer array b.
The number of bytes actually read is returned as an integer. This
method blocks until input data is available, end of file is detected,
or an exception is thrown.
If the length of b is zero, then no bytes are read and 0 is returned;
It could also be that you're assigning count twice, which could chop the first byte off the data:
int count = input.read(data);
while ((count = input.read(data)) != -1)
So don't assign anything to count when you declare it.
Let's assume your library is the FTP client from the commons-net package. It's not easy to figure out what's wrong with your code, because we can't run it and because your description (the second file stops) is not sufficient (does it throw an exception? Does it hang forever? Does it complete without any side effect?). Anyway I have a couple of advices:
Use a CountingOutputStream (from Apache commons-io) to monitor progress
Use a ProtocolCommandListener to log what's going on
Also, note that the first 1024 bytes are always lost. Eventually, I don't know how safe it is to put a file in C:\ with the same name it has on the server. At the best, it could lead to permission troubles, at the worst it may originate a security flaw - anyway this doesn't hold if you have some degree of control over the filenames, but hey consider this advice.
This is a sample client
public class FTP {
public static void main(String[] args) throws SocketException, IOException {
FTPClient client = new FTPClient();
client.addProtocolCommandListener(new ProtocolCommandListener(){
#Override
public void protocolCommandSent(ProtocolCommandEvent evt) {
logger.debug(evt.getMessage());
}
#Override
public void protocolReplyReceived(ProtocolCommandEvent evt) {
logger.debug(evt.getMessage());
}
});
client.connect("ftp.mozilla.org");
client.login("anonymous", "");
client.enterLocalPassiveMode();
OutputStream out = new CountingOutputStream(new NullOutputStream()) {
#Override
public void beforeWrite(int count) {
super.beforeWrite(count);
logger.info("Downloaded " + getCount() + " bytes");
}
};
for (String filename: new String[] {"MD5SUMS", "SHA1SUMS"})
client.retrieveFile("pub/firefox/releases/15.0b4/" + filename, out);
out.close();
client.disconnect();
}
private static Logger logger;
static {
logger = Logger.getLogger(FTP.class.getCanonicalName());
}
}
Once configured, the logger will output all the raw socket conversation, and it may help you to better understand the problem, provided it's on the FTP side and not in application IO

Java multiple connection downloading

I wanted to get some advice, I have started on a new project to create a java download accelerator that will use multiple connections. I wanted to know how best to go about this.
So far I have figured out that i can use HttpUrlConnection and use the range property, but wanted to know an efficient way of doing this. Once i have download the parts from the multiple connections i will then have to join the parts so that we end up with a fully downloaded file.
Thanks in advance :)
Get the content length of the file to download.
Divide it according to a criteria (size, speed, …).
Run multiple threads to download the file starting at different positions,
and save them in different files: myfile.part1, myfile.part2, …
Once downloaded, join the parts into one single file.
I tried the following code to get the content length:
public Downloader(String path) throws IOException {
int len = 0;
URL url = new URL(path);
URLConnection connectUrl = url.openConnection();
System.out.println(len = connectUrl.getContentLength());
System.out.println(connectUrl.getContentType());
InputStream input = connectUrl.getInputStream();
int i = len;
int c = 0;
System.out.println("=== Content ===");
while (((c = input.read()) != -1) && (--i > 0)) {
System.out.print((char) c);
}
input.close();
}
Here's a sample code to join the files:
public void join(String FilePath) {
long leninfile=0, leng=0;
int count=1, data=0;
try {
File filename = new File(FilePath);
RandomAccessFile outfile = new RandomAccessFile(filename,"rw");
while(true) {
filename = new File(FilePath + count + ".sp");
if (filename.exists()) {
RandomAccessFile infile = new RandomAccessFile(filename,"r");
data=infile.read();
while(data != -1) {
outfile.write(data);
data=infile.read();
}
leng++;
infile.close();
count++;
} else break;
}
outfile.close();
} catch(Exception e) {
e.printStackTrace();
}
}
If you want to avoid joining segments after downloading you could use a FileChannel.
With a FileChannel, you can write to any position of a file (even with multiple threads).
So you could first allocate the whole file, and then
write the segments where they belong as they come in.
See the Javadocs page for more info.
JDownloader is the best downloader I've seen. If you are interested, it's open source and surely you can learn a lot from their code.

Categories

Resources