Split mp3 file into chuncks using multiple threads - java

I have to write a program which can split and merge files with various extensions. While splitting and merging it should use multiple threads. My code can do only a half of the task - if I don't use multithreading, it splits the file perfectly. If I do use multithreading, it splits the file, but saves only the first part several times.
What should I fix to make it work?
A method of Splitter.class
public void splitFile(CustomFile customFile, int dataSize) {
for (int i = 1; i <= partsNumber; i++) {
FileSplitterThread thread = new FileSplitterThread(customFile, i, dataSize);
thread.start();
}
}
Run method of my thread:
#Override
public void run() {
try {
fileInputStream = new FileInputStream(initialFile.getData());
byte[] b = new byte[dataSize];
String fileName = initialFile.getName() + "_part_" + index + "." + initialFile.getExtension();
fileOutputStream = new FileOutputStream(fileName);
int i = fileInputStream.read(b);
fileOutputStream.write(b, 0, i);
fileOutputStream.close();
fileOutputStream = null;
} catch (IOException e) {
e.printStackTrace();
}
}

The reason is you cannot achieve multi-threaded file splitting with just InputStream. And you are reading the file from the beginning always, you are getting the same bytes
For a simple file splitting mechanism, the following could be the general steps:
Get the size of the file (data size)
Chunk it into offsets for each thread to read. Example, if you have 2 threads and the data is 1000 bytes, the offsets will be 0,1000/2, where the read length is 500. the first thread will read position from 0 to 499, the next thread will start at 500 and read till 999
Get two InputStreams and position them using Channel (here is a good post, Java how to read part of file from specified position of bytes?)
Encapsulate the above info: InputStream, offset, length to read, output file name etc. and provide it to each of the threads

Related

is it ok to use Parallel stream to FileWriter?

I want to write a Stream to file. However, the Stream is big (few Gb when write to file) so I want to use parallel. At the end of process, I would like to write to file (I am using FileWriter)
I would like to ask if that has potential cause any problem in file.
Here is some code
function to write stream to file
public static void writeStreamToFile(Stream<String> ss, String fileURI) {
try (FileWriter wr = new FileWriter(fileURI)) {
ss.forEach(line -> {
try {
if (line != null) {
wr.write(line + "\n");
}
} catch (Exception ex) {
System.err.println("error when write file");
}
});
} catch (IOException ex) {
Logger.getLogger(OaStreamer.class.getName()).log(Level.SEVERE, null, ex);
}
}
how I use my stream
Stream<String> ss = Files.lines(path).parallel()
.map(x->dosomething(x))
.map(x->dosomethingagain(x))
writeStreamToFile(ss, "path/to/output.csv")
As others have mentioned, this approach should work, however you should question if it is the best method. Writing to a file is a shared operation between threads meaning you are introducing thread contention.
While it is easy to think that having multiple threads will speed up performance, in the case of I/O operations the opposite is true. Remember I/O operations are finitely bounded, so more threads will not increase performance. In fact, this I/O contention will slow down access to the shared resource because of the constant locking/unlocking of the ability to write to the resource.
The bottom line is that only one thread can write to a file at a time, so parallelizing write operations is counterproductive.
Consider using multiple threads to handle your CPU intensive tasks, and then having all threads post to a queue/buffer. A single thread can then pull from the queue and write to your file. This solution (and more detail) was suggested in this answer.
Checkout this article for more info on thread contention and locks.
Yes It is Ok to use FileWriter as you are using, I have some another ways which may be helpful to you.
As you are dealing with large files, FileChannel can be faster than standard IO. The following code write String to a file using FileChannel:
#Test
public void givenWritingToFile_whenUsingFileChannel_thenCorrect()
throws IOException {
RandomAccessFile stream = new RandomAccessFile(fileName, "rw");
FileChannel channel = stream.getChannel();
String value = "Hello";
byte[] strBytes = value.getBytes();
ByteBuffer buffer = ByteBuffer.allocate(strBytes.length);
buffer.put(strBytes);
buffer.flip();
channel.write(buffer);
stream.close();
channel.close();
// verify
RandomAccessFile reader = new RandomAccessFile(fileName, "r");
assertEquals(value, reader.readLine());
reader.close();
}
Reference : https://www.baeldung.com/java-write-to-file
You can use Files.write with stream operations as below which converts the Stream to the Iterable:
Files.write(Paths.get(filepath), (Iterable<String>)yourstream::iterator);
For example:
Files.write(Paths.get("/dir1/dir2/file.txt"),
(Iterable<String>)IntStream.range(0, 1000).mapToObj(String::valueOf)::iterator);
If you have stream of some custom objects, you can always add the .map(Object::toString) step to apply the toString() method.
Heading
It is not a problem in case it is okay for the file to have the lines in random order. You are reading content in parallel, not in sequence. Therefore you have no guarantees at which point any line is coming in for processing.
That is only thing to keep in mind here.

Reading file >4GB file in java

I have mainframe data file which is greater than 4GB. I need to read and process the data for every 500 bytes. I have tried using FileChannel, however I am getting error with message Integer.Max_VALUE exceeded
public void getFileContent(String fileName) {
RandomAccessFile aFile = null;
FileChannel inChannel = null;
try {
aFile = new RandomAccessFile(Paths.get(fileName).toFile(), "r");
inChannel = aFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(500 * 100000);
while (inChannel.read(buffer) > 0) {
buffer.flip();
for (int i = 0; i < buffer.limit(); i++) {
byte[] data = new byte[500];
buffer.get(data);
processData(new String(data));
buffer.clear();
}
}
} catch (Exception ex) {
// TODO
} finally {
try {
inChannel.close();
aFile.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Can you help me out with a solution?
The worst problem of you code is the
catch (Exception ex) {
// TODO
}
part, which implies that you won’t notice any exceptions thrown by your code. Since there is nothing in the JRE printing a “Integer.Max_VALUE exceeded” message, that problem must be connected to your processData method.
It might be worth noting that this method will be invoked way too often with repeated data.
Your loop
for (int i = 0; i < buffer.limit(); i++) {
implies that you iterate as many times as there are bytes within the buffer, up to 500 * 100000 times. You are extracting 500 bytes from the buffer in each iteration, processing a total of up to 500 * 500 * 100000 bytes after each read, but since you have a misplaced buffer.clear(); at the end of the loop body, you will never experience a BufferUnderflowException. Instead, you will invoke processData each of the up to 500 * 100000 times with the first 500 bytes of the buffer.
But the whole conversion from bytes to a String is unnecessarily verbose and contains unnecessary copy operations. Instead of implementing this yourself, you can and should just use a Reader.
Besides that, your code makes a strange detour. It starts with a Java 7 API, Paths.get, to convert it to a legacy File object, create a legacy RandomAccessFile to eventually acquire a FileChannel. If you have a Path and want a FileChannel, you should open it directly via FileChannel.open. And, of course, use a try(…) { … } statement to ensure proper closing.
But, as said, if you want to process the contents as Strings, you surely want to use a Reader instead:
public void getFileContent(String fileName) {
try( Reader reader=Files.newBufferedReader(Paths.get(fileName)) ) {
CharBuffer buffer = CharBuffer.allocate(500 * 100000);
while(reader.read(buffer) > 0) {
buffer.flip();
while(buffer.remaining()>500) {
processData(buffer.slice().limit(500).toString());
buffer.position(buffer.position()+500);
}
buffer.compact();
}
// there might be a remaining chunk of less than 500 characters
if(buffer.position()>0) {
processData(buffer.flip().toString());
}
} catch(Exception ex) {
// the *minimum* to do:
ex.printStackTrace();
// TODO real exception handling
}
}
There is no problem with processing files >4GB, I just tested it with a 8GB file. Note that the code above uses the UTF-8 encoding. If you want to retain the behavior of your original code of using whatever happens to be your system’s default encoding, you may create the Reader using
Files.newBufferedReader(Paths.get(fileName), Charset.defaultCharset())
instead.

Java match/exceed performance of readline

For my application, I had to write a custom "readline" method since I wanted to detect and preserve the newline endings in an ASCII text file. The Java readLine() method does not tell which newline sequence (\r, \n, \r\n) or EOF was encountered, so I cannot put the exact same newline sequence when writing to the modified file.
Here is the SSCE of my test example.
public class TestLineIO {
public static java.util.ArrayList<String> readLineArrayFromFile1(java.io.File file) {
java.util.ArrayList<String> lineArray = new java.util.ArrayList<String>();
try {
java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(file));
String strLine;
while ((strLine = br.readLine()) != null) {
lineArray.add(strLine);
}
br.close();
} catch (java.io.IOException e) {
System.err.println("Could not read file");
System.err.println(e);
}
lineArray.trimToSize();
return lineArray;
}
public static boolean writeLineArrayToFile1(java.util.ArrayList<String> lineArray, java.io.File file) {
try {
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
int size = lineArray.size();
for (int i = 0; i < size; i++) {
out.write(lineArray.get(i));
out.newLine();
}
out.close();
} catch (java.io.IOException e) {
System.err.println("Could not write file");
System.err.println(e);
return false;
}
return true;
}
public static java.util.ArrayList<String> readLineArrayFromFile2(java.io.File file) {
java.util.ArrayList<String> lineArray = new java.util.ArrayList<String>();
try {
java.io.FileInputStream stream = new java.io.FileInputStream(file);
try {
java.nio.channels.FileChannel fc = stream.getChannel();
java.nio.MappedByteBuffer bb = fc.map(java.nio.channels.FileChannel.MapMode.READ_ONLY, 0, fc.size());
char[] fileArray = java.nio.charset.Charset.defaultCharset().decode(bb).array();
if (fileArray == null || fileArray.length == 0) {
return lineArray;
}
int length = fileArray.length;
int start = 0;
int index = 0;
while (index < length) {
if (fileArray[index] == '\n') {
lineArray.add(new String(fileArray, start, index - start + 1));
start = index + 1;
} else if (fileArray[index] == '\r') {
if (index == length - 1) { //last character in the file
lineArray.add(new String(fileArray, start, length - start));
start = length;
break;
} else {
if (fileArray[index + 1] == '\n') {
lineArray.add(new String(fileArray, start, index - start + 2));
start = index + 2;
index++;
} else {
lineArray.add(new String(fileArray, start, index - start + 1));
start = index + 1;
}
}
}
index++;
}
if (start < length) {
lineArray.add(new String(fileArray, start, length - start));
}
} finally {
stream.close();
}
} catch (java.io.IOException e) {
System.err.println("Could not read file");
System.err.println(e);
e.printStackTrace();
return lineArray;
}
lineArray.trimToSize();
return lineArray;
}
public static boolean writeLineArrayToFile2(java.util.ArrayList<String> lineArray, java.io.File file) {
try {
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
int size = lineArray.size();
for (int i = 0; i < size; i++) {
out.write(lineArray.get(i));
}
out.close();
} catch (java.io.IOException e) {
System.err.println("Could not write file");
System.err.println(e);
return false;
}
return true;
}
public static void main(String[] args) {
System.out.println("Begin");
String fileName = "test.txt";
long start = 0;
long stop = 0;
start = java.util.Calendar.getInstance().getTimeInMillis();
java.io.File f = new java.io.File(fileName);
java.util.ArrayList<String> javaLineArray = readLineArrayFromFile1(f);
stop = java.util.Calendar.getInstance().getTimeInMillis();
System.out.println("Total time = " + (stop - start) + " ms");
java.io.File oj = new java.io.File(fileName + "_readline.txt");
writeLineArrayToFile1(javaLineArray, oj);
start = java.util.Calendar.getInstance().getTimeInMillis();
java.util.ArrayList<String> myLineArray = readLineArrayFromFile2(f);
stop = java.util.Calendar.getInstance().getTimeInMillis();
System.out.println("Total time = " + (stop - start) + " ms");
java.io.File om = new java.io.File(fileName + "_custom.txt");
writeLineArrayToFile2(myLineArray, om);
System.out.println("End");
}
}
Version 1 uses readLine(), whereas version 2 is my version, which preserves newline characters.
On a text file with about 500K lines, version1 takes about 380 ms, whereas version2 takes 1074 ms.
How can I speed-up the performance of version2?
I checked Google guava and apache-commons libraries but cannot find a suitable replacement for "readLine()" that will tell which newline character was encountered when reading a text file.
Whenever the issue regards a program's speed, the main thing you should keep in mind is that, for any continuous process within that program, the speed is nearly always limited by one of two things: CPU (processing power) or IO (memory allocation and transfer speed).
Usually either your CPU is faster than your IO, or the contrary. Because of this, your program's speed-limit is almost always dictated by one of them, and it's usually easy to know which:
A program that does a lot of calculations but makes only a few, small operations with files, is almost certainly CPU-bound.
A program that reads a lot of data from files, or writes a lot of data to them, but is not very demanding towards processing, is almost certainly IO-bound.
Things are kinda straightforward when trying to improve an CPU-bounded program's speed. It mostly comes down to achieving the same goal or effect while making less operations.
This, on the other hand, does not make the process any easier. In fact, it's usually much harder to optimize CPU-bounded programs than to optimize IO-bounded ones, because each CPU-related operation is usually unique, and has to be revised individually.
Although generally easier once you have the experience, things are not so straightforward with IO-bound programs. There are a lot more stuff to consider when dealing with IO-bound processes.
I'll be using Hard-Disk Drives (HDDs) as the basis, since the characteristics I'll mention affect HDDs the strongest (because they are mechanical), but you should keep in mind that many of the same concepts apply, to some extent, to almost every memory-storage hardware, including Solid-State Drives (SSDs) and even RAM!
These are the main performance characteristics of most memory-storage hardware:
Access time: Also known as response time, it is the time it takes before the hardware can actually transfer data.
For mechanical hardware such as HDDs, this is mostly related to the mechanical nature of the drive, in other words, it's rotating disk and moving "heads". As such, access time of mechanical drives can vary significantly between each-other.
For circuital hardware such as SSDs and RAM, this time is not dependent on moving parts, but rather electrical connections, so the access time is very quick and consistent, and you shouldn't worry about it.
Seek time: The time it takes for the hardware to seek (reach) the correct position within it's internal subdivisions, in order to read from or write to addresses in that section.
For mechanical drives, mainly rotary ones, the seek time measures the time it takes the head assembly on the actuator arm to travel to the track of the disk where the data will be read from or written to.
Average seek time ranges from 3 ms (~) for high-end server drives, to 15 ms (~) for mobile drives, with the most common desktop drives typically having a seek time around 9 ms (~).
With RAM and SSDs, there are no moving parts, so a measurement of the seek time is only testing the electronic circuits, and preparing a particular location on the memory in the device for the operation.
Typical SSDs will have a seek time between 0.08 to 0.16 ms (~), with RAM being even faster.
Command-Processing time: Also known as command overhead, it is the time it takes for the drive's electronics to set up the necessary communication between the various internal components, so it can read or write the data.
This is in the range of 0.003 ms (~) for both, mechanical and circuital devices, and is usually ignored in benchmarks.
Settle time: It is the time it takes for the heads to settle on the target track and stop vibrating, so that they do not read or write off-track.
This amount is usually very small (typically less than 0.1 ms), and typically included in benchmarks as part of the seek time.
Data-Transfer rate: Also called throughput, it covers both: The internal rate, which is the time it takes to move data between the disk surface and the controller on the drive. And the external rate, which is the time to move data between the controller on the drive and an external component in the host system. It has a few sub-factors within:
Media rate: Speed at which the drive can read bits from the media. In other words, the actual read/write speed.
Sector overhead: Additional time (bytes) needed for control structures and other information necessary to manage the drive, locate and validate data and perform other support functions.
Allocation speed: Similar to sector overhead, it's the time taken for the drive to determine the slots that will be written to, and to register them on it's address dictionary. Only needed for write operations.
Head-Switch time: Time required to electrically switch from one head to another; Only applies to multi-head drives and is about 1 to 2 ms.
Cylinder-switch time: Time required to move to an adjacent track; The name cylinder is used because typically all the tracks of a drive with more than one head or data surface are read before moving the actuator, implying the image of a circle or cylinder rather than a track. This time is exclusive to rotary mechanical drives, and is typically about about 2 to 3 ms.
This means that the main performance issues regarding IO are caused by going back-and-forth between IO and processing. An issue that can be enormously diminished by using buffers, and processing and reading/writhing in bigger chunks of data, rather than every byte.
As you can also see, although many of the speed characteristics are still present, RAM and SSDs do not have the same internal limits of HDDs, so their internal and external transfer rates often reach the maximum capabilities of the drive-to-host interface.
Chunk approach example:
This example will create a Test folder on the desktop, and generate a Test.txt file within.
The file is generated with an specified number of lines, each line containing the word "Test" repeated for an specific number of times (for file-size purposes). Each line is ended by "\r", "\n" or "\r\n", sequentially.
It's meaningless to save the results of each chunk in-memory cumulatively, as doing so would lead the whole file end up in-memory eventually, which is nearly the same problem of not using chunks to begin with.
As such, an output file is created in the same Test folder, to which the result of every chunk is stored at, once that chunk is finished.
The base file is read using buffers, and those buffers are additionally used as the chunks.
The process here is simply printing a textual version of the line-separator ("\\r", "\\n" or "\\r\\n"), followed by ": ", followed by the line contents; But for the last line, "EOF" is used instead.
To actually operate with chunks, it's probably easier to manage with a class-based approach, rather than a purely function-based one.
Anyways, here goes the code:
public static void main(String[] args) throws FileNotFoundException, IOException {
File file = new File(TEST_FOLDER, "Test.txt");
//These settings create a 122 MB file.
generateTestFile(file, 500000, 50);
long clock = System.nanoTime();
processChunks(file, 8 * (int) Math.pow(1024, 2));
clock = System.nanoTime() - clock;
float millis = clock / 1000000f;
float seconds = millis / 1000f;
System.out.printf(""
+ "%12d nanos\n"
+ "%12.3f millis\n"
+ "%12.3f seconds\n",
clock, millis, seconds);
}
public static File prepareResultFile(File source) {
String ofn = source.getName(); //Original File Name.
int extPos = ofn.lastIndexOf('.'); //Extension index.
String ext = ofn.substring(extPos); //Get extension.
ofn = ofn.substring(0, extPos); //Get name without extension reusing 'ofn'.
return new File(source.getParentFile(), ofn + "_Result" + ext);
}
public static void processChunks(File file, int buffSize)
throws FileNotFoundException, IOException {
//No need for buffers bigger than the file itself.
if (file.length() < buffSize) {
buffSize = (int)file.length();
}
byte[] buffer = new byte[buffSize];
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file), buffSize);
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(
prepareResultFile(file)), buffSize);
StringBuilder sb = new StringBuilder();
while (bis.read(buffer) > (-1)) {
//Check if a "\r\n" was split between chunks.
boolean skipFirst = false;
if (sb.length() > 0 && sb.charAt(sb.length() - 1) == '\r') {
if (buffer[0] == '\n') {
bos.write(("\\r\\n: " + sb.toString() + System.lineSeparator()).getBytes());
sb = new StringBuilder();
skipFirst = true;
}
}
for (int i = skipFirst ? 1 : 0; i < buffer.length; i++) {
if (buffer[i] == '\r') {
if (i + 1 < buffer.length) {
if (buffer[i + 1] == '\n') {
bos.write(("\\r\\n: " + sb.toString() + System.lineSeparator()).getBytes());
i++; //Skip '\n'.
} else {
bos.write(("\\r: " + sb.toString() + System.lineSeparator()).getBytes());
}
sb = new StringBuilder(); //Reset accumulator.
} else {
//A "\r\n" might be split between two chunks.
}
} else if (buffer[i] == '\n') {
bos.write(("\\n: " + sb.toString() + System.lineSeparator()).getBytes());
sb = new StringBuilder(); //Reset accumulator.
} else {
sb.append((char) buffer[i]);
}
}
}
bos.write(("EOF: " + sb.toString()).getBytes());
bos.flush();
bos.close();
bis.close();
System.out.println("Finished!");
}
public static boolean generateTestFile(File file, int lines, int elements)
throws IOException {
String[] lineBreakers = {"\r", "\n", "\r\n"};
BufferedOutputStream bos = null;
try {
bos = new BufferedOutputStream(new FileOutputStream(file));
for (int i = 0; i < lines; i++) {
for (int ii = 1; ii < elements; ii++) {
bos.write("test ".getBytes());
}
bos.write("test".getBytes());
bos.write(lineBreakers[i % 3].getBytes());
}
bos.flush();
System.out.printf("LOG: Test file \"%s\" created.\n", file.getName());
return true;
} catch (IOException ex) {
System.err.println("ERR: Could not write file.");
throw ex;
} finally {
try {
bos.close();
} catch (IOException ex) {
System.err.println("WRN: Could not close stream.");
Logger.getLogger(Q_13458142_v2.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I don't know what IDE you are using, but if it's NetBeans, make a memory-profile of your code and compare to a profile of this one. You should notice a big difference in the amount of memory needed during processing.
Here, the chunk approach's memory usage, which includes not only the chunk itself but also the program's own variables and structures, does not go over 40 MB even tough we are dealing with a file bigger than 100 MB. As you can see:
It also spends very little time in GB, mostly less than 5% at any given point:
The second version doesn't seem to use BufferedReader or another form of buffer. It might be the cause of slow down.
Since you seem to read the whole file in memory, you can perhaps read it as a big string (with a buffer) then parse it in memory to analyze the line endings.
Your are doubling the out statements(one for line and one for newline):
Can you try below(use lineSeparator() to get the line separator and append before writing):
out.write(lineArray.get(i)+System.lineSeparator());
Don't reinvent the wheel.
Check the BufferedReader#readLine() code
Copy, paste, and make the changes you need to keep the line separator inside the line

How to write bytes by byte and display it continuously

I have encrypted video file and while decrypting it i have defined Bytebyte[] input = new byte[1024]; size to written it in output file.
Here i want to write first 1024 bytes in output files while at same time if want to play that video file i can play that output file without waiting to whole file written like video streaming.
when first 1024 bytes written , video file will start playing till whole file will written.
You'll have to setup your input stream and output stream depending on where you're getting the data and where you're saving/viewing it. Performance could also likely be improved with some buffering on the output. You should get the general idea.
public class DecryptionWotsit {
private final BlockingDeque<Byte> queue = new LinkedBlockingDeque<Byte>();
private final InputStream in;
private final OutputStream out;
public DecryptionWotsit(InputStream in, OutputStream out) {
this.in = in;
this.out = out;
}
public void go() {
final Runnable decryptionTask = new Runnable() {
#Override
public void run() {
try {
byte[] encrypted = new byte[1024];
byte[] decrypted = new byte[1024];
while (true) {
int encryptedBytes = in.read(encrypted);
// TODO: decrypt into decrypted, set decryptedBytes
int decryptedBytes = 0;
for (int i = 0; i < decryptedBytes; i++)
queue.addFirst(decrypted[i]);
}
}
catch (Exception e) {
// exception handling left for the reader
throw new RuntimeException(e);
}
}
};
final Runnable playTask = new Runnable() {
#Override
public void run() {
try {
while (true) {
out.write(queue.takeLast());
}
}
catch (Exception e) {
throw new RuntimeException(e);
}
}
};
Executors.newSingleThreadExecutor().execute(decryptionTask);
Executors.newSingleThreadExecutor().execute(playTask);
}
}
You will have to do the writing in a separate thread.
Since writing to file is a lot slower than displaying video, expect the file-writing thread to be running long after you've quit watching the video. Unless (as I understand it) you intend to write only the first 1024 bytes to file.
If you intend to write the entire video to file, a single 1024 byte buffer will slow you down. You will either have to use a buffer that is a lot larger, or need a lot of these 1024-byte buffers. (I suppose the 1024 byte buffer size is a consequence of the decryption algorithm?)
Also, you may want to look at how much memory is available for the JVM, to make sure that you won't get an OutOfMemoryException halfway. You can use the -Xms and -Xmx options to set the amount of memory available to the JVM.
A simple way to write to a file, you also want to process is to open the file twice (or more times). In one thread you write to the file and update a counter to say how much you have written e.g. a long protected by a synchronized block. In the reading thread(s) you can get this value and read up to that point, repeatedly until the writer has finished. A simple way to signal the write has finished is to set the size to Long.MAX_VALUE, causing the readers to read until the EOF. To stop the readers busy waiting, you can have them wait() until the data written is greater than the amount read.
This approach always uses a fixed amount of memory e.g. 16 - 128K, regardless of how far behind the readers are from the writer.

Java multiple connection downloading

I wanted to get some advice, I have started on a new project to create a java download accelerator that will use multiple connections. I wanted to know how best to go about this.
So far I have figured out that i can use HttpUrlConnection and use the range property, but wanted to know an efficient way of doing this. Once i have download the parts from the multiple connections i will then have to join the parts so that we end up with a fully downloaded file.
Thanks in advance :)
Get the content length of the file to download.
Divide it according to a criteria (size, speed, …).
Run multiple threads to download the file starting at different positions,
and save them in different files: myfile.part1, myfile.part2, …
Once downloaded, join the parts into one single file.
I tried the following code to get the content length:
public Downloader(String path) throws IOException {
int len = 0;
URL url = new URL(path);
URLConnection connectUrl = url.openConnection();
System.out.println(len = connectUrl.getContentLength());
System.out.println(connectUrl.getContentType());
InputStream input = connectUrl.getInputStream();
int i = len;
int c = 0;
System.out.println("=== Content ===");
while (((c = input.read()) != -1) && (--i > 0)) {
System.out.print((char) c);
}
input.close();
}
Here's a sample code to join the files:
public void join(String FilePath) {
long leninfile=0, leng=0;
int count=1, data=0;
try {
File filename = new File(FilePath);
RandomAccessFile outfile = new RandomAccessFile(filename,"rw");
while(true) {
filename = new File(FilePath + count + ".sp");
if (filename.exists()) {
RandomAccessFile infile = new RandomAccessFile(filename,"r");
data=infile.read();
while(data != -1) {
outfile.write(data);
data=infile.read();
}
leng++;
infile.close();
count++;
} else break;
}
outfile.close();
} catch(Exception e) {
e.printStackTrace();
}
}
If you want to avoid joining segments after downloading you could use a FileChannel.
With a FileChannel, you can write to any position of a file (even with multiple threads).
So you could first allocate the whole file, and then
write the segments where they belong as they come in.
See the Javadocs page for more info.
JDownloader is the best downloader I've seen. If you are interested, it's open source and surely you can learn a lot from their code.

Categories

Resources