Using final variable in lambda expression Java

Using final variable in lambda expression Java - java

I have an app that fetches a lot of data, so I would like to paginate the data into chunks and process those chunks individually rather than dealing with the data all at once. So I wrote a function I am calling every n seconds to check if a chunk is done and then process that chunk. My problem is I have no way of keeping track of the fact that I just processed a chunk and that I should move onto the next chunk when it is available. I was thinking something along the lines of the code below, however I cannot call multiplier++; as it complains that it is not behaving like a final variable anymore. I would like to use something like multiplier so that once the code processes a chunk it 1) doesn't process the same chunk again and 2) moves onto the next chunk. Is it possible to do this? Is there a modifier one can put on multiplier to help avoid race conditions?
int multiplier = 1;
CompletableFuture<String> completionFuture = new CompletableFuture<>();
final ScheduledFuture<?> checkFuture = executor.scheduleAtFixedRate(() -> {
// parse json response
String response = getJSONResponse();
JsonObject jsonObject = ConverterUtils.parseJson(response, true)
.getAsJsonObject();
int pages = jsonObject.get("stats").getAsJsonObject().get("pages").getAsInt();
// if we have a chunk of n pages records then process them with dataHandler function
if (pages > multiplier * bucketSize) {
dataHandler.apply(getResponsePaginated((multiplier - 1) * bucketSize, bucketSize));
multiplier++;
}
if (jsonObject.has("finishedAt") && !jsonObject.get("finishedAt").isJsonNull()) {
// we are done!
completionFuture.complete("");
}
}, 0, sleep, TimeUnit.SECONDS);

You can use an AtomicInteger. Since this is a mutable type, you can assign it to a final variable while still being able to change its value. This also addresses the synchronization issue between the callbacks:
final AtomicInteger multiplier = new AtomicInteger(1);
executor.scheduleAtFixedRate(() -> {
//...
multiplier.incrementAndGet();
}, 0, sleep, TimeUnit.SECONDS);

Related

Rest parellel calls to service -Multithreading in java

I have a rest call api where max count of result return by the api is 1000.start page=1
{
"status": "OK",
"payload": {
"EMPList":[],
count:5665
}
So to get other result I have to change the start page=2 and again hit the service.again will get 1000 results only.
but after first call i want to make it as a parallel call and I want to collect the result and combine it and send it back to calling service in java. Please suggest i am new to java.i tried using callable but it's not working

It seems to me that ideally you should be able to configure your max count to one appropriate for your use case. I'm assuming you aren't able to do that. Here is a simple, lock-less, multi threading scheme that acts as a simple reduction operation for your two network calls:
// online runnable: https://ideone.com/47KsoS
int resultSize = 5;
int[] result = new int[resultSize*2];
Thread pg1 = new Thread(){
public void run(){
System.out.println("Thread 1 Running...");
// write numbers 1-5 to indexes 0-4
for(int i = 0 ; i < resultSize; i ++) {
result[i] = i + 1;
}
System.out.println("Thread 1 Exiting...");
}
};
Thread pg2 = new Thread(){
public void run(){
System.out.println("Thread 2 Running");
// write numbers 5-10 to indexes 5-9
for(int i = 0 ; i < resultSize; i ++) {
result[i + resultSize] = i + 1 + resultSize;
}
System.out.println("Thread 2 Exiting...");
}
};
pg1.start();
pg2.start();
// ensure that pg1 execution finishes
pg1.join();
// ensure that pg2 execution finishes
pg2.join();
// print result of reduction operation
System.out.println(Arrays.toString(result));
There is a very important caveat with this implementation however. You will notice that both of the threads DO NOT overlap in their memory writes. This is very important as if you were to simply change our int[] result to ArrayList<Integer> this could lead to catastrophic failure in our reduction operation between the two threads called a Race Condition (I believe the standard ArrayList implementation in Java is not thread safe). Since we can guarantee how large our result will be I would highly suggest sticking to my usage of an array for this multi-threaded implementation as ArrayLists hide a lot of implementation logic from you that you likely won't understand until you take a basic data-structures course.

Update a string for N seconds in a while loop

I just started learn Java and I'm stuck with this problem: I have an infinite while-loop which creates a message to send over a socket; currently the message is not send until a number of elements is poll from a queue and read them.
String msg = null;
String toSend = "";
String currentNumOfMsg = 0;
String MAX_MSG_TO_SEND = 200;
while(true) {
if ((msg = messageQueue.poll()) != null) { // if there is an element in the list
toSend += (msg + "#");
currentNumOfMsg++;
if (currentNumOfMsg == MAX_MSG_TO_SEND) {
try {
sendMessage(toSend); // send to socket
} finally {
msg = null;
toSend = "";
currentNumOfMsg = 0;
}
}
}
}
My goal is to send the message after N seconds, without waiting to reach the MAX_MSG_TO_SEND... Is it possible to do it or I shall continue with this approach?

While the other answer is perfectly valid, I thought it may be valuable to tell you that ScheduledExecutorService (documentation found here), lets you call a function foo() every n seconds using the method scheduleAtFixedRate().
Basically, the actually setting up the executor is as easy as:
ScheduledExecutorService ses = Executors.newScheduledThreadPool(1);
ses.scheduleAtFixedRate(foo, 0, n, TimeUnit.SECONDS);
I think putting any more code in here is bit unnecessary, but to see how to do this in more detail, look here, here, or here. These links give some basic examples. I would really recommend doing it this way as this class is part of the java util library (so no extra dependencies) and you don't actually have to worry very much about the multithreading/scheduling part of it, it takes care of all that for you. But thats just my $.02.
Leave a question/comment if you have one, I'll try to answer it.

Yeah, definitely you can do such a thing. But at first you should store your receive messages in a data structure and when you want to send the data via the socket, send the data in the data structure.
also, you can use guava stopWatch to send the message exactly on time. for further information, you can see https://dzone.com/articles/guava-stopwatch
Otherwise, you can use a long variable which stores System.currentTimeMillis() and each time checks if the expected elapsed time is received or not like below sample code:
long l = System.currentTimeMillis();
if(System.currentTimeMillis() - l >= 10000) {
//send data
}

Sending data to a database in size-limited chunks

I have a method which takes a parameter which is Partition enum. This method will be called by multiple background threads (15 max) around same time period by passing different value of partition. Here dataHoldersByPartition is a map of Partition and ConcurrentLinkedQueue<DataHolder>.
private final ImmutableMap<Partition, ConcurrentLinkedQueue<DataHolder>> dataHoldersByPartition;
//... some code to populate entry in `dataHoldersByPartition`
private void validateAndSend(final Partition partition) {
ConcurrentLinkedQueue<DataHolder> dataHolders = dataHoldersByPartition.get(partition);
Map<byte[], byte[]> clientKeyBytesAndProcessBytesHolder = new HashMap<>();
int totalSize = 0;
DataHolder dataHolder;
while ((dataHolder = dataHolders.poll()) != null) {
byte[] clientKeyBytes = dataHolder.getClientKey().getBytes(StandardCharsets.UTF_8);
if (clientKeyBytes.length > 255)
continue;
byte[] processBytes = dataHolder.getProcessBytes();
int clientKeyLength = clientKeyBytes.length;
int processBytesLength = processBytes.length;
int additionalLength = clientKeyLength + processBytesLength;
if (totalSize + additionalLength > 50000) {
Message message = new Message(clientKeyBytesAndProcessBytesHolder, partition);
// here size of `message.serialize()` byte array should always be less than 50k at all cost
sendToDatabase(message.getAddress(), message.serialize());
clientKeyBytesAndProcessBytesHolder = new HashMap<>();
totalSize = 0;
}
clientKeyBytesAndProcessBytesHolder.put(clientKeyBytes, processBytes);
totalSize += additionalLength;
}
// calling again with remaining values only if clientKeyBytesAndProcessBytesHolder is not empty
if(!clientKeyBytesAndProcessBytesHolder.isEmpty()) {
Message message = new Message(partition, clientKeyBytesAndProcessBytesHolder);
// here size of `message.serialize()` byte array should always be less than 50k at all cost
sendToDatabase(message.getAddress(), message.serialize());
}
}
And below is my Message class:
public final class Message {
private final byte dataCenter;
private final byte recordVersion;
private final Map<byte[], byte[]> clientKeyBytesAndProcessBytesHolder;
private final long address;
private final long addressFrom;
private final long addressOrigin;
private final byte recordsPartition;
private final byte replicated;
public Message(Map<byte[], byte[]> clientKeyBytesAndProcessBytesHolder, Partition recordPartition) {
this.clientKeyBytesAndProcessBytesHolder = clientKeyBytesAndProcessBytesHolder;
this.recordsPartition = (byte) recordPartition.getPartition();
this.dataCenter = Utils.CURRENT_LOCATION.get().datacenter();
this.recordVersion = 1;
this.replicated = 0;
long packedAddress = new Data().packAddress();
this.address = packedAddress;
this.addressFrom = 0L;
this.addressOrigin = packedAddress;
}
// Output of this method should always be less than 50k always
public byte[] serialize() {
int bufferCapacity = getBufferCapacity(clientKeyBytesAndProcessBytesHolder); // 36 + dataSize + 1 + 1 + keyLength + 8 + 2;
ByteBuffer byteBuffer = ByteBuffer.allocate(bufferCapacity).order(ByteOrder.BIG_ENDIAN);
// header layout
byteBuffer.put(dataCenter).put(recordVersion).putInt(clientKeyBytesAndProcessBytesHolder.size())
.putInt(bufferCapacity).putLong(address).putLong(addressFrom).putLong(addressOrigin)
.put(recordsPartition).put(replicated);
// now the data layout
for (Map.Entry<byte[], byte[]> entry : clientKeyBytesAndProcessBytesHolder.entrySet()) {
byte keyType = 0;
byte[] key = entry.getKey();
byte[] value = entry.getValue();
byte keyLength = (byte) key.length;
short valueLength = (short) value.length;
ByteBuffer dataBuffer = ByteBuffer.wrap(value);
long timestamp = valueLength > 10 ? dataBuffer.getLong(2) : System.currentTimeMillis();
byteBuffer.put(keyType).put(keyLength).put(key).putLong(timestamp).putShort(valueLength)
.put(value);
}
return byteBuffer.array();
}
private int getBufferCapacity(Map<byte[], byte[]> clientKeyBytesAndProcessBytesHolder) {
int size = 36;
for (Entry<byte[], byte[]> entry : clientKeyBytesAndProcessBytesHolder.entrySet()) {
size += 1 + 1 + 8 + 2;
size += entry.getKey().length;
size += entry.getValue().length;
}
return size;
}
// getters and to string method here
}
Basically, what I have to make sure is whenever the sendToDatabase method is called, size of message.serialize() byte array should always be less than 50k at all cost. My sendToDatabase method sends byte array coming out from serialize method. And because of that condition I am doing below validation plus few other stuff. In the method, I will iterate dataHolders CLQ and I will extract clientKeyBytes and processBytes from it. Here is the validation I am doing:
If the clientKeyBytes length is greater than 255 then I will skip it and continue iterating.
I will keep incrementing the totalSize variable which will be the sum of clientKeyLength and processBytesLength, and this totalSize length should always be less than 50000 bytes.
As soon as it reaches the 50000 limit, I will send the clientKeyBytesAndProcessBytesHolder map to the sendToDatabase method and clear out the map, reset totalSize to 0 and start populating again.
If it doesn't reaches that limit and dataHolders got empty, then it will send whatever it has.
I believe there is some bug in my current code because of which maybe some records are not being sent properly or dropped somewhere because of my condition and I am not able to figure this out. Looks like to properly achieve this 50k condition I may have to use getBufferCapacity method to correctly figure out the size before calling sendToDatabase method?

I checked your code, its look good as per your logic. As you said it will always store the information which is less than 50K but it will actually store information till 50K. To make it less than 50K you have to change the if condition to if (totalSize + additionalLength >= 50000).
If your codes still not fulfilling your requirement i.e. storing information when totalSize + additionalLength is greater than 50k I can advise you few thinks.
As more than 50 threads call this method you need to consider two section in your codes to be synchronize.
One is global variable which is a container dataHoldersByPartition object. If multiple concurrent and parallel searches happened in this container object, outcome might not be perfect. Just check whether container type is synchronized or not. If not make this block like below:-
synchronized(this){
ConcurrentLinkedQueue<DataHolder> dataHolders = dataHoldersByPartition.get(partition);
}
Now, I can give only two suggestion to fix this issue. One is instead of if (totalSize + additionalLength > 50000) this you can check the size of the object clientKeyBytesAndProcessBytesHolder if(sizeof(clientKeyBytesAndProcessBytesHolder) >= 50000) (check appropriate method for sizeof in java). And second one is narrow down the area to check whether it is a side effect of multithreading or not. All these suggestion are to find out the area where exactly problem is and fix should be from your end only.
First check whether you method validateAndSend is exactly satisfying your requirement or not. For that synchronize whole validateAndSend method first and check whether everything fine or still have the same result. If still have the same result that means it is not because of multithreading but your coding is not as per requirement. If its work fine that means it is a problem of multithreading. If method synchronization is fixing your issue but degrade the performance you just remove the synchronization from it and concentrate every small block of your code which might cause the issue and make it synchronize block and remove if still not fixing your issue. Like that finally you locate the block of code which is actually creating the issue and leave it as synchronize to fix it finally.
For example first attempt:-
`private synchronize void validateAndSend`
Second attempts: Remove synchronize key words from the method and do the below step:-
synchronize(this){
Message message = new Message(clientKeyBytesAndProcessBytesHolder, partition);
sendToDatabase(message.getAddress(), message.serialize());
}
If you think that I did not correctly understand you please let me know.

In your validateAndSend I would put whole data to the queue, and do whole processing in separate thread. Please consider command model. That way all threads are going to put their load on queue. Consumer thread has all the data, all the information in place, and can process it quite effectively. The only complicated part is sending response / result back to calling thread. Since in your case that is not a problem - the better. There are some more benefits of this pattern - please look at netflix/hystrix.

Read large file multithreaded

I am implementing a class that should receive a large text file. I want to split it in chunks and each chunk to be hold by a different thread that will count the frequency of each character in this chunk. I expect with starting more threads to get better performance but it turns out performance is getting poorer. Here`s my code:
public class Main {
public static void main(String[] args)
throws IOException, InterruptedException, ExecutionException, ParseException
{
// save the current run's start time
long startTime = System.currentTimeMillis();
// create options
Options options = new Options();
options.addOption("t", true, "number of threads to be start");
// variables to hold options
int numberOfThreads = 1;
// parse options
CommandLineParser parser = new DefaultParser();
CommandLine cmd;
cmd = parser.parse(options, args);
String threadsNumber = cmd.getOptionValue("t");
numberOfThreads = Integer.parseInt(threadsNumber);
// read file
RandomAccessFile raf = new RandomAccessFile(args[0], "r");
MappedByteBuffer mbb
= raf.getChannel().map(FileChannel.MapMode.READ_ONLY, 0, raf.length());
ExecutorService pool = Executors.newFixedThreadPool(numberOfThreads);
Set<Future<int[]>> set = new HashSet<Future<int[]>>();
long chunkSize = raf.length() / numberOfThreads;
byte[] buffer = new byte[(int) chunkSize];
while(mbb.hasRemaining())
{
int remaining = buffer.length;
if(mbb.remaining() < remaining)
{
remaining = mbb.remaining();
}
mbb.get(buffer, 0, remaining);
String content = new String(buffer, "ISO-8859-1");
#SuppressWarnings("unchecked")
Callable<int[]> callable = new FrequenciesCounter(content);
Future<int[]> future = pool.submit(callable);
set.add(future);
}
raf.close();
// let`s assume we will use extended ASCII characters only
int alphabet = 256;
// hold how many times each character is contained in the input file
int[] frequencies = new int[alphabet];
// sum the frequencies from each thread
for(Future<int[]> future: set)
{
for(int i = 0; i < alphabet; i++)
{
frequencies[i] += future.get()[i];
}
}
}
}
//help class for multithreaded frequencies` counting
class FrequenciesCounter implements Callable
{
private int[] frequencies = new int[256];
private char[] content;
public FrequenciesCounter(String input)
{
content = input.toCharArray();
}
public int[] call()
{
System.out.println("Thread " + Thread.currentThread().getName() + "start");
for(int i = 0; i < content.length; i++)
{
frequencies[(int)content[i]]++;
}
System.out.println("Thread " + Thread.currentThread().getName() + "finished");
return frequencies;
}
}

As suggested in comments, you will (usually) do not get better performance when reading from multiple threads. Rather you should process the chunks you have read on multiple threads. Usually processing does some blocking, I/O operations (saving to another file? saving to database? HTTP call?) and your performance will get better if you process on multiple threads.
For processing, you may have ExecutorService (with sensible number of threads). use java.util.concurrent.Executors to obtain instance of java.util.concurrent.ExecutorService
Having ExecutorService instance, you may submit your chunks for processing. Submitting chunks would not block. ExecutorService will start to process each chunk at separate thread (details depends of configuration of ExecutorService ). You may submit instances of Runnable or Callable.
Finally, after you submit all items you should call awaitTermination at your ExecutorService. It will wait until processing of all submited items is finished. After awaitTermination returns you should call shutdownNow() to abort processing (otherwise it may hang indefinitely, procesing some rogue task).

Your program is almost certainly limited by the speed of reading from disk. Using multiple threads does not help with this since the limit is a hardware limit on how fast the information can be transferred from disk.
In addition, the use of both RandomAccessFile and a subsequent buffer likely results in a small slowdown, since you are moving the data in memory after reading it in but before processing, rather than just processing it in place. You would be better off not using an intermediate buffer.
You might get a slight speedup by reading from the file directly into the final buffers and dispatching those buffers to be processed by threads as they are filled, rather than waiting for the entire file to be read before processing. However, most of the time would still be used by the disk read, so any speedup would likely be minimal.

Multithreading a massive file read

I'm still in the process of wrapping my brain around how concurrency works in Java. I understand that (if you're subscribing to the OO Java 5 concurrency model) you implement a Task or Callable with a run() or call() method (respectively), and it behooves you to parallelize as much of that implemented method as possible.
But I'm still not understanding something inherent about concurrent programming in Java:
How is a Task's run() method assigned the right amount of concurrent work to be performed?
As a concrete example, what if I have an I/O-bound readMobyDick() method that reads the entire contents of Herman Melville's Moby Dick into memory from a file on the local system. And let's just say I want this readMobyDick() method to be concurrent and handled by 3 threads, where:
Thread #1 reads the first 1/3rd of the book into memory
Thread #2 reads the second 1/3rd of the book into memory
Thread #3 reads the last 1/3rd of the book into memory
Do I need to chunk Moby Dick up into three files and pass them each to their own task, or do I I just call readMobyDick() from inside the implemented run() method and (somehow) the Executor knows how to break the work up amongst the threads.
I am a very visual learner, so any code examples of the right way to approach this are greatly appreciated! Thanks!

You have probably chosen by accident the absolute worst example of parallel activities!
Reading in parallel from a single mechanical disk is actually slower than reading with a single thread, because you are in fact bouncing the mechanical head to different sections of the disk as each thread gets its turn to run. This is best left as a single threaded activity.
Let's take another example, which is similar to yours but can actually offer some benefit: assume I want to search for the occurrences of a certain word in a huge list of words (this list could even have come from a disk file, but like I said, read by a single thread). Assume I can use 3 threads like in your example, each searching on 1/3rd of the huge word list and keeping a local counter of how many times the searched word appeared.
In this case you'd want to partition the list in 3 parts, pass each part to a different object whose type implements Runnable and have the search implemented in the run method.
The runtime itself has no idea how to do the partitioning or anything like that, you have to specify it yourself. There are many other partitioning strategies, each with its own strengths and weaknesses, but we can stick to the static partitioning for now.
Let's see some code:
class SearchTask implements Runnable {
private int localCounter = 0;
private int start; // start index of search
private int end;
private List<String> words;
private String token;
public SearchTask(int start, int end, List<String> words, String token) {
this.start = start;
this.end = end;
this.words = words;
this.token = token;
}
public void run() {
for(int i = start; i < end; i++) {
if(words.get(i).equals(token)) localCounter++;
}
}
public int getCounter() { return localCounter; }
}
// meanwhile in main :)
List<String> words = new ArrayList<String>();
// populate words
// let's assume you have 30000 words
// create tasks
SearchTask task1 = new SearchTask(0, 10000, words, "John");
SearchTask task2 = new SearchTask(10000, 20000, words, "John");
SearchTask task3 = new SearchTask(20000, 30000, words, "John");
// create threads for each task
Thread t1 = new Thread(task1);
Thread t2 = new Thread(task2);
Thread t3 = new Thread(task3);
// start threads
t1.start();
t2.start();
t3.start();
// wait for threads to finish
t1.join();
t2.join();
t3.join();
// collect results
int counter = 0;
counter += task1.getCounter();
counter += task2.getCounter();
counter += task3.getCounter();
This should work nicely. Note that in practical cases you would build a more generic partitioning scheme. You could alternatively use an ExecutorService and implement Callable instead of Runnable if you wish to return a result.
So an alternative example using more advanced constructs:
class SearchTask implements Callable<Integer> {
private int localCounter = 0;
private int start; // start index of search
private int end;
private List<String> words;
private String token;
public SearchTask(int start, int end, List<String> words, String token) {
this.start = start;
this.end = end;
this.words = words;
this.token = token;
}
public Integer call() {
for(int i = start; i < end; i++) {
if(words.get(i).equals(token)) localCounter++;
}
return localCounter;
}
}
// meanwhile in main :)
List<String> words = new ArrayList<String>();
// populate words
// let's assume you have 30000 words
// create tasks
List<Callable> tasks = new ArrayList<Callable>();
tasks.add(new SearchTask(0, 10000, words, "John"));
tasks.add(new SearchTask(10000, 20000, words, "John"));
tasks.add(new SearchTask(20000, 30000, words, "John"));
// create thread pool and start tasks
ExecutorService exec = Executors.newFixedThreadPool(3);
List<Future> results = exec.invokeAll(tasks);
// wait for tasks to finish and collect results
int counter = 0;
for(Future f: results) {
counter += f.get();
}

You picked a bad example, as Tudor was so kind to point out. Spinning disk hardware is subject to physical constraints of moving platters and heads, and the most efficient read implementation is to read each block in order, which reduces the need to move the head or wait for the disk to align.
That said, some operating systems don't always store things continuously on disks, and for those who remember, defragmentation could provide a disk performance boost if you OS / filesystem didn't do the job for you.
As you mentioned wanting a program that would benefit, let me suggest a simple one, matrix addition.
Assuming you made one thread per core, you can trivially divide any two matrices to be added into N (one for each thread) rows. Matrix addition (if you recall) works as such:
A + B = C
or
[ a11, a12, a13 ] [ b11, b12, b13] = [ (a11+b11), (a12+b12), (a13+c13) ]
[ a21, a22, a23 ] + [ b21, b22, b23] = [ (a21+b21), (a22+b22), (a23+c23) ]
[ a31, a32, a33 ] [ b31, b32, b33] = [ (a31+b31), (a32+b32), (a33+c33) ]
So to distribute this across N threads, we simply need to take the row count and modulus divide by the number of threads to get the "thread id" it will be added with.
matrix with 20 rows across 3 threads
row % 3 == 0 (for rows 0, 3, 6, 9, 12, 15, and 18)
row % 3 == 1 (for rows 1, 4, 7, 10, 13, 16, and 19)
row % 3 == 2 (for rows 2, 5, 8, 11, 14, and 17)
// row 20 doesn't exist, because we number rows from 0
Now each thread "knows" which rows it should handle, and the results "per row" can be computed trivially because the results do not cross into other thread's domain of computation.
All that is needed now is a "result" data structure which tracks when the values have been computed, and when last value is set, then the computation is complete. In this "fake" example of a matrix addition result with two threads, computing the answer with two threads takes approximately half the time.
// the following assumes that threads don't get rescheduled to different cores for
// illustrative purposes only. Real Threads are scheduled across cores due to
// availability and attempts to prevent unnecessary core migration of a running thread.
[ done, done, done ] // filled in at about the same time as row 2 (runs on core 3)
[ done, done, done ] // filled in at about the same time as row 1 (runs on core 1)
[ done, done, .... ] // filled in at about the same time as row 4 (runs on core 3)
[ done, ...., .... ] // filled in at about the same time as row 3 (runs on core 1)
More complex problems can be solved by multithreading, and different problems are solved with different techniques. I purposefully picked one of the simplest examples.

you implement a Task or Callable with a run() or call() method
(respectively), and it behooves you to parallelize as much of that
implemented method as possible.
A Task represents a discrete unit of work
Loading a file into memory is a discrete unit of work and can therefore this activity can be delegated to a background thread. I.e. a background thread runs this task of loading the file.
It is a discrete unit of work since it has no other dependencies needed in order to do its job (load the file) and has discrete boundaries.
What you are asking is to further divide this into task. I.e. a thread loads 1/3 of the file while another thread the 2/3 etc.
If you were able to divide the task into further subtasks then it would not be a task in the first place by definition. So loading a file is a single task by itself.
To give you an example:
Let's say that you have a GUI and you need to present to the user data from 5 different files. To present them you need also to prepare some data structures to process the actual data.
All these are separate tasks.
E.g. the loading of files is 5 different tasks so could be done by 5 different threads.
The preparation of the data structures could be done a different thread.
The GUI runs of course in another thread.
All these can happen concurrently

If you system supported high-throughput I/O , here is how you can do it:
How to read a file using multiple threads in Java when a high throughput(3GB/s) file system is available
Here is the solution to read a single file with multiple threads.
Divide the file into N chunks, read each chunk in a thread, then merge them in order. Beware of lines that cross chunk boundaries. It is the basic idea as suggested by user
slaks
Bench-marking below implementation of multiple-threads for a single 20 GB file:
1 Thread : 50 seconds : 400 MB/s
2 Threads: 30 seconds : 666 MB/s
4 Threads: 20 seconds : 1GB/s
8 Threads: 60 seconds : 333 MB/s
Equivalent Java7 readAllLines() : 400 seconds : 50 MB/s
Note: This may only work on systems that are designed to support high-throughput I/O , and not on usual personal computers
Here is the essential nits of the code, for complete details , follow the link
public class FileRead implements Runnable
{
private FileChannel _channel;
private long _startLocation;
private int _size;
int _sequence_number;
public FileRead(long loc, int size, FileChannel chnl, int sequence)
{
_startLocation = loc;
_size = size;
_channel = chnl;
_sequence_number = sequence;
}
#Override
public void run()
{
System.out.println("Reading the channel: " + _startLocation + ":" + _size);
//allocate memory
ByteBuffer buff = ByteBuffer.allocate(_size);
//Read file chunk to RAM
_channel.read(buff, _startLocation);
//chunk to String
String string_chunk = new String(buff.array(), Charset.forName("UTF-8"));
System.out.println("Done Reading the channel: " + _startLocation + ":" + _size);
}
//args[0] is path to read file
//args[1] is the size of thread pool; Need to try different values to fing sweet spot
public static void main(String[] args) throws Exception
{
FileInputStream fileInputStream = new FileInputStream(args[0]);
FileChannel channel = fileInputStream.getChannel();
long remaining_size = channel.size(); //get the total number of bytes in the file
long chunk_size = remaining_size / Integer.parseInt(args[1]); //file_size/threads
//thread pool
ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(args[1]));
long start_loc = 0;//file pointer
int i = 0; //loop counter
while (remaining_size >= chunk_size)
{
//launches a new thread
executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i));
remaining_size = remaining_size - chunk_size;
start_loc = start_loc + chunk_size;
i++;
}
//load the last remaining piece
executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i));
//Tear Down
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using final variable in lambda expression Java - java

Related

Rest parellel calls to service -Multithreading in java

Update a string for N seconds in a while loop

Sending data to a database in size-limited chunks

Read large file multithreaded

Multithreading a massive file read

Categories

Resources