ExecutorService with huge number of tasks - java

I have a list of files and a list of analyzers that analyze those files. Number of files can be large (200,000) and number of analyzers (1000). So total number of operations can be really large (200,000,000). Now, I need to apply multithreading to speed things up. I followed this approach:
ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
for (File file : listOfFiles) {
for (Analyzer analyzer : listOfAnalyzers){
executor.execute(() -> {
boolean exists = file.exists();
if(exists){
analyzer.analyze(file);
}
});
}
}
executor.shutdown();
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
But the problem of this approach is that it's taking too much from memory and I guess there is better way to do it. I'm still beginner at java and multithreading.

Where are 200M tasks going to reside? Not in memory, I hope, unless you plan to implement your solution in a distributed fashion. In meantime, you need to instantiate an ExecutorService that does not accumulate a massive queue. Use with the "caller runs policy" (see here) when you create the service. If you try to put another task in the queue when it's already full, you'll end up executing it yourself, which is probably what you want.
OTOH, now that I look at your question more conscientiously, why not analyze a single file concurrently? Then the queue is never larger than the number of analyzers. That's what I'd do, frankly, since I'd like a readable log that has a message for each file as I load it, in the correct order.
I apologize for not being more helpful:
analysts.stream().map(analyst -> executor.submit(() -> analyst.analyze(file))).map(Future::get);
Basically, create bunch of futures for a single file, then wait for all of them before you move on.

One idea is to employ fork/join algorithm and group items (files) into batches in order to process them individually.
My suggestion is the following:
Firstly, filter out all files that do not exist - they occupy resources unnecessarily.
The following pseudo-code demonstrates the algorithm that might help you out:
public static class CustomRecursiveTask extends RecursiveTask<Integer {
private final Analyzer[] analyzers;
private final int threshold;
private final File[] files;
private final int start;
private final int end;
public CustomRecursiveTask(Analyzer[] analyzers,
final int threshold,
File[] files,
int start,
int end) {
this.analyzers = analyzers;
this.threshold = threshold;
this.files = files;
this.start = start;
this.end = end;
}
#Override
protected Integer compute() {
final int filesProcessed = end - start;
if (filesProcessed < threshold) {
return processSequentially();
} else {
final int middle = (start + end) / 2;
final int analyzersCount = analyzers.length;
final ForkJoinTask<Integer> left =
new CustomRecursiveTask(analyzers, threshold, files, start, middle);
final ForkJoinTask<Integer> right =
new CustomRecursiveTask(analyzers, threshold, files, middle + 1, end);
left.fork();
right.fork();
return left.join() + right.join();
}
}
private Integer processSequentially() {
for (int i = start; i < end; i++) {
File file = files[i];
for(Analyzer analyzer : analyzers) { analyzer.analyze(file) };
}
return 1;
}
}
And the usage looks the following way:
public static void main(String[] args) {
final Analyzer[] analyzers = new Analyzer[]{};
final File[] files = new File[] {};
final int threshold = files.length / 5;
ForkJoinPool.commonPool().execute(
new CustomRecursiveTask(
analyzers,
threshold,
files,
0,
files.length
)
);
}
Notice that depending on constraints you can manipulate the task's constructor arguments so that the algorithm will adjust to the amount of files.
You could specify different thresholds let's say depending on the amount of files.
final int threshold;
if(files.length > 100_000) {
threshold = files.length / 4;
} else {
threshold = files.length / 8;
}
You could also specify the amount of worker threads in ForkJoinPool depending on the input amount.
Measure, adjust, modify, you will solve the problem eventually.
Hope that helps.
UPDATE:
If the result analysis is of no interest, you could replace the RecursiveTask with RecursiveAction. The pseudo-code adds auto-boxing overhead in between.

Related

Compartmentalizing loops over a large iteration

The Goal of my question is to enhance the performance of my algorithm by splitting the range of my loop iterations over a large array list.
For example: I have an Array list with a size of about 10 billion entries of long values, the goal I am trying to achieve is to start the loop from 0 to 100 million entries, output the result for the 100 million entries of whatever calculations inside the loop; then begin and 100 million to 200 million doing the previous and outputting the result, then 300-400million,400-500million and so on and so forth.
after I get all the 100 billion/100 million results, then I can sum them up outside of the loop collecting the results from the loop outputs parallel.
I have tried to use a range that might be able to achieve something similar by trying to use a dynamic range shift method but I cant seem to have the logic fully implemented like I would like to.
public static void tt4() {
long essir2 = 0;
long essir3 = 0;
List cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
long t1 = (long) cc.get((int) k);
long t2 = (long) cc.get((int) (k + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}
System.out.println("\n" + essir3);
}
I don't have any errors, I am just looking for a way to enhance performance and time. I can do a million entries in under a second directly, but when I put the size I require it runs forever. The size I'm giving are abstracts to illustrate size magnitudes, I don't want opinions like a 100 billion is not much, if I can do a million under a second, I'm talking massively huge numbers I need to iterate over doing complex tasks and calls, I just need help with the logic I'm trying to achieve if I can.
One thing I would suggest right off the bat would be to store your Breakpoint return value inside a simple array rather then using a List. This should improve your execution time significantly:
List<Long> cc = new ArrayList<>();
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
Long[] ccArray = cc.toArray(new Long[0]);
I believe what you're looking for is to split your tasks across multiple threads. You can do this with ExecutorService "which simplifies the execution of tasks in asynchronous mode".
Note that I am not overly familiar with this whole concept but have experimented with it a bit recently and give you a quick draft of how you could implement this.
I welcome those more experienced with multi-threading to either correct this post or provide additional information in the comments to help improve this answer.
Runnable Task class
public class CompartmentalizationTask implements Runnable {
private final ArrayList<Long> cc;
private final long index;
public CompartmentalizationTask(ArrayList<Long> list, long index) {
this.cc = list;
this.index = index;
}
#Override
public void run() {
Main.compartmentalize(cc, index);
}
}
Main class
private static ExecutorService exeService = Executors.newCachedThreadPool();
private static List<Future> futureTasks = new ArrayList<>();
public static void tt4() throws ExecutionException, InterruptedException
{
long essir2 = 0;
long essir3 = 0;
ArrayList<Long> cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
futureTasks.add(Main.exeService.submit(new CompartmentalizationTask(cc, k)));
}
for (int i = 0; i < futureTasks.size(); i++) {
futureTasks.get(i).get();
}
exeService.shutdown();
}
public static void compartmentalize(ArrayList<Long> cc, long index)
{
long t1 = (long) cc.get((int) index);
long t2 = (long) cc.get((int) (index + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}

Read large file multithreaded

I am implementing a class that should receive a large text file. I want to split it in chunks and each chunk to be hold by a different thread that will count the frequency of each character in this chunk. I expect with starting more threads to get better performance but it turns out performance is getting poorer. Here`s my code:
public class Main {
public static void main(String[] args)
throws IOException, InterruptedException, ExecutionException, ParseException
{
// save the current run's start time
long startTime = System.currentTimeMillis();
// create options
Options options = new Options();
options.addOption("t", true, "number of threads to be start");
// variables to hold options
int numberOfThreads = 1;
// parse options
CommandLineParser parser = new DefaultParser();
CommandLine cmd;
cmd = parser.parse(options, args);
String threadsNumber = cmd.getOptionValue("t");
numberOfThreads = Integer.parseInt(threadsNumber);
// read file
RandomAccessFile raf = new RandomAccessFile(args[0], "r");
MappedByteBuffer mbb
= raf.getChannel().map(FileChannel.MapMode.READ_ONLY, 0, raf.length());
ExecutorService pool = Executors.newFixedThreadPool(numberOfThreads);
Set<Future<int[]>> set = new HashSet<Future<int[]>>();
long chunkSize = raf.length() / numberOfThreads;
byte[] buffer = new byte[(int) chunkSize];
while(mbb.hasRemaining())
{
int remaining = buffer.length;
if(mbb.remaining() < remaining)
{
remaining = mbb.remaining();
}
mbb.get(buffer, 0, remaining);
String content = new String(buffer, "ISO-8859-1");
#SuppressWarnings("unchecked")
Callable<int[]> callable = new FrequenciesCounter(content);
Future<int[]> future = pool.submit(callable);
set.add(future);
}
raf.close();
// let`s assume we will use extended ASCII characters only
int alphabet = 256;
// hold how many times each character is contained in the input file
int[] frequencies = new int[alphabet];
// sum the frequencies from each thread
for(Future<int[]> future: set)
{
for(int i = 0; i < alphabet; i++)
{
frequencies[i] += future.get()[i];
}
}
}
}
//help class for multithreaded frequencies` counting
class FrequenciesCounter implements Callable
{
private int[] frequencies = new int[256];
private char[] content;
public FrequenciesCounter(String input)
{
content = input.toCharArray();
}
public int[] call()
{
System.out.println("Thread " + Thread.currentThread().getName() + "start");
for(int i = 0; i < content.length; i++)
{
frequencies[(int)content[i]]++;
}
System.out.println("Thread " + Thread.currentThread().getName() + "finished");
return frequencies;
}
}
As suggested in comments, you will (usually) do not get better performance when reading from multiple threads. Rather you should process the chunks you have read on multiple threads. Usually processing does some blocking, I/O operations (saving to another file? saving to database? HTTP call?) and your performance will get better if you process on multiple threads.
For processing, you may have ExecutorService (with sensible number of threads). use java.util.concurrent.Executors to obtain instance of java.util.concurrent.ExecutorService
Having ExecutorService instance, you may submit your chunks for processing. Submitting chunks would not block. ExecutorService will start to process each chunk at separate thread (details depends of configuration of ExecutorService ). You may submit instances of Runnable or Callable.
Finally, after you submit all items you should call awaitTermination at your ExecutorService. It will wait until processing of all submited items is finished. After awaitTermination returns you should call shutdownNow() to abort processing (otherwise it may hang indefinitely, procesing some rogue task).
Your program is almost certainly limited by the speed of reading from disk. Using multiple threads does not help with this since the limit is a hardware limit on how fast the information can be transferred from disk.
In addition, the use of both RandomAccessFile and a subsequent buffer likely results in a small slowdown, since you are moving the data in memory after reading it in but before processing, rather than just processing it in place. You would be better off not using an intermediate buffer.
You might get a slight speedup by reading from the file directly into the final buffers and dispatching those buffers to be processed by threads as they are filled, rather than waiting for the entire file to be read before processing. However, most of the time would still be used by the disk read, so any speedup would likely be minimal.

(Thread pools in Java) Increasing number of threads creates slow down for simple for loop. Why?

I've got a little bit of work that is easily parallelizable, and I want to use Java threads to split up the work across my four core machine. It's a genetic algorithm applied to the traveling salesman problem. It doesn't sound easily parallelizable, but the first loop is very easily so. The second part where I talk about the actual evolution may or may not be, but I want to know if I'm getting slow down because of the way I'm implementing threading, or if its the algorithm itself.
Also, if anyone has better ideas on how I should be implementing what I'm trying to do, that would be very much appreciated.
In main(), I have this:
final ArrayBlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(numThreads*numIter);
ThreadPoolExecutor tpool = new ThreadPoolExecutor(numThreads, numThreads, 10, TimeUnit.SECONDS, queue);
barrier = new CyclicBarrier(numThreads);
k.init(tpool);
I have a loop that is done inside of init() and looks like this:
for (int i = 0; i < numCities; i++) {
x[i] = rand.nextInt(width);
y[i] = rand.nextInt(height);
}
That I changed to this:
int errorCities = 0, stepCities = 0;
stepCities = numCities/numThreads;
errorCities = numCities - stepCities*numThreads;
// Split up work, assign to threads
for (int i = 1; i <= numThreads; i++) {
int startCities = (i-1)*stepCities;
int endCities = startCities + stepCities;
// This is a bit messy...
if(i <= numThreads) endCities += errorCities;
tpool.execute(new citySetupThread(startCities, endCities));
}
And here is citySetupThread() class:
public class citySetupThread implements Runnable {
int start, end;
public citySetupThread(int s, int e) {
start = s;
end = e;
}
public void run() {
for (int j = start; j < end; j++) {
x[j] = ThreadLocalRandom.current().nextInt(0, width);
y[j] = ThreadLocalRandom.current().nextInt(0, height);
}
try {
barrier.await();
} catch (InterruptedException ie) {
return;
} catch (BrokenBarrierException bbe) {
return;
}
}
}
The above code is run once in the program, so it was sort of a test case for my threading constructs (this is my first experience with Java threads). I implemented the same sort of thing in a real critical section, specifically the evolution part of the genetic algorithm, whose class is as follows:
public class evolveThread implements Runnable {
int start, end;
public evolveThread(int s, int e) {
start = s;
end = e;
}
public void run() {
// Get midpoint
int n = population.length/2, m;
for (m = start; m > end; m--) {
int i, j;
i = ThreadLocalRandom.current().nextInt(0, n);
do {
j = ThreadLocalRandom.current().nextInt(0, n);
} while(i == j);
population[m].crossover(population[i], population[j]);
population[m].mutate(numCities);
}
try {
barrier.await();
} catch (InterruptedException ie) {
return;
} catch (BrokenBarrierException bbe) {
return;
}
}
}
Which exists in a function evolve() that is called in init() like so:
for (int p = 0; p < numIter; p++) evolve(p, tpool);
Yes I know that's not terribly good design, but for other reasons I'm stuck with it. Inside of evolve is the relevant parts, shown here:
// Threaded inner loop
int startEvolve = popSize - 1,
endEvolve = (popSize - 1) - (popSize - 1)/numThreads;
// Split up work, assign to threads
for (int i = 0; i < numThreads; i++) {
endEvolve = (popSize - 1) - (popSize - 1)*(i + 1)/numThreads + 1;
tpool.execute(new evolveThread(startEvolve, endEvolve));
startEvolve = endEvolve;
}
// Wait for our comrades
try {
barrier.await();
} catch (InterruptedException ie) {
return;
} catch (BrokenBarrierException bbe) {
return;
}
population[1].crossover(population[0], population[1]);
population[1].mutate(numCities);
population[0].mutate(numCities);
// Pick out the strongest
Arrays.sort(population, population[0]);
current = population[0];
generation++;
What I really want to know is this:
What role does the "queue" have? Am I right to create a queue for as many jobs as I think will be executed for all threads in the pool? If the size isn't sufficiently large, I get RejectedExecutionException's. I just decided to do numThreads*numIterations because that's how many jobs there would be (for the actual evolution method that I mentioned earlier). It's weird though.. I shouldn't have to do this if the barrier.await()'s were working, which leads me to...
Am I using the barrier.await() correctly? Currently I have it in two places: inside the run() method for the Runnable object, and after the for loop that executes all the jobs. I would've thought only one would be required, but I get errors if I remove one or the other.
I'm suspicious of contention for the threads, as that is the only thing I can glean from the absurd slowdown (which does scale with the input parameters). I want to know if it is anything to do with how I'm implementing the thread pool and barriers. If not, then I'll have to look inside the crossover() and mutate() methods, I suppose.
First, I think you may have a bug with how you intended to use the CyclicBarrier. Currently you are initializing it with the number of executor threads as the number of parties. You have an additional party, however; the main thread. So I think you need to do:
barrier = new CyclicBarrier(numThreads + 1);
I think this should work, but personally I find it an odd use of the barrier.
When using a worker-queue thread-pool model I find it easier to use a Semaphore or Java's Future model.
For a semaphore:
class MyRunnable implements Runnable {
private final Semaphore sem;
public MyRunnable(Semaphore sem) {
this.sem = sem;
}
public void run() {
// do work
// signal complete
sem.release()
}
}
Then in your main thread:
Semaphore sem = new Semaphore(0);
for (int i = 0; i < numJobs; ++i) {
threadPool.execute(new MyRunnable(sem));
}
sem.acquire(numJobs);
Its really doing the same thing as the barrier, but I find it easier to think about the worker tasks "signaling" that they are done instead of "sync'ing up" with the main thread again.
For example, if you look at the example code in the CyclicBarrier JavaDoc the call to barrier.await() is inside the loop inside the worker. So it is really synching up the multiple long running worker threads and the main thread is not participating in the barrier. Calling barrier.await() at the end of the worker outside the loop is more signaling completion.
As you increase the number of tasks, you increase the overhead using each task adds. This means you want to minimise the number of tasks i.e. the same as the number of cpus you have. For some tasks using double the number of cpus can be better when the work load is not even.
BTW: You don't need a barrier in each task, you can wait for the future of each task to complete by calling get() on each one.

Java strange performance inconsistency

I have a simple recursive method, a depth first search. On each call, it checks if it's in a leaf, otherwise it expands the current node and calls itself on the children.
I'm trying to make it parallel, but I notice the following strange (for me) problem.
I measure execution time with System.currentTimeMillis().
When I break the search into a number of subsearches and add the total execution time, I get a bigger number than the sequential search. I only measure execution time, no communication or sync, etc. I would expect to get the same time when I add the times of the subtasks. This happens even if I just run one task after the other, so without threads. If I just break the search into some subtasks and run the subtasks one after the other, I get a bigger time.
If I add the number of method calls for the subtasks, I get the same number as the sequential search. So, basically, in both cases I do the same number of method calls, but I get different times.
I'm guessing there's some overhead on initial method calls or something else caused by a JVM mechanism. Any ideas what could it be?
For example, one sequential search takes around 3300 ms. If I break it into 13 tasks, it takes a total time of 3500ms.
My method looks like this:
private static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
Whenever I call it, I do it like this:
for(int i = 0; i < num_tasks; i++){
long start = System.currentTimeMillis();
dfs(tasks[i]);
totalTime += (System.currentTimeMillis() - start);
}
Problem is totalTime increases with num_tasks and I would expect to stay the same because the method_calls variable stays the same.
You should average out the numbers over longer runs. Secondly the precision of currentTimeMillis may not be sufficient, you can try using System.nanoTime().
As in all the programming languages, whenever you call a procedure or a method, you have to push the environment, initialize the new one, execute the programs instructions, return the value on the stack and finally reset the previous environment. It cost a bit! Create a thread cost also more!
I suppose that if you enlarge the researching tree you will have benefit by the parallelization.
Adding system clock time for several threads seems a weird idea. Either you are interested in the time until processing is complete, in which case adding doesn't make sense, or in cpu usage, in which case you should only count when the thread is actually scheduled to execute.
What probably happens is that at least part of the time, more threads are ready to execute than the system has cpu cores, and the scheduler puts one of your threads to sleep, which causes it to take longer to complete. It makes sense that this effect is exacerbated the more threads you use. (Even if your program uses less threads than you have cores, other programs (such as your development environment, ...) might).
If you are interested in CPU usage, you might wish to query ThreadMXBean.getCurrentThreadCpuTime
I'd expect to see Threads used. Something like this:
import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class Puzzle {
static volatile long totalTime = 0;
private static int method_calls = 0;
/**
* #param args
*/
public static void main(String[] args) {
final int num_tasks = 13;
final State[] tasks = new State[num_tasks];
ExecutorService threadPool = Executors.newFixedThreadPool(5);
for(int i = 0; i < num_tasks; i++){
threadPool.submit(new DfsRunner(tasks[i]));
}
try {
threadPool.shutdown();
threadPool.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException e) {
System.out.println("Interrupted");
}
System.out.println(method_calls + " Methods in " + totalTime + "msecs");
}
static final int dfs(State state) {
method_calls++;
if(state.isLeaf()){
return 1;
}
State[] children = state.expand();
int result = 0;
for (int i = 0; i < children.length; i++) {
result += dfs(children[i]);
}
return result;
}
}
With the runnable bit like this:
public class DfsRunner implements Runnable {
private State state;
public DfsRunner(State state) {
super();
this.state = state;
}
#Override
public void run() {
long start = System.currentTimeMillis();
Puzzle.dfs(state);
Puzzle.totalTime += (System.currentTimeMillis() - start);
}
}

How to speed up/optimize file write in my program

Ok. I am supposed to write a program to take a 20 GB file as input with 1,000,000,000 records and create some kind of an index for faster access. I have basically decided to split the 1 bil records into 10 buckets and 10 sub-buckets within those. I am calculating two hash values for the record to locate its appropriate bucket. Now, i create 10*10 files, one for each sub-bucket. And as i hash the record from the input file, i decide which of the 100 files it goes to; then append the record's offset to that particular file.
I have tested this with a sample file with 10,000 records. I have repeated the process 10 times. Effectively emulating a 100,000 record file. For this it takes me around 18 seconds. This means its gonna take me forever to do the same for a 1 bil record file.
Is there anyway i can speed up/ optimize my writing.
And i am going through all this because i can't store all the records in main memory.
import java.io.*;
// PROGRAM DOES THE FOLLOWING
// 1. READS RECORDS FROM A FILE.
// 2. CALCULATES TWO SETS OF HASH VALUES N, M
// 3. APPENDING THE OFFSET OF THAT RECORD IN THE ORIGINAL FILE TO ANOTHER FILE "NM.TXT" i.e REPLACE THE VALUES OF N AND M.
// 4.
class storage
{
public static int siz=10;
public static FileWriter[][] f;
}
class proxy
{
static String[][] virtual_buffer;
public static void main(String[] args) throws Exception
{
virtual_buffer = new String[storage.siz][storage.siz]; // TEMPORARY STRING BUFFER TO REDUCE WRITES
String s,tes;
for(int y=0;y<storage.siz;y++)
{
for(int z=0;z<storage.siz;z++)
{
virtual_buffer[y][z]=""; // INITIALISING ALL ELEMENTS TO ZERO
}
}
int offset_in_file = 0;
long start = System.currentTimeMillis();
// READING FROM THE SAME IP FILE 20 TIMES TO EMULATE A SINGLE BIGGER FILE OF SIZE 20*IP FILE
for(int h=0;h<20;h++){
BufferedReader in = new BufferedReader(new FileReader("outTest.txt"));
while((s = in.readLine() )!= null)
{
tes = (s.split(";"))[0];
int n = calcHash(tes); // FINDING FIRST HASH VALUE
int m = calcHash2(tes); // SECOND HASH
index_up(n,m,offset_in_file); // METHOD TO WRITE TO THE APPROPRIATE FILE I.E NM.TXT
offset_in_file++;
}
in.close();
}
System.out.println(offset_in_file);
long end = System.currentTimeMillis();
System.out.println((end-start));
}
static int calcHash(String s) throws Exception
{
char[] charr = s.toCharArray();;
int i,tot=0;
for(i=0;i<charr.length;i++)
{
if(i%2==0)tot+= (int)charr[i];
}
tot = tot % storage.siz;
return tot;
}
static int calcHash2(String s) throws Exception
{
char[] charr = s.toCharArray();
int i,tot=1;
for(i=0;i<charr.length;i++)
{
if(i%2==1)tot+= (int)charr[i];
}
tot = tot % storage.siz;
if (tot<0)
tot=tot*-1;
return tot;
}
static void index_up(int a,int b,int off) throws Exception
{
virtual_buffer[a][b]+=Integer.toString(off)+"'"; // THIS BUFFER STORES THE DATA TO BE WRITTEN
if(virtual_buffer[a][b].length()>2000) // TO A FILE BEFORE WRITING TO IT, TO REDUCE NO. OF WRITES
{ .
String file = "c:\\adsproj\\"+a+b+".txt";
new writethreader(file,virtual_buffer[a][b]); // DOING THE ACTUAL WRITE PART IN A THREAD.
virtual_buffer[a][b]="";
}
}
}
class writethreader implements Runnable
{
Thread t;
String name, data;
writethreader(String name, String data)
{
this.name = name;
this.data = data;
t = new Thread(this);
t.start();
}
public void run()
{
try{
File f = new File(name);
if(!f.exists())f.createNewFile();
FileWriter fstream = new FileWriter(name,true); //APPEND MODE
fstream.write(data);
fstream.flush(); fstream.close();
}
catch(Exception e){}
}
}
Consider using VisualVM to pinpoint the bottlenecks. Everything else below is based on guesswork - and performance guesswork is often really, really wrong.
I think you have two issues with your write strategy.
The first is that you're starting a new thread on each write; the second is that you're re-opening the file on each write.
The thread problem is especially bad, I think, because I don't see anything preventing one thread writing on a file from overlapping with another. What happens then? Frankly, I don't know - but I doubt it's good.
Consider, instead, creating an array of open files for all 100. Your OS may have a problem with this - but I think probably not. Then create a queue of work for each file. Create a set of worker threads (100 is too many - think 10 or so) where each "owns" a set of files that it loops through, outputting and emptying the queue for each file. Pay attention to the interthread interaction between queue reader and writer - use an appropriate queue class.
I would throw away the entire requirement and use a database.

Categories

Resources