I am writing a program that reads words from a file and sorts them in alphabetical order. You provide the input and output files in the command line, and the program reads the words from the input file and writes a sorted list back to the output file. This is done, and it works as it should do. No questions here.
I am not looking for specific code, but rather help on how to approach a problem. The next part of the assignment states that in the command line, you are to be able to set the number of Threads you want the program to use in the sorting process.
For instance, if you compile with the following:
java Sort 12 infile.txt outfile.txt
The above program is meant to use 12 Threads to sort the words from "infile.txt". Each Thread is to sort a number of N = (numberOfWords)/(numberOfThreads) words. All the words are read into memory, before the Threads are started. I'm aware that this might sound cryptic, but I have been googling around looking for a good explanation on "multithreading"/defining the number of Threads in a Java program, yet I am not any wiser.
If anyone knows how to explain how you can set the number of Threads in Java, even with a small example, I would be very grateful!
Thanks!
You could use the Executors.newFixedThreadPool(int nThreads) method (see details here) to get a ThreadPool with the required number of threads. Then, divide your work into the appropriate number of chunks (12 in your example), create a Runnable object for each chunk of work and pass those Runnable objects to the ThreadPool's submit method.
Oh sure. Well a thread is just a class with a "run" method.
You create the class and either have it extend Thread or implement Runnable. If you extend thread you can just call Thread.start() on it and that would start the thread. If you implement Runnable instead you have to so something like Thread t = new Thread(yourRunnableClass);, and then start T.
So for your example:
public class Sort {
class RunnableClass implements Runnable(){
String args;
RunnableClass(String[] args){
this.args = args;
}
run(){
//Do your sorting
}
}
public static void main(String[] args){
//some code that chops the args beyond arg 0 into arrays or something
int numberOfThreads = Integer.parseInt(args[0]);
for(int x=0;x<numberOfThreads;x++){
Thread t = new Thread(new RunnableClass(String[] wordsToSort));
}
//something to manage the threads and coordinate their work
}
}
You could make this more elaborate or complex, one simple implementation would be to just loop over the words, passing 2 to each thread to sort and then once the threads complete if the order didn't change increment along the list till no orders change. That's a form of bubble sort. So in other words Thread A sorts words 1 and 2 Thread B sorts words 3 and 4 and so on.
The threads can communicate with each other, share state or have their own state, etc. There are many ways to implement this.
The threads could terminate, or be re-entrant, could have state, etc.
Executors class has static newFixedThreadPool (int numberOfThreads) that can be given the number of threads to pool. For example, if you have class implementing Runnable
public class MyCustomThread implements Runnable {
#Override
public void run() {
//do your work
}
}
you can create pool with 5 threads like this
..
int numberOfThreads = 5;
ExecutorService srv = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
srv.execute(new MyCustomThread());
}
Using ExecutorService it will be much easier for you to manage lifecycles of threads. Read Oracle concurrency tutorial for more information.
Here I want to ask you one question is which version of java you are using. As this task is not trivial to achieve as you are required to take care of couple things like threads join etc. Java 7 has a feature 'Fork/Join' by which you can leverage the task.
You can refer the following for an example.
Sorting using Fork/Join
You can start from this
What you're looking for is a Fork/Join framework. This splits a single task into parts, handing the parts to multiple threads to be processed.
ExecutorService's FixedThreadPool allows you to create 12 worker threads, but leaves you with all the hard work of separating the work between the threads. The Fork/Join framework makes this easy, using a recursive system to break the process down if needed so it could be split between threads.
http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
define a runnable, then in a loop add new threads with that runnable to a list. then start all the threads, either in the same loop or a separate one, passing all the words you need to process to each runnable on construction?
you will also have to control access to the output file, and possibly the input file depending on how you access it, otherwise your thread will run into trouble, so take a look at race-conditions and how to deal with them
Related
I don't understand, when I creates Thread, what i will get in first case and the second?
And in general, there is difference between them?
ExecutorService executorService = Executors.newCachedThreadPool();
NewThread newThread = new NewThread(Thread.MAX_PRIORITY);
for(int i = 0;i < 5; i++){
executorService.execute(newThread);
}
ExecutorService executorService = Executors.newCachedThreadPool();
for(int i = 0;i < 5; i++){
NewThread newThread = new NewThread(Thread.MAX_PRIORITY);
executorService.execute(newThread);
}
Best answer, given what you've provided is: in the first case you'll probably get errors. Second way is totally safe (assuming that you're not doing something unsafe, of course).
I know, not much helpful, so let's get you some background.
NewThread most probably implements Runnable, so it should have method void run(), like this:
class NewThread implements Runnable {
void run(){
//do something
}
}
Now, we don't know what's the actual implementation, but we still can do some analysis. The whole outcome of your examples depends on whether NewThread is stateful or stateless. "Stateful" means that instance of that class has state, for example some internal fields (attributes). "Stateless" is just "not stateful".
If NewThread is stateless, then in both cases the outcome will be the same - underneath ExecutorService executes the run() method in new thread, and as there is no state of variables anyway, we won't have any problems.
If NewThread is stateful, there may be some problems in first of your examples. Compiler won't be of much help here, as the code is OK, but the logic may be broken. Imagine this:
class NewThread implements Runnable {
int x = 0;
void run(){
while (x<10)
x = x + 1;
}
}
What you see here is a handbook example of race condition. Better authors than me explained the issue way better than me, so I'm just gonna provide some links to read, like this, this and this (also: use Google, of course). Basically, race condition in this case is that when we do x = x + 1 we first need to read x, then write to it. Between read and write some other thread may have modified the value of x, and that would be overwriten by this thread.
There is a case in which NewThread is stateful, but still works properly. This happens if you synchronize your code by-hand - either using synchronized keyword (for example, see 3rd link above) or by using synchronized data structures:
class NewThread implements Runnable {
AtomicInteger x = new AtomicInteger(0);
void run(){
while (x<10)
x.incrementAndGet(); //getAndIncrement would work too - we don't care about the result, only about incrementing
}
}
"Atomic" means that every operation on that class is considered single step, like read or write (while x = x+1 are two steps, which is exactly what leads to race condition). There are already several available atomic classes in JDK. If you would like to implement something similiar yourself, you'd probably be using synchronized keyword or some lock-like object to guard the variable.
In the first case you are creating one thread instance and attempting to execute it 5 times in the second you are creating 5 different thread instances and trying to execute them. Does that answer your question?
I think your question is rooted in bad naming. You are doing
executorService.execute(newThread);
and probably you are wondering now how why that service (which is based on a threadpool) is dealing with Threads.
Simple answer: it isn't. That interface Executor.execute() takes a Runnable object.
In other words: your code will call that run method that your class NewThread provides.
Of course, the "direct" answer to your question is: in the first case, you are sending the same Runnable 5 times to the Executor; whereas in the second case, you are sending 5 different Runnables to the Executor.
Different in the sense of: different objects - as they are of the same class, the very same thing should happen for both examples. Unless you do some nasty static stuff in NewThread; which wouldn't be too surprising given the overall impression of your question.
I haven't tried it, but the first case should execute once and then start throwing exceptions. Once an instance of Thread has terminated, it is illegal to try to start it again. See the javadoc for start:
IllegalThreadStateException - if the thread was already started.
Your second example is the more sensible of the two since it's creating 5 separate Thread instances.
I have a simple multi threading problem (in Java). I have 2 sets of 4 very large arrays and I have 4 threads, 1 for each array in the set. I want the threads, in parallel, to check if both sets, if their arrays have identical values. If one of the values in one of the arrays does not match the corresponding index value in the other array, then the two sets are not identical and all threads should stop what they are doing and move on to next 2 sets of 4 very large arrays. This process continues until all the pairs of array sets have been compared and deemed equal or not equal. I want all the threads to stop when one of the threads finds a mis-match. What is the correct way to implement this?
Here's one simple solution, but I don't know if it's the most efficient: Simply declare an object with a public boolean field.
public class TerminationEvent {
public boolean terminated = false;
}
Before starting the threads, create a new TerminationEvent object. Use this object as a parameter when you construct the thread objects, e.g.
public class MyThread implements Runnable {
private TerminationEvent terminationEvent;
public MyThread(TerminationEvent event) {
terminationEvent = event;
}
}
The same object will be passed to every MyThread, so they will all see the same boolean.
Now, the run() method in each MyThread will have something like
if (terminationEvent.terminated) {
break;
}
in the loop, and will set terminationEvent.terminated = true; when the other threads need to stop.
(Normally I wouldn't use public fields like terminated, but you said you wanted efficiency. I think this is a bit more efficient than a getter method, but I haven't tried benchmarking anything. Also, in a simple case like this, I don't think you need to worry about synchronization when the threads read or write the terminated field.)
Stopping other threads are usually done through the use of interrupts. Java threads do no longer use Thread.stop() because this was seen as unsafe in that it unlocks all monitors held by the thread, possibly leading to other threads being able to view objects in an inconsistent state (Ref: http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html). The threads are not "stopped" as such, but are commonly used to set a flag false:
The thread should check the interrupted flag (infrequently) before performing computations:
if (Thread.interrupted()) {
throw new InterruptedException();
}
Use a volatile variable to set the abort condition. In your check loop that is run by all threads, let those threads check a number N of values uninterrupted so they don't have to fetch the volatile too often, which may be costly compared to the value match test. Benchmark your solution to find the optimum for N on your target hardware.
Another way would be to use a ForkJoin approach where your result is true if a mismatch was found. Divide your array slices down to a minimum size similar to N.
I need to split an array into 5 parts (the last one would not be an equal part probably),and feed into threads for processing in parallel.
An attempt is as below:
int skip=arr.length/5;
for(int i=0;i<arr.length;i+=skip){
int[] ssub;
if(i+skip>=arr.length){
sub=new int[arr.length-1]
}
else
{
sub=new int[skip+1];
}
System.arraycopy(arr,0,sub,0,sub.length);
Thread t=new Runnable(barrier,sub){
};
t.start();
}
Any suggestion to make it more functional and avoiding local arrays is welcome.
Well in terms of making well organized Threads. You should look into ExecutorService, Simple youtube video on ExecutorService
In terms of organizing the arrays you can either split them non-locally or can create a Queue for that look towards, Java Queue implementations, which one?
These threads may work at different speeds so a Queue is recommended please look into them.
I have a thread issue in my code that should not be happening - but is. So I'm trying to make some work around. I will try to explain my problems with simple code as I can - because the code that I'm experiencing the issue is big and complicated
so in short the code:
...................
..................
void createAndRunThreads(){
List<Path> pathList = //read path from DB readPath();
for(Path p : pathList){
RunJob rj = new RunJob(p);
Thred t = new Thread(rj);
t.start();
}
}
class RunJob implements Runnable {
private Path path;
private ExecuteJob execJob;
public RunJob(Path path){
this.path = path;
this.execJob = new ExecuteJob();
}
public void run() {
execJob.execute(path);
}
}
class ExecuteJob {
private static Job curentExecutingJob;
public void execute(Path path){
//here every thread should get different job list from others but this is not happening
//so what happens eventually two threads are executing the same Job at once and it gets messy
List<Job> jobList = getJobsFromPath(path);
for(Job job : jobList) {
curentExecutingJob=job;
//work around that I'm trying to do. So if other thread tries to run the same job has to wait on lock(I dont know if this is posible do)
synchronized(curentExecutingJob){
if(job.getStatus.equals("redy")){
//do sum initialization
//and databese changes
job.run();
}
}
}
}
}
So my concern is if this going to work - I don know if the object in the lock is compared by memory(need to be the exact object) or by equals(to implement equal on it)
What happens when the static curentExecutingJob member has one value-object in first thread and creates lock on that(in synchronized block) and second thread changes that value and tries to enter synchronized block(My expectation that I'm hoping to be is that thread-2 will continue with executing and only time that it would be block is when he will get the same Job from DB that previously the first thread got it)
I don't know if this approach can be done and has sense
Two thread are running the following code that is inside method
1 Job j = getJobByIdFromDB(1111);
2 if(j.status.equals("redye")){
3 do staff
4 make database changes
5 j.run();
6 j.state="running";
7 }
The ThreadA is stop from executing in line 3 from JVM and his state is changed from running to runnable and is set to wait in the poll.
The ThreadB is given chance by the JVM and ThreadB executes lines 1, 2, 3, 4, 5, 6 that I don't want to happen. I want the first thread that enters the code in lines 2,3 to finish before someone from the rest threads have chances to enter the same code
Problem accomplish this is that the two threads are executing the example method with different instance so synchronized the whole method wont work - also I have other code that is been executed in this method and I don't want that to be synchronizing to
So is there solution for my problem
Also if I make synchronized(this.class){} it will lose the benefits and sense of multithreading
The problem is that the 'currentExecutingJob' is defined as static, meaning that all instances of ExecuteJob share the same 'instance' of this variable. In addition, you are setting the value of this variable outside of a synchronization block, which means that each thread will set it in an uncontrolled way. Your following synchronization block should have no practical impact whatsoever.
Given the way your sample code is written, it appears to me that you don't need any static variables and you don't need any synchronization, as there are no resources shared across multiple threads.
However, Your comments in the code indicate that you want to prevent two threads from executing the same job at the same time. Your code does not achieve this, as there is no comparison of running jobs to see if the same job is running, and even if there was a comparison, your getJobsFromPath() would need to to construct a job list such that the same object instance would need to be reused when two threads/paths encounter the same 'job'.
I don't see any of this in your code.
Can't comment so I'll put it as an answer. Sorry.
The block
synchronized(curentExecutingJob)
will synchronize on the object curentExecutingJob (in your terms, memory). If you synchronize on another object otherExecutingJob with currentExecutingJob.equals(otherExecutingJob) == true, both synchronize statements will not influence each other.
To your question/problem: It would be helpful if you describe what getJobsFromPath is doing or should do and what you actually want to do and what your problem actually is. It's not really clear to me.
i saw your code that it check's for the status of job, if it is ready or not, well as i think this is not a afeasible way
you can use the Callable Interface instead of Runnable
here is an example detailed which may help you.
Java Concurrency
Is it possible to create a complete fork of a 'PROGRAM' in execution into two sub-programs from a single execution sequence ?
The sub-programs produced are completely identical. They have the same execution sequences and values but now they are two different programs. It is like creating a clone of an Object, thus giving us two different objects of the same type to work on. But instead of just an object and some values, here we want to create a completely parallel execution sequence of a Program already loaded in the JVM (would prefer an answer for Java).
You seem to be looking for the Java equivalent of the fork system call from Unix.
That does not exist in Java, and it's unclear whether it would even be possible, as processes in Unix don't have a direct equivalent in the JVM (threads are less independent than processes).
There is however a fork framework planned for Java 7:
http://www.ibm.com/developerworks/java/library/j-jtp11137.html
It's not the same as Unix'es fork/join, but it shares some ideas and might be useful.
Of course you can do concurrent programming in Java, it's just not done via fork(), but using Threads.
I'm not sure exactly what you're trying to do here. It sounds to me like you have a solution in mind to a problem that would best be solved in another way. Would something like this accomplish your end goal?
public class MyApp implements Runnable
{
public MyApp(int foo, String bar)
{
// Set stuff up...
}
#Override
public void run()
{
// Do stuff...
}
public static void main(String[] argv)
{
// Parse command line args...
Thread thread0 = new Thread(new MyApp(foo, bar));
Thread thread1 = new Thread(new MyApp(foo, bar));
thread0.start();
thread1.start();
}
}
Though I would probably put main() in another object in a real app, since life-cycle management is a separate concern.
Well, using ProcessBuilder you can spawn another program.
See Java equivalent of fork in Java task of Ant?