I am fetching thousands of URLs and parsing tables from it using parseDocument(doc). It would take hours to parse all tables, so I wanted to use threads to parse a lot of them at the same time, but don't know to do it.
The code below is a for loop that I need to use threads on:
for(int i = 0; i < urlList.size(); i++) {
Document doc = Jsoup.connect(urlList.get(i)).get();
reader.add(parseDocument(doc));
}
for(int i = 0; i < urlList.size(); i++) {
Thread t = new Thread(new Runnable() {
public void run() {
try {
//parsedocument( urlList.get(i))
} catch (Exception e) { //Catching exeptions
e.printStackTrace();
}
}
});
t.start();
}
We create a thread variable called 't' that we could use. For example, can wait for it to finish using .Join(). A thread requires a run void where the actual operations will execute, this is because it uses something called "runnable" which is an interface that needs that Run method. Here you would parse the document. Finally, we start the thread by call t.start().
Related
In a certain part of a Java code that i am working, i need to place a timer inside a run() method. Each thread will execute all code inside run(). But i need to start measuring after block (1) and before block of code (2) so the timer needs to be triggered there.
for (int i = 0; i < total_threads-1; i++){
final int id = i+1;
th[i] = new Thread(new Runnable() {
public final void run(){
/* ... block of code (1) executed by multiple threads ... */
/* How can i start this counter only once? */
final long begin = System.currentTimeMillis();
/* ... another block of code (2) executed by multiple threads i need to measure!!! ... */
}
});
th[i].start();
}
for(int i = 0 ; i < total_threads-1 ; i++) {
try {
th[i].join();
}
catch (InterruptedException e) {}
}
final long end = System.currentTimeMillis();
System.out.println((end-begin) / 1000.0);
But all the threads will have their own begin variable and start the counter which is a problem because System.currentTimeMillis() should be triggered once and not by many threads.
I probably could separate the code of run() in two different parallel regions but would imply creating the threads twice which would be unacceptable (in terms of performance).
There is a similar technique of OpenMP directive #pragma omp master for Java using Java threads?
How can i measure the time correctly here?
You can check the thread ID to execute prefer line once:
if (id == YOUR_THREAD_ID)
{
begin = System.currentTimeMillis();
}
The simplest way would just be to record the time in the master thread (i.e. the thread which creates all the children) instead of in the children themselves.
But if you really want to start the timer in the children (maybe there's some expensive setup?), you could use a Guava StopWatch object.
The code would look something like:
StopWatch timer = StopWatch.createUnstarted();
for (int i = 0; i < total_threads-1; i++){
final int id = i+1;
th[i] = new Thread(() -> {
/* ... block of code (1) executed by multiple threads ... */
try {
synchronized (timer) {
timer.start();
}
} catch (IllegalStateException e) {
// ignore; the watch is already started
}
/* ... another block of code (2) executed by multiple threads i need to measure!!! ... */
});
th[i].start();
}
for(int i = 0 ; i < total_threads-1 ; i++) {
try {
th[i].join();
} catch (InterruptedException e) {}
}
timer.stop();
System.out.println(timer.elapsed(TimeUnit.SECONDS));
Can someone explain the following behavior to me?
Given this code:
for(int j = 0; j<100; j+=10) {
for(int i = 0; i<10; i++) {
threads[i] = new Thread(new RunAmounts(i+j));
threads[i].start();
}
for(Thread thread : threads) {
try {
if(thread != null)
thread.join();
} catch(InterruptedException ie) {
ie.printStackTrace();
return;
}
}
System.gc();
}
Assuming that RunAmounts does nothing but print its parameter. One would expect a single print of each number 0-99, but each number ends up printing several times. Can someone explain this property of threads?
EDIT: may be due to run(), in actuality, code passes a unique pageNum to RunAmounts which appends it to a SQL statement
class RunAmounts extends Thread {
private int pageNum;
public RunAmounts(int pageNum) {
this.pageNum = pageNum;
}
public void run() {
ResultSet rs = null;
String usdAmt, row[] = new String[5], extr[] = new String[3];
LinkedList<String[]> toWrite = new LinkedList<String[]>();
CSVWriter fw = null;
boolean cont;
try {
fw = new CSVWriter(new FileWriter("Amounts.csv", true), ',');
do {
//executes SQL command, initializes rs & pst
cont = pst.execute();
while(rs.next()) {
//does a bit of parsing
toWrite.addFirst(row);
synchronized(this) {
fw.writeAll(toWrite);
fw.flush();
}
toWrite.clear();
}
System.out.println("page: " + Integer.toString(pageNum));
rs.close();
} while(cont);
fw.close();
} catch(Exception e) {e.printStackTrace();}
}
This example was hard for me to read, it takes some careful reading to see it's only calling join on the most recent 10 threads started. The array can be gotten rid of (unless you want to hold onto a reference to these in order to call interrupt on them, in which case you need a bigger array, of course), the equivalent functionality in Groovy can be written like this:
class RunAmounts implements Runnable {
final int i
public void run() {
println i
}
RunAmounts(int i) {this.i = i}
}
def foo() {
(0 .. 90).step(10).each { j ->
(0 .. 9).each { i ->
t = new Thread(new RunAmounts(i + j) as Runnable)
t.start()
t.join()
}
}
}
and it works fine. I can add the array part back in (using a list here but it's the same concept)
def foo() {
(0 .. 90).step(10).each { j ->
threads = []
(0 .. 9).each { i ->
t = new Thread(new RunAmounts(i + j) as Runnable)
t.start()
threads << t
}
threads.each { it.join() }
}
}
and it still works.
So I think you're looking in the wrong spot. Either you edited out the real issue when you created this example or your problem is somewhere else.
How you get the database connection for your RunAmounts object is redacted from your example. JDBC objects are not threadsafe (connections are technically threadsafe though in a way that's not helpful to the application developer, as a practical matter their use needs to be confined to a single thread at a time), if you do this part wrong it could be a problem.
Your assertion would be valid if you remove the first inner loop,
Thread[] threads = new Thread[100];
for(int j = 0; j<threads.length; j++) {
//for(int i = 0; i<10; i++) {
threads[j] = new Thread(new RunAmounts(j));
threads[j].start();
// }
}
for(Thread thread : threads) {
try {
if(thread != null)
thread.join();
} catch(InterruptedException ie) {
ie.printStackTrace();
return;
}
}
}
If you read the manual you would recognize that I'm telling you line 3 has probably a problem with the addition.
Whereas
/* thread 1 */
t1 = new Date().getTime();
/* thread 2 */
t2 = new Date().getTime();
if(t2 < t1){
System.out.println("Your wrong with your assumption");
}
Wouldn't fit if precision wouldn't have problems.
docs.oracle.com/javase/specs/jls/se7/html/jls-17.html
Use atomic operations in order to ensure memory barriers.
It doesn't matter that's why I'm telling you.
Your assumption of timing is probably wrong:
t2 - t1 > 0
I did some thoughts about timing and everything with more than one t variable sounds like madness. Your problem is rather fundamental of computation and synchronization.
I'd like you to refer to a blog post I've written:
http://sourceforge.net/p/ags/blog/2014/07/mathematical-properties-of-timing/
I do thread synchronization on my system and it's really unreliable and expiring unexpected crashes.
So let's say I'm creating and starting a bunch of threads in a for loop, that is being executed in the run method of a launcher thread. Let's also say that I want to be able to interrupt the launcher thread and all threads that the thread has created, and I do this through a button.
So something like this -
try{
for(int i = 0; i < n;i++){
Worker currThread = new Worker(someArgs);
workerThreads.add(currThread);
currThread.start();
}
} catch (InterruptedException e){
e.printStackTrace();
}
BUTTON-
public void actionPerformed(ActionEvent arg0) {
List<Worker> threads = launchThread.getWorkerThreads();
for(int i = 0; i < threads.size();i++){
threads.get(i).interrupt();
}
launchThread.interrupt();
}
Now, let's say that I want to make it so that the interrupts cannot occur at the same time as thread creation. I think a way to do this would be to construct a dummy object and put both pieces of code inside a lock
synchronized(dummyObject){
//thread creation or interruption code here (shown above)
}
Will this way work? I ask because I'm not sure how to test to see if it will.
Start the threads separately from creating them.
for(int i = 0; i < n; i++) {
Worker currThread = new Worker(someArgs);
workerThreads.add(currThread);
}
// later
for (Worker w : workerThreads) {
w.start();
}
If that's still not enough, your dummyObject synchronization should work just fine.
// You probably need to make this a (private final) field
Object lock = new Object();
// later
synchronized (lock) {
for(int i = 0; i < n; i++) {
Worker currThread = new Worker(someArgs);
workerThreads.add(currThread);
w.start();
}
}
// later still
public void actionPerformed(ActionEvent arg0) {
synchronized (lock) {
// interruption code here
}
}
The concept of synchronization remains the same however complicated are the underlying operations to be executed.
As you specified, there are two types of mutually exclusive tasks (thread creation and interruption). So locking is pretty much the canonical tool for the job.
I read in a few posts that using JUnit to test concurrency is not ideal but I have no choice for now. I have just encountered an exception that I can't explain.
I run a test where, in summary:
I submit 1000 runnables to an executor
each runnable adds an element to a list
I wait for the executor termination
JUnit tells me the list only has 999 elements
no exception is printed in the runnable catch block
What could cause that behavior?
Note: I only get the exception from time to time. The code has some non related stuff but I left it there in case I missed something. XXXQuery is an enum.
public void testConcurrent() throws InterruptedException {
final int N_THREADS = 1000;
final XXXData xxxData = new AbstractXXXDataImpl();
final List<QueryResult> results = new ArrayList<>();
ExecutorService executor = Executors.newFixedThreadPool(N_THREADS);
for (int i = 0; i < N_THREADS; i++) {
final int j = i;
executor.submit(new Runnable() {
#Override
public void run() {
try {
results.add(xxxData.get(XXXQuery.values()[j % XXXQuery.values().length]));
} catch (Exception e) {
System.out.println(e);
}
}
});
}
executor.shutdown();
executor.awaitTermination(10, TimeUnit.SECONDS);
assertEquals(N_THREADS, results.size());
}
You cannot add to the results ArrayList in your Runnable.run() method in multiple threads without synchronizing around it.
The assertion failed message is showing that although N_THREADS calls to add() were made, the ArrayList got fewer entries because of concurrency race conditions.
I would use a final array instead of a list. Something like:
final QueryResult[] results = new QueryResult[N_THREADS];
for (int i = 0; i < N_THREADS; i++) {
...
public void run() {
results[j] = data.get(Query.values()[j % Query.values().length]);
}
Also, I don't quite get the XXXQuery.values() but I'd pull that into a variable above the loop unless it is changing.
I'm writing a Java app which writes to excel sheet bunch of data, and it takes a while to do so.
I'd like to create something like writing out dots to screen like on Linux when you're installing something.
Is that possible in java?printing dots, while other thread actually does the writing to excel, then after its finished the one displaying dots also quits?
I'd like to print dots to console.
A variation to #John V. answer would be to use a ScheduledExecutorService:
// SETUP
Runnable notifier = new Runnable() {
public void run() {
System.out.print(".");
}
};
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
// IN YOUR WORK THREAD
scheduler.scheduleAtFixedRate(notifier, 1, 1, TimeUnit.SECONDS);
// DO YOUR WORK
schedule.shutdownNow();
Modify the notifier object to suit your individual needs.
Its very possible. Use a newSingleThreadExecutor to print the dots while the other thread does the parsing. For example
ExecutorService e = Executors.newSingleThreadExecutor();
Future f = e.submit(new Runnable(){
public void run(){
while(!Thread.currentThread().isInterrupted()){
Thread.sleep(1000); //exclude try/catch for brevity
System.out.print(".");
}
}
});
//do excel work
f.cancel(true);
e.shutdownNow();
Yes, it is possible, you will want to have your working thread set a variable to indicate that it is working and when it is finished. Then create a thread by either extending the Thread class or implementing the Runnable interface. This thread should just infinitely loop and inside this loop it should do whatever printing you want it to do, and then check the variable to see if the work is done. When the variable value changes, break the loop and end the thread.
One note. Watch your processing speed. Use Thread.sleep() inside your loop if your processor usage goes way high. This thread should not be labour intensive. System.gc() is another popular way to make threads wait.
Not an elegant solution, but get's the job done. It prints 1, 2, 3, 1, 2... dots in a loop and terminates everything after 5 seconds.
public class Busy {
public Busy() {
Indicator i = new Indicator();
ExecutorService ex = Executors.newSingleThreadExecutor();
ex.submit(i);
try {
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
i.finished = true;
ex.shutdown();
}
public static void main(String[] args) {
new Busy();
}
private class Indicator implements Runnable {
private static final int DOTS_NO = 3;
private volatile boolean finished = false;
#Override
public void run() {
for (int i = 0; !finished; i = (i + 1) % (DOTS_NO + 1)) {
for (int j = 0; j < i; j++) {
System.out.print('.');
}
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
for (int j = 0; j < i; j++) {
System.out.print("\b \b");
}
}
}
}
}