Hibernate out of Memory Heap error

Hibernate out of Memory Heap error - java

I've got a Java application that, among other things, goes out to our Active Directory server every hour, and pulls down a list of all the accounts, and dumps them in a database; this work is done via a thread that's spawned new every hour, and the database interfacing is done via Hibernate. The thread's run method (essentially the only thing this thread does) looks like so:
public void run() {
try {
Thread.sleep(3600000); //we run once an hour, so we sleep for an hour
Thread newHourlyRunThread = new Thread(new HourlyRunThread());
newHourlyRunThread.start();
LDAPNewUsersReport report = new LDAPNewUsersReport();
Calendar calendar = Calendar.getInstance();
calendar.set(0, 0, 0, 0, 0); //We tell the report to look for everything from 12AM Jan 1 0 AD, which should be sufficient to find all created AD objects.
report.runReport(calendar.getTime(), new Date());
HashSet<LDAPEntry> allEntries = report.getAllEntries();
Iterator it = allEntries.iterator();
while (it.hasNext()) {
ContactParser.parseContact((LDAPEntry) it.next());
}
}
The relevant methods from ContactParser are below:
public static void parseContact(LDAPEntry entry) {
Contact chosenContact = null;
Session session = HibernateUtil.getSessionFactory().getCurrentSession();
session.beginTransaction();
List contacts = session.getNamedQuery("ContactByCanonicalName").setString(0, entry.getDN()).list();
Iterator it = contacts.iterator();
if (it.hasNext()) {
chosenContact = (Contact) it.next();
chosenContact = ContactParser.fillContactFields(chosenContact, entry);
} else {
chosenContact = ContactParser.fillContactFields(new Contact(), entry);
}
session.saveOrUpdate(chosenContact);
session.getTransaction().commit();
}
private static Contact fillContactFields(Contact chosenContact, LDAPEntry entry) {
chosenContact.setCanonicalName(entry.getDN());
chosenContact.setFirstName(ContactParser.getEntryField(entry, "givenName"));
chosenContact.setLastName(ContactParser.getEntryField(entry, "sn"));
chosenContact.setUserName(ContactParser.getEntryField(entry, "sAMAccountname"));
chosenContact.setEmployeeID(ContactParser.getEntryField(entry, "employeeID"));
chosenContact.setMiddleName(ContactParser.getEntryField(entry, "initials"));
chosenContact.setEmail(ContactParser.getEntryField(entry, "mail"));
if(chosenContact.getFirstSeen() == null){
chosenContact.setFirstSeen(new Date());
}
chosenContact.setLastSeen(new Date());
return chosenContact;
}
private static String getEntryField(LDAPEntry entry, String fieldName){
String returnString = "";
if(entry.getAttribute(fieldName) != null){
returnString = entry.getAttribute(fieldName).getStringValue();
}
return returnString;
}
This all works very nicely if we're only running a single instance (so, no new threads are spawned after the fact), but if we run this thread more than once (IE, I speed up execution to ~30 seconds so that I can see issues), Hibernate is reporting a lack of Heap space. This doesn't seem like it's a particularly intense set of data (only about 6K entries), but I'm seeing that same error when we bump the code over to the staging error to prepare to push to production. I'm inexperienced when it comes to writing effective threads, and very inexperienced when it comes to Hibernate, so if anyone has an idea what might be exhausting our Heap space (the other major thread in this application isn't running at the same time, and takes up a few hundred kilobytes of memory total) from looking at the code, I'd greatly appreciate any suggestions.
Thanks in advance.

You can re-write this using a ScheduledExecutorService, I suspect part of the problem is that you are creating lots of HourlyRunThread objects, when you only need one.
For example this test illustrates how to schedule a thread to run every second for 10 seconds
#Test(expected = TimeoutException.class)
public void testScheduledExecutorService() throws InterruptedException, ExecutionException, TimeoutException {
final AtomicInteger id = new AtomicInteger();
final ScheduledExecutorService service = Executors.newScheduledThreadPool(1);
service.scheduleAtFixedRate(new Runnable() {
public void run() {
System.out.println("Thread" + id.incrementAndGet());
}
}, 1, 1, TimeUnit.SECONDS).get(10, TimeUnit.SECONDS);
}
This gives output you'd expect when running, where as this test creates almost 10k threads in its 10 second runtime
private static final class HourlyRunThread extends Thread {
private static final AtomicInteger id = new AtomicInteger();
private final int seconds;
private HourlyRunThread(final int seconds) {
super("Thread" + id.incrementAndGet());
this.seconds = seconds;
}
public void run() {
try {
Thread.sleep(seconds);
if (seconds < 10) {
Thread newHourlyRunThread = new Thread(new HourlyRunThread(seconds));
newHourlyRunThread.start();
}
// do stuff
System.out.println(getName());
} catch (InterruptedException e) {
}
}
}
#Test
public void testThreading() {
final Thread t = new HourlyRunThread(1);
t.start();
}

It looks like you are doing batch insertions or updates, in which case you should be periodically flushing and clearing the Hibernate Session so that the Session-level cache does not fill up with more space than you have allocated.
See the chapter in the Hibernate manual about Batch Processing for advice on how to accomplish this.
In addition I'd strongly suggest finding another way to launch your tasks on a scheduled timeframe, either using a ScheduledExecutorService as suggested by Jon Freedman or using a library such as Quartz Scheduler. Sleeping the thread for 3600000 milliseconds before launching the actual thread to do the work seems like it'd be a highly problematic (and non-deterministic) way to handle this.

Memory Analyzer is a free open source powerfull Java heap analyzer. I already used it several time to identify source of memory leaks. With this tool you will be able to quickly see if hibernate is the one to punish ;-)

Thanks for the suggestions everyone, but as it turned out, the error we were receiving was actually caused by a configuration error between local testing and staging -- the database was new and the permissions weren't configured correctly to allow the staging area to speak to the created database. When run with the correct permissions, it works like a charm.
I will definitely look into setting up batch settings for Hibernate, and moving to a thread scheduler instead of my current hacked together system.

I was accidently creating a new sessionfactory for each transaction. For some reason GC was not able to clean up those old sessionfactories.
Using always the same SessionFactory instance solved my problems.

Related

Execution of Tasks in ExecutorService without Thread pauses

I have a thread pool with 8 threads
private static final ExecutorService SERVICE = Executors.newFixedThreadPool(8);
My mechanism emulating the work of 100 user (100 Tasks):
List<Callable<Boolean>> callableTasks = new ArrayList<>();
for (int i = 0; i < 100; i++) { // Number of users == 100
callableTasks.add(new Task(client));
}
SERVICE.invokeAll(callableTasks);
SERVICE.shutdown();
The user performs the Task of generating a document.
Get UUID of Task;
Get Task status every 10 seconds;
If Task is ready get document.
public class Task implements Callable<Boolean> {
private final ReportClient client;
public Task(ReportClient client) {
this.client = client;
}
#Override
public Boolean call() {
final var uuid = client.createDocument(documentId);
GetStatusResponse status = null;
do {
try {
Thread.sleep(10000); // This stop current thread, but not a Task!!!!
} catch (InterruptedException e) {
return Boolean.FALSE;
}
status = client.getStatus(uuid);
} while (Status.PENDING.equals(status.status()));
final var document = client.getReport(uuid);
return Boolean.TRUE;
}
}
I want to give the idle time (10 seconds) to another task. But when the command Thread.sleep(10000); is called, the current thread suspends its execution. First 8 Tasks are suspended and 92 Tasks are pending 10 seconds. How can I do 100 Tasks in progress at the same time?

The Answer by Yevgeniy looks correct, regarding Java today. You want to have your cake and eat it too, in that you want a thread to sleep before repeating a task but you also want that thread to do other work. That is not possible today, but may be in the future.
Project Loom
In current Java, a Java thread is mapped directly to a host OS thread. In all common OSes such as macOS, BSD, Linux, Windows, and such, when code executing in a host thread blocks (stops to wait for sleep, or storage I/O, or network I/O, etc.) the thread too blocks. The blocked thread suspends, and the host OS generally runs another thread on that otherwise unused core. But the crucial point is that the suspended thread performs no further work until your blocking call to sleep returns.
This picture may change in the not-so-distant future. Project Loom seeks to add virtual threads to the concurrency facilities in Java.
In this new technology, many Java virtual threads are mapped to each host OS thread. Juggling the many Java virtual threads is managed by the JVM rather than by the OS. When the JVM detects a virtual thread’s executing code is blocking, that virtual thread is "parked", set aside by the JVM, with another virtual thread swapped out for execution on that "real" host OS thread. When the other thread returns from its blocking call, it can be reassigned to a "real" host OS thread for further execution. Under Project Loom, the host OS threads are kept busy, never idled while any pending virtual thread has work to do.
This swapping between virtual threads is highly efficient, so that thousands, even millions, of threads can be running at a time on conventional computer hardware.
Using virtual threads, your code will indeed work as you had hoped: A blocking call in Java will not block the host OS thread. But virtual threads are experimental, still in development, scheduled as a preview feature in Java 19. Early-access builds of Java 19 with Loom technology included are available now for you to try. But for production deployment today, you'll need to follow the advice in the Answer by Yevgeniy.
Take my coverage here with a grain of salt, as I am not an expert on concurrency. You can hear it from the actual experts, in the articles, interviews, and presentations by members of the Project Loom team including Ron Pressler and Alan Bateman.

EDIT: I just posted this answer and realized that you seem to be using that code to emulate real user interactions with some system. I would strongly recommend just using a load testing utility for that, rather than trying to come up with your own. However, in that case just using a CachedThreadPool might do the trick, although probably not a very robust or scalable solution.
Thread.sleep() behavior here is working as intended: it suspends the thread to let the CPU execute other threads.
Note that in this state a thread can be interrupted for a number of reasons unrelated to your code, and in that case your Task returns false: I'm assuming you actually have some retry logic down the line.
So you want two mutually exclusive things: on the one hand, if the document isn't ready, the thread should be free to do something else, but should somehow return and check that document's status again in 10 seconds.
That means you have to choose:
You definitely need that once-every-10-seconds check for each document - in that case, maybe use a cachedThreadPool and have it generate as many threads as necessary, just keep in mind that you'll carry the overhead for numerous threads doing virtually nothing.
Or, you can first initiate that asynchronous document creation process and then only check for status in your callables, retrying as needed.
Something like:
public class Task implements Callable<Boolean> {
private final ReportClient client;
private final UUID uuid;
// all args constructor omitted for brevity
#Override
public Boolean call() {
GetStatusResponse status = client.getStatus(uuid);
if (Status.PENDING.equals(status.status())) {
final var document = client.getReport(uuid);
return Boolean.TRUE;
} else {
return Boolean.FALSE; //retry next time
}
}
}
List<Callable<Boolean>> callableTasks = new ArrayList<>();
for (int i = 0; i < 100; i++) {
var uuid = client.createDocument(documentId); //not sure where documentId comes from here in your code
callableTasks.add(new Task(client, uuid));
}
List<Future<Boolean>> results = SERVICE.invokeAll(callableTasks);
// retry logic until all results come back as `true` here
This assumes that createDocument is relatively efficient, but that stage can be parallelized just as well, you just need to use a separate list of Runnable tasks and invoke them using the executor service.
Note that we also assume that the document's status will indeed eventually change to something other than PENDING, and that might very well not be the case. You might want to have a timeout for retries.

In your case, it seems like you need to check if a certain condition is met every x seconds. In fact, from your code the document generation seems asynchronous and what the Task keeps doing after that is just is waiting for the document generation to happen.
You could launch every document generation from your Thread-Main and use a ScheduledThreadPoolExecutor to verify every x seconds whether the document generation has been completed. At that point, you retrieve the result and cancel the corresponding Task's scheduling.
Basically, one ConcurrentHashMap is shared among the thread-main and the Tasks you've scheduled (mapRes), while the other, mapTask, is just used locally within the thread-main to keep track of the ScheduledFuture returned by every Task.
public class Main {
public static void main(String[] args) {
ScheduledThreadPoolExecutor pool = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(8);
//ConcurrentHashMap shared among the submitted tasks where each Task updates its corresponding outcome to true as soon as the document has been produced
ConcurrentHashMap<Integer, Boolean> mapRes = new ConcurrentHashMap<>();
for (int i = 0; i < 100; i++) {
mapRes.put(i, false);
}
String uuid;
ScheduledFuture<?> schedFut;
//HashMap containing the ScheduledFuture returned by scheduling each Task to cancel their repetition as soon as the document has been produced
Map<String, ScheduledFuture<?>> mapTask = new HashMap<>();
for (int i = 0; i < 100; i++) {
//Starting the document generation from the thread-main
uuid = client.createDocument(documentId);
//Scheduling each Task 10 seconds apart from one another and with an initial delay of i*10 to not start all of them at the same time
schedFut = pool.scheduleWithFixedDelay(new Task(client, uuid, mapRes), i * 10, 10000, TimeUnit.MILLISECONDS);
//Adding the ScheduledFuture to the map
mapTask.put(uuid, schedFut);
}
//Keep checking the outcome of each task until all of them have been canceled due to completion
while (!mapTasks.values().stream().allMatch(v -> v.isCancelled())) {
for (Integer key : mapTasks.keySet()) {
//Canceling the i-th task scheduling if:
// - Its result is positive (i.e. its verification is terminated)
// - The task hasn't been canceled already
if (mapRes.get(key) && !mapTasks.get(key).isCancelled()) {
schedFut = mapTasks.get(key);
schedFut.cancel(true);
}
}
//... eventually adding a sleep to check the completion every x seconds ...
}
pool.shutdown();
}
}
class Task implements Runnable {
private final ReportClient client;
private final String uuid;
private final ConcurrentHashMap mapRes;
public Task(ReportClient client, String uuid, ConcurrentHashMap mapRes) {
this.client = client;
this.uuid = uuid;
this.mapRes = mapRes;
}
#Override
public void run() {
//This is taken form your code and I'm assuming that if it's not pending then it's completed
if (!Status.PENDING.equals(client.getStatus(uuid).status())) {
mapRes.replace(uuid, true);
}
}
}
I've tested your case locally, by emulating a scenario where n Tasks wait for a folder with their same id to be created (or uuid in your case). I'll post it right here as a sample in case you'd like to try something simpler first.
public class Main {
public static void main(String[] args) {
ScheduledThreadPoolExecutor pool = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(2);
ConcurrentHashMap<Integer, Boolean> mapRes = new ConcurrentHashMap<>();
for (int i = 0; i < 16; i++) {
mapRes.put(i, false);
}
ScheduledFuture<?> schedFut;
Map<Integer, ScheduledFuture<?>> mapTasks = new HashMap<>();
for (int i = 0; i < 16; i++) {
schedFut = pool.scheduleWithFixedDelay(new MyTask(i, mapRes), i * 20, 3000, TimeUnit.MILLISECONDS);
mapTasks.put(i, schedFut);
}
while (!mapTasks.values().stream().allMatch(v -> v.isCancelled())) {
for (Integer key : mapTasks.keySet()) {
if (mapRes.get(key) && !mapTasks.get(key).isCancelled()) {
schedFut = mapTasks.get(key);
schedFut.cancel(true);
}
}
}
pool.shutdown();
}
}
class MyTask implements Runnable {
private int num;
private ConcurrentHashMap mapRes;
public MyTask(int num, ConcurrentHashMap mapRes) {
this.num = num;
this.mapRes = mapRes;
}
#Override
public void run() {
System.out.println("Task " + num + " is checking whether the folder exists: " + Files.exists(Path.of("./" + num)));
if (Files.exists(Path.of("./" + num))) {
mapRes.replace(num, true);
}
}
}

Starting threads in a schuled task in Spring Web Application?

I'm using Spring Boot for a web server, and there is a scheduled task that I have to run every hour. It involves making thousands of http requests which I have stored in a list (retrieved and set from a different endpoint), which obviously will take long. To speed things up, inside the scheduled method I start up four threads to each handle a fourth of the http calls that I have to make. There is absolutely no risk of deadlock or race-conditions. It's rather simple: I have 1000 http requests to make every hour, thread one will handle the first 250, thread two will handle the next 250, etc.
#Component
public MyComponent {
private List<URI> uris;
...
#Scheduled(fixedRate = 3600000)
public void process() {
List<List<URI>> uriList = //method that will divide up the uri's into equal fourths
uriList.forEach(uri -> new Thread(new URIProcessor(uri)).start());
Would this be an acceptable practice? I know Spring offers its own abstractions for multithreading but I feel such a simple task shouldn't require using them.

one important point to consider, you have configured this process method to run in every 1 hour, however here you don't care what happened to the ones raised before !!
Example:1) Think about this if the previous threads are still running because the URI opening is taking long, in this case you will end up increasing the threads every hour !! So ensure you test that your thread completion time
2) If you thread gets struck say due to technical reason what would you like it to do then ? this needs to be accounted for.
One solution is set a global variable/indicator say a file or database entry to tell new starting process that old is completed else some way to inform you say by logging exceptions or shooting email to you etc...
Please accept and like if you appreciate my gesture to help with my ideas n experience.

You may think of something like this to handle start, stop and restarts
#Component
#Scope("prototype")
public class AutoTimerService {
private ScheduledExecutorService scheduledThreadPool = null;
private Runnable autoTask = null;
private Long currentDelayIntervalInMs;
private boolean isTaskRunning = false;
public AutoTimerService(String name, Long delayIntervalInMs, Runnable autoTask){
if (name == null || name.isEmpty()){
throw new RuntimeException("Please specify a friendly name to the timer service");
}
if (autoTask == null){
throw new RuntimeException("Please specify task to be scheduled of type java.util.TimerTask");
}
this.autoTask = autoTask;
this.currentDelayIntervalInMs = delayIntervalInMs;
}
public synchronized void startTask() {
if (!isTaskRunning) {
scheduledThreadPool = Executors.newScheduledThreadPool(1);;
scheduledThreadPool.scheduleWithFixedDelay(autoTask, 0, currentDelayIntervalInMs, TimeUnit.MILLISECONDS);
isTaskRunning = true;
}
}
public synchronized void resetTask(Long delayIntervalInMs) {
stopTask();
this.currentDelayIntervalInMs = delayIntervalInMs;
startTask();
}
public synchronized void stopTask() {
if (isTaskRunning){
scheduledThreadPool.shutdown();
while(!scheduledThreadPool.isTerminated());
isTaskRunning = false;
}
}

Why cannot `ExecutorService` consistently schedule threads?

I am attempting to reimplement my concurrent code using CyclicBarrier which is new to me. I can do without it but am time trialling it against my other solution, the problem I have is a deadlock situation with the following code:
//instance variables (fully initialised elsewhere).
private final ExecutorService exec = Executors.newFixedThreadPool(4);
private ArrayList<IListener> listeners = new ArrayList<IListener>();
private int[] playerIds;
private class WorldUpdater {
final CyclicBarrier barrier1;
final CyclicBarrier barrier2;
volatile boolean anyChange;
List<Callable<Void>> calls = new ArrayList<Callable<Void>>();
class SyncedCallable implements Callable<Void> {
final IListener listener;
private SyncedCallable(IListener listener) {
this.listener = listener;
}
#Override
public Void call() throws Exception {
listener.startUpdate();
if (barrier1.await() == 0) {
anyChange = processCommons();
}
barrier2.await();
listener.endUpdate(anyChange);
return null;
}
}
public WorldUpdater(ArrayList<IListener> listeners, int[] playerIds) {
barrier2 = new CyclicBarrier(listeners.size());
barrier1 = new CyclicBarrier(listeners.size());
for (int i : playerIds)
calls.add(new SyncedCallable(listeners.get(i)));
}
void start(){
try {
exec.invokeAll(calls);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
void someMethodCalledEveryFrame() {
//Calls some Fisher-something method that shuffles int[]
shufflePIDs();
WorldUpdater updater = new WorldUpdater(listeners, playerIds);
updater.start();
}
I use the debugger in Android Studio (intelliJ) to pause execution at this stage. I get multiple threads showing the my await calls as the last of my code to be executed
->Unsafe.park
->LockSupport.park
->AbstractQueuedSynchronizer$ConditionObject.await
->CyclicBarrier.doWait
->CyclicBarrier.await
At least one thread will be have this stack:
->Unsafe.park.
->LockSupport.park
->AbstractQueuedSynchronizer$ConditionObject.await
->LinkedBlockingQueue.take
->ThreadPoolExecutor.getTask
->ThreadPoolExecutor.runWorker
->ThreadPoolExecutor$Worker.run
->Thread.run
I notice that the CyclicBarrier plays no part in these latter stray threads.
processCommons is calling exec.invokeAll (on the 3 listeners), I suppose this means I am running out of threads. But many times this doesn't happen so please could someone clarify why ExecutorService cannot consistently schedule my threads? They have their own stack and program counter so I would have thought this to not be a problem. I only ever have max 4 running at once. Someone help me with the math?

What is the value of listeners.size() when your WorldUpdater is created? If it is more than four, then your threads will never get past the barrier.
Your ExecutorService has exactly four threads. No more, no fewer. The callers of barrier1.await() and barrier2.await() will not get past the barrier until exactly listeners.size() threads are waiting.
My gut reaction is, it would be a mistake for pool threads to use a CyclicBarrier. CyclicBarrier is only useful when you know exactly how many threads will be using it. But, when you're using a thread pool, you often do not know the size of the pool. In fact, in a real-world (i.e., commercial) application, if you're using a thread pool, It probably was not created by your code at all. It probably was created somewhere else, and passed in to your code as an injected dependency.

I did a little experiment and came up with:
#Override
public Void call() throws Exception {
System.out.println("startUpdate, Thread:" + Thread.currentThread());
listener.startUpdate();
if (barrier1.await() == 0) {
System.out.println("processCommons, Thread:" + Thread.currentThread());
anyChange = processCommons();
}
barrier2.await();
System.out.println("endUpdate, Thread:" + Thread.currentThread());
listener.endUpdate(anyChange);
return null;
}
Which revealed when using a pool of 3 with 3 listeners, I will always hang in processCommons which contains the following:
List<Callable<Void>> calls = new ArrayList<Callable<Void>>();
for (IListener listiner : listeners)
calls.add(new CommonsCallable(listener));
try {
exec.invokeAll(calls);
} catch (InterruptedException e) {
e.printStackTrace();
}
With 2 threads waiting at the barrier and the third attempting to create 3 more. I needed one extra thread in the ExecutorService and the 2 at the barrier could be "recycled" as I was asking in my question. I've got references to 6 threads at this stage when exec is only holding 4. This can run happily for many minutes.
private final ExecutorService exec = Executors.newFixedThreadPool(8);
Should be better, but it was not.
Finally I did breakpoint stepping in intelliJ (thanks ideaC!)
The problem is
if (barrier1.await() == 0) {
anyChange = processCommons();
}
barrier2.await();
Between the 2 await you may get several suspended threads that haven't actually reached the await. In the case of 3 listeners out of a pool of 4 it only takes one to get "unscheduled" (or whatever) and barrier2 will never get the full complement. But what about when I have a pool of 8? The same behaviour manifests with all but two of the threads the stack of limbo:
->Unsafe.park.
->LockSupport.park
->AbstractQueuedSynchronizer$ConditionObject.await
->LinkedBlockingQueue.take
->ThreadPoolExecutor.getTask
->ThreadPoolExecutor.runWorker
->ThreadPoolExecutor$Worker.run
->Thread.run
What can be happening here to disable all 5 threads? I should have taken James Large's advice and avoided crowbarring in this over elaborate CyclicBarrier.--UPDATE-- It can run all night now without CyclicBarrier.

How to calculate run-time for a multi-threaded program?

I am trying to test the performance (in terms of execution time) for my webcrawler but I am having trouble timing it due to multi-threading taking place.
My main class:
class WebCrawlerTest {
//methods and variables etc
WebCrawlerTest(List<String> websites){
//
}
if(!started){
startTime = System.currentTimeMillis();
executor = Executors.newFixedThreadPool(32); //this is the value I'm tweaking
started=true;
}
for(String site : websites){
executor.submit(webProcessor = new AllWebsiteProcessorTest(site, deepSearch));
}
executor.shutdown();
//tried grabbing end time here with no luck
AllWebsiteProcessorTest class:
class AllWebsiteProcessorTest implements Runnable{
//methods and var etc
AllWebsiteProcessorTest(String site, boolean deepSearch) {
}
public void run() {
scanSingleWebsite(websites);
for(String email:emails){
System.out.print(email + ", ");
}
private void scanSingleWebsite(String website){
try {
String url = website;
Document document = Jsoup.connect(url).get();
grabEmails(document.toString());
}catch (Exception e) {}
With another class (with a main method), I create an instance of WebCrawlerTest and then pass in an array of websites. The crawler works fine but I can't seem to figure out how to time it.
I can get the start time (System.getCurrentTime...();), but the problem is the end time. I've tried adding the end time like this:
//another class
public static void main(.....){
long start = getCurrent....();
WebCrawlerTest w = new WebCrawlerTest(listOfSites, true);
long end = getCurrent....();
}
Which doesn't work. I also tried adding the end after executor.shutdown(), which again doesn't work (instantly triggered). How do I grab the time for the final completed thread?

After shutting down your executors pool
executor.shutdown();
//tried grabbing end time here with no luck
You can simply
executor.awaitTermination(TimeUnit, value)
This call will block untill all tasks are completed. Take the time, subtract T0 from it and voila, we have execution time.
shutdown() method just assures that no new tasks will be accepted into excution queue. Tasks already in the queue will be performed (shutdownNow() drops pending tasks). To wait for all currently running tasks to complete, you have to awaitTermination().

Junit test the correct number of threads has started

So I have a method that starts five threads. I want to write a unit test just to check that the five threads have been started. How do I do that? Sample codes are much appreciated.

Instead of writing your own method to start threads, why not use an Executor, which can be injected into your class? Then you can easily test it by passing in a dummy Executor.
Edit: Here's a simple example of how your code could be structured:
public class ResultCalculator {
private final ExecutorService pool;
private final List<Future<Integer>> pendingResults;
public ResultCalculator(ExecutorService pool) {
this.pool = pool;
this.pendingResults = new ArrayList<Future<Integer>>();
}
public void startComputation() {
for (int i = 0; i < 5; i++) {
Future<Integer> future = pool.submit(new Robot(i));
pendingResults.add(future);
}
}
public int getFinalResult() throws ExecutionException {
int total = 0;
for (Future<Integer> robotResult : pendingResults) {
total += robotResult.get();
}
return total;
}
}
public class Robot implements Callable<Integer> {
private final int input;
public Robot(int input) {
this.input = input;
}
#Override
public Integer call() {
// Some very long calculation
Thread.sleep(10000);
return input * input;
}
}
And here's how you'd call it from your main():
public static void main(String args) throws Exception {
// Note that the number of threads is now specified here
ExecutorService pool = Executors.newFixedThreadPool(5);
ResultCalculator calc = new ResultCalculator(pool);
try {
calc.startComputation();
// Maybe do something while we're waiting
System.out.printf("Result is: %d\n", calc.getFinalResult());
} finally {
pool.shutdownNow();
}
}
And here's how you'd test it (assuming JUnit 4 and Mockito):
#Test
#SuppressWarnings("unchecked")
public void testStartComputationAddsRobotsToQueue() {
ExecutorService pool = mock(ExecutorService.class);
Future<Integer> future = mock(Future.class);
when(pool.submit(any(Callable.class)).thenReturn(future);
ResultCalculator calc = new ResultCalculator(pool);
calc.startComputation();
verify(pool, times(5)).submit(any(Callable.class));
}
Note that all this code is just a sketch which I have not tested or even tried to compile yet. But it should give you an idea of how the code can be structured.

Rather than saying you are going to "test the five threads have been started", it would be better to step back and think about what the five threads are actually supposed to do. Then test to make sure that that "something" is actually being done.
If you really just want to test that the threads have been started, there are a few things you could do. Are you keeping references to the threads somewhere? If so, you could retrieve the references, count them, and call isAlive() on each one (checking that it returns true).
I believe there is some method on some Java platform class which you can call to find how many threads are running, or to find all the threads which are running in a ThreadGroup, but you would have to search to find out what it is.
More thoughts in response to your comment
If your code is as simple as new Thread(runnable).start(), I wouldn't bother to test that the threads are actually starting. If you do so, you're basically just testing that the Java platform works (it does). If your code for initializing and starting the threads is more complicated, I would stub out the thread.start() part and make sure that the stub is called the desired number of times, with the correct arguments, etc.
Regardless of what you do about that, I would definitely test that the task is completed correctly when running in multithreaded mode. From personal experience, I can tell you that as soon as you start doing anything remotely complicated with threads, it is devilishly easy to get subtle bugs which only show up under certain conditions, and perhaps only occasionally. Dealing with the complexity of multithreaded code is a very slippery slope.
Because of that, if you can do it, I would highly recommend you do more than just simple unit testing. Do stress tests where you run your task with many threads, on a multicore machine, on very large data sets, and make sure all the answers are exactly as expected.
Also, although you are expecting a performance increase from using threads, I highly recommend that you benchmark your program with varying numbers of threads, to make sure that the desired performance increase is actually achieved. Depending on how your system is designed, it's possible to wind up with concurrency bottlenecks which may make your program hardly faster with threads than without. In some cases, it can even be slower!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.