How do I reduce/change delay after crawling? - java

Has anybody experience with using Crawler4j?
I followed the example from the project page to realize my own crawler. The crawler is working fine and crawls very fast. The only thing is that I always have a delay of 20–30 seconds. Is there a way to avoid the waiting time?

Just checked crawler4j source code. The CrawerController.start method have a lot of fixed 10 seconds "pauses" going on to make sure that threads are done and ready to be cleaned up.
// Make sure again that none of the threads
// are
// alive.
logger.info("It looks like no thread is working, waiting for 10 seconds to make sure...");
sleep(10);
// ... more code ...
logger.info("No thread is working and no more URLs are in queue waiting for another 10 seconds to make sure...");
sleep(10);
// ... more code ...
logger.info("Waiting for 10 seconds before final clean up...");
sleep(10);
Also, the main loop checks every 10 seconds to know if the crawling threads are done:
while (true) {
sleep(10);
// code to check if some thread is still working
}
protected void sleep(int seconds) {
try {
Thread.sleep(seconds * 1000);
} catch (Exception ignored) {
}
}
So it may be worth to fine tune those calls and reduce the sleeping time.
A better solution, if you can spare some time, would be to rewrite this method. I would replace the List<Thread> threads by an ExecutorService, its awaitTermination method would be particularly handy. Unlike Sleep, awaitTermination(10, TimeUnit.SECONDS) will return immediately if all tasks are done.

Related

Thread.sleep() / robot.delay() more accurate?

I want to write a bot for a online game using the Robot class. My problem is now, that the method Thread.sleep() or robot.delay() is to inaccurate. Outside the game they work perfectly fine, with a deviation of approximately only 2 - 3 ms. But when the game is in focus, the methods have a deviation of +5 - +20 ms or even more. That is sadly enaugh to make my bot unusable. Is there any way to make these methods more accurate? Or are there any other ways to solve this problem?
There is no difference
If you browse the source for the JDK, Robot.delay() ends up calling Thread.sleep().
public void delay(int ms) {
checkDelayArgument(ms);
Thread thread = Thread.currentThread();
if (!thread.isInterrupted()) {
try {
Thread.sleep(ms);
} catch (final InterruptedException ignored) {
thread.interrupt(); // Preserve interrupt status
}
}
}
You might be able to give the Java process a higher priority then the game, tasks might be executed more quickly after being given to the scheduler.

Why does ForkJoinPool.commonPool().execute(runnable) take more time to run the thread

I am using ForkJoinPool.commonPool().execute(runnable) as a handy way to spawn a thread in many places across my application. But at a particular invocation of that it is taking more time (more than 10 seconds) to invoke the code in the runnable in a thread. What could be the reason for that? How to avoid that?
EDIT: As per #ben 's answer, avoiding long running process in thread pool seems to the solution. Creating new thread manually solved my problem instead of using common ForkJoinPool.
So after some quick testing I found the issue. Look at the following example code:
List<Runnable> runnables = new ArrayList<Runnable>();
for (int i = 0; i < 20; ++i)
{
runnables.add(() -> {
System.out.println("Runnable start");
try
{
Thread.sleep(10000);
}
catch (InterruptedException e)
{
}
System.out.println("Runnable end");
});
}
for (Runnable run : runnables)
{
//ForkJoinPool.commonPool().execute(run);
//new Thread(run).start();
}
Comment in one of the two lines.
We create a number of runnables that send a message, sit idle for 10s and send a message again. Quite simple.
When using Threads for each of those all Runnables send Runnable start 10s pass, all runnables send Runnable end.
When using the commonPool() just a number of them sends Runnable start 10s pass, they send Runnable end and another bunch of them sends Runnable start until they are all finished.
This is simply because the number of cores on your system determines how many threads the threadpool will hold. When all of them are filled new tasks are not executed until one thread is freed up.
So moral of the story: Only use a threadpool when you know the way it works internally and that is what you want it to do.

How to terminate a specific code block after ceratin time even in a loop

For example I have some code block which takes more than 30 seconds to execute but I want to stop that if it takes more than 30 seconds. I am trying with executor.shutdown(), executor.awaitTermination(30, TimeUnit.SECONDS) and executor.shutdownNow(); but i can not understand where I have to write my code block which I want to terminate after a specific time. Please give an perfect example.
It is pretty simple: when using threads there is no reliable way to kill that thread ( see here for example).
The only choice: start another JVM in another process - because that you can actually kill. See here for details.
Of course - this is rather not the way to go. A better way would be to implement your long-running-task in a way that regularly checks for "cancel" commands for example.
The way to go with your methods mentioned, you just add your task to the executor, then in the next line of code put executor.shutdown(); It restricts your executor from taking other tasks and then your actually put executor.awaitTermination(30, TimeUnit.SECONDS) to set the "timer" to wait for the task to complete during this time
A simple samle code snipet:
ExecutorService taskExecutor = Executors.newFixedThreadPool(1);
taskExecutor.execute(new MyTask());
taskExecutor.shutdown();
try {
taskExecutor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException e) {
...
}

Break out of a recursion in java when the time run out

I'm implementing AI for a chess-like game. I intend to use recursion to try all the possible state of the board and choose out the 'best move'.
Because of the time's limit per move, i need to have some mechanism to break out of those recursive procedure whenever the time limit is reached. Of course i can keep checking the time before making a recursion call and break out if the current time is near the limit, but it is a trade-off with the performance of my program.
It would be great if there is a way to break out of those recursive procedure whenever a timer end. However, since i'm new to Java, i don't know if there are any way to do so in java? Can you give an example code? :)
Checking the time, e.g. System.currentTimeMillis() costs about 200 ns per call. However if this is to much for you, you can have another thread set a flag to stop.
There is a mechanism to do this already.
ExecutorService es = Executors.newSingleThreadExecutor();
Future f = es.submit(new Runnable() {
#Override
public void run() {
long start = System.nanoTime();
while(!Thread.interrupted()) {
// busy wait.
}
long time = System.nanoTime() - start;
System.out.printf("Finished task after %,d ns%n", time);
}
});
try {
f.get(1, TimeUnit.SECONDS); // stops if the task completes.
} catch (TimeoutException e) {
f.cancel(true);
}
es.shutdown();
prints
Finished task after 1,000,653,574 ns
Note: you don't need to start/stop the ExecutorService every time.
I don't think there is any nice way of doing this that doesn't involve checking if you can continue.
Even if you did check the time... what happens if you have 8 milliseconds remaining. Can you guarantee that your recursive call will finish in that time? Do you check the time after every little step (this may add a lot of extra overhead)?
One way is to have your execution(recursion) logic running in one thread, and a timer in another thread. When the timer completes, it invokes an interrupt() on your execution thread. In your worker thread, everytime you complete a recursion, you save the state that you need. Then if it gets interrupted, return the last saved state.
That's just a brief description of one way to do it.. by no means the best way
You can use a boolean flag to set when the AI task have to stop.
Create a thread that will run the AI task, this thread will check a boolean variable before each recursive call. To check boolean variable is more efficient than to call a method to get time. Do the parent thread sleep for the limited time. After it wake up, set the boolean flag to stop the child thread.

question related with java

Can you please suggest how to use until command in Java, actually I have to perform this System.exit(0); after 3 second of current system time. So I am thinking to do by long time=System.currentTimeMillis();
until(System.currentTimeMillis()<(time+3000))
{
System.exit(0);
}
But it reports an error
Actually java does not have an until command but you should use a while-loop or a do-while-loop instead.
Note: Thread.sleep(3000); would be a better way to sleep for three seconds.
I might have misunderstood your requirement but if you just want to wait for 3 seconds then call System.exit(0), you can just use:
Thread.sleep(3000);
System.exit(0);
I apologise if I have misunderstood your question.
If you're trying to wait for a specific period of time constantly polling on the elapsed system time is not the way to go. Alternatively you can use the thread scheduler to pause execution of the current thread and request that the JVM notify you when the time is up. This allows other threads in a multithreaded environment to get things done while you wait.
try {
Thread.sleep(3000);
} catch (InterruptedException ex) {
// Insert appropriate exception handling here...
}
Thread.sleep makes the currently executing thread move to the ready state for 3000ms. The JVM is then responsible for moving the thread back to the running state when at least 3 seconds is up.

Categories

Resources