Simultaneously downloading of webpages/files in EJB(java)

Simultaneously downloading of webpages/files in EJB(java) - java

I have a small problem with creating threads in EJB.OK I understand why i can not use them in EJB, but dont know how to replace them with the same functionality.I am trying to download 30-40 webpages/files and i need to start downloading of all files at the same time(approximately).This is need ,because if i run them in one thread in queue.It will excecute more than 3 minutes.
I try with #Asyncronious anotation, but nothing happened.
public void execute(String lang2, String lang1,int number) {
Stopwatch timer = new Stopwatch().start();
htmlCodes.add(URL2String(URLs.get(number)));
timer.stop();
System.out.println( number +":"+ Thread.currentThread().getName() + timer.elapsedMillis()+"miseconds");
}
private void findMatches(String searchedWord, String lang1, String lang2) {
articles = search(searchedWord);
for (int i = 0; i < articles.size(); i++) {
execute(lang1,lang2,i);
}

Here are two really good SO answers that can help. This one gives you your options, and this one explains why you shouldn't spawn threads in an ejb. The problem with the first answer is it doesn't contain a lot of knowledge about EJB 3.0 options. So, here's a tutorial on using #Asynchronous.
No offense, but I don't see any evidence in your code that you've read this tutorial yet. Your asynchronous method should return a Future. As the tutorial says:
The client may retrieve the result using one of the Future.get methods. If processing hasn’t been completed by the session bean handling the invocation, calling one of the get methods will result in the client halting execution until the invocation completes. Use the Future.isDone method to determine whether processing has completed before calling one of the get methods.

Related

How to automatically collapse repetitive log output in log4j

Every once in a while, a server or database error causes thousands of the same stack trace in the server log files. It might be a different error/stacktrace today than a month ago. But it causes the log files to rotate completely, and I no longer have visibility into what happened before. (Alternately, I don't want to run out of disk space, which for reasons outside my control right now is limited--I'm addressing that issue separately). At any rate, I don't need thousands of copies of the same stack trace--just a dozen or so should be enough.
I would like it if I could have log4j/log4j2/another system automatically collapse repetitive errors, so that they don't fill up the log files. For example, a threshold of maybe 10 or 100 exceptions from the same place might trigger log4j to just start counting, and wait until they stop coming, then output a count of how many more times they appeared.
What pre-made solutions exist (a quick survey with links is best)? If this is something I should implement myself, what is a good pattern to start with and what should I watch out for?
Thanks!

Will the BurstFilter do what you want? If not, please create a Jira issue with the algorithm that would work for you and the Log4j team would be happy to consider it. Better yet, if you can provide a patch it would be much more likely to be incorporated.

Log4j's BurstFilter will certainly help prevent you filling your disks. Remember to configure it so that it applies in as limited a section of code as you can, or you'll filter out messages you might want to keep (that is, don't use it on your appender, but on a particular logger that you isolate in your code).
I wrote a simple utility class at one point that wrapped a logger and filtered based on n messages within a given Duration. I used instances of it around most of my warning and error logs to protect the off chance that I'd run into problems like you did. It worked pretty well for my situation, especially because it was easier to quickly adapt for different situations.
Something like:
...
public DurationThrottledLogger(Logger logger, Duration throttleDuration, int maxMessagesInPeriod) {
...
}
public void info(String msg) {
getMsgAddendumIfNotThrottled().ifPresent(addendum->logger.info(msg + addendum));
}
private synchronized Optional<String> getMsgAddendumIfNotThrottled() {
LocalDateTime now = LocalDateTime.now();
String msgAddendum;
if (throttleDuration.compareTo(Duration.between(lastInvocationTime, now)) <= 0) {
// last one was sent longer than throttleDuration ago - send it and reset everything
if (throttledInDurationCount == 0) {
msgAddendum = " [will throttle future msgs within throttle period]";
} else {
msgAddendum = String.format(" [previously throttled %d msgs received before %s]",
throttledInDurationCount, lastInvocationTime.plus(throttleDuration).format(formatter));
}
totalMessageCount++;
throttledInDurationCount = 0;
numMessagesSentInCurrentPeriod = 1;
lastInvocationTime = now;
return Optional.of(msgAddendum);
} else if (numMessagesSentInCurrentPeriod < maxMessagesInPeriod) {
msgAddendum = String.format(" [message %d of %d within throttle period]", numMessagesSentInCurrentPeriod + 1, maxMessagesInPeriod);
// within throttle period, but haven't sent max messages yet - send it
totalMessageCount++;
numMessagesSentInCurrentPeriod++;
return Optional.of(msgAddendum);
} else {
// throttle it
totalMessageCount++;
throttledInDurationCount++;
return emptyOptional;
}
}
I'm pulling this from an old version of the code, unfortunately, but the gist is there. I wrote a bunch of static factory methods that I mainly used because they let me write a single line of code to create one of these for that one log message:
} catch (IOException e) {
DurationThrottledLogger.error(logger, Duration.ofSeconds(1), "Received IO Exception. Exiting current reader loop iteration.", e);
}
This probably won't be as important in your case; for us, we were using a somewhat underpowered graylog instance that we could hose down fairly easily.

How would I avoid using Thread.sleep()?

I have the below snippet of code, designed to check if a message was sent to a phone number:
public static boolean checkMessages(long sendTime, String phone) {
boolean gotMessage = false;
while (!gotMessage) {
try {
Thread.sleep(5000);
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
gotMessage = MessageHandler.getResponse(sendTime, phone);
}
return gotMessage;
}
This code itself is called through a CompletableFuture, so it can run in parallel with another check. If neither check is satisfied within a certain amount of time, both will expire.
Now, according to my IDE and this site, using Thread.sleep() is bad for a number of reasons, so I'd like to remove it from my code somehow.
Is there a way to do this such that this method will only ever return true, like it currently is?
MessageHandler.getResponse() is a handler I wrote to check if I received a text message containing a specific (hardcoded) string of text from a specific phone number. It does block execution until it finishes checking, but the API I'm using has very aggressive rate limits. The API offers no callbacks -- it must manually be called.

It's not very clear what your whole code does. As commented by others, knowing what MessageHandler does would add some context.
The Thread.sleep static invocation will make the current thread sleep for at least the given amount of time,
subject to the precision and accuracy of system timers and schedulers
(see API)
If your MessageHandler.getResponse invocation blocks before returning, then you probably don't need to sleep at all.
However, if this task is repeated "endlessly", you probably want to use a ScheduledExecutorService instead, and run it based on a schedule.
Bottomline, Thread.sleep is not "bad practice" per se, but you seldom need to actually use it.

I fully agree with Mena's response, but to offer an alternate implementation to Thread.sleep, you can use a CountdownLatch to perform your looping:
public void blockingWaitForMessage(long sendTime, String phone) throws InterruptedException{
final CountDownLatch latch = new CountDownLatch(1);
while (!latch.await(5, TimeUnit.SECONDS)) {
if (MessageHandler.getResponse(sendTime, phone)) {
latch.countDown();
}
}
}
Using CountDownLatch.await handles both your boolean and temporal states!

You can use guava retrying library.
Retryer
It has nice API.
Or you can decorate ScheduledThreadPoolExecutor from Java library.

Ideas on concurrent datastructure

I am not sure if i can put my question in the clearest fashion but i will try my best.
Lets say i am retrieving some information from a third party api. The retrieved information will be huge in size. To have a performance gain, instead of retrieving all the info in one go, i will be retrieving the info in a paged fashion (the api gives me that facility, basically an iterator). The return type is basically a list of objects.
My aim here is to process the information i have in hand(that includes comparing and storing in db and many other operations) while i get paged response on the request.
My question here to the expert community is , what data structure do you prefer in such case. Also does a framework like spring batch help you in getting performance gains in such cases.
I know the question is a bit vague, but i am looking for general ideas,tips and pointers.

In these cases, the data structure for me is java.util.concurrent.CompletionService.
For purposes of example, I'm going to assume a couple of additional constraints:
You want only one outstanding request to the remote server at a time
You want to process the results in order.
Here goes:
// a class that knows how to update the DB given a page of results
class DatabaseUpdater implements Callable { ... }
// a background thread to do the work
final CompletionService<Object> exec = new ExecutorCompletionService(
Executors.newSingleThreadExecutor());
// first call
List<Object> results = ThirdPartyAPI.getPage( ... );
// Start loading those results to DB on background thread
exec.submit(new DatabaseUpdater(results));
while( you need to ) {
// Another call to remote service
List<Object> results = ThirdPartyAPI.getPage( ... );
// wait for existing work to complete
exec.take();
// send more work to background thread
exec.submit(new DatabaseUpdater(results));
}
// wait for the last task to complete
exec.take();
This just a simple two-thread design. The first thread is responsible for getting data from the remote service and the second is responsible for writing to the database.
Any exceptions thrown by DatabaseUpdater will be propagated to the main thread when the result is taken (via exec.take()).
Good luck.

In terms of doing the actual parallelism, one very useful construct in Java is the ThreadPoolExecutor. A rough sketch of what that might look like is this:
public class YourApp {
class Processor implements Runnable {
Widget toProcess;
public Processor(Widget toProcess) {
this.toProcess = toProcess;
}
public void run() {
// commit the Widget to the DB, etc
}
}
public static void main(String[] args) {
ThreadPoolExecutor executor =
new ThreadPoolExecutor(1, 10, 30,
TimeUnit.SECONDS,
new LinkedBlockingDeque());
while(thereAreStillWidgets()) {
ArrayList<Widget> widgets = doExpensiveDatabaseCall();
for(Widget widget : widgets) {
Processor procesor = new Processor(widget);
executor.execute(processor);
}
}
}
}
But as I said in a comment: calls to an external API are expensive. It's very likely that the best strategy is to pull all the Widget objects down from the API in one call, and then process them in parallel once you've got them. Doing more API calls gives you the overhead of sending the data all the way from the server to you, every time -- it's probably best to pay that cost the fewest number of times that you can.
Also, keep in mind that if you're doing DB operations, it's possible that your DB doesn't allow for parallel writes, so you might get a slowdown there.

Play Framework await() makes the application act wierd

I am having some strange trouble with the method await(Future future) of the Controller.
Whenever I add an await line anywhere in my code, some GenericModels which have nothing to do with where I placed await, start loading incorrectly and I can not access to any of their attributes.
The wierdest thing is that if I change something in another completely different java file anywhere in the project, play will try to recompile I guess and in that moment it starts working perfectly, until I clean tmp again.

When you use await in a controller it does bytecode enhancement to break a single method into two threads. This is pretty cool, but definitely one of the 'black magic' tricks of Play1. But, this is one place where Play often acts weird and requires a restart (or as you found, some code changing) - the other place it can act strange is when you change a Model class.
http://www.playframework.com/documentation/1.2.5/asynchronous#SuspendingHTTPrequests
To make it easier to deal with asynchronous code we have introduced
continuations. Continuations allow your code to be suspended and
resumed transparently. So you write your code in a very imperative
way, as:
public static void computeSomething() {
Promise delayedResult = veryLongComputation(…);
String result = await(delayedResult);
render(result); }
In fact here, your code will be executed in 2 steps, in 2 different hreads. But as you see it, it’s very
transparent for your application code.
Using await(…) and continuations, you could write a loop:
public static void loopWithoutBlocking() {
for(int i=0; i<=10; i++) {
Logger.info(i);
await("1s");
}
renderText("Loop finished"); }
And using only 1 thread (which is the default in development mode) to process requests, Play is able to
run concurrently these loops for several requests at the same time.
To respond to your comment:
public static void generatePDF(Long reportId) {
Promise<InputStream> pdf = new ReportAsPDFJob(report).now();
InputStream pdfStream = await(pdf);
renderBinary(pdfStream);
and ReportAsPDFJob is simply a play Job class with doJobWithResult overridden - so it returns the object. See http://www.playframework.com/documentation/1.2.5/jobs for more on jobs.
Calling job.now() returns a future/promise, which you can use like this: await(job.now())

How do I perform a Unit Test using threads? [duplicate]

This question already has answers here:
How should I unit test multithreaded code?
(29 answers)
Closed 5 years ago.
Executive Summary: When assertion errors are thrown in the threads, the unit test doesn't die. This makes sense, since one thread shouldn't be allowed to crash another thread. The question is how do I either 1) make the whole test fail when the first of the helper threads crashes or 2) loop through and determine the state of each thread after they have all completed (see code below). One way of doing the latter is by having a per thread status variable, e.g., "boolean[] statuses" and have "statuses[i] == false" mean that the thread failed (this could be extended to capture more information). However, that is not what I want: I want it to fail just like any other unit test when the assertion errors are thrown. Is this even possible? Is it desirable?
I got bored and I decided to spawn a bunch of threads in my unit test and then have them call a service method, just for the heck of it. The code looks approximately like:
Thread[] threads = new Thread[MAX_THREADS];
for( int i = 0; i < threads.length; i++ ) {
threads[i] = new Thread( new Runnable() {
private final int ID = threadIdSequenceNumber++;
public void run() {
try {
resultRefs[ID] = runTest( Integer.toString( ID ) ); // returns an object
}
catch( Throwable t ) {
// this code is EVIL - it catches even
// Errors - don't copy it - more on this below
final String message = "error testing thread with id => "
+ ID;
logger.debug( message, t );
throw new IllegalStateException( message, t );
// need to wrap throwable in a
// run time exception so it will compile
}
}
} );
}
After this, we will loop through the array of threads and start each one. After that we will wait for them all to finish. Finally, we will perform some checks on the result references.
for( Thread thread : threads )
thread.start();
logger.debug( "waiting for threads to finish ..." );
boolean done = false;
while( !done ) {
done = true;
for( Thread thread : threads )
if( thread.isAlive() )
done = false;
}
for( int i = 0; i < resultRefs.length; i++ ) {
assertTrue( "you've got the world messed, dawg!",
myCondition(resultRefs[i]) );
Here's the problem. Did you notice that nasty try-catch-throwable block? I just added that as a temporary hack so I could see what was going on. In runTest( String ) a few assertions are made, e.g., assertNotNull( null ), but since it is in a different thread, it doesn't cause the unit test to fail!!!!
My guess is that we will need to somehow iterate over the threads array, check the status of each, and manually cause an assertion error if the thread terminated in a nasty way. What's the name of the method that gives this information (the stack trace of the dead thread).

Concurrency is one of those things that are very difficult to unit test. If you are just trying to test that the code inside each thread is doing what it is supposed to test, may be you should just test this code isolated of the context.
If in this example the threads collaborate to reach a result, may be you can test that collaboration without using threads. That would be done by executing all the collaborative parts sequentially.
If you want to test for race conditions and these kind of things, unit testing is not the best way. You will get tests that sometimes fail and sometimes don´t fail.
To summarize, I think that may be your problem is that you are unit testing in a level too high.
Hope this helps

The Google Testing Blog had an excellent article on this subject that's well worth reading: http://googletesting.blogspot.com/2008/08/tott-sleeping-synchronization.html
It's written in Python, but I think the principles are directly transferable to Java.

Unit testing in a multithreaded environment is tough... so some adjustments need to be made. Unit tests must be repeatable.. deterministic. As a result anything with multiple threads fails this criteria. Tests with multiple threads also tend to be slow.
I'd either try to see if I can get by with testing on a single thread.. does the logic under test really need multiple threads.
If that doesn't work, go with the member variable approach that you can check against an expected value at the end of the test, when all the threads have finished running.
Hey seems like there's another question just like this. Check my post for a link to a longer discussion at the tdd yahoogroup
Unit testing a multithreaded application?

Your runnable wrapper should be passing the exception object back to your test class and then you can store them in a collection. When all the tests are finish you can test the collection. If it isn't empty, iterate over each of the exceptions and .printStackTrace() then fail.

Implement a UncaughtExceptionHandler that sets some flags (which the Threads peridocially check) and set it on each Thread.

Another popular option for Junit concurrent thread testing is Matthieu Carbou's method using a custom JunitRunner and a simple annotation.
See the full documentation

It is possible making the unit test to fail, by using a special synchronization object. Take a look at the following article:
Sprinkler - Advanced synchronization object
I'll try to explain the main points here.
You want to be able to externalize internal threads failures to the main thread, which, in your case is the test. So you have to use a shared object/lock that both the internal thread and the test will use to sync each other.
See the following test - it creates a thread which simulates a thrown exception by calling a shared object named Sprinkler.
The main thread (the test) is blocked on Sprinkler.getInstance().await(CONTEXT, 10000)
which, by the time release is called - will be free and catch the thrown exception.
In the catch block you can write the assert which fails the test.
#Test
public void testAwait_InnerThreadExternalizeException() {
final int CONTEXT = 1;
final String EXCEPTION_MESSAGE = "test inner thread exception message";
// release will occur sometime in the future - simulate exception in the releaser thread
ExecutorServiceFactory.getCachedThreadPoolExecutor().submit(new Callable<void>() {
#Override
public Void call() throws Exception {
Sprinkler.getInstance().release(CONTEXT, new RuntimeException(EXCEPTION_MESSAGE));
return null;
}
});
Throwable thrown = null;
try {
Sprinkler.getInstance().await(CONTEXT, 10000);
} catch (Throwable t) {
// if the releaser thread delivers exception it will be externelized to this thread
thrown = t;
}
Assert.assertTrue(thrown instanceof SprinklerException);
Assert.assertEquals(EXCEPTION_MESSAGE, thrown.getCause().getMessage());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Simultaneously downloading of webpages/files in EJB(java) - java

Related

How to automatically collapse repetitive log output in log4j

How would I avoid using Thread.sleep()?

Ideas on concurrent datastructure

Play Framework await() makes the application act wierd

How do I perform a Unit Test using threads? [duplicate]

Categories

Resources