java Multithreading (newCachedThreadPool ), then writing result to one file? - java

I have tried to do this with simple Threads and succeded but I believe that using Threadpool I could do the same thing more effeciently:)?
simple threads:
public static class getLogFile implements Runnable {
private String file;
public void setFilename(String namefile){
file=namefile;
}
public int run1(String Filenamet) {
connectToServer(XXX, Filenamet, XXX, XXX, XXX, XXX);//creates a file and downloads it
return 0;
}
public void run() {
run1(file);
}
}
in main:
for(x=0 ; x < 36 ; x++){
String Filename1=Filename+x;
getLogFile n=new getLogFile();
n.setFilename(Filename1);
(new Thread(n)).start();
}
Program connects to the server executes 36 commands(using threadpool/simplethreads?!) at the same time and either downloads 36 result files, than merges them, or maybe it could just write to one file on server and then download it?
how to transform this code into threadpools?
how to write data to one file from 36 threads?

I can only offer you directions.
In order to use thread pool, look how ServiceExecutor works. Any example from Google will give you enough information. As an example look at:
http://www.deitel.com/articles/java_tutorials/20051126/JavaMultithreading_Tutorial_Part4.html
Concerning writing 36 threads to its own file, or writing into the one file. I cannot say anything about writing by several threads into the same file, but you may use CyclicBarrier to wait event when all threads will finish writing. Example of its using you may find here:
http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/CyclicBarrier.html

It's not clear what you want to do. My thoughts are that creating 36 separate connections to the server will be a sizable load that it could well do without.
Can the server assemble these 36 files itself and look after the threading itself ? That seems a more logical partition of duties. The server would have knowledge of how parallelisable this work is, and there's a substantial impact on the server of servicing multiple connections (including potentially blocking out other clients).

A simple way to do it using java task executors, as follows:
ExecutorService executor = Executors.newFixedThreadPool(100);
for(int i=0;i<100;i++)
{
executor.execute(new Runnable(i));
}
You can also use spring task executors, it will be easier. However, I will also suggest using a single connection, as mentioned above.

Related

Ideas on concurrent datastructure

I am not sure if i can put my question in the clearest fashion but i will try my best.
Lets say i am retrieving some information from a third party api. The retrieved information will be huge in size. To have a performance gain, instead of retrieving all the info in one go, i will be retrieving the info in a paged fashion (the api gives me that facility, basically an iterator). The return type is basically a list of objects.
My aim here is to process the information i have in hand(that includes comparing and storing in db and many other operations) while i get paged response on the request.
My question here to the expert community is , what data structure do you prefer in such case. Also does a framework like spring batch help you in getting performance gains in such cases.
I know the question is a bit vague, but i am looking for general ideas,tips and pointers.
In these cases, the data structure for me is java.util.concurrent.CompletionService.
For purposes of example, I'm going to assume a couple of additional constraints:
You want only one outstanding request to the remote server at a time
You want to process the results in order.
Here goes:
// a class that knows how to update the DB given a page of results
class DatabaseUpdater implements Callable { ... }
// a background thread to do the work
final CompletionService<Object> exec = new ExecutorCompletionService(
Executors.newSingleThreadExecutor());
// first call
List<Object> results = ThirdPartyAPI.getPage( ... );
// Start loading those results to DB on background thread
exec.submit(new DatabaseUpdater(results));
while( you need to ) {
// Another call to remote service
List<Object> results = ThirdPartyAPI.getPage( ... );
// wait for existing work to complete
exec.take();
// send more work to background thread
exec.submit(new DatabaseUpdater(results));
}
// wait for the last task to complete
exec.take();
This just a simple two-thread design. The first thread is responsible for getting data from the remote service and the second is responsible for writing to the database.
Any exceptions thrown by DatabaseUpdater will be propagated to the main thread when the result is taken (via exec.take()).
Good luck.
In terms of doing the actual parallelism, one very useful construct in Java is the ThreadPoolExecutor. A rough sketch of what that might look like is this:
public class YourApp {
class Processor implements Runnable {
Widget toProcess;
public Processor(Widget toProcess) {
this.toProcess = toProcess;
}
public void run() {
// commit the Widget to the DB, etc
}
}
public static void main(String[] args) {
ThreadPoolExecutor executor =
new ThreadPoolExecutor(1, 10, 30,
TimeUnit.SECONDS,
new LinkedBlockingDeque());
while(thereAreStillWidgets()) {
ArrayList<Widget> widgets = doExpensiveDatabaseCall();
for(Widget widget : widgets) {
Processor procesor = new Processor(widget);
executor.execute(processor);
}
}
}
}
But as I said in a comment: calls to an external API are expensive. It's very likely that the best strategy is to pull all the Widget objects down from the API in one call, and then process them in parallel once you've got them. Doing more API calls gives you the overhead of sending the data all the way from the server to you, every time -- it's probably best to pay that cost the fewest number of times that you can.
Also, keep in mind that if you're doing DB operations, it's possible that your DB doesn't allow for parallel writes, so you might get a slowdown there.

Simultaneously downloading of webpages/files in EJB(java)

I have a small problem with creating threads in EJB.OK I understand why i can not use them in EJB, but dont know how to replace them with the same functionality.I am trying to download 30-40 webpages/files and i need to start downloading of all files at the same time(approximately).This is need ,because if i run them in one thread in queue.It will excecute more than 3 minutes.
I try with #Asyncronious anotation, but nothing happened.
public void execute(String lang2, String lang1,int number) {
Stopwatch timer = new Stopwatch().start();
htmlCodes.add(URL2String(URLs.get(number)));
timer.stop();
System.out.println( number +":"+ Thread.currentThread().getName() + timer.elapsedMillis()+"miseconds");
}
private void findMatches(String searchedWord, String lang1, String lang2) {
articles = search(searchedWord);
for (int i = 0; i < articles.size(); i++) {
execute(lang1,lang2,i);
}
Here are two really good SO answers that can help. This one gives you your options, and this one explains why you shouldn't spawn threads in an ejb. The problem with the first answer is it doesn't contain a lot of knowledge about EJB 3.0 options. So, here's a tutorial on using #Asynchronous.
No offense, but I don't see any evidence in your code that you've read this tutorial yet. Your asynchronous method should return a Future. As the tutorial says:
The client may retrieve the result using one of the Future.get methods. If processing hasn’t been completed by the session bean handling the invocation, calling one of the get methods will result in the client halting execution until the invocation completes. Use the Future.isDone method to determine whether processing has completed before calling one of the get methods.

Play Framework await() makes the application act wierd

I am having some strange trouble with the method await(Future future) of the Controller.
Whenever I add an await line anywhere in my code, some GenericModels which have nothing to do with where I placed await, start loading incorrectly and I can not access to any of their attributes.
The wierdest thing is that if I change something in another completely different java file anywhere in the project, play will try to recompile I guess and in that moment it starts working perfectly, until I clean tmp again.
When you use await in a controller it does bytecode enhancement to break a single method into two threads. This is pretty cool, but definitely one of the 'black magic' tricks of Play1. But, this is one place where Play often acts weird and requires a restart (or as you found, some code changing) - the other place it can act strange is when you change a Model class.
http://www.playframework.com/documentation/1.2.5/asynchronous#SuspendingHTTPrequests
To make it easier to deal with asynchronous code we have introduced
continuations. Continuations allow your code to be suspended and
resumed transparently. So you write your code in a very imperative
way, as:
public static void computeSomething() {
Promise delayedResult = veryLongComputation(…);
String result = await(delayedResult);
render(result); }
In fact here, your code will be executed in 2 steps, in 2 different hreads. But as you see it, it’s very
transparent for your application code.
Using await(…) and continuations, you could write a loop:
public static void loopWithoutBlocking() {
for(int i=0; i<=10; i++) {
Logger.info(i);
await("1s");
}
renderText("Loop finished"); }
And using only 1 thread (which is the default in development mode) to process requests, Play is able to
run concurrently these loops for several requests at the same time.
To respond to your comment:
public static void generatePDF(Long reportId) {
Promise<InputStream> pdf = new ReportAsPDFJob(report).now();
InputStream pdfStream = await(pdf);
renderBinary(pdfStream);
and ReportAsPDFJob is simply a play Job class with doJobWithResult overridden - so it returns the object. See http://www.playframework.com/documentation/1.2.5/jobs for more on jobs.
Calling job.now() returns a future/promise, which you can use like this: await(job.now())

Java Multithreaded I/0 and communication problem

I am using java to create an application for network management. In this application I establish communication with network devices using SNMP4j library (for the snmp protocol). So, Im supposed to scan certain values of the network devices using this protocol and put the result into a file for caching. Up in some point I decided to make my application multi-threaded and assign a device to a thread. I created a class that implements the runnable interface and then scans for the values that I want for each device.
When i run this class alone it, works fine. but when I put multiple threads at the same time the output mess up, it prints additional or out of order output into the files. Now, i wonder if this problem is due to the I/O or due to the communication.
Here I'll put some of the code so that you can see what im doing and help me figure what's wrong.
public class DeviceScanner implements Runnable{
private final SNMPCommunicator comm;
private OutputStreamWriter out;
public DeviceScanner(String ip, OutputStream output) throws IOException {
this.device=ip;
this.comm = new SNMPV1Communicator(device);
oids=MIB2.ifTableHeaders;
out = new OutputStreamWriter(output);
}
#Override
public void run(){
//Here I use the communicator to request for desired data goes something like ...
String read=""
for (int j=0; j<num; j++){
read= comm.snmpGetNext(oids);
out.write(read);
this.updateHeaders(read);
}
out.flush();
//...
}
}
some of the expected ooutput would be something like:
1.3.6.1.2.1.1.1.0 = SmartSTACK ELS100-S24TX2M
1.3.6.1.2.1.1.2.0 = 1.3.6.1.4.1.52.3.9.1.10.7
1.3.6.1.2.1.1.3.0 = 26 days, 22:35:02.31
1.3.6.1.2.1.1.4.0 = admin
1.3.6.1.2.1.1.5.0 = els
1.3.6.1.2.1.1.6.0 = Computer Room
but instead i get something like (varies):
1.3.6.1.2.1.1.1.0 = SmartSTACK ELS100-S24TX2M
1.3.6.1.2.1.1.2.0 = 1.3.6.1.4.1.52.3.9.1.10.7
1.3.6.1.2.1.1.4.0 = admin
1.3.6.1.2.1.1.5.0 = els
1.3.6.1.2.1.1.3.0 = 26 days, 22:35:02.31
1.3.6.1.2.1.1.6.0 = Computer Room
1.3.6.1.2.1.1.1.0 = SmartSTACK ELS100-S24TX2M
1.3.6.1.2.1.1.2.0 = 1.3.6.1.4.1.52.3.9.1.10.7
*Currently I have one file per device scanner desired.
i get them from a list of ip , it looks like this. Im also using a little threadpool to keep a limited number of threads at the same time .
for (String s: ips){
output= new FileOutputStream(new File(path+s));
threadpool.add(new DeviceScanner(s, output));
}
I suspect SNMPV1Communicator(device) is not thread-safe. As I can see it's not a part of SNMP4j library.
Taking a wild guess at what's going on here, try putting everything inside a synchronized() block, like this:
synchronized (DeviceScanner.class)
{
for (int j=0; j<num; j++){
read= comm.snmpGetNext(oids);
out.write(read);
this.updateHeaders(read);
}
out.flush();
}
If this works, my guess is right and the reason for the problems you're seeing is that you have many OutputStreamWriters (one on each thread), all writing to a single OutputStream. Each OutputStreamWriter has its own buffer. When this buffer is full, it passes the data to the OutputStream. It's essentially random when each each OutputStreamWriter's buffer is full - it might well be in the middle of a line.
The synchronized block above means that only one thread at a time can be writing to that thread's OutputStreamWriter. The flush() at the end means that before leaving the synchronized block, the OutputStreamWriter's buffer should have been flushed to the underlying OutputStream.
Note that synchronizing in this way on the class object isn't what I'd consider best practice. You should probably be looking at using a single instance of some other kind of stream class - or something like a LinkedBlockingQueue, with all of the SNMP threads passing their data over to a single file-writing thread. I've added the synchronized as above because it was the only thing available to synchronize on within your pasted example code.
You've got multiple threads all using buffered output, and to the same file.
There's no guarantees as to when those threads will be scheduled to run ... the output will be fairly random ordered, dictated by the thread scheduling.

How do I make "simple" throughput servlet-filter?

I'm looking to create a filter that can give me two things: number of request pr minute, and average responsetime pr minute. I already got the individual readings, I'm just not sure how to add them up.
My filter captures every request, and it records the time each request takes:
public void doFilter(ServletRequest request, ...()
{
long start = System.currentTimeMillis();
chain.doFilter(request, response);
long stop = System.currentTimeMillis();
String time = Util.getTimeDifferenceInSec(start, stop);
}
This information will be used to create some pretty Google Chart charts. I don't want to store the data in any database. Just a way to get current numbers out when requested
As this is a high volume application; low overhead is essential.
I'm assuming my applicationserver doesn't provide this information.
I did something similar once. If I remember well, I had something like
public class StatisticsFilter implements ...
{
public static Statistics stats;
public class PeriodicDumpStat extends Thread
{
...
}
public void doFilter(ServletRequest request, ...()
{
long start = System.currentTimeMillis();
chain.doFilter(request, response);
long stop = System.currentTimeMillis();
stats.add( stop - start );
}
public void init()
{
Thread t = new PeriodicDumpStat();
t.setDaemon( true );
t.start();
}
}
(That's only a sketch)
Make sure that the Statistics object is correctly synchronize, as it will be accessed concurrently.
I had a background DumpStatistics thread that was periodically dumping the stats in an XML file, to be processed later. For better encapsulation, I had the thread as an inner class. You can of course use Runnable as well. As #Trevor Tippins pointed out, it's also good to flag the thread as daemon thread.
I also used Google Chart and had actually another ShowStatisticsServlet that would rad the XML file and turn the data into a nice Chart. The servlet would not depends on the filter, but only on the XML file, so both were actually decoupled. The XML file can be created as a temporary file with File.createTempFile. (Another variant would be of course to keep all data in memory, but storing the data was handy for us to backup the results of perf. tests and analyze them later)
Some colleague claimed that the synchronization in the Statistics object would "kill" the app performance, but in practice it was really neglectable. The overhead to dump the file as well, given that it was done each 10 sec or so.
Hope it helps, or give you some ideas.
PS: And as #William Louth commented, you should write such infrastructure code only if you can't solve your issue with an existing solution. In my case, I was also benchmarking the internal time of my code, not only the complete request processing time.

Categories

Resources