Exchange data in real time over AJAX with multiple threads

Exchange data in real time over AJAX with multiple threads - java

I am developing an application in JSF 2.0 and I would like to have a multiline textbox which displays output data which is being read (line by line) from a file in real time.
So the goal is to have a page with a button on it that triggers the backend to start reading from the file and then displaying the results as it's reading in the textbox.
I had thought about doing this in the following way:
Have the local page keep track of what lines it has retrieved/displayed in the textbox so far.
Periodically the local page will poll the backend using AJAX and request any new data that has been read (tell it what lines the page has so far and only retrieve the new lines since then).
This will continue until the entire file has been completely retrieved.
The issue is that the bean method that reads from the file is running a while loop that blocks. So to read from the data structure it is writing to at the same time will require using additional Threads, correct? I hear that spawning new Threads in a web application is a potentially dangerous move and that Thread pools should be used, etc.
Can anyone shed some insight on this?
Update: I tried a couple of different things with no luck. But I did manage to get it working by spawning a separate Thread to run my blocking loop while the main thread could be used to read from it whenever an AJAX request is processed. Is there a good library I could use to do something similar to this that still gives JSF some lifecycle control over this Thread?

Have you considered implementing the Future interface (included in Java5+ Concurrency API)? Basically, as you read in the file, you could split it into sections and simply create a new Future object (for each section). Then you can have the object return once the computation has completed.
This way you prevent having to access the structure while it is still being manipulated by the loop and you also split the operations into smaller computations reducing the amount of time locking occurs (total lock time might be greater but you get faster response to other areas). If you maintain the order in which your Future objects were created then you don't need to track line #'s. Note that calling Future.get() does block until the object is 'ready'.
The rest of you're approach would be similar - make the Ajax call to get content of all 'ready' Future objects from a FIFO queue.
I think I understand what you're trying to accomplish.. maybe a bit more info would help.

Related

Reading huge file in Java

I read a huge File (almost 5 million lines). Each line contains Date and a Request, I must parse Requests between concrete **Date**s. I use BufferedReader for reading File till start Date and than start parse lines. Can I use Threads for parsing lines, because it takes a lot of time?

It isn't entirely clear from your question, but it sounds like you are reparsing your 5 million-line file every time a client requests data. You certainly can solve the problem by throwing more threads and more CPU cores at it, but a better solution would be to improve the efficiency of your application by eliminating duplicate work.
If this is the case, you should redesign your application to avoid reparsing the entire file on every request. Ideally you should store data in a database or in-memory instead of processing a flat text file on every request. Then on a request, look up the information in the database or in-memory data structure.
If you cannot eliminate the 5 million-line file entirely, you can periodically recheck the large file for changes, skip/seek to the end of the last record that was parsed, then parse only new records and update the database or in-memory data structure. This can all optionally be done in a separate thread.

Firstly, 5 million lines of 1000 characters is only 5Gb, which is not necessarily prohibitive for a JVM. If this is actually a critical use case with lots of hits then buying more memory is almost certainly the right thing to do.
Secondly, if that is not possible, most likely the right thing to do is to build an ordered Map based on the date. So every date is a key in the map and points to a list of line numbers which contain the requests. You can then go direct to the relevant line numbers.
Something of the form
HashMap<Date, ArrayList<String>> ()
would do nicely. That should have a memory usage of order 5,000,000*32/8 bytes = 20Mb, which should be fine.
You could also use the FileChannel class to keep the I/O handle open as you go jumping from on line to a different line. This allows Memory Mapping.
See http://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html
And http://en.wikipedia.org/wiki/Memory-mapped_file

A good way to parallelize a lot of small tasks is to wrap the processing of each task with a FutureTask and then pass each task to a ThreadPoolExecutor to run them. The executor should be initalized with the number of CPU cores your system has available.
When you call executor.execute(future), the future will be queued for background processing. To avoid creating and destroying too many threads, the ScheduledThreadPoolExecutor will only create as many threads as you specified and execute the futures one after another.
To retrieve the result of a future, call future.get(). When the future hasn't completed yet (or wasn't even started yet), this method will freeze until it is completed. But other futures get executed in background while you wait.
Remember to call executor.shutdown() when you don't need it anymore, to make sure it terminates the background threads it otherwise keeps around until the keepalive time has expired or it is garbage-collected.
tl;dr pseudocode:
create executor
for each line in file
create new FutureTask which parses that line
pass future task to executor
add future task to a list
for each entry in task list
call entry.get() to retrieve result
executor.shutdown()

Java simple Analytics/Event Stream Processing with front end

My application takes a lot of measurements of it's internal processes. For example I time certain methods, I time external webservice calls and I also have variables which have a changing value, and processes which have a 'state' (e.g. PAUSED, WAITING etc).
The application uses 100 to 200 threads, and each bit of data would be associated with a particular thread.
I am looking for some software that I can channel all this information into that would produce useful metrics and graphs of the data (ideally in real time or close to real time), let me set thresholds to trigger warnings, would allow me to filter the data by thread or thread group, etc etc.
The application is performing time critical tasks so the software/api would need to be very fast and never block.
The application is written in java, and ideally the software/api would be in java as well. I think what I'm looking for is called Event Stream Processing, but I'm really not sure what language to use to describe it.
All I've found so far are Esper and ERMA. Can anyone give me a recommendation? I'm the only one working on this project so I'm hoping for something that is pretty easy to set up and use, and has a workable front end.

In the end I found Graphite which was pretty close to being exactly what I wanted. Not the simplest to set up and configure however, but I got it working in the end.
http://graphite.wikidot.com/
In my case I send data directly from my application to Statsd (via UDP), which collects the data and does some pre processing before it ends up in the whisper back end, there is a simple example of a java interface here https://github.com/etsy/statsd/commit/2253223f3c19d2149d65ec5bc802198ff93da4cb
Alternatively you could send your data directly to graphite, example here http://neopatel.blogspot.co.uk/2011/04/logging-to-graphite-monitoring-tool.html

Graphically/Text display of thread progress and status

I am working on a program (Java) that uses concurrent threading quite heavily. I run into issues with the work being performed by these threads very regularly. It's not an issue with the actual thread handling, instead it is the actual stuff it's doing (db access, math computations, file IO etc).
I would like to provide some way of seeing the status of threads in realtime from the console. Perhaps something like this:
THREAD ID THREAD STATUS TABLE NAME ELAPSED TIME
Thread 1: Dumping MSF011 22s
Thread 2: Conversion MSF002 2h 8m
Thread 3: Conversion MSF020 10s
Thread 4: Loading MSF001 14m
ITEMS LEFT IN QUEUE: MSF033, MSF123, MSFXYZ
sort of thing.
Ideally I'd like to see that updated in place (so no new lines etc, but I am open to ANY idea that lets me see information like this quickly.

How important is the console output? I mean, will other mechanisms (ie graphical) be ok?
Either way, I'd approach it as two steps.
Instrument your threads
Display the instrument data
Instrument your threads
If JConole and the default thread information isn't enough (WAITING, stack traces etc), you can get your threads to post updates to their state as they go along. I like to use MBeans to do this so that way you can separate the posting of updates from the reading. Otherwise you could update some shared location with the state and have the reading done in the same VM. Perhaps even dumping process information to a file?
Display the instrument data
Once you've got the threads updating the process information within, displaying it should be straight forward. If you really want the console output and to have it not scroll, I think something like ncurses is your only choice.
Otherwise, it's probably simpler to write a little UI that reads the instrument data and updates a display. You can read this data via the MBean server if your using MBeans (and so separate the UI physically from the server) or just read from say a file. JFreeChart is nice if you want some pretty graphs.
Having said all that, Haim has written a 'top' style thing to monitor threads. See here. Might be useful

Handling asynchronous saving with the possibility of time-critical errors?

So, to explain this, I'll start out by going through the application stack.
The system is running JSP with jQuery on top, talking through a controller layer with a service layer, which in turn utilizes a persistence layer implemented in Hibernate.
Now, traditionally, errors like having overlapping contracts has been handled through throwing exceptions up through the layers until they're translated into an error message for the user.
Now I have an object that at any given time can only be tied to one contract. At the moment, when I save a contract, I look at all of these objects and check if they're already covered by an existing contract. However, since multiple clients can be saving at any given time, this introduces the risk of getting past the check on two separate contracts, leading to one object being tied to two contracts at the same time.
To combat this, the idea was to use a queue, put objects into the queue from the main thread, and then have a separate thread take them out one by one, saving them.
However, here's the problem. For one, I would like the user to know that the saving is currently happening, for another, if by accident the scenario before happens, and two contracts with the same object covering the same time is in the queue, the second one will fail, and this needs to be sent back to the user.
My initial attempt was to keep data fields on the object put into the queue, and then check against those in a blocking wait, and then throw an exception or report success based on what happens. That deadlocked the system completely.
Anyone able to point me in the right direction with regards to techniques and patterns I should be using for this?

I can't really tell why you have a deadlock without seeing your code. I can think of some other options though:
Poll the thread to see its state (not as good).
Use some kind of eventing system. You would have an event listener (OverlappingContractEventListener perhaps) and then you would trigger the event from the thread when the scenario happens. The event handler would need to persist this information somehow.
If you are going for this approach, then on the client side you will need to poll.
You can poll a specific controller (using setInterval and AJAX) that looks up the corresponding information for the object to see what state its in. This information should have been persisted by your event listener.
You can use web workers (this is supported in Chrome, Firefox, Safari, and Opera. IE will support it in 10) and perform the polling in the background.
There is one other way that doesn't involve eventing. It depends on you figuring out the source of your deadlock though. Once you fix the source of your deadlock you can do one of two things:
Perform an AJAX call to the controller. The controller will wait for the service to return information. The code to issue feedback to the user will be inside the success handler of your controller.
Use a web worker to perform the call in the background. The web worker would also perform an AJAX call and wait for the response.

Shouldn't you be doing the check for duplicate contracts in the database? Depending on the case, you can do this with a constraint, trigger, o stored procedure. If it fails, send an exception up the stack. That's normally the way to handle things like this. You can then catch the exception in jQuery and display an error:
jQuery Ajax error handling, show custom exception messages
Hope this helps.

Multithreading a jsp?

I'm new to jersey, jsp's and web application development in general so hopefully this isn't a silly question. I've got a jsp and currently when the user hits a button on it, it starts a HTTP request which takes about 5-10 minutes to return. Once it finishes they're redirected to another page.
I'm wondering, is it possible or even advisable that I multithread the application so that the heavy processing will start but the user get's redirected to the next .jsp right away. If multithreading is not possible is there another method that you would recommend for dealing with heavy processing in a web application?

A JSP is basically a Servlet (it's translated in a Java Servlet Class and compiled). Teoretically you can start a new thread in a servlet (and hence in a JSP, via scriptlet), but that's really not advised for multiple reasons.
It'd be better recommended to make an asynchronous HTTP call via ajax, then, once the call is done immediately show something else to the user, and when the call back returns display the results.

Rather than create a new thread each time it might be more efficient to have a worker thread which continually polls a shared queue. Using, for example, ArrayBlockingQueue you web request can simple add an object to the queue and return to the user, and your worker thread (or repeating scheduled job) can take care of the heavy weight processing.

Instead of waiting for process to complete in a JSP, you can create a TimerTask (or Quartz Job) and set it for immediate execution and redirect user to some other page. Have that Job store the result in some central place that can be accessed by another JSP (in case you want to pull result of Job later, may be through ajax) Doing so, you save yourself from managing threads manually (which is error prone), you get async functionality, user does not need to see the blank browser screen for around 5-10 minutes.

It is possible.
Create a thread, store its reference somewhere that is available everywhere (a static Map) and store its key (in the session, in the code of the JSP's answer).
Following calls can retrieve the thread and check its state/results.
Anyway, use with care:
a) You will need to control that old results are deleted. It is inevitable that sometimes the browser will close, so you need a Watchdog to clear data obviously no longer needed.
b) The user are not used to this kind of behavior. There is a serious risk that they will just "go back" and try to launch the thread again, and again, and again. Try to control it (ideally the id of the thread will be linked to the user, so as long as an older thread is active an user cannot launch another one).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.