Implementing threads over an existing code - java

I am not a thorough JAVA professional but have experience in Programming, none with threads though. I have an application code which currently does the following.
Make connection with a DB
Pull records from DB into a collection (Each record has an 'action code' 1-5 besides other things
Each record is picked one by one and Based on each action code some particular method (One each for each action code ) is called from a class EVENTHANDLER.class
These individual methods also use/share some other methodsin EVENTHANDLER.class and some other classes for some common functionality that there is
Finally the db_sequence is updated
All records processed so finish
Now , I have a requirement, which is little vague right now, but it wants the introduction of threads into above for primarily a performance enhancement. Along with prioritizing the process of some records with some specific action code above the others. for example- A record with Action code -2 should be on high prority over 1 and then 3 and then 4.
My question is to how to go about first with the approach to implement this. Secondly this is to be done in JAVA 1.6 so what classes should I use. Also any direction codewise (example code) or based on functional flow above would be greatly helpful.
A very direct question is- for the above action code (1-5) should I have five threads running concurrently in whole or should I have one thread for each record (there can be hundreds), irrespective of Action Code?
Thanks Already

I'd be concerned if I were you or the person who asked you to do this.
Do you have numbers to show what the performance for the existing app is? If yes, do they exceed the target for the expected performance? I wouldn't make a judgment regarding threads until I had both.
Threading is an advanced topic that's easy to get wrong, even if you're experienced.
It sounds to me like the database portion can be a single thread. The handlers might be long-running, so I'd run those using Executor and the new constructs in the Java concurrency package. Under no circumstances should you do this with raw Threads.
It sounds to me like you'll need help. I'd find someone that knows Java better than you do to consult.

Related

Determine if a method call is executing on multiple threads

So recently I was asked this an interview:
Suppose you have been given a third party black box java library, you are calling a method from that library. How would you determine if that method call is executing on multiple threads?
I mentioned that I can :
Get a thread dump and find out
Run a profiler and check from there
check using Thread.currentThread().getName()
But they didn't seem satisfied with the answers. What would be the proper way of answering this?
There is the function Thread.activeCount() that gives you the amount of currently active threads.
You could store the returned value before calling the third party method and compare that to the returned value while it's running.
You have to make sure not to create/terminate threads outside the method while it's running, otherwise the result would be affected.
Well, the obvious answer would be to look at the source code. I know he said "black box", but there ain't no such thing where code is involved. It's easy if I got both a library.jar and library-sources.jar, but even if all I have is the library.jar, then just use a decompiler on it. That moves the problem from "evidence of multithreading" to "multithreading confirmed".
The second best way would be to RTFM. Is it documented as using threads and/or being thread-safe? Well then, there ya go. But as we all know, documentation is always lacking, so I don't have a lot of hope here. Still, sometimes you get lucky.
Then... I would go where you went with the profiles and thread-dumps etc.
That said, it's a pretty poor question. As an operational question, who cares if a library is multithreaded as long as it executes correctly and it performs expeditiously. It's not your code, so if it fails (accuracy, or performance), you don't care why - go tell whoever owns it to fix it.
As an interview question, you're evaluating problem-solving skills. Does the applicant approach it in different ways. The more ways to attack the problem, the more senior a person you are likely dealing with - source code inspection, decompilation, profiles, thread dumps, log messages, documentation, etc, etc. The most senior folks will give you the "who cares, if it's a black box, it's not my problem. Tell someone else to fix it and meet my SLA." :) If you don't have a clue what skillset(s) the interviewer was looking for, then the problem may be the interviewer. Junior interviewers often assume there is only one answer (or only one correct answer), when in fact multiples exist - some better than others, but depends on the situation. If you've already given your answer and still get the 'frowny face', (politely) ask them what they were looking for. You'll actually learn A LOT with this question. Some will pull the 'i'm not telling, it's my interview' (command-driven organization where collaboration is not sought). Some will try and guide you toward the right answer, or tell you the question-within-a-question, which gives you a chance to recover.
StackWalker and Thread.getStacktrace can be used to list all currently active methods, something like this should do the job, more or less:
Set<Thread> threads = Thread.getAllStackTraces().keySet();
for (Thread thread : threads) {
for (StackTraceElement stackTraceElement : thread.getStackTrace()) {
stackTraceElement.getClassName()
stackTraceElement.getMethodName();
}
}
Stacktraces around all threads change very quickly, so it's not super reliable method, but might be what they were looking for.

Agent based modeling in anylogic

I need a help :(
I am new in anylogic, the problem is i have 4 identical machines each machine has 5 different critical parts.
I want these critical parts to represent one machine. What i tried to do is i create a machine agent type with population 4, and Inside the diagram of the machine agent I created 5 critical part agent type( i.e cp1 ,cp2..cp5) each with initial no. Of agent = 1, and i extended those cps to the machine agent type..is this correct? I am confused because i have 4 machines, does the initial no of CP should be 4 to be distributed to the 4 machines?
I know it is very stupid question :)
Thank you
If this behavior will only occur in the case of a failure you can model this in a different way. Incorporate fails in the resourcePool and select the flowchart option (instead of modeling it with a delay). In that flowchart you have a pickup (or a similar action) from a queue that should contain the spare parts. Tweaking this behavior will probably be a better approach than modeling the 5 critical parts and use them all.
I would suggest the following approach.
Create a resource pool for each part and require its use in the service(see image):
Then, for each of the resource Pools, you will model failures, as in the picture, and the repair task is a flowchart.
You will need to have a queue to represent the spare parts storage. From there you can remove the specific part you want (this will require you to model that information into the agent type and then search the queue but I expect you to know how to do that.
The repair task is very simple in my example but you can and should improve it to your needs.
Hope this is enough for you to solve your problem.
Best regards,
Luís

What action should a client application take after executing a command?

Background
This question is best illustrated using an example. Say I have a client application (e.g. desktop application, mobile app, etc.) that consumes information from a web service. One of the screens has a list of products that are queried from the web service when the client application starts up and are bound to the UI element. Now, the user creates a new product. This causes the client application to send a command to the web service to add that product to a database.
Question
In the client application, what should happen after the command is issued and is successful? Do you:
Query the full product list from the service and refresh the entire product list in the client application?
Query just the two newly added products and add them to the product list?
Don't query, and instead just use the information available in the client application to create the new products in the GUI, and then add them to the list?
The same questions apply to update too. If you update a product, do you get confirmation of a successful update on the service, and then just let the GUI update the product without further requests to the service?
Edit - Additional details added
From initial feedback, the takeaway appears to be go with the simplest approach unless this:
Leads to performance concerns
Negatively impacts user experience
There is a major/significant portion of my application where the main way to interact with the application is to drag grid records between a number of different grids. For example, dragging a product onto another grid would create a new order, which would need to be sent to the service. Some of these grids are more complex than your standard grid. Records can be grouped, and each group can be collapsed/expanded (see here). In this case, while the grid can be refreshed from the service very quickly, this would probably lead to usability concerns. When a grid is refreshed with all new data, if the user had any groups expanded/collapsed, this would be lost.
So, while most grids in my application could probably just all be refreshed at once, the more complex ones will need to be updated more carefully. I would think this would lend to option 1 or 2 (at least for creating new records). One thought I had was that the client application could create GUIDs for new records to be sent with the application. That way, no follow-up query would need to be made to the service, as the client application would already have the unique ID. Then, the client application would just wait for a successful response from the service prior to showing the user the new record.
Get the whole list
I guess it depends how costly the request/response are. If possible and efficient, I would always choose your first option (get the whole list) until there is a performance concern.
As the saying goes:
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization – For experts only: Don't do it yet.
There is simply less scenarios to cover, less code to write, less code to maintain since you'll need the "get the whole list" service no matter what.
It also returns the "most up to date list of products" in case another client added products simultaneously.
Only pros, until there is a performance concern, in my opinion. These last 3 words would imply that this question will only lead to opinions and should be closed...
I don't think there's any definitive right answer; these kinds of questions need to be thought of on a case by case basis. #3 by itself is often not an option - for example, if you need the client to have a database-generated field like an ID, it's gotta get from point A to point B somehow. You also need to think about how you're exposing any errors to your user, because it's a terrible experience if you make it appear that everything succeeded, but you actually had an error and the product didn't really save.
Beyond that, I'd look at usability as my next criteria. What's the experience like for your users if you refresh the list versus adding just a couple of products? Is there a significant difference? A lot comes down to your specific application, and also the workflow being done. If adding products is something that is the main part of someone's job, where they may spend hours a day doing this, shaving even a second off the time is a real win for your users, while if it's an uncommon workflow that people do from time to time, the performance expectations are somewhat lower.
And last I'd look at code maintenance and complexity. If two paths are giving relatively similar experiences, pick the one that's easier to build and maintain.
There are other options, too. You can go with a hybrid approach - for example, maybe on the client you add the data to the product list immediately (perhaps showing some kind of "saving" indicator), while also asynchronously querying the database so you can refresh the product listing and report any errors. Such approaches tend to be the most complex, but you might go down that route if usability demands it.

Writing hundreds of data objects to a Mongo database

I am working on a Minecraft network which has several servers manipulating 'user-objects', which is just a Mongo document. After a user object is modified it need to be written to the database immediately, otherwise it may be overwritten in other servers (which have an older version of the user object), but sometimes hundreds of objects need to be written away in a short amount of time.. (in a few seconds). My question is: How can I easily write objects to a MongoDB database without really overload the database..
I have been thinking up an idea but I have no idea if it is relevant:
- Create some sort of queue in another thread, everytime an data object gets need to be saved into the database it gets in the queue and then in the 'queue thread' the objects will be saved one by one with some sort of interval..
Thanks in advance
btw Im using Morphia as framework in Java
"hundreds of objects [...] in a few seconds" doesn't sound that much. How much can you do at the moment?
The setting most important for the speed of write operations is the WriteConcern. What are you using at the moment and is this the right setting for your project (data safety vs speed)?
If you need to do many write operations at once, you can probably speed up things with bulk operations. They have been added in MongoDB 2.6 and Morphia supports them as well — see this unit test.
I would be very cautious with a queue:
Do you really need it? Depending on your hardware and configuration you should be able to do hundreds or even thousands of write operations per second.
Is async really the best approach for you? The producer of the write operation / message can only assume his change has been applied, but it probably has not and is still waiting in the queue to be written. Is this the intended behaviour?
Does it make your life easier? You need to know another piece of software, which adds many new and most likely unforeseen problems.
If you need to scale your writes, why not use sharding? No additional technology and your code will behave the same with and without it.
You might want to read the following blogpost on why you probably want to avoid queues for this kind of operation in general: http://widgetsandshit.com/teddziuba/2011/02/the-case-against-queues.html

Cache update with db changes

We have a java based product which keeps Calculation object in database as blob. During runtime we keep this in memory for fast performance. Now there is another process which updates this Calculation object in database at regular interval. Now, what could be the best strategy to implement so that when this object get updated in database, the cache removes the stored object and fetch it again from database.
I won't prefer any caching framework until it is must to use.
I appreciate response on this.
It is very difficult to give you good answer to your question without any knowledge of your system architecture, design constraints, your IT strategy etc.
Personally I would use Messaging pattern to solve this issue. A few advantages of that pattern are as follows:
Your system components (Calculation process, update process) can be loosely coupled
Depending on implementation of Messaging pattern you can "connect" many Calculation processes (out-scaling) and many update processes (with master-slave approach).
However, implementing Messaging pattern might be very challenging task and I would recommend taking one of the existing frameworks or products.
I hope that will help at least a bit.
I did some work similar to your scenario before, generally there are 2 ways.
One, the cache holder poll the database regularly, fetch the data it needs and keep it in the memory. The data can be stored in a HashMap or some other collections. This approach is simple and easy to implement, no extra framework or library needed. But users will have to endure dirty data from time to time. Besides, polling will cause a lot of pressure on DB if the number of pollers is huge or the query is not fast enough. However, it is generally not a bad one if your requirement for real-time is not that high and the scale of your system is relatively small.
The other approach is that the cache holder subscribes the notification of the data updater and update its data after being notified. It provides better user experience, but this will bring more complexity to your system because you have to get some MS infrastructure, such as JMS, involved. Developing and tuning is more time-consuming.
I know I am quite late resonding this but it might help somebody searching for the same issue.
Here was my problem, I was storing requestPerMinute information in a Hashmap in a Java filter which gets loaded during the start of the application. The problem if somebody updates the DB with new information ,the map doesn't know about this.
Solution: I took one variable updateTime in my Java filter which just stored when was my hashmap last got updated and with every request it checks if the current time is time more than 24 hours , if yes then it updates the hashmap from the database.So every 24 hours it just refreshes the whole hashmap.
Although my usecase was not to update at real time so it fits the use case.

Categories

Resources