"Caching" Alternatives on Process Intense Calculations

"Caching" Alternatives on Process Intense Calculations - java

Background: I have done a bit of looking into Caching in Spring and it seems like a great way to save time for common read operations. My code currently has a loop over a large number of items, where I am performing logic to see if certain other objects are connected in a way through common items. A way to think about this is similar to a shopping website's related items showing up when you view a certain item. The values I use to determine this are complex, but that is the basic idea.
On loading the item page there is a very long load time trying to compute and figure out which other items are related in some way as to display links to them. Instead of computing this list every time an item page loads, I have started "caching" items with a list of their recommended items. Many things in the system can trigger a need to recalculate these relations: adding/removing properties to items, adding/removing items, etc.
Problem: My "cache" is simply a singleton object containing a Map for items and their related objects. The process of iterating through every item in the system when any change to the cache is needed is very time consuming and process intensive. Java Caches don't seem to be the right answer due to constant changes to items. Is there any other design patterns that I am overlooking for this design? Caches seem to be close, but I am not sure if this problem fits into the mold of caching, due to it being a little more complex then a bulk amount of reads to a single item.
Are caches the way to go with this? If caching isn't the right solution, what is?

It seems that caches are not a solution for your problem, but they might help you in reaching a solution.
For example instead of caching the created items another approach is to cache information that rarely changes but is crucial to create the lists.
Spring function based caching (ie #Cachable) might come in handy, either for caching or invalidation.
The next level is to examine different types of caches (ie. redis) and what they offer in terms of algorithms, sorting and Pub/Sub.

Related

Finding object by index, HashMaps or looping through an ArrayList? Which is fastest? (Java) [duplicate]

I've building a tree pagination in JSF1.2 and Richfaces 3.3.2, because I have a lot of tree nodes (something like 80k), and it's slow..
So, as first attempt, I create a HashMap with the page and the list of nodes of the page.
But, the performance isn't good enough...
So I was wondering if is something faster than a HashMap, maybe a List of Lists or something.
Someone have some experience with this? What can I do?
Thanks in advance.
EDIT.
The big problem is that I have to validate permissions of users in the childnodes of the tree. I knew that this is the big problem: this validation is slow, because I have to go inside the nodes, I don't have a good way to know if the user have permission in a 10th level node without iterate all of them. Plus to this, the same three has used in more places...
The basic reason for why I was doing this pagination, is that the client side will be much slow, because of the structure generated by richfaces, a lot of tr's and td's, the browser just going crazy with this.
So, unfortunatelly, I have to load all the nodes, and paginate just client side, and I need to know what of them is faster to iterate...
Sorry my bad english.

A hash map is the fastest data structure if you want to get all nodes for a page. The list of nodes can be fetched in constant time (O(1)) while with lists the time is O(n) (n=number of pages, faster on sorted lists but never getting near O(1))
What operations on your datastructure are too slow. That's what you have to analyse before you start optimization.

It's probably more due to the fact that JSF is a performance pig than a data structure choice. The one attempt I've seen to create a JSF app could be timed with a sundial.
You're making a mistake by guessing about solutions without more knowledge about the root cause. I'd recommend that you profile your app to see where the time is being spent.

The data structure to use always depends on how you need to store the data and how you need to access it. HashMap<K, V> is supposed to have constant time complexity in accessing the value, provided the key. When you call get(key), the hashCode() for key is computed and it's used to retrieve the related value. Unless you've got different keys that have the same hashcode (in which case you may have been doing something wrong, as while is not mandatory different objects should have different hash codes, at least in the majority of cases), this is usually fast.
Searching an element in a plain list requires scanning of the list, which will (almost) always be slower than computing an hashcode.
If you need to associate values with keys, a Map is the way. And HashMap should be fast enough.
I don't know too much about JSF, but I think - if the data structure and access pattern is the one that a Map is designed for - the problem is not the HashMap itself.

I would solve this with a javascript/ajax calls method that fetches childnodes.

Data structure for continuous additions and cheap deletions

I am reading this blog post about making animations with Gnuplot and Cairo -terminal which algo's plan is simply
to save png-images to working directory, and
to save latest the video to working directory.
I would like to have something more such that the user can also browse the images real time when the images are being converted:
Data-parallelism model - data structure regularly arranged in an array
to give the user some list in some interface which the user can browse by arrow buttons
in this interface, new images are being added to the end of the list
the user can also remove bad images from the stream in real time
which may work well in Data parallelism model of Parallel programming i.e. a data set regularly structured in an array.
The operations (additions, deletions) can operate on this data, but independently on distinct processes.
Let's assume that there is no need for efficient searches for simplicity in Version 1.
However, if you come with a model which can do that also, I am happy to consider it - let's call it Version 2.
I think a list is not a good data structure here because of the wanted opportunity for deletions and continuous easy addition to the end of the data structure.
The data structure stack is not going to work either because of deletions.
I think some sort of tree data structure can work because of rather cheap deletions and cheap search there.
However, a simple array in the Data-parallelism model can be sufficient.
Languages
I think Java is a good option here because of parallelism.
However, any language and pseudocode are good too.
Frontend
I have an intuition that requirements for such a system in the frontend should be qT as a terminal emulator.
What is a better data structure for cheap deletions and continuous additions to the end?

Java LinkedList seems to be the thing you could use for version 1. you can use its single param add() to append to the list in constant time. if by "real-time" you mean when the image is in user's display and thus pointed to somehow, can delete them in constant time as well.
optimum use of memory and no re-instantiation as you'd have with an Arraylist.
any doubly linked list implemented on objects (as opposed to an array) would do.
your second version isn't clear enough.

Cache update with db changes

We have a java based product which keeps Calculation object in database as blob. During runtime we keep this in memory for fast performance. Now there is another process which updates this Calculation object in database at regular interval. Now, what could be the best strategy to implement so that when this object get updated in database, the cache removes the stored object and fetch it again from database.
I won't prefer any caching framework until it is must to use.
I appreciate response on this.

It is very difficult to give you good answer to your question without any knowledge of your system architecture, design constraints, your IT strategy etc.
Personally I would use Messaging pattern to solve this issue. A few advantages of that pattern are as follows:
Your system components (Calculation process, update process) can be loosely coupled
Depending on implementation of Messaging pattern you can "connect" many Calculation processes (out-scaling) and many update processes (with master-slave approach).
However, implementing Messaging pattern might be very challenging task and I would recommend taking one of the existing frameworks or products.
I hope that will help at least a bit.

I did some work similar to your scenario before, generally there are 2 ways.
One, the cache holder poll the database regularly, fetch the data it needs and keep it in the memory. The data can be stored in a HashMap or some other collections. This approach is simple and easy to implement, no extra framework or library needed. But users will have to endure dirty data from time to time. Besides, polling will cause a lot of pressure on DB if the number of pollers is huge or the query is not fast enough. However, it is generally not a bad one if your requirement for real-time is not that high and the scale of your system is relatively small.
The other approach is that the cache holder subscribes the notification of the data updater and update its data after being notified. It provides better user experience, but this will bring more complexity to your system because you have to get some MS infrastructure, such as JMS, involved. Developing and tuning is more time-consuming.

I know I am quite late resonding this but it might help somebody searching for the same issue.
Here was my problem, I was storing requestPerMinute information in a Hashmap in a Java filter which gets loaded during the start of the application. The problem if somebody updates the DB with new information ,the map doesn't know about this.
Solution: I took one variable updateTime in my Java filter which just stored when was my hashmap last got updated and with every request it checks if the current time is time more than 24 hours , if yes then it updates the hashmap from the database.So every 24 hours it just refreshes the whole hashmap.
Although my usecase was not to update at real time so it fits the use case.

Appropriate datastructure or implementation class for holding a small but changing collection

I have written following game server and want to provide a groups feature. Groups will allow to group players together who are "nearby" on screen. In fast action games, this group would be changing fast since players will be moving in and out of their zones constantly.
Since each player would need to listen to events from other players in the group, players will subscribe to the group.
This brings me to my question. What is the appropriate datastructure or java collection class which can be used in this scenario to hold the changing set of event listeners on a group? The number of listeners to a group would rarely exceed 20 in my opinion, and should be lesser than that in most scenarios. It is a multi-threaded environment.
The one I am planning to use is a CopyOnWriteArrayList. But since there will be reasonable amount of updates(due to changing subscriptions) is this class appropriate? What other class would be good to use? If you have any custom implementation using array's etc please share.

Unless you have millions of changes per second (which seems unlikely in your scenario) a CopyOnWriteArrayList should be good enough for what you need. If I were you, I would use that.
IF you notice a performance issue AND you have profiled your application AND you have identified that the CopyOnWriteArrayList is the bottleneck, then you can find a better structure. But I doubt it will be the case.

Do players have integer IDs? If so then I have an lightweight, immutable array-based set class that might make sense for you:
http://code.google.com/p/mikeralib/source/browse/trunk/Mikera/src/main/java/mikera/persistent/IntSet.java
This was written for similar kinds of situations in game engines.
However I also have an alternative approach to consider: If you are updating the groups automatically based on vicinity, then you might want to consider not tracking groups at all. Instead, consider using a spatial data structure that allows you to quickly search for nearby players whenever an event occurs, and directly send the event to nearby players.
Typically you could use a 2D or 3D grid or octree with the smallest division size set to be equal to the max range for your groups. Then a vicinity search will only need to check 9 (2D case) or 27 (3D case) locations in order to find all nearby players. I think doing this search whenever needed will be faster and simpler than the overhead of maintaining lists of groups and listeners all the time....

From what I've gathered, you have choice between CopyOnWriteArrayList and ConcurrentHashMap:
CopyOnWriteArrayList:
Add/remove operation computational cost is linear to the size of the list. May happen multiple times during a single iteration (group notification).
Simpler data structure with constant read time.
ConcurrentHashMap:
Add/remove operation is a constant time operation. Additions or Removal of subscribers do not affect iteration already in progress and blocking is minimized.
Larger data structure that requires slightly longer read time.
Creating a custom solution is possible when it comes to efficiency but probably not as safe when it comes to thread safety. I'm leaning towards ConcurrentHashMap but the winner will probably depend heavily on how your game turns out.

Java Social Networking, friends' activity status awareness efficient method

Assume you having a social networking website where you can have friends and also view their up to date activities. The question is, what is the most efficient way (avoiding performance problem) to be informed of their activities right away such as changing profile when they're online and you're online as well at the same time?
I have two different ways working out this but I am not precisely sure which one will be the most efficient method from DB point of view as well as Java memory point of view. The followings are my methods, and please let me know if you have any better way:
1- Using java HTTP Session Listener to get session of each single user and traverse through for updates.
2- Checking database for new updates after every few seconds and then updating the map.

First: you will only know after you have done measurements.
Having said this, there is always a space-time-tradeoff. Meaning that if you store a lot of stuff in memory, it will be fast, but you have a large memory footprint. If you go via DB, you will have a small (java) footprint, but will be a lot slower.
So you need to decide what to do. Memory is cheap, so putting stuff in a memory cache will work nicely. But on the other hand, do you really need sub-second updates? Or can an update of a profile be 20sec old, before it is detected?
There is a great episode of the SE-Radio podcast on NoSQL databases that talks a lot about those decisions to make: http://www.se-radio.net/2010/05/episode-162-project-voldemort-with-jay-kreps/ (I hope it is this one)
This episode about memory-grids is also quite good: http://www.se-radio.net/2010/11/episode-169-memory-grid-architecture-with-nati-shalom/

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.