avoiding calculation every time a class method is called - java

I don't know if the title is appropriate but this is a design question.
I am designing a Java class which has a method which does heavy calculation and I am wondering there is a clean way to avoid this calculation every time the method is called. I know that the calling code can handle this but should it always be the responsibility of the calling code?.
To elaborate - I was writing a class for thousand dimensional vectors with a method to calculate the magnitude.So every time this method will be called it will calculate the magnitude over all the dimensions.

The concept you are looking for is called Memoization

Just cache the results in some structure internal to your class. Once the method is called, it looks if it has the previously calculated result in cache and returns it. In the other case it does the calculation and stores the result in cache. Be careful with the memory though.

Use flag to indicate whether there is a change to your vectors or not. If there is a change, then the method should do a full calculation or apply the calculation to only the changes but you will need to becareful with all the implementations of the rest of your class and make sure that the flag is properly set every time the value is modified.
The second method is to use cache. This is done by storing the previously calculated result and look it up before doing the calculation. However, this method is only work well if you don't have many variety in the key values of your objects oterwise you will end up using a lot of memory. Especially, if your key value has type of double, it is possible that the key value will never be found if they aren't exactly equal.

If the "thousand dimensional vectors" are passed in c'tor you can calculate the magnitude in c'tor and store in some private member variable.
Few things to take care of are:
If there are methods to add / delete vectors or contents of vectors then you need to update the magnitude in those methods.
If your class is supposed to be thread-safe then ensure appropriate write functions are atomic.

How often are the magnitudes changed? Is this immutable? How much of the interface for the vector do you control? Specifically, do you have any way to identify rotations or other magnitude-preserving transformations in your 1000 dimensional space? You could just store state for the magnitude, flag when the value changes, and recalculate only when necessary. If your transformations have nice internals, you might be able to skip the calculation based on that knowledge.

Related

testing a custom data structures big-o complexity

As an assignment, I implemented a custom data structure and a few test cases in order to make sure it works properly.
The code itself is not really needed for the question but you can assume it is some sort of a SortedList.
my problem is that I was also asked to test big-O complexity, e.g make sure put() is o(n) etc.
I am having allot of trouble understanding how can I write such a test.
one way that came to mind was to count the amount of iterations inside the put() method with a simple counter and then checking that it is equal to the size of the list, but that would require me to change the code of the list itself to count the exact number, and I would much rather doing it in a proper way, outside the class, holding only an instance of it.
any ideas anyone? I would really appreciate the help!
With unit testing you test the interface of a class, but the number of iterations is not part of the interface here. You could use a timer to check the runtime behaviour with different sizes. If it's O(n) there should be a linear dependency between time and n.
Get the current time at the start of the test, call the method that you're testing a large number of times, get the current time after, and check the difference to get the execution time. Repeat with a different size input, and check that the execution time follows the expected relationship.
For an o(n) algorithm, a suitable check would be that doubling the input size approximately doubles the execution time.

Insert and Update in log(n) time with better performance

I am developing some financial algorithms using Java. I have a complex data structure with many properties that need to be updated during the life time of the algorithm. Sometimes this data structure is updated more than 1000 times ...
To improve the performance especially for get(search)/update/insert I decided to use TreeMap as a container which is quite efficient in that regard.
Now comes the challenging part. I need to update the data-structure properties for which I need to retrieve it from the container which requires:
check if container has the object
if yes, then get the object, else create new object and add to map
update the object if it is present in the container
This process takes THREE x log(n) i.e check, get and put. I want to do this in SINGLE log(n) time.
For that, my solution is:
I always add the object in the map (insert/update/get) using put. put returns the old object, I update the current object with the old values, which solves log(n) but different object lost reference to previous object because the new value is replaced in the map.
Is there any better solution or better container for updating the datastructure. I can use a List and use Binary Search of collections but for that I need to sort the datastructure again as list is not sorted.
Kindly Guide
I think you are doing pretty good.
O(k.log(n)) = O(log(n))
where k is a constant. So your time complexity is actually O(log(n))
You can achieve 1 and 2 in one hit if you switch to ConcurrentMap.computeIfAbsent(...). It returns the new/old object so you can update it.
If Java-7 then putIfAbsent but that requires an extra new - perhaps a bad thing if construction is expensive.
If you are not scared of having mutable objects around (which you seem to have given your proposed solution), you can do it with 1-2 operations. Instead of
1. contains()
2a. exists? get(), modify, put()
2b. doesn't exist? create, put()
you can just do
1. get()
2a. null? create put()
2b. not-null? modify object contents, as you already have reference
this way you have 1 search op for existing objects and 2 search ops for non-existing objects.
If you want to improve it further, you may want to use ConcurrentHashMap (after you get over your distrust of hashcodes ;) and putIfAbsent
1. old = putIfAbsent(createFresh())
2. old not null? update old
Said all that, I'm generally trying to avoid mutable objects for things longer than lifetime of single method. At some point you might want to multithread your processing and having mutable things is going to make it a lot more complicated. But there are various tradeoffs (like memory pressure), so it is up to you. But please look into hashmaps seriously, they are probably biggest optimalization you can do here, regardless of object (im)mutability.

Is ImmutableMap a sub-optimal choice for large volume of keys/objects/

I was doing some tests with a colleague, and we were pulling data in from a database (about 350,000 records), converting each record into an object and a key object, and then populating them into an ImmutableMap.Builder.
When we called the build() method it took forever, probably due to all the data integrity checks that come with ImmutableMap (dupe keys, nulls, etc). To be fair we tried to use a hashmap as well, and that took awhile but not as long as the ImmutableMap. We finally settled on just using ConcurrentHashMap which we populated with 9 threads as the records were iterated, and wrapped that in an unmodifiable map. The performance was good.
I noticed on the documentation it read ImutableMap is not optimized for "equals()" operations. As a die-hard immutable-ist, I'd like the ImmutableMap to work for large data volumes but I'm getting the sense it is not meant for that. Is that assumption right? Is it optimized only for small/medium-sized data sets? Is there a hidden implementation I need to invoke via "copyOf()" or something?
I guess your key.equals() is a time consuming method.
key.equals() will be called much more times in ImmutableMap.build() than HashMap.put() (in a loop). And key.hashCode() is called same times, both HashMap.put() and ImmutableMap.build(). As result, if key.equals() takes long time, the whole duration can be different much.
key.equals() are called a few times during HashMap.put() (Good hash algorithm leads a few collision).
While in case of ImmutableMap.build(), key.equals() will be called many times when checkNoConflictInBucket(). O(n) for key.equals().
Once the map is built, two types of Map should not be different much when access, as both are hash-based.
Sample:
There are 10000 random String to put as keys. HashMap.put() calls
String.equals() 2 times, while ImmutableMap.build() calls 3000 times.
My experience is that none of Java's built in Collection classes are really optimised for performance at huge volumes. For example HashMap uses simple iteration once hashCode has been used as an index in the array and compares the key via equals to each item with the same hash. If you are going to store several million items in the map then you need a very well designed hash and large capacity. These classes are designed to be as generic and safe as possible.
So performance optimisations to try if you wish to stick with the standard Java HashMap:
Make sure your hash function provides as close as possible to even distribution. Many domains have skewed values and your hash needs to take account of this.
When you have a lot of data HashMap will be expanded many times. Ideally set initial capacity as close as possible to the final value.
Make sure your equals implementation is as efficient as possible.
There are massive performance optimisations you can apply if you know (for example) that your key is an integer, for example using some form of btree after the hash has been applied and using == rather than equals.
So the simple answer is that I believe you will need to write your own collection to get the performance you want or use one of the more optimised implementations available.

Library for optimizing / parameter scanning a method

I have several java classes which implements a quite complicated (non-linear) business logic. Basically the user provides several (numeric) input parameters, and the application computes a scalar.
I would like to do a parameter scan on the input data, that is I would like to know what parameter values create the maximum output value.
The easiest and most time-consuming method would be to create some simple loops with "small" steps on the input parameters, and constantly check the output one.
But as I said, this takes quite a long time; there are several mathematical solutions for this problem (e.g. Newton-method).
My question is; are there any free/open source JAVA libraries which provide this parameter scanning funcionality?
Thanks,
krisy
You might be able to adjust OptaPlanner for this. Call your business logic in a SimpleScoreCalculator and wrap the returned scalar in a Score instance.
Make sure you use at least 6.1.0.Beta2 as that supports IntValueRange and DoubleValueRange.
As for optimization algorithms: if the variables are few, integer and small in range, "Branch And Bound" can work, guaranteeing you the optimal solution (but much faster than Brute Force).
Otherwise, you'll need to go with a limited selection Construction Heuristic followed by Local Search with custom Move implementations.

Is there a parallel processing implementation of HashMap available to Java? Is it even possible?

Searching for the magic ParallelHashMap class
More succinctly, can you use multiple threads to speed up HashMap lookups? Are there any implementations out there that do this already?
In my project, we need to maintain a large map of objects in memory. We never modify the map after it is created, so the map is strictly read-only. However, read and look-up performance on this map is absolutely critical for the success of the application. The systems that the application will be installed on typically have many hardware threads available. Yet, our look-ups only utilize a single thread to retrieve values from the HashMap. Could a divide and conquer approach using multiple threads (probably in a pool) help improve look-up speed?
Most of my google searches have been fruitless - returning lots of results about concurrency problems rather than solutions. Any advice would be appreciated, but if you know of an out of the box solution, you are awesome.
Also of note, all keys and values are immutable. Hash code values are precomputed and stored in the objects themselves on instantiation.
As for the details of the implementation, the Map has about 35,000 items in it. Both keys and values are objects. Keys are a custom look-up key and values are strings. Currently, we can process about 5,000 look-ups per second max (this included a bit of overhead from some other logic, but the main bottleneck is the map implementation itself). However, in order to keep up with our future performance needs, I want to get this number up to around 10,000 look-ups per second. By most normal standards our current implementation is fast - it is just that we need it faster.
In our Map of 35,000 values we have about one hash code collision on average, so I'm guessing that the hash codes are reasonably well-distributed.
So your hash codes are precomputed and the equals function is fast - your hashmap gets should be very fast in this case.
Have you profiled your application to prove that the hashmap gets are indeed the bottleneck?
If you have multiple application threads, they should all be able to perform their own gets from the hashmap at the same time - since you aren't modifying the map, you don't need to externally synchronize the gets. Is the application that uses the hashmap sufficiently threaded to be able to make use of all your hardware threads?
Since the contents of the hash table are immutable, it might be worth looking into perfect hashing - with a perfect hash function, you should never have collisions or need chaining in the hash table, which may improve performance. I don't know of a java implementation off hand, but in know in C/C++, there is gperf
Sounds like you should profile. You could have a high collision rate. You could also try using a lower loadFactor in the HashMap - to reduce collision probability.
Also, if the hashCodes are precomputed, then there is not much work for get() to do except mod and a few equals(). How fast is equals() on your key objects?
To answer your question: yes, absolutely. AS LONG AS YOU AREN'T WRITING TO IT.
You're going to have to make it by hand, and it's going to be a little tricky. Before trying that, have you optimized as much as possible?
In C++, check out Google's dense hash map class in their sparsehash package.
In Java, if you're mapping with a primitive key, use Trove or Colt maps.
That said, here's a start for your parallel hash map: if you choose n hash functions and spawn n threads to search down each path (probing/chaining at each of the n insertion points) you'll get a decent speedup. Be careful because there's a high cost to creating threads, so spawn the threads on construction and then block them until they're needed.
Hopefully the cost of locking won't be higher than the cost of lookup, but that's up to you to experiment with.
From the HashMap documentation (I've changed the emphasis):
Note that this implementation is not
synchronized. If multiple threads
access this map concurrently, and at
least one of the threads modifies the
map structurally, it must be
synchronized externally.
Since your HashMap is never modified you can safely let multiple threads read from it. Implementing locking is not necessary. (The same is true for any case where threads share access to immutable data; in general the simplest way to achieve thread safety is not to share any writable memory)
To ensure that your code doesn't modify the map by accident, I would wrap the map with Collections.unmodifiableMap immediately after its construction. Don't let any references to the original modifiable map linger.
You mentioned this in a comment:
I'm doing equals checks between 5 numbers referenced
From this I infer that your hash computation is also doing some calculations with these 5 numbers. For good HashMap performance, the results of this computation should be randomly dispersed over all possible int values. From the HashMap documentation:
This implementation provides
constant-time performance for the
basic operations (get and put),
assuming the hash function disperses
the elements properly among the
buckets.
In other words, look-up times should remain constant regardless of the element count, if you have a good hash function. Example of a good hashCode() function for a class that stores three numbers (using a prime number to reduce the chance of the XOR yielding zero as suggested by comment):
return this.a.hashCode() ^ (31 * (this.b.hashCode() ^ (31 * this.c.hashCode())));
Example of a bad hashCode function:
return (this.a + this.b + this.c);
HashMaps have constant lookup times. Not sure how you could really speed that up since trying to have multiple threads execute the hashing function will only cause it to go slower.
I think you need evidence that the get() method on the HashMap is where you delay is. I think this is highly unlikely. Put a loop around your get() method to make it repeat 1,000 times and your application probably won't slow down at all. Then you'll know that the delay is elsewhere.

Categories

Resources