Redis: Beginners Issue - java

I have to store more than 100 millions of key-values in my HashMultiMap (key can have multiple values). Now, I want to use Jedis for that. I download it from here - Jedis 2.0.0.0.jar as recomended to me here. Now, after little bit searching, I could not find any nice document that helps me as a beginner:
1) How to use Jedis (specifically, do I have to treat it as normal .jar files in java ex. like Guava) ?
2) How to implement HashMultiMap (key can have multiple values) in Redis ?
3) How to perform all insertion, searching etc. in Redis.
4) I found by searching Redis, many options like Jedis, Redis, Jredis etc. What are those variations ? And which one would me nice to me for solving this ?
Any information and/or link to any document will be helpful for me. Sorry, if any stupid questions I ask, because I have no idea about Redis. So, beginning idea will be valuable for me. Thanks.

I'm afraid there isn't a simple way to achieve what you want. Redis only has normal hashes. One key - one value.
However, you can serialize your multiple values to a string and store that as a value. Of course, you lose ability to individually insert/update/remove items, you'll have to reset the whole value every time. But this might not be a problem for you.

Redis has few internal types like lists or sets or associated hashes. I guess you can use sets for your case. It's better that serializing whle data because operations with internal types are atomic, and you will not need to worry about possible race conditions.

check out https://github.com/xetorthio/jedis/wiki and http://redis.io/commands
there are several ways which imply using list/sortedSet/hashs as a single fields of your multimap. Then
a) make of subdatabases to provide separated namespaces i.e. limit what is your overall multimap ( select . and/or
b) use the rich semantics the keys have in redis ( see example here ). You could make up your multimap simply using regular key/value mappings set/get with the key name additionally describing your map fields. You have a variety of options to get what you want. One of the last resorts is scripting.
Depends!
afaik, jedis is the most mature.

Related

Calculation of Dependent Values in Java

Using Java, I've got a source data set of integers, it's big but not huge - let's say it won't get bigger than 30,000 values.
Using the source dataset I have some summary values I want to create (these are domain specific so not something you'll find in a library such as Apache Math).
There is a relationship between the summary values like this:
[source data] -> summary1 -> summary2 -> summary3
\ ^
\____________________|
I don't want to over-engineer the solution, but I do expect in future that there may be additional summary values that will build upon this graph. Currently my solution involves having a domain object that has a 'getter' for each summary and merely checks if it has already been computed, and compute-stores it if needed. This works fine, but I don't like having all this compute logic in my domain object.
It feels to me like this could be represented as more of a key->calculator design where results are stored in a map and calculators know which "keys" they need. Before I go off and implement something like this it's very hard to imagine someone hasn't already done this (a thousand times).
Can anyone advise me on idioms or any libraries that would be worth looking at for this kind of problem space? I'm familiar with things like JGraph but I don't believe this will let me associate a calculator on a node, it will merely provide a graph model. Perhaps this is more problem for a caching library?
The key -> calculator idea looks like a typical application of a loading cache (aka auto populating, aka read through). An example to do it with cache2k:
Cache<Key, Integer> summary1cache = new Cache2kBuilder<Key, Integer>() {}
.loader(this::calculateSummary1)
.build();
int calculateSummary1(Key key) {
...
}
To achieve the best performance I recommend one cache per summary type. The user guide has more information about cache loaders / read through.
You can do exactly the same with other caches, e.g. Guava Cache or Caffeine.
An alternative pattern is Map.computeIfAbsent(key, function). However, if the loader function is known from the start I recommend configuring the cache with it.
Disclaimer: I don't know for 100% whether this is the best solution since it is not totally clear from the question how many different keys / summaries you'll have and what the access pattern looks like.

Is it a good practice to iterate a hashmap?

This is a very generic question, I am just taking an example of Java HashMap.
I am having a Hashmap.
Map<Integer,Integer> idPriceMap=new HashMap<Integer,Integer>();
idPriceMap.put(10,20);
idPriceMap.put(11,25);
idPriceMap.put(12,0);
idPriceMap.put(13,100);
idPriceMap.put(14,20);
idPriceMap.put(15,40);
idPriceMap.put(16,90);
Requirements might differ, for e.g.:
UseCase1: I want a value for a particular key assuming that I know the key (PS. I know in this scenario HashMap is the best structure )
Usecase2: I want to get all the values.
For now consider the UseCase2 only. The question is: Is that good practice?
In another scenario I am having UseCase1 & UseCase2 both of them at the same time. What would you suggest?
I tried to Google it, all i got is best ways to iterate a HashMap. :(
UseCase1 (value for a specific key): hashap is the best structure for this, yes
UseCase2 (all values): because you want all values, it does not matter if its a hashmap, a list or a tree
so im not sure what your question is. but you could easily get the time complexity for different data-structures if you google for it:
http://en.wikipedia.org/wiki/Hash_table (see the right side big O notation)
If you have a mixed usage scenario, then there is no generic answer to your question. It depends on the frequency and distribution (scenario 1 or 2) of your requests, and the volatility of your map contents.
Recommendation: Just use the standard hash map. Then profile your application. Most of the time the bottleneck is not where you expect it first and you achieve performance gains with cheap changes in other places.
If the hash map is really the bottle neck:
Ask again, with specific frequencies of your usages ;)
If you want to do "premature optimization" (See the good old c2 wiki: http://c2.com/cgi/wiki?PrematureOptimization), then just put your values in a separate array to speed up the request on the complete values a little. But if you have lots of modifications, then your overall performance will degrade. That is the same thing like having a database index or not....
Hope that was useful.

Is Hazelcast's built in CountAggregation really inefficient?

I've been looking into replacing our Oracle database of currently-executing commands with a Hazelcast distributed map implementation. To do this, I need to replace our SQL queries with the Hazelcast equivalent. Hazelcast provides some built-in aggregations, such as a count. I've been happily using this, but when I came to writing my own aggregations, I had a look at the source code for the CountAggregation. It can be found here: http://grepcode.com/file/repo1.maven.org/maven2/com.hazelcast/hazelcast/3.3-RC2/com/hazelcast/mapreduce/aggregation/impl/CountAggregation.java
Aggregations in Hazelcast are implemented using the MapReduce algorithm. But to me, the source above seems to be really inefficient. For the Mapper stage of the algorithm, they use a SupplierConsumingMapper, which simply emits mappings using the same key as the supplied key. What this then means is that the reducing stage doesn't actually reduce anything, because all of the emitted keys are different, and you end up with a whole load of 1's to count up at the final collation stage, rather than a number of partial counts to add together.
Surely what they should be doing is using a mapper which always emits the same key? Then the combiners and reducers could actually do some combining and reducing. It seems to me that the source code above is incorrectly using the MapReduce model, although the result you end up with is correct. Have I misunderstood something?
Hey you're absolutely correct. The implementation is a bit to simple at that place :) Can you please file an issue at github so we won't forget to fix that one. Thanks Chris

SortedBiTreeMultimap data structure in Java?

Is there any Java library with TreeMap-like data structure which also supports all of these:
lookup by value (like Guava's BiMap)
possibility of non-unique keys as well as non unique values (like Guava's Multimap)
keeps track of sorted values as well as sorted keys
If it exists, it would probaby be called SortedBiTreeMultimap, or similar :)
This can be produced using a few data structures together, but I never took time to unite them in one nice class, so I was wondering if someone else has done it already.
I think you are looking for a "Graph". You might be interested in this slightly similar question asked a while ago, as well as this discussion thread on BiMultimaps / Graphs. Google has a BiMultimap in its internal code base, but they haven't yet decided whether to open source it.

Is there an STL-Multiset equivalent container in Java?

I'm still seeking an ideal solution to this question. To summarize, I am modeling a power subsystem in Java and I need a Directed-Acyclic-Graph (DAG)-type container for my data.
I found exactly what I need in C++'s Standard Template Library (STL). It is the multiset, which supports storing multiple data values for the same key. I can clearly see how storing power nodes and keys, and their upstream/downstream connections as values, could be pulled off with this data structure.
My customer has a hard-requirement that I write the power subsystem model in Java, so I need a data structure identical to the STL multiset. I could potentially roll my own, but it's late in the game and I can't afford the risk of making a mistake.
I'm supremely disappointed that Java is so light on Tree / Graph collections.
Has anyone found an multiset-type structure in Java?
Check out Guava's Multiset. In particular the HashMultiset and the TreeMultiset.
Have you looked at Google's version: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multiset.html

Categories

Resources