I'm still seeking an ideal solution to this question. To summarize, I am modeling a power subsystem in Java and I need a Directed-Acyclic-Graph (DAG)-type container for my data.
I found exactly what I need in C++'s Standard Template Library (STL). It is the multiset, which supports storing multiple data values for the same key. I can clearly see how storing power nodes and keys, and their upstream/downstream connections as values, could be pulled off with this data structure.
My customer has a hard-requirement that I write the power subsystem model in Java, so I need a data structure identical to the STL multiset. I could potentially roll my own, but it's late in the game and I can't afford the risk of making a mistake.
I'm supremely disappointed that Java is so light on Tree / Graph collections.
Has anyone found an multiset-type structure in Java?
Check out Guava's Multiset. In particular the HashMultiset and the TreeMultiset.
Have you looked at Google's version: http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multiset.html
Related
I have about a year of experience in coding in Java. To hone my skills I'm trying to write a Calendar/journal entry desktop app in Java. I've realized that I still have no experience in data persistence and still don't really understand what the data persistence options would be for this program -- So perhaps I'm jumping the gun, and the design choices that I'm hoping to implement aren't even applicable once I get into the nitty gritty.
I mainly want to write a calendar app that allows you to log daily journal entries with associated activity logs for time spent on daily tasks. In terms of adding, editing and viewing the journal entries, using a hash table with the dates of the entries as keys and the entries themselves as the values seems most Big-Oh efficient (O(1) average case for each using a hash table).
However, I'm also hoping to implement a feature that could, given a certain range of dates, provide a simple analysis of average amount of time spent on certain tasks per day. If this is one of the main features I'm interested in, am I wrong in thinking that perhaps a sorted array would be more Big-Oh efficient? Especially considering that the data entries are generally expected to already be added date by date.
Or perhaps there's another option I'm unaware of?
The reason I'm asking is because of the answer provided by this following question: Why not use hashing/hash tables for everything?
And the reason I'm unsure if I'm even asking the right question is because of the answer to the following question: Whats the best data structure for a calendar / day planner?
If so, I would really appreciate being directed other resources on data persistence in java.
Thank you for the help!
Use a NavigableMap interface (implemented by TreeMap, a red-black tree).
This allows you to easily and efficiently select date ranges and traverse over events in key order.
As an aside, if you consider time or date intervals to be "half-open" it will make many problems easier. That is, when selecting events, include the lower bound in results, but exclude the upper. The methods of NavigableMap, like subMap(), are designed to work this way, and it's a good practice when you are working with intervals of any quantity, as it's easy to define a sequence of intervals without overlap or gaps.
Depends on how serious you want your project to be. In all cases, be careful of premature optimization. This is when you try too hard to make your code "efficient", and sacrifice readability/maintainability in the process. For example, there is likely a way of doing manual memory management with native code to make a more efficient implementation of a data structure for your calendar, but it likely does not outweigh the beneits of using familiar APIs etc. It might do, but you only know when you run your code.
Write readable code
Run it, test for performance issues
Use a profiler (e.g. JProfiler) to identify the code that is responsible for poor performance
Optimise that code
Repeat
For code that will "work", but will not be very scalable, a simple List will usually do fine. You can use JSONs to store your objects, and a library such as Jackson Databind to map between List and JSON. You could then simply save it to a file for persistence.
For an application that you want to be more robust and protected against data corruption, a database is probably better. With this, you can guarantee that, for example, data is not partially written, concurrent access to the same data will not result in corruption, and a whole host of other benefits. However, you will need to have a database server running alongside your application. You can use JDBC and suitable drivers for your database vendor (e.g. Mysql) to connect to, read from and write to the database.
For a serious application, you will probably want to create an API for your persistence. A framework like Spring is very helpful for this, as it allows you to declare REST endpoints using annotations, and introduces useful programming concepts, such as containers, IoC/Dependency Injection, Testing (unit tests and integration tests), JPA/ORM systems and more.
Like I say, this is all context dependent, but above all else, avoid premature optimization.
This thread might give you some ideas what data structure to use for Range Queries.
Data structure for range query
And it even might be easier to use a database and using an API to query for the desired range.
If you are using (or are able to use) Guava, you might consider using RangeMap (*).
This would allow you to use, say, a RangeMap<Instant, Event>, which you could then query to say "what event is occurring at time T".
One drawback is that you wouldn't be able to model concurrent events (e.g. when you are double-booked in two meetings).
(*) I work for Google, Guava is Google's open-sourced Java library. This is the library I would use, but others with similar range map offerings are available.
I'm looking for an in-memory map with java-friendly APIs (not necessarily java) that supports range queries. Our design doesn't yet call for it to be distributed.
Any suggestions? Thanks!
Use a TreeMap. A range query can be done using the methods lowerEntry and higherEntry, higherKey and lowerKey. Find the first key smaller than the left end of the range, the first key bigger than the right one and return everything between them.
Depending on how flexible you need things to be and how extensible, etc., you could consider using an in-memory database; that would give you far more capability than you've mentioned here, and is probably only interesting if you think you might have a use for a lot more one day. You would be expending a lot of complexity, and possibly space, for something that would be extremely flexible. But you should be aware that several (free) java databases offer in-memory configurations, including Derby (released with Java).
Is an interval tree maybe what you're looking for?
I have to store more than 100 millions of key-values in my HashMultiMap (key can have multiple values). Now, I want to use Jedis for that. I download it from here - Jedis 2.0.0.0.jar as recomended to me here. Now, after little bit searching, I could not find any nice document that helps me as a beginner:
1) How to use Jedis (specifically, do I have to treat it as normal .jar files in java ex. like Guava) ?
2) How to implement HashMultiMap (key can have multiple values) in Redis ?
3) How to perform all insertion, searching etc. in Redis.
4) I found by searching Redis, many options like Jedis, Redis, Jredis etc. What are those variations ? And which one would me nice to me for solving this ?
Any information and/or link to any document will be helpful for me. Sorry, if any stupid questions I ask, because I have no idea about Redis. So, beginning idea will be valuable for me. Thanks.
I'm afraid there isn't a simple way to achieve what you want. Redis only has normal hashes. One key - one value.
However, you can serialize your multiple values to a string and store that as a value. Of course, you lose ability to individually insert/update/remove items, you'll have to reset the whole value every time. But this might not be a problem for you.
Redis has few internal types like lists or sets or associated hashes. I guess you can use sets for your case. It's better that serializing whle data because operations with internal types are atomic, and you will not need to worry about possible race conditions.
check out https://github.com/xetorthio/jedis/wiki and http://redis.io/commands
there are several ways which imply using list/sortedSet/hashs as a single fields of your multimap. Then
a) make of subdatabases to provide separated namespaces i.e. limit what is your overall multimap ( select . and/or
b) use the rich semantics the keys have in redis ( see example here ). You could make up your multimap simply using regular key/value mappings set/get with the key name additionally describing your map fields. You have a variety of options to get what you want. One of the last resorts is scripting.
Depends!
afaik, jedis is the most mature.
Is there any Java library with TreeMap-like data structure which also supports all of these:
lookup by value (like Guava's BiMap)
possibility of non-unique keys as well as non unique values (like Guava's Multimap)
keeps track of sorted values as well as sorted keys
If it exists, it would probaby be called SortedBiTreeMultimap, or similar :)
This can be produced using a few data structures together, but I never took time to unite them in one nice class, so I was wondering if someone else has done it already.
I think you are looking for a "Graph". You might be interested in this slightly similar question asked a while ago, as well as this discussion thread on BiMultimaps / Graphs. Google has a BiMultimap in its internal code base, but they haven't yet decided whether to open source it.
Well, is there a high-performance graph library for working with primitivies, without those generics/autoboxing overheads? For double lists you may use trove, for linear algebra you may use netlib-java (examples for you to better understand the point of my interest in this question).
As for Graphs/Networks: all the libs I've found use generics and should be not that performant. I may as well do some tests for that, but I believe that heap-managed network link weights would be inferior to double[] with some bit offsets to get the index for i and j. The usage scenario: there're hundreds of such networks (most of them sparse) of size 4k*4k, there's some genetic optimization running over that set of networks, which do some flow/min route estimations for each specimen.
So, there're: JGraphT, JUNG, ANNAS, JDSL (the links lead to the APIs/code samples which expose the miserable Java Generics/Object wrappers in all of them). Are there any Trove-ish alternatives? I'd already created some simplistic implementation, but just decided to look around to avoid inventing the proper bicycle...
Any opinions, suggestions?
Thanks,
Anton
PS: Please don't start on performance of generics-laden Java code, at least without linking to some decent benchmark, ok? ;)
You may use some sparse matrix with row compression. Not best and not specialized, but you may build upon it.
Well, there're some generic sparse matrix implementations which do not mess with generics and one rather solid performance benchmark:
java-matrix-benchmark on google code
ujmp related overview
The most convincing is MTJ's sparse matrix.
Please add answers to the question if you have any suggestions or updates. I'll accept any better ideas. Thanks.
If you need performant data structures, you should check the fastutil project, which is an efficient both in time and memory implementation of the Java Collection Framework. Performance is achieved also avoiding boxing and unboxing primitive types.
Fastutil are very efficient data structure. If you need a graph ADT implementation, you could check this, which is an efficient in memory graph implementation, based on the fastutil.
The project was part of my MS thesis, which was about community detection in big graphs.
Hope it helps!