I need to implement an n:m relation in Java.
The use case is a catalog.
a product can be in multiple categories
a category can hold multiple products
My current solution is to have a mapping class that has two hashmaps.
The key of the first hashmap is the product id and the value is a list of category ids
The key to the second hashmap is the category id and the value is a list of product ids
This is totally redundant an I need a setting class that always takes care that the data is stored/deleted in both hashmaps.
But this is the only way I found to make the following performant in O(1):
what products holds a category?
what categories is a product in?
I want to avoid full array scans or something like that in every way.
But there must be another, more elegant solution where I don't need to index the data twice.
Please en-light me. I have only plain Java, no database or SQLite or something available. I also don't really want to implement a btree structure if possible.
If you associate Categories with Products via a member collection, and vica versa, then you can accomplish the same thing:
public class Product {
private Set<Category> categories = new HashSet<Category>();
//implement hashCode and equals, potentially by id for extra performance
}
public class Category {
private Set<Product> contents = new HashSet<Product>();
//implement hashCode and equals, potentially by id for extra performance
}
The only difficult part is populating such a structure, where some intermediate maps might be needed.
But the approach of using auxiliary hashmaps/trees for indexing is not a bad one. After all, most indices placed on databases for example are auxiliary data structures: they coexist with the table of rows; the rows aren't necessarily organized in the structure of the index itself.
Using an external structure like this empowers you to keep optimizations and data separate from each other; that's not a bad thing. Especially if tomorrow you want to add O(1) look-ups for Products given a Vendor, e.g.
Edit: By the way, it looks like what you want is an implementation of a Multimap optimized to do reverse lookups in O(1) as well. I don't think Guava has something to do that, but you could implement the Multimap interface so at least you don't have to deal with maintaining the HashMaps separately. Actually it's more like a BiMap that is also a Multimap which is contradictory given their definitions. I agree with MStodd that you probably want to roll your own layer of abstraction to encapsulate the two maps.
Your solution is perfectly good. Remember that putting an object into a HashMap doesn't make a copy of the Object, it just stores a reference to it, so the cost in time and memory is quite small.
I would go with your first solution. Have a layer of abstraction around two hashmaps. If you're worried about concurrency, implement appropriate locking for CRUD.
If you're able to use an immutable data structure, Guava's ImmutableMultimap offers an inverse() method, which enables you to get a collection of keys by value.
Related
My dataset looks like this:
Task-1, Priority1, (SkillA, SkillB)
Task-2, Priority2, (SkillA)
Task-3, Priority3, (SkillB, SkillC)
Calling application (client) will send in a list of skills - say (SkillD, SkillA).
lookup:
Search thru dataset for SkillD first, and not find anything.
Search for SkillA. We will find two entries - Task-1 with Priority1, Task-2 with Priority2.
Identify the task with highest priority (in this case, Task-1)
Remove Task-1 from that dataset & return Task-1 to client
Design considerations:
there will be lot of add/update/delete to the dataset when website goes live
There are only few skills but not a static list (about 10), but for each skill, there can be thousands of tasks. So, the lookup/retrieval will have to be extremely fast
I have considered simple List with binarySearch(comparator) or Map(skill, SortedSettasks(task)), but looking for more ideas.
What is the best way to design a data structure for this kind of dataset that allows a complex key and sorted array of tasks associated with that key.
How about changing the aproach a bit?
You can use the Guava and a Multimap in particular.
Every experienced Java programmer has, at one point or another, implemented a Map<K, List<V>> or Map<K, Set<V>>, and dealt with the awkwardness of that structure. For example, Map<K, Set<V>> is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.
There are two ways to think of a Multimap conceptually: as a collection of mappings from single keys to single values:
I would suggest you having a Multimap of and the answer to your problem in a powerfull feature introduced by Multimap called Views
Good luck!
I would consider MongoDB. The data object for one of your rows sounds like a good fit into a JSON format, versus a row in a table. The reason is because the skill set list may grow. In classic relational DB you solve this through one of three ways, have ever expanding columns to make sure you have max number of skill set columns (this is very ugly), have a separate table that has grouping of skill sets matched to an ID, or store the skill sets as a comma delimited list of skill sets. Each of these suck. In MongoDB you can have array fields and the items in the array are indexable.
So with this in mind I would do all the querying on MongoDB and let it deal with it all. I would create a POJO that would like this:
public class TaskPriority {
String taskId;
String priorityId;
List<String> skillIds;
}
In MongoDB you can index all these fields to get fast searching and querying.
If it is the case that you have to cache these items locally and do these queries off of Java data structures then what you can do is create an index for the items you care about that reference instances of the TaskPriority object.
For example to track skill sets to their TaskPriority's then the following Map can be used:
Map<String, TaskPriority> skillSetToTaskPriority;
You can repeat this for taskId and priorityId. You would have to manage these indexes. This is usually the job of your DB to do.
Finally, you can then have POJO's and tables (or MongodDB collections) that map the taskId to a Task object that contains any meta data about that task that you may wish to have. And the same is true for Priority and SkillSet. So thats 4 MongoDB collections... Tasks, Priorities, SkillSets, and TaskPriorities.
The recommended way of using merge() is to first get the DTO first before inputting the changes.
public void merge(PersonModel model) {
Person inputDTO = PersonBuilder.build(model)
Person dto = get(pk)
dto.setName(inputDTO.getName())
dto.getChildren().clear()
Iterator<Child> iter = inputDTO .getChildren().Iterator();
while(iter.hasNext()){
dto.getChildren().add(iter.next());
}
dto.merge();
}
Is there a more elegant way of performing such operation translating domain model to dto and merging it so that no data are accidentally deleted.
Example of problem:
Hibernate: prevent delete orphan when using merge();
I find the need to clear the list and adding it very wasteful.
Can someone recommend me a design pattern or a way to code it properly?
Thank you
ADD ON:
1) Possible to use Hibernate Hashset to replace List? Will hibernate hashset replace elements base on primary keys?
any help?
"The recommended way of using merge() is to first get the DTO first before inputting the changes"
Who recommended you to do this?
"Is there a more elegant way of performing such operation translating domain model to dto and merging it so that no data are accidentally deleted."
I don't think you can translate domain objects to DTOs. A DTO is just about data, a domain object is data, behaviour and context. Completely different.
If you don't have behaviour and context in your domain objects (a.k.a. anemic domain model), you don't need an extra DTO layer that just duplicates the objects.
Because you tagged this question with Hibernate and mentioned it in your question, you don't need to call merge yourself because you just got the object from the database and Hibernate will flush the session to synchronize the changes with the database.
"Possible to use Hibernate Hashset to replace List? Will hibernate hashset replace elements base on primary keys?"
I would replace the List with a Hashset, since the table where the data is going to be stored is a set, not a list (you can't have duplicate records). A hashset will not replace elements based on primary keys. A set (any set, Hibernate's implementation is no different) works by preventing duplicates. It uses your equals() and getHashCode() implementation to find out if there is already an object in that set. If that is the case, it won't be added but it keeps the original.
I'm looked a lot into being able to use Hibernate to persist a map like Map<String, Set<Entity>> with little luck (especially since I want it all to be on one table).
Mapping MultiMaps with Hibernate is the thing that seems to get referenced the most, which describes in detail how to go about implementing this using a UserCollectionType.
I was wondering, since that was written over four years ago, is there any better way of doing it now?
So, for example, I would like to have on EntityA a map like Map<String, Set/List<EntityB>>.
There would be two tables: EntityA and EntityB (with EntityB having a foreign key back to EntityA).
I don't want any intermediate tables.
The way how its done on my current project is that we transforming beans/collections to xml using xstream:
public static String toXML(Object instance) {
XStream xs = new XStream();
StringWriter writer = new StringWriter();
xs.marshal(instance, new CompactWriter(writer));
return writer.toString();
}
and then using Lob type in hibernate for persisting :
#Lob
#Column(nullable = false)
private String data;
I found this approach very generic and you could effectively implement flexible key/value storage with it. You you don't like XML format then Xstream framework has inbuilt driver for transforming objects to JSON. Give it a try, its really cool.
Cheers
EDIT: Response to comment.
Yes, if you want to overcome limitations of classic approach you are probably sacrifice something like indexing and/or search. You stil could implement indexing/searching/foreign/child relationships thru collections/generic entity beans by yourself - just maintain separate key/value table with property name/property value(s) for which you think search is needed.
I've seen number of database designs for products where flexible and dynamic(i.e. creation new attributes for domain objects without downtime) schema is needed and many of them use key/value tables for storing domain attributes and references from owner objects to child one. Those products cost millions of dollars (banking/telco) so I guess this design is already proven to be effective.
Sorry, that's not answer to your original question since you asked about solution without intermediate tables.
It depends :) When things are getting complex, you should understand what your application is doing.
In some situation, you may represent your Set as a TreeSet, and represent this TreeSet in an ordered coded String, such as ["1", "8", "12"] where 1, 8, 12 are primary keys, and then let's write code !
Obviously, it's not a general answer for, in my opinion, a too general question.
I need a map that has two keys, e.g.
Map2<String /*ssn*/, String /*empId*/, Employee> _employees;
So that I can
_employees.put(e.ssn(), e.empId(), e)
And later
_employees.get1(someSsn);
_employees.get2(someImpId);
Or even
_employees.remove1(someImpId);
I am not sure why I want to stop at two, why not more, probably because that's the case I am I need right now :-) But the type needs to handle fixed number of keys to be type-safe -- type parameters cannot be vararg :-)
Appreciate any pointers, or advice on why this is a bad idea.
I imagine the main key would be empId, so I would build a Map with that as the key, i.e. empId ---> Employee. All other unique attributes (e.g. ssn) would be treated as secondary and will use separate Maps as a lookup table for empId (e.g. ssn ---> empId).
This implementation makes it easy to add/remove employees, since you only need to change one Map, i.e. empId ---> Employee; the other Maps can be rebuilt only when needed.
My first thought was: the easiest way to do this, I think, would be two maps.
Map< String, Map< String,Employee> > _employees;
But from what it looks like, you just want to be able to look up an employee by either SSN or ID. What's to stop you then from making two maps, or at worst a class that contains two maps?
As a clarification, are you looking for a compound key being employees are uniquely identified by the combination of their SSN and ID, but not either one by itself, or are you looking for two different ways of referencing an employee?
The Spiffy Framework appears to provide exactly what you`re looking for. From the Javadocs:
A two-dimensional hashmap, is a
HashMap that enables you to refer to
values via two keys rather than one
The relevant class is TwoDHashMap. It also provides a ThreeDHashMap.
Suppose you have a collection of a few hundred in-memory objects and you need to query this List to return objects matching some SQL or Criteria like query. For example, you might have a List of Car objects and you want to return all cars made during the 1960s, with a license plate that starts with AZ, ordered by the name of the car model.
I know about JoSQL, has anyone used this, or have any experience with other/homegrown solutions?
Filtering is one way to do this, as discussed in other answers.
Filtering is not scalable though. On the surface time complexity would appear to be O(n) (i.e. already not scalable if the number of objects in the collection will grow), but actually because one or more tests need to be applied to each object depending on the query, time complexity more accurately is O(n t) where t is the number of tests to apply to each object.
So performance will degrade as additional objects are added to the collection, and/or as the number of tests in the query increases.
There is another way to do this, using indexing and set theory.
One approach is to build indexes on the fields within the objects stored in your collection and which you will subsequently test in your query.
Say you have a collection of Car objects and every Car object has a field color. Say your query is the equivalent of "SELECT * FROM cars WHERE Car.color = 'blue'". You could build an index on Car.color, which would basically look like this:
'blue' -> {Car{name=blue_car_1, color='blue'}, Car{name=blue_car_2, color='blue'}}
'red' -> {Car{name=red_car_1, color='red'}, Car{name=red_car_2, color='red'}}
Then given a query WHERE Car.color = 'blue', the set of blue cars could be retrieved in O(1) time complexity. If there were additional tests in your query, you could then test each car in that candidate set to check if it matched the remaining tests in your query. Since the candidate set is likely to be significantly smaller than the entire collection, time complexity is less than O(n) (in the engineering sense, see comments below). Performance does not degrade as much, when additional objects are added to the collection. But this is still not perfect, read on.
Another approach, is what I would refer to as a standing query index. To explain: with conventional iteration and filtering, the collection is iterated and every object is tested to see if it matches the query. So filtering is like running a query over a collection. A standing query index would be the other way around, where the collection is instead run over the query, but only once for each object in the collection, even though the collection could be queried any number of times.
A standing query index would be like registering a query with some sort of intelligent collection, such that as objects are added to and removed from the collection, the collection would automatically test each object against all of the standing queries which have been registered with it. If an object matches a standing query then the collection could add/remove it to/from a set dedicated to storing objects matching that query. Subsequently, objects matching any of the registered queries could be retrieved in O(1) time complexity.
The information above is taken from CQEngine (Collection Query Engine). This basically is a NoSQL query engine for retrieving objects from Java collections using SQL-like queries, without the overhead of iterating through the collection. It is built around the ideas above, plus some more. Disclaimer: I am the author. It's open source and in maven central. If you find it helpful please upvote this answer!
I have used Apache Commons JXPath in a production application. It allows you to apply XPath expressions to graphs of objects in Java.
yes, I know it's an old post, but technologies appear everyday and the answer will change in the time.
I think this is a good problem to solve it with LambdaJ. You can find it here:
http://code.google.com/p/lambdaj/
Here you have an example:
LOOK FOR ACTIVE CUSTOMERS // (Iterable version)
List<Customer> activeCustomers = new ArrayList<Customer>();
for (Customer customer : customers) {
if (customer.isActive()) {
activeCusomers.add(customer);
}
}
LambdaJ version
List<Customer> activeCustomers = select(customers,
having(on(Customer.class).isActive()));
Of course, having this kind of beauty impacts in the performance (a little... an average of 2 times), but can you find a more readable code?
It has many many features, another example could be sorting:
Sort Iterative
List<Person> sortedByAgePersons = new ArrayList<Person>(persons);
Collections.sort(sortedByAgePersons, new Comparator<Person>() {
public int compare(Person p1, Person p2) {
return Integer.valueOf(p1.getAge()).compareTo(p2.getAge());
}
});
Sort with lambda
List<Person> sortedByAgePersons = sort(persons, on(Person.class).getAge());
Update: after java 8 you can use out of the box lambda expressions, like:
List<Customer> activeCustomers = customers.stream()
.filter(Customer::isActive)
.collect(Collectors.toList());
Continuing the Comparator theme, you may also want to take a look at the Google Collections API. In particular, they have an interface called Predicate, which serves a similar role to Comparator, in that it is a simple interface that can be used by a filtering method, like Sets.filter. They include a whole bunch of composite predicate implementations, to do ANDs, ORs, etc.
Depending on the size of your data set, it may make more sense to use this approach than a SQL or external relational database approach.
If you need a single concrete match, you can have the class implement Comparator, then create a standalone object with all the hashed fields included and use it to return the index of the match. When you want to find more than one (potentially) object in the collection, you'll have to turn to a library like JoSQL (which has worked well in the trivial cases I've used it for).
In general, I tend to embed Derby into even my small applications, use Hibernate annotations to define my model classes and let Hibernate deal with caching schemes to keep everything fast.
I would use a Comparator that takes a range of years and license plate pattern as input parameters. Then just iterate through your collection and copy the objects that match. You'd likely end up making a whole package of custom Comparators with this approach.
The Comparator option is not bad, especially if you use anonymous classes (so as not to create redundant classes in the project), but eventually when you look at the flow of comparisons, it's pretty much just like looping over the entire collection yourself, specifying exactly the conditions for matching items:
if (Car car : cars) {
if (1959 < car.getYear() && 1970 > car.getYear() &&
car.getLicense().startsWith("AZ")) {
result.add(car);
}
}
Then there's the sorting... that might be a pain in the backside, but luckily there's class Collections and its sort methods, one of which receives a Comparator...