I am reading a huge xml file with Java SAX parser:
http://api.steampowered.com/IEconItems_440/GetSchema/v0001/?format=xml
(2.82 MB)
This file contains several thousand 'items', each with properties like 'name', 'level', etc. One of the properties is a unique integer identifier called 'defindex'. I am creating POJOs for each of these items with some of the properties mentioned above as fields (defindex is one of them).
I will need to read these item objects a lot by searching for the defindex
I won't change the data fields of objects though
My question is: How should I store these item objects?
My first thought was storing them in an array and use the defindex as actual array-index, but the array would be huge and not all defindexes are used, e.g. it jumps from 2k to 30k at one point.
Use a Map.
Map objects store relationships between unique "keys" and values.
Implementations of Map are HashMap and TreeMap, among others. They are generic, with a type parameter for the key and value.
You could use the following. This is DEFINITELY pseudocode; adapt it to however you are going to be manipulating these objects. I did not take the SAX API into account; this just demonstrates how to use a Map.
Map<Integer, Item> items = new HashMap<Integer, Item>();
for (Item itemToRead : file) { // or however you iterate
items.put(item.getDefindex(), item);
}
// data retrieval
Item itemToRetrieve = items.get(defindexToGet);
Related
I have a java arraylist that is made like this:
{[{},{}], [{},{}], [{},{}], [{},{}]} of around four thousand records.
I have a particular key through which I want to search in one of the objects in this list and fetch that particular array where that
record matches. The search key is a string.
Is there a solution to this without traversing through the entire list.
It is basically a list that is constructed like this:
List<Object[]> list = new ArrayList<>();
I am using this to fetch the the data from two tables using a join. Individual records of each tables map to these objects.
Say table1: {a:1,b:2,c:3} and table2: {x:1,y:2,z:3}
the data returned would be
{[{a:1,b:2,c:3}, {x:1,y:2,z:3}],[{a:2,b:3,c:4}, {x:2,y:3,z:4}]}
How will I search for say in which array in the list is a=2.
Thanks
If you do not want to be a victim of the linear search, you should consider using another type of data structure than List.
The use case you described seems like a good match for a Map in general. If you want constant time key lookup, consider using HashMap instead.
In a hazelcast project, I wish to use map or multimap which is bundled in its library, the issue is that I want to map objects of a custom class (say Computer) (that contains 2 string and 2 integer data items) as values and keeping String as key "Laptop" , "Desktop".
The problem is I am unable to figure out how to use list or array of objects in the map.
Alter: If I try to store objects individually and not as a list or array, how should I code to read those all objects using key as input. Any help is appreciated :)
My dataset looks like this:
Task-1, Priority1, (SkillA, SkillB)
Task-2, Priority2, (SkillA)
Task-3, Priority3, (SkillB, SkillC)
Calling application (client) will send in a list of skills - say (SkillD, SkillA).
lookup:
Search thru dataset for SkillD first, and not find anything.
Search for SkillA. We will find two entries - Task-1 with Priority1, Task-2 with Priority2.
Identify the task with highest priority (in this case, Task-1)
Remove Task-1 from that dataset & return Task-1 to client
Design considerations:
there will be lot of add/update/delete to the dataset when website goes live
There are only few skills but not a static list (about 10), but for each skill, there can be thousands of tasks. So, the lookup/retrieval will have to be extremely fast
I have considered simple List with binarySearch(comparator) or Map(skill, SortedSettasks(task)), but looking for more ideas.
What is the best way to design a data structure for this kind of dataset that allows a complex key and sorted array of tasks associated with that key.
How about changing the aproach a bit?
You can use the Guava and a Multimap in particular.
Every experienced Java programmer has, at one point or another, implemented a Map<K, List<V>> or Map<K, Set<V>>, and dealt with the awkwardness of that structure. For example, Map<K, Set<V>> is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.
There are two ways to think of a Multimap conceptually: as a collection of mappings from single keys to single values:
I would suggest you having a Multimap of and the answer to your problem in a powerfull feature introduced by Multimap called Views
Good luck!
I would consider MongoDB. The data object for one of your rows sounds like a good fit into a JSON format, versus a row in a table. The reason is because the skill set list may grow. In classic relational DB you solve this through one of three ways, have ever expanding columns to make sure you have max number of skill set columns (this is very ugly), have a separate table that has grouping of skill sets matched to an ID, or store the skill sets as a comma delimited list of skill sets. Each of these suck. In MongoDB you can have array fields and the items in the array are indexable.
So with this in mind I would do all the querying on MongoDB and let it deal with it all. I would create a POJO that would like this:
public class TaskPriority {
String taskId;
String priorityId;
List<String> skillIds;
}
In MongoDB you can index all these fields to get fast searching and querying.
If it is the case that you have to cache these items locally and do these queries off of Java data structures then what you can do is create an index for the items you care about that reference instances of the TaskPriority object.
For example to track skill sets to their TaskPriority's then the following Map can be used:
Map<String, TaskPriority> skillSetToTaskPriority;
You can repeat this for taskId and priorityId. You would have to manage these indexes. This is usually the job of your DB to do.
Finally, you can then have POJO's and tables (or MongodDB collections) that map the taskId to a Task object that contains any meta data about that task that you may wish to have. And the same is true for Priority and SkillSet. So thats 4 MongoDB collections... Tasks, Priorities, SkillSets, and TaskPriorities.
I have an object so that
class MyObj{
public long id_1;
...
}
My HBM (hibernate mapping file) tells that this id_1 is my Id. Now what i want to do is to cache this entity in HashMap so that HashMap<MyObj, NestObj> i.e., MyObj will become the key for the hashMap.
Now the question that I wanted to ask
I want to make sure that even though i have saved the whole object as the key, I want to keep object retrieval/storing in the hashmap based on the MyObj.id_1 value. The easiest way i can do so is to after retrieving all objects, I have to do a for loop to add them in a map as Map <Integer, MyObj> but in that case i would have maintain two maps (one for MyObj and other for NestedObj) which i want to avoid.
How can i dictate my HashMap to use MyObj.id_1 column to use as comparator, hash etc. Shall i override hash and equal function ? But if i do so, would it affect hibnerate comparison while storing/retrieving entities ?
I need to implement an n:m relation in Java.
The use case is a catalog.
a product can be in multiple categories
a category can hold multiple products
My current solution is to have a mapping class that has two hashmaps.
The key of the first hashmap is the product id and the value is a list of category ids
The key to the second hashmap is the category id and the value is a list of product ids
This is totally redundant an I need a setting class that always takes care that the data is stored/deleted in both hashmaps.
But this is the only way I found to make the following performant in O(1):
what products holds a category?
what categories is a product in?
I want to avoid full array scans or something like that in every way.
But there must be another, more elegant solution where I don't need to index the data twice.
Please en-light me. I have only plain Java, no database or SQLite or something available. I also don't really want to implement a btree structure if possible.
If you associate Categories with Products via a member collection, and vica versa, then you can accomplish the same thing:
public class Product {
private Set<Category> categories = new HashSet<Category>();
//implement hashCode and equals, potentially by id for extra performance
}
public class Category {
private Set<Product> contents = new HashSet<Product>();
//implement hashCode and equals, potentially by id for extra performance
}
The only difficult part is populating such a structure, where some intermediate maps might be needed.
But the approach of using auxiliary hashmaps/trees for indexing is not a bad one. After all, most indices placed on databases for example are auxiliary data structures: they coexist with the table of rows; the rows aren't necessarily organized in the structure of the index itself.
Using an external structure like this empowers you to keep optimizations and data separate from each other; that's not a bad thing. Especially if tomorrow you want to add O(1) look-ups for Products given a Vendor, e.g.
Edit: By the way, it looks like what you want is an implementation of a Multimap optimized to do reverse lookups in O(1) as well. I don't think Guava has something to do that, but you could implement the Multimap interface so at least you don't have to deal with maintaining the HashMaps separately. Actually it's more like a BiMap that is also a Multimap which is contradictory given their definitions. I agree with MStodd that you probably want to roll your own layer of abstraction to encapsulate the two maps.
Your solution is perfectly good. Remember that putting an object into a HashMap doesn't make a copy of the Object, it just stores a reference to it, so the cost in time and memory is quite small.
I would go with your first solution. Have a layer of abstraction around two hashmaps. If you're worried about concurrency, implement appropriate locking for CRUD.
If you're able to use an immutable data structure, Guava's ImmutableMultimap offers an inverse() method, which enables you to get a collection of keys by value.