Hashmap vs Array performance - java

Is it (performance-wise) better to use Arrays or HashMaps when the indexes of the Array are known? Keep in mind that the 'objects array/map' in the example is just an example, in my real project it is generated by another class so I cant use individual variables.
ArrayExample:
SomeObject[] objects = new SomeObject[2];
objects[0] = new SomeObject("Obj1");
objects[1] = new SomeObject("Obj2");
void doSomethingToObject(String Identifier){
SomeObject object;
if(Identifier.equals("Obj1")){
object=objects[0];
}else if(){
object=objects[1];
}
//do stuff
}
HashMapExample:
HashMap objects = HashMap();
objects.put("Obj1",new SomeObject());
objects.put("Obj2",new SomeObject());
void doSomethingToObject(String Identifier){
SomeObject object = (SomeObject) objects.get(Identifier);
//do stuff
}
The HashMap one looks much much better but I really need performance on this so that has priority.
EDIT: Well Array's it is then, suggestions are still welcome
EDIT: I forgot to mention, the size of the Array/HashMap is always the same (6)
EDIT: It appears that HashMaps are faster
Array: 128ms
Hash: 103ms
When using less cycles the HashMaps was even twice as fast
test code:
import java.util.HashMap;
import java.util.Random;
public class Optimizationsest {
private static Random r = new Random();
private static HashMap<String,SomeObject> hm = new HashMap<String,SomeObject>();
private static SomeObject[] o = new SomeObject[6];
private static String[] Indentifiers = {"Obj1","Obj2","Obj3","Obj4","Obj5","Obj6"};
private static int t = 1000000;
public static void main(String[] args){
CreateHash();
CreateArray();
long loopTime = ProcessArray();
long hashTime = ProcessHash();
System.out.println("Array: " + loopTime + "ms");
System.out.println("Hash: " + hashTime + "ms");
}
public static void CreateHash(){
for(int i=0; i <= 5; i++){
hm.put("Obj"+(i+1), new SomeObject());
}
}
public static void CreateArray(){
for(int i=0; i <= 5; i++){
o[i]=new SomeObject();
}
}
public static long ProcessArray(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkArray(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkArray(String Identifier) {
SomeObject object;
if(Identifier.equals("Obj1")){
object=o[0];
}else if(Identifier.equals("Obj2")){
object=o[1];
}else if(Identifier.equals("Obj3")){
object=o[2];
}else if(Identifier.equals("Obj4")){
object=o[3];
}else if(Identifier.equals("Obj5")){
object=o[4];
}else if(Identifier.equals("Obj6")){
object=o[5];
}else{
object = new SomeObject();
}
object.kill();
}
public static long ProcessHash(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkHash(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkHash(String Identifier) {
SomeObject object = (SomeObject) hm.get(Identifier);
object.kill();
}
}

HashMap uses an array underneath so it can never be faster than using an array correctly.
Random.nextInt() is many times slower than what you are testing, even using array to test an array is going to bias your results.
The reason your array benchmark is so slow is due to the equals comparisons, not the array access itself.
HashTable is usually much slower than HashMap because it does much the same thing but is also synchronized.
A common problem with micro-benchmarks is the JIT which is very good at removing code which doesn't do anything. If you are not careful you will only be testing whether you have confused the JIT enough that it cannot workout your code doesn't do anything.
This is one of the reason you can write micro-benchmarks which out perform C++ systems. This is because Java is a simpler language and easier to reason about and thus detect code which does nothing useful. This can lead to tests which show that Java does "nothing useful" much faster than C++ ;)

arrays when the indexes are know are faster (HashMap uses an array of linked lists behind the scenes which adds a bit of overhead above the array accesses not to mention the hashing operations that need to be done)
and FYI HashMap<String,SomeObject> objects = HashMap<String,SomeObject>(); makes it so you won't have to cast

For the example shown, HashTable wins, I believe. The problem with the array approach is that it doesn't scale. I imagine you want to have more than two entries in the table, and the condition branch tree in doSomethingToObject will quickly get unwieldly and slow.

Logically, HashMap is definitely a fit in your case. From performance standpoint is also wins since in case of arrays you will need to do number of string comparisons (in your algorithm) while in HashMap you just use a hash code if load factor is not too high. Both array and HashMap will need to be resized if you add many elements, but in case of HashMap you will need to also redistribute elements. In this use case HashMap loses.

Arrays will usually be faster than Collections classes.
PS. You mentioned HashTable in your post. HashTable has even worse performance thatn HashMap. I assume your mention of HashTable was a typo
"The HashTable one looks much much
better "

The example is strange. The key problem is whether your data is dynamic. If it is, you could not write you program that way (as in the array case). In order words, comparing between your array and hash implementation is not fair. The hash implementation works for dynamic data, but the array implementation does not.
If you only have static data (6 fixed objects), array or hash just work as data holder. You could even define static objects.

Related

ArrayLists better practice

I have written 2 methods in Java. Second method looks cleaner to me because I come from python background, but I think it will be slower than first because indexOf() also does the iteration? Is there a way to use for in loop correctly in situation like this? Also, if there is better way to do it (without Streams), how can it be done?
private ArrayList<MyObject> myObjects;
First method:
private int findObject(String objectName) {
for(int i=0; i<this.myObjects.size(); i++) {
MyObject myObject = this.myObjects.get(i);
if(myObject.getName().equals(objectName)) return i;
}
return -1;
}
Second method:
private int findObject(String objectName) {
for(MyObject myObject: this.myObjects) {
if(myObject.getName().equals(objectName)) return this.myObjects.indexOf(myObject);
}
return -1;
}
I think it will be slower than first because indexOf() also does the iteration?
You are correct.
Is there a way to use for each loop correctly in situation like this?
You can use a for each AND an index variable.
private int findObject(String objectName) {
int i = 0;
for (MyObject myObject: this.myObjects) {
if (myObject.getName().equals(objectName)) return i;
i++;
}
return -1;
}
This would be a good solution if myObjects.get(i) is an expensive operation (e.g. on a LinkedList where get(n) is O(N)) or if it is not implementable (e.g. if you were iterating a Stream).
You could also use a ListIterator provided that myObjects has a method that returns a ListIterator; see #Andy Turner's answer for an example. (It won't work for a typical Set or Map class.)
The first version is perfect if you know you're working with an ArrayList (or some other array-based List, e.g. Vector).
If myObject happens to be a LinkedList or similar, your performance will degrade with longer lists, as then get(i) no longer executes in constant time.
Your second approach will handle LinkedLists as well as ArrayLists, but it iterates twice over your list, once in your for loop, and once in the indexOf() call.
I'd recommend a third version: use the for loop from the second approach, and add an integer counting variable, incrementing inside the loop. This way, you get the best of both: iterating without performance degradation, and cheap position-counting.
The better way of doing this (that avoids you having to maintain a separate index variable; and works for non-RandomAccess lists too) would be to use a ListIterator:
for (ListIterator<MyObject> it = myObjects.listIterator(); it.hasNext();) {
MyObject myObject = it.next();
if(myObject.getName().equals(objectName)) return it.prevIndex();
}
return -1;

Creating a dictionary: Method to prevent the same word from being added more than once

I need to create a method to determine whether or not the word I'm trying to add to my String[] dictionary has already been added. We were not allowed to use ArrayList for this project, only arrays.
I started out with this
public static boolean dictHasWord(String str){
for(int i = 0; i < dictionary.length; i++){
if(str.equals(dictionary[i])){
return true;
}
}
return false;
}
However, my professor told me not to use this, because it is a linear function O(n), and is not effective. What other way could I go about solving this method?
This is a example of how to quickly search through a Array with good readability. I would suggest using this method to search your array.
import java.util.*;
public class test {
public static void main(String[] args) {
String[] list = {"name", "ryan"
};
//returns boolean here
System.out.println(Arrays.asList(list).contains("ryan"));
}
}
If you are allowed to use the Arrays class as part of your assignment, you can sort your array and use a binary search instead, which is not O(n).
public static boolean dictHasWord(String str){
if(Arrays.binarySearch(dictionary, str) != -1){
return true;
}
return false;
}
Just keep in mind you must sort first.
EDIT:
Regarding writing your own implementation, here's a sample to get you going. Here are the javadocs for compareTo() as well. Heres another sample (int based example) showing the difference between recursive and non recursive, specifically in Java.
Although it maybe an overkill in this case, but a hash-table would not be O(n).
This uses the fact that every String can be turnt into an int via hashCode(), and equal strings will produce the same hash.
Our dictionary can be declared as:
LinkedList<String>[] dictionary;
In other words in each place several strings may reside, this is due to possible collisions (different strings producing the same result).
The simplest solution for addition would be:
public void add(String str)
{
dictionary[str.hashCode()].add(str);
}
But in order to do this, you would need to make an array size equal to 1 less the maximum of hashCode() function. Which is probably too much memory for you. So we can do a little differently:
public void add(String str)
{
dictionary[str.hashCode()%dictionary.length].add(str);
}
This way we always mod the hash. For best results you should make your dictionary size some prime number, or at least a power of a single prime.
Then when you want to test the existence of the string you do exactly what you had in the original, but you use the specific LinkedList that you get from the hash:
public static boolean dictHasWord(String str)
{
for(String existing : dictionary[str.hashCode()%dictionary.length])
{
if(str.equals(existing)){
return true;
}
}
return false;
}
At which point you may ask "Isn't it O(n)?". And the answer is that it is not, since the hash function did not take into consideration the number of elements in array. The more memory you will give to your array, less collisions you will have, and more this approach moves towards O(1).
If somebody finds this answer searching for a real solution (not homework assignment). Then just use HashMap.

String IdentityHashMap vs HashMap performance

Identity HashMap is special implementation in Java which compares the objects reference instead of equals() and also uses identityHashCode() instead of hashCode(). In addition, it uses linear-probe hash table instead of Entry list.
Map<String, String> map = new HashMap<>();
Map<String, String> iMap = new IdentityHashMap<>();
Does that mean for the String keys IdentifyHashMap will be usually faster if tune correctly ?
See this example:
public class Dictionary {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("/usr/share/dict/words"));
String line;
ArrayList<String> list = new ArrayList<String>();
while ((line = br.readLine()) != null) {
list.add(line);
}
System.out.println("list.size() = " + list.size());
Map<String, Integer> iMap = new IdentityHashMap<>(list.size());
Map<String, Integer> hashMap = new HashMap<>(list.size());
long iMapTime = 0, hashMapTime = 0;
long time;
for (int i = 0; i < list.size(); i++) {
time = System.currentTimeMillis();
iMap.put(list.get(i), i);
time = System.currentTimeMillis() - time;
iMapTime += time;
time = System.currentTimeMillis();
hashMap.put(list.get(i), i);
time = System.currentTimeMillis() - time;
hashMapTime += time;
}
System.out.println("iMapTime = " + iMapTime + " hashMapTime = " + hashMapTime);
}
}
Tried very basic performance check. I am reading dictionary words (235K) & pushing into the both maps. It prints:
list.size() = 235886
iMapTime = 101 hashMapTime = 617
I think this is very good improvment to ignore, unless I am doing something wrong here.
How does IdentityHashMap<String,?> work?
To make IdentityHashMap<String,?> work for arbitrary strings, you'll have to String.intern() both the keys you put() and potential keys you pass to get(). (Or use an equivalent mechanism.)
Note: unlike stated in #m3th0dman's answer, you don't need to intern() the values.
Either way, interning a string ultimately requires looking it up in some kind of hash table of already interned strings. So unless you had to intern your strings for some other reason anyway (and thus already paid the cost), you won't get much of an actual performance boost out of this.
So why does the test show that you can?
Where your test is unrealistic is that you keep the exact list of keys you used with put() and you iterate across them one by one in list order. Note (the same could be achieved by inserting the elements into a LinkedHashMap and simply calling iterator() on its entry set.
What's the point of IdentityHashMap then?
There are scenarios where it is guaranteed (or practically guaranteed) that object identity is the same as equals(). Imagine trying to implement your own ThreadLocal class for example, you'll probably write something like this:
public final class ThreadLocal<T> {
private final IdentityHashMap<Thread,T> valueMap;
...
public T get() {
return valueMap.get( Thread.currentThread() );
}
}
Because you know that threads have no notion of equality beyond identity. Same goes if your map keys are enum values and so on.
You will see significantly faster performance on IdentityHashMap, however that comes at a substantial cost.
You must be absolutely sure that you will never ever have objects added to the map that have the same value but different identities.
That's hard to guarantee both now and for the future, and a lot of people make mistaken assumptions.
For example
String t1 = "test";
String t2 = "test";
t1==t2 will return true.
String t1 = "test";
String t2 = new String("test");
t1==t2 will return false.
Overall my recommendation is that unless you absolutely critically need the performance boost and know exactly what you are doing and heavily lock down and comment access to the class then by using IdentityHashMap you are opening yourself up to massive risks of very hard to track down bugs in the future.
Technically you can do something like this to make sure you have the same instance of the string representation:
public class StringIdentityHashMap extends IdentityHashMap<String, String>
{
#Override
public String put(String key, String value)
{
return super.put(key.intern(), value.intern());
}
#Override
public void putAll(Map<? extends String, ? extends String> m)
{
m.entrySet().forEach(entry -> put(entry.getKey().intern(), entry.getValue().intern()));
}
#Override
public String get(Object key)
{
if (!(key instanceof String)) {
throw new IllegalArgumentException();
}
return super.get(((String) key).intern());
}
//implement the rest of the methods in the same way
}
But this won't help you very much since intern() calls equals() to make sure the given String exists or not in the String pool so you end up with the performance of the typical HashMap.
This, however will only help you to improve memory and not CPU. There is no way to achieve better CPU usage and to be sure your program is correct (without possible using some internal knowledge of JVM which might change) because Strings can be in String pool or not and you cannot know if they are in without (not implicitly) calling equals().
Interestingly, IdentityHashMap can be SLOWER. I am using Class objects as keys, and seeing a ~50% performance INCREASE with HashMap over IdentityHashMap.
IdentityHashMap and HashMap are different internally, so if the equals() method of your keys is really fast, HashMap seems better.

TreeSet Comparator

I have a TreeSet and a custom comparator.
I get the values from server according to the changes in the stock
ex: if time=0 then server will send all the entries on the stock (unsorted)
if time=200 then server will send entries added or deleted after the time 200(unsorted)
In client side i am sorting the entries. My question is which is more efficient
1> fetch all entries first and then call addAll method
or
2> add one by one
there can be millions of entries.
/////////updated///////////////////////////////////
private static Map<Integer, KeywordInfo> hashMap = new HashMap<Integer, KeywordInfo>();
private static Set<Integer> sortedSet = new TreeSet<Integer>(comparator);
private static final Comparator<Integer> comparator = new Comparator<Integer>() {
public int compare(Integer o1, Integer o2) {
int integerCompareValue = o1.compareTo(o2);
if (integerCompareValue == 0) return integerCompareValue;
KeywordInfo k1 = hashMap.get(o1);
KeywordInfo k2 = hashMap.get(o2);
if (null == k1.getKeyword()) {
if (null == k2.getKeyword())
return integerCompareValue;
else
return -1;
} else {
if (null == k2.getKeyword())
return 1;
else {
int compareString = AlphaNumericCmp.COMPARATOR.compare(k1.getKeyword().toLowerCase(), k2.getKeyword().toLowerCase());
//int compareString = k1.getKeyword().compareTo(k2.getKeyword());
if (compareString == 0)
return integerCompareValue;
return compareString;
}
}
}
};
now there is an event handler which gives me an ArrayList of updated entries,
after adding them to my hashMap i am calling
final Map<Integer, KeywordInfo> mapToReturn = new SubMap<Integer, KeywordInfo>(sortedSet, hashMap);
I think your bottleneck can be probably more network-related than CPU related. A bulk operation fetching all the new entries at once would be more network efficient.
With regards to your CPU, the time required to populate a TreeSet does not change consistently between multiple add()s and addAll(). The reason behind is that TreeSet relies on AbstractCollection's addAll() (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/util/AbstractCollection.java#AbstractCollection.addAll%28java.util.Collection%29) which in turn creates an iterator and calls multiple times add().
So, my advice on the CPU side is: choose the way that keeps your code cleaner and more readable. This is probably obtained through addAll().
In general it is less memory overhead when on being loaded alread data is stored. This should be time efficient too, maybe using small buffers. Memory allocation costs time too.
However time both solutions, in a separate prototype. You really have to test with huge numbers, as network traffic costs much too. That is a bit Test Driven Development, and adds to QA both quantitative statistics, as correctness of implementation.
The actual implementation is a linked list, so add one by one will be faster if you do it right. And i think in the near future this behaviour wont be change.
For your problem a Statefull comparator may help.
// snipplet, must not work fine
public class NaturalComparator implements Comparator{
private boolean anarchy = false;
private Comparator parentComparator;
NaturalComparator(Comparator parent){
this.parentComparator = parent;
}
public void setAnarchy(){...}
public int compare(A a, A b){
if(anarchy) return 1
else return parentCoparator.compare(a,b);
}
}
...
Set<Integer> sortedSet = new TreeSet<Integer>(new NaturalComparator(comparator));
comparator.setAnarchy(true);
sortedSet.addAll(sorted);
comparator.setAnarchy(false);

Optimized implementations of java.util.Map and java.util.Set?

I am writing an application where memory, and to a lesser extent speed, are vital. I have found from profiling that I spend a great deal of time in Map and Set operations. While I look at ways to call these methods less, I am wondering whether anyone out there has written, or come across, implementations that significantly improve on access time or memory overhead? or at least, that can improve these things given some assumptions?
From looking at the JDK source I can't believe that it can't be made faster or leaner.
I am aware of Commons Collections, but I don't believe it has any implementation whose goal is to be faster or leaner. Same for Google Collections.
Update: Should have noted that I do not need thread safety.
Normally these methods are pretty quick.
There are a couple of things you should check: are your hash codes implemented? Are they sufficiently uniform? Otherwise you'll get rubbish performance.
http://trove4j.sourceforge.net/ <-- this is a bit quicker and saves some memory. I saved a few ms on 50,000 updates
Are you sure that you're using maps/sets correctly? i.e. not trying to iterate over all the values or something similar. Also, e.g. don't do a contains and then a remove. Just check the remove.
Also check if you're using Double vs double. I noticed a few ms performance improvements on ten's of thousands of checks.
Have you also set up the initial capacity correctly/appropriately?
Have you looked at Trove4J ? From the website:
Trove aims to provide fast, lightweight implementations of the java.util.Collections API.
Benchmarks provided here.
Here are the ones I know, in addition to Google and Commons Collections:
http://trove4j.sourceforge.net/
http://javolution.org/
http://fastutil.dsi.unimi.it/
Of course you can always implement your own data structures which are optimized for your use cases. To be able to help better, we would need to know you access patterns and what kind of data you store in the collections.
Try improving the performance of your equals and hashCode methods, this could help speed up the standard containers use of your objects.
You can extend AbstractMap and/or AbstractSet as a starting point. I did this not too long ago to implement a binary trie based map (the key was an integer, and each "level" on the tree was a bit position. left child was 0 and right child was 1). This worked out well for us because the key was EUI-64 identifiers, and for us most of the time the top 5 bytes were going to be the same.
To implement an AbstractMap, you need to at the very least implement the entrySet() method, to return a set of Map.Entry, each of which is a key/value pair.
To implement a set, you extend AbstractSet and supply implementations of size() and iterator().
That's at the very least, however. You will want to also implement get and put, since the default map is unmodifiable, and the default implementation of get iterates through the entrySet looking for a match.
You can possibly save a little on memory by:
(a) using a stronger, wider hash code, and thus avoiding having to store the keys;
(b) by allocating yourself from an array, avoiding creating a separate object per hash table entry.
In case it's useful, here's a no-frills Java implementation of the Numerical Recipies hash table that I've sometimes found useful. You can key directly on a CharSequence (including Strings), or else you must yourself come up with a strong-ish 64-bit hash function for your objects.
Remember, this implementation doesn't store the keys, so if two items have the same hash code (which you'd expect after hashing in the order of 2^32 or a couple of billion items if you have a good hash function), then one item will overwrite the other:
public class CompactMap<E> implements Serializable {
static final long serialVersionUID = 1L;
private static final int MAX_HASH_TABLE_SIZE = 1 << 24;
private static final int MAX_HASH_TABLE_SIZE_WITH_FILL_FACTOR = 1 << 20;
private static final long[] byteTable;
private static final long HSTART = 0xBB40E64DA205B064L;
private static final long HMULT = 7664345821815920749L;
static {
byteTable = new long[256];
long h = 0x544B2FBACAAF1684L;
for (int i = 0; i < 256; i++) {
for (int j = 0; j < 31; j++) {
h = (h >>> 7) ^ h;
h = (h << 11) ^ h;
h = (h >>> 10) ^ h;
}
byteTable[i] = h;
}
}
private int maxValues;
private int[] table;
private int[] nextPtrs;
private long[] hashValues;
private E[] elements;
private int nextHashValuePos;
private int hashMask;
private int size;
#SuppressWarnings("unchecked")
public CompactMap(int maxElements) {
int sz = 128;
int desiredTableSize = maxElements;
if (desiredTableSize < MAX_HASH_TABLE_SIZE_WITH_FILL_FACTOR) {
desiredTableSize = desiredTableSize * 4 / 3;
}
desiredTableSize = Math.min(desiredTableSize, MAX_HASH_TABLE_SIZE);
while (sz < desiredTableSize) {
sz <<= 1;
}
this.maxValues = maxElements;
this.table = new int[sz];
this.nextPtrs = new int[maxValues];
this.hashValues = new long[maxValues];
this.elements = (E[]) new Object[sz];
Arrays.fill(table, -1);
this.hashMask = sz-1;
}
public int size() {
return size;
}
public E put(CharSequence key, E val) {
return put(hash(key), val);
}
public E put(long hash, E val) {
int hc = (int) hash & hashMask;
int[] table = this.table;
int k = table[hc];
if (k != -1) {
int lastk;
do {
if (hashValues[k] == hash) {
E old = elements[k];
elements[k] = val;
return old;
}
lastk = k;
k = nextPtrs[k];
} while (k != -1);
k = nextHashValuePos++;
nextPtrs[lastk] = k;
} else {
k = nextHashValuePos++;
table[hc] = k;
}
if (k >= maxValues) {
throw new IllegalStateException("Hash table full (size " + size + ", k " + k);
}
hashValues[k] = hash;
nextPtrs[k] = -1;
elements[k] = val;
size++;
return null;
}
public E get(long hash) {
int hc = (int) hash & hashMask;
int[] table = this.table;
int k = table[hc];
if (k != -1) {
do {
if (hashValues[k] == hash) {
return elements[k];
}
k = nextPtrs[k];
} while (k != -1);
}
return null;
}
public E get(CharSequence hash) {
return get(hash(hash));
}
public static long hash(CharSequence cs) {
if (cs == null) return 1L;
long h = HSTART;
final long hmult = HMULT;
final long[] ht = byteTable;
for (int i = cs.length()-1; i >= 0; i--) {
char ch = cs.charAt(i);
h = (h * hmult) ^ ht[ch & 0xff];
h = (h * hmult) ^ ht[(ch >>> 8) & 0xff];
}
return h;
}
}
Check out GNU Trove:
http://trove4j.sourceforge.net/index.html
There is at least one implementation in commons-collections that is specifically built for speed: Flat3Map it's pretty specific in that it'll be really quick as long as there are no more than 3 elements.
I suspect that you may get more milage through following #thaggie's advice add look at the equals/hashcode method times.
You said you profiled some classes but have you done any timings to check their speed? I'm not sure how you'd check their memory usage. It seems like it would be nice to have some specific figures at hand when you're comparing different implementations.
There are some notes here and links to several alternative data-structure libraries: http://www.leepoint.net/notes-java/data/collections/ds-alternatives.html
I'll also throw in a strong vote for fastutil. (mentioned in another response, and on that page) It has more different data structures than you can shake a stick at, and versions optimized for primitive types as keys or values. (A drawback is that the jar file is huge, but you can presumably trim it to just what you need)
I went through something like this a couple of years ago -- very large Maps and Sets as well as very many of them. The default Java implementations consumed way too much space. In the end I rolled my own, but only after I examined the actual usage patterns that my code required. For example, I had a known large set of objects that were created early on and some Maps were sparse while others were dense. Other structures grew monotonically (no deletes) while in other places it was faster to use a "collection" and do the occasional but harmless extra work of processing duplicate items than it was to spend the time and space on avoiding duplicates. Many of the implementations I used were array-backed and exploited the fact that my hashcodes were sequentially allocated and thus for dense maps a lookup was just an array access.
Take away messages:
look at your algorithm,
consider multiple implementations, and
remember that most of the libraries out there are catering for general purpose use (eg insert and delete, a range of sizes, neither sparse nor dense, etc) so they're going to have overheads that you can probably avoid.
Oh, and write unit tests...
At times when I have see Map and Set operations are using a high percentage of CPU, it has indicated that I have over used Map and Set and restructuring my data has almost eliminated collections from the top 10% CPU consumer.
See if you can avoid copies of collections, iterating over collections and any other operation which results in accessing most of the elements of the collection and creating objects.
It's probably not so much the Map or Set which causing the problem, but the objects behind them. Depending upon your problem, you might want a more database-type scheme where "objects" are stored as a bunch of bytes rather than Java Objects. You could embed a database (such as Apache Derby) or do your own specialist thing. It's very dependent upon what you are actually doing. HashMap isn't deliberately big and slow...
Commons Collections has FastArrayList, FastHashMap and FastTreeMap but I don't know what they're worth...
Commons Collections has an id map which compares through ==, which should be faster.
-[Joda Primities][1] as has primitive collections, as does Trove. I experimented with Trove and found that its memory useage is better.
I was mapping collections of many small objects with a few Integers. altering these to ints saved nearly half the memory (although requiring some messier application code to compensate).
It seems reasonable to me that sorted trees should consume less memory than hashmaps because they don't require the load factor (although if anyone can confirm or has a reason why this is actually dumb please post in the comments).
Which version of the JVM are you using?
If you are not on 6 (although I suspect you are) then a switch to 6 may help.
If this is a server application and is running on windows try using -server to use the correct hotspot implementation.
I use the following package (koloboke) to do a int-int hashmap, because it supports promitive type and it stores two int in a long variable, this is cool for me. koloboke

Categories

Resources