Mark an empty space into an array without using null - java

I am extending AbstractMap and I want to implement my own hash-map using two parallel arrays:
K[] keys;
V[] values;
Suppose I want to store null values as well, how could I initialize these two arrays so that I can differentiate between a space in the array where I could place some new key-value pairs and a space where I am storing a null?

Might I suggest not using two arrays, and instead do something along the lines of:
class Node {
K key;
V value;
}
Node[] nodes;
Then a non-entry is an element in nodes that is equal to null.

If the values can be null but the keys cannot be null then having a null key would mean that there is no key.
If the key can also be null you can use a parallel array of booleans to store whether each space is taken or not.
K[] keys;
V[] values;
boolean[] hasValue;

Not quite sure the details of your question, but you could always have some special object for your "blank".
private static final Object BLANK = new Object();
Then if the item in the array == BLANK, then consider it to be an empty slot.

Since there can only be one null key, you can simply have a special reference value (not in the array) that holds the value of the object mapped from this null key (and possibly a boolean indicating if this value has been set). Unfortunately this will probably complicate iteration.
E.g.
private boolean isNullMapped = false;
private V nullValue = null;
public put(K key, V value)
{
if (key == null) { nullValue = value; }
...
}
Alternatively, you can wrap all keys in a wrapper object (supposing you still want to use parallel arrays instead of entries), and if the value contained in this wrapper object is null, then it represents the null key.
E.g.
private static class KeyWrapper<K>
{
public K key;
}
Lastly, as a question for consideration, if you are not having entries in your arrays, but instead are directly holding arrays of K and V, then how are you accounting for different keys that happen to share the same hash code? The java.util implementation has arrays of entries that also act as linked lists to account for this possibility (and incidentally, the null key is always mapped to array index 0).

Storing a null value is not a problem in your scenario. So long as keys[n] != null, just return values[n] whether values[n] is null or not.
Remember that you are not being asked to key on n but objects of type K so every access of the Map will require a search through keys to find the key they are looking for.
However, if you want to allow the storage of a value against a null key then using something like private static final Object NULL_KEY = "NULL" would probably do the trick as the other suggestions point out.
private static final Object NULL_KEY = "NULL";
K[] keys;
V[] values;
private int find(K key) {
for (int i = 0; i < keys.length; i++) {
if (keys[i] == key) {
return i;
}
}
return -1;
}
public V put(K key, V value) {
V old = null;
if (key != null) {
int i = find(key);
if (i >= 0) {
old = values[i];
values[i] = value;
} else {
// ...
}
} else {
return put((K) NULL_KEY, value);
}
return old;
}
public V get(K key) {
if (key != null) {
int i = find(key);
if (i >= 0) {
return values[i];
}
return null;
} else {
return (get((K) NULL_KEY));
}
}

In the java.util implementation a special object representing null is used.

Related

Java - how to get a key object (or entry) stored in HashMap by key?

I'd like to get the "canonical" key object for each key usable to query a map. See here:
Map<UUID, String> map = new HashMap();
UUID a = new UUID("ABC...");
map.put(a, "Tu nejde o zamykání.");
UUID b = new UUID("ABC...");
String string = map.get(b); // This gives that string.
// This is what I am looking for:
UUID againA = map.getEntry(b).key();
boolean thisIsTrue = a == againA;
A HashMap uses equals(), which is the same for multiple unique objects. So I want to get the actual key from the map, which will always be the same, no matter what object was used to query the map.
Is there a way to get the actual key object from the map? I don't see anything in the interface, but perhaps some clever trick I overlooked?
(Iterating all entries or keys doesn't count.)
Is there a way to get the actual key object from the map?
OK, so I am going to make some assumptions about what you mean. After all, you said that your question doesn't need clarification, so the obvious meaning that I can see must be the correct one. Right? :-)
The answer is No. There isn't a way.
Example scenario (not compileable!)
UUID uuid = UUID.fromString("xxxx-yyy-zzz");
UUID uuid2 = UUID.fromString("xxxx-yyy-zzz"); // same string
println(uuid == uuid2); // prints false
println(uuid.equals(true)); // prints true
Map<UUID, String> map = new ...
map.put(uuid, "fred");
println(map.get(uuid)); // prints fred
println(map.get(uuid2)); // prints fred (because uuid.equals(uuid2) is true)
... but, the Map API does not provide a way to find the actual key (in the example above it is uuid) in the map apart from iterating the key or entry sets. And I'm not aware of any existing Map class (standard or 3rd-party) that does provide this1.
However, you could implement your own Map class with an additional method for returning the actual key object. There is no technical reason why you couldn't, though you would have more code to write, test, maintain, etcetera.
But I would add that I agree with Jim Garrison. If you have a scenario where you have UUID objects (with equality-by-value semantics) and you also want to implement equality by identity semantics, then there is probably something wrong with your application's design. The correct approach would be to change the UUID.fromString(...) implementation to always return the same UUID object for the same input string.
1 - This is not to say that such a map implementation doesn't exist. But if it does, you should be able to find it if you look hard enough Note that Questions asking us to find or recommend a library are off-topic!
There is a (relatively) simple way of doing this. I’ve done so in my applications from time to time, when needed ... not for the purpose of == testing, but to reduce the number of identical objects being stored when tens of thousand of objects exist, and are cross-referenced with each other. This significantly reduced my memory usage, and improved performance ... while still using equals() for equality tests.
Just maintain a parallel map for interning the keys.
Map<UUID, UUID> interned_keys = ...
UUID key = ...
if (interned_keys.contains(key))
key = interned_keys.get(key)
Of course, it is far better when the object being stored knows what its own identity is. Then you get the interning basically for free.
class Item {
UUID key;
// ...
}
Map<UUID, Item> map = ...
map.put(item.key, item);
UUID key = ...
key = map.get(key).key; // get interned key
I think there are valid reasons for wanting the actual key. For example, to save memory. Also keep in mind that the actual key may store other objects. For instance, suppose you have a vertex of a graph. The vertex can store the actual data (Say a String, for instance), as well as the incident vertices. A vertex hash value can be dependent only on the data. So to look up a vertex with some data,
D, look up a vertex with data, D,and with with no incident values. Now if you can return the actual vertex in the map you will be able to get the actual incident to the vertex.
It seems to me that many map implementations could easily provide a getEntry method. For example, the HashMap implementation for get is:
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
One could use the getNode method to return an Entry:
public getEntry(Object key){
Node<K,V> e = getNode(hash(key),key);
if(e == null) return null;
return new Entry<>(e.key,e.value);
}
The easiest way is to duplicate the reference to the key in the value using a generic Pair type, like this:
HashMap<UUID,Pair<UUID,String>> myMap = new HashMap<>();
When you put them in the map, you provide the reference to the key to the pair. The cost is one reference per entry.
void add(UUID uuid, String str)
{
myMap.put(uuid,Pair.of(uuid,str));
}
Pair<UUID,String> get(UUID uuid)
{
return myMap.get(uuid);
}
Then getFirst() of the Pair is your key. getSecond() is the value.
Whatever you do, it's going to cost you in either time or space.
Your Pair class will be something like:
public class Pair<A,B>
{
private final A a;
private final B b;
public Pair(A a, B b)
{
this.a = a;
this.b = b;
}
/**
* #return the first argument of the Pair
*/
public A getFirst()
{
return this.a;
}
/**
* #return the second argument of the Pair
*/
public B getSecond()
{
return this.b;
}
/**
* Create a Pair.
*
* #param a The first argument (of type A)
* #param b The second argument (of type B)
*
* #return A Pair of A and B
*/
public static <A,B> Pair<A,B> of(A a, B b)
{
return new Pair<>(a,b);
}
// Don't forget to get your IDE to produce a hashcode()
// and equals() method for you, depending
// on if you allow nulls or not, or DIY.
}
it could help. You can use a for each like below.
Map<String,Object> map = new HashMap<>();
map.put("hello1", new String("Hello"));
map.put("hello2", new String("World"));
map.put("hello3", new String("How"));
map.put("hello4", new String("Are u"));
for(Map.Entry<String,Object> e: map.entrySet()){
System.out.println(e.getKey());
}

TreeMap java implementation - putting 1st element

public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
compare(key, key); // type (and possibly null) check
root = new Entry<>(key, value, null);
size = 1;
modCount++;
return null;
}
int cmp;
...
}
final int compare(Object k1, Object k2) {
return comparator==null ? ((Comparable<? super K>)k1).compareTo((K)k2)
: comparator.compare((K)k1, (K)k2);
}
After facing some bug in my application, I had to debug TreeMaps put method. My issue was in comparing objects that were put in the map. What is odd, is that when I put FIRST element to the Map, it key gets compared with itself. I can't understand why would it work like that. Any insights (besides the commented "type (and possibly null) check")? Why wouldn't they just check if key was null? What kind of "type" check is made out there and what for?
As mentioned in the comment, https://bugs.openjdk.java.net/browse/JDK-5045147 is the issue where this was introduced. From the discussion in that issue, the original fix was the following:
BT2:SUGGESTED FIX
Doug Lea writes:
"Thanks! I have a strong sense of deja vu that I've
added this before(!) but Treemap.put should have the
following trap added."
public V put(K key, V value) {
Entry<K,V> t = root;
if (t == null) {
+ if (key == null) {
+ if (comparator == null)
+ throw new NullPointerException();
+ comparator.compare(key, key);
+ }
incrementSize();
root = new Entry<K,V>(key, value, null);
return null;
}
The intention seems to throw a NPE in case the comparator of the TreeMap is null, or the comparator does not accept null keys (which conforms to the API specification). It seems the fix was shortened to one line:
compare(key, key);
which is defined as:
#SuppressWarnings("unchecked")
final int compare(Object k1, Object k2) {
return comparator==null ? ((Comparable<? super K>)k1).compareTo((K)k2)
: comparator.compare((K)k1, (K)k2);
}
Hence this test will do both the null check and the type check, namely the cast to Comparable.
I believe this is the place where TreeMap< K,V > checks if K implements Comparable if no Comparator is supplied. You get a ClassCastException otherwise.

Cant understand why my generic code is not working [duplicate]

This question already has answers here:
How can I properly compare two Integers in Java?
(10 answers)
Closed 8 years ago.
Below is a simple implementation of a linked list. I have just added the relevant code.
First, I add some values to the list, 10,990 and 10000. When I am searching for the same values, I get true for key = 10, but false for key = 990 and key = 10000, though it should be true. Also, if I change the second value from 990 to 99 and search for key = 99, this time I am getting a true.
I am not sure about using generic type. I guess I am doing something wrong there. Because if I replace generic type with int, I get the correct behavior. Please suggest.
public class LinkedListTest {
public static void main(String[] args) {
LinkedList<Integer> num = new LinkedList<Integer>();
num.add(10);
num.add(990);
num.add(10000);
int key = 10;
System.out.println("Key " + key + " found ?" + num.findValue(key));
key = 990; //also checked for Integer key = 990
System.out.println("Key " + key + " found ?" + num.findValue(key));
key = 10000;
System.out.println("Key " + key + " found ?" + num.findValue(key));
}
}
class LinkedList<T>{
private Node<T> first;
private class Node<T>{
private T data;
private Node<T> next;
public Node(T data){
this.data = data;
this.next = next;
}
}
public void add(T data){
Node<T> nn = new Node<T>(data);
nn.next = first;
first = nn;
}
public boolean findValue(T key){
Node current = first;
while(current != null){
if(current.data == key)
return true;
else
current = current.next;
}
return false;
}
}
The == operator compares two object references to see if they refer to the same object. With Integer values, the JVM will cache Integers from -128 through 127.
From the Integer.valueOf javadocs:
This method will always cache values in the range -128 to 127, inclusive, and may cache other values outside of this range.
When 10 and 99 are boxed, they result in the same Integer object (respectively) when another 10 and 99 are boxed. However, boxing non-cached Integer objects such as 990 and 10000 will result in different Integer objects each time.
Replace == with the equals method, to compare the key contents, not the key references.
if(current.data != null && current.data.equals(key))
You should be using .equals() instead of == when checking if you've found the value you're looking for:
public boolean findValue(T key){
Node current = first;
while(current != null){
if(current.data != null && current.data.equals(key))
return true;
else
current = current.next;
}
return false;
}
When you declare your LinkedList as a list of Integers, your primitive int is wrapped in an Integer object before it is stored in the node. Thus == doesn't always work because you're not comparing two primitives.
Your problem is that you are using == instead of equals.
It works for int which is a primitive type, but for Integer (object) == returns true only if the two members are the same instance.
In order to be "generic" you should use equals method instead of == operator.

Recursively traverse HashMap?

Is there a way to recursively traverse a HashMap so that value1 of key1 is actually the new key2 which returns value2 that again will be the next key3 and so on ... till it returns null? The logic is as follows:
hm.get(key)
hm.get(hm.get(key))
hm.get(hm.get(hm.get(key)))
......
I'm assuming this may be done through some recursion procedure? Please correct me if I were wrong. Thanks!
Is this the one you wanted procedure? it will return the ultimate value by traversing the hashmap:
Public Object traverseMap(Object key)
while(hm.get(key) != null){
key = hm.get(key);
}
return key;
}
If the hashmap would be set up this way (i.e. it contains a value which is also the key for another value) it would be possible. You could do that in a recursive method but a loop would be sufficient:
Object key = someInitialKey;
Object value = null;
do {
value = hm.get( key );
key = value;
} while( value != null );
Well, anyway, that's the (tail!) recursive version you asked for:
public class Qsdf {
public static Object traverseMap(Map m, Object key) {
return traverseMap(m, key, new HashSet());
}
public static Object traverseMap(Map m, Object key, Set traversed) {
if (key == null) { // first key has to be null
throw new NullPointerException();
}
traversed.add(key);
Object value = m.get(key);
if (traversed.contains(value)) { // added after Stephen C's comment on other answer
// cycle found, either throw exception, return null, or return key
return key;
}
return value != null ?
traverseMap(m, value, traversed) :
key; // I guess you want to return the last value that isn't also a key
}
public static void main(String[] args) {
final HashMap<Integer, Integer> m = new HashMap<Integer, Integer>();
m.put(0, 1);
m.put(1, 2);
m.put(3, 4);
m.put(2, 3);
final Object o = traverseMap(m, 0);
System.out.println(o);
}
}

Find element position in a Java TreeMap

I am working with a TreeMap of Strings TreeMap<String, String>, and using it to implement a Dictionay of words.
I then have a collection of files, and would like to create a representation of each file in the vector space (space of words) defined by the dictionary.
Each file should have a vector representing it with following properties:
vector should have same size as dictionary
for each word contained in the file the vector should have a 1 in the position corresponding to the word position in dictionary
for each word not contained in the file the vector should have a -1 in the position corresponding to the word position in dictionary
So my idea is to use a Vector<Boolean> to implement these vectors. (This way of representing documents in a collection is called Boolean Model - http://www.site.uottawa.ca/~diana/csi4107/L3.pdf)
The problem I am facing in the procedure to create this vector is that I need a way to find position of a word in the dictionary, something like this:
String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...
1) Is there any method like this I can use on a TreeMap?If not could you provide some code to help me implement it by myself?
2) Is there an iterator on TreeMap (it's alphabetically ordered on keys) of which I can get position?
3)Eventually should I use another class to implement dictionary?(If you think that with TreeMaps I can't do what I need) If yes, which?
Thanks in advance.
ADDED PART:
Solution proposed by dasblinkenlight looks fine but has the problem of complexity (linear with dimension of dictionary due to copying keys into an array), and the idea of doing it for each file is not acceptable.
Any other ideas for my questions?
Once you have constructed your tree map, copy its sorted keys into an array, and use Arrays.binarySearch to look up the index in O(logN) time. If you need the value, do a lookup on the original map too.
Edit: this is how you copy keys into an array
String[] mapKeys = new String[treeMap.size()];
int pos = 0;
for (String key : treeMap.keySet()) {
mapKeys[pos++] = key;
}
An alternative solution would be to use TreeMap's headMap method. If the word exists in the TreeMap, then the size() of its head map is equal to the index of the word in the dictionary. It may be a bit wasteful compared to my other answer, through.
Here is how you code it in Java:
import java.util.*;
class Test {
public static void main(String[] args) {
TreeMap<String,String> tm = new TreeMap<String,String>();
tm.put("quick", "one");
tm.put("brown", "two");
tm.put("fox", "three");
tm.put("jumps", "four");
tm.put("over", "five");
tm.put("the", "six");
tm.put("lazy", "seven");
tm.put("dog", "eight");
for (String s : new String[] {
"quick", "brown", "fox", "jumps", "over",
"the", "lazy", "dog", "before", "way_after"}
) {
if (tm.containsKey(s)) {
// Here is the operation you are looking for.
// It does not work for items not in the dictionary.
int pos = tm.headMap(s).size();
System.out.println("Key '"+s+"' is at the position "+pos);
} else {
System.out.println("Key '"+s+"' is not found");
}
}
}
}
Here is the output produced by the program:
Key 'quick' is at the position 6
Key 'brown' is at the position 0
Key 'fox' is at the position 2
Key 'jumps' is at the position 3
Key 'over' is at the position 5
Key 'the' is at the position 7
Key 'lazy' is at the position 4
Key 'dog' is at the position 1
Key 'before' is not found
Key 'way_after' is not found
https://github.com/geniot/indexed-tree-map
I had the same problem. So I took the source code of java.util.TreeMap and wrote IndexedTreeMap. It implements my own IndexedNavigableMap:
public interface IndexedNavigableMap<K, V> extends NavigableMap<K, V> {
K exactKey(int index);
Entry<K, V> exactEntry(int index);
int keyIndex(K k);
}
The implementation is based on updating node weights in the red-black tree when it is changed. Weight is the number of child nodes beneath a given node, plus one - self. For example when a tree is rotated to the left:
private void rotateLeft(Entry<K, V> p) {
if (p != null) {
Entry<K, V> r = p.right;
int delta = getWeight(r.left) - getWeight(p.right);
p.right = r.left;
p.updateWeight(delta);
if (r.left != null) {
r.left.parent = p;
}
r.parent = p.parent;
if (p.parent == null) {
root = r;
} else if (p.parent.left == p) {
delta = getWeight(r) - getWeight(p.parent.left);
p.parent.left = r;
p.parent.updateWeight(delta);
} else {
delta = getWeight(r) - getWeight(p.parent.right);
p.parent.right = r;
p.parent.updateWeight(delta);
}
delta = getWeight(p) - getWeight(r.left);
r.left = p;
r.updateWeight(delta);
p.parent = r;
}
}
updateWeight simply updates weights up to the root:
void updateWeight(int delta) {
weight += delta;
Entry<K, V> p = parent;
while (p != null) {
p.weight += delta;
p = p.parent;
}
}
And when we need to find the element by index here is the implementation that uses weights:
public K exactKey(int index) {
if (index < 0 || index > size() - 1) {
throw new ArrayIndexOutOfBoundsException();
}
return getExactKey(root, index);
}
private K getExactKey(Entry<K, V> e, int index) {
if (e.left == null && index == 0) {
return e.key;
}
if (e.left == null && e.right == null) {
return e.key;
}
if (e.left != null && e.left.weight > index) {
return getExactKey(e.left, index);
}
if (e.left != null && e.left.weight == index) {
return e.key;
}
return getExactKey(e.right, index - (e.left == null ? 0 : e.left.weight) - 1);
}
Also comes in very handy finding the index of a key:
public int keyIndex(K key) {
if (key == null) {
throw new NullPointerException();
}
Entry<K, V> e = getEntry(key);
if (e == null) {
throw new NullPointerException();
}
if (e == root) {
return getWeight(e) - getWeight(e.right) - 1;//index to return
}
int index = 0;
int cmp;
if (e.left != null) {
index += getWeight(e.left);
}
Entry<K, V> p = e.parent;
// split comparator and comparable paths
Comparator<? super K> cpr = comparator;
if (cpr != null) {
while (p != null) {
cmp = cpr.compare(key, p.key);
if (cmp > 0) {
index += getWeight(p.left) + 1;
}
p = p.parent;
}
} else {
Comparable<? super K> k = (Comparable<? super K>) key;
while (p != null) {
if (k.compareTo(p.key) > 0) {
index += getWeight(p.left) + 1;
}
p = p.parent;
}
}
return index;
}
You can find the result of this work at https://github.com/geniot/indexed-tree-map
There's no such implementation in the JDK itself. Although TreeMap iterates in natural key ordering, its internal data structures are all based on trees and not arrays (remember that Maps do not order keys, by definition, in spite of that the very common use case).
That said, you have to make a choice as it is not possible to have O(1) computation time for your comparison criteria both for insertion into the Map and the indexOf(key) calculation. This is due to the fact that lexicographical order is not stable in a mutable data structure (as opposed to insertion order, for instance). An example: once you insert the first key-value pair (entry) into the map, its position will always be one. However, depending on the second key inserted, that position might change as the new key may be "greater" or "lower" than the one in the Map. You can surely implement this by maintaining and updating an indexed list of keys during the insertion operation, but then you'll have O(n log(n)) for your insert operations (as will need to re-order an array). That might be desirable or not, depending on your data access patterns.
ListOrderedMap and LinkedMap in Apache Commons both come close to what you need but rely on insertion order. You can check out their implementation and develop your own solution to the problem with little to moderate effort, I believe (that should be just a matter of replacing the ListOrderedMaps internal backing array with a sorted list - TreeList in Apache Commons, for instance).
You can also calculate the index yourself, by subtracting the number of elements that are lower than then given key (which should be faster than iterating through the list searching for your element, in the most frequent case - as you're not comparing anything).
I agree with Isolvieira. Perhaps the best approach would be to use a different structure than TreeMap.
However, if you still want to go with computing the index of the keys, a solution would be to count how many keys are lower than the key you are looking for.
Here is a code snippet:
java.util.SortedMap<String, String> treeMap = new java.util.TreeMap<String, String>();
treeMap.put("d", "content 4");
treeMap.put("b", "content 2");
treeMap.put("c", "content 3");
treeMap.put("a", "content 1");
String key = "d"; // key to get the index for
System.out.println( treeMap.keySet() );
final String firstKey = treeMap.firstKey(); // assuming treeMap structure doesn't change in the mean time
System.out.format( "Index of %s is %d %n", key, treeMap.subMap(firstKey, key).size() );
I'd like to thank all of you for the effort you put in answering my question, they all were very useful and taking the best from each of them made me come up to the solution I actually implemented in my project.
What I beleive to be best answers to my single questions are:
2) There is not an Iterator defined on TreeMaps as #Isoliveira sais:
There's no such implementation in the JDK itself.
Although TreeMap iterates in natural key ordering,
its internal data structures are all based on trees and not arrays
(remember that Maps do not order keys, by definition,
in spite of that the very common use case).
and as I found in this SO answer How to iterate over a TreeMap?, the only way to iterate on elements in a Map is to use map.entrySet() and use Iterators defined on Set (or some other class with Iterators).
3) It's possible to use a TreeMap to implement Dictionary, but this will garantuee a complexity of O(logN) in finding index of a contained word (cost of a lookup in a Tree Data Structure).
Using a HashMap with same procedure will instead have complexity O(1).
1) There exists no such method. Only solution is to implement it entirely.
As #Paul stated
Assumes that once getPosition() has been called, the dictionary is not changed.
assumption of solution is that once that Dictionary is created it will not be changed afterwards: in this way position of a word will always be the same.
Giving this assumption I found a solution that allows to build Dictionary with complexity O(N) and after garantuees the possibility to get index of a word contained with constat time O(1) in lookup.
I defined Dictionary as a HashMap like this:
public HashMap<String, WordStruct> dictionary = new HashMap<String, WordStruct>();
key --> the String representing the word contained in Dictionary
value --> an Object of a created class WordStruct
where WordStruct class is defined like this:
public class WordStruct {
private int DictionaryPosition; // defines the position of word in dictionary once it is alphabetically ordered
public WordStruct(){
}
public SetWordPosition(int pos){
this.DictionaryPosition = pos;
}
}
and allows me to keep memory of any kind of attribute I like to couple with the word entry of the Dictionary.
Now I fill dictionary iterating over all words contained in all files of my collection:
THE FOLLOWING IS PSEUDOCODE
for(int i = 0; i < number_of_files ; i++){
get_file(i);
while (file_contais_words){
dictionary.put( word(j) , new LemmaStruct());
}
}
Once HashMap is filled in whatever order I use procedure indicated by #dasblinkenlight to order it once and for all with complexity O(N)
Object[] dictionaryArray = dictionary.keySet().toArray();
Arrays.sort(dictionaryArray);
for(int i = 0; i < dictionaryArray.length; i++){
String word = (String) dictionaryArray[i];
dictionary.get(word).SetWordPosition(i);
}
And from now on to have index position in alphatebetic order of word in dictionary only thing needed is to acces it's variable DictionaryPosition:
since word is know you just need to access it and this has constant cost in a HashMap.
Thanks again and Iwish you all a Merry Christmas!!
Have you thought to make the values in your TreeMap contain the position in your dictionary? I am using a BitSet here for my file details.
This doesn't work nearly as well as my other idea below.
Map<String,Integer> dictionary = new TreeMap<String,Integer> ();
private void test () {
// Construct my dictionary.
buildDictionary();
// Make my file data.
String [] file1 = new String[] {
"1", "3", "5"
};
BitSet fileDetails = getFileDetails(file1, dictionary);
printFileDetails("File1", fileDetails);
}
private void printFileDetails(String fileName, BitSet details) {
System.out.println("File: "+fileName);
for ( int i = 0; i < details.length(); i++ ) {
System.out.print ( details.get(i) ? 1: -1 );
if ( i < details.length() - 1 ) {
System.out.print ( "," );
}
}
}
private BitSet getFileDetails(String [] file, Map<String, Integer> dictionary ) {
BitSet details = new BitSet();
for ( String word : file ) {
// The value in the dictionary is the index of the word in the dictionary.
details.set(dictionary.get(word));
}
return details;
}
String [] dictionaryWords = new String[] {
"1", "2", "3", "4", "5"
};
private void buildDictionary () {
for ( String word : dictionaryWords ) {
// Initially make the value 0. We will change that later.
dictionary.put(word, 0);
}
// Make the indexes.
int wordNum = 0;
for ( String word : dictionary.keySet() ) {
dictionary.put(word, wordNum++);
}
}
Here the building of the file details consists of a single lookup in the TreeMap for each word in the file.
If you were planning to use the value in the dictionary TreeMap for something else you could always compose it with an Integer.
Added
Thinking about it further, if the value field of the Map is earmarked for something you could always use special keys that calculate their own position in the Map and act just like Strings for comparison.
private void test () {
// Dictionary
Map<PosKey, String> dictionary = new TreeMap<PosKey, String> ();
// Fill it with words.
String[] dictWords = new String[] {
"0", "1", "2", "3", "4", "5"};
for ( String word : dictWords ) {
dictionary.put( new PosKey( dictionary, word ), word );
}
// File
String[] fileWords = new String[] {
"0", "2", "3", "5"};
int[] file = new int[dictionary.size()];
// Initially all -1.
for ( int i = 0; i < file.length; i++ ) {
file[i] = -1;
}
// Temp file words set.
Set fileSet = new HashSet( Arrays.asList( fileWords ) );
for ( PosKey key : dictionary.keySet() ) {
if ( fileSet.contains( key.getKey() ) ) {
file[key.getPosiion()] = 1;
}
}
// Print out.
System.out.println( Arrays.toString( file ) );
// Prints: [1, -1, 1, 1, -1, 1]
}
class PosKey
implements Comparable {
final String key;
// Initially -1
int position = -1;
// The map I am keying on.
Map<PosKey, ?> map;
public PosKey ( Map<PosKey, ?> map, String word ) {
this.key = word;
this.map = map;
}
public int getPosiion () {
if ( position == -1 ) {
// First access to the key.
int pos = 0;
// Calculate all positions in one loop.
for ( PosKey k : map.keySet() ) {
k.position = pos++;
}
}
return position;
}
public String getKey () {
return key;
}
public int compareTo ( Object it ) {
return key.compareTo( ( ( PosKey )it ).key );
}
public int hashCode () {
return key.hashCode();
}
}
NB: Assumes that once getPosition() has been called, the dictionary is not changed.
I would suggest that you write a SkipList to store your dictionary, since this will still offer O(log N) lookups, insertion and removal while also being able to provide an index (tree implementations can generally not return an index since the nodes don't know it, and there would be a cost to keeping them updated). Unfortunately the java implementation of ConcurrentSkipListMap does not provide an index, so you would need to implement your own version.
Getting the index of an item would be O(log N), if you wanted both the index and value without doing 2 lookups then you would need to return a wrapper object holding both.

Categories

Resources