I am working with a TreeMap of Strings TreeMap<String, String>, and using it to implement a Dictionay of words.
I then have a collection of files, and would like to create a representation of each file in the vector space (space of words) defined by the dictionary.
Each file should have a vector representing it with following properties:
vector should have same size as dictionary
for each word contained in the file the vector should have a 1 in the position corresponding to the word position in dictionary
for each word not contained in the file the vector should have a -1 in the position corresponding to the word position in dictionary
So my idea is to use a Vector<Boolean> to implement these vectors. (This way of representing documents in a collection is called Boolean Model - http://www.site.uottawa.ca/~diana/csi4107/L3.pdf)
The problem I am facing in the procedure to create this vector is that I need a way to find position of a word in the dictionary, something like this:
String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...
1) Is there any method like this I can use on a TreeMap?If not could you provide some code to help me implement it by myself?
2) Is there an iterator on TreeMap (it's alphabetically ordered on keys) of which I can get position?
3)Eventually should I use another class to implement dictionary?(If you think that with TreeMaps I can't do what I need) If yes, which?
Thanks in advance.
ADDED PART:
Solution proposed by dasblinkenlight looks fine but has the problem of complexity (linear with dimension of dictionary due to copying keys into an array), and the idea of doing it for each file is not acceptable.
Any other ideas for my questions?
Once you have constructed your tree map, copy its sorted keys into an array, and use Arrays.binarySearch to look up the index in O(logN) time. If you need the value, do a lookup on the original map too.
Edit: this is how you copy keys into an array
String[] mapKeys = new String[treeMap.size()];
int pos = 0;
for (String key : treeMap.keySet()) {
mapKeys[pos++] = key;
}
An alternative solution would be to use TreeMap's headMap method. If the word exists in the TreeMap, then the size() of its head map is equal to the index of the word in the dictionary. It may be a bit wasteful compared to my other answer, through.
Here is how you code it in Java:
import java.util.*;
class Test {
public static void main(String[] args) {
TreeMap<String,String> tm = new TreeMap<String,String>();
tm.put("quick", "one");
tm.put("brown", "two");
tm.put("fox", "three");
tm.put("jumps", "four");
tm.put("over", "five");
tm.put("the", "six");
tm.put("lazy", "seven");
tm.put("dog", "eight");
for (String s : new String[] {
"quick", "brown", "fox", "jumps", "over",
"the", "lazy", "dog", "before", "way_after"}
) {
if (tm.containsKey(s)) {
// Here is the operation you are looking for.
// It does not work for items not in the dictionary.
int pos = tm.headMap(s).size();
System.out.println("Key '"+s+"' is at the position "+pos);
} else {
System.out.println("Key '"+s+"' is not found");
}
}
}
}
Here is the output produced by the program:
Key 'quick' is at the position 6
Key 'brown' is at the position 0
Key 'fox' is at the position 2
Key 'jumps' is at the position 3
Key 'over' is at the position 5
Key 'the' is at the position 7
Key 'lazy' is at the position 4
Key 'dog' is at the position 1
Key 'before' is not found
Key 'way_after' is not found
https://github.com/geniot/indexed-tree-map
I had the same problem. So I took the source code of java.util.TreeMap and wrote IndexedTreeMap. It implements my own IndexedNavigableMap:
public interface IndexedNavigableMap<K, V> extends NavigableMap<K, V> {
K exactKey(int index);
Entry<K, V> exactEntry(int index);
int keyIndex(K k);
}
The implementation is based on updating node weights in the red-black tree when it is changed. Weight is the number of child nodes beneath a given node, plus one - self. For example when a tree is rotated to the left:
private void rotateLeft(Entry<K, V> p) {
if (p != null) {
Entry<K, V> r = p.right;
int delta = getWeight(r.left) - getWeight(p.right);
p.right = r.left;
p.updateWeight(delta);
if (r.left != null) {
r.left.parent = p;
}
r.parent = p.parent;
if (p.parent == null) {
root = r;
} else if (p.parent.left == p) {
delta = getWeight(r) - getWeight(p.parent.left);
p.parent.left = r;
p.parent.updateWeight(delta);
} else {
delta = getWeight(r) - getWeight(p.parent.right);
p.parent.right = r;
p.parent.updateWeight(delta);
}
delta = getWeight(p) - getWeight(r.left);
r.left = p;
r.updateWeight(delta);
p.parent = r;
}
}
updateWeight simply updates weights up to the root:
void updateWeight(int delta) {
weight += delta;
Entry<K, V> p = parent;
while (p != null) {
p.weight += delta;
p = p.parent;
}
}
And when we need to find the element by index here is the implementation that uses weights:
public K exactKey(int index) {
if (index < 0 || index > size() - 1) {
throw new ArrayIndexOutOfBoundsException();
}
return getExactKey(root, index);
}
private K getExactKey(Entry<K, V> e, int index) {
if (e.left == null && index == 0) {
return e.key;
}
if (e.left == null && e.right == null) {
return e.key;
}
if (e.left != null && e.left.weight > index) {
return getExactKey(e.left, index);
}
if (e.left != null && e.left.weight == index) {
return e.key;
}
return getExactKey(e.right, index - (e.left == null ? 0 : e.left.weight) - 1);
}
Also comes in very handy finding the index of a key:
public int keyIndex(K key) {
if (key == null) {
throw new NullPointerException();
}
Entry<K, V> e = getEntry(key);
if (e == null) {
throw new NullPointerException();
}
if (e == root) {
return getWeight(e) - getWeight(e.right) - 1;//index to return
}
int index = 0;
int cmp;
if (e.left != null) {
index += getWeight(e.left);
}
Entry<K, V> p = e.parent;
// split comparator and comparable paths
Comparator<? super K> cpr = comparator;
if (cpr != null) {
while (p != null) {
cmp = cpr.compare(key, p.key);
if (cmp > 0) {
index += getWeight(p.left) + 1;
}
p = p.parent;
}
} else {
Comparable<? super K> k = (Comparable<? super K>) key;
while (p != null) {
if (k.compareTo(p.key) > 0) {
index += getWeight(p.left) + 1;
}
p = p.parent;
}
}
return index;
}
You can find the result of this work at https://github.com/geniot/indexed-tree-map
There's no such implementation in the JDK itself. Although TreeMap iterates in natural key ordering, its internal data structures are all based on trees and not arrays (remember that Maps do not order keys, by definition, in spite of that the very common use case).
That said, you have to make a choice as it is not possible to have O(1) computation time for your comparison criteria both for insertion into the Map and the indexOf(key) calculation. This is due to the fact that lexicographical order is not stable in a mutable data structure (as opposed to insertion order, for instance). An example: once you insert the first key-value pair (entry) into the map, its position will always be one. However, depending on the second key inserted, that position might change as the new key may be "greater" or "lower" than the one in the Map. You can surely implement this by maintaining and updating an indexed list of keys during the insertion operation, but then you'll have O(n log(n)) for your insert operations (as will need to re-order an array). That might be desirable or not, depending on your data access patterns.
ListOrderedMap and LinkedMap in Apache Commons both come close to what you need but rely on insertion order. You can check out their implementation and develop your own solution to the problem with little to moderate effort, I believe (that should be just a matter of replacing the ListOrderedMaps internal backing array with a sorted list - TreeList in Apache Commons, for instance).
You can also calculate the index yourself, by subtracting the number of elements that are lower than then given key (which should be faster than iterating through the list searching for your element, in the most frequent case - as you're not comparing anything).
I agree with Isolvieira. Perhaps the best approach would be to use a different structure than TreeMap.
However, if you still want to go with computing the index of the keys, a solution would be to count how many keys are lower than the key you are looking for.
Here is a code snippet:
java.util.SortedMap<String, String> treeMap = new java.util.TreeMap<String, String>();
treeMap.put("d", "content 4");
treeMap.put("b", "content 2");
treeMap.put("c", "content 3");
treeMap.put("a", "content 1");
String key = "d"; // key to get the index for
System.out.println( treeMap.keySet() );
final String firstKey = treeMap.firstKey(); // assuming treeMap structure doesn't change in the mean time
System.out.format( "Index of %s is %d %n", key, treeMap.subMap(firstKey, key).size() );
I'd like to thank all of you for the effort you put in answering my question, they all were very useful and taking the best from each of them made me come up to the solution I actually implemented in my project.
What I beleive to be best answers to my single questions are:
2) There is not an Iterator defined on TreeMaps as #Isoliveira sais:
There's no such implementation in the JDK itself.
Although TreeMap iterates in natural key ordering,
its internal data structures are all based on trees and not arrays
(remember that Maps do not order keys, by definition,
in spite of that the very common use case).
and as I found in this SO answer How to iterate over a TreeMap?, the only way to iterate on elements in a Map is to use map.entrySet() and use Iterators defined on Set (or some other class with Iterators).
3) It's possible to use a TreeMap to implement Dictionary, but this will garantuee a complexity of O(logN) in finding index of a contained word (cost of a lookup in a Tree Data Structure).
Using a HashMap with same procedure will instead have complexity O(1).
1) There exists no such method. Only solution is to implement it entirely.
As #Paul stated
Assumes that once getPosition() has been called, the dictionary is not changed.
assumption of solution is that once that Dictionary is created it will not be changed afterwards: in this way position of a word will always be the same.
Giving this assumption I found a solution that allows to build Dictionary with complexity O(N) and after garantuees the possibility to get index of a word contained with constat time O(1) in lookup.
I defined Dictionary as a HashMap like this:
public HashMap<String, WordStruct> dictionary = new HashMap<String, WordStruct>();
key --> the String representing the word contained in Dictionary
value --> an Object of a created class WordStruct
where WordStruct class is defined like this:
public class WordStruct {
private int DictionaryPosition; // defines the position of word in dictionary once it is alphabetically ordered
public WordStruct(){
}
public SetWordPosition(int pos){
this.DictionaryPosition = pos;
}
}
and allows me to keep memory of any kind of attribute I like to couple with the word entry of the Dictionary.
Now I fill dictionary iterating over all words contained in all files of my collection:
THE FOLLOWING IS PSEUDOCODE
for(int i = 0; i < number_of_files ; i++){
get_file(i);
while (file_contais_words){
dictionary.put( word(j) , new LemmaStruct());
}
}
Once HashMap is filled in whatever order I use procedure indicated by #dasblinkenlight to order it once and for all with complexity O(N)
Object[] dictionaryArray = dictionary.keySet().toArray();
Arrays.sort(dictionaryArray);
for(int i = 0; i < dictionaryArray.length; i++){
String word = (String) dictionaryArray[i];
dictionary.get(word).SetWordPosition(i);
}
And from now on to have index position in alphatebetic order of word in dictionary only thing needed is to acces it's variable DictionaryPosition:
since word is know you just need to access it and this has constant cost in a HashMap.
Thanks again and Iwish you all a Merry Christmas!!
Have you thought to make the values in your TreeMap contain the position in your dictionary? I am using a BitSet here for my file details.
This doesn't work nearly as well as my other idea below.
Map<String,Integer> dictionary = new TreeMap<String,Integer> ();
private void test () {
// Construct my dictionary.
buildDictionary();
// Make my file data.
String [] file1 = new String[] {
"1", "3", "5"
};
BitSet fileDetails = getFileDetails(file1, dictionary);
printFileDetails("File1", fileDetails);
}
private void printFileDetails(String fileName, BitSet details) {
System.out.println("File: "+fileName);
for ( int i = 0; i < details.length(); i++ ) {
System.out.print ( details.get(i) ? 1: -1 );
if ( i < details.length() - 1 ) {
System.out.print ( "," );
}
}
}
private BitSet getFileDetails(String [] file, Map<String, Integer> dictionary ) {
BitSet details = new BitSet();
for ( String word : file ) {
// The value in the dictionary is the index of the word in the dictionary.
details.set(dictionary.get(word));
}
return details;
}
String [] dictionaryWords = new String[] {
"1", "2", "3", "4", "5"
};
private void buildDictionary () {
for ( String word : dictionaryWords ) {
// Initially make the value 0. We will change that later.
dictionary.put(word, 0);
}
// Make the indexes.
int wordNum = 0;
for ( String word : dictionary.keySet() ) {
dictionary.put(word, wordNum++);
}
}
Here the building of the file details consists of a single lookup in the TreeMap for each word in the file.
If you were planning to use the value in the dictionary TreeMap for something else you could always compose it with an Integer.
Added
Thinking about it further, if the value field of the Map is earmarked for something you could always use special keys that calculate their own position in the Map and act just like Strings for comparison.
private void test () {
// Dictionary
Map<PosKey, String> dictionary = new TreeMap<PosKey, String> ();
// Fill it with words.
String[] dictWords = new String[] {
"0", "1", "2", "3", "4", "5"};
for ( String word : dictWords ) {
dictionary.put( new PosKey( dictionary, word ), word );
}
// File
String[] fileWords = new String[] {
"0", "2", "3", "5"};
int[] file = new int[dictionary.size()];
// Initially all -1.
for ( int i = 0; i < file.length; i++ ) {
file[i] = -1;
}
// Temp file words set.
Set fileSet = new HashSet( Arrays.asList( fileWords ) );
for ( PosKey key : dictionary.keySet() ) {
if ( fileSet.contains( key.getKey() ) ) {
file[key.getPosiion()] = 1;
}
}
// Print out.
System.out.println( Arrays.toString( file ) );
// Prints: [1, -1, 1, 1, -1, 1]
}
class PosKey
implements Comparable {
final String key;
// Initially -1
int position = -1;
// The map I am keying on.
Map<PosKey, ?> map;
public PosKey ( Map<PosKey, ?> map, String word ) {
this.key = word;
this.map = map;
}
public int getPosiion () {
if ( position == -1 ) {
// First access to the key.
int pos = 0;
// Calculate all positions in one loop.
for ( PosKey k : map.keySet() ) {
k.position = pos++;
}
}
return position;
}
public String getKey () {
return key;
}
public int compareTo ( Object it ) {
return key.compareTo( ( ( PosKey )it ).key );
}
public int hashCode () {
return key.hashCode();
}
}
NB: Assumes that once getPosition() has been called, the dictionary is not changed.
I would suggest that you write a SkipList to store your dictionary, since this will still offer O(log N) lookups, insertion and removal while also being able to provide an index (tree implementations can generally not return an index since the nodes don't know it, and there would be a cost to keeping them updated). Unfortunately the java implementation of ConcurrentSkipListMap does not provide an index, so you would need to implement your own version.
Getting the index of an item would be O(log N), if you wanted both the index and value without doing 2 lookups then you would need to return a wrapper object holding both.
Related
In a way, it is the reverse of the problem of generating subsets of size k from an array containing k+1 elements.
For example, if somebody gives me the pairs {a,b} , {a,c} , {b,c} , {a,e} , {b,e}, {a,f}, I need an algorithm that will tell me the triplets {a,b,c} and (a,b,e} are completely covered for their pairwise combinations in the pairs given to me. I need to generalize from pair/triplet in my example to the case k/k+1
My hunch was that there would be a well documented and efficient algorithm that solves my problem. Sadly, searching the internet did not help obtaining it. Questions already posted in stackoverflow do not cover this problem. I am thereby compelled to post this question to find my solution.
I'm not familiar with an established algorithm for this and you didn't ask for a specific language so I've written up a C# algorithm that accomplishes what you've asked and matches the test values provided. It doesn't have much real-world error checking. I've got a .Net fiddle you can run to see the results within a web browser. https://dotnetfiddle.net/ErwTeg
It works by converting your array of arrays (or similar container) to a dictionary with every unique value as a key and the value for each key being every value that is found within any list with the key. From your sample, a gets {b,c,e,f} (We'll call them friends, and this is what the GetFriends function does)
The AreFriendsWithEachother function indicates whether or not all passed values are friends with all other values.
The results of the friends list are then fed to the MakeTeam function which makes teams of a given size by enumerating every friend that a key has and trying every size length permutation of these. For instance, in the original example a has friend permutations of {{a,b,c},{a,b,e},{a,b,f},{a,c,b},{a,c,e},{a,c,f},{a,e,b},{a,e,c},{a,e,f},{a,f,b},{a,f,c},{a,f,e}}. Of these we make sure that all three values are friends by checking the friends list we created earlier. If all values within a permutation are friends then we add it to our results cache. The results would then be culled for all duplicate sets. This is handled in C# by using HashSet which only adds items that aren't already on the list.
The MakeTeam function is terrible looking because it contains a runtime variable number of loops (normally visualized by foreach). I am rolling up and down through enumerators and emulating the foreach loops myself.
I've included versions for MakeTeamOf3 and MakeTeamOf4 which show static loop structures, which are very easily adapted when you know your k value ahead of time.
The same code is provided here
using System;
using System.Collections.Generic;
using System.Linq;
namespace kfromkm1 // k elements from k minus 1
{
public class Program
{
static readonly string[][] pairs =
{
new string[] { "a", "b" },
new string[] { "a", "c" },
new string[] { "b", "c" },
new string[] { "a", "e" },
new string[] { "b", "e" },
new string[] { "a", "f" }
};
static readonly string[][] pairsExpectedResult =
{
new string[] { "a", "b", "c" },
new string[] { "a", "b", "e" }
};
static readonly string[][] triplets =
{
new string[] { "a", "b", "c" },
new string[] { "a", "b", "d" },
new string[] { "a", "c", "d" },
new string[] { "b", "c", "d" },
new string[] { "b", "c", "e" }
};
static readonly string[][] tripletsExpectedResults =
{
new string[] { "a", "b", "c", "d" }
};
public static void Main(string[] args)
{
Dictionary<string, HashSet<string>> friendsList = GetFriends(pairs);
Dump(nameof(pairs), pairs);
Console.WriteLine();
Dump(nameof(pairsExpectedResult), pairsExpectedResult);
Console.WriteLine();
HashSet<HashSet<string>> teams = MakeTeams(friendsList, 3);
Dump(nameof(teams), teams);
Console.WriteLine();
friendsList = GetFriends(triplets);
Dump(nameof(triplets), triplets);
Console.WriteLine();
Dump(nameof(tripletsExpectedResults), tripletsExpectedResults);
Console.WriteLine();
teams = MakeTeams(friendsList, 4);
Dump(nameof(teams), teams);
Console.ReadLine();
}
// helper function to display results
static void Dump<T>(string name, IEnumerable<IEnumerable<T>> values)
{
Console.WriteLine($"{name} =");
int line = 0;
bool notfirst;
foreach (IEnumerable<T> layer in values)
{
Console.Write($"{line}: {{");
notfirst = false;
foreach (T value in layer)
{
if (notfirst)
Console.Write($", {value}");
else
{
Console.Write(value);
notfirst = true;
}
}
Console.WriteLine("}");
line++;
}
}
// items are friends if they show up in a set (pair in the example) together
// list can be a list of lists, array of arrays, list of arrays, etc
// {a, b} means a and b are friends
// {a, b, c} means a is friends with b and c, b is friends with a and c, c is friends with a and b
static Dictionary<T, HashSet<T>> GetFriends<T>(IEnumerable<IEnumerable<T>> list) where T : IEquatable<T>
{
Dictionary<T, HashSet<T>> result = new Dictionary<T, HashSet<T>>();
foreach (IEnumerable<T> set in list) // one set at a time
{
foreach (T current in set) // enumerate the set from front to back
{
foreach (T other in set) // enumerate the set with a second pointer to compare every item
{
if (!current.Equals(other)) // ignore self
{
if (!result.ContainsKey(current)) // initialize this item's result hashset
result[current] = new HashSet<T>();
result[current].Add(other); // add friend (hashset will ignore duplicates)
}
}
}
}
return result;
}
// indicates whether or not all items are friends
static bool AreFriendsWithEachother<T>(Dictionary<T, HashSet<T>> friendsList, IEnumerable<T> values)
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
foreach (T first in values)
{
if (!friendsList.ContainsKey(first)) // not on list, has no friends
return false;
foreach (T other in values)
{
if (!friendsList[first].Contains(other) && !first.Equals(other)) // false if even one doesn't match, don't count self as non-friend for computational ease
return false;
}
}
return true; // all matched so true
}
// size represents how many items should be in each team
static HashSet<HashSet<T>> MakeTeams<T>(Dictionary<T, HashSet<T>> friendsList, int size) where T : IEquatable<T>
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
if (size < 2)
throw new ArgumentOutOfRangeException(nameof(size), size, "Size should be greater than 2");
HashSet<HashSet<T>> result = new HashSet<HashSet<T>>(HashSet<T>.CreateSetComparer());
T[] values = new T[size];
IEnumerator<T>[] enumerators = new IEnumerator<T>[size - 1]; // gotta cache our own enumerators with a variable number of "foreach" layers
int layer;
bool moveNext;
foreach (T key in friendsList.Keys) // this is a mess because it's a runtime variable number of copies of enumerators running over the same list
{
values[0] = key;
for (int index = 0; index < size - 1; index++)
enumerators[index] = friendsList[key].GetEnumerator();
moveNext = true;
layer = 0;
while (moveNext)
{
while (layer < size - 1 && moveNext)
{
if (enumerators[layer].MoveNext())
layer++;
else
{
if (layer == 0)
moveNext = false;
else
{
enumerators[layer].Reset();
layer--;
}
}
}
for (int index = 1; index < size; index++)
values[index] = enumerators[index - 1].Current;
if (values.Distinct().Count() == size && AreFriendsWithEachother(friendsList, values))
result.Add(new HashSet<T>(values));
layer--;
}
}
return result;
}
// provided as an example
static HashSet<HashSet<T>> MakeTeamsOf3<T>(Dictionary<T, HashSet<T>> friendsList) where T : IEquatable<T>
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
HashSet<HashSet<T>> result = new HashSet<HashSet<T>>(HashSet<T>.CreateSetComparer());
T[] values;
foreach (T key in friendsList.Keys) // start with every key
{
foreach (T first in friendsList[key])
{
foreach (T second in friendsList[key])
{
values = new T[] { key, first, second };
if (values.Distinct().Count() == 3 && AreFriendsWithEachother(friendsList, values)) // there's no duplicates and they are friends
result.Add(new HashSet<T>(values));
}
}
}
return result;
}
// provided as an example
static HashSet<HashSet<T>> MakeTeamsOf4<T>(Dictionary<T, HashSet<T>> friendsList) where T : IEquatable<T>
{
if (friendsList == null) // no list = no results
throw new ArgumentNullException(nameof(friendsList));
HashSet<HashSet<T>> result = new HashSet<HashSet<T>>(HashSet<T>.CreateSetComparer());
T[] values;
foreach (T key in friendsList.Keys) // start with every key
{
foreach (T first in friendsList[key])
{
foreach (T second in friendsList[key])
{
foreach (T third in friendsList[key])
{
values = new T[] { key, first, second, third };
if (values.Distinct().Count() == 4 && AreFriendsWithEachother(friendsList, values)) // there's no duplicates and they are friends
result.Add(new HashSet<T>(values));
}
}
}
}
return result;
}
}
}
Function to generate SetOfkNbrdElementCombinations
//to generate outputs with k values greater than two (pairwise)
Take SetOfkNbrdElementCombinations as an input
//Example - {{a,b},{b,c},...} : here k is 2 (though variable name will retain the letter k); elements are a,b,c,..; sets {a,b}, {b,c} are 2-numbered combinations of elements
Take nextSize as an input
//nextSize should be bigger than the k in the input SetOfkNbrdElementCombinations by 1.
//For example above where k is 2, nextSize would be 3
//Logic:
Comb(SetOfkNbrdElementCombinations={S1,S2,...Sn},nextSize) = {S1,Comb({SetOfkNbrdElementCombinations-a1},nextSize-l)}
//The recursive algorithm specified in the line above generates sets containing unique nextSize numbered combinations of the combinations in SetOfkNbrdElementCombinations
//Code that implements the algorithm is available at Rosetta code
//In our example it would generate {{{a,b},{b,c},{b,e}},{{a,b},{b,c},{a,c}},...} (triplets of pairs)
//My logic to generate nextSize numbered combinations of elements is below
// Example of my output, based on the example input above, would be {{a,b,c},{a,c,e},...}
Intitialize oputSetOfkNbrdElementCombinations to empty
For each nextSize sized combination of combinations generated above
Join the contained combinations in a union set
If the nbr of elements in the union is nextSize, add the union set to oputSetOfkNbrdElementCombinations
Output oputSetOfkNbrdElementCombinations
Here is the Java implementation of the algorithm. You can copy, paste and run on https://ide.geeksforgeeks.org/
/* This program takes k sized element combinations and generates the k+1 sized element combinations
that are possible.
For example if the program is given {a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}, {b,c,e}
which are 3 sized combinations, it will identify {a,b,c,d} the
4 sized combination that has all the 3 sized combinations of its elements covered
in what were provided to the program
The program can scale to higher values of k.
The program uses only the hashset data structure
*/
//AUTHOR: Suri Chitti
import java.util.*;
public class uppOrdCombsFromCombs {
//sample CSV strings...let us pretend they came from a file
//This is a sample of input to the program
static String[] csvStrings = new String[] {
"a,b,c",
"a,b,d",
"a,c,d",
"b,c,d",
"b,c,e"
};
/* //Another sample CSV strings...let us pretend they came from a file
//This is another sample of input to the program
static String[] csvStrings = new String[] {
"a,b",
"b,c",
"a,c",
"c,e",
"a,e"
};
*/ /////USE ONLY ONE SAMPLE
//Before we can generate a k+1 sized combination of elements from a bunch
//of k sized combinations we need to obtain groups containing k+1 number of
// k sized combinations
//The method below, called SetOfNxtSizeNbrdkElementCombinationsets will do it for us
//It takes a bunch of k sized combinations called the parameter hsSetOfkNbrdCombinationsetsPrm
//which is a hashset.
//It also takes k+1 as input called the parameter nextSize which is an integer
//Outputs k+1 sized groups of k sized element combinations as a variable called hsSetOfNxtSizeNbrdCombinationsets
//which is hashset
static HashSet SetOfNxtSizeNbrdCombinationsets(HashSet hsSetOfkNbrdCombinationsetsPrm, Integer nextSize){
HashSet hsSetOfNxtSizeNbrdCombinationsets = new HashSet<>();//is it better to have nested <HashSet> tokens in this declaration?
HashSet hsRecursor = new HashSet<>(hsSetOfkNbrdCombinationsetsPrm);
Iterator <HashSet> loopIterator1 = hsSetOfkNbrdCombinationsetsPrm.iterator();
while (loopIterator1.hasNext()) {
HashSet hsName = loopIterator1.next();
if(nextSize == 1){
hsSetOfNxtSizeNbrdCombinationsets.add(hsName);
}
else {
HashSet hsConc1 = new HashSet<>();
hsRecursor.remove(hsName);
hsConc1 = SetOfNxtSizeNbrdCombinationsets(hsRecursor,nextSize-1);
Iterator <HashSet> loopIterator2 = hsConc1.iterator();
while (loopIterator2.hasNext()) {
HashSet hsConc2 = new HashSet<>();
HashSet hsConc3 = new HashSet<>();
hsConc2 = loopIterator2.next();
Iterator <HashSet> loopIterator3 = hsConc2.iterator();
Object obj = loopIterator3.next();
if (String.class.isInstance(obj)) {
hsConc3.add(hsName);
hsConc3.add(hsConc2);
}
else {
loopIterator3 = hsConc2.iterator();
hsConc3.add(hsName);
while (loopIterator3.hasNext()) {
hsConc3.add(loopIterator3.next());
}
}
hsSetOfNxtSizeNbrdCombinationsets.add(hsConc3);
}
}
}
return hsSetOfNxtSizeNbrdCombinationsets;
}
//The method below takes the k+1 sized groupings of k sized element combinations
//generated by the method above and generates all possible K+1 sized combinations of
//elements contained in them
//Name of the method is SetOfkNbrdCombinationsets
//It takes the k+1 sized groupings in a parameter called hsSetOfNxtSizeNbrdCombinationsetsPrm which is a HashSet
//It takes the value k+1 as a parameter called nextSize which is an Integer
//It returns k+1 sized combinations as a variable called hsSetOfkNbrdCombinationsets which is a HashSet
//This is the intended output of the whole program
static HashSet SetOfkNbrdCombinationsets(HashSet hsSetOfNxtSizeNbrdCombinationsetsPrm, Integer nextSize){
HashSet hsSetOfkNbrdCombinationsets = new HashSet<>();
HashSet hsMember = new HashSet<>();
Iterator <HashSet> loopIteratorOverParam = hsSetOfNxtSizeNbrdCombinationsetsPrm.iterator();
while (loopIteratorOverParam.hasNext()) {
hsMember = loopIteratorOverParam.next();
HashSet hsInnerUnion = new HashSet<>();
Iterator <HashSet> loopIteratorOverMember = hsMember.iterator();
while (loopIteratorOverMember.hasNext()) {
HashSet hsInnerMemb = new HashSet<>(loopIteratorOverMember.next());
hsInnerUnion.addAll(hsInnerMemb);
}
if (hsInnerUnion.size()==nextSize) {
HashSet hsTemp = new HashSet<>(hsInnerUnion);
hsSetOfkNbrdCombinationsets.add(hsTemp);
}
hsInnerUnion.clear();
}
return hsSetOfkNbrdCombinationsets;
}
public static void main(String args[]) {
HashSet hsSetOfkNbrdCombinationsets = new HashSet<>();//should this have nested <HashSet> tokens?
HashSet hsSetOfNxtSizeNbrdCombinationsets = new HashSet<>();//should this have nested <HashSet> tokens?
Integer innerSize=0,nextSize = 0;
System.out.println("Ahoy");
//pretend we are looping through lines in a file here
for(String line : csvStrings)
{
String[] linePieces = line.split(",");
List<String> csvPieces = new ArrayList<String>(linePieces.length);
for(String piece : linePieces)
{
//System.out.println(piece); will print each piece in separate lines
csvPieces.add(piece);
}
innerSize = csvPieces.size();
Set<String> hsInner = new HashSet<String>(csvPieces);
hsSetOfkNbrdCombinationsets.add(hsInner);
}
nextSize = innerSize+1; //programmatically obtain nextSize
hsSetOfNxtSizeNbrdCombinationsets = SetOfNxtSizeNbrdCombinationsets(hsSetOfkNbrdCombinationsets,nextSize);
hsSetOfkNbrdCombinationsets = SetOfkNbrdCombinationsets(hsSetOfNxtSizeNbrdCombinationsets, nextSize);
System.out.println("The " + nextSize + " sized combinations from elements present in the input combinations are: " + hsSetOfkNbrdCombinationsets);
} //end of main
} //end of class
I'm developing a Java Application that reads a lot of strings data likes this:
1 cat (first read)
2 dog
3 fish
4 dog
5 fish
6 dog
7 dog
8 cat
9 horse
...(last read)
I need a way to keep all couple [string, occurrences] in order from last read to first read.
string occurrences
horse 1 (first print)
cat 2
dog 4
fish 2 (last print)
Actually i use two list:
1) List<string> input; where i add all data
In my example:
input.add("cat");
input.add("dog");
input.add("fish");
...
2)List<string> possibilities; where I insert the strings once in this way:
if(possibilities.contains("cat")){
possibilities.remove("cat");
}
possibilities.add("cat");
In this way I've got a sorted list where all possibilities.
I use it like that:
int occurrence;
for(String possible:possibilities){
occurrence = Collections.frequency(input, possible);
System.out.println(possible + " " + occurrence);
}
That trick works good but it's too slow(i've got millions of input)... any help?
(English isn’t my first language, so please excuse any mistakes.)
Use a Map<String, Integer>, as #radoslaw pointed, to keep the insertion sorting use LinkedHashMap and not a TreeMap as described here:
LinkedHashMap keeps the keys in the order they were inserted, while a TreeMap is kept sorted via a Comparator or the natural Comparable ordering of the elements.
Imagine you have all the strings in some array, call it listOfAllStrings, iterate over this array and use the string as key in your map, if it does not exists, put in the map, if it exists, sum 1 to actual result...
Map<String, Integer> results = new LinkedHashMap<String, Integer>();
for (String s : listOfAllStrings) {
if (results.get(s) != null) {
results.put(s, results.get(s) + 1);
} else {
results.put(s, 1);
}
}
Make use of a TreeMap, which will keep ordering on the keys as specified by the compare of your MyStringComparator class handling MyString class which wraps String adding insertion indexes, like this:
// this better be immutable
class MyString {
private MyString() {}
public static MyString valueOf(String s, Long l) { ... }
private String string;
private Long index;
public hashcode(){ return string.hashcode(); }
public boolean equals() { // return rely on string.equals() }
}
class MyStringComparator implements Comparator<MyString> {
public int compare(MyString s1, MyString s2) {
return -s1.getIndex().compareTo(s2.gtIndex());
}
}
Pass the comparator while constructing the map:
Map<MyString,Integer> map = new TreeMap<>(new MyStringComparator());
Then, while parsing your input, do
Long counter = 0;
while (...) {
MyString item = MyString.valueOf(readString, counter++);
if (map.contains(item)) {
map.put(map.get(item)+1);
} else {
map.put(item,1);
}
}
There will be a lot of instantiation because of the immutable class, and the comparator will not be consistent with equals, but it should work.
Disclaimer: this is untested code just to show what I'd do, I'll come back and recheck it when I get my hands on a compiler.
Here is the complete solution for your problem,
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class DataDto implements Comparable<DataDto>{
public int count = 0;
public String string;
public long lastSeenTime;
public DataDto(String string) {
this.string = string;
this.lastSeenTime = System.currentTimeMillis();
}
public boolean equals(Object object) {
if(object != null && object instanceof DataDto) {
DataDto temp = (DataDto) object;
if(temp.string != null && temp.string.equals(this.string)) {
return true;
}
}
return false;
}
public int hashcode() {
return string.hashCode();
}
public int compareTo(DataDto o) {
if(o != null) {
return o.lastSeenTime < this.lastSeenTime ? -1 : 1;
}
return 0;
}
public String toString() {
return this.string + " : " + this.count;
}
public static final void main(String[] args) {
String[] listOfAllStrings = {"horse", "cat", "dog", "fish", "cat", "fish", "dog", "cat", "horse", "fish"};
Map<String, DataDto> results = new HashMap<String, DataDto>();
for (String s : listOfAllStrings) {
DataDto dataDto = results.get(s);
if(dataDto != null) {
dataDto.count = dataDto.count + 1;
dataDto.lastSeenTime = System.nanoTime();
} else {
dataDto = new DataDto(s);
results.put(s, dataDto);
}
}
List<DataDto> finalResults = new ArrayList<DataDto>(results.values());
System.out.println(finalResults);
Collections.sort(finalResults);
System.out.println(finalResults);
}
}
Ans
[horse : 1, cat : 2, fish : 2, dog : 1]
[fish : 2, horse : 1, cat : 2, dog : 1]
I think this solution will be suitable for your requirement.
If you know that your data is not going to exceed your memory capacity when you read it all into memory, then the solution is simple - using a LinkedList or a and a LinkedHashMap.
For example, if you use a Linked list:
LinkedList<String> input = new LinkedList();
You then proceed to use input.add() as you did originally. But when the input list is full, you basically use Jordi Castilla's solution - but put the entries in the linked list in reverse order. To do that, you do:
Iterator<String> iter = list.descendingIterator();
LinkedHashMap<String,Integer> map = new LinkedHashMap<>();
while (iter.hasNext()) {
String s = iter.next();
if ( map.containsKey(s)) {
map.put( s, map.get(s) + 1);
} else {
map.put(s, 1);
}
}
Now, the only real difference between his solution and mine is that I'm using list.descendingIterator() which is a method in LinkedList that gives you the entries in backwards order, from "horse" to "cat".
The LinkedHashMap will keep the proper order - whatever was entered first will be printed first, and because we entered things in reverse order, then whatever was read last will be printed first. So if you print your map the result will be:
{horse=1, cat=2, dog=4, fish=2}
If you have a very long file, and you can't load the entire list of strings into memory, you had better keep just the map of frequencies. In this case, in order to keep the order of entry, we'll use an object such as this:
private static class Entry implements Comparable<Entry> {
private static long nextOrder = Long.MIN_VALUE;
private String str;
private int frequency = 1;
private long order = nextOrder++;
public Entry(String str) {
this.str = str;
}
public String getString() {
return str;
}
public int getFrequency() {
return frequency;
}
public void updateEntry() {
frequency++;
order = nextOrder++;
}
#Override
public int compareTo(Entry e) {
if ( order > e.order )
return -1;
if ( order < e.order )
return 1;
return 0;
}
#Override
public String toString() {
return String.format( "%s: %d", str, frequency );
}
}
The trick here is that every time you update the entry (add one to the frequency), it also updates the order. But the compareTo() method orders Entry objects from high order (updated/inserted later) to low order (updated/inserted earlier).
Now you can use a simple HashMap<String,Entry> to store the information as you read it (I'm assuming you are reading from some sort of scanner):
Map<String,Entry> m = new HashMap<>();
while ( scanner.hasNextLine() ) {
String str = scanner.nextLine();
Entry entry = m.get(str);
if ( entry == null ) {
entry = new Entry(str);
m.put(str, entry);
} else {
entry.updateEntry();
}
}
Scanner.close();
Now you can sort the values of the entries:
List<Entry> orderedList = new ArrayList<Entry>(m.values());
m = null;
Collections.sort(orderedList);
Running System.out.println(orderedList) will give you:
[horse: 1, cat: 2, dog: 4, fish: 2]
In principle, you could use a TreeMap whose keys contained the "order" stuff, rather than a plain HashMap like this followed by sorting, but I prefer not having either mutable keys in a map, nor changing the keys constantly. Here we are only changing the values as we fill the map, and each key is inserted into the map only once.
What you could do:
Reverse the order of the list using
Collections.reverse(input). This runs in linear time - O(n);
Create a Set from the input list. A Set garantees uniqueness.
To preserve insertion order, you'll need a LinkedHashSet;
Iterate over this set, just as you did above.
Code:
/* I don't know what logic you use to create the input list,
* so I'm using your input example. */
List<String> input = Arrays.asList("cat", "dog", "fish", "dog",
"fish", "dog", "dog", "cat", "horse");
/* by the way, this changes the input list!
* Copy it in case you need to preserve the original input. */
Collections.reverse(input);
Set<String> possibilities = new LinkedHashSet<String>(strings);
for (String s : possibilities) {
System.out.println(s + " " + Collections.frequency(strings, s));
}
Output:
horse 1
cat 2
dog 4
fish 2
My hashmap contains one of entry as **key: its-site-of-origin-from-another-site##NOUN** and **value: its##ADJ site-of-origin-from-another-site##NOUN**
i want to get the value of this key on the basis of only key part of `"its-site-of-origin-from-another-site"``
If hashmap contains key like 'its-site-of-origin-from-another-site' then it should be first pick 'its' and then 'site-of-origin-from-another-sit' only not the part after '##'
No. It would be a String so it will pick up whatever after "##" as well. If you need value based on substring then you would have to iterate over the map like:
String value = map.get("its...");
if (value != null) {
//exact match for value
//use it
} else {//or use map or map which will reduce your search time but increase complexity
for (Map.Entry<String, String> entry : map.entrySet()) {
if (entry.getKey().startsWith("its...")) {
//that's the value i needed.
}
}
}
You can consider using a Patricia trie. It's a data structure like a TreeMap where the key is a String and any type of value. It's kind of optimal for storage because common string prefix between keys are shared, but the property which is interesting for your use case is that you can search for specific prefix and get a sorted view of the map entries.
Following is an example with Apache Common implementation.
import org.apache.commons.collections4.trie.PatriciaTrie;
public class TrieStuff {
public static void main(String[] args) {
// Build a Trie with String values (keys are always strings...)
PatriciaTrie<String> pat = new PatriciaTrie<>();
// put some key/value stuff with common prefixes
Random rnd = new Random();
String[] prefix = {"foo", "bar", "foobar", "fiz", "buz", "fizbuz"};
for (int i = 0; i < 100; i++) {
int r = rnd.nextInt(6);
String key = String.format("%s-%03d##whatever", prefix[r], i);
String value = String.format("%s##ADJ %03d##whatever", prefix[r], i);
pat.put(key, value);
}
// Search for all entries whose keys start with "fiz"
SortedMap<String, String> fiz = pat.prefixMap("fiz");
fiz.entrySet().stream().forEach(e -> System.out.println(e));
}
}
Prints all keys that start with "fiz" and sorted.
fiz-000##whatever
fiz-002##whatever
fiz-012##whatever
fiz-024##whatever
fiz-027##whatever
fiz-033##whatever
fiz-036##whatever
fiz-037##whatever
fiz-041##whatever
fiz-045##whatever
fiz-046##whatever
fiz-047##whatever
fizbuz-008##whatever
fizbuz-011##whatever
fizbuz-016##whatever
fizbuz-021##whatever
fizbuz-034##whatever
fizbuz-038##whatever
I have an array of objects (telephone directory entries, stored in the form Entry( surname,initials,extension) ) which I would like to search efficiently. In order to do this I'm trying to use Arrays.binarySearch(). I have two separate methods for searching the array, one using names and the other using numbers. The array is sorted by Surname in alphabetical order as I insert each element in the correct place in my addEntry() method. I can use binarySearch() when searching by name as the array is sorted in alphabetical order, but the problem I have is the array is not sorted when I search by number. I have overridden compareTo() in my Entry class for comparing surnames, but when I search by number I need to sort my array in ascending order of numbers, I am unsure how to do this?
public int lookupNumberByName(String surname, String initials) {
int index = 0;
if (countElements() == directory.length) {
Entry lookup = new Entry(surname, initials);
index = Arrays.binarySearch(directory, lookup);
}
else if (countElements() != directory.length) {
Entry[] origArray = directory;
Entry[] cutArray = Arrays
.copyOfRange(directory, 0, countElements());
directory = cutArray;
Entry lookup = new Entry(surname, initials);
index = Arrays.binarySearch(directory, lookup);
directory = origArray;
}
return index;
}
I would like to do something like this for my LookupByNumber() method -
public int LookupByNumber(int extension) {
Entry[] origArray1 = directory;
Entry[] cutArray1 = Arrays.copyOfRange(directory, 0, countElements());
directory = cutArray1;
Arrays.sort(directory); //sort in ascending order of numbers
Entry lookup1 = new Entry(extension);
int index1 = Arrays.binarySearch(directory, lookup1);
String surname1 = directory[index1].getSurname();
String initals1 = directory[index1].getInitials();
directory = origArray1;
int arrayPos = lookupNumberByName(surname1,initials1);
return arrayPos;
My compareTo method -
public int compareTo(Entry other) {
return this.surname.compareTo(other.getSurname());
}
Help very much appreciated
edit - I realize arrays are not the best data structure for this, but I have been specifically asked to use an array for this task.
Update - How exactly does sort(T[] a, Comparator<? super T> c) work? when I try writing my own Comparator -
public class numberSorter implements Comparator<Entry> {
#Override
public int compare(Entry o1, Entry o2) {
if (o1.getExtension() > o2.getExtension()) {
return 1;
}
if (o1.getExtension() == o2.getExtension()) {
return 0;
}
if (o1.getExtension() < o2.getExtension()) {
return -1;
}
return -1;
}
}
And calling Arrays.sort(directory,new numberSorter()); I get the following exception -
java.lang.NullPointerException
at java.lang.String.compareTo(Unknown Source)
at project.Entry.compareTo(Entry.java:45)
at project.Entry.compareTo(Entry.java:1)
at java.util.Arrays.binarySearch0(Unknown Source)
at java.util.Arrays.binarySearch(Unknown Source)
at project.ArrayDirectory.LookupByNumber(ArrayDirectory.java:128)
at project.test.main(test.java:29)
What exactly am I doing wrong?
Rather than keeping the Entry objects in Arrays, keep them in Maps. For example, you'd have one Map that mapped the Surname to the Entry, and another that mapped the Extension to the Entry. You can then efficiently look up the entry by Surname or Extension by calling the get() method on the appropriate Map.
If the Map is a TreeMap, the lookup is about the same speed as a binary search (O log(n)). If you use a HashMap, it can be even faster once you get a large number of entries.
The problem I have is an example of something I've seen often. I have a series of strings (one string per line, lets say) as input, and all I need to do is return how many times each string has appeared. What is the most elegant way to solve this, without using a trie or other string-specific structure? The solution I've used in the past has been to use a hashtable-esque collection of custom-made (String, integer) objects that implements Comparable to keep track of how many times each string has appeared, but this method seems clunky for several reasons:
1) This method requires the creation of a comparable function which is identical to the String's.compareTo().
2) The impression that I get is that I'm misusing TreeSet, which has been my collection of choice. Updating the counter for a given string requires checking to see if the object is in the set, removing the object, updating the object, and then reinserting it. This seems wrong.
Is there a more clever way to solve this problem? Perhaps there is a better Collections interface I could use to solve this problem?
Thanks.
One posibility can be:
public class Counter {
public int count = 1;
}
public void count(String[] values) {
Map<String, Counter> stringMap = new HashMap<String, Counter>();
for (String value : values) {
Counter count = stringMap.get(value);
if (count != null) {
count.count++;
} else {
stringMap.put(value, new Counter());
}
}
}
In this way you still need to keep a map but at least you don't need to regenerate the entry every time you match a new string, you can access the Counter class, which is a wrapper of integer and increase the value by one, optimizing the access to the array
TreeMap is much better for this problem, or better yet, Guava's Multiset.
To use a TreeMap, you'd use something like
Map<String, Integer> map = new TreeMap<>();
for (String word : words) {
Integer count = map.get(word);
if (count == null) {
map.put(word, 1);
} else {
map.put(word, count + 1);
}
}
// print out each word and each count:
for (Map.Entry<String, Integer> entry : map.entrySet()) {
System.out.printf("Word: %s Count: %d%n", entry.getKey(), entry.getValue());
}
Integer theCount = map.get("the");
if (theCount == null) {
theCount = 0;
}
System.out.println(theCount); // number of times "the" appeared, or null
Multiset would be much simpler than that; you'd just write
Multiset<String> multiset = TreeMultiset.create();
for (String word : words) {
multiset.add(word);
}
for (Multiset.Entry<String> entry : multiset.entrySet()) {
System.out.printf("Word: %s Count: %d%n", entry.getElement(), entry.getCount());
}
System.out.println(multiset.count("the")); // number of times "the" appeared
You can use a hash-map (no need to "create a comparable function"):
Map<String,Integer> count(String[] strings)
{
Map<String,Integer> map = new HashMap<String,Integer>();
for (String key : strings)
{
Integer value = map.get(key);
if (value == null)
map.put(key,1);
else
map.put(key,value+1);
}
return map;
}
Here is how you can use this method in order to print (for example) the string-count of your input:
Map<String,Integer> map = count(input);
for (String key : map.keySet())
System.out.println(key+" "+map.get(key));
You can use a Bag data structure from the Apache Commons Collection, like the HashBag.
A Bag does exactly what you need: It keeps track of how often an element got added to the collections.
HashBag<String> bag = new HashBag<>();
bag.add("foo");
bag.add("foo");
bag.getCount("foo"); // 2