Searching through Collections in Java - java

I have a java properties file containing a key/value pair of country names and codes. I will load the contents of this file into a Collection like List or HashMap.
Then, I want users to be able to search for a country, e.g if they type 'Aus' in a textbox and click submit, then I want to search through the collection I have, containing a key/value pair of country codes/names (e.g AUS=>Australia), and return those countries which are found matching.
Is there any more efficient way of doing this, other than looping through the elements of the collection and using charAt()?

If performance is important, you can use a TreeSet or TreeMap to hold the country names, and do the following can be used to identify countries that start with a given string.
NavigableMap<String, String> countries = new TreeMap<String, String>();
countries.put("australia", "Australia");
...
String userText = ...
String tmp = userText.toLower();
List<String> hits = new ArrayList<String>();
Map.Entry<String, String> entry = countries.ceilingEntry(tmp);
while (entry != null && entry.getKey().startsWith(tmp)) {
hits.add(entry.getValue());
entry = map.higherEntry(entry.getKey());
}
// hits now contains all country names starting with the value of `userText`,
// ignoring differences in letter case.
This is O(logN) where N is the number of countries. By contrast a linear search of a collection is O(N)

Looping with String.contains() is the way unless you want to move in some heavy artillery like Lucene.

Short of indexing the collection via something like Lucene, then you'd have to manually check by looping through all of the elements. You could use startsWith as opposed to looping over the string:
String userText = ...
for (Map.Entry<String, String> entry : map) {
boolean entryMatches = entry.getKey().startsWith(userText);
...
Or alternatively use regular expressions:
Pattern pattern = Pattern.compile(userText);
for (Map.Entry<String, String> entry : map) {
boolean entryMatches = pattern.matcher(entry.getKey()).find();
...

Since the list is small enough to load into memory, sort it and then do a binary search, using the static method java.util.Collections.binarySearch(). This returns an index, and works regardless of whether the exact string is in the list or not (although if it's not it returns a negative number, so be sure to check that). Then, starting from that index, just iterative forward to find all the strings with that prefix. As a nice side-effect, the resulting output will be in alphabetical order.
To make the whole thing case insensitive, remember to convert to lowercase when loading the list and of course convert the prefix to lowercase before searching.

Related

How do I store this data in Java?

I want a dictionary of values. The keys are all strings. Each key corresponds with some sort of list of strings. How do I make a list of strings for each key and update that accordingly? I'll explain:
I have a loop that is reading lines of a word list. The words are then converted into a string code and set as keys in the dictionary. Here is an example of the string code/word relationship.
123, [the]
456, [dog]
328, [bug]
...
However, my program keeps looping through the word list and eventually will run into a word with the same code as "the", but maybe a different word, lets say "cat". So I want the list to look like:
123, [the, cat]
456, [dog]
...
How do I get it to make an arraylist for every key that I can then add to on the fly when needed? My end goal is to be able to print out the list of words in that list for a called code (.get())
You can make a HashMap. In your case
HashMap<Integer, ArrayList<String>> works fine.
Like it has already been said, a MultiMap seems to be what you need. Guava that was already suggested and it's a good option. There is also and implementation from commons-collections you can use.
From commons-collections documentation:
MultiValuedMap<K, String> map = new MultiValuedHashMap<K, String>();
map.put(key, "A");
map.put(key, "B");
map.put(key, "C");
Collection<String> coll = map.get(key); // returns ["A", "B", "C"]
You can always implement your own MultiMap if you don't want to use an external library. Use a HashMap<String,List<String>> to store your values and wrap it with your own put, get and whatever other methods you see fit.
It sounds like you want a Multimap from the Guava library.
You can also go the route of using a Map<Integer, List<String>>, but then you will need to manually handle the case where the list is null (probably just allocate a new list in that case).
You can use a HashMap that links each id to a list of strings:
Map<String, List<String>> dictionary = new HashMap<String,List<String>>();
Now let's say you read two Strings: id and word . To add them to your dictionary, you can first verify if your id has already been read (using the containsKey() method)- in which case you just append the word to the list corresponding to that id - or, if this is not the case, you create a new list with this word:
//If the list already exists...
if(dictionary.containsKey(id)) {
List<String> appended = dictionary.get(id);
appended.add(word); //We add a new word to our current list
dictionary.remove(id); //We update the map by first removing the old list
dictionary.put(id, appended); //and then appending the new one
} else {
//Otherwise we create a new list for that id
List<String> newList = new ArrayList<String>();
newList.add(word);
dictionary.put(id, newList);
}
Then whenever you want to retrieve your list of strings for a certain id you can simply use dictionary.get(id);
You can find more information on HashMaps on the Java documentation
I assumed you didn't want repeats in your list so I used Set instead.
Map<String,Set<String>> mapToSet = new HashMap<>();
List<String []>keyvals = Arrays.asList(new String[][]{{"123","the"},{"123","cat"}});
for(String kv[] : keyvals) {
Set<String> s = mapToSet.get(kv[0]);
if(null == s) {
s = new HashSet<String>();
}
s.add(kv[1]);
mapToSet.put(kv[0], s);
}

Contains.(cs) is case sensitive in android

I have a search method written onTextChanged. This will search an Arraylist of Hashmap. The problem here is that I get search results only for upper case or lower case.
How to solve this issue by making it not case sensitive.
Here is the code I have:
public void onTextChanged(CharSequence s, int start, int before, int count)
{
song = 2;
songsList2.clear();
for (HashMap<String, String> map : songsList)
{
if(map.get(KEY_TITLE).contains(s))
{
HashMap<String, String> map2 = new HashMap<String, String>();
// adding each child node to HashMap key => value
map2.put(KEY_ID, map.get(KEY_ID));
map2.put(KEY_TITLE, map.get(KEY_TITLE));
map2.put(KEY_ARTIST, map.get(KEY_ARTIST));
map2.put(KEY_DURATION, map.get(KEY_DURATION));
map2.put(KEY_THUMB_URL, map.get(KEY_THUMB_URL));
songsList2.add(map2);
adapter=new LazyAdapter(CustomizedListView.this, songsList2);
list.setAdapter(adapter);
}
}
adapter.notifyDataSetChanged();
}
Thanks!
Yes, the contains method is case sensitive. If you want to make it not so, you can either compile a regex pattern and use that instead, or simply make the value you search to lowercase and then check it, just as
if(map.get(KEY_TITLE).toLowerCase().contains(s)...
assuming that the char sequence is in the same case.
Edit:
As pointed out in the comments, to ensure a match where no assumption of the char sequence case is made, you would have to force the search phrase to the same case by using toLowerCase on it aswell
if(map.get(KEY_TITLE).toLowerCase().contains(s.toString().toLowerCase())...
First avoid using concrete classes when you can use interfaces. As far as I understand songsList is a list or arrays of HashMaps. Change the definition to List<Map<String, String>>. Now use TreeMap instead of HashMap but pass String.CASE_INSENSITIVE_ORDER comparator when creating the instance:
Map<String, String> map = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
Now you do not need to change your code. you can put there "Yellow Submarine" and get either "yellow submarine" or "YELLOW SUBMARINE" or whatever. But the actual value stored in map remains "Yellow Submarine" (the value that you put), so if you iterate over the entries you get "correct" value.

Counting occurrences of words in an array

I've been working on something which takes a stream of characters, forms words, makes an array of the words, then creates a vector which contains each unique words and the number of times it occurs (basically a word counter).
Anyway I've not used Java in a long time, or much programming to be honest and I'm not happy with how this currently looks. The part I have which makes the vector looks ugly to me and I wanted to know if I could make it less messy.
int counter = 1;
Vector<Pair<String, Integer>> finalList = new Vector<Pair<String, Integer>>();
Pair<String, Integer> wordAndCount = new Pair<String, Integer>(wordList.get(1), counter); // wordList contains " " as first word, starting at wordList.get(1) skips it.
for(int i= 1; i<wordList.size();i++){
if(wordAndCount.getLeft().equals(wordList.get(i))){
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter++);
}
else if(!wordAndCount.getLeft().equals(wordList.get(i))){
finalList.add(wordAndCount);
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter=1);
}
}
finalList.add(wordAndCount); //UGLY!!
As a secondary question, this gives me a vector with all the words in alphabetical order (as in the array). I want to have it sorted by occurrence, the alphabetical within that.
Would the best option be:
Iterate down the vector, testing each occurrence int with the one above, using Collections.swap() if it was higher, then checking the next one above (as its now moved up 1) and so on until it's no longer larger than anything above it. Any occurrence of 1 could be skipped.
Iterate down the vector again, testing each element against the first element of the vector and then iterating downwards until the number of occurrences is lower and inserting it above that element. All occurrences of 1 would once again be skipped.
The first method would doing more in terms of iterating over the elements, but the second one requires you to add and remove components of the vector (I think?) so I don't know which is more efficient, or whether its worth considering.
Why not use a Map to solve your problem?
String[] words // your incoming array of words.
Map<String, Integer> wordMap = new HashMap<String, Integer>();
for(String word : words) {
if(!wordMap.containsKey(word))
wordMap.put(word, 1);
else
wordMap.put(word, wordMap.get(word) + 1);
}
Sorting can be done using Java's sorted collections:
SortedMap<Integer, SortedSet<String>> sortedMap = new TreeMap<Integer, SortedSet<String>>();
for(Entry<String, Integer> entry : wordMap.entrySet()) {
if(!sortedMap.containsKey(entry.getValue()))
sortedMap.put(entry.getValue(), new TreeSet<String>());
sortedMap.get(entry.getValue()).add(entry.getKey());
}
Nowadays you should leave the sorting to the language's libraries. They have been proven correct with the years.
Note that the code may use a lot of memory because of all the data structures involved, but that is what we pay for higher level programming (and memory is getting cheaper every second).
I didn't run the code to see that it works, but it does compile (copied it directly from eclipse)
re: sorting, one option is to write a custom Comparator which first examines the number of times each word appears, then (if equal) compares the words alphabetically.
private final class PairComparator implements Comparator<Pair<String, Integer>> {
public int compareTo(<Pair<String, Integer>> p1, <Pair<String, Integer>> p2) {
/* compare by Integer */
/* compare by String, if necessary */
/* return a negative number, a positive number, or 0 as appropriate */
}
}
You'd then sort finalList by calling Collections.sort(finalList, new PairComparator());
How about using google guava library?
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
int countFoo = multiset.count("foo");
From their javadocs:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Simple enough?

How to search an array for a part of string?

I have an arraylist<string> of words. I sort it using Collections.sort(wordsList);
I'm using this array for an auto-suggest drop down box, so that when the user is typing in a letter, they are given a list of suggestions similar to what they are typing in.
How do I go about searching this array for a prefix of string, say the user types in "mount" and the array contains the word "mountain", how can I search this array and return similar values.
Here's my code so far:
public List<Interface> returnSuggestedList(String prefix) {
String tempPrefix = prefix;
suggestedPhrases.clear();
//suggestedPhrases = new ArrayList<Interface>();
//Vector<String> list = new Vector<String>();
//List<Interface> interfaceList = new ArrayList<Interface>();
Collections.sort(wordsList);
System.out.println("Sorted Vector contains : " + wordsList);
int i = 0;
while (i != wordsList.size()) {
int index = Collections.binarySearch(wordsList, prefix);
String tempArrayString = wordsList.get(index).toString();
if (tempArrayString.toLowerCase().startsWith(prefix.toLowerCase())) {
ItemInterface itemInt = new Item(tempArrayString);
suggestedPhrases.add(itemInt);
System.out.println(suggestedPhrases.get(i).toString());
System.out.println("Element found at : " + index);
}
i++;
}
return suggestedPhrases;
}
The most basic approach would be
List<String> result = new ArrayList<String>();
for(String str: words){
if(str.contains(keyword){
result.add(str);
}
}
You can improve this version, if you only concern with startWith instead of contains then you can distribute words in a HashMap and you will have narrowed search
For this task, there are better data structures than a sorted array of strings. You might look e.g. at DAWG (Directed acyclic word graph).
If wordList is fixed (does not change from one method call to the other) you should sort it somewhere else, because sort is costly, and store it in lowercase.
In the rest of the method you would do something like:
List<String> selected = new ArrayList<String>();
for(String w:wordList){
if(w.startsWith(prefix.toLower())) // or .contains(), depending on
selected.add(w); // what you want exactly
}
return selected;
Also see the trie data structure. This question has useful info. I should think its getPrefixedBy() will be more efficient than anything you can roll by hand quickly.
Of course, this will work for prefix searches only. Contains search is a different beast altogether.
As #Jiri says you can use a DAWG, but if you don't want to go that far you can do some simple and useful things.
Make use of the sorting
If you want to sort the array of words do it previously. don't sort it each time
As it's sorted you can find the first and the last word in the list that are matches. The use list.subList(from, to) to return sublist. It's a little more optimal that adding each one.
Use a pre-sorted structure
Use a TreeSet<String> for storing the strings (the will be sorted internally).
Then use treeSet.subSet(from, true, to, false);
Where from is the prefix and to is the "prefix plus one char". By example if you're looking for abc, to must be abd. If you don't want to make that char transformation anyway you can ask for treeSet.headSet(from) and iterate over it until there are no more prefixes.
This is specially useful if you read more than you write. Maybe ordering strings is a little expensive but once ordered you can find them very fast (O(log n)).
Case insensitive comparing
You can provide a Comparator<String> to the tree set in order to indicate how it must order the strings. You cam implement it or maybe there are a prebuild case-insensitive comparator over there.
Anyway its code should be:
int compare(String a, String b) {
return a.toLowerCase().compareTo(b.toLowerCase());
}
Here is a similar example:
-> http://samuelsjoberg.com/archive/2009/10/autocompletion-in-swing

Java Hashtable problem

I am having some problem with java hashtable. Following is my hastable key and values
{corpus\2.txt=[cat sparrow], corpus\4.txt=[elephant sparrow], corpus\1.txt=[elephant cow], corpus\3.txt=[cow cat]}
So if i want to access first tuple i have to pass key "corpus\2.txt" to get its value. If i pass value i can get it's key. But I want to make a function I pass like 1 2 3 4 etc. and get both key and value. Any idea?
2nd question:
Is it possible to store an index with key and value too?? Or is it possible to get index ( 0,1,2,3 etc. ) from existing hashtable?
Thanks !
For starters, I would use a HashMap, rather than the (now obsolete) HashTable. If you do that, then you can use Map.Entry to return a key/value pair (as per your first question).
You can't easily store an index with your key. You might want to create a special Key object thus:
public class Key {
private String name;
private int index;
....
}
with a suitable equals()/hashCode() implementation (as pointed out below in the comments) and use that as the key in your HashMap. You've have to perform a lookup using this key and thus construct one from your current String-based key, but I don't think that's a big deal.
There is no method in the API to get a specific entry from a Java hash table. You can access the collection of all entries with the entrySet method, and iterating over that you will get all the key-value pairs as Map.Entry objects.
Hash tables are completely unordered. They are just mappings from keys to values and do not have any definite indices. There is a specific order that the entries will be processed if you iterate over the entrySet result, but this might also change when you modify the hash table.
Take a look at LinkedHashMap, a map implementation that preserves input ordering.
Rather use a Map<Integer, ValueObject> wherein ValueObject is just a custom javabean class with two properties e.g. filename and description.
Basic kickoff example:
public class ValueObject {
private String filename;
private String description;
public ValueObject() {
// Always keep default constructor alive.
}
public ValueObject(String filename, String description) {
this.filename = filename;
this.description = description;
}
// Add/generate public getters and setters for filename and description.
}
which you can use as follows:
Map<Integer, ValueObject> map = new HashMap<Integer, ValueObject>();
map.put(1, new ValueObject("corpus1.txt", "elephant cow"));
map.put(2, new ValueObject("corpus2.txt", "cat sparrow"));
map.put(3, new ValueObject("corpus3.txt", "cow cat"));
map.put(4, new ValueObject("corpus4.txt", "elephant sparrow"));
ValueObject vo = map.get(1); // Returns VO with corpus1.txt and elephant cow.
There's no way to access a Map by index. However, if what you really want to do is access the key-value pairs in the map one by one, you can just do:
for (Map.Entry<String, List<String>> nameAndWords: hashmap) {
String name = nameAndWords.getKey();
List<String> words = nameAndWords.getValue();
// do your stuff here
}
If you actually need indexing, you can add an external order to the map by keeping the keys in a list, which must be updated when you edit the map:
HashMap<String, List<String>> wordsByCorpus;
List<String> corpusNames;
public void addCorpus(String name, List<String> words) {
List<String> oldValue = wordsByCorpus.put(name, words);
if (oldValue == null) corpusNames.add(name);
}
public void removeCorpus(String name) {
wordsByCorpus.remove(name);
corpusNames.remove(name);
}
public Map.Entry<String, List<String>> getCorpus(int i) {
String name = corpusNames.get(i);
List<String> words = wordsByCorpus.get(name);
return wordsByCorpus.new SimpleImmutableEntry(name, words); // 1.6 only!
}
You either want to use a LinkedHashMap which allows you to access values added to the map using the index of the order they were added in.
Or you want to use 2 HashMaps. One to index by the string value and the second one to convert the integer value into the string value key of the first map. Then simple to get key and value from index:
String key = mapByIntToStringKey.get(index);
V value = mapByStringKey.get(key);
// now have both key and value, no linear searching so should be fast
Thus your maps would contain:
mapByStringKey={corpus\2.txt=[cat sparrow], corpus\4.txt=[elephant sparrow], corpus\1.txt=[elephant cow], corpus\3.txt=[cow cat]}
mapByIntToStringKey{2=corpus\2.txt, 4=corpus\4.txt, 1=corpus\1.txt}
although this is assuming that all your keys are not simply "corpus"+index+".txt".
If all keys are as above then if the indexes are not sparse then you could use a simple ArrayList (previously mentioned) and use get(index) which is fast (directly looks up in an array, can't get much faster than that), and then reconstruct the string key using the expression above.
If the indexes are sparse (i.e. some are missing, there are gaps) then just use the mapByIntToStringKey but replace with mapByIntToValue and reconstruct any string key you need using previous string expression.
The current high answer seems very odd to me, in that the suggestion is to key the map using the int index bit only of a compound key. Unless I'm reading it wrong, it means that you loose the ability to lookup values in the map using the string key alone or maybe just implies that you can always deduce the int index from the string key.

Categories

Resources