Using the class methods of something in a TreeMap - java

I know there are already topics on this exact thing but none of them actually answer my question. is there a way to do this?
if I have a TreeMap that uses strings as the keys and objects of the TreeSet class as the values, is there a way that I can add some int to a set that is associated with a specific key?
Well what I'm supposed to do is make a concordance from a text file using the TreeMap and TreeSet class. my plan is this use the TreeMap keys as the words in the text file and the values will be sets of line numbers on which the word appears. So you step through the text file and every time you get a word you check the TreeMap to see if you already have that key and if you don't you add it in and create a new TreeSet of line numbers starting with the one you are on. If you already have it then you just add the line number to the set. So you see what I need to do is access the .add() function of the set
something like
map.get(identifier).add(lineNumber);
I know that doesn't work but how do I do it?
I mean if there is an easier way to do what I'm trying to do I'd be happy to do that instead, but I would still like to know how to do it this way just for you know learning and experience and all that.

Consider the following logic (I assume the input words are in an array):
TreeMap<String, TreeSet<Integer>> index = new TreeMap<String, TreeSet<Integer>>();
for (int pos = 0; pos < input.length; pos++) {
String word = input[pos];
TreeSet<Integer> wordPositions = index.get(word);
if (wordPositions == null) {
wordPositions = new TreeSet<Integer>();
index.put(word, wordPositions);
}
wordPositions.add(pos);
}
This results in the index you need, which maps from strings to the set of positions where the string appears. Depending on your specific needs, the outer/inner data structure can be changed to HashMap/HashSet respectively.

Why not to use a Map of String and ArrayList<int>, something like:
Map<String, List<Integer>> map = new HashMap<String, List<Integer>>();
And then always when you get a word you check if it already exists in the Map and if it does exist you add the line number to the List and if not you create a new entry in the Map for the given word and the given line number.
if (map.get(word ) != null) {
map.get(word).add(line);
}
else{
final List<Integer> list = new ArrayList<Integer>();
list.add(line);
map.put(word, list);
}

If I understand correctly, you want to have a treemap with each key referring to a treeset for storing line number on which the key has appeared. It is definitely doable and implementation is quiet simple. I am not sure why your map.get(identifier).add(lineNumber); is not working. This is how I would do it:
TreeMap<String, TreeSet<Integer>> map = new TreeMap<String, TreeSet<Integer>>();
TreeSet<Integer> set = new TreeSet<Integer>();
set.add(1234);
map.put("hello", set);
map.get("hello").add(123);
It all works fine.

The only reason your construct won't work is because the result of map.get(identifier) can be null. Personally, I like the lazy initialization solution that #EyalSchneider answered with. But there is an alternative if you know all your identifiers ahead of time: for example, if you preload your Map with all known English words. Then you can do something like:
for (String word : allEnglishWords) {
map.put(word, new LinkedList<Integer>);
}
for (int pos = 0; pos < input.length; pos++) {
String word = input[pos];
map.get(word).add(pos);
}

Related

Java: matching ArrayList strings to an iterator, and incrementing the integers of a different ArrayList at the same index

noob here, so sorry if I say anything dumb.
I'm comparing strings in an ArrayList to an iterator of strings in an iterator of Sets. When I find a match, I want to grab the index of matched string in the ArrayList and increment that same index in a different ArrayList of integers. I have something that looks (to me) like it should work, but after this code runs, my integer ArrayList contains mostly -1 with a few 2,1, and 0.
I'm interested in fixing my code first, but I'd also be interested different approaches, so here's the larger picture: I have a map where the keys are usernames in a social network, and the values are sets usernames of people they follow. I need to return a list of all usernames in descending order of followers. In the code below I'm only trying to make an ArrayList of strings (that contains ALL the usernames in the map) that correspond with a different ArrayList of integers like:
usernamesList ... numberOfFollowers
theRealJoe ... 7
javaNovice ... 3
FakeTinaFey ... 3
etc
Map<String, Set<String>> map = new HashMap<String, Set<String>>();
//edit: this map is populated. It's a parameter of the method I'm trying to write.
List<String> usernamesList = new ArrayList<String>();
//populate usernamesList with all strings in map
Iterator<Set<String>> setIter = map.values().iterator();
Iterator<String> strIter;
int strIterIndex = 0;
int w = 0;
List<Integer> numOfFollowers = new ArrayList<Integer>();
//initialize all elements to 0. not sure if necessary
for (int i = 0; i < usernamesList.size(); i++) {
numOfFollowers.add(0);
}
while (setIter.hasNext()) {
Set<String> currentSetIter = setIter.next();
strIter = currentSetIter.iterator();
while (strIter.hasNext()) {
String currentstrIter = strIter.next();
if (usernamesList.contains(currentstrIter)) {
strIterIndex = usernamesList.indexOf(currentstrIter);
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
w++;
System.out.println("if statement has run " + w + " times." );
} else {
throw new RuntimeException("Should always return true. all usernames from guessFollowsGraph should be in usernamesList");
}
}
}
I think everyhing looks ok, except this one:
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
When you do numOfFollowers.indexOf, you are looking for the index of an element that has a value strInterIndex. What you want, is the value (follower count) of an element with index strIterIndex:
numOfFollowers.set(strIterIndex, numOfFollowers.get(strIterIndex) +1);
I would also suggest using int[] (array) instead of a list of indices. It would be faster and more straightforward.
Oh, one more thing: correct the "fake" constructors please, they won't work since there is no "new" keyword after the assignment...

Java Collection, Set, Map or Array to hold unique sorted non blank strings

I am fairly new to java and would like a container that I can use to hold strings that are not empty and have them sorted.
So far, I have mostly been using ArrayList, but this seems a bit limited for this case.
Thanks
Use TreeSet or TreeMap, depending on your requirements. Both are collections that accept unique elements and keep them sorted.
A Set is what you want, as the items in it have to be unique.
As the Strings should be sorted you'll need a TreeSet.
As for the non blank Strings you have to override the insertion methods like this:
Set<String> sortedSetOfStrings = new TreeSet<String>() {
#Override
public boolean add(String s) {
if(s.isEmpty())
return false;
return super.add(s);
}
};
EDIT: Simplified thanks to Peter Rader's comment.
Thanks for all the help. Here is what I eventually came up with, using TreeSet and apache commons StringUtils. My Input is a CSV String so, I didn't use the check on the input.
String csvString = "Cat,Dog, Ball, Hedge,, , Ball, Cat"
String[] array = StringUtils.split((String) csvString, ",");
for (int i = 0; i < array.length; i++)
{
array[i] = array[i].trim(); //Remove unwanted whitespace
}
set = new TreeSet<String>(Arrays.asList(array));
set.remove(""); //Remove the one empty string if it is there
set now contains: Ball,Cat,Dog,Hedge

Counting occurrences of words in an array

I've been working on something which takes a stream of characters, forms words, makes an array of the words, then creates a vector which contains each unique words and the number of times it occurs (basically a word counter).
Anyway I've not used Java in a long time, or much programming to be honest and I'm not happy with how this currently looks. The part I have which makes the vector looks ugly to me and I wanted to know if I could make it less messy.
int counter = 1;
Vector<Pair<String, Integer>> finalList = new Vector<Pair<String, Integer>>();
Pair<String, Integer> wordAndCount = new Pair<String, Integer>(wordList.get(1), counter); // wordList contains " " as first word, starting at wordList.get(1) skips it.
for(int i= 1; i<wordList.size();i++){
if(wordAndCount.getLeft().equals(wordList.get(i))){
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter++);
}
else if(!wordAndCount.getLeft().equals(wordList.get(i))){
finalList.add(wordAndCount);
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter=1);
}
}
finalList.add(wordAndCount); //UGLY!!
As a secondary question, this gives me a vector with all the words in alphabetical order (as in the array). I want to have it sorted by occurrence, the alphabetical within that.
Would the best option be:
Iterate down the vector, testing each occurrence int with the one above, using Collections.swap() if it was higher, then checking the next one above (as its now moved up 1) and so on until it's no longer larger than anything above it. Any occurrence of 1 could be skipped.
Iterate down the vector again, testing each element against the first element of the vector and then iterating downwards until the number of occurrences is lower and inserting it above that element. All occurrences of 1 would once again be skipped.
The first method would doing more in terms of iterating over the elements, but the second one requires you to add and remove components of the vector (I think?) so I don't know which is more efficient, or whether its worth considering.
Why not use a Map to solve your problem?
String[] words // your incoming array of words.
Map<String, Integer> wordMap = new HashMap<String, Integer>();
for(String word : words) {
if(!wordMap.containsKey(word))
wordMap.put(word, 1);
else
wordMap.put(word, wordMap.get(word) + 1);
}
Sorting can be done using Java's sorted collections:
SortedMap<Integer, SortedSet<String>> sortedMap = new TreeMap<Integer, SortedSet<String>>();
for(Entry<String, Integer> entry : wordMap.entrySet()) {
if(!sortedMap.containsKey(entry.getValue()))
sortedMap.put(entry.getValue(), new TreeSet<String>());
sortedMap.get(entry.getValue()).add(entry.getKey());
}
Nowadays you should leave the sorting to the language's libraries. They have been proven correct with the years.
Note that the code may use a lot of memory because of all the data structures involved, but that is what we pay for higher level programming (and memory is getting cheaper every second).
I didn't run the code to see that it works, but it does compile (copied it directly from eclipse)
re: sorting, one option is to write a custom Comparator which first examines the number of times each word appears, then (if equal) compares the words alphabetically.
private final class PairComparator implements Comparator<Pair<String, Integer>> {
public int compareTo(<Pair<String, Integer>> p1, <Pair<String, Integer>> p2) {
/* compare by Integer */
/* compare by String, if necessary */
/* return a negative number, a positive number, or 0 as appropriate */
}
}
You'd then sort finalList by calling Collections.sort(finalList, new PairComparator());
How about using google guava library?
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
int countFoo = multiset.count("foo");
From their javadocs:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Simple enough?

How do I utilize hashtables to hold words and frequency of use?

I am so confused right now. I am supposed to write a program that uses a hashtable. The hashtable holds words along with their frequency of use. The class "Word" holds a counter and the string. If the word is already in the table then its frequency increases. I have been researching how to do this but am just lost. I need to be pointed in the right direction. Any help would be great.
Hashtable<String, Word> words = new Hashtable<String, Word>();
public void addWord(String s) {
if (words.containsKey(s) {
words.get(s).plusOne();
} else {
words.put(s, new Word(s));
}
}
This will do it.
Hashtable would be an unusual choice for any new Java code these days. I assume this is some kind of exercise.
I would be slightly concerned by any exercise that hadn't been updated to use newer mechanisms.
HashMap will give you better performance than Hashtable in any single threaded scenario.
But as Emmanuel Bourg points out, Bag will do all of this for you without needing the Word class at all: just add String objects to the Bag, and the bag will automatically keep count for you.
Anyway, you're being asked to use a Map, and a map lets you find things quickly by using a key. The key can be any Object, and Strings are very commonly used: they are immutable and have good implementations of hashCode and equals, which make them ideal keys.
The javadoc for Map talks about how you use maps. Hashtable is one implementation of this interface, though it isn't a particularly good one.
You need a good key to let you find existing Word objects quickly, so that you can increment the counter. While you could make the Word object itself into the key, you would have some work to do: better is to use the String that the Word contains as the key.
You find whether the Word is already in the map by looking for the value object that has the String as its key.
You'd better use a Bag, it keeps the count of each element:
http://commons.apache.org/collections/api-release/org/apache/commons/collections/Bag.html
This piece of code should solve your problem
Hashtable <String, Word> myWords = new Hashtable<String, Word>();
Word w = new Word("test");
Word w = new Word("anotherTest");
String inputWord = "test";
if (myWords.containsKey(inputWord)){
myWords.get(inputWord).setCounter(myWords.get(inputWord).getCounter+1);
}
Given that the class Word has a counter and a string, I'd use a HashMap<String, Word>. If your input is an array of Strings, you can accomplish something like this by using:
public Map<String, Word> getWordCount(String[] input) {
Map<String, Word> output = new HashMap<String, Word>();
for (String s : input) {
Word w = output.get(s);
if (w == null) {
w = new Word(s, 0);
}
w.incrementValue(); // Or w = new Word(s, w.getCount() + 1) if you have no such function
output.put(s, w);
}
return output;
}

Java - Optimize finding a string in a list

I have an ArrayList of objects where each object contains a string 'word' and a date. I need to check to see if the date has passed for a list of 500 words. The ArrayList could contain up to a million words and dates. The dates I store as integers, so the problem I have is attempting to find the word I am looking for in the ArrayList.
Is there a way to make this faster? In python I have a dict and mWords['foo'] is a simple lookup without looping through the whole 1 million items in the mWords array. Is there something like this in java?
for (int i = 0; i < mWords.size(); i++) {
if ( word == mWords.get(i).word ) {
return mWords.get(i);
}
}
If the words are unique then use HashMap. I mean, {"a", 1}, {"b", 2}
Map<String, Integer> wordsAndDates = new HashMap<String, Integer>();
wordsAndDates.put("a", 1);
wordsAndDates.put("b", 2);
and wordsAndDates.get("a") return 1
If not you shouldn't use HashMap because it overrides previous value. I mean
wordsAndDates.put("a", 1);
wordsAndDates.put("b", 2);
wordsAndDates.put("a", 3);
and wordsAndDates.get("a") return 3
In such case you can use ArrayList and search in it
If you're not stuck with an ArrayList you should use some kind of hash based data structure. In this case it seems like a HashMap should fit nicely (it's pretty close to python's dict). This will give you an O(1) lookup time (compared to your current method of linear search).
You want to use a Map in Java
Map<String,Integer> mWords = new HashMap<String, Integer>();
mWords.put ("foo", 112345);
What about Collections.binarySearch() (NB: the list must be sorted) if ou are stuck with the ArrayList

Categories

Resources