Java - Optimize finding a string in a list

Java - Optimize finding a string in a list - java

I have an ArrayList of objects where each object contains a string 'word' and a date. I need to check to see if the date has passed for a list of 500 words. The ArrayList could contain up to a million words and dates. The dates I store as integers, so the problem I have is attempting to find the word I am looking for in the ArrayList.
Is there a way to make this faster? In python I have a dict and mWords['foo'] is a simple lookup without looping through the whole 1 million items in the mWords array. Is there something like this in java?
for (int i = 0; i < mWords.size(); i++) {
if ( word == mWords.get(i).word ) {
return mWords.get(i);
}
}

If the words are unique then use HashMap. I mean, {"a", 1}, {"b", 2}
Map<String, Integer> wordsAndDates = new HashMap<String, Integer>();
wordsAndDates.put("a", 1);
wordsAndDates.put("b", 2);
and wordsAndDates.get("a") return 1
If not you shouldn't use HashMap because it overrides previous value. I mean
wordsAndDates.put("a", 1);
wordsAndDates.put("b", 2);
wordsAndDates.put("a", 3);
and wordsAndDates.get("a") return 3
In such case you can use ArrayList and search in it

If you're not stuck with an ArrayList you should use some kind of hash based data structure. In this case it seems like a HashMap should fit nicely (it's pretty close to python's dict). This will give you an O(1) lookup time (compared to your current method of linear search).

You want to use a Map in Java
Map<String,Integer> mWords = new HashMap<String, Integer>();
mWords.put ("foo", 112345);

What about Collections.binarySearch() (NB: the list must be sorted) if ou are stuck with the ArrayList

Related

How to keep track of number of occurrences between two dependent LinkedLists?

I'm having some trouble figuring out how to keep track of occurrences between two dependent LinkedLists.
Let me elaborate with an example:
These are the linked lists in question.
They are dependent because each value in first list corresponds to the value in the second list with the same index i. Both lists are always the same length.
{sunny, sunny, rainy, sunny, cloudy, sunny, ...}
{yes, no, no, maybe, yes, no, ...}
What I need is to somehow keep track of the "pairs" of occurrences.
For example:
sunny -> 1 yes, 1 maybe, 2 no
rainy -> 1 no
cloudy -> 1 yes
Note: There doesn't have to be exactly 3 options. There can be more or less. Also the names of the items of the lists aren't known previously.
So yea, I'm wondering which is the best way to go about storing this information as I've hit a dead end.
Any help is appreciated.

You may do that with a Map<String, Map<String, Integer>.
The key of the outer map is the weather (sunny, rainy etc.)
The value is another map which contains each possible value (yes, no, maybe...) and the number of times that value occurs.
Something like this to merge the two lists:
public static Map<String, Map<String, Integer>> count(List<String> weathers, List<String> answers) {
//weathers olds the strings 'sunny', 'rainy', 'sunny'...
//answers old the strings 'yes', 'no'...
//this code assumes both lists have the same size, you can enforce this in a check or throw if not the case
Map<String, Map<String, Integer>> merged = new HashMap<>();
for (int j = 0; j < weathers.size(); j++) {
if (merged.containsKey(weathers.get(j))) {
Map<String, Integer> counts = merged.get(weathers.get(j));
counts.put(answers.get(j), counts.getOrDefault(answers.get(j), 0) + 1);
} else {
Map<String, Integer> newAnswer = new HashMap<>();
newAnswer.put(answer.get(j), 1);
merged.put(weathers.get(j), newAnswer);
}
}
return merged;
}
The logic applied to the code above is that you loop through each occurrency of the list(s) and you check if your map already contain that weather.
If it's the case, you get the already existing map and increase the number for that answer (if the answer is not present yet, you will start from zero)
If it's not the case, you add a new map for that weather where you only have the first answer with count 1.
Sample usage:
Map<String, Map<String, Integer>> resume = count(weathers, answers);
//How many times 'sunny' weather was 'maybe'?
Integer answer1 = resume.get("sunny").get("maybe");
//How many times 'rainy' weather was 'no'?
Integer answer2 = resume.get("rainy").get("no");
//etc.

Java: matching ArrayList strings to an iterator, and incrementing the integers of a different ArrayList at the same index

noob here, so sorry if I say anything dumb.
I'm comparing strings in an ArrayList to an iterator of strings in an iterator of Sets. When I find a match, I want to grab the index of matched string in the ArrayList and increment that same index in a different ArrayList of integers. I have something that looks (to me) like it should work, but after this code runs, my integer ArrayList contains mostly -1 with a few 2,1, and 0.
I'm interested in fixing my code first, but I'd also be interested different approaches, so here's the larger picture: I have a map where the keys are usernames in a social network, and the values are sets usernames of people they follow. I need to return a list of all usernames in descending order of followers. In the code below I'm only trying to make an ArrayList of strings (that contains ALL the usernames in the map) that correspond with a different ArrayList of integers like:
usernamesList ... numberOfFollowers
theRealJoe ... 7
javaNovice ... 3
FakeTinaFey ... 3
etc
Map<String, Set<String>> map = new HashMap<String, Set<String>>();
//edit: this map is populated. It's a parameter of the method I'm trying to write.
List<String> usernamesList = new ArrayList<String>();
//populate usernamesList with all strings in map
Iterator<Set<String>> setIter = map.values().iterator();
Iterator<String> strIter;
int strIterIndex = 0;
int w = 0;
List<Integer> numOfFollowers = new ArrayList<Integer>();
//initialize all elements to 0. not sure if necessary
for (int i = 0; i < usernamesList.size(); i++) {
numOfFollowers.add(0);
}
while (setIter.hasNext()) {
Set<String> currentSetIter = setIter.next();
strIter = currentSetIter.iterator();
while (strIter.hasNext()) {
String currentstrIter = strIter.next();
if (usernamesList.contains(currentstrIter)) {
strIterIndex = usernamesList.indexOf(currentstrIter);
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
w++;
System.out.println("if statement has run " + w + " times." );
} else {
throw new RuntimeException("Should always return true. all usernames from guessFollowsGraph should be in usernamesList");
}
}
}

I think everyhing looks ok, except this one:
numOfFollowers.set(strIterIndex, numOfFollowers.indexOf(strIterIndex) +1);
When you do numOfFollowers.indexOf, you are looking for the index of an element that has a value strInterIndex. What you want, is the value (follower count) of an element with index strIterIndex:
numOfFollowers.set(strIterIndex, numOfFollowers.get(strIterIndex) +1);
I would also suggest using int[] (array) instead of a list of indices. It would be faster and more straightforward.
Oh, one more thing: correct the "fake" constructors please, they won't work since there is no "new" keyword after the assignment...

Using the class methods of something in a TreeMap

I know there are already topics on this exact thing but none of them actually answer my question. is there a way to do this?
if I have a TreeMap that uses strings as the keys and objects of the TreeSet class as the values, is there a way that I can add some int to a set that is associated with a specific key?
Well what I'm supposed to do is make a concordance from a text file using the TreeMap and TreeSet class. my plan is this use the TreeMap keys as the words in the text file and the values will be sets of line numbers on which the word appears. So you step through the text file and every time you get a word you check the TreeMap to see if you already have that key and if you don't you add it in and create a new TreeSet of line numbers starting with the one you are on. If you already have it then you just add the line number to the set. So you see what I need to do is access the .add() function of the set
something like
map.get(identifier).add(lineNumber);
I know that doesn't work but how do I do it?
I mean if there is an easier way to do what I'm trying to do I'd be happy to do that instead, but I would still like to know how to do it this way just for you know learning and experience and all that.

Consider the following logic (I assume the input words are in an array):
TreeMap<String, TreeSet<Integer>> index = new TreeMap<String, TreeSet<Integer>>();
for (int pos = 0; pos < input.length; pos++) {
String word = input[pos];
TreeSet<Integer> wordPositions = index.get(word);
if (wordPositions == null) {
wordPositions = new TreeSet<Integer>();
index.put(word, wordPositions);
}
wordPositions.add(pos);
}
This results in the index you need, which maps from strings to the set of positions where the string appears. Depending on your specific needs, the outer/inner data structure can be changed to HashMap/HashSet respectively.

Why not to use a Map of String and ArrayList<int>, something like:
Map<String, List<Integer>> map = new HashMap<String, List<Integer>>();
And then always when you get a word you check if it already exists in the Map and if it does exist you add the line number to the List and if not you create a new entry in the Map for the given word and the given line number.
if (map.get(word ) != null) {
map.get(word).add(line);
}
else{
final List<Integer> list = new ArrayList<Integer>();
list.add(line);
map.put(word, list);
}

If I understand correctly, you want to have a treemap with each key referring to a treeset for storing line number on which the key has appeared. It is definitely doable and implementation is quiet simple. I am not sure why your map.get(identifier).add(lineNumber); is not working. This is how I would do it:
TreeMap<String, TreeSet<Integer>> map = new TreeMap<String, TreeSet<Integer>>();
TreeSet<Integer> set = new TreeSet<Integer>();
set.add(1234);
map.put("hello", set);
map.get("hello").add(123);
It all works fine.

The only reason your construct won't work is because the result of map.get(identifier) can be null. Personally, I like the lazy initialization solution that #EyalSchneider answered with. But there is an alternative if you know all your identifiers ahead of time: for example, if you preload your Map with all known English words. Then you can do something like:
for (String word : allEnglishWords) {
map.put(word, new LinkedList<Integer>);
}
for (int pos = 0; pos < input.length; pos++) {
String word = input[pos];
map.get(word).add(pos);
}

Implementing search based on 2 fields in a java class

I am trying to present a simplified version of my requirement here for ease of understanding.
I have this class
public class MyClass {
private byte[] data1;
private byte[] data2;
private long hash1; // Hash value for data1
private long hash2; // Hash value for data2
// getter and setters }
Now I need to search between 2 List instances of this class, find how many hash1's match between the 2 instances and for all matches how many corresponding hash2's match. The 2 list will have about 10 million objects of MyClass.
Now I am planning to iterate over first list and search in the second one. Is there a way I can optimize the search by sorting or ordering in any particular way? Should I sort both list or only 1?

Best solution would be to iterate there is no faster solution than this. You can create Hashmap and take advantage that map does not add same key but then it has its own creation overload

sort only second, iterate over first and do binary search in second, sort O(nlogn) and binary search for n item O(nlogn)
or use hashset for second, iterate over first and search in second, O(n)

If you have to check all the elements, I think you should iterate over the first list and have a Hashmap for the second one as said AmitD.
You just have to correctly override equals and hashcode in your MyClass class. Finally, I will recomend you to use basic types as much as possible. For example, for the first list, instead of a list will be better to use a simple array.
Also, at the beginning you could select which of the two lists is the shorter one (if there's a difference in the size) and iterate over that one.

I think you should create a hashmap for one of the lists (say list1) -
Map<Long, MyClass> map = new HashMap<Long, MyClass>(list1.size());//specify the capacity
//populate map like - put(myClass.getHash1(), myClass) : for each element in the list
Now just iterate through the second list (there is no point in sorting both) -
int hash1MatchCount = 0;
int hash2MatchCount = 0;
for(MyClass myClass : list2) {
MyClass mc = map.get(myClass.getHash1());
if(mc != null) {
hash1MatchCount++;
if(myClass.getHash2() == mc.getHash2) {
hash2MatchCount++;
}
}
}
Note: Assuming that there is no problem regarding hash1 being duplicates.

Counting occurrences of words in an array

I've been working on something which takes a stream of characters, forms words, makes an array of the words, then creates a vector which contains each unique words and the number of times it occurs (basically a word counter).
Anyway I've not used Java in a long time, or much programming to be honest and I'm not happy with how this currently looks. The part I have which makes the vector looks ugly to me and I wanted to know if I could make it less messy.
int counter = 1;
Vector<Pair<String, Integer>> finalList = new Vector<Pair<String, Integer>>();
Pair<String, Integer> wordAndCount = new Pair<String, Integer>(wordList.get(1), counter); // wordList contains " " as first word, starting at wordList.get(1) skips it.
for(int i= 1; i<wordList.size();i++){
if(wordAndCount.getLeft().equals(wordList.get(i))){
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter++);
}
else if(!wordAndCount.getLeft().equals(wordList.get(i))){
finalList.add(wordAndCount);
wordAndCount = new Pair<String, Integer>(wordList.get(i), counter=1);
}
}
finalList.add(wordAndCount); //UGLY!!
As a secondary question, this gives me a vector with all the words in alphabetical order (as in the array). I want to have it sorted by occurrence, the alphabetical within that.
Would the best option be:
Iterate down the vector, testing each occurrence int with the one above, using Collections.swap() if it was higher, then checking the next one above (as its now moved up 1) and so on until it's no longer larger than anything above it. Any occurrence of 1 could be skipped.
Iterate down the vector again, testing each element against the first element of the vector and then iterating downwards until the number of occurrences is lower and inserting it above that element. All occurrences of 1 would once again be skipped.
The first method would doing more in terms of iterating over the elements, but the second one requires you to add and remove components of the vector (I think?) so I don't know which is more efficient, or whether its worth considering.

Why not use a Map to solve your problem?
String[] words // your incoming array of words.
Map<String, Integer> wordMap = new HashMap<String, Integer>();
for(String word : words) {
if(!wordMap.containsKey(word))
wordMap.put(word, 1);
else
wordMap.put(word, wordMap.get(word) + 1);
}
Sorting can be done using Java's sorted collections:
SortedMap<Integer, SortedSet<String>> sortedMap = new TreeMap<Integer, SortedSet<String>>();
for(Entry<String, Integer> entry : wordMap.entrySet()) {
if(!sortedMap.containsKey(entry.getValue()))
sortedMap.put(entry.getValue(), new TreeSet<String>());
sortedMap.get(entry.getValue()).add(entry.getKey());
}
Nowadays you should leave the sorting to the language's libraries. They have been proven correct with the years.
Note that the code may use a lot of memory because of all the data structures involved, but that is what we pay for higher level programming (and memory is getting cheaper every second).
I didn't run the code to see that it works, but it does compile (copied it directly from eclipse)

re: sorting, one option is to write a custom Comparator which first examines the number of times each word appears, then (if equal) compares the words alphabetically.
private final class PairComparator implements Comparator<Pair<String, Integer>> {
public int compareTo(<Pair<String, Integer>> p1, <Pair<String, Integer>> p2) {
/* compare by Integer */
/* compare by String, if necessary */
/* return a negative number, a positive number, or 0 as appropriate */
}
}
You'd then sort finalList by calling Collections.sort(finalList, new PairComparator());

How about using google guava library?
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
int countFoo = multiset.count("foo");
From their javadocs:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Simple enough?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Optimize finding a string in a list - java

If you're not stuck with an ArrayList you should use some kind of hash based data structure. In this case it seems like a HashMap should fit nicely (it's pretty close to python's dict). This will give you an O(1) lookup time (compared to your current method of linear search).

You want to use a Map in Java Map<String,Integer> mWords = new HashMap<String, Integer>(); mWords.put ("foo", 112345);

What about Collections.binarySearch() (NB: the list must be sorted) if ou are stuck with the ArrayList

Related

How to keep track of number of occurrences between two dependent LinkedLists?

Java: matching ArrayList strings to an iterator, and incrementing the integers of a different ArrayList at the same index

Using the class methods of something in a TreeMap

Implementing search based on 2 fields in a java class

Counting occurrences of words in an array

Categories

Resources