I am trying to present a simplified version of my requirement here for ease of understanding.
I have this class
public class MyClass {
private byte[] data1;
private byte[] data2;
private long hash1; // Hash value for data1
private long hash2; // Hash value for data2
// getter and setters }
Now I need to search between 2 List instances of this class, find how many hash1's match between the 2 instances and for all matches how many corresponding hash2's match. The 2 list will have about 10 million objects of MyClass.
Now I am planning to iterate over first list and search in the second one. Is there a way I can optimize the search by sorting or ordering in any particular way? Should I sort both list or only 1?
Best solution would be to iterate there is no faster solution than this. You can create Hashmap and take advantage that map does not add same key but then it has its own creation overload
sort only second, iterate over first and do binary search in second, sort O(nlogn) and binary search for n item O(nlogn)
or use hashset for second, iterate over first and search in second, O(n)
If you have to check all the elements, I think you should iterate over the first list and have a Hashmap for the second one as said AmitD.
You just have to correctly override equals and hashcode in your MyClass class. Finally, I will recomend you to use basic types as much as possible. For example, for the first list, instead of a list will be better to use a simple array.
Also, at the beginning you could select which of the two lists is the shorter one (if there's a difference in the size) and iterate over that one.
I think you should create a hashmap for one of the lists (say list1) -
Map<Long, MyClass> map = new HashMap<Long, MyClass>(list1.size());//specify the capacity
//populate map like - put(myClass.getHash1(), myClass) : for each element in the list
Now just iterate through the second list (there is no point in sorting both) -
int hash1MatchCount = 0;
int hash2MatchCount = 0;
for(MyClass myClass : list2) {
MyClass mc = map.get(myClass.getHash1());
if(mc != null) {
hash1MatchCount++;
if(myClass.getHash2() == mc.getHash2) {
hash2MatchCount++;
}
}
}
Note: Assuming that there is no problem regarding hash1 being duplicates.
Related
My program has to use the Collections sort method to sort the ArrayList of Strings lexicographically but each String has a corresponding integer value stored in a separate ArrayList. I want to sort them both the same so the integer values stay with the correct Strings. And if you know a better way to store both values I'm all ears.
public class a5p1b {
public static void main(String[] args) {
Scanner input = new Scanner(System.in).useDelimiter("[^a-zA-z]+");
// ArrayLists to store the Strings and the frequencies
ArrayList<String> lst = new ArrayList<String>();
ArrayList<Integer> intLst = new ArrayList<Integer>();
//loops through as long as there is user input
while (input.hasNext()) {
String str = input.next().toLowerCase();
// if the list already has the string it doesn't add it and it
// ups the count by 1
if (lst.contains(str)) {
int index = lst.indexOf(str);
intLst.set(index, intLst.get(index) + 1);
} else {
// if the word hasnt been found yet it adds it to the list
lst.add(str);
intLst.add(1);
}
}
}
}
You are getting your abstractions wrong. If that string and that number belong together, then do not keep them in two distinct lists.
Instead create a class (or maybe use one of the existing Pair classes) that holds those two values. You can then provide an equals method for that class; plus a specific comparator, that only compares the string elements.
Finally, you put objects of that class into a single list; and then you sort that list.
The whole idea of good OO programming is to create helpful abstractions!
For the record: as dnault suggests, if there is really no "tight" coupling between strings and numbers you could also use a TreeMap (to be used as TreeMap<String, Integer>) to take care of sorting strings that have a number with them.
Try
inList.sort(Comparator.comparing(i -> i.toString());
Although, I don't think the two lists is a good idea.
You should use a Map to associate each unique String key with an Integer value.
Then you can invoke Collections.sort on the map's set of keys returned by keySet().
Additionally, if you use a SortedMap such as TreeMap, it is not necessary to sort the keys. However that solution may not fulfill the requirements of your "Assignment 5 Problem 1b."
I am trying to create a linked list that will take a large amount of data, either integers or strings, and get the frequency that they occur. I know how to create a basic linked list that would achieve this but since the amount of data is so large, I want to find a quicker way to sort through the data, instead of going through the entire linked list every time I call a certain method. In order to do this I need to make a Pair of <Object, Integer> where the Object is the data and the integer is the frequency it occurs.
So far I have tried creating arrays and lists that would help me sort out the data but cannot figure out how to get it into a Pair that represents the data and frequency. If you have any ideas that can help me at least get started that would be much appreciated.
First of all you must define your own data type, let's say
public FrequencyCount<T> implements Comparable<FrequencyCount<T>>
{
public final T data;
public int frequency;
public int compareTo(FrequencyCount<T> other) {
// implement this method to choose your correct natural ordering
}
}
With a similar object everything becomes trivial:
List<FrequencyCount<Some>> data = new ArrayList<FrequencyCount<Some>>();
Collections.sort(data);
Set<FrequencyCount<Some>> sortedData = new TreeSet<FrequencyCount<Some>>(data);
You could place all values into a List, create a Set from it and then iterate over the Set to find the frequency in the List using Collections.frequency: http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#frequency(java.util.Collection,%20java.lang.Object)
List<Integer> allValues = ...;
Set<Integer> uniqueValues = new HashSet<Integer>(allValues);
for(Integer val : uniqueValues) {
int frequency = Collections.frequency(allValues, val);
// use val and frequency as key and value as you wish
}
I am trying to merge multiple sorted lists into one TreeSet.. And then I am thinking to apply Binary Search algorithm on that TreeSet to retrieve the element in O(log n) time complexity..
Below is my code in which I am passing List of Lists in in one of my method and combining them into TreeSet to avoid duplicacy... All the lists inside inputs are sorted -
private TreeSet<Integer> tree = new TreeSet<Integer>();
public void mergeMultipleLists(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
for(Integer ii : input) {
tree.add(ii);
}
}
}
public List<Integer> getItem(final Integer x) {
// extract elements from TreeSet in O(log n)
}
First of all, is this right way to merge multiple sorted lists into TreeSet? Is there any direct way to merge multiple sorted lists in TreeSet efficiently?
Secondly, how would I extract an element from that TreeSet in O(log n) time complexity? I would like to find an element x in that TreeSet, if it is there, then return it, if it is not there then return the next largest value from the TreeSet.
Or may be I am better off to another data structure as compared to which I am using currently?
UPDATED CODE:-
private TreeSet tree = new TreeSet();
public SearchItem(final List<List<Integer>> inputs) {
tree = new TreeSet<Integer>();
for (List<Integer> input : inputs) {
tree.addAll(input);
}
}
public Integer getItem(final Integer x) {
if(tree.contains(x)) {
return x;
} else {
// now how do I extract next largest
// element from it if x is not present
}
}
TreeSet is backed by a NavigableMap, a TreeMap specifically. Calling contains() on a TreeSet delegates to TreeMap.containsKey(), which is a binary search implementation.
You can check if an object is contained in the set by using TreeSet.contains(), but you have to have the object first. If you want to be able to look up and retrieve an object, then a Map implementation will be better.
You could use TreeSet.floor(), which according to the docs
Returns the greatest element in this set less than or equal to the given element, or null if there is no such element.
TreeSet, by it's nature is a sorted set and uses a red-tree-black-tree via TreeMap as it's backing
Basically: TreeSet.add(E) -> TreeMap.put(E,NULL);
As it is already a binary, sorted tree structure any 'get' or 'contains' will result in an O(log n) operation.
Your code and your question though don't line up.
You're flattening a List<List<Integer>> and just putting them all in to get all unique elements (or, at least, that's what this code will do).
But then your following method says "given this integer, give me a List<Integer>" which isn't achievable in the above code
So, let me answer your questions in order:
Sure/Yes Y
No. You misunderstand Sets (you can't extract by design) If you can do Set.contains(e)
then you HAVE the element and need not extract anything
If you need to do something like a "Set extraction" then use a TreeMap or turn your set back into a list and do myList.get(Collections.binarySearch(myElement));
I have the following problem: I need to find pairs of the same elements in two lists, which are unordered. The thing about these two lists is that they are "roughly equal" - only certain elements are shifted by a few indexes e.g. (Note, these objects are not ints, I am just using integers in this example):
[1,2,3,5,4,8,6,7,10,9]
[1,2,3,4,5,6,7,8,9,10]
My first attempt would be to iterate through both lists and generate two HashMaps based on some unique key for each object. Then, upon the second pass, I would simply pull the elements from both maps. This yields O(2N) in space and time.
I was thinking about a different approach: we would keep pointers to the current element in both lists, as well as currentlyUnmatched set for each of the list. the pseudocode would be sth of the following sort:
while(elements to process)
elem1 = list1.get(index1)
elem2 = list2.get(index2)
if(elem1 == elem2){ //do work
... index1++;
index2++;
}
else{
//Move index of the list that has no unamtched elems
if(firstListUnmatched.size() ==0){
//Didn't find it also in the other list so we save for later
if(secondListUnamtched.remove(elem1) != true)
firstListUnmatched.insert(elem1)
index1++
}
else { // same but with other index}
}
The above probably does not work... I just wanted to get a rough idea what you think about this approach. Basically, this maintains a hashset on the side of each list, which size << problem size. This should be ~O(N) for small number of misplaced elements and for small "gaps". Anyway, I look forward to your replies.
EDIT: I cannot simply return a set intersection of two object lists, as I need to perform operations (multiple operations even) on the objects I find as matching/non-matching
I cannot simply return a set intersection of two object lists, as I need to perform operations (multiple operations even) on the objects I find as matching/non-matching
You can maintain a set of the objects which don't match. This will be O(M) in space where M is the largest number of swapped elements at any point. It will be O(N) for time where N is the number of elements.
interface Listener<T> {
void matched(T t1);
void onlyIn1(T t1);
void onlyIn2(T t2);
}
public static <T> void compare(List<T> list1, List<T> list2, Listener<T> tListener) {
Set<T> onlyIn1 = new HashSet<T>();
Set<T> onlyIn2 = new HashSet<T>();
for (int i = 0; i < list1.size(); i++) {
T t1 = list1.get(i);
T t2 = list2.get(i);
if (t1.equals(t2)) {
tListener.matched(t1);
continue;
}
if (onlyIn2.remove(t1))
tListener.matched(t1);
else
onlyIn1.add(t1);
if (!onlyIn1.remove(t2))
onlyIn2.add(t2);
}
for (T t1 : onlyIn1)
tListener.onlyIn1(t1);
for (T t2 : onlyIn2)
tListener.onlyIn2(t2);
}
If I have understood your question correctly, You can use Collection.retainAll and then iterate over collection that is been retained and do what you have to do.
list2.retainAll(list1);
All approaches based on maps will be O(n log(n)) at best, because creating the map is an insertion sort. The effect is to do an insertion sort on both, and then compare them, which is as good as it's going to get.
If the lists are nearly sorted to begin with, a sort step shouldn't take as long as the average case, and will scale with O(n log(n)), so just do a sort on both and compare. This allows you to step through and perform your operations on the items that match or do not match as appropriate.
I have a nested list like below (but it has 1,000's of the holder lists within the one main list). Say I need to sort the main list listEmailData by the value for each of its holder lists on the holder.get(2) index. I can't seem to figure out how to do this any advice is appreciated.
ArrayList listEmailData;
ArrayList holder = new ArrayList();
listEmailData.add(3)
listEmailData.add(323)
listEmailData.add(2342)
listEmailData.add(holder)
EDIT: To clarify, I have a list where each list entry contains a sub-list, within this sub-list a specific index contains a value that is a ranking. I need to sort the main list based on this ranking value within each sub-list.
2ND EDIT: Thanks for the help on this, got it working but its seems that its putting larger numbers first and large numbers later, I was hoping to reverse this so it goes from largest to smallest as I am
You should implement Comparator<T> to compare lists, then call
Collections.sort(listEmailData, comparator);
Your comparator would have to compare any two "sublists" - e.g. by fetching a particular value. For example:
public class ListComparator implements Comparator<List<Integer>>
{
private final int indexToCompare;
public ListComparator(int indexToCompare)
{
this.indexToCompare = indexToCompare;
}
public int compare(List<Integer> first, List<Integer> second)
{
// TODO: null checking
Integer firstValue = first.get(indexToCompare);
Integer secondValue = second.get(indexToCompare);
return firstValue.compareTo(secondValue);
}
}
Note that this is using generics - hopefully your real code is too.