How to count similar rows in Arraylist

How to count similar rows in Arraylist - java

I have an Arraylist list1 of type SearchList which is my pojo class. I want to find number of similar rows and their frequency occuring in list1..
How do I achieve this?
My code:
List<SearchList> list1= new ArrayList<SearchList>();;
SearchList sList = new SearchList();
for (int k = 0; k < list.size(); k++) {
sList.setArea(list.get(k).getArea());
sList.setLocation(list.get(k).getLocation());
list1.add(sList);
}
I have done this for counting frequency but doesn't execute:
Set<SearchList> uniqueSet = new HashSet<SearchList>(list1);
for (SearchList temp : uniqueSet) {
System.out.println(temp + ": " + Collections.frequency(list1, temp));
}

Try implementing the equals() method on SearchList, since Collections.frequency() calls this method, and be sure that "uniqueSet" is not empty.

you should do the following:
make sure there are entries in your globally defined list (change its name to something less misleading btw.)
make sure you add new instances of SearchList to your list1. At the moment you are adding the same SearchList-Object over and over to your list1. Therefore if you call setArea or setLocation on your sList Object all your lsit entries will change their values. The list entry is just a reference pointing at the real object in memory.
make sure you supply proper equals() and hashcode() implementations for your SearchList class.
check this Link on equals() and hashcode(): Why do I need to override the equals and hashCode methods in Java?

List<SearchList> list1= new ArrayList<SearchList>(); //list1 is empty
list1 is empty so the code with in the for loop:
for (SearchList temp : uniqueSet) {
System.out.println(temp + ": " + Collections.frequency(list1,
temp));
}
would not execute (uniqueSet must have at least one element in order the loop code to be executed) as the set will also be empty:
Set<SearchList> uniqueSet = new HashSet<SearchList>(list1); // set is empty

If you want to distinct the row based on value of both area and location value then with Java 8 Streams, it is as simple as:
int countSimilar = (int) searchListList.stream().distinct().count();
Or, if distinctness of row depends on value of area only:
int countSimilarByArea = searchListList
.stream()
.collect(
Collectors.toCollection(() -> new TreeSet<SearchList>((
p1, p2) -> p1.getArea().compareTo(p2.getArea()))))
.size();
Or, if distinctness of row depends on value of location only:
int countSimilarByLocation = searchListList
.stream()
.collect(
Collectors.toCollection(() -> new TreeSet<SearchList>((
p1, p2) -> p1.getLocation().compareTo(
p2.getLocation())))).size();
Usage:
import java.util.ArrayList;
import java.util.List;
import java.util.TreeSet;
import java.util.stream.Collectors;
final class SearchList {
private final String area;
private final String location;
public SearchList(String area, String location) {
this.area = area;
this.location = location;
}
public String getArea() {
return area;
}
#Override
public String toString() {
return "SearchList [area=" + area + ", location=" + location + "]";
}
public String getLocation() {
return location;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((area == null) ? 0 : area.hashCode());
result = prime * result
+ ((location == null) ? 0 : location.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SearchList other = (SearchList) obj;
if (area == null) {
if (other.area != null)
return false;
} else if (!area.equals(other.area))
return false;
if (location == null) {
if (other.location != null)
return false;
} else if (!location.equals(other.location))
return false;
return true;
}
}
public class CountSimilar {
public static void main(String[] args) {
List<SearchList> searchListList = new ArrayList<>();
searchListList.add(new SearchList("A", "India"));
searchListList.add(new SearchList("B", "India"));
searchListList.add(new SearchList("A", "India"));
searchListList.add(new SearchList("A", "USA"));
searchListList.add(new SearchList("A", "USA"));
int countSimilar = (int) searchListList.stream().distinct().count();
System.out.println(countSimilar);
int countSimilarByArea = searchListList
.stream()
.collect(
Collectors.toCollection(() -> new TreeSet<SearchList>((
p1, p2) -> p1.getArea().compareTo(p2.getArea()))))
.size();
System.out.println(countSimilarByArea);
int countSimilarByLocation = searchListList
.stream()
.collect(
Collectors.toCollection(() -> new TreeSet<SearchList>((
p1, p2) -> p1.getLocation().compareTo(
p2.getLocation())))).size();
System.out.println(countSimilarByLocation);
}
}

Related

Java group by custom variable and return same object

I want to count number of duplicate in my list by custom variable (myHash)
Map<PersonHash, Long> result = list.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
This will count duplicate by id which is value in hash and equals function. How I can count it by custom variable ? In my case it is byte[] myHash
my pojo:
public class PersonHash implements Serializable {
private Long id;
private byte[] myHash;
....
}

You can't group by myHash and get an instance of PersonHash as key, if myHash is not the identifier and part of equals and hashCode.
If myHash is not part of equals and hashCode,
add a getter for myHash
PersonHash {
getMyHash() {...}
}
and use
Map<byte[], Long> result = list.stream()
.collect(Collectors.groupingBy(PersonHash::getMyHash, Collectors.counting()));
Afterwards you can match the list with the results to find the objects with the given hash.
Or use
Map<byte[], List<PersonHash>> result = list.stream()
.collect(Collectors.groupingBy(PersonHash::getMyHash));
to get the list of PersonHash with the same myHash value.

Another approach without changing your current pojo (changes to equals and hashcode might cause errors some where else) could be to sort your list by your myHash field, then you could use an atomic reference to build your map
List<PersonHash> list // your list
Comparator<PersonHash> byMyHash = (a,b) -> Arrays.compare(a.getMyHash(),b.getMyHash());
BiPredicate<PersonHash,PersonHash> pred = (a,b) -> Arrays.equals(a.getMyHash(),b.getMyHash());
list.sort(byMyHash);
AtomicReference<PersonHash> ai = new AtomicReference<>(list.get(0));
Map<PersonHash, Long> result = list.stream()
.collect(Collectors.groupingBy(ph -> {
if (pred.test(ph,ai.get())){
return ai.get();
}
else {
ai.set(ph);
return ph;
}
} , Collectors.counting()));
System.out.println(result);

You have to override the equals and hashCode function of your object. Then you can do this with Function.identity(). I have overrides those functions like below:
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
PersonHash personHash = (PersonHash) o;
return hashCompare(personHash) == 0;
}
#Override
public int hashCode() {
return myHash.length;
}
public int hashCompare(PersonHash other) {
int i = this.myHash.length - other.myHash.length;
if (i != 0) {
return i;
}
for (int j = 0; j < this.myHash.length; j++) {
i = this.myHash[j] - other.myHash[j];
if (i != 0) {
return i;
}
}
return 0;
}
And now with the following code:
PersonHash personHash1 = new PersonHash();
personHash1.setId(1L);
personHash1.setMyHash(new byte[]{1, 2, 3});
PersonHash personHash1_2 = new PersonHash();
personHash1_2.setId(3L);
personHash1_2.setMyHash(new byte[]{1, 2, 3});
PersonHash personHash2 = new PersonHash();
personHash2.setId(2L);
personHash2.setMyHash(new byte[]{4, 5, 6});
List<PersonHash> list = new LinkedList<>();
list.add(personHash1);
list.add(personHash1_2);
list.add(personHash2);
Map<PersonHash, Long> result = list.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
result.forEach((k, v) -> System.out.println(Arrays.toString(k.getMyHash()) + " " + v));
You will get the following output:
[4, 5, 6] 1
[1, 2, 3] 2
PS: Please write better hashCode() function, I just want to demonstrate.
Edit : As #WJS Commented we could override the equals method like this, and we don't need the hashCompare function anymore:
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
return Arrays.equals(myHash,((PersonHash) ob).getHash());
}

finding duplicates using java 8 [duplicate]

This question already has answers here:
Java 8, Streams to find the duplicate elements
(17 answers)
Closed 3 years ago.
I have an employee class with id, name and address fields. Two employees are considered the same if their id and name are exactly same. Now I have a list of employees, now my task is to get the collection of duplicate employees.
Here is my code for Employee class with hascode and equals methods overriden based on id and name fields.
class Employee {
int id;
String name;
String address;
public Employee(int id, String name, String address) {
this.id = id;
this.name = name;
this.address = address;
}
#Override
public String toString() {
return "Employee [id=" + id + ", name=" + name + ", address=" + address + "]";
}
// auto generated by eclipse based on fields for id and name
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + id;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Employee other = (Employee) obj;
if (id != other.id)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
}
Now I have this code to find the duplicate employees
public static void main(String[] args) {
Employee e1 = new Employee(1, "John", "SFO");
Employee e2 = new Employee(2, "Doe", "NY");
Employee e3 = new Employee(1, "John", "NJ");
List<Employee> list = Arrays.asList(e1, e2, e3);
Set<Employee> set = new HashSet<>();
for (int i = 0; i < list.size(); i++) {
for (int j = i + 1; j < list.size(); j++) {
if (list.get(i).equals(list.get(j))) {
set.add(list.get(i));
}
}
}
System.out.println(set);
}
This code works fine and gives me employee with id 1 in my set.
How to do the same operation using Java 8 lamda's and streams? Is flatmap is helpful in this case?

Your requirement if kinda specific and not really helpful in most cases. I would do something like that instead:
final Map<Employee, Long> groupedWithCount = employees.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
Now you have all the info you need, and more. Employees are grouped by count in this map, for your data it looks like that:
{
Employee [id=2, name=Doe, address=NY] = 1,
Employee [id=1, name=John, address=SFO] = 2
}
Obviously, duplicates are entries with value > 1.

Another approach:
list.stream()
.collect(groupingBy(identity(), counting()))
.entrySet()
.stream()
.filter(e -> e.getValue() != 1)
.map(Map.Entry::getKey)
.collect(toList());
or :
list.stream()
.collect(groupingBy(identity()))
.values()
.stream()
.filter(l -> l.size() != 1)
.map(l -> l.get(0)) // The list cannot be empty
.collect(toList());

Best way to map a triplet to an int in Java

I'd like to map Triplets to an Int, like so:
(12,6,6) -> 1
(1,0,6) -> 1
(2,3,7) -> 0
I need to be able access the Int and each individual values in the triplet.
What's the most efficient way of doing this in Java?
Thanks

Java has no built-in method for representing tuples.
But you can easily create one on your one. Just take a look at this simple generic Triple class:
public class Triple<A, B, C> {
private final A mFirst;
private final B mSecond;
private final C mThird;
public Triple(final A first, final B second, final C third) {
this.mFirst = first;
this.mSecond = second;
this.mThird = third;
}
public A getFirst() {
return this.mFirst;
}
public B getSecond() {
return this.mSecond;
}
public C getThird() {
return this.mThird;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + this.mFirst.hashCode();
result = prime * result + this.mSecond.hashCode();
result = prime * result + this.mThird.hashCode();
return result;
}
#Override
public boolean equals(final Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final Triple other = (Triple) obj;
if (this.mFirst == null) {
if (other.mFirst != null) {
return false;
}
} else if (!this.mFirst.equals(other.mFirst)) {
return false;
}
if (this.mSecond == null) {
if (other.mSecond != null) {
return false;
}
} else if (!this.mSecond.equals(other.mSecond)) {
return false;
}
if (this.mThird == null) {
if (other.mThird != null) {
return false;
}
} else if (!this.mThird.equals(other.mThird)) {
return false;
}
return true;
}
}
The class just holds the three values and provides getters. Additionally it overrides equals and hashCode by comparing all three values.
Don't be scared of how equals and hashCode are implemented. They were generated by an IDE (most IDEs are capable of doing this).
You can then create your mappings using a Map like this:
Map<Triple<Integer, Integer, Integer>, Integer> map = new HashMap<>();
map.put(new Triple<>(12, 6, 6), 1);
map.put(new Triple<>(1, 0, 6), 1);
map.put(new Triple<>(2, 3, 7), 0);
And access them by Map#get:
Triple<Integer, Integer, Integer> key = ...
int value = map.get(key);
Alternatively you could add a fourth value to your Triple class, like id or something like that. Or build a Quadruple class instead.
For convenience you could also create a generic factory method like Triple#of and add it to the Triple class:
public static <A, B, C> Triple<A, B, C> of(final A first,
final B second, final C third) {
return new Triple<>(first, second, third);
}
You can then use it to create instances of Triple slightly compacter. Compare both methods:
// Using constructor
new Triple<>(12, 6, 6);
// Using factory
Triple.of(12, 6, 6);

You can use org.apache.commons.lang3.tuple.Triple
HashMap<Triple<Integer, Integer, Integer>, Integer> tripletMap = new HashMap<>();
tripletMap.put(Triple.of(12, 6, 6), 1);

Data Structure for keeping frequency count of pairwise data?

I have a table with hundred' of record where a field is paired with a similar field based on an id. I want to know what is a good data structure for keeping frequency counts for the number of times a pair has appeared together irrespective of the order they appeared in.
Sample data:
ID Feature
5 F1
5 F2
6 F1
6 F2
7 F3
7 F1
7 F2
8 F1
9 F1
10 F1
The sample output is:
F1 F2 F3
F1 0 3 1
F2 3 0 1
F3 1 1 0
One option is to sort all features and use a 2-dimensional int array to represent the pairwise data but then 2/3's of the array is useless/duplicate. For example array[i][i] = 0 and array[i][j] = array[j][i]. Given that I have hundreds of features, this approach won't work.
I thought of using a map but then the key needs to represent a pair e.g. (F1,F3). I am hoping for other solutions too. If there are none I will use a map.

Create a class, say MyPair to use for hash keys that stores pairs of your items and overrides Object#equals(...) (and Object#hashCode()) so that order doesn't matter (e.g. by ordering lexicographically).
Create a Map<MyPair,Integer> to store the frequency count of your pairs.
class MyPair {
public final String feature1;
public final String feature2;
public MyPair(String s1, String s2) {
// Order features so comparison is order-independent.
if (s1.compareTo(s2) <= 0) { // TODO: null check
feature1 = s1;
feature2 = s2;
} else {
feature1 = s2;
feature2 = s1;
}
}
#Override public int hashCode() {
return (s1 + s2).hashCode(); // TODO: cache for performance.
}
#Override public boolean equals(that) {
return (that instanceof MyPair)
&& (that.feature1.equals(this.feature1))
&& (that.feature2.equals(this.feature2));
}
}
Then can hash pairs as expected:
Map<MyPair,Integer> freq = new HashMap<MyPair,Integer>();
MyPair pair1 = new MyPair("F1", "F2");
freq.get(pair1); // => null
freq.put(pair1, 1);
MyPair pair2 = new MyPair("F2", "F1");
freq.get(pair2); // => 1

This is simple algorithm. I assume that data is initially sorted. It is not maybe written as good as I wanted to be, but It must only shows you the proper path :)
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
public class NeighborListExample {
static class Pair {
private String feature;
private int cnt = 1;
Pair(String feature) {
this.feature = feature;
}
void incr() {
cnt++;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((feature == null) ? 0 : feature.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Pair other = (Pair) obj;
if (feature == null) {
if (other.feature != null)
return false;
} else if (!feature.equals(other.feature))
return false;
return true;
}
#Override
public String toString() {
return "(" + feature + ", " + cnt + ")";
}
}
static Map<String, List<Pair>> feature2neighbors = new HashMap<>();
private static int getId(Object[][] data, int i) {
return ((Integer) data[i][0]).intValue();
}
private static String getFeature(Object[][] data, int i) {
return data[i][1].toString();
}
private static void processFeatures(String[] array) {
for (int i = 0; i < array.length; i++) {
for (int j = 0; j < array.length; j++) {
if (i != j) {
List<Pair> pairs = feature2neighbors.get(array[i]);
if (pairs == null) {
pairs = new LinkedList<>();
feature2neighbors.put(array[i], pairs);
}
Pair toAdd = new Pair(array[j]);
int index = pairs.indexOf(toAdd);
if (index == -1) {
pairs.add(toAdd);
} else {
pairs.get(index).incr();
}
}
}
}
}
static void print(Map<String, List<Pair>> feature2neighbors) {
StringBuilder builder = new StringBuilder();
for (Map.Entry<String, List<Pair>> e : feature2neighbors.entrySet()) {
builder.append(e.getKey()).append(" -> ");
Iterator<Pair> it = e.getValue().iterator();
builder.append(it.next().toString());
while(it.hasNext()) {
builder.append(" ").append(it.next().toString());
}
builder.append("\n");
}
System.out.println(builder.toString());
}
public static void main(String[] args) {
//I assume that data is sorted
Object[][] data = { { 5, "F1" }, //
{ 5, "F2" }, //
{ 6, "F1" }, //
{ 6, "F2" }, //
{ 7, "F3" }, //
{ 7, "F1" }, //
{ 7, "F2" }, //
{ 8, "F1" }, //
{ 9, "F1" }, //
{ 10, "F1" }, //
};
List<String> features = new LinkedList<>();
int id = getId(data, 0);
for (int i = 0; i < data.length; i++) {
if (id != getId(data, i)) {
processFeatures(features.toArray(new String[0]));
features = new LinkedList<>();
id = getId(data, i);
}
features.add(getFeature(data, i));
}
print(feature2neighbors);
}
}
Out:
F1 -> (F2, 3) (F3, 1)
F3 -> (F1, 1) (F2, 1)
F2 -> (F1, 3) (F3, 1)

HashMap in java cannot hash MyObject

I have defined a simple private class named SetOb which contains an int and a Set data structure. I have a HashMap in the 'main' method with SetOb as Key and Integer as value. Now as you can see in the main method, when I feed the HashMap with a SetOb instance and then look for an instance with exactly the same value, it returns 'null'. This has happened with me quite a few times before when I use my own defined data structures like SetOb as Key in HashMap. Can someone please point me what am I missing ?
Please note that in the constructor of SetOb class, I copy the Set passed as argument.
public class Solution {
public static Solution sample = new Solution();
private class SetOb {
public int last;
public Set<Integer> st;
public SetOb(int l , Set<Integer> si ){
last = l;
st = new HashSet<Integer>(si);
}
}
public static void main(String[] args) {
Map<SetOb, Integer> m = new HashMap< SetOb, Integer>();
Set<Integer> a = new HashSet<Integer>();
for(int i =0; i<10; i++){
a.add(i);
}
SetOb x = sample.new SetOb(100, a);
SetOb y = sample.new SetOb(100, a);
m.put(x,500);
Integer val = m.get(y);
if(val!= null) System.out.println("Success: " + val);
else System.out.println("Failure");
}
}

Your x and y are not the same object instances hence contains is not able to match y against x, which ends up not finding the matching key/value in the Map.
If you want the match to succeed, please implement(override) hasCode & equals method in SetOb which will compare the field values.
Sample methods(Eclipse generated) as below:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + last;
result = prime * result + ((st == null) ? 0 : st.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SetOb other = (SetOb) obj;
if (last != other.last)
return false;
if (st == null) {
if (other.st != null)
return false;
} else if (!st.equals(other.st))
return false;
return true;
}

The default implementation of hashCode uses object identity to determine the hash code. You will need to implement hashCode (and equals) in your private class if you want value identity. For instance:
private class SetOb {
public int last;
public Set<Integer> st;
public SetOb(int l , Set<Integer> si ){
last = l;
st = new HashSet<Integer>(si);
}
#Override
public boolean equals(Object other) {
if (other.class == SetOb.class) {
SetOb otherSetOb = (SetOb) other;
return otherSetOb.last == last && otherSetOb.st.equals(st);
}
return false;
}
#Override
public int hashCode() {
return 37 * last + st.hashCode();
}
}

SetOb needs to override the hashCode() and thus the equals() methods.
Hash-based collections use these methods to store (hashCode()) and retrieve (hashCode()) and equals()) your objects.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to count similar rows in Arraylist - java

Try implementing the equals() method on SearchList, since Collections.frequency() calls this method, and be sure that "uniqueSet" is not empty.

Related

Java group by custom variable and return same object

finding duplicates using java 8 [duplicate]

Best way to map a triplet to an int in Java

Data Structure for keeping frequency count of pairwise data?

HashMap in java cannot hash MyObject

Categories

Resources