TreeSet to find k most frequent words in a book?

TreeSet to find k most frequent words in a book? - java

The commonly occurring question of finding k most frequent words in a book ,(words can dynamically be added), is usually solved using combination of trie and heap.
However, I think even using a TreeSet should suffice and be cleaner with log(n) performance for insert and retrievals.
The treeset would contain a custom object:
class MyObj implements Comparable{
String value;
int count;
public int incrementCount(){count++;}
//override equals and hashcode to make this object unique by string 'value'
//override compareTo to compare count
}
Whenever we insert object in the treeset we first check if the element is already present in the treeset if yes then we get the obj and increment the count variable of that object.
Whenever, we want to find the k largest words , we just iterate over the first k elements of the treeset
What are your views on the above approach? I feel this approach is easier to code and understand and also matches the time complexity of the trie and heap approach to get k largest elements
EDIT: As stated in one of the answers , incrementing count variable after myobj has been inserted wouldn't re-sort the treeset/treemap. So ,after incrementing the count , I will additionally need to remove and reinsert the object in the treeset/treemap

Once you enter an object into the TreeSet, if the properties used in the comparison of the compareTo method changes, the TreeSet (or the underlying TreeMap) does not reorder the elements. Hence, this approach does not work as you expect.
Here's a simple example to demonstrate it
public static class MyObj implements Comparable<MyObj> {
String value;
int count;
MyObj(String v, int c) {
this.value = v;
this.count = c;
}
public void incrementCount(){
count++;
}
#Override
public int compareTo(MyObj o) {
return Integer.compare(this.count, o.count); //This does the reverse. Orders by freqency
}
}
public static void main(String[] args) {
Set<MyObj> set = new TreeSet<>();
MyObj o1 = new MyObj("a", 1);
MyObj o2 = new MyObj("b", 4);
MyObj o3 = new MyObj("c", 2);
set.add(o1);
set.add(o2);
set.add(o3);
System.out.println(set);
//The above prints [a-1, c-2, b-4]
//Increment the count of c 4 times
o3.incrementCount();
o3.incrementCount();
o3.incrementCount();
o3.incrementCount();
System.out.println(set);
//The above prints [a-1, c-6, b-4]
As we can see the object corresponding to c-6 does not get pushed to the last.
//Insert a new object
set.add(new MyObj("d", 3));
System.out.println(set);
//this prints [a-1, d-3, c-6, b-4]
}
EDIT:
Caveats/Problems:
Using count when comparing two words would remove one word if both words have the same frequency. So, you need to compare the actual words if their frequencies are same.
It would work if we remove and reinsert the object with the updated frequency. But for that, you need to get that object(MyObj instance for a specified value to know the frequency so far) from the TreeSet. A Set does not have a get method. Its contains method just delegates to the underlying TreeMap's containsKey method which identifies the object by using the compareTo logic (and not equals). The compareTo function also takes into account the frequency of the word, so we cannot identify the word in the set to remove it (unless we iterate the whole set on each add)

A TreeMap should work if you remove and insert the object, with an integer key as a frequency and a list of MyObj as a value, the keys are sorted by frequency. An update of the above code demonstrate it:
public class MyObj {
String value;
int count;
MyObj(String v, int c) {
this.value = v;
this.count = c;
}
public int getCount() {
return count;
}
public void incrementCount() {
count++;
}
#Override
public String toString() {
return value + " " + count;
}
public static void put(Map<Integer, List<MyObj>> map, MyObj value) {
List<MyObj> myObjs = map.get(value.getCount());
if (myObjs == null) {
myObjs = new ArrayList<>();
map.put(value.getCount(),myObjs);
}
myObjs.add(value);
}
public static void main(String[] args) {
TreeMap<Integer, List<MyObj>> set = new TreeMap<>();
MyObj o1 = new MyObj("a", 1);
MyObj o2 = new MyObj("b", 4);
MyObj o3 = new MyObj("c", 2);
MyObj o4 = new MyObj("f", 4);
put(set,o1);
put(set,o2);
put(set,o3);
System.out.println(set);
put(set,o4);
System.out.println(set);
}
}

Related

How to check if two objects in a ArrayList are the same? [duplicate]

How could I go about detecting (returning true/false) whether an ArrayList contains more than one of the same element in Java?
Many thanks,
Terry
Edit
Forgot to mention that I am not looking to compare "Blocks" with each other but their integer values. Each "block" has an int and this is what makes them different.
I find the int of a particular Block by calling a method named "getNum" (e.g. table1[0][2].getNum();

Simplest: dump the whole collection into a Set (using the Set(Collection) constructor or Set.addAll), then see if the Set has the same size as the ArrayList.
List<Integer> list = ...;
Set<Integer> set = new HashSet<Integer>(list);
if(set.size() < list.size()){
/* There are duplicates */
}
Update: If I'm understanding your question correctly, you have a 2d array of Block, as in
Block table[][];
and you want to detect if any row of them has duplicates?
In that case, I could do the following, assuming that Block implements "equals" and "hashCode" correctly:
for (Block[] row : table) {
Set set = new HashSet<Block>();
for (Block cell : row) {
set.add(cell);
}
if (set.size() < 6) { //has duplicate
}
}
I'm not 100% sure of that for syntax, so it might be safer to write it as
for (int i = 0; i < 6; i++) {
Set set = new HashSet<Block>();
for (int j = 0; j < 6; j++)
set.add(table[i][j]);
...
Set.add returns a boolean false if the item being added is already in the set, so you could even short circuit and bale out on any add that returns false if all you want to know is whether there are any duplicates.

Improved code, using return value of Set#add instead of comparing the size of list and set.
public static <T> boolean hasDuplicate(Iterable<T> all) {
Set<T> set = new HashSet<T>();
// Set#add returns false if the set does not change, which
// indicates that a duplicate element has been added.
for (T each: all) if (!set.add(each)) return true;
return false;
}

With Java 8+ you can use Stream API:
boolean areAllDistinct(List<Block> blocksList) {
return blocksList.stream().map(Block::getNum).distinct().count() == blockList.size();
}

If you are looking to avoid having duplicates at all, then you should just cut out the middle process of detecting duplicates and use a Set.

Improved code to return the duplicate elements
Can find duplicates in a Collection
return the set of duplicates
Unique Elements can be obtained from the Set
public static <T> List getDuplicate(Collection<T> list) {
final List<T> duplicatedObjects = new ArrayList<T>();
Set<T> set = new HashSet<T>() {
#Override
public boolean add(T e) {
if (contains(e)) {
duplicatedObjects.add(e);
}
return super.add(e);
}
};
for (T t : list) {
set.add(t);
}
return duplicatedObjects;
}
public static <T> boolean hasDuplicate(Collection<T> list) {
if (getDuplicate(list).isEmpty())
return false;
return true;
}

I needed to do a similar operation for a Stream, but couldn't find a good example. Here's what I came up with.
public static <T> boolean areUnique(final Stream<T> stream) {
final Set<T> seen = new HashSet<>();
return stream.allMatch(seen::add);
}
This has the advantage of short-circuiting when duplicates are found early rather than having to process the whole stream and isn't much more complicated than just putting everything in a Set and checking the size. So this case would roughly be:
List<T> list = ...
boolean allDistinct = areUnique(list.stream());

If your elements are somehow Comparable (the fact that the order has any real meaning is indifferent -- it just needs to be consistent with your definition of equality), the fastest duplicate removal solution is going to sort the list ( 0(n log(n)) ) then to do a single pass and look for repeated elements (that is, equal elements that follow each other) (this is O(n)).
The overall complexity is going to be O(n log(n)), which is roughly the same as what you would get with a Set (n times long(n)), but with a much smaller constant. This is because the constant in sort/dedup results from the cost of comparing elements, whereas the cost from the set is most likely to result from a hash computation, plus one (possibly several) hash comparisons. If you are using a hash-based Set implementation, that is, because a Tree based is going to give you a O( n log²(n) ), which is even worse.
As I understand it, however, you do not need to remove duplicates, but merely test for their existence. So you should hand-code a merge or heap sort algorithm on your array, that simply exits returning true (i.e. "there is a dup") if your comparator returns 0, and otherwise completes the sort, and traverse the sorted array testing for repeats. In a merge or heap sort, indeed, when the sort is completed, you will have compared every duplicate pair unless both elements were already in their final positions (which is unlikely). Thus, a tweaked sort algorithm should yield a huge performance improvement (I would have to prove that, but I guess the tweaked algorithm should be in the O(log(n)) on uniformly random data)

If you want the set of duplicate values:
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class FindDuplicateInArrayList {
public static void main(String[] args) {
Set<String> uniqueSet = new HashSet<String>();
List<String> dupesList = new ArrayList<String>();
for (String a : args) {
if (uniqueSet.contains(a))
dupesList.add(a);
else
uniqueSet.add(a);
}
System.out.println(uniqueSet.size() + " distinct words: " + uniqueSet);
System.out.println(dupesList.size() + " dupesList words: " + dupesList);
}
}
And probably also think about trimming values or using lowercase ... depending on your case.

Simply put:
1) make sure all items are comparable
2) sort the array
2) iterate over the array and find duplicates

To know the Duplicates in a List use the following code:It will give you the set which contains duplicates.
public Set<?> findDuplicatesInList(List<?> beanList) {
System.out.println("findDuplicatesInList::"+beanList);
Set<Object> duplicateRowSet=null;
duplicateRowSet=new LinkedHashSet<Object>();
for(int i=0;i<beanList.size();i++){
Object superString=beanList.get(i);
System.out.println("findDuplicatesInList::superString::"+superString);
for(int j=0;j<beanList.size();j++){
if(i!=j){
Object subString=beanList.get(j);
System.out.println("findDuplicatesInList::subString::"+subString);
if(superString.equals(subString)){
duplicateRowSet.add(beanList.get(j));
}
}
}
}
System.out.println("findDuplicatesInList::duplicationSet::"+duplicateRowSet);
return duplicateRowSet;
}

best way to handle this issue is to use a HashSet :
ArrayList<String> listGroupCode = new ArrayList<>();
listGroupCode.add("A");
listGroupCode.add("A");
listGroupCode.add("B");
listGroupCode.add("C");
HashSet<String> set = new HashSet<>(listGroupCode);
ArrayList<String> result = new ArrayList<>(set);
Just print result arraylist and see the result without duplicates :)

This answer is wrriten in Kotlin, but can easily be translated to Java.
If your arraylist's size is within a fixed small range, then this is a great solution.
var duplicateDetected = false
if(arrList.size > 1){
for(i in 0 until arrList.size){
for(j in 0 until arrList.size){
if(i != j && arrList.get(i) == arrList.get(j)){
duplicateDetected = true
}
}
}
}

private boolean isDuplicate() {
for (int i = 0; i < arrayList.size(); i++) {
for (int j = i + 1; j < arrayList.size(); j++) {
if (arrayList.get(i).getName().trim().equalsIgnoreCase(arrayList.get(j).getName().trim())) {
return true;
}
}
}
return false;
}

String tempVal = null;
for (int i = 0; i < l.size(); i++) {
tempVal = l.get(i); //take the ith object out of list
while (l.contains(tempVal)) {
l.remove(tempVal); //remove all matching entries
}
l.add(tempVal); //at last add one entry
}
Note: this will have major performance hit though as items are removed from start of the list.
To address this, we have two options. 1) iterate in reverse order and remove elements. 2) Use LinkedList instead of ArrayList. Due to biased questions asked in interviews to remove duplicates from List without using any other collection, above example is the answer. In real world though, if I have to achieve this, I will put elements from List to Set, simple!

/**
* Method to detect presence of duplicates in a generic list.
* Depends on the equals method of the concrete type. make sure to override it as required.
*/
public static <T> boolean hasDuplicates(List<T> list){
int count = list.size();
T t1,t2;
for(int i=0;i<count;i++){
t1 = list.get(i);
for(int j=i+1;j<count;j++){
t2 = list.get(j);
if(t2.equals(t1)){
return true;
}
}
}
return false;
}
An example of a concrete class that has overridden equals() :
public class Reminder{
private long id;
private int hour;
private int minute;
public Reminder(long id, int hour, int minute){
this.id = id;
this.hour = hour;
this.minute = minute;
}
#Override
public boolean equals(Object other){
if(other == null) return false;
if(this.getClass() != other.getClass()) return false;
Reminder otherReminder = (Reminder) other;
if(this.hour != otherReminder.hour) return false;
if(this.minute != otherReminder.minute) return false;
return true;
}
}

ArrayList<String> withDuplicates = new ArrayList<>();
withDuplicates.add("1");
withDuplicates.add("2");
withDuplicates.add("1");
withDuplicates.add("3");
HashSet<String> set = new HashSet<>(withDuplicates);
ArrayList<String> withoutDupicates = new ArrayList<>(set);
ArrayList<String> duplicates = new ArrayList<String>();
Iterator<String> dupIter = withDuplicates.iterator();
while(dupIter.hasNext())
{
String dupWord = dupIter.next();
if(withDuplicates.contains(dupWord))
{
duplicates.add(dupWord);
}else{
withoutDupicates.add(dupWord);
}
}
System.out.println(duplicates);
System.out.println(withoutDupicates);

A simple solution for learners.
//Method to find the duplicates.
public static List<Integer> findDublicate(List<Integer> numList){
List<Integer> dupLst = new ArrayList<Integer>();
//Compare one number against all the other number except the self.
for(int i =0;i<numList.size();i++) {
for(int j=0 ; j<numList.size();j++) {
if(i!=j && numList.get(i)==numList.get(j)) {
boolean isNumExist = false;
//The below for loop is used for avoid the duplicate again in the result list
for(Integer aNum: dupLst) {
if(aNum==numList.get(i)) {
isNumExist = true;
break;
}
}
if(!isNumExist) {
dupLst.add(numList.get(i));
}
}
}
}
return dupLst;
}

How to match the exact string value in the list of comma separated string [duplicate]

I have a String[] with values like so:
public static final String[] VALUES = new String[] {"AB","BC","CD","AE"};
Given String s, is there a good way of testing whether VALUES contains s?

Arrays.asList(yourArray).contains(yourValue)
Warning: this doesn't work for arrays of primitives (see the comments).
Since java-8 you can now use Streams.
String[] values = {"AB","BC","CD","AE"};
boolean contains = Arrays.stream(values).anyMatch("s"::equals);
To check whether an array of int, double or long contains a value use IntStream, DoubleStream or LongStream respectively.
Example
int[] a = {1,2,3,4};
boolean contains = IntStream.of(a).anyMatch(x -> x == 4);

Concise update for Java SE 9
Reference arrays are bad. For this case we are after a set. Since Java SE 9 we have Set.of.
private static final Set<String> VALUES = Set.of(
"AB","BC","CD","AE"
);
"Given String s, is there a good way of testing whether VALUES contains s?"
VALUES.contains(s)
O(1).
The right type, immutable, O(1) and concise. Beautiful.*
Original answer details
Just to clear the code up to start with. We have (corrected):
public static final String[] VALUES = new String[] {"AB","BC","CD","AE"};
This is a mutable static which FindBugs will tell you is very naughty. Do not modify statics and do not allow other code to do so also. At an absolute minimum, the field should be private:
private static final String[] VALUES = new String[] {"AB","BC","CD","AE"};
(Note, you can actually drop the new String[]; bit.)
Reference arrays are still bad and we want a set:
private static final Set<String> VALUES = new HashSet<String>(Arrays.asList(
new String[] {"AB","BC","CD","AE"}
));
(Paranoid people, such as myself, may feel more at ease if this was wrapped in Collections.unmodifiableSet - it could then even be made public.)
(*To be a little more on brand, the collections API is predictably still missing immutable collection types and the syntax is still far too verbose, for my tastes.)

You can use ArrayUtils.contains from Apache Commons Lang
public static boolean contains(Object[] array, Object objectToFind)
Note that this method returns false if the passed array is null.
There are also methods available for primitive arrays of all kinds.
Example:
String[] fieldsToInclude = { "id", "name", "location" };
if ( ArrayUtils.contains( fieldsToInclude, "id" ) ) {
// Do some stuff.
}

Just simply implement it by hand:
public static <T> boolean contains(final T[] array, final T v) {
for (final T e : array)
if (e == v || v != null && v.equals(e))
return true;
return false;
}
Improvement:
The v != null condition is constant inside the method. It always evaluates to the same Boolean value during the method call. So if the input array is big, it is more efficient to evaluate this condition only once, and we can use a simplified/faster condition inside the for loop based on the result. The improved contains() method:
public static <T> boolean contains2(final T[] array, final T v) {
if (v == null) {
for (final T e : array)
if (e == null)
return true;
}
else {
for (final T e : array)
if (e == v || v.equals(e))
return true;
}
return false;
}

Four Different Ways to Check If an Array Contains a Value
Using List:
public static boolean useList(String[] arr, String targetValue) {
return Arrays.asList(arr).contains(targetValue);
}
Using Set:
public static boolean useSet(String[] arr, String targetValue) {
Set<String> set = new HashSet<String>(Arrays.asList(arr));
return set.contains(targetValue);
}
Using a simple loop:
public static boolean useLoop(String[] arr, String targetValue) {
for (String s: arr) {
if (s.equals(targetValue))
return true;
}
return false;
}
Using Arrays.binarySearch():
The code below is wrong, it is listed here for completeness. binarySearch() can ONLY be used on sorted arrays. You will find the result is weird below. This is the best option when array is sorted.
public static boolean binarySearch(String[] arr, String targetValue) {
return Arrays.binarySearch(arr, targetValue) >= 0;
}
Quick Example:
String testValue="test";
String newValueNotInList="newValue";
String[] valueArray = { "this", "is", "java" , "test" };
Arrays.asList(valueArray).contains(testValue); // returns true
Arrays.asList(valueArray).contains(newValueNotInList); // returns false

If the array is not sorted, you will have to iterate over everything and make a call to equals on each.
If the array is sorted, you can do a binary search, there's one in the Arrays class.
Generally speaking, if you are going to do a lot of membership checks, you may want to store everything in a Set, not in an array.

For what it's worth I ran a test comparing the 3 suggestions for speed. I generated random integers, converted them to a String and added them to an array. I then searched for the highest possible number/string, which would be a worst case scenario for the asList().contains().
When using a 10K array size the results were:
Sort & Search : 15
Binary Search : 0
asList.contains : 0
When using a 100K array the results were:
Sort & Search : 156
Binary Search : 0
asList.contains : 32
So if the array is created in sorted order the binary search is the fastest, otherwise the asList().contains would be the way to go. If you have many searches, then it may be worthwhile to sort the array so you can use the binary search. It all depends on your application.
I would think those are the results most people would expect. Here is the test code:
import java.util.*;
public class Test {
public static void main(String args[]) {
long start = 0;
int size = 100000;
String[] strings = new String[size];
Random random = new Random();
for (int i = 0; i < size; i++)
strings[i] = "" + random.nextInt(size);
start = System.currentTimeMillis();
Arrays.sort(strings);
System.out.println(Arrays.binarySearch(strings, "" + (size - 1)));
System.out.println("Sort & Search : "
+ (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
System.out.println(Arrays.binarySearch(strings, "" + (size - 1)));
System.out.println("Search : "
+ (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
System.out.println(Arrays.asList(strings).contains("" + (size - 1)));
System.out.println("Contains : "
+ (System.currentTimeMillis() - start));
}
}

Instead of using the quick array initialisation syntax too, you could just initialise it as a List straight away in a similar manner using the Arrays.asList method, e.g.:
public static final List<String> STRINGS = Arrays.asList("firstString", "secondString" ...., "lastString");
Then you can do (like above):
STRINGS.contains("the string you want to find");

With Java 8 you can create a stream and check if any entries in the stream matches "s":
String[] values = {"AB","BC","CD","AE"};
boolean sInArray = Arrays.stream(values).anyMatch("s"::equals);
Or as a generic method:
public static <T> boolean arrayContains(T[] array, T value) {
return Arrays.stream(array).anyMatch(value::equals);
}

You can use the Arrays class to perform a binary search for the value. If your array is not sorted, you will have to use the sort functions in the same class to sort the array, then search through it.

ObStupidAnswer (but I think there's a lesson in here somewhere):
enum Values {
AB, BC, CD, AE
}
try {
Values.valueOf(s);
return true;
} catch (IllegalArgumentException exc) {
return false;
}

Actually, if you use HashSet<String> as Tom Hawtin proposed you don't need to worry about sorting, and your speed is the same as with binary search on a presorted array, probably even faster.
It all depends on how your code is set up, obviously, but from where I stand, the order would be:
On an unsorted array:
HashSet
asList
sort & binary
On a sorted array:
HashSet
Binary
asList
So either way, HashSet for the win.

Developers often do:
Set<String> set = new HashSet<String>(Arrays.asList(arr));
return set.contains(targetValue);
The above code works, but there is no need to convert a list to set first. Converting a list to a set requires extra time. It can as simple as:
Arrays.asList(arr).contains(targetValue);
or
for (String s : arr) {
if (s.equals(targetValue))
return true;
}
return false;
The first one is more readable than the second one.

If you have the google collections library, Tom's answer can be simplified a lot by using ImmutableSet (http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/ImmutableSet.html)
This really removes a lot of clutter from the initialization proposed
private static final Set<String> VALUES = ImmutableSet.of("AB","BC","CD","AE");

In Java 8 use Streams.
List<String> myList =
Arrays.asList("a1", "a2", "b1", "c2", "c1");
myList.stream()
.filter(s -> s.startsWith("c"))
.map(String::toUpperCase)
.sorted()
.forEach(System.out::println);

One possible solution:
import java.util.Arrays;
import java.util.List;
public class ArrayContainsElement {
public static final List<String> VALUES = Arrays.asList("AB", "BC", "CD", "AE");
public static void main(String args[]) {
if (VALUES.contains("AB")) {
System.out.println("Contains");
} else {
System.out.println("Not contains");
}
}
}

Using a simple loop is the most efficient way of doing this.
boolean useLoop(String[] arr, String targetValue) {
for(String s: arr){
if(s.equals(targetValue))
return true;
}
return false;
}
Courtesy to Programcreek

the shortest solution
the array VALUES may contain duplicates
since Java 9
List.of(VALUES).contains(s);

Use the following (the contains() method is ArrayUtils.in() in this code):
ObjectUtils.java
public class ObjectUtils {
/**
* A null safe method to detect if two objects are equal.
* #param object1
* #param object2
* #return true if either both objects are null, or equal, else returns false.
*/
public static boolean equals(Object object1, Object object2) {
return object1 == null ? object2 == null : object1.equals(object2);
}
}
ArrayUtils.java
public class ArrayUtils {
/**
* Find the index of of an object is in given array,
* starting from given inclusive index.
* #param ts Array to be searched in.
* #param t Object to be searched.
* #param start The index from where the search must start.
* #return Index of the given object in the array if it is there, else -1.
*/
public static <T> int indexOf(final T[] ts, final T t, int start) {
for (int i = start; i < ts.length; ++i)
if (ObjectUtils.equals(ts[i], t))
return i;
return -1;
}
/**
* Find the index of of an object is in given array, starting from 0;
* #param ts Array to be searched in.
* #param t Object to be searched.
* #return indexOf(ts, t, 0)
*/
public static <T> int indexOf(final T[] ts, final T t) {
return indexOf(ts, t, 0);
}
/**
* Detect if the given object is in the given array.
* #param ts Array to be searched in.
* #param t Object to be searched.
* #return If indexOf(ts, t) is greater than -1.
*/
public static <T> boolean in(final T[] ts, final T t) {
return indexOf(ts, t) > -1;
}
}
As you can see in the code above, that there are other utility methods ObjectUtils.equals() and ArrayUtils.indexOf(), that were used at other places as well.

For arrays of limited length use the following (as given by camickr). This is slow for repeated checks, especially for longer arrays (linear search).
Arrays.asList(...).contains(...)
For fast performance if you repeatedly check against a larger set of elements
An array is the wrong structure. Use a TreeSet and add each element to it. It sorts elements and has a fast exist() method (binary search).
If the elements implement Comparable & you want the TreeSet sorted accordingly:
ElementClass.compareTo() method must be compatable with ElementClass.equals(): see Triads not showing up to fight? (Java Set missing an item)
TreeSet myElements = new TreeSet();
// Do this for each element (implementing *Comparable*)
myElements.add(nextElement);
// *Alternatively*, if an array is forceably provided from other code:
myElements.addAll(Arrays.asList(myArray));
Otherwise, use your own Comparator:
class MyComparator implements Comparator<ElementClass> {
int compareTo(ElementClass element1; ElementClass element2) {
// Your comparison of elements
// Should be consistent with object equality
}
boolean equals(Object otherComparator) {
// Your equality of comparators
}
}
// construct TreeSet with the comparator
TreeSet myElements = new TreeSet(new MyComparator());
// Do this for each element (implementing *Comparable*)
myElements.add(nextElement);
The payoff: check existence of some element:
// Fast binary search through sorted elements (performance ~ log(size)):
boolean containsElement = myElements.exists(someElement);

If you don't want it to be case sensitive
Arrays.stream(VALUES).anyMatch(s::equalsIgnoreCase);

Try this:
ArrayList<Integer> arrlist = new ArrayList<Integer>(8);
// use add() method to add elements in the list
arrlist.add(20);
arrlist.add(25);
arrlist.add(10);
arrlist.add(15);
boolean retval = arrlist.contains(10);
if (retval == true) {
System.out.println("10 is contained in the list");
}
else {
System.out.println("10 is not contained in the list");
}

Check this
String[] VALUES = new String[]{"AB", "BC", "CD", "AE"};
String s;
for (int i = 0; i < VALUES.length; i++) {
if (VALUES[i].equals(s)) {
// do your stuff
} else {
//do your stuff
}
}

Arrays.asList() -> then calling the contains() method will always work, but a search algorithm is much better since you don't need to create a lightweight list wrapper around the array, which is what Arrays.asList() does.
public boolean findString(String[] strings, String desired){
for (String str : strings){
if (desired.equals(str)) {
return true;
}
}
return false; //if we get here… there is no desired String, return false.
}

Use below -
String[] values = {"AB","BC","CD","AE"};
String s = "A";
boolean contains = Arrays.stream(values).anyMatch(v -> v.contains(s));

Use Array.BinarySearch(array,obj) for finding the given object in array or not.
Example:
if (Array.BinarySearch(str, i) > -1)` → true --exists
false --not exists

Try using Java 8 predicate test method
Here is a full example of it.
import java.util.Arrays;
import java.util.List;
import java.util.function.Predicate;
public class Test {
public static final List<String> VALUES =
Arrays.asList("AA", "AB", "BC", "CD", "AE");
public static void main(String args[]) {
Predicate<String> containsLetterA = VALUES -> VALUES.contains("AB");
for (String i : VALUES) {
System.out.println(containsLetterA.test(i));
}
}
}
http://mytechnologythought.blogspot.com/2019/10/java-8-predicate-test-method-example.html
https://github.com/VipulGulhane1/java8/blob/master/Test.java

Create a boolean initially set to false. Run a loop to check every value in the array and compare to the value you are checking against. If you ever get a match, set boolean to true and stop the looping. Then assert that the boolean is true.

As I'm dealing with low level Java using primitive types byte and byte[], the best so far I got is from bytes-java https://github.com/patrickfav/bytes-java seems a fine piece of work

You can check it by two methods
A) By converting the array into string and then check the required string by .contains method
String a = Arrays.toString(VALUES);
System.out.println(a.contains("AB"));
System.out.println(a.contains("BC"));
System.out.println(a.contains("CD"));
System.out.println(a.contains("AE"));
B) This is a more efficent method
Scanner s = new Scanner(System.in);
String u = s.next();
boolean d = true;
for (int i = 0; i < VAL.length; i++) {
if (VAL[i].equals(u) == d)
System.out.println(VAL[i] + " " + u + VAL[i].equals(u));
}

Representing multisets as LinkedLists

A multiset is similar to a set except that the duplications count.
We want to represent multisets as linked lists. The first representation
that comes to mind uses a LinkedList<T> where the same item can occur at
several indices.
For example:the multiset
{ "Ali Baba" , "Papa Bill", "Marcus", "Ali Baba", "Marcus", "Ali Baba" }
can be represented as a linked list
of strings with "Ali Baba" at index 0, "Papa Bill" at index 1,
"Marcus" at index 2, "Ali Baba" at index 3, and so on, for a total of
6 strings.
The professor wants a representation of the multiset as pair <item,integer> where the integer, called the multiplication of item, tells us how many times item occurs in the multiset. This way the above multiset is represented as the linked list with Pair("Ali Baba" ,3) at index 0, Pair("Papa Bill", 1) at index 1, and Pair("Marcus",2) at index 2.
The method is (he wrote good luck, how nice of him >:[ )
public static <T> LinkedList<Pair<T,Integer>> convert(LinkedList<T> in){
//good luck
}
the method transforms the first representation into the Pair representation.
If in is null, convert returns null. Also feel free to modify the input list.
He gave us the Pair class-
public class Pair<T,S>
{
// the fields
private T first;
private S second;
// the constructor
public Pair(T f, S s)
{
first = f;
second = s;
}
// the get methods
public T getFirst()
{
return first;
}
public S getSecond()
{
return second;
}
// the set methods
// set first to v
public void setFirst(T v)
{
first = v;
}
// set second to v
public void setSecond(S v)
{
second = v;
}
}
I am new to programming and I've been doing well, however I have no idea how to even start this program. Never done something like this before.

If you are allowed to use a temporary LinkedList you could do something like that:
import java.util.LinkedList;
public class Main {
public static void main(String[] args) {
LinkedList<String> test = new LinkedList<String>();
test.add("Ali Baba");
test.add("Papa Bill");
test.add("Marcus");
test.add("Ali Baba");
test.add("Marcus");
test.add("Ali Baba");
LinkedList<Pair<String, Integer>> result = convert(test);
for(Pair<String, Integer> res : result) {
System.out.println(res.getFirst() + " :" + res.getSecond());
}
}
public static <T> LinkedList<Pair<T, Integer>> convert(LinkedList<T> in) {
LinkedList<Pair<T, Integer>> returnList = new LinkedList<>();
LinkedList<T> tmp = new LinkedList<T>();
// iterate over your list to count the items
for(T item : in) {
// if you already counted the current item, skip it
if(tmp.contains(item)) {
continue;
}
// counter for the current item
int counter = 0;
//iterate again over your list to actually count the item
for(T item2 : in) {
if(item.equals(item2)) {
counter ++;
}
}
// create your pair for your result list and add it
returnList.add(new Pair<T, Integer>(item, counter));
// mark your item as already counted
tmp.add(item);
}
return returnList;
}
}
With that i get the desired output of
Ali Baba :3
Papa Bill :1
Marcus :2

Your requirements put:
your input : LinkedList
your output : LinkedList>
1 - write a loop to read your input
2 - process / store it in a convenient way: user Map . In fact, use linkedhashmap which keeps the order
2bis - if you can't use a Map, do the same thing directly with two arrays: an array of T, and an array of integer. You must manager insertion, search, and keep count.
3 - iterate over your arrays, and create your output
It is easier to begin with 2, and if it works, replace with 2bis

List of string with occurrences count and sort

I'm developing a Java Application that reads a lot of strings data likes this:
1 cat (first read)
2 dog
3 fish
4 dog
5 fish
6 dog
7 dog
8 cat
9 horse
...(last read)
I need a way to keep all couple [string, occurrences] in order from last read to first read.
string occurrences
horse 1 (first print)
cat 2
dog 4
fish 2 (last print)
Actually i use two list:
1) List<string> input; where i add all data
In my example:
input.add("cat");
input.add("dog");
input.add("fish");
...
2)List<string> possibilities; where I insert the strings once in this way:
if(possibilities.contains("cat")){
possibilities.remove("cat");
}
possibilities.add("cat");
In this way I've got a sorted list where all possibilities.
I use it like that:
int occurrence;
for(String possible:possibilities){
occurrence = Collections.frequency(input, possible);
System.out.println(possible + " " + occurrence);
}
That trick works good but it's too slow(i've got millions of input)... any help?
(English isn’t my first language, so please excuse any mistakes.)

Use a Map<String, Integer>, as #radoslaw pointed, to keep the insertion sorting use LinkedHashMap and not a TreeMap as described here:
LinkedHashMap keeps the keys in the order they were inserted, while a TreeMap is kept sorted via a Comparator or the natural Comparable ordering of the elements.
Imagine you have all the strings in some array, call it listOfAllStrings, iterate over this array and use the string as key in your map, if it does not exists, put in the map, if it exists, sum 1 to actual result...
Map<String, Integer> results = new LinkedHashMap<String, Integer>();
for (String s : listOfAllStrings) {
if (results.get(s) != null) {
results.put(s, results.get(s) + 1);
} else {
results.put(s, 1);
}
}

Make use of a TreeMap, which will keep ordering on the keys as specified by the compare of your MyStringComparator class handling MyString class which wraps String adding insertion indexes, like this:
// this better be immutable
class MyString {
private MyString() {}
public static MyString valueOf(String s, Long l) { ... }
private String string;
private Long index;
public hashcode(){ return string.hashcode(); }
public boolean equals() { // return rely on string.equals() }
}
class MyStringComparator implements Comparator<MyString> {
public int compare(MyString s1, MyString s2) {
return -s1.getIndex().compareTo(s2.gtIndex());
}
}
Pass the comparator while constructing the map:
Map<MyString,Integer> map = new TreeMap<>(new MyStringComparator());
Then, while parsing your input, do
Long counter = 0;
while (...) {
MyString item = MyString.valueOf(readString, counter++);
if (map.contains(item)) {
map.put(map.get(item)+1);
} else {
map.put(item,1);
}
}
There will be a lot of instantiation because of the immutable class, and the comparator will not be consistent with equals, but it should work.
Disclaimer: this is untested code just to show what I'd do, I'll come back and recheck it when I get my hands on a compiler.

Here is the complete solution for your problem,
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class DataDto implements Comparable<DataDto>{
public int count = 0;
public String string;
public long lastSeenTime;
public DataDto(String string) {
this.string = string;
this.lastSeenTime = System.currentTimeMillis();
}
public boolean equals(Object object) {
if(object != null && object instanceof DataDto) {
DataDto temp = (DataDto) object;
if(temp.string != null && temp.string.equals(this.string)) {
return true;
}
}
return false;
}
public int hashcode() {
return string.hashCode();
}
public int compareTo(DataDto o) {
if(o != null) {
return o.lastSeenTime < this.lastSeenTime ? -1 : 1;
}
return 0;
}
public String toString() {
return this.string + " : " + this.count;
}
public static final void main(String[] args) {
String[] listOfAllStrings = {"horse", "cat", "dog", "fish", "cat", "fish", "dog", "cat", "horse", "fish"};
Map<String, DataDto> results = new HashMap<String, DataDto>();
for (String s : listOfAllStrings) {
DataDto dataDto = results.get(s);
if(dataDto != null) {
dataDto.count = dataDto.count + 1;
dataDto.lastSeenTime = System.nanoTime();
} else {
dataDto = new DataDto(s);
results.put(s, dataDto);
}
}
List<DataDto> finalResults = new ArrayList<DataDto>(results.values());
System.out.println(finalResults);
Collections.sort(finalResults);
System.out.println(finalResults);
}
}
Ans
[horse : 1, cat : 2, fish : 2, dog : 1]
[fish : 2, horse : 1, cat : 2, dog : 1]
I think this solution will be suitable for your requirement.

If you know that your data is not going to exceed your memory capacity when you read it all into memory, then the solution is simple - using a LinkedList or a and a LinkedHashMap.
For example, if you use a Linked list:
LinkedList<String> input = new LinkedList();
You then proceed to use input.add() as you did originally. But when the input list is full, you basically use Jordi Castilla's solution - but put the entries in the linked list in reverse order. To do that, you do:
Iterator<String> iter = list.descendingIterator();
LinkedHashMap<String,Integer> map = new LinkedHashMap<>();
while (iter.hasNext()) {
String s = iter.next();
if ( map.containsKey(s)) {
map.put( s, map.get(s) + 1);
} else {
map.put(s, 1);
}
}
Now, the only real difference between his solution and mine is that I'm using list.descendingIterator() which is a method in LinkedList that gives you the entries in backwards order, from "horse" to "cat".
The LinkedHashMap will keep the proper order - whatever was entered first will be printed first, and because we entered things in reverse order, then whatever was read last will be printed first. So if you print your map the result will be:
{horse=1, cat=2, dog=4, fish=2}
If you have a very long file, and you can't load the entire list of strings into memory, you had better keep just the map of frequencies. In this case, in order to keep the order of entry, we'll use an object such as this:
private static class Entry implements Comparable<Entry> {
private static long nextOrder = Long.MIN_VALUE;
private String str;
private int frequency = 1;
private long order = nextOrder++;
public Entry(String str) {
this.str = str;
}
public String getString() {
return str;
}
public int getFrequency() {
return frequency;
}
public void updateEntry() {
frequency++;
order = nextOrder++;
}
#Override
public int compareTo(Entry e) {
if ( order > e.order )
return -1;
if ( order < e.order )
return 1;
return 0;
}
#Override
public String toString() {
return String.format( "%s: %d", str, frequency );
}
}
The trick here is that every time you update the entry (add one to the frequency), it also updates the order. But the compareTo() method orders Entry objects from high order (updated/inserted later) to low order (updated/inserted earlier).
Now you can use a simple HashMap<String,Entry> to store the information as you read it (I'm assuming you are reading from some sort of scanner):
Map<String,Entry> m = new HashMap<>();
while ( scanner.hasNextLine() ) {
String str = scanner.nextLine();
Entry entry = m.get(str);
if ( entry == null ) {
entry = new Entry(str);
m.put(str, entry);
} else {
entry.updateEntry();
}
}
Scanner.close();
Now you can sort the values of the entries:
List<Entry> orderedList = new ArrayList<Entry>(m.values());
m = null;
Collections.sort(orderedList);
Running System.out.println(orderedList) will give you:
[horse: 1, cat: 2, dog: 4, fish: 2]
In principle, you could use a TreeMap whose keys contained the "order" stuff, rather than a plain HashMap like this followed by sorting, but I prefer not having either mutable keys in a map, nor changing the keys constantly. Here we are only changing the values as we fill the map, and each key is inserted into the map only once.

What you could do:
Reverse the order of the list using
Collections.reverse(input). This runs in linear time - O(n);
Create a Set from the input list. A Set garantees uniqueness.
To preserve insertion order, you'll need a LinkedHashSet;
Iterate over this set, just as you did above.
Code:
/* I don't know what logic you use to create the input list,
* so I'm using your input example. */
List<String> input = Arrays.asList("cat", "dog", "fish", "dog",
"fish", "dog", "dog", "cat", "horse");
/* by the way, this changes the input list!
* Copy it in case you need to preserve the original input. */
Collections.reverse(input);
Set<String> possibilities = new LinkedHashSet<String>(strings);
for (String s : possibilities) {
System.out.println(s + " " + Collections.frequency(strings, s));
}
Output:
horse 1
cat 2
dog 4
fish 2

ArrayList as key in HashMap

Would it be possible to add an ArrayList as the key of HashMap. I would like to keep the frequency count of bigrams. The bigram is the key and the value is its frequency.
For each of the bigrams like "he is", I create an ArrayList for it and insert it into the HashMap. But I am not getting the correct output.
public HashMap<ArrayList<String>, Integer> getBigramMap(String word1, String word2) {
HashMap<ArrayList<String>, Integer> hm = new HashMap<ArrayList<String>, Integer>();
ArrayList<String> arrList1 = new ArrayList<String>();
arrList1 = getBigram(word1, word2);
if (hm.get(arrList1) != null) {
hm.put(arrList1, hm.get(arrList1) + 1);
} else {
hm.put(arrList1, 1);
}
System.out.println(hm.get(arrList1));
return hm;
}
public ArrayList<String> getBigram(String word1, String word2) {
ArrayList<String> arrList2 = new ArrayList<String>();
arrList2.add(word1);
arrList2.add(word2);
return arrList2;
}

Yes you can have ArrayLists as a keys in a hash map, but it is a very bad idea since they are mutable.
If you change the ArrayList in any way (or any of its elements), the mapping will basically be lost, since the key won't have the same hashCode as it had when it was inserted.
The rule of thumb is to use only immutable data types as keys in a hash map. As suggested by Alex Stybaev, you probably want to create a Bigram class like this:
final class Bigram {
private final String word1, word2;
public Bigram(String word1, String word2) {
this.word1 = word1;
this.word2 = word2;
}
public String getWord1() {
return word1;
}
public String getWord2() {
return word2;
}
#Override
public int hashCode() {
return word1.hashCode() ^ word2.hashCode();
}
#Override
public boolean equals(Object obj) {
return (obj instanceof Bigram) && ((Bigram) obj).word1.equals(word1)
&& ((Bigram) obj).word2.equals(word2);
}
}

Why can't you use something like this:
class Bigram{
private String firstItem;
private String secondItem;
<getters/setters>
#Override
public int hashCode(){
...
}
#Override
public boolean equals(){
...
}
}
instead of using the dynamic collection for limited number of items (two).

From the documentation:
Note: great care must be exercised if mutable objects are used as map
keys. The behavior of a map is not specified if the value of an object is
changed in a manner that affects equals comparisons while the
object is a key in the map. A special case of this prohibition is that it
is not permissible for a map to contain itself as a key. While it is
permissible for a map to contain itself as a value, extreme caution is
advised: the equals and hashCode methods are no longer
well defined on such a map.
You have to take care when you are using mutable objects as keys for the sake of hashCode and equals.
The bottom line is that it is better to use immutable objects as keys.

Try this ,this will work.
public Map<List, Integer> getBigramMap (String word1,String word2){
Map<List,Integer> hm = new HashMap<List, Integer>();
List<String> arrList1 = new ArrayList<String>();
arrList1 = getBigram(word1, word2);
if(hm.get(arrList1) !=null){
hm.put(arrList1, hm.get(arrList1)+1);
}
else {
hm.put(arrList1, 1);
}
System.out.println(hm.get(arrList1));
return hm;
}

I've come up with this solution. It is obviously not usable in all cases, for example over stepping the hashcodes int capacity, or list.clone() complications(if the input list gets changed, key stays the same as intended, but when the items of List are mutable, cloned list has the same reference to its items, which would result in changing the key itself).
import java.util.ArrayList;
public class ListKey<T> {
private ArrayList<T> list;
public ListKey(ArrayList<T> list) {
this.list = (ArrayList<T>) list.clone();
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
for (int i = 0; i < this.list.size(); i++) {
T item = this.list.get(i);
result = prime * result + ((item == null) ? 0 : item.hashCode());
}
return result;
}
#Override
public boolean equals(Object obj) {
return this.list.equals(obj);
}
}
---------
public static void main(String[] args) {
ArrayList<Float> createFloatList = createFloatList();
ArrayList<Float> createFloatList2 = createFloatList();
Hashtable<ListKey<Float>, String> table = new Hashtable<>();
table.put(new ListKey(createFloatList2), "IT WORKS!");
System.out.println(table.get(createFloatList2));
createFloatList2.add(1f);
System.out.println(table.get(createFloatList2));
createFloatList2.remove(3);
System.out.println(table.get(createFloatList2));
}
public static ArrayList<Float> createFloatList() {
ArrayList<Float> floatee = new ArrayList<>();
floatee.add(34.234f);
floatee.add(new Float(33));
floatee.add(null);
return floatee;
}
Output:
IT WORKS!
null
IT WORKS!

Sure it possible. I suppose the issue in your put. Try obtain key for bigram, increment it, remove entry with this bigram and insert updated value

Unlike Array, List can be used as the key of a HashMap, but it is not a good idea, since we should always try to use an immutable object as the key.
.toString() method getting the String represtenation is a good key choice in many cases, since String is an immuteable object and can prefectly stands for the array or list.

Please check below my code in order to understand if key is ArrayList in Map and how JVM will do it for inputs:
here i write hashCode and equals method for TesthashCodeEquals class.
package com.msq;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
class TesthashCodeEquals {
private int a;
private int b;
public TesthashCodeEquals() {
// TODO Auto-generated constructor stub
}
public TesthashCodeEquals(int a, int b) {
super();
this.a = a;
this.b = b;
}
public int getA() {
return a;
}
public void setA(int a) {
this.a = a;
}
public int getB() {
return b;
}
public void setB(int b) {
this.b = b;
}
public int hashCode() {
return this.a + this.b;
}
public boolean equals(Object o) {
if (o instanceof TesthashCodeEquals && o != null) {
TesthashCodeEquals c = (TesthashCodeEquals) o;
return ((this.a == c.a) && (this.b == c.b));
} else
return false;
}
}
public class HasCodeEquals {
public static void main(String[] args) {
Map<List<TesthashCodeEquals>, String> m = new HashMap<>();
List<TesthashCodeEquals> list1=new ArrayList<>();
list1.add(new TesthashCodeEquals(1, 2));
list1.add(new TesthashCodeEquals(3, 4));
List<TesthashCodeEquals> list2=new ArrayList<>();
list2.add(new TesthashCodeEquals(10, 20));
list2.add(new TesthashCodeEquals(30, 40));
List<TesthashCodeEquals> list3=new ArrayList<>();
list3.add(new TesthashCodeEquals(1, 2));
list3.add(new TesthashCodeEquals(3, 4));
m.put(list1, "List1");
m.put(list2, "List2");
m.put(list3, "List3");
for(Map.Entry<List<TesthashCodeEquals>,String> entry:m.entrySet()){
for(TesthashCodeEquals t:entry.getKey()){
System.out.print("value of a: "+t.getA()+", value of b: "+t.getB()+", map value is:"+entry.getValue() );
System.out.println();
}
System.out.println("######################");
}
}
}
.
output:
value of a: 10, value of b: 20, map value is:List2
value of a: 30, value of b: 40, map value is:List2
######################
value of a: 1, value of b: 2, map value is:List3
value of a: 3, value of b: 4, map value is:List3
######################
so this will check the number of objects in List and the values of valriabe in object. if number of objects are same and the values of instance variables is also same then it will consider duplicate key and override the key.
now if i change only the value of object on list3
list3.add(new TesthashCodeEquals(2, 2));
then it will print:
output
value of a: 2, value of b: 2, map value is:List3
value of a: 3, value of b: 4, map value is:List3
######################
value of a: 10, value of b: 20, map value is:List2
value of a: 30, value of b: 40, map value is:List2
######################
value of a: 1, value of b: 2, map value is:List1
value of a: 3, value of b: 4, map value is:List1
######################
so that It always check the number of objects in List and the value of instance variable of object.
thanks

ArrayList.equals() is inherited from java.lang.Object - therefore equals() on ArrayList is independent of the content of the list.
If you want to use an ArrayList as a map key, you will need to override equals() and hashcode() in order to make two arraylists with the same content in the same order return true on a call to equals() and return the same hashcode on a call to hashcode().
Is there any particular reason you have to use an ArrayList as opposed to say a simple String as the key?
edit: Ignore me, as Joachim Sauer pointed out below, I am so wrong it's not even funny.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

TreeSet to find k most frequent words in a book? - java

Related

How to check if two objects in a ArrayList are the same? [duplicate]

How to match the exact string value in the list of comma separated string [duplicate]

Representing multisets as LinkedLists

List of string with occurrences count and sort

ArrayList as key in HashMap

Categories

Resources