How to remove all words from TreeSet ignore care - java

I created two sets:
public static Set<String> COMMON_ENGLISH_WORDS = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
public static Set<String> NON_ENGLISH_WORDS = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
I kept all my common english words arround 58000 and non english word arround 1700 in two files seperately. And I am loading from file and asigning to above two variables. That assigment is happening properly that I check by debug.
public static void finalNonEnglishWords(){
ToolsConstants.COMMON_ENGLISH_WORDS = CSVFileUtil.readCSVToTreeSet(ToolsConstants.COMMON_ENGLISH_WORDS_FILE);
ToolsConstants.NON_ENGLISH_WORDS = CSVFileUtil.readCSVToTreeSet(ToolsConstants.NON_ENGLISH_WORDS_FILE);
System.out.println(ToolsConstants.NON_ENGLISH_WORDS.size());
ToolsConstants.NON_ENGLISH_WORDS.removeAll(ToolsConstants.COMMON_ENGLISH_WORDS);
System.out.println(ToolsConstants.NON_ENGLISH_WORDS.size());
}
But it not removing.
I am seeing same number in output.
But I saw both files, there are some common words.
I did the same sample with just 7 elements but it working perfectly. And fallowed the same way only different is number of elements in collection.
public static void removeAllDemo(){
List<String> list1 = new ArrayList<>(
Arrays.asList("BOB", "Joe", "john", "MARK","MARk", "dave", "Bill")
);
List<String> list2 = Arrays.asList("JOE", "MARK", "DAVE", "Ravi");
// Add all values of list1 in a case insensitive collection
Set<String> set1 = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
set1.addAll(list1);
// Add all values of list2 in a case insensitive collection
Set<String> set2 = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
set2.addAll(list2);
// Remove all common Strings ignoring case
System.out.println(set1);
set1.removeAll(set2);
System.out.println(set1);
// Keep in list1 only the remaining Strings ignoring case
list1.retainAll(set1);
}

So generally you should apply the same approach as that implemented in the demo method:
appropriate empty sets with the case-insensitive order are already created
use addAll to populate the sets
remove duplicates from the set of non-English words
public static void finalNonEnglishWords() {
ToolsConstants.COMMON_ENGLISH_WORDS.addAll(
CSVFileUtil.readCSVToTreeSet(ToolsConstants.COMMON_ENGLISH_WORDS_FILE)
);
ToolsConstants.NON_ENGLISH_WORDS.addAll(
CSVFileUtil.readCSVToTreeSet(ToolsConstants.NON_ENGLISH_WORDS_FILE)
);
System.out.println(ToolsConstants.NON_ENGLISH_WORDS.size());
ToolsConstants.NON_ENGLISH_WORDS.removeAll(
ToolsConstants.COMMON_ENGLISH_WORDS
);
System.out.println(ToolsConstants.NON_ENGLISH_WORDS.size());
}

Related

Check if an arrayList contains an array

I have an arrayList that contains arrays. How do I check if the arrayList contains a specified array? I used .contains method and it returns false instead of expected true.
import java.util.ArrayList;
import java.util.Arrays;
public class main {
public static void main(String[] args) {
ArrayList<String[]> action = new ArrayList<String[]>();
action.add(new String[]{"appple", "ball"});
String[] items = new String[]{"appple", "ball"};
if (action.contains(new String[]{"appple", "ball"})) {
System.out.println("Yes");
}
System.out.println(action.contains(items)); // False
}
}
As you are creating different arrays (even if the contents are the same), contains will result false.
However, if you do this:
List<String[]> action = new ArrayList<String[]>();
String[] items = new String[]{"apple","ball"};
action.add(items);
if (action.contains(items))
System.out.println("Yes");
This will print Yes.
Also, some examples of the behaviour:
String[] items = new String[]{"apple","ball"};
action.add(items);
String[] clone = items.clone();
String[] mirror = items;
action.contains(clone); // false
action.contains(mirror); // true
items[0]="horse";
System.out.println(mirror[0]); // "horse"
System.out.println(clone[0]); // "apple"
System.out.println(action.get(0)[0]); // "horse"
mirror[1]="crazy";
System.out.println(clone[1]); // "ball"
System.out.println(action.get(0)[1]); // "crazy"
System.out.println(items[1]); // "crazy"
clone[1]="yolo";
System.out.println(action.get(0)[1]); // "crazy"
System.out.println(items[1]); // "crazy"
System.out.println(mirror[1]); // "crazy"
System.out.println(action.get(0).hashCode()); //2018699554
System.out.println(items.hashCode()); //2018699554
System.out.println(clone.hashCode()); //1311053135
System.out.println(mirror.hashCode()); //2018699554
Custom "contains"
The issue here is that if you want to search for an specific array afterwards, you'd lose the references and searching an item wouldn't be possible, not even replicating the array with the same exact values.
As a workaround, you could implement your own contains method. Something like:
If you wish to get the index:
static int indexOfArray(List<String[]> list, String[] twin)
{
for (int i=0;i<list.size();i++)
if (Arrays.equals(list.get(i),twin))
return i;
return -1;
}
And then, call it like:
String[] toSearch = new String[]{"apple","ball"};
int index = indexOfArray(action, toSearch);
if (index>0)
System.out.println("Array found at index "+index);
else
System.out.println("Array not found");
If the index is bigger than -1, you can get your original array by just:
String[] myArray = action.get(index);
HashMap + identifier
An alternative would be storing the arrays into a HashMap by declaring an identifier for each array. For example:
Base64 ID
This will give the same result for the same values, as the encoded value is based on the entries, not the Object's reference.
static String getIdentifier(String[] array)
{
String all="";
for (String s : array)
all+=s;
return Base64.getEncoder().encodeToString(all.getBytes());
}
And then you could:
Map<String, String[]> arrayMap= new HashMap<>();
String[] items = new String[]{"apple","pear", "banana"}; // *[1234]
action.add(items);
arrayMap.put(getIdentifier(items), items); // id = QUJDYWFh
//....
//Directly finding the twin will fail
String[] toSearch = new String[]{"apple","pear", "banana"}; // *[1556]
System.out.println(action.contains(toSearch)); // false
//But if we get the identifier based on the values
String arrId = getIdentifier(toSearch); // id = QUJDYWFh
System.out.println(action.contains(arrayMap.get(arrId))); //true
//arrayMap.get(arrId)-> *[1234]
//.....
Name.
Choose a representative name and use it as Id
Map<String, String[]> arrayMap= new HashMap<>();
String[] items = new String[]{"apple","pear", "banana"};
action.add(items);
arrayMap.put("fruits", items);
//...
System.out.println(action.contains(arrayMap.get("fruits"))); // true
The 'contains' method compares equivalent hashCode values.
So if you make it like below*, it will pass.
public class main {
public static void main(String[] args) {
ArrayList<String[]> action = new ArrayList<String[]>();
String[] items = new String[]{"appple","ball"};
action.add(items);
System.out.println("TO STRING");
System.out.println("--"+action.get(0));
System.out.println("--"+new String[]{"apple","ball"});
System.out.println("HASHCODES");
String[] sameValues = new String[]{"apple","ball"};
System.out.println("--"+action.get(0).hashCode());
System.out.println("--"+items.hashCode());
System.out.println("--"+sameValues.hashCode());
System.out.println("CONTAINS");
System.out.println("--"+action.contains(items)); // *this
System.out.println("--"+action.contains(sameValues));
System.out.println("--"+action.contains(new String[]{"apple","ball"}));
}
}
result is:
TO STRING
--[Ljava.lang.String;#7b1d7fff
--[Ljava.lang.String;#299a06ac
HASHCODES
--1243554231
--1243554231
--2548778887
CONTAINS
--true
--false
--false
Regarding the code shown when printing the array, these don't override toString(), so you get:
getClass().getName() + '#' + Integer.toHexString(hashCode())
For example:
[Ljava.lang.String;#7b1d7fff
[ stands for single dimension array
Ljava.lang.String stands for the type
#
7b1d7fff Hex representation of the hashcode
However, if you want to compare the values, there is the following method.
public class main {
public static void main(String[] args) {
String[] items = new String[]{"apple","ball"};
ArrayList<String> action = new ArrayList<>(Arrays.asList(items));
if (action.contains("apple")) {
System.out.println("Yes");
}
}
}
You can iterate over this list and for each element, i.e. array, call Arrays.equals method to check equality of arrays until first match, or till the end of the list if none match. In this case it can return true for each element:
List<String[]> list = List.of(
new String[]{"appple", "ball"},
new String[]{"appple", "ball"});
String[] act = new String[]{"appple", "ball"};
System.out.println(list.stream()
.anyMatch(arr -> Arrays.equals(arr, act))); // true
This method internally calls String#equals method for each element of the array, i.e. String, so this code also returns true:
List<String[]> list = List.of(
new String[]{new String("appple"), new String("ball")},
new String[]{new String("appple"), new String("ball")});
String[] act = new String[]{new String("appple"), new String("ball")};
System.out.println(list.stream()
.anyMatch(arr -> Arrays.equals(arr, act))); // true
According to JavaDocs, "contains" method is using "equals" and "hashCode" methods in order to check whether an object is contained.
A leading question:
Do you know what's the implementation of "equals" for arrays?
Check it and you will probably understand your code's execution result (hint: ==).
As "Hovercraft Full Of Eels" said, a better design will be using a list of some Collection which you DO understand / control it's "equals" and "hashCode" methods.

How to merge two corresponding values of two different variables in arraylist?

I want to merge two corresponding values of two different variables with comma separator in a row :
like
Plate Numbers(Output) : MH 35353, AP 35989, NA 24455, DL 95405.
There is two different variables one is plate State and another is plate Number, I want to merge them together with their corresponding values like 1st values of plate State with 1st value of plate Number after that comma then so on..
I tried this code snippet but didn't work :
ArrayList<String>
list1 = new ArrayList<String>();
list1.add("MH");
list1.add("AP");
list1.add("NA ");
list1.add("DL");
ArrayList<String>
list2 = new ArrayList<String>();
list2.add("35353");
list2.add("35989");
list2.add("24455");
list2.add("95405");
list1.addAll(list2);
use this :
ArrayList<String>
list1 = new ArrayList<String>();
list1.add("MH");
list1.add("AP");
list1.add("NA ");
list1.add("DL");
ArrayList<String>
list2 = new ArrayList<String>();
list2.add("35353");
list2.add("35989");
list2.add("24455");
list2.add("95405");
Iterator iterable = list2.iterator();
List<String> list3 =list1.stream()
.map(x->{
x= x+" "+((String) iterable.next());
return x;})
.collect(Collectors.toList());
String output = String.join(", ", list3);
System.out.println(output);
From ArrayList#addAll Javadoc:
Appends all of the elements in the specified collection to the end of this list[...]
This is not what you want, because you actually don't want to append the objects, you want to merge the String of the first list with the String from the second list. So in a sense, not merge the List but merge the objects (Strings) in the lists.
The easiest (most beginner friendly) solution would be to just create a simple helper method yourself, that does what you need.
Something like this:
public static void main(String[] args) {
ArrayList<String> list1 = new ArrayList<String>();
list1.add("MH");
list1.add("AP");
list1.add("NA");
list1.add("DL");
ArrayList<String> list2 = new ArrayList<String>();
list2.add("35353");
list2.add("35989");
list2.add("24455");
list2.add("95405");
ArrayList<String> combined = combinePlateNumbers(list1, list2);
System.out.println(combined);
}
private static ArrayList<String> combinePlateNumbers(List<String> list1, List<String> list2) {
ArrayList<String> result = new ArrayList<>();
if (list1.size() != list2.size()) {
// lists don't have equal size, not compatible
// your decision on how to handle this
return result;
}
// iterate the list and combine the strings (added optional whitespace here)
for (int i = 0; i < list1.size(); i++) {
result.add(list1.get(i).concat(" ").concat(list2.get(i)));
}
return result;
}
Output:
[MH 35353, AP 35989, NA 24455, DL 95405]

How do I calculate intersection between more than two HashSets?

Considering the code below and the fact that the 4 HashSets are populated elsewhere.
My aim is to contain all element(s) that are common in all 4 HashSets.
My question is that first of all, am I doing it right? Secondly, if I'm doing it right, is there a better way to do it? If not, then what solution do I have for this problem?
static Set<String> one=new HashSet<>();
static Set<String> two=new HashSet<>();
static Set<String> three=new HashSet<>();
static Set<String> four=new HashSet<>();
private static void createIntersectionQrels() {
ArrayList<String> temp = new ArrayList<>();
Set<String> interQrels = new HashSet<>();
temp.addAll(one);
one.retainAll(two);
interQrels.addAll(one);
one.addAll(temp);
one.retainAll(three);
interQrels.addAll(one);
one.addAll(temp);
one.retainAll(four);
interQrels.addAll(one);
one.addAll(temp);
interQrels.retainAll(two);
interQrels.retainAll(three);
interQrels.retainAll(four);
}
I think you can simply can call retainAll() on the first set, using the second, third, and fourth sets as parameters:
private static Set<String> getIntersectionSet() {
// create a deep copy of one (in case you don't wish to modify it)
Set<String> interQrels = new HashSet<>(one);
interQrels.retainAll(two); // intersection with two (and one)
interQrels.retainAll(three); // intersection with three (and two, one)
interQrels.retainAll(four); // intersection four (and three, two, one)
return interQrels;
}
I'm a bit new to Java 8, but this seems pretty readable:
Set<String> intersection = one.stream()
.filter(two::contains)
.filter(three::contains)
.filter(four::contains)
.collect(Collectors.toSet());
Here's a quick Junit test to try out:
#Test
public void testIntersectionBetweenSets() {
Collection<String> one = new HashSet<>(4);
one.add("Larry");
one.add("Mark");
one.add("Henry");
one.add("Andrew");
Set<String> two = new HashSet<>(2);
two.add("Mark");
two.add("Andrew");
Set<String> three = new HashSet<>(3);
three.add("Mark");
three.add("Mary");
three.add("Andrew");
Set<String> four = new HashSet<>(3);
four.add("Mark");
four.add("John");
four.add("Andrew");
Set<String> intersection = one.stream()
.filter(two::contains)
.filter(three::contains)
.filter(four::contains)
.collect(Collectors.toSet());
Collection<String> expected = new HashSet<>(2);
expected.add("Andrew");
expected.add("Mark");
Assert.assertEquals(expected, intersection);
}
I would think the best way to handle this is with Groovy. I know you didn't ask for groovy, but anytime I can convert all that code into one line, it's hard to resist.
println one.intersect(two).intersect(three).intersect(four)

Efficient way to find multiple objects in Arraylist

I have to check whether an arraylist contains any of the value passed through an object.
Consider an arraylist with values "abc", "jkl","def", "ghi".
And String check="abc,ghi"
We have to check whether any of the value in string (abc or ghi) is present in the arraylist and we can stop checking when a match is found.
Traditionally, we can split the String check with comma and use arraylist.contains() in iteration for each comma separated values.
But this is time consuming. Is there any better way to do this check.
One way would be to use the retainAll method and Sets.
Example
// note an additional "ghi" here
List<String> original = new ArrayList<String>(Arrays.asList(new String[]{"abc", "jkl","def", "ghi", "ghi"}));
Set<String> clone = new HashSet<String>(original);
Set<String> control = new HashSet<String>(Arrays.asList(new String[]{"abc","ghi"}));
clone.retainAll(control);
System.out.println(clone.equals(control));
Output
true
This is still O(n), but you could build a set from the search strings and just iterate over the list once:
HashSet<String> checks = new HashSet<String>();
checks.addAll(Arrays.asList(check.split(",")));
for (String item : arraylist) {
if (checks.contains(item)) {
// Found one
}
}
You could transform check into a regexp and loop only once through the ArrayList.
String check = "abc,ghi";
Pattern p = Pattern.compile("(" + check.replace(',', '|') + ")");
List<String> list = Arrays.asList(new String[] { "abc", "jkl", "def", "ghi" });
for (String element : list) {
if (p.matcher(element).matches()) {
System.out.println("match: " + element);
}
}

Java Compare Two Lists

I have two lists ( not java lists, you can say two columns)
For example
**List 1** **Lists 2**
milan hafil
dingo iga
iga dingo
elpha binga
hafil mike
meat dingo
milan
elpha
meat
iga
neeta.peeta
I'd like a method that returns how many elements are same. For this example it should be
3 and it should return me similar values of both list and different values too.
Should I use hashmap if yes then what method to get my result?
Please help
P.S: It is not a school assignment :) So if you just guide me it will be enough
EDIT
Here are two versions. One using ArrayList and other using HashSet
Compare them and create your own version from this, until you get what you need.
This should be enough to cover the:
P.S: It is not a school assignment :) So if you just guide me it will be enough
part of your question.
continuing with the original answer:
You may use a java.util.Collection and/or java.util.ArrayList for that.
The retainAll method does the following:
Retains only the elements in this collection that are contained in the specified collection
see this sample:
import java.util.Collection;
import java.util.ArrayList;
import java.util.Arrays;
public class Repeated {
public static void main( String [] args ) {
Collection listOne = new ArrayList(Arrays.asList("milan","dingo", "elpha", "hafil", "meat", "iga", "neeta.peeta"));
Collection listTwo = new ArrayList(Arrays.asList("hafil", "iga", "binga", "mike", "dingo"));
listOne.retainAll( listTwo );
System.out.println( listOne );
}
}
EDIT
For the second part ( similar values ) you may use the removeAll method:
Removes all of this collection's elements that are also contained in the specified collection.
This second version gives you also the similar values and handles repeated ( by discarding them).
This time the Collection could be a Set instead of a List ( the difference is, the Set doesn't allow repeated values )
import java.util.Collection;
import java.util.HashSet;
import java.util.Arrays;
class Repeated {
public static void main( String [] args ) {
Collection<String> listOne = Arrays.asList("milan","iga",
"dingo","iga",
"elpha","iga",
"hafil","iga",
"meat","iga",
"neeta.peeta","iga");
Collection<String> listTwo = Arrays.asList("hafil",
"iga",
"binga",
"mike",
"dingo","dingo","dingo");
Collection<String> similar = new HashSet<String>( listOne );
Collection<String> different = new HashSet<String>();
different.addAll( listOne );
different.addAll( listTwo );
similar.retainAll( listTwo );
different.removeAll( similar );
System.out.printf("One:%s%nTwo:%s%nSimilar:%s%nDifferent:%s%n", listOne, listTwo, similar, different);
}
}
Output:
$ java Repeated
One:[milan, iga, dingo, iga, elpha, iga, hafil, iga, meat, iga, neeta.peeta, iga]
Two:[hafil, iga, binga, mike, dingo, dingo, dingo]
Similar:[dingo, iga, hafil]
Different:[mike, binga, milan, meat, elpha, neeta.peeta]
If it doesn't do exactly what you need, it gives you a good start so you can handle from here.
Question for the reader: How would you include all the repeated values?
You can try intersection() and subtract() methods from CollectionUtils.
intersection() method gives you a collection containing common elements and the subtract() method gives you all the uncommon ones.
They should also take care of similar elements
If you are looking for a handy way to test the equality of two collections, you can use org.apache.commons.collections.CollectionUtils.isEqualCollection, which compares two collections regardless of the ordering.
Are these really lists (ordered, with duplicates), or are they sets (unordered, no duplicates)?
Because if it's the latter, then you can use, say, a java.util.HashSet<E> and do this in expected linear time using the convenient retainAll.
List<String> list1 = Arrays.asList(
"milan", "milan", "iga", "dingo", "milan"
);
List<String> list2 = Arrays.asList(
"hafil", "milan", "dingo", "meat"
);
// intersection as set
Set<String> intersect = new HashSet<String>(list1);
intersect.retainAll(list2);
System.out.println(intersect.size()); // prints "2"
System.out.println(intersect); // prints "[milan, dingo]"
// intersection/union as list
List<String> intersectList = new ArrayList<String>();
intersectList.addAll(list1);
intersectList.addAll(list2);
intersectList.retainAll(intersect);
System.out.println(intersectList);
// prints "[milan, milan, dingo, milan, milan, dingo]"
// original lists are structurally unmodified
System.out.println(list1); // prints "[milan, milan, iga, dingo, milan]"
System.out.println(list2); // prints "[hafil, milan, dingo, meat]"
Of all the approaches, I find using org.apache.commons.collections.CollectionUtils#isEqualCollection is the best approach. Here are the reasons -
I don't have to declare any additional list/set myself
I am not mutating the input lists
It's very efficient. It checks the equality in O(N) complexity.
If it's not possible to have apache.commons.collections as a dependency, I would recommend to implement the algorithm it follows to check equality of the list because of it's efficiency.
Using java 8 removeIf
public int getSimilarItems(){
List<String> one = Arrays.asList("milan", "dingo", "elpha", "hafil", "meat", "iga", "neeta.peeta");
List<String> two = new ArrayList<>(Arrays.asList("hafil", "iga", "binga", "mike", "dingo")); //Cannot remove directly from array backed collection
int initial = two.size();
two.removeIf(one::contains);
return initial - two.size();
}
Simple solution :-
List<String> list = new ArrayList<String>(Arrays.asList("a", "b", "d", "c"));
List<String> list2 = new ArrayList<String>(Arrays.asList("b", "f", "c"));
list.retainAll(list2);
list2.removeAll(list);
System.out.println("similiar " + list);
System.out.println("different " + list2);
Output :-
similiar [b, c]
different [f]
Assuming hash1 and hash2
List< String > sames = whatever
List< String > diffs = whatever
int count = 0;
for( String key : hash1.keySet() )
{
if( hash2.containsKey( key ) )
{
sames.add( key );
}
else
{
diffs.add( key );
}
}
//sames.size() contains the number of similar elements.
I found a very basic example of List comparison at List Compare
This example verifies the size first and then checks the availability of the particular element of one list in another.
public static boolean compareList(List ls1, List ls2){
return ls1.containsAll(ls2) && ls1.size() == ls2.size() ? true :false;
}
public static void main(String[] args) {
ArrayList<String> one = new ArrayList<String>();
one.add("one");
one.add("two");
one.add("six");
ArrayList<String> two = new ArrayList<String>();
two.add("one");
two.add("six");
two.add("two");
System.out.println("Output1 :: " + compareList(one, two));
two.add("ten");
System.out.println("Output2 :: " + compareList(one, two));
}
protected <T> boolean equals(List<T> list1, List<T> list2) {
if (list1 == list2) {
return true;
}
if (list1 == null || list2 == null || list1.size() != list2.size()) {
return false;
}
// to prevent wrong results on {a,a,a} and {a,b,c}
// iterate over list1 and then list2
return list1.stream()
.filter(val -> !list2.contains(val))
.collect(Collectors.toList())
.isEmpty() &&
list2.stream()
.filter(val -> !list1.contains(val))
.collect(Collectors.toList())
.isEmpty();
}

Categories

Resources