Is there a better way to combine two string sets in java?

Is there a better way to combine two string sets in java? - java

I need to combine two string sets while filtering out redundant information, this is the solution I came up with, is there a better way that anyone can suggest? Perhaps something built in that I overlooked? Didn't have any luck with google.
Set<String> oldStringSet = getOldStringSet();
Set<String> newStringSet = getNewStringSet();
for(String currentString : oldStringSet)
{
if (!newStringSet.contains(currentString))
{
newStringSet.add(currentString);
}
}

Since a Set does not contain duplicate entries, you can therefore combine the two by:
newStringSet.addAll(oldStringSet);
It does not matter if you add things twice, the set will only contain the element once... e.g it's no need to check using contains method.

You can do it using this one-liner
Set<String> combined = Stream.concat(newStringSet.stream(), oldStringSet.stream())
.collect(Collectors.toSet());
With a static import it looks even nicer
Set<String> combined = concat(newStringSet.stream(), oldStringSet.stream())
.collect(toSet());
Another way is to use flatMap method:
Set<String> combined = Stream.of(newStringSet, oldStringSet).flatMap(Set::stream)
.collect(toSet());
Also any collection could easily be combined with a single element
Set<String> combined = concat(newStringSet.stream(), Stream.of(singleValue))
.collect(toSet());

The same with Guava:
Set<String> combinedSet = Sets.union(oldStringSet, newStringSet)

From the definition Set contain only unique elements.
Set<String> distinct = new HashSet<String>();
distinct.addAll(oldStringSet);
distinct.addAll(newStringSet);
To enhance your code you may create a generic method for that
public static <T> Set<T> distinct(Collection<T>... lists) {
Set<T> distinct = new HashSet<T>();
for(Collection<T> list : lists) {
distinct.addAll(list);
}
return distinct;
}

If you are using Guava you can also use a builder to get more flexibility:
ImmutableSet.<String>builder().addAll(someSet)
.addAll(anotherSet)
.add("A single string")
.build();

If you are using the Apache Common, use SetUtils class from org.apache.commons.collections4.SetUtils;
SetUtils.union(setA, setB);

Just use newStringSet.addAll(oldStringSet). No need to check for duplicates as the Set implementation does this already.

http://docs.oracle.com/javase/7/docs/api/java/util/Set.html#addAll(java.util.Collection)
Since sets can't have duplicates, just adding all the elements of one to the other generates the correct union of the two.

newStringSet.addAll(oldStringSet);
This will produce Union of s1 and s2

If you care about performance, and if you don't need to keep your two sets and one of them can be huge, I would suggest to check which set is the largest and add the elements from the smallest.
Set<String> newStringSet = getNewStringSet();
Set<String> oldStringSet = getOldStringSet();
Set<String> myResult;
if(oldStringSet.size() > newStringSet.size()){
oldStringSet.addAll(newStringSet);
myResult = oldStringSet;
} else{
newStringSet.addAll(oldStringSet);
myResult = newStringSet;
}
In this way, if your new set has 10 elements and your old set has 100 000, you only do 10 operations instead of 100 000.

Set.addAll()
Adds all of the elements in the specified collection to this set if they're not already present (optional operation). If the specified collection is also a set, the addAll operation effectively modifies this set so that its value is the union of the two sets
newStringSet.addAll(oldStringSet)

Related

Creating an ArrayList from a method which returns a String

I have an custom class InfoAQ which has a method called public String getSeqInf(). Now I have an ArrayList<InfoAQ> infList and
I need an ArrayList<String>strList = new ArrayList<String>with the content from getSeqInf()for each element.
This is the way Im doing it right now ...
for(InfoAQ currentInf : infList)
strList.add(currentInf.getSeqInf());
Is there an alternative way to make it ? Maybe a faster one or one liner ?

Yes, there is:
strList = infList.stream().map(e -> g.getSeqInf()).collect(Collectors.toList());
The map step can be also written in another way:
strList = infList.stream().map(InfoAQ::getSeqInf).collect(Collectors.toList());
which is know as method reference passing. Those two solutions are equivalent.

Also maybe this one:
List<String> strList = new ArrayList<String>();
infList.forEach(e -> strList.add(e.getSeqInf()));

And there is another one (-liner, if you format it in a single line):
infList.forEach(currentInf -> {strList.add(currentInf.getSeqInf());});
while I would prefer a formatting in more lines:
infList.forEach(currentInf -> {
strList.add(currentInf.getSeqInf());
});

Using streams
infList.stream()
.map(InfoAQ::getSeqInf)
.collect(Collectors.toCollection(ArrayList::new))
Using Collectors.toCollection here to create an ArrayList that will hold the results as you do in your case. (Important if you do care about the result list type as Collectors.toList() does not guarantee this)
May not be the fastest as using stream has some overhead. You need to measure/benchmark to find out its performance

This code will iterate all the data in the list, as getSeqInf returns a String, the collect method will store all returns of the getSeqInf method in a list.
`List listString = infList.stream().map(InfoAQ::getSeqInf).collect(Collectors.toList());`
or
`
ArrayList<String> listString = new ArrayList<>();
for(int i = 0; i < infoAq.size(); i++) {
listString.add(infoAq.get(i).getSeqInf());
}`

What is the best optimized way to find if a list contains every element of another list?

I have two list:
List<String> firstList = new LinkedList<String>();
List<String> secondList = new LinkedList<String>();
I want to know if every element of a list is contained by the other list.
A possible solution could be:
public boolean function(List<String> first, List<String> second)
{
first = firstList;
second = secondList
for (String item : firstList)
{
for (String elem : secondList)
{
if(elem.compareTo(item)!=0)
return false;
}
}
return true;
}
As we can see, the time is quadratic. Is there a way to do it better?

You have an O(n*m) implementation with O(1) space requirements; you could make an O(n+m) implementation with O(m) space requirements by adding elements of the first list to HashSet<String>, and then verifying that all elements of the second list are present:
Set<String> firstSet = new HashSet<String>(firstList);
for (String elem : secondList) {
if(!firstSet.contains(item)) {
return false;
}
}
return true;
or even better
return new HashSet<>(firstList).containsAll(secondList);
(thanks, bradimus!)
Note: Your approach uses sub-optimal comparison mechanism: rather than calling compareTo, you could call equals, because you do not need to check if the word is alphabetically before or after.
Another problem is that your approach will often returns false when it should return true, because you return false too early.

public boolean function(List<String> first, List<String> second) {
return (second.size() == first.size() && first.containsAll(second))
}

Depending on whether one or both lists change regulary you may find it more efficient to use HashSet<String> for one that changes rarely.
See Set operations: union, intersection, difference, symmetric difference, is subset, is superset for more details.

Try this:
!Collections.disjoint(list1, list2);
Or you can use containsAll
Hope it helps!

return first.containsAll(second);

There is an existing answer that I believe can address your question.
The idea is basically to use two Set and to calculate the size of their intersection.
I believe that if you can afford to use Set instead of List so you can save time on contains. You will have a complexity of o(n) whatever happens.
Edit
If you care about matching duplicates in your input Lists the above solution may not be efficient. You may want to be careful about that.

Can the new for loop in Java be used with two variables?

We can use the old for loop (for(i = 0, j = 0; i<30; i++,j++)) with two variables
Can we use the for-each loop (or the enhanced for loop) in java (for(Item item : items) with two variables? What's the syntax for that?

Unfortunately, Java supports only a rudimentary foreach loop, called the enhanced for loop. Other languages, especially FP ones like Scala, support a construct known as list comprehension (Scala calls it for comprehension) which allows nested iterations, as well as filtering of elements along the way.

No you can't. It is syntactic sugar for using Iterator. Refer here for a good answer on this issue.
You need to have an object that contains both variables.
It can be shown on a Map object for example.
for (Map.Entry<String,String> e: map.entrySet()) {
// you can use e.getKey() and e.getValue() here
}

The following should have the same (performance) effect that you are trying to achieve:
List<Item> aItems = new List<Item>();
List<Item> bItems = new List<Item>();
...
Iterator aIterator = aItems.iterator();
Iterator bIterator = bItems.iterator();
while (aIterator.hasNext() && bIterator.hasNext()) {
Item aItem = aIterator.next();
Item bItem = bIterator.next();
}

The foreach loop assumes that there is only one collection of things. You can do something for each element per iteration. How would you want it to behave that if you could iterate over two collections at once? What if they have different lenghts?
Assuming that you have
Collection<T1> collection1;
Collection<T2> collection2;
You could write an iterable wrapper that iterates over both and returns some sort of merged result.
for(TwoThings<T1, T2> thing : new TwoCollectionWrapper(collection1, collection2) {
// one of them could be null if collections have different length
T1 t1 = thing.getFirst();
T2 t2 = thing.getSecond();
}
That's the closest what I can think of but I don't see much use for that. If both collections are meant to be iterated together, it would be simpler to create a Collection<TwoThings> in the first place.
Besides iterating in parallel you could also want to iterate sequentially. There are implementations for that, e.g. Guava's Iterables.concat()

The simple answer "No" is already given. But you could implement taking two iterators as argument, and returning Pairs of the elements coming from the two iterators. Pair being a class with two fields. You'd either have to implement that yourself, or it is probably existent in some apache commons or similar lib.
This new Iterator could then be used in the foreach loop.

I had to do one task where I need to collect various data from XML and store in SET interface and then output them to a CSV file.
I read the data and stored it in Set interface object as x,y,z.
For CSV file header, I used string buffer to hold the headers
String buffer
StringBuffer buffer = new StringBuffer("");
buffer.append("FIRST_NAME,LAST_NAME,ADDRESS\r\n")
Set<String> x = new HashSet<String>();
Set<String> y = new HashSet<String>();
Set<String> z = new HashSet<String>();
....
Iterator iterator1 = x.iterator()
Iterator iterator2 = y.iterator()
Iterator iterator3 = z.iterator()
while(iterator1.hasNext() && iterator2.hasNext() && iterator3.hasNext()){
String fN = iterator1.next()
String lN = iterator2.next()
String aDS = iterator3.next()
buffer.append(""+fN+","+lN+","+aDS+"\r\n")
}

How to make a new list with a property of an object which is in another list

Imagine that I have a list of certain objects:
List<Student>
And I need to generate another list including the ids of Students in the above list:
List<Integer>
Avoiding using a loop, is it possible to achieve this by using apache collections or guava?
Which methods should be useful for my case?

Java 8 way of doing it:-
List<Integer> idList = students.stream().map(Student::getId).collect(Collectors.toList());

With Guava you can use Function like -
private enum StudentToId implements Function<Student, Integer> {
INSTANCE;
#Override
public Integer apply(Student input) {
return input.getId();
}
}
and you can use this function to convert List of students to ids like -
Lists.transform(studentList, StudentToId.INSTANCE);
Surely it will loop in order to extract all ids, but remember guava methods returns view and Function will only be applied when you try to iterate over the List<Integer>
If you don't iterate, it will never apply the loop.
Note: Remember this is the view and if you want to iterate multiple times it will be better to copy the content in some other List<Integer> like
ImmutableList.copyOf(Iterables.transform(students, StudentToId.INSTANCE));

Thanks to Premraj for the alternative cool option, upvoted.
I have used apache CollectionUtils and BeanUtils. Accordingly, I am satisfied with performance of the following code:
List<Long> idList = (List<Long>) CollectionUtils.collect(objectList,
new BeanToPropertyValueTransformer("id"));
It is worth mentioning that, I will compare the performance of guava (Premraj provided) and collectionUtils I used above, and decide the faster one.

Java 8 lambda expression solution:
List<Integer> iDList = students.stream().map((student) -> student.getId()).collect(Collectors.toList());

If someone get here after a few years:
List<String> stringProperty = (List<String>) CollectionUtils.collect(listOfBeans, TransformerUtils.invokerTransformer("getProperty"));

You can use Eclipse Collections for this purpose
Student first = new Student(1);
Student second = new Student(2);
Student third = new Student(3);
MutableList<Student> list = Lists.mutable.of(first, second, third);
List<Integer> result = list.collect(Student::getId);
System.out.println(result); // [1, 2, 3]

The accepted answer can be written in a further shorter form in JDK 16 which includes a toList() method directly on Stream instances.
Java 16 solution
List<Integer> idList = students.stream().map(Student::getId).toList();

It is Mathematically impossible to do this without a loop. In order to create a mapping, F, of a discrete set of values to another discrete set of values, F must operate on each element in the originating set. (A loop is required to do this, basically.)
That being said:
Why do you need a new list? You could be approaching whatever problem you are solving in the wrong way.
If you have a list of Student, then you are only a step or two away, when iterating through this list, from iterating over the I.D. numbers of the students.
for(Student s : list)
{
int current_id = s.getID();
// Do something with current_id
}
If you have a different sort of problem, then comment/update the question and we'll try to help you.

How to lowercase every element of a collection efficiently?

What's the most efficient way to lower case every element of a List or Set?
My idea for a List:
final List<String> strings = new ArrayList<String>();
strings.add("HELLO");
strings.add("WORLD");
for(int i=0,l=strings.size();i<l;++i)
{
strings.add(strings.remove(0).toLowerCase());
}
Is there a better, faster way? How would this example look like for a Set? As there is currently no method for applying an operation to each element of a Set (or List) can it be done without creating an additional temporary Set?
Something like this would be nice:
Set<String> strings = new HashSet<String>();
strings.apply(
function (element)
{ this.replace(element, element.toLowerCase();) }
);
Thanks,

Yet another solution, but with Java 8 and above:
List<String> result = strings.stream()
.map(String::toLowerCase)
.collect(Collectors.toList());

This seems like a fairly clean solution for lists. It should allow for the particular List implementation being used to provide an implementation that is optimal for both the traversal of the list--in linear time--and the replacing of the string--in constant time.
public static void replace(List<String> strings)
{
ListIterator<String> iterator = strings.listIterator();
while (iterator.hasNext())
{
iterator.set(iterator.next().toLowerCase());
}
}
This is the best that I can come up with for sets. As others have said, the operation cannot be performed in-place in the set for a number of reasons. The lower-case string may need to be placed in a different location in the set than the string it is replacing. Moreover, the lower-case string may not be added to the set at all if it is identical to another lower-case string that has already been added (e.g., "HELLO" and "Hello" will both yield "hello", which will only be added to the set once).
public static void replace(Set<String> strings)
{
String[] stringsArray = strings.toArray(new String[0]);
for (int i=0; i<stringsArray.length; ++i)
{
stringsArray[i] = stringsArray[i].toLowerCase();
}
strings.clear();
strings.addAll(Arrays.asList(stringsArray));
}

You can do this with Google Collections:
Collection<String> lowerCaseStrings = Collections2.transform(strings,
new Function<String, String>() {
public String apply(String str) {
return str.toLowerCase();
}
}
);

If you are fine with changing the input list here is one more way to achieve it.
strings.replaceAll(String::toLowerCase)

Well, there is no real elegant solution due to two facts:
Strings in Java are immutable
Java gives you no real nice map(f, list) function as you have in functional languages.
Asymptotically speaking, you can't get a better run time than your current method. You will have to create a new string using toLowerCase() and you will need to iterate by yourself over the list and generate each new lower-case string, replacing it with the existing one.

Try CollectionUtils#transform in Commons Collections for an in-place solution, or Collections2#transform in Guava if you need a live view.

This is probably faster:
for(int i=0,l=strings.size();i<l;++i)
{
strings.set(i, strings.get(i).toLowerCase());
}

I don't believe it is possible to do the manipulation in place (without creating another Collection) if you change strings to be a Set. This is because you can only iterate over the Set using an iterator or a for each loop, and cannot insert new objects whilst doing so (it throws an exception)

Referring to the ListIterator method in the accepted (Matthew T. Staebler's) solution. How is using the ListIterator better than the method here?
public static Set<String> replace(List<String> strings) {
Set<String> set = new HashSet<>();
for (String s: strings)
set.add(s.toLowerCase());
return set;
}

I was looking for similar stuff, but was stuck because my ArrayList object was not declared as GENERIC and it was available as raw List type object from somewhere. I was just getting an ArrayList object "_products". So, what I did is mentioned below and it worked for me perfectly ::
List<String> dbProducts = _products;
for(int i = 0; i<dbProducts.size(); i++) {
dbProducts.add(dbProducts.get(i).toLowerCase());
}
That is, I first took my available _products and made a GENERIC list object (As I were getting only strings in same) then I applied the toLowerCase() method on list elements which was not working previously because of non-generic ArrayList object.
And the method toLowerCase() we are using here is of String class.
String java.lang.String.toLowerCase()
not of ArrayList or Object class.
Please correct if m wrong. Newbie in JAVA seeks guidance. :)

Using JAVA 8 parallel stream it becomes faster
List<String> output= new ArrayList<>();
List<String> input= new ArrayList<>();
input.add("A");
input.add("B");
input.add("C");
input.add("D");
input.stream().parallel().map((item) -> item.toLowerCase())
.collect(Collectors.toCollection(() -> output));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Is there a better way to combine two string sets in java? - java

Since a Set does not contain duplicate entries, you can therefore combine the two by: newStringSet.addAll(oldStringSet); It does not matter if you add things twice, the set will only contain the element once... e.g it's no need to check using contains method.

The same with Guava: Set<String> combinedSet = Sets.union(oldStringSet, newStringSet)

If you are using Guava you can also use a builder to get more flexibility: ImmutableSet.<String>builder().addAll(someSet) .addAll(anotherSet) .add("A single string") .build();

If you are using the Apache Common, use SetUtils class from org.apache.commons.collections4.SetUtils; SetUtils.union(setA, setB);

Just use newStringSet.addAll(oldStringSet). No need to check for duplicates as the Set implementation does this already.

http://docs.oracle.com/javase/7/docs/api/java/util/Set.html#addAll(java.util.Collection) Since sets can't have duplicates, just adding all the elements of one to the other generates the correct union of the two.

newStringSet.addAll(oldStringSet); This will produce Union of s1 and s2

Related

Creating an ArrayList from a method which returns a String

What is the best optimized way to find if a list contains every element of another list?

Can the new for loop in Java be used with two variables?

How to make a new list with a property of an object which is in another list

How to lowercase every element of a collection efficiently?

Categories

Resources