I have changed my Test to make it reproduce easier:
Minimize Test
public class test {
public static void main(String[] args) {
List<TestBean> obj_list = Arrays.asList(new TestBean("aa"), new TestBean("bb" ), new TestBean("bb")).stream()
.distinct().map(tt -> {
tt.col = tt.col + "_t";
return tt;
}).collect(Collectors.toList());
System.out.println(obj_list);
List<String> string_obj_list = Arrays.asList(new String("1"), new String("2"), new String("2")).stream().distinct().map(t -> t + "_t").collect(Collectors.toList());
System.out.println(string_obj_list);
List<String> string_list = Arrays.asList("1", "2", "2").stream().distinct().map(t -> t + "_t").collect(Collectors.toList());
System.out.println(string_list);
}
}
#Data
#AllArgsConstructor
#EqualsAndHashCode
class TestBean {
String col;
}
the result is below, the line one is abnormal for me to understand:
[TestBean(col=aa_t), TestBean(col=bb_t), TestBean(col=bb_t)]
[1_t, 2_t]
[1_t, 2_t]
----------original question is below -------------------------------
my logic step:
produce a list of stream
map each element to list stream
collect list stream to one stream
distinct element
map function apply to each element and collect the result to a list
however , the result does not do distinct logic (step 4) ,which is that i can not understand
public class test {
public static void main(String[] args) {
List<TestBean> stage1 = Arrays.asList(new TestBean("aa", null), new TestBean("bb", null), new TestBean("bb", null)).stream()
.map(tt -> {
return Arrays.asList(tt);
})
.flatMap(Collection::stream).distinct().collect(Collectors.toList());
List<Object> stage2 = stage1.stream().map(tt -> {
tt.setCol2(tt.col1);
return tt;
}).collect(Collectors.toList());
System.out.println(stage1);
System.out.println(stage2);
List<TestBean> stage_all = Arrays.asList(new TestBean("aa", null), new TestBean("bb", null), new TestBean("bb", null)).stream()
.map(tt -> {
return Arrays.asList(tt);
})
.flatMap(Collection::stream).distinct().map(tt -> {
tt.setCol2(tt.col1);
return tt;
}).collect(Collectors.toList());
System.out.println(stage_all);
}
}
#Data
#AllArgsConstructor
#EqualsAndHashCode
class TestBean {
String col1;
String col2;
}
the result is
[TestBean(col1=aa, col2=aa), TestBean(col1=bb, col2=bb)]
[TestBean(col1=aa, col2=aa), TestBean(col1=bb, col2=bb)]
[TestBean(col1=aa, col2=aa), TestBean(col1=bb, col2=bb), TestBean(col1=bb, col2=bb)]
line three is abnormal for me.
The distinct() operation is filtering the set of items in the stream using Object.equals(Object).
However you are mutating the items as they are streamed - a very bad idea for Set operations - so in theory it is possible in your runs that the first TestBean(col=bb) is changed to TestBean(col=bb_t) before the final TestBean(col=bb) is handled by distinct(). Therefore at that moment there are 3 unique items in the stream seen by the distinct() step and the last .map() sees all three items.
You can verify by re-processing the stream without the ".map()" side effect or add .distinct() after .map().
Takeaway from this: Don't use distinct() or other set like operations on data structures that change fields used in equals() or hashCode() as this gives misleading / duplicates into set.add() operations. This is where Java records are useful as they cannot be changed and would eliminate errors from these type of side-effects:
record TestBean(String col) {}
Example
The #EqualsAndHashCode tag on TestBean is generating the equals and hashCode calls which are essential for Set / distinct() operations to work. If the hashcode/equals changes after adding an item, the set won't work properly as it fails to match up the previous element as being a duplicate of a newly added element. Consider this simpler definition of TestBean which add same instance 5 times:
public static void main(String... args) {
class TestBean {
String col;
TestBean(String col) {
this.col = col;
}
// This is bad choice hashCode as it changes if "col" is changed:
public int hashCode() {
return col.hashCode();
}
}
Set<TestBean> set = new HashSet<>();
TestBean x = new TestBean("bb");
for (int i = 0; i < 5; i++) {
System.out.println("set.add(x)="+set.add(x));
System.out.println("set.size()="+set.size());
// Comment out next line or whole hashCode method:
x.col += "_t";
}
}
Run the above which adds same element to a set 5 times. You will see that set.size() may be 5 not 1. Comment out the line which causes the hashcode to change - or the hashCode() method, and set.size()=1 as expected.
Related
I am reading data from an excel file using apache poi and transforming it into a list of object. But now I want to extract any duplicates based on certain rules into another list of that object and also get the non-duplicate list.
Condition to check for a duplicate
name
email
phone number
gst number
Any of these properties can result in a duplicate. which mean or not an and
Party Class
public class Party {
private String name;
private Long number;
private String email;
private String address;
private BigDecimal openingBalance;
private LocalDateTime openingDate;
private String gstNumber;
// Getter Setter Skipped
}
Let's say this is the list returned by the logic to excel data so far
var firstParty = new Party();
firstParty.setName("Valid Party");
firstParty.setAddress("Valid");
firstParty.setEmail("Valid");
firstParty.setGstNumber("Valid");
firstParty.setNumber(1234567890L);
firstParty.setOpeningBalance(BigDecimal.ZERO);
firstParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var secondParty = new Party();
secondParty.setName("Valid Party");
secondParty.setAddress("Valid Address");
secondParty.setEmail("Valid Email");
secondParty.setGstNumber("Valid GST");
secondParty.setNumber(7593612247L);
secondParty.setOpeningBalance(BigDecimal.ZERO);
secondParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var thirdParty = new Party();
thirdParty.setName("Valid Party 1");
thirdParty.setAddress("address");
thirdParty.setEmail("email");
thirdParty.setGstNumber("gst");
thirdParty.setNumber(7593612888L);
thirdParty.setOpeningBalance(BigDecimal.ZERO);
secondParty.setOpeningDate(DateUtil.getDDMMDateFromString("01/01/2020"));
var validParties = List.of(firstParty, secondParty, thirdParty);
What I have attempted so far :-
var partyNameOccurrenceMap = validParties.parallelStream()
.map(Party::getName)
.collect(Collectors.groupingBy(Function.identity(), HashMap::new, Collectors.counting()));
var partyNameOccurrenceMapCopy = SerializationUtils.clone(partyNameOccurrenceMap);
var duplicateParties = validParties.stream()
.filter(party-> {
var occurrence = partyNameOccurrenceMap.get(party.getName());
if (occurrence > 1) {
partyNameOccurrenceMap.put(party.getName(), occurrence - 1);
return true;
}
return false;
})
.toList();
var nonDuplicateParties = validParties.stream()
.filter(party -> {
var occurrence = partyNameOccurrenceMapCopy.get(party.getName());
if (occurrence > 1) {
partyNameOccurrenceMapCopy.put(party.getName(), occurrence - 1);
return false;
}
return true;
})
.toList();
The above code only checks for party name but we also need to check for email, phone number and gst number.
The code written above works just fine but the readability, conciseness and the performance might be an issue as the data set is large enough like 10k rows in excel file
Never ignore Equals/hashCode contract
name, email, number, gstNumber
Any of these properties can result in a duplicate, which mean or
Your definition of a duplicate implies that any of these properties should match, whilst others might not.
It means that it's impossible to provide an implementation equals/hashCode that would match the given definition and doesn't violate the hashCode contract.
If two objects are equal according to the equals method, then calling the hashCode method on each of the two objects must produce the same integer result.
I.e. if you implement equals in such a way they any (not all) of these properties: name, email, number, gstNumber could match, and that would enough to consider the two objects equal, then there's no way to implement hashCode correctly.
And as the consequence of this, you can't use the object with a broken equals/hashCode implementation in with a hash-based Collection because equal objects might end up the in the different bucket (since they can produce different hashes). I.e. HashMap would not be able to recognize the duplicated keys, hence groupingBy with groupingBy() with Function.identity() as a classifier function would not work properly.
Therefore, to address this problem, you need to implement equals() based on all 4 properties: name, email, number, gstNumber (i.e. all these values have to be equal), and similarly all these values must contribute to hash-code.
How to determine Duplicates
There's no easy way to determine duplicates by multiple criteria. The solution you've provided is not viable, since we can't rely on the equals/hashCode.
The only way is to generate a HashMap separately for each end every attribute (i.e. in this case we need 4 maps). But can we alternate this, avoiding repeating the same steps for each map and hard coding the logic?
Yes, we can.
We can create a custom generic accumulation type (it would be suitable for any class - no hard-coded logic) that would encapsulate all the logic of determining duplicates and maintain an arbitrary number of maps under the hood. After consuming all the elements from the given collection, this custom object would be aware of all the duplicates in it.
That's how it can be implemented.
A custom accumulation type that would be used as container of a custom Collector. Its constructor expects varargs of functions, each function correspond to the property that should be taken into account while checking whether an object is a duplicate.
public static class DuplicateChecker<T> implements Consumer<T> {
private List<DuplicateHandler<T>> handles;
private Set<T> duplicates;
#SafeVarargs
public DuplicateChecker(Function<T, ?>... keyExtractors) {
this.handles = Arrays.stream(keyExtractors)
.map(DuplicateHandler::new)
.toList();
}
#Override
public void accept(T t) {
handles.forEach(h -> h.accept(t));
}
public DuplicateChecker<T> merge(DuplicateChecker<T> other) {
for (DuplicateHandler<T> handler: handles) {
other.handles.forEach(handler::merge);
}
return this;
}
public DuplicateChecker<T> finish() {
duplicates = handles.stream()
.flatMap(handler -> handler.getDuplicates().stream())
.flatMap(Set::stream)
.collect(Collectors.toSet());
return this;
}
public boolean isDuplicate(T t) {
return duplicates.contains(t);
}
}
A helper class representing a single createrion (like name, email, etc.) which encapsulates a HashMap. keyExtractor is used to obtain a key from an object of type T.
public static class DuplicateHandler<T> implements Consumer<T> {
private Map<Object, Set<T>> itemByKey = new HashMap<>();
private Function<T, ?> keyExtractor;
public DuplicateHandler(Function<T, ?> keyExtractor) {
this.keyExtractor = keyExtractor;
}
#Override
public void accept(T t) {
itemByKey.computeIfAbsent(keyExtractor.apply(t), k -> new HashSet<>()).add(t);
}
public void merge(DuplicateHandler<T> other) {
other.itemByKey.forEach((k, v) ->
itemByKey.merge(k,v,(oldV, newV) -> { oldV.addAll(newV); return oldV; }));
}
public Collection<Set<T>> getDuplicates() {
Collection<Set<T>> duplicates = itemByKey.values();
duplicates.removeIf(set -> set.size() == 1); // the object is proved to be unique by this particular property
return duplicates;
}
}
And that is the method, responsible for generating the map of duplicates, that would be used from the clean code. The given collection would be partitioned into two parts: one mapped to the key true - duplicates, another mapped to the key false - unique objects.
public static <T> Map<Boolean, List<T>> getPartitionByProperties(Collection<T> parties,
Function<T, ?>... keyExtractors) {
DuplicateChecker<T> duplicateChecker = parties.stream()
.collect(Collector.of(
() -> new DuplicateChecker<>(keyExtractors),
DuplicateChecker::accept,
DuplicateChecker::merge,
DuplicateChecker::finish
));
return parties.stream()
.collect(Collectors.partitioningBy(duplicateChecker::isDuplicate));
}
And that how you can apply it for your particular case.
main()
public static void main(String[] args) {
List<Party> parties = // initializing the list of parties
Map<Boolean, List<Party>> isDuplicate = partitionByProperties(parties,
Party::getName, Party::getNumber,
Party::getEmail, Party::getGstNumber);
}
I would use create a map for each property where
key is the property we want to check duplicate
value is a Set containing all the index of element in the list with same key.
Then we can
filter values in the map with more that 1 index (i.e. duplicate indexes).
union all the duplicate index
determine if the element is duplicate/unique by using the duplicate index.
The time complexity is roughly O(n).
public class UniquePerEachProperty {
private static void separate(List<Party> partyList) {
Map<String, Set<Integer>> nameToIndexesMap = new HashMap<>();
Map<String, Set<Integer>> emailToIndexesMap = new HashMap<>();
Map<Long, Set<Integer>> numberToIndexesMap = new HashMap<>();
Map<String, Set<Integer>> gstNumberToIndexesMap = new HashMap<>();
for (int i = 0; i < partyList.size(); i++) {
Party party = partyList.get(i);
nameToIndexesMap.putIfAbsent(party.getName(), new HashSet<>());
nameToIndexesMap.get(party.getName()).add(i);
emailToIndexesMap.putIfAbsent(party.getEmail(), new HashSet<>());
emailToIndexesMap.get(party.getEmail()).add(i);
numberToIndexesMap.putIfAbsent(party.getNumber(), new HashSet<>());
numberToIndexesMap.get(party.getNumber()).add(i);
gstNumberToIndexesMap.putIfAbsent(party.getGstNumber(), new HashSet<>());
gstNumberToIndexesMap.get(party.getGstNumber()).add(i);
}
Set<Integer> duplicatedIndexes = Stream.of(
nameToIndexesMap.values(),
emailToIndexesMap.values(),
numberToIndexesMap.values(),
gstNumberToIndexesMap.values()
).flatMap(Collection::stream).filter(indexes -> indexes.size() > 1)
.flatMap(Set::stream).collect(Collectors.toSet());
List<Party> duplicatedList = new ArrayList<>();
List<Party> uniqueList = new ArrayList<>();
for (int i = 0; i < partyList.size(); i++) {
Party party = partyList.get(i);
if (duplicatedIndexes.contains(i)) {
duplicatedList.add(party);
} else {
uniqueList.add(party);
}
}
System.out.println("duplicated:" + duplicatedList);
System.out.println("unique:" + uniqueList);
}
public static void main(String[] args) {
separate(List.of(
// name duplicate
new Party("name1", 1L, "email1", "gstNumber1"),
new Party("name1", 2L, "email2", "gstNumber2"),
// number duplicate
new Party("name3", 3L, "email3", "gstNumber3"),
new Party("name4", 3L, "email4", "gstNumber4"),
// email duplicate
new Party("name5", 5L, "email5", "gstNumber5"),
new Party("name6", 6L, "email5", "gstNumber6"),
// gstNumber duplicate
new Party("name7", 7L, "email7", "gstNumber7"),
new Party("name8", 8L, "email8", "gstNumber7"),
// unique
new Party("name9", 9L, "email9", "gstNumber9")
));
}
}
Assume Party has below constructor and toString()(for testing)
public class Party {
public Party(String name, Long number, String email, String gstNumber) {
this.name = name;
this.number = number;
this.email = email;
this.address = "";
this.openingBalance = BigDecimal.ZERO;
this.openingDate = LocalDateTime.MIN;
this.gstNumber = gstNumber;
}
#Override
public String toString() {
return "Party{" +
"name='" + name + '\'' +
", number=" + number +
", email='" + email + '\'' +
", gstNumber='" + gstNumber + '\'' +
'}';
}
...
}
In Java docs it is given -
Modifier and Type Method and Description
static <T> Predicate<T> isEqual(Object targetRef)
Returns a predicate that tests if two arguments are equal according to Objects.equals(Object, Object).
In https://www.geeksforgeeks.org/java-8-predicate-with-examples/
it is given -
isEqual(Object targetRef) : Returns a predicate that tests if two arguments are equal according to Objects.equals(Object, Object).
static Predicate isEqual(Object targetRef)
Returns a predicate that tests if two arguments are
equal according to Objects.equals(Object, Object).
T : the type of arguments to the predicate
Parameters:
targetRef : the object reference with which to
compare for equality, which may be null
Returns: a predicate that tests if two arguments
are equal according to Objects.equals(Object, Object)
I can't get a grisp of what this Objects.equals(Object, Object) might be
I write the following code to try it out -
Class Fruits -
Fruits.java -
public class Fruits {
private String fruit;
public Fruits(String fruit) {
this.fruit = fruit;
}
public String getFruit() {
return fruit;
}
}
Here, the other methods of predicate seem to be quite easy to understand -
Predicate<List<Fruits>> containsApple = list -> {
boolean myReturn = false;
Iterator<Fruits> iterator = list.iterator();
while (iterator.hasNext()) {
Fruits fruits = iterator.next();
String fruit = fruits.getFruit();
if (fruit.equals("Apple")) {
myReturn = true;
break;
}
}
return myReturn;
};
Predicate<List<Fruits>> containsOrange = list -> {
boolean myReturn = false;
Iterator<Fruits> iterator = list.iterator();
while (iterator.hasNext()) {
Fruits fruits = iterator.next();
String fruit = fruits.getFruit();
if (fruit.equals("Orange")) {
myReturn = true;
break;
}
}
return myReturn;
};
Predicate<List<Fruits>> containsAppleAndOrange = list -> {
return containsApple.and(containsOrange).test(list);
};
Predicate<List<Fruits>> containsAppleOrRange = list -> {
return containsApple.or(containsOrange).test(list);
};
Predicate<List<Fruits>> notContainsApple = list -> {
return containsApple.negate().test(list);
};
Predicate<List<Fruits>> notContainsOrange = list -> {
return containsOrange.negate().test(list);
};
Predicate<List<Fruits>> notContainsAppleAndOrange = list -> {
return containsAppleAndOrange.negate().test(list);
};
Predicate<List<Fruits>> notContainsAppleOrOrange = list -> {
return containsAppleOrRange.negate().test(list);
};
Here I test it with following data -
List<Fruits> list1 = new ArrayList<>(List.of(
new Fruits("Apple"),
new Fruits("Orange"),
new Fruits("Mango"),
new Fruits("Banana")
));
List<Fruits> list2 = new ArrayList<>(List.of(
new Fruits("Apple"),
new Fruits("Mango"),
new Fruits("Banana"),
new Fruits("Berry")
));
List<Fruits> list3 = new ArrayList<>(List.of(
new Fruits("Orange"),
new Fruits("Mango"),
new Fruits("Banana"),
new Fruits("Berry")
));
Result is as expected.
But in no way can I understand how to implement the isEqual() method -
To see that two arguments are equal are not I create another predicate -
redicate<List<Fruits>> containsApple2 = list -> {
boolean myReturn = false;
Iterator<Fruits> iterator = list.iterator();
while (iterator.hasNext()) {
Fruits fruits = iterator.next();
String fruit = fruits.getFruit();
if (fruit.equals("Apple")) {
myReturn = true;
break;
}
}
return myReturn;
};
I try something like (without understanding why) -
System.out.println(Predicate.isEqual(containsApple).test(list1));
Output - false
Now what happened here?
System.out.println(Predicate.isEqual(containsApple2).test(containsApple));
Output - false
Now again what happened here?
So, how to exactly use this isEqual method?
Predicate.isEqual is a factory method that creates predicates that test if a given thing is equal to the parameter passed in.
Predicate.isEqual(containsApple) creates a Predicate<Predicate<List<Fruits>>> that tests if a given thing is equal to containsApple. However, since containsApple refers to an instance created from a lambda, and nothing much is guaranteed about the equality of instances created from lambda expressions (See the JLS), nothing much can be said about the result of calling test on it. The classes of the lambda instances may or may not implement equals, and containsApple may or may not be the same instance as containsApple2, depending on the implementation.
Rather than comparing lambda instances, a typical example of using Predicate.isEqual is:
Fruits apple = new Fruits("Apple");
Predicate<Fruits> isApple = Predicate.isEqual(apple);
// rather than this slightly longer version:
// Predicate<Fruits> isApple = x -> Objects.equals(x, apple);
Then you can pass isApple around, to other methods that take Predicates, and/or call test on it. isApple.test(apple) would be true, isApple.test(new Fruits("something else")) would be false. I would also recommend that you override equals and hashCode in Fruits.
Note that we generally make predicates that test against individual objects, rather than lists (collections) of things. We would pass these predicates to other methods (such as Stream.filter), and let them do the filtering. For example, to filter a list to get all the apples:
List<Fruits> apples = fruitsList.stream()
.filter(Predicate.isEqual(apple)).toList();
One should use singular here for the class Fruits.
First you must establish equality of Fruit. Also should you ever want it to store in a HashMap or HashSet, a hashCode implementation is important.
public class Fruit {
private final String fruit; // Or name.
public Fruit(String fruit) {
this.fruit = fruit;
}
public String getFruit() {
return fruit;
}
#Override
public boolean equals(Object other) {
return other instanceOf Fruit && ((Fruit) other).fruit.equals(fruit);
}
#Override
public int hashCode() {
return fruit.hashCode();
}
}
The Iterator class is rather old and its primary advantage is you can walk through and still remove an element with iterator.remove(), which is not allowed on the List in a - statefull - for (ConcurrentModificationException).
Predicate<List<Fruit>> containsApple = list -> {
for (Fruit fruit: list) {
if (fruit.getFruit().equals("Apple")) {
return true;
}
}
return false;
};
Predicate<List<Fruit>> containsApple = list -> list.contains(new Fruit("Apple"));
Advisable is to get acquainted with Stream (like for iterating through a collection) and its expressive power.
Predicate<List<Fruit>> containsApple = list ->
list.stream()
.anyMatch(fr -> fr.getFruit().equals("Apple"));
As mentioned by #user16320675 in comments one of the simplest examples would be -
import java.util.function.Predicate;
public class App {
public static void main(String[] args) {
Integer num1 = 2;
Integer num2 = 3;
Predicate<Integer> predicate = Predicate.isEqual(num1);
System.out.println(predicate.test(num1));
System.out.println(predicate.test(num2));
}
}
Output -
true
false
The code can also be rewritten as -
System.out.println(Predicate.isEqual(num1).test(num1));
System.out.println(Predicate.isEqual(num1).test(num2));
with same output.
A practical application in Java streams -
Code -
import java.util.ArrayList;
import java.util.List;
import java.util.function.Predicate;
public class App {
public static void main(String[] args) {
List<String> list = new ArrayList<>();
list.add("Elephant");
list.add("Hippo");
list.add("Rhino");
list.add("Deer");
list.add("Hippo");
list.add("Zebra");
Predicate<String> predicate = Predicate.isEqual("Hippo");
list.stream().filter(predicate).forEach(System.out::println);
}
}
Output -
Hippo
Hippo
The expectation is derive 3 lists itemIsBoth, aItems, bItems from the input list items.
How to convert code like below to functional style? (I understand this code is clear enough in an imperative style, but I want to know does declarative style really fail to deal with such a simple example). Thanks.
for (Item item: items) {
if (item.isA() && item.isB()) {
itemIsBoth.add(item);
} else if (item.isA()) {
aItems.add(item);
} else if (item.isB()){
bItems.add(item)
}
}
The question title is quite broad (convert if-else ladder), but since the actual question asks about a specific scenario, let me offer a sample that can at least illustrate what can be done.
Because the if-else structure creates three distinct lists based on a predicate applied to the item, we can express this behavior more declaratively as a grouping operation. The only extra needed to make this work out of the box would be to collapse the multiple Boolean predicates using a tagging object. For example:
class Item {
enum Category {A, B, AB}
public Category getCategory() {
return /* ... */;
}
}
Then the logic can be expressed simply as:
Map<Item.Category, List<Item>> categorized =
items.stream().collect(Collectors.groupingBy(Item::getCategory));
where each list can be retrieved from the map given its category.
If it's not possible to change class Item, the same effect can be achieved by moving the enum declaration and the categorization method outsize the Item class (the method would become a static method).
Another solution using Vavr and doing only one iteration over a list of items might be achieved using foldLeft:
list.foldLeft(
Tuple.of(List.empty(), List.empty(), List.empty()), //we declare 3 lists for results
(lists, item) -> Match(item).of(
//both predicates pass, add to first list
Case($(allOf(Item::isA, Item::isB)), lists.map1(l -> l.append(item))),
//is a, add to second list
Case($(Item::isA), lists.map2(l -> l.append(item))),
//is b, add to third list
Case($(Item::isB), lists.map3(l -> l.append(item)))
))
);
It will return a tuple containing three lists with results.
Of course, you can. The functional way is to use declarative ways.
Mathematically you are setting an Equivalence relation, then, you can write
Map<String, List<Item>> ys = xs
.stream()
.collect(groupingBy(x -> here your equivalence relation))
A simple example show this
public class Main {
static class Item {
private final boolean a;
private final boolean b;
Item(boolean a, boolean b) {
this.a = a;
this.b = b;
}
public boolean isB() {
return b;
}
public boolean isA() {
return a;
}
}
public static void main(String[] args) {
List<Item> xs = asList(new Item(true, true), new Item(true, true), new Item(false, true));
Map<String, List<Item>> ys = xs.stream().collect(groupingBy(x -> x.isA() + "," + x.isB()));
ys.entrySet().forEach(System.out::println);
}
}
With output
true,true=[com.foo.Main$Item#64616ca2, com.foo.Main$Item#13fee20c]
false,true=[com.foo.Main$Item#4e04a765]
Another way you can get rid of the if-else is to to replace them with Predicate and Consumer:
Map<Predicate<Item>, Consumer<Item>> actions =
Map.of(item.predicateA(), aItems::add, item.predicateB(), bItems::add);
actions.forEach((key, value) -> items.stream().filter(key).forEach(value));
Therefore you need to enhace your Item with the both mehods predicateA() and predicateB() using the logic you have implemented in your isA() and isB()
Btw I would still suggest to use your if-else logic.
Since you've mentioned vavr as a tag, I'm gonna provide a solution using vavr collections.
import static io.vavr.Predicates.allOf;
import static io.vavr.Predicates.not;
...
final Array<Item> itemIsBoth = items.filter(allOf(Item::isA, Item::isB));
final Array<Item> aItems = items.filter(allOf(Item::isA, not(Item::isB)));
final Array<Item> bItems = items.filter(allOf(Item::isB, not(Item::isA)));
The advantage of this solution that it's simple to understand at a glance and it's as functional as you can get with Java. The drawback is that it will iterate over the original collections three times instead of once. That's still an O(n), but with a constant multiplier factor of 3. On non-critical code paths and with small collections it might be worth to trade a few CPU cycles for code clarity.
Of course, this works with all the other vavr collections too, so you can replace Array with List, Vector, Stream, etc.
Not (functional in the sense of) using lambda's or so, but quite functional in the sense of using only functions (as per mathematics) and no local state/variabels anywhere :
/* returns 0, 1, 2 or 3 according to isA/isB */
int getCategory(Item item) {
return item.isA() ? 1 : 0 + 2 * (item.isB() ? 1 : 0)
}
LinkedList<Item>[] lists = new LinkedList<Item> { initializer for 4-element array here };
{
for (Item item: items) {
lists[getCategory(item)].addLast(item);
}
}
The question is somewhat controversial, as it seems (+5/-3 at the time of writing this).
As you mentioned, the imperative solution here is most likely the most simple, appropriate and readable one.
The functional or declarative style does not really "fail". It's rather raising questions about the exact goals, conditions and context, and maybe even philosophical questions about language details (like why there is no standard Pair class in core Java).
You can apply a functional solution here. One simple, technical question is then whether you really want to fill the existing lists, or whether it's OK to create new lists. In both cases, you can use the Collectors#groupingBy method.
The grouping criterion is the same in both cases: Namely, any "representation" of the specific combination of isA and isB of one item. There are different possible solutions for that. In the examples below, I used an Entry<Boolean, Boolean> as the key.
(If you had further conditions, like isC and isD, then you could in fact also use a List<Boolean>).
The example shows how you can either add the item to existing lists (as in your question), or create new lists (which is a tad simpler and cleaner).
import java.util.AbstractMap.SimpleEntry;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.stream.Collectors;
public class FunctionalIfElse
{
public static void main(String[] args)
{
List<Item> items = new ArrayList<Item>();
items.add(new Item(false, false));
items.add(new Item(false, true));
items.add(new Item(true, false));
items.add(new Item(true, true));
fillExistingLists(items);
createNewLists(items);
}
private static void fillExistingLists(List<Item> items)
{
System.out.println("Filling existing lists:");
List<Item> itemIsBoth = new ArrayList<Item>();
List<Item> aItems = new ArrayList<Item>();
List<Item> bItems = new ArrayList<Item>();
Map<Entry<Boolean, Boolean>, List<Item>> map =
new LinkedHashMap<Entry<Boolean, Boolean>, List<Item>>();
map.put(entryWith(true, true), itemIsBoth);
map.put(entryWith(true, false), aItems);
map.put(entryWith(false, true), bItems);
items.stream().collect(Collectors.groupingBy(
item -> entryWith(item.isA(), item.isB()),
() -> map, Collectors.toList()));
System.out.println("Both");
itemIsBoth.forEach(System.out::println);
System.out.println("A");
aItems.forEach(System.out::println);
System.out.println("B");
bItems.forEach(System.out::println);
}
private static void createNewLists(List<Item> items)
{
System.out.println("Creating new lists:");
Map<Entry<Boolean, Boolean>, List<Item>> map =
items.stream().collect(Collectors.groupingBy(
item -> entryWith(item.isA(), item.isB()),
LinkedHashMap::new, Collectors.toList()));
List<Item> itemIsBoth = map.get(entryWith(true, true));
List<Item> aItems = map.get(entryWith(true, false));
List<Item> bItems = map.get(entryWith(false, true));
System.out.println("Both");
itemIsBoth.forEach(System.out::println);
System.out.println("A");
aItems.forEach(System.out::println);
System.out.println("B");
bItems.forEach(System.out::println);
}
private static <K, V> Entry<K, V> entryWith(K k, V v)
{
return new SimpleEntry<K, V>(k, v);
}
static class Item
{
private boolean a;
private boolean b;
public Item(boolean a, boolean b)
{
this.a = a;
this.b = b;
}
public boolean isA()
{
return a;
}
public boolean isB()
{
return b;
}
#Override
public String toString()
{
return "(" + a + ", " + b + ")";
}
}
}
I'm trying to find a data-structure in Java (or Groovy) that where something like this works:
MemberAdressableSetsSet mass = new MemberAdressableSetsSet();
mass.addSet(["a","b"]);
mass.addSet(["c","d","e"]);
mass.get("d").add("f");
String output = Arrays.toString(mass.get("e").toArray());
System.out.println(output); // [ "c", "d", "e", "f" ] (ordering irrelevant)
Does anything like that exist? And if not, is there a way to implement something like this with normal Java code that doesn't give the CPU or the memory nightmares for weeks?
Edit: more rigorously
MemberAdressableSetsSet mass = new MemberAdressableSetsSet();
Set<String> s1 = new HashSet<String>();
s1.add("a");
Set<String> s2 = new HashSet<String>();
s2.add("c");s2.add("d");s2.add("e");
mass.addSet(s1);
mass.addSet(s2);
Set<String> s3 = new HashSet<String>();
s3.add("a");s3.add("z");
mass.addSet(s3);
/* s3 contains "a", which is already in a subset of mass, so:
* Either
* - does nothing and returns false or throws Exception
* - deletes "a" from its previous subset before adding s3
* => possibly returns the old subset
* => deletes the old subset if that leaves it empty
* => maybe requires an optional parameter to be set
* - removes "a" from the new subset before adding it
* => possibly returns the new subset that was actually added
* => does not add the new subset if purging it of overlap leaves it empty
* => maybe requires an optional parameter to be set
* - merges all sets that would end up overlapping
* - adds it with no overlap checks, but get("a") returns an array of all sets containing it
*/
mass.get("d").add("f");
String output = Arrays.toString(mass.get("e").toArray());
System.out.println(output); // [ "c", "d", "e", "f" ] (ordering irrelevant)
mass.get("d") would return the Set<T> in mass that contains "d". Analogous to how get() works in, say, HashMap:
HashMap<String,LinkedList<Integer>> map = new HashMap<>();
LinkedList<Integer> list = new LinkedList<>();
list.add(9);
map.put("d",list);
map.get("d").add(4);
map.get("d"); // returns a LinkedList with contents [9,4]
The best I could come up with so far looks like this:
import java.util.HashMap;
import java.util.Set;
public class MemberAdressableSetsSet {
private int next_id = 1;
private HashMap<Object,Integer> members = new HashMap();
private HashMap<Integer,Set> sets = new HashMap();
public boolean addSet(Set s) {
if (s.size()==0) return false;
for (Object member : s) {
if (members.get(member)!=null) return false;
}
sets.put(next_id,s);
for (Object member : s) {
members.put(member,next_id);
}
next_id++;
return true;
}
public boolean deleteSet(Object member) {
Integer id = members.get(member);
if (id==null) return false;
Set set = sets.get(id);
for (Object m : set) {
members.remove(m);
}
sets.remove(id);
return true;
}
public boolean addToSet(Object member, Object addition) {
Integer id = members.get(member);
if (id==null) throw new IndexOutOfBoundsException();
if (members.get(addition)!=null) return false;
sets.get(id).add(addition);
members.put(addition,id);
return true;
}
public boolean removeFromSet(Object member) {
Integer id = members.get(member);
if (id==null) return false;
Set s = sets.get(id);
if (s.size()==1) sets.remove(id);
else s.remove(member);
members.remove(member);
return true;
}
public Set getSetClone(Object member) {
Integer id = members.get(member);
if (id==null) throw new IndexOutOfBoundsException();
Set copy = new java.util.HashSet(sets.get(id));
return copy;
}
}
Which has some drawbacks:
Sets are not directly accessible, which makes all Set methods and properties not exposed by explicitly defined translation methods inaccessible, unless the clones are an acceptable option
Type information is lost.
Say a Set<Date> is added.
It would not complain about trying to add, for example, a File object to that set.
At least the lost type information for the Sets doesn't extend to their members: the Set.contains() still works exactly as expected, despite both sides having been typecast to Object before being compared by contains(). So a set containing (Object)3 won't return true when asked whether it contains (Object)3L and vice versa, for example.
A set containing (Object)(new java.util.Date(10L)) will return true when asked whether it contains (Object)(new java.sql.Date(10L)) (and the other way round), but that's true even without the (Object) in front, so I guess that's "works as intended" ¯\_(ツ)_/¯
How often do you need to access by one element? Might be worth using a map and storing the same Set reference under multiple keys.
I would prevent external mutation to the map and sub sets, and provide helper method to do all of the updates:
public class MemberAdressableSets<T> {
Map<T, Set<T>> data = new HashMap<>();
public void addSet(Set<T> dataSet) {
if (dataSet.stream().anyMatch(data::containsKey)) {
throw Exception("Key already in member addressable data");
}
Set<T> protectedSet = new HashSet<>(dataSet);
dataSet.forEach(d -> data.put(d, protectedSet));
}
public void updateSet(T key, T... newData) {
Set<T> dataSet = data.get(key);
Arrays.stream(newData).forEach(dataSet::add);
Arrays.stream(newData).forEach(d -> data.put(d, dataSet));
}
public Set<T> get(T key) {
return Collections.unmodifiableSet(data.get(key));
}
}
Alternatively you could update the addSet and updateSet to create new Set instances if the key doesn't exist and make updateSet never throw. You'll also need to extend this class to handle the cases of merging sets. i.e. handle the use-case:
mass.addSet(["a","b"]);
mass.addSet(["a","c"]);
This solution allows for things like mass.get("d").add("f"); to affect the subset stored in mass, but with major drawbacks.
import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.Set;
public class MemberAdressableSetsSetDirect {
private LinkedHashSet<Set> sets;
public void addSet(Set newSet) {
sets.add(newSet);
}
public Set removeSet(Object member) {
Iterator<Set> it = sets.iterator();
while (it.hasNext()) {
Set s = it.next();
if (s.contains(member)) {
it.remove();
return s;
}
}
return null;
}
public int removeSets(Object member) {
int removed = 0;
Iterator<Set> it = sets.iterator();
while (it.hasNext()) {
Set s = it.next();
if (s.contains(member)) {
it.remove();
removed++;
}
}
return removed;
}
public void deleteEmptySets() {
sets.removeIf(Set::isEmpty);
}
public Set get(Object member) {
for (Set s : sets) {
if (s.contains(member)) return s;
}
return null;
}
public Set[] getAll(Object member) {
LinkedHashSet<Set> results = new LinkedHashSet<>();
for (Set s : sets) {
if (s.contains(member)) results.add(s);
}
return (Set[]) results.toArray();
}
}
There's no built-in protection against overlap and thus we have unreliable access, as well as introducing the possibility of countless empty sets that need to be periodically purged with a manual call to deleteEmptySets(), as this solution can't detect if a subset was modified by direct access.
MemberAdressableSetsSetDirect massd = new MemberAdressableSetsSetDirect();
Set s1 = new HashSet();Set s2 = new HashSet();Set s3 = new HashSet();
s1.add("a");s1.add("b");
s2.add("c");s2.add("d");
s3.add("e");
massd.addSet(s1);massd.addSet(s2);
massd.get("c").add("a");
// massd.get("a") will now either return the Set ["a","b"] or the Set ["a","c","d"]
// (could be that my usage of a LinkedHashSet as the basis of massd
// at least makes it consistently return the set added first)
massd.get("e").remove("e");
// the third set is now empty, can't be accessed anymore,
// and massd has no clue about that until it's told to look for empty sets
massd.get("c").remove("d");
massd.get("c").remove("c");
// if LinkedHashSet makes this solution act as I suspected above,
// this makes the third subset inaccessible except via massd.getAll("a")[1]
Additionaly, this solution also can't preserve type information.
This will not even give warnings:
MemberAdressableSetsSetDirect massd = new MemberAdressableSetsSetDirect();
Set<Long> s = new HashSet<Long>();
s.add(3L);
massd.addSet(s);
massd.get(3L).add("someString");
// massd.get(3L) will now return a Set with contents [3L, "someString"]
I want to change a List to List, and then map as Map.
The key would be "img"+index , how can I do it.
example:
from List ["a", "b", "c"]
to Map {"img1": "a", "img2": "b", "img3": "c"}
Observable.from((Bitmap[])bitmapList.toArray())
.subscribeOn(Schedulers.io())
.map(new Func1<Bitmap, String>() {
#Override
public String call(Bitmap bitmap) {
return doSomething(bitmap);
}
})
.toMap(new Func1<String, String>() {
#Override
public String call(String s) {
return "img"; // how to got the index
}
})
...;
In order to combine a value with an index you need some internal state: you need to keep track of a counter within the stream. You can do this with the scan operator. Because you need to keep track of both that counter and the actual value, we first need to introduce a simple class that can hold two values:
private static class Tuple<T, S> {
final T first;
final S second;
Tuple(T k, S v) {
this.first = k;
this.second = v;
}
}
The scan operator requires two parameters: an initial value for the state and an accumulator function that takes the previous state and a new value and transforms them into a new state. The initial state is simple, it is the combination of the empty String ("") and an initial index (depending on which index you want to start, 0 or 1. The accumulator is easy now: it takes the new value and increments the counter from the previous state and combines them in a new Tuple.
Because the initial state is not what you want to see here, you need to do a skip(1) to get rid of the first emitted element.
Finally you can do toMap, but you need to take the version with two arguments: the keySelector and the valueSelector where you get the key and value out of the Tuple respectively.
The final code looks as follows:
public static void main(String[] args) {
Observable.from(Arrays.asList("a", "b", "c"))
.scan(new Tuple<>("", 0), (tuple, s) -> new Tuple<>(s, tuple.second + 1))
.skip(1)
.toMap(tuple -> "img" + tuple.second, tuple -> tuple.first)
.subscribe(System.out::println);
}
Notice that this combination of scan and skip is in fact a zipWithIndex, as it is called in for example RxScala. Java does not have tuples in the language so you cannot do this directly, but you have to create your own Tuple class for it to work.