Join Custom lists in java - java

I have two custom lists as follows.
List<OfficeName> = [{id: 1, offname: "Office1"}{id: 2, offname: "Office2"}]
List<OfficeLocation> = [{id: 1, offlocation: "location1"}{id: 2, offlocation: "locatio2"}]
I want result as follows:
list<OfficeDetails> = [{id: 1, offname: "Office1",offlocation: "location1" },
{id: 2, offname: "Office2", offlocation: "location2"}]
The first two lists needs to be joined on basis of "id" to give a new list which is equivalent to the join operation in sql tables.
My model classes are
public class OfficeName {
int id;
String offname;
//getter and setter
}
.................
public class OfficeLocation{
int id;
String offlocation;
//getter and setter
}
.........
Currently I am Iterating and manually adding as follows to a LinkedHashSet .
{
List<OfficeName> officeName = new ArrayList<OfficeName>();
onr.findById(id).forEach(officeName::add); // adding values from auto wired Repository
List<OfficeLocation> officeLocation = new ArrayList<OfficeLocation>();
olr.findById(id).forEach(officeLocation::add); // adding values from auto wired Repository
LinkedHashSet<LinkedHashSet<String>> lhs = new LinkedHashSet<LinkedHashSet<String> >();
OfficeName officeName1 = new OfficeName();
OfficeLocation officeLocation1 = new OfficeLocation();
Iterator<OfficeName> onIterator = officeName.iterator();
Iterator<OfficeLocation> olIterator = officeLocation.iterator();
while (onIterator.hasNext()) {
officeName1 =onIterator.next();
int idon =officeName1.getId();
while(olIterator.hasNext()){
officeLocation1 = olIterator.next();
int idol = officeLocation1.getId();
if(idon==idol)
{
lhs.add(new LinkedHashSet<String>(Arrays.asList( String.valueOf(officeName1.getId()),officeName1.getOffname(),officeLocation1.getOfflocation())));
olIterator.remove();
break;
}
};
}
I am not sure whether this is correct way to achieve the same as I am new to java. In C#, this could able to achieve through data tables. Please suggest whether there is any faster way?

Assuming both input lists:
Are distinct, with no duplicate id values in either, and…
Are complete, with a single object in both lists for each possible id value
… then we can get the work done with little code.
I use NavigableSet or SortedSet implementations to hold our input lists, the names and the locations. Though I have not verified, I assume being sorted will yield better performance when searching for a match across input collections.
To get the sorting done, we define a Comparator for each input collection: Comparator.comparingInt( OfficeName :: id ) & Comparator.comparingInt( OfficeLocation :: id ) where the double-colons make a method reference. To each NavigableSet we add the contents of our inputs, an unmodifiable list made with the convenient literals syntax of List.of.
To get the actual work done of joining these two input collections, we make a stream of either input collection. Then we produce a new object of our third joined class using inputs from each element of the stream plus its counterpart found via a stream of the other input collection. These newly produced objects of the third joined class are then collected into a list.
NavigableSet < OfficeName > officeNames = new TreeSet <>( Comparator.comparingInt( OfficeName :: id ) );
officeNames.addAll( List.of( new OfficeName( 1 , "Office1" ) , new OfficeName( 2 , "Office2" ) ) );
NavigableSet < OfficeLocation > officeLocations = new TreeSet <>( Comparator.comparingInt( OfficeLocation :: id ) );
officeLocations.addAll( List.of( new OfficeLocation( 1 , "location1" ) , new OfficeLocation( 2 , "locatio2" ) ) );
List < Office > offices = officeNames.stream().map( officeName -> new Office( officeName.id() , officeName.name() , officeLocations.stream().filter( officeLocation -> officeLocation.id() == officeName.id() ).findAny().get().location() ) ).toList();
Results:
officeNames = [OfficeName[id=1, name=Office1], OfficeName[id=2, name=Office2]]
officeLocations = [OfficeLocation[id=1, location=location1], OfficeLocation[id=2, location=locatio2]]
offices = [Office[id=1, name=Office1, location=location1], Office[id=2, name=Office2, location=locatio2]]
Our three classes, the two inputs and the third joined one, are all written as records here for their convenient brevity. This Java 16+ feature is a brief way to declare a class whose main purpose is to communicate data transparently and immutably. The compiler implicitly creates the constructor, getters, equals & hashCode, and toString. Note that a record can be defined locally as well as nested or separate.
public record OfficeName( int id , String name ) { }
public record OfficeLocation( int id , String location ) { }
public record Office( int id , String name , String location ) { }
Given the conditions outlined above, we could optimize by hand-writing loops to manage the matching of objects across the input collections, rather than using streams. But I would not be concerned about the performance impact unless you had huge amounts of data that had proven to be a bottleneck. Otherwise, using streams makes for less code and more fun.

One of the lists (e.g. locations) should be converted into a map (HashMap) by a key on which the joining should be made, in this case id field.
Then, assuming that OfficeDetails class has an all-args constructor, the resulting list may be retrieved by streaming the other list offices and mapping its contents into new OfficeDetails, filling the remaining location argument by looking up the map.
List<OfficeName> offices = Arrays.asList(
new OfficeName(1, "Office1"), new OfficeName(2, "Office2"), new OfficeName(3, "Office3")
);
List<OfficeLocation> locations = Arrays.asList(
new OfficeLocation(1, "Location 1"), new OfficeLocation(2, "Location 2"), new OfficeLocation(4, "Location 4")
);
Map<Integer, OfficeLocation> mapLoc = locations
.stream()
.collect(Collectors.toMap(
OfficeLocation::getId,
loc -> loc,
(loc1, loc2) -> loc1 // to resolve possible duplicates
));
List<OfficeDetails> details = offices
.stream()
.filter(off -> mapLoc.containsKey(off.getId())) // inner join
.map(off -> new OfficeDetails(
off.getId(), off.getOffname(),
mapLoc.get(off.getId()).getOfflocation() // look up the map
))
.collect(Collectors.toList());
details.forEach(System.out::println);
Output (assuming toString is implemented in OfficeDetails):
{id: 1, offname: "Office1", offlocation: "Location 1"}
{id: 2, offname: "Office2", offlocation: "Location 2"}
If offices list is not filtered by condition mapLoc.containsKey, an implementation of LEFT JOIN is possible (when null locations are stored in the resulting OfficeDetails).
To implement RIGHT JOIN (with null office names and all available locations), a lookup map should be created for offices, and main iteration has to be run for locations list.
To implement FULL JOIN (where either name or location parts of OfficeDetails can be null), two maps need to be created and then joined:
Map<Integer, OfficeName> mapOff = offices
.stream()
.collect(Collectors.toMap(
OfficeName::getId,
off -> off,
(off1, off2) -> off1, // to resolve possible duplicates
LinkedHashMap::new
));
List<OfficeDetails> fullDetails = Stream.concat(mapOff.keySet().stream(), mapLoc.keySet().stream())
.distinct()
.map(id -> new OfficeDetails(
id,
Optional.ofNullable(mapOff.get(id)).map(OfficeName::getOffname).orElseGet(()->null),
Optional.ofNullable(mapLoc.get(id)).map(OfficeLocation::getOfflocation).orElseGet(()->null)
))
.collect(Collectors.toList());
fullDetails.forEach(System.out::println);
Output:
{id: 1, offname: "Office1", offlocation: "Location 1"}
{id: 2, offname: "Office2", offlocation: "Location 2"}
{id: 3, offname: "Office3", offlocation: null}
{id: 4, offname: null, offlocation: "Location 4"}

Related

Merging two stream operation into one in Java for performance improvement

I have this object
Class A {
int count;
String name;
}
I have a list of my above custom object as below :
List<A> aList = new ArrayList<>();
A a = new A(1,"abc");
A b = new A(0,"def");
A c = new A(0,"xyz");
aList.add(a);
aList.add(b);
aList.add(c);
I will get this list as input in my service. Now based upon some scenario, first I need to set "count" to ZERO for all elements in the list and based on a check with "name" I need to set the count as ONE for a particular name.
This is how I am doing now :
String tempName = "Some Name like abc/def/xyz";
alist.stream().forEach(x -> x.setCount(0));
aList.stream().filter(x -> x.getName().equalsIgnoreCase(tempName))
.findFirst()
.ifPresent(y -> y.setCount(1));
This is doing my job, but I want to know if I can simplify the above logic and use one single stream instead of two and improve the performance by avoiding looping through the list twice.
Just check if the name matches in the first loop:
alist.forEach(x -> x.setCount(x.getName().equalsIgnoreCase(tempName) ? 1 : 0));

Create a Map of Optionals in Java using Streams

I need to find a method (using streams) to return a Map<Category,Optional<ToDo>, which help me group an ArrayList and give me a ToDo object with the highest priority of each category.
public record ToDo(String name, Category category,
int priority, LocalDate date) {}
public enum Category { HOME, WORK }
An example of the input data:
List<ToDo> todo = List.of(
new ToDo("Eat", Category.HOME, 1, LocalDate.of(2022, 8, 29)),
new ToDo("Sleep", Category.HOME, 2, LocalDate.of(2022, 8, 30)),
new ToDo("Learn", Category.WORK, 2, LocalDate.of(2022, 9, 3)),
new ToDo("Work", Category.WORK, 3, LocalDate.of(2022, 10, 3))
);
And in the end, I want to have something like this as a result:
{HOME=[ToDo{Description='Eat', category=HOME, priority=1, deadline=2022-08-29},
WORK=[ToDo{Description='Learn', category=WORK, priority=2, deadline=2022-09-03]}
I was trying to use
.collect(Collectors.groupingBy(p -> p.getCategory()));
and
.sorted(Comparator.comparing(ToDo::getPriority)).findFirst();
But I can't do it in a single method and get Optional as a result. How can I resolve this problem?
The practice of storing Optionals in a Collection is discouraged.
It might seem as a smart move at first. But in fact you're creating a Map which give you a null or potentially empty Optional, which doesn't sound very handy.
Besides that it goes against the design goal of Optional which is intended to be used as a return type. Optional is meant only for transitioning data (not storing), for that reason it was designed non-serializable, and it might cause issues.
And for every category that would be encountered in the list, there always would be a corresponding ToDo object. If your intention was to have all members of Category in the map in order to be able to safely fire an action on the Optional returned by get() via ifPresent(), then instead you can implement Null-object pattern and map your null-object ToDo to every category that wasn't present in the list via putIfAbsent().
If you want to find ToDo with the highest priority (lowest value of priority) using collector groupingBy() as you've mentioned in the question. Then you can use collector minBy() in conjunction with a collector collectingAndThen() as a downstream of groupingBy(). It would be way more efficient than combination .sorted().findFirst().
But since we need a single value mapped to each key (and not a collection of values) as #Holger has pointed out, the proper way of handling this task is by using collector toMap() instead of groupingBy() + downstream collectors. It results in less verbose and more intuitive code.
List<ToDo> todo = List.of(
new ToDo("Eat", Category.HOME, 1, LocalDate.of(2022, 8, 29)),
new ToDo("Sleep", Category.HOME, 2, LocalDate.of(2022, 8, 30)),
new ToDo("Learn", Category.WORK, 2, LocalDate.of(2022, 9, 3)),
new ToDo("Work", Category.WORK, 3, LocalDate.of(2022, 10, 3))
);
Map<Category, ToDo> highestPriorityTaskByCategory = todo.stream()
.collect(Collectors.toMap(
ToDo::category,
Function.identity(),
BinaryOperator.minBy(Comparator.comparingInt(ToDo::priority))
));
highestPriorityTaskByCategory.forEach((k, v) -> System.out.println(k + " -> " + v));
Output:
WORK -> ToDo[name=Learn, category=WORK, priority=2, date=2022-09-03]
HOME -> ToDo[name=Eat, category=HOME, priority=1, date=2022-08-29]
I assume that the DOM should be HOME, because that's the category inside your object. I also assume that the ToDoList should be an Optional<ToDo>.
The groupingBy you used will return a Map<Category, List<ToDo>>. That's the right key but the wrong value. You need to solve that by also supplying a downstream collector, that will collect ToDo elements:
Map<Category, Optional<ToDo>> result = todo.stream()
.collect(Collectors.groupingBy(
ToDo::getCategory,
Collectors.minBy(Comparator.comparingInt(ToDo::getPriority))
));
You can improve the comparator to Comparator.comparingInt(ToDo::getPriority).thenComparing(ToDo::getDeadline) to find the entry with the earliest deadline in case some have the same lowest priority.
Note that this will only include entries for categories that are part of your input. If you need all, use an additional loop:
for (Category category : Category.values()) {
result.computeIfAbsent(category, c -> Optional.empty());
}

Java Vector of Object and Sort/Group by third value

I have a Vector of customers:
Vector<Customer> customers = Vector<Customer>();
Customer looks like:
public class Customer {
private String firstName;
private String lastName;
private Customer partner;
}
Sample data:
Peter; Doe; Gloria Ven
John, Doe, null
Gloria; Ven; Peter Doe
Jonny; Tab; null
I would like to sort this vector by grouping the partners together. So the result should look like this:
Peter; Doe; Gloria Ven
Gloria; Ven; Peter Doe
John, Doe, null
Jonny; Tab; null
Any ideas how do I do this efficiently?
First I tried to solve this by using a custom Sort function (for collections) by overwriting the compare-Method.
The problem is that I just did it when comparing the same field.
In this case I have to compare basically something like this "this.compareTo(getPartner().getPartner());" but I couldn't get in running.
Second try would be obviously looping over the vector over and over again manually and sort it "by hand" and I would prefer to avoid that, because I thought its a common problem and someone already came up with a better solution.
Thank you!
Note: class Vector is obsolete and may be replaced with other implementations of List interface like ArrayList.
Simpler sorting may be applied by placing the customers without the partners to the end with the help of nullsLast comparator.
Let's assume we have the following test data:
Customer a = new Customer("Peter", "Doe");
Customer x = new Customer("Gloria", "Ven", a);
Customer b = new Customer("Adam", "Swan");
Customer z = new Customer("Mary", "Blake", b);
Customer p = new Customer("John", "Doe");
Customer y = new Customer("Kyle", "Flint");
List<Customer> data = Arrays.asList(a, b, p, x, y, z);
// a Vector may be created similarly
// Vector<Customer> data = new Vector<>(Arrays.asList(a, b, p, x, y, z));
Collections.shuffle(data);
System.out.println(data);
Then custom comparators byLastAndFirst and byPartnerLastAndFirst are implemented and applied to partner field and then self customer.
Comparator<Customer> byLastAndFirst = Comparator.comparing(Customer::getLastName)
.thenComparing(Customer::getFirstName);
Comparator<Customer> byPartnerLastAndFirst = Comparator
.comparing(Customer::getPartner, Comparator.nullsLast(byLastAndFirst))
.thenComparing(byLastAndFirst);
data.sort(byPartnerLastAndFirst);
System.out.println(data);
This provides the following results:
[`Adam Swan` & `Mary Blake`, `Gloria Ven` & `Peter Doe`, `Mary Blake` & `Adam Swan`, `Peter Doe` & `Gloria Ven`, `John Doe` & NULL, `Kyle Flint` & NULL]
However, the partners are not grouped together in this solution.
Additional grouping can be implemented with Stream API by collecting the customers into a sorted map where a key is a sorted list of a customer and a partner, the partner may be null, and then the keys of the map are mapped back to list of the customers using Stream::flatMap and excluding null partners:
Comparator<List<Customer>> sort = Comparator.comparing(list -> list.get(0), byPartnerLastAndFirst);
data.stream()
.collect(Collectors.groupingBy(
c -> Arrays.asList(c, c.getPartner())
.stream()
.sorted(Comparator.nullsLast(byLastAndFirst))
.collect(Collectors.toList()),
() -> new TreeMap<>(sort),
Collectors.toList()
))
.keySet().stream() // Stream<List<Customer>>
.flatMap(list -> list.stream().filter(Objects::nonNull))
.forEach(System.out::println);
Output (may be improved to not show NULL partners)
`Mary Blake` & `Adam Swan`
`Adam Swan` & `Mary Blake`
`Peter Doe` & `Gloria Ven`
`Gloria Ven` & `Peter Doe`
`John Doe` & NULL
`Kyle Flint` & NULL

rearrange an array into a nested array in java

I have a table in sql of doctor names and their clients
Each doctor has multiple clients
And one client can visit multiple do doctors
array and a simple table
[
{doctor="illies",client=4},
{doctor="illies",client=7},
{doctor="illies",client=1},
{doctor="houari",client=5},
{doctor="abdou",client=1},
{doctor="illies",client=2},
{doctor="abdou",client=1},
]
These data are already ordered So the task is To teach client know it's place in the queue
For example
The client with ID 1 Is in the third place in the doctor "illies"
And he's in the first place in the doctor "abdou"
I don't know if I explain it to you well A friend of mine suggest me to
Rearrange the array to a nested array like this (well this array is not totally correct but i has the idea)
[doctor="abdou" => clients=[cleint1="1",client2="2" ], doctor="illies"=>clients=[...] ]
now i just need an idea that could help me with my projet , all this work it to display the queue of the client (the position of the client in the doctor's queue), and thank you so much.
It seems that each row in the input array can be presented as a class like this:
class DocClient {
private String doctor;
private int client;
public String getDoctor() { return this.doctor; }
public int getClient() { return this.client; }
}
Then array or list of <DocClient> needs to be converted not into the "nested array" but into the map where doctor is used as a key, and the value is list of clients: Map<String, Integer> docClients.
This map can be conveniently built using Java Stream API using collectors Collectors.groupingBy and Collector.mapping:
List<DocClient> list = Arrays.asList(
new DocClient("illies", 4), new DocClient("illies", 4), new DocClient("illies", 1),
new DocClient("houari", 5), new DocClient("abdou", 1), new DocClient("illies", 2),
new DocClient("abdou", 2)
);
Map<String, List<Integer>> map = list
.stream()
.collect(Collectors.groupingBy(
DocClient::getDoctor, // use doctor as key via reference to getter
Collectors.mapping(
DocClient::getClient, // use `client` field
Collectors.toList() // convert to list
) // List<Integer> is value in map entry
));
// print the map
// map.forEach((doc, clients) -> System.out.printf("%s -> %s%n", doc. clients));

filter KeyValueGrouped Dataset in spark

I have a typed dataset of a custom class and use groupbykey method on it. You know that it results a KeyValueGroupedDataset. I want to filter this new dataset but there is no filter method for this type of dataset. So, My question is: How can I filter on this type of dataset? (Java solution is needed. spark version: 2.3.1).
sampleData:
"id":1,"fname":"Gale","lname":"Willmett","email":"gwillmett0#nhs.uk","gender":"Female"
"id":2,"fname":"Chantalle","lname":"Wilcher","email":"cwilcher1#blinklist.com","gender":"Female"
"id":3,"fname":"Polly","lname":"Grandisson","email":"pgrandisson2#linkedin.com","gender":"Female"
"id":3,"fname":"Moshe","lname":"Pink","email":"mpink3#twitter.com","gender":"Male"
"id":2,"fname":"Yorke","lname":"Ginnelly","email":"yginnelly4#apple.com","gender":"Male"
And What I did:
Dataset<Person> peopleDS = spark.read().format("parquet").load("\path").as(Encoders.bean(Person.class));
KeyValueGroupedDataset<String, Person> KVDS = peopleDS.groupByKey( (MapFunction<Person, String> ) f -> f.getGender() , Encoders.STRING());
//How Can I filter on KVDS's id field?
Update1 (use of flatMapGroups):
Dataset<Person> persons = KVDS.flatMapGroups((FlatMapGroupsFunction <String,Person,Person>) (f,k) -> (Iterator<Person>) k , Encoders.bean(Person.class));
Update2 (use of MapGroups)
Dataset<Person> peopleMap = KVDS.mapGroups((MapGroupsFunction <String,Person,Person>) (f,g) -> {
while (g.hasNext()) {
//What can I do here?
}
},Encoders.bean(Person.Class);
Update3 : I want to filter those groups that distinct of their ids is greater than 1. for example in below picture: I want just Female groups because distinct of their ids is greater that 1 (first field is id. Others are fname,lname,email and gender).
Update4: I did What I want with "RDD", but I want to do exactly this part of code with "Dataset":
List<Tuple2<String, Iterable<Person>>> f = PersonRDD
.mapToPair(s -> new Tuple2<>(s.getGender(), s)).groupByKey()
.filter(t -> ((Collection<Person>) t._2()).stream().mapToInt(e -> e.getId).distinct().count() > 1)
.collect();
Why don't you filter on id before grouping ? GroupByKey is an expensive action, it should be faster to filter first.
If you really want to group first, you may have to then use .flatMapGroups with identity function.
Not sure about java code but scala version would be something as follow:
peopleDS
.groupByKey(_.gender)
.mapGroups { case (gender, persons) => persons.filter(your condition) }
But again, you should filter first :). Specially since your ID field is already available before grouping.
Grouping is used for aggregation functions, you can find functions like "agg" in "KeyValueGroupedDataset" class. If you apply aggregation function for ex. "count", you will get "Dataset", and "filter" function will be available.
"groupBy" without aggregation function looks strange, other function, for ex. "distinct" can be used.
Filtering example with "FlatMapGroupsFunction":
.flatMapGroups(
(FlatMapGroupsFunction<String, Person, Person>) (f, k) -> {
List<Person> result = new ArrayList<>();
while (k.hasNext()) {
Person value = k.next();
// filter condition here
if (value != null) {
result.add(value);
}
}
return result.iterator();
},
Encoders.bean(Person.class))

Categories

Resources