Use of guava Sets.cartesianProduct with unknown number of arguments - java

I'm stuck in the following problem:
I have a dynamic List of items defined like this:
List<Object> itemList = ArrayList(ArrayList<Object1, Object2, Double[]>)
The Object1 and Object2 are not of interest here, but the array Double[] which contains an arbitrary number of entries.
In my code I iterate over the outer ArrayList and try to calculate the cartesianProduct of guava. Up to now I have something like (partly not working code, sorry ...):
private Set<List<Set<Double>>> getValueCombinations() {
List<Set<Double>> valuesOfInnerArrays = new ArrayList<>();
// Loop over the list of device data sets in the class and add the trim value vectors to a list for further
// processing and cartesian prooduct generation.
for (Integer indexCounter = 0; indexCounter < OuterArrayList.size(); indexCounter++) {
final List<Object> innerDataSet = (List<Object>) OuterArrayList.get(indexCounter);
final Set<Double> innerDoubleArray = ImmutableSet.of(((List<Double>) innerDataSet.get(2)).toArray(new Double[]));
valuesOfInnerArrays.add(innerDoubleArray);
}
ImmutableList<Set<Double>> test = ImmutableList.of(valuesOfInnerArrays)
// generate Cartesian product of all trim vectors = a n x m matrix of all combinations of settings
final Set<List<Set<Double>>> cartesianProduct = Sets.cartesianProduct(ImmutableList.of(valuesOfInnerArrays));
return cartesianProduct;
}
In all the examples I found, they always call cartesianProduct with known Sets, which I cannot do:
Set<Double> first = ImmutableSet.of(1., 2.);
Set<Double> second = ImmutableSet.of(3., 4.);
Set<List<Double>> result =
Sets.cartesianProduct(ImmutableList.of(first, second));
What I would like to have in the end are all cominations of the numbers stored in the inner Double[] arrays.
Any help appreciated.

Thanks to the Post "Java Guava CartesianProduct" I solved my problem. My final solution looks like this:
private Set<List<Double>> getValueCombinations() {
final List<Set<Double>> valuesOfInnerArrays = new ArrayList<>();
// Loop over the list of device data sets in the class and add the value vectors to a list for further
// processing and cartesian product generation.
for (Integer indexCounter = 0; indexCounter < outerArrayList.size(); indexCounter++) {
final List<Object> innerDataSet = (List<Object>) deviceDataSets.get(indexCounter);
final SortedSet<Double> >innerDoubleArray = new TreeSet<>((List<Double>) innerDataSet.get(2));
valuesOfInnerArrays.add(innerDoubleArray);
}
return Sets.cartesianProduct(valuesOfInnerArrays);
}
Additionally I changed the format of my input list:
List<Object> itemList = ArrayList(ArrayList<Object1, Object2, List<Double>>)

Related

How to Sort a Nested List of strings

I want to sort the list by values in the list. I want to do multisorting based on few parameters in the list. Providing sample example how data looks like.
Note: I don't have feasibility to convert List<List<String>> into a list of objects.
List<List<String>> data = new ArrayList<>();
List<String> list1 = new ArrayList<>();
List<String> list2 = new ArrayList<>();
List<String> list3 = new ArrayList<>();
list1.add("Siva");
list1.add("20");
list1.add("Hyd");
list1.add("TA");
list1.add("India");
list2.add("Suresh");
list2.add("22");
list2.add("Banglore");
list2.add("KA");
list2.add("India");
list3.add("Ramesh");
list3.add("24");
list3.add("Chennai");
list3.add("TN");
list3.add("India");
data.add(list1);
data.add(list2);
data.add(list2);
I want to do multi sorting based on name, age and city.
It's just sample data. List of lists is dynamic. Sorting parameters will also change sometimes.
I want to do sorting on a list of lists of strings only.
Expected Output: List<List<String>> sortedData
Solution by Maintaining your Structure
If you can't really create a class wrapping the data in your nested List (for whatever reason), you could use the collection stream and define the sorted operation's logic as follows:
List<List<String>> listRes = data.stream()
.sorted((x, y) -> {
int res = x.get(0).compareTo(y.get(0)); //Comparing by name
if (res != 0) return res;
res = Integer.valueOf(x.get(1)).compareTo(Integer.valueOf(y.get(1))); //Comparing by age (numeric value)
if (res != 0) return res;
return x.get(2).compareTo(y.get(2)); //Comapring by city
})
.collect(Collectors.toList());
Link to test the code above:
https://ideone.com/RhW1VI
Alternative Solution
However, as it has been pointed out in the comments, a better approach would be to create a custom class representing your data in the nested List. Perhaps a simple record if you're using Java 14 or later with a factory method to retrieve an instance of your class from a nested List.
Then, with a stream you could map each nested list to your custom class and sort it with a Comparator.
Here is a snippet of the implementation:
public static void main(String[] args) {
List<List<String>> data = /* ... your initialization ... */
List<MyClass> listSorted = data.stream()
.map(list -> MyClass.createMyClass(list))
.sorted(Comparator.comparing(MyClass::getName).thenComparing(MyClass::getAge).thenComparing(MyClass::getCity))
.collect(Collectors.toList());
System.out.println(listSorted);
}
Mapping record
record MyClass(String name, int age, String city, String code, String country) {
public static MyClass createMyClass(List<String> list) {
if (list == null || list.size() < 5) {
return null;
}
MyClass mc = new MyClass();
mc.name = list.get(0);
mc.age = Integer.valueOf(list.get(1));
mc.city = list.get(2);
mc.code = list.get(3);
mc.country = list.get(4);
return mc;
}
}
Here there is also a link with both implementations:
https://ideone.com/UK9trV
In order to impose the order of lists inside a nested list need to define a Comparator.
As you've said that contents of the list will can't be predicted in advance, I assume that nested lists might be of arbitrary size and their sizes might not be equal.
A comparator that can handle such case might be written like that:
Comparator<List<String>> listComparator = new Comparator<>() {
#Override
public int compare(List<String> o1, List<String> o2) {
int limit = Math.min(o1.size(), o2.size());
for (int i = 0; i < limit; i++) {
int localResult = o1.get(i).compareTo(o2.get(i));
if (localResult != 0)
return localResult;
}
return o1.size() - o2.size();
}
};
In order to sort the list, you can apply method sort() on it (available with Java 8+) which expects a comparator:
data.sort(listComparator);
And you can make a defensive copy of the list before applying sort(), if its initial order might be useful for you:
List<List<String>> initialOrder = new ArrayList<>(data);
data.sort(listComparator);

I'm trying to iterate through two arrays in Java, while also checking to see if the values are equal

I am trying to iterate through many arrays, two at a time. They contain upwards of ten-thousand entries each, including the source. In which I am trying to assign each word to either a noun, verb, adjective, or adverb.
I can't seem to figure a way to compare two arrays without writing an if else statement thousands of times.
I searched on Google and SO for similar issues. I couldn't find anything to move me forward.
package wcs;
import dictionaryReader.dicReader;
import sourceReader.sourceReader;
public class Assigner {
private static String source[], snArray[], svArray[], sadvArray[], sadjArray[];
private static String nArray[], vArray[], advArray[], adjArray[];
private static boolean finished = false;
public static void sourceAssign() {
sourceReader srcRead = new sourceReader();
//dicReader dic = new dicReader();
String[] nArray = dicReader.getnArray(), vArray = dicReader.getvArray(), advArray = dicReader.getAdvArray(),
adjArray = dicReader.getAdjArray();
String source[] = srcRead.getSource();
// Noun Store
for (int i = 0; i < source.length; i++) {
if (source[i] == dicReader.getnArray()[i]) {
source[i] = dicReader.getnArray()[i];
}else{
}
}
// Verb Store
// Adverb Store
// Adjective Store
}
}
Basically this is a simpler way to get a list of items that are in both Lists
// construct a list of item for first list
List<String> firstList = new ArrayList<>(Arrays.asList(new String[0])); // add items
//this function will only keep items in `firstList` if the value is in both lists
firstList.retainAll(Arrays.asList(new String[0]));
// iterate to do your work
for(String val:firstList) {
}

How to count the amount of each object in an array list (Java)?

I am a beginner in Java so this may be a very basic question but I haven't been able to find an answer anywhere.
I would like to know how do you count the amount of each object in an array list, where the items are not named in the code?
I have 2 classes, one to simulate a vending machine, and the other representing the snacks. The constructor of the snacks class initiates the snack type as a string type as so:
public TypeOfSnack(String SnkType)
Then in the vending machine class these can be added onto the arraylist. The field for this is:
private ArrayList<TypeOfSnack> snacks;
I know to get the number of snacks in the array list by doing snacks.size(); But how do I return the amount of each type. So far I have done:
public int countSnacks(String SnkType)
{
return packets.size();
}
Which is just given me the total amount rather than the one passed through the method.
Thanks
You could convert the actual list to a unique item list and after that, you could use the method Collections.frequency to know how many time the same element appears in the list.
Example:
public static void main(String[] args) {
List<Integer> numbers = new ArrayList<>();
numbers.add(5);
numbers.add(5);
numbers.add(5);
numbers.add(3);
numbers.add(5);
numbers.add(5);
numbers.add(2);
numbers.add(5);
Map<Integer, Integer> result = new HashMap<>();
for(Integer unique : new HashSet<>(numbers)) {
result.put(unique, Collections.frequency(numbers, unique));
}
System.out.println(result);
}
Output:
{2=1, 3=1, 5=6}
As far as i understand, you want to store the number of each snack in an ArrayList.
An ArrayList is not suitable for this problem, you should use HashMap.
Maybe something like the following code:
HashMap<String, Integer> snacks = new HashMap<String, Integer>();
snacks.put("Chips", 5);
snacks.put("Fried", 2);
To get the number of a snack:
int numberOfChips = snacks.get("Chips");
Or if you want to do it your way:
public int getNumberOf(String snackType) {
int value = 0;
for(TypeOfSnack s : snacks) {
if(s.getType().equals(snackType)) {
value++;
}
}
}

How to get random string value without duplicate?

I want to fetch only a single company name and I want to fetch it only once. So if it already was fetched, it should not be fetched again.
Here is the code:
private static String[] billercompanies = {
"1st",
"TELUS Communications",
"Rogers Cablesystems",
"Shaw Cable",
"TELUS Mobility Inc",
"Nanaimo Regional District of",
"Credit Union MasterCard",
};
public static String GetBillerCompany(){
String randomBillerComp = "";
randomBillerComp = (billercompanies[new Random().nextInt(billercompanies.length)]);
return randomBillerComp;
}
Just shuffle the array you want using Collections
Collections.shuffle(List);
So simply create a list from your array
List<E> list = Arrays.asList(array);
Then shuffle it using the method above
Collections.shuffle(list);
Your list can be read from left to right as it was random.
So simply save the index
int currentIndex = 0;
public E getRandom(){
//If at the end, start over
if(++currentIndex == list.size()) {
currentIndex = 0;
shuffle(list);
}
return list.get(currentIndex);
}
Each time you want to forget the duplicate list you already used, simply shuffle the array again
Collections.shuffle(list);
Without index
You could simply remove the first value each time, once the list is empty, recreate it with the original array. As Ole V.V. pointer out, a List generated by Arrays.asList(E[]) doesn't support the remove methods so it is necessary to generate a new instance from it.
Here is a quick and simple class using this solution :
public class RandomList<E>{
E[] array;
List<E> list;
public RandomList(E[] array){
this.array = array;
buildList(array);
}
public E getRandom(){
if(list.isEmpty()) buildList(array);
return list.remove(0);
}
public void buildList(E[] array){
list = new ArrayList<E>(Arrays.asList(array));
Collections.shuffle(list);
}
}
And the test was done with this small code :
Integer[] array = {1,2,3,4,5};
RandomList<Integer> rl = new RandomList(array);
int i = 0;
while(i++ < 10)
System.out.println(rl.getRandom());
Make a copy in a List and remove the element when it was already fetched.
Arrays.asList(array) is not modifiable but you can wrap it in a full featured List.
List<String> billercompaniesList = new ArrayList<>(Arrays.asList(billercompanies));
String randomBillerComp = "";
Random random = new Random();
// first retrieval
int index = random.nextInt(billercompaniesList.size());
randomBillerComp = billercompaniesList.get(index);
billercompaniesList.remove(index);
// second retrieval
index = random.nextInt(billercompaniesList.size());
randomBillerComp = billercompaniesList.get(index);
billercompaniesList.remove(index);
// and so for

PairWise matching millions of records

I have an algorithmic problem at hand. To easily explain the problem, I will be using a simple analogy.
I have an input file
Country,Exports
Austrailia,Sheep
US, Apple
Austrialia,Beef
End Goal:
I have to find the common products between the pairs of countries so
{"Austrailia,New Zealand"}:{"apple","sheep}
{"Austrialia,US"}:{"apple"}
{"New Zealand","US"}:{"apple","milk"}
Process :
I read in the input and store it in a TreeMap > Where the List, the strings are interned due to many duplicates.
Essentially, I am aggregating by country.
where Key is country, Values are its Exports.
{"austrailia":{"apple","sheep","koalas"}}
{"new zealand":{"apple","sheep","milk"}}
{"US":{"apple","beef","milk"}}
I have about 1200 keys (countries) and total number of values(exports) is 80 million altogether.
I sort all the values of each key:
{"austrailia":{"apple","sheep","koalas"}} -- > {"austrailia":{"apple","koalas","sheep"}}
This is fast as there are only 1200 Lists to sort.
for(k1:keys)
for(k2:keys)
if(k1.compareTo(k2) <0){ //Dont want to double compare
List<String> intersectList = intersectList_func(k1's exports,k2's exports);
countriespair.put({k1,k2},intersectList)
}
This code block takes so long.I realise it O(n2) and around 1200*1200 comparisions.Thus,Running for almost 3 hours till now..
Is there any way, I can speed it up or optimise it.
Algorithm wise is best option, or are there other technologies to consider.
Edit:
Since both List are sorted beforehand, the intersectList is O(n) where n is length of floor(listOne.length,listTwo.length) and NOT O(n2) as discussed below
private static List<String> intersectList(List<String> listOne,List<String> listTwo){
int i=0,j=0;
List<String> listResult = new LinkedList<String>();
while(i!=listOne.size() && j!=listTwo.size()){
int compareVal = listOne.get(i).compareTo(listTwo.get(j));
if(compareVal==0){
listResult.add(listOne.get(i));
i++;j++;} }
else if(compareVal < 0) i++;
else if (compareVal >0) j++;
}
return listResult;
}
Update 22 Nov
My current implementation is still running for almost 18 hours. :|
Update 25 Nov
I had run the new implementation as suggested by Vikram and a few others. It's been running this Friday.
My question, is that how does grouping by exports rather than country save computational complexity. I find that the complexity is the same. As Groo mentioned, I find that the complexity for the second part is O(E*C^2) where is E is exports and C is country.
This can be done in one statement as a self-join using SQL:
test data. First create a test data set:
Lines <- "Country,Exports
Austrailia,Sheep
Austrailia,Apple
New Zealand,Apple
New Zealand,Sheep
New Zealand,Milk
US,Apple
US,Milk
"
DF <- read.csv(text = Lines, as.is = TRUE)
sqldf Now that we have DF issue this command:
library(sqldf)
sqldf("select a.Country, b.Country, group_concat(Exports) Exports
from DF a, DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
")
giving this output:
Country Country Exports
1 Austrailia New Zealand Sheep,Apple
2 Austrailia US Apple
3 New Zealand US Apple,Milk
with index If its too slow add an index to the Country column (and be sure not to forget the main. parts:
sqldf(c("create index idx on DF(Country)",
"select a.Country, b.Country, group_concat(Exports) Exports
from main.DF a, main.DF b using (Exports)
where a.Country < b.Country
group by a.Country, b.Country
"))
If you run out memory then add the dbname = tempfile() sqldf argument so that it uses disk.
Store something like following datastructure:- (following is a pseudo code)
ValuesSet ={
apple = {"Austrailia","New Zealand"..}
sheep = {"Austrailia","New Zealand"..}
}
for k in ValuesSet
for k1 in k.values()
for k2 in k.values()
if(k1<k2)
Set(k1,k2).add(k)
time complextiy: O(No of distinct pairs with similar products)
Note: I might be wrong but i donot think u can reduce this time complexity
Following is a java implementation for your problem:-
public class PairMatching {
HashMap Country;
ArrayList CountNames;
HashMap ProdtoIndex;
ArrayList ProdtoCount;
ArrayList ProdNames;
ArrayList[][] Pairs;
int products=0;
int countries=0;
public void readfile(String filename) {
try {
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
CountNames = new ArrayList();
Country = new HashMap<String,Integer>();
ProdtoIndex = new HashMap<String,Integer>();
ProdtoCount = new ArrayList<ArrayList>();
ProdNames = new ArrayList();
products = countries = 0;
while((line=br.readLine())!=null) {
String[] s = line.split(",");
s[0] = s[0].trim();
s[1] = s[1].trim();
int k;
if(!Country.containsKey(s[0])) {
CountNames.add(s[0]);
Country.put(s[0],countries);
k = countries;
countries++;
}
else {
k =(Integer) Country.get(s[0]);
}
if(!ProdtoIndex.containsKey(s[1])) {
ProdNames.add(s[1]);
ArrayList n = new ArrayList();
ProdtoIndex.put(s[1],products);
n.add(k);
ProdtoCount.add(n);
products++;
}
else {
int ind =(Integer)ProdtoIndex.get(s[1]);
ArrayList c =(ArrayList) ProdtoCount.get(ind);
c.add(k);
}
}
System.out.println(CountNames);
System.out.println(ProdtoCount);
System.out.println(ProdNames);
} catch (FileNotFoundException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(PairMatching.class.getName()).log(Level.SEVERE, null, ex);
}
}
void FindPairs() {
Pairs = new ArrayList[countries][countries];
for(int i=0;i<ProdNames.size();i++) {
ArrayList curr = (ArrayList)ProdtoCount.get(i);
for(int j=0;j<curr.size();j++) {
for(int k=j+1;k<curr.size();k++) {
int u =(Integer)curr.get(j);
int v = (Integer)curr.get(k);
//System.out.println(u+","+v);
if(Pairs[u][v]==null) {
if(Pairs[v][u]!=null)
Pairs[v][u].add(i);
else {
Pairs[u][v] = new ArrayList();
Pairs[u][v].add(i);
}
}
else Pairs[u][v].add(i);
}
}
}
for(int i=0;i<countries;i++) {
for(int j=0;j<countries;j++) {
if(Pairs[i][j]==null)
continue;
ArrayList a = Pairs[i][j];
System.out.print("\n{"+CountNames.get(i)+","+CountNames.get(j)+"} : ");
for(int k=0;k<a.size();k++) {
System.out.print(ProdNames.get((Integer)a.get(k))+" ");
}
}
}
}
public static void main(String[] args) {
PairMatching pm = new PairMatching();
pm.readfile("Input data/BigData.txt");
pm.FindPairs();
}
}
[Update] The algorithm presented here shouldn't improve time complexity compared to the OP's original algorithm. Both algorithms have the same asymptotic complexity, and iterating through sorted lists (as OP does) should generally perform better than using a hash table.
You need to group the items by product, not by country, in order to be able to quickly fetch all countries belonging to a certain product.
This would be the pseudocode:
inputList contains a list of pairs {country, product}
// group by product
prepare mapA (product) => (list_of_countries)
for each {country, product} in inputList
{
if mapA does not contain (product)
create a new empty (list_of_countries)
and add it to mapA with (product) as key
add this (country) to the (list_of_countries)
}
// now group by country_pair
prepare mapB (country_pair) => (list_of_products)
for each {product, list_of_countries} in mapA
{
for each pair {countryA, countryB} in list_of_countries
{
if mapB does not countain country_pair {countryA, countryB}
create a new empty (list_of_products)
and add it to mapB with country_pair {countryA, countryB} as key
add this (product) to the (list_of_products)
}
}
If your input list is length N, and you have C distinct countries and P distinct products, then the running time of this algorithm should be O(N) for the first part and O(P*C^2) for the second part. Since your final list needs to have pairs of countries mapping to lists of products, I don't think you will be able to lose the P*C^2 complexity in any case.
I don't code in Java too much, so I added a C# example which I believe you'll be able to port pretty easily:
// mapA maps each product to a list of countries
var mapA = new Dictionary<string, List<string>>();
foreach (var t in inputList)
{
List<string> countries = null;
if (!mapA.TryGetValue(t.Product, out countries))
{
countries = new List<string>();
mapA[t.Product] = countries;
}
countries.Add(t.Country);
}
// note (this is very important):
// CountryPair tuple must have value-type comparison semantics,
// i.e. you need to ensure that two CountryPairs are compared
// by value to allow hashing (mapping) to work correctly, in O(1).
// In C# you can also simply use a Tuple<string,string> to
// represent a pair of countries (which implements this correctly),
// but I used a custom class to emphasize the algorithm
// mapB maps each CountryPair to a list of products
var mapB = new Dictionary<CountryPair, List<string>>();
foreach (var kvp in mapA)
{
var product = kvp.Key;
var countries = kvp.Value;
for (int i = 0; i < countries.Count; i++)
{
for (int j = i + 1; j < countries.Count; j++)
{
var pair = CountryPair.Create(countries[i], countries[j]);
List<string> productsForCountryPair = null;
if (!mapB.TryGetValue(pair, out productsForCountryPair))
{
productsForCountryPair = new List<string>();
mapB[pair] = productsForCountryPair;
}
productsForCountryPair.Add(product);
}*
}
}
This is a great example to use Map Reduce.
At your map phase you just collect all the exports that belong to each Country.
Then, the reducer sorts the products (Products belong to the same country, because of mapper)
You will benefit from distributed, parallel algorithm that can be distributed into a cluster.
You are actually taking O(n^2 * time required for 1 intersect).
Lets see if we can improve time for intersect. We can maintain map for every country which stores corresponding products, so you have n hash maps for n countries. Just need to iterate thru all products once for initializing. If you want quick lookup, maintain a map of maps as:
HashMap<String,HashMap<String,Boolean>> countryMap = new HashMap<String, HashMap<String,Boolean>>();
Now if you want to find the common products for countries str1 and str2 do:
HashMap<String,Boolean> map1 = countryMap.get("str1");
HashMap<String,Boolean> map2 = countryMap.get("str2");
ArrayList<String > common = new ArrayList<String>();
Iterator it = map1.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String,Boolean> pairs = (Map.Entry)it.next();
//Add to common if it is there in other map
if(map2.containsKey(pairs.getKey()))
common.add(pairs.getKey());
}
So, total it will be O(n^2 * k) if there are k entries in one map assuming hash map lookup implementation is O(1) (I guess it is log k for java).
Using hashmaps where necessary to speed things up:
1) Go through the data and create a map with keys Items and values a list of countries associated with that item. So e.g. Sheep:Australia, US, UK, New Zealand....
2) Create a hashmap with keys each pair of countries and (initially) an empty list as values.
3) For each Item retrieve the list of countries associated with it and for each pair of countries within that list, add that item to the list created for that pair in step (2).
4) Now output the updated list for each pair of countries.
The largest costs are in steps (3) and (4) and both of these costs are linear in the amount of output produced, so I think this is not too far from optimal.

Categories

Resources