How to add a column of counts to an ArrayList - java

I have:
TAG | REVIEW
A | hello
B | yay
A | win
in an ArrayList and I am trying to get:
TAG | COUNT
A | 8 //hello+win =8
B | 3 //yay =3
where count is the total number of characters in all strings with the same tag. I have been reading about Collections and Maps, but I am completely lost. Can someone explain how to solve this in pieces?
1) To get the count:
List<String,Integer> poll_reviewText_count=new ArrayList<>();
for(String l:poll_reviewText){
poll_reviewText_count.add({l[0],l[1].length()}) //TAG, COUNT
}
2) Then I think I need to combine all the instances of TAG that match into one sum. Not sure how to do this.

There isn't such thing as List<V, T> in java. Also you can't use a Map for your data, because inserting this :
TAG | REVIEW
A | hello
B | yay
A | win
In map, A | hello will get replaced by A | win (they have the same key).
A solution will be to create a class that will contain TAG and REVIEW information:
class Bar {
String tag;
String review;
// setters - getters
}
And then using the java stream, you can collect the data how you want:
Map<String, Integer> collect = poll_reviewText_count.stream()
.collect(Collectors.groupingBy(Bar::getTag, Collectors.summingInt(o -> o.getReview().length())));

Related

Stream Filter List based on Combination of values from another List

Need: To filter out data in list - 1 based on the values present in list - 2 with multiple criteria i.e. combination of Date & Order Number
Issue: Able to filter based on 1 criteria. But when I try adding another filter condition it treats it as 2 separate & not as combination. Unable to figure out how to make it as a combination.
Hope issue faced is clear.
Research: I referred to my earlier query on similar need - Link1 . Also checked - Link2
List 1: (All Orders)
[Date | OrderNumber | Time | Company | Rate ]
[2014-10-01 | 12345 | 10:00:01 | CompA | 1000]
[2015-03-01 | 23456 | 08:00:01 | CompA | 2200]
[2016-08-01 | 34567 | 09:00:01 | CompA | 3300]
[2017-09-01 | 12345 | 11:00:01 | CompA | 4400]
[2017-09-01 | 98765 | 12:00:01 | CompA | 7400]
List 2: (Completed Orders)
[Date | OrderNumber | Time]
[2014-10-01 | 12345 | 10:00:01]
[2015-03-01 | 23456 | 08:00:01]
[2016-08-01 | 34567 | 09:00:01]
[2017-09-01 | 98765 | 12:00:01]
Expected O/p after filter :
[Date | OrderNumber | Time | Company | Rate]
[2017-09-01 | 12345 | 11:00:01 | CompA | 4400]
Code:
// Data extracted from MySQL database
// List 1: All Orders
List<ModelAllOrders> listOrders = getDataFromDatabase.getTable1();
// List 2: Completed Orders
List<ModelCompletedOrders> listCompletedOrders = getDataFromDatabase.getTable2();
// Filter with 1 criteria works
Set<Integer> setOrderNumbers = listCompletedOrders.stream().map(ModelCompletedOrders::getOrderNumber).collect(Collectors.toSet());
listOrders = listOrders.stream().filter(p -> !setOrderNumbers.contains(p.getOrderNumber()).collect(Collectors.toList());
// Below not working as expected when trying to combinational filter
Set<LocalDate> setDates = listCompletedOrders.stream().map(ModelCompletedOrders::getDate).collect(Collectors.toSet());
listOrders = listOrders.stream().filter(p -> !setDates.contains(p.getDate()) && !setOrderNumbers.contains(p.getOrderNumber()))
.collect(Collectors.toList());
You've asked for logic that will do this:
The combination of Date & Order Number is unique. I need to check if that unique combination is present in List-2, if yes then filter out, if not then output should contain that row.
Stream::filter() will return a subset of the stream where the filter predicate returns true (i.e. it filters out those objects in the stream where the predicate is false).
listOrders = listOrders.stream().filter(p -> !setDates.contains(p.getDate()) && !setOrderNumbers.contains(p.getOrderNumber()))
.collect(Collectors.toList());
Your code expression here says "show me orders where the order's date does not appear in the list of prior orders AND where the order's order number does not appear in the list of prior orders". Your logical expression is wrong (you're getting confused between what in electronics would be called positive vs negative logic).
You want either:
listOrders = listOrders.stream().filter(p -> !(setDates.contains(p.getDate()) && setOrderNumbers.contains(p.getOrderNumber())))
.collect(Collectors.toList());
"show me orders where both the order's date and order's id are not
present in the list of prior orders"
or:
listOrders = listOrders.stream().filter(p -> !setDates.contains(p.getDate()) || !setOrderNumbers.contains(p.getOrderNumber()))
.collect(Collectors.toList());
"show me orders where either the order's date has not been seen before
OR the order's id has not been seen before"

How to pass Scenario out line data as a object in step method using cucumber-jvm

I am finding a solution to pass each scenario outline example row as object in cucuber-jvm.
So as for example if I consider a scenario
Scenario Outline: example
Given I have a url
When I choose <input_1>
Then page should hold field1 value as <validation field1> field2 value as <validation field2> fieldn value as <validation fieldn>
Examples:
| input_1 | validation field1 |validation field2|validation field n|
| input_1_case_1 | expected value 1 |expected value 1 |expected value n |
So in Step file
public void validationMethod(String validation field2,String validation field2,String validation field3){
............
............
}
So if I have more field then my method also consume more argument.
Now I want to pass all validation field as object in method. So is it possible using cucumber jvm? If possible could any one can please provide some suggestion with sample code.
You could try something like this
Then Use the following values
| <validation field1> | <validation field2> | <validation field3> |
Examples:
| input_1 | validation field1 |validation field2|validation field3 |
| input_1_case_1 | expected value 1 |expected value 2 |expected value 3 |
| input_2_case_2 | expected value 1 |expected value 2 |expected value 3 |
Step Definition
#Then("^Use the following values$")
public void useFollVal(List<String> valFields) {
//The values will be inside the list. Use index to access
}
You can even get an validation object instead of string list ie List<ValidationData>. To do this add a header in the step (not the examples table) with names matching the variables in the ValidationData class and cucumber will populate the data into the object.
Then Use the following values
| valField1 | valField2 | valField3 | <<<--- Header to add
| <validation field1> | <validation field2> | <validation field3> |
valField1 -> private String valField1; in ValidationData
Step Definition
#Then("^Use the following values$")
public void useFollVal(List<ValidationData> valObject) {
}
This is more of a comment: Wouldnt a variable length argument list work for you? You would need to know the sequence of your params though, without the argument names to help out.
public void multiParams(String... val){
}

Create dataframe from rdd objectfile

What is the method to create ddf from an RDD which is saved as objectfile. I want to load the RDD but I don't have a java object, only a structtype I want to use as schema for ddf.
I tried retrieving as Row
val myrdd = sc.objectFile[org.apache.spark.sql.Row]("/home/bipin/"+name)
But I get
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to
org.apache.spark.sql.Row
Is there a way to do this.
Edit
From what I understand, I have to read rdd as array of objects and convert it to row. If anyone can give a method for this, it would be acceptable.
If you have an Array of Object you only have to use the Row apply method for an array of Any. In code will be something like this:
val myrdd = sc.objectFile[Array[Object]]("/home/bipin/"+name).map(x => Row(x))
EDIT
you are rigth #user568109 this will create a Dataframe with only one field that will be an Array to parse the whole array you have to do this:
val myrdd = sc.objectFile[Array[Object]]("/home/bipin/"+name).map(x => Row.fromSeq(x.toSeq))
As #user568109 said there are other ways to do this:
val myrdd = sc.objectFile[Array[Object]]("/home/bipin/"+name).map(x => Row(x:_*))
No matters which one you will because both are wrappers for the same code:
/**
* This method can be used to construct a [[Row]] with the given values.
*/
def apply(values: Any*): Row = new GenericRow(values.toArray)
/**
* This method can be used to construct a [[Row]] from a [[Seq]] of values.
*/
def fromSeq(values: Seq[Any]): Row = new GenericRow(values.toArray)
Let me add some explaination,
suppose you have a mysql table grocery with 3 columns (item,category,price) and its contents as below
+------------+---------+----------+-------+
| grocery_id | item | category | price |
+------------+---------+----------+-------+
| 1 | tomato | veg | 2.40 |
| 2 | raddish | veg | 4.30 |
| 3 | banana | fruit | 1.20 |
| 4 | carrot | veg | 2.50 |
| 5 | apple | fruit | 8.10 |
+------------+---------+----------+-------+
5 rows in set (0.00 sec)
Now, within spark you want to read it, your code will be something like below
val groceryRDD = new JdbcRDD(sc, ()=> DriverManager.getConnection(url,uname,passwd), "select item,price from grocery limit ?,?",1,10,2,r => r.getString("item")+"|"+r.getString("price"))
Note :
In the above statement i converted the ResultSet into String r => r.getString("item")+"|"+r.getString("price")
So my JdbcRDD will be as
groceryRDD: org.apache.spark.rdd.JdbcRDD[String] = JdbcRDD[29] at JdbcRDD at <console>:21
now you save it.
groceryRDD.saveAsObjectFile("/user/cloudera/jdbcobject")
Answer to your question
while reading the object file you need to write as below,
val newJdbObjectFile = sc.objectFile[String]("/user/cloudera/jdbcobject")
In a blind manner ,just substitute the type Parameter of RDD you are saving.
In my case, groceryRDD has a type parameter as String, hence i have used the same
UPDATE:
In your case, as mentioned by jlopezmat, you need to use Array[Object]
Here each row of RDD will be Object, but since you have converted that using ObjectArray each row with its contents will be again saved as Array,
i.e, In my case , if save above RDD as below,
val groceryRDD = new JdbcRDD(sc, ()=> DriverManager.getConnection(url,uname,passwd), "select item,price from grocery limit ?,?",1,10,2,r => JdbcRDD.resultSetToObjectArray(r))
when i read the same using and collect data
val newJdbcObjectArrayRDD = sc.objectFile[Array[Object]]("...")
val result = newJdbObjectArrayRDD.collect
result will be of type Array[Array[Object]]
result: Array[Array[Object]] = Array(Array(raddish, 4.3), Array(banana, 1.2), Array(carrot, 2.5), Array(apple, 8.1))
you can parse the above based on your column definitions.
Please let me know if it answered you question

How to ensure first keys in all containers are unique in whole stack

I have a two command "add" and "undo". With "add" operation, more than one, including one, pair can be sent to the program. On the other hand, with "undo" operation, last sent data should be given back. To implement this functionality, I have thought;
I should have these elements of data structure;
| | HashMap<String, String> : It is used for storing data
| |
| |
| |
--------
Stack : It seems best to "undo" and "add" operations.
At some point, I should, in practice, have below stack;
| |
| |
| ________________________________ |
| | (filename5, date1) | | third HashMap type container
| | (filename6, date5) | | holding three items
| | (filename7, date9) | |
| |______________________________| |
| |
| ________________________________ | second HashMap type container
| | (filename3, date2) | | holding one item
| |______________________________| |
| |
| ________________________________ | first HashMap type container
| | (filename1, date1) | | holding two items
| | (filename2, date2) | |
| |______________________________| |
|----------------------------------|
My question is "How can I ensure first keys in all containers are unique in the whole stack?"
If it is not possible with above data structures, what should I use as a data structure to implement mentioned wish?
Why not just keeping two structures: Stack<String> and check, whether it contains or not filename_x, and second structure - your hashmap HashMap<String, String>?
UPD: In other words, you do not need to hold theese maps inside stack as its elements.
It is solved with three Java Stack, and own stack implementation to make push and pop operations synhronized. Implementation is ;
class OwnStack{
private final Stack<String> file = new Stack<String>()
private final Stack<String> date = new Stack<String>()
private final Stack<String> numberOfElementsPushedAtTheSamTime
= new Stack<String>()
public void push(String []files, String []date{
/* Algorithm
* Iterate over files
* if file is in stack
* donot push anything to stack
* else
* push file to stack
* push category to stack
* push number of entries pushed at the same time to stack (files.length())
*/
}
public void pop(){
/*
* Algorithm
* If size is zero in file stack
* do nothing
* else
* get top number from numberOfElementsPushedAtTheSamTime stack
* iterate over stacks returned number times
* pop files stack
* pop date stack
*/
// I donot need return values
}
}

Adding ArrayList to TreeMap within a while cycle

I'm retrieving data from a table in a database and add the whole row to a TreeMap.
I'm using the following code:
ArrayList<String> listaValores = new ArrayList();
TreeMap<Integer, ArrayList> tmValores = new TreeMap<>();
(...)
while (rs.next())
{
listaValores.clear();
for(int i=2; i<=ncolunas; i++)
{
listaValores.add(rs.getString(i));
}
tmValores.put(rs.getInt(1), listaValores);
}
My TreeMap keys are inserted fine but the values are always repeated as the values from the last line as a result from the SELECT * FROM Table query.
So if my table is:
id | name | last name |
1 | joe | lewis |
2 | mark | spencer |
3 | mike | carter |
4 | duke | melvin |
My TreeMap contains:
1=> duke, melvin
2=> duke, melvin
3=> duke, melvin
4=> duke, melvin
Instead of (as I wanted)
1=> joe, lewis
2=> mark, spencer
3=> mike, carter
4=> duke melvin
Can anyone point me out where is the problem?
I believe you have to reassign the value of listaValores.
Simply change
listaValores.clear();
to
listaValores = new ArrayList<String>();
Objects in Java are passed around as references, so you are in fact adding the same list for all the keys in your map.
Since you clear it at every step and then add some values to it, it will contain just the last row, after the while loop has finished.
What you really want to do is to create an instance of ArrayList for every row and add that to your map, instead of clearing the values in the old one:
while (rs.next())
{
listaValores = new ArrayList<String>();
for(int i = 2; i <= ncolunas; i++)
{
listaValores.add(rs.getString(i));
}
tmValores.put(rs.getInt(1), listaValores);
}

Categories

Resources