Count distinct while aggregating others? - java

This is how my dataset looks like:
+---------+------------+-----------------+
| name |request_type| request_group_id|
+---------+------------+-----------------+
|Michael | X | 1020 |
|Michael | X | 1018 |
|Joe | Y | 1018 |
|Sam | X | 1018 |
|Michael | Y | 1021 |
|Sam | X | 1030 |
|Elizabeth| Y | 1035 |
+---------+------------+-----------------+
I want to calculate the amount of request_type's per person and count unique request_group_id's
Result should be following:
+---------+--------------------+---------------------+--------------------------------+
| name |cnt(request_type(X))| cnt(request_type(Y))| cnt(distinct(request_group_id))|
+---------+--------------------+---------------------+--------------------------------+
|Michael | 2 | 1 | 3 |
|Joe | 0 | 1 | 1 |
|Sam | 2 | 0 | 2 |
|John | 1 | 0 | 1 |
|Elizabeth| 0 | 1 | 1 |
+---------+--------------------+---------------------+--------------------------------+
What I've done so far: (helps to derive first two columns)
msgDataFrame.select(NAME, REQUEST_TYPE)
.groupBy(NAME)
.pivot(REQUEST_TYPE, Lists.newArrayList(X, Y))
.agg(functions.count(REQUEST_TYPE))
.show();
How to count distinct request_group_id's in this select? Is it possible to do within it?
I think it's possible only via two datasets join (my current result + separate aggregation by distinct request_group_id)

Example with "countDistinct" ("countDistinct" is not worked over window, replaced with "size","collect_set"):
val groupIdWindow = Window.partitionBy("name")
df.select($"name", $"request_type",
size(collect_set("request_group_id").over(groupIdWindow)).alias("countDistinct"))
.groupBy("name", "countDistinct")
.pivot($"request_type", Seq("X", "Y"))
.agg(count("request_type"))
.show(false)

Related

Extract elements from ObservableList

My Observable list looks like this : [AR | Argentina | 2 |
AU | Australia | 3 |
BE | Belgium | 1 |
BR | Brazil | 2 |
CA | Canada | 2 |
CH | Switzerland | 1 |
CN | China | 3 |
DE | Germany | 1 |
DK | Denmark | 1 |
EG | Egypt | 4 |
FR | France | 1 |
IL | Israel | 4 |
IN | India | 3 |
IT | Italy | 1 |
JP | Japan | 3 |
KW | Kuwait | 4 |
ML | Malaysia | 3 |
MX | Mexico | 2 |
NG | Nigeria | 4 |
NL | Netherlands | 1 |
SG | Singapore | 3 |
UK | United Kingdom | 1 |
US | United States of America | 2 |
ZM | Zambia | 4 |
ZW | Zimbabwe | 4 |
]
I would like to extract each these words and insert them in tableColumn so it would look like this
https://imgur.com/CXFW68K

Issue with tracing down the array in Java recursion function

I have an issue with recursion in Java. The question is as such:
Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses.
For example, given n = 3, a solution set is:
The code for the above problem is recursive and is as mentioned below:
public List<String> generateParenthesis(int n) {
ArrayList<String> result = new ArrayList<String>();
dfs(result, "", n, n);
return result;
}
public void dfs(ArrayList<String> result, String s, int left, int right){
if(left > right)
return;
if(left==0&&right==0){
result.add(s);
return;
}
if(left>0){
dfs(result, s+"(", left-1, right);
}
if(right>0){
dfs(result, s+")", left, right-1);
}
}
I have been able to trace the program upto a particular point, but I am unable to trace it down totally.
if n=2
left=2;right=2;
result="(())",
__________
| s="" |
| l=2 |
| r=2 |
| |
| |
|________|
|
V
__________
| s=( |
| l 1 |
| r 2 |
| |
| |
|________|
|
V
__________
| s=(( |
| l 0 |
| r 2 |
| |
| |
|________|
|
V
__________
| s=(() |
| l 0 |
| r 1 |
| |
| |
|________|
|
V
__________
| s= (())|
| l=0 |
| r=0 |
| |
| |
|________|
how would the program work after what I have mentioned above? Can someone help me tracing it? Thanks.
From where you left off:
__________
| s=( |
| l=1 |
| r=2 |
| |
| |
|________|
|
V
__________
| s=() |
| l 1 |
| r 1 |
| |
| |
|________|
|
V
__________
| s=()( |
| l 0 |
| r 1 |
| |
| |
|________|
|
V
__________
| s=()() |
| l 0 |
| r 0 |
| |
| |
|________|
If you're using eclipse or any other IDE, it should be easy to set a breakpoint and go through how your program runs line by line (showing all your variables and how they change). If you haven't learned debugging yet, I would encourage you to google it and learn how to debug programs.
What your program is actually doing:
left (l=1, r=2)
left (l=0, r=2)
right (l=0, r=1)
right (l=0, r=0)
add result to s (l=0, r=0)
*here you break out of 3 recursive functions and values of l,r reset to (l=1, r=2)*
right (l=1, r=1)
left (l=0, r=1)
right (l=0, r=0)
add result to s (l=0, r=0)

Spark - sample() function duplicating data?

I want to randomly select a subset of my data and then limit it to 200 entries. But after using the sample() function, I'm getting duplicate rows, and I don't know why. Let me show you:
DataFrame df= sqlContext.sql("SELECT * " +
" FROM temptable" +
" WHERE conditions");
DataFrame df1 = df.select(df.col("col1"))
.where(df.col("col1").isNotNull())
.distinct()
.orderBy(df.col("col1"));
df.show();
System.out.println(df.count());
Up until now, everything is OK. I get the output:
+-----------+
|col1 |
+-----------+
| 10016|
| 10022|
| 100281|
| 10032|
| 100427|
| 100445|
| 10049|
| 10070|
| 10076|
| 10079|
| 10081|
| 10082|
| 100884|
| 10092|
| 10099|
| 10102|
| 10103|
| 101039|
| 101134|
| 101187|
+-----------+
only showing top 20 rows
10512
with 10512 records without duplicates. AND THEN!
df = df.sample(true, 0.5).limit(200);
df.show();
System.out.println(users.count());
This returns 200 rows full of duplicates:
+-----------+
|col1 |
+-----------+
| 10022|
| 100445|
| 100445|
| 10049|
| 10079|
| 10079|
| 10081|
| 10081|
| 10082|
| 10092|
| 10102|
| 10102|
| 101039|
| 101134|
| 101134|
| 101134|
| 101345|
| 101345|
| 10140|
| 10141|
+-----------+
only showing top 20 rows
200
Can anyone tell me why? This is driving me crazy. Thank you!
You explicitly ask for a sample with replacement so there is nothing unexpected about getting duplicates:
public Dataset<T> sample(boolean withReplacement, double fraction)

How to resolve Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2?

Can somebody help me solving this type of error
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
I am searching for data in linked list but when I want to insert the data into an array, it turn up to be like this:
matric | nama | sem | cc | ch | fm
32255 | izzat | 1 | ccs2 | 3 | 45.0
| | 2 | ccs3 | 3 | 56.0
32345 | khai] | 3 | ccs4 | 3 | 45.0
| | 2 | ccs5 | 3 | 2.0
32246 | fifi | 1 | cc1 | 3 | 60.0
| | 1 | ccs3 | 4 | 34.0
34567 | dudu | 2 | ccs2 | 2 | 24.0
| | 2 | ccs4 | 6 | 79.0
first-->34567-->32246-->32345-->32255-->null
first-->6-->2-->4-->3-->3-->3-->3-->3-->null
first-->2-->2-->1-->1-->2-->3-->2-->1-->null
first-->dudu-->fifi-->khai]-->izzat-->null
first-->ccs4-->ccs2-->ccs3-->cc1-->ccs5-->ccs4-->ccs3-->ccs2-->null
first-->79.0-->24.0-->34.0-->60.0-->2.0-->45.0-->56.0-->45.0-->null
42insert matric= 032345
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
2
khai]
2
3
at inputoutput.LinkedList.getcc(LinkedList.java:141)
at inputoutput.baca.getcc(baca.java:84)
at inputoutput.Inputoutput.main(Inputoutput.java:75)
Java Result: 1
BUILD SUCCESSFUL (total time: 7 seconds)
the code:
String[] getcc(int mat,int sub) {
ListObject2 current = first2;
int count=0;
String b[]=new String[2] ;//2 is the subject number==sub
int x=0;
while (current!=null ) {
if(count==((mat*sub)+x) && ((mat*sub)+0)<((mat*sub)+x)<<((mat*sub)+sub)){
b[x]=current.data2;
x++;
}
current=current.next;
count++;
}
return b;
}
but I will get the input if search for last data in the linked list which is 032255
this is the output:
matric | nama | sem | cc | ch | fm
32255 | izzat | 1 | ccs2 | 3 | 45.0
| | 2 | ccs3 | 3 | 56.0
32345 | khai] | 3 | ccs4 | 3 | 45.0
| | 2 | ccs5 | 3 | 2.0
32246 | fifi | 1 | cc1 | 3 | 60.0
| | 1 | ccs3 | 4 | 34.0
34567 | dudu | 2 | ccs2 | 2 | 24.0
| | 2 | ccs4 | 6 | 79.0
first-->34567-->32246-->32345-->32255-->null
first-->6-->2-->4-->3-->3-->3-->3-->3-->null
first-->2-->2-->1-->1-->2-->3-->2-->1-->null
first-->dudu-->fifi-->khai]-->izzat-->null
first-->ccs4-->ccs2-->ccs3-->cc1-->ccs5-->ccs4-->ccs3-->ccs2-->null
first-->79.0-->24.0-->34.0-->60.0-->2.0-->45.0-->56.0-->45.0-->null
42insert matric= 032255
3
izzat
2
1
ccs3//the data i want to search
ccs2//
You're going into the if statement more than twice while walking the list. If you do that, you'll go past the bounds of the b array (which can only hold two values). You should use an ArrayList instead so you can add as many items as you need.

java: how to implement math parsing

I am trying to implement a simple math parser in java. This is for my small school project working with matrices that enables to input some simple equations, such as A^-1(B+C) and then the program asks to input matrices A,B and C and outputs result for these operations.
What I got so far is a class called MathParser, that creates objects of class Operation.
Operation has methods like setOperation ( one of plus,times,inverse,power) and addInput(Matrix|Operation|int) and finally executeOperation() that loops all items from addInput() and executes chosen operation from setOperation. If it finds that some item from the input is instance of class Operation, it executes it first - this is a sort of recurrent calling. It is done this way to manage operation order - multiplying comes before addition etc.
However, I don't find this solution very good. Do you have any ideas how to implement such a task?
I found a blog describing how to parse and execute expressions in a simple calculator then looking on expression trees.
It might be a litle over the top but it should give some tips:
http://community.bartdesmet.net/blogs/bart/archive/2006/10/11/4513.aspx
Well, maybe this solution is not exactly what you need/want to implement or maybe it's a overkill, but I'd go with some scripting engine (for example Groovy). In that case this is how your code would look:
GroovyShell shell = new GroovyShell();
shell.setVariable("a",10);
shell.setVariable("b",20);
int result = ((Number) shell.evaluate("(a+b)/2")).intValue();
Moreover, you can also parse formulas of any complexity or even using your specific calculation functions. You just put it all into the shell and then evaluate the input string.
Added:
Operators do not work with matrices by default, but it is not hard to implement that with groovy as it supports operator overloading (read more about it here: http://groovy.codehaus.org/Operator+Overloading)
So here is an example with matrices:
class Matrix {
private int[][] data;
public Matrix(int[][] data) {
this.data = data;
}
public int[][] getData() {
return data;
}
//Method that overloads the groovy '+' operator
public Matrix plus(Matrix b) {
Matrix result = calculateMatrixSumSomehow(this,b);
return result;
}
}
Now in your call will look like this:
shell.setVariable("A",new Matrix(...));
shell.setVariable("B",new Matrix(...));
Matrix result = (Matrix)shell.evaluate("A+B"); //+ operator will use 'plus' function
The canonical method for parsing mathematical expressions is the shunting yard algorithm. It is a very simple and elegant algorithm, and implementing it will teach you a lot.
http://en.wikipedia.org/wiki/Shunting-yard_algorithm has a good description, complete with a worked example.
Have you considered using embedded scripting?
i released an expression evaluator based on Dijkstra's Shunting Yard algorithm, under the terms of the Apache License 2.0:
http://projects.congrace.de/exp4j/index.html
Have a look at http://bracer.sourceforge.net It's my implementation of shunting-yard algorithm.
You can consider using library built specifically for math expression parsing, such as mXparser. You will get a lot of very helpful options:
1 - Checking expression syntax
import org.mariuszgromada.math.mxparser.*;
...
...
Expression e = new Expression("2+3-");
e.checkSyntax();
mXparser.consolePrintln(e.getErrorMessage());
Result:
[mXparser-v.4.0.0] [2+3-] checking ...
[2+3-] lexical error
Encountered "<EOF>" at line 1, column 4.
Was expecting one of:
"(" ...
"+" ...
"-" ...
<UNIT> ...
"~" ...
"#~" ...
<NUMBER_CONSTANT> ...
<IDENTIFIER> ...
<FUNCTION> ...
"[" ...
[2+3-] errors were found.
[mXparser-v.4.0.0]
2 - Evaluating expression
import org.mariuszgromada.math.mxparser.*;
...
...
Expression e = new Expression("2+3-(10+2)");
mXparser.consolePrintln(e.getExpressionString() + " = " + e.calculate());
Result:
[mXparser-v.4.0.0] 2+3-(10+2) = -7.0
3 - Using built-in functions constants, operators, etc..
import org.mariuszgromada.math.mxparser.*;
...
...
Expression e = new Expression("sin(pi)+e");
mXparser.consolePrintln(e.getExpressionString() + " = " + e.calculate());
Result:
[mXparser-v.4.0.0] sin(pi)+e = 2.718281828459045
4 - Defining your own functions, arguments and constants
import org.mariuszgromada.math.mxparser.*;
...
...
Argument z = new Argument("z = 10");
Constant a = new Constant("b = 2");
Function p = new Function("p(a,h) = a*h/2");
Expression e = new Expression("p(10, 2)-z*b/2", p, z, a);
mXparser.consolePrintln(e.getExpressionString() + " = " + e.calculate());
Result:
[mXparser-v.4.0.0] p(10, 2)-z*b/2 = 0.0
5 - Tokenizing expression string and playing with expression tokens
import org.mariuszgromada.math.mxparser.*;
...
...
Argument x = new Argument("x");
Argument y = new Argument("y");
Expression e = new Expression("2*sin(x)+(3/cos(y)-e^(sin(x)+y))+10", x, y);
mXparser.consolePrintTokens( e.getCopyOfInitialTokens() );
Result:
[mXparser-v.4.0.0] --------------------
[mXparser-v.4.0.0] | Expression tokens: |
[mXparser-v.4.0.0] ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] | TokenIdx | Token | KeyW | TokenId | TokenTypeId | TokenLevel | TokenValue | LooksLike |
[mXparser-v.4.0.0] ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] | 0 | 2 | _num_ | 1 | 0 | 0 | 2.0 | |
[mXparser-v.4.0.0] | 1 | * | * | 3 | 1 | 0 | NaN | |
[mXparser-v.4.0.0] | 2 | sin | sin | 1 | 4 | 1 | NaN | |
[mXparser-v.4.0.0] | 3 | ( | ( | 1 | 20 | 2 | NaN | |
[mXparser-v.4.0.0] | 4 | x | x | 0 | 101 | 2 | NaN | |
[mXparser-v.4.0.0] | 5 | ) | ) | 2 | 20 | 2 | NaN | |
[mXparser-v.4.0.0] | 6 | + | + | 1 | 1 | 0 | NaN | |
[mXparser-v.4.0.0] | 7 | ( | ( | 1 | 20 | 1 | NaN | |
[mXparser-v.4.0.0] | 8 | 3 | _num_ | 1 | 0 | 1 | 3.0 | |
[mXparser-v.4.0.0] | 9 | / | / | 4 | 1 | 1 | NaN | |
[mXparser-v.4.0.0] | 10 | cos | cos | 2 | 4 | 2 | NaN | |
[mXparser-v.4.0.0] | 11 | ( | ( | 1 | 20 | 3 | NaN | |
[mXparser-v.4.0.0] | 12 | y | y | 1 | 101 | 3 | NaN | |
[mXparser-v.4.0.0] | 13 | ) | ) | 2 | 20 | 3 | NaN | |
[mXparser-v.4.0.0] | 14 | - | - | 2 | 1 | 1 | NaN | |
[mXparser-v.4.0.0] | 15 | e | e | 2 | 9 | 1 | NaN | |
[mXparser-v.4.0.0] | 16 | ^ | ^ | 5 | 1 | 1 | NaN | |
[mXparser-v.4.0.0] | 17 | ( | ( | 1 | 20 | 2 | NaN | |
[mXparser-v.4.0.0] | 18 | sin | sin | 1 | 4 | 3 | NaN | |
[mXparser-v.4.0.0] | 19 | ( | ( | 1 | 20 | 4 | NaN | |
[mXparser-v.4.0.0] | 20 | x | x | 0 | 101 | 4 | NaN | |
[mXparser-v.4.0.0] | 21 | ) | ) | 2 | 20 | 4 | NaN | |
[mXparser-v.4.0.0] | 22 | + | + | 1 | 1 | 2 | NaN | |
[mXparser-v.4.0.0] | 23 | y | y | 1 | 101 | 2 | NaN | |
[mXparser-v.4.0.0] | 24 | ) | ) | 2 | 20 | 2 | NaN | |
[mXparser-v.4.0.0] | 25 | ) | ) | 2 | 20 | 1 | NaN | |
[mXparser-v.4.0.0] | 26 | + | + | 1 | 1 | 0 | NaN | |
[mXparser-v.4.0.0] | 27 | 10 | _num_ | 1 | 0 | 0 | 10.0 | |
[mXparser-v.4.0.0] ---------------------------------------------------------------------------------------------------------------
6 - Whats equally important - you will find much more in mXparser tutorial, mXparser math collection and mXparser API definition.
7 - mXparser supports:
JAVA
.NET/MONO
.NET Core
.NET Standard
.NET PCL
Xamarin.Android
Xamarin.iOS
Best regards

Categories

Resources