Transactions_Table:
+---------+--------+-------------+--------------+-----+
| DocType | SFCode | Productname | WarrantyCode | QTY |
+---------+--------+-------------+--------------+-----+
| FP | 12 | Item | 1111-01 | 100 | -100
| FP | 12 | Item | 2222-22 | 200 |
| FP | 12 | Item | 3333-33 | 350 | -350
| LP | 12 | Item | 4444-44 | 10 |
| LP | 12 | Item | 5555-55 | 20 |
| LP | 12 | Item | 6666-66 | 35 | -35
| CAS | 12 | Item | 1111-01 | 50 | -(50 Left, show)
| CRS | 12 | Item | 3333-33 | 120 | -(230 Left, show)
| CRS | 12 | Item | 6666-66 | 35 | -(0 Left, no show)
| FPR | 12 | Item | 1111-01 | 10 | -(40 Left, show)
| LPR | 12 | Item | 5555-55 | 20 | -(0 Left, no show)
| CSR | 12 | Item | 1111-01 | 5 | -(50+5 Left, show)
| CRR | 12 | Item | 6666-66 | 5 | -(Got back 5, show)
+---------+--------+-------------+--------------+-----|
KEY:
FP: Foreign Purchase
LP: Local Purchase
CAS: Cash Sale
CRS: Credit Sale
FPR: Foreign Purchase Return
LPR: Local Purchase Return
CSR: Cash Sale Return
CRR: Credit Sale Return
There are many products but for now focussing on a single SFCode "12".
QTY is the Physical Stock PRESENT in the store, and the DocType are the transactions.
There are 2 Things I need to do with this table.
Get Current Stock which is (FP+LP+CSR+CRR) - (FPR+LPR+CAS+CRS) Note: There maybe no transaction of a particular DocType
Get Warranty Code(s) for a Product which has not been Sold Out for a particular Warranty Code. Go from Top to Bottom in Table last Column (not named) and you will get the idea.
Please suggest Java-MySql statement(s) that will help me achieve this result. Any help is appreciated.
Try something like this for #1:
SELECT SFCode, SUM(FP+LP+CSR+CRR-FPR-LPR-CAS-CRS) AS Total FROM
(SELECT SFCode,
SUM(IF(DocType = "FP", QTY, 0)) AS FP,
>>please fill out all the columns<<
FROM Transactions_Table
WHERE SFCode = "12"
GROUP BY DocType);
This is my shot at #2: (This assumes SFCode isn't an integer)
SELECT a.SFCode, a.WarrantyCode, (a.QTY-b.QTY) AS Stock FROM
(SELECT SFCode, WarrantyCode, QTY
FROM Transactions_Table
WHERE SFCode = "12"
AND DocType IN ('FP','LP','CSR','CRR')
GROUP BY WarrantyCode) AS a
LEFT JOIN
(SELECT SFCode, WarrantyCode, QTY
FROM Transactions_Table
WHERE SFCode = "12"
AND DocType IN ('FPR','LPR','CAS','CRS')
GROUP BY WarrantyCode) AS b
ON a.SFCode = b.SFCode AND a.WarrantyCode = b.WarrantyCode;
Can't really test this myself right now but this should at least give you an idea.
Related
I have two dataset as following:
smoothieDs.show()
|smoothie_id | smoothie | price |
|1 | Tropical | 10 |
|2 | Green vegie | 20 |
and:
ingredientDs.show()
|smoothie | ingredient |
|Tropical | Mango |
|Tropical | Passion fruit |
|Green veggie | Cucumber |
|Green veggie | Kiwi |
I want to join two datasets so that I could get ingredient information for each smoothie whose price is lower than 15$, but keep those even if the price is higher, and fill in with a string To be communicated for the ingredient field.
I tried smoothieDs.join(ingredientDs).filter(col(price).lt(15)) and it gives:
|smoothie_id |price | smoothie | ingredient |
|1 |10 | Tropical | Mango |
|1 |10 | Tropical | Passion fruit |
But my expected result should be:
|smoothie_id |price | smoothie | ingredient |
|1 |10 | Tropical | Mango |
|1 |10 | Tropical | Passion fruit |
|2 |20 | Green veggie | To be communicated |
Is it possible to achieve this using join directly, if not what is the best way to achieve this ?
You can replace the ingredient based on the price after the join:
import org.apache.spark.sql.functions._
smoothieDs.join(ingredientDs, "smoothie")
.withColumn("ingredient", when('price.lt(15), 'ingredient).otherwise("To be communicated"))
.distinct()
.show()
Output:
+------------+-----------+-----+------------------+
| smoothie|smoothie_id|price| ingredient|
+------------+-----------+-----+------------------+
|Green veggie| 2| 20|To be communicated|
| Tropical| 1| 10| Mango|
| Tropical| 1| 10| Passion fruit|
+------------+-----------+-----+------------------+
Edit: another option would be to filter the ingredient dataset first and then do the join. This would avoid using distinct but comes at the price of a second join. Depending on the data this can or can not be faster.
smoothieDs.join(
ingredientDs.join(smoothieDs.filter('price.lt(15)), Seq("smoothie"), "left_semi"),
Seq("smoothie"), "left_outer")
.na.fill("To be communicated", Seq("ingredient"))
.show()
My Observable list looks like this : [AR | Argentina | 2 |
AU | Australia | 3 |
BE | Belgium | 1 |
BR | Brazil | 2 |
CA | Canada | 2 |
CH | Switzerland | 1 |
CN | China | 3 |
DE | Germany | 1 |
DK | Denmark | 1 |
EG | Egypt | 4 |
FR | France | 1 |
IL | Israel | 4 |
IN | India | 3 |
IT | Italy | 1 |
JP | Japan | 3 |
KW | Kuwait | 4 |
ML | Malaysia | 3 |
MX | Mexico | 2 |
NG | Nigeria | 4 |
NL | Netherlands | 1 |
SG | Singapore | 3 |
UK | United Kingdom | 1 |
US | United States of America | 2 |
ZM | Zambia | 4 |
ZW | Zimbabwe | 4 |
]
I would like to extract each these words and insert them in tableColumn so it would look like this
https://imgur.com/CXFW68K
This is how my dataset looks like:
+---------+------------+-----------------+
| name |request_type| request_group_id|
+---------+------------+-----------------+
|Michael | X | 1020 |
|Michael | X | 1018 |
|Joe | Y | 1018 |
|Sam | X | 1018 |
|Michael | Y | 1021 |
|Sam | X | 1030 |
|Elizabeth| Y | 1035 |
+---------+------------+-----------------+
I want to calculate the amount of request_type's per person and count unique request_group_id's
Result should be following:
+---------+--------------------+---------------------+--------------------------------+
| name |cnt(request_type(X))| cnt(request_type(Y))| cnt(distinct(request_group_id))|
+---------+--------------------+---------------------+--------------------------------+
|Michael | 2 | 1 | 3 |
|Joe | 0 | 1 | 1 |
|Sam | 2 | 0 | 2 |
|John | 1 | 0 | 1 |
|Elizabeth| 0 | 1 | 1 |
+---------+--------------------+---------------------+--------------------------------+
What I've done so far: (helps to derive first two columns)
msgDataFrame.select(NAME, REQUEST_TYPE)
.groupBy(NAME)
.pivot(REQUEST_TYPE, Lists.newArrayList(X, Y))
.agg(functions.count(REQUEST_TYPE))
.show();
How to count distinct request_group_id's in this select? Is it possible to do within it?
I think it's possible only via two datasets join (my current result + separate aggregation by distinct request_group_id)
Example with "countDistinct" ("countDistinct" is not worked over window, replaced with "size","collect_set"):
val groupIdWindow = Window.partitionBy("name")
df.select($"name", $"request_type",
size(collect_set("request_group_id").over(groupIdWindow)).alias("countDistinct"))
.groupBy("name", "countDistinct")
.pivot($"request_type", Seq("X", "Y"))
.agg(count("request_type"))
.show(false)
I'm building up a series of distribution analysis using Java Spark library. This is the actual code I'm using to fetch the data from a JSON file and save the output.
Dataset<Row> dataset = spark.read().json("local/foods.json");
dataset.createOrReplaceTempView("cs_food");
List<GenericAnalyticsEntry> menu_distribution= spark
.sql(" ****REQUESTED QUERY ****")
.toJavaRDD()
.map(row -> Triple.of( row.getString(0), BigDecimal.valueOf(row.getLong(1)), BigDecimal.valueOf(row.getLong(2))))
.map(GenericAnalyticsEntry::of)
.collect();
writeObjectAsJsonToHDFS(fs, "/local/output/menu_distribution_new.json", menu_distribution);
The query I'm looking for is based on this structure:
+------------+-------------+------------+------------+
| FIRST_FOOD | SECOND_FOOD | DATE | IS_SPECIAL |
+------------+-------------+------------+------------+
| Pizza | Spaghetti | 11/02/2017 | TRUE |
+------------+-------------+------------+------------+
| Lasagna | Pizza | 12/02/2017 | TRUE |
+------------+-------------+------------+------------+
| Spaghetti | Spaghetti | 13/02/2017 | FALSE |
+------------+-------------+------------+------------+
| Pizza | Spaghetti | 14/02/2017 | TRUE |
+------------+-------------+------------+------------+
| Spaghetti | Lasagna | 15/02/2017 | FALSE |
+------------+-------------+------------+------------+
| Pork | Mozzarella | 16/02/2017 | FALSE |
+------------+-------------+------------+------------+
| Lasagna | Mozzarella | 17/02/2017 | FALSE |
+------------+-------------+------------+------------+
How can I achieve this (written below) output from the code written above?
+------------+--------------------+----------------------+
| FOODS | occurrences(First) | occurrences (Second) |
+------------+--------------------+----------------------+
| Pizza | 2 | 1 |
+------------+--------------------+----------------------+
| Lasagna | 2 | 1 |
+------------+--------------------+----------------------+
| Spaghetti | 2 | 3 |
+------------+--------------------+----------------------+
| Mozzarella | 0 | 2 |
+------------+--------------------+----------------------+
| Pork | 1 | 0 |
+------------+--------------------+----------------------+
I've of course tried to figure out a solution by myself but had no luck with the my tries, I may be wrong, but I need something like this:
"SELECT (first_food + second_food) as menu, COUNT(first_food), COUNT(second_food) from cs_food GROUP BY menu"
From the example data, this looks like it will produce the output you want:
select
foods,
first_count,
second_count
from
(select first_food as food from menus
union select second_food from menus) as f
left join (
select first_food, count(*) as first_count from menus
group by first_food
) as ff on ff.first_food=f.food
left join (
select second_food, count(*) as second_count from menus
group by second_food
) as sf on sf.second_food=f.food
;
Simple combination of flatMap and groupBy should do the job like this (sorry, can't check if it 100% correct right now):
import spark.sqlContext.implicits._
val df = Seq(("Pizza", "Pasta"), ("Pizza", "Soup")).toDF("first", "second")
df.flatMap({case Row(first: String, second: String) => Seq((first, 1, 0), (second, 0, 1))})
.groupBy("_1")
Can somebody help me solving this type of error
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
I am searching for data in linked list but when I want to insert the data into an array, it turn up to be like this:
matric | nama | sem | cc | ch | fm
32255 | izzat | 1 | ccs2 | 3 | 45.0
| | 2 | ccs3 | 3 | 56.0
32345 | khai] | 3 | ccs4 | 3 | 45.0
| | 2 | ccs5 | 3 | 2.0
32246 | fifi | 1 | cc1 | 3 | 60.0
| | 1 | ccs3 | 4 | 34.0
34567 | dudu | 2 | ccs2 | 2 | 24.0
| | 2 | ccs4 | 6 | 79.0
first-->34567-->32246-->32345-->32255-->null
first-->6-->2-->4-->3-->3-->3-->3-->3-->null
first-->2-->2-->1-->1-->2-->3-->2-->1-->null
first-->dudu-->fifi-->khai]-->izzat-->null
first-->ccs4-->ccs2-->ccs3-->cc1-->ccs5-->ccs4-->ccs3-->ccs2-->null
first-->79.0-->24.0-->34.0-->60.0-->2.0-->45.0-->56.0-->45.0-->null
42insert matric= 032345
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
2
khai]
2
3
at inputoutput.LinkedList.getcc(LinkedList.java:141)
at inputoutput.baca.getcc(baca.java:84)
at inputoutput.Inputoutput.main(Inputoutput.java:75)
Java Result: 1
BUILD SUCCESSFUL (total time: 7 seconds)
the code:
String[] getcc(int mat,int sub) {
ListObject2 current = first2;
int count=0;
String b[]=new String[2] ;//2 is the subject number==sub
int x=0;
while (current!=null ) {
if(count==((mat*sub)+x) && ((mat*sub)+0)<((mat*sub)+x)<<((mat*sub)+sub)){
b[x]=current.data2;
x++;
}
current=current.next;
count++;
}
return b;
}
but I will get the input if search for last data in the linked list which is 032255
this is the output:
matric | nama | sem | cc | ch | fm
32255 | izzat | 1 | ccs2 | 3 | 45.0
| | 2 | ccs3 | 3 | 56.0
32345 | khai] | 3 | ccs4 | 3 | 45.0
| | 2 | ccs5 | 3 | 2.0
32246 | fifi | 1 | cc1 | 3 | 60.0
| | 1 | ccs3 | 4 | 34.0
34567 | dudu | 2 | ccs2 | 2 | 24.0
| | 2 | ccs4 | 6 | 79.0
first-->34567-->32246-->32345-->32255-->null
first-->6-->2-->4-->3-->3-->3-->3-->3-->null
first-->2-->2-->1-->1-->2-->3-->2-->1-->null
first-->dudu-->fifi-->khai]-->izzat-->null
first-->ccs4-->ccs2-->ccs3-->cc1-->ccs5-->ccs4-->ccs3-->ccs2-->null
first-->79.0-->24.0-->34.0-->60.0-->2.0-->45.0-->56.0-->45.0-->null
42insert matric= 032255
3
izzat
2
1
ccs3//the data i want to search
ccs2//
You're going into the if statement more than twice while walking the list. If you do that, you'll go past the bounds of the b array (which can only hold two values). You should use an ArrayList instead so you can add as many items as you need.