Between function in spark using java - java

I have two dataframe :
Dataframe 1
+-----------------+-----------------+
| hour_Entre | hour_Sortie |
+-----------------+-----------------+
| 18:30:00 | 05:00:00 |
| | |
+-----------------+-----------------+
Dataframe 2
+-----------------+
| hour_Tracking |
+-----------------+
| 19:30:00 |
+-----------------+
I want to take the hour_tracking that are between hour_Entre and hour_Sortie.
I tried the following code :
boolean checked = true;
try{
if(df1.select(col("heureSortie")) != null && df1.select(col("heureEntre")) !=null){
checked = checked && df2.select(col("dateTracking_hour_minute").between(df1.select(col("heureSortie")),df1.select(col("heureEntre"))));
}
} catch (Exception e) {
e.printStackTrace();
}
But I get this error :
Operator && cannot be applied to boolean , 'org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>'

In case you are looking for hour difference -
1st create date difference
from pyspark.sql import functions as F
df = df.withColumn('date_diff', F.datediff(F.to_date(df.hour_Entre), F.to_date(df.hour_Sortie)))
Then calculate hour difference out of that -
df = df.withColumn('hours_diff', (df.date_diff*24) +
F.hour(df.hour_Entre) - F.hour(df.hour_Sortie))

Related

Add value to map if the key has generated dates

I have a class as a helper to an entity
#Data
#Builder
#AllArgsConstructor
#NoArgsConstructor
public class ChildReports {
private LocalDate date;
private BigDecimal amount;
}
I have entries in the database, for example:
| date | amount |
+-----------------+---------------------+
| 2022-06-20 | 10000 |
| 2023-01-15 | 8000 |
| 2023-07-05 | 6500 |
| 2024-02-11 | 5000 |
| 2024-08-18 | 1000 |
Now I want to fill in the gaps between the dates, so that the previous digits are written in those months in which there is no data. At the end it should look something like this:
| date | amount |
+-----------------+---------------------+
| 2022-06-20 | 10000 |
| 2022-07-20 | 10000 |
| 2022-08-20 | 10000 |
| 2022-09-20 | 10000 |
.............
| 2022-12-20 | 10000 |
| 2023-01-15 | 8000 |
| 2023-02-15 | 8000 |
| 2023-03-15 | 8000 |
and so on
In the service, I started writing a method in which I took the entire range of dates, starting from dateStart and ending with dateEnd.
LocalDate dateStart = Objects.requireNonNull(childReports.stream().findFirst().orElse(null)).getDate();
LocalDate dateEnd = Objects.requireNonNull(childReports.stream().reduce((first, second) -> second).orElse(null).getDate());
long monthsBetween = ChronoUnit.MONTHS.between(dateStart, dateEnd);
List<LocalDate> totalMonths = LongStream.iterate(0,i -> i+1)
.limit(monthsBetween).mapToObj(dateStart::plusMonths)
.collect(Collectors.toList());
Map<List<LocalDate>, BigDecimal> map = new HashMap<>();
for (ChildReports childReport : childReports) {
BigDecimal amount = childReport.getAmount();
map.put(totalMonths, amount);
}
System.out.println(map);
I get this interval correctly, but now I want to add a value - amunt, so that at the end the result would come out, as I indicated above.
Can't get this result
Make sure to adjust the start date to be on the same day as the end date before finding the months beween them:
ChronoUnit.MONTHS.between(dateStart.withDayOfMonth(dateEnd.getDayOfMonth()), dateEnd)
The rest of the things should be pretty straightforward using nested loops. Given below is a sample report:
class ReportRow {
private LocalDate date;
private BigDecimal amount;
// Parametrised constructor and getters
#Override
public String toString() {
return date + " | " + amount;
}
}
public class Solution {
public static void main(String[] args) {
List<ReportRow> originalReport = List.of(
new ReportRow(LocalDate.of(2022, 6, 20), BigDecimal.valueOf(10000)),
new ReportRow(LocalDate.of(2023, 1, 15), BigDecimal.valueOf(8000)),
new ReportRow(LocalDate.of(2023, 7, 5), BigDecimal.valueOf(6500)));
System.out.println("Before:");
originalReport.forEach(System.out::println);
List<ReportRow> updatedReport = new ArrayList<>();
int size = originalReport.size();
if (size > 0)
updatedReport.add(originalReport.get(0));
if (size > 1) {
for (int i = 1; i < size; i++) {
ReportRow lastRow = originalReport.get(i - 1);
ReportRow currentRow = originalReport.get(i);
BigDecimal lastAmount = lastRow.getAmount();
LocalDate dateStart = lastRow.getDate();
LocalDate dateEnd = currentRow.getDate();
if (ChronoUnit.MONTHS.between(dateStart.withDayOfMonth(dateEnd.getDayOfMonth()), dateEnd) > 1) {
for (LocalDate date = dateStart.plusMonths(1); date.isBefore(dateEnd); date = date.plusMonths(1))
updatedReport.add(new ReportRow(date, lastAmount));
}
updatedReport.add(currentRow);
}
}
System.out.println("After:");
updatedReport.forEach(System.out::println);
}
}
Output:
Before:
2022-06-20 | 10000
2023-01-15 | 8000
2023-07-05 | 6500
After:
2022-06-20 | 10000
2022-07-20 | 10000
2022-08-20 | 10000
2022-09-20 | 10000
2022-10-20 | 10000
2022-11-20 | 10000
2022-12-20 | 10000
2023-01-15 | 8000
2023-02-15 | 8000
2023-03-15 | 8000
2023-04-15 | 8000
2023-05-15 | 8000
2023-06-15 | 8000
2023-07-05 | 6500
Note: if you plan to use a Map and intend to maintain the order, you should use LinkedHashMap instead of HashMap.

Spark Dataset - How to create a new column by modifying an existing column value

I have a Dataset like below
Dataset<Row> dataset = ...
dataset.show()
| NAME | DOB |
+------+----------+
| John | 19801012 |
| Mark | 19760502 |
| Mick | 19911208 |
I want to convert it to below (formatted DOB)
| NAME | DOB |
+------+------------+
| John | 1980-10-12 |
| Mark | 1976-05-02 |
| Mick | 1991-12-08 |
How can I do this? Basically, I am trying to figure out how to manipulate existing column string values in a generic way.
I tried using dataset.withColumn but couldn't quite figure out how to achieve this.
Appreciate any help.
With "substring" and "concat" functions:
df.withColumn("DOB_FORMATED",
concat(substring($"DOB", 0, 4), lit("-"), substring($"DOB", 5, 2), lit("-"), substring($"DOB", 7, 2)))
Load the data into a dataframe(deltaData) and just use the following line
deltaData.withColumn("DOB", date_format(to_date($"DOB", "yyyyMMdd"), "yyyy-MM-dd")).show()
Assuming DOB is a String you could write a UDF
def formatDate(s: String): String {
// date formatting code
}
val formatDateUdf = udf(formatDate(_: String))
ds.select($"NAME", formatDateUdf($"DOB").as("DOB"))

Dataset Manipulation in Spark Java API

I have a Dataset DS1 below. I want to build DS2 using Spark Java API.
DS1:
+---------+------------+------------+
| account| amount | type |
+---------+------------+------------+
| c1 | 100 | D |
| c1 | 200 | C |
| c2 | 500 | C |
DS2:
amount1 is DS1 amount where type = D and amount2 is DS1 amount where type = C
+---------+------------+------------+
| account| amount1 | amount2 |
+---------+------------+------------+
| c1 | 100 | 200 |
| c2 | 0 | 500 |
Can someone help me please?
For transforming ds1 to ds2 in the expected format, you can use following code-
val ds2 = ds1
.withColumn("amount1", when($"type" === "D", $"amount").otherwise(0))
.withColumn("amount2", when($"type" === "C", $"amount").otherwise(0))
.select("account", "amount1", "amount2")
.groupBy($"account")
.agg(Map("amount1" -> "sum", "amount2" -> "sum"))
I hope it helps!

How do I map a resultset to a nested structure of objects?

I have a result set like this…
+--------------+--------------+----------+--------+
| LocationCode | MaterialCode | ItemCode | Vendor |
+--------------+--------------+----------+--------+
| 1 | 11 | 111 | 1111 |
| 1 | 11 | 111 | 1112 |
| 1 | 11 | 112 | 1121 |
| 1 | 12 | 121 | 1211 |
+--------------+--------------+----------+--------+
And so on for LocationCode 2,3,4 etc. I need an object (to be converted to json, eventually) as : List<Location>
Where the the hierarchy of nested objects in Location Class are..
Location.class
LocationCode
List<Material>
Material.class
MaterialCode
List<Item>
Item.class
ItemCode
Vendor
This corresponds to the resultset, where 1 location has 2 materials, 1 material(11) has 2 Items, 1 item(111) has 2 vendors. How do i achieve this? I have used AliasToBeanResultTransformer before, but i doubt it will be of help in this case.
I don't think there is a neat way to do that mapping. I'd just do it with nested loops, and custom logic to decide when to when to start building the next Location, Material, Item, whatever.
Something like this pseudo-code:
while (row = resultSet.next()) {
if (row.locationCode != currentLocation.locationCode) {
currentLocation = new Location(row.locationCode)
list.add(currentLocation)
currentMaterial = null
} else if (currentMaterial == null ||
row.materialCode != currentMaterial.materialCode) {
currentMaterial = new Material(row.materialCode)
currentLocation.add(currentMaterial)
} else {
currentMaterial.add(new Item(row.itemCode, row.vendorCode))
}
}

List the child nodes in tree when parent expanded

I have a problem in implementing the tree structure of OID. when I click the parent , i need to display only child details, not the sub child of a child.
i.e., i need not display an OID which contains a "." (dot).
For example, if my OID structure is private.MIB.sample.first
private.MIB.sample.second and so on.
when I click on MIB, it should display only "sample" not first and second.
first and second is to be displayed when I click sample.
How can I implement this in java.
My datyabase is MySQL. The code which I tried is given below
FilteredRowSet rs = new FilteredRowSetImpl();
// for Other Types Like OBJECT-TYPE, Object_IDENTIFIER
rs = new FilteredRowSetImpl();
rs.setCommand("Select * from MIBNODEDETAILS where " + "mn_OID like '" + OID
+ ".%' order by mn_NodeType, mn_OID");
rs.setUrl(Constants.DB_CONNECTION_URL);
rs.setFilter(new MibRowFilter(1, expString));
rs.execute();
rs.absolute(1);
rs.beforeFirst();
I guess the change is to be made in the setCommand argument.
How can I do this?
Structure of mobnodedetails table
+--------------------+-------------------+-------------+
| mn_OID | mn_name | mn_nodetype |
+--------------------+-------------------+-------------+
| 1 | iso | 0 |
| 1.3 | org | 1 |
| 1.3.6 | dod | 1 |
| 1.3.6.1 | internet | 1 |
| 1.3.6.1.1 | directory | 1 |
| 1.3.6.1.2 | mgmt | 1 |
| 1.3.6.1.2.1 | mib-2 | 0 |
| 1.3.6.1.2.1.1 | system | 1 |
| 1.3.6.1.2.1.10 | transmission | 1 |
You can use something like
SELECT *
FROM mibnodedetails
WHERE mn_oid LIKE "+mn_OID+"%
AND LENGTH ("+mn_OID+") + 2 = LENGTH (mn_oid)
ORDER BY mn_nodetype, mn_oid
So if you pass mm_OID as 1.3.6.1 (|1.3.6.1 |internet |1 |)
You will get following result:
| 1.3.6.1.1 | directory | 1 |
| 1.3.6.1.2 | mgmt | 1 |
Working Demo
PS: This will not work for child more than 9 as we are using length + 2
The function given below dispalys the tree as required.
public void populateMibValues()
{
final DefaultTreeModel model = (DefaultTreeModel) this.m_mibTree.getModel();
model.setRoot(null);
this.rootNode.removeAllChildren();
final String query_MibNodeDetailsSelect = "Select * from MIBNODEDETAILS where LENGTH(mn_oid)<=9 "
+ " and mn_OID<='1.3.6.1.4.1' order by mn_OID"; // only
this.innerNodeNames.clear();
this.innerNodes.clear();
this.innerNodesOid = null;
try {
final ResultSet deviceRS = Application.getDBHandler().executeQuery(query_MibNodeDetailsSelect, null);// inner
// nodes
while (deviceRS.next()) {
final mibNode mb = new mibNode(deviceRS.getString("mn_OID").toString(), deviceRS.getString("mn_name")
.toString());
mb.m_Type = Integer.parseInt(deviceRS.getString("mn_nodetype").toString());
createMibTree(mb);
}
}
catch (final Exception e) {
Application.showErrorInConsole(e);
NmsLogger.writeErrorLog("ERROR creating MIB tree failed", e.toString());
}

Categories

Resources