JAVA - Storing result set in hash table by grouping data efficiently

JAVA - Storing result set in hash table by grouping data efficiently - java

I'd like to store in a hash table a result set coming from a query execution.
The hash table is something like this
Map<List<String>,List<Object>>
where
List<String>, the hash table key, is a subset of the extracted fields
Object is a Java object corresponding to a database tuple (all fields)
So, first, data have to be grouped in order to create each key and group all the items sharing this key.
The pseudo-code related to my current approach is:
while(iterate){
while(rs.next){
if(key is empty)
// build REFERENCE KEY and delete rs entry
else
// build key for i-th rs entry and compare it with the REFERENCE key.
Eventually, get data and delete rs entry
}
rs.beforeFirst()
}
In other words, the result set is iterated many times and each time a new key is created, in order to compare the ramaining result set entries with it. Each time the processed entry is deleted to exit the outer loop.
Since the result set is very large (and also each List(Object) ), performance are poor (a very high loading time per key).
Appending an order by clause to the query (in order to preliminarily group data) doesn't alleviate the problem.
Is there a more efficient approach?
Thanks everyone.
EDIT
Input ResultSet
---------------------------------------------------------------
| Field1 | Field2 | Field3 | Field4 | Field5 | Field6 | Field7 |
---------------------------------------------------------------
| X | A | val1_3 | val1_4 | val1_5 | val1_6 | val1_7 |
| X | A | val2_3 | val2_4 | val2_5 | val2_6 | val2_7 |
| Y | B | val3_3 | val3_4 | val3_5 | val3_6 | val3_7 |
| Z | C | val4_3 | val4_4 | val4_5 | val4_6 | val4_7 |
| Y | D | val5_3 | val5_4 | val5_5 | val5_6 | val5_7 |
----------------------------------------------------------------
Key_Fields : [Field1, Field2]
Output Map
-----------------------------------
| KEY | VALUE |
-----------------------------------
| [X,A] | [Object1, Object2] |
| [Y,B] | [Object3] |
| [Z,C] | [Object4] |
| [Y,D] | [Object5] |
-----------------------------------
I'm using List<String> for key because another ResultSet can have a Key_Fields of different lenght.
Here, my current time-consuming Java code
while(itera){
key = new ArrayList<String>();
values = new ArrayList<AbstractClass>();
while(rs.next()){
if(key.isEmpty()){
// build REFERENCE KEY
// add first OBJECT to List<AbstractClass>
// delete this data from ResultSet
}
else{
// Build KEY_TO_BE_COMPARED
List<String> row_to_be_compared = new ArrayList<String>();
// If this key equals to REFERENCE KEY
if(row_to_be_compared.equals(key)){
AbstractClass value_object = new AbstractClass();
...
rs.deleteRow();
}
// ORDERBY clause in query ensures that, if keys don't match, then all objects related to REFERENCE KEY have been collected
else{
break;
}
}
}
rs.beforeFirst();
map.put(key, values);
if(!rs.next() || items_loaded==max_hash_size)
itera = false;
else
rs.beforeFirst();
}
}

Instead of using List as key. Use a class having List as its instance variable. Override equals very carefully.

Why don't you simplify your key and make it a String that contains all the concatenated fields concatenated by a special character (say .)?

Related

Spark 2.3 with Java8 transform a row to columns

I am new to Spark 2.4 with Java 8. I need help. Here is example of instances:
Source DataFrame
+--------------+
| key | Value |
+--------------+
| A | John |
| B | Nick |
| A | Mary |
| B | Kathy |
| C | Sabrina|
| B | George |
+--------------+
Meta DataFrame
+-----+
| key |
+-----+
| A |
| B |
| C |
| D |
| E |
| F |
+-----+
I would like to transform it to the following: Column names from Meta Dataframe and Rows will be transformed based on Source Dataframe
+-----------------------------------------------+
| A | B | C | D | E | F |
+-----------------------------------------------+
| John | Nick | Sabrina | null | null | null |
| Mary | Kathy | null | null | null | null |
| null | George | null | null | null | null |
+-----------------------------------------------+
Need to write a code Spark 2.3 with Java8. Appreciated your help.

To make things clearer (and easily reproducible) let's define dataframes:
val df1 = Seq("A" -> "John", "B" -> "Nick", "A" -> "Mary",
"B" -> "Kathy", "C" -> "Sabrina", "B" -> "George")
.toDF("key", "value")
val df2 = Seq("A", "B", "C", "D", "E", "F").toDF("key")
From what I see, you are trying to create one column by value in the key column of df2. These columns should contain all the values of the value column that are associated to the key naming the column. If we take an example, column A's first value should be the value of the first occurrence of A (if it exists, null otherwise): "John". Its second value should be the value of the second occurrence of A: "Mary". There is no third value so the third value of the column should be null.
I detailed it to show that we need a notion of rank of the values for each key (windowing function), and group by that notion of rank. It would go as follows:
import org.apache.spark.sql.expressions.Window
val df1_win = df1
.withColumn("id", monotonically_increasing_id)
.withColumn("rank", rank() over Window.partitionBy("key").orderBy("id"))
// the id is just here to maintain the original order.
// getting the keys in df2. Add distinct if there are duplicates.
val keys = df2.collect.map(_.getAs[String](0)).sorted
// then it's just about pivoting
df1_win
.groupBy("rank")
.pivot("key", keys)
.agg(first('value))
.orderBy("rank")
//.drop("rank") // I keep here it for clarity
.show()
+----+----+------+-------+----+----+----+
|rank| A| B| C| D| E| F|
+----+----+------+-------+----+----+----+
| 1|John| Nick|Sabrina|null|null|null|
| 2|Mary| Kathy| null|null|null|null|
| 3|null|George| null|null|null|null|
+----+----+------+-------+----+----+----+
Here is the very same code in Java
Dataset<Row> df1_win = df1
.withColumn("id", functions.monotonically_increasing_id())
.withColumn("rank", functions.rank().over(Window.partitionBy("key").orderBy("id")));
// the id is just here to maintain the original order.
// getting the keys in df2. Add distinct if there are duplicates.
// Note that it is a list of objects, to match the (strange) signature of pivot
List<Object> keys = df2.collectAsList().stream()
.map(x -> x.getString(0))
.sorted().collect(Collectors.toList());
// then it's just about pivoting
df1_win
.groupBy("rank")
.pivot("key", keys)
.agg(functions.first(functions.col("value")))
.orderBy("rank")
// .drop("rank") // I keep here it for clarity
.show();

Need to set values in columns of dataset based on value of 1 column

I have a Dataset<Row> in java. I need to read value of 1 column which is a JSON string, parse it, and set the value of a few other columns based on the parsed JSON value.
My dataset looks like this:
|json | name| age |
========================================
| "{'a':'john', 'b': 23}" | null| null |
----------------------------------------
| "{'a':'joe', 'b': 25}" | null| null |
----------------------------------------
| "{'a':'zack'}" | null| null |
----------------------------------------
And I need to make it like this:
|json | name | age |
========================================
| "{'a':'john', 'b': 23}" | 'john'| 23 |
----------------------------------------
| "{'a':'joe', 'b': 25}" | 'joe' | 25 |
----------------------------------------
| "{'a':'zack'}" | 'zack'|null|
----------------------------------------
I am unable to figure out a way to do it. Please help with the code.

There is a function get_json_object exists in Spark.
Suggesting, you have a data frame named df, you may choose this way to solve your problem:
df.selectExpr("get_json_object(json, '$.a') as name", "get_json_object(json, '$.b') as age" )
But first and foremost, be sure that your json attribute has double quotes instead of single ones.
Note: there is a full list of Spark SQL functions. I am using it heavily. Consider to add it to bookmarks and reference time to time.

You could use UDFs
def parseName(json: String): String = ??? // parse json
val parseNameUDF = udf[String, String](parseName)
def parseAge(json: String): Int = ??? // parse json
val parseAgeUDF = udf[Int, String](parseAge)
dataFrame
.withColumn("name", parseNameUDF(dataFrame("json")))
.withColumn("age", parseAgeUDF(dataFrame("json")))

Spark Dataset - How to create a new column by modifying an existing column value

I have a Dataset like below
Dataset<Row> dataset = ...
dataset.show()
| NAME | DOB |
+------+----------+
| John | 19801012 |
| Mark | 19760502 |
| Mick | 19911208 |
I want to convert it to below (formatted DOB)
| NAME | DOB |
+------+------------+
| John | 1980-10-12 |
| Mark | 1976-05-02 |
| Mick | 1991-12-08 |
How can I do this? Basically, I am trying to figure out how to manipulate existing column string values in a generic way.
I tried using dataset.withColumn but couldn't quite figure out how to achieve this.
Appreciate any help.

With "substring" and "concat" functions:
df.withColumn("DOB_FORMATED",
concat(substring($"DOB", 0, 4), lit("-"), substring($"DOB", 5, 2), lit("-"), substring($"DOB", 7, 2)))

Load the data into a dataframe(deltaData) and just use the following line
deltaData.withColumn("DOB", date_format(to_date($"DOB", "yyyyMMdd"), "yyyy-MM-dd")).show()

Assuming DOB is a String you could write a UDF
def formatDate(s: String): String {
// date formatting code
}
val formatDateUdf = udf(formatDate(_: String))
ds.select($"NAME", formatDateUdf($"DOB").as("DOB"))

Resultset relative skips last record in Java/Access

My Database :
+---------+-----------+---------------+-------------------+---------------+-----------------+--------------------+
| KlantId | KlantNaam | KlantVoornaam | KlantAdres | KlantPostcode | KlantWoonplaats | KlantGeboorteDatum |
+---------+-----------+---------------+-------------------+---------------+-----------------+--------------------+
| 1 | Vervoort | Dieter | Grootstraat 1 | 3500 | Hasselt | 1/01/1991 |
| 2 | Droogmans | Jos | Kleine Laan 2 | 3500 | Hasselt | 5/05/1999 |
| 3 | Severijns | Sarah | Brede Weg 3 | 3570 | Alken | 28/02/1972 |
| 4 | Peeters | Piet | Rondplein 4 | 3600 | Genk | 6/08/1973 |
| 5 | Vreemans | Veerle | Lange Boulevard 5 | 3500 | Hasselt | 4/04/1980 |
+---------+-----------+---------------+-------------------+---------------+-----------------+--------------------+
My code where I read it :
try {
System.out.println("De klanten zijn:");
if (!rs.relative(1)) {
rs.first();
}
System.out.println("Naam en voornaam: " + rs.getString("KlantVoornaam") + " " + rs.getString("KlantNaam"));
ShowMenu();
con.close();
} catch (SQLException e) {
System.out.println(e.getMessage());
System.exit(0);
}
For some odd reason rs.relative(1) returns false on the 4th record ("Peeters Piet") while there is still one left ("Vreemans Veerle") and so it returns to the first record.
Is this a mistake in rs.relative() or in my code?
I know that I could do this with
if (!rs.next()) {
rs.first();
}
But I'm wondering why rs.relative(1) doesn't do the same job.

API documentation explicity says:
Note: Calling the method relative(1) is identical to calling the
method next() and calling the method relative(-1) is identical to
calling the method previous().
So I don't think it should be any difference. Try to debug it, or just add getRow() values in a System.out.println before calling relative(1) to check in which line you are.
EDIT: Looking your code I cannot figure out why is failing, but be aware that ShowMenu should not be called by the rest of the methods: you are chaining calls in the stack.It's very hard to produce it manually, but enough user interactions could crash the application. A better way to do that is having a loop in showMenu(), which would be broken when user press 5.
SOLUTION: You're right, the issue is the mySQL implementation of ResultSet. Check this API doc, where mySQL code says EXACTLY THE OPPOSITE as we expected (Current version of the driver renamed this class to com.mysql.jdbc.ResultSetImpl, but the comment remains there):
Note: Calling relative(1) is different than calling next() since is
makes sense to call next() when there is no current row, for example,
when the cursor is positioned before the first row or after the last
row of the result set.
And method names should start with lowercase (following Java conventions makes code more readable for others ;) )

List the child nodes in tree when parent expanded

I have a problem in implementing the tree structure of OID. when I click the parent , i need to display only child details, not the sub child of a child.
i.e., i need not display an OID which contains a "." (dot).
For example, if my OID structure is private.MIB.sample.first
private.MIB.sample.second and so on.
when I click on MIB, it should display only "sample" not first and second.
first and second is to be displayed when I click sample.
How can I implement this in java.
My datyabase is MySQL. The code which I tried is given below
FilteredRowSet rs = new FilteredRowSetImpl();
// for Other Types Like OBJECT-TYPE, Object_IDENTIFIER
rs = new FilteredRowSetImpl();
rs.setCommand("Select * from MIBNODEDETAILS where " + "mn_OID like '" + OID
+ ".%' order by mn_NodeType, mn_OID");
rs.setUrl(Constants.DB_CONNECTION_URL);
rs.setFilter(new MibRowFilter(1, expString));
rs.execute();
rs.absolute(1);
rs.beforeFirst();
I guess the change is to be made in the setCommand argument.
How can I do this?
Structure of mobnodedetails table
+--------------------+-------------------+-------------+
| mn_OID | mn_name | mn_nodetype |
+--------------------+-------------------+-------------+
| 1 | iso | 0 |
| 1.3 | org | 1 |
| 1.3.6 | dod | 1 |
| 1.3.6.1 | internet | 1 |
| 1.3.6.1.1 | directory | 1 |
| 1.3.6.1.2 | mgmt | 1 |
| 1.3.6.1.2.1 | mib-2 | 0 |
| 1.3.6.1.2.1.1 | system | 1 |
| 1.3.6.1.2.1.10 | transmission | 1 |

You can use something like
SELECT *
FROM mibnodedetails
WHERE mn_oid LIKE "+mn_OID+"%
AND LENGTH ("+mn_OID+") + 2 = LENGTH (mn_oid)
ORDER BY mn_nodetype, mn_oid
So if you pass mm_OID as 1.3.6.1 (|1.3.6.1 |internet |1 |)
You will get following result:
| 1.3.6.1.1 | directory | 1 |
| 1.3.6.1.2 | mgmt | 1 |
Working Demo
PS: This will not work for child more than 9 as we are using length + 2

The function given below dispalys the tree as required.
public void populateMibValues()
{
final DefaultTreeModel model = (DefaultTreeModel) this.m_mibTree.getModel();
model.setRoot(null);
this.rootNode.removeAllChildren();
final String query_MibNodeDetailsSelect = "Select * from MIBNODEDETAILS where LENGTH(mn_oid)<=9 "
+ " and mn_OID<='1.3.6.1.4.1' order by mn_OID"; // only
this.innerNodeNames.clear();
this.innerNodes.clear();
this.innerNodesOid = null;
try {
final ResultSet deviceRS = Application.getDBHandler().executeQuery(query_MibNodeDetailsSelect, null);// inner
// nodes
while (deviceRS.next()) {
final mibNode mb = new mibNode(deviceRS.getString("mn_OID").toString(), deviceRS.getString("mn_name")
.toString());
mb.m_Type = Integer.parseInt(deviceRS.getString("mn_nodetype").toString());
createMibTree(mb);
}
}
catch (final Exception e) {
Application.showErrorInConsole(e);
NmsLogger.writeErrorLog("ERROR creating MIB tree failed", e.toString());
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JAVA - Storing result set in hash table by grouping data efficiently - java

Instead of using List as key. Use a class having List as its instance variable. Override equals very carefully.

Why don't you simplify your key and make it a String that contains all the concatenated fields concatenated by a special character (say .)?

Related

Spark 2.3 with Java8 transform a row to columns

Need to set values in columns of dataset based on value of 1 column

Spark Dataset - How to create a new column by modifying an existing column value

Resultset relative skips last record in Java/Access

List the child nodes in tree when parent expanded

Categories

Resources