Suppose In my LDT(LargeMap) Bin I have following values,
key1, value1
key2, value2
key3, value3
key4, value4
. .
key50, value50
Now, I get my required data using following snippet :
Map<?, ?> myFinalRecord = new HashMap<?, ?>();
// First call to client to get the largeMap associated with the bin
LargeMap largeMap = myDemoClient.getLargeMap(myPolicy, myKey, myLDTBinName, null);
for (String myLDTKey : myRequiredKeysFromLDTBin) {
try {
// Here each get call results in one call to aerospike
myFinalRecord.putAll(largeMap.get(Value.get(myLDTKey)));
} catch (Exception e) {
log.warn("Key does not exist in LDT Bin");
}
}
The problem is here if myRequiredKeysFromLDTBin contains say 20 keys. Then largeMap.get(Value.get(myLDTKey)) will make 20 calls to aerospike.
Thus if I go by retrieval time of 1 ms per transaction , here my one call to retrieve 20 ids from a record will result in 20 calls to aerospike. This will increase my response time to approx. 20 ms !
So is there any way where I can just pass a set of ids to be retrieved from a LDT Bin and it takes only one call to do so ?
There is no direct API to do multi-get. A way of doing this would be call lmap API directly from server multiple time through UDF.
Example 'mymap.lua'
local lmap = require('ldt/lib_lmap');
function getmany(rec, binname, keys)
local resultmap = map()
local keycount = #keys
for i = 1,keycount,1 do
local rc = lmap.exists(rec, binname, keys[i])
if (rc == 1) then
resultmap[keys[i]] = lmap.get(rec, binname, keys[i]);
else
resultmap[keys[i]] = nil;
end
end
return resultmap;
end
Register this lua file
aql> register module 'mymap.lua'
OK, 1 module added.
aql> execute lmap.put('bin', 'c', 'd') on test.demo where PK='1'
+-----+
| put |
+-----+
| 0 |
+-----+
1 row in set (0.000 secs)
aql> execute lmap.put('bin', 'b', 'c') on test.demo where PK='1'
+-----+
| put |
+-----+
| 0 |
+-----+
1 row in set (0.001 secs)
aql> execute mymap.getmany('bin', 'JSON["b","a"]') on test.demo where PK='1'
+--------------------------+
| getmany |
+--------------------------+
| {"a":NIL, "b":{"b":"c"}} |
+--------------------------+
1 row in set (0.000 secs)
aql> execute mymap.getmany('bin', 'JSON["b","c"]') on test.demo where PK='1'
+--------------------------------+
| getmany |
+--------------------------------+
| {"b":{"b":"c"}, "c":{"c":"d"}} |
+--------------------------------+
1 row in set (0.000 secs)
Java Code to invoke this would be
try {
resultmap = myClient.execute(myPolicy, myKey, 'mymap', 'getmany', Value.get(myLDTBinName), Value.getAsList(myRequiredKeysFromLDTBin)
} catch (Exception e) {
log.warn("One of the key does not exist in LDT bin");
}
Value will be set if key exists and it would return NIL if it does not.
Related
I had an unexpected result when try to store with Hibernate (5.6.1) OffsetTime entity properties into Postgresql Time with time zone field.
For ex (if current default zone is +02):
| OffsetTime| Timez |
| -------- | -------- |
| 00:00+01 | 00:00+02 |
| 00:00+02 | 00:00+02 |
| 00:00+03 | 00:00+02 |
Original offset was lost and stored default instead.
I researched two classes:
org.hibernate.type.descriptor.sql.TimeTypeDescriptor
final Time time = javaTypeDescriptor.unwrap( value, Time.class, options );
org.hibernate.type.descriptor.java.OffsetTimeJavaDescriptor
if ( java.sql.Time.class.isAssignableFrom( type ) ) {
return (X) java.sql.Time.valueOf( offsetTime.toLocalTime() );
}
I think, that I had some mistake in understanding this logic, but in another answers I saw recommendation: (LINK)
ZoneOffset zoneOffset = ZoneOffset.systemDefault().getRules()
.getOffset(LocalDateTime.now());
Notification notification = new Notification()
//...
).setClockAlarm(
OffsetTime.of(7, 30, 0, 0, zoneOffset)
);
So, do I must to convert all OffsetTime values to default time zone so that it store correctly?
I have a custom function which is depended on the order of the data. I want to apply this function for each group in spark in parallel (parallel groups). How can I do?
For example,
public ArrayList<Integer> my_logic(ArrayList<Integer> glist) {
Boolean b = true;
ArrayList<Integer> result = new ArrayList<>();
for (int i=1; i<glist.size();I++) { // Size is around 30000
If b && glist[i-1] > glist[i] {
// some logic then set b to false
result.add(glist[i]);
} else {
// some logic then set b to true
}
}
return result;
}
My data,
Col1 Col2
a 1
b 2
a 3
c 4
c 3
…. ….
I want something similar to below
df.group_by(col(“Col1”)).apply(my_logic(col(“Col2”)));
// output
a [1,3,5…]
b [2,5,8…]
…. ….
In Spark, you can use Window Aggregate Functions directly, I will show that here in Scala.
Here is your input data (my preparation):
import scala.collection.JavaConversions._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val schema = StructType(
StructField("Col1", StringType, false) ::
StructField("Col2", IntegerType, false) :: Nil
)
val row = Seq(Row("a", 1),Row("b", 8),Row("b", 2),Row("a", 5),Row("b", 5),Row("a", 3))
val df = spark.createDataFrame(row, schema)
df.show(false)
//input:
// +----+----+
// |Col1|Col2|
// +----+----+
// |a |1 |
// |b |8 |
// |b |2 |
// |a |5 |
// |b |5 |
// |a |3 |
// +----+----+
Here is the code to obtain desired logic :
import org.apache.spark.sql.expressions.Window
df
// NEWCOLUMN: EVALUATE/CREATE LIST OF VALUES FOR EACH RECORD OVER THE WINDOW AS FRAME MOVES
.withColumn(
"collected_list",
collect_list(col("Col2")) over Window
.partitionBy(col("Col1"))
.orderBy(col("Col2"))
)
// NEWCOLUMN: MAX SIZE OF COLLECTED LIST IN EACH WINDOW
.withColumn(
"max_size",
max(size(col("collected_list"))) over Window.partitionBy(col("Col1"))
)
// FILTER TO GET ONLY HIGHEST SIZED ARRAY ROW
.where(col("max_size") - size(col("collected_list")) === 0)
.orderBy(col("Col1"))
.drop("Col2", "max_size")
.show(false)
// output:
// +----+--------------+
// |Col1|collected_list|
// +----+--------------+
// |a |[1, 3, 5] |
// |b |[2, 5, 8] |
// +----+--------------+
Note:
you can just use collect_list() Aggregate function with groupBy directly but, you can not get the collection list ordered.
collect_set() Aggregate function you can explore if you want to eliminate duplicates (with some changes to the above query).
EDIT 2 : You can write your custom collect_list() as a UDAF (UserDefinedAggregateFunction) like this in Scala Spark for DataFrames
Online Docs
For Spark2.3.0
For Latest Version
Below Code Spark Version == 2.3.0
object Your_Collect_Array extends UserDefinedAggregateFunction {
override def inputSchema: StructType = StructType(
StructField("yourInputToAggFunction", LongType, false) :: Nil
)
override def dataType: ArrayType = ArrayType(LongType, false)
override def deterministic: Boolean = true
override def bufferSchema: StructType = {
StructType(
StructField("yourCollectedArray", ArrayType(LongType, false), false) :: Nil
)
}
override def initialize(buffer: MutableAggregationBuffer): Unit = {
buffer(0) = new Array[Long](0)
}
override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
buffer.update(
0,
buffer.getAs[mutable.WrappedArray[Long]](0) :+ input.getLong(0)
)
}
override def merge(
buffer1: MutableAggregationBuffer,
buffer2: Row
): Unit = {
buffer1.update(
0,
buffer1.getAs[mutable.WrappedArray[Long]](0) ++ buffer2
.getAs[mutable.WrappedArray[Long]](0)
)
}
override def evaluate(buffer: Row): Any =
buffer.getAs[mutable.WrappedArray[Long]](0)
}
//Below is the query with just one line change i.e., calling above written custom udf
df
// NEWCOLUMN : USING OUR CUSTOM UDF
.withColumn(
"your_collected_list",
Your_Collect_Array(col("Col2")) over Window
.partitionBy(col("Col1"))
.orderBy(col("Col2"))
)
// NEWCOLUMN: MAX SIZE OF COLLECTED LIST IN EACH WINDOW
.withColumn(
"max_size",
max(size(col("your_collected_list"))) over Window.partitionBy(col("Col1"))
)
// FILTER TO GET ONLY HIGHEST SIZED ARRAY ROW
.where(col("max_size") - size(col("your_collected_list")) === 0)
.orderBy(col("Col1"))
.drop("Col2", "max_size")
.show(false)
//Output:
// +----+-------------------+
// |Col1|your_collected_list|
// +----+-------------------+
// |a |[1, 3, 5] |
// |b |[2, 5, 8] |
// +----+-------------------+
Note:
UDFs are not that efficient in spark hence, use them only when you absolutely need them. They are mainly focused for data analytics.
I created a dataset in Spark using Java by reading a csv file. Following is my initial dataset:
+---+----------+-----+---+
|_c0| _c1| _c2|_c3|
+---+----------+-----+---+
| 1|9090999999|NANDU| 22|
| 2|9999999999| SANU| 21|
| 3|9999909090| MANU| 22|
| 4|9090909090|VEENA| 23|
+---+----------+-----+---+
I want to create dataframe as follows (one column having null values):
+---+----+--------+
|_c0| _c1| _c2|
+---+----|--------+
| 1|null| NANDU|
| 2|null| SANU|
| 3|null| MANU|
| 4|null| VEENA|
+---+----|--------+
Following is my existing code:
Dataset<Row> ds = spark.read().format("csv").option("header", "false").load("/home/nandu/Data.txt");
Column [] selectedColumns = new Column[2];
selectedColumns[0]= new Column("_c0");
selectedColumns[1]= new Column("_c2");
ds2 = ds.select(selectedColumns);
which will create dataset as follows.
+---+-----+
|_c0| _c2|
+---+-----+
| 1|NANDU|
| 2| SANU|
| 3| MANU|
| 4|VEENA|
+---+-----+
To select the two columns you want and add a new one with nulls you can use the following:
import org.apache.spark.sql.functions.*;
import org.apache.spark.sql.types.StringType;
ds.select({col("_c0"), lit(null).cast(DataTypes.StringType).as("_c1"), col("_c2")});
Try Following code
import org.apache.spark.sql.functions.{ lit => flit}
import org.apache.spark.sql.types._
val ds = spark.range(100).withColumn("c2",$"id")
ds.withColumn("new_col",flit(null: String)).selectExpr("id","new_col","c2").show(5)
Hope this Helps
Cheers :)
Adding new column with string null value may solve the problem. Try the following code although it's written in scala but you'll get the idea:
import org.apache.spark.sql.functions.lit
import org.apache.spark.sql.types.StringType
val ds2 = ds.withColumn("new_col", lit(null).cast(StringType)).selectExpr("_c0", "new_col as _c1", "_c2")
I have a jpql query instanciates a java object in select clause
public List<ChampEtatOT> getEtatOT(Date dateDebut, Date dateFin) {
Query query = em.createQuery("SELECT NEW ChampEtatOT( ot.numero, uo.denominationFr, ot.etat, ot.dateDebutReelle , ot.dateFinReelle, :dateParam1, :dateParam2, :dateParam3) FROM ordre ot JOIN ot.unite uo")
.setParameter("dateParam1", dateDebut, TemporalType.DATE)
.setParameter("dateParam2", dateFin, TemporalType.DATE)
.setParameter("dateParam3", new Date("2015-01-01"), TemporalType.DATE);
return query.getResultList();
}
I put 3 parameters, so i can pass it in the constructor
I get this error
Caused by: Exception [EclipseLink-6137] (Eclipse Persistence Services - 2.3.2.v20111125-r10461): org.eclipse.persistence.exceptions.QueryExceptionException Description: An Exception was thrown while executing a ReportQuery with a constructor expression: java.lang.NoSuchMethodException: dz.elit.gmao.commun.reporting.classe.ChampEtatOT.<init>(java.lang.String, java.lang.String, java.lang.String, java.util.Date, java.util.Date)Query: ReportQuery(referenceClass=TravOrdreTravail jpql="SELECT NEW dz.elit.gmao.commun.reporting.classe.ChampEtatOT( ot.numero, uo.denominationFr, ot.etat, ot.dateDebutReelle , ot.dateFinReelle, :dateParam1, :dateParam2, :dateParam3) FROM TravOrdreTravail ot JOIN ot.uniteOrganisationnellle uo")
I think that it's not possible to put parameters in a select clause so does anyone have an idea, the constructor method is as follows:
public ChampEtatOT(String numero, String denominationFr, String etat, Date dateDebutReelle, Date dateFinReelle, Date dateParam1, Date dateParam2, Date dateParam3) {
this.numero = numero;
this.denominationFr = denominationFr;
if (etat.equals("OUV")) {
if (dateDebutReelle.before(dateParam1)) {
etatEntreeSortie = "En instance debut du mois";
} else {
if (dateDebutReelle.before(dateParam2)) {
etatEntreeSortie = "En instance fin du mois";
} else {
if (dateDebutReelle.after(dateParam1) && dateDebutReelle.before(dateParam2)) {
etatEntreeSortie = "Entree/Mois";
}
}
}
}
}
Problem solved, as you suggested bRIMOs Bor it's not possible to pass parameters in a SELECT clause, so i have retreived all the results in a List than filtered the results according to the three dates date1, date2, date3
Query query = em.createQuery("SELECT NEW ChampEtatAteliers"
+ "( ot.numero, uo.denominationFr, ot.etat, ot.dateDebutReelle, ot.dateFinReelle) "
+ "FROM ordre ot JOIN ot.unite uo");
List<ChampEtatAteliers> champEtatAtelierses = query.getResultList();
for (ChampEtatAteliers champEtatAtelierse : champEtatAtelierses) {
if (champEtatAtelierse.getDateDebutReelle().compareTo(date1) >= 0 && champEtatAtelierse.getDateDebutReelle().compareTo(date2) <= 0) {
champEtatAtelierList2.add(new ChampEtatAteliers(champEtatAtelierse.getNumero(), champEtatAtelierse.getDenominationFr(), "Entree/Mois"));
}
if (champEtatAtelierse.getEtat().equals("OUV")) {
if (champEtatAtelierse.getDateDebutReelle().compareTo(date1) < 0) {
champEtatAtelierse.setEtatEntreeSortie("En instance début du mois");
} else {
if (champEtatAtelierse.getDateDebutReelle().compareTo(date2) <= 0) {
champEtatAtelierse.setEtatEntreeSortie("En instance fin du mois");
}
}
}
}
I think that it's not possible to reference a parameter in the contructor.
in your case it throws a NoSuchMethodexeption : it means that, no method with the current signature in your ChampEtatOT class (5 parameters instead of 8 )
you can refer to this answer => Passing a parameter in a jpql query select
So ,try to retrive all data then make a filter method to set all the etatEntreeSortie values inside the ChampEtatOT class of the ResultList
Clearly the JPQL BNF does permit passing parameters as constructor arguments.
constructor_expression ::= NEW constructor_name ( constructor_item {, constructor_item}* )
constructor_item ::= single_valued_path_expression | scalar_expression | aggregate_expression |
identification_variable
scalar_expression ::= simple_arithmetic_expression | string_primary | enum_primary |
datetime_primary | boolean_primary | case_expression | entity_type_expression
string_primary ::= state_field_path_expression | string_literal |
input_parameter | functions_returning_strings | aggregate_expression | case_expression
i.e a scalar_expression can be a string_primary, which can be an input_parameter. So your JPA provider is not meeting the JPA spec and you should raise a bug on it.
Is is possible to have a RowExpander that is not HTML but rather another Row? That is, a row have a expand [+] icon then when expanded, sub rows appear like a "child-row""?
For example I have a List<ModelData> like this:
ModelData column1 = new BaseModelData();
column1.set("Date", "11-11-11");
column1.set("Time", "11:11:11");
column1.set("Code", "abcdef");
column1.set("Status", "OK");
ModelData column2 = new BaseModelData();
column2.set("Date", "11-11-11");
column2.set("Time", "12:11:11");
column2.set("Code", "abcdef");
column2.set("Status", "Failed");
ModelData column3 = new BaseModelData();
column3.set("Date", "11-11-11");
column3.set("Time", "13:11:11");
column3.set("Code", "abcedf");
column3.set("Status", "Failed");
ModelData column4 = new BaseModelData();
column4.set("Date", "11-11-11");
column4.set("Time", "14:11:11");
column4.set("Code", "abcdef");
column4.set("Status", "Failed");
List<ModelData> data = ...
data.add(model1);
data.add(model2);
data.add(model3);
data.add(model4);
And that this will be rendered in the Grid as two columns (Grouped by the Code and Status column):
Date | Time | Code | Status
-------------------------------------
11-11-11 | 11:11:11 | abcedf | OK
[+] 11-11-11 | 12:11:11 | abcedf | Failed
|--->11-11-11 | 13:11:11 | abcedf | Failed
|--->11-11-11 | 14:11:11 | abcedf | Failed
Something like this.
Update:
I was advised that the solution would be to extends the RowExpander class and merge with GridView class.
You can take a look at GroupingView and TreeGrid and customize one of them for you purposes. It is much safer than trying to reuse GridView's rows rendering functionality.