UPDATE-FROM clause in jOOQ throws an expecption for CTE field - java

I am trying to convert following PostgreSQL query into jOOQ:
SET amount = bat.amount
VALUES (2, 136),(5, 75)
) AS bat(book_id, amount)
WHERE book.book_id = bat.book_id;
VALUES inside of FROM-clause are being created from Map<Long, Integer> bookIdsAmountMap parameter and I am trying to perform that this way:
class BookUtilHelper {
static Table<Record2<Long, Integer>> batTmp(DSLContext dsl, Map<Long, Integer> bookIdAmountMapUpdated) {
Row2<Long,Integer> array[] = new Row2[bookIdAmountMapUpdated.size()];
int i = 0;
for (Map.Entry<Long, Integer> pair : bookIdAmountMapUpdated.entrySet()) {
array[i]=DSL.row(pair.getKey(), pair.getValue());
Table<Record2<Long, Integer>> batTmp = DSL.values(array);
batTmp.fields("book_id", "amount");
return batTmp;
Then, I try to also create fields which can be accessed like in this example
Field<Long> bookIdField = DSL.field(DSL.name("bat", "book_id"), Long.class);
Field<Integer> amountField = DSL.field(DSL.name("bat", "amount"), Integer.class);
Table<Record2<Long, Integer>> batTmp = BookUtilHelper.batTmp(dsl, bookIdAmountMapUpdated);
// ctx variable is of type DSLContext
ctx.update(BOOK).set(BOOK.AMOUNT, amountField).from(batTmp.as("bat"))
When I try to update book I get following exception:
column bat.book_id does not exist
Any advice on how to solve this issue would be greatly appreciated. :)

This doesn't have any effect:
batTmp.fields("book_id", "amount");
Whereas this only renames the table, not the columns:
Write this instead:
batTmp.as("bat", "book_id", "amount")


How to pass array of columns to Spark user defined function in Java?

I have dynamic set of columns in my Spark dataset. I want to pass array of columns instead of separate columns. How can we write the UDF function so that, it accepts array of columns.
I have tried passing sequence of strings, but it is failing.
static UDF1<Seq<String>, String> udf = new UDF1<Seq<String>, String>() {
public String call(Seq<String> t1) throws Exception {
return t1.toString();
private static Column generate(Dataset<Row> dataset, SparkSession ss) {
ss.udf().register("generate", udf, DataTypes.StringType);
StructField[] columnsStructType = dataset.schema().fields();
List<Column> columnList = new ArrayList<>();
for (StructField structField : columnsStructType) {
return functions.callUDF("generate", convertListToSeq(columnList));
private static Seq<Column> convertListToSeq(List<Column> inputList) {
return JavaConverters.asScalaIteratorConverter(inputList.iterator()).asScala().toSeq();
I am getting following error message when I tried to invoke generate function
Exception in thread "main" org.apache.spark.sql.AnalysisException: Invalid number of arguments for function generate. Expected: 1; Found: 14;
at org.apache.spark.sql.UDFRegistration.builder$27(UDFRegistration.scala:763)
at org.apache.spark.sql.UDFRegistration.$anonfun$register$377(UDFRegistration.scala:766)
at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:115)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction(SessionCatalog.scala:1273)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$143.$anonfun$applyOrElse$66(Analyzer.scala:1329)
at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:1329)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13$$anonfun$applyOrElse$143.applyOrElse(Analyzer.scala:1312)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:256)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:256)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:261)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:326)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDown$1(QueryPlan.scala:83)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:105)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:105)
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:116)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:121)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:121)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:126)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:83)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:74)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13.applyOrElse(Analyzer.scala:1312)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$13.applyOrElse(Analyzer.scala:1310)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:90)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$.apply(Analyzer.scala:1310)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$.apply(Analyzer.scala:1309)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:87)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:122)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:118)
at scala.collection.immutable.List.foldLeft(List.scala:85)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:76)
at scala.collection.immutable.List.foreach(List.scala:388)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:106)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3407)
at org.apache.spark.sql.Dataset.select(Dataset.scala:1336)
at org.apache.spark.sql.Dataset.withColumns(Dataset.scala:2253)
at org.apache.spark.sql.Dataset.withColumn(Dataset.scala:2220)
In short: you should use the array method to combine your columns into one structure before passing that into your UDF.
This code should work (it's the actual working code after some refactoring).
// The UDF function implementation
static String myFunc(Seq<Object> values) {
Iterator<Object> iterator = values.iterator();
while (iterator.hasNext()) {
Object object = iterator.next();
// Do something with your column value
return ...;
// UDF registration; `sc` here is the Spark SQL context
sc.udf().register("myFunc", (UDF1<Seq<Object>, String>) myFunc, DataTypes.StringType);
// Calling the UDF; note the `array` method
Dataset<Row> ds = ...;
Seq<Column> columns = JavaConversions.asScalaBuffer(Stream
.map(f -> col(f.name()))
ds = ds.withColumn("myColumn", callUDF("myFunc", array(columns)));

Spark - Java UDF returning multiple columns

I'm using sparkSql 1.6.2 (Java API) and I have to process the following DataFrame that has a list of value in 2 columns:
ID AttributeName AttributeValue
0 [an1,an2,an3] [av1,av2,av3]
1 [bn1,bn2] [bv1,bv2]
The desired table is:
ID AttributeName AttributeValue
0 an1 av1
0 an2 av2
0 an3 av3
1 bn1 bv1
1 bn2 bv2
I think I have to use a combination of the explode function and a custom UDF function.
I found the following resources:
Explode (transpose?) multiple columns in Spark SQL table
How do I call a UDF on a Spark DataFrame using JAVA?
and I can successfully run an example that read the two columns and return the concatenation of the first two strings in a column
UDF2 combineUDF = new UDF2<Seq<String>, Seq<String>, String>() {
public String call(final Seq<String> col1, final Seq<String> col2) throws Exception {
return col1.apply(0) + col2.apply(0);
context.udf().register("combineUDF", combineUDF, DataTypes.StringType);
the problem is to write the signature of a UDF returning two columns (in Java).
As far as I understand I must define a new StructType as the one shown below and set that as return type, but so far I didn't manage to have the final code working
StructType retSchema = new StructType(new StructField[]{
new StructField("#AttName", DataTypes.StringType, true, Metadata.empty()),
new StructField("#AttValue", DataTypes.StringType, true, Metadata.empty()),
context.udf().register("combineUDF", combineUDF, retSchema);
Any help will be really appreciated.
UPDATE: I'm trying to implement first the zip(AttributeName,AttributeValue) so then I will need just to apply the standard explode function in sparkSql:
ID AttName_AttValue
0 [[an1,av1],[an1,av2],[an3,av3]]
1 [[bn1,bv1],[bn2,bv2]]
I built the following UDF:
UDF2 combineColumns = new UDF2<Seq<String>, Seq<String>, List<List<String>>>() {
public List<List<String>> call(final Seq<String> col1, final Seq<String> col2) throws Exception {
List<List<String>> zipped = new LinkedList<>();
for (int i = 0, listSize = col1.size(); i < listSize; i++) {
List<String> subRow = Arrays.asList(col1.apply(i), col2.apply(i));
return zipped;
But when I run the code
myDF.select(callUDF("combineColumns", col("AttributeName"), col("AttributeValue"))).show(10);
I got the following error message:
scala.MatchError: [[an1,av1],[an1,av2],[an3,av3]] (of class java.util.LinkedList)
and it looks like the combining has been performed correctly but then the return type is not the expected one in Scala.
Any Help?
Finally I managed to get the result I was looking for but probably not in the most efficient way.
Basically the are 2 step:
Zip of the two list
Explode of the list in rows
For the first step I defined the following UDF Function
UDF2 concatItems = new UDF2<Seq<String>, Seq<String>, Seq<String>>() {
public Seq<String> call(final Seq<String> col1, final Seq<String> col2) throws Exception {
ArrayList zipped = new ArrayList();
for (int i = 0, listSize = col1.size(); i < listSize; i++) {
String subRow = col1.apply(i) + ";" + col2.apply(i);
return scala.collection.JavaConversions.asScalaBuffer(zipped);
Missing the function registration to SparkSession:
and then I called it with the following code:
DataFrame df2 = df.select(col("ID"), callUDF("concatItems", col("AttributeName"), col("AttributeValue")).alias("AttName_AttValue"));
At this stage the df2 looks like that:
ID AttName_AttValue
0 [[an1,av1],[an1,av2],[an3,av3]]
1 [[bn1,bv1],[bn2,bv2]]
Then I called the following lambda function for exploding the list into rows:
DataFrame df3 = df2.select(col("ID"),explode(col("AttName_AttValue")).alias("AttName_AttValue_row"));
At this stage the df3 looks like that:
ID AttName_AttValue
0 [an1,av1]
0 [an1,av2]
0 [an3,av3]
1 [bn1,bv1]
1 [bn2,bv2]
Finally to split the attribute name and value into two different columns, I converted the DataFrame into a JavaRDD in order to use the map function:
JavaRDD df3RDD = df3.toJavaRDD().map(
(Function<Row, Row>) myRow -> {
String[] info = String.valueOf(myRow.get(1)).split(",");
return RowFactory.create(myRow.get(0), info[0], info[1]);
If anybody has a better solution feel free to comment.
I hope it helps.

How I can get items from DynamoDBIndexHashKey?

I would like to specify. May I receive elements only from DynamoDBIndexHashKey, not use DynamoDBHashKey?
I have a table with fields
#DynamoDBIndexHashKey (attributeName = "count", globalSecondaryIndexName = "count-index")
I have no #DynamoDBIndexRangeKey
It's code:
MyEntity myEntity = new MyEntity();
myEntity.setCount(1); // Integer
DynamoDBQueryExpression<NewsDynamoDb> queryExpression = new DynamoDBQueryExpression<NewsDynamoDb>()
List<MyEntity> myCollection = mapper.query(MyEntity.class, queryExpression);
AmazonServiceException: Status Code: 400, AWS Service: AmazonDynamoDBv2, AWS Request ID: I97S04LDGO6FSF56OCJ8S3K167VV4KQNSO5AEMVJF66Q9ASUAAJG, AWS Error Code: ValidationException, AWS Error Message: One or more parameter values were invalid: Invalid number of argument(s) for the EQ ComparisonOperator
How I can get items from DynamoDBIndexHashKey?
P.s. Scan - work but not interesting to me, because in a further I want a sorting
Query with DynamoDBHashKey work. I have problems with DynamoDBIndexHashKey
same example
It is the answer to my question
public Integer getCluster_output_Id() {
return cluster_output_Id;
#DynamoDBIndexHashKey(attributeName = "count", globalSecondaryIndexName = "count-index")
public Integer getCount() {
return count;
#DynamoDBIndexRangeKey(attributeName = "last_fetch", globalSecondaryIndexName = "count-index")
public Date getLast_fetch() {
return last_fetch;
dynamoDBMapper = new DynamoDBMapper(amazonDynamoDBClient);
MyClass myClass= new MyClass();
DynamoDBQueryExpression<MyClass > queryExpression = new DynamoDBQueryExpression<MyClass >();
queryExpression.withIndexName("count-index"); // it's not necessarily
Condition rangeKeyCondition = new Condition();
.withAttributeValueList(new AttributeValue().withS(""));
List entities = dynamoDBMapper.query(MyClass.class, queryExpression);
Thank you!
like explained here
Table table = dynamoDB.getTable("tableName");
Index index = table.getIndex("count-index");
ItemCollection<QueryOutcome> items = null;
QuerySpec querySpec = new QuerySpec();
querySpec.withKeyConditionExpression("count= :v_count > 0 ")
.withValueMap(new ValueMap() .withString(":v_count","1");
items = index.query(querySpec);
while (iterator.hasNext()) {
You cannot use Query to find items based on sort/range key only.
You can read more here.
In a Query operation, you use the KeyConditionExpression parameter to determine the items to be read from the table or index. You must specify the partition key name and value as an equality condition. You can optionally provide a second condition for the sort key (if present).
In this case your options are:
Scan operation with last_fetch as filter.
Redesign your database to have a GSI with last_fetch as partition key

How can I update a row on database by iterating through a stringbuilder?

I am trying to update my database according to the data of "replace". replace have few columns of data and I want to update those columns in my database abcd accordingly.
But when I run this code only the last column that means the last data gets updated in DB and I guess the iteration is not right in this case.
So please help me. I appreciate your suggestions.
private static void updateDB(HashMap<String, HashMap<Integer, String>> map) throws ConfigException, SQLException {
MConfig config = ScriptsTools.init();
ConnPool select = ScriptsTools.openPool("database", config);
Connection write = select.getWrite();
StringBuilder replace = new StringBuilder();
replace.append("REPLACE INTO abcd (a,b,c,d,e) values ");
Iterator<Entry<String, HashMap<Integer, String>>> it = map.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String, HashMap<Integer, String>> pair = it.next();
replace.append("), ");
replace.delete(replace.length()-2, replace.length());
//System.out.println("Query : "+replace.toString());
String words = replace.toString();
Statement st = write.createStatement();
Because you use a Statement instead of a PreparedStatement is easy to print the query to be executed on the database using the following command
Take it and execute it in your preferred database client and check if it works as expected.

Query HBase table by key using RowFilter not working

I have a HBase table (from java) and i want to query the table by list of keys. I did the following, but its not working.
mFilterFeatureIt = mFeatureSet.iterator();
FilterList filterList=new FilterList(FilterList.Operator.MUST_PASS_ONE);
while (mFilterFeatureIt.hasNext()) {
long myfeatureId = mFilterFeatureIt.next();
System.out.println("FeatureId:"+myfeatureId+" , ");
RowFilter filter = new RowFilter(CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(myfeatureId)) );
outputMap = HbaseUtils.getHbaseData("mytable", filterList);
System.out.println("Size of outputMap map:"+ outputMap.szie());
public static Map<String, Map<String, String>> getHbaseData(String table, FilterList filter) {
Map<String, Map<String, String>> data = new HashMap<String, Map<String, String>>();
HTable htable = null;
try {
htable = new HTable(HTableConfiguration.getHTableConfiguration(),table);
Scan scan = new Scan();
ResultScanner resultScanner = htable.getScanner(scan);
Iterator<Result> results = resultScanner.iterator();
while (results.hasNext()) {
Result result = results.next();
String rowId = Bytes.toString(result.getRow());
List<KeyValue> columns = result.list();
if (null != columns) {
HashMap<String, String> colData = new HashMap<String, String>();
for (KeyValue column : columns) {
colData.put(Bytes.toString(column.getFamily()) + ":"+ Bytes.toString(column.getQualifier()),Bytes.toString(column.getValue()));
data.put(rowId, colData);
} catch (IOException e) {
} finally {
if (htable != null)
try {
} catch (IOException e) {
return data;
FeatureId:80515900 ,
FeatureId:80515901 ,
FeatureId:80515902 ,
Size of outputMap map: 0
I see that value of feature id is what i want , but I always get the above output even if the key is present in the hbase table. Can anyone tell me what am i doing wrong ?
I posted the code for my hbase util method too above, so that you can point me to any bugs there.
I am trying to do an SQL equivalent of select * FROM mytable where featureId in (80515900, 80515901, 80515902) My idea to achieve the same in HBase was to create a filter list with one filter for each featureId. Is that correct ?
Here is the content of my table
scan 'mytable', {COLUMNS => ['sample:tag_count'] }
80515900 column=sample:tag_count, timestamp=1339304052748, value=4
80515901 column=sample:tag_count, timestamp=1339304052748, value=0
80515902 column=sample:tag_count, timestamp=1339304052748, value=3
80515903 column=sample:tag_count, timestamp=1339304052748, value=1
80515904 column=sample:tag_count, timestamp=1339304052748, value=2
Its not returning any data as while inserting the data into hbase,
the data-type for key is 'String' (from your scan result) & while fetching, the value passed in RowFilter has 'long' data type. Use this filter:
RowFilter filter = new RowFilter(CompareOp.EQUAL,new
BinaryComparator(Bytes.toBytes(myfeatureId.toString())) );
the while loop will always generate a new filter and added to the filter list.
The circuit are all the keys in the filter. This filter can never apply to a single row. Create only one filter in the while loop pointing to a knowing "myfeatureId".
while (mFilterFeatureIt.hasNext()) {
long myfeatureId = mFilterFeatureIt.next();
System.out.println("FeatureId:"+myfeatureId+" , ");
if ( myfeatureId=="80515902") {
RowFilter filter = new RowFilter(CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(myfeatureId)) );
For rows quantity, the query is responsible. HBase is not
HBase filters
Filters push row selection criteria out to the HBase. Rows can be filtered remotely and in parallel. Using these functions helps you to avoid sending rows to the client that are not needed.
To get a part out of the key, gets all from 80515900 .. 80515909 try this
of course remove from the loop
RowFilter filter = new RowFilter(CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(myfeatureId)) );
and add above the line outputMap = HbaseUtils.getHbaseData("mytable", filterList);
RowFilter filter = new RowFilter(CompareOp.EQUAL,new SubStringComparator("8051590"));
outputMap = HbaseUtils.getHbaseData("mytable", filterList);

