QueryDSL and SubQuery with Tuple condition

QueryDSL and SubQuery with Tuple condition - java

I am trying to write a query in QueryDSL to fetch the oldest elements of a table grouped by their parentId.
The SQL equivalent should be:
SELECT a.* FROM child a
INNER JOIN
(
SELECT parentId, MAX(revision) FROM child GROUP BY parentId
) b
ON (
a.parentId = b.parentId AND a.revision = b.revision
)
Now in QueryDSL I'm stuck with the syntax.
JPQLQuery<Tuple> subquery = JPAExpressions
.select(child.parent, child.revision.max())
.from(child)
.groupBy(child.parent);
HibernateQuery<Child> query = new HibernateQuery<>(session);
query.from(child)
.where(child.parent.eq(subquery.???).and(child.revision.eq(subquery.???))));
How do you write this query using a subquery ?
The tables are looking like this :
___parent___ (not used in this query, but exists)
parentId
P1 | *
P2 | *
P3 | *
___child___
parentId | revision
P1 | 1 | *
P1 | 2 | *
P1 | 3 | *
P2 | 2 | *
P2 | 3 | *
P3 | 1 | *
___result from child, the highest revision for each parentId___
P1 | 3 | *
P2 | 3 | *
P3 | 1 | *
What I've tried so far :
.where(JPAExpressions.select(child.parent,child.revision).eq(subquery));
-> org.hibernate.hql.internal.ast.QuerySyntaxException: unexpected end of subtree
and many syntax errors ...
I use a dirty loop, for now, since I haven't found a solution yet.

You can use Expressions.list() to specify more than one column for the in clause:
query.from(child).where(Expressions.list(child.parent, child.revision).in(subquery));
The alternative is to use innerJoin(), as in your original SQL.

Expressions.list(ENTITY.year, ENTITY.week).in(//
Expressions.list(Expressions.constant(1029), Expressions.constant(1)),
Expressions.list(Expressions.constant(1030), Expressions.constant(1)),
Expressions.list(Expressions.constant(1031), Expressions.constant(1))
would be what you are looking for, but QueryDSL generates wrong SQL from it:
((p0_.year , p0_.week) in (1029 , 1 , (1030 , 1) , (1031 , 1)))

In JPA subqueries can appear only in the where part.
Here is my take on your query
select(child).from(child).where(child.revision.eq(
select(child2.revision.max())
.from(child2)
.where(child2.parent.eq(child.parent))
.groupBy(child2.parent))).fetch()

Building off of JRA_TLL's answer - nested use of Expressions.list() is, per the maintainers of QueryDSL, not supported. Choice quote:
This is not really a bug, is just improper use of QueryDSL's internal list expressions for tuples.
[...]
The solution is quite simply: don't use the list expression for this. This use a Template expression.
Here's a version of JRA_TLL's answer with the Template paradigm recommended by the maintainers, which does the right thing:
public static SimpleExpression<Object> tuple(Expression<?>... args) {
return Expressions.template(Object.class, "({0}, {1})", args);
}
// ...
Expressions.list(ENTITY.year, ENTITY.week).in(
tuple(Expressions.constant(1029), Expressions.constant(1)),
tuple(Expressions.constant(1030), Expressions.constant(1)),
tuple(Expressions.constant(1031), Expressions.constant(1)));
This should generate the right SQL:
(p0_.year , p0_.week) in ((1029 , 1) , (1030 , 1) , (1031 , 1))
Note that this kind of query construction doesn't work for all kinds of databases; for instance, this is valid in PostgresSQL, but isn't valid in Microsoft SQL Server.

Related

How to successfully separate "ordered items" from "product items" in a shopping software system design?

I am developing an app for ordering products online. I will first present you a part of the ER Diagram and then explain.
Here the "products" are food items and they will go inside the fresh_products table. As you can see a fresh_product consists of product species, category, type, size and product grade.
Then when I place an order, the order id and order_status will be saved in order table. All the ordered items will be saved in the order_item table. In order_item table you can see I have a direct connection with the fresh_products.
At certain times, the management will decide to change the existing fresh_products. For an example, lets take fish products such as Tuna. This fish item has a category called Loin(fished with Loin), after sometime management decide to remove it or rename it because they no longer fish with Loin.
Now, my design will be affected because of the above change. Why? When you make a change to any of the fresh_product fields, it will directly affect the order_item which holds the information of already ordered items. So if you rename Loin , all order_items which has Loin will now be linked to the new name which is wrong. If you decide to delete a fresh_product you can't do that either, because there are existing order history bound to that fresh_product.
My Suggestion
As a solution for this problem, I am thinking of removing the relationship of fresh_products from order_item. Then, I will add String fields into order_item representing all fields of fresh_products, for an example productType, prodyctCategory and so on.
Now I don't have a direct connection with fresh_products but have all the information needed. At the same time, anyfresh_product item can undertake any change without affecting already purchased items at order_item.
My question is, is my suggestion the best way to solve this issue? If you have better solutions, I am open.

Consider the following; in this example, only the name product changes, but you could easily add columns for each attribute (although at some point this would move from sublime to ridiculous)...
Also, I rarely use correlated subqueries, so apologies if there's an error there...
create table product_history
(product_id int not null
,product_name varchar(12) not null
,date date not null
,primary key (product_id,date)
);
insert into product_history values
(1,'soda','2020-01-01'),
(1,'pop','2020-01-04'),
(1,'cola','2020-01-07');
create table order_detail
(order_id int not null
,product_id int not null
,quantity int not null default 1
,primary key(order_id,product_id)
);
insert into order_detail values
(100,1,4);
create table orders
(order_id serial primary key
,customer_id int not null
,order_date date not null
);
insert into orders values
(100,22,'2020-01-05');
SELECT o.order_id
, o.customer_id
, o.order_date
, ph.product_id
, ph.product_name
, od.quantity
FROM orders o
JOIN order_detail od
ON od.order_id = o.order_id
JOIN product_history ph
ON ph.product_id = od.product_id
WHERE ph.date IN
( SELECT MAX(x.date)
FROM product_history x
WHERE x.product_id = ph.product_id
AND x.date <= o.order_date
);
| order_id | customer_id | order_date | product_id | product_name | quantity |
| -------- | ----------- | ---------- | ---------- | ------------ | -------- |
| 100 | 22 | 2020-01-05 | 1 | pop | 4 |
---
View on DB Fiddle

How to query Oracle via JDBC for an intersection

I need to check from Java if a certain user has at least one group membership. In Oracle (12 by the way) there is a big able that looks like this:
DocId | Group
-----------------
1 | Group-A
1 | Group-E
1 | Group-Z
2 | Group-A
3 | Group-B
3 | Group-W
In Java I have this information:
docId = 1
listOfUsersGroups = { "Group-G", "Group-A", "Group-Of-Something-Totally-Different" }
I have seen solutions like this, but this is not the approach I want to go for. I would like to do something like this (I know this is incorrect syntax) ...
SELECT * FROM PERMSTABLE WHERE DOCID = 1 AND ('Group-G', 'Group-A', 'Group-Of-Something-Totally-Different' ) HASATLEASTONE OF Group
... and not use any temporary SQL INSERTs. The outcome should be that after executing this query I know that my user has a match because he is member of Group-A.

You can do this (using IN condition):
SELECT * FROM PERMSTABLE WHERE DocId = 1 AND Group IN
('Group-G', 'Group-A', 'Group-Of-Something-Totally-Different')

Postgresql Array Functions with QueryDSL

I use the Vlad Mihalcea's library in order to map SQL arrays (Postgresql in my case) to JPA. Then let's imagine I have an Entity, ex.
#TypeDefs(
{#TypeDef(name = "string-array", typeClass =
StringArrayType.class)}
)
#Entity
public class Entity {
#Type(type = "string-array")
#Column(columnDefinition = "text[]")
private String[] tags;
}
The appropriate SQL is:
CREATE TABLE entity (
tags text[]
);
Using QueryDSL I'd like to fetch rows which tags contains all the given ones. The raw SQL could be:
SELECT * FROM entity WHERE tags #> '{"someTag","anotherTag"}'::text[];
(taken from: https://www.postgresql.org/docs/9.1/static/functions-array.html)
Is it possible to do it with QueryDSL? Something like the code bellow ?
predicate.and(entity.tags.eqAll(<whatever>));

1st step is to generate proper sql: WHERE tags #> '{"someTag","anotherTag"}'::text[];
2nd step is described by coladict (thanks a lot!): figure out the functions which are called: #> is arraycontains and ::text[] is string_to_array
3rd step is to call them properly. After hours of debug I figured out that HQL doesn't treat functions as functions unless I added an expression sign (in my case: ...=true), so the final solution looks like this:
predicate.and(
Expressions.booleanTemplate("arraycontains({0}, string_to_array({1}, ',')) = true",
entity.tags,
tagsStr)
);
where tagsStr - is a String with values separated by ,

Since you can't use custom operators, you will have to use their functional equivalents. You can look them up in the psql console with \doS+. For \doS+ #> we get several results, but this is the one you want:
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+---------------------+-------------
pg_catalog | #> | anyarray | anyarray | boolean | arraycontains | contains
It tells us the function used is called arraycontains, so now we look-up that function to see it's parameters using \df arraycontains
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+---------------+------------------+---------------------+--------
pg_catalog | arraycontains | boolean | anyarray, anyarray | normal
From here, we transform the target query you're aiming for into:
SELECT * FROM entity WHERE arraycontains(tags, '{"someTag","anotherTag"}'::text[]);
You should then be able to use the builder's function call to create this condition.
ParameterExpression<String[]> tags = cb.parameter(String[].class);
Expression<Boolean> tagcheck = cb.function("Flight_.id", Boolean.class, Entity_.tags, tags);
Though I use a different array solution (might publish soon), I believe it should work, unless there are bugs in the underlying implementation.
An alternative to method would be to compile the escaped string format of the array and pass it on as the second parameter. It's easier to print if you don't treat the double-quotes as optional. In that event, you have to replace String[] with String in the ParameterExpression row above

For EclipseLink I created a function
CREATE OR REPLACE FUNCTION check_array(array_val text[], string_comma character varying ) RETURNS bool AS $$
BEGIN
RETURN arraycontains(array_val, string_to_array(string_comma, ','));
END;
$$ LANGUAGE plpgsql;
As pointed out by Serhii, then you can useExpressions.booleanTemplate("FUNCTION('check_array', {0}, {1}) = true", entity.tags, tagsStr)

JPQL/SQL get the latest of records grouped by columns

I'm currently using JPQL (latest version I think) with a postgresql database and I'm trying to get the latests objects grouped by some (not all columns).
Let's say I have a table MyTable like this:
id | name | color | shape | date_of_creation | other_stuff|
---+------+-------+-------+------------------+------------+
01 | apple| red| circle| 30/12/2015| stuff|
02 | apple| red| circle| 15/12/2015| somestuff|
03 | apple| green| circle| 01/12/2015|anotherstuff|
04 | box| orange| cubic| 13/12/2015| blahblah1|
05 | box| orange| cubic| 25/12/2015| blahblah2|
And I only want the latest grouped by name, color and shape like this:
id | name | color | shape | date_of_creation | other_stuff|
---+------+-------+-------+------------------+------------+
01 | apple| red| circle| 30/12/2015| stuff|
05 | box| orange| cubic| 25/12/2015| blahblah2|
Let's assume that the java equivalent is MyClass with the same elements.
My query in JPQL would look like something like:
SELECT m FROM MyClass m GROUP BY m.name, m.color, m.shape ORDER BY m.date_of_creation;
But this does not work because "date_of_creation" must be in the "GROUP BY" clause, which does not make sense because I don't want to group by date_of_creation. I tried using nested select but I have the same problem with the id now..
I found out about the greatest-n-per-group problem ( SQL Select only rows with Max Value on a Column ) on SO's title suggestions after orginaly typing the question, so I created a named query like this:
SELECT m FROM Myclass LEFT JOIN m.id mleft WHERE
m.name = mleft.name AND
m.color = mleft.color AND
m.shape = mleft.shape AND
m.date_of_creation < mleft.date_of_creation AND
mleft.id = null
Which makes a java.lang.NullPointerException at org.hibernate.hql.internal.ast.HqlSqlWalker.createFromJoinElement(HqlSqlWalker.java:395) or does not even work (no error, nothing). I think it's because the field "id" does not have a relationship with itself. And using an inner join with a select inside does not seem to work in JPQL:
SELECT m FROM MyClass INNER JOIN (SELECT mInner from Myclass GROUP BY m.name etc.)
Maybe I'm missing something and I'll have to build a much more complicated query, but JPQL seems not to be powerfull enough, or I don't know enough of it.
Also, I don't really feel like using the query.getResultList() and doing the work in java...
Any tips are welcome, thanks in advance.

You can't use subqueries in FROM block in JPQL, only SELECT and WHERE blocks allowed. Try someting like this:
SELECT m
FROM Myclass m
WHERE m.date_of_creation IN ( SELECT MAX(mm.date_of_creation)
FROM Myclass mm
WHERE mm.id = m.id)
P.S. Code not tested

Cassandra: Selecting a Range of TimeUUIDs using the DataStax Java Driver

The use case that we are working to solve with Cassandra is this: We need to retrieve a list of entity UUIDs that have been updated within a certain time range within the last 90 days. Imagine that we're building a document tracking system, so our relevant entity is a Document, whose key is a UUID.
The query we need to support in this use case is: Find all Document UUIDs that have changed between StartDateTime and EndDateTime.
Question 1: What's the best Cassandra table design to support this query?
I think the answer is as follows:
CREATE TABLE document_change_events (
event_uuid TIMEUUID,
document_uuid uuid,
PRIMARY KEY ((event_uuid), document_uuid)
) WITH default_time_to_live='7776000';
And given that we can't do range queries on partition keys, we'd need to use the token() method. As such the query would then be:
SELECT document_uuid
WHERE token(event_uuid) > token(minTimeuuid(?))
AND token(event_uuid) < token(maxTimeuuid(?))
For example:
SELECT document_uuid
WHERE token(event_uuid) > token(minTimeuuid('2015-05-10 00:00+0000'))
AND token(event_uuid) < token(maxTimeuuid('2015-05-20 00:00+0000'))
Question 2: I can't seem to get the following Java code using DataStax's driver to reliability return the correct results.
If I run the following code 10 times pausing 30 seconds between, I will then have 10 rows in this table:
private void addEvent() {
String cql = "INSERT INTO document_change_events (event_uuid, document_uuid) VALUES(?,?)";
PreparedStatement preparedStatement = cassandraSession.prepare(cql);
BoundStatement boundStatement = new BoundStatement(preparedStatement);
boundStatement.setConsistencyLevel(ConsistencyLevel.ANY);
boundStatement.setUUID("event_uuid", UUIDs.timeBased());
boundStatement.setUUID("document_uuid", UUIDs.random());
cassandraSession.execute(boundStatement);
}
Here are the results:
cqlsh:> select event_uuid, dateOf(event_uuid), document_uuid from document_change_events;
event_uuid | dateOf(event_uuid) | document_uuid
--------------------------------------+--------------------------+--------------------------------------
414decc0-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:51:09-0500 | 92b6fb6a-9ded-47b0-a91c-68c63f45d338
9abb4be0-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:53:39-0500 | 548b320a-10f6-409f-a921-d4a1170a576e
6512b960-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:52:09-0500 | 970e5e77-1e07-40ea-870a-84637c9fc280
53307a20-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:51:39-0500 | 11b4a49c-b73d-4c8d-9f88-078a6f303167
ac9e0050-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:54:10-0500 | b29e7915-7c17-4900-b784-8ac24e9e72e2
88d7fb30-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:53:09-0500 | c8188b73-1b97-4b32-a897-7facdeecea35
0ba5cf70-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:49:39-0500 | a079b30f-be80-4a99-ae0e-a784d82f0432
76f56dd0-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:52:39-0500 | 3b593ca6-220c-4a8b-8c16-27dc1fb5adde
1d88f910-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:50:09-0500 | ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
2f6b3850-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:50:39-0500 | db42271b-04f2-45d1-9ae7-0c8f9371a4db
(10 rows)
But if I then run this code:
private static void retrieveEvents(Instant startInstant, Instant endInstant) {
String cql = "SELECT document_uuid FROM document_change_events " +
"WHERE token(event_uuid) > token(?) AND token(event_uuid) < token(?)";
PreparedStatement preparedStatement = cassandraSession.prepare(cql);
BoundStatement boundStatement = new BoundStatement(preparedStatement);
boundStatement.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
boundStatement.bind(UUIDs.startOf(Date.from(startInstant).getTime()),
UUIDs.endOf(Date.from(endInstant).getTime()));
ResultSet resultSet = cassandraSession.execute(boundStatement);
if (resultSet == null) {
System.out.println("None found.");
return;
}
while (!resultSet.isExhausted()) {
System.out.println(resultSet.one().getUUID("document_uuid"));
}
}
It only retrieves three results:
3b593ca6-220c-4a8b-8c16-27dc1fb5adde
ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
db42271b-04f2-45d1-9ae7-0c8f9371a4db
Why didn't it retrieve all 10 results? And what do I need to change to achieve the correct results to support this use case?
For reference, I've tested this against dsc-2.1.1, dse-4.6 and using the DataStax Java Driver v2.1.6.

First of all, please only ask one question at a time. Both of your questions here could easily stand on their own. I know these are related, but it just makes the readers come down with a case of tl;dr.
I'll answer your 2nd question first, because the answer ties into a fundamental understanding that is central to getting the data model correct. When I INSERT your rows and run the following query, this is what I get:
aploetz#cqlsh:stackoverflow2> SELECT document_uuid FROM document_change_events
WHERE token(event_uuid) > token(minTimeuuid('2015-05-10 00:00-0500'))
AND token(event_uuid) < token(maxTimeuuid('2015-05-22 00:00-0500'));
document_uuid
--------------------------------------
a079b30f-be80-4a99-ae0e-a784d82f0432
3b593ca6-220c-4a8b-8c16-27dc1fb5adde
ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
db42271b-04f2-45d1-9ae7-0c8f9371a4db
(4 rows)
Which is similar to what you are seeing. Why didn't that return all 10? Well, the answer becomes apparent when I include token(event_uuid) in my SELECT:
aploetz#cqlsh:stackoverflow2> SELECT token(event_uuid),document_uuid FROM document_change_events WHERE token(event_uuid) > token(minTimeuuid('2015-05-10 00:00-0500')) AND token(event_uuid) < token(maxTimeuuid('2015-05-22 00:00-0500'));
token(event_uuid) | document_uuid
----------------------+--------------------------------------
-2112897298583224342 | a079b30f-be80-4a99-ae0e-a784d82f0432
2990331690803078123 | 3b593ca6-220c-4a8b-8c16-27dc1fb5adde
5049638908563824288 | ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
5577339174953240576 | db42271b-04f2-45d1-9ae7-0c8f9371a4db
(4 rows)
Cassandra stores partition keys (event_uuid in your case) in order by their hashed token value. You can see this when using the token function. Cassandra generates partition tokens with a process called consistent hashing to ensure even cluster distribution. In other words, querying by token range doesn't make sense unless the actual (hashed) token values are meaningful to your application.
Getting back to your first question, this means you will have to find a different column to partition on. My suggestion is to use a timeseries mechanism called a "date bucket." Picking the date bucket can be tricky, as it depends on your requirements and query patterns...so that's really up to you to pick a useful one.
For the purposes of this example, I'll pick "month." So I'll re-create your table partitioning on month and clustering by event_uuid:
CREATE TABLE document_change_events2 (
event_uuid TIMEUUID,
document_uuid uuid,
month text,
PRIMARY KEY ((month),event_uuid, document_uuid)
) WITH default_time_to_live='7776000';
Now I can query by a date range, when also filtering by month:
aploetz#cqlsh:stackoverflow2> SELECT document_uuid FROM document_change_events2
WHERE month='201505'
AND event_uuid > minTimeuuid('2015-05-10 00:00-0500')
AND event_uuid < maxTimeuuid('2015-05-22 00:00-0500');
document_uuid
--------------------------------------
a079b30f-be80-4a99-ae0e-a784d82f0432
ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
db42271b-04f2-45d1-9ae7-0c8f9371a4db
92b6fb6a-9ded-47b0-a91c-68c63f45d338
11b4a49c-b73d-4c8d-9f88-078a6f303167
970e5e77-1e07-40ea-870a-84637c9fc280
3b593ca6-220c-4a8b-8c16-27dc1fb5adde
c8188b73-1b97-4b32-a897-7facdeecea35
548b320a-10f6-409f-a921-d4a1170a576e
b29e7915-7c17-4900-b784-8ac24e9e72e2
(10 rows)
Again, month may not work for your application. So put some thought behind coming up with an appropriate column to partition on, and then you should be able to solve this.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.