jOOQ : fetchMany and fetchAny - java

I was going through jooq documentation to try and understand how fetchMany and fetchAny work. But there aren't many examples and usecases available.
Could someone show the proper use of these commands how are they different from each other and also from fetch().

The general idea of the various ResultQuery.fetch() methods is outlined in the manual:
http://www.jooq.org/doc/latest/manual/sql-execution/fetching/
And in particular:
http://www.jooq.org/doc/latest/manual/sql-execution/fetching/many-fetching/
As far as your specific question is concerned, I think the relevant Javadocs might help:
fetchAny()
This executes the query and returns at most one resulting record.
Example:
TableRecord randomRecord =
DSL.using(configuration)
.select()
.from(TABLE)
.fetchAny();
So, this will fetch whatever record the database returns first. A similar query would be the following one, where you explicitly limit the number of records to 1 in the database:
TableRecord randomRecord =
DSL.using(configuration)
.select()
.from(TABLE)
.limit(1)
.fetchOne();
fetchMany()
A variety of databases support returning more than one result set from stored procedures. A Sybase ASE example:
> sp_help 'author'
+--------+-----+-----------+-------------+-------------------+
|Name |Owner|Object_type|Object_status|Create_date |
+--------+-----+-----------+-------------+-------------------+
| author|dbo |user table | -- none -- |Sep 22 2011 11:20PM|
+--------+-----+-----------+-------------+-------------------+
+-------------+-------+------+----+-----+-----+
|Column_name |Type |Length|Prec|Scale|... |
+-------------+-------+------+----+-----+-----+
|id |int | 4|NULL| NULL| 0|
|first_name |varchar| 50|NULL| NULL| 1|
|last_name |varchar| 50|NULL| NULL| 0|
|date_of_birth|date | 4|NULL| NULL| 1|
|year_of_birth|int | 4|NULL| NULL| 1|
+-------------+-------+------+----+-----+-----+
When using JDBC directly, this is rather tedious as you have to write a lot of code to fetch one result after the other:
ResultSet rs = statement.executeQuery();
// Repeat until there are no more result sets
for (;;) {
// Empty the current result set
while (rs.next()) {
// [ .. do something with it .. ]
}
// Get the next result set, if available
if (statement.getMoreResults()) {
rs = statement.getResultSet();
}
else {
break;
}
}
// Be sure that all result sets are closed
statement.getMoreResults(Statement.CLOSE_ALL_RESULTS);
statement.close();
With jOOQ and fetchMany(), this is dead simple:
List<Result<Record>> results = create.fetchMany("sp_help 'author'");

Related

How to cluster rows in a postgresql table that match an input value or match a value from any of the other matching rows?

I have a table that looks like this in my postgresql database
How can I bring back a cluster of contacts where each contact in the cluster shares either the contact_id_a or contact_id_b value (or both) with another contact in the cluster?
In the example in the screenshot image above, rows 1-6 would be in the same cluster and row 8 would belong to no cluster.
How can this be achieved using either a SQL query or a SQL query in combination with Java code?
For context, this table lists all potential duplicate contacts in a list of contacts. We want to present to the list owner all of the contacts that are potential duplicates so that the user can manually manage these duplicates.
Here is my starting code:
DuplicateCandidate firstDuplicate = db.sql("select * from duplicates where list_id = "+list_id+ " and ignore_duplicate is not true").first(DuplicateCandidate);
String sql = "select * from duplicates where list_id = "+list_id+ "and ignore_duplicate is not true "
+ "and (contact_id_a = ? or contact_id_b = ? or contact_id_a = ? or contact_id_b = ?";
List<DuplicateCandidate> groupOfDuplicates = db.sql(sql, firstDuplicate.contact_id_a,firstDuplicate.contact_id_a, firstDuplicate.contact_id_b, firstDuplicate.contact_id_b).results(DuplicateCandidate.class);
This will bring back the first row and any other rows containing 16247096 or 16247097, but not other essential rows matching the contact_ids from the second query's results.
Cheers.
You can use a recursive CTE. This walks the graph and then assigns the minimum identifier in the graph for each row. Note that your data does not have a unique identifier for each row so this starts by generating one:
with recursive d as (
select row_number() over (order by contact_id_a, contact_id_b) as id, d.*
from duplicates d
),
cte (id, contact_id_a, contact_id_b, min_id, ids, lev) as (
select id, contact_id_a, contact_id_b, id as min_id, array[id] as ids, 1 as lev
from d
union all
select d.id, d.contact_id_a, d.contact_id_b, least(d.id, cte.min_id), ids || d.id, lev + 1
from cte join
d
on cte.contact_id_a = d.contact_id_a or cte.contact_id_b = d.contact_id_b
where d.id <> ALL (cte.ids)
)
select distinct on (id) cte.*
from cte
order by id, min_id;
The column min_id contains the grouping you want.
Here is a db<>fiddle illustrating the code.
Clustering like this is an iterative process with an unknown number of steps. I have never found a solution that can be done within a recursive query.
I have not worked on CRM in over six years, but the following function is similar to how we used to generate match groups. Doing this row-by-row did not perform well enough for our workload, and accomplishing this via host language using e.g. Java HashMap() and HashSet() and inverted indexing creates very messy code.
Assuming this schema:
\d contact_info
Table "public.contact_info"
Column | Type | Collation | Nullable | Default
------------------+---------+-----------+----------+---------
contact_id_a | bigint | | |
contact_id_b | bigint | | |
ignore_duplicate | boolean | | | false
list_id | integer | | | 496
select * from contact_info ;
contact_id_a | contact_id_b | ignore_duplicate | list_id
--------------+--------------+------------------+---------
16247096 | 16247097 | f | 496
16247096 | 16247098 | f | 496
16247096 | 16247099 | f | 496
16247097 | 16247098 | f | 496
16247097 | 16247099 | f | 496
16247098 | 16247099 | f | 496
16247094 | 16247095 | f | 496
(7 rows)
This function creates two temp tables to hold intermediate clusters and then returns the result once there is no more clustering possible.
create or replace function cluster_contact()
returns table (clust_id bigint, contact_id bigint)
language plpgsql as $$
declare
last_count bigint := 1;
this_count bigint := 0;
begin
create temp table contact_match (clust_id bigint, contact_id bigint) on commit drop;
create index cm_1 on contact_match (contact_id, clust_id);
create index cm_2 on contact_match using hash (clust_id);
create temp table contact_hold (clust_id bigint, contact_id bigint) on commit drop;
with dedup as (
select distinct least(ci.contact_id_a) as clust_id,
greatest(ci.contact_id_b) as contact_id
from contact_info ci
where not ci.ignore_duplicate
)
insert into contact_match
select d.clust_id, d.clust_id from dedup d
union
select d.clust_id, d.contact_id from dedup d;
while last_count > this_count loop
if this_count = 0 then
select count(distinct cm.clust_id) into last_count from contact_match cm;
else
last_count := this_count;
end if;
with new_cid as (
select cm.contact_id as clust_id_old,
min(cm.clust_id) as clust_id_new
from contact_match cm
group by cm.contact_id
)
update contact_match
set clust_id = nc.clust_id_new
from new_cid nc
where contact_match.clust_id = nc.clust_id_old;
truncate table contact_hold;
insert into contact_hold
select distinct * from contact_match;
truncate table contact_match;
insert into contact_match
select * from contact_hold;
select count(distinct cm.clust_id) into this_count from contact_match cm;
end loop;
return query select * from contact_match order by clust_id, contact_id;
end $$;
One of the biggest mental blocks I have seen developers face is neglecting to include the relationship of a contact_id to itself. This leads to disjoint handling and a mental model needlessly complicated by a left-side and a right-side.
select * from cluster_contact();
clust_id | contact_id
----------+------------
16247094 | 16247094
16247094 | 16247095
16247096 | 16247096
16247096 | 16247097
16247096 | 16247098
16247096 | 16247099
(6 rows)
Please comment if you need clarification on any of the steps in this solution or if it does not work for you.
Also, know that Levenshtein is available in fuzzystrmatch, and it works well.
If you would rather have sequential clust_id starting at 1, change your return query in the function to this:
return query
select dense_rank() over (order by cm.clust_id) as clust_id,
cm.contact_id
from contact_match cm
order by clust_id, contact_id;
It will yield:
select * from cluster_contact();
clust_id | contact_id
----------+------------
1 | 16247094
1 | 16247095
2 | 16247096
2 | 16247097
2 | 16247098
2 | 16247099
(6 rows)

HQL How to query Table with List of Objects as a set of conditions?

I have a class called form
Class Form {
String id;
String version;
String minVersion;
}
Now I have a List<Form> with me.
I have to query a table which satisfies the condition such that the id, version, type all as a set satisfies ie. I want to find all the {id, version, minVersion} which is the exact same in the table from my list. For example
I have a table like this
+----+---------+------------+---------+
| id | version | minVersion | passage |
+----+---------+------------+---------+
| 1 | 2 | 2 | Hi |
| 1 | 3 | 1 | Hello |
| 2 | 2 | 2 | Hi |
| 3 | 3 | 3 | Hi |
+----+---------+------------+---------+
I want to select 2 rows from it given as a List: {id: 1, version: 3, minVersion: 1} and {id: 2, version: 2, minVersion: 2}
I wrote a query like this,
SELECT id, version, type, passage FROM Content WHERE (id, version, type) IN (:id, :version, :minVersion);
But how do to convert into HSQL and setParameter for it?
Suggestion if your table size is small get all the elements specific to ids and filter in java layer
Please use the IN syntax in your query and setParameterList API.
For Example:
String hql = " SELECT id, version, type, passage FROM Content WHERE id in (:ids) "
And while forming query use
Query query = session.createQuery(hql);
query.setParameterList("ids", idList);
Refer this
Try this:
List<Form> findByIdInAndVersionInAndMinVersionIn(List<String> ids, List<String> versions, List<String> minVersions);

avoid executing a DB query in loop

I have a table as below,
process_name | startTime | endTime | parent_id
-------------------------------------------------------------------------
chrome | 2019-03-06 00:48:27 | 2019-03-06 00:48:58 | 111
chrome | 2019-03-07 00:48:27 | 2019-03-07 00:48:58 | 112
firefox | 2019-03-08 00:48:27 | 2019-03-08 00:48:58 | 113
IE | 2019-03-09 00:48:27 | 2019-03-09 00:48:58 | 114
firefox | 2019-03-10 00:48:27 | 2019-03-10 00:48:58 | 115
chrome | 2019-03-11 00:48:27 | 2019-03-11 00:48:58 | 116
Some points from the table,
Same process_name can be found under different parent_ids.
I already know the process_name and have an ArrayList of parent_ids where it exists.
For a given process_name, I need to find startTime and endTime for all of the parent_ids present in my ArrayList.
All I can think of is using a for loop and execute the below query.
select startTime, endTime from myTable where parent_id = ? and process_name=?
Can I avoid executing this query in a loop and follow some other better approach?
I use Derby at this moment but the DB may get changed to Oracle.
There are multiple approaches to solve this problem :
As Svend Suggested, Since you already have process_name, You can get all the start and end times for that process name.
Another approach is to prepare a string of parent_ids first, and then query it to the database using a single query:
Code:
String s = "(";
for(int i = 0; i < list.length; i++)
{
if(i<list.length-1) s+=list[i] + ",";
else s+= ""+ list[i] + ")";
}
String query = " select startTime, endTime from myTable where parent_id IN " + s + " and process_name=?";
It is not good practice to execute a query in a loop. Instead, why don't you fetch all the records belonging to a given process at once. Maybe using GROUPBY clause for the process_name along with aggregate functions on other columns [if you need to perform anything upon those other columns and avoid doing via code by yourself]
Once you have all the records in hand, then the startTime and endTime can be used for your further processing.
Hope that helps.
Wouldn't it be possible to just do:
select parent_id, startTime, endTime from myTable where process_name=?
And get the values for all parent_id values, and then filter those on your ArrayList afterwards?
As a slight variation of the solution of #prakhar-londhe you might also use parameters in the in clause as in
String query = " select startTime, endTime from myTable where parent_id IN (?, ?, ?) and process_name = ?";
and then add a parameter for each parent_id. Of course you'd still have to assemble the query (to put a ? inside the parentheses for each parent_id) as well as the parameter list in a loop.

JoinColumn with multi column concatenated reference. A Foreign key refering * from * has the wrong number of column

Sorry for the awkward title, There isn't an easy way to summarize my problem.
I have 2 related entities:
the real data has been modified to protect sensitive IP, but the problem is the same. so try not to be distracted by the fields not making complete sense.
# Permission
+----------------+--------------+
| Column | type |
+----------------+--------------+
| perm_id | Number(20,0) |
| item_id1_id2 | VARCHAR2(8) |
| date_code | VARCHAR2(6) |
+----------------+--------------+
# Item
+-----------------+-------------+
| Column | type |
+-----------------+-------------+
| id1 | VARCHAR2(4) |
| id2 | VARCHAR2(4) |
| date_code | VARCHAR2(6) |
| some_data_field | VARCHAR(20) |
+-----------------+-------------+
Permission has a #ManyToOne relationship with Item. Permission links to Item via the logic in SQL below:
SELECT p.*, i.*
FROM Permission p
JOIN (
SELECT
id1 || id2 as joined_ids, -- the p.item_id1_id2 is 2 CONCATed columns to Item.id1 and Item.id2
effective_date_code, -- this column specifies WHEN this data is effective by, i.e. all date codes for permissions between this date and not including the next greatest date should link to this record.
some_data_field, -- and arbitrary data point that gives this object its usefulness.
rank() over (partition by id1, id2 order by effective_date_code DESC) max_date_code -- this essentially gives us the
FROM Item
-- where effective_date_code <= p.date_code
ORDER BY max_date_code
) i
ON i.max_date_code = 1
and p.item_id1_id2 = i.joined_ids
;
As you can see, the join is rather complex, And so far my attempts to wrangle with Hibernate's API have been fruitless. Be aware that these are highly depended on legacy tables that cannot afford a schema change, so that's out of the question.
I've tried to use the #JoinColumnsOrFormulas annotation and related:
public class Permission {
// ...
#ManyToOne(fetch = FetchType.LAZY)
#JoinColumnsOrFormulas(value = {
#JoinColumnOrFormula(formula = #JoinFormula(value = "item_id1_id2", referencedColumnName = "id1 || id2")),
#JoinColumnOrFormula(column = #JoinColumn(name = "date_code", referencedColumnName = "effective_date_code")) // This isn't the final thing, but just for testing purposes I'm linking like this.
})
public Subject subject;
}
but I get the complaint:
java.lang.RuntimeException: org.hibernate.AnnotationException: A Foreign key
\ refering com.example.Item from com.example.Permission has the wrong number
\ of column. should be 3...
Am I expecting too much of the ORM and should I split up the query into more manageable and feasible parts or is this possible using hibernate?

Testing a method to return unique values from a result set of a db query

Let's say there is a database table named "Students".
Students Table
|---------------|---------------|
| Name | Age |
|---------------|---------------|
| John | 9 |
|---------------|---------------|
| Jane | 7 |
|---------------|---------------|
| Dana | 8 |
|---------------|---------------|
| John | 6 |
|---------------|---------------|
I make a request to the database to return all names using:
SELECT Name FROM Students
So the result set would be:
ResultSet rs = {"John" "Jane" "Dana" "John"}
I then want to return all unique values so I write a method called populateSet() to populate a HashSet setOfNames with rs.
After executing populateSet(ResultSet rs), only unique names appear:
"John" "Jane" "Dana"
Here is the validation test:
public void testValidation() {
// Skipping the test data / db connection / query part
ResultSet rs = ResultSet received back from DB
Set<String> expected = {"John", "Jane", "Dana"};
Set<String> actual = WhateverClass.populateSet(rs);
assertEquals(expected, actual);
}
What are some of the possible unit tests that I could write?
I know I should write a null, empty, and negative tests. What are some corner cases?
Potential corner cases:
Very large table (millions of rows)
Transactionality - e.g, what happens if a name is inserted/deleted/updated after you start the table scan?
Data ordering
Case sensitivity/insensitivity of names
non-ASCII characters in names

Categories

Resources