avoid executing a DB query in loop - java

I have a table as below,
process_name | startTime | endTime | parent_id
-------------------------------------------------------------------------
chrome | 2019-03-06 00:48:27 | 2019-03-06 00:48:58 | 111
chrome | 2019-03-07 00:48:27 | 2019-03-07 00:48:58 | 112
firefox | 2019-03-08 00:48:27 | 2019-03-08 00:48:58 | 113
IE | 2019-03-09 00:48:27 | 2019-03-09 00:48:58 | 114
firefox | 2019-03-10 00:48:27 | 2019-03-10 00:48:58 | 115
chrome | 2019-03-11 00:48:27 | 2019-03-11 00:48:58 | 116
Some points from the table,
Same process_name can be found under different parent_ids.
I already know the process_name and have an ArrayList of parent_ids where it exists.
For a given process_name, I need to find startTime and endTime for all of the parent_ids present in my ArrayList.
All I can think of is using a for loop and execute the below query.
select startTime, endTime from myTable where parent_id = ? and process_name=?
Can I avoid executing this query in a loop and follow some other better approach?
I use Derby at this moment but the DB may get changed to Oracle.

There are multiple approaches to solve this problem :
As Svend Suggested, Since you already have process_name, You can get all the start and end times for that process name.
Another approach is to prepare a string of parent_ids first, and then query it to the database using a single query:
Code:
String s = "(";
for(int i = 0; i < list.length; i++)
{
if(i<list.length-1) s+=list[i] + ",";
else s+= ""+ list[i] + ")";
}
String query = " select startTime, endTime from myTable where parent_id IN " + s + " and process_name=?";

It is not good practice to execute a query in a loop. Instead, why don't you fetch all the records belonging to a given process at once. Maybe using GROUPBY clause for the process_name along with aggregate functions on other columns [if you need to perform anything upon those other columns and avoid doing via code by yourself]
Once you have all the records in hand, then the startTime and endTime can be used for your further processing.
Hope that helps.

Wouldn't it be possible to just do:
select parent_id, startTime, endTime from myTable where process_name=?
And get the values for all parent_id values, and then filter those on your ArrayList afterwards?

As a slight variation of the solution of #prakhar-londhe you might also use parameters in the in clause as in
String query = " select startTime, endTime from myTable where parent_id IN (?, ?, ?) and process_name = ?";
and then add a parameter for each parent_id. Of course you'd still have to assemble the query (to put a ? inside the parentheses for each parent_id) as well as the parameter list in a loop.

Related

How to cluster rows in a postgresql table that match an input value or match a value from any of the other matching rows?

I have a table that looks like this in my postgresql database
How can I bring back a cluster of contacts where each contact in the cluster shares either the contact_id_a or contact_id_b value (or both) with another contact in the cluster?
In the example in the screenshot image above, rows 1-6 would be in the same cluster and row 8 would belong to no cluster.
How can this be achieved using either a SQL query or a SQL query in combination with Java code?
For context, this table lists all potential duplicate contacts in a list of contacts. We want to present to the list owner all of the contacts that are potential duplicates so that the user can manually manage these duplicates.
Here is my starting code:
DuplicateCandidate firstDuplicate = db.sql("select * from duplicates where list_id = "+list_id+ " and ignore_duplicate is not true").first(DuplicateCandidate);
String sql = "select * from duplicates where list_id = "+list_id+ "and ignore_duplicate is not true "
+ "and (contact_id_a = ? or contact_id_b = ? or contact_id_a = ? or contact_id_b = ?";
List<DuplicateCandidate> groupOfDuplicates = db.sql(sql, firstDuplicate.contact_id_a,firstDuplicate.contact_id_a, firstDuplicate.contact_id_b, firstDuplicate.contact_id_b).results(DuplicateCandidate.class);
This will bring back the first row and any other rows containing 16247096 or 16247097, but not other essential rows matching the contact_ids from the second query's results.
Cheers.
You can use a recursive CTE. This walks the graph and then assigns the minimum identifier in the graph for each row. Note that your data does not have a unique identifier for each row so this starts by generating one:
with recursive d as (
select row_number() over (order by contact_id_a, contact_id_b) as id, d.*
from duplicates d
),
cte (id, contact_id_a, contact_id_b, min_id, ids, lev) as (
select id, contact_id_a, contact_id_b, id as min_id, array[id] as ids, 1 as lev
from d
union all
select d.id, d.contact_id_a, d.contact_id_b, least(d.id, cte.min_id), ids || d.id, lev + 1
from cte join
d
on cte.contact_id_a = d.contact_id_a or cte.contact_id_b = d.contact_id_b
where d.id <> ALL (cte.ids)
)
select distinct on (id) cte.*
from cte
order by id, min_id;
The column min_id contains the grouping you want.
Here is a db<>fiddle illustrating the code.
Clustering like this is an iterative process with an unknown number of steps. I have never found a solution that can be done within a recursive query.
I have not worked on CRM in over six years, but the following function is similar to how we used to generate match groups. Doing this row-by-row did not perform well enough for our workload, and accomplishing this via host language using e.g. Java HashMap() and HashSet() and inverted indexing creates very messy code.
Assuming this schema:
\d contact_info
Table "public.contact_info"
Column | Type | Collation | Nullable | Default
------------------+---------+-----------+----------+---------
contact_id_a | bigint | | |
contact_id_b | bigint | | |
ignore_duplicate | boolean | | | false
list_id | integer | | | 496
select * from contact_info ;
contact_id_a | contact_id_b | ignore_duplicate | list_id
--------------+--------------+------------------+---------
16247096 | 16247097 | f | 496
16247096 | 16247098 | f | 496
16247096 | 16247099 | f | 496
16247097 | 16247098 | f | 496
16247097 | 16247099 | f | 496
16247098 | 16247099 | f | 496
16247094 | 16247095 | f | 496
(7 rows)
This function creates two temp tables to hold intermediate clusters and then returns the result once there is no more clustering possible.
create or replace function cluster_contact()
returns table (clust_id bigint, contact_id bigint)
language plpgsql as $$
declare
last_count bigint := 1;
this_count bigint := 0;
begin
create temp table contact_match (clust_id bigint, contact_id bigint) on commit drop;
create index cm_1 on contact_match (contact_id, clust_id);
create index cm_2 on contact_match using hash (clust_id);
create temp table contact_hold (clust_id bigint, contact_id bigint) on commit drop;
with dedup as (
select distinct least(ci.contact_id_a) as clust_id,
greatest(ci.contact_id_b) as contact_id
from contact_info ci
where not ci.ignore_duplicate
)
insert into contact_match
select d.clust_id, d.clust_id from dedup d
union
select d.clust_id, d.contact_id from dedup d;
while last_count > this_count loop
if this_count = 0 then
select count(distinct cm.clust_id) into last_count from contact_match cm;
else
last_count := this_count;
end if;
with new_cid as (
select cm.contact_id as clust_id_old,
min(cm.clust_id) as clust_id_new
from contact_match cm
group by cm.contact_id
)
update contact_match
set clust_id = nc.clust_id_new
from new_cid nc
where contact_match.clust_id = nc.clust_id_old;
truncate table contact_hold;
insert into contact_hold
select distinct * from contact_match;
truncate table contact_match;
insert into contact_match
select * from contact_hold;
select count(distinct cm.clust_id) into this_count from contact_match cm;
end loop;
return query select * from contact_match order by clust_id, contact_id;
end $$;
One of the biggest mental blocks I have seen developers face is neglecting to include the relationship of a contact_id to itself. This leads to disjoint handling and a mental model needlessly complicated by a left-side and a right-side.
select * from cluster_contact();
clust_id | contact_id
----------+------------
16247094 | 16247094
16247094 | 16247095
16247096 | 16247096
16247096 | 16247097
16247096 | 16247098
16247096 | 16247099
(6 rows)
Please comment if you need clarification on any of the steps in this solution or if it does not work for you.
Also, know that Levenshtein is available in fuzzystrmatch, and it works well.
If you would rather have sequential clust_id starting at 1, change your return query in the function to this:
return query
select dense_rank() over (order by cm.clust_id) as clust_id,
cm.contact_id
from contact_match cm
order by clust_id, contact_id;
It will yield:
select * from cluster_contact();
clust_id | contact_id
----------+------------
1 | 16247094
1 | 16247095
2 | 16247096
2 | 16247097
2 | 16247098
2 | 16247099
(6 rows)

rank values based on column entry type

i have an sql query that ranks values of a column called 'Total' and returns the result in a computed column called 'Rank'. the query is as follows:
String sql ="select s1.Name as 'NAME',s1.Lang as 'LANGUAGE',"
+ "s1.Total as 'TOTAL',COUNT(DISTINCT s2.Total) AS Rank from Type1 s1 JOIN Type1 s2 on(s1.Total<=s2.Total) GROUP BY s1.ID order by s1.Name ASC ";
pst=conn.prepareStatement(sql);
rs=pst.executeQuery();
table_statistics1.setModel(DbUtils.resultSetToTableModel(rs));
and it gives the result below:
| NAME | LANGUAGE | TOTAL | RANK|
------------------------------------------------------
| james|French |70.0|2nd |
| jimmy|English |90 |1st |
| josh|French |60 |3rd |
| john|English |40 |4th |
i want to write a query that will rank the total column but based on only the language type..so since i have two type of languages,it will rank the total column and return values for only french entries and a seperate rank for only english entries..
so that the output will look like this below:
| NAME | LANGUAGE | TOTAL | RANK|
------------------------------------------------------
| james |French |70.0 |1st |
| jimmy |English |90 |1st |
|josh |French |60 |2nd |
|john |English |40 |2nd |
all suggestions will be very gratefull. thank you.
The normal way to rank would use ANSI-standard window functions:
select t.Name, t.Lang as LANGUAGE, t.Total,
dense_rank() over (Partition by t.lang order by t.total) as rnk
from Type t;
In SQLite, you can do:
select t.*,
(select count(distinct t2.total)
from t t2
where t2.lang = t.lang and t2.total <= t.total
) as rnk
from Type t;
Thank you Gordon.. your query for dense_rank() substitute didnt work out but i took an important part which was missing in mine and that was t2.lang=t.lang
the final working code is:
SELECT S1.*,COUNT(DISTINCT s2.Total) AS Rank from Type s1 JOIN Type s2 on(s2.lang=s1.lang and s1.Total<=s2.Total)GROUP BY s1.ID order by s1.Name ASC
thank you very much boss.

Showing parameter number 2 is not an OUT parameter while calling a procedure using java

I am trying to call a stored procedure using java.
I am a beginner in stored procedure.
So tried the same first in mysql terminal
mysql> delimiter //
mysql> create procedure get_EmployeeName(IN cust_id INT,OUT cust_name varchar(20))
-> begin
-> select name into cust_name from CUSTOMERS where id = cust_id;
-> end //
Query OK, 0 rows affected (0.11 sec)
mysql> delimiter ;
mysql> call get_EmployeeName(1,#cust_name);
Query OK, 1 row affected (0.08 sec)
mysql> select #cust_name;
+------------+
| #cust_name |
+------------+
| A |
+------------+
1 row in set (0.01 sec)
I am able to get the output.
Now tried using Java.
Create procedure statement
stmt.execute("create procedure mydb.WhoAreThey(IN id INT, OUT name VARCHAR(20)) begin select Name into name from employees where EmployeeID = id; end");
Procedure Call
Not sure if my procedure call is right or wrong
String IN = "1000";
cs = con.prepareCall("{call WhoAreThey(?,?)}");
cs.setString(1, IN);
cs.registerOutParameter(2, Types.VARCHAR);
cs.execute();
Above call gives an sqlexception
"SQLException: Parameter number 2 is not an OUT parameter"
UPDATE
worked by calling " cs = con.prepareCall("{call mydb.WhoAreThey(?,?)}");"
But When I did select #name..it is showing "Null".
I expected my out as "David Walker" from employees table as EmployeeID is 1000.
mysql> select * from employees;
+------------+--------------+--------+---------------------+
| EmployeeID | Name | Office | CreateTime |
+------------+--------------+--------+---------------------+
| 1000 | David Walker | HQ101 | 2016-12-21 13:00:37 |
| 1001 | Paul Martin | HQ101 | 2016-12-21 13:00:37 |
+------------+--------------+--------+---------------------+
2 rows in set (0.00 sec)
Am I checking in a wrong way(select #name)
Instead of calling WhoAreThey call mydb.WhoAreThey
String IN = "1000";
cs = con.prepareCall("{call mydb.WhoAreThey(?,?)}");
cs.setString(1, IN);
cs.registerOutParameter(2, Types.VARCHAR);
cs.execute();
To fetch the value of OUT param USE:
String name = cs.getString(2);

Caused by: java.sql.SQLException: Subquery returns more than 1 row on all rows where emailAccess is thesame

I am trying to retrieve all rows where emailAccess is equal to john#yahoo.com
user table is structured this way
id | name | email | emailAccess
1 | john |john#yahoo.com | john#yahoo.com
2 | jeff |jeff#yahoo.com | john#yahoo.com
I have a log table like this
id | userId | message
1 | 1 | bla bla
2 | 2 | 1234
now I am using the following hql query to retrieve the log based on the userId where emailAccesss from sesssion is john#yahoo.com
String hql = "FROM Chat c WHERE c.user = (FROM User u WHERE u.emailAccess = :emailAccess)";
return _sessionFactory.getCurrentSession().createQuery(hql).setParameter("emailAccess", emailAccess).list();
trying to use the above hql query gives me this error
Caused by: org.hibernate.exception.DataException: could not extract ResultSet
at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:135)
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
second stacktrace
Caused by: java.sql.SQLException: Subquery returns more than 1 row
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:957)
Where am I failing.
Kindly assist!
using hql allows you to access objects and their attributes, try this query :
String hql = "FROM Chat c WHERE c.user.emailAccess = :emailAccess"
(FROM User u WHERE u.emailAccess = :emailAccess)
This returns more than one row and you have:
WHERE c.user =
You can't have an equals with multiple results on the right. Either change your query to return a single row or change to be something like:
WHERE c.user in

java.sql.SQLException: Column count doesn't match value count at row 1 [duplicate]

This question already has answers here:
java.sql.SQLException: Column count doesn't match value count at row 1
(3 answers)
Closed 6 years ago.
I'm trying to update values using JDBC and I continue to get the same error for different tables and with different schemas.
Let's say that I have a table like this
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| field1 | varchar(50) | YES | | NULL | |
| field2 | varchar(50) | YES | | NULL | |
+----------------+-------------+------+-----+---------+-------+
then, I try to add a row:
String Text1 = text1;
String Text2 = text2;
String Query_String = "INSERT INTO tablename(field1,field2) VALUES ('"+Text1+","+Text2+"')";
Query_Statement.executeUpdate(Query_String);
the number of columns is the same, and also in the text there are not other commas, but I continue to get the error "java.sql.SQLException: Column count doesn't match value count at row 1"
I'm sure it's something simple, probably on the syntax since I managed to make it to work with a single column...
Thanks in advance
There is something wrong with:
String Query_String = "INSERT INTO tablename(field1,field2) VALUES ('"+Text1+","+Text2+"')";
You've missed some quotes between Text1 and Text2:
String Query_String = "INSERT INTO tablename(field1,field2) VALUES ('"+Text1+"','"+Text2+"')";
String Query_String = "INSERT INTO tablename(field1,field2) VALUES ("'"+Text1+"' , '"+Text2+"');";
It should like this note ' this
PreparedStatement would be better choice.
you have got a mistake with your quotes...
the following will be executed:
INSERT INTO tablename(field1,field2) VALUES ('Text1,Text2');
you have to write:
String Query_String = "INSERT INTO tablename(field1,field2)
VALUES ('"+Text1+"','"+Text2+"')"

Categories

Resources