I'm not a pro in SQL at all :)
Having a very critical performance issue.
Here is the info directly related to problem.
I have 2 tables in my DB- table condos and table goods.
table condos have the fields:
id (PK)
name
city
country
table items:
id (PK)
name
multiple fields not related to issue
condo_id (FK)
I have 1000+ entities in condos table and 1000+ in items table.
The problem is how i perform items search
currently it is:
For example, i want to get all the items for city = Sydney
Perform a SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'
Make a SELECT * FROM public.items WHERE item.condo_id = ? for each condo_id i get in step 1.
The issue is that once i get 1000+ entities in condos table, the request is performed 1000+ times for each condo_id belongs to 'Sydney'. And the execution of this request takes more then a 2 minutes which is a critical performance issue.
So, the questions is:
What is the best way for me to perform such search ? should i put a 1000+ id's in single WHERE request? or?
For add info, i use PostgreSQL 9.4 and Spring MVC.
Use a table join to perform a query such that you do not need to perform a additional query. In your case you can join condos and items by condo_id which is something like:
SELECT i.*
FROM public.items i join public.condos c on i.condo_id = c.condo_id
WHERE c.city = 'Sydney'
Note that performance tuning is a board topic. It can varied from environment to environment, depends on how you structure the data in table and how you organize the data in your code.
Here is some other suggestion that may also help:
Try to add index to the field where you use sorting and searching, e.g. city in condos and condo_id in items. There is a good answer to explain how indexing work.
I also recommend you to perform EXPLAIN to devises a query plan for your query whether there is full table search that may cause performance issue.
Hope this can help.
Essentially what you need is to eliminate the N+1 query and at the same time ensure that your City field is indexed. You have 3 mechanisms to go. One is already stated in one of the other answers you have received this is the SUBSELECT approach. Beyond this approach you have another two.
You can use what you have stated :
SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'
SELECT *
FROM public.items
WHERE items.condo_id IN (up to 1000 ids here)
the reason why I am stating up to 1000 is because some SQL providers have limitations.
You also can do join as a way to eliminate the N+1 selects
SELECT *
FROM public.items join public.condos on items.condo_id=condos.condo_id and condos.city='Sydney'
Now what is the difference in between the 3 queries.
Pros of Subselect query is that you get everything at once.
The Cons is that if you have too many elements the performance may suffer:
Pros of simple In clause. Effectivly solves the N+1 problem,
Cons may lead to some extra queries compared to the Subselect
Joined query pros, you can initialize in one go both Condo and Item.
Cons leads to some data duplication on Condo side
If we have a look into a framework like Hibernate, we can find there that in most of the cases as a fetch strategy is used either Joined either IN strategies. Subselect is used rarely.
Also if you have critical performance you may consider reading everything In Memory and serving it from there. Judging from the content of these two tables it should be fairly easy to just upload it into a Map.
Effectively everything that solves your N+1 query problem is a solution in your case if we are talking of just 2 times 1000 queries. All three options are solutions.
You could use the first query as a subquery in an in operator in the second query:
SELECT *
FROM public.items
WHERE item.condo_id IN (SELECT condos.condo_id
FROM public.condos
WHERE city = 'Sydney')
Related
I know we are all using a bunch of ORMs for SQL, but I wanted to give a try to native drivers. What is the correct way to map the data after executing Postgres Join? Please see the two queries below.
I have tables:
articles (id, title, text)
comments (id, article_id, text)
The data I want to pass back to the client is an array of articles, and each article has an array of comments:
[{ id: 1,
title: "Article 1",
text: "Random text",
comments: [{
id: 1,
article_id: 1,
text: "Hello",
}],
}];
I could achieve this using two different approaches.
SELECT a.id, a.title, c.id, c.text, c.article_id
FROM articles a JOIN "comments" c ON a.id = c.article_id
or
SELECT a.id, a.title, array_agg((c.id, c.text, c.article_id)) AS comments
FROM articles a
JOIN comments c ON a.id = c.article_id
GROUP BY c.article_id
First query: I will need to group data manually on the backend by c.article_id
Second query: It will group it by PostgreSQL and create an array, but is this the correct approach?
I am specifically asking for 'every-day use', which means a bit greater tables. Or maybe you know some other approach? I would love to know if any of these approaches are popular ORMs using because with ORMs you can easily get the data as it is in the example of articles array above
ORMs use the first approach for One-to-Many relations. One or several queries execute depending on fetch strategy. Grouping performs on the backend, out of the box by the framework.
For example, hibernate documentation.
This is a common solution because aggregation functions are implemented differently in databases. It is difficult to support a database-specific implementation. So this is an advantage of the first approach that it will work in the same way on any database.
Regarding the second approach, I see only one advantage the count of returned records will be significantly decreased and probably we will save time for communication with DB. The columns of the parent table will not be repeated.
But we can't say that one of the approaches will be faster, it depends on query complexity and affected records count.
Also, there can be issues connected to high memory consumption by array_agg function.
For example:
Out of memory
memory issue with array_agg
limit causes runaway memory
IMHO:
I would prefer to use the first approach as the main solution for the general case. The aggregation function is good where it can increase performance without high DB memory consumption.
Problem
I'm trying to get the most frequent item in a collection that belongs to a table. So for instance, if I have Table 'Library' and Table 'Book' and 'Library' has a collection of 'Book's, I would like to retrieve ALL books from ALL Libraries. From this result, I would like the most frequent book. The problem is I need this is one query, if possible. It would also be ok if I just got a list of ALL the books but sorted by occurrence.
What I've Tried
SELECT l.books, COUNT(l.books) AS occur FROM Library l
SELECT b FROM Library l, l.books b ORDER BY b.name
The second one sadly does not order by ALL books, it sorts each collection on its own.
If more information is need I can provide it of course.
I hope somebody can help me :(
Your first query is pretty close. The ANSI standard version of the query is:
SELECT l.books, COUNT(l.books) AS occur
FROM Library l
ORDER BY occur DESC
FETCH FIRST 1 ROW ONLY;
Some databases use TOP or LIMIT to fetch only one row.
I'm creating a heavy HQL query and try to optimize it. There is a table Product, which needs some statistics calculated for every single product, lets call it stat. I'm trying this kind of query to fetch all products and their stats at once (this is a simplified query, real one is much more complex):
select new map(min(product) as prod, sum(somestat) as stat)
from Product product
left join product.stats somestat
group by product.id, product.name
order by product.name
However, when I try to execute this kind of query, first it executes the primary select, and then it executes X times SELECT product.* FROM product WHERE product.id=? selecting every product that was returned.
Is there a way to make it take the results from the first query to create those Product instances?
Thanks in advance.
If you want the whole product, then hibernate is already doing the only reasonable thing: executing N+1 selects. You happen to group by the primary key so theoretically one could imagine doing it, but even in SQL you are not allowed to select any columns not used in group-by. Anyway, such custom trickery is beyond an ORM such as Hibernate.
I have trouble understanding how to avoid the n+1 select in jpa or hibernate.
From what i read, there's the 'left join fetch', but i'm not sure if it still works with more than one list (oneToMany)..
Could someone explain it to me, or give me a link with a clear complete explanation please ?
I'm sorry if this is a noob question, but i can't find a real clear article or doc on this issue.
Thanks
Apart from the join, you can also use subselect(s). This results in 2 queries being executed (or in general m + 1, if you have m lists), but it scales well for a large number of lists too, unlike join fetching.
With join fetching, if you fetch 2 tables (or lists) with your entity, you get a cartesian product, i.e. all combinations of pairs of rows from the two tables. If the tables are large, the result can be huge, e.g. if both tables have 1000 rows, the cartesian product contains 1 million rows!
A better alternative for such cases is to use subselects. In this case, you would issue 2 selects - one for each table - on top of the main select (which loads the parent entity), so altogether you load 1 + 100 + 100 rows with 3 queries.
For the record, the same with lazy loading would result in 201 separate selects, each loading a single row.
Update: here are some examples:
a tutorial: Tuning Lazy Fetching, with a section on subselects towards the end (btw it also explains the n+1 selects problem and all strategies to deal with it),
examples of HQL subqueries from the Hibernate reference,
just in case, the chapter on fetching strategies from the Hibernate reference - similar content as the first one, but much more thorough
How do I build oracle pl/sql query dynamically from a java application? The user will be presented with a bunch of columns that are present in different tables in the database. The user can select any set of column and the application should build the complete select query using only the tables that contain the selected columns.
For example, lets consider that there are 3 tables in the database. The user selects col11, col22. In this case, the application should build the query using Tabl1 and Tabl2 only.
How do I achieve this?
Tabl1
- col11
- col12
- col13
Tabl2
- fkTbl1
- col21
- col22
- col23
Tabl3
- col31
- col32
- col33
- fkTbl1
Ad hoc reporting is an old favourite. It frequently appears as a one-liner at the end of the Reports Requirements section: "Users must be able to define and run their own reports". The only snag is that ad hoc reporting is an application in its own right.
You say
"The user will be presented with a
bunch of columns that are present in
different tables in the database."
You can avoid some of the complexities I discuss below if the "bunch of columns" (and the spread of tables) is preselected and tightly controlled. Alas, it is in the nature of ad hoc reporting that users will want pretty much all columns from all tables.
Let's start with your example. The user has selected col11 and col22, so you need to generate this query:
SELECT tabl1.col11
, tabl2.col22
FROM tabl1 JOIN tabl2
ON (TABL1.ID = TABL2.FKTABL1)
/
That's not too difficult. You just need to navigate the data dictionary views USER_CONSTRAINTS and USER_CONS_COLUMNS to establish the columns in the join condition - providing you have defined foreign keys (please have foreign keys!).
Things become more complicated if we add a fourth table:
Tabl4
- col41
- col42
- col43
- fkTbl2
Now when the user choose col11 and col42 you need to navigate the data dictionary to establish that Tabl2 acts as an intermediary table to join Tabl4 and Tabl1 (presuming you are not using composite primary keys, as most people don't). But suppose the user selects col31 and col41. Is that a legitimate combination? Let's say it is. Now you have to join Tabl4 to Tabl2 to Tabl1 to Tabl3. Hmmm...
And what if the user selects columns from two completely unrelated tables - Tabl1 and Tabl23? Do you blindly generate a CROSS JOIN or do you hurl an exception? The choice is yours.
Going back to that first query, it will return all the rows in both tables. Almost certainly your users will want the option to restrict the result set. So you need to offer them the ability to add to filters to the WHERE clause. Gotchas here include:
ensuring that supplied values are of an appropriate data-type (no strings for a number, no numbers for a date)
providing look-ups to reference data
values
handling multiple values (IN list
rather than equals)
ensuring date ranges are sensible
(opening bound before closing bound)
handling free text searches (are you
going to allow it? do you need to
use TEXT indexes or will you run the
risk of users executing LIKE
'%whatever%' against some CLOB
column?)
The last point highlights one risk inherent in ad hoc reporting: if the users can assemble a query from any tables with any filters they can assemble a query which can drain all the resources from your system. So it is a good idea to apply profiles to prevent that happening. Also, as I have already mentioned, it is possible for the users to build nonsensical queries. Bear in mind that you don't need very many tables in your schema to generate too many permutations to test.
Finally there is the tricky proposition of security policies. If users are restricted to seeing subsets of data on the basis their department or their job role, then you will need to replicate those rules. In such cases the automatic application of policies through Row Level Security is a real boon
All of which might lead you to conclude that the best solution would be to pursuade your users to acquire an off-the-shelf product instead. Although that approach isn't without its own problems.
The way that I've done this kind of thing in the past is to simply construct the SQL query on the fly using a StringBuilder and then executing it using a JDBC a non-prepared statement. This is rather inefficient since the Oracle DB has to repeat all of the query analysis and optimization work for each query.