I'm creating a heavy HQL query and try to optimize it. There is a table Product, which needs some statistics calculated for every single product, lets call it stat. I'm trying this kind of query to fetch all products and their stats at once (this is a simplified query, real one is much more complex):
select new map(min(product) as prod, sum(somestat) as stat)
from Product product
left join product.stats somestat
group by product.id, product.name
order by product.name
However, when I try to execute this kind of query, first it executes the primary select, and then it executes X times SELECT product.* FROM product WHERE product.id=? selecting every product that was returned.
Is there a way to make it take the results from the first query to create those Product instances?
Thanks in advance.
If you want the whole product, then hibernate is already doing the only reasonable thing: executing N+1 selects. You happen to group by the primary key so theoretically one could imagine doing it, but even in SQL you are not allowed to select any columns not used in group-by. Anyway, such custom trickery is beyond an ORM such as Hibernate.
Related
I'm looking to have a GUI where when I click an Invoice it displays the information from both Customer and Product also, such as name, brand etc all in one row.
Do I have to put Name, brand, etc into Invoice too and inner join everything?
Invoice Table Customer Table Product Table
EDIT:
No, no need to modify the tables you're referring to. They all contain a unique primary key column which are referenced from the invoice table. Based on them the INNER JOIN can be formulated.
Maybe also worth mentioning: Don't confuse the INNER JOIN with the SELF JOIN which also exists.
The difference is that the INNER JOIN is still joining two different tables based on specific columns (e.g. id) whereby the SELF JOIN is joining a single table with itself.
Yes what you'll need is the INNER JOIN combining the information from your invoice table with the one from the customer table as well as the product table - all based on your given invoice id (column: idInvoice).
To obtain the needed information you don't need to add - and therefore repeat - it in the invoice table. Due to the join they'll be available for selection in one single query.
Your query should look like:
SELECT *
FROM invoice inv, customer cust, product prod
WHERE
inv.idCustomer = cust.idCostumer
AND
inv.idProduct = prod.idProduct
AND
inv.idInvoice = ${theIdOfTheInvoiceTheUserClickedOn}
Note: If you don't need all the information (columns) from the three tables (what the "*" stands for) you can replace the "*" with an enumeration explicitly stating only the columns you want to show. E.g. inv.id, cust.FirstName, cust.LastName.
Depending on the database technology/ dialect you're exactly using. The example above would be suitable for an Oracle database and should also suite most other databases, since only basic SQL features are being used in the query.
I'm assuming you're not using any ORM framework like Hibernate and that you'll need to write the query yourself (since you didn't provide any more detail on your technology stack). In case youre using an ORM framework the approach would need to look different, but the final generated query should look similar.
In the query above the first two conditions in the WHERE clause are forming the INNER JOIN implicitly, whereby the last third one is specifying which exact entry you're looking for.
Although you've asked only if an INNER JOIN is needed, I've provided the query here to you since your question implied you're not sure how to write one.
You might take it as an working example you can compare your solution with. You should try to understand how it's working and how it can be written and also research more on the SQL basics so that you can write it on your own as well.
Tip: PreparedStatements are the way to go to execute such queries to a database from Java in a safe way
In my opinion, based on your application, you can use a flat table that includes what you need and doesn't need to join tables. This solution is applicable when you are in a situation that you have small datasets (E.g. in banking, relationships between Terminal table and ATMTerminal, POSTerminal and CashlessTerminal tables).
And for situations that you have a relationship that one side is static data (like the above example) and another side is transactional data (like Banking Transactions), you should use the foreign key from the transaction table to the static data table.
I recommend you to use the second solution for your problem and handle relationships using foreign keys and get the columns you need using the join strategy.
I am currently new to neo4j and exploring with cypher queries for a task at hand. I am using neo4j bolt driver in Java
Here is what I am trying to achieve. I have something like below data available as a Java ArrayList(stored in a HashMap):
employerId 2 : [employeeId 1, employeeId 2, employeeId3, ...]
Which basically shows relationship between employer and employee (these are the employees of employer 2)
Now, I need to find these employees & employer in the graph(they may or may not exist already) and create a "(x:Employer) -[Employs]->(y:Employee)" relationship between them.
One way (maybe, naive) that I can think of, is to search for employer and employee every time and run a separate CREATE query for each.
match (employer:Employer{name:"John"}), (name:Employee{name:"Snow"}) CREATE (employer)-[pr:EMPLOYES]->(employee)
But I feel that it would unnecessary search for the same Employer node multiple times. And as time is an important criterion for me right now, I am looking for a better way(if exists)
As a newbie to neo4j, all I can think of is, to do a search for the employer ID once, and then run multiple queries using that result, with Employee ID being searched every time. But I am unable to find the correct query to do this. Moreover, will this be the right approach? I need to prepare this query from Java. So should I query multiple times or send a single query?
This query below looks similar to the one from #Lju. However, it has a few improvements.
The MERGE for the Employer only needs to be done once, so it should come before the UNWIND. Otherwise, it would be done for every Employee.
You should pass the employer name (or id) and the list of employee
names (or ids) in
parameters. In
the following example, the Cypher code refers to the parameters as
$employerName and $names.
Also, since the WITH clause between the 2 MERGE clauses was just passing all identifiers forward, it is not needed. (However, Cypher syntax does require a WITH clause between a MERGE and an UNWIND).
Query:
MERGE (employer:Employer {name: $employerName})
WITH employer
UNWIND $names AS name
MERGE (employee:Employee {name: name})
MERGE (employer)-[:Employs]->(employee)
RETURN *
I'm not a pro in SQL at all :)
Having a very critical performance issue.
Here is the info directly related to problem.
I have 2 tables in my DB- table condos and table goods.
table condos have the fields:
id (PK)
name
city
country
table items:
id (PK)
name
multiple fields not related to issue
condo_id (FK)
I have 1000+ entities in condos table and 1000+ in items table.
The problem is how i perform items search
currently it is:
For example, i want to get all the items for city = Sydney
Perform a SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'
Make a SELECT * FROM public.items WHERE item.condo_id = ? for each condo_id i get in step 1.
The issue is that once i get 1000+ entities in condos table, the request is performed 1000+ times for each condo_id belongs to 'Sydney'. And the execution of this request takes more then a 2 minutes which is a critical performance issue.
So, the questions is:
What is the best way for me to perform such search ? should i put a 1000+ id's in single WHERE request? or?
For add info, i use PostgreSQL 9.4 and Spring MVC.
Use a table join to perform a query such that you do not need to perform a additional query. In your case you can join condos and items by condo_id which is something like:
SELECT i.*
FROM public.items i join public.condos c on i.condo_id = c.condo_id
WHERE c.city = 'Sydney'
Note that performance tuning is a board topic. It can varied from environment to environment, depends on how you structure the data in table and how you organize the data in your code.
Here is some other suggestion that may also help:
Try to add index to the field where you use sorting and searching, e.g. city in condos and condo_id in items. There is a good answer to explain how indexing work.
I also recommend you to perform EXPLAIN to devises a query plan for your query whether there is full table search that may cause performance issue.
Hope this can help.
Essentially what you need is to eliminate the N+1 query and at the same time ensure that your City field is indexed. You have 3 mechanisms to go. One is already stated in one of the other answers you have received this is the SUBSELECT approach. Beyond this approach you have another two.
You can use what you have stated :
SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'
SELECT *
FROM public.items
WHERE items.condo_id IN (up to 1000 ids here)
the reason why I am stating up to 1000 is because some SQL providers have limitations.
You also can do join as a way to eliminate the N+1 selects
SELECT *
FROM public.items join public.condos on items.condo_id=condos.condo_id and condos.city='Sydney'
Now what is the difference in between the 3 queries.
Pros of Subselect query is that you get everything at once.
The Cons is that if you have too many elements the performance may suffer:
Pros of simple In clause. Effectivly solves the N+1 problem,
Cons may lead to some extra queries compared to the Subselect
Joined query pros, you can initialize in one go both Condo and Item.
Cons leads to some data duplication on Condo side
If we have a look into a framework like Hibernate, we can find there that in most of the cases as a fetch strategy is used either Joined either IN strategies. Subselect is used rarely.
Also if you have critical performance you may consider reading everything In Memory and serving it from there. Judging from the content of these two tables it should be fairly easy to just upload it into a Map.
Effectively everything that solves your N+1 query problem is a solution in your case if we are talking of just 2 times 1000 queries. All three options are solutions.
You could use the first query as a subquery in an in operator in the second query:
SELECT *
FROM public.items
WHERE item.condo_id IN (SELECT condos.condo_id
FROM public.condos
WHERE city = 'Sydney')
I have a two tables: Person and House, the mapping is one to one.
Now I have to assign the address of Person and House (which can be different) to the same address.
There are more than 5000 records. Which will be faster? Using Code to update the entities one by one, e.g.
for (id : Ids) {
Person person = PersonDAO.find(id);
person.setAddress ("abc");
}
and then doing same with House;
Or should I use JPQL to update both in two different queries, e.g.
UPDATE Person p SET p.Address = "abc" WHERE ID IN(.....ID QUERY)
My question is what will be faster? Will the update using JPQL have the same performance, same as that in code? Or should I use native query to NOT load the entities, as I only want performance.
Using the query will be faster (and much more memory efficient), as the query provider will translate the JPQL query to native SQL. Also, if you use entities directly, the number of queries made against the database will be siginificantly higher (one select and update for each and every row).
The native query will be faster, as it doesn't have to translate anything.
If you want it to be even faster, you can use a PreparedStatement. With the .addBatch() method you add the query to the batch, and with the executeBatch() method you will execute the full batch, minimizing the amount of times being switched between user and kernel mode.
One thing I like about EclipseLink has this great thing called the batch query hint, which I'm yet to find a Hibernate equivalent of.
Basically doing a whole bunch of joins gets messy real quick and you end up querying way more data than you necessarily want (remember that if you join person to 6 addresses the person information is returned 6 times; now keep multiplying that out by extra joins).
Imagine a Person entity with 0:M collections of Address, Email, Phone and OrderHistory. Joining all that is not good but with the batch method:
List persons = entityManager.createQuery("select p from Person p"
.setHint(QueryHints.BATCH, "p.address")
.setHint(QueryHints.BATCH, "p.email")
.setHint(QueryHints.BATCH, "p.phone")
.setHint(QueryHints.BATCH, "p.orderHistory")
.getResultList();
This will do a query on the Person table and that's it. When you first access an address record it will do a single query for the entire Address table. If you specified a where clause on the Person table, this same criteria will be used for the Address load too.
So instead of doing 1 query, you do 5.
If you were doing that with joins you might get it all in one query but you may very well be loading way more data because of the joins.
Anyway, I've gone looking in the Hibernate docs for an equivalent to this but don't see one. Is there one?
There isn't one.
There are two things I know of that might help:
1) hibernate.default_batch_fetch_size
2) Criteria.setFetchMode and Criteria.setFetchSize