I'm looking for something which would allow me to replicate something like Jackson's #JsonView functionality but on a database level. Often I feel like I've hit the wall trying to balance between query performance and having all the necessary data returned from the API.
For example I have a class structure like:
class Employee {
String name;
#ManyToOne(fetchType = EAGER)
Company company;
// A lot of other data
}
class Company {
String name;
// A lot of other data
}
Now I would like to display a list of Employees and additionally show the Company name of each employee in the table. Ideally in raw SQL world I would do something like
select e.name, c.name from Employees as e left join Companies as c on e.companyId = c.id
But what instead happens is that hibernate queries the whole company data as well including other relations and their relations which often makes it really slow. Lazy loading is also slower and doesn't work when you need to return the queried data from a controller.
Maybe there is something which would allow me to declare a data view in a manner like:
class EmployeeListView {
String name = Path.to("Employee.name");
String companyName = Path.to("Employee.company.name")
}
?
#NamedEntityGraph is the answer, for anyone concerned
Related
I have a JPA select where you receibe a parameter then we can search using some attributes (username, email, identifier) The user only have a text field to write the criteria text.
The problem is perfomance, in the database We have about 9Millions of users registereds, and the search is too slow, using JPA,
Form:
- Value (Input text)
User sends the value in the form (He doesn't say if he is using username, email or identifier)
User (Table) Fields:
- Identifier
- Name
- Email
JPA Query:
select u from UserEntity u where u.alias LIKE lower(:query) OR u.email LIKE lower(:query) OR lower(u.identifier) LIKE lower(:query) ORDER BY u.alias
I don't know what the best method to improve the speed of the search, (We have some indexes in the table in these fields), if we remove the lower in the u.identifier field the speed improves a lot (almost instant). But we can have the identifier in a lot of ways (migrations, registers, manual client inserts..)
We have about 9 Millions of users registered, and the search is too slow, using JPA
Please note the search will be too slow regardless the technology involved in executing the query just because the SQL query itself is too slow. That being said the problem is not JPA but how to improve the SQL query.
Combining LOWER() function with LIKE operator adds a lot of overhead because the RDBMS must apply a text function to 3 fields before analyse the LIKE match.
IMHO a good approach is using a VIEW to leverage the LOWER() part on the 3 fields and then execute the query over this view (BTW mapping a View with JPA is just as simple as mapping a Table):
CRAETE VIEW user_view AS SELECT id, lower(identifier) AS identifier, lower(alias) AS alias, lower(email) AS email FROM user;
Then create an entity for searching:
#Entity
#Table("user_view")
public class UserView {
#Basic private Long id;
#Basic private String identifier;
#Basic private String alias;
#Basic private String email;
// getters and setters as required
}
And finally the JPQL query:
String jqpl = "SELECT u FROM UserView u WHERE u.alias LIKE :query OR u.email LIKE :query OR u.identifier LIKE :query";
Note that you can pass query parameter directly in lower case as a Query parameter and you can order the result list after the query execution. Both will result in the RDBMS making less effort and consequently improving the overall response time.
You can solve it using different techniques.
One is using a collate that ignores case. For example in MySQL https://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html
Another strategy is to have a derived field with an index. In DB2, for example, you can create a field 'lower_identifier generated always as lower(identifier)' and an index in the field and it will use it automatically when doing a lower search. You don't have to map the field in JPA.
Let's say I have a List of entities:
List<SomeEntity> myEntities = new ArrayList<>();
SomeEntity.java:
#Entity
#Table(name = "entity_table")
public class SomeEntity{
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private long id;
private int score;
public SomeEntity() {}
public SomeEntity(long id, int score) {
this.id = id;
this.score = score;
}
MyEntityRepository.java:
#Repository
public interface MyEntityRepository extends JpaRepository<SomeEntity, Long> {
List<SomeEntity> findAllByScoreGreaterThan(int Score);
}
So when I run:
myEntityRepository.findAllByScoreGreaterThan(10);
Then Hibernate will load all of the records in the table into memory for me.
There are millions of records, so I don't want that. Then, in order to intersect, I need to compare each record in the result set to my List.
In native MySQL, what I would have done in this situation is:
create a temporary table and insert into it the entities' ids from the List.
join this temporary table with the "entity_table", use the score filter and then only pull the entities that are relevant to me (the ones that were in the list in the first place).
This way I gain a big performance increase, avoid any OutOfMemoryErrors and have the machine of the database do most of the work.
Is there a way to achieve such an outcome with Spring Data JPA's query methods (with hibernate as the JPA provider)? I couldn't find in the documentation or in SO any such use case.
I understand you have a set of entity_table identifiers and you want to find each entity_table whose identifier is in that subset and whose score is greater than a given score.
So the obvious question is: how did you arrive to the initial subset of entity_tables and couldn't you just add the criteria of that query to your query that also checks for "score is greater than x"?
But if we ignore that, I think there's two possible solutions. If the list of some_entity identifiers is small (what exactly is "small" depends on your database), you could just use an IN clause and define your method as:
List<SomeEntity> findByScoreGreaterThanAndIdIn(int score, Set<Long) ids)
If the number of identifiers is too large to fit in an IN clause (or you're worried about the performance of using an IN clause) and you need to use a temporary table, the recipe would be:
Create an entity that maps to your temporary table. Create a Spring Data JPA repository for it:
class TempEntity {
#Id
private Long entityId;
}
interface TempEntityRepository extends JpaRepository<TempEntity,Long> { }
Use its save method to save all the entity identifiers into the temporary table. As long as you enable insert batching this should perform all right -- how to enable differs per database and JPA provider, but for Hibernate at the very least set the hibernate.jdbc.batch_size Hibernate property to a sufficiently large value. Also flush() and clear() your entityManager regularly or all your temp table entities will accumulate in the persistence context and you'll still run out of memory. Something along the lines of:
int count = 0;
for (SomeEntity someEntity : myEntities) {
tempEntityRepository.save(new TempEntity(someEntity.getId());
if (++count == 1000) {
entityManager.flush();
entityManager.clear();
}
}
Add a find method to your SomeEntityRepository that runs a native query that does the select on entity_table and joins to the temp table:
#Query("SELECT id, score FROM entity_table t INNER JOIN temp_table tt ON t.id = tt.id WHERE t.score > ?1", nativeQuery = true)
List<SomeEntity> findByScoreGreaterThan(int score);
Make sure you run both methods in the same transaction, so create a method in a #Service class that you annotate with #Transactional(Propagation.REQUIRES_NEW) that calls both repository methods in succession. Otherwise your temp table's contents will be gone by the time the SELECT query runs and you'll get zero results.
You might be able to avoid native queries by having your temp table entity have a #ManyToOne to SomeEntity since then you can join in JPQL; I'm just not sure if you'll be able to avoid actually loading the SomeEntitys to insert them in that case (or if creating a new SomeEntity with just an ID would work). But since you say you already have a list of SomeEntity that's perhaps not a problem.
I need something similar myself, so will amend my answer as I get a working version of this.
You can:
1) Make a paginated native query via JPA (remember to add an order clause to it) and process a fixed amount of records
2) Use a StatelessSession (see the documentation)
I have this class mapped
#Entity
#Table(name = "USERS")
public class User {
private long id;
private String userName;
}
and I make a query:
Query query = session.createQuery("select id, userName, count(userName) from User order by count(userName) desc");
return query.list();
How can I access the values returned by the query?
I mean, how should I treat the query.list()? As a User or what?
To strictly answer your question, queries that specify a property of a class in the select clause (and optionally call aggregate functions) return "scalar" results i.e. a Object[] (or a List<Object[]>). See 10.4.1.3. Scalar results.
But your current query doesn't work. You'll need something like this:
select u.userName, count(u.userName)
from User2633514 u
group by u.userName
order by count(u.userName) desc
I'm not sure how Hibernate handles aggregates and counts, but I'm not sure if your query is going to work at all. You're trying to select a aggregate (i.e. the "count(userName)"), but you don't have a "group by" clause for userName.
If the query does in fact work, and Hibernate can figure out what to do with it, the results you get back will most likely be a raw Object[], because Hibernate will not be able to map your "count(userName)" data into any field on your mapped objects.
Overall, when you get into using aggregates in queries, Hibernate can get a little more tricky, since you're no longer mapping tables/columns directly into classes/fields. It might be a good idea to read up more on how to do aggregates in Hibernate, from their documentation.
I have two classes, Person and Company, derived from another class Contact. They are represented as polymorphically in two tables (Person and Company). The simplified classes look like this:
public abstract class Contact {
Integer id;
public abstract String getDisplayName();
}
public class Person extends Contact {
String firstName;
String lastName;
public String getDisplayName() {
return firstName + " " + lastName;
}
}
public class Company extends Contact {
String name;
public String getDisplayName() {
return name;
}
}
The problem is that I need to make a query finding all contacts with displayName containing a certain string. I can't make the query using displayName because it is not part of either table. Any ideas on how to do this query?
Because you do the concatenation in the Java class, there is no way that Hibernate can really help you with this one, sorry. It can simply not see what you are doing in this method, since it is in fact not related to persistence at all.
The solution depends on how you mapped the inheritance of these classes:
If it is table-per-hierarchy you can use this approach: Write a SQL where clause for a criteria query, and then use a case statement:
s.createCriteria(Contact.class)
.add(Restrictions.sqlRestriction("? = case when type='Person' then firstName || ' '|| lastName else name end"))
.list();
If it is table per-concrete-subclass, then you are better of writing two queries (since that is what Hibernate will do anyway).
You could create a new column in the Contact table containing the respective displayName, which you could fill via a Hibernate Interceptor so it would always contain the right string automatically.
The alternative would be having two queries, one for the Person and one for the Company table, each containing the respective search logic. You may have to use native queries to achieve looking for a concatenated string via a LIKE query (I'm not a HQL expert, though, it may well be possible).
If you have large tables, you should alternatively think about full-text indexing, as LIKE '%...%' queries require a full table scan unless your database supports full text indexes.
If you change displayName to be a mapped property (set to the name column in Company and to a formula like first||' '||last in Person), then can query for Contract and Hibernate will run two queries both of which now have a displayName. You will get back a List of two Lists, one containing Companies and one containing Persons so you'll have to merge them back together. I think you need to query by the full package name of Contract or set up a typedef to tell Hibernate about it.
What exactly does JPA's fetch strategy control? I can't detect any difference between eager and lazy. In both cases JPA/Hibernate does not automatically join many-to-one relationships.
Example: Person has a single address. An address can belong to many people. The JPA annotated entity classes look like:
#Entity
public class Person {
#Id
public Integer id;
public String name;
#ManyToOne(fetch=FetchType.LAZY or EAGER)
public Address address;
}
#Entity
public class Address {
#Id
public Integer id;
public String name;
}
If I use the JPA query:
select p from Person p where ...
JPA/Hibernate generates one SQL query to select from Person table, and then a distinct address query for each person:
select ... from Person where ...
select ... from Address where id=1
select ... from Address where id=2
select ... from Address where id=3
This is very bad for large result sets. If there are 1000 people it generates 1001 queries (1 from Person and 1000 distinct from Address). I know this because I'm looking at MySQL's query log. It was my understanding that setting address's fetch type to eager will cause JPA/Hibernate to automatically query with a join. However, regardless of the fetch type, it still generates distinct queries for relationships.
Only when I explicitly tell it to join does it actually join:
select p, a from Person p left join p.address a where ...
Am I missing something here? I now have to hand code every query so that it left joins the many-to-one relationships. I'm using Hibernate's JPA implementation with MySQL.
Edit: It appears (see Hibernate FAQ here and here) that FetchType does not impact JPA queries. So in my case I have explicitly tell it to join.
JPA doesn't provide any specification on mapping annotations to select fetch strategy. In general, related entities can be fetched in any one of the ways given below
SELECT => one query for root entities + one query for related mapped entity/collection of each root entity = (n+1) queries
SUBSELECT => one query for root entities + second query for related mapped entity/collection of all root entities retrieved in first query = 2 queries
JOIN => one query to fetch both root entities and all of their mapped entity/collection = 1 query
So SELECT and JOIN are two extremes and SUBSELECT falls in between. One can choose suitable strategy based on her/his domain model.
By default SELECT is used by both JPA/EclipseLink and Hibernate. This can be overridden by using:
#Fetch(FetchMode.JOIN)
#Fetch(FetchMode.SUBSELECT)
in Hibernate. It also allows to set SELECT mode explicitly using #Fetch(FetchMode.SELECT) which can be tuned by using batch size e.g. #BatchSize(size=10).
Corresponding annotations in EclipseLink are:
#JoinFetch
#BatchFetch
"mxc" is right. fetchType just specifies when the relation should be resolved.
To optimize eager loading by using an outer join you have to add
#Fetch(FetchMode.JOIN)
to your field. This is a hibernate specific annotation.
The fetchType attribute controls whether the annotated field is fetched immediately when the primary entity is fetched. It does not necessarily dictate how the fetch statement is constructed, the actual sql implementation depends on the provider you are using toplink/hibernate etc.
If you set fetchType=EAGER This means that the annotated field is populated with its values at the same time as the other fields in the entity. So if you open an entitymanager retrieve your person objects and then close the entitymanager, subsequently doing a person.address will not result in a lazy load exception being thrown.
If you set fetchType=LAZY the field is only populated when it is accessed. If you have closed the entitymanager by then a lazy load exception will be thrown if you do a person.address. To load the field you need to put the entity back into an entitymangers context with em.merge(), then do the field access and then close the entitymanager.
You might want lazy loading when constructing a customer class with a collection for customer orders. If you retrieved every order for a customer when you wanted to get a customer list this may be a expensive database operation when you only looking for customer name and contact details. Best to leave the db access till later.
For the second part of the question - how to get hibernate to generate optimised SQL?
Hibernate should allow you to provide hints as to how to construct the most efficient query but I suspect there is something wrong with your table construction. Is the relationship established in the tables? Hibernate may have decided that a simple query will be quicker than a join especially if indexes etc are missing.
Try with:
select p from Person p left join FETCH p.address a where...
It works for me in a similar with JPA2/EclipseLink, but it seems this feature is present in JPA1 too:
If you use EclipseLink instead of Hibernate you can optimize your queries by "query hints". See this article from the Eclipse Wiki: EclipseLink/Examples/JPA/QueryOptimization.
There is a chapter about "Joined Reading".
to join you can do multiple things (using eclipselink)
in jpql you can do left join fetch
in named query you can specify query hint
in TypedQuery you can say something like
query.setHint("eclipselink.join-fetch", "e.projects.milestones");
there is also batch fetch hint
query.setHint("eclipselink.batch", "e.address");
see
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
I had exactly this problem with the exception that the Person class had a embedded key class.
My own solution was to join them in the query AND remove
#Fetch(FetchMode.JOIN)
My embedded id class:
#Embeddable
public class MessageRecipientId implements Serializable {
#ManyToOne(targetEntity = Message.class, fetch = FetchType.LAZY)
#JoinColumn(name="messageId")
private Message message;
private String governmentId;
public MessageRecipientId() {
}
public Message getMessage() {
return message;
}
public void setMessage(Message message) {
this.message = message;
}
public String getGovernmentId() {
return governmentId;
}
public void setGovernmentId(String governmentId) {
this.governmentId = governmentId;
}
public MessageRecipientId(Message message, GovernmentId governmentId) {
this.message = message;
this.governmentId = governmentId.getValue();
}
}
Two things occur to me.
First, are you sure you mean ManyToOne for address? That means multiple people will have the same address. If it's edited for one of them, it'll be edited for all of them. Is that your intent? 99% of the time addresses are "private" (in the sense that they belong to only one person).
Secondly, do you have any other eager relationships on the Person entity? If I recall correctly, Hibernate can only handle one eager relationship on an entity but that is possibly outdated information.
I say that because your understanding of how this should work is essentially correct from where I'm sitting.