OpenJPA distinct on

OpenJPA distinct on - java

In my DB schema I have conversations with several emails. I want to get the newest emails from a list of conversations. In PostgreSql the query:
select distinct on (conversation_id) *
from email
where conversation_id in (7085214, 7084964)
order by conversation_id, processing_date desc
OpenJPA:
(List<Email>) entityManager.createQuery("SELECT
distinct(email.conversation.id), email FROM Email email WHERE
email.conversation.id in :id ORDER BY email.conversation.id,
email.processingDate DESC").setParameter("id", conversationIds);
It gives back a map of the conversation ids and the whole list of emails in the conversations.
How could I make it right?
Thanks

Use native SQL.
The only other way to do what you want is to develop a patch to OpenJPA that "teaches" it how to use the PostgreSQL extension DISTINCT ON in its JPQL parser and query generator. Most ORMs accept such extensions via dialect hooks. Don't expect this to be a simple task, though - unless you're writing a lot of these queries, native SQL is almost certain to be much easier.
You can't just use DISTINCT or DISTINCT ON like functions. They aren't; they're completely separate syntax. A JPQL engine would try to convert it into a true function call that'd fail at runtime - or in the case of distinct on, just fail to parse it in the first place.
BTW, DISTINCT ON is a bit like GROUP BY in some other vendor databases like MySQL, where you're allowed to specify columns in the SELECT that don't appear in the GROUP BY or an aggregate. So in MySQL people probably do this by just producing a technically invalid query that MySQL accepts anyway - it's quite likely that the OpenJPA JPQL handler won't notice the problem, so it'll pass it through fine. This trick won't work for DISTINCT ON and PostgreSQL is strictly standards compliant about GROUP BY, it won't let you produce a non-deterministic query with GROUP BY.

Related

Limiting SQL Injection when query is almost entirely configurable

I have a requirement to perform a scheduled dump of a SQL query from a web application. Initially it was an entire table (only the table name was configurable), but then the addition of a configurable WHERE clause was raised, along with a subset of columns.
The configurable options now required are:
columns
table name
where clause
At this point, it might as well just be the entire query, right?!
I know that SQLi can be mitigated somewhat by java.sql.PreparedStatement, but as far as I can tell, that relies on knowing the columns and datatypes at compile time.
The configurable items will not be exposed to end users. They will sit in a properties file within WEB-INF/classes, so the user's I am defending from here are sysadmins that are not as good as they think they are.
Am I being over cautious here?
If nothing else, can java.sql.PreparedStatement prevent multiple queries from being executed if, say, the WHERE clause was Robert'); DROP TABLE students;--?

A prepared statement will not handle this for you. With a prepared statement you can only safely add parameters to your query, not table names, column names or entire where clauses.
Especially the latter makes it virtually impossible to prevent injection if there are no constraints whatsoever. Column and table name parameters could be checked against a list of valid values either statically defined or dynamically based on you database structure. You could do some basic regex checking on the where parameter, but that will only really help against obvious SQL injection.
With the flexiblity you intend to offer in the form of SELECT FROM WHERE you could have queries like this:
SELECT mycolumn FROM mytable WHERE id = 1 AND 'username' in (SELECT username FROM users)
You could look at something like JOOQ to offer safe dynamic query building while still being able to constrain the things your users are allowed to query for.
Constraining your users in one way or another is key here. Not doing that means you have to worry not just about SQL injection, but also about performance issues for instance. Provide them with a visual (drag-and-drop) query builder for instance.

"It all depends".
If you have an application where users can type in the where clause as free text, then yes, they can construct SQL Injection attacks. They can also grind your server to a halt by selecting huge cartesian joins.
You could create a visual query builder - use the schema metadata to show a list of tables, and once the table is selected the columns, and for each column the valid comparisons. You can then construct the query as a parameterized query, and limit the human input to the comparison values, which you can in turn use as parameters.
It's a lot of work, though, and in most production systems of any scale, letting users run this kind of query is usually not particularly useful...

It's insecure to allow users to execute arbitrary queries. This is the kind of thing you'd see at Equifax. You don't want to allow it.
Prepared statements don't help make SQL expressions safe. Using parameters in prepared statements help make values safe. You can use a parameter only in the place where you would normally put a constant value, like a number, a quoted string, or a quoted date.
The easiest solution would be to NOT allow arbitrary queries or expressions on demand.
Instead, allow users to submit their custom query for review.
The query is reviewed by a human being, who may authorize the stored query to be run by the user (or other users). If you think you can develop some kind of automatic validator, be my guest, but IMHO that's bound to be a lot more work than just having a qualified database administrator review it.
Subsequently, the user is allowed to run the stored query on demand, but only by its id.
Here's another alternative idea: users who want to run custom queries can apply to get a replica of the database, to host on their own computer. They will get a dump of the subset of data they are authorized to view. Then if they run queries that trash the data, or melt their computer, that's their business.

Named Query Or Native Query or Query Which one is better in performance point of view?

Which one is better among following(EJB 3 JPA)
//Query
a). getEntityManager().createQuery("select o from User o");
//Named Query where findAllUser is defined at Entity level
b). getEntityManager().createNamedQuery("User.findAllUser");**
//Native Query
c). getEntityManager().createNativeQuery("SELECT * FROM TBLMUSER ");
Please explain me which approach is better in which case?.

createQuery()
It should be used for dynamic query creation.
//Example dynamic query
StringBuilder builder = new StringBuilder("select e from Employee e");
if (empName != null) {
builder.append(" where e.name = ?");
}
getEntityManager().createQuery(builder.toString());
createNamedQuery()
It is like a constant variable which can be reused by name. You should use it in common database calls, such as "find all users", "find by id", etc.
createNativeQuery()
This creates a query that depends completely on the underlying database's SQL scripting language support. It is useful when a complex query is required and the JPQL syntax does not support it.
However, it can impact your application and require more work, if the underlying database is changed from one to another. An example case would be, if your development environment is in MySQL, and your production environment is using Oracle. Plus, the returned result binding can be complex if there is more than a single result.

For me, the better is obviously the first two one, that is JPQL Queries - the second meaning the entity manager will compile the queries (and validate them) while loading the persistence unit, while the first would only yield errors at execution time.
You can also get support in some IDE, and it support the object notation (eg: select b from EntityA a left join a.entityB b) and some other oddities introduced by the object-relational mapping (like collections, index, etc).
On the other hand, use Native queries in last resort in corner case of JPQL (like window function, such as select id, partition by (group_id) from table)

Native SQL is not necessarily faster than Hibernate/JPA Query. Hibernate/JPA Query finally also is translated into SQL. In some cases it can happen Hibernate/JPA does not generate the most efficient statements, so then native SQL can be faster - but with native SQL your application loses the portability from one database to another, so normally is better to tune the Hibernate/JPA Query mapping and the HQL statement to generate more efficient SQL statements. On the other side with native SQL you're missing the Hibernate cache - as a consequence in some cases native SQL can be slower than Hibernate/JPA Query.
I am not with performance, in most cases for the performance it is irrelevant if your load all columns or only the needed columns. In database access the time is lost when searching the row, and not when transferring the data into your application. When you read only the necessary columns.

Simple Answer:
1) createQuery() - When you want your queries to be executed at runtime.
2) createNamedQuery() - When you want to send common database calls like findBy<attribute>, findAll,..
3)createNativeQuery() - Used when you want your queries to be database vendor-specific. This brings a challenge of portability.

Named queries are the same as queries. They are named only to let them be reusable + they can be declared in various places eg. in class mappings, conf files etc. (so you can change query without changing actual code)
Native queries are just native queries. You have to do all the things that JPA Queries do for you eg. Binding and quoting values etc. + they use DBMP independent syntax (JPQL in your case) so changing database system (lets say from MySQL to Postgresql or H2) will require less work as it does not (not always) require to rewrite native queries.

Named Query:
All the required queries are written in one place related to that entity and they are differentiated by name and we can use them based on the name, no need to write entiry query each time just use the name of the query
For example:
#NamedQuery(name="User_detailsbyId",Query="from UserDetails where UserId=:UserId)

Memcache implementation design

Iam trying to implement memcache in my web application and just wanted to get suggestions that whether what iam doing is right in terms of design.
I have a class SimpleDataAccessor which run all my insert, update and select sql queries. So any query that has to be performed is executed inside the method of this class.
So inside the method where I have my select query implementation i have a method which stores the resultset in memcache like this.
storeinMC(resultset.getJSON(),sqlquery);
the sqlquery here is my key.
Also before running the selectquery i check in memcache that whether I have a resultset already for that query.
if((String res=getRSFromMC(sqlquery)==null)
So i've tried to keep it plain and simple.
Do you see any issues with this.?

As rai.skumar rightfully pointed out your SQL statements could be constructed differently (e.g. WHERE clause could contain same conditions in diff order, etc.)
So to overcome above mentioned issues, you need to parse your SQL and get all the relevant pieces from it. Then you can combine these pieces into a cache key.
You can take a look at SQL parsers: ZQL, JSqlParser, General SQL Parser for Java that return you java classes out of your SQL.
Another option would be to use JPA instead of straight JDBC. For example Hibernate has great JPA support and fully capable of caching your queries.
If you feel closer to JDBC you could use MyBatis that has very JDBC like syntax and caching support.

Consider below queries:
String k1 = "Select * from table"; //Query1
String k2 = "Select * from TABLE"; // Query2 ; notice TABLE is in caps
Both of above SQL queries are same and will fetch same data. But if above queries are used as keys in Memchached they will get stored at different places ( as k1.equals(k2) will return false).
Also if somehow you can ensure that there are no typos or extra spaces, it won't be very efficient as keys/queries could be very big.

Usage of SQL parsers for java for search scenarios

Can anyone suggest some good implementation examples or usage scenarios where SQL parsers can be used for java.
I have an application where we need to filter data to be presented on UI based on certain parameters, sort criteria etc.
I have some doubts regarding this:
1)Can this be an ideal solution for this?
2)How can UI play an role for providing query to the Java layer?

Do you want to construct sql query dynamically in your java app and then fetch data using this sql? Let's say you have a sql like this:
select salary from emp where dept='sales'
after users pick up some other filters from UI, such as age > 40, then sql will be like this:
select salary from emp where dept='sales' and age > 40
Of course, you maybe want to add more complicated filters or add sort clause as well.
In order to achieve this, a full SQL Parser is helpful, here is a demo that illustrate how to Deconstruct, Modify, Rebuild a SQL statement based on a Java SQL Parser.

I don't know what a SQL parser is supposed to be. Applications don't have to parse SQL. Thay have to execute SQL queries, and thus potentially generate SQL queries dynamically. But it's the database that parses the SQL, not the application.
The UI doesn't typically provide a query to the service layer. The UI doesn't have to know how the data is persisted and which queries to execute. It's not its responsibility. The UI should just pass Filter or Criteria objects to the service layer, which transforms the filter or criteria into a SQL query, executes the query, transforms the results into Java objects, and return those objects to the UI, which displays these objects.

build oracle sql query dynamically from java application

How do I build oracle pl/sql query dynamically from a java application? The user will be presented with a bunch of columns that are present in different tables in the database. The user can select any set of column and the application should build the complete select query using only the tables that contain the selected columns.
For example, lets consider that there are 3 tables in the database. The user selects col11, col22. In this case, the application should build the query using Tabl1 and Tabl2 only.
How do I achieve this?
Tabl1
- col11
- col12
- col13
Tabl2
- fkTbl1
- col21
- col22
- col23
Tabl3
- col31
- col32
- col33
- fkTbl1

Ad hoc reporting is an old favourite. It frequently appears as a one-liner at the end of the Reports Requirements section: "Users must be able to define and run their own reports". The only snag is that ad hoc reporting is an application in its own right.
You say
"The user will be presented with a
bunch of columns that are present in
different tables in the database."
You can avoid some of the complexities I discuss below if the "bunch of columns" (and the spread of tables) is preselected and tightly controlled. Alas, it is in the nature of ad hoc reporting that users will want pretty much all columns from all tables.
Let's start with your example. The user has selected col11 and col22, so you need to generate this query:
SELECT tabl1.col11
, tabl2.col22
FROM tabl1 JOIN tabl2
ON (TABL1.ID = TABL2.FKTABL1)
/
That's not too difficult. You just need to navigate the data dictionary views USER_CONSTRAINTS and USER_CONS_COLUMNS to establish the columns in the join condition - providing you have defined foreign keys (please have foreign keys!).
Things become more complicated if we add a fourth table:
Tabl4
- col41
- col42
- col43
- fkTbl2
Now when the user choose col11 and col42 you need to navigate the data dictionary to establish that Tabl2 acts as an intermediary table to join Tabl4 and Tabl1 (presuming you are not using composite primary keys, as most people don't). But suppose the user selects col31 and col41. Is that a legitimate combination? Let's say it is. Now you have to join Tabl4 to Tabl2 to Tabl1 to Tabl3. Hmmm...
And what if the user selects columns from two completely unrelated tables - Tabl1 and Tabl23? Do you blindly generate a CROSS JOIN or do you hurl an exception? The choice is yours.
Going back to that first query, it will return all the rows in both tables. Almost certainly your users will want the option to restrict the result set. So you need to offer them the ability to add to filters to the WHERE clause. Gotchas here include:
ensuring that supplied values are of an appropriate data-type (no strings for a number, no numbers for a date)
providing look-ups to reference data
values
handling multiple values (IN list
rather than equals)
ensuring date ranges are sensible
(opening bound before closing bound)
handling free text searches (are you
going to allow it? do you need to
use TEXT indexes or will you run the
risk of users executing LIKE
'%whatever%' against some CLOB
column?)
The last point highlights one risk inherent in ad hoc reporting: if the users can assemble a query from any tables with any filters they can assemble a query which can drain all the resources from your system. So it is a good idea to apply profiles to prevent that happening. Also, as I have already mentioned, it is possible for the users to build nonsensical queries. Bear in mind that you don't need very many tables in your schema to generate too many permutations to test.
Finally there is the tricky proposition of security policies. If users are restricted to seeing subsets of data on the basis their department or their job role, then you will need to replicate those rules. In such cases the automatic application of policies through Row Level Security is a real boon
All of which might lead you to conclude that the best solution would be to pursuade your users to acquire an off-the-shelf product instead. Although that approach isn't without its own problems.

The way that I've done this kind of thing in the past is to simply construct the SQL query on the fly using a StringBuilder and then executing it using a JDBC a non-prepared statement. This is rather inefficient since the Oracle DB has to repeat all of the query analysis and optimization work for each query.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.