I've a database structure built with JPA. To give a little bit context - the user is shown a datatable and some (many...) filters with which he can filter the shown entries of the datatable. For example, every entry of the datatable corresponds to a certain factory (of e.g. 40 factories in total) and the user can filter for one or more factories.
To prevent overhead, the user can only filter by factories which occur in the dataset the datatable is built of, i.e. factories which would give an empty result after filtering (since there aren't any entries with this factory) aren't shown at all in the factory filter.
Furthermore, the datatable is paginated, i.e. just the first (second, third, etc.) 50 entries are given to the frontend but -- and that's the crucial point -- the filters should of course correspond to all entries (i.p. also the entries on the pages which are not shown). That means if factory B only occurs on page 2 and page 1 is loaded (i.e. there is no factory B under the shown entries resp. the entries the frontend receives), the factory filter should still list factory B.
My approach is the following: When constructing the result, using the same sql query, I get all values which one can filter by. So my constructed table looks so to say like this
factory | status | ...
F1 | done | ...
F2 | in progress | ...
F1 | in progress | ...
containing the entries over all pages (of which just the first (second, third, ...) 50 entries are given to the frontend) projected to the properties which one can filter by.
Now I just get the needes values by e.g. SELECT DISTINCT(entry.factory) FROM ... in a typed query. Basically, at this point I would create a sql view of the constructed table and get then 1) all distinct factories, 2) all distinct statuses, etc. pp. But JPA doesn't allow a CREATE VIEW when creating a typed query so at the moment, for each column I construct the same table (the one above) and get the distinct values in the current column.
(just for the sake of completeness: for performance reasons, it is not possible to get all entities and then iterate e.g. using the stream API over the list of results mapping to the "filterable" attributes, i.e. filtering has to happen directly in the database using SQL (i guess?))
I have two questions:
Is this implementation reasonable?
If yes, how can I approach the idea of creating a view? Ideally, directly via JPA but unfortunately I didn't find a real way to approach this using JPA.
Edit: To break it down to the very basic problem:
QueryPart queryPart = new QueryPartDTO("<some JPA query>");
QueryFactory<MyEntity> queryFactory = new QueryFactory<>();
final TypedQuery<MyEntity> query = queryFactory.buildTypedQuery(queryPart, entityManager, MyEntity.class);
List<MyEntity> resultList = query.getResultList();
// at this point, the query fires ↑
// that's okay because I do need the List<Entity> in order to give it to the frontend
// but instead of just getting a List<MyEntity>,
// I would like to have a sql view containing the result (resp. the whole result) as well
// because moreover, I have to get e.g. all factories, all statuses, etc.
// which occur in the given data - what could of course be done with
// the already given List<MyEntity> using the Stream API but since the list
// is in general very large, this approach would be to inefficient.
// Instead, I would like to do these computations using SQL - but for this,
// I need the data given by the constructed query above in a SQL table.
// Executing the query over and over again (and first, getting all distinct factories,
// second getting all distinct statuses, etc.) is not possible for
// performance reasons - the query is expensive. Hence, I need to
// save the result the query gives - in a way, I can execute some more
// queries on it: a view i guess.
Thanks in advance.
JPA doesn't support view creation indeed. It doesn't support any DDL queries.
Dynamic view creation is a fishy idea. Normally you design your database scheme only once. Moreover, you'll need separate views for different users which starts to sound like a nightmare.
What you can do is create a table with user filtering results, keyed with the user ID. Whenever the user applies his filters, you can DELETE the old result rows for this user and INSERT the new ones. Then you do the SELECT DISTINCT queries over this smaller dataset.
JPA won't help you with this too much since it doesn't support the INSERT SELECT SQL statement which you'll like to use here.
EDIT. I changed this answer to keep useful comments below. The previous version was completely different and not useful anyway.
Related
I am working on an JavaEE application, and there are almost 1000+ tables in the database, now I have to query the records by the parametes from the client.
Generally I will create one Entity for each table, and create the Dao,Service to do the query.
However I meet two problems:
1 Number of the tables
As I said, 1000+ table with almost 40+ columns for each, it would a nightmare to create the entity one by one.
2 Scheme update
Even I can create the Entity by program, the schema of the data may change sometime which is out of my control.
And in my application, only read operations are related to these kinds of data,no update,delete,create required.
So I wonder if the following solution is possible:
1 Use Map instead of POJOs
Do not create POJOs at all, use the native Map to wrap the columns and values.
2 Row mapping
When querying using Hibernate or Spring JdbcTemplate or something else, use a mapper to map each row to an entry in the map.
If yes, I would use the ResultMetaData to detect the column name,type,value:
ResultMetaData rmd=rs.getMetaData();
for(int i=0;i<rmd.getColumnCount();i++){
Type t=rmd.getType(i)
if(t==....){
...
}else if(t=...){
...
}
}
Looks like part of JPA's job, any library can used here?
If not, any other alternatives?
I have a request that should extract data from three tables A, B, C based on two conditions, these tables A,B and C are located in the same data source.
does BIRT 3.1 supports joint data sets with more than two tables?
Otherwise, is there a way to overcome this limitation?
You don't say what your data source is, but assuming that it is a SQL data base. You can do something like this in the SQl. You only need to do BIRT joins if the data is in different data sources.
select TableA.Field
, TableB.OtherField
, TableC.SomeOtherField
from dbo.TableA
left join dbo.TableB
on TableA.Same = TableB.Same
left join dbo.TableC
on TableA.Same = TableC.Same
where TableA.Important = 'Something'
In addition to James' answer:
In many cases just joining the tables using SQL is the best solution (you should know SQL if you are developing with BIRT, unless someone else prepared the Data Sets and corresponding report items for you).
As an alternative, keep in mind that BIRT does not have a "data model" like other report designers (e.g. Oracle Reports) and that you link data from different data sets by creating a corresponding layout structure, with data set parameter bindings.
You didn't mention the logical structure of your data.
If it's master-detail-detail (for example, artist-album-title), then you would use for example a list item bound to DS "artist", containing a list or table item bound to DS "album" which in turn contains a table bound to DS "title".
The DS "album" would need a DS parameter like "artist_id" or whatever (which you use in the WHERE clause of the SELECT statement), and in the list/table item bound to DS "album", you would use row["artist_id"] as the value for the DS parameter "artist_id".
This is similar for the table item bound to DS "title". However, if the primary key consists of (artist_id, album_id, title_no), you probably need access to the current artist from the outer-most list item. To access this, you use row._outer["artist_id"].
The solution for this problem is using the stored procedure query, you set your proceure with whatever sql request you want, compile it with your DBMS, and you call it from BIRT with the syntax
call nameOfYourProceure{(?,?,?...)}
Question marks refer to the parameters that you will pass to your stored procedure.
I'm not sure if something special exists for this use case - but it felt like a case where someone was likely to have made some sort of useful structure/technique/design-pattern.
My Situation
I have a set of SQL commands executed from middle tier (Java) to insert/update/delete data to any of a set of very large tables via joins from a related staging table.
I have more SQL commands which update various derived tables based on the staging table/actual table contents. Different tables will interact with different derived tables via different queries (as usual). These commands may have to be interleaved with the first set depending on the use case - so, I can't necessarily execute set 1 then set 2 all at once.
My Question
So, I need to build a chain of commands that get executed sequentially, and I need to trigger a rollback if any of them fail. I'd like to do this in the most clear, documented way possible.
Does anyone know a standard way of coding this? I'm sure anyone migrating from stored procedure code to middle tier code has done this before and I don't want to reinvent the wheel if there are good options out there.
Additional Information
One of my main concerns is making everything clear. To elaborate, I'll have a set of queries specifically designed to:
Truncate staging table A' and populate it with primary keys targeting deletion records
Delete from actual table A based on join with A'
Truncate staging table A' and populate it with full data for upserts
Update/Insert records from A' to A based on joins
The same logic will apply to tables B, C, D, etc. Unfortunately, it can be the case where just A and C need an extra step, like syncing deletes to a certain derived table, to be done after the deletions but before the upserts.
I'd obviously like to group all the logic for updating a table, and I'd like to group all the logic for updating a derived table as well, but at execution time they have to be intelligently interleaved and this sounds messy to me.
Don't write such a thing yourself. This is what JTA was born for.
You can use either JPA or Spring to do it.
Annotate the unit of work as transactional and let the database and JDBC handle it.
If you must do it yourself, follow the aspect-oriented approach and make it a decorative "before & after" implementation.
I have an application in which there are Courses, Topics, and Tags. Each Topic can be in many Courses and have many Tags. I want to look up every Topic that has a specific Tag x and is in specific Course y.
Naively, I give each standard a list of Course ids and Tag ids, so I can select * from Topic where tagIds = x && courseIds = y. I think this query would require an exploding index: with 30 courses and 30 tags we're looking at ~900 index entries, right? At 50 x 20 I'm well over the 5000-entry limit.
I could just select * from Topic where tagIds = x, and then use a for loop to go through the result, choosing only Topics whose courseIds.contain(y). This returns way more results than I'm interested in and spends a lot of time deserializing those results, but the index stays small.
I could select __KEY__ from Topic where tagIds = x AND select __KEY__ from Topic where courseIds = y and find the intersection in my application code. If the sets are small this might not be unreasonable.
I could make a sort of join table, TopicTagLookup with a tagId and courseId field. The parent key of these entities would point to the relevant Topic. Then I would need to make one of these TopicTagLookup entities for every combination of courseId x tagId x relevant topic id. This is effectively like creating my own index. It would still explode, but there would be no 5000-entry limit. Now, however, I need to write 5000 entities to the same entity group, which would run up against the entity-group write-rate limit!
I could precalculate each query. A TopicTagQueryCache entity would hold a tagId, courseId, and a List<TopicId>. Then the query looks like select * from TopicTagQueryCache where tagId=x && courseId = y, fetching the list of topic ids, and then using a getAllById call on the list. Similar to #3, but I only have one entity per courseId x tagId. There's no need for entity groups, but now I have this potentially huge list to maintain transactionally.
Appengine seems great for queries you can precalculate. I just don't quite see a way to precalculate this query efficiently. The question basically boils down to:
What's the best way to organize data so that we can do set operations like finding the Topics in the intersection of a Course and a Tag?
Your assessment of your options is correct. If you don't need any sort criteria, though, option 3 is more or less already done for you by the App Engine datastore, with the merge join strategy. Simply do a query as you detail in option 1, without any sorts or inequality filters, and App Engine will do a merge join internally in the datastore, and return only the relevant results.
Options 4 and 5 are similar to the relation index pattern documented in this talk.
I like #5 - you are essentially creating your own (exploding) index. It will be fast to query.
The only downsides are that you have to manually maintain it (next paragraph), and retrieving the Topic entity will require an extra query (first you query TopicTagQueryCache to get the topic ID and then you need to actually retrieve the topic).
Updating the TopicTagQueryCache you suggested shouldn't be a problem either. I wouldn't worry about doing it transactionally - this "index" will just be stale for a short period of time when you update a Topic (at worst, your Topic will temporarily show up in results it should no longer show up in, and perhaps take a moment before it shows up in new results which it should show up it - this doesn't seem so bad). You can even do this update on the task queue (to make sure this potentially large number of database writes all succeed, and so that you can quickly finish the request so your user isn't waiting).
As you said yourself you should arrange your data to facilitate the scaling of your app, thus in the question of What's the best way to organize data so that we can do set operations like finding the Topics in the intersection of a Course and a Tag?
You can hold your own indexes of these sets by creating objects of CourseRef and TopicRef which consist of Key only, with the ID portion being an actual Key of the corresponding entity. These "Ref" entities will be under a specific tag, thus no actual Key duplicates. So the structure for a given Tag is : Tag\CourseRef...\TopicRef...
This way given a Tag and Course, you construct the Key Tag\CourseRef and do an ancestor Query which gets you a set of keys you can fetch. This is extremely fast as it is actually a direct access, and this should handle large lists of courses or topics without the issues of List properties.
This method will require you to use the DataStore API to some extent.
As you can see this gives answer to a specific question, and the model will do no good for other type of Set operations.
How do I build oracle pl/sql query dynamically from a java application? The user will be presented with a bunch of columns that are present in different tables in the database. The user can select any set of column and the application should build the complete select query using only the tables that contain the selected columns.
For example, lets consider that there are 3 tables in the database. The user selects col11, col22. In this case, the application should build the query using Tabl1 and Tabl2 only.
How do I achieve this?
Tabl1
- col11
- col12
- col13
Tabl2
- fkTbl1
- col21
- col22
- col23
Tabl3
- col31
- col32
- col33
- fkTbl1
Ad hoc reporting is an old favourite. It frequently appears as a one-liner at the end of the Reports Requirements section: "Users must be able to define and run their own reports". The only snag is that ad hoc reporting is an application in its own right.
You say
"The user will be presented with a
bunch of columns that are present in
different tables in the database."
You can avoid some of the complexities I discuss below if the "bunch of columns" (and the spread of tables) is preselected and tightly controlled. Alas, it is in the nature of ad hoc reporting that users will want pretty much all columns from all tables.
Let's start with your example. The user has selected col11 and col22, so you need to generate this query:
SELECT tabl1.col11
, tabl2.col22
FROM tabl1 JOIN tabl2
ON (TABL1.ID = TABL2.FKTABL1)
/
That's not too difficult. You just need to navigate the data dictionary views USER_CONSTRAINTS and USER_CONS_COLUMNS to establish the columns in the join condition - providing you have defined foreign keys (please have foreign keys!).
Things become more complicated if we add a fourth table:
Tabl4
- col41
- col42
- col43
- fkTbl2
Now when the user choose col11 and col42 you need to navigate the data dictionary to establish that Tabl2 acts as an intermediary table to join Tabl4 and Tabl1 (presuming you are not using composite primary keys, as most people don't). But suppose the user selects col31 and col41. Is that a legitimate combination? Let's say it is. Now you have to join Tabl4 to Tabl2 to Tabl1 to Tabl3. Hmmm...
And what if the user selects columns from two completely unrelated tables - Tabl1 and Tabl23? Do you blindly generate a CROSS JOIN or do you hurl an exception? The choice is yours.
Going back to that first query, it will return all the rows in both tables. Almost certainly your users will want the option to restrict the result set. So you need to offer them the ability to add to filters to the WHERE clause. Gotchas here include:
ensuring that supplied values are of an appropriate data-type (no strings for a number, no numbers for a date)
providing look-ups to reference data
values
handling multiple values (IN list
rather than equals)
ensuring date ranges are sensible
(opening bound before closing bound)
handling free text searches (are you
going to allow it? do you need to
use TEXT indexes or will you run the
risk of users executing LIKE
'%whatever%' against some CLOB
column?)
The last point highlights one risk inherent in ad hoc reporting: if the users can assemble a query from any tables with any filters they can assemble a query which can drain all the resources from your system. So it is a good idea to apply profiles to prevent that happening. Also, as I have already mentioned, it is possible for the users to build nonsensical queries. Bear in mind that you don't need very many tables in your schema to generate too many permutations to test.
Finally there is the tricky proposition of security policies. If users are restricted to seeing subsets of data on the basis their department or their job role, then you will need to replicate those rules. In such cases the automatic application of policies through Row Level Security is a real boon
All of which might lead you to conclude that the best solution would be to pursuade your users to acquire an off-the-shelf product instead. Although that approach isn't without its own problems.
The way that I've done this kind of thing in the past is to simply construct the SQL query on the fly using a StringBuilder and then executing it using a JDBC a non-prepared statement. This is rather inefficient since the Oracle DB has to repeat all of the query analysis and optimization work for each query.