Is it possible to perfrom the following transaction with DynamoDB? - java

I am working with DynamoDB using Java. I have 2 tables with me, say Table1 and Table2.
Table1 has items with attributes like this: <attr1, attr2, attr3>
Table2 has items with attributes like this: <attr3, attr4, attr5>
[In the above 2 lines, the attributes in bold are partition keys and the ones in italic are range keys].
My API gets attr1 as input, using that I will have to access an item from Table1, then using the attr3 from the returned item, I will have to access an item from Table2 (I have the attr4 value available with me).
I want these 2 steps to be completed as a transaction.
Is it possible to perform this type of transaction in DynamoDB?
TransactionLoad, seems to be promising but I couldn't find a way to specify that I want a particular item from the Table1, using the attr1 value and then using the attr3 value obtained, get the item from Table2.

To expand on the accepted answer, it looks like you're trying to implement joins using a NoSQL database.
One of the defining differences between SQL and NoSQL solutions is the normalization of your data. In SQL, we often normalize the data and use SQL to combine the data into the structure our application needs. This can become problematic at scale, as doing complex joins (or other unbounded SQL queries) can make it difficult to predict performance at scale.
NoSQL databases, on the other hand, typically denormalize data to match the specific needs of our application. This approach eliminates the need to join data across tables and improves performance at scale.
The approach you are describing in your question is a blend of these two approaches. You are using NoSQL to store your data in a normalized fashion and are simulating a join in your application code (because DDB does not give you joins). While this approach may make sense in certain scenarios, it suggests that you may not be modeling your data optimally for a NoSQL database.
The "NoSQL" way of achieving what you're after is to store the data in Table1 and Table2 in the same partition (in the same table).

At first I would like to point that DynamoDB (or any non SQL) it's not suited for that kind of operations, and multiple tables: NoSQL Design for DynamoDB
And about your question, the transactional read posibilities of DynamoDB are TransactGetItems, where you can read from multiple tables, but it´s not possible to make a transaction that read 1 table, and with the results read another, in fact by definition those are 2 transactions.

Related

Adding custom fields in my application

I have a SAAS product, which is build by Spring MVC and Hibernate. Generally SAAS products allow user's to customize the product like adding extra fields to the table. So i want to give the flexibility to users, to create custom fields in the tables for themselves. Please provide all the viable solutions to achieve it. Thank you so much for your help.
I'm guessing your trying to back this to a Relational database. The primary problem is that relational databases store things in tables, and tables don't really handle free form data well.
So one solution is to use a document structure that is flexible, like XML (and perhaps ditch the database) but databases have features which are nice, so let's also consider the database-using approaches.
You could create a "custom field" table which would have columns (composite primary key) for
ExtendedTable
ColumnName
but you'd also have to store the data somewhere
(ExtendedKey)
DataItem
And now we get into the really nasty bits. How would you apply constraints to this data? I mean, what would the type be of a DataItem? A general solution would be quite complex (being a type of free form database). Hopefully you could limit the solution to solve only the problems you require solved.
Another approach is to use a single "extra" column that contains an XML record which embeds it's own "column and value" extensions, but if you wanted to display a table of the efficiently, you'd have to parse out every XML document in every field, which is not ideal.
Neither one of these approaches will work well with the existing SQL query language, so you'll then start building your own query language.
I suggest you go back and look at real data requirements, instead of sweeping them under the table with a "and anything else one might want" set of columns on your table.
Your requirement is best suited use case for NoSQL databases (like MongoDB).
Dynamically creating relational database tables & columns (modifying schemas) upon user requests in an application is not a best practice as these involve DDL operations, which are very powerful and in case if you don't handle them carefully, the whole application's database goes to the inconsistent state.

Hbase sort on column qualifiers

I have an Hbase table with a couple of million records. Each record has a couple of properties describing the record, stored each in a column qualifier.(Mostly int or string values)
I have a a requirement that I should be able to see the records paginated and sorted based on a column qualifier (or even more than one, in the future). What would be a best approach to do this? I have looked into secondary indexes using coprocessors (mostly hindex from huawei), but it doesn't seem to match my use case exactly. I've also thought about replicating all the data into multiple tables, one for each sort property, which would be included in the rowkey and then redirect queries to those tables. But this seems very tedious as I have a few so called properties already..
Thanks for any suggestions.
You need your NoSQL database to work just like a RDBMS, and given the size of your data your life would be a lot simpler if you stick to it, unless you expect exponential growth :) Also, you don't mention if your data gets updated, this is very important to make a good decision.
Having said that, you have a lot of options, here are some:
If you can wait for the results: Write a MapReduce task to do the scan, sort it and retrieve the top X rows, do you really need more than 1000 pages (20-50k rows) for each sort type?. Another option would be using something like Hive.
If you can aggregate the data and "reduce" the dataset: Write a MapReduce task to periodically export the newest aggregated data to a SQL table (which will handle the queries). I've done this a few times to and it works like a charm, but it depends on your requirements.
If you have plenty of storage: Write a MapReduce task to periodically regenerate (or append the data) a new table for each property (sorting by it in the row-key). You don't need multiple tables, just use a prefix in your rowkeys for each case, or, if you do not want tables and you won't have a lot queries, simply write the sorted data to csv files and store them in the HDFS, they could be easily read by your frontend app.
Manually maintain a secondary index: Which would not very tolerant to schema updates and new properties but would work great for near real-time results. To do it, you have to update your code to also to write to the secondary table with a good buffer to help with performance while avoiding hot regions. Think about this type of rowkeys: [4B SORT FIELD ID (4 chars)] [8B SORT FIELD VALUE] [8B timestamp], with just one column storing the rowkey of the main table. To retrieve the data sorted by any of the fields just perform a SCAN using the SORT FIELD ID as start row + the starting sort field value as pivot for pagination (ignore it to get the first page, then set the last one retrieved), that way you'll have the rowkeys of the main table, and you can just perform a multiget to it to retrieve the full data. Keep in mind that you'll need a small script to scan the main table and write the data to the index table for the existing rows.
Rely on any of the automatic secondary indexing through coprocessors like you mentioned, although I do not like this option at all.
You have mostly enumerated the options. HBase natively does not support secondary indexes as you are aware. In addition to hindex you may consider phoenix
https://github.com/forcedotcom/phoenix
( from SalesForce) which in addition to secondary indexes has jdbc driver and sql support.

Data structure/Java Technique for managing a list of sequential commands

I'm not sure if something special exists for this use case - but it felt like a case where someone was likely to have made some sort of useful structure/technique/design-pattern.
My Situation
I have a set of SQL commands executed from middle tier (Java) to insert/update/delete data to any of a set of very large tables via joins from a related staging table.
I have more SQL commands which update various derived tables based on the staging table/actual table contents. Different tables will interact with different derived tables via different queries (as usual). These commands may have to be interleaved with the first set depending on the use case - so, I can't necessarily execute set 1 then set 2 all at once.
My Question
So, I need to build a chain of commands that get executed sequentially, and I need to trigger a rollback if any of them fail. I'd like to do this in the most clear, documented way possible.
Does anyone know a standard way of coding this? I'm sure anyone migrating from stored procedure code to middle tier code has done this before and I don't want to reinvent the wheel if there are good options out there.
Additional Information
One of my main concerns is making everything clear. To elaborate, I'll have a set of queries specifically designed to:
Truncate staging table A' and populate it with primary keys targeting deletion records
Delete from actual table A based on join with A'
Truncate staging table A' and populate it with full data for upserts
Update/Insert records from A' to A based on joins
The same logic will apply to tables B, C, D, etc. Unfortunately, it can be the case where just A and C need an extra step, like syncing deletes to a certain derived table, to be done after the deletions but before the upserts.
I'd obviously like to group all the logic for updating a table, and I'd like to group all the logic for updating a derived table as well, but at execution time they have to be intelligently interleaved and this sounds messy to me.
Don't write such a thing yourself. This is what JTA was born for.
You can use either JPA or Spring to do it.
Annotate the unit of work as transactional and let the database and JDBC handle it.
If you must do it yourself, follow the aspect-oriented approach and make it a decorative "before & after" implementation.

Can we write describe table query in JPQL?

I am taking a 'Keyword' and table name from user.
Now, I want to find all the columns of table whose data type is varchar(String).
Then I will create query which will compare the keyword with those column and matching rows will be returned as result set.
I tried desc table_name query, but it didn't work.
Can we write describe table query in JPQL?
If not then is there any other way to solve above situation?
Please help and thank you in advance.
No workaround is necessary, because it's not a drawback of the technology. It is not JPQL that needs to be changed, it's your choice of technology. In JPQL you cannot even select data from a table. You select from classes, and these can be mapped to multiple tables at once, resulting in SQL joins for simplest queries. Describing such a join would be meaningless. And even if you could describe a table, you do not use names of columns in JPQL, but properties of objects. Describing tables in JPQL makes no sense.
JPQL is meant for querying objects, not tables. Also, it is meant for static work (where classes are mapped to relations once and for good) and not for dynamic things like mapping tables to objects on-the-fly or live inspection of database (that is what ror's AR is for). Dynamic discovery of properties is not a part of that.
Depending on what you really want to achieve (we only know what you are trying to do, that's different) you have two basic choices:
if you are trying to write a piece of software in a dynamic way, so that it adjusts itself to changes in schema - drop JPQL (or any other ORM). Java classes are meant to be static, you can't really map them to dynamic tables (or grow new attributes). Use rowsets, they work fine and they will let you use SQL;
if you are building a clever library that can be shared by many projects and so has to work with many different static mappings, use reflection API to find properties of objects that you query for. Names of columns in the table will not help you anyway, since in JPQL queries you have to use names defined in mappings.
Map the database dictionary tables and read the required data from them. For Oracle database you will need to select from these three tables: user_tab_comments, user_tab_cols, user_col_comments; to achieve the full functionality of the describe statement.
There are some talks over the community about dynamic definition of the persistent unit in the future releases of JPA: http://www.oracle.com/goto/newsletters/javadev/0111/blogs_sun_devoxx.html?msgid=3-3156674507
According to me, we can not use describe query in jpql.

build oracle sql query dynamically from java application

How do I build oracle pl/sql query dynamically from a java application? The user will be presented with a bunch of columns that are present in different tables in the database. The user can select any set of column and the application should build the complete select query using only the tables that contain the selected columns.
For example, lets consider that there are 3 tables in the database. The user selects col11, col22. In this case, the application should build the query using Tabl1 and Tabl2 only.
How do I achieve this?
Tabl1
- col11
- col12
- col13
Tabl2
- fkTbl1
- col21
- col22
- col23
Tabl3
- col31
- col32
- col33
- fkTbl1
Ad hoc reporting is an old favourite. It frequently appears as a one-liner at the end of the Reports Requirements section: "Users must be able to define and run their own reports". The only snag is that ad hoc reporting is an application in its own right.
You say
"The user will be presented with a
bunch of columns that are present in
different tables in the database."
You can avoid some of the complexities I discuss below if the "bunch of columns" (and the spread of tables) is preselected and tightly controlled. Alas, it is in the nature of ad hoc reporting that users will want pretty much all columns from all tables.
Let's start with your example. The user has selected col11 and col22, so you need to generate this query:
SELECT tabl1.col11
, tabl2.col22
FROM tabl1 JOIN tabl2
ON (TABL1.ID = TABL2.FKTABL1)
/
That's not too difficult. You just need to navigate the data dictionary views USER_CONSTRAINTS and USER_CONS_COLUMNS to establish the columns in the join condition - providing you have defined foreign keys (please have foreign keys!).
Things become more complicated if we add a fourth table:
Tabl4
- col41
- col42
- col43
- fkTbl2
Now when the user choose col11 and col42 you need to navigate the data dictionary to establish that Tabl2 acts as an intermediary table to join Tabl4 and Tabl1 (presuming you are not using composite primary keys, as most people don't). But suppose the user selects col31 and col41. Is that a legitimate combination? Let's say it is. Now you have to join Tabl4 to Tabl2 to Tabl1 to Tabl3. Hmmm...
And what if the user selects columns from two completely unrelated tables - Tabl1 and Tabl23? Do you blindly generate a CROSS JOIN or do you hurl an exception? The choice is yours.
Going back to that first query, it will return all the rows in both tables. Almost certainly your users will want the option to restrict the result set. So you need to offer them the ability to add to filters to the WHERE clause. Gotchas here include:
ensuring that supplied values are of an appropriate data-type (no strings for a number, no numbers for a date)
providing look-ups to reference data
values
handling multiple values (IN list
rather than equals)
ensuring date ranges are sensible
(opening bound before closing bound)
handling free text searches (are you
going to allow it? do you need to
use TEXT indexes or will you run the
risk of users executing LIKE
'%whatever%' against some CLOB
column?)
The last point highlights one risk inherent in ad hoc reporting: if the users can assemble a query from any tables with any filters they can assemble a query which can drain all the resources from your system. So it is a good idea to apply profiles to prevent that happening. Also, as I have already mentioned, it is possible for the users to build nonsensical queries. Bear in mind that you don't need very many tables in your schema to generate too many permutations to test.
Finally there is the tricky proposition of security policies. If users are restricted to seeing subsets of data on the basis their department or their job role, then you will need to replicate those rules. In such cases the automatic application of policies through Row Level Security is a real boon
All of which might lead you to conclude that the best solution would be to pursuade your users to acquire an off-the-shelf product instead. Although that approach isn't without its own problems.
The way that I've done this kind of thing in the past is to simply construct the SQL query on the fly using a StringBuilder and then executing it using a JDBC a non-prepared statement. This is rather inefficient since the Oracle DB has to repeat all of the query analysis and optimization work for each query.

Categories

Resources