I am trying to sync multiple databases whose items have GUID for IDs, meaning that one item has the same ID on all databases.
My question is:
If i modify or create on item on 1 database, and want to synchronize this change to the other database should i:
1.) Check if the item is new or just modified, if its new then use the save() function, if its modified then use the update() function
or
2.)Do not check if its new or modified and just use the saveOrUpdate() function?
After seeing your use case in the comments, I think the best approach is to track (on both the client and server) when the last updated/last synced time was. In the event that the last sync time is null, or comes before the last updated time, you know that the data needs to be synced.
Now, on to the heart of your question: how to sync it. The client need not know the state of a server when it sends an object to you. In fact, it shouldn't. Consider the case where the client posts an object, your server receives it and process it, but the connection dies before your client receives the response. This is a very valid scenario and will result in a mis-match of data. As a result, any way that you try to determine whether or not the server has received an object (from the client) is likely to end up in a bad state.
The best solution is really to create an idempotent endpoint on the server (an upsert method, or saveOrUpdate as you referred to it in your question) which is able to determine what to do with the object. The server can query it's database by primary key to determine if it has the object or not. If it does, it can update, if not, it can insert.
Understandably, performance is important as well as the data. But, stick with primary keys in the database and that one additional select query you add should be extremely minimal (sub-10ms). If you really want to squeeze some more performance out, you could always use memcache or redis as a caching layer to determine if you have a certain GUID in your database. This way, you only have to hit memory (not your database) to determine if an object exists or not. The overhead of that would be measured only in the latency between your web server and cache server (since a memory read is incredibly cheap).
tl;dr
Upsert (or saveOrUpdate) is the way to go. Try not to track the state of one machine on another.
Related
I am developing a java application that loads certain things from a database, such as client records and product info. When the user navigates to say the 'products' tab, I query for products in the database and update a table with that information.
I am wondering if there is a way to see if the query results have changed since the last check, in order to avoid querying and loading all info from the database, and instead just load updates. Is there a way to do this, or perhaps just load changes only from a query into my table list? My goal is to make the program run faster when switching between tabs.
I am wondering if there is a way to see if the query results have changed since the last check
Stated differently, you want a way to automatically answer the question “is this the same result?” without retrieving the entire result.
The general approach to this problem would be to come up with some fast-to-query proxy for the entire state of the result set, and query that instead.
Once you have determined a stable fast computation for the entire result set, you can compute that any time the relevant data changes; and only poll that stored proxy to see whether the data has changed.
For example, you could say that “the SHA-256 hash of fields lorem, ipsum, and dolor” is your proxy. You can now:
Implement that computation inside the database as a function, maybe products_hash.
Create a latest_products_hash table, that stores created timestamp and products_hash that was computed at that time.
In your application, retrieve the most recent record from latest_products_hash and keep it for reference.
In the database, have a scheduled job, or a trigger on some event you decide makes sense, that will compute and store the products_hash in latest_products_hash automatically without any action from the application.
To determine whether there have been updates yet, the application will query the latest_products_hash table again and compare its most recent record with the one the application stored for reference.
Only if the latest_products_hash most-recent value is different, then query the products table and get the full result set.
That way, the application is polling a much faster query (the most-recent record in latest_products_hash) frequently, and avoiding the full products query until it knows the result set will be new.
I'm currently developing an application in Java that connects to a MySQL database using JDBC, and displays records in jTable. The application is going to be run by more than one user at a time and I'm trying to implement a way to see if the table has been modified. EG if user one modifies a column such as stock level, and then user two tries to access the same record tries to change it based on level before user one interacts.
At the moment I'm storing the checksum of the table that's being displayed as a variable and when a user tries to modify a record it will do a check whether the stored checksum is the same as the one generated before the edit.
As I'm new to this I'm not sure if this a correct way to do it or not; as I have no experience in this matter.
Calculating the checksum of an entire table seems like a very heavy-handed solution and definitely something that wouldn't scale in the long term. There are multiple ways of handling this but the core theme is to do as little work as possible to ensure that you can scale as the number of users increase. Imagine implementing the checksum based solution on table with million rows continuously updated by hundreds of users!
One of the solutions (which requires minimal re-work) would be to "check" the stock name against which the value is updated. In the background, you'll fire across a query to the table to see if the data for "that particular stock" has been updated after the table was populated. If yes, you can warn the user or mark the updated cell as dirty to indicate that that value has changed. The problem here is that the query won't be fired off till the user tries to save the updated value. Or you could poll the database to avoid that but again hardly an efficient solution.
As a more robust solution, I would recommend using a database which implements native "push notifications" to all the connected clients. Redis is a NoSQL database which comes to mind for this.
Another tried and tested technique would be to forgo direct database connection and use a middleware layer like a messaging queue (e.g. RabbitMQ). Message queues enable design of systems which communicate using message. So for e.g. every update the stock value in the JTable would be sent across as a message to an "update database queue". Once the update is done, a message would be sent across to a "update notification queue" to which all clients would be connected. This will enable all of them to know that the value of a given stock has been updated and act accordingly. The advantage to this solution is that you get to keep your existing stack (Java, MySQL) and can implement notifications without polling the DB and killing it.
Checksum is a way to see if data has changed.
Anyway I would suggest you store a column "last_update_date", this column is supposed to be always updated at every update of the record.
So you juste have to store this date (precision date time) and do the check with that.
You can also add a column version number : a simple counter incremented by 1 at each update.
Note:
You can add a trigger on update for updating last_update_date, it should be 100% reliable, maybe you don't need a trigger if you control all updates.
When using in network communication:
A checksum is a count of the number of bits in a transmission unit
that is included with the unit so that the receiver can check to see
whether the same number of bits arrived. If the counts match, it's
assumed that the complete transmission was received.
So it can be translated to check 2 objects are different, your approach is correct.
i am going to call a function which will retrieve some data value from database. but before that i am sending those data. i am just checking whether those data properly inserted or not with this function call. but inserting data taking some time to insert into the database but my function calling starts before it actually inserts the data into the database. As because of that it is finding that no data is inserted in the database. Can any one tell how do i resolve this issue. How to synchronize this. whether i should get the proper result after the proper insertion into the database. i cant use here runnable interface or thread class. the think that i have to do is to call the data access function after certain time so that data gets enough time to get inserted into the database. please help me out.
Don't know what language you are using, but maybe the function has a parameter which causes it to wait until the query finishes before returning? Something mentioning the word "synchronous?"
Use the Database Driver
I'm not familiar with JDBC so I'm not sure which tools are/aren't available to you, but it seems like you're doing more work than you need to.
Typically the database driver will inform you whether the query was executed successfully, so you should not need to have your application query the data afterward to verify that the data is there. Instead, ask the driver for errors to see whether there was a problem with the query.
If you're inserting a large about of data and your database supports it, you may want to use a transaction to perform your insert. This will pass all of the data into the db, attempt the insert, and warn you of any problems.
If there are problems with the transaction, you can roll back, and the database state will be the same as when you started (obviously you will need to handle the errors to save your data). If there are no problems, you can finish committing the transaction, and rest assured that the database state matches the application state.
Alternatives
If for some reason the above methods won't work, you can try to resolve the race condition using an event pattern. In simple terms, you want to raise an event when the data is done inserting to alert the validator that it can start reading data. The validator will listen for that event and trigger when it hears it.
Is there an efficient way to create a copy of table structure+data in HBase, in the same cluster? Obviously the destination table would have a different name. What I've found so far:
The CopyTable job, which has been described as a tool for copying data between different HBase clusters. I think it would support intra-cluster operation, but have no knowledge on whether it has been designed to handle that scenario efficiently.
Use the export+import jobs. Doing that sounds like a hack but since I'm new to HBase maybe that might be a real solution?
Some of you might be asking why I'm trying to do this. My scenario is that I have millions of objects I need access to, in a "snapshot" state if you will. There is a batch process that runs daily which updates many of these objects. If any step in that batch process fails, I need to be able to "roll back" to the original state. Not only that, during the batch process I need to be able to serve requests to the original state.
Therefore the current flow is that I duplicate the original table to a working copy, continue to serve requests using the original table while I update the working copy. If the batch process completes successfully I notify all my services to use the new table, otherwise I just discard the new table.
This has worked fine using BDB but I'm in a whole new world of really large data now so I might be taking the wrong approach. If anyone has any suggestions of patterns I should be using instead, they are more than welcome. :-)
All data in HBase has a certain timestamp. You can do reads (Gets and Scans) with a parameter indicating that you want to the latest version of the data as of a given timestamp. One thing you could do would be to is to do your reads to server your requests using this parameter pointing to a time before the batch process begins. Once the batch completes, bump your read timestamp up to the current state.
A couple things to be careful of, if you take this approach:
HBase tables are configured to store the most recent N versions of a given cell. If you overwrite the data in the cell with N newer values, then you will lose the older value during the next compaction. (You can also configure them to with a TTL to expire cells, but that doesn't quite sound like it matches your case).
Similarly, if you delete the data as part of your process, then you won't be able to read it after the next compaction.
So, if you don't issue deletes as part of your batch process, and you don't write more versions of the same data that already exists in your table than you've configured it to save, you can keep serving old requests out of the same table that you're updating. This effectively gives you a snapshot.
There are two different processes developed in Java running independently,
If any of the process modifyies the table, can i get any intimation? As the table is modified. My objective is i want a object always in sync with a table in database, if any modification happens on table i want to modify the object.
If table is modified can i get any intimation regarding this ? Do Database provide any facility like this?
We use SQL Server and have certain triggers that fire when a table is modified and call an external binary. The binary we call sends a Tib rendezvous message to notify other applications that the table has been updated.
However, I'm not a huge fan of this solution - Much better to control writing to your table through one "custodian" process and have other applications delegate to that. To enforce this you could change permissions on your table so that only your custodian process can write to the database.
The other advantage of this approach is being able to provide a caching layer within your custodian process to cater for common access patterns. Granted that a DBMS performs caching anyway, but by offering it at the application layer you will have more control / visibility over it.
No, database doesn't provide these services. You have to query it periodically to check for modification. Or use some JMS solution to send notifications from one app to another.
You could add a timestamp column (last_modified) to the tables and check it periodically for updates or sequence numbers (which are incremented on updates similiar in concept to optimistic locking).
You could use jboss cache which provides update mechanisms.
One way, you can do this is: Just enclose your database statement in a method which should return 'true' when successfully accomplished. Maintain the scope of the flag in your code so that whenever you want to check whether the table has been modified or not. Why not you try like this???
If you're willing to take the hack approach, and your database stores tables as files (eg, mySQL), you could always have something that can check the modification time of the files on disk, and look to see if it's changed.
Of course, databases like Oracle where tables are assigned to tablespaces, and tablespaces are what have storage on disk it won't work.
(yes, I know this is a bad approach, that's why I said it's a hack -- but we don't know all of the requirements, and if he needs something quick, without re-writing the whole application, this would technically work for some databases)