I am developing under Java, Ejb3.0, WebLogic.
I would like to have a system design suggestain from you about feature which I am going to develop. (not too complicated)
The main goal is to have a system which takes information from couple of databases and sync between them.
for example:
let's say I have database A, Database B and Database C.
if we compare B against C: ( B is master DB)
desired target:
scenario 1. A has a record which is missing in B. action we take = B add to its table the missing record.
scenario 2. A has a record and B has also that record. action we take = B is updating the record information exactly as it shown in A.
(The same goes with Database A against Database B).
The compare method suppose to compare between specific table columns information.
Now I could take everything and drop it to objects and then compare.
Other hand I can do manually sync.
Would like to hear some design advice (could be OOP design or any other pattern). Even if it's a bit overhead for some special design. I still Would like to do it so I can learn something new, and also use this mechanism to sync other systems.
Thanks in advance,
ray.
A good answer on this does depend on the amount of data.
If the amount is little, just get all objects from all databases and put it within a collection. Thats the most easy to maintain.
With a minor load of data coming from one database and major load of data from another, maybe its a good idead to take the minor data, pass it to the database with the major data and let the database do the work.
Mostly best practice is to keep the dataflow between your application and the database low.
Maybe you can explain details about your questions a bit more...
--- edit ---
Ok, so you want sync all to your B Master DB.
There are several approaches, depending on several environment parameters you have, the two main directions would be
Make a full iteration every time (easy to program and maintain, very bad performance)
Make a full sync once and delta updates after that (harder up to very hard to maintain, very good performance)
To 1.)
If all items from a table fit into your main memory without problem, take all of them into there, and do your stuff there.
If not you have to do it bunch for bunch.
To 2.)
a)
To allow to make deltas you have to identify changed items.
For that you can use DB triggers, but this is very DB specific and very hard to maintain,
or
b)
you can introduce table columns which have version numbers only for your sync purpose, which you count up, if a entity is done.
The version number could be introduced with Frameworks like Hibernate more easily, but still you have a bigger code base to do your things, since you have to check the version, etc.
But the much better performance will make delta updates the most commonly used approach.
This just sounds like data replication, which is best handled by the database itself. Consult the documentation for your database technology, there should be a multitude of different ways to configure replication. Don't re-invent the wheel.
Related
Recently I come across a schema model like this
Structure looks exactly the same, i just renamed with Entity name like Table (*)
Starting from Table C, all the tables are having close to 200 Columns, from C to L
Reason for posting this is like, I never come across structure like this before, if anyone who have already experienced like this or worked similar or more complex than this please do share your idea,
Having a structure like this is good or bad, and why?
Assume we need to have API to save data for the table structure like this,
how to design the API
How we are going to manage the Transactional across all these tables
In service code, there are few cases where we might need to get data from these table and transfer to external system.
Catch here is, external system is accepting the request in the flatten structure not in the hierarchy which we have as mentioned above. If this data needs to be transferred to external system, how can we manage marshaling and un marshaling
Last but not least, API which is going to manage the data like this can be consumed atleast 2K a day.
What is your thought on this, I don't know exactly why we need it, it needs a detailed discussion and we need to break up the things.
If I consider Spring Data JPA, Hibernate. What are all things i need to consider,
More Importantly, all these tables row values will be limited based on the the ownerId/tenantId, so the data needs to be consistent across all the tables.
I can not comment on the general aspect of the structure as that is pretty domain specific and one would need to know why this structure was chosen to be able to say if it's good or not. Either way, you probably can't change this anyway, so why bother asking if it's good or not?
Having said that, with such a model there are a few aspects that you should consider:
When updating data, it is pretty important to update only columns that really changed to avoid index trashing and allow the DB to use spare storage in pages. This is a performance concern that usually comes up when using Hibernate with such models as Hibernate usually updates all "updatable" columns, not just the dirty ones. There is an option to do dynamic updates though. Without dynamic updates, you might produce a few more IOs per update and thus keep locks for a longer time which affects the overall scalability.
When reading data, it is very important not to use join fetching by default as that might result in a result set size explosion.
This is my first post on stackoverflow, so please be nice to me :-)
So let me explain the context. I'm developing a web service with a standard layer (resources, services, DAO Layer...). I use JPA with hibernate implementation for my object model with the database.
For a class A parent and a class B child, most of the time when i want to find an object B on the collection, I use the streamAPI to filter the collection based on what i want. My question here is more general, is it better to search an object by requesting the database (from my point of view this gonna cause a lot of calls to the database but it's gonna use less CPU), or do the opposite by searching over the model object and process over collection (this gonna cause less database calls, but more CPU process)
If you consider latency, the database will always be slower.
So you gotta ask yourself some questions:
how far away is the database (latency)?
how big is the dataset?
How do I process them ?
do I have any major runtime issues ?
from my point of view this gonna cause a lot of calls to the database but it's gonna use less CPU), or do the opposite by searching over the model object and process over collection (this gonna cause less database calls, but more CPU process)
You're program is probably not very performant programmed. I suggest you check the O-Notation if you have any major runtime leaks.
Your Question is very broad, so it's hard to tell you, for your use-case, which might be the best.
Use database to return data what you need and Java to perform processing on them that would be complicated to do in a JPQL/SQL query.
Databases are designed to perform queries more efficiently than Java (stream or no).
Besides, fetching many data from a database to finally keep only a part of them is not efficient.
The database is usually faster since it is optimized for requesting specific data. Usually one would add indexes to speed up querying on certain fields.
TLDR: Filter your data in the database and process them from java.
This isn't an easy question to answer, since there are many different factors that would influence my decision to go to the db or not. First, I think it's fair to say that, for almost every app I've worked on in the past 20 years, hitting the DB for information is the default strategy. More recently (say past 10 or so years) data access through web service calls has become common as well.
For me, the main question would be something along the lines of, "Are there any situations when I would not hit an external resource (DB, Service, or even file read) for data every time I need it?"
So, I'll outline some of the things I would consider.
Is the data search space very small?
If you are searching a data space of tens of different records, then this information might be a candidate for non-db storage. On the other hand, once you get past a fairly small set records, this approach becomes increasingly untenable. Examples of these "small sets" might be something like salutations (Mr., Ms., Dr., Mrs., Lord). I looks for small sets of data that rarely change, which I, as a lazy developer, wouldn't mind typing into a configuration file. Once I get past something like 50 different records (like US States, for example), I want to pull that info from a DB or service call.
Are the data cacheable?
If you have multiple requests that could legitimately use the exact same data, then leverage caching in your application. Examine the data and expected usage of your service for opportunities to leverage regularities in data and likely requests to cache data whenever possible. Remember to consider cache keys, how long items should be cached, and when cached items should be evicted.
In many web usage scenarios, it's not uncommon that each display could include a fairly large amount of cached information, and a small amount of dynamic data. Menu and other navigation items are good candidates for caching. User-specific data, such as contract-sepcific pricing in an eCommerce app are often poor candidates.
Can you pre-load some data into cache?
Some items can be read once and cached for the entire duration of your application. A list of US States and/or Canadian Provinces is a good example here. These almost never change, so once read from the db, you would rarely need to read them again. Consider application components that can load such data on startup, and then hold this data in an appropriate collection.
I am hitting a REST API to get data from a service. I transform this data and store it in a database. I will have to do this on some interval, 15 minutes, and then make sure this database has latest information.
I am doing this in a Java program. I am wondering if it would be better, after I have queried all data, to do
1. SELECT statements and compare vs transformed data and do UPDATEs (DELETE all associated records to what was changed and INSERT new)
OR
DELETE ALL and INSERT ALL every time.
Option 1 has potential to be a lot less transactions, guaranteed SELECT on all records because we are comparing, but potentially not a lot of UPDATEs since I don't expect data to be changing much. But it has downside of doing comparisons on all records to detect a change
I am planning on doing this using Spring Boot, JPA layer and possibly postgres
The short answer is "It depends. Test and see for your usecase."
The longer answer: this feels like preoptimization. And the general response for preoptimization is "don't." Especially in DB realms like this, what would be best in one situation can be awful in another. There are a number of factors, including (and not exclusive to) schema, indexes, HDD backing speed, concurrency, amount of data, network speed, latency, and so on:
First, get it working
Identify what's wrong → get a metric
Measure against that metric
Make any obvious or necessary changes
Repeat 1 through 4 as appropriate
The first question I would ask of you is "What does better mean?" Once you define that, the path forward will likely become clearer.
In troubleshooting operations issues, I'm finding it difficult at times to diagnose a problem without more details. I see from timestamps that a merchant record changed on a particular date, for example, and the processing of transactions on the prior day is called into question. Logging what changed could help quickly rule out possibilities.
Are there any utilities out there that do that sort of comparison automatically? I'd like it to be able to do something like:
String logDelta=SomeLibrary.describeChanges(bean1, bean2);
I'd be hoping for a one-liner with something like:
"lastName{'Onassis','Kennedy Onassis'}, favoriteNumber{16,50}"
This is called an audit trail or an audit log and it's generally done in the database using triggers or stored procedures to make a copy of the row in the database being changed with the name of the user and the timestamp. It's very common to do this for compliance reasons. I haven't seen any packages that manage it for you because it's usually very tightly coupled to the database design.. you don't necessarily want a copy of every single row or every field, and it can become very expensive to do this in a highly transactional environment.
Try googling 'audit trail'
I have to go through a database and modify it according to a logic. The problem looks something like this. I have a history table in my database and I have to modify.
Before modifying anything I have to look at whether an object (which has several rows in the history table) had a certain state, say 4 or 9. If it had state 4 or 9 then I have to check the rows between the currently found row and the next state 4 or 9 row. If such a row (between those states) has a specific value in a specific column then I do something in the next row. I hope this is simple enough to give you an idea. I have to do this check for all the objects. Keep in mind that any object can be modified anywhere in its life cycle (of course until it reaches a final state).
I am using a SQL Sever 2005 and Hibernate. AFAIK I can not do such a complicated check in Transact SQL! So what would you recommend for me to do? So far I have been thinking on doing it as JUnit test. This would have the advantage of having Hibernate to help me do the modifications and I would have Java for lists and other data structures I might need and don't exist in SQL. If I am doing it as a JUnit test I am not loosing my mapping files!
I am curious what approaches would you use?
I think you should be able to use cursors to manage the complicated checks in SQL Server. You didn't mention how frequently you need to do this, but if this is a one-time thing, you can either do it in Java or SQL Server, depending on your comfort level.
If this check needs to be applied on every CRUD operation, perhaps database trigger is the way to go. If the logic may change frequently over the time, I would much rather writing the checks in Hibernate assuming no one will hit the database directly.