Real Time Data Extraction/ Transaction capturing

Real Time Data Extraction/ Transaction capturing - java

We are trying to capture some transactions on [near] Real-Time occuring on the Core-database, in another remote database connected via VPN.
These transactions can be identified easily but we are facing challenge while deciding the workflow and identifying which technology to use.
For eg.
1.) Dumping CSV file every x seconds.
From the core system we create a CSV file every x seconds with the required information. We will then push/pull this file to the remote system and process it.
2.) Web Service
We will have 2 web services, one on the sender side & another on the reciever side.
Every x seconds the sender web service will execute a query and fetch records from the source database and push the data to reciever web service in batches of 'y' records.
The receiver will then process the records and send an acknowledgement for 'y' records.
Note.
1.) Ideally we would like to make the process Real-Time. Both the above ideas are [near] Real-Time and not Real-Time.
2.) The source database system is not specific. It can be oracle,ms-sql,mysql,sybase,informix etc.
3.) Remote target database is oracle.
Any ideas are most welcome and also the technology used can be flexible.
The main focus is on decreasing the load caused due to this process on the core-database.
Edit:
It is becoming more and more clear to me that getting actual Real time with heterogeneous database systems will be nearly impossible as the trigger/notify on insertion of records are RDBMS specific.
I would like to shift the focus of the question to get better near Real time ideas apart from the above 2 examples shared.
Also please note that we have little to no control over the source database & also the process/service which originally inserts the records in the database. We only have control over the records.

See this article for an example on how to listen for database changes (in this case a database trigger) in PostgreSQL. Basically you set up a function to handle the trigger that sends an event to all interested clients. You Application will then listen for this event and can start the sync whenever the trigger is executed. The example applies the trigger to new insertions on a specific table.

Related

Conditional execution based on database load (Oracle database)

We have a situation where we have to perform a lengthy query to the database based on human input. As the input changes, the query has to be done over and over again, and the input may change once per second.
The problem is, we know that this will cause a spike in server activity for several seconds, and since it is not critical to have an answer immediately or on every input change, it means we can afford executing or not executing the query.
The criteria we would like to use is the current state of the database server, and only allow the query to be done if it is in a low or medium load state, skipping the query when the database server is under stress.
We use Oracle database for this, and so far we have not found any way, from Java, to do this except by actually loading into the server a known query and benchmark it, but that is essentially adding some load to the server. So my question: is there any other way, specifically in Oracle database, where we can discover from the Java side of the application the load of the database?

Depending on how you define "low or medium load state", I'd guess that hitting v$osstat would give you the information you're after Of course, hitting v$osstat constantly will also add to the load on the server. You may want to write a job that copies the v$osstat data to a table you control periodically (and can thus index appropriately) so that your application can hit that table rather than hitting the dynamic performance view constantly. Depending on the goal (i.e. are you trying to ensure that other users have enough resources or are you trying to ensure that your app remains responsive), you may want to use Resource Manager to control resource utilization among users, you may want to run the query asynchronously from the application, and/or you may want to use some sort of cache at the middle tier to avoid hitting the database every time.

Control C application from Java web application

I have C applications that will run on multiple machines at different sites.
Now I want to control and monitor these C applications. For that I am thinking about Java Web Application using Servlet/JSP.
I am thinking that C applications will connect to Java Web application over TCP. In my web application, I am thinking to implement manager which communicates with C applications over TCP. I will start manager when web application starts as separate thread. And manager will communicate to servlet requests via Context and Session. So whenever user do something on browser, I want to use functionalities of my manager at server, with ServetContext an Session as interface.
So this is what I am thinking. So, I want to know if there is better approach, or I am doing anything wrong? Can anyone please suggest me better solution?
EDIT
Current workflow: whenever I need to start / stop C application, I have to SSH remote machine puTTY terminal, type long commands, and start / stop it. Whenever there is some issue, I have to scroll long long log files. There couple of other things like live status of what application is doing/processing all things at every second, that I can't log always in log file.
So I find these workflow difficult. And things like live status I can't monitor.
Now I want to have web application interface to it. I can modify my C application and implement web application from scratch.
New Workflow to implement: I want to start / stop C application from web page. I want to view logs and live status reports / live graphs on web page (monitoring what C application is doing). I want to monitor machine status also on web page.
The web interface I thinking to design in Java using JSP/servlets.
So, I will modify my C application so it can communicate with with web application.
Question:
Just need guidelines / best practices for making new workflow.
EDIT 2
Sorry for confusion between controller or manager. Both are same thing.
My thoughts:
System will consist of C applications running at different sites, Java controller and Java web app running parallely in Tomcat server, and DB.
1) C applications will connect to controller over TCP. So, controller here becomes server and C applications client.
2) C applications will be multithreaded, will receive tasks from controller and spawns new thread to perform that task. When controller tells to stop task, C application will stop thread of that task. Additionally, C applications will send work progress (logs) every second to controller.
3) Controller receives task commands from web application (as both running parallelly in Tomcat server, both in same instance on JVM), and web application will receive commands from user over HTTP.
4) The work progress (logs) received every second from C applications to controller, controller will then insert logs in DB for later analysis (need to consider if it is good insert logs in MySQL RDBMS, may be needed to do lot of inserts, may be 100 or 1000 every second, forever). Web application may also request recent 5 minute logs from controller and send to user over HTTP. If user is monitoring logs, then web application will have to retrieve logs every second from controller and send to user over HTTP.
5) User monitoring C application tasks, will see progress in graph, updated every second. Additionally text lines of logs of info/error events that may happen occasionally in C applications.
6) C applications will be per machine, which will execute any task user sends from web browser. C applications will be running as service in machine, which will start on machine startup, will connect to server, and will stay connected to server forever. Can be running idle if no tasks to perform.

It is a valid approach, I believe sockets is how most distributed systems communicate, and more often than not even different services on the same box communicate that way. Also I believe what you are suggesting for the java web service is very typical and will work well (It will probably grow in complexity beyond what you are currently thinking, but the archetecture you describe is a good start).
If your C services are made to also run independantly of the management system then you might want to reverse it and have the management system connect to the services (Unless your firewall prevents it).
You will certainly want a small, well-defined protocol. If you are sending lots of fields you could even make everything you send JSON or xml since they will already have parsers to validate the format.
Be careful about security! On the C side ensure that you won't get any buffer overflows and if you parse the information yourself, be strict about throwing away (and logging!) data that doesn't look right. On Java the buffer overruns aren't as much of a problem but be sure that you log packets that don't fit your protocol exactly to detect both bugs and intrusions.
Another solution that you might consider--Your systems all share a database already you could send commands and responses through the DB (Assuming the command/responses are not happening too often). We don't do this exactly, but we share a variable table in which we place name/value pairs indicating different aspects of our systems performance and configuration (it's 2-way), this is probably not optimal but has been amazingly flexible since it allows us to reconfigure our system at runtime (the values are cached locally in each service and re-read/updated every 30 seconds).
I might be able to give you more info if I knew more specifics about what you expected to do--for instance, how often will your browser update it's fields, what kind of command signals or data requests will be sent and what kind of data do you expect back? Although you certainly don't have to post that stuff here, you must consider it--I suggest mocking up your browser page to start.
edits based on comments:
Sounds good, just a couple comments:
2) Any good database should be able to handle that volume of data for logging but you may want to use a good cache on top of your DB.
5) You will probably want a web framework to render the graph and manage updates. There are a lot and most can do what you are saying pretty easily, but trying to do it all yourself without a framework of some sort might be tough. I only say this because you didn't mention it.
6) Be sure you can handle dropped connections and reconnecting. When you are testing, pull the plug on your server (at least the network cable) and leave it out for 10 minutes, then make sure when you plug it back in you get the results you expect (Should the client automatically reconnect? Should it hold onto the logs or throw them away? How long will it hold onto logs?)
You may want to build in a way to "Reboot" your C services. Since they were started as a service, simply sending a command that tells them to terminate/exit will generally work since the system will restart them. You may also want a little monitoring loop that restarts them under certain criteria (like they haven't gotten a command from the server for n minutes). This can come in handy when you're in california at 10am trying to work with a C service in Austraillia at 2am.
Also, consider that an attacker can insert himself between your client and server. If you are using an SSL socket you should be okay, but if it's a raw socket you must be VERY careful.
Correction:
You may have problems putting that many records into a MySQL database. If it is not indexed and you minimize queries against it you may be okay. You can achieve this by keeping the last 5 minutes of all your logs in memory so you don't have to index your database and by grouping inserts or having a very well tuned cache.
A better approach might be to forgo the database and just use flat log files pre-filtered to what a single user might want to see, so if the user asks for the last 5 minutes "WARN" and "DEBUG" messages from a machine you could just read the logfile from that machine into memory, skipping all but warn/debug messages, and display those. This has it's own problems but should be more scalable than an indexed database. This would also allow you to zip up older data (that a user won't want to query against any more) for a 70-90% savings in disk space.

Here are my recommendations on your current design and since you haven't defined a specific scope for this project:
Define a protocol to communicate between your C apps and your monitor app. Probably you don't need the same info from all the C apps in the same format or there are more important metrics for some C apps than others. I would recommend using plain JSON for this and to define a minimum schema to fulfill in order for both C to produce the data and Java for consume and validate it.
Use a database to store the results of monitoring your C apps. The generic option would be using a RDBMS, probably open source like MySQL or PostgreSQL, or if you (or your company) can get the licenses go for SQL Server or Oracle or another one. This in case you need to maintain a history of the results, and you can clear the data periodically.
Probably you want/need to have the latest results from monitoring available in a sort of cache (because in this time performance is critical), so you may use an in-memory database like Hazelcast or Redis, or just a simple cache like EhCache or Infinispan. Storing the data in an external element is better than storing it in plain ServletContext because these technologies are aware of multi threading and support ACID, which is not the primary use case for ServletContext but seems necessary for the monitor.
Separate the monitor that will receive the data from the C apps from the web app. In case the monitor fails or it takes too much time to perform some operations, the Web application will still be available to work without having the overhead to receive and manage the data from the C apps. In the other hand, if the web app starts to be slower (due to problems in the implementation of the app or something that should be discovered using a profiler) then you may restart it, and by doing this your monitor should continue gathering the data from the C apps and store them in your data source.
For the threads in the monitor app, since it seems it will be based on Java, use ExecutorService rather than creating and managing the threads manually.
For this part:
User monitoring C application tasks, will see progress in graph, updated every second. Additionally text lines of logs of info/error events that may happen occasionally in C applications
You may use Rx Java to not update your view (JSP, Facelet, plain HTML or whatever you will use) or another reactive programming model like Play Framework to read the data continuously from database (and cache if you use it) and update the view in a direct way for the users of the web app. If you don't want to use this programming model, then at least use push technology like comet or WebSockets. If this part is not that important, then use a simple refresh timer as explained here: How to reload page every 5 second?
For this part:
C applications will be per machine, which will execute any task user sends from web browser
You could reuse the protocol to communicate the C apps using JSON to the monitor and another thread in each C app to translate the action and execute it.

Java application subscribed to PostgreSQL/MySql

I need to create a Java agent that can be aware and execute its instruccions as soon as any update for particular tables in a Mysql or Psql Database occurr.
Everything needs to be done automaticaly.
I was wondering given Im a novice in Java you guys could give me any advice..
My options are:
1) Having a trigger that after a commit could awake my java application. (using Pg_notify and others)
2) or Having the java application subscribed to a particular ID in a database (not sure if this can be done given asynchronous updates are not possible and I might need to have my agent asking xx second to the dabatase for changes)
Thanks!

Yes, a trigger that uses NOTIFY is a good way to do it in PostgreSQL. The important problem when using the JDBC driver is that there is no way to receive notifications asynchronously, you have to poll. This is usually fine as the NOTIFY/LISTEN mechanism is very light-weight: if you want to poll 10 (100?) times a second, then you can do so without causing performance problems. See http://jdbc.postgresql.org/documentation/83/listennotify.html for more.
MySQL is a little less helpful; you'll need to have triggers INSERT rows into a monitoring table and repeatedly poll that table with SELECT * (and then DELETE). This will work, but you are more likely to end up in a latency/performance trade-off.

Preventing multiple users from doing the same action

I have a swing desktop application that is installed on many desktops within a LAN. I have a mysql database that all of them talk to. At precisely 5 PM everyday, there is a thread that will wake up in each of these applications and try to back up files to a remote server. I would like to prevent all the desktop applications from doing the same thing.
The way I was thinking to do this was:
After waking up at 5PM , all the applications will try to write a row onto a MYSQL table. They will write the same information. Only 1 will succeed and the others will get a duplicate row exception. Whoever succeeds, then goes on to run the backup program.
My questions are:
Is this right way of doing things? Is there any better (easier) way?
I know we can do this using sockets as well. But I dont want to go down that route... too much of coding also I would need to ensure that all the systems can talk to each other first (ping)
Will mysql support such as a feature. My DB is INNO DB. So I am thinking it does. Typically I will have about 20-30 users in the LAN. Will this cause a huge overhead for the DB to handle.

If you could put an intermediate class in between the applications and the database that would queue up the results and allow them to proceed in an orderly manner you'd have it knocked.
It sounds like the applications all go directly against the database. You'll have to modify the applications to avoid this issue.
I have a lot of questions about the design:
Why are they all writing "the same row"? Aren't they writing information for their own individual instance?
Why would every one of them have exactly the same primary key? If there was an auto increment or timestamp you would't have this problem.
What's the isolation set to on the database connection? If it's set to SERIALIZABLE, you'll force each one to wait until the previous one is done, at the cost of performance.
Could you have them all write files to a common directory and pick them up later in an orderly way?
I'm just brainstorming now.

It seems you want to backup server data not client data.
I recommend to use a 3-tier architecture using Java EE.
You could use a Timer Service then to trigger the backup.
Though usually a backup program is an independent program e.g. started by a cron job on the server. But again: you'll need a server to do this properly, not just a shared folder.

Here is what I would suggest. Instead of having all clients wake up at the same time and trying to perform the backup, stagger the time at which they wake up.
So when a client wakes up
- It will check some table in your DB (MYSQL) to see if a back up job has completed or is running currently. If the job has completed, the client will go on with its normal duties. You can decide how to handle the case when the job is running.
- If the client finds that the back up job has not been run for the day, it will start the back up job. At the same time will modify the row to indicate that the back up job has started. Once the back up has completed the client will modify the table to indicate that the back up has completed.
This approach will prevent a spurt in network activity and can also provide a rudimentary form of failover. So if one client fails, another client at a later time can attempt the backup. (this is a bit more involved though. Basically it comes down to what a client should do when it sees that a back up job is on going).

What is the most efficient way to store analytics beacons?

Similar to how google analytics sends beacons from javascript that track events, what are the most efficient ways to collect that beacon data and return back to the client in the fastest time?
For example, if I have a server to server beacon call I want to make that call as fast as possible on the clients server.
PHP to a flat files?
PHP to a local queue?
Java Server that logs to a queue and maintains a connection the remote queue the whole time?
custom c++ server?
This would be on the order of 1000 requests per second.

There are 2 aspects to this.
1) the client's beacon call should be done as quickly as possible. This means the incoming HTTP request should respond 200 OK and exit as soon as possible, so it probably shouldn't do the actual data writing itself. It should hand that off to another process in the background, either by a background shell execution or by utilizing a queue/job mechanism like Gearman.
2) The data writing itself, if done in a background thread away from the client's attention, has a little more time luxury. 1000 writes per second should be fine for a modern hardware well tuned database with row locking that's not being SELECTed from too heavily at the same instant. Perhaps, though, this could be a good usage scenario for a key-value store for the immediate data storage. Then a separate analysis/reporting process could query the key-value store off-line for all stored data, process it, and eventually copy it into a database.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.