I have C applications that will run on multiple machines at different sites.
Now I want to control and monitor these C applications. For that I am thinking about Java Web Application using Servlet/JSP.
I am thinking that C applications will connect to Java Web application over TCP. In my web application, I am thinking to implement manager which communicates with C applications over TCP. I will start manager when web application starts as separate thread. And manager will communicate to servlet requests via Context and Session. So whenever user do something on browser, I want to use functionalities of my manager at server, with ServetContext an Session as interface.
So this is what I am thinking. So, I want to know if there is better approach, or I am doing anything wrong? Can anyone please suggest me better solution?
EDIT
Current workflow: whenever I need to start / stop C application, I have to SSH remote machine puTTY terminal, type long commands, and start / stop it. Whenever there is some issue, I have to scroll long long log files. There couple of other things like live status of what application is doing/processing all things at every second, that I can't log always in log file.
So I find these workflow difficult. And things like live status I can't monitor.
Now I want to have web application interface to it. I can modify my C application and implement web application from scratch.
New Workflow to implement: I want to start / stop C application from web page. I want to view logs and live status reports / live graphs on web page (monitoring what C application is doing). I want to monitor machine status also on web page.
The web interface I thinking to design in Java using JSP/servlets.
So, I will modify my C application so it can communicate with with web application.
Question:
Just need guidelines / best practices for making new workflow.
EDIT 2
Sorry for confusion between controller or manager. Both are same thing.
My thoughts:
System will consist of C applications running at different sites, Java controller and Java web app running parallely in Tomcat server, and DB.
1) C applications will connect to controller over TCP. So, controller here becomes server and C applications client.
2) C applications will be multithreaded, will receive tasks from controller and spawns new thread to perform that task. When controller tells to stop task, C application will stop thread of that task. Additionally, C applications will send work progress (logs) every second to controller.
3) Controller receives task commands from web application (as both running parallelly in Tomcat server, both in same instance on JVM), and web application will receive commands from user over HTTP.
4) The work progress (logs) received every second from C applications to controller, controller will then insert logs in DB for later analysis (need to consider if it is good insert logs in MySQL RDBMS, may be needed to do lot of inserts, may be 100 or 1000 every second, forever). Web application may also request recent 5 minute logs from controller and send to user over HTTP. If user is monitoring logs, then web application will have to retrieve logs every second from controller and send to user over HTTP.
5) User monitoring C application tasks, will see progress in graph, updated every second. Additionally text lines of logs of info/error events that may happen occasionally in C applications.
6) C applications will be per machine, which will execute any task user sends from web browser. C applications will be running as service in machine, which will start on machine startup, will connect to server, and will stay connected to server forever. Can be running idle if no tasks to perform.
It is a valid approach, I believe sockets is how most distributed systems communicate, and more often than not even different services on the same box communicate that way. Also I believe what you are suggesting for the java web service is very typical and will work well (It will probably grow in complexity beyond what you are currently thinking, but the archetecture you describe is a good start).
If your C services are made to also run independantly of the management system then you might want to reverse it and have the management system connect to the services (Unless your firewall prevents it).
You will certainly want a small, well-defined protocol. If you are sending lots of fields you could even make everything you send JSON or xml since they will already have parsers to validate the format.
Be careful about security! On the C side ensure that you won't get any buffer overflows and if you parse the information yourself, be strict about throwing away (and logging!) data that doesn't look right. On Java the buffer overruns aren't as much of a problem but be sure that you log packets that don't fit your protocol exactly to detect both bugs and intrusions.
Another solution that you might consider--Your systems all share a database already you could send commands and responses through the DB (Assuming the command/responses are not happening too often). We don't do this exactly, but we share a variable table in which we place name/value pairs indicating different aspects of our systems performance and configuration (it's 2-way), this is probably not optimal but has been amazingly flexible since it allows us to reconfigure our system at runtime (the values are cached locally in each service and re-read/updated every 30 seconds).
I might be able to give you more info if I knew more specifics about what you expected to do--for instance, how often will your browser update it's fields, what kind of command signals or data requests will be sent and what kind of data do you expect back? Although you certainly don't have to post that stuff here, you must consider it--I suggest mocking up your browser page to start.
edits based on comments:
Sounds good, just a couple comments:
2) Any good database should be able to handle that volume of data for logging but you may want to use a good cache on top of your DB.
5) You will probably want a web framework to render the graph and manage updates. There are a lot and most can do what you are saying pretty easily, but trying to do it all yourself without a framework of some sort might be tough. I only say this because you didn't mention it.
6) Be sure you can handle dropped connections and reconnecting. When you are testing, pull the plug on your server (at least the network cable) and leave it out for 10 minutes, then make sure when you plug it back in you get the results you expect (Should the client automatically reconnect? Should it hold onto the logs or throw them away? How long will it hold onto logs?)
You may want to build in a way to "Reboot" your C services. Since they were started as a service, simply sending a command that tells them to terminate/exit will generally work since the system will restart them. You may also want a little monitoring loop that restarts them under certain criteria (like they haven't gotten a command from the server for n minutes). This can come in handy when you're in california at 10am trying to work with a C service in Austraillia at 2am.
Also, consider that an attacker can insert himself between your client and server. If you are using an SSL socket you should be okay, but if it's a raw socket you must be VERY careful.
Correction:
You may have problems putting that many records into a MySQL database. If it is not indexed and you minimize queries against it you may be okay. You can achieve this by keeping the last 5 minutes of all your logs in memory so you don't have to index your database and by grouping inserts or having a very well tuned cache.
A better approach might be to forgo the database and just use flat log files pre-filtered to what a single user might want to see, so if the user asks for the last 5 minutes "WARN" and "DEBUG" messages from a machine you could just read the logfile from that machine into memory, skipping all but warn/debug messages, and display those. This has it's own problems but should be more scalable than an indexed database. This would also allow you to zip up older data (that a user won't want to query against any more) for a 70-90% savings in disk space.
Here are my recommendations on your current design and since you haven't defined a specific scope for this project:
Define a protocol to communicate between your C apps and your monitor app. Probably you don't need the same info from all the C apps in the same format or there are more important metrics for some C apps than others. I would recommend using plain JSON for this and to define a minimum schema to fulfill in order for both C to produce the data and Java for consume and validate it.
Use a database to store the results of monitoring your C apps. The generic option would be using a RDBMS, probably open source like MySQL or PostgreSQL, or if you (or your company) can get the licenses go for SQL Server or Oracle or another one. This in case you need to maintain a history of the results, and you can clear the data periodically.
Probably you want/need to have the latest results from monitoring available in a sort of cache (because in this time performance is critical), so you may use an in-memory database like Hazelcast or Redis, or just a simple cache like EhCache or Infinispan. Storing the data in an external element is better than storing it in plain ServletContext because these technologies are aware of multi threading and support ACID, which is not the primary use case for ServletContext but seems necessary for the monitor.
Separate the monitor that will receive the data from the C apps from the web app. In case the monitor fails or it takes too much time to perform some operations, the Web application will still be available to work without having the overhead to receive and manage the data from the C apps. In the other hand, if the web app starts to be slower (due to problems in the implementation of the app or something that should be discovered using a profiler) then you may restart it, and by doing this your monitor should continue gathering the data from the C apps and store them in your data source.
For the threads in the monitor app, since it seems it will be based on Java, use ExecutorService rather than creating and managing the threads manually.
For this part:
User monitoring C application tasks, will see progress in graph, updated every second. Additionally text lines of logs of info/error events that may happen occasionally in C applications
You may use Rx Java to not update your view (JSP, Facelet, plain HTML or whatever you will use) or another reactive programming model like Play Framework to read the data continuously from database (and cache if you use it) and update the view in a direct way for the users of the web app. If you don't want to use this programming model, then at least use push technology like comet or WebSockets. If this part is not that important, then use a simple refresh timer as explained here: How to reload page every 5 second?
For this part:
C applications will be per machine, which will execute any task user sends from web browser
You could reuse the protocol to communicate the C apps using JSON to the monitor and another thread in each C app to translate the action and execute it.
Related
When you want to launch scripts (.sh,jar,.py,.pl...) running more than some hours by a simple click on a user interface(for example a nice jsf page).
What are the best methods to throw process.
Running.exec() method , the use of Threads or something else ?
Thanks.
Definitely do not use long running threads inside a Java web app. Send a message with the task (ideally via a queue (for example rabbitmq)) to a separate application with its own thread pool, which will then handle independently the long running tasks requested by the webapp users.
If your system does not have a messaging system installed, and you find the overhead of managing one too high, and you are already using some sql or no-sql or whatever storage, you can probably adapt this storage to also be used for communication between the webapp and your new separate long requests runner app.
I have a swing desktop application that is installed on many desktops within a LAN. I have a mysql database that all of them talk to. At precisely 5 PM everyday, there is a thread that will wake up in each of these applications and try to back up files to a remote server. I would like to prevent all the desktop applications from doing the same thing.
The way I was thinking to do this was:
After waking up at 5PM , all the applications will try to write a row onto a MYSQL table. They will write the same information. Only 1 will succeed and the others will get a duplicate row exception. Whoever succeeds, then goes on to run the backup program.
My questions are:
Is this right way of doing things? Is there any better (easier) way?
I know we can do this using sockets as well. But I dont want to go down that route... too much of coding also I would need to ensure that all the systems can talk to each other first (ping)
Will mysql support such as a feature. My DB is INNO DB. So I am thinking it does. Typically I will have about 20-30 users in the LAN. Will this cause a huge overhead for the DB to handle.
If you could put an intermediate class in between the applications and the database that would queue up the results and allow them to proceed in an orderly manner you'd have it knocked.
It sounds like the applications all go directly against the database. You'll have to modify the applications to avoid this issue.
I have a lot of questions about the design:
Why are they all writing "the same row"? Aren't they writing information for their own individual instance?
Why would every one of them have exactly the same primary key? If there was an auto increment or timestamp you would't have this problem.
What's the isolation set to on the database connection? If it's set to SERIALIZABLE, you'll force each one to wait until the previous one is done, at the cost of performance.
Could you have them all write files to a common directory and pick them up later in an orderly way?
I'm just brainstorming now.
It seems you want to backup server data not client data.
I recommend to use a 3-tier architecture using Java EE.
You could use a Timer Service then to trigger the backup.
Though usually a backup program is an independent program e.g. started by a cron job on the server. But again: you'll need a server to do this properly, not just a shared folder.
Here is what I would suggest. Instead of having all clients wake up at the same time and trying to perform the backup, stagger the time at which they wake up.
So when a client wakes up
- It will check some table in your DB (MYSQL) to see if a back up job has completed or is running currently. If the job has completed, the client will go on with its normal duties. You can decide how to handle the case when the job is running.
- If the client finds that the back up job has not been run for the day, it will start the back up job. At the same time will modify the row to indicate that the back up job has started. Once the back up has completed the client will modify the table to indicate that the back up has completed.
This approach will prevent a spurt in network activity and can also provide a rudimentary form of failover. So if one client fails, another client at a later time can attempt the backup. (this is a bit more involved though. Basically it comes down to what a client should do when it sees that a back up job is on going).
I am a bit confused. I wrote a Java stand alone app and now I want to use GAE to
deploy it on the web and on the way also to learn about GAE.
In my application, I read data from file, store it in memory, process it, and then store the results in memory or file.
I understand that now I need to store the results in the GAE's data store, which is fine. So I can run my program independently on my computer, then write the results to file, and then use GAE to upload all the results to the data store, and then users can query it. However, is there a way that I can transfer the entire process into the GAE application? so the application reads data from file, do the processing (use the memory on the application server and not my computer - needs at least 4GB of RAM), and then when it's done (might take 1-2 hours), writes everything to the GAE data store? (so it's an internal "offline" process that no users are involved).
I'm a bit confused since Google don't mention anything about memory quota.
Thanks!
You will not be able to do your offline processing the way you are envisioning. There is a limit to how much memory your app can use, but that is not the main problem. All processing in app engine is done in request handlers. In other words, any action you want your app to do will be written as if it is handling a web request. Each of these handlers is limited to 30 seconds of running time. If your process tries to run longer, it will get shut down. App engine is optimized for serving web requests, not doing heavy computations.
All that being said, you may be able to break up your computational tasks into 30 second chunks and store intermediate results in the datastore or memcache. In that case you could use a cron job or task queue (both described in the app engine docs) to keep calling your processing handlers until the data crunching was done.
In summary, yes, it may be possible to do what you want, but it might not be worth the trouble. Look into other cloud solutions like Amazon's EC2 or Hadoop if you want to do computationally intensive things.
I am developing a client-server based application for financial alerts, where the client can set a value as the alert for a chosen financial instrument , and when this value will be reached the monitoring server will somehow alert the client (email, sms ... not important) .The server will monitor updates that come from a data generator program. Now, the server has to be very efficient as it has to handle many clients (possible over 50-100.000 alerts ,with updates coming at 1,2 seconds) .I've written servers before , but never with such imposed performances and I'm simply afraid that a basic approach(like before) will just not do it . So how should I design the server ?, what kind of data structures are best suited ?..what about multithreading ?....in general what should I do (and what I should not do) to squeeze every drop of performance out of it ?
Thanks.
I've worked on servers like this before. They were all written in C (or fairly simple C++). But they were even higher performance -- handling 20K updates per second (all updates from most major stock exchanges).
We would focus on not copying memory around. We were very careful in what STL classes we used. As far as updates, each financial instrument would be an object, and any clients that wanted to hear about that instrument would subscribe to it (ie get added to a list).
The server was multi-threaded, but not heavily so -- maybe a thread handing incoming updates, one handling outgoing client updates, one handling client subscribe/release notifications (don't remember that part -- just remember it had fewer threads than I would have expected, but not just one).
EDIT: Oh, and before I forget, the number of financial transactions happening is growing at an exponential rate. That 20K/sec server was just barely keeping up and the architects were getting stressed about what to do next year. I hear all major financial firms are facing similar problems.
You might want to look into using a proven message queue system, as it sounds like this is basically what you are doing in your application.
Projects like Apache's ActiveMQ or RabbitMQ are already widely used and highly tuned, and should be able to support the type of load you are talking about outside of the box.
I would think that squeezing every drop of performance out of it is not what you want to do, as you really never want that server to be under load significant enough to take it out of a real-time response scenario.
Instead, I would use a separate machine to handle messaging clients, and let that main, critical server focus directly on processing input data in "real time" to watch for alert criteria.
Best advice is to design your server so that it scales horizontally.
This means distributing your input events to one or more servers (on the same or different machines), that individually decide whether they need to handle a particular message.
Will you be supporting 50,000 clients on day 1? Then that should be your focus: how easily can you define a single client's needs, and how many clients can you support on a single server?
Second-best advice is not to artificially constrain yourself. If you say "we can't afford to have more than one machine," then you've already set yourself up for failure.
Beware of any architecture that needs clustered application servers to get a reasonable degree of performance. London Stock Exchange had just such a problem recently when they pulled an existing Tandem-based system and replaced it with clustered .Net servers.
You will have a lot of trouble getting this type of performance from a single Java or .Net server - really you need to consider C or C++. A clustered architecture is much more error prone to build and deploy and harder to guarantee uptime from.
For really high volumes you need to think in terms of using asynchronous I/O for networking (i.e. poll(), select() and asynchronous writes or their Windows equivalents), possibly with a pool of worker threads. Read up about the C10K problem for some more insight into this.
There is a very mature C++ framework called ACE (Adaptive Communications Environment) which was designed for high volume server applications in telecommunications. It may be a good foundation for your product - it has support for quite a variety of concurrency models and deals with most of the nuts and bolts of synchronisation within the framework. You might find that the time spent learning how to drive this framework pays you back in less development and easier implementation and testing.
One Thread for the receiving of instrument updates which will process the update and put it in a BlockingQueue.
One Thread to take the update from the BlockingQueue and hand it off to the process that handles that instrument, or set of instruments. This process will need to serialize the events to an instrument so the customer will not receive notices out-of-order.
This process (Thread) will need to iterated through the list of customers registered to receive notification and create a list of customers who should be notified based on their criteria. The process should then hand off the list to another process that will notify the customer of the change.
The notification process should iterate through the list and send each notification event to another process that handles how the customer wants to be notified (email, etc.).
One of the problems will be that with 100,000 customers synchronizing access to the list of customers and their criteria to be monitored.
You should try to find a way to organize the alerts as a tree and be able to quickly decide what alerts can be triggered by an update.
For example let's assume that the alert is the level of a certain indicator. Said indicator can have a range of 0, n. I would groups the clients who want to be notified of the level of the said indicator in a sort of a binary tree. That way you can scale it properly (you can actually implement a subtree as a process on a different machine) and the number of matches required to find the proper subset of clients will always be logarithmic.
Probably the Apache Mina network application framework as well as Apache Camel for messages routing are the good start point. Also Kilim message-passing framework looks very promising.
We have a Java program run as root on Unix, that therefore can read for example the content of the folders /home/user1 and /home/user2. However, if the Unix user "user1" is logged in in our application, he should not be able to access "/home/user2" data.
We would like to use directly the Unix rights and not recreate all the permissions in our application !
So, could we...
try to change the UID of our
program depending on the user logged
in ? Sounds difficult, and each file
access is in different threads so
the UID would be different on each
thread of our program...
use JNI to read permissions of
"/home/user2"...And then determine
if user1 has sufficient permissions
on "/home/user2" ? (how ?).
Use SecurityManager!
Put current unix user id into ThreadLocal
Create your own SecurityManager that checks unix user permissions on checkRead() and checkWrite()
System.setSecurityManager(new MySecurityManager())
Enjoy
Update
There is no, of course, standard library to read unix file permissions. It's not WORA.
But I have tried briefly to find a ready to use library, and found this one:
http://jan.newmarch.name/java/posix/ It uses JNI, but you don't need to write your own JNI code, which is a big relief. :) I'm sure there must also be others.
Class Stat from there gives you all required access information:
http://jan.newmarch.name/java/posix/posix.Stat.html
Update 2
As folks mentioned, this approach fails to check for "non-standard" unix security features, such as ACL or Posix Capabilities (may be; not sure if they apply to files). But if the goal of being totally in sync with host OS security is set, then we even more need to use SecurityManager, because it's a JVM-wide protection mechanism! Yes, we can start a child SUID-process to verify the permissions (and keep it running, talking to it via pipe running while the user is logged in), but we need to do so from SecurityManager!
The simplest and most portable way would be to spawn a child process, have it exec a wrapper written in C which changes the UID, drops all the privileges (be careful, writting a wrapper to do that is tricky - it is as hard as writing a setuid wrapper), and execs another java instance to which you talk via RMI. That java instance would do all the filesystem manipulation on behalf of the user.
For single-threaded Linux programs, you could instead use setfsuid()/setfsgid(), but that is not an option for portable or multithreaded programs.
if you only want the app to be allowed to read files by user1 i strongly suggest the app runs as user1.
If everything else fails, you can run a shellscript from java and parse the result.
Described for example here
For those who were wondering, it's apparently not possible to do this by calling setuid with JNI for each independent thread. setuid affects the whole process, not just the thread that invoked it.
Should you want to call setuid within a single-threaded Java program there's a good example at http://www2.sys-con.com/itsg/virtualcd/Java/archives/0510/Silverman/index.html.
Another option would be to invert the approach: instead of the code running as root most of the time and either changing the user ID or somehow checking the permissions whenever it has to use some restricted resource, run as the user most of the time and talk to a smaller daemon running as root when it needs to do something only root can do. This also has the added benefit of reducing the attack surface.
Of course, you then have to authenticate the connection from the process running as the user to the process running as root.
I am also having the exact problem as Mikael, and got to this page looking for answers.
None of the answers are 100% satisfactionary for me. So I am thinking of 4 alternatives:
Use a Linux group that has access to all the users. Run a single Java app under that group. This Java app can communicate to the 'root' app using whatever means.
Potentially, it can be "hotel"-ed. e.g. 1 "hotel" (app with group permissions) per 100 users (or as appropriate). So if you have 10,000 users you need 100 hotels, which is quite manageable.
Spawn a JVM for each child app under its own user ID. This is like calling a script, but rather than using stdin/stdio/stderr, use any communication protocol. In my case, I'm using XMPP and IO Data (which, since it's already in use by other components, it doesn't matter "where" aka which JVM it runs).
Create a Super-Server 'root' app. This can be part of the original 'root' app or a separate service dedicated to service management.
The Super-Server is responsible for handling incoming requests (i.e. it practically becomes a reverse proxy) for the user-specific sub-apps, and launching the real child apps (if they're not running already), and passing messages back and forth between the client and the child app(s).
Additionally, the child apps can be pooled (or even "passivated", if there's such thing), much in the way Java EE EJB Container does it. So even if there are 10,000 users and (potentially) 10,000 child apps servicing, the maximum number of child apps running are capped. Idle apps are shut down to make room for others.
Same as #3 but rather than creating a proprietary service management mechanism, integrate with Upstart (or the service management framework in the underlying OS). i.e. there is a 'root' service that can control Upstart. Upstart can start, stop, restart, can query the status of the child services, just like it can control mysqld, Apache, etc.
For me, now, the quickest and simplest to implement would be #1. However, my ideal solution would be #4, but it will take time and testing whether it works well. (the concept itself borrows from inetd/xinetd and EJB, so I think it's pretty sound fundamentally)