Producer Consumer in PHP and Java

Producer Consumer in PHP and Java - java

I have a system where in I get requests via HTTP call to my PHP code(producer). This code adds the request parameters to a table in MYSQL(queue). This is then taken and processed by a java program(consumer). In my first implementation both producer and consumer was in PHP(with MYSQL queue). Then as load increased this proved inefficient and so I made the consumer java. Now I think polling MYSQL table for the queue from my java app is getting inefficient(vey high cpu usage for MYSQL process). Is there a better way to implement this queue (sharing memory between PHP code and Java app or something)?

Yes, you've got many options. The first is obviously to convert this into a client-server service, and pass either text or binary messages between them. You might want to look into webservices if you're a masochist, or a simpler REST service, or CORBA / COM+ and other for binary serializations. And then there's various queues, like MQseries, RabbitMQ, etc. Sometimes the middle-man is fast enough and efficient enough, or a direct call would suffice.
The next is a more direct link if your platforms are within the same server or cluster, something like JavaBridge and others (do a search for "java php bridge", and several will crop up. There's even a PHP interpreter written in Java for the JVM which gives you full compatibility between the two which might do the trick for you.

Related

Is instantiating a new JVM for every server request a large load?

I'm using Apache PHP for my web application. I'd like to use the exec function in PHP and call a java class (Im not going to be using a java to php bridge), however this now requires not only distributing the request thread in apache but also starting a new JVM with the java execution. Is this going to be extremely large of a load on server resources if we have a significant amount of users?
The only reason I'm not doing a java to php bridge is because it seems a bit difficult and time consuming to get up and running.

You need only one java process to make this work.
Write a Java Socket server. Then you can write a PHP Client with PHP Sockets to send commands to your Java Server which do the work.

Yes, starting a new process (Java or otherwise) for each request is going to perform very poorly.

Google App Server does this and it can handle in the order of 2 to 10 requests per second per server. If you use a Java server which is running all the time you should expect around 100-1000 requests per second. If you use a persisted connection, and efficient messages, you can handle over 100K message per second.
In short, it could work but I would have trouble accepting such a inefficient solution. ;)

in HTTP mode, does node.js have substantial performance advantage over Java?

I just started to code in node.js for a little while. Now here is one of my questions about it:
In HTTP apps, given the request-response model, the single app thread is blocked until all the back end tasks are done and response is returned to the client, so the performance improvement seems to be limited only to fine-tuning back end things like parallelizing IO requests. (Well, this improvement matters when it comes to many heavy and independent IO operations being involved, but usually the condition also implies that by redesigning the data structure you could eliminate a large number of IO request and, possibly, end up with even better performance than just issuing parallelized operations.)
If that is true, how could it produce superior performance than those frameworks based on Java (or PHP, python, etc.) do?
I also referred to an article Understanding the node.js event loop, which also explains that situation:
It really is a single thread running: you can’t do any parallel code
execution; doing a “sleep” for example will block the server for one
second:
while(new Date().getTime() < now + 1000) {
// do nothing
}
…however, everything runs in parallel except your code.
I personally verified that by putting exact the "sleep" code into one IO callback closure, and tried submitting a request leading to this callback, then submitted another one. Both requests will trigger a console log when it is processed. And my observation is that the later was blocked until the former returned a response.
So, does it imply that only in socket mode, where both sides can emit events and push messages to each other at any time, would the full power of its asynchronous processing capability be utilized?
I'm a little confused about that. Any comment or advice is welcome. Thanks!
update
I ask this question because some performance evaluation cases are
reported, for instance Node.js is taking over the Enterprise –
whether you like it or
not,
and LinkedIn Moved from Rails to Node: 27 Servers Cut and Up to 20x
Faster.
Some radical opinion claims that J2EE will be totally replaced: J2EE
is Dead: Long-live Javascript Backed by JSON
Services.

NodeJS uses libuv, so IO operations are non-blocking. Yes, your Node app uses 1 thread, however, all the IO requests are pushed to an event queue. Then when the request is made, it is obvious that its response will not be read from socket, file etc. at zero-time. So, whatever is ready in the queue is popped and it is handled. In the mean time, your requests can be answered, there might be chunks or full data to be read, however they are just waiting in the queue to be processed. This goes on until there is no event remains, or the open sockets are closed. Then the NodeJS can finally end its execution.
As you see, NodeJS is not like other frameworks, pretty different. If you have a long going and Non-IO operation, so it is blocking, like matrix operations, image&video processing, you can spawn another processes and assign them the job, use message passing, the way you like TCP, IPC.
The main point of NodeJS is to remove unncesseary context switches which brings significant overhead when not used properly. In NodeJS, why would you want context switches? All the jobs are pushed to event queue and they are probably small in computation, since all they do to make multiple IO/s, (read from db, update db, write to client, write to bare TCP socket, read from cache), it is not logical to stop them in the middle and switch to another job. So with the help of libuv, whichever IO is ready can be executed right now.
For reference please look at libuv documentation: http://nikhilm.github.io/uvbook/basics.html#event-loops

I have also noticed a lot of radical opinions regarding Node.js performance when compared to Java. From a queuing theory perspective, I was skeptical how a single thread with no blocking could out perform multiple threads that blocked. I thought that I would conduct my own investigation into just how well Node.js performed against a more established and mature technology.
I evaluated Node.js by writing a functionally identical, multiple datasource micro-service both in Node.js and in DropWizard / Java then subjected both implementions to the same load test. I collected performance measurements of the results from both tests and analyzed the data.
At one fifth the code size, Node.js had comparable latency and 16% lower throughput than DropWizard.
I can see how Node.js has caught on with early stage start-up companies. It is easier to write micro-services very quickly in Node.js and get them running than it is with Java. As companies mature, their focus tends to shift from finding product / market fit to improving economies of scale. This might explain why more established companies prefer Java with its higher scalability.

As far as my experience(though brief) goes with node.js, i agree the performance of node.js server can not be compared with other webservers like tomcat etc as stated in node.js doc somewhere
It really is a single thread running: you can’t do any parallel code
execution; doing a “sleep” for example will block the server for one
second:
So we used it not as alternative to full fledged webserver like tomcat but just to distrubute some load from tomcat where we can take of single thread model. So it has to be trade-off somewhere
Also see http://www.sitepoint.com/node-js-is-the-new-black/ Thats the beautiful article about node.js

Hard time choosing ... IO vs NIO

I would like to ask what would be more appropriate to choose when developing a server similar to SmartFoxServer. I intend to develop a similar yet different server. In the benchmarks made by the ones that developed the above server they had something like 10000 concurrent clients.
I made a bit of research regarding the cost of using too many threads(>500) but cannot decide which way to go. I once made a server in java but that was for a small application and had nothing to do with heavy loads.
Thanks

Take a look at Apache Mina. They've done alot of the heavy lifting required to use NIO effectively in a networking application. Whether or not NIO increases your ability to process concurrent connections really depends on your implementation, but the performance boosts in Tomcat, JBoss and Jetty are plenty evidence to you already in the positive.

i'm not familiar with smartfoxserver, so i can only speak generically (which is not always good :P but here i go)
i think those are 2 different questions. on one hand, the io performance when using native java sockets vs. native sockets written in c (like tomcat).
the other question is how to scale up to that kind of concurrency level. other than that, i'd always choose native sockets (i.e: c).
now, how to scale: it's not a good idea to have a lot of threads running at the same time (os constraints, etc), so i'd choose to scale horizontally, meaning to add a load balancer that can send the requests to different servers that can be linked by using messages (using jms, like rabbitmq or activemq, or even using a protocol like stomp or amqp).
other solution, a cloud environment that allows you to grow your installation as you need

In most benchmarks which test 10K or 100K connections, the server is doing no work and unless your server does next to nothing, these test are unrealistic.
You need to take a clear idea of mow many concurrent connections you want to support.
If you have less than 1K connection, using a thread per connection will work ok. This is the simplest approach to take. Using a dispatcher model with NIO will work better if your request are very simple. Otherwise it won't matter much.
If you have more than 1K connections it is likely you want to use more than one server as each connection is getting less than 1% of a core and the cost of a basic server is relatively cheap these days.

Advice for writing Client-Server based game

I'm thinking about writing a game which is based around a server, and several client programs connect to it. The game (very) basically consists of a list of items which a user can 'accept', which would remove it from the list on all connected computers (this needs to update very quickly).
I'm thinking about using a Java applet for the client since I would like this to be portable and run from a browser (mostly in Windows), as well as updating fast, and either a C++ or Java server running on Linux (currently just a home server, but possibly to go on a VPS).
A previous 'incarnation' of this game ran in a browser, and used PHP+mySQL for the backend, but this swamped the server quite a bit when several people connected (that was with about 8 people, this would eventually need to handle a lot more).
The users would probably all be in the same physical location (with the same public IP address), and the system would get several requests per second, all of which would require sending the list back to the clients.
Some computers may have firewall restrictions on them, so would you recommend using HTTP traffic, a custom port, or perhaps through SSH or some existing protocol?
Could anyone suggest some tips (threading, multiple requests of one item?), tools, databases (mySQL?), or APIs which would help me get started on this project? I would prefer C++ for the backend as it would be faster, but using Java would allow me to reuse code.
Thanks!

I wouldn't use C++ because of speed alone. It is highly unlikely that the difference in performance will make a real difference to your game. (Your network is likely to cloud any performance difference, unless you have 10 GigE between the client and server) I would use C++ or Java because you will get it working first using that language.

For anyone looking for a good networking API for c++ I always suggest Boost.Asio. It has the advantage of being platform independent, so you can compile a server for linux, windows etc. However, if you are not too familiar with c++ templates/boost the code can be a little overwhelming. Have a look, give it a try.
In terms of general advice. Given the description above, you seem to need a relatively simple server. I would suggest keeping it very basic, single threaded polling loop. Read a message from your connected clients (wait on multiple sockets), and respond appropriately. This eliminates any issue around multiple accesses to your list and other synchronization problems.
I might also suggest, before you re-write your initial incarnation. Try improving it, as you have stated:
and the system would get several requests per second, all of which would require sending the list back to the clients.
Given that each request removes an item from this list, why not just inform your uses which item is removed, rather than sending the entire list over the network time and time again? If this list is of any significant size, this minor change will result in a large improvement.

Critically efficient server

I am developing a client-server based application for financial alerts, where the client can set a value as the alert for a chosen financial instrument , and when this value will be reached the monitoring server will somehow alert the client (email, sms ... not important) .The server will monitor updates that come from a data generator program. Now, the server has to be very efficient as it has to handle many clients (possible over 50-100.000 alerts ,with updates coming at 1,2 seconds) .I've written servers before , but never with such imposed performances and I'm simply afraid that a basic approach(like before) will just not do it . So how should I design the server ?, what kind of data structures are best suited ?..what about multithreading ?....in general what should I do (and what I should not do) to squeeze every drop of performance out of it ?
Thanks.

I've worked on servers like this before. They were all written in C (or fairly simple C++). But they were even higher performance -- handling 20K updates per second (all updates from most major stock exchanges).
We would focus on not copying memory around. We were very careful in what STL classes we used. As far as updates, each financial instrument would be an object, and any clients that wanted to hear about that instrument would subscribe to it (ie get added to a list).
The server was multi-threaded, but not heavily so -- maybe a thread handing incoming updates, one handling outgoing client updates, one handling client subscribe/release notifications (don't remember that part -- just remember it had fewer threads than I would have expected, but not just one).
EDIT: Oh, and before I forget, the number of financial transactions happening is growing at an exponential rate. That 20K/sec server was just barely keeping up and the architects were getting stressed about what to do next year. I hear all major financial firms are facing similar problems.

You might want to look into using a proven message queue system, as it sounds like this is basically what you are doing in your application.
Projects like Apache's ActiveMQ or RabbitMQ are already widely used and highly tuned, and should be able to support the type of load you are talking about outside of the box.

I would think that squeezing every drop of performance out of it is not what you want to do, as you really never want that server to be under load significant enough to take it out of a real-time response scenario.
Instead, I would use a separate machine to handle messaging clients, and let that main, critical server focus directly on processing input data in "real time" to watch for alert criteria.

Best advice is to design your server so that it scales horizontally.
This means distributing your input events to one or more servers (on the same or different machines), that individually decide whether they need to handle a particular message.
Will you be supporting 50,000 clients on day 1? Then that should be your focus: how easily can you define a single client's needs, and how many clients can you support on a single server?
Second-best advice is not to artificially constrain yourself. If you say "we can't afford to have more than one machine," then you've already set yourself up for failure.

Beware of any architecture that needs clustered application servers to get a reasonable degree of performance. London Stock Exchange had just such a problem recently when they pulled an existing Tandem-based system and replaced it with clustered .Net servers.
You will have a lot of trouble getting this type of performance from a single Java or .Net server - really you need to consider C or C++. A clustered architecture is much more error prone to build and deploy and harder to guarantee uptime from.
For really high volumes you need to think in terms of using asynchronous I/O for networking (i.e. poll(), select() and asynchronous writes or their Windows equivalents), possibly with a pool of worker threads. Read up about the C10K problem for some more insight into this.
There is a very mature C++ framework called ACE (Adaptive Communications Environment) which was designed for high volume server applications in telecommunications. It may be a good foundation for your product - it has support for quite a variety of concurrency models and deals with most of the nuts and bolts of synchronisation within the framework. You might find that the time spent learning how to drive this framework pays you back in less development and easier implementation and testing.

One Thread for the receiving of instrument updates which will process the update and put it in a BlockingQueue.
One Thread to take the update from the BlockingQueue and hand it off to the process that handles that instrument, or set of instruments. This process will need to serialize the events to an instrument so the customer will not receive notices out-of-order.
This process (Thread) will need to iterated through the list of customers registered to receive notification and create a list of customers who should be notified based on their criteria. The process should then hand off the list to another process that will notify the customer of the change.
The notification process should iterate through the list and send each notification event to another process that handles how the customer wants to be notified (email, etc.).
One of the problems will be that with 100,000 customers synchronizing access to the list of customers and their criteria to be monitored.

You should try to find a way to organize the alerts as a tree and be able to quickly decide what alerts can be triggered by an update.
For example let's assume that the alert is the level of a certain indicator. Said indicator can have a range of 0, n. I would groups the clients who want to be notified of the level of the said indicator in a sort of a binary tree. That way you can scale it properly (you can actually implement a subtree as a process on a different machine) and the number of matches required to find the proper subset of clients will always be logarithmic.

Probably the Apache Mina network application framework as well as Apache Camel for messages routing are the good start point. Also Kilim message-passing framework looks very promising.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.