PHP Java combination for multithreaded processing - good or bad?

PHP Java combination for multithreaded processing - good or bad? - java

i need to make multiple calls to different web services using PHP and i was wondering if the php-java combination would be more appropriate in dealing with this issue.
The multiple calls to the services if called sequentially will create a significant amount of delay, so i am looking for ways to overcome that.
I have read articles that 'simulate' concurrent processing in php and deal with this particular issue but i was wondering if the introduction of let's say a java socket server that accepts requests and creates worker threads would be more efficient (faster).
Any comments appreciated.
regards,

Interestingly I've been thinking about this issue as well. You have a number of options:
Use PHP calls to fork new processes;
Use a worker framework like beanstalkd to create work requests and have something pick them up;
Use something else like memcache to create work requests.
(2) is the interesting one (to me). You could run CLI PHP scripts to do the processing of beanstalk requests. Or you could use Java. Which depends on a large number of factors. I'd generally favour a single language environment over multi-language where possible and practical. But I can also envision instances where a Java backend would be a good idea.

That's exactly the reason why we switched from php to java - because of multithreading. We had an app that reads rss feeds through http. Switching from single threaded php app to several threads in java gave about 10x boost. I can't say anything about php threading simulation though.

Related

Huge data processing/ HPC in Java - suggest me how to begin

I am thinking to work on a programming problem for which, I suppose, I will need to know a lot of advanced programming concepts. For some reasons I have decided to code it in Java - even though I am not proficient in it.
So I want you to help me with suggestions, guidance, pointers to resources, books, tutorials or any generic advises that you think is pertinent.
Here is the basic nature of my problem:
I need to create a client-server architecture. Server supports multiple concurrent clients. Clients send it simple instructions (may be server exposes some kind of API/ runs listener on specific port), server executes the instructions and send result back to client.
The main job of the server is to do huge volume of data processing based on the instructions given to it. It takes data from backend database/ file systems. Data volume can easily surge up to ~ 200GB - 700GB. Data will be usually streamed to it, but it may require to hold huge volume of data in memory cache during processing (and if RAM is not enough, then page it to disk). Computations are generally numerically intensive in nature (let's say taking the inverse of a matrix)
The server should be able to do multithreading (I don't know what this term mean in Java, what I wish is, the server should be able to distribute the job in multiple parallel sub-processes.)
The server itself should be very lightweight. I Do NOT need any GUI Interface.
It will be great if I design it in a way so that I can integrate it later with HPC frameworks like Hadoop.
Now if I got to do this, what kind of programming do I need to learn? By the way, I have good understanding on OOP, I am somewhat familiar with Data Structures and algorithms, I know basic Java (never done any network or multithreaded programming in Java before, but have used typical oop concepts, generics, comparable interfaces etc.). I basically work in database programming, but have also done lot of C, C++, C#, Python in the past.
Given the requirement and my background, please suggest,
How should I begin to work on this project? What is the way to architect the project?
Should I create some basic API definitions first and then start working on the details?
Should I follow any particular design pattern? Where to learn them from?
What are the things I need to learn in Java and where to learn them from?
What is the best way to read huge data in memory? Is Java nio good solution?
If I instantiate a class with huge amount of data, would it work? (example, let's say I have a Vector class to represent a matrix with millions of elements and the constructor of the class reads huge data set in the memory). What's the best way to handle that?

You will want to define how the client and server will talk to eachother. The easiest way is to use established protocols such as HTTP by creating REST services that the client can call without much coding.
Most frameworks that support HTTP create several listeners that run in different threads. This gives you multi threading out of the box.
I'd suggest looking into I prefer Spring Controllers. Spring is fairly light weight.
If you want to use these frameworks, you will want to quickly find, and incorporate them into your application for compilation and packaging.
I would suggest looking into Maven for this. It's a big time saver. In particular using archetypes to create your project's folder structure, and auto download dependencies, and their dependencies.
Finally my words of wisdom. Ensure your services are singleton stateless services. This means you only create the objects once, and each thread uses the same objects. There is lots less garbage collection happening. This makes a huge difference when processing large amounts of requests.
Be careful not to use class level variables to hold state, in these services. If you do, different threads will over write each others data.

First thing I would like to say that as per your explanation of the things you seem to be in a pretty good shape to use java as your server side language.
The kind of client server architecture you choose may depend on what kind of clients actually you are serving to. Would they be typical GUI or CUI based desktop clients or the web clients.
In the latter case you could use Spring Framework in a normal fashion and for the former one you could go further to explore Spring's support for Restful Web services. I would advise not to go with socket or TCP based networking solutions or use java networking.
Spring's RESTful API gives you a very cool abstraction over things like networking and multi threading even for a desktop based client. In case of a desktop client you can use JSON/XML as response and can use HttpClient library for making calls to server, which is a very cool abstraction of the underlying networking stuff.
Further up Spring's design patterns follow a very linear flow of data. A lot of your fundamental design considerations are catered by the Spring itself using Dependency Injection and Inversion of Control which are extremely simple to incorporate.
For a detailed analysis of design patterns related to specific requirements I would suggest you to read the book called Java Design Patterns: A Tutorial of Addison Wesley publications and the author is James W. Cooper.
One more thing about the API design. It would be preferable for you to first create a API specification and then go further to implement them.

Advice around Web Service consuming another Web Service

I'm designing an application and trying to do some research on how it should work and any tips etc that I could use.
I need to develop a middleware Web service running on Tomcat 6.
Client program consumes my webservice.
My Webservice in turns needs to run a number of searches, 10, based on information from the client. These searches are using an 3rd Party web service. 3rd Party supply Java Stub Classes.
Can/Should I write my web service to be multi-threaded so each thread is created and used for a search and results collated and returned to client?
Searches can take a while to complete approx 200-500ms
All advice is gratefully received,

I'm a bit uncertain as to what your needs are exactly. Are the searches able to be run in parallel? If that is the case it may not be a bad idea to use multi-threading for executing them.
We have something similar in an application I'm working on - a long-running search is run in a separate thread, so that other processing can continue, and then when it is finished the results are sent back to the client.
There is no problem with this, we run on Tomcat 6 and it works fine. Obviously the usual caveats with multi-threading apply, we're using the Java 6 java.util.concurrent library which is really useful.

There do seem to be potential benefits here in having several back-end queries running in parallel, so some kind of multi-threading seems like a good idea.
There are a few issues that occur to me:
The direct spawning of threads in a Java EE container is not generally recomended - Java EE containers like to be in control of that - so there are specifically supported APIs for doing that in the Java EE world (see this answer for more on this topic.) I don't know whether Tomcat supports such APIs these days, if not then perhaps something like this may work.
You need a good strategy to deal will broken and slow responses. Suppose you have 7 out of 8 responses and the 8th seems to take a very long time is it better to give a rapid partial responses. Best to think about this up front.
Which leads to is it better to have some kind of "noticebaord" approach, send a request, come back later to collect interim results, come back yet later to collect more complete results.
Some back end systems might react badly to excessive requests from the same source. You may need to throttle request freuqencies both to be "sociable" and also to avoid any black-listing policies.

How can I communicate between PHP and a Java program?

I'm working on a web application that frequently requires a calculation intense query to be run, the results of which are stored in a separate table. Using MySQL, this query takes about 500ms (as optimized as possible, believe me). To eliminate this bottleneck, I've created a Java program that loads the relevant DB data into memory and performs the query itself; it takes about 8ms (something I'm a little bit proud of). I'd like to use this Java program to get the results, and if it fails or is unavailable, failover to having PHP run a MySQL query.
Since loading the data into the Java application takes some time, it's going to load it once and remain running as a background process. Now, the question is how do I communicate with this Java application via PHP?
Keep in mind:
Multiple instances of PHP may need to communicate with this Java process simultaneously.
If the Java instance cannot be found (eg: it crashes for some reason) PHP should progress by using the older and slower MySQL method.
An intermediary process, such as Memcache, is acceptable.
Ideally, the solution would withstand race conditions.
I would preferably not like to use MySQL as the intermediary.
I was going to use Memcache, where PHP would write to a known key and poll until that key changed to "completed", meanwhile Java would poll that key and once it found something perform the job and set it to "completed". However, this wouldn't work for two reasons. First, both PHP and Java read/write to Memcache using serialized objects, and there's no way to change that, and I don't want Java to unserialize PHP objects and vice/versa -- it's too messy. Second, this is not ACID compliant -- if a queue built up there would be race conditions.
For now, I'm stuck with polling MySQL "selects" to see if a job is off the queue or not, which is far from an optimal solution because the poll time will need to be slower so MySQL doesn't get pinged too frequently. I need a better solution!
Thanks.
Edit: Duh. It looks like I will be using some sort of SocketServer in Java, which I'm unfamiliar with. An example might help :)

I'm using socket server on the Java end, and PHP sockets on the PHP end. Works great.
There's no need to overcomplicate things with PHP/Java bridge, and no need for overhead of creating a web server.
Sockets work great, and I'm actually a bit ashamed I even asked the question to begin work.

My suggestion is to use WebServices... Write and run webservice in Java, and then request it in php by using f.e. NuSOAP. This solution have one more advantage - your webservice can be used easily in other applications like f.e. .NET ones...
Another option which might be easier if you have small number of methods is to build Servlet in Java which will take the parameters as GET request.
Both those solutions are strictly web-based, and both of them are working on separate threads so they guarantee you good performance.

Fastest(performance-wise) way to share data(not objects) between .Net & Java

I know of at least one post which has same words like this. But this is not exactly same as that post. I'm trying to work a way to "share" data between a .NET and Java application. I'm not concerned about objects, but just plain strings if u like.
I have a .NET application capturing real-time data and a Java application which has capability to analyze and work on this data. I'm looking for ways to re-use this same java app without coding it entirely in .NET.
My problem is that the data is "fairly" REAL-Time (.NET), and so has to be the analysis (Java). I can live with microsecond delays but I can't afford one second delay. WebServices, Queues (as in Messaging Queues), RDBMS are some of the options I can think of. Is there any better way?
Or has anybody got some real performance numbers for the solutions I mentioned above to select one of them? And just to get started: RDBMSs' are not "THAT" good for concurrent (connections doing) insertion/updation/reading, at least with the crude way of doing DBMS stuff. (Deadlocks?)

What are "objects" if not a mechanism for describing "data"? But I digress - I suspect I would look at a TCP socket between the two. If the data is very basic, then fine - just write directly to the stream; if there is any complexity, perhaps use something like "protocol buffers" to provide an easy way of reading/writing dense data to a stream without having to write every last byte yourself.
I think microsecond delays are going to be a challenge for any approach here... will millisecond delays do?

For completeness:
Another possible is to use Named pipes, it should be pretty quick, and I'd imagine (being a java guy I can only imagine) that .NET has native support for them. The down side is that on windows you'll have to either write a JNI extension or use a library like JNA to poke around at the Win32 API from Java.

Sounds like a local socket could do. The latency should be in low ms or less.

Depending on your program you may get some milage out of what #Cowan reports in answer to 'Any Concept of shared memory in java', his answer is: Any concept of shared memory in Java
In summary: he say's that you can use memory mapped files between two processes on the same machine. This in theory could work between .NET and java assuming .NET has some memory mapped file support.

Different machines communicate with each other by sending messages into sockets. Please check the below link for example.
Socket programming in the real world

Answers provided here are great. One idea that might be of interest, but is probably asking for more trouble than it's worth is to load both VMs in a single process (both the JVM and the CLR can be loaded within a native Windows application) and give them access to native code. Java via JNI and .Net via the mapping functions to native code that they allow.
You could also leverage native queue semaphores to wake up a thread on one side or the other when data is updated.
While JNI transitions are expense, they would probably still be faster than the native local socket implementation.

How is your Java application currently deployed? It sounds to me like you're willing to make some modification to it, so I'm assuming you have access to the source code.
I know this is a little out there, but could you compile the Java application in the J# compiler, so that your .NET app has native access to it?

You can convert your compiled java application to .NET by IKVM. After that you can change logic of your .NET application so it will not make data transfers to Java application, but just call data processing code written in Java as it were written and compiled for .NET.

There are a number of JMS servers which support .NET and Java clients. These can perform messages in under a millisecond.
However you might like to try an RPC solution like Hessian RPC or Protobuf RPC. These can achieve lower latencies and can give the appearance of direct calls between platforms. These support .NET and Java as well.

Connect PHP code to Java backend

I am implementing a website using PHP for the front end and a Java service as the back end. The two parts are as follows:
PHP front end listens to http requests and interacts with the database.
The Java back end run continuously and responds to calls from the front end.
More specifically, the back end is a daemon that connects and maintain the link to several IM services (AOL, MSN, Yahoo, Jabber...).
Both of the layers will be deployed on the same system (a CentOS box, I suppose) and introducing a middle layer (for instance: using XML-RPC) will reduce the performance (the resource is also rather limited).
Question: Is there a way to link the two layers directly? (no more web services in between)

Since this is communication between two separate running processes, a "direct" call (as in JNI) is not possible. The easiest ways to do such interprocess communcation are probably named pipes and network sockets. In both cases, you'll have to define a communication protocol and implement it on both sides. Using a standard protocol such as XML-RPC makes this easier, but is not strictly necessary.

There are generally four patterns for application integration:
via Filesystem, ie. one producers writes data to a directory monitored by the consumer
via Database, ie. two applications share a schema or table and use it to swap data
via RMI/RPC/web service/any blocking, sync call from one app to another. For PHP to Java you can pick from the various integration libraries listed above, or use some web services standards like SOAP.
via messaging/any non-blocking, async operation where one app sends a message to another app.
Each of these patterns has pros and cons, but a good rule of thumb is to pick the one with the loosest coupling that you can get away with. For example, if you selected #4 your Java app could crash without also taking down your PHP app.
I'd suggest before looking at specific libraries or technologies listed in the answers here that you pick the right pattern for you, then investigate your specific options.

I have tried PHP-Java bridge(php-java-bridge.sourceforge.net/pjb/) and it works quite well. Basically, we need to run a jar file (JavaBridge.jar) which listens on port(there are several options available like Local socket, 8080 port and so on). Your java class files must be availabe to the JavaBridge in the classpath. You need to include a file Java.inc in your php and you can access the Java classes.

Sure, there are lots of ways, but you said about the limited resource...
IMHO define your own lightweight RPC-like protocol and use sockets on TCP/IP to communicate. Actually in this case there's no need to use full advantages of RPC etc... You need only to define API for this particular case and implement it on both sides. In this case you can serialize your packets to quite small. You can even assign a kind of GUIDs to your remote methods and use them to save the traffic and speed-up your intercommunication.
The advantage of sockets usage is that your solution will be pretty scalable.

You could try the PHP/Java integration.
Also, if the communication is one-way (something like "sendmail for IM"), you could write out the PHP requests to a file and monitor that in your Java app.

I was also faced with this problem recently. The Resin solution above is actually a complete re-write of PHP in Java along the lines of JRuby, Jython and Rhino. It is called Quercus. But I'm guessing for you as it was for me, tossing out your Apache/PHP setup isn't really an option.
And there are more problems with Quercus besides: the free version is GPL, which is tricky if you're developing commercial software (though not as tricky as Resin would like you to believe (but IANAL)) and on top of that the free version doesn't support compiling to byte code, so its basically an interpreter written in Java.
What I decided on in the end was to just exchange simple messages over HTTP. I used PHP's json_encode()/json_decode() and Java's json-lib to encode the messages in JSON (simple, text-based, good match for data model).
Another interesting and light-weight option would be to have Java generate PHP code and then use PHP include() directive to fetch that over HTTP and execute it. I haven't tried this though.
If its the actual HTTP calls you're concerned about (for performance), neither of these solutions will help there. All I can say is that I haven't had problems with the PHP and Java on the same LAN. My feeling is that it won't be a problem for the vast majority of applications as long as you keep your RPC calls fairly course-grained (which you really should do anyway).

Sorry, this is a bit of a quick answer but: i heard the Resin app server has support for integrating java and PHP.
They claim they can smash php and java together: http://www.caucho.com/resin-3.0/quercus/
I've used resin for serving J2ee applications, but not for its PHP support.
I'd be interested to hear of such adventures.

Why not use web service?
Make a Java layer and put a ws access(Axis, SpringWS, etc...) and the Php access the Java layer using one ws client.
I think it's simple and useful.

I've come across this page which introduces a means to link the two layers. However, it still requires a middle layer (TCP/IP). Moreover, other services may exploit the Java service as well because it accepts all incoming connections.
http://www.devx.com/Java/Article/20509
[Researching...]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.