Questions about PHP and Java

Questions about PHP and Java - java

I want to know how Java (JSP) on Tomcat compares to PHP on Apache in terms of performance.
Two servers with the same hardware configurations, one running Tomcat/Java (JSP) the other Apache/PHP, both servers maxed out with how many connections they can handle at once. Would they be somewhat close or would one pull away from the other one by a large margin? I basically just want to know if Tomcat/Java (JSP) is going to be a big performance hit if I switch to it vs PHP. If anyone can give a detailed answer on why one is faster than the other that would be amazing. Links are great too, I was unable to find anything online surprisingly.
Please no Java vs PHP wars, this is about performance only, nothing to do with the languages themselves.
Note: If there is any other concerns I should have for switching to Java from PHP please let me know. I REALLY hate asking this question because I'm usually the first person to say "program in what you like" but in my situation I need whats also good for the projects I work for. I know that there are large sites written in JSP, but it doesn't mean that they're better.
Thanks

What's good for the projects you're working on is to spend as little time as possible to write them as developer time is way more expensive than any perceived differences in performance. So stick with what you're familiar with.
The answer to your question is: they are both fast enough.
Any such comparison is hard because you end up doing things differently in different languages. Java bytecode is probably faster to interpret but then again any decent PHP install uses as opcode cache largely negating any such advantage in real terms.
Java also has a more complicated development model because Web processes are persistent. This can have a performance advantage but also can create problems like memory and other resource leakage, which PHP doesn't tend to have because everything is created and destroyed on each request (barring session information, memcache and so on).
Also PHP extensions can be created for any parts that you want to speed up.
$10,000 can buy an awful lot of hardware. It can buy the hardware to run SO. It doesn't buy much developer time.
I've got experience doing both Java and PHP development. I will generally choose PHP for Web development because of:
quicker to test changes in development (ie no build/deploy steps and Java hot-deploy has serious limitations). Words cannot express how freeing it is to test changes by saving the file you're working on and clicking reload on a browser vs running an Ant/Maven build process;
far fewer issues of memory/resource leakage;
extensive library of functions to do pretty much anything you want;
cheaper to host (at the low end).
I will use Java for some things, like anything that involves a lot of background processing and threading, which aren't PHP's strong points.
You'll note that performance (or the lack thereof) doesn't even rate as a reason for or again.
Sorry if that doesn't answer your question, but such concerns over performance are a pointless distraction.

The best way to answer performance questions is with a benchmark. Implement some simple page in both PHP and Java and then benchmark them using ab (Apache Benchmark).
Having said that, I suspect Java will outperform PHP because of the nature of the 2 platforms. Java is compiled to optimized bytecode (once) and then interpreted by a virtual machine. When Tomcat runs, the JVM loads the classes required for any given page and keeps them in memory so they're ready to go when an HTTP request hits the web server. Contrast that with PHP which reloads and re-interprets the code from scratch with each invocation by Apache. This is helped to a large degree by op-code caching, but still not to the level of what happens in the JVM.

Related

Is there way to benchmark your computer with java code?

I'm java developer and my goal is to understand which computer is best suited for some statistical evaluation. I have 3 different desktops with different os(Windows 7, MacOS, Ubuntu).
JVM based program seems best suited for this benchmark.
Is there some maven besed package which I can put to dependency and run on all these desktops to get HDD/CPU/Memory benchmark?
The question is about java libraries which provides CPU/IO/memory benchmarks...

Not in any meaningful way, AFAIK. The purpose you have proposed "some statistical evaluation" is too broad for meaningful benchmarking.
In fact, the only meaningful approach would be to:
Select the statistical application that you are going to use.
Select a bunch of representative problems; i.e. problems that are typically of what you are going to be doing ... in both quality and "size".
Code the solutions using your selected application.
Run the solutions, and measure the times taken.
Tune the solutions / application and repeat the previous step until you are satisfied that you are getting the best performance out of the application.
Run the application on the candidate machines.
Compare the times, across all of your problems on all machines.
I would posit that unless you are trying to run really large analyses on an underpowered machine, it is not going to make much difference which OS you use. The critical issues are likely to be using a fast enough machine with enough memory (if the analysis requires lots of memory), picking the right application, coding the solutions correctly, and tuning the application. The choice of OS probably won't matter ... unless you push the memory envelope too hard.
I will disagree. If what you are saying is correct there were no such think as SUperPI, 3DMark etc. Only problem with that stuff it is OS specific so I can compare 2 windows laptops only. Performance can be easly measured with elemntal operations such as write/read disk/memmory. Arithmetical operations. Thats is actaully universe of possible computer operations.
Well fine.
If you think you can find a meaningful benchmark that compares application-level performance across different OSes ... go find one.
And if you think such a benchmark is going to give you numbers that are applicable to running Java statistical analysis tools, feel free to use it. (Hint: the OS-specific benchmarks like SUperPI, 3DMark, etc are not great predictors of performance running applications.)
And if you think that Java application performance is only about how fast disk read/write, memory read/write and basic arithmetic instructions ... feel free to continue believing that.
Unfortunately, reality is very different.
But my guess is that doesn't make a lot of difference what OS you choose, provided that the hardware is up to it.

Resource usage of google Go vs Python and Java on Appengine

Will google Go use less resources than Python and Java on Appengine? Are the instance startup times for go faster than Java's and Python's startup times?
Is the go program uploaded as binaries or source code and if it is uploaded as source code is it then compiled once or at each instance startup?
In other words: Will I benefit from using Go in app engine from a cost perspective? (only taking to account the cost of the appengine resources not development time)

Will google Go use less resources than Python and Java on Appengine?
Are the instance startup times for go faster than Java's and Python's
startup times?
Yes, Go instances have a lower memory than Python and Java (< 10 MB).
Yes, Go instances start faster than Java and Python equivalent because the runtime only needs to read a single executable file for starting an application.
Also even if being atm single threaded, Go instances handle incoming request concurrently using goroutines, meaning that if 1 goroutine is waiting for I/O another one can process an incoming request.
Is the go program uploaded as binaries or source code and if it is
uploaded as source code is it then compiled once or at each instance
startup?
Go program is uploaded as source code and compiled (once) to a binary when deploying a new version of your application using the SDK.
In other words: Will I benefit from using Go in app engine from a cost
perspective?
The Go runtime has definitely an edge when it comes to performance / price ratio, however it doesn't affect the pricing of other API quotas as described by Peter answer.

The cost of instances is only part of the cost of your app. I only use the Java runtime right now, so I don't know how much more or less efficient things would be with Python or Go, but I don't imagine it will be orders of magnitude different. I do know that instances are not the only cost you need to consider. Depending on what your app does, you may find API or storage costs are more significant than any minor differences between runtimes. All of the API costs will be the same with whatever runtime you use.
Language "might" affect these costs:
On-demand Frontend Instances
Reserved Frontend Instances
Backed Instances
Language Independent Costs:
High Replication Datastore (per gig stored)
Outgoing Bandwidth (per gig)
Datastore API (per ops)
Blobstore API storge (per gig)
Email API (per email)
XMPP API (per stanza)
Channel API (per channel)

The question is mostly irrelevant.
The minimum memory footprint for a Go app is less than a Python app which is less than a Java app. They all cost the same per-instance, so unless your application performs better with extra heap space, this issue is irrelevant.
Go startup time is less than Python startup time which is less than Java startup time. Unless your application has a particular reason to churn through lots of instance startup/shutdown cycles, this is irrelevant from a cost perspective. On the other hand, if you have an app that is exceptionally bursty in very short time periods, the startup time may be an advantage.
As mentioned by other answers, many costs are identical among all platforms - in particular, datastore operations. To the extent that Go vs Python vs Java will have an effect on the instance-hours bill, it is related to:
Does your app generate a lot of garbage? For many applications, the biggest computational cost is the garbage collector. Java has by far the most mature GC and basic operations like serialization are dramatically faster than with Python. Go's garbage collector seems to be an ongoing subject of development, but from cursory web searches, doesn't seem to be a matter of pride (yet).
Is your app computationally intensive? Java (JIT-compiled) and Go are probably better than Python for mathematical operations.
All three languages have their virtues and curses. For the most part, you're better off letting other issues dominate - which language do you enjoy working with most?

It's probably more about how you allocate the resources than your language choice. I read that GAE was built the be language-agnostic so there is probably no builtin advantage for any language, but you can get an advantage from choosing the language you are comfortable and motivated with. I use python and what made my deployment much more cost-effective was the upgrade to python 2.7 and you can only make that upgrade if you use the correct subset of 2.6, which is good. So if you choose a language you're comfortable with, it's likely that you will gain an advantage from your ability using the language rather than the combo language + environment itself.
In short, I'd recommend python but that's the only app engine language I tried and that's my choice even though I know Java rather well the code for a project will be much more compact using my favorite language python.
My apps are small to medium sized and they cost like nothing:

I haven't used Go, but I would strongly suspect it would load and execute instances much faster, and use less memory purely because it is compiled. Anecdotally from the group, I believe that Python is more responsive than Java, at least in instance startup time.
Instance load/startup times are important because when your instance is hit by more requests than it can handle, it spins up another instance. This makes that request take much longer, possibly giving the impression that the site is generally slow. Both Java and Python have to startup their virtual machine/interpreter, so I would expect Go to be an order of magnitude faster here.
There is one other issue - now Python2.7 is available, Go is the only option that is single-threaded (ironically, given that Go is designed as a modern multi-process language). So although Go requests should be handled faster, an instance can only handle requests serially. I'd be very surprised if this limitation last long, though.

Which programming language for compute-intensive trading portfolio simulation?

I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data).
I plan on employing Amazon web services to take on the entire load of the application.
I have four choices that I am considering as language.
Java
C++
C#
Python
Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements:
Weekly simulation of 10,000,000 trading systems.
(Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration)
Real-time production of portfolio w/ 100,000 trading strategies
Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000)
Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm)
Speed is a concern, but I believe that Java can handle the load.
I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required.
The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same.
Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk.
To sum up:
real time production
weekly simulations of a large number of systems
weekly/monthly optimizations of portfolios
large numbers of connections to collect data from
There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers.
Thank you guys so much for your wisdom.

Pick the language you are most familiar with. If you know them all equally and speed is a real concern, pick C.

While I am a huge fan of Python and personaly I'm not a great lover of Java, in this case I have to concede that Java is the right way to go.
For many projects Python's performance just isn't a problem, but in your case even minor performance penalties will add up extremely quickly. I know this isn't a real-time simulation, but even for batch processing it's still a factor to take into consideration. If it turns out the load is too big for one virtual server, an implementation that's twice as fast will halve your virtual server costs.
For many projects I'd also argue that Python will allow you to develop a solution faster, but here I'm not sure that would be the case. Java has world-class development tools and top-drawer enterprise grade frameworks for parallell processing and cross-server deployment and while Python has solutions in this area, Java clearly has the edge. You also have architectural options with Java that Python can't match, such as Javaspaces.
I would argue that C and C++ impose too much of a development overhead for a project like this. They're viable inthat if you are very familiar with those languages I'm sure it would be doable, but other than the potential for higher performance, they have nothing else to bring to the table.
C# is just a rewrite of Java. That's not a bad thing if you're a Windows developer and if you prefer Windows I'd use C# rather than Java, but if you don't care about Windows there's no reason to care about C#.

I would pick Java for this task. In terms of RAM, the difference between Java and C++ is that in Java, each Object has an overhead of 8 Bytes (using the Sun 32-bit JVM or the Sun 64-bit JVM with compressed pointers). So if you have millions of objects flying around, this can make a difference. In terms of speed, Java and C++ are almost equal at that scale.
So the more important thing for me is the development time. If you make a mistake in C++, you get a segmentation fault (and sometimes you don't even get that), while in Java you get a nice Exception with a stack trace. I have always preferred this.
In C++ you can have collections of primitive types, which Java hasn't. You would have to use external libraries to get them.
If you have real-time requirements, the Java garbage collector may be a nuisance, since it takes some minutes to collect a 20 GB heap, even on machines with 24 cores. But if you don't create too many temporary objects during runtime, that should be fine, too. It's just that your program can make that garbage collection pause whenever you don't expect it.

Why only one language for your system? If I were you, I will build the entire system in Python, but C or C++ will be used for performance-critical components. In this way, you will have a very flexible and extendable system with fast-enough performance. You can find even tools to generate wrappers automatically (e.g. SWIG, Cython). Python and C/C++/Java/Fortran are not competing each other; they are complementing.

Write it in your preferred language. To me that sounds like python. When you start running the system you can profile it and see where the bottlenecks are. Once you do some basic optimisations if it's still not acceptable you can rewrite portions in C.
A consideration could be writing this in iron python to take advantage of the clr and dlr in .net. Then you can leverage .net 4 and parallel extensions. If anything will give you performance increases it'll be some flavour of threading which .net does extremely well.
Edit:
Just wanted to make this part clear. From the description, it sounds like parallel processing / multithreading is where the majority of the performance gains are going to come from.

It is useful to look at the inner loop of your numerical code. After all you will spend most of your CPU-time inside this loop.
If the inner loop is a matrix operation, then I suggest python and scipy, but of the inner loop if not a matrix operation, then I would worry about python being slow. (Or maybe I would wrap c++ in python using swig or boost::python)
The benefit of python is that it is easy to debug, and you save a lot of time by not having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internals.

I would go with pypy. If not, http://lolcode.com/.

Is there any benchmark which compares PHP and JSP?

I don't want to make a holy war. I just want to know if there is such benchmark? Or maybe you can say something about this thread from your experience?

I just stumbled over the language shootout yesterday, again where you can compare some performance characteristics of both languages while running different programs. I didn't find a benchmark for web performance, though.
Fact is, that interpreted languages like PHP are always slower than a compiled language. JSP files get compiled, too, so once the server is up an running and doesn't get changed anymore, the performance will be better than a PHP script that gets interpreted every time a request comes in.
On the other hand, the first performance bottleneck you will have will probably be the database speed, anyway. And then there are still lots of other ways for improving performance like pre compiling your PHP scripts, externalizing heavy calculations into C etc. And compared to the monster of Java web development PHP is easy to learn and to get along with. In the end, if you have the choice, you should go with the language you are most comfortable with. If you are starting a new project you may not even know if all the performance considerations will ever be important because you don't have the users yet and just want to get your application out there quickly.

While Daff's explanation of PHP vs JSP is technically wrong, the essential gist of his post is correct: choose the language that is best for you. Only very rarely will you find yourself in a position where performance really matters badly. At that point, you are much more likely to be able to make significant architectural optimizations in your language of choice - and these optimizations are likely to have significantly more effect than the difference between PHP and JSP.
One of the core rules of programming has always been to avoid premature optimization - if for no other reason than because until you're actually under pressure you don't know what you actually need to optimize, nor do you have a means of determining whether it worked.
In the event that you believe there's a possibility you may face performance issues, no website can help you. The most vital thing is to create your own load testing benchmarks that represent the specifics of how your site works, simulating how your users do things. Only once you have done that are you able to move on to tweaking your code, implementing things like caching, load balancing, data and request partitioning etc with any confidence that the changes you are making are having a positive impact on your site performance.
There are books specifically about the process of optimization in general, but the key sequence is this:
Benchmark
Change test
Benchmark to see if change indicates performance improvement
Go live
Evaluate live response to see if benchmark prediction was correct
(People forget #5 a lot and cause themselves grief)
If you're going to spend time worrying about performance, spend time setting up that sequence, don't spend time worrying about your language choice.

Can the JVM provide snapshot persistence?

Is it possible to dump an image of a running JVM and later restore the previous state by loading the image into the JVM? I'm fairly certain the answer is negative, but would love to be wrong.
With all the dynamic languages available for the JVM comes an increase in interactivity, being able to save a coding session would help save time manually restoring the VM to a previous session.

There was a JSR 323 proposed for this a while back but it was rejected. You can find some links in those articles about the research behind this and what it would take. It was mostly rejected as an idea that was too immature.
I have heard of at least one startup (unfortunately don't recall the name) that was working on a virtualization technology over a hypervisor (probably Xen) that was getting pretty close to being able to move JVMs, including even things like file system refs and socket endpoints. Because they were at the hypervisor level, they had access to all of that stuff. By hooking that and the JVM, they had most of the pieces. I think they might have gone under though.
The closest thing you can get today is Terracotta, which allows you to cluster a portion of your JVM heap, storing it in a server array, which can be made persistent. On JVM startup, you connect to the cluster and can continue using whatever portions of your heap are specified as clustered. The actual objects are faulted in on an as-needed basis.

Not possible at present. In general, pausing and restarting a memory image of a process in a different context is incredibly hard to achieve: what are you going to do with open OS resources? Transfers to machines with different instruction sets? database connections?
Also images of the running JVM are probably quite large - maybe much larger than the subset of the state you are actually interested in. So it's not a good idea from a performance perspective.
A much better strategy is to have code that persists and recreates the application state: this is relatively feasible with most JVM dynamic languages. I do so similar stuff in Clojure, where you have an interactive environment (REPL) and it is quite possible to create and run a sequence of operations that rebuild the application state that you want in another JVM.

This is currently not possible in any of the JVMs I know. It would not be very difficult to implement something like this in the JVM if programs run disconnected from their environments. However, many programs have hooks into their environment (think file handles, database connections) which would make implementing something like this very hairy.

As of early 2023, there's some progress in this space and it seems a lot of things can at least be tried, even if without claims for their production readiness.
One such feature is called CRaC. You can check their docs or even get an OpenJDK build that includes the feature. The project has its own repo under OpenJDK and looks quite promising.
Another vendors/products to check:
Azul ReadyNow!
OpenJ9 InstantOn
What's also really exciting, is AWS Lambda SnapStart. It doesn't give you full snapshoting capabilities, and is intrinsically vendor-specific, but it's what a ton of Java engineering who use AWS Lambda were waiting for so long.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.