Resource usage of google Go vs Python and Java on Appengine

Resource usage of google Go vs Python and Java on Appengine - java

Will google Go use less resources than Python and Java on Appengine? Are the instance startup times for go faster than Java's and Python's startup times?
Is the go program uploaded as binaries or source code and if it is uploaded as source code is it then compiled once or at each instance startup?
In other words: Will I benefit from using Go in app engine from a cost perspective? (only taking to account the cost of the appengine resources not development time)

Will google Go use less resources than Python and Java on Appengine?
Are the instance startup times for go faster than Java's and Python's
startup times?
Yes, Go instances have a lower memory than Python and Java (< 10 MB).
Yes, Go instances start faster than Java and Python equivalent because the runtime only needs to read a single executable file for starting an application.
Also even if being atm single threaded, Go instances handle incoming request concurrently using goroutines, meaning that if 1 goroutine is waiting for I/O another one can process an incoming request.
Is the go program uploaded as binaries or source code and if it is
uploaded as source code is it then compiled once or at each instance
startup?
Go program is uploaded as source code and compiled (once) to a binary when deploying a new version of your application using the SDK.
In other words: Will I benefit from using Go in app engine from a cost
perspective?
The Go runtime has definitely an edge when it comes to performance / price ratio, however it doesn't affect the pricing of other API quotas as described by Peter answer.

The cost of instances is only part of the cost of your app. I only use the Java runtime right now, so I don't know how much more or less efficient things would be with Python or Go, but I don't imagine it will be orders of magnitude different. I do know that instances are not the only cost you need to consider. Depending on what your app does, you may find API or storage costs are more significant than any minor differences between runtimes. All of the API costs will be the same with whatever runtime you use.
Language "might" affect these costs:
On-demand Frontend Instances
Reserved Frontend Instances
Backed Instances
Language Independent Costs:
High Replication Datastore (per gig stored)
Outgoing Bandwidth (per gig)
Datastore API (per ops)
Blobstore API storge (per gig)
Email API (per email)
XMPP API (per stanza)
Channel API (per channel)

The question is mostly irrelevant.
The minimum memory footprint for a Go app is less than a Python app which is less than a Java app. They all cost the same per-instance, so unless your application performs better with extra heap space, this issue is irrelevant.
Go startup time is less than Python startup time which is less than Java startup time. Unless your application has a particular reason to churn through lots of instance startup/shutdown cycles, this is irrelevant from a cost perspective. On the other hand, if you have an app that is exceptionally bursty in very short time periods, the startup time may be an advantage.
As mentioned by other answers, many costs are identical among all platforms - in particular, datastore operations. To the extent that Go vs Python vs Java will have an effect on the instance-hours bill, it is related to:
Does your app generate a lot of garbage? For many applications, the biggest computational cost is the garbage collector. Java has by far the most mature GC and basic operations like serialization are dramatically faster than with Python. Go's garbage collector seems to be an ongoing subject of development, but from cursory web searches, doesn't seem to be a matter of pride (yet).
Is your app computationally intensive? Java (JIT-compiled) and Go are probably better than Python for mathematical operations.
All three languages have their virtues and curses. For the most part, you're better off letting other issues dominate - which language do you enjoy working with most?

It's probably more about how you allocate the resources than your language choice. I read that GAE was built the be language-agnostic so there is probably no builtin advantage for any language, but you can get an advantage from choosing the language you are comfortable and motivated with. I use python and what made my deployment much more cost-effective was the upgrade to python 2.7 and you can only make that upgrade if you use the correct subset of 2.6, which is good. So if you choose a language you're comfortable with, it's likely that you will gain an advantage from your ability using the language rather than the combo language + environment itself.
In short, I'd recommend python but that's the only app engine language I tried and that's my choice even though I know Java rather well the code for a project will be much more compact using my favorite language python.
My apps are small to medium sized and they cost like nothing:

I haven't used Go, but I would strongly suspect it would load and execute instances much faster, and use less memory purely because it is compiled. Anecdotally from the group, I believe that Python is more responsive than Java, at least in instance startup time.
Instance load/startup times are important because when your instance is hit by more requests than it can handle, it spins up another instance. This makes that request take much longer, possibly giving the impression that the site is generally slow. Both Java and Python have to startup their virtual machine/interpreter, so I would expect Go to be an order of magnitude faster here.
There is one other issue - now Python2.7 is available, Go is the only option that is single-threaded (ironically, given that Go is designed as a modern multi-process language). So although Go requests should be handled faster, an instance can only handle requests serially. I'd be very surprised if this limitation last long, though.

Related

Why is Erlang slower than Java on all these small math benchmarks?

While considering alternatives for Java for a distributed/concurrent/failover/scalable backend environment I discovered Erlang. I've spent some time on books and articles where nearly all of them (even Java addicted guys) says that Erlang is a better choice in such environments, as many useful things are out of the box in a less error prone way.
I was sure that Erlang is faster in most cases mainly because of a different garbage collection strategy (per process), absence of shared state (b/w threads and processes) and more compact data types. But I was very surprised when I found comparisons of Erlang vs Java math samples where Erlang is slower by several orders, e.g. from x10 to x100.
Even on concurrent tasks, both on several cores and a single one.
What's the reasons for that? These answers came to mind:
Usage of Java primitives (=> no heap/gc) on most of the tasks
Same number of threads in Java code and Erlang processes so the actor model has no advantage here
Or just that Java is statically typed, while Erlang is not
Something else?
If that's because these are very specific math algorithms, can anybody show more real/practice performance tests?
UPDATE: I've got the answers so far summarizing that Erlang is not the right tool for such specific "fast Java case", but the thing that is unclear to me - what's the main reason for such Erlang inefficiency here: dynamic typing, GC or poor native compiling?

Erlang was not built for math. It was built with communication, parallel processing and scalability in mind, so testing it for math tasks is a bit like testing if your jackhammer gives you refreshing massage experience.
That said, let's offtop a little:
If you want Erlang-style programming in JVM, take a look at Scala Actors or Akka framework or Vert.x.

Benchmarks are never good for saying anything else than what they are really testing. If you feel that a benchmark is only testing primitives and a classic threading model, that is what you get knowledge about. You can now with some confidence say that Java is faster than Erlang on mathematics on primitives as well as the classic threading model for those types of problems. You don't know anything about the performance with large number of threads or for more involved problems because the benchmark didn't test that.
If you are doing the types of math that the benchmark tested, go with Java because it is obviously the right tool for that job. If you want to do something heavily scalable with little to no shared state, find a benchmark for that or at least re-evaluate Erlang.
If you really need to do heavy math in Erlang, consider using HiPE (consider it anyway for that matter).

I took interest to this as some of the benchmarks are a perfect fit for erlang, such as gene sequencing. So on http://benchmarksgame.alioth.debian.org/ the first thing I did was look at reverse-complement implementations, for both C and Erlang, as well as the testing details. I found that the test is biased because it does not discount the time it takes erlang to start the VM /w the schedulers, natively compiled C is started much faster. The way those benchmarks measure is basically:
time erl -noshell -s revcomp5 main < revcomp-input.txt
Now the benchmark says Java took 1.4 seconds and erlang /w HiPE took 11. Running the (Single threaded) Erlang code took me 0.15 seconds, and if you discount the time it took to start the vm, the actual workload took only 3000 microseconds (0.003 seconds).
So I have no idea how that is benchmarked, if its done 100 times then it makes no sense as the cost of starting the erlang VM will be x100. If the input is a lot longer than given, it would make sense, but I see no details on the webpage of that. To make the benchmarks more fair for Managed languages, have the code (Erlang/Java) send a Unix signal to the python (that is doing the benchmarking) that it hit the startup function.
Now benchmark aside, the erlang VM essentially just executes machine code at the end, as well as the Java VM. So there is no way a math operation would take longer in Erlang than in Java.
What Erlang is bad at is data that needs to mutate often. Such as a chained block cypher. Say you have the chars "0123456789", now your encryption xors the first 2 chars by 7, then xors the next two chars by the result of the first two added, then xors the previous 2 chars by the results of the current 2 subtracted, then xors the next 4 chars.. etc
Because objects in Erlang are immutable this means that the entire char array needs to be copied each time you mutate it. That is why erlang has support for things called NIFS which is C code you can call into to solve this exact problem. In fact all the encryption (ssl,aes,blowfish..) and compression (zlib,..) that ship with Erlang are implemented in C, also there is near 0 cost associated with calling C from Erlang.
So using Erlang you get the best of both worlds, you get the speed of C with the parallelism of Erlang.
If I were to implement the reverse-complement in the FASTEST way possible, I would write the mutating code using C but the parallel code using Erlang. Assuming infinite input, I would have Erlang split on the >
<<Line/binary, ">", Rest/binary>> = read_stream
Dispatch the block to the first available scheduler via round robin, consisting of infinite EC2 private networked hidden nodes, being added in real time to the cluster every millisecond.
Those nodes then call out to C via NIFS for processing (C was the fastest implementation for reverse-compliment on alioth website), then send the output back to the node master to send out to the inputer.
To implement all this in Erlang I would have to write code as if I was writing a single threaded program, it would take me under a day to create this code.
To implement this in Java, I would have to write the single threaded code, I would have to take the performance hit of calling from Managed to Unmanaged (as we will be using the C implementation for the grunt work obviously), then rewrite to support 64 cores. Then rewrite it to support multiple CPUS. Then rewrite it again to support clustering. Then rewrite it again to fix memory issues.
And that is Erlang in a nutshell.

As pointed in other answers - Erlang is designed to solve effectively real life problems, which are bit opposite to benchmark problems.
But I'd like to enlighten one more aspect - pithiness of erlang code (in some cases means rapidness of development), which could be easily concluded, after comparing benchmarks implementations.
For example, k-nucleotide benchmark:
Erlang version: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=hipe&id=3
Java version: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=java&id=3
If you want more real-life benchmarks, I'd suggest you Comparing C++ And Erlang For Motorola Telecoms Software

The Erlang solution uses ETS, Erlang Term Storage, which is like an in-memory database running in a separate process. Consequent to it being in a separate process, all messages to and from that process must be serialized/deserialized. This would account for a lot of the slowness, I should think. For example, if you look at the "regex-dna" benchmark, Erlang is only slightly slower than Java there, and it doesn't use ETS.

The fact that erlang has to allocate memory for every value whereas in java you will typically reuse variables if you want it to be fast, means it will always be faster for 'tight loop' bench marks.
It would be interesting to benchmark a java version using the -client flag and boxed primitives and compare that to erlang.
I believe using hipe is unfair since it is not an active project. I would be interested to know if any mission critical software is running on this.

I don't know anything about Erlang, but this seems to be a compare apples to oranges approach anyways. You must be aware that considerable effort was spent over more than a decade to improve java preformance to the point where it is today.
Its not surprising (to me) that a language implementation done by volunteers or a small company can not outmatch that effort.

Why Java is called more efficient (in terms of resources consumption) but less speedy, and how is that advantageous in context of web applications?

I have heard a lot of times Java being more resource efficient than being a very fast language. What does efficiency in terms of resources consumption actually means and how is that beneficial for web applications context ?

For web applications, raw performance of the application code is mostly irrelevant (of course if you make it excessively slow, you're still in trouble) as the main performance bottlenecks are outside your application code.
They're network, databases, application servers, browser rendering time, etc. etc..
So even if Java were slow (it isn't) it wouldn't be so much of a problem.
An application however that consumes inordinate amounts of memory or CPU cycles can bring a server to its knees.
But again, that's often less a problem of the language you use than of the way you use it.
For example, creating a massive memory structure and not properly disposing of it can cause memory problems always. Java however makes it harder to not dispose of your memory (at the cost of a tiny bit of performance and maybe hanging on to that memory longer than strictly necessary). But similar garbage collection systems can be (and have been) built and used with other languages as well.

Resources are cheap these days, esp memory, CPU, disk. So unless you are programming for a phone, or a very complex problem efficiency is unlikely the best reason to use/not use Java.
Java is more efficient than some languages, less efficient than others.
For web applications, there are many other considerations, like user experience to consider first.

A programming language can't be "fast" or "slow" or "resource efficient".

For architecture overview, MSDN Architecture Guide seems nice.
For most accurate/technical description of difference between interface and abstract class, refer to the spec for your language. For Java it's Java Language Spec.

Which programming language for compute-intensive trading portfolio simulation?

I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data).
I plan on employing Amazon web services to take on the entire load of the application.
I have four choices that I am considering as language.
Java
C++
C#
Python
Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements:
Weekly simulation of 10,000,000 trading systems.
(Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration)
Real-time production of portfolio w/ 100,000 trading strategies
Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000)
Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm)
Speed is a concern, but I believe that Java can handle the load.
I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required.
The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same.
Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk.
To sum up:
real time production
weekly simulations of a large number of systems
weekly/monthly optimizations of portfolios
large numbers of connections to collect data from
There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers.
Thank you guys so much for your wisdom.

Pick the language you are most familiar with. If you know them all equally and speed is a real concern, pick C.

While I am a huge fan of Python and personaly I'm not a great lover of Java, in this case I have to concede that Java is the right way to go.
For many projects Python's performance just isn't a problem, but in your case even minor performance penalties will add up extremely quickly. I know this isn't a real-time simulation, but even for batch processing it's still a factor to take into consideration. If it turns out the load is too big for one virtual server, an implementation that's twice as fast will halve your virtual server costs.
For many projects I'd also argue that Python will allow you to develop a solution faster, but here I'm not sure that would be the case. Java has world-class development tools and top-drawer enterprise grade frameworks for parallell processing and cross-server deployment and while Python has solutions in this area, Java clearly has the edge. You also have architectural options with Java that Python can't match, such as Javaspaces.
I would argue that C and C++ impose too much of a development overhead for a project like this. They're viable inthat if you are very familiar with those languages I'm sure it would be doable, but other than the potential for higher performance, they have nothing else to bring to the table.
C# is just a rewrite of Java. That's not a bad thing if you're a Windows developer and if you prefer Windows I'd use C# rather than Java, but if you don't care about Windows there's no reason to care about C#.

I would pick Java for this task. In terms of RAM, the difference between Java and C++ is that in Java, each Object has an overhead of 8 Bytes (using the Sun 32-bit JVM or the Sun 64-bit JVM with compressed pointers). So if you have millions of objects flying around, this can make a difference. In terms of speed, Java and C++ are almost equal at that scale.
So the more important thing for me is the development time. If you make a mistake in C++, you get a segmentation fault (and sometimes you don't even get that), while in Java you get a nice Exception with a stack trace. I have always preferred this.
In C++ you can have collections of primitive types, which Java hasn't. You would have to use external libraries to get them.
If you have real-time requirements, the Java garbage collector may be a nuisance, since it takes some minutes to collect a 20 GB heap, even on machines with 24 cores. But if you don't create too many temporary objects during runtime, that should be fine, too. It's just that your program can make that garbage collection pause whenever you don't expect it.

Why only one language for your system? If I were you, I will build the entire system in Python, but C or C++ will be used for performance-critical components. In this way, you will have a very flexible and extendable system with fast-enough performance. You can find even tools to generate wrappers automatically (e.g. SWIG, Cython). Python and C/C++/Java/Fortran are not competing each other; they are complementing.

Write it in your preferred language. To me that sounds like python. When you start running the system you can profile it and see where the bottlenecks are. Once you do some basic optimisations if it's still not acceptable you can rewrite portions in C.
A consideration could be writing this in iron python to take advantage of the clr and dlr in .net. Then you can leverage .net 4 and parallel extensions. If anything will give you performance increases it'll be some flavour of threading which .net does extremely well.
Edit:
Just wanted to make this part clear. From the description, it sounds like parallel processing / multithreading is where the majority of the performance gains are going to come from.

It is useful to look at the inner loop of your numerical code. After all you will spend most of your CPU-time inside this loop.
If the inner loop is a matrix operation, then I suggest python and scipy, but of the inner loop if not a matrix operation, then I would worry about python being slow. (Or maybe I would wrap c++ in python using swig or boost::python)
The benefit of python is that it is easy to debug, and you save a lot of time by not having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internals.

I would go with pypy. If not, http://lolcode.com/.

Questions about PHP and Java

I want to know how Java (JSP) on Tomcat compares to PHP on Apache in terms of performance.
Two servers with the same hardware configurations, one running Tomcat/Java (JSP) the other Apache/PHP, both servers maxed out with how many connections they can handle at once. Would they be somewhat close or would one pull away from the other one by a large margin? I basically just want to know if Tomcat/Java (JSP) is going to be a big performance hit if I switch to it vs PHP. If anyone can give a detailed answer on why one is faster than the other that would be amazing. Links are great too, I was unable to find anything online surprisingly.
Please no Java vs PHP wars, this is about performance only, nothing to do with the languages themselves.
Note: If there is any other concerns I should have for switching to Java from PHP please let me know. I REALLY hate asking this question because I'm usually the first person to say "program in what you like" but in my situation I need whats also good for the projects I work for. I know that there are large sites written in JSP, but it doesn't mean that they're better.
Thanks

What's good for the projects you're working on is to spend as little time as possible to write them as developer time is way more expensive than any perceived differences in performance. So stick with what you're familiar with.
The answer to your question is: they are both fast enough.
Any such comparison is hard because you end up doing things differently in different languages. Java bytecode is probably faster to interpret but then again any decent PHP install uses as opcode cache largely negating any such advantage in real terms.
Java also has a more complicated development model because Web processes are persistent. This can have a performance advantage but also can create problems like memory and other resource leakage, which PHP doesn't tend to have because everything is created and destroyed on each request (barring session information, memcache and so on).
Also PHP extensions can be created for any parts that you want to speed up.
$10,000 can buy an awful lot of hardware. It can buy the hardware to run SO. It doesn't buy much developer time.
I've got experience doing both Java and PHP development. I will generally choose PHP for Web development because of:
quicker to test changes in development (ie no build/deploy steps and Java hot-deploy has serious limitations). Words cannot express how freeing it is to test changes by saving the file you're working on and clicking reload on a browser vs running an Ant/Maven build process;
far fewer issues of memory/resource leakage;
extensive library of functions to do pretty much anything you want;
cheaper to host (at the low end).
I will use Java for some things, like anything that involves a lot of background processing and threading, which aren't PHP's strong points.
You'll note that performance (or the lack thereof) doesn't even rate as a reason for or again.
Sorry if that doesn't answer your question, but such concerns over performance are a pointless distraction.

The best way to answer performance questions is with a benchmark. Implement some simple page in both PHP and Java and then benchmark them using ab (Apache Benchmark).
Having said that, I suspect Java will outperform PHP because of the nature of the 2 platforms. Java is compiled to optimized bytecode (once) and then interpreted by a virtual machine. When Tomcat runs, the JVM loads the classes required for any given page and keeps them in memory so they're ready to go when an HTTP request hits the web server. Contrast that with PHP which reloads and re-interprets the code from scratch with each invocation by Apache. This is helped to a large degree by op-code caching, but still not to the level of what happens in the JVM.

Java on mainframes

I work for a large corporation that runs a lot of x86 based servers on which we run JVMs.
We have experimented successfully with VMWare ESX to get better usage out of our data center. But these still consume a lot of power per processing unit.
I had a mad idea that we should resurrect mainframes, we could host either lots of JVMs or virtual machines.
Has anyone tried this? Are there any good cost-benefits?
Do you lose flexibility? E.g. we have mainframes in other parts of the company but they seem to have much more rigid usage of the machines.. lots of change control, long lead times etc

IBM makes a special Java co-processor that you should seriously consider. I would not run Java on the general engines as this may increase MPU charges for licensed software.

All this assumes you’re talking about Java on Z/OS and not running Linux VM’s on the mainframe to take advantage of the cost savings that come with fewer machines.
My thoughts on virtualization are at the end of this and it’s probably the route you want to look at but I’ll start out with Z/OS since it’s what mainframes are traditionally associated with and what I have familiarity with. I have some experience with mainframe Java.
The short answer is, it depends, but probably not. What exactly are your applications? The mainframe is a difficult environment compared to x86 servers. If you're running I/O-intensive workloads under something like Websphere, it might be worth it, assuming your mainframe is underutilized.
In my experience, Java is horribly slow on a mainframe but that’s because the system I used was set up for developer flexibility rather than performance. That just goes to prove performance tuning on the mainframe is usually much more complicated then on an average server since mainframes will be running many more workloads then a generic x86 server.
Remember that the mainframe is designed primarily for I/O throughput and can outperform any normal x86 server at that. It was not designed to do a lot of computationally intensive calculations so won’t outperform a small cluster of x86 servers if your doing a lot of math.
The change controls on mainframes are there for a good reason - if one x86 server has a problem, you reboot it. If a mainframe has a problem, every second that it’s down is costing the company money. You also have to take into account any native code your apps depend on or third party libraries that may use native code. All that code would have to be ported.
Configuration of a mainframe also takes a lot longer on average then on an x86 server. I would suggest that, if you want to seriously look into this, you make a better business case than power savings, such as tight integration with current business apps and start out small either with a proof of concept or a new application. One that is not business critical, that can be implemented to take advantage of the mainframes strengths.
IBM mainframes can also run Linux in either native mode or a virtualized environment similar to VMWare. Unless your company is the exception to the rule, your Linux instances would run as virtual machines. I haven’t had much experience with this but, if your app depends on no native code and runs under Linux, it would probably work on a mainframe running Linux. For more info about Linux on mainframes see this link.

We have extensive experience running Java under Windows, Linux and on IBM SystemI (or iSeries, or AS/400, depending on IBM's mood that year) minicomputers. It is my opinion that the mini-computer platform seems to deliver much less bang for your buck against modern multi-core x86 CPU's.
Note that Java benefits more readily from having multiple cores available than typical software today, because of it's inherently multithreaded nature - this would be even more true as you run multiple JVMs.
That said, you will typically be capable of getting many more CPU cores available with better bandwidth to access memory on a mini or mainframe, and better throughput on disk subsystems (overall) so these systems may very well scale much better as you toss more JVMs on them.

IBM allows this. Some of their mainframes can hold Java accelerator processors that run the bytecode natively for more performance. They also have DB2 accelerators, and possibly some for XML operations.
I've never gotten to play with any of them, but I'd sure love to.

Though I've been in the Industry since 1975, I'm no longer sure what a "mainframe" is. My current development machine has four 3GHZ processors in it, 8GB of RAM, and 750GB of disk space (RAID 1, so it's really double that), and two 19-inch flatscreen monitors.
That's because I'm there on a contract. The employees all have much more powerful boxes than mine.
I understand that the server machines, especially the database servers, are much faster.
Mainframe?

Depending on you workload this is worth looking at!
There are a bewildering number of options available to you just using the IBM hardware:
Its definately worth considering the add on java processors.
(these are actually not that different form the standard cpus its just they are
restricted to java jvm workloads -- and -- most importantly are excluded from
cpu based software license pricing).
You can run multple Linux VMs each ruuning thier own Java app.
You can run multiple native VMs running thier minimalist operating system
used to be called DOS but they change the name every couple of years.
The software licenses are cheaper than the main OS, but it has very limited
functionality which turns out to be an advantage if you are running self contained
applications.
You can run in the monster z/OS environment either:-
a. Within USS (Unix System Services) which is pretty much a full UNIX
OS running inside the parent z/OS.
b. Run your java app in its own started task (== unix daemon).
c. Run your app inside CICS.
(Probably not as you need to use CICS/Java API where you would
normally use Servlet/J2EE APIs so you app would require a rewrite.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.