I'm going to build a little service which monitors an IMAP email account and acts on the read messages. For this it just has to run every say 10 min, no external trigger required, but I want to host this service externally (so that I don't need to worry about up times.)
To be machine independent I could write the service in Java or Python. What are good hosting providers for this? and which of the two languages is better supported?
The service has either to run the whole time (and must do the waiting itself) or it has to be kicked off every 10 min. I guess most (web) hosts are geared towards request driven code (e.g. JSP) and I assume they shut down processes which run forever. Who offers hosting for user-written services like the one mentioned above?
Depending on what actions you require, and your requirements for resources, Google App Engine might be quite suitable for both Python and Java services (GAE supports both languages quite decently). cron jobs can be set to run every 10 minutes (the URL I gave shows how to do that with Python) and you can queue more tasks if the amount of work you need to perform on a certain occasion exceeds the 30-seconds limit that GAE supports.
GAE is particularly nice to get started and experiment since it has reasonably-generous free quotas for most all resources your jobs could consume (you need to enable billing, provide a credit card, and set up a budget, to allow your jobs to consume more than their free quota, though).
If you decide that GAE has limitations you can't stand, or would cost you too much for billed use of resources over the free quotas, any hosting provider supporting a Unix-like cron jobs scheduler should be acceptable. Starting up from scratch a Python script every 10 minutes may be faster than starting up from scratch a JVM, but that depends on what it is that you have to do every 10 minutes (for some kinds of tasks Python will be just as fast, or maybe even faster -- for others it will be slower, and we have no way to guess what kinds of tasks you require or at what "tipping point" the possibly-faster JVM will "pay for its own startup" wrt the possibly-slower Python... basically you'll need to assess that for yourself!-).
You are lucky, since Google AppEngine provides CRON jobs both for Python and Java.
GAE - Python
GAE - Java
Check out Google App Engine. You can set up a cron job for your Java or Python script.
Related
I'm developing an enrollment application. The client side is an Android application enabling the client to enter their information which are stored using the data storage service of the Google cloud and the images are entered are stored using the blob storage service.
The server side is a J2EE application extracting the data and blobs entered previously and doing some tests such as face recognition, alphanumeric matching etc. These tests are done asynchronously and continuously.I thaught to use the multithreading for these processes done by the server side.
So is that recommended for such case ? Is there other solution ?
There are several limitations in GAE that somewhat limit it's multiprocessing abilities:
Each request can make up to 50 threads, but threads can not outlive the request, which itself has a 60 seconds limit. Also, threads must be created via GAE's own ThreadManager, which limits the use of most external processing libraries.
Background threads independent of current request are available and can be long-lived, but there is a limit of 10 background threads per instance.
For async processing you should look into Task Queues - it has all above limitations, but can run for 10 minutes. You can start periodic processing via Cron jobs.
Note that GAE instances are quite limited (default is single core 600MHz, 128Mb RAM). They are also quite expensive given how low-power they are. If you need more processing power you should look into Compute Engine (powerful, stand-alone, unmanaged, no GAE-services access, fairly priced for the power), or in your case preferably Managed VMs (powerful, managed, limited GAE-service access, same price as CE).
So if you have light processing, use Task Queues, if you need more power use Managed VMs (currently in preview).
In order to improve the execution speed of a Java program running in Google App Engine, can I create additional Java threads during the runtime to make use of idle machines in the data center?
I've found conflicting data thus far.
If your primary concern is to improve the execution time, take a look at Memcache and Tasks. They can be used to reduce or avoid the latency of reading from or writing to the Datastore or other storage options, fetching URLs, sending emails, etc. If you do a lot of difficult computations that can run in parallel, look at MapReduce API.
Once you remove all the delays from your program, there will be no reason to use multiple threads within a single request.
Note that App Engine instances can use multithreading to execute multiple requests at the same time, so they tend to use allocated resources efficiently. To enable it, see:
https://developers.google.com/appengine/docs/java/config/appconfig#Java_appengine_web_xml_Using_concurrent_requests
If you have a problem that calls for a multithreaded solution, you can use threads (as described on the link that you included in your question).
However, based on your reasoning ("to make use of idle machines in the datacenter"), it seems like you're misguided. You should not use threads for that reason. You use the machines hours that you pay for and not more. The only time you will have an idle machine is if you tell App Engine to keep around an extra idle machine so that it doesn't have to start up an extra machine your app gets a big usage spike.
Most of the time, unless you are truly doing parallel computation, you won't need to use multiple threads in App Engine. For instance, the datastore has an asynchronous API so that you can do multiple datastore operations in parallel without having to deal with threads yourself.
Does that make sense?
I'm new here and I'm not that very good in CPU consumption and Multi Threading. But I was wondering why my web app is consuming too much of the CPU process? What my program does is update values in the background so that users don't have to wait for the processing of the data and will only need to fetch it upon request. The updating processes are scheduled tasks using executor library that fires off 8 threads every 5 seconds to update my data.
Now I'm wondering why my application is consuming too much of the CPU. Is it because of bad code or is it because of a low spec server? (2 cores with 2 database and 1 major application running with my web app)
Thank you very much for your help.
You need to profile your application to find out where the CPU is actually being consumed. Java has some basic profiling methods built in, or if your environment permits it, you could run the built in "hprof" compiler:
java -Xrunhprof ...
(In reality, you probably want to set some extra options: Google "hprof" for more details.)
The latter is easier in principle, but I mention the possibility of adding your own profiling routine because it's more flexible and you can do it e.g. in a Servlet environment where running another profiler is more cumbersome.
Paulo,
It is not possible for someone here to say whether the problem is that your code is inefficient or the server is under spec. It could be either or both of those, or something else.
You are going to need to do some research of your own:
Profile the code. This will allow you to identify where your webapp is spending most of its time.
Look at the OS-level stats that are available to you. This might tell you that the real problem is memory usage or disk I/O.
Look at the performance of the back-end database. Is it using a lot of CPU?
Once you have identified the area(s) where the CPU is being used, you need to figure out the real cause of the problem is and work out how to fix it. And once you've got a potential fix implemented, you can rerun your profiling, etc to see it has helped.
We have a simple java app which runs user requests (commands) on the command line and outputs results. So, as an example, say the user submits a job along the lines of:
cmd.exe ping -10 google.com
Typically, we have around 100 users, submitting 100's of requests to be exceuted. Currently, the java app simply queues the jobs and run them sequentially, without being "democratic" or "greedy" about it.
What we would like to do is have the ability to prioritize jobs, run jobs in a more "even" fashion (say 2 users have submitted 100 jobs each, we would like to run 1 job per user and switch back and forth).
To this end, I was wondering if there are any opensource tools such as PBS:
http://code.google.com/p/pbs4java/
which could integrate with java. A search on google does not reveal a lot - any comments or suggestions would be much appreciated.
UPDATE: The most important criterions I am after are:
[1] Should be opensource
[2] Should be able to integrate with Java.
I suggest taking a look at Akka Dispatch and priority features. You should manage how you want to assign the priorities to the tasks. So, in your case, as users submit more tasks, it could lead to degrading to lower priorities through time. The metrics should be defined by your problem requirement and injected into the library.
As a side node, Akka uses HawtDispatch ideas, so you might also take a look at the library if it suits you.
A program I am working on takes forever to complete (~3days, everytime).
Is there some place on the internet where I can leave the code, some robot might run it for me and I can come back and collect the results? Some online judge that offers this capability?
[I am not talking abt optimisations here.]
You may need to go to something like this:
Amazon Elastic Compute Cloud (Amazon
EC2) is a web service that provides
resizable compute capacity in the
cloud. It is designed to make
web-scale computing easier for
developers.
http://aws.amazon.com/ec2/
If you really cannot run it on your own machines, you can run in on Amazon EC2 cloud. You would need to maintain a virtual machine, but Amazon provides some preconfigured settings.
The pricing starts with $0.085/hour (~$6 for three days). The actual price is determined by the duration of use and your CPU needs. Higher CPU capability is more expensive.