Numerical computing environment on cloud? [ Undergrad Project ] - java

I am a computer science undergraduate currently in my final year. As my final year project, I am thinking of creating a matlab-like numerical computing environment as SAAS that supports matrix manipulations, plotting of functions and data, image processing operations etc. The project is going to be created in Java + Scala. Scala will be used for application's DSL. Rest of the application is going to be programmed in Java.
I was thinking of implementing this system on google app engine so that we could parallelize various algorihms across a number of servers and thus obtain faster results. However I do not have any prior experience with web development (except some simple sites in PHP).
So I had the following key questions:
First of all does it make sense to have an application like matlab hosted on cloud?
How easy or difficult it would be to write such an application over google app engine, considering my limited experience with web development?
Can you please point me to some already existing projects that parallelize mathematical, graph and image processing algorithms.
I know the question is very much subjective but I still request you all not to close it as I am very much confused regarding my project and need some expert advice.
Any hep would be greatly appreciated!
Thanks!

About half a year ago I've thought about making such thing.
Thoughts ended up with nothing except some code at http://code.google.com/p/metaplasm...
In fact, the tricky thing with GAE is that computation must be sliced into thirty secods slices with no shared memory (only memcache and database). After you're accomplish that, everything else will go smooth :-)

App Engine probably isn't the right platform for this. App Engine is targeted at web applications where each request does a modest amount of computation, but you need to service a lot of them - most traditional webapps, such as social networking sites, blogs, web-based games, and so on and so forth. It isn't targeted at services that need to do intensive computation for a single user request, and while it has services to do parallel background processing, they're asynchronous, which is probably also not what you want for your use-case.
What I would recommend is looking at other cloud environments, such as Amazon's EC2, for the processing power and parallelism you need. App Engine would still do an admirable job as a frontend for such a service, though! For example, you could use an App Engine app to manage jobs, dispatch them to backends, and turn up and down VM instances as required by load.

This absolutely makes sense, and there are two existing projects that run numerical routines in the cloud.
Biocep (free, runs R & Scilab on EC2 or Eucalyptus) and Monkey Analytics (commercial, runs R, Octave or Python on EC2).

Why not try BOINC opensource distributed computing system ?
http://boinc.berkeley.edu/
It allows multiple platforms, multiple hosting environments and services all kind of numerical computation jobs depending on parallel environments.
Moreover, You don't need any web development knowledge. You need to just create a new project in BOINC and try running it in existing volunteer computing environment.

You might encounter issues with this type of service on GAE as it's quite restrictive on what you are allowed to do in the sandbox. From the GAE Docs
An App Engine application cannot:
spawn a sub-process or thread. A web
request to an application must be
handled in a single process within a
few seconds. Processes that take a
very long time to respond are
terminated to avoid overloading the
web server.
This could make it tricky to offer the types of services you describe. The scaling that GAE offers enables you to grow the number of requests you can handle but doesn't really offer you good tools for scaling the CPU resources for a single request.
Sounds like an interesting idea for a project though, good luck.

It makes little sense to me to write the rest in Java. That's precisely where I think Scala would make the most difference.

I'm hosting my Java math online demo on Google appengine. This non parallelized demo of course hits the Google Appengine quota limits for time expensive requests.
But with the help of the appengine-mapreduce library you can parallelize your mathematical algorithms and avoid these limits.

Related

Application Insights for Java: how to create a child unit of work

Out of the box AI works very well when it comes to correlate the web requests and its dependencies, events etc.
But assuming I have a long running background job, I would like to split its tracking into smaller pieces, units of work or spans, all within the same parent (and this division could go further too). Just like that image
Can I somehow do it in code?
A new SDK called Microsoft.ApplicationInsights.WorkerService was released very recently which is best suited for these type of background jobs, and non-HTTP workloads like messaging, console applications etc. This is the blog announcing the Worker service SDK, with detailed instructions on onboarding different types of supported scenarios with examples here.
However, while this is a .Net core solution, I can check with our internal Teams about the available options for Java applications and get back as I have more information. To pursue this further, please help me with the following additional details about your app environment from your end:
What Azure services do you use in this application's ecosystem? (VMs / DBs / App Services?)
What is the framework used in code?
Are there any Load Balancers or PaaS services also involved? If yes, how is their performance measured? Where are their logs configured?
This information would certainly help us understand your setup better. Thanks!

Tools/techniques to Load/Soak test a deployed Java EE application?

I'm quite new to performance testing and am looking to be pointed in the right direction.
I have a Java project which contains two parts, deployed seperately:
A service-broker, published as a webservice; which has service and db wrappers.
A front end, which has a service-broker facade, business logic and a Spring MVC UI.
It is deployed on tomcat, which is running on a fresh install of Windows server 2008.
I need to do basic soak testing on this project, to highlight major memory leaks performance issues.
I've been told SOAP UI is the tool I need to do this.
Now for my questions:
Soap UI (Load UI) is only appropriate as the load-generator for testing the service-broker aspect of the project, right?
What additional tools would be helpful (Something to visualize garbage collection, memory use, heap/stack size etc?)
Can I use Load UI as a load generator for a Spring MVC Front end? If not, what's an appropriate alternative?
Thanks a lot.
Here is my opinions
SoapUI is good enough for microbenchmark test but not good at huge scale of testing. So i recommend to use other load testing tool. LoadUI can be a solution. But i want to recommend nGrinder. I used it, it works very well. Apache Jmeter is common tool. But it is JVM based so Jmeter itself needs a tuning.
To monitoring application during perfomance testing. easiest way is use VisualVM. It can monitor all that u mentioned. But it can show u a data in just Java Virtual machine perspective. I rather to recommend to use APM (Application Performance Monitoring). AppDynamic will be good solution.
About UX testing, big difference is, it needs record and play feature. U can do it by using Load UI. but nGrinder can also cover that by implementing HTTP resquest in coding. (It is a reason why we use such a expensive tools like LoadRunner etc).
I hope this will be useful to u.
Cheers
There are four sets of requirements you need to cover in a tool
Can it exercise my interface (Any HTTP test tool will do this for a web services application)
Can it monitor my infrastructure. Now you are getting into the details of if your underlying OS and Virtual Machine can be monitored in an integrated fashion. Not all tools allow for this and you need to be very explicit as to the level of detail you are interested in.
Will it report appropriate to my requirements and in a way which allows for easier identification of system bottlenecks? This is a mix of objective and subjective items. You have not indicated what level of reporting you need
Does my user community have the skills to use the tool? Get the top three right and miss this one and even a free as in beer tool goes to a negative ROI almost immediately.
It's time to button up the requirements or just hire a firm with a set of tools included to do the job.

(.Net + SQL Server +Azure ) vs (Java + Oracle +Google App Engine) which is better for this project specifically?

I have to develop an ERP System for a 2,000+ end users organisation.
Could you please suggest me with comparable points that among (Java or .Net)
in which technology I should invest money and time? Although I have done
some average projects in both, but this project is going to be very big in near
future in terms of scalability.
I want to know your experiences and some tips from you people, so that I can develop
and deploy this project efficiently.
I rate .Net > Java for this project only due to less development time available.
We have to use some Rapid App Development technology.
I have to deploy this on Cloud (Azure or Google App engine).
It will be better if I got answers from those people who works in both (.Net and Java).
I will appreciate answers from your experiences.
I would suggest creating a very small proof-of-concept project in both technologies, which do something real - like allow people to log in, see messages, and allow them to type in new messages, and log out again.
Even if the project is laughably small, if you do it well, you will have a finished product on each platform which have shown you by experience how things works and if you like the way you had to do them. You will be able to see if you can debug in the cloud, if you can profile when load testing, if you can do fast work inhouse which then works well when deployed to the cloud.
And you will need to figure out things. Are the online resources good? How responsive is the StackOverflow community for each platform when you ask questions?
Personally, I consider the ".NET is Windows-only" to be important. Except for that I do not believe there is any technical showstopper for either platform.
I think both approaches can be used to deliver this successfully. I would expect you to have the same amount of success/pain with either choice. When it comes to making a decision you should base it on the amount of expertise that you have to hand. That is, your own and that of your existing colleagues and the resources that you can acquire (new recruits, contractors, consultants etc.).
That said a couple of technical notes:
The Java approach tends to have more freedom, i.e. more frameworks and choice of technologies for various solutions (although GAE will bring in some restrictions).
There is less choice in the .NET space, but that is not always a bad thing. E.g. you tend not end up in tireless debates about the logging frameworks.
Java is starting to age as a language and C# is a bit nicer, however there a number of newer languages that run on the Java VM (Scala, Groovy, Ruby, Clojure).

Thinking in AppEngine

I'm looking for resources to help migrate my design skills from traditional RDBMS data store over to AppEngine DataStore (ie: 'Soft Schema' style). I've seen several presentations and all touch on the the overarching themes and some specific techniques.
I'm wondering if there's a place we could pool knowledge from experience ("from the trenches") on real-world approaches to rethinking how data is structured, especially porting existing applications. We're heavily Hibernate based and have probably travelled a bit down the wrong path with our data model already, generating some gnarly queries which our DB is struggling with.
Please respond if:
You have ported a non-trivial application over to AppEngine
You've created a common type of application from scratch in AppEngine
You've done neither 1 or 2, but are considering it and want to share your own findings so far.
I'm wondering if there's a place we could pool knowledge from experience
Various Google Groups are good for that, though I don't know if any are directly applicable to Java-GAE yet -- my GAE experience so far is all-Python (I'm kind of proud to say that Guido van Rossum, inventor of Python and now working at Google on App Engine, told me I had taught him a few things about how his brainchild worked -- his recommendation mentioning that is now the one I'm proudest, on amongst all those on my linkedin profile;-). [I work at Google but my impact on App Engine was very peripheral -- I worked on "building the cloud", cluster and network management SW, and App Engine is about making that infrastructure useful for third party developers].
There are indeed many essays & presentations on how best to denormalize and shard your data for optimal GAE scaling and performance -- they're of varying quality, though. The books that are out so far are so-so; many more are coming in the next few months, hopefully better ones (I had a project to write one of those, with two very skilled friends, but we're all so busy that we ended up dropping it). In general, I'd recommend the Google I/O videos and the essays that Google blessed in its app engine site and blogs, PLUS every bit of content from appenginefan's blog -- what Guido commended me for teaching him about GAE, I in turn mostly learned from appenginefan (partly through the wonderful app engine meetup in Palo Alto, but his blog is great too;-).
I played around with Google App Engine for Java and found that it had many shortcomings:
This is not general purpose Java application hosting. In particular, you do not have access to a full JRE (e.g. cannot create threads, etc.) Given this fact, you pretty much have to build your application from the ground up with the Google App Engine JRE in mind. Porting any non-trival application would be impossible.
More pertinent to your datastore questions...
The datastore performance is abysmal. I was trying to write 5000 weather observations per hour -- nothing too massive -- but I could not do it because I kept on running into time out exception both with the datastore and the HTTP request. Using the "low-level" datastore API helped somewhat, but not enough.
I wanted to delete those weather observation after 24 hours to not fill up my quota. Again, could not do it because the delete operation took too long. This problem in turn led to my datastore quota filling up. Insanely, you cannot easily delete large swaths of data in the GAE datastore.
There are some features that I did like. Eclipse integration is snazzy. The appspot application server UI is a million times better than working with Tomcat (e.g. nice views of logs). But the minuses far outweighed those benefits for me.
In sum, I constantly found myself having to shave the yak, in order to do something that would have been pretty trivial in any normal Java / application hosting environment.
The timeouts are tight and performance was ok but not great, so I found myself using extra space to save time; for example I had a many-to-many relationship between trading cards and players, so I duplicated the information of who owns what: Card objects have a list of Players and Player objects have a list of Cards.
Normally storing all your information twice would have been silly (and prone to get out of sync) but it worked really well.
In Python they recently released a remote API so you can get an interactive shell to the datastore so you can play with your datastore without any timeouts or limits (for example, you can delete large swaths of data, or refactor your models); this is fantastically useful since otherwise as Julien mentioned it was very difficult to do any bulk operations.
The non relational database design essentially involves denormalization wherever possible.
Example: Since the BigTable doesnt provide enough aggregation features, the sum(cash) option that would be in the RDBMS world is not available. Instead it would have to be stored on the model and the model save method must be overridden to compute the denormalized field sum.
Essential basic design that comes to mind is that each template has its own model where all the required fields to be populated are present denormalized in the corresponding model; and you have an entire signals-update-bots complexity going on in the models.

What are some of the major shifts in thinking required to become a good Rich Internet Application (RIA) developer?

I've been experimenting with Adobe Flex recently. Being a long-time server-side web app developer, I'm faced with difficulties that I last experienced when I dabbled in Java Swing development a long time ago. It mainly revolves around the flow of control between my code and the framework's code. Most things are asynchronous as to not freeze the UI.
So, I'm looking for all the seasoned developers out there who have seen it all to put into words the shifts in thinking required to make the transition from traditional web apps to RIAs.
Update: Moved the distracting parts to another question.
There's two models I'm seeing in the market right now:
Blended UI. The server is still involved in the UI construction effort, but a lot of it is offloaded to javascript. This is how a lot of the javascript toolkits work (except dojo, extjs, ...).
Separated concerns. The server is treated as a data storage and synchronization method only. The app runs entirely client-side, possibly even with local storage. This is how flex works.
I think we're going to be migrating towards the second model, because it means that you don't have to track UI state on the server, which dramatically simplifies the architecture. I've been toying with ExtJS and Flex, and the development experience is a lot like building a desktop app, only without the fancy drag-and-drop IDE's. It's hard to think of large differences between a three-tier desktop app and a web app in this fashion.
So my advice would be: stop thinking you're building web apps, always put into doubt whether something belongs on the server, because in the new model it often won't. Also, use gears or the browser cache effectively, because if your app is client-side, downloading all that code every time will be too slow.
Two pieces of advice:
Your server should never ever trust anything given to it by the client. Like any web app, data originating on the client can be compromised.
Visualise. That's the real (perhaps only) benefit of RIA: the ability to give rich interactive visualisations of data, that can be mixed in interesting ways. Make the most of it.

Categories

Resources