Java Process Servers Good Idea or Not?

Java Process Servers Good Idea or Not? - java

Just want to shout out to the community to see what peoples thoughts are on Java process servers in general.
IBM in particular tend to make a lot of noise about Websphere process server. I can see the idea behind the process servers if your working in a web service world but in practice are they really effective or are they just overkill?
BPEL is another closely linked technology that tends to get a lot of hype from IBM but I am yet to see an implementation in real life.
General thoughts welcome.

Some projects/companies do have complex business processes that involve many services, applications, human interactions for which using a BPM engine, its connectors, its modeling tools can be justified. But this is clearly not for everybody.
Now, to use IBM Process Server, you'll need a license, you'll need an app server to deploy it (at random, WebSphere), some (IBM) machines, maybe some expensive connectors, some licenses for the modeling tools, etc. So I'm not surprised that IBM makes noise about it (even if don't really have the same feeling), selling such a solution must be a good deal for them (not even mentioning the consulting they will add to the bill).
And BPEL, which is a standardized language to describe flows as a sequences of services consuming or producing XML messages, i.e. a generalization of BPM through XML and Web Services, is another brick allowing to promote SOA a bit further, feeding the marketing soup. So, again, there is nothing surprising in the fact that software vendors try to promote it.
Conceptually, I don't think that BPM, BPEL, etc are bad ideas. But as I said, they are not for everybody. If they don't solve anything for you, then using them would be a bad idea. But this does not necessarily invalidate them as concepts.

IBM has multiple offerings now in this space.
The acquisition Lombardi and heritage WPS are not merged as IBM Business Process manager. There is also a FileNet BPM that is available from IBM which are targetted towards Document centric BPM solutions.
Lombardi stack effectively uses BPMN while WPS uses BPEL as the orchecstration mechanism.
The IBM/Oracle camp had chosen the BPEL path while the others like Appian, Lombardi, Pega etc had come in from using BPMN as the execution model for the business process.
Both of them are widely used and have a meaningful reason to exist.
HTH
Manglu

Related

Building long term application Architecture

I am interested in learning how web application are evolving. The idea is that if new technologies or design methodologies should be introduced in java based web application, what would be the top 5-10 technologies worth exploring? Also it would be helpful if someone can point out good books or online resources to conduct this research.

One key aspect is developer productivity. There has been some excellent research and presentations done in this space by Matt Raible (generally, the non-EE space) and Adam Bien for the EE6 space.

The ability of your application to successfully evolve over time has more to do with your software architecture than it has to do with either your technology or methodology. However, your choice of technologies and methodologies will influence how you architect your software.
First off, know what you are up against. In that late 1960s, people started studying what happens to applications over time. Over the last 40 years, those observations have been turned into a set of laws (c.f. Meir Lehman). This stuff might seem obvious, but it is a good starting place:
As a system evolves, its complexity increases unless work is done to maintain or reduce it.
The functionality of a system must be continually increased to maintain user satisfaction over its lifetime. The quality of a system will appear to decline unless it is rigorously maintained and adapted to operational environment changes.
If your in this for the long haul, the biggest questions are probably organizational and not technical. For example, what technologies does the development staff already know and enjoy using? If the developers plan to stick with the company for 5-10 years, ask what it is about the future that excites them. The best places to collect ideas about "hot" Web app technologies is http://www.infoq.com/.
Consider what methodologies are a good fit to both the technical and business culture of your organization. Agile development is great, but it isn't the right fit for every organization or every environment.
Consider vendors. I worked at a site once that was a true-blue IBM shop because IBM made solid software and hardware. However, the client was really locked into the vendor. The client was still using token ring networks and OS/2 in 1997. Give yourself some room to switch tools and technologies when the need arises. A living, breathing application almost never survives a decade of use without switching technology stacks at least once.
To really create a software design that will hold up to changes in the business environment, follow the old adage "build one to throw away". We once built a new system using a new operating system, a new programming paradigm, a switch from green-screen terminal to fat-client GUIs... it was a complete reinvention of the company's information technology. We would have never succeeded if we had not build a prototype and throw it away. We didn't pick all the right technologies and methods the the first time around, when we build the prototype. But we got the chance to correct those mistakes when we built the production system. This only works if you can create a prototype and then throw it away before it is used for real business needs. Once the application goes into production, the window to "throw one away" is gone.
Best of luck!
-Aaron

For all my quick and working web solutions I use Grails. It gives productivity, reasonable performance. It is supported by VMWare, so long term support looks OK.

Can ESB/BPM allow to totally get rid of coding apart from wrapping webservices?

In a big company I work for, a very (costfull) ESB has been bought, the purpose is to be able to align with business goal quickly by resusing legacy infrastructure wrapping them with webservices, that is to say no more coding needed. Are ESB/BPM now really mature enough for that because it's now more than 10 years old or is it just an other vendor promise ?

Almost certainly just a vendor promise. If this becomes a reality for your company, they'll be the first to be so lucky!
This is the same sales job being done again and again for over a dozen years now (remember 4GLs?).
Most companies find the reality is that 1) it takes far more effort to install, integrate the ESB/BPM tool than they were led to believe, 2) only the most trivial changes can be made with the tool - it still takes coders to perform any meaningful process change / addition, 3) whenever the ESB/BPM tool vendor upgrades their tool, it's a huge effort to upgrade and be eligible for support (look into the history of any of these tools and what pains shops go through to upgrade, particularly Webmethods and BEA/Oracle's products over the years), 4) support services are expensive and rarely provide help (I know of companies that have paid for premium support who have filed dozens of tickets only to have one or two of them resolved by the idiots on the phone before someone in-house finally found the solution / work-around themselves.

You certainly can use an ESB / BPM to wrap legacy infrastructure and facilitate a migration towards a more modern target architecture. In fact that's one of the best reasons to adopt an ESB/SOA strategy in a complex application environment.
However, it's a complete fallacy to say that this implies "no more coding needed". After all, you will need to orchestrate a potentially complex sequence of web services with detailed knowledge of the state and transactional semantics of the legacy systems. Another word for that is.... coding.
p.s. It may be too late for you now but, for the sake of others reading this, I feel obliged to point out that costly prorprietary ESBs are often a waste of money. What you need can be done perfectly well (and sometimes even better!) by the open source solutions. JBoss and Mule spring immediately to mind. Since you are going to need to do most of the hard work in-house anyway, you might as well spend the time learning a great open source toolkit rather than locking yourself in to a vendor's proprietary solution.

building a high scale java app, what stack would you use?

if you needed to build a highly scalable web application using java, what framework would you use and why?
I'm just reading thinking-in-java, head first servlets and manning's spring framework book, but really I want to focus on highly scalable architectures etc.
would you use tomcat, hibernate, ehcache?
(just assume you have to design for scale, not looking for the 'worry about it when you get traffic type responses)

The answer depends on what we mean by "scalable". A lot depends on your application, not on the framework you choose to implement it with.
No matter what framework you choose, the fact is that the hardware you deploy it on will have an upper limit on the number of simultaneous requests it'll be able to handle. If you want to handle more traffic, you'll have to throw more hardware at it and include load balancing, etc.
The part that's pertinent in that case has to do with shared state. If you have a lot of shared state, you'll have to make sure that it's thread safe, "sticky" when it needs to be, replicated throughout a cluster, etc. All that has to do with the app server you deploy it to and the way you design your app, not the framework.
Tomcat's not a "framework", it's a servlet/JSP engine. It's got clustering capabilities, but so do most other Java EE app servers. You can use Tomcat if you've already chosen Spring, because it implies that you don't have EJBs. Jetty, Resin, WebLogic, JBOSS, Glassfish - any of them will do.
Spring is a good choice if you already know it well. I think following the Spring idiom will make it more likely that your app is layered and architecturally sound, but that's not the deciding factor when it comes to scalability.
Hibernate will make your development life easier, but the scalability of your database depends a great deal on the schema, indexes, etc. Hibernate isn't a guarantee.
"Scalable" is one of those catch-all terms (like "lightweight") that is easy to toss off but encompasses many considerations. I'm not sure that a simple choice of framework will solve the issue once and for all.

I would check out Apache Mina. From the home page:
Apache MINA is a network application
framework which helps users develop
high performance and high scalability
network applications easily. It
provides an abstract · event-driven ·
asynchronous API over various
transports such as TCP/IP and UDP/IP
via Java NIO.
It has an HTTP engine AsyncWeb built on top of it.
A less radical suggestion (!) is Jetty - a servlet container geared towards performance and a small footprint.

The two keywords I would mainly focus on are Asynchronous and Stateless. Or at least "as stateless as possible: Of course you need state but maybe, instead of going for a full fledged RDBMS, have a look at document centered datastores.
Have a look at AKKA concerning async and CouchDB or MongoDB as datastores...

Frameworks are more geared towards speeding up development, not performance. There will be some overhead with any framework because of use cases it handles that you don't need. Granted, the overhead may be low, and most frameworks will point you towards patterns that have been proven to scale, but those patterns can be used without the framework as well.
So I would design your architecture assuming 'bare metal', i.e. pure servlets (yes, you could go even lower level, but I'm assuming you don't want to write your own http socket layer), straight JDBC, etc. Then go back and figure out which frameworks best fit your architecture, speed up your development, and don't add too much overhead. Tomcat versus other containers, Hibernate versus other ORMs, Struts versus other web frameworks - none of that matters if you make the wrong decisions about the key performance bottlenecks.
However, a better approach might be to choose a framework that optimizes for development time and then find the bottlenecks and address those as they occur. Otherwise, you could spin your wheels optimizing prematurely for cases that never occur. But that probably falls in the category of 'worry about it when you get traffic'.

All popular modern frameworks (and "stacks") are well-written and don't pose any threat to performance and scaling, if used correctly. So focus on what stack will be best for your requirements, rather than starting with the scalability upfront.
If you have a particular requirement, then you can ask a question about it and get recommendations about what's best for handling it.

There is no framework that is magically going to make your web service scalable.
The key to scalability is replicating the functionality that is (or would otherwise be) a bottleneck. If you are serious about making your service, you need to start with a good understanding of the characteristics of your application, and hence an idea of where the bottlenecks are likely to be:
Is it a read-only service or do user requests cause primary data to change?
Do you have / need sessions, or is the system RESTful?
Are the requests normal HTTP requests with HTML responses, or are you doing AJAX or callbacks or something.
Are user requests computation intensive, I/O intensive, rendering intensive?
How big/complicated is your backend database?
What are the availability requirements?
Then you need to decide how scalable you want it to be. Do you need to support hundreds, thousands, millions of simultaneous users? (Different degrees of scalability require different architectures, and different implementation approaches.)
Once you have figured these things out, then you decide whether there is an existing framework that can cope with the level traffic that you need to support. If not, you need to design your own system architecture to be scalable in the problem areas.

If you are able to work with a commercial system, then I'd suggest taking a look at Jazz Foundation at http://jazz.net. It's the base for IBM Rational's new products. The project is led by the guys that developed Eclipse within IBM before it was open-sourced. It has pluggable DB layer as well as supporting multiple App Servers. It's designed to handle clustering and multi-site type deployments. It has nice capabilities like OAuth support and License management.

In addition to the above:
Take a good look at JMS (Java Message Service). This is a much under rated technology. There are vendor solutions such as TibCo EMS, Oracle etc. But there are also free stacks such as Active MQ.
JMS will allow you to build synch and asynch solutions using queues. You can choose to have persistent or non-persistent queues.

As others already have replied scalability isn't about what framework you use. Sure it is nice to squeeze out as much performance as possible from each node, but what you ideally want is that by adding another node you scale your app in a linear fashion.
The application should be architected in distinct layers so it is possible to add more power to different layers of the application without a rewrite and also to add different layered caching. Caching is key to archive speed.
One example of layers for a big webapp:
Load balancers (TCP level)
Caching reverse proxies
CDN for static content
Front end webservers
Appservers (business logic of the app)
Persistent storage (RDBMS, key/value, document)

Why is Java EE scalable?

I heard from various sources that Java EE is highly scalable, but to me it seems that you could never scale a Java EE application to the level of the google search engine or any other large website.
I would like to hear the technical reasons why it is so scalable.

Java EE is considered scalable because if you consider the EJB architecture and run on an appropriate application server, it includes facilities to transparently cluster and allow the use of multiple instances of the EJB to serve requests.
If you managed things manually in plain-old-java, you would have to figure out all of this yourself, for example by opening ports, synchronizing states, etc.
I am not sure you could define Google as a "large website". That would be like likening the internet to your office LAN. Java EE was not meant to scale to the global level, which is why sites like Amazon and Google use their own technologies (e.g., with use of MapReduce).
There are many papers discussing the efficiency of Java EE scalability.
For example this

What makes Java EE scalable is what makes anything scalable: separation of concerns. As your processing or IO needs increase, you can add new hardware and redistribute the load semi-transparently (mostly transparent to the app, obviously less so to the configuration monkeys) because the separated, isolated concerns don't know or care if they're on the same physical hardware or on different processors in a cluster.
You can make scalable applications in any language or execution platform. (Yes, even COBOL on ancient System 370 mainframes.) What application frameworks like Java EE (and others, naturally -- Java EE is hardly unique in this regard!) give you is the ability to easily (relatively speaking) do this by doing much of the heavy lifting for you.
When my web app uses, say, an EJB to perform some business logic, that EJB may be on the same CPU core, on a different core in the same CPU, on a different CPU entirely or, in extreme cases, perhaps even across the planet. I don't know and, for the most part, provided the performance is there, I don't care. Similarly when I send a message out on the message bus to get handled, I don't know nor do I care where that message goes, which component does the processing and where that processing takes place, again as long as the performance falls within my needs. That's all for the configuration monkeys to work out. The technology permits this and the tools are in place to assess what pieces have to go where to get acceptable performance as the system scales up in size.
Now when I try and hand roll all of this, I start with the problems right away. If I don't think about all the proxying and scheduling and distribution and such in advance, when my app expands beyond the bounds of a single machine's handling I now have major rewrites in place as I shift some of the application to another box. And then each time my capacities grow I have to do this again and again.
If I do think about all of this in advance, I'm writing a whole lot of boilerplate code for each application that does minor variations of all the same things. I can code things in a scalable way, but do I want to do this every. damned. time. I write an app?
So what Java EE (and other frameworks) bring to the table is pre-written boilerplate for the common requirements of making scalable applications. Writing my apps to these doesn't guarantee they'd be scalable, of course, but the frameworks make writing said scalable apps a whole lot easier.

One could look at a scalable architecture from the point of view of what the base framework (like Java EE) provides. But that's just the beginning.
Designing for a scalable infrastructure is an architectural art. It's like the art of projection ... how will it behave when it's blown up real big. The base questions are:
Where do I keep commonly accessed stuff so that when so many persons are asking for it, I don't have to go for it so many time (cache)?
Where do I keep each individual's stuff so that when there are so many individuals needing stuff kept, I won't have trouble managing them all.
How do I remember what a person did here the last time they came here, since they may not be coming back to the same particular node they visited the last time.
How long will I have to wait for (block on) a long-running procedure if so many persons are requesting it?
...
that sort of thing is beyond what a framework can wrap. In other words, the framework could be scalable but the product is wired too tight to scale.
Java EE, as a framework is quite scalable, like most modern microprocessor-targeting enterprise frameworks. But I have seen amazing (not in a good way) stuff build out of even the best of them.
For a plethora of references, please search Google for "Designing for Scalability"

The "scalability" thing talks about "what will you do when your application doesn't fit in a single computer anymore?".
Scalable applications can grow over more computers than one.
Note that large servers can have VERY large applications with lots of memory and lots of cpu's - see http://www.sun.com/servers/highend/m9000/ or http://www-03.ibm.com/systems/i/hardware/595/index.html - but it is usually more expensive than having lots of small servers with the application spreading over them.

Is Web Service suitable for ETL purpose?

My company is considering using web service as mean of ETL process. However I don't think web service fit into this purpose, for several reasons:
1. web service could possibly consume a lot of memory when generating large xml.
2. xml is a bloated format.
3. possibly time-out if the server takes huge amount of time to generate data
4. file size limitation? (for windows, it's 2Gb, if my memory serves me right)
I am not a web service expert, so I need your opinions. :)
Thanks.

There are plenty of technologies in the Web Services tool shed that circumvent all the problems you elaborate. There is stream oriented XML shredding, there are XML compression formats for delivery, protocols that deal with fragmentation and fairness and there are many a storage systems that can hold terabytes upon terabytes of data.
If by web service you imagine some college freshmen homework concoction of an interface that accepts a single glop argument with a 2GB serialized table in it then all your arguments are valid. But if you give your requirements to an experienced team with knowledge of the concepts involved in WS-ReliableMessaging and WS-Transaction then there is no reason not to have an ETL process around Web Services. Note that I do not advocate the SOAP protocols per-se, but I do advocate knowledge and understanding of the concepts involved.
Now that being said, whether an Web Service oriented ETL process makes sense for you or not it depends on a whole set of other reasons. However, your rebuttal of the Web Service technologies does not hold water.

I would not use a web service for an ETL task. There are specialized tools for that task (e.g., Ab Initio, Informatica, etc.) that are better suited.
If you have a large amount of data, I'd say that the price of the extra latency that the network would introduce would be prohibitive.

It really does depend on what you are doing and how you are trying to accomplish it. In general webservices require more care and feeding than you would normally put into an ETL process, but they can be surprisingly effective at the task as well. I did not get enough specifics for your scenario to say whether it would work.
I have worked on Webservices which transmit and recieve 100+ MB documents, some encoded in XML some not, and do it in seconds (on a closed local network). These services required a good deal of tuning and planning, but they did work well for our scenario and they allowed a wide variety of clients to connect and transmit differing amounts of data through a fairly standard interface. This differed from some of the other ETL jobs we had were the job was specific to each client and had to be setup and maintained for each client.
It all depends on what you are doing and what your constraints are.
If you are going to pursue this route sit down and draft out the process from beginning to end, including how you want clients to connect, verify that the data was received and verify that the job is finished. Consider some of the scenarios, the clients and the types of data being transmitted and then work out what would be needed. Contrast that with what is already available in other tools, and how much time you have to get it done.

I'm really wondering why your company is not considering using a real ETL tool like like those mentioned by duffymo in his answer or, Talend or CloverETL if open source is an option.
They are in general good for ETL purpose :)
Building your own solution sounds like reinventing the wheel.
Many of them have web services oriented features (see Export a job as webservice in Talend's wiki or CloverETL Server HTTP Launch Services for example).
I'm not an ETL product expert and I didn't check them all but I'm pretty sure this is something to consider.

Look up MTOM, to start with, which allows arbitrary non-XML data to be streamed in a web service.

Web services are just fine for ETL tasks. Remember that each task is going to get handled in its own thread for free, and you're guaranteed proper cleanup between requests. Using web services inside something like Tomcat wouldn't be nearly as heavy as you think.
If you're concerned over the bloat of XML, consider JSON format.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.