Is it Possible to Convert Existing Java Projects into Spring-Batch Jobs

Is it Possible to Convert Existing Java Projects into Spring-Batch Jobs - java

My question without the fluff
Is Spring-Batch the right tool for converting a handful of one-off (one and done type) java projects that tend to interact with a database at some point, into their own separate "Jobs" that are part of a Spring-Batch project?
With Fluff/Background
The company I work for has several "one off" java projects that really only do one thing (either at some arbitrary time, or just whenever we tell it) and then they are done. They tend to interact with the database at some point in their lifecycle, but not necessarily at the same time.
There are a couple more projects/scripts that I'm tasked with creating, but instead of having nearly 10 different perl scripts and a few handful of jars out on our box, we would like to maybe stuff all these projects in one place and just be ran when we need to run them (ideally on a remote box via command line args). From the research I've done on Spring-Batch, it kind of sounds like exactly what we are looking for, especially the functionality described here.
I'm moderately familiar with Spring-Boot and the Spring Framework, but for some reason reading up on Jobs and the setup, the domain language seems a little foreign to me, and I just want to make sure I'm not wasting my time trying to figure this out if it's not realistic. We would like this to just be one cozy place for any current or future projects that are ran independently so that we minimize clutter on our box. Just to clarify, I keep saying projects, but from what I understand, these projects need to be converted to Jobs if they are to be part of a Spring-Batch project. Is this realistic?

(converting comments to an answer)
If things like task scheduling, transaction management, consuming data in chunks, flexibility on start/stop operations, retrying mechanisms are things you find yourself spending time into coding yourself then yes definitely take a look at Spring Batch it has robust already-implemented facilities of all of the above.
On the grander scheme of things if your application has many moving parts, consuming a lot of external resources you might as well dive into the EIP (Enterprise Integration Pattern) waters with Apache Camel and Spring Integration very solid implementations.
By your description your one-off project(s) are reasonable single-focused so the learning curve of new frameworks might not worth the while. In that case it might suffice to focus on the re-usable components of your projects and externalize them on a core/helper lib for re-usabilitly purposes if that makes sense.

Related

independent prototyping with java

I am new to java (well I played with it a few times), and I am wondering:
=> How to do fast independent prototypes ? something like one file projects.
The last few years, I worked with python. Each time I had to develop some new functionality or algorithm, I would make a simple python module (i.e. file) just for it. I could then integrate all or part of it into my projects. So, how should I translate such "modular-development" workflow into a java context?
Now I am working on some relatively complex java DB+web project using spring, maven and intelliJ. And I can't see how to easily develop and run independent code into this context.
Edit:
I think my question is unclear because I confused two things:
fast developement and test of code snippets
incremental development
In my experience (with python, no web), I could pass from the first to the second seemlessly.
For the sake of consistency with the title, the 1st has priority. However it is good only for exploration purpose. In practice, the 2nd is more important.

Definitely take a look at Spring Boot. Relatively new project. Its aim is to remove initial configuration phase and spin up Spring apps quickly. You can think about it as convention over configuration wrapper on top of Spring Framework.
It's also considered as good fit for micro-services architecture.
It has embedded servlet container (Jetty/Tomcat), so you don't need to configure it.
It also has various different bulk dependencies for different technology combinations/stacks. So you can pick good fit for your needs.

What does "develop and run independent code in this context" mean?
Do you mean "small standalone example code snippets?"
Use the Maven exec plugin
Write unit/integration tests
Bring your Maven dependencies into something like a JRuby REPL

Share data between Java EE servers

What products/projects could help me with the following scenario?
More than one server (same location)
Some state should be shared between server (for instance information if a scheduled task is running and on what server).
The obvious answer could of course be databases but we are using Seam and there doesn't seem to be a good way to nest transactions inside a Seam-bean so I need to find a way where I don't have to go crazy over configuration (tried to use EJB:s but persistence.xml wasn't pretty afterwards). So i need another way around this problem until Seam support nested transactions.
This is basically the same scenario as I have if you need more details: https://community.jboss.org/thread/182126.
Any ideas?

Sounds like you need to do distributed job management.
The reality is that in the Java EE world, you are going to end up having to do Queues, as in MoM [Message-oriented Middleware]. Seam will work with JMS, and you can have publish and subscribe queues.
Where you might want to take a look for an alternative is at Akka. It gives you the ability to distribute jobs across machines using an Actor/Agent model that is transparent. That's to say your agents can cooperate with each other whether they are on the same instance or across the network from each other, and you are not writing a ton of code to make that happen, or having to special handle things up and down the message chain.
The other thing Akka has going for it is the notion of Supervision, aka Go Ahead and Fail, or Let it Crash. This is the idea (followed by the Telcos for years), that systems will fail and you should design for it and have a means of making things resilient.
Finally, the state of other options job wise in the Java world is dismal. Have used Seam for years. It's great, but they decided to just support Quartz for jobs, which is useless.
Akka is built on Netty, too, which does some pretty crazy stuff in terms of concurrency and performance.
[Not a TypeSafe employee, btw…]

Ripping out Hibernate/Mysql for MongoDB or Couch for a Java/Spring/Tomcat web application

I have an application that is undergoing massive rework, and I've been exploring different options - chug along 'as is', redo the project in a different framework or platform, etc.
When I really think about it, here are 3 major things I really dislike about java:
Server start/stops when modifying controllers or other classes. Dynamic languages are a huge win over Java here.
Hibernate, Lazyloading exceptions (especially those that occur in asynchronous service calls or during Jackson JSON marshalling) and ORM bloat in general. Hibernate, all by itself, is responsible for slow integration start up times and insanely slow application start up times.
Java stupidity - inconsistent class-loading problems when running your app inside of your IDE compared to Tomcat. Granted once you iron out these issues, you most likely won't see them again. Even still, most of these are actually caused by Hibernate since it insists on a specific Antlr version and so on.
After thinking about the problem... I could solve or at least improve the situation in all 3 of these areas if I just got rid of Hibernate.
Have any of you reworked a 50+ entity java application to use mongo or couch or similar database? What was the experience like? Do you recommend it? How long did it take you assuming you have some pretty great unit/integration tests? Does the idea sound better than it really is?
My application would actually benefit in many areas if I could store documents. It would actually open up some very cool and interesting features for this application. However, I do like being able to create dynamic queries for complex searches... and I'm told that Couch can't do those.
I'm really green when it comes to NoSQL databases, so any advice on migrating (or not migrating) a big java/spring project would be really helpful. Also, if this is a good idea, what books would you recommend I pick up to get me up to speed and really make use of them for this application in the best way possible?
Thanks

In any way, your rant doesn't just cover problems with the previously made (legacy) decision for Hibernate but also with your development as a programmer in general.
This is how I would do it, should a similar project be dropped in my lap and in dire need of refactoring or improvement.
It depends on the stage in your software's lifetime and the time pressure involved if you should make big changes or stick with smaller ones. Nevertheless, migrating in increments seems to be your best option in the long term.
Keeping the application written in Java for the short term seems wise, a major rewrite in another language will definitely break acceptance and integration tests.
Like suggested by Joseph, make the step from Hibernate to JPA. It shouldn't cost too much time. And from there you can switch the back-end to some other way of storage. Work towards a way of seperating concerns. Pick whatever concept seems best, some prefer MVC while others might opt for CQRS and still others adore another style of segmentation/seperation.
Since the JVM supports many languages, you can always switch to any of those or at least partially implement functionality in more dynamic languages. This will solve part of the problem where you keep bumping into the "stupidity" of Java, while still retaining the excellent optimizations of current JVMs at runtime.
In addition, you might want to set up automatic integration tests... since the application will hopefully never be run from your IDE, these tests will give you honest results.
Side note: I never trust my IDE to get dependencies right if the IDE has capabilities to inject its own libraries into my build or runtime path.
So to recap in short: small steps; lose Hibernate and go more abstract to JPA; if Java becomes stupid, then gradually switch to a clever language. Your primary concern should be to restructure the code base without losing functionality, keeping in mind to have an open design which will make adding interesting and cool features easier later on.

Well, much depends on things like "what exactly are the pain points with Hibernate?" (I know, you gave three examples...)
But those aren't core issues over the long haul. What you're running into is the nature of a compiled language vs. a dynamic one; at runtime, it works out better for you (as Java is faster and more scalable than the dynamic languages, based on my not-quite-exhaustive tests), but at development time, it's less amenable to just hacking crap together and hoping it works.
NoSQL isn't going to fix things, although document stores could, but there's a migration step you're going to have to go through.
Important: I work for a vendor in this space, which explains my experience in the area, as well as the bias in the next paragraph:
You're focusing on open source projects, I suppose, although what I would suggest is using a commercial product: GigaSpaces (http://gigaspaces.com). There's a community edition, that would allow you to migrate JPA-based java objects to a document model (via the SpaceDynamicProperties annotation); you could use JPA for the code you've written and slowly migrate to a fully document-oriented model at your convenience, plus complex queries aren't an issue.

All of those points are usually causing problems due to incompetence, rather than hibernate or java being problematic:
apart from structural modifications (adding fields or methods), all changes in the java code are hot-swapped in debug mode, so that you can save & test (without any redeploy).
the LazyInitializationException is a problem for hibernate-beginners only. There are many and clear solutions to it, and you'll find them with a simple google or SO search. And you can always set your collections to fetch=FetchType.EAGER. Or you can use Hibernate.initialize(..) to initialize lazy collections.
It is entirely normal for a library to require a specific version of another library (the opposite would be suspicious and wrong). If you keep your classpath clean (for example by using maven or ivy), you won't have any classloading issues. I have never had.
Now, I will provide an alternative. spring-data is a new portfolio project by springsource, that allows you to use your entities for a bunch of NoSQL stores.

any experience with "Play" java web development framework? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've just stumbled upon the following new java web framework: Play
http://www.playframework.org/
http://www.playframework.org/documentation/1.0/home
with such a stunning list of features, I'm pretty much surprised I haven't heard of it before...
Sounds like the java web development promised land...
has anybody tried it? any real experience with it? do you think it's worth studying it?

I agree with Jason that Play might just prove to be better than Grails. With four Grails projects under my belt (preceded by two Tapestry projects and one Wicket project), I'm seriously looking at Play next.
One of the things I thought was cool about Grails is that "everything's Groovy." That is, you use Groovy to write everything (except the HTML and the CSS) -- domains, controllers, services, page templates (GSP), tag libraries, Hibernate API (GORM), unit tests (GUnit), and build scripts (GANT). You can even write shell scripts in Groovy. So, being able to code all aspects of an app using a single language again seemed like a simplification that was long overdue -- hearkening back to the days of writing desktop apps in a single language like C++ or Delphi. However, I've learned that one size does not fit all here.
For one, the IDE support for Groovy isn't great. IntelliJ does the best job, but with Groovy being dynamic, it can only go so far. The refactoring tools do not (cannot) catch everything, so you can't trust them 100%. This means you have to be especially vigilant with unit testing. Here again, because Grails relies so much on dynamic "magic" that happens at runtime, the unit testing in Grails must rely on an extensive mocking layer to emulate it, and that mocking layer is quirky. A third issue is that much of the so-called Groovy code that you're writing is actually domain-specific-language (DSL) code. (To make a long story short, DSLs are short-hand Groovy, taking advantage of the fact that in Groovy and lot of the syntax is optional.) Grails uses different DSLs for various configurations, URL mapping, etc. and it is inconsistent. How you specify log4j settings, for example, looks nothing like how you specify the data sources, and neither looks like the pure Java upon which Groovy is based. So, the promise of "everything's Groovy" falls apart anyway.
That being the case, I see where the Play team is coming from.
Going back to regular Java for the domains, controllers, services, and JUnits makes sense. Strong typing means the IDE can reliably help with inteli-sense, code navigation, refactoring, etc. (And thus you don't need to pay for IntelliJ if you're happy with Eclipse.) Having to write more verbose code in order to gain back strong tool support seems like a good deal to me right now. We'll see.
I like that I still get to use Groovy in the page templates. I'm afraid I may end up putting more code in the templates than I should, though.
I have no experience with JPA, but it seems like it's pretty close to what GORM does for me, so that's cool.
The Spring IOC support in Grails is completely transparent whereas Play's support seems minimal; however, I think that IOC is way overused and I'm perfectly willing to hand code a Spring XML mapping on the rare occasion that I really need one. (One of my open questions is that I'm assuming that JPA has transaction support which is why Play doesn't need Spring for that like Grails does, no?)
I've never been a fan of Python, so I cringed when I read that Play uses Python for its build scripts. But I agree that Grails' GANT scripts run pretty slow. Plus I find that, while GANT is a huge improvement over XML ANT, it's still hard to wrap your head around the ANT concepts. The Grails GANT scripts are pretty convoluted. So, I'll go in to it with an open mind.
The Play "application module" model sounds to be just like Grails' "plugin" model, so that's cool.
I'm quite impressed with the Play documentation that I've read so far. I had a huge number of questions going in, but half of them were answered right off the bat.
I'll report back again later as I dive deeper in.

I've tried Play and I'm impressed: it does a great job of delivering a useful development model that is far simpler than most frameworks'. More than anything else, the runtime's ability in 'development mode' to parse .java files directly is worth a lot: just reloading the web page in the browser without running a build script or waiting for a redeployment is worth a lot of development speed. The error messages shown in the browser are really good too.
Another thing that impressed me was the overall aesthetic: it is perhaps a small thing that the tutorial application actually looks good (both the code and the web page design), but this extends to the whole framework, the API as well as the documentation.

After prodding from a colleague I looked at it, followed the tutorial, and got hooked. Getting immediate feedback right in your browser means you don't have to use an IDE. I love Eclipse, but let's face it: after you've added some extras, it's not as stable as a simple text editor. On a Mac with TextMate you can even click on the error message in your browser and TextMate pops up with the cursor on that line.
Testing in Play is also nicely done, with one button press you run unit tests, functional tests and Selenium-based tests.
Play is exciting because it's still small and uncomplicated. It uses just ant to build and does so in 25 seconds. Contributing to the beautiful documentation is a matter of editing the .textile files and reloading the docs in any play app.
That's how I wound up on a quest to translate the tutorial to use Scala, adding to the Scala support where needed to get it as nice as possible.

I like it, I'm using it for small projects and so far it looks perfect for the job.
However, there's one thing I miss very much that's been left out on purpose: Service/DAO/Model layers separation! Documentation says it clearly, one of the goals of Play is to avoid the "Anemic data model":
http://www.playframework.org/documentation/1.0.1/model
but in my experience the classical Service/DAO/Model layers separation saves tons of development time when the application needs to be refactored! With Play you're stuck with static methods that rely on Play-specific transaction management and peculiarities...
However, many thumbs up for: development speed, code cleanness, and in the end... fun!

I've used Grails, Tapestry 4/5 and straight Java/JSP/Spring/Hibernate.
I think this is going in the right direction for the first time in a long time. Grails was a really good first step but Play! looks like something that could really have legs. Scala support is coming in 1.1. If there is a chance I can write my controllers/domain in Clojure, I'm sold ;)

Since one year and no visible bug after 18 small releases, we use Play! 1.2.4 in a production "absences" intranet application for a school (actors: >100 teachers, > 700 students, administrativ team). Client side has been written with FLEX 4.6 from Adobe (very beautiful views). Data are send and receive in AMF3 format (Cinnamon module). We use a own simple dao layer based on JPA EclipseLink and MySql for the DB. Application is stored on a Linux virtual server. I'm a very fan developer of Play for its simplicity and its very productive approach.

I like the look of Play, but haven't tried it. From scanning through the docs one thing that stood out was the heavy use of static methods. From a unit testing point of view this always makes things much harder (I'm thinking mocks), and is a departure from the OO-everywhere approach in typical Java development. Maybe this is the point, but it's just something that made me a little less enthused...

I currently build web applications at work using play framework which does massive data processing. I must say that the speed that play offers alone is significant and more than what RoR can provide. Besides, play is a java based framework and hence Multi-Threading can be done easily. Next is the sheer performance you get when you use java-based modules like Japid and Netty along with play.It's like an endless amount of tweaking can be done for performance. A must try in my opinion.

I'm using Play in a small project, and seems to be exactly what they've said about. But one feature I think should to be present by default in the framework: ability to work with more than one datasource (e.g. use more than one database schema). This is the only missing feature I've found until now.
Regards,
Uilian.

What is the difference between Hudson and CruiseControl for Java projects?

I think the title sums it up. I just want to know why one or the other is better for continous integration builds of Java projects from Svn.

I agree with this answer, but wanted to add a few points.
In short, Hudson (update: Jenkins) is likely the better choice now. First and foremost because creating and configuring jobs ("projects" in CC vocabulary) is just so much faster through Hudson's web UI, compared to editing CruiseControl's XML configuration file (which we used to keep in version control just to keep track of it better). The latter is not especially difficult - it simply is slower and more tedious.
CruiseControl has been great, but as noted in Dan Dyer's aptly-named blog post, Why are you still not using Hudson?, it suffers from being first. (Um, like Britain, if you will, later into the industrial revolution, when others started overtaking it with newer technologies.)
We used CruiseControl heavily and have gradually switched over to Hudson, finally using it exclusively. And even more heavily: in the process we've started using the CI server for many other things than before, because setting up and managing Hudson jobs is so handy. (We now have some 40+ jobs in Hudson: the usual build & test jobs for stable and development branches; jobs related to releasing (building installers etc); jobs that run some (experimental) metrics against the codebase; ones that run (slow) UI or integration tests against a specific database version; and so on.)
From this experience I'd argue that even if you have lots of builds, including complicated ones, Hudson is a pretty safe choice because, like CC, you can use it to do anything, basically. Just configure your job to run whatever Ant or Maven targets, Unix shell scripts, or Windows .bat scripts, in the order you wish.
As for the 3rd party stuff (mentioned here by Jeffrey Fredrick) - that is a good point, but my impression is that Hudson is quickly catching up, and that there's already a very large number of plugins available for it.
For me, the two things I can name that I miss about CruiseControl are:
Its warning emails about broken builds were more informative than those of Hudson. In most cases the root cause was evident from CC's nicely formatted HTML mail itself, whereas with Hudson I usually need to follow the link to Hudson web UI, and click around a little to get the details.
The CruiseControl dashboard is better suited, out of the box, as an "information radiator" (shown on a public monitor, or projected on a wall, so that you can always quickly see the status of all projects). With Hudson's front page, we needed some Greasemonkey tricks to get job rows all nicely green/red.
Minor disclaimer: I haven't been following the CC project closely for the last year or so. (But from a quick look, it has not changed in any dramatic way.)
Note (2011-02-03): Hudson has been renamed/forked as Jenkins (by Hudson creator Kohsuke Kawaguchi and others). It looks as if Oracle—which controls the Hudson name—will keep "Hudson" around too, but my personal recommendation is to go with Jenkins, no matter what Oracle says.

As a long time CruiseControl committer and someone who has never used Hudson I'm pretty biased, but my take on it is:
Hudson is much easier to get up and running (in large part from a nice web interface) and has a very active plugin development community.
CruiseControl has support from lots of 3rd party stuff and has the benefit of doing some neat tricks with the xml configuration like plugin preconfiguration and include.projects which lets you version the configuration information with the project.
If you're only going to have a few builds I think Hudson is the clear winner. If you're going to have lots -- and don't mind the xml -- then I think CruiseControl's xml configuration tricks become a real strength.

My last project, we started off on CruiseControl. Which rocked. Then we moved to Hudson, which rocked even more. The things I liked about Hudson:
The upstream and downstream projects. So a commit to your data access code will eventually also trigger a build of the presentation layer.
Easily use an existing project as the starting point of a new one - so if you are in the habit of creating development branches, then making sure these are under continuous integration is a snap.

One difference is that Hudson is the product of a single genius intellect—Kohsuke Kawaguchi. Because of that, it's consistent, coherent, and rock solid. The downside could be some limitation on the rate of progress. However, Kohsuke is incredibly prolific, so I wouldn't be too worried about that. And, it's extensible, so if there's something Kohsuke doesn't have time for (or doesn't want), you can probably do it yourself.

I looked at both Cruise Control and Hudson but choose Hudson as it was much easier to setup and configure. Hudson seems very widely used these days with regular releases and lots of extensiblity through plugins. I would highly recommend it.

Hudson is the more user-friendly alternative in my opinion. It can be set up and maintained completely via the web interface (apart from the initial installation of the webapp, of course).
The only way this could be said about CruiseControl is if you count the built-in XML file editor.
Still, having used both, I'd still prefer any one over having no automated build.

I tried Cruise control...Its good...But documents are fragmented. Dashboard is confusing. Widget creation is also confusing. Never tried hudson. Will try on weekend.

I recently setup Jenkins for building Borland BDS 2006 projects making use of Subversion and I am very happy with it. I never used CruiseControl yet, so I can't compare. Read my blog post for more information.
Continuous integration of Delphi project with Jenkins

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.