I have multiple stand-alone scripts that query a database (each script queries a different source database) and insert the data into another database local to the script. Each script is scheduled via cronjob making things relatively easy and segregated. I need to combine all those scripts into one and I am looking for pointers on a design approach (assuming the one application will now run as a start-up process).
Two high-level approaches I am thinking about are:
1) Placing each script into its own package and run a pseudo-cron from main
if (time = 7pm) then run script package #1
if (time = 10 after the hour) then run script package #2
2) Place each script in its own Thread and Thread.sleep() it
Any suggestions and links to supporting documentation would be appreciated.
Thanks
this (what I am about to write) doesn't feel worthy of an "answer" but it's better than using than comments. Also, I don't have specific knowledge of either MySQL or Oracle, so this will be generic.
Reading your situations, I feel there's so much more going on here than simply 'combining scripts'.
We were given the directive to get our projects down to the minimum
number of repositories.
Is it okay to have multiple scripts in the one repo? I'm assuming here that at the very least you can have one project/folder with any number of script files in it. I'm not sure why that means you have to have one script file. And this comment...
Placing each script into its own package
...seems a bit odd - I'm not sure how you have 'one script' and 'each script in separate packages' :)
I am looking for pointers on a design approach (assuming the one
application will now run as a start-up process)
Managing (scheduling, running and monitoring) jobs is complicated to do reliably from scratch, but fortunately its a common challenge so the right tooling exists -
have you checked out the MySQL event scheduler?
If you weren't familiar with that, have a look and see if it might work for you. If it looks promising, see what guidance the docs have which might inform how you compose your scripts.
The reading and learning you do now might seem like too much effort for this specific problem, but it will pay-off in the long-term.
Related
My question without the fluff
Is Spring-Batch the right tool for converting a handful of one-off (one and done type) java projects that tend to interact with a database at some point, into their own separate "Jobs" that are part of a Spring-Batch project?
With Fluff/Background
The company I work for has several "one off" java projects that really only do one thing (either at some arbitrary time, or just whenever we tell it) and then they are done. They tend to interact with the database at some point in their lifecycle, but not necessarily at the same time.
There are a couple more projects/scripts that I'm tasked with creating, but instead of having nearly 10 different perl scripts and a few handful of jars out on our box, we would like to maybe stuff all these projects in one place and just be ran when we need to run them (ideally on a remote box via command line args). From the research I've done on Spring-Batch, it kind of sounds like exactly what we are looking for, especially the functionality described here.
I'm moderately familiar with Spring-Boot and the Spring Framework, but for some reason reading up on Jobs and the setup, the domain language seems a little foreign to me, and I just want to make sure I'm not wasting my time trying to figure this out if it's not realistic. We would like this to just be one cozy place for any current or future projects that are ran independently so that we minimize clutter on our box. Just to clarify, I keep saying projects, but from what I understand, these projects need to be converted to Jobs if they are to be part of a Spring-Batch project. Is this realistic?
(converting comments to an answer)
If things like task scheduling, transaction management, consuming data in chunks, flexibility on start/stop operations, retrying mechanisms are things you find yourself spending time into coding yourself then yes definitely take a look at Spring Batch it has robust already-implemented facilities of all of the above.
On the grander scheme of things if your application has many moving parts, consuming a lot of external resources you might as well dive into the EIP (Enterprise Integration Pattern) waters with Apache Camel and Spring Integration very solid implementations.
By your description your one-off project(s) are reasonable single-focused so the learning curve of new frameworks might not worth the while. In that case it might suffice to focus on the re-usable components of your projects and externalize them on a core/helper lib for re-usabilitly purposes if that makes sense.
Could anybody please recommend good solution (framework) to access HBase on Hadoop cluster from Scala (or Java) application?
By now I'm moving in scalding direction. Prototypes I obtained allowed me to combine scalding library with Maven and separate scalding job JAR from 'library' code packages. This in turn allowed me to run scalding based Hadoop jobs from outside cluster with minimal overhead per job ('library' code is posted to cluster 'distributed cache' only when it changes (which is rarely needed) so I can load jobs code fast).
Now I'm actually starting to play with HBase itself and I see scalding is good but it is not so 'native' to HBase. Yes, there are things like hbase-scalding but as I anyway have some point to plan future actions I'd like to know about other good solutions I probably missed.
What is expected:
Applications (jobs) starting overhead should be low. I need to run lot of them.
It should be possible (easier - better) to run jobs from outside cluster without any SSH (just based on 'hadoop jar' command or even simply by application execution).
Jobs language itself should allow short, logical semantic. Ideally this code should be simple enough to be automatically generated.
This solution should be productive on big enough HBase tables (initially up to 100.000.000 entries).
OK, solution should be 'live' (being actively developing) but relatively good in terms of general stability.
I think argumentation here could be even more useful than solution itself and this question should add couple of ideas for many people.
Any peace of advice?
If you're using scalding (which I recommend) there's a new project with updated cascading and scalding wrappers for accessing HBase. You might want to check it out - https://github.com/ParallelAI/SpyGlass
HPaste http://www.gravity.com/labs/hpaste/ may be what you are looking for.
You may be interested in the Kiji project (https://github.com/kijiproject/). It provides a "schema-ed" layer on top of HBase.
It also has a Scalding adapter (KijiExpress) so that you can do functional collections operations (map, groupby, etc.) on "pipes" of tuples sourced from these schema-ed HBase tables.
Update (August 2014): Stratosphere is now called Apache Flink (incubating)
Check out Stratosphere. If offers a Scala API and has a HBase module and is under active development.
Starting a job should be possible within a sec or so (depends on your cluster size.)
You can submit jobs remotely (it has a class called RemoteExecutor which allows you to programmatically submit jobs on remote clusters)
Please contact me if you have further questions!
I am currently trying to maintain hbase-scalding at my free time. As I am also picking up Scala.
Please take a look at github
I have been developing a Online Poker Game. But I keep hitting a wall. I want to implement Awards into the system, but I want them to be dynamic. Meaning I don't want to recompile for every award I would like to add.
I have thought about using Python code for each award. Then when the server checks to see if the user qualifies for the award it runs the python script with Jython (server is in Java and Netty NIO) and if the function returns a certain value I award the award to the user. Which could work but is there maybe a more efficient technique out there that will not force me to run hundreds of python scripts each time I need to check if a user got a award.
And when are the best times to do these checks? I have tought about a hook system where I will specify the hooks like ( [onconnect][ondisconnect][chatmessage.received] ). Which also could work but feels a bit crude and I will still have to run all the scripts from the database.
If I were you, I'd have a totally separate process that grants awards. It runs perhaps once a day on the underlying database that contains all your player/game data.
Your core customer-facing app knows about awards, but all it knows about them is data it loads from the DB -- something like a title, image, description, maybe how many people have the award, etc., and (based on DB tables) who has won the award.
Your "award granter" process simply runs in batch mode, once per day / hour etc, and grants new awards to eligible players. Then the core customer-facing app notifies them but doesn't actually have to know the smarts of how to grant them. This gives you the freedom to recompile and re-run your award granter any time you want with no core app impact.
Another approach, depending on how constrained your awards are, would be to write a simple rules interface that allows you to define rules in data. That would be ideal to achieve what you describe, but it's quite a bit of work for not much reward, in my opinion.
PS -- in running something like an online poker server, you're going to run into versions of this problem all the time. You are absolutely going to need to develop a way to deploy new code without killing your service or having a downtime window. Working around a java-centric code solution for awards is not going to solve that problem for you in the long run. You should look into the literature on running true 24/7 services, there are quite a few ways to address the issue and it's actually not that difficult these days.
There are a number of options I can think of:
OSGi as described above - it comes at a cost, but is probably the most generic and dynamic solution out there
If you're open to restart (just not recompile), a collection of jars in a well known folder and spring give you a cheaper but equally generic solution. Just have your award beans implement a standard interface, be beans, and let spring figure #Autowire all the available awards into your checker.
If you award execution is fairly standard, and the only variation between awards are the rules themselves, you can have some kind of scripted configuration. Many options there, from the python you described (except I'd go for a few big script managing all awards), to basic regular expressions, with LUA and Drools in the middle. In all cases you're looking at some kind of rules engine architecture, which is flexible in term of what the award can trigger on but doesn't offer much flexibility in term of what an award can lead to (i.e. perfect for achievements).
Some comments to the answer with batch ideas:
Implementing a Dynamic Award System
That batch processes can be on separate server/machine, so you can recompile the app or restart the server at any time. Having that new awards can be handled using for example the mentioned approach with adding jars and restarting the server, also new batch jobs can be introduced at any time and so on. So your core application is running 99% of the time, batch server can be restarted frequently. So separate batch machines is good to have.
When you need to deploy new version of core app I think you can just stop, deploy and start it with maintenance notice to users. That approach is used even by top poker rooms with great software (e.g. FullTiltPoker did so, right now it is down due to the license loss, but their site says 'System Update' :) ).
So one approach for versions update is to redeploy/restart in off-hours.
Another approach is real time updates. As a rule it is done by migrating users bunch by bunch to new version. So at the same time some users are using old version, some - new. Not pretty cool for poker software were users with different versions can interact. But if you are sure in the versions 'compatibility' you can go with that approach, checking user's client version for example on login.
In my answer I tried to say that you need not introduce 24/7 support logic to your code. Leave that system availability problems for hardware (failovers, loadbalancers, etc.). You can follow any good techniques you used to write code, only you need to remember that your crucial core logic is deployed not frequently (for example once a week), and batch part can be updated/restarted at any time if needed.
As far as I understand you, you probably do not have to run external processes from your application nor use OSGI.
Just create a simple Java interface and implement each plugin ('award') as a class implementing the interface. You could then just simply compile any new plugin and load it in as a class file from your application at run-time using Class.forName(String className).
Any logic you need from such a plugin would be contained in methods on the interface.
http://download.oracle.com/javase/1,5.0/docs/api/java/lang/Class.html#forName(java.lang.String)
I'm really confused - but really it's pretty wierd because I know two programming langauges but I can't figure out something simple like this...
I've been looking for ages but I can't seem to get my head around it.
You See, for a long time I've been writting in AutoIt, and I've written two programs in it that are due to go on sale soon! They were never meant to be out for a long time though (kind of like Windows Vista), so lately I've been learning Java with success. I've wrote a few very very simple applications in eclipse while going through Java tutorials! I'm now ready to transfer my programs to Java, to gain me a wider market due to Java's cross-platform ability, but I never anticipated distribution to be this complex.
My first problem I have come across is ease of use on multiple OS's: I don't want my customers to have to deal with JRE or multiple files, I need a double click solution that will work on MAC, Windows, Linux, etc so that even complete computer newbie's can launch my software! Secondly, this is not as much a problem as something I am not sure how to do. This is including files in my software package some need to stay seperate from the program but others could be compiled, but actually I suppose this can be worked around by an installer - which would probably be easier! And finally the other proram I made is stand - alone so it can work on USB sticks (which what it was designed for), now, how do you suppose I do this so it will launch on multiple opearating systems when it's plugged in without any hasle?
Update :: Forgot to add :: My concerns about security
I have read and from personal experience I know: how easy it is to decompile a .jar, and if there not protected properly read the source code! I know about obfuscation and I know I'll have legal back up but it just worries me. Even from the point of view that user may get the wrong first impretion of my software.
So to conclude in one sentace (Please read above still):
I need to be able to let people use my software written in Java by double clicking e.g. Like something made in AutoIt - a standard application i.e *.exe
Thanks in advance
There are two routes you can take without needing any extra software involved.
The first is to just make an executable jar (Java Archive) file. Java automatically associates the .jar extension with the Java interpreter on most systems. The JAR's manifest file will tell it which class to launch when you double-click it.
The second, less recommended route, is to make a Java Web Start application, with a JNLP launcher file. The is aimed more at applications distributed from web sites.
I'll suggest the third way: write several platform-specific launch scripts for your application. For example, run.bat for Windows and run.sh for Linux. Inside every script write a command to run JRE with all the necessary parameters. You can also maintain some pre-launch checks (is JRE installed?) or some platform-specific actions in this scripts.
I have a situation where I need to execute some code that for a number of reasons (that I won't get into here) is better done in Java than in PL/SQL. As I see it there are two options:
Create a jar with all my compiled code and other supporting files/other jars, load the jar into Oracle (we're running 10g), and execute the Java from a stored procedure.
Pros: The Java code integrates very nicely with the rest of the system, can be called from existing PL/SQL.
Cons: I have very little experience with running Java in Oracle.
Leave the Java in a separate jar and execute it through shell scripts.
Pros: I've written Java like this before so I'm familiar with it.
Cons: Poor integration with everything else, will probably require extra manual steps to run and manage.
The Java code will have to read XML data from Oracle tables, and write data (non-XML) to other tables, so the amount of database integration made me think loading the Java code into the database might be a good idea but I'm just not sure...
What experiences do people have loading and running Java code from within Oracle? How easy is it to test and debug? Are there are any special tools required? Any "gotchas" I should be aware of?
I would go with option number 1: loading your java code in your database. I have had good experiences with this approach and you don't need that much experience to get a good solution working with this method.
The only part where embedding Java in your database gets complicated (you'll have to set special permissions for your Java code) is when you need access to external resources (network, file i/o to name a few) and it is clearly not your situation in this problem.
For your scenario, doing it within the database seems the right approach. Reasons to do it outside could be:
dependence on Java libs that are not compatible with Oracle's built-in JVM
need to run it under a different linux user account than Oracle's
but I don't see either in your scenario