How to practice Hadoop online? [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Is there a way to find an online Hadoop database and practice on it using Java?
I found that you can practice on www.gethue.com, but I don't think you can do it using java.
Thank you

You can try Cloudera Live.
It's in beta, but seems to work pretty well.

I made a small list of free offers enabling you to manage your own hadoop cluster. It's not technically an available database, but you can fill these clusters with the data you want.
Here is the list :
Microsoft Azure HDInsight : they offer you 150€ to spend on their products. You can rent a Hadoop cluster and work on it.
Qubole : they give you preconfigured Hadoop clusters, you have 75 computing hours for free
Joyent : you can have one VM for free for a year.
You may also try amazon's Elastic Map Reduce, although I'm not sure this specific offer is included in their free trial. An advantage of using it is you can access free datasets more easily (for instance, this one).
Please also note that all these services (except Qubole) require a credit card for registration.

Related

how to distribute the load in a billing system? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
My use case is to develop a telecom billing in Java. Knowing that i must calculate 60 000 bills per day, i need to distribute the load of the calculation on several(three or four) servers.
That is called Clustering ( correct me if it's wrong).
My Solution is to develop a middleware that will distribute for each server a list of client who should be charged, and then each server will calculate the bill and generate a PDF file.
Could you give me some more ideas, for example which Java class do i need, or which methods shall i use.
Thank you
if you use jms (maybe as part of using j2ee?) you could simply use a jms queue for this - have several consumers (mdbs in j2ee) on every node, and send the list of clients to process to the queue.
the queue will guarantee that every message (==client) will be handled by one and only one listener, and since each node will have a limited number of listeners you get work distribution this way.
try apache hadoop.
http://hadoop.apache.org/
It will be perfect for this kind of divide and conquer tasks.
Spring batch will be a good fit for that: http://projects.spring.io/spring-batch/
It has support for automatic restart of jobs and partitioning ( distribution over many machines ).
There is an entire chaptr in the docs dedicated to scaling/distribution: http://docs.spring.io/spring-batch/2.2.x/reference/html/scalability.html

Java application as a service - what are the options out there? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have a Java application. I can expose it using web services or REST (JAX-WS or JAX-RS).
Now I actually want to run it "in a cloud" and expose it as a service. I have read around that there are services e.g. Apigee, Rackspace, Google App Engine... The idea is that I don't want to worry about scale and performance. I want that handled by the host.
What are the options for Java?
Thanks,
David.
After a bit of homework, here's what I am gathering:
This is really about Java PaaS offerings (platform as a service).
In addition to what I previously mentioned,
Google App Engine
Rackspace
Apigee
it's worth adding:
Jelastic
CloudBees
There's also a guide - albeit old - from InfoQ that can be read here.
Depends on amount of money you can spend.
This cloud hosting seems interesting.
There's Heroku too, or even AWS.
Amazon AWS provides auto-scaling features that you can configure so you don't have to "worry about scale" day to day, though you do have to set it up in the first place (you will also have to monitor your bill in case you are scaling big-time ;) ). It works well and provides decent monitoring/visualization if you are happy to do the set up.
I can't say whatthe other systems you are investigating have to offer in terms of automatic scaling though.

Using a nosql database for very large dataset with small data size highly written and moderate read [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
what is a better nosql database for creating a system to record advertisement data for about 50 to 200 millions insert per day, the aggregation of the data will be used to show the pattern of how users engage with the ads. I really like MongoDB but it seems that major industry players are picking Riak for the job. It seems that Mongo had to flush some caveats in last 2 releases and the current version seems to be pretty good for the job, any idea?
It seems mongodb with hadoop (http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/ ) fits your data requirements. You can store data in mongodb and run aggregation jobs (map/reduce) on hadoop cluster.
I'd use Java Berkeley DB from Oracle.
Very powerful and easy to use Open source but not free.
This is not a type of question that could be answered as Product1 or Product2. It is just too small amount of information given. There is no info about the environment, where the system will run, what type of information will be inserted, how are you going to aggregate it.
The best way is to try:
write a test using Product1,
write a the same thing with a Product2
start inserting the data which looks as close as possible to the
data you are assuming to get in real environment
make measurements of speed and whatever factors you need
and only based on that you will be able to determine what suits you

Machine Learning framework with Hadoop [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Which other frameworks exists besides Mahout for implementing Machine Learning algorithms in JAVA such that the underlying framework takes the JAVA code and runs it on Hadoop?
I am looking for alternatives to Mahout because I am need of a SVM and an Agglomerative Clustering implementation on Hadoop, and only SVM is supported in Mahout.
I recommend you guys for Apache Hadoop based machine learning / data mining library like Apache Mahout.
http://www.openankus.org/pages/viewpage.action?pageId=2195722
It is so simple and easy mapreduce job processing. Are you interested in? See more wiki (http://www.openankus.org)
Well, if SVM is on hadoop, the rest is easy to implement!
Note that naive agglomerative clustering algorithm is not efficient for large data ( O(n^2) complexity). Such complexity makes it impossible to run the algorithm on a large dataset, even on a big cluster, unless you try one of its extensions like this one: ftp://193.167.42.127/franti/papers/GraphPnn-TPAMI.pdf
Pattern. It has a Java API and you can use R too.
http://www.cascading.org/pattern/
A quick Googling gave the following
http://java-ml.sourceforge.net/ - After close to 3 years, there was a release. Not sure how well it is supported and what algorithms are implemented.
http://sourceforge.net/projects/weka/ - Some recent recommendations by others look good.
Also, see this thread.
Haven't tried both of them.

Good Website for aspiring Java technical architects [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I was trying to find out a good website for aspiring technical architects. To be precise, I have worked for 10 years in Java/J2EE areas, and now would like to gain further knowledge on architecture side of application. At the same time If I could see upcoming trends in technology that could provide a roadmap for Java professionals, would be of great help.
I usually get an hour or two to spend on doing extra things that includes scanning web sites, reading some articles etc..
I would like to know from experts, what site usually can be referred that can enrich me with good knowledge by spending an hour or two regularly? Or If you can share your experience would certainly help.
I like infoq.
I like Java Posse. They have lot of relevant podcasts for technical architects http://www.javaposse.com/
java lobby on dzone is quite a nice place, but you probably have already found that out. they do publish a variety of excellent quality articles.

Categories

Resources