What is the basic configuration required in a system to install hadoop in psuedo-distributed mode?
Capacity of the following:
1.RAM
2.Processor
3.Hard disk etc.,
Follow any tutorial to setup hadoop on your machine. Here is a good document by Micheal Noll.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
You can even refer Apache website for detailed explaination of prerequisites and installation procedure to setup hadoop.
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Hadoop is designed to run on commodity hardware (affordable hardware). I think if your are using hadoop for learning or experimental purpose, then 2GB of RAM, 40 GB HDD and dual core processor will do your stuff. I would prefer to setup a linux environment rather than using VM on windows.
To try out on windows using VM ware, you can download VM sandbox freely available from various vendors such as Cloudera or Hortonworks etc.
Related
I am looking for a way to pull heap sizes (min, max, used) from a Java process on Linux. I need a lightwaight tool/command to use to do the job. Big monitoring packages are not an option.
I done some googleing and more but could not find a viable alternatives. The only possible option that I found so far is to use JMX protocol. I enabled JMX on Java appication and was succesfully poll it using various java tools that use JMX protocol/library implementation. But these Java tools are slow, taking much cpu during startup when allocating memory. What I want is a simple tool command line that would talk for example JMX protocol and poll the process for heap sizes.
I am using IBM's J9 version of Java and jstat tool is not available there.
Any ideas anyone?
Your need has probably been taken care of by now but one option, for others who might stumble on this thread, is a tool called 'jvmtop' (link: https://code.google.com/p/jvmtop/). It works with the IBM J9 JVM (among others).
I have a headless Java application, running on a remote server as a daemon-style process.
I want to extract Java level profiling information from the process, of the sort displayed by JVisualVM. For example, it should show method invocation times and so on.
What is the best way of doing this? My understanding is that JVisualVM does not profile when connecting remotely.
Ideally, the profiling information would be stored in a file for later inspection, in a manner similar to Java heap generation (with jmap) and later analysis (with a heap inspector).
You can use NetBeans profiler remote profiling capability.
Remote Profiling
Profile an application that is running on a different system than your NetBeans IDE. The profiler's remote pack can be installed on a remote system, allowing you to profile an application that is started on that system.
In fact, VisualVM is based on it.
Beside various monitoring features, the tool contains a built-in profiler based on the NetBeans profiler. While the profiler UI in VisualVM looks simple (especially when compared to the NetBeans profiler), the profiling capabilities are almost as powerful as in NetBeans.
Here is a detailed blog post about Profiling a Java remote server using Netbeans.
According to the link you will have to set up jstatd and jmx:
http://javadevsoup.blogspot.de/2012/02/remote-java-profiling-using-visual-vm.html
I just found another product that I think does this: http://chrononsystems.com/products/chronon-recording-server
I have developed a nice multi-Threaded genetic algorithm in Java that runs on a 16-core system running CentOS with 128GB of RAM.
I want to use a code profiler to see if I can figure out which portion of the code is getting bogged down when I increase the number of mutations in my simulation beyond a certain point. Memory doesn't seem to be the issue.
So I installed VNC and Eclipse 3.6SR2 on the server and installed the TPTP plugin.
PROBLEM: The biggest issue is that it doesn't look like eclipse is using more than one core when I am doing the TPTP "execution time analysis" (I checked using 'top'). Normally when the program is run from the command line it uses as many cores as there are threads in the program.
Is there a way to fix this in the eclipse configuration?
Disclaimer: My company develops JProfiler
A profiler that uses JVMTI should not change the multi-core thread distribution with respect to a regular execution. TPTP may not be the best option for you.
There are several powerful Java profilers in the market. The most well-known free option is VisualVM and a commmercial alternative with much more powerful analytic capabilities in the area of multi-threading and monitor contention analysis is JProfiler - there's a fully functional free trial.
Get a real profiler like YourKit and add the agent to your application at startup.
Then you can open an SSH tunnel with the port where the agent is running and you can remotely profile your application. It has a quite good documentation and a healthy community in their forums. And in my opinion YourKit is great for multithreaded applications, I use it a lot.
No need for VNC and installing Eclipse on a production server.
I'm looking for a tool which can profile the java application running as a windows service ( remotely/locally either way) .
I've come across VisualVM as one option. Are there any other products available other than VisualVm.I'm more interested in reputated product. Can Jprofiler do this for me?
VisualVM , Does it give class wise list of profiling ?
See this article, if you want to use Java VisualVM. It describes how java application running as a Windows service can be monitored and/or profiled using VisualVM.
JVisualVM is you best shot. It's free, comes with the JVM and gives you a pretty decent range of functionality. Not sure what you mean by "class wise list of profiling." But it will show you where the majority of your execution time is spent.
You can execute jvsiualvm by going to $JAVA_HOME/bin and typing jvisualvm. Then select the the vm you wish to profile.
You can use BTrace to instrument your application and to measure the parts of the application that you are interested in. BTrace logs its output to files, which you can later or in real-time transfer to a monitoring application, such as EurekaJ (which I've written myself).
Both tools are Open Sourced and free to use. BTrace uses the "GNU Public License v.2 w/Classpath Exception" license, while EurekaJ uses the GPLv3 license.
InfoQ Writeup on the two tool including VisualVM and a few commandline tools: http://www.infoq.com/articles/java-profiling-with-open-source
Link How can I monitor my Java application running as Windows Service with Visual VM? says that on recent Windows versions only JMX mode can be used, but not local.
Is there any profiler available in the java environment which can be used on a remote machine?
I have a .jar file (plain java code,nothing fancy) running on a remote machine and I want to profile that file. However,I can't install the profiler on the remote machine since I do not have the necessary permissions. Is there any way I can profile the application from my local machine?
Aall Java profilers I know have that ability, since the Java debugging interface (JVMTI) is inherently network-capable.
VisualVM has basic (but often sufficient) profiling features and comes with the JDK.