I have configured HBase and integrated with HDFS on windows successfully. I using HBase version 0.98.6.1-hadoop2 and Hadoop version 2.5.1
Followed HBase quick start tutorial.
If i run HBase normally (without hbase.cluster.distributed property) then it works fine. Otherwise it shows This is not implemented yet. Stay tuned.
How do i start HBase in cluster distributed mode on windows without cygwin?
As per my knowledge you can do in these ways
1) Use cygwin (not in your requirements).
2) Use VMWare or VirtualBox
3) Use Microsoft HDInsights (Suitable for you)
Before Starting Hbase make sure Hadoop is in distributed mode and is working only then your HBase will work in distributed mode else it will run in local mode.
Related
I am currently trying to copy data from my postgres, which runs within a docker-container, to my windows host. For this purpose I implemented a java application (also within a docker container) that uses the postgres-jdbc driver and its CopyManager in order to copy specific data to the host in mapped volume.
Problem: When I copy the data to the mapped windows directory, it becomes very slow. (Writing 1 GB of data takes about 40 minutes - without volume mapping only 1 minute)
Docker-compose:
exportservice:
build: ./services/exportservice
volumes:
- samplePath:/export_data
I have already read that it's a known problem, but I haven't found a suitable solution.
My services have to run in a production environment that is based on Windows.
So what's the way to solve this issue? WSL2?
Looking forward to your advice!
Mounting a Windows folder into a Docker container is always slow no matter how you do it. WSL2 is even slower than WSL1 in that respect.
The best solution is to install WSL2, copy all your project files into the Linux file system (mounted in Windows at \\wsl$\<distro>\), run containers from there and mount Linux directories accordingly. That bypasses any Windows file interaction.
I wrote a Docker for Web Developers book and video course because I couldn't find good getting-started tutorials which explained how to create local development environments. It includes Hyper-V and WSL2 instructions and advice. Use the discount code dock30 for 30% off.
Use WSL2 instead of WSL and use the Linux file system. But you could also reduce your write cycles in order to reduce the writing overhead. This could be achieved by using a BufferedWriter in Java.
I need a simulator to run some servers on Hadoop:
Able to work with database.
I want to run a Java on it and see its results.
Run the Hadoop without MapReduce
You don't run servers on Hadoop. It's the other way around.
If you want to create a Hadoop environment without installing Hadoop on your own, then you can download a virtual machine or start an account with any of the major cloud providers
Hadoop just starts YARN and HDFS. If you want to run code that isn't MapReduce, you'll need to find/install another tool such as Spark, Pig, Hive, Flink, etc, each of which can be used to query databases, but are not one themselves
I have created a 4-node hadoop cluster. I start all datanodes,namenode resource manager,etc.
To find whether all of my nodes are working or not, I tried the following procedure:
Step 1. I run my program when all nodes are active
Step 2. I run my program when only master is active.
The completion time in both cases were almost same.
So, I would like to know if there is any other means by which I can know how many nodes are actually used while running the program.
Discussed in the chat. The problem is caused by incorrect Hadoop installation, in both cases job was started locally using LocalJobRunner.
As a recommendations:
Install Hadoop using Ambari (http://ambari.apache.org/)
Change platform to CentOS 6.4+
Use Oracle JDK 7
Be patient with host names and firewall
Get familiar with the cluster commands for health diagnostics and default Hadoop WebUIs
What is the basic configuration required in a system to install hadoop in psuedo-distributed mode?
Capacity of the following:
1.RAM
2.Processor
3.Hard disk etc.,
Follow any tutorial to setup hadoop on your machine. Here is a good document by Micheal Noll.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
You can even refer Apache website for detailed explaination of prerequisites and installation procedure to setup hadoop.
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Hadoop is designed to run on commodity hardware (affordable hardware). I think if your are using hadoop for learning or experimental purpose, then 2GB of RAM, 40 GB HDD and dual core processor will do your stuff. I would prefer to setup a linux environment rather than using VM on windows.
To try out on windows using VM ware, you can download VM sandbox freely available from various vendors such as Cloudera or Hortonworks etc.
I have a cluster of 32 servers and I need a tool to distribute a Java service, packaged as a Jar file, to each machine and remotely start the service. The cluster consists of Linux (Suse 10) servers with 8 cores per blade. The application is a data grid which uses Oracle Coherence. What is the best tool for doing this?
I asked something similar once, and it seems that the Java Parallel Processing Framework might be what you need:
http://www.jppf.org/
From the web site:
JPPF is an open source Grid Computing
platform written in Java that makes it
easy to run applications in parallel,
and speed up their execution by orders
of magnitude. Write once, deploy once,
execute everywhere!
Have a look at OpenMOLE: http://www.openmole.org/
This tool enables you to distribute a computing workflow to several resources: from multicores machines, to clusters and computing grids.
It is nicely documented and can be controlled through groovy code or a GUI.
Distributing a jar on a cluster should be very easy to do with OpenMOLE.
Is your service packaged as an EJB? JBoss does a fairly good job with clustering.
Use Bit Torrent. Using Peer to Peer sharing style on clusters can really boost up your deployment speed.
It depends on which operating system you have and how security is setup on your network.
If you can use NFS or Windows Share, I suggest you put the software on an NFS drive which is visible to all machines. That way you can run them all from one copy.
If you have remote shell or secure remote shell you can write a script which runs the same command on each machine e.g. start on all machines, or stop on all machines.
If you have windows you might want to setup a service on each machine. If you have linux you might want to add a startup/shutdown script to each machine.
When you have a number of machines, it may be useful to have a tool which monitors that all your services are running, collects the logs and errors in one place and/or allows you to start/stop them from a GUI. There are a number of tools to do this, not sure which is the best these days.