Configurate Spark by given Cluster - java

I have to send some applications in python to a Apache Spark cluster. There is given a Clustermanager and some worker nodes with the addresses to send the Application to.
My question is, how to setup and to configure Spark on my local computer to send those requests with the data to be worked out to the cluster?
I am working on Ubuntu 16.xx and already installed java and scala. I have searched the inet but the most find is how to build the cluster or some old advices how to do it, which are out of date.

i assume you remote cluster is running and you are able to submit jobs on it from remote server itself. what you need is ssh tuneling. Keep in mind that it does not work with aws.
ssh -f user#personal-server.com -L 2000:personal-server.com:7077 -N
read more here: http://www.revsys.com/writings/quicktips/ssh-tunnel.html

your question is unclear. If the data are on your local machine, you should first copy your data to the cluster on HDFS filesystem. Spark can works in 3 modes with YARN (are u using YARN or MESOS ?): cluster, client and standalone. What you are looking for is client-mode or cluster mode. But if you want to start the application from your local machine, use client-mode. If you have an SSH access, you are free to use both.
The simplest way is to copy your code directly on the cluster if it is properly configured then start the application with the ./spark-submit script, providing the class to use as an argument. It works with python script and java/scala classes (I only use python so I don't really know)

Related

How to get user count on a EC2 Windows Instance AWS

I'm trying to incorporate some AWS features into my JSF application. I have multiple EC2 instances running windows server, I would like to know how many windows users are connected to each instance and if they are actively using the system or not.
That info will be further used to create and terminate instances on the fly. I've tried using a ELB, but there is no metric for number of users connected and if they are active or not.
Currently I'm using the Java AWS SDK 1.11.657 due to some application constraints. Given that I have a list of my instances and power to create and terminate them, how would I go about finding the number of users connected to each instance? Did not find anything online using the Java SDK. Thank you.
You can use the Remote Desktop Services API/SDK or PowerShell cmdlet (Get-RDUserSession) to determine the count of active RDP sessions. There's also, allegedly, a more sophisticated cross-server PowerShell script.
To remotely invoke PowerShell scripts on Windows instances, you can use SSM Run Command. Here's an example of using the awscli to do this:
aws ssm send-command \
--document-name "AWS-RunShellScript" \
--comment "List Windows services" \
--instance-ids "i-1234567890,i-0987654321" \
--parameters commands="service --status-all" \
--output text
Note that your Windows instances need to be set up in advance to support this so see this tutorial. It's likely Linux-oriented but hopefully this gets you started in the right direction.

Connect to windows remote server using java and modify some files?

Hi I have been doing some manual tasks which consume some of my time and I want to automate it the tasks are:
Connecting to remote windows server using mstsc command and restarting some services.
Connecting to remote windows server and modifying the files, checking the modifications and then again reverting the changes when the changes are tested.
I want to know whether I can achieve a one click solution for this scenario by writing some code in java and reducing the manual time.
Or is there any other solution for the same which can be generic and could be implemented on other servers too.
The steps for the solution to cover would be:
Connect to remote machine using username and password.
Restarting the services from the code or just executing a batch file for the same which could be lying in some folder on the same machine.
Modifying some files on the remote machine.

Is it possible to run java command line app from python in AWS EC2?

I am working on some machine learning for chemical modelling in python. I need to run a java app (from command line through python subprocess.call) and a python webserver. Is this possible on AWS EC2?
I currently have this setup running on my mac but I am curious on how to set it up on aws.
Thanks in advance!
Since you're just making a command line call to the Java app, the path of least resistance would just be to make that call from another server using ssh. You can easily adapt the command you've been using with subprocess.call to use ssh -- more or less, subprocess.call(['ssh', '{user}#{server}', command]) (although have fun figuring out the quotation marks). As an aside on those lines, I usually find using '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' stabilizes scripted SSH calls in my environment.
The more involved thing will be setting up the environments to properly run the components you need. You'll need to set up ssh configs so that your django app can ssh over and set up -- probably with private key verification. Then, you'll need to make sure that your EC2 security groups are set up to allow the ssh access to your java server on port 22, where sshd listens by default.
None of this is that hairy, but all the same, it might be stabler to just wrap your Java service in a HTTP server that your Django app can hit. Anyway, hope this is helpful.

Cannot Connect to HBase Server from Java Code

I have installed Hadoop and Hbase on a VirtualBox VM running Ubuntu; both Hadoop and Hbase are running successfully in pseudo-distributed mode. I have disabled IPv6 on Ubuntu and changed the localhost to 127.0.0.1 in the hosts file on the VM.
I am trying to write some basic Java code on a Windows machine in Eclipse to connect to the Hbase instance, create a table, insert and retrieve data, etc. The code fails with an error that it cannot connect to the master. However, it makes the Zookeeper connection to the VM just fine.
On the Windows machine, I am able to connect to the Hbase instance info via the web browser via the same IP address and port that I specify in the Java code.
I have searched everywhere and tried everything that I could find, but it is still failing to connect to the master after it makes the zookeeper connection.
I have read that others have had this problem too, but no one has posted a solution.
Please help! Thanks!
The IP and Port used to view information are not the one used to read/write from/into HBase. To do so you need to use either the REST API (included in HBase) or Apache Thrift (2 thrift servers are included in HBase - thrift & thrift2)
I would recommend you to use Apache Thrift (thrift2)
To start REST use :
$HBASE-INSTALL-DIR/bin/hbase-deamon.sh start rest
To start Thrift use :
$HBASE-INSTALL-DIR/bin/hbase-deamon.sh start thrift
To start Thrift (v2) use :
$HBASE-INSTALL-DIR/bin/hbase-deamon.sh start thrift2
To use the Thrift client from Java for example you will need to install thrift on the server and then generate the Java Classes using the hbase thrift file included with HBase.
By default Thrift will be listening on the 9090 port and REST on the 8080
Usefull Links :
HBase Thrift
HBase REST
Ok -- Someone gave me some 1-1 help that fixed the problem and I wanted to pass it along. It turned out to be an IP addressing issue with the VM and with my Windows machine. First, in the etc/hosts file on the VM, I had to take out '127.0.0.1 locahost' and instead insert ' localhost'. Second, on my Windows hosts file, I had to add ' '. Thankfully, that fixed the problem. Please let me know this is unclear since I have seen this problem posted quite a few times without suitable resolution. Also, since I am writing Java code to access the HBase instance in the VM, there was no need to use Thrift or REST -- the Java API was sufficient.

Connecting to Hadoop from Java app using SSH

I'm trying to connect to remote hadoop cluster, which isn't accessible just through HDFS. Right now it is being used in that way: user connects to Jump box through SSH (e.g. ssh user#somejumboxhost.com), then from jump box server we do connect to hadoop also with ssh (e.g. ssh user#realhadoopcluster1.com). What I'm trying to do is to access files from my Scala/Java application using HDFS client. Now I'm feeling like in Matrix - "I must go deeper" and don't know how to reach the server.
May be someone had similar experience? Right now I'm trying to connect to first server with SSH client from my app, but then I don't know how to call the HDFS client.
Any ideas will be appreciated, thanks!
I can think of something like this . There is "ganymed-ssh2" api which helps you to connect to some server using ssh and run unix command from there. Using this you can connect to your jumo box.
And from there you can run command as " ssh user#realhadoopcluster1.com hadoop fs somthing"
As we can run commands with ssh like this.
From your jump box, setup a password less ssh to your hadoopcluster machine. or you can use sshpass with password.
You can visit following link to check how to use this api:
http://souravgulati.webs.com/apps/forums/topics/show/8116298-how-to-execute-unix-command-from-java-
Hadoop is implemented in Java, so you could just run the Hadoop cluster directly from your application. Use Java RMI if it's a remote cluster. This extra pipework you're trying to do makes no sense.

Categories

Resources