geode examples of clientSecurity run failed - java

version 1.10 , Apache geode exampples of clientSecurity
when I build the project and execute the 'start' task, the GemFireSecurityException always occurs when start the server. even I can find the file "example_security.json" in the dir build/resources/main/.
and locator can find the file but server can't, why?
> Task :clientSecurity:start
1. Executing - start locator --name=locator --bind-address=127.0.0.1 --connect=false --security-properties-file=******** --classpath=../build/resources/main/
........
Locator in C:\Users\kenneth\Desktop\geode-examples-master\clientSecurity\locator on 127.0.0.1[10334] as locator is currently online.
2. Executing - start server --name=server1 --locators=127.0.0.1[10334] --classpath=../build/resources/main/:../build/classes/java/main/ --security-properties-file=******** --server-port=0 --user=superUser --password=********
...The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in C:\Users\kenneth\Desktop\geode-examples-master\clientSecurity\server1 for full details.
Exception in thread "main" org.apache.geode.security.GemFireSecurityException: ExampleSecurityManager: unable to find json resource "example_security.json" as specified by [security-json].
at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:842)
at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:732)
at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:251)
************************* Execution Summary ***********************
Script file: C:\Users\kenneth\Desktop\geode-examples-master\clientSecurity\scripts\start.gfsh
Command-1 : start locator --name=locator --bind-address=127.0.0.1 --connect=false --security-properties-file=example_security.properties --classpath=../build/resources/main/
Status : PASSED
Command-2 : start server --name=server1 --locators=127.0.0.1[10334] --classpath=../build/resources/main/:../build/classes/java/main/ --security-properties-file=./example_security.properties --server-port=0 --user=superUser --password=123
Status : FAILED

I've just tried this myself locally and it worked just fine, below is the execution output:
user#localhost~/git/geode-examples ((rel/v1.10.0)): cd clientSecurity/
user#localhost~/git/geode-examples/clientSecurity ((rel/v1.10.0)): ../gradlew build
> Task :clientSecurity:compileJava
Note: /Users/user/git/geode-examples/clientSecurity/src/main/java/org/apache/geode_examples/clientSecurity/ExampleAuthInit.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
BUILD SUCCESSFUL in 17s
5 actionable tasks: 4 executed, 1 up-to-date
user#localhost~/git/geode-examples/clientSecurity ((rel/v1.10.0)): ../gradlew start
> Task :clientSecurity:start
1. Executing - start locator --name=locator --bind-address=127.0.0.1 --connect=false --security-properties-file=******** --classpath=../build/resources/main/
......
Locator in /Users/user/git/geode-examples/clientSecurity/locator on 127.0.0.1[10334] as locator is currently online.
Process ID: 3103
Uptime: 8 seconds
Geode Version: 1.10.0
Java Version: 1.8.0_221
Log File: /Users/user/git/geode-examples/clientSecurity/locator/locator.log
JVM Arguments: -DgemfireSecurityPropertyFile=/Users/user/git/geode-examples/clientSecurity/example_security.properties -Dgemfire.enable-cluster-configuration=true -Dgemfire.load-cluster-configuration-from-dir=false -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path: /Users/user/git/geode-examples/build/apache-geode-1.10.0/lib/geode-core-1.10.0.jar:../build/resources/main/:/Users/user/git/geode-examples/build/apache-geode-1.10.0/lib/geode-dependencies.jar
2. Executing - start server --name=server1 --locators=127.0.0.1[10334] --classpath=../build/resources/main/:../build/classes/java/main/ --security-properties-file=******** --server-port=0 --user=superUser --password=********
...==========-> 94% EXECUTING [11s]
Server in /Users/user/git/geode-examples/clientSecurity/server1 on 10.255.203.195[50649] as server1 is currently online.
Process ID: 3119
Uptime: 3 seconds
Geode Version: 1.10.0
Java Version: 1.8.0_221
Log File: /Users/user/git/geode-examples/clientSecurity/server1/server1.log
JVM Arguments: -DgemfireSecurityPropertyFile=/Users/user/git/geode-examples/clientSecurity/./example_security.properties -Dgemfire.locators=127.0.0.1[10334] -Dgemfire.security-username=superUser -Dgemfire.start-dev-rest-api=false -Dgemfire.security-password=******** -Dgemfire.use-cluster-configuration=true -XX:OnOutOfMemoryError=kill -KILL %p -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path: /Users/user/git/geode-examples/build/apache-geode-1.10.0/lib/geode-core-1.10.0.jar:../build/resources/main/:../build/classes/java/main/:/Users/user/git/geode-examples/build/apache-geode-1.10.0/lib/geode-dependencies.jar
3. Executing - start server --name=server2 --locators=127.0.0.1[10334] --classpath=../build/resources/main/:../build/classes/java/main/ --security-properties-file=******** --server-port=0 --user=superUser --password=********
...
Server in /Users/user/git/geode-examples/clientSecurity/server2 on 10.255.203.195[50674] as server2 is currently online.
Process ID: 3120
Uptime: 3 seconds
Geode Version: 1.10.0
Java Version: 1.8.0_221
Log File: /Users/user/git/geode-examples/clientSecurity/server2/server2.log
JVM Arguments: -DgemfireSecurityPropertyFile=/Users/user/git/geode-examples/clientSecurity/./example_security.properties -Dgemfire.locators=127.0.0.1[10334] -Dgemfire.security-username=superUser -Dgemfire.start-dev-rest-api=false -Dgemfire.security-password=******** -Dgemfire.use-cluster-configuration=true -XX:OnOutOfMemoryError=kill -KILL %p -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path: /Users/user/git/geode-examples/build/apache-geode-1.10.0/lib/geode-core-1.10.0.jar:../build/resources/main/:../build/classes/java/main/:/Users/user/git/geode-examples/build/apache-geode-1.10.0/lib/geode-dependencies.jar
4. Executing - connect --user=superUser --password=******** --use-ssl=true --key-store=keystore.jks --key-store-password=******** --trust-store=truststore.jks --trust-store-password=********
Connecting to Locator at [host=localhost, port=10334] ..
Connecting to Manager at [host=10.255.203.195, port=1099] ..
Successfully connected to: [host=10.255.203.195, port=1099]
5. Executing - create region --name=region1 --type=REPLICATE
Member | Status | Message
------- | ------ | --------------------------------------
server1 | OK | Region "/region1" created on "server1"
server2 | OK | Region "/region1" created on "server2"
Cluster configuration for group 'cluster' is updated.
6. Executing - create region --name=region2 --type=PARTITION
Member | Status | Message
------- | ------ | --------------------------------------
server1 | OK | Region "/region2" created on "server1"
server2 | OK | Region "/region2" created on "server2"
Cluster configuration for group 'cluster' is updated.
************************* Execution Summary ***********************
Script file: /Users/user/git/geode-examples/clientSecurity/scripts/start.gfsh
Command-1 : start locator --name=locator --bind-address=127.0.0.1 --connect=false --security-properties-file=example_security.properties --classpath=../build/resources/main/
Status : PASSED
Command-2 : start server --name=server1 --locators=127.0.0.1[10334] --classpath=../build/resources/main/:../build/classes/java/main/ --security-properties-file=./example_security.properties --server-port=0 --user=superUser --password=123
Status : PASSED
Command-3 : start server --name=server2 --locators=127.0.0.1[10334] --classpath=../build/resources/main/:../build/classes/java/main/ --security-properties-file=./example_security.properties --server-port=0 --user=superUser --password=123
Status : PASSED
Command-4 : connect --user=superUser --password=123 --use-ssl=true --key-store=keystore.jks --key-store-password=password --trust-store=truststore.jks --trust-store-password=password
Status : PASSED
Command-5 : create region --name=region1 --type=REPLICATE
Status : PASSED
Command-6 : create region --name=region2 --type=PARTITION
Status : PASSED
BUILD SUCCESSFUL in 28s
8 actionable tasks: 2 executed, 6 up-to-date
user#localhost~/git/geode-examples/clientSecurity ((rel/v1.10.0)): ../gradlew stop
> Task :clientSecurity:stop
1. Executing - connect --locator=127.0.0.1[10334] --user=superUser --password=******** --use-ssl=true --key-store=./keystore.jks --key-store-password=******** --trust-store=./truststore.jks --trust-store-password=********
Connecting to Locator at [host=127.0.0.1, port=10334] ..
Connecting to Manager at [host=10.255.203.195, port=1099] ..
Successfully connected to: [host=10.255.203.195, port=1099]
2. Executing - shutdown --include-locators=true
Shutdown is triggered
************************* Execution Summary ***********************
Script file: /Users/user/git/geode-examples/clientSecurity/scripts/stop.gfsh
Command-1 : connect --locator=127.0.0.1[10334] --user=superUser --password=123 --use-ssl=true --key-store=./keystore.jks --key-store-password=password --trust-store=./truststore.jks --trust-store-password=password
Status : PASSED
Command-2 : shutdown --include-locators=true
Status : PASSED
BUILD SUCCESSFUL in 3s
2 actionable tasks: 1 executed, 1 up-to-date
user#localhost~/git/geode-examples/clientSecurity ((rel/v1.10.0)):
I've tried on MacOS and I've noticed you're using Windows instead, maybe the problem is caused by the path separator used within the start.gfsh script?.
Can you change the scripts under geode-examples\clientSecurity\scripts to use full paths and give it a try?.

I changed to full path of locator and server, here is all output:
PS C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security> C:\Users\hw83770\Documents\pivotal-gemfire-9.8.0\bin\gfsh.bat run --file=.\scripts\start.gfsh
1. Executing - start locator --name=clocator --bind-address=127.0.0.1 --connect=false --security-properties-file=******** --classpath=C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\build\resources\main
......
Locator in C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\clocator on 127.0.0.1[10334] as clocator is currently online.
Process ID: 28816
Uptime: 7 seconds
Geode Version: 9.8.0
Java Version: 1.8.0_161
Log File: C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\clocator\clocator.log
JVM Arguments: -DgemfireSecurityPropertyFile=C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\example_security.properties -Dgemfire.enable-cluster-configuration=true -Dgemfire.load-cluster-configuration-from-dir=false -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806
Class-Path: C:\Users\hw83770\Documents\pivotal-gemfire-9.8.0\lib\geode-core-9.8.0.jar;C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\build\resources\main;C:\Users\hw83770\Documents\pivotal-gemfire-9.8.0\lib\geode-dependencies.jar;C:\Users\hw83770\Documents\pivotal-gemfire-9.8.0\extensions\gemfire-greenplum-3.4.1.jar
2. Executing - start server --name=cserver1 --locators=127.0.0.1[10334] --classpath=C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\build\resources\main:C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\build\classes\java\main --security-properties-file=******** --server-port=0 --user=superUser --password=********
...The Cache Server process terminated unexpectedly with exit status 1. Please refer to the log file in C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\cserver1 for full details.
Exception in thread "main" org.apache.geode.security.GemFireSecurityException: ExampleSecurityManager: unable to find json resource "example_security.json" as specified by [security-json].
at org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:824)
at org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:716)
at org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:236)
************************* Execution Summary ***********************
Script file: .\scripts\start.gfsh
Command-1 : start locator --name=clocator --bind-address=127.0.0.1 --connect=false --security-properties-file=example_security.properties --classpath=C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\build\resources\main
Status : PASSED
Command-2 : start server --name=cserver1 --locators=127.0.0.1[10334] --classpath=C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\build\resources\main:C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\build\classes\java\main --security-properties-file=./example_security.properties --server-port=0 --user=superUser --password=123
Status : FAILED
besides, it says
1. Please refer to the log file in C:\Users\hw83770\git\frameworkpoc\rio-geode-cli\client-security\cserver1 for full details
Actually, there is no any log file, and I'm not familiar with the geode source code so dont know how to deal with this.
I'm working the POC of geode, our team need to ensure the geode supports security of client and end points, it's very important, so I'm here for some help.

I'm still convinced the problem is caused by a problem within your environment, specially the classpath. As you can see here, the start.gfsh script sets the member's classpath to contain ../build/resources/main/, exactly the folder under which the example_security.json file should be located after building the project with Gradle.
I've just noticed that, at the very start of your code snippet, you have
C:\Users\hw83770\Documents\pivotal-gemfire-9.8.0\bin\gfsh.bat run --file=.\scripts\start.gfsh... why is that?, according to the instructions you should execute $ ../gradlew start under the clientSecurity directory instead. Using C:\Users\hw83770\Documents\pivotal-gemfire-9.8.0\bin\gfsh.bat run --file=.\scripts\start.gfsh is basically changing the folder from which the script is executed and, thus, ../build/resources/main/ doesn't point to what it should anymore, this is probably the reason why the example fails.
Last, but not least, you must not mix Pivotal GemFire with Apache Geode, things will probably not work as expected.

Related

Gitlab ci selenium testing with docker not connecting to RemoteWebDriver

I want to run automatically selenium tests with gitlab-ci, docker.
Locally everything works fine, but it seems like, there are some connection issues with docker and selenium.
The job is failing with
selenium.test.dashboard.MyTest > myFirstTest FAILED
org.openqa.selenium.remote.UnreachableBrowserException
Caused by: java.net.ConnectException
Caused by: java.net.ConnectException
java.lang.NullPointerException
I tried to change different url's to connect to selenium server and I thought there is maybe a port issue. But every combination which I tried ended up with the same result.
.gitlab-ci.yml
image: gradle:alpine
variables:
GRADLE_OPTS: "-Dorg.gradle.daemon=false"
before_script:
- export GRADLE_USER_HOME=`pwd`/.gradle
stages:
- build
- seleniumTesting
build:
stage: build
script:
- echo $CI_JOB_STAGE
- echo $CI_COMMIT_REF_NAME
- gradle --build-cache war
artifacts:
paths:
- public
cache:
key: "$CI_COMMIT_REF_NAME"
policy: push
paths:
- build
- .gradle
seleniumTestingChrome:
stage: seleniumTesting
script: gradle integrationTest
# services:
# - selenium/standalone-chrome:latest
services:
- name: selenium/standalone-chrome:latest
artifacts:
paths:
- build/reports/tests/
cache:
key: "$CI_COMMIT_REF_NAME"
policy: push
paths:
- build
- .gradle
Java code for RemoteWebDriver
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setBrowserName(DesiredCapabilities.chrome().getBrowserName());
try {
// driver = new RemoteWebDriver( new URL("http://selenium_standalone-chrome:4444/wd/hub"), capabilities);
WebDriver driver = new RemoteWebDriver( new URL("http://127.0.0.1:4444/wd/hub"), capabilities);
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
created container on runner
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
92f018da4cbe 8017d8c2ba74 "sh -c 'if [ -x /usr…" 20 seconds ago Up 19 seconds runner-Y2QWpCBd-project-4-concurrent-0-build-4
9dfdc838a7af 9e599fb82f84 "/opt/bin/entry_poin…" 40 seconds ago Up 38 seconds 4444/tcp runner-Y2QWpCBd-project-4-concurrent-0-selenium__standalone-chrome-0
docker logs command on runner
019-08-30 17:06:02,099 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2019-08-30 17:06:02,101 INFO supervisord started with pid 7
2019-08-30 17:06:03,106 INFO spawned: 'xvfb' with pid 10
2019-08-30 17:06:03,109 INFO spawned: 'selenium-standalone' with pid 11
17:06:03.826 INFO [GridLauncherV3.parse] - Selenium server version: 3.141.59, revision: e82be7d358
2019-08-30 17:06:03,830 INFO success: xvfb entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-08-30 17:06:03,830 INFO success: selenium-standalone entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
17:06:04.065 INFO [GridLauncherV3.lambda$buildLaunchers$3] - Launching a standalone Selenium Server on port 4444
2019-08-30 17:06:04.200:INFO::main: Logging initialized #1058ms to org.seleniumhq.jetty9.util.log.StdErrLog
17:06:04.804 INFO [WebDriverServlet.<init>] - Initialising WebDriverServlet
17:06:05.050 INFO [SeleniumServer.boot] - Selenium Server is up and running on port 4444
I do expect that the test is running on the gitlab-ci-runner in a docker container. Connects to the selenium-server and executes the selenium test with a public available url
As pointed out by #Sascha Frinken
The URL to connect to the RemoteWebDriver was wrong. I missed one underscore.
"http://selenium_standalone-chrome:4444/wd/hub"
VS
"http://selenium__standalone-chrome:4444/wd/hub"
there is topic in gitlab https://docs.gitlab.com/ee/ci/services/#accessing-the-services, which states
Everything after the colon (:) is stripped.
Slash (/) is replaced with double underscores (__) and the primary alias is created.
Slash (/) is replaced with a single dash (-) and the secondary alias is created (requires GitLab Runner v1.1.0 or higher).

Spark Cannot assign requested address: Service Driver failed after 16 retries

I have a cluster of 3 workers Spark. (worker-1, worker-2, worker-3) that runs with Spark 2.0.2.
The Spark Master is started on worker-1.
I submit my application with the following script :
#!/bin/bash
sparkMaster=spark://worker-1:6066
mainClass=my.package.Main
jar=/path/to/my/jar-with-dependencies.jar
driverPort=7079
blockPort=7082
deployMode=cluster
$SPARK_HOME/bin/spark-submit \
--conf "spark.driver.port=${driverPort}"\
--conf "spark.blockManager.port=${blockPort}"\
--class $mainClass \
--master $sparkMaster \
--deploy-mode $deployMode \
$jar
When my driver is started on the worker-1 (Worker + Master), everything is ok, and my application is correctly executed using all workers
But when my driver start on another worker (worker-2 or worker-3), he fails with error :
Launch Command: "/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.2-bin-hadoop2.7/conf/:/root/spark-2.0.2-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.submit.deployMode=cluster" "-Dspark.app.name=my.package.Main" "-Dspark.driver.port=7083" "-Dspark.blockManager.port=7082" "-Dspark.master=spark://worker-1:7077" "-Dspark.jars=file:/path/to/my/jar-with-dependencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#worker-2:7078" "/data/spark/work/driver-20181001132624-0001/jar-with-dependencies.jar" "my.package.Main"
========================================
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
...
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Service 'Driver' could not bind on port 0. Attempting port 1.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'Driver' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'Driver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
My 3 workers are configured as follow :
SPARK_LOCAL_IP=worker-[X]
SPARK_LOCAL_DIRS=/data/spark/tmp
SPARK_WORKER_PORT=7078
SPARK_WORKER_DIR=/data/spark/work
SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=86400 -Dspark.worker.cleanup.interval=1800"
After multiple attempts to solve this problem, I tried to force the start of the driver on the master machine by adding to my submit the following option :
--conf "spark.driver.host=worker-1"
But the driver still start on a random worker, so it does not solve my problem.
Edit :
When I submit with the spark.driver.host option, the option does not appear in the Launch Command log (but the spark.driver.port appear, so I don't understand why my option is not taken this time).
Edit 2 :
I have done some deeper tests :
I now have only one worker running on worker-2, still submitting from worker-1 where my master is running.
When I submit my application, I can see on my worker logs :
2018-10-04 11:27:39,794 | dispatcher-event-loop-6 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Asked to launch driver driver-20181004112739-0003
2018-10-04 11:27:39,833 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Copying user jar file:/path/to/myjar-with-depencies.jar to /data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar
2018-10-04 11:27:39,833 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Copying /path/to/myjar-with-depencies.jar to /data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar
2018-10-04 11:27:40,243 | DriverRunner for driver-20181004112739-0003 | INFO | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) | Launch Command: "/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.0-bin-hadoop2.7/conf/:/root/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.driver.supervise=false" "-Dspark.history.fs.cleaner.interval=12h" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://worker-1:7077" "-Dspark.history.fs.cleaner.maxAge=1d" "-Dspark.app.name=my.package.Main" "-Dspark.jars=file:/path/to/myjar-with-depencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#worker-2:7078" "/data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar" "my.package.Main"
2018-10-04 11:27:42,692 | dispatcher-event-loop-8 | WARN | org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66) | Driver driver-20181004112739-0003 exited with failure
And I still have the same error in my driver logs.
I then tried to run manually the command that is launched by the DriverRunner :
"/usr/java/jdk1.8.0_181-amd64/jre/bin/java" "-cp" "/root/spark-2.0.0-bin-hadoop2.7/conf/:/root/spark-2.0.0-bin-hadoop2.7/jars/*" "-Xmx1024M" "-Dspark.driver.supervise=false" "-Dspark.history.fs.cleaner.interval=12h" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://worker-1:7077" "-Dspark.history.fs.cleaner.maxAge=1d" "-Dspark.app.name=my.package.Main" "-Dspark.jars=file:/path/to/myjar-with-depencies.jar" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker#worker-2:7078" "/data/spark/work/driver-20181004112739-0003/myjar-with-depencies.jar" "my.package.Main"
And when I do that, the application start correctly (surprisingly).
What is the difference between my manual start, and the one from the Driver-Runner that can cause my binding error ?
Note :
I have made no modification on the Driver-Runner command line to work
I manually launched my command line in root, and my spark runs in root too.
Had the same behavior on Spark 2.0.0 and Spark 2.0.2
So, I answer my own question as I found the reason of this weird behavior.
It does happend when I run the spark-submit from a machine where there is a spark-env.sh file. And more precisely, when the SPARK_LOCAL_IP is set in this machine.
To avoid this problem, I created a 4th machine, running only a Spark Master and with no spark-env.sh file and from which I run my spark submit.

tachyon0.8.2 deployed with hadoop2.6.0,but the IPC version are not matched

Now,I want to deploy the tachyon0.8.2 on my ubuntu14.04,I already has hadoop and spark:
on the master
bd#master$ jps
11871 Jps
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
on the slave
bd#slave$ jps
4350 Jps
2778 NodeManager
2647 DataNode
2879 Worker
And I editor the taachyon-env.sh:
export TACHYON_MASTER_ADDRESS=${TACHYON_MASTER_ADDRESS:-master}
export TACHYON_UNDERFS_ADDRESS=${TACHYON_UNDERFS_ADDRESS:-hdfs://master:9000}
Then, I run the bin/tachyon formatand bin/tachyon-start.sh local.
I cannot see the tachyonMaster in JPS:
/usr/local/bigdata/tachyon-0.8.2 [06:06:32]
bd$ bin/tachyon-start.sh local
Killed 0 processes on master
Killed 0 processes on master
Connecting to master as bd...
Killed 0 processes on master
Connection to master closed.
[sudo] password for bd:
Formatting RamFS: /mnt/ramdisk (512mb)
Starting master # master
Starting worker # master
/usr/local/bigdata/tachyon-0.8.2 [06:06:54]
bd$ jps
12183 TachyonWorker
3388 Master
2919 NameNode
3266 ResourceManager
3123 SecondaryNameNode
12203 Jps
and I see the logs in master.logs,I said that:
2015-12-27 18:06:50,635 ERROR MASTER_LOGGER (MetricsConfig.java:loadConfigFile) - Error loading metrics configuration file.
2015-12-27 18:06:51,735 ERROR MASTER_LOGGER (HdfsUnderFileSystem.java:<init>) - Exception thrown when trying to get FileSystem for hdfs://master:9000
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at tachyon.underfs.hdfs.HdfsUnderFileSystem.<init>(HdfsUnderFileSystem.java:74)
at tachyon.underfs.hdfs.HdfsUnderFileSystemFactory.create(HdfsUnderFileSystemFactory.java:30)
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:116)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
2015-12-27 18:06:51,742 ERROR MASTER_LOGGER (TachyonMaster.java:main) - Uncaught exception terminating Master
java.lang.IllegalArgumentException: All eligible Under File Systems were unable to create an instance for the given path: hdfs://master:9000
java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at tachyon.underfs.UnderFileSystemRegistry.create(UnderFileSystemRegistry.java:132)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:100)
at tachyon.underfs.UnderFileSystem.get(UnderFileSystem.java:83)
at tachyon.master.TachyonMaster.connectToUFS(TachyonMaster.java:412)
at tachyon.master.TachyonMaster.startMasters(TachyonMaster.java:280)
at tachyon.master.TachyonMaster.start(TachyonMaster.java:261)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:64)
What should I do for this problem?
This exception arises due to version mismatch of Hadoop client and server side. Check your Hadoop version, and then recompile Tachyon against that version using this command:
mvn -Dhadoop.version=your_hadoop_version clean install
Example: mvn -Dhadoop.version=2.4.0 clean install
Now configure your compiled Tachyon and it should work fine. Reference link.

can't create java application in openshift

I'm developing a web application using Struts 2, Hibernate and etc and not using Maven. I put the needed jars in app.
I put my app in git (https://github.com/vahidhiv/vaphap) and want to use OpenShift to test it but I face this error.
What should I do?
The initial build for the application failed: Shell command '/sbin/runuser -s /bin/sh 55966353e0b8cdebf9000040 -c "exec /usr/bin/runcon 'unconfined_u:system_r:openshift_t:s0:c2,c481' /bin/sh -c \"gear postreceive --init >> /tmp/initial-build.log 2>&1\""' returned an error. rc=255 .Last 10 kB of build output: The jbossews cartridge is already stopped Repairing links for 1 deployments Building git ref 'master', commit 6e5a635 Skipping Maven build due to absence of pom.xml Preparing build for deployment Deployment id is 3ec7fc2a Activating deployment Starting jbossews cartridge jbossews process failed to start ------------------------- Git Post-Receive Result: failure Activation status: failure Activation failed for the following gears: 55966353e0b8cdebf9000040 (Error activating gear: CLIENT_ERROR: Failed to execute: 'control start' for /var/lib/openshift/55966353e0b8cdebf9000040/jbossews # # ) Deployment completed with status: failure postreceive failed

Tooltwist Controller connection reset by peer

I am trying to deploy our designer using the Tooltwist Controller and I keep receiving the following error:
+------------------------------------------------------------------------------------------+
| |
| GENERATION PHASE |
| |
+------------------------------------------------------------------------------------------+
...
**
** Check the server is running
**
Setting JAVA_OPTS=-Xms512m -Xmx5g -XX:MaxPermSize=512m
Starting the launchpad...
$ ./startup.sh
Wait a bit...
-
Error with http request: Connection reset by peer
==>> Status is error - Connection reset by peer
==>> Status is down
**
** Fatal error: Could not start the launchpad.
**
Finished: SUCCESS
I have tried changing the tomcat version in the payloads using both Tomcat 7.0.54 and Tomcat 7.0.40 but the issue persists
It appears that the launchpad server is not starting correctly. There are many reasons why Tomcat might not start, so the best first step is to look at the Tomcat log file, which will be located somewhere like /ControllerV8/launchpads/<launchpad-name>/image/tomcat/logs/catalina.out.
One possibility could be that another launchpad's server is already running using the same launchpad ports.

Categories

Resources