File could only be written to 0 of the 1 minReplication nodes - java

server A: 192.168.96.130, OS: centos7.x
server B: Localhost,my computer, OS: windows10
I install Hadoop3.1.2 on serverA, and write Java Application to write data into HDFS on server A.
When the Java Applicataion is deployed on server A, it can write files with content onto HDFS successfully.
When the Java Applicataion is deployed on server B, It can write files onto HDFS, But can't write the content in the file. Always get the error:
2020-03-18 20:56:43,460 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call Call#4 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.96.1:53463
java.io.IOException: File /canal/canal_1/canal_1-2020-3-19-4.txt could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
And below is my Java Application code:
Configuration conf = new Configuration();
FileSystem fs= FileSystem.get(new URI("hdfs://192.168.96.1:9000/"),conf,"root");
FSDataOutputStream out = fs.create(new Path("/canal/canal_1/canal_1-2020-03-10.txt"));
out.writeBytes("15, kevin15, 2020.3.15");
out.flush();
out.close();
fs.close();
How to solve this probem?

I think you should check your cluster healthy at first.http://namenode1:50070
then maybe you have not close your iptables ,so you cannot telnet the port when you on server2. you can try to execute command telnet SERVER1_IP 50020 on server1 and server2 to check it .

Related

SparkJob in multinode cluster: WARN TaskSetManager: Lost task 0.0 in stage 0.0: java.io.FileNotFoundException

I have just set up a Spark multi-node cluster. My cluster is made of an iMac and a couple of Raspberry all linked via Ethernet with ssh passwordless access to one another.
The Spark command I'm trying to execute is:
spark-submit --master spark://10.0.0.20:7077 rdd/WordCount.py
My slave nodes are:
10.0.0.10
10.0.0.11
The cod exits with the error shown on the following snippet of the log:
21/01/13 13:54:38 INFO Utils: Fetching ftp://myuser:mypassword#my-NAS-IP:21/Projects/Corso-Spark/word_count.text to /private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676/fetchFileTemp8028573497969255747.tmp
...
21/01/13 13:54:54 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.0.0.10, executor 1): java.io.FileNotFoundException: File file:/private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676/word_count.text does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:282)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:281)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
The file word_count.text inside the python is retrieved via FTP # this URL:
"ftp://myuser:mypassword#my-NAS-IP:21/Projects/Corso-Spark/word_count.text"
Apparently, the file is fetched in the master inside the
/private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676 directory and then Spark tries to retrieve the same file from the same directory on the slaves. Of course, in the slaves Spark cannot find the path.
Why?
As a further test, I've created the path /private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676 on the slave and manually put the file, but I wasn't able to get rid of the error.
Somebody can help?
Thank you in advance.
[SOLVED]: what all the tutorials I've found online don't say is that you have to mount the exact same path where the input file will be fetched on the master in each and every worker.

ClassNotFoundException: com.mongodb.ConnectionString for Apache Kafka Mongodb connector

I am configuring a Kafka Mongodb sink connector on my Windows machine.
My connect-standalone.properties file has
plugin.path=E:/Tools/kafka_2.12-2.4.0/plugins
My MongoSinkConnector.properties file has
name=mongo-sink
topics=first_topic
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
key.ignore=true
# Specific global MongoDB Sink Connector configuration
connection.uri=mongodb://localhost:27017,mongo1:27017,mongo2:27017,mongo3:27017
database=test_kafka
collection=transactions
max.num.retries=3
retries.defer.timeout=5000
type.name=kafka-connect
In the E:/Tools/kafka_2.12-2.4.0/plugins folder I have mongo-kafka-connect-1.0.1.jar file.
Command
bin\windows\connect-standalone config\connect-standalone.properties config\MongoSinkConnector.properties
The error I get is
[2020-03-23 04:04:12,376] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone)
java.lang.NoClassDefFoundError: com/mongodb/ConnectionString
at com.mongodb.kafka.connect.sink.MongoSinkConfig.createConfigDef(MongoSinkConfig.java:140)
at com.mongodb.kafka.connect.sink.MongoSinkConfig.<clinit>(MongoSinkConfig.java:78)
at com.mongodb.kafka.connect.MongoSinkConnector.config(MongoSinkConnector.java:62)
at org.apache.kafka.connect.connector.Connector.validate(Connector.java:129)
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:313)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:194)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:115)
Caused by: java.lang.ClassNotFoundException: com.mongodb.ConnectionString
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
Which other jar files should I place in the plugins folder and/or config changes do I have to make?
UPDATE 1
I have placed mongodb-driver-core-4.0.1 and bson-4.0.1 jar files also in the plugins folder, but have the same error.
Finally, I could make the mongo-kafka-connector work on Windows.
Here is what worked for me:
Kafka installation folder is E:\Tools\kafka_2.12-2.4.0
E:\Tools\kafka_2.12-2.4.0\plugins has mongo-kafka-1.0.1-all.jar file.
I downloaded this from https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
Click on the blue Download button at the left to get mongodb-kafka-connect-mongodb-1.0.1.zip file.
There is also the file MongoSinkConnector.properties in the etc folder inside the zip file.
Move it to kafka_installation_folder\plugins
My connect-standalone.properties file has the following entries:
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=E:/Tools/kafka_2.12-2.4.0/plugins/mongo-kafka-1.0.1-all.jar
My MongoSinkConnector.properties file has the following entries
name=mongo-sink
topics=topic1,topic2
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
connection.uri=mongodb://localhost:27017,localhost:27017,localhost:27017
database=test_kafka
collection=transactions
max.num.retries=3
retries.defer.timeout=5000
field.renamer.mapping=[]
field.renamer.regex=[]
max.batch.size = 0
rate.limiting.timeout=0
rate.limiting.every.n=0
How To Run
Start mongodb, zookeeper, kafka server in three consoles.
In 4th console, start Kafka connect --
bin\windows\connect-standalone config\connect-standalone.properties config\MongoSinkConnector.properties
In 5th console, send msgs to a topic (I did for topic1)
bin\windows\kafka-console-producer --broker-list localhost:9092 --topic topic1
>{"Hello":1}
>{"Mongo":2}
>{"World":3}
Open a mongo client and check your database/collections. You will see these three messages.

there are 1 datanode(s) running and 1 node(s) are excluded in this operation

I have configured hadoop-2.7.2 with pseudo node cluster in windows. I have created a client by copying hadoop package to another machine.
I can able to list, create, delete directory from client. But when i run example using below command
hadoop jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.2.jar wordcount /names /names1
getting exception as below,
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hadoop-yarn/staging/Kumar/.staging/job_1455644013935_0008/job.jar could only be replicated to 0 nodes instead of minReplication (=1). There is 1 datanode(s) running and 1 node(s) are excluded in this operation.
But when i run the same command in hadoop running node, it executed successfully.
Someone help me to submit job from client machine without any issue.
It looks like your datanodes are full. Could you please check disk space?

How to configure hadoop with eclipse

I am new in hadoop I have downloaded the hortonworks sanbox image and mounted that with virtualBox. And sanbox ui is coming in the localhost when I am typing 192.168.56.101/ in the Chrome. Also I am able to log in to hadoop shell with hue/hadoop username password. Now I want to run a simple program in eclipse. I have added hadoop-0.18.3-eclipse-plugin to the eclipse and then tried the following steps.
1.choosed map/reduce from eclipse.
2.went to hadoop location editer
localhost name:localhost
under map/reduce master
port:9000
under DFS master
port:9001
But I am getting this error
Cannot connect to the Map/Reduce location: localhost Call to
localhost/127.0.0.1:9001 failed on connection exception:
java.net.ConnectException: Connection refused: no further information
Virtual box is running.
Add required hadoop dependancy jar files to your eclipse class path.
In your main method of your mapreduce program add these lines
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:50000");
conf.set("mapreduce.job.tracker", "localhost:50001");
if you are running in virtual machine change the localhost to
required ip address (where hadoop demon runs).you can get the ip
address bytyping ifconfig
run the mapreduce program as simple java program
.you will get the output in the eclipse console.

Creating a connection with hsqldb

I have to connect a database to a java program. I'm using HSQLDB on OS X 10.7.4 with Eclipse Helios and SQLExplorer plugin.
This is the content of the runServer.sh file:
#!/bin/bash
cd ../data
java -classpath ../lib/hsqldb.jar org.hsqldb.server.Server -database.0 file:mantenimiento -dbname.0 mantenimiento
You can find it in:
http://i45.tinypic.com/jfw6tw.png
When I executed the script this is what I get:
MacBook-Pro-de-Luis:bin luis$ ./runServer.sh
: No such file or directory ../data
[Server#6016a786]: [Thread[main,5,main]]: checkRunning(false) entered
[Server#6016a786]: [Thread[main,5,main]]: checkRunning(false) exited
[Server#6016a786]: Startup sequence initiated from main() method
[Server#6016a786]: Could not load properties from file
[Server#6016a786]: Using cli/default properties only
[Server#6016a786]: Initiating startup sequence...
[Server#6016a786]: Server socket opened successfully in 7 ms.
] opened sucessfully in 505 ms.dex=0, id=0, db=file:mantenimiento, alias=mantenimiento
[Server#6016a786]: Startup sequence completed in 513 ms.
[Server#6016a786]: 2012-05-18 10:54:51.396 HSQLDB server 2.2.8 is online on port 9001
[Server#6016a786]: To close normally, connect and execute SHUTDOWN SQL
[Server#6016a786]: From command line, use [Ctrl]+[C] to abort abruptly
I have problems with cd ../data and then it creates the database and says its working but it is created in /bin and not in /data.
When I try to reate the conexion in Eclipse with SQLexplorer I get this:
http://i45.tinypic.com/21d3cl2.png
And the terminals says:
[Server#6016a786]: [Thread[HSQLDB Connection #60f47bf5,5,HSQLDB Connections #6016a786]]: database alias=mantenimiento does not exist
Anyone knows what am I doing wrong?
Thank you.
No such file or directory ../data
Create the ../data directory: it doesn't exist. So obviously, you can't cd into it.
If you look at your folder structure, you'll see that the runServer.sh file sits one level behind the bin folder. Yet, the .sh script tries to go BACK one level to find the ../data folder, the .. means go back one folder.
Try changing the ../data to just data and see what happens.

Categories

Resources