Mapreduce wordcount example is giving error

Mapreduce wordcount example is giving error - java

I used step by step guide as below.
https://phoenixnap.com/kb/install-hadoop-ubuntu
Then I tried to run mapreduce word count file on a text file.
The problem is that the program is not running and I am getting the AM Container for app and Exception from container-launch.
Is there any solution to this?
All nodes are working.
6544 Jps
3041 NameNode
3842 NodeManager
3219 DataNode
3494 SecondaryNameNode
3706 ResourceManager
Below is the yarn status output for my application.
doop#contactkarim-VirtualBox:~/hadoop-3.3.1$ yarn app -status application_1667981786519_0006
2022-11-09 11:35:22,184 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /127.0.0.1:8032
2022-11-09 11:35:22,522 INFO conf.Configuration: resource-types.xml not found
2022-11-09 11:35:22,522 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
Application Report :
Application-Id : application_1667981786519_0006
Application-Name : word count
Application-Type : MAPREDUCE
User : hdoop
Queue : default
Application Priority : 0
Start-Time : 1667982679380
Finish-Time : 1667982691120
Progress : 0%
State : FAILED
Final-State : FAILED
Tracking-URL : http://contactkm-VirtualBox:8088/cluster/app/application_1667981786519_0006
RPC Port : -1
AM Host : N/A
Aggregate Resource Allocation : 20250 MB-seconds, 8 vcore-seconds
Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
Log Aggregation Status : DISABLED
Diagnostics : Application application_1667981786519_0006 failed 2 times due to AM Container for appattempt_1667981786519_0006_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2022-11-09 11:31:31.113]Exception from container-launch.
Container id: container_1667981786519_0006_02_000001
Exit code: 1
[2022-11-09 11:31:31.116]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2022-11-09 11:31:31.116]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://contactkarim-VirtualBox:8088/cluster/app/application_1667981786519_0006 Then click on links to logs of each attempt.
. Failing the application.
Unmanaged Application : false
Application Node Label Expression : <Not set>
AM container Node Label Expression : <DEFAULT_PARTITION>
TimeoutType : LIFETIME ExpiryTime : UNLIMITED RemainingTime : -1seconds
Thanks
I troubleshooted many things e.g. I checked the site settings, checked resources. Also, I went through the configurations multiple times. permissions etc are giving.
I am suspecting the java version here only.

Linked blog says nothing about mapreduce, only cluster setup (which I always recommend following official Apache Hadoop site, not 3rd party blogs).
No appenders could be found means you're missing log4j.properties file submitted with your job - See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
You won't be able to see the real runtime error/log output until you add that, e.g. if you've submitted your own jar built by maven/gradle to src/main/resources

Related

logback logs not getting exported to datadog by opentelemetry-collector

I have created a spring-boot application with open-telemetry. I have used spring-cloud-sleuth for exporting the traces to a open-telemetry collectors which ultimately is exporting these traces to datadog. I can see the exported traces in the datadog.
Now, I also have to add some logging to the application and open-telemetry does not support logging directly. So, I have used opentelemetry-logback-appender to export the logs also to datadog. I can see the log has same trace id and span id as the exported traces in the console. However, the logs are not getting forwarded to datadog.
My code :-
otel-collector-config.yaml :-
receivers:
otlp:
protocols:
grpc:
http:
processors:
batch:
exporters:
datadog:
api:
site: datadoghq.com
key: ${DD_API_KEY}
file:
path: /tmp/signals.json
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [datadog, logging, file]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [datadog, logging, file]
logs:
receivers: [otlp]
processors: [batch]
exporters: [logging, file]
Log in the console added with slf4j (logback):
spring-cloud-sleuth-otel-slf4j-spring-cloud-sleuth-otel-slf4j-1 | 09:25:45.835 [http-nio-8181-exec-1] ERROR com.uplight.web.MyController traceId: c9c54856c474a11e22e3716b6e97ec4b spanId: 569063cd0411d3a6 - Logging error using SLF4J LOGGER--------------------------------------------------------------------
As seen in image, the log is not available in the trace. Can someone please suggest if I am missing anything?

If you are running a version of the collector less than 0.61.0, which added logs support for the datadogexporter (#2651), then if you update your collector and add datadog to the logs pipeline, logs should appear in Datadog.

Hadoop : There are 1 datanode(s) running and 1 node(s) are excluded in this operation

We are running Hadoop on another machine in the same network cable to put files in Hadoop from the same system but when the application is trying to put the file in Hadoop getting the following error.
Hadoop setup is a single node setup.
Getting error with Hadoop
Fetching value from cache - key: 254af6643e1bf515123de68e6ea6b3256e0b9f11
Found file import entry for datasetId: 254af6643e1bf515123de68e6ea6b3256e0b9f11
21/09/12 09:55:52 WARN DataStreamer: Abandoning BP-452841745-127.0.0.1-1631439723175:blk_1073741825_1001
21/09/12 09:55:52 WARN DataStreamer: Excluding datanode DatanodeInfoWithStorage[127.0.0.1:9866,DS-2b79652d-f07c-4924-a901-7f38141af2b5,DISK]
21/09/12 09:55:52 WARN DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/test/uploadedDatasets/254af6643e1bf515123de68e6ea6b3256e0b9f11/_temporary/0/_
temporary/attempt_20210912095551948982914863741166_0023_m_000000_0/part-00000
could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation
Running
./start-dfs.sh
Running JPS
1747400 DataNode
1747647 SecondaryNameNode
1763807 Jps
1747191 NameNode

Hadoop Docker Setup - WordCount Tutorial

I was following the tutorial to run WordCount.java mentioned in here and when I run the following line in the tutorial
hadoop jar wordcount.jar org.myorg.WordCount /user/cloudera/wordcount/input /user/cloudera/wordcount/output
I get the following error -
17/09/04 01:57:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/09/04 01:57:30 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
The docker image that I used was docker pull cloudera/quickstart
There were no setup tutorials for Hadoop with Docker so it would be helpful if you could tell me the configurations that are to be made to overcome these issues.

That tutorial assumes you are in the cluster with the Hadoop client command available, the Hadoop Services are started, and properly configured.
0.0.0.0:8032 is the default YARN resource manager, so you need to configure your HADOOP_CONF_DIR XML files (specifically yarn-site for this error) to point at the Docker container for the correct addresses of YARN. core and hdfs-site will need configured to point at HDFS as well.

Neo4j 3.2.1 from the command line fails with a java error

Trying to start Neo4j 3.2.1 from the command line (using Invoke-Neo4j console) fails with a java error. The application starts fine from the desktop icon. The command line option works fine with v 3.2.0
I've raised a bug but has anyone experienced this? I'm running Invoke-Console and get the following Java error - note the highlighted error
Error log: 2017-06-16 11:54:03.206+0000 INFO
[o.n.k.i.DiagnosticsManager] --- INITIALIZED diagnostics END ---
2017-06-16 11:54:03.551+0000 INFO [o.n.b.v.r.WorkerFactory] Bolt
Server extension loaded. 2017-06-16 11:54:03.552+0000 INFO
[o.n.b.v.r.WorkerFactory] Bolt enabled on 0.0.0.0:7687. 2017-06-16
11:54:03.722+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected
RecordFormat:StandardV3_2[v0.A.8] record format from store
D:\Apps\Neo4j CE 3.1.4\data\databases\graph.db 2017-06-16
11:54:03.750+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected
RecordFormat:StandardV3_2[v0.A.8] record format from store
D:\Apps\Neo4j CE 3.1.4\data\databases\graph.db 2017-06-16
11:54:03.751+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not
configured. Selected format from the store:
RecordFormat:StandardV3_2[v0.A.8] 2017-06-16 11:54:04.474+0000 INFO
[o.n.k.i.a.i.IndexingService] IndexingService.init: indexes not
specifically mentioned above are ONLINE 2017-06-16 11:54:05.445+0000
WARN [o.n.k.NeoStoreDataSource] Exception occurred while starting the
datasource. Attempting to close things down. Component
'org.neo4j.kernel.recovery.Recovery#e9890a4' failed to initialize.
Please see attached cause exception.
org.neo4j.kernel.lifecycle.LifecycleException: Component
'org.neo4j.kernel.recovery.Recovery#e9890a4' failed to initialize.
Please see attached cause exception. at
org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:416)
at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:62) at
org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:98) at
org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:511)
at
org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at
org.neo4j.kernel.impl.transaction.state.DataSourceManager.start(DataSourceManager.java:100)
at
org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at
org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:205)
at
org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:124)
at
org.neo4j.server.CommunityNeoServer.lambda$static$0(CommunityNeoServer.java:58)
at
org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:89)
at
org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:434)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at
org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:211)
at
org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:107)
at
org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:80)
at
org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:28)
Caused by: java.lang.IllegalArgumentException: Unrecognized log entry
version -10. At position LogPosition{logVersion=0,
byteOffset=86193010} and entry version null at
org.neo4j.kernel.impl.transaction.log.entry.LogEntryVersion.byVersion(LogEntryVersion.java:162)
at
org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:97)
at
org.neo4j.kernel.impl.transaction.log.LogEntryCursor.next(LogEntryCursor.java:54)
at
org.neo4j.kernel.recovery.LatestCheckPointFinder.find(LatestCheckPointFinder.java:82)
at
org.neo4j.kernel.recovery.PositionToRecoverFrom.apply(PositionToRecoverFrom.java:89)
at
org.neo4j.kernel.recovery.DefaultRecoverySPI.getPositionToRecoverFrom(DefaultRecoverySPI.java:81)
at org.neo4j.kernel.recovery.Recovery.init(Recovery.java:80) at
org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:406)
... 18 more

I was able to solve this by re-extracting the entire zip file distribution, earlier I had only added the files under /bin to the Windows distribution
Summary: when using the zip file distribution to run neo4j from the command line, use the full zip file distribution

Error: Could not create the Java Virtual Machine. - Apache Hadoop

I am trying to run wordcount example already provided in hadoop on the following env: (Pseudodistributed mode)
Windows 7
Hadoop 2.7.1
JDK 1.7.x
RAM 4 GB
The jps command returns
C:\deploy\hadoop-2.7.1>jps
2336 ResourceManager
7500 NameNode
4984 Jps
6900 NodeManager
4940 DataNode
The command I use for setting the hadoop heap size
set HADOOP_HEAPSIZE=512
The command I use from the hadoop home installation directory is
bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
I see the following stack trace
C:\deploy\hadoop-2.7.1>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-exam
ples-2.7.1.jar wordcount /input /output
15/08/14 22:36:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0
:8032
15/08/14 22:36:27 INFO input.FileInputFormat: Total input paths to process : 1
15/08/14 22:36:28 INFO mapreduce.JobSubmitter: number of splits:1
15/08/14 22:36:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14
39571873038_0001
15/08/14 22:36:28 INFO impl.YarnClientImpl: Submitted application application_14
39571873038_0001
15/08/14 22:36:28 INFO mapreduce.Job: The url to track the job: http://XXX-PC
:8088/proxy/application_1439571873038_0001/
15/08/14 22:36:28 INFO mapreduce.Job: Running job: job_1439571873038_0001
15/08/14 22:36:37 INFO mapreduce.Job: Job job_1439571873038_0001 running in uber
mode : false
15/08/14 22:36:37 INFO mapreduce.Job: map 0% reduce 0%
15/08/14 22:36:37 INFO mapreduce.Job: Job job_1439571873038_0001 failed with sta
te FAILED due to: Application application_1439571873038_0001 failed 2 times due
to AM Container for appattempt_1439571873038_0001_000002 exited with exitCode:
1
For more detailed output, check application tracking page:http://XXX-PC:8088/
cluster/app/application_1439571873038_0001Then, click on links to logs of each a
ttempt.
Diagnostics: Exception from container-launch.
Container id: container_1439571873038_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
unchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
at java.lang.Thread.run(Thread.java:744)
Shell output: 1 file(s) moved.
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
15/08/14 22:36:37 INFO mapreduce.Job: Counters: 0
When I went to the Stderr logs as mentioned in the above stack trace, the actual error came out to be
Error: Could not create the Java Virtual Machine.
When I try to increase HADOOP_HEAPSIZE to 1024, the namenode, datanode and yarn daemons do not start at all and give me the same error of could not create java virtual machine
Has someone got the same problem ? How to solve this issue ?

You can solve this problem by editting the file /etc/hadoop/hadoop-env.sh like the one below:
export HADOOP_HEAPSIZE=3072

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.