Reading a file using map reduce in hadoop

Reading a file using map reduce in hadoop - java

I am trying to write java code in map reduce form it ran in Eclipse but when am trying to implement in map reduce form i am getting this error please help me what does this error means and how to fix this
16/07/15 14:05:17 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/07/15 14:05:17 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/07/15 14:05:17 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/07/15 14:05:18 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/07/15 14:05:18 INFO mapred.FileInputFormat: Total input paths to process : 2
16/07/15 14:05:18 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/hadoop1148163758/.staging/job_local1148163758_0001
Exception in thread "main" java.io.IOException: Not a file: hdfs://localhost:54310/TcTest/NewTest
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:320)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
at TextClassification.run(TextClassification.java:38)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at TextClassification.main(TextClassification.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

You need to provide the complete path of the input file as arguments to your program. Right click on the program --> Run configurations-->Arguments
Before this you need to add few jar files from your hadoop-version/share/hadoop folder. You can refer to the below blog for complete details on how to run map reduce program through eclipse in local mode.
https://acadgild.com/blog/running-mapreduce-in-local-mode-2/

Related

Why apache spark does not work with java 10? We get illegal reflective then java.lang.IllegalArgumentException

Is there any technical reason why spark 2.3 does not work with java 1.10 (as of July 2018)?
Here is the output when I run SparkPi example using spark-submit.
$ ./bin/spark-submit ./examples/src/main/python/pi.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2018-07-13 14:31:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-07-13 14:31:31 INFO SparkContext:54 - Running Spark version 2.3.1
2018-07-13 14:31:31 INFO SparkContext:54 - Submitted application: PythonPi
2018-07-13 14:31:31 INFO Utils:54 - Successfully started service 'sparkDriver' on port 58681.
2018-07-13 14:31:31 INFO SparkEnv:54 - Registering MapOutputTracker
2018-07-13 14:31:31 INFO SparkEnv:54 - Registering BlockManagerMaster
2018-07-13 14:31:31 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-07-13 14:31:31 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-07-13 14:31:31 INFO DiskBlockManager:54 - Created local directory at /private/var/folders/mp/9hp4l4md4dqgmgyv7g58gbq0ks62rk/T/blockmgr-d24fab4c-c858-4cd8-9b6a-97b02aa630a5
2018-07-13 14:31:31 INFO MemoryStore:54 - MemoryStore started with capacity 434.4 MB
2018-07-13 14:31:31 INFO SparkEnv:54 - Registering OutputCommitCoordinator
...
2018-07-13 14:31:32 INFO StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
Traceback (most recent call last):
File "~/Documents/spark-2.3.1-bin-hadoop2.7/./examples/src/main/python/pi.py", line 44, in <module>
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 862, in reduce
File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in collect
File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2299)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:162)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:844)
2018-07-13 14:31:33 INFO SparkContext:54 - Invoking stop() from shutdown hook
...
I resolved the issue by switching to Java8 instead of Java10 as mentioned here.

Primary technical reason is that Spark depends heavily on direct access to native memory with sun.misc.Unsafe, which has been made private in Java 9.
https://issues.apache.org/jira/browse/SPARK-24421
http://apache-spark-developers-list.1001551.n3.nabble.com/Java-9-td20875.html

Committer here. It's actually a fair bit of work to support Java 9+: SPARK-24417
It's also almost done and should be ready for Spark 3.0, which should run on Java 8 through 11 and beyond.
The goal (well, mine) is to make it work without opening up module access. The key issues include:
sun.misc.Unsafe usage has to be removed or worked around
Changes to the structure of boot classloader
Scala support for Java 9+
A bunch of dependency updates to work with Java 9+
JAXB no longer automatically available

Spark depends on the memory API's which has been changed in JDK 9 so it is not available starting JDK 9.
And that is the reason for this.
Please check the issue:
https://issues.apache.org/jira/browse/SPARK-24421

Running mahout jar on emr gives Wrong FS error for s3n://*/ input link

I need to run the seqdirectory command on emr.
I created a Fatjar(without specifying any main class) including the mahout-integration-0.9, mahout-core-0.9 and the commons-cli-2.0-mahout jar files.
and gave the following jar arguments-
org.apache.mahout.text.SequenceFilesFromDirectory -i s3n://mybucket/dataset/2013-04-20-22/txt/ -o s3n://mybucket/sequence_files/ -c UTF-8
It gives a wrong FS msg on stderr-
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3n://trecdata/dataset/2013-04-20-22/txt, expected: hdfs://172.31.27.158:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:647)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:162)
at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
and this on syslog-
2015-07-13 21:34:18,684 INFO org.apache.mahout.common.AbstractJob (main): Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[s3n://trecdata/dataset/2013-04-20-22/txt/], --keyPrefix=[], --method=[mapreduce], --output=[s3n://trecdata/sequence_files/], --startPhase=[0], --tempDir=[temp]}
2015-07-13 21:34:19,103 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2015-07-13 21:34:19,103 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
2015-07-13 21:34:19,103 INFO org.apache.hadoop.conf.Configuration.deprecation (main): mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir

Hadoop jobs.setJar does not working for jars on HDFS

I am trying to solve an issue when a Hadoop app throws java.lang.ClassNotFoundException:
WARN mapreduce.FaunusCompiler: Using the distribution Faunus job jar: ../lib/faunus-0.4.4-hadoop2-job.jar
INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)
INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: VerticesMap.Map > CountMapReduce.Map > CountMapReduce.Reduce
INFO mapreduce.FaunusCompiler: Job data location: output/job-0
INFO client.RMProxy: Connecting to ResourceManager at yuriys-bigdata3/172.31.8.161:8032
WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1402963354379_0016
INFO impl.YarnClientImpl: Submitted application application_1402963354379_0016
INFO mapreduce.Job: The url to track the job: http://local-bigdata3:8088/proxy/application_1402963354379_0016/
INFO mapreduce.Job: Running job: job_1402963354379_0016
INFO mapreduce.Job: Job job_1402963354379_0016 running in uber mode : false
INFO mapreduce.Job: map 0% reduce 0%
INFO mapreduce.Job: Task Id : attempt_1402963354379_0016_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
com.tinkerpop.blueprints.util.DefaultVertexQuery
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
The app does create a "fat" jar file, where all the dependency jars (including the one that contains the not found class) are included under the lib node
The app does set the Job.setJar on this fat jar file.
The code does not do anything strange:
job.setJar(hadoopFileJar);
...
boolean success = job.waitForCompletion(true);
Besides, I looked up the configuration in the yarn-site.xml and verified that a job dir under yarn.nodemanager.local-dirs does contain that jar (it is renamed to job.jar though) and also that lib directory with extracted jars in them.
i.e. the jar that contains the missing class is there. Yarn/MR recreates this dir with all these required files after each job schedule, so the files do get transferred there.
I've discovered so far, is that the classpath environment variable on the java worker processes that execute the failing code is set as
C:\hdp\data\hadoop\local\usercache\user\appcache\application_1402963354379_0013\container_1402963354379_0013_02_000001\classpath-3824944728798396318.jar
and this jar just contains a manifest.mf That manifest contains paths to the directory with the "fat.jar" file and its directories (original formatting saved):
file:/c:/hdp/data/hadoop/loc al/usercache/user/appcache/application_1402963354379_0013/container
_1402963354379_0013_02_000001/job.jar/job.jar file:/c:/hdp/data/hadoo p/local/usercache/user/appcache/application_1402963354379_0013/cont ainer_1402963354379_0013_02_000001/job.jar/classes/ file:/c:/hdp/data /hadoop/local/usercache/user/appcache/application_1402963354379_001 3/container_1402963354379_0013_02_000001/jobSubmitDir/job.splitmetain fo file:/c:/hdp/data/hadoop/local/usercache/user/appcache/applicati on_1402963354379_0013/container_1402963354379_0013_02_000001/jobSubmi tDir/job.split file:/c:/hdp/data/hadoop/local/usercache/user/appcac he/application_1402963354379_0013/container_1402963354379_0013_02_000 001/job.xml file:/c:/hdp/data/hadoop/local/usercache/user/appcache/ application_1402963354379_0013/container_1402963354379_0013_02_000001 /job.jar/
However, this path does not explicitly adds the jars in the directories, i.e. the directory from the above manifest
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/
does contain the jar file with the class that is not being found by yarn (as this directory contains all the jars from the "fat" jar lib section), but for JAVA world this kind of setting of classpath seems incorrect – this directory was supposed to be included with star*,
e.g:
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/*
What I am doing wrong with passing dependencies to Yarn?
Could cluster configuration be an issue or possibly this is a bug on my Hadoop distro (HDP 2.1, Windows x64)?

Hadoop-LZO strange native-lzo library not available error

I've installed the Cloudera Hadoop-LZO package and added the following settings into my client environment safety valve:
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native
However, I get the strangest native-lzo library not available error:
13/08/05 23:59:06 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/08/05 23:59:06 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6298911ef75545c61859c08add6a74a83e0183ad]
13/08/05 23:59:07 INFO mapred.JobClient: Running job: job_201308052350_0003
13/08/05 23:59:08 INFO mapred.JobClient: map 0% reduce 0%
13/08/05 23:59:18 INFO mapred.JobClient: Task Id : attempt_201308052350_0003_m_000000_0, Status : FAILED
java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzopCodec.getDecompressorType(LzopCodec.java:96)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:131)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Why would it say that the native-lzo library was loaded successfully, and then complain that the native-lzo library was not available? Are these exceptions coming out of the DataNodes?

The issue was that we did not have lzop installed on the datanodes themselves! After the following instruction, all was well:
sudo apt-get install lzop
Hope that helps!

Cassandra's server starting error on OS X

I'm trying to set up apache cassandra on my OS X macbook. I'm trying to test NoSql databases, I've already set up memcached and redis. But with cassandra I'm having
I'm using jdk 1.7.0_09 from oracle
I've followed the installation instructions, but when I'm trying to start the server, I get this the issue from the console:
MacBook-Air-Urij:bin urijvoskresenskij$ ./cassandra -f
xss = -ea -javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M -Xmn200M -XX:+HeapDumpOnOutOfMemoryError
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /var/log/cassandra/system.log (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:212)
at java.io.FileOutputStream.<init>(FileOutputStream.java:136)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.RollingFileAppender.setFile(RollingFileAppender.java:207)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:395)
at org.apache.log4j.PropertyWatchdog.doOnChange(PropertyConfigurator.java:922)
at org.apache.log4j.helpers.FileWatchdog.checkAndConfigure(FileWatchdog.java:89)
at org.apache.log4j.helpers.FileWatchdog.<init>(FileWatchdog.java:58)
at org.apache.log4j.PropertyWatchdog.<init>(PropertyConfigurator.java:914)
at org.apache.log4j.PropertyConfigurator.configureAndWatch(PropertyConfigurator.java:461)
at org.apache.cassandra.service.AbstractCassandraDaemon.initLog4j(AbstractCassandraDaemon.java:100)
at org.apache.cassandra.thrift.CassandraDaemon.<clinit>(CassandraDaemon.java:61)
INFO 17:40:17,717 Logging initialized
INFO 17:40:17,720 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.7.0_09
INFO 17:40:17,720 Heap size: 1052770304/1052770304
INFO 17:40:17,721 Classpath: ./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/antlr-3.2.jar:./../lib/apache-cassandra-1.1.7.jar:./../lib/apache-cassandra-clientutil-1.1.7.jar:./../lib/apache-cassandra-thrift-1.1.7.jar:./../lib/avro-1.4.0-fixes.jar:./../lib/avro-1.4.0-sources-fixes.jar:./../lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commons-lang-2.4.jar:./../lib/compress-lzf-0.8.4.jar:./../lib/concurrentlinkedhashmap-lru-1.3.jar:./../lib/guava-r08.jar:./../lib/high-scale-lib-1.1.2.jar:./../lib/jackson-core-asl-1.9.2.jar:./../lib/jackson-mapper-asl-1.9.2.jar:./../lib/jamm-0.2.5.jar:./../lib/jline-0.9.94.jar:./../lib/json-simple-1.1.jar:./../lib/libthrift-0.7.0.jar:./../lib/log4j-1.2.16.jar:./../lib/metrics-core-2.0.3.jar:./../lib/servlet-api-2.5-20081211.jar:./../lib/slf4j-api-1.6.1.jar:./../lib/slf4j-log4j12-1.6.1.jar:./../lib/snakeyaml-1.6.jar:./../lib/snappy-java-1.0.4.1.jar:./../lib/snaptree-0.1.jar:./../lib/jamm-0.2.5.jar
INFO 17:40:17,722 JNA not found. Native methods will be disabled.
INFO 17:40:17,734 Loading settings from file:/Users/urijvoskresenskij/apache-cassandra-1.1.7/conf/cassandra.yaml
INFO 17:40:17,924 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO 17:40:18,149 Global memtable threshold is enabled at 334MB
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
at org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:46)
at org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:56)
at org.apache.cassandra.io.compress.SnappyCompressor.<clinit>(SnappyCompressor.java:38)
at org.apache.cassandra.config.CFMetaData.<clinit>(CFMetaData.java:76)
at org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:84)
at org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:438)
at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:114)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:127)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
at java.lang.Runtime.loadLibrary0(Runtime.java:845)
at java.lang.System.loadLibrary(System.java:1084)
at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 17 more
WARN 17:40:18,269 Cannot initialize native Snappy library. Compression on new tables will be disabled.
ERROR 17:40:18,317 Exception encountered during startup
java.lang.AssertionError: Directory /var/lib/cassandra/data is not accessible.
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:155)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
java.lang.AssertionError: Directory /var/lib/cassandra/data is not accessible.
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:155)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:389)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
Exception encountered during startup: Directory /var/lib/cassandra/data is not accessible.
</pre></code>
I don't know how to solve the problem. Please, help me, guys :)

You don't have permission to write to /var/log, which is to be expected. You'll need to run Cassandra as sudo or create the directory yourself with the proper permissions.

config/cassandra.yaml - contains paths to all folders required by Cassandra. Change those paths to place where your user has write access so that you do not need to run it as sudo.

I think a potentially less intrusive solution for working on local dev machines is not to mess with root permissions at all. Just change the log locations to anywhere you want by editing the log4j properties file in the <cassandra_install_dir>/conf directory.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reading a file using map reduce in hadoop - java

Related

Why apache spark does not work with java 10? We get illegal reflective then java.lang.IllegalArgumentException

Running mahout jar on emr gives Wrong FS error for s3n://*/ input link

Hadoop jobs.setJar does not working for jars on HDFS

Hadoop-LZO strange native-lzo library not available error

Cassandra's server starting error on OS X

Categories

Resources