Map reduce job throwing "EPERM: Operation not permitted" when using copyToLocalFile(-,-) method.? - java

I have written Map-Reduce code to copy a file from HDFS to Local and when i am running Map-Reduce job it is throwing below error.
Log Upload Time: Thu Jun 29 10:51:22 +0530 2017
Log Length: 78421677
Showing 1000000 bytes of 78421677 total. Click
here
for the full log.
t org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:295)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:393)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:922)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:903)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:800)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:368)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2016)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1985)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1961)
at com.mani.pmml_mr.PMMLMapper.map(PMMLMapper.java:64)
at com.mani.pmml_mr.PMMLMapper.map(PMMLMapper.java:35)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
EPERM: Operation not permitted
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:708)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:295)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:393)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:922)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:903)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:800)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:368)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2016)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1985)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1961)
In my code i am using copyToLocalFile() method but i am not sure why it is throwing errors. I gave full permissions (777) to the local folder where the file will be copied. But still it is throwing some error.
And another problem i noticed is, even the job is throwing error (Which i noticed from Job Browser of Hue), still in console it is showing job is successful.
17/06/29 10:51:16 INFO mapreduce.Job: Job job_1495430640647_0231 completed successfully
Can someone please help me in this.

Related

Spark possible race condition in driver

I have a Spark job that processes several folders on S3 per run and stores its state on DynamoDB. In other words, we're running the job once per day, it looks for new folders added by another job, transforms them one-by-one and writes state to DynamoDB. Here's rough pseudocode:
object App {
val allFolders = S3Folders.list()
val foldersToProcess = DynamoDBState.getFoldersToProcess(allFolders)
Transformer.run(foldersToProcess)
}
object Transformer {
def run(folders: List[String]): Unit = {
val sc = new SparkContext()
folders.foreach(process(sc, _))
}
def process(sc: SparkContext, folder: String): Unit = ??? // transform and write to S3
}
This approach works well if S3Folders.list() returns relatively small amount of folders (up to few thousands), if it returns more (4-8K) very often we see following error (that in first glance has nothing to do with Spark):
17/10/31 08:38:20 ERROR ApplicationMaster: User class threw exception: shadeaws.SdkClientException: Failed to sanitize XML document destined for handler class shadeaws.services.s3.model.transform.XmlResponses
SaxParser$ListObjectsV2Handler
shadeaws.SdkClientException: Failed to sanitize XML document destined for handler class shadeaws.services.s3.model.transform.XmlResponsesSaxParser$ListObjectsV2Handler
at shadeaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214)
at shadeaws.services.s3.model.transform.XmlResponsesSaxParser.parseListObjectsV2Response(XmlResponsesSaxParser.java:315)
at shadeaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:88)
at shadeaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
at shadeaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at shadeaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at shadeaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at shadeaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1553)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1271)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055)
at shadeaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at shadeaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at shadeaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at shadeaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at shadeaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at shadeaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4247)
at shadeaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194)
at shadeaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4188)
at shadeaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:865)
at me.chuwy.transform.S3Folders$.com$chuwy$transform$S3Folders$$isGlacierified(S3Folders.scala:136)
at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:267)
at scala.collection.AbstractTraversable.filterNot(Traversable.scala:104)
at me.chuwy.transform.S3Folders$.list(S3Folders.scala:112)
at me.chuwy.transform.Main$.main(Main.scala:22)
at me.chuwy.transform.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: shadeaws.AbortedException:
at shadeaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53)
at shadeaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81)
at shadeaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.read1(BufferedReader.java:210)
at java.io.BufferedReader.read(BufferedReader.java:286)
at java.io.Reader.read(Reader.java:140)
at shadeaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:186)
... 36 more
For big amount of folders (~20K) this happens all the time and job cannot start.
Previously we had very similar, but much more frequent error when getFoldersToProcess did GetItem for every folder from allFolders and therefore took much longer:
17/09/30 14:46:07 ERROR ApplicationMaster: User class threw exception: shadeaws.AbortedException:
shadeaws.AbortedException:
at shadeaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:51)
at shadeaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at shadeaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:489)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:126)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:215)
at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1240)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:802)
at shadeaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:109)
at shadeaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:43)
at shadeaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at shadeaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1503)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1226)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
at shadeaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
at shadeaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at shadeaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at shadeaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at shadeaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2089)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2065)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:1173)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:1149)
at me.chuwy.tranform.sdk.Manifest$.contains(Manifest.scala:179)
at me.chuwy.tranform.DynamoDBState$$anonfun$getUnprocessed$1.apply(ProcessManifest.scala:44)
at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:267)
at scala.collection.AbstractTraversable.filterNot(Traversable.scala:104)
at me.chuwy.transform.DynamoDBState$.getFoldersToProcess(DynamoDBState.scala:44)
at me.chuwy.transform.Main$.main(Main.scala:19)
at me.chuwy.transform.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
I believe that current error has nothing to do with XML parsing or invalid response, but originate from some race condition inside Spark, because:
There's clear connection between amount of time "state-fetching" takes and chance of failure
Tracebacks have underlying AbortedException, which AFAIK caused by swallowed InterruptedException, which can mean something inside JVM (spark-submit or even YARN) calls Thread.sleep for main thread.
Right now I'm using EMR AMI 5.5.0, Spark 2.1.0 and shaded AWS SDK 1.11.208, but had similar error with AWS SDK 1.10.75.
I'm deploying this job on EMR via command-runner.jar spark-submit --deploy-mode cluster --class ....
Does anyone have any idea where does this exception originate from and how to fix it?
foreach does not guarantee orderly computations and it applies the operation(s) to each element of an RDD, meaning that it will instantiate for every element which, in turn, may overwhelm the executor.
The problem was that getFoldersToProcess is a blocking (and very long) operation, which prevents SparkContext from being instantiated. SpackContext itself should signal about own instantiation to YARN and if it doesn't help in a certain amount of time - YARN assumes that driver node has fallen off and kills the whole cluster.

How to determine maximum amount of data that can be handled by 1 run of MR2 job?

I am running a YARN job on CDH 5.3 cluster. I have default configurations.
No of nodes=3
yarn.nodemanager.resource.cpu-vcores=8
yarn.nodemanager.resource.memory-mb=10GB
mapreduce.[map/reduce].cpu.vcores=1
mapreduce.[map/reduce].memory.mb=1GB
mapreduce.[map | reduce].java.opts.max.heap=756MB
While doing a run on 4.5GB csv data spread over 11 files ,I get following error:
2015-10-12 05:21:04,507 FATAL [IPC Server handler 18 on 50388] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1444634391081_0005_r_000000_0 - exited : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#9
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
Then I tuned mapreduce.reduce.memory.mb=1GB to mapreduce.reduce.memory.mb=3GB and job runned fine.
So how to decide on how much data maximum can be handled by 1 reducer assuming that all the input to mapper have to be processed by 1 reducer only?
Generally there is no limitation on the data that can be processed by a single reducer. The memory allocation can slow down the process but must not restrict or fail to process the data. I believe after allocating minimum memory to reducer the data processing should not be an issue. Can u pls share some code snippet to check for any memory leak issues.
We used to process 6+Gb of file in a single reducer withou any issues. I believe you might be having memory leak issues.

How to run MR job in Normal privilege

I have installed Hadoop 2.3.0 and able to execute MR jobs successfully. But when I trying to execute MR jobs in normal privilege (without admin privilege) means job get fails with following exception.
I tried "WordCount.jar" sample.
14/10/28 09:16:12 INFO mapreduce.Job: Task Id : attempt_1414467725299_0002_r_000
000_1, Status : FAILED
Error: java.lang.NullPointerException
at org.apache.hadoop.mapred.Task.getFsStatistics(Task.java:347)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(Re
duceTask.java:478)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:414
)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
By debugging the source I drill down the problem occurs at class YarnChild.java
childUGI.doAs(new PrivilegedExceptionAction<Object>() {
#Override
public Object run() throws Exception {
// use job-specified working directory
FileSystem.get(job).setWorkingDirectory(job.getWorkingDirectory());
taskFinal.run(job, umbilical); // run the task
return null;
}
});
But If I start "NodeManager" with admin privilege mean the above exception won't occur. I don't know why MR job not working when I start "NodeManager" with normal privilege.
If anyone know the reason and solution for above problem. Please do me the favor as soon as possible.

Runtime partition failed for this job in Hama BSP

I encountered the following problem when start running a hama BSP job. This exception occurs when hama tries to load and partition the input data before it actually runs my own code. This is a known problem discussed in some websites but unfortunate without a known cause (eg. see here).
My BSP job works perfectly ok when I only runs part of the data set. However, when I run the full data set, the problem occurs :(
Can I know how to resolve or avoid this problem?
13/11/18 01:19:30 INFO bsp.FileInputFormat: Total input paths to process : 32
13/11/18 01:19:30 INFO bsp.FileInputFormat: Total input paths to process : 32
13/11/18 01:19:30 INFO bsp.BSPJobClient: Running job: job_201311180115_0002
13/11/18 01:19:33 INFO bsp.BSPJobClient: Current supersteps number: 0
13/11/18 01:19:33 INFO bsp.BSPJobClient: Job failed.
13/11/18 01:19:33 ERROR bsp.BSPJobClient: Error partitioning the input path.
java.io.IOException: Runtime partition failed for the job.
at org.apache.hama.bsp.BSPJobClient.partition(BSPJobClient.java:465)
at org.apache.hama.bsp.BSPJobClient.submitJobInternal(BSPJobClient.java:333)
at org.apache.hama.bsp.BSPJobClient.submitJob(BSPJobClient.java:293)
at org.apache.hama.bsp.BSPJob.submit(BSPJob.java:228)
at org.apache.hama.bsp.BSPJob.waitForCompletion(BSPJob.java:235)
at edu.wisc.cs.db.opener.hama.ConnectedEntityBspDriver.main(ConnectedEntityBspDriver.java:183)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hama.util.RunJar.main(RunJar.java:146)
After stuck at this problem for several hours, I found that once the number of input files is greater than the number of allowed bsp tasks, then this error will occur. I think it is probably a bug that Hama should fix in the future.
A quick fix to this problem is to increase the number of maximum bsp tasks, specified by the variable bsp.tasks.maximum in the hama-site.xml file. For example, the following uses 10 instead of the default setting 3:
<property>
<name>bsp.tasks.maximum</name>
<value>10</value>
<description>The maximum number of BSP tasks that will be run simultaneously
by a groom server.</description>
</property>

Q job unsuccessful mahout ssvd

I'm trying to run ssvd on some tfidf-vectors in mahout. When I run it in Java code as follows (with mahout 0.6 jars), it works fine:
public static void main(String[] args){
runSSVDOnSparseVectors(vectorOutputPath
+ "/tfidf-vectors/part-r-00000", ssvdOutputPath, 1, 0, 30000, 1);
}
private static void runSSVDOnSparseVectors(String inputPath, String outputPath,
int rank, int oversampling, int blocks,
int reduceTasks) throws IOException {
Configuration conf = new Configuration();
// get number of reduce tasks from config?
SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path(
inputPath) }, new Path(outputPath), blocks, rank, oversampling,
reduceTasks);
solver.setcUHalfSigma(true);
solver.setcVHalfSigma(true);
solver.run();
}
I decided that I wanted to convert it to a bash script and just use the cli command instead, but when I do, I get the following error (tried this on version 0.5 and 0.7, neither worked. I could try 0.6 but I don't think it's a version thing):
[username#hostname lsa]$ $MAHOUT/mahout ssvd -i $H/test_lsa/v_out/tfidf-vectors -o $H/test_lsa/svd_out -k 1 -p 0 -r 30000 -t 1
Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /usr/lib/mahout-distribution-0.7/mahout-examples-0.7-job.jar
12/07/23 15:00:47 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[200000], --blockHeight=[30000], --broadcast=[true], --computeU=[true], --computeV=[true], --endPhase=[2147483647], --input=[/path/to/folder/test_lsa/v_out/tfidf-vectors], --minSplitSize=[-1], --outerProdBlockHeight=[30000], --output=[/path/to/folder/test_lsa/svd_out], --oversampling=[0], --pca=[false], --powerIter=[0], --rank=[1], --reduceTasks=[100], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]}
12/07/23 15:00:49 INFO input.FileInputFormat: Total input paths to process : 100
Exception in thread "main" java.io.IOException: Q job unsuccessful.
at org.apache.mahout.math.hadoop.stochasticsvd.QJob.run(QJob.java:230)
at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:377)
at org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:141)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:171)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
I'm running this in distributed mode on a cluster. I've read that Q job failure can have something to do with block size, but mine is greater than p+k. I also realize I'm using a ridiculously small input (4 vectors), but like I said, it works in the java code. I'm mostly baffled as to why it would work in java but not in the CLI. I'm pretty sure I've got all of the same arguments to the function. I can always just package up the java code into a jar and put it into the bash script, but that would be pretty hacky...
The log for the job says:
2012-07-23 15:00:55,413 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2012-07-23 15:00:55,417 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin#6ce53220
2012-07-23 15:00:55,638 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2012-07-23 15:00:55,697 ERROR org.apache.mahout.common.IOUtils: new m can't be less than n
java.lang.IllegalArgumentException: new m can't be less than n
at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org. apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-23 15:00:55,731 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-07-23 15:00:55,733 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalArgumentException: new m can't be less than n
at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-07-23 15:00:55,736 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
Thanks for the help in advance.
I actually think this was because there were some sequence files in tfidf-vectors that were empty, because I was using too many reducers. This seems like a bug to me.

Categories

Resources