I'm new in Hadoop. This time I'm realising word counter for inputted keyword. Also I read that using Job class better than JobConf. So I have this code for main class:
...
Configuration conf = new Configuration();
conf.set("keyword", args[0]);
Job job = new Job(conf);
...
So how I can get my keyword in Mapper back? As I understand I need to get my Job object and from Job get Configuration object using getConfiguration() method and than call get("keyword") method.
But how I need to get Job from Mapper class?
Thank's for your time.
When map is called on your Mapper implementation it is passed a Context object that exposes a getConfiguration method. This will give you what you want.
The code you have used to set the parameter in conf looks alright.
From the mapper function, this is what you need to do:
Configuration conf = context.getConfiguration();
String keyword = conf.get("keyword");
Related
I wanted to pass the collection of String as a step parameter.
Since I didn't find a way to construct JobParameter for collection, I decided to pass it as a string with comma-separated values.
My code to execute the job:
#Autowired
private JobLauncher jobLauncher;
#Autowired
private Job myJob;
public void execute() {
List<String> myCollection = getMyCollection();
jobLauncher.run(myJob, new JobParameters(ImmutableMap.<String, JobParameter> builder()
.put("myCollection", new JobParameter(String.join(",", myCollection)))
.build())
...
}
I define the Step as follows:
#Bean
#StepScope
public Step myStep(#Value("#{jobParameters['myCollection']}") String myCollectionString) {
List<String> myCollection = ArrayUtil.asList(lisReferencesString.split(","));
...
}
But when execution is started I'm getting the error:
org.postgresql.util.PSQLException: ERROR: value too long for type character varying(250)
Since the job params are stored as a column value, I can't pass too long strings as a param.
Could you suggest how I could overcome it?
The default length of job parameters of type String is 250, see BATCH_JOB_EXECUTION_PARAMS. The scripts provided by Spring Batch are just a starting point, you can update them as needed. So in your case, you need to increase the length of BATCH_JOB_EXECUTION_PARAMS#STRING_VAL as required.
I think there is no way you can pass the collection as job parameter. You can probably
Split the string with 250 characters each and send them in multiple parameters
Save the parameters somewhere in temp table or file and read in the job wherever required.
Please check these threads
how to send a custom object as Job Parameter in Spring Batch?
ArrayList cannot be cast to org.springframework.batch.core.JobParameter
My project needs to build a file containing logs and load it to S3.
To do this, whenever a Spring Batch job is ran, I create a file like this:
String startTime = new SimpleDateFormat("yyyy-MM-dd_HH-mm-ss").format(new Date());
new File(startTime + "_error_logs");
I then add it to a set of JobParameters which get passed to my JobLauncher like this:
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
JobParameters param = new JobParametersBuilder()
.addString("startTime", startTime)
.toJobParameters();
jobLauncher.run(getJob(), param);
Then, throughout the project, I want to be able to access this job parameter. For example, I can get it in my JobCompletionNotificationListener class which in the afterJob function, takes JobExecution as a parameter, meaning that I can do this:
String startTime = jobExecution.getJobParameters().getString("startTime");
However, in the GlobalControllerExceptionHandler class, I don't have access to this JobExecution object meaning that I cannot get hold of the startTime parameter.
Is there anyway I can pass data to it? Or is there a better approach to this problem? I know that I will have the same issue in other classes. Other approaches I have thought of won't work either, for example, storing the String in a file won't work because then if another job is ran in parallel, it will become confused.
You can inject a JobExplorer in your GlobalControllerExceptionHandler and query for the jobExecution you are interested in. Once you get a handle to the execution you want, you can get the parameter as you mentioned:
String startTime = jobExecution.getJobParameters().getString("startTime");
I am trying to write a Spring Batch job that has two steps in it. They are both the same step but with different file locations. As such I need to pass multiple strings into the job bean to let the step know where to send the different files. However, it I try to pass the Resource values, I get a NoSuchBeanDefinitionException. The answers I found to this is that I need to add a #Value to the bean to tell it that the beans needs to a parameter to work.
But that is for only one value.
Is there a way to pass multiple #Values to a bean using java configuration? Below is the code I am using.
#Value("#{'${batch.outputFile}'}")
Resource outputFilePath;
#Value("#{'${batch.outputFileTrg}'}")
Resource outputFilePathTrg;
#Bean
public Step toServerStep(Resource outputFile) {
return stepBuilderFactory.get("toServerStep")
.chunk(1)
.reader(xmlFileItemReader())
.writer((ItemWriter<? super Object>) flatFileItemWriter(outputFile))
.build();
}
#Bean
public Job fileToServerJob(JobBuilderFactory jobBuilderFactory){
return jobBuilderFactory.get("fileToServerJob")
.start(toServerStep(outputFilePath1))
.next(toServerStep(outputFilePath2))
.build();
}
You can pass a delimited String as the property and break it apart into a list for your Value object.
#Value("#{'${batch.outputFiles}'.split(',')}")
private List<String> outputFilePaths;
With your application.property with the following
batch.outputFiles=/tmp/a,/tmp/b,/tmp/c
You can then use these path strings to grab the appropriate Resource to be used by your writer.
You are putting in a string, but shouldn't you put in a Rescource? Got one not so nice example that once worked for me here. Maby it can help you.
I am trying to write an unit test for a hadoop job. The catch is that the mapper uses the Context argument passed on to it in order to determine which file is being read by it at that moment. It makes the following call:
String inputFile = ((FileSplit) context.getInputSplit()).getPath().toString();
However, while writing a unit test for the mapper using MRUnit, I cant seem to find any way to mock out this Context object. Even MapDriver does not seem to have any option for setting a new Context object. Is there a way I can write a unit test for this mapper class?
MockInputSplit is what you need:
http://mrunit.apache.org/documentation/javadocs/0.9.0-incubating/org/apache/hadoop/mrunit/mock/MockInputSplit.html
I have a Spring Batch job which takes parameters, and the parameters are usually the same every time the job is run. By default, Spring Batch doesn't let you re-use the same parameters like that... so I created a simple incrementer and added it to my job like this:
http://numberformat.wordpress.com/2010/02/07/multiple-batch-runs-with-spring-batch/
When using the standard CommandLineJobRunner to run my job, I have to pass the -next parameter in order for my incrementer to be used.
However, when I run an end-to-end job test from within a JUnit class, using JobLauncherTestUtils.launchJob( JobParameters )... I can't find a way to declare that my incrementer should be used. The job is just quietly skipped, presumably because it has already been run with those parameters (see note below).
The JobParameters class is meant to hold a collection of name-value pairs... but the -next parameter is different. It starts with a dash, and has no corresponding value. I tried various experiments, but trying to add something to the JobParameters collection doesn't seem to be the ticket.
Does anyone know the JUnit equivalent to passing -next to CommandLineJobRunner?
NOTE: I presume that the the issue is my incrementer being ignored, because:
The job works the first time, and it works if I wipe out the job repository database. It only fails on retries.
The job works fine, retries and all, when I hardcode the variables and remove the parameters altogether.
JobLauncherTestUtils class contains a getUniqueJobParameters method which serves exactly the same need.
/**
* #return a new JobParameters object containing only a parameter for the
* current timestamp, to ensure that the job instance will be unique.
*/
public JobParameters getUniqueJobParameters() {Map<String, JobParameter>
parameters = new HashMap<String, JobParameter>();
parameters.put("random", new JobParameter((long) (Math.random() * JOB_PARAMETER_MAXIMUM)));
return new JobParameters(parameters);
}
Sample usage would be,
JobParameters params = new JobParametersBuilder (jobLauncherTestUtils.getUniqueJobParameters()).toJobParameters();
//extra parameters to be added
JobExecution jobExecution = jobLauncherTestUtils.launchJob(params);