Extract JobID etc from Hadoop Job - java

I am running a Hadoop jar file inside a cluster. From the documentation, I know that Hadoop manages JobID, Start time etc. Is it possible to get the parameters so that we can show them on our web interface just to let user know how much time the job will consume (e.g. estimated duration)?

All the details shown in the Jobtracker UI can be obtained easily by using the APIs provided.
Use jobclient API refer : https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/JobClient.html
and Jobstatus api refer : https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/JobStatus.html
Using the combination of jobclient and jobstatus(jobsToComplete(), getAllJobs() ) you can retrieve the JobId . Once you get the jobId you can easily get all the other details by just calling the functions in the API.

Related

Create AEM packages via code

Is there a way to create an AEM package via a java code ?
We need to package some content every night via a service run by a cron job.
I checked online and it seems to be possible using a curl command. But either way, I'd need this done via a daily service running a java code.
Please refer to some of the links given below :
1)https://helpx.adobe.com/experience-manager/using/dynamic_aem_packages.html
2)http://cq5experiences.blogspot.in/2014/01/creating-packages-using-java-code-in-cq5.html
The main code goes something like this :
final JcrPackage jcrPackage = getPackageHelper().createPackageFromPathFilterSets(packageResources,
request.getResourceResolver().adaptTo(Session.class),
properties.get(PACKAGE_GROUP_NAME, getDefaultPackageGroupName()),
properties.get(PACKAGE_NAME, getDefaultPackageName()),
properties.get(PACKAGE_VERSION, DEFAULT_PACKAGE_VERSION),
PackageHelper.ConflictResolution.valueOf(properties.get(CONFLICT_RESOLUTION,
PackageHelper.ConflictResolution.IncrementVersion.toString())),
packageDefinitionProperties
);
So first of all you can create a scheduler and in the scheduler's run method you can write the logic to package the required filter paths .
Hoping this is helpful for you.

AS400 JOB Queue via Java jt400

I am just writing an Interface between a java application and an AS400.
For this purpose I use jt400. I managed to get information about the systemstatus like CPU usage, as well I managed to receive the current status about subsystems and jobs.
Now I am searching for an option to have a look at the different job queues inside the AS400.
For example: I would like to know, how many jobs are in which queue.
Is there a solution via jt400 or a different approach to access those information via java?
The corresponding command inside AS400 is WRKJOBQ
Best
LStrike
[Edit]
The following code is my filter for JobList. But how do I configure QSYSObjectPathName that it is matching WRKJOBQ?
QSYSObjectPathName path = new QSYSObjectPathName(.....);
JobList jList = new JobList(as400);
jList.addJobSelectionCriteria(JobList.SELECTION_PRIMARY_JOB_STATUS_JOBQ, true);
jList.addJobSelectionCriteria(JobList.SELECTION_JOB_QUEUE, path.getPath());
Job[] jobs = jList.getJobs(-1, 1);
System.out.println("Jobs Size: " + jobs.length);
You can use a JobList object for that, using SELECTION_JOB_QUEUE to filter jobs.
Once your selection suits your need, JobList#getLength() will give you the number of jobs.
See also this question

AS400 Job's Thread details

I've already been retrieved details of a specific AS/400 job by its job number. I have a problem. I want to get that specific jobs thread detail. Some jobs have multi threading. I need to get specific job's list of multi threads and thread details. I'm checked jt400 doc for finding some class for it. But I'm failing to find :(
Thank in Advance!
JobList jobList = new JobList(System);
jobList.clearJobSelectionCriteria();
jobList.addJobSelectionCriteria(JobList.SELECTION_JOB_NUMBER, jobNumber);
Enumeration list = jobList.getJobs();
while (list.hasMoreElements()) {
Job j = (Job) list.nextElement();
System.out.println(j.getName());
System.out.println(j.getStatus());
System.out.println(j.getOutputQueue());
}
The API you're looking for is QWCOLTHD. JTOpen 8.1 was recently released and I don't see the QWCOLTHD API implemented.
It looks like you either need to email the developers and ask for this API, or write the implementation yourself. JTOpen is open source; you can get the source code and see how similar APIs are implemented and then write the appropriate classes for QWCOLTHD.

Running Talend Job from within Java application

I am developing a web app using Spring MVC. Simply put, a user uploads a file which can be of different types (.csv, .xls, .txt, .xml) and the application parses this file and extracts data for further processing. The problem is that I format of the file can change frequently. So there must be some way for quick and easy customization. Being a bit familiar with Talend, I decided to give it a shot and use it as ETL tool for my app. This short tutorial shows how to run Talend job from within Java app - http://www.talendforge.org/forum/viewtopic.php?id=2901
However, jobs created using Talend can read from/write to physical files, directories or databases. Is it possible to modify Talend job so that it can be given some Java object as a parameter and then return Java object just as usual Java methods?
For example something like:
String[] param = new String[]{"John Doe"};
String talendJobOutput = teaPot.myjob_0_1.myJob.main(param);
where teaPot.myjob_0_1.myJob is the talend job integrated into my app
I did something similar I guess. I created a mapping in tallend using tMap and exported this as talend job (java se programm). If you include the libraries of that job, you can run the talend job as described by others.
To pass arbitrary java objects you can use the following methods which are present in every talend job:
public Object getValueObject() {
return this.valueObject;
}
public void setValueObject(Object valueObject) {
this.valueObject = valueObject;
}
In your job you have to cast this object. e.g. you can put in a List of HashMaps and use Java reflection to populate rows. Use tJavaFlex or a custom component for that.
Using this method I can adjust the mapping of my data visually in Talend, but still use the generated code as library in my java application.
Now I better understand your willing, I think this is NOT possible because Talend's architecture is made like a standalone app, with a "main" entry point merely as does the Java main() method :
public String[][] runJob(String[] args) {
int exitCode = runJobInTOS(args);
String[][] bufferValue = new String[][] { { Integer.toString(exitCode) } };
return bufferValue;
}
That is to say : the Talend execution entry point only accepts a String array as input and doesn't returns anything as output (except as a system return code).
So, you won't be able link to Talend (generated) code as a library but as an isolated tool that you can only parameterize (using context vars, see my other response) before launching.
You can see that in Talend help center or forum the only integration described is as an "external" job execution ... :
Talend knowledge base "Calling a Talend Job from an external Java application" article
Talend Community Forum "Java Object to Talend" topic
May be you have to rethink the architecture of your application if you want to use Talend as the ETL tool for your purpose.
Now from Talend ETL point of view : if you want to parameter the execution environment of your Jobs (for exemple the physical directory of the uploaded files), you should use context variables that can be loaded at execution time from a configuration file as mentioned here :
https://help.talend.com/display/TalendOpenStudioforDataIntegrationUserGuide53EN/2.6.6+Context+settings

Retrieving Jobs and/or Process

Is there an easy way to retrieve a job and check e.g. the status with Play?
I have a few encoding jobs/downloading jobs which run for a long time. In some cases I want to cancel them.
Is there a way to retrieve a list of Jobs or something?
E.g. one Job calls the FFMPEG encoder using the ProcessBuilder. I would like to be able to get this job and kill the process if it is not required (e.g. wrong file uploaded and don't want to wait for an hour before it is finished). If I can get a handle to that Job then I can get to the process as well.
I am using Play 1.2.4
See JobsPlugin.java to see how to list all the scheduledJobs.
Getting the task currently executed is more tricky but you can find your jobs in JobsPlugin.scheduledJobs list by checking Job class and call a method in your custom Job to tell him to cancel
Something like
for (Job<?> job : JobsPlugin.scheduledJobs) {
if (job instanceof MyJob) {
((MyJob) job).cancelWork();
}
}
where cancelWork is your custom method

Categories

Resources