Executing sub-transformation in a Pentaho Job executed using Java Springboot

Executing sub-transformation in a Pentaho Job executed using Java Springboot - java

I am trying to execute a Pentaho job and transformation using Springboot and I have been able to execute both of them successfully. But, the problem that arises is when I try to execute a Pentaho job that has transformations linked within it that I have connected using the ${Internal.Job.Filename.Directory} parameter. It works successfully in the Pentaho PDI but when I am trying to execute it using my Springboot code, I am faced with the following error:
2022/10/10 10:51:04 - data-fetch - Starting entry [Check S3 DB Connections]
2022-10-10T10:51:04.632+0530
(org.pentaho.di.job.Job) [http-nio-8085-exec-10] INFO - [src/main/resources/pentaho/data-fetch.kjb] Starting entry [Check S3 DB Connections]
2022/10/10 10:51:14 - data-fetch - Starting entry [S3-Transformation]
2022-10-10T10:51:14.828+0530
(org.pentaho.di.job.Job) [http-nio-8085-exec-10] INFO - [src/main/resources/pentaho/data-fetch.kjb] Starting entry [S3-Transformation]
2022/10/10 10:51:14 - S3-Transformation - ERROR (version 9.0.0.1-497, build 9.0.0.1-497 from 2020-03-19 08.25.00 by buildguy) : Unable to run job data-fetch. The S3-Transformation has an error. The transformation path ${Internal.Job.Filename.Directory}/S3-fetch.ktr is invalid, and will not run successfully.
2022/10/10 10:51:14 - S3-Transformation - ERROR (version 9.0.0.1-497, build 9.0.0.1-497 from 2020-03-19 08.25.00 by buildguy) : org.pentaho.di.core.exception.KettleXMLException:
2022/10/10 10:51:14 - S3-Transformation - The transformation path ${Internal.Job.Filename.Directory}/S3-fetch.ktr is invalid, and will not run successfully.
Is there a different parameter that I should be using?

So Pentaho doesn't provide the ability to automatically enable runtime variables and we need to explicitly provide them during execution. I added the following line of code to add it and it got executed successfully.
job.setVariable("Internal.Job.Filename.Directory", pentahoDir);
Where pentahoDir is the variable the points to the absolute path of the directory and needs to be set up by the user.

Related

Apache Livy : Could not find or load main class org.apache.livy.server.LivyServer

I am trying to start Apache Livy 0.8.0 server on my windows 10 machine for spark 3.1.2 and hadoop 3.2.1. I am taking help from here.. I have successfully built apache livy using maven (I have attached a of it) But I am not able to run the livy server. When I run it I get the following error -
> starting C:/AmazonJDK/jdk1.8.0_332/bin/java -cp /d/ApacheLivy/incubator-livy-master/incubator-livy-master/server/target/jars/*:/d/ApacheLivy/incubator-livy-master/incubator-livy-master/conf:D:/Program_files/spark/conf:D:/ApacheHadoop/hadoop-3.2.1/etc/hadoop: org.apache.livy.server.LivyServer, logging to D:/ApacheLivy/incubator-livy-master/incubator-livy-master/logs/livy--server.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch C:/AmazonJDK/jdk1.8.0_332/bin/java -cp /d/ApacheLivy/incubator-livy-master/incubator-livy-master/server/target/jars/*:/d/ApacheLivy/incubator-livy-master/incubator-livy-master/conf:D:/Program_files/spark/conf:D:/ApacheHadoop/hadoop-3.2.1/etc/hadoop: org.apache.livy.server.LivyServer:
Error: Could not find or load main class org.apache.livy.server.LivyServer
full log in D:/ApacheLivy/incubator-livy-master/incubator-livy-master/logs/livy--server.out
I am using Git bash. If you need more information I will provide

The error got resolved when I used Windows Subsystem for Linux (WSL).

Java home path differs

Hello I try to execute my project with bootRun on IntelliJ and I get the followign error:
Execution failed for task ':bootRun'.
> Process 'command '/usr/lib/jvm/java-1.8.0-openjdk- 1.8.0.101-1.b14.fc24.x86_64/bin/java'' finished with non-zero exit value 1
I checked the results of my JAVA paths and stuff and here there are
echo $JAVA_HOME
/home/mypc123/Downloads/jdk1.8.0_101/bin/java
$ which java
/usr/bin/java
I have jdk1.8.0 in /usr/bin
I looked more indepth and found this:
ERROR org.apache.tomcat.jdbc.pool.ConnectionPool - Unable to create initial connections of pool.
org.postgresql.util.PSQLException: FATAL: role "syn12" does not exist
However when I connect to postgresql I have syn12 role and all my gradle JVM's are in the form usr/lib/jvm/java.......
Well we got down to this : Can't load library: /opt/symmetry/ste/java/libste-java.so ,how can I install this library?

Seems like you have your project JDK pointed to another installation. The second issue is most likely different, like an incorrect JDBC url that happens to point to another, existing, database schema in which the required role does not exist.

Intellij IDEA doesn't use the $JAVA_HOME from your system but relies on its own JDK definitions.
It looks like your application is not starting because of the SQL error you found in the logs, and Spring Boot returns 1 because it failed to start.

Managed VM deploy failed because "env" setting is not supported

Trying to deploy a Java app to Google Appengine Managed VM. I'm using console gcloud and already prepared WAR file. Plus app.yaml.
Using following command:
gcloud preview app deploy ./build/libs/app.yaml
Right now it fails with:
Building and pushing image for module [default]
-------------------------------------------------------------------------------- DOCKER BUILD OUTPUT --------------------------------------------------------------------------------
Step 0 : FROM gcr.io/google_appengine/jetty9
---> 005014071b64
Step 1 : ADD webapp-webapp.war $JETTY_BASE/webapps/root.war
---> 3e9023930cc8
Removing intermediate container 342e8a2f5750
Successfully built 3e9023930cc8
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Beginning teardown of remote build environment (this may take a few seconds).
Updating module [default]...failed.
ERROR: (gcloud.preview.app.deploy) Error Response: [400] "env" setting is not supported for this deployment.
I see similar error (there) for maven-gcloud-plugin that happens when project is not configured as WAR. But notice that:
i'm using plain command line tool gcloud, a latest version
and my project is packaged into WAR already
Also i'm using following app.yaml (which i've got from maven plugin sources):
runtime: java
env: 2
api_version: 1
handlers:
- url: .*
script: dynamic
So the question, where from this error is coming from (docker image is already prepared at this moment, right?). What it means? And how to fix this?
Update
I noticed that it uses FROM gcr.io/google_appengine/jetty9 for VM. But for Appengine it should be FROM gcr.io/google_appengine/jetty9-compat. I've tried to switch to exploded app instead of WAR, and it started using correct Docker base image. But still fails:
Building and pushing image for module [default]
-------------------------------------------------------------------------------- DOCKER BUILD OUTPUT --------------------------------------------------------------------------------
Step 0 : FROM gcr.io/google_appengine/jetty9-compat
---> 2ad8572ef3d8
Step 1 : ADD . /app/
---> b10f4bc6718e
Removing intermediate container 8b149f4baf9c
Successfully built b10f4bc6718e
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Beginning teardown of remote build environment (this may take a few seconds).
Updating module [default]...failed.
ERROR: (gcloud.preview.app.deploy) Error Response: [400] "env" setting is not supported for this deployment.

The reason was this line in app.yaml:
env: 2
it was too simple and too obvious to try to deploy w/o this option. Also, every doc, official and unofficial, mentions that you need to have env: 2 option set to deploy your app as Appengine app. That's really strange.
Removing this line also changed base Docker image to gcr.io/google_appengine/java-compat. I guess it means that jetty images, including jetty9-compat, aren't compatible with Appengine apps

PIG/Hadoop issue: ERROR 2081: Unable to setup the load function [duplicate]

This question already has answers here:
how to load files on hadoop cluster using apache pig?
(3 answers)
Closed 3 years ago.
I'm running Pig 0.13.0 and Hadoop 2.5.1, both installed from the Apache distros, they're not packages from Horton or Cloudera or anything.
I'm working with a tutorial and can get it to work fine when running Pig locally ($> ./pig -x local), but when trying to run it on the Hadoop instance I get an error that I'm having a hard time researching on the internet.
This command:
movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
DUMP movies;
Works fine running locally. When I run it in Hadoop/MR mode, it seems to work fine when I run the first line of code:
grunt> movies = LOAD '/home/hduser/pig-tutorial-master/movies_data.csv' USING PigStorage(',') as (id,name,year,rating,duration);
2014-10-29 18:16:26,281 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-29 18:16:26,281 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
But when I try to $> DUMP movies it gives me this trace:
grunt> dump movies
2014-10-29 18:17:15,419 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-10-29 18:17:15,420 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-10-29 18:17:15,445 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2014-10-29 18:17:15,469 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2081: Unable to setup the load function.
Details at logfile: /usr/local/pig/pig_1414606194436.log
The ERROR 2081 is what I'm trying to diagnose, but can't find anything that helps point me in the right direction. Any ideas of where to start? I assume it's something to do with my Hadoop installation and not Pig, but I don't know. Any suggestions will be helpful.
Thanks,
Mark
EDIT: Here is the full log output:
ERROR 2081: Unable to setup the load function.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias movies
at org.apache.pig.PigServer.openIterator(PigServer.java:912)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias movies
at org.apache.pig.PigServer.storeEx(PigServer.java:1015)
at org.apache.pig.PigServer.store(PigServer.java:974)
at org.apache.pig.PigServer.openIterator(PigServer.java:887)
... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: movies: Store(hdfs://localhost:54310/tmp/temp-1276361014/tmp-2000190966:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1): org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNextTuple(POStore.java:143)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.runPipeline(FetchLauncher.java:160)
at org.apache.pig.backend.hadoop.executionengine.fetch.FetchLauncher.launchPig(FetchLauncher.java:81)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:275)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1367)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1352)
at org.apache.pig.PigServer.storeEx(PigServer.java:1011)
... 14 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to setup the load function.
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:127)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
... 21 more
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:54310/home/hduser/pig-tutorial-master/movies_data.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:190)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:146)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.setUp(POLoad.java:95)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNextTuple(POLoad.java:123)
... 22 more
================================================================================

If you are running the pig commands from grunt shell on hadoop cluster, set the property:
set opt.fetch false;
By setting the above property dump will run in mapreduce mode, by default the above property is set to true.

if you are working with hadoop 2.6.0 and pig 0.14
downgrading pig to 0.13 may help. This worked for me.

Liquibase, "Migration Failed: Java heap space" error when generateChangeLog with "data"

I am trying to create a baseline on one of my development database using liquibase, here's my environment
- Database, -> Oralce 10g, with 500+ Tables with lots of configuration data, the oracle export dump file is about 70mb;
- Java - Java 6
- Oracle JDBC Driver - ojdbc14.jar (downloaded from Oracle web site)
- Command line execute
liquibase --changeLogFile=base.changelog.data.xml --diffTypes="data" generateChangeLog
Execute result:
- Liquibase is configure to run with "-Xmx512m -Xms256m" jvm parameters, fail, error message - "Migration Failed: Java heap space
"
- configure to run with "-Xmx1024m -Xms512m", same error occurs
- configure to run with "-Xmx2048 -Xms512m", same error
What other options I have in oder to create base line for my development projects, so that we could start version control our db..
Appreciate ur advise, thanks!
James

Which version of liquibase are you using? There has been some improvements in the performance of the diff support in the upcoming 2.0. The latest build can be gotten from http://liquibase.org/ci/latest (once the bamboo server is fully upgraded)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.