Adding support for JS UDF in Google Dataflow template

Adding support for JS UDF in Google Dataflow template - java

I have this code from Google Cloud Platform Dataflow Templates.
I wish to add more functionalities to it, namely, I wish to add support for JavaScript UDF. When I try to compile the file, using this:
mvn compile exec:java \
-Dexec.mainClass=com.google.cloud.teleport.templates.${PIPELINE_NAME} \
-Dexec.cleanupDaemonThreads=false \
-Dexec.args=" \
--project=${PROJECT_ID} \
--stagingLocation=gs://${PROJECT_ID}/dataflow/${PIPELINE_FOLDER}/staging \
--tempLocation=gs://${PROJECT_ID}/dataflow/${PIPELINE_FOLDER}/temp \
--runner=DataflowRunner \
--windowDuration=2m \
--numShards=1 \
--topic=projects/${PROJECT_ID}/topics/windowed-files \
--outputDirectory=gs://${PROJECT_ID}/temp/ \
--outputFilenamePrefix=windowed-file \
--outputFilenameSuffix=.txt"
When compiling the file, I get the following error:
An exception occured while executing the Java class. Class interface com.google.cloud.teleport.templates.PubsubToText$Options missing a property named 'topic'. -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project google-cloud-teleport-java: An exception occured while executing the Java class. Class interface com.google.cloud.teleport.templates.PubsubToText$Options missing a property named 'topic'.
Even though, I've passed the --topic flag with appropriate values plugged in.

The example at the top is wrong. You have to pass --inputTopic instead of --topic. You can see this in the code where the ValueProvider is defined:
#Description("The Cloud Pub/Sub topic to read from.")
#Required
ValueProvider<String> getInputTopic();
void setInputTopic(ValueProvider<String> value);
You can also run the template from the Console UI and the job details will show that the option is indeed inputTopic:

The invocation example in the javadoc should now reflect the correct input parameter (--inputTopic) here.

Related

I am trying to run a jenkins pipeline with gradle+sonarqube, need help due to error message

I need help figuring out why my pipeline won't work. I already figured out how to get jenkins with my github. All I need to do now is get it where I can see the code to show up on sonarqube for analysis. I am using gradle not maven, and most tutorials show good examples for maven...
Here is my code:
pipeline {
agent any
stages{
stage('SonarQube analysis') {
steps{
withSonarQubeEnv(installationName: 'Sonar9.5', credentialsId: 'sonarqube-token') {
sh './gradlew sonarqube \
-Dsonar.projectKey=${#key} \
-Dsonar.host.url=${"#url"} \
-Dsonar.login=${#auth} \
-Dsonar.projectName=${#name} \
-Dsonar.projectVersion=${"not provided"}'
}
}
}
}
}
Here is the error message:
java.io.IOException: CreateProcess error=2, The system cannot find the file specified

Run SBT 1.2.8 project with Java -D options on Windows

I have been running my Play projects using the deprecated Activator wrapper for SBT, and it allows me to specify -D options for the JVM it launches like so:
> activator -Dhttp.port=9000 -Dplay.server.websocket.frame.maxLength=10485760 "run 9000"
This is very useful as it allows me to create separate .bat files for running a given project on different ports, which is great as I'm working on several different websites in parallel.
However, I've been unable to transition this command line to use SBT directly:
> sbt -Dhttp.port=9000 -Dplay.server.websocket.frame.maxLength=10485760 "run 9000"
...
[error] Expected letter
[error] Expected symbol
[error] Expected '+'
[error] Expected '++'
[error] Expected 'java++'
[error] Expected 'java+'
[error] Expected '^'
[error] Expected '^^'
[error] Expected '+-'
[error] Expected 'debug'
[error] Expected 'info'
[error] Expected 'warn'
[error] Expected 'error'
[error] Expected 'addPluginSbtFile'
[error] Expected 'show'
[error] Expected 'all'
[error] Expected 'Global'
[error] Expected '*'
[error] Expected 'Zero'
[error] Expected 'ThisBuild'
[error] Expected 'ProjectRef('
[error] Expected '{'
[error] Expected project ID
[error] Expected configuration
[error] Expected configuration ident
[error] Expected key
[error] Expected end of input.
[error] Expected ';'
[error] Expected 'early('
[error] Expected '-'
[error] Expected '--'
[error] Expected '!'
[error] .port
[error] ^
Adding -J as suggested by https://stackoverflow.com/a/47914062/708381
> sbt -J-Dhttp.port=9000 -J-Dplay.server.websocket.frame.maxLength=10485760 "run 9000"
...
[error] Expected symbol
[error] Not a valid command: -
[error] Expected end of input.
[error] Expected '--'
[error] Expected 'debug'
[error] Expected 'info'
[error] Expected 'warn'
[error] Expected 'error'
[error] Expected 'addPluginSbtFile'
[error] -J-Dhttp
[error] ^
The SBT documentation lists many properties (all of which contain dots) but fails to provide any full command line examples of how to actually specify them. It seems like you should be able to "just" do -Dprop=value as in my first example: https://www.scala-sbt.org/1.x/docs/Command-Line-Reference.html
(Yes, there are more recent versions of SBT available, but I'm blocked on this bug: https://github.com/sbt/sbt/issues/5046 - ideally any solution works with any recent-ish version of SBT)

First, some background...
There are different ways to install SBT. Usually it comes with a wrapper shell script which makes it convenient to run so you don't have to specify sbt jar file location. However, depending on your installation method you might have a very simple or much more advanced sbt wrapper script.
I suggest to actually check your sbt runner script once to see what it does. Some basic scripts or manually created ones do NOT pass cmd args to JVM at all!
Here is one popular sbt runner script you can use: https://github.com/paulp/sbt-extras.
To get it simply make a script like this get_sbt.sh:
#!/bin/bash
curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt > sbt \
&& chmod 0755 sbt
and it will download it for you.
On Fedora Linux (and other non-windows OSes) you can simply install sbt with package manager:
dnf install sbt
Here is a help page from my fedora sbt package:
$ sbt --help
...
# jvm options and output control
JAVA_OPTS environment variable, if unset uses "-Dfile.encoding=UTF-8"
.jvmopts if this file exists in the current directory, its contents
are appended to JAVA_OPTS
SBT_OPTS environment variable, if unset uses ""
.sbtopts if this file exists in the current directory, its contents
are prepended to the runner args
/etc/sbt/sbtopts if this file exists, it is prepended to the runner args
-Dkey=val pass -Dkey=val directly to the java runtime
-J-X pass option -X directly to the java runtime
(-J is stripped)
-S-X add -X to sbt's scalacOptions (-S is stripped)
And a similar help from the github script above:
$ ./sbt -h
...
# passing options to the jvm - note it does NOT use JAVA_OPTS due to pollution
# The default set is used if JVM_OPTS is unset and no -jvm-opts file is found
<default> -Xms512m -Xss2m -XX:MaxInlineLevel=18
JVM_OPTS environment variable holding either the jvm args directly, or
the reference to a file containing jvm args if given path is prepended by '#' (e.g. '#/etc/jvmopts')
Note: "#"-file is overridden by local '.jvmopts' or '-jvm-opts' argument.
-jvm-opts <path> file containing jvm args (if not given, .jvmopts in project root is used if present)
-Dkey=val pass -Dkey=val directly to the jvm
-J-X pass option -X directly to the jvm (-J is stripped)
# passing options to sbt, OR to this runner
SBT_OPTS environment variable holding either the sbt args directly, or
the reference to a file containing sbt args if given path is prepended by '#' (e.g. '#/etc/sbtopts')
Note: "#"-file is overridden by local '.sbtopts' or '-sbt-opts' argument.
-sbt-opts <path> file containing sbt args (if not given, .sbtopts in project root is used if present)
-S-X add -X to sbt's scalacOptions (-S is stripped)
... Now the answer:
You can test whether the properties are passed correctly to sbt like so:
$ echo 'sys.props.get("test")' | sbt -Dtest=works console
...
scala> sys.props.get("test")
res0: Option[String] = Some(works)
If you see None then your runner script is not doing its job, reinstall SBT or replace the script.
If that works but your port doesn't change, then perhaps the path in your config is different. Each dot in typesafe config represents a level of hierarchy (in json). You can print full config on start to see how it's structured.

Running Java Spark program on AWS EMR

I'm having problem running Java written spark application on AWS EMR.
Locally, everything runs fine. When I submit a job to EMR, I always get "Completed" withing 20 seconds even though job should take minutes. There is no output being produced, no log messages are being printed.
I'm still confused as weather it should be ran as Spark application or CUSTOM_JAR type.
Look of my main method:
public static void main(String[] args) throws Exception {
SparkSession spark = SparkSession
.builder()
.appName("RandomName")
.getOrCreate();
//process stuff
String from_path = args[0];
String to_path = args[1];
Dataset<String> dataInput = spark.read().json(from_path).toJSON();
JavaRDD<ResultingClass> map = dataInput.toJavaRDD().map(row -> convertData(row)); //provided function didn't include here
Dataset<Row> dataFrame = spark.createDataFrame(map, ResultingClass.class);
dataFrame
.repartition(1)
.write()
.mode(SaveMode.Append)
.partitionBy("year", "month", "day", "hour")
.parquet(to_path);
spark.stop();
}
I've tried these:
1)
aws emr add-steps --cluster-id j-XXXXXXXXX --steps \
Type=Spark,Name=MyApp,Args=[--deploy-mode,cluster,--master,yarn, \
--conf,spark.yarn.submit.waitAppCompletion=false, \
--class,com.my.class.with.main.Foo,s3://mybucket/script.jar, \
s3://partitioned-input-data/*/*/*/*/*.txt, \
s3://output-bucket/table-name], \
ActionOnFailure=CONTINUE --region us-west-2 --profile default
Completes in 15 sec without error, output result or logs I've added.
2)
aws emr add-steps --cluster-id j-XXXXXXXXX --steps \
Type=CUSTOM_JAR, \
Jar=s3://mybucket/script.jar, \
MainClass=com.my.class.with.main.Foo, \
Name=MyApp, \
Args=[--deploy-mode,cluster, \
--conf,spark.yarn.submit.waitAppCompletion=true, \
s3://partitioned-input-data/*/*/*/*/*.txt, \
s3://output-bucket/table-name], \
ActionOnFailure=CONTINUE \
--region us-west-2 --profile default
Reads parameters wrongly, sees --deploy-mode as first parameter and cluster as second instead of buckets
3)
aws emr add-steps --cluster-id j-XXXXXXXXX --steps \
Type=CUSTOM_JAR, \
Jar=s3://mybucket/script.jar, \
MainClass=com.my.class.with.main.Foo, \
Name=MyApp, \
Args=[s3://partitioned-input-data/*/*/*/*/*.txt, \
s3://output-bucket/table-name], \
ActionOnFailure=CONTINUE \
--region us-west-2 --profile default
I get this: Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession
When I include all dependencies (which I do not need to locally)
I get: Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
I do not want to hardcode the "yarn" into the app.
I find AWS documentation very confusing as to what is the proper way to run this.
Update:
Running command directly on the server does work. So the problem must be in the way I'm defining a cli command.
spark-submit --class com.my.class.with.main.Foo \
s3://mybucket/script.jar \
"s3://partitioned-input-data/*/*/*/*/*.txt" \
"s3://output-bucket/table-name"

The 1) was working.
The step overview on the aws console said that the task was finished within 15 seconds, but in reality it was still running on the cluster. It took him an hour to do the work and I can see the result.
I do not know why the step is misreporting the result. I'm using emr-5.9.0 with Ganglia 3.7.2, Spark 2.2.0, Zeppelin 0.7.2.

How to run a Java application as Windows service using WinRun4J

I'm trying to run a Java application as a Windows service with WinRun4J.
I copied WinRun4J64c.exe in my application directory and placed the following service.ini file beside:
service.class=org.boris.winrun4j.MainService
service.id=MyAPP
service.name=MyAPP
service.description=some description
classpath.1=./lib/*
classpath.2=WinRun4J.jar
[MainService]
class=play.core.server.NettyServer
But if I start the service with: WinRun4J64c.exe --WinRun4J:RegisterService I get:
Service control dispatcher error: 1063
What is wrong?

I didn't get it working, so my workaround is to use Apache Commons Deamon. I executed the included prunsrv.exe with the following parameters:
prunsrv.exe install "MeineAnwendung" \
--Install="C:/pfad/zu/prunsrv.exe" \
--JvmOptions=-Dpidfile.path=NUL
--Jvm=auto \
--Startup=auto \
--StartMode=jvm \
--Classpath="c:/irgendwo/anwendung/lib/*;" \
--StartClass=play.core.server.NettyServer

IKVM - Unable to convert a jar to dll

Team,
Ia m using IKVM to convert a jar file to a dll, so that I can use it with C# to test the Java application...I dont have the original java source code or the class files.
Here is what I am doing and the error I get:
ikvmc myApplication.jar
Note IKVMC0002: output file is "asapi.dll"
Warning IKVMC0100: class "org.apache.log4j.Logger" not found
Warning IKVMC0111: emitted java.lang.NoClassDefFoundError in "com.myApp.authenticateUser(LNote IKVMC0002: output file is "asapi.dll"
Warning IKVMC0100: class "org.apache.log4j.Logger" not found
Warning IKVMC0111: emitted java.lang.NoClassDefFoundError in
vices.AsApi.authenticateWithArtifact(Ljava.lang.String;Lcom.myApp.AppApi)....
Any ideas? This jar file doesnt contain a main method...
Regards,
Deekshit

I believe you need something more along the lines of:
/usr/bin/mono \
/path/to/ikvm-0.42.0.6/bin/ikvmc.exe \
/path/to/project/target/project-1.2.3.4.jar \
-out:/path/to/project/target/project-1.2.3.4.dll \
-keyfile:/path/to/project/target/private-key.snk \
-assembly:project-1.2.3.4 \
-fileversion:1.2.3.4 \
-version:1.2.3.4
If your application depends on third-party jars, you might need to add them to the jar, using shading (not good practice at all).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.