Apache Beam How to use TestStream with files

Apache Beam How to use TestStream with files - java

I have a simple pipeline that just copies files from source to destination. Im trying to write tests for the windowing I had set up.
Is there a way to use the TestStream class for files?
For example:
#Test
public void elementsAreInCorrectWindows() {
TestStream<FileIO.ReadableFile> testStream = TestStream.create(ReadableFileCoder.of())
.advanceWatermarkTo(start)
.addElements(readableFile1)
.advanceWatermarkTo(end)
.addElements(readableFile2)
.advanceWatermarkToInfinity();
}
However the constructor for ReadableFile is packaged protected so I wouldn't be able to create those objects.

I think it would be a reasonable feature/pull request to make this Coder public. In the meantime, you could have a TestStream that produces elements of another type that you then transform with a DoFn into ReadableFiles.

Related

Best practice to pass large pipeline option in apache beam

We have a use case where we want in to pass hundred lines of json spec to our apache beam pipeline.? One straight forward way is to create custom pipeline option as mentioned below. Is there any other way where we can pass the input as file?
public interface CustomPipelineOptions extends PipelineOptions {
#Description("The Json spec")
String getJsonSpec();
void setJsonSpec(String jsonSpec);
}
I want to deploy the pipeline in Google dataflow engine. Even If I pass the spec as filepath and read the file contents inside the beam code before starting the pipeline, how do I bundle the spec file part of pipeline.
P.S Note, I don't want to commit the spec file(in resource folder) part of my source code where my beam code is available. It needs to configurable, i.e I want to pass different spec file for different beam pipeline job.

You can pass the options as a POJO.
public class JsonSpec {
public String stringArg;
public int intArg;
}
Then reference in your options
public interface CustomPipelineOptions extends PipelineOptions {
#Description("The Json spec")
JsonSpec getJsonSpec();
void setJsonSpec(JsonSpec jsonSpec);
}
Options will be parsed to the class; I believe by Jackson though not sure.
I am wondering why you want to pass in "hundreds of lines of JSON" as a pipeline option? This doesn't seem like a very "Beam" way of doing things. Pipeline options should pass configuration; do you really need hundreds of lines of configuration per pipeline run? If you intend to pass data to create a PCollection then better off using TextIO and then processing lines as JSON.

Beam PipelineOptions, as name implies, are intended to be used to provide small configuration parameters to configure a pipeline. PipelineOptions are usually read at job submission. So even if you get your json spec to job submission program using a PipelineOption, you have to make sure that you write your program so that your DoFns have access to this file at runtime. For this:
You have to save your files in a distributed storage system that Dataflow VMs have access to (for example, GCS)
You have to pass your input file to the transform that is reading the file.
There are multiple ways to do (2). For example,
Directly pass in the file path to constructor of your DoFn.
Pass in the file path as a side input to your transform (which allows you to configure it during runtime)

Is there any way auto generate graphql schema from protobuf?

I am developing springboot with GraphQL. Since the data structure is already declared within Protobuf, I tried to use it. This is example of my code.
#Service
public class Query implements GraphQLQueryResolver {
public MyProto getMyProto() {
/**/
}
}
I want make code like upper structure. To to this, I divided job into 2 sections.
Since ".proto file" can be converted to java class, I will use this class as return type.
And The second section is a main matter.
Also Schema is required. At first, I tried to code schema with my hand. But, the real size of proto is about 1000 lines. So, I want to know Is there any way to convert ".proto file" to ".graphqls file".

There is a way. I am using the a protoc plugin for that purpose: go-proto-gql
It is fairly simple to be used, for example:
protoc --gql_out=paths=source_relative:. -I=. ./*.proto
Hope this is working for you as well.

Create custom gradle plugin to analyze java source code and generate codes

I am trying to create a plugin to generate some java code and write back to the main source module. I was able to create a some simple pojo class using JavaPoet and write to the src/main/java.
To make this useful, it should read the code from src/maim/java folder and analyze the classes using reflection. Look for some annotation then generate some codes. Do I use the SourceTask for this case. Looked like I can only access the classes by the files. Is that possible to read the java classes as the class and using reflection analyze the class?

Since you specified what you want to do:
You'll need to implement an annotation processor. This has absolutely nothing to do with gradle, and a gradle plugin is actually the wrong way to go about this. Please look into Java Annotation Processor and come back with more questions if any come up.

With JavaForger you can read input classes and generate sourcecode based on that. It also provides an API to insert it into existing classes or create new classes based on the input file. In contrast to JavaPoet, JavaForger has a clear separation between code to be generated and settings on where and how to insert it. An example of a template for a pojo can look like this:
public class ${class.name}Data {
<#list fields as field>
private ${field.type} ${field.name};
</#list>
<#list fields as field>
public ${field.type} ${field.getter}() {
return ${field.name};
}
public void ${field.setter}(${field.type} ${field.name}) {
this.${field.name} = ${field.name};
}
</#list>
}
The example below uses a template called "myTemplate.javat" and adds some extra settings like creating the file if it does not exist and changing the path where the file will be created from */path/* to */pathToDto/*. The the path to the input class is given to read the class name and fields and more.
JavaForgerConfiguration config = JavaForgerConfiguration.builder()
.withTemplate("myTemplate.javat")
.withCreateFileIfNotExists(true)
.withMergeClassProvider(ClassProvider.fromInputClass(s -> s.replace("path", "pathToPojo")))
.build();
JavaForger.execute(config, "MyProject/path/inputFile.java");
If you are looking for a framework that allows changing the code more programatticaly you can also look at JavaParser. With this framework you can construct an abstract syntax tree from a java class and make changes to it.

Can I use testng groups read from a file?

I tried using testng groups read from external file. It is giving a compile time error stating that it can only take string constants. It looks like below:
#Test(dataProvider="myData", DataProviderClass=MyDataProvider.class, groups=MyGroups.getGroups())
public void test()
{
//...
}
I cannot do the above with TestNG as of now. So is there way of doing this?

Maybe you can try building an implementation around org.testng.IAnnotationTransformer interface that TestNG provides to you as a listener, and within its org.testng.IAnnotationTransformer#transform method you can inject the group information dynamically. Your transform() implementation could be enriched such that it reads the group information from an external data source. That should solve your problem.

Mockito - Mocking behaviour of a File

I have a class that takes in a single file, finds the file related to it, and opens it. Something along the lines of
class DummyFileClass
{
private File fileOne;
private File fileTwo;
public DummyFileClass(File fileOne)
{
this.fileOne = fileOne;
fileTwo = findRelatedFile(fileOne)
}
public void someMethod()
{
// Do something with files one and two
}
}
In my unit test, I want to be able to to test someMethod() without having to have physical files sitting somewhere. I can mock fileOne, and pass it to the constructor, but since fileTwo is being calculated in the constructor, I don't have control of this.
I could mock the method findRelatedFile() - but is this the best practice? Looking for the best design rather than a pragmatic workaround here. I'm fairly new to mocking frameworks.

In this sort of situation, I would use physical files for testing the component and not rely on a mocking framework. As fge mentions it may be easier plus you don't have to worry about any incorrect assumptions you may make of your mock.
For instance, if you rely upon File#listFiles() you may have your mock return a fixed list of Files, however, the order they are returned in is not guaranteed - a fact you may only discover when you run your code on a different platform.
I would consider using JUnit's TemporaryFolder rule to help you set up the file and directory structure you need for your test, e.g.:
public class DummyFileClassTest {
#Rule
public TemporaryFolder folder = new TemporaryFolder();
#Test
public void someMethod() {
// given
final File file1 = folder.newFile("myfile1.txt");
final File file2 = folder.newFile("myfile2.txt");
... etc...
}
}
The rule should clean up any created files and directories when the test completes.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Beam How to use TestStream with files - java

I think it would be a reasonable feature/pull request to make this Coder public. In the meantime, you could have a TestStream that produces elements of another type that you then transform with a DoFn into ReadableFiles.

Related

Best practice to pass large pipeline option in apache beam

Is there any way auto generate graphql schema from protobuf?

Create custom gradle plugin to analyze java source code and generate codes

Can I use testng groups read from a file?

Mockito - Mocking behaviour of a File

Categories

Resources