BigQueryIO: Query configured via options, but "Value only available at runtime" - java

Apache Beam 2.9.0
I have set up a pipeline that pulls data from BigQuery and does a series of transforms on it. The options have a start date attached to them using a ValueProvider:
ValueProvider<String> getStartTime();
void setStartTime(ValueProvider<String> startTime);
I then go to pull the data with BigQueryIO (changing things around a bit for the sake of making it explicit what is going on):
BigQueryIO.read(
(SerializableFunction<SchemaAndRecord, AggregatedRowRecord>)
input -> new BigQueryParser().apply(input.getRecord()))
.withoutValidation()
.withTemplateCompatibility()
.fromQuery(
ValueProvider.NestedValueProvider.of(
opts.getStartTime(),
(SerializableFunction<String, String>)
input -> {
Instant instant = Instant.parse(input);
return String.format(
<large SQL statement with a %s in it>,
String.format(
"%d_%d_%d",
instant.get(ChronoField.YEAR),
instant.get(ChronoField.MONTH_OF_YEAR),
instant.get(ChronoField.DAY_OF_MONTH)));
}))
.withCoder(<coder for AggregatedRowRecords>)
.usingStandardSql()
This is then added to a pipeline normally (p.apply(<above>)).
Now I run it:
--project=<project> \
--tempLocation=<directory> \
--stagingLocation=<directory> \
--network=dataflow \
--subnetwork=<subnetwork> \
--defaultWorkerLogLevel=DEBUG
--appName=<name>
--runner=DirectRunner
This causes the following error:
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: RuntimeValueProvider{propertyName=startTime, default=null}
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
at <class>.main(<class>.java:<>)
Caused by: java.lang.IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: RuntimeValueProvider{propertyName=startTime, default=null}
at org.apache.beam.sdk.options.ValueProvider$RuntimeValueProvider.get(ValueProvider.java:228)
at org.apache.beam.sdk.options.ValueProvider$NestedValueProvider.get(ValueProvider.java:131)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.createBasicQueryConfig(BigQueryQuerySource.java:230)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.dryRunQueryIfNeeded(BigQueryQuerySource.java:175)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:115)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:102)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead$2.processElement(BigQueryIO.java:783)
The use of NestedValueProvider comes from this example on setting up templates:
The user provides a substring for a BigQuery query, such as a specific date. The transform uses the substring to create the full query. Calling .get() returns the full query.
Removing the value provider logic doesn't seem to help, however. Removing the ValueProvider entirely from the withQuery section works fine, but defeats the purpose of being able to set it via options.

The exception explains you the issue, Apache beam first builds the pipeline and the classes and then start to run the data in the pipeline, in this stage, you can't access to options, this is just metadata for building the pipeline.
The way to overcome it is to create a ParDo function/ PTransform, that will get the options you need as parameters in the constructor, then it can access it in its logic.
See example: (my use case, I face the same issue last days)
The pipeline:
HistoryProcessingOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
.as(HistoryProcessingOptions.class);
Pipeline pipeline = Pipeline.create(options);
pipeline.apply(SourceRead.of(options.getSourceBigQueryTable().get(),
options.getSourceBigQueryDataset().get(),
options.getSourceBigQueryProject().get(),
options.getFromDate().get(),
options.getToDate().get()
))
The transformer itself:
public class SourceRead extends PTransform<PBegin, PCollection<TableRow>> {
private String sourceBigQueryTable;
private String sourceBigQueryDataset;
private String sourceBigQueryProject;
private String formDate;
private String toDate;
private static Logger logger = LoggerFactory.getLogger(SourceRead.class);
public SourceRead(String sourceBigQueryTable, String sourceBigQueryDataset, String sourceBigQueryProject, String formDate, String toDate) {
this.sourceBigQueryTable = sourceBigQueryTable;
this.sourceBigQueryDataset = sourceBigQueryDataset;
this.sourceBigQueryProject = sourceBigQueryProject;
this.formDate = formDate;
this.toDate = toDate;
}
public static SourceRead of(String sourceBigQueryTable, String sourceBigQueryDataset, String sourceBigQueryProject, String yearToLoad, String dateToLoad) {
return new SourceRead(sourceBigQueryTable, sourceBigQueryDataset, sourceBigQueryProject, yearToLoad, dateToLoad);
}
#Override
public PCollection<TableRow> expand(PBegin input) {
String query = "SELECT * FROM TABLE_DATE_RANGE([" + sourceBigQueryProject + ":"+sourceBigQueryDataset+"."+sourceBigQueryTable+"],"
+ "TIMESTAMP('" + formDate + "'),"
+ "TIMESTAMP('" + toDate + "'))";
logger.info("query is"+ query);
return input.apply(BigQueryIO.readTableRows()
.fromQuery(query));
}

Related

What are the solutions for findByN1QL in SDK3?

When migrating from Couchbase SDK 2 to SDK 3 certain document formats seem to have been removed.
How can this format or an alternative output be used in Couchbase SDK 3 to handle the below-indicated API change?
This is one of the sample classes that used findByN1QL in the existing system.
private List<Document> getBspReconciledAgentTransactionDataList(
BspReconciliationAgentTransactionLogicData transactionLogicData) {
final String bucketName = getBucketName(repository);
String query = getTransactionQueryStatement(transactionLogicData).toString();
query = query.split(N1qlQueryUtil.WHERE)[NumberConstants.ONE];
query = N1qlQueryUtil.selectOf(N1qlQueryUtil.metaOf(bucketName, "id", "_ID"),
N1qlQueryUtil.metaOf(bucketName, "cas", "_CAS"), N1QlQueryConstants.COUNTRY_NAME,
N1QlQueryConstants.COUNTRYCODE, N1QlQueryConstants.AIRLINECODE, N1QlQueryConstants.TRANSACTIONTYPE,
N1QlQueryConstants.SUBMISSIONSTATUS, N1QlQueryConstants.RECONCILIATIONSTATUS,
N1QlQueryConstants.IATA_CODE_CONST, N1QlQueryConstants.AGENT_CODE_CONST,
N1QlQueryConstants.TRANSACTION_DATE_CONST, N1QlQueryConstants.DPC_PROCESSING_DATE,
N1QlQueryConstants.E_TICKET_NO, N1QlQueryConstants.ORDER_ID, N1QlQueryConstants.PASSENGER_NAME,
N1QlQueryConstants.CURRENCY, N1QlQueryConstants.DEBIT_AMOUNT_POSTED,
N1QlQueryConstants.CREDIT_AMOUNT_POSTED, N1QlQueryConstants.DEBIT_AMOUNT_FROM_HOT_FILE,
N1QlQueryConstants.CREDIT_AMOUNT_FROM_HOT_FILE, N1QlQueryConstants.ACMADM_REF_ID,
N1QlQueryConstants.DOCUMENT_ID) +
N1qlQueryUtil.fromOf(bucketName)
+ N1qlQueryUtil.whereOf(N1QlQueryConstants.CLASS_DETAIL_DOCUMENT + query);
return getCouchbaseOperations(repository).findByN1QL(couchbaseConfiguration.cluster().query(query),
BSPReconciledDetailDocument.class);
private <T, I extends Serializable> CouchbaseOperations getCouchbaseOperations(
CouchbaseRepository<T, I> repository) {
return repository.getOperations();
}
Error showing line "findByN1QL",
return getCouchbaseOperations(repository).findByN1QL(couchbaseConfiguration.cluster().query(query),
BSPReconciledDetailDocument.class);
What would be the best possible options?
I have a solution to "findByN1QL" using cluster,
// return getCouchbaseOperations(repository).findByN1QL(N1qlQuery.simple(query),
// BSPReconciledDetailDocument.class);
return cluster.query(query).rowsAs(BSPReconciledDetailDocument.class);
With this scenario not used "CouchbaseRepository".Used Cluster.
#Autowired
private Cluster cluster;
private List<Document> getBspReconciledAgentTransactionDataList(
BspReconciliationAgentTransactionLogicData transactionLogicData) {
final String bucketName = getBucketName(repository);
String query = getTransactionQueryStatement(transactionLogicData).toString();
query = query.split(N1qlQueryUtil.WHERE)[NumberConstants.ONE];
query = N1qlQueryUtil.selectOf(N1qlQueryUtil.metaOf(bucketName, "id", "_ID"),
N1qlQueryUtil.metaOf(bucketName, "cas", "_CAS"), N1QlQueryConstants.COUNTRY_NAME,
N1QlQueryConstants.COUNTRYCODE, N1QlQueryConstants.AIRLINECODE, N1QlQueryConstants.TRANSACTIONTYPE,
N1QlQueryConstants.SUBMISSIONSTATUS, N1QlQueryConstants.RECONCILIATIONSTATUS,
N1QlQueryConstants.IATA_CODE_CONST, N1QlQueryConstants.AGENT_CODE_CONST,
N1QlQueryConstants.TRANSACTION_DATE_CONST, N1QlQueryConstants.DPC_PROCESSING_DATE,
N1QlQueryConstants.E_TICKET_NO, N1QlQueryConstants.ORDER_ID, N1QlQueryConstants.PASSENGER_NAME,
N1QlQueryConstants.CURRENCY, N1QlQueryConstants.DEBIT_AMOUNT_POSTED,
N1QlQueryConstants.CREDIT_AMOUNT_POSTED, N1QlQueryConstants.DEBIT_AMOUNT_FROM_HOT_FILE,
N1QlQueryConstants.CREDIT_AMOUNT_FROM_HOT_FILE, N1QlQueryConstants.ACMADM_REF_ID,
N1QlQueryConstants.DOCUMENT_ID) +
N1qlQueryUtil.fromOf(bucketName)
+ N1qlQueryUtil.whereOf(N1QlQueryConstants.CLASS_DETAIL_DOCUMENT + query);
return cluster.query(query).rowsAs(BSPReconciledDetailDocument.class);
Does it work or not? Does anyone have any idea about it?
You can use the SPEL expression in an #Query method in a repository interfaces as shown below (assuming the n1ql statement will return Persons).
#Query("#{[0]}")
List<Person> myN1ql(String n1qlString);

lambda SNSEvent with simple Json

I am new to lambda and trying to create a function, where i can consume both SNSEvent and simple json as payload in my requestHandler method. How can i do it in Java? should i take my input as Object type?
public class LogEvent implements RequestHandler<SNSEvent, Object> {
public Object handleRequest(SNSEvent request, Context context){
String timeStamp = new SimpleDateFormat("yyyy-MM-dd_HH:mm:ss").format(Calendar.getInstance().getTime());
context.getLogger().log("Invocation started: " + timeStamp);
context.getLogger().log(request.getRecords().get(0).getSNS().getMessage());
timeStamp = new SimpleDateFormat("yyyy-MM-dd_HH:mm:ss").format(Calendar.getInstance().getTime());
context.getLogger().log("Invocation completed: " + timeStamp);
return null;
}
}
this works fine. but if i want the flexibility of passing simple Json like below
{
"req": "test"
}
from aws console lambda test section. to manually trigger few tests without sending actual SNSEvent Object. how should i modify my code.
Note: above mentioned code and test are not exactly what i have but providing any suggestion on given code itself will be helpful.

Use "Selector modules" with DataMovement SDK MarkLogic [Java] [MarkLogic] [dmsdk] [data-movement-sdk][ml-java-api]

I'm using Data Movement SDK from MarkLogic Java API to transform several documents, up to now I can transform documents by using a query batcher and a transform, but i'm only able to use URIS selectors by StructuredQuery objects.
My question is: ¿How may I use a selector module from my database instead of define it into my java application?
Update:
Up to now I already have a code that looks for document's URIS and applies a transform on them. I want to change that query batcher and use a module or selector module instead of looking for all documents into a directory
public TransformExecutionResults applyTransformByModule(String transformName, String filterText, int batchSize, int threadCount, String selectorModuleName, Map<String,String> parameters ) {
final ConcurrentHashMap<String, TransformExecutionResults> transformResult = new ConcurrentHashMap<>();
try {
// Specify a server-side transformation module (stored procedure) by name
ServerTransform transform = new ServerTransform(transformName);
ApplyTransformListener transformListener = new ApplyTransformListener().withTransform(transform).withApplyResult(ApplyResult.REPLACE) // Transform in-place, i.e. rewrite
.onSuccess(batch -> {
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Success);
System.out.println("Transformation " + transformName + " executed succesfully.");
}).onSkipped(batch -> {
System.out.println("Transformation " + transformName + " skipped succesfully.");
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Skipped);
}).onFailure((batchListener, throwable) -> {
System.err.println("Transformation " + transformName + " executed with errors.");
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.Failed); // failed
});
// Apply the transformation to only the documents that match a query.
QueryManager qm = DbClient.newQueryManager();
StructuredQueryBuilder sqb = qm.newStructuredQueryBuilder();
// instead of this StruturedQueryDefinition, I want to use a module to get all URIS
StructuredQueryDefinition queryBySubdirectory = sqb.directory(true, "/temp/" + filterText + "/");
final QueryBatcher batcher = DMManager.newQueryBatcher(queryBySubdirectory);
batcher.withBatchSize(batchSize);
batcher.withThreadCount(threadCount);
batcher.withConsistentSnapshot();
batcher.onUrisReady(transformListener).onQueryFailure(exception -> {
exception.printStackTrace();
System.out.println("There was an error on Transform process.");
});
final JobTicket ticket = DMManager.startJob(batcher);
batcher.awaitCompletion();
DMManager.stopJob(ticket);
} catch (Exception fault) {
transformResult.compute(transformName, (k, v) -> TransformExecutionResults.GeneralException); // general exception
}
return transformResult.get(transformName);
}
If the job is small enough, you can just implement the document rewriting within your enode code either by making a call to a resource service extension:
http://docs.marklogic.com/guide/java/resourceservices#id_27702
http://docs.marklogic.com/javadoc/client/com/marklogic/client/extensions/ResourceServices.html
or by invoking a main module:
http://docs.marklogic.com/guide/java/resourceservices#id_84134
If the job is too long to fit in a single transaction, your can create a QueryBatcher with a document URI iterator instead of with a query. See:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/DataMovementManager.html#newQueryBatcher-java.util.Iterator-
For some examples illustrating the approach, see the second half of the second example in the class description for QueryBatcher:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/QueryBatcher.html
as well as the second half of this example:
http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/UrisToWriterListener.html
In your case, you could implement an Iterator that calls a resource service extension or invokes a main module to get and return the URIs (preferrably with read ahead), blocking when necessary.
By returning the uris to the client, it's easy to log the uris for later audit.
Hoping that helps,

Mockito Unit tests - Timestamps are different

Having some issues with a Mockito test.
I currently get this error:
Argument(s) are different! Wanted:
repository.save(
uk.co.withersoft.docservice.repositories.hibernate.MetaDataEntity#3e437e6c
);
-> at uk.co.withersoft.docservice.datastore.impl.MetaDataStoreImplTest.storeClaimMetadata(MetaDataStoreImplTest.java:55)
Actual invocation has different arguments:
repository.save(
uk.co.withersoft.docservice.repositories.hibernate.MetaDataEntity#3e361ee2
);
I'm pretty sure its because the times within MetaDataEntity are different
//This is what I should be getting
id = null
metaData = "{"caseReference":"CN00000001","claimReference":"LN00000001","rpsDocumentType":"REJ","documentTitle":"Claims LN00000001 (Claimant: Mr LOCAL HOST) REJ-Rejection letter"}"
batchId = 0
state = "Saved MetaData to DB"
lastUpdatedDate = {Timestamp#1517} "2018-07-25 18:39:21.993"
createdDate = {Timestamp#1518} "2018-07-25 18:39:21.993"
// This is actually what I get.
id = null
metaData = "{"caseReference":"CN00000001","claimReference":"LN00000001","rpsDocumentType":"REJ","documentTitle":"Claims LN00000001 (Claimant: Mr LOCAL HOST) REJ-Rejection letter"}"
batchId = 0
state = "Saved MetaData to DB"
lastUpdatedDate = {Timestamp#1530} "2018-07-25 18:39:49.274"
createdDate = {Timestamp#1531} "2018-07-25 18:39:52.716"
Here is my test case:
#Test
public void storeClaimMetadata () throws JsonProcessingException {
ClaimMetaData metaData = constructMetaData();
MetaDataEntity mockResponseMetaDataEntity = new MetaDataEntity();
mockResponseMetaDataEntity.setId(1);
when(repository.save(any(MetaDataEntity.class))).thenReturn(mockResponseMetaDataEntity);
Integer result = testSubject.storeClaimMetadata(metaData);
assertEquals(Integer.valueOf(1), result);
final ObjectMapper mapper = new ObjectMapper();
String jsonMetaData = mapper.writeValueAsString(metaData);
MetaDataEntity expectedMetaDataEntity = new MetaDataEntity(null,
jsonMetaData,
0,
"Saved MetaData to DB",
new Timestamp(System.currentTimeMillis()),
new Timestamp(System.currentTimeMillis()));
Mockito.verify(repository, times(1)).save(expectedMetaDataEntity);
}
//Creates a ClaimRequest
private ClaimMetaData constructMetaData() {
final ClaimMetaData metaData = new ClaimMetaData("CN00000001",
"LN00000001",
"REJ",
"Claims LN00000001 (Claimant: Mr LOCAL HOST) REJ-Rejection letter");
return metaData;
}
Any help would be much appreciated. This has been driving me crazy!!
This is exactly why people use dependency injection, so they can specify test collaborators that give back predictable results. Replace the hardcoded new Timestamp(System.currentTimeMillis) stuff with calls to Timestamp.from(Instant.now(clock)).
java.time.Clock is an interface that you can use to get your timestamp values. The real implementation can be injected into the code being tested, using one of the factory methods that returns a system clock, like this (using Spring Java configuration):
#Bean
public Clock clock() {
return Clock.systemDefaultZone();
}
and for the test code you can have an implementation where you specify the time you want the clock to return:
#Before
public void setUp() {
clock = Clock.fixed(date.toInstant(), ZoneId.of("America/NewYork"));
systemUnderTest.setClock(clock);
}
This is "works as designed".
You are invoking a service that computes timestamps. Like, now.
Then you have a test case that has some setup going on, and fetches time stamps, too. Now.
Guess what: albeit these two "nows above are close to each other, there is still a bit of delay between them.
You are checking for equality, can only work when the time stamps are identical! But they aren't, because they are created one after the other, with very well noticeable delays in between!
Meaning: you need to look how you could control which timestamps are created within your application, like saying "the timestamps should be t1 and t2". So that your test can then check "I found t1 and t2".
Alternatively, you simply change your verification step: instead of trying to have "equal" objects (that can't be equal because different time stamps!), you could compare those parts that should be equal, and for the time stamps, you could check that they are "close enough".
In Code , instead of using new Timestamp(System.currentTimeMillis()) , you can use
new Timestamp(DateTimeUtils.currentTimeMillis()). Here DateTimeUtils is from jodatime.
In test cases, can use below.
private SimpleDateFormat DATE_FORMATTER = new SimpleDateFormat("dd/MM/yyyy HH:mm:ss:SSS");
#Before
public void before() throws Exception {
// define a fixed date-time
Date fixedDateTime = DATE_FORMATTER.parse("01/07/2016 16:45:00:000");
DateTimeUtils.setCurrentMillisFixed(fixedDateTime.getTime());
}
#After
public void after() throws Exception {
// Make sure to cleanup afterwards
DateTimeUtils.setCurrentMillisSystem();
}````

Call transactional method Play Java JPA Hibernate

I have 2 database one is mysql and other is postgree.
I tried to get postgree data from mysql transactional method.
#Transactional(value = "pg")
public List<String> getSubordinate(){
Query q1 = JPA.em().createNativeQuery("select vrs.subordinate_number, vrs.superior_number\n" +
"from view_reporting_structure vrs\n" +
"where vrs.superior_number = :personel_number");
q1.setParameter("personel_number","524261");
List<String> me = q1.getResultList();
return me;
}
}
from another method
#Transactional
public Result getOpenRequestList(){
Subordinate subordinate = new Subordinate();
List<String> subordinateData = subordinate.getSubordinate();
....
}
i got error
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'db_hcm.view_reporting_structure' doesn't exist
so my Postgre method recognized as mySQL transaction which is the view not exist in mySQL database. how do I get data from different presistence unit with 1 method?
I never did it (different databases), but I guess the following may work.
For example, you have the following data source definition in application.conf:
# MySql
db.mysql.driver=com.mysql.jdbc.Driver
... the rest of setting for db.mysql
# H2
db.postgre.driver=org.postgresql.Driver
... the rest of setting for db.postgre
Instead of using #Transactional annotation, manage a transaction explicitly and use JPA withTransaction API:
private static final String MYSQL_DB = "mysql";
private static final String POSTGRE_DB = "postgre";
public List<String> getSubordinate() {
JPA.withTransaction(MYSQL_DB, true/* this is read-only flag*/,
() -> {
Query q1 = JPA.em().createNativeQuery("select vrs.subordinate_number, vrs.superior_number\n" +
"from view_reporting_structure vrs\n" +
"where vrs.superior_number = :personel_number");
q1.setParameter("personel_number","524261");
List<String> me = q1.getResultList();
return me;
}
}
public Result getOpenRequestList(){
JPA.withTransaction(POSTGRE_DB, true/* this is read-only flag*/,
() -> {
Subordinate subordinate = new Subordinate();
List<String> subordinateData = subordinate.getSubordinate();
....
}
}
Note: I prefer always use withTransaction, since it allows better control of unhappy flow. You should wrap the call with try-catch. If JPA throws a run-time exception on commit, you can do proper error handling. In case of using #Transactional annotation, commit takes place after controller have finished and you cannot handle the error.

Categories

Resources