How do I iterate over a MongoDB Change Stream in Spring Boot?

How do I iterate over a MongoDB Change Stream in Spring Boot? - java

I have read countless of articles and code examples on MongoDB Change Streams, but I still can't manage to set it up properly. I'm trying to listen to a specific collection in my MongoDB and whenever a document is inserted, updated or deleted, I want to do something.
This is what I've tried:
#Data
#Document(collection = "teams")
public class Teams{
private #MongoId(FieldType.OBJECT_ID)
ObjectId id;
private Integer teamId;
private String name;
private String description;
}
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.Aggregates;
import com.mongodb.client.model.Filters;
import com.mongodb.client.model.changestream.FullDocument;
import com.mongodb.client.ChangeStreamIterable;
import org.bson.Document;
import org.bson.conversions.Bson;
import java.util.Arrays;
import java.util.List;
public class MongoDBChangeStream {
// connect to the local database server
MongoClient mongoClient = MongoClients.create("db uri goes here");
// Select the MongoDB database
MongoDatabase database = mongoClient.getDatabase("MyDatabase");
// Select the collection to query
MongoCollection<Document> collection = database.getCollection("teams");
// Create pipeline for operationType filter
List<Bson> pipeline = Arrays.asList(
Aggregates.match(
Filters.in("operationType",
Arrays.asList("insert", "update", "delete"))));
// Create the Change Stream
ChangeStreamIterable<Document> changeStream = collection.watch(pipeline)
.fullDocument(FullDocument.UPDATE_LOOKUP);
// Iterate over the Change Stream
for (Document changeEvent : changeStream) {
// Process the change event here
}
}
So this is what I have so far and everything is good until the for-loop which gives three errors:
There is a red line under 'for (', which says unexpected token.
There is a red line under ' :', which says ';' expected.
There is a red line under 'changeStream)', which says unknown class: 'changeStream'.

First of all you should put your code inside class method, not class body. Second - ChangeStreamIterable<Document> iterator element is ChangeStreamDocument<Document> and not Document.
Summing things up:
public class MongoDBChangeStream {
public void someMethod() {
// connect to the local database server
MongoClient mongoClient = MongoClients.create("db uri goes here");
// Select the MongoDB database
MongoDatabase database = mongoClient.getDatabase("MyDatabase");
// Select the collection to query
MongoCollection<Document> collection = database.getCollection("teams");
// Create pipeline for operationType filter
List<Bson> pipeline = Arrays.asList(
Aggregates.match(
Filters.in(
"operationType",
Arrays.asList("insert", "update", "delete")
)));
// Create the Change Stream
ChangeStreamIterable<Document> changeStream = collection.watch(pipeline)
.fullDocument(FullDocument.UPDATE_LOOKUP);
// Iterate over the Change Stream
for (ChangeStreamDocument<Document> changeEvent : changeStream) {
// Process the change event here
}
}
}

Related

Not able to process kafka json message with Flink siddhi library

I am trying to create a simple application where the app will consume Kafka message do some cql transform and publish to Kafka and below is the code:
JAVA: 1.8
Flink: 1.13
Scala: 2.11
flink-siddhi: 2.11-0.2.2-SNAPSHOT
I am using library: https://github.com/haoch/flink-siddhi
input json to Kafka:
{
"awsS3":{
"ResourceType":"aws.S3",
"Details":{
"Name":"crossplane-test",
"CreationDate":"2020-08-17T11:28:05+00:00"
},
"AccessBlock":{
"PublicAccessBlockConfiguration":{
"BlockPublicAcls":true,
"IgnorePublicAcls":true,
"BlockPublicPolicy":true,
"RestrictPublicBuckets":true
}
},
"Location":{
"LocationConstraint":"us-west-2"
}
}
}
main class:
public class S3SidhiApp {
public static void main(String[] args) {
internalStreamSiddhiApp.start();
//kafkaStreamApp.start();
}
}
App class:
package flinksidhi.app;
import com.google.gson.JsonObject;
import flinksidhi.event.s3.source.S3EventSource;
import io.siddhi.core.SiddhiManager;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.siddhi.SiddhiCEP;
import org.json.JSONObject;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Map;
import static flinksidhi.app.connector.Consumers.createInputMessageConsumer;
import static flinksidhi.app.connector.Producer.*;
public class internalStreamSiddhiApp {
private static final String inputTopic = "EVENT_STREAM_INPUT";
private static final String outputTopic = "EVENT_STREAM_OUTPUT";
private static final String consumerGroup = "EVENT_STREAM1";
private static final String kafkaAddress = "localhost:9092";
private static final String zkAddress = "localhost:2181";
private static final String S3_CQL1 = "from inputStream select * insert into temp";
private static final String S3_CQL = "from inputStream select json:toObject(awsS3) as obj insert into temp;" +
"from temp select json:getString(obj,'$.awsS3.ResourceType') as affected_resource_type," +
"json:getString(obj,'$.awsS3.Details.Name') as affected_resource_name," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration') as encryption," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm') as algorithm insert into temp2; " +
"from temp2 select affected_resource_name,affected_resource_type, " +
"ifThenElse(encryption == ' ','Fail','Pass') as state," +
"ifThenElse(encryption != ' ' and algorithm == 'aws:kms','None','Critical') as severity insert into outputStream";
public static void start(){
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//DataStream<String> inputS = env.addSource(new S3EventSource());
//Flink kafka stream consumer
FlinkKafkaConsumer<String> flinkKafkaConsumer =
createInputMessageConsumer(inputTopic, kafkaAddress,zkAddress, consumerGroup);
//Add Data stream source -- flink consumer
DataStream<String> inputS = env.addSource(flinkKafkaConsumer);
SiddhiCEP cep = SiddhiCEP.getSiddhiEnvironment(env);
cep.registerExtension("json:toObject", io.siddhi.extension.execution.json.function.ToJSONObjectFunctionExtension.class);
cep.registerExtension( "json:getString", io.siddhi.extension.execution.json.function.GetStringJSONFunctionExtension.class);
cep.registerStream("inputStream", inputS, "awsS3");
inputS.print();
System.out.println(cep.getDataStreamSchemas());
//json needs extension jars to present during runtime.
DataStream<Map<String,Object>> output = cep
.from("inputStream")
.cql(S3_CQL1)
.returnAsMap("temp");
//Flink kafka stream Producer
FlinkKafkaProducer<Map<String, Object>> flinkKafkaProducer =
createMapProducer(env,outputTopic, kafkaAddress);
//Add Data stream sink -- flink producer
output.addSink(flinkKafkaProducer);
output.print();
try {
env.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Consumer class:
package flinksidhi.app.connector;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.json.JSONObject;
import java.util.Properties;
public class Consumers {
public static FlinkKafkaConsumer<String> createInputMessageConsumer(String topic, String kafkaAddress, String zookeeprAddr, String kafkaGroup ) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", kafkaAddress);
properties.setProperty("zookeeper.connect", zookeeprAddr);
properties.setProperty("group.id",kafkaGroup);
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(
topic,new SimpleStringSchema(),properties);
return consumer;
}
}
Producer class:
package flinksidhi.app.connector;
import flinksidhi.app.util.ConvertJavaMapToJson;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.json.JSONObject;
import java.util.Map;
public class Producer {
public static FlinkKafkaProducer<Tuple2> createStringProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Tuple2>(kafkaAddress, topic, new AverageSerializer());
}
public static FlinkKafkaProducer<Map<String,Object>> createMapProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Map<String,Object>>(kafkaAddress, topic, new SerializationSchema<Map<String, Object>>() {
#Override
public void open(InitializationContext context) throws Exception {
}
#Override
public byte[] serialize(Map<String, Object> stringObjectMap) {
String json = ConvertJavaMapToJson.convert(stringObjectMap);
return json.getBytes();
}
});
}
}
I have tried many things but the code where the CQL is invoked is never called and doesn't even give any error not sure where is it going wrong.
The same thing if I do creating an internal stream source and use the same input json to return as string it works.

Initial guess: if you are using event time, are you sure you have defined watermarks correctly? As stated in the docs:
(...) an incoming element is initially put in a buffer where elements are sorted in ascending order based on their timestamp, and when a watermark arrives, all the elements in this buffer with timestamps smaller than that of the watermark are processed (...)
If this doesn't help, I would suggest to decompose/simplify the job to a bare minimum, for example just a source operator and some naive sink printing/logging elements. And if that works, start adding back operators one by one. You could also start by simplifying your CEP pattern as much as possible.

First of all thanks a lot #Piotr Nowojski , just because of your small pointer which no matter how many times I pondered over about event time , it did not came in my mind. So yes while debugging the two cases:
With internal datasource , where it was processing successfully, while debugging the flow , I identified that it was processing a watermark after it was processing the data, but it did not catch me that it was somehow managing the event time of the data implicitly.
With kafka as a datasource , while I was debugging I could very clearly see that it was not processing any watermark in the flow, but it did not occur to me that , it is happening because of the event time and watermark not handled properly.
Just adding a single line of code in the application code which I understood from below Flink code snippet:
#deprecated In Flink 1.12 the default stream time characteristic has been changed to {#link
* TimeCharacteristic#EventTime}, thus you don't need to call this method for enabling
* event-time support anymore. Explicitly using processing-time windows and timers works in
* event-time mode. If you need to disable watermarks, please use {#link
* ExecutionConfig#setAutoWatermarkInterval(long)}. If you are using {#link
* TimeCharacteristic#IngestionTime}, please manually set an appropriate {#link
* WatermarkStrategy}. If you are using generic "time window" operations (for example {#link
* org.apache.flink.streaming.api.datastream.KeyedStream#timeWindow(org.apache.flink.streaming.api.windowing.time.Time)}
* that change behaviour based on the time characteristic, please use equivalent operations
* that explicitly specify processing time or event time.
*/
I got to know that by default flink considers event time and for that watermark needs to be handled properly which I didn't so I added below link for setting the time characteristics of the flink execution environment:
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
and kaboom ... it started working , while this is deprecated and needs some other configuration, but thanks a lot , it was a great pointer and helped me a lot and I solved the issue..
Thanks again #Piotr Nowojski

How to convert from Mongo Document to java Set<String>?

I've saved Set<String> into mongodb as array and after that I want to load it again into Set<String>. How to do this?
My try returns exception:
package Database;
import static com.mongodb.client.model.Filters.eq;
import static com.mongodb.client.model.Projections.fields;
import static com.mongodb.client.model.Projections.include;
import java.util.HashSet;
import java.util.Set;
import org.bson.Document;
import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.FindIterable;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
public class StackOverflow {
public static void main(String[] args) {
// insert something to mongo:
final String URI = "mongodb://localhost:27017";
final String DB = "StackOverflowQuestion";
final String COLLECTION = "eqDoesntExcist";
MongoClientURI connection = new MongoClientURI(URI);
MongoClient mongo = new MongoClient(connection);
MongoDatabase database = mongo.getDatabase(DB);
MongoCollection<Document> collection = database.getCollection(COLLECTION);
Set<String> namesOfTroysKids = new HashSet<>();
namesOfTroysKids.add("Paul");
namesOfTroysKids.add("Jane");
namesOfTroysKids.add("Mark");
namesOfTroysKids.add("Ivona");
Document doc = new Document("name", "Troy").append("height", 185).append("kids", namesOfTroysKids);
collection.insertOne(doc);
// read something from mongo
FindIterable<Document> findIt = collection.find(eq("name", "Troy")).projection(fields(include("kids")));
Document d = findIt.first();
Set<String> kids = (Set<String>) d; // ERROR !!!
///Exception in thread "main" java.lang.ClassCastException: org.bson.Document cannot be cast to java.util.Set
//at Database.StackOverflow.main(StackOverflow.java:45)
}
}
There was method toArray() but it is for DBObject which is depreciated.

The document returned by your query is:
{
"_id": {
"$oid": "_id_value_"
},
"kids": [
"Mark",
"Ivona",
"Paul",
"Jane"
]
}
That obviously can not be implicitly coerced into a set. Its now just a matter of obtaining the kids from the document as a list and instantiating a Set from it:
public static void main(String [] args) throws Exception {
final String URI = "mongodb://localhost:27017";
final String DB = "StackOverflowQuestion";
final String COLLECTION = "eqDoesntExcist";
MongoClientURI connection = new MongoClientURI(URI);
MongoClient mongo = new MongoClient(connection);
MongoDatabase database = mongo.getDatabase(DB);
MongoCollection<Document> collection = database.getCollection(COLLECTION);
Set<String> namesOfTroysKids = new HashSet<>();
namesOfTroysKids.add("Paul");
namesOfTroysKids.add("Jane");
namesOfTroysKids.add("Mark");
namesOfTroysKids.add("Ivona");
Document doc = new Document("name", "Troy").append("height", 185).append("kids", namesOfTroysKids);
collection.insertOne(doc);
// read something from mongo
FindIterable<Document> findIt = collection.find(Filters.eq("name", "Troy")).projection(Projections.include("kids"));
Document d = findIt.first();
System.out.println("doc: " + d.toJson());
List<String> kidsList = (List<String>) d.get("kids", List.class);
Set<String> kidsSet = new HashSet<>(kidsList);
System.out.println("kids: " + kidsSet);
}

Basically an Iterable type is something that is meant to loop through. If you open a cursor using the Iterable type the only way to assign it to a java data type is to grab one iteration. You have done this by using the first method which will grab the first document returned by your query, note that if that is your intention you can use sort to control which Document is returned.
Below is the code. When i do this i generally do whatever i need to with the data one by one unless i need the entire dataset before i start my processing.
Set<String> kids = new Set<String>;
for(Document kidDoc : collection.find(eq("name", "Troy"))){
kids.add(kidDoc.getString("kids")))
}

Why eq doesn't exist for mongo-java-driver?

I've found in mongodb tutorial for java about how to query from mongo collection but the eq which they use doesn't work for me! Do you know how to filter documents from a collection with mongo and java?
This is my try:
package Database;
import org.bson.Document;
import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.FindIterable;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
public class StackOverflow {
public static void main(String[] args) {
// insert something to mongo:
final String URI = "mongodb://localhost:27017";
final String DB = "StackOverflowQuestion";
final String COLLECTION = "eqDoesntExcist";
MongoClientURI connection = new MongoClientURI(URI);
MongoClient mongo = new MongoClient(connection);
MongoDatabase database = mongo.getDatabase(DB);
MongoCollection<Document> collection = database.getCollection(COLLECTION);
Document doc = new Document("name", "Troy").append("height", 185);
collection.insertOne(doc);
doc = new Document("name", "Ann").append("height", 175);
collection.insertOne(doc);
// read something from mongo
FindIterable<Document> findIt = collection.find(eq("name", "Troy"));
// ERROR!!! the method eq(String, String) is undefined!
mongo.close();
}
}
I want something like:
SELECT * from eqDoesntExcist WHERE name = "Troy"

You can use an eq Filter there as:
Bson bsonFilter = Filters.eq("name", "Troy");
FindIterable<Document> findIt = collection.find(bsonFilter);
or else to make it look the way doc suggests include a static import for the method call Filters.eq
import static com.mongodb.client.model.Filters.eq;
and further use the same piece of code as yours :
FindIterable<Document> findIt = collection.find(eq("name", "Troy")); // static import is the key to such syntax

you can not do this:
collection.find(eq("name", "Troy"));
because the compiler will expect in your class StackOverflow a method with the name eq and this is not what you need..
what you are looking for is defined in the Filter class
public static <TItem> Bson eq(String fieldName, Item value)
so it may be
collection.find(Filters.eq("name", "Troy"));

Google Cloud Dataflow issue with writing the data (TextIO or DatastoreIO)

OK, everyone. Another Dataflow question from a Dataflow newbie. (Just started playing with it this week..)
I'm creating a datapipe to take in a list of product names and generate autocomplete data. The data processing part is all working fine, it seems, but I'm missing something obvious because when I add my last ".apply" to use either DatastoreIO or TextIO to write the data out, I'm getting a syntax error in my IDE that says the following:
"The method apply(DatastoreV1.Write) is undefined for the type ParDo.SingleOutput>,Entity>"
If gives me an option add a cast to the method receiver, but that obviously isn't the answer. Do I need to do some other step before I try to write the data out? My last step before trying to write the data is a call to an Entity helper for Dataflow to change my Pipeline structure from > to , which seems to me like what I'd need to write to Datastore.
I got so frustrated with this thing the last few days, I even decided to write the data to some AVRO files instead so I could just load it in Datastore by hand. Imagine how ticked I was when I got all that done and got the exact same error in the exact same place on my call to TextIO. That is why I think I must be missing something very obvious here.
Here is my code. I included it all for reference, but you probably just need to look at the main[] at the bottom. Any input would be greatly appreciated! Thanks!
MrSimmonsSr
package com.client.autocomplete;
import com.client.autocomplete.AutocompleteOptions;
import com.google.datastore.v1.Entity;
import com.google.datastore.v1.Key;
import com.google.datastore.v1.Value;
import static com.google.datastore.v1.client.DatastoreHelper.makeKey;
import static com.google.datastore.v1.client.DatastoreHelper.makeValue;
import org.apache.beam.sdk.coders.DefaultCoder;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionList;
import com.google.api.services.bigquery.model.TableRow;
import com.google.common.base.MoreObjects;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.gcp.datastore.DatastoreIO;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.transforms.GroupByKey;
import org.apache.beam.sdk.transforms.DoFn.ProcessContext;
import org.apache.beam.sdk.transforms.DoFn.ProcessElement;
import org.apache.beam.sdk.extensions.jackson.ParseJsons;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.StreamingOptions;
import org.apache.beam.sdk.options.Validation;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.List;
import java.util.ArrayList;
/*
* A simple Dataflow pipeline to create autocomplete data from a list of
* product names. It then loads that prefix data into Google Cloud Datastore for consumption by
* a Google Cloud Function. That function will take in a prefix and return a list of 10 product names
*
* Pseudo Code Steps
* 1. Load a list of product names from Cloud Storage
* 2. Generate prefixes for use with autocomplete, based on the product names
* 3. Merge the prefix data together with 10 products per prefix
* 4. Write that prefix data to the Cloud Datastore as a KV with a <String>, List<String> structure
*
*/
public class ClientAutocompletePipeline {
private static final Logger LOG = LoggerFactory.getLogger(ClientAutocompletePipeline.class);
/**
* A DoFn that keys each product name by all of its prefixes.
* This creates one row in the PCollection for each prefix<->product_name pair
*/
private static class AllPrefixes
extends DoFn<String, KV<String, String>> {
private final int minPrefix;
private final int maxPrefix;
public AllPrefixes(int minPrefix) {
this(minPrefix, 10);
}
public AllPrefixes(int minPrefix, int maxPrefix) {
this.minPrefix = minPrefix;
this.maxPrefix = maxPrefix;
}
#ProcessElement
public void processElement(ProcessContext c) {
String productName= c.element().toString();
for (int i = minPrefix; i <= Math.min(productName.length(), maxPrefix); i++) {
c.output(KV.of(productName.substring(0, i), c.element()));
}
}
}
/**
* Takes as input the top product names per prefix, and emits an entity
* suitable for writing to Cloud Datastore.
*
*/
static class FormatForDatastore extends DoFn<KV<String, List<String>>, Entity> {
private String kind;
private String ancestorKey;
public FormatForDatastore(String kind, String ancestorKey) {
this.kind = kind;
this.ancestorKey = ancestorKey;
}
#ProcessElement
public void processElement(ProcessContext c) {
// Initialize an EntityBuilder and get it a valid key
Entity.Builder entityBuilder = Entity.newBuilder();
Key key = makeKey(kind, ancestorKey).build();
entityBuilder.setKey(key);
// New HashMap to hold all the properties of the Entity
Map<String, Value> properties = new HashMap<>();
String prefix = c.element().getKey();
String productsString = "Products[";
// iterate through the product names and add each one to the productsString
for (String productName : c.element().getValue()) {
// products.add(productName);
productsString += productName + ", ";
}
productsString += "]";
properties.put("prefix", makeValue(prefix).build());
properties.put("products", makeValue(productsString).build());
entityBuilder.putAllProperties(properties);
c.output(entityBuilder.build());
}
}
/**
* Options supported by this class.
*
* <p>Inherits standard Beam example configuration options.
*/
public interface Options
extends AutocompleteOptions {
#Description("Input text file")
#Validation.Required
String getInputFile();
void setInputFile(String value);
#Description("Cloud Datastore entity kind")
#Default.String("prefix-product-map")
String getKind();
void setKind(String value);
#Description("Whether output to Cloud Datastore")
#Default.Boolean(true)
Boolean getOutputToDatastore();
void setOutputToDatastore(Boolean value);
#Description("Cloud Datastore ancestor key")
#Default.String("root")
String getDatastoreAncestorKey();
void setDatastoreAncestorKey(String value);
#Description("Cloud Datastore output project ID, defaults to project ID")
String getOutputProject();
void setOutputProject(String value);
}
public static void main(String[] args) throws IOException{
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
// create the pipeline
Pipeline p = Pipeline.create(options);
PCollection<String> toWrite = p
// A step to read in the product names from a text file on GCS
.apply(TextIO.read().from("gs://sample-product-data/clean_product_names.txt"))
// Next expand the product names into KV pairs with prefix as key (<KV<String, String>>)
.apply("Explode Prefixes", ParDo.of(new AllPrefixes(2)))
// Apply a GroupByKey transform to the PCollection "flatCollection" to create "productsGroupedByPrefix".
.apply(GroupByKey.<String, String>create())
// Now format the PCollection for writing into the Google Datastore
.apply("FormatForDatastore", ParDo.of(new FormatForDatastore(options.getKind(),
options.getDatastoreAncestorKey()))
// Write the processed data to the Google Cloud Datastore
// NOTE: This is the line that I'm getting the error on!!
.apply(DatastoreIO.v1().write().withProjectId(MoreObjects.firstNonNull(
options.getOutputProject(), options.getOutputProject()))));
// Run the pipeline.
PipelineResult result = p.run();
}
}

I think you need another closing parenthesis. I've removed some of the extraneous bits and reindent according to the parentheses:
PCollection<String> toWrite = p
.apply(TextIO.read().from("..."))
.apply("Explode Prefixes", ...)
.apply(GroupByKey.<String, String>create())
.apply("FormatForDatastore", ParDo.of(new FormatForDatastore(
options.getKind(), options.getDatastoreAncestorKey()))
.apply(...);
Specifically, you need another parenthesis to close the apply("FormatForDatastore", ...). Right now, it is trying to call ParDo.of(...).apply(...) which doesn't work.

Java MongoDB Query Criteria (WHERE date > X and field = value) Ignores Second Clause

import java.util.Date;
import java.util.List;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.mongodb.core.MongoOperations;
import org.springframework.data.mongodb.core.query.Criteria;
import org.springframework.data.mongodb.core.query.Query;
public class CustomQuery {
#Autowired private MongoOperations mongoOperations;
public void customQuery(Date submittalDate) {
List<Question> q1s = mongoOperations.find(
new Query(Criteria.where("category").is("New")),
Question.class);
List<Question> q2s = mongoOperations.find(
new Query(
Criteria.where("submittalDate").gt(submittalDate).and("category").is("New")
),
Question.class);
}
}
The top Spring Java MongoDB query gives back the expected results in q1s.
The bottom query should return a subset of the top query. Instead, records which match ("submittalDate").gt(submittalDate) are in the q2s results regardless of whether or not they are in the "New" category.
i.e. it is like and("category").is("New") from the second query is being ignored.
Using Mongodb version v2.0.6 32-bit with Spring Data.
Help appreciated.
Update 05/09/2012
Still doesn't work
Update 26/08/2012
This returns results on the Mongo command line:
db.foo.find( { "submittalDate":{ "$gte": ISODate("2012-07-31T23:00:00.000Z") }, "category" : "New" } )
In constrast, the Java code (for the same date paramter) doesn't work. For comparison, the query logged by DEBUG from Java is:
[DEBUG] [http-8080-1] (MongoTemplate.java:doFind:1256) find using query:
{ "submittalDate" : { "$gte" : { "$date" : "2012-07-31T23:00:00.000Z"}} , "category" : "New"}
Yes, the logging logs a date string whereas to get Mongo shell working I needed to use ISODate(..).
But I'm using MongoDB Java driver with the accepted type of java.util.Date - how could ISODate(..) not appearing be the issue? Issue might have another cause.

I'm no spring expert, but it seems like some of your imports may be conflicting with each other. It's difficult to diagnose exactly where you are going wrong given the documentation I've looked at. If your not set on using the spring framework for this, an alternative/more common approach would be the below.
import com.mongodb.Mongo;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import com.mongodb.DBCursor;
public class CustomQuery {
public void customQuery(Date submittalDate)
{
document = new BasicDBObject();
document.put(("submittalDate").greaterThanEquals(submittalDate).put("category").is("New").get());
DBCursor cursor = getDbCollection().find(document);
}
}

{ "$date" : "2012-07-31T23:00:00.000Z"}
equals to
Date("2012-07-31T23:00:00.000Z")
and Date("2012-07-31T23:00:00.000Z") will return a string, not an ISODate().
via http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON .
I think this is a bug of org.springframework.data.mongodb.core.query.Criteria.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I iterate over a MongoDB Change Stream in Spring Boot? - java

Related

Not able to process kafka json message with Flink siddhi library

How to convert from Mongo Document to java Set<String>?

Why eq doesn't exist for mongo-java-driver?

Google Cloud Dataflow issue with writing the data (TextIO or DatastoreIO)

Java MongoDB Query Criteria (WHERE date > X and field = value) Ignores Second Clause

Categories

Resources