Unrecognized pipeline stage name: '$setOnInsert'

Unrecognized pipeline stage name: '$setOnInsert' - java

Using a standalone MongoDB instance in version 4.4.1 with a Java client that connects using the latest driver (org.mongodb:mongodb-driver-sync:4.1.1), I am getting an error when calling findOneAndUpdate with the $setOnInsert operator.
Here is the query used:
final List<Bson> updates = new ArrayList<>();
updates.add(Updates.set("data", "test"));
updates.add(Updates.setOnInsert("firstSeenTime", new Date()));
final Document updatedDocument =
this.visitorsCollection.findOneAndUpdate(
eq("userId", "u1"), updates, new FindOneAndUpdateOptions().returnDocument(ReturnDocument.AFTER).upsert(true));
The error:
Exception in thread "main" com.mongodb.MongoCommandException: Command
failed with error 40324 (Location40324): 'Unrecognized pipeline stage
name: '$setOnInsert'' on server A.B.C.D:XXXXX. The full
response is {"ok": 0.0, "errmsg": "Unrecognized pipeline stage name:
'$setOnInsert'", "code": 40324, "codeName": "Location40324"} at
com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:175)
at
com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:359)
at
com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:280)
at
com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)
at
com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)
at
com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)
at
com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:255)
at
com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)
at
com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)
at
com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:110)
at
com.mongodb.internal.operation.CommandOperationHelper$13.call(CommandOperationHelper.java:712)
at
com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:620)
at
com.mongodb.internal.operation.CommandOperationHelper.executeRetryableCommand(CommandOperationHelper.java:705)
at
com.mongodb.internal.operation.CommandOperationHelper.executeRetryableCommand(CommandOperationHelper.java:697)
at
com.mongodb.internal.operation.BaseFindAndModifyOperation.execute(BaseFindAndModifyOperation.java:69)
at
com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:195)
at
com.mongodb.client.internal.MongoCollectionImpl.executeFindOneAndUpdate(MongoCollectionImpl.java:785)
at
com.mongodb.client.internal.MongoCollectionImpl.findOneAndUpdate(MongoCollectionImpl.java:765)
If I get rid of the Updates.setOnInsert(...) call, then the update works but not as I would like. My purpose is to set some fields based on whether the document to update exists or not. Looking at the documentation, $setOnInsert should be supported:
https://docs.mongodb.com/manual/reference/operator/update/#id1
Any idea about what is wrong?

The problem here is there are 2 forms of findOneAndUpdate. The second argument can be either:
a document containing update operator expressions
an array containing $set, $unset, and $replaceRoot aggregation stages
Since you are creating updates as an ArrayList, findOneAndUpdate is trying to process it as an aggregation pipeline, which does not recognize a $setOneInsert stage.
You need to build updates as a Document for the update operators to be recognized. Following your example, you can simply wrap the list with Updates.combine(updates) and pass it to findOneAndUpdate as the second parameter.

Related

How to set AvroCoder with KafkaIO and Apache Beam with Java

I'm trying to create a pipeline that streams data from a Kafka topic to google's Bigquery. Data in the topic is in Avro.
I call the apply function 3 times. Once to read from Kafka, once to extract record and once to write to Bigquery. Here is the main part of the code:
pipeline
.apply("Read from Kafka",
KafkaIO
.<byte[], GenericRecord>read()
.withBootstrapServers(options.getKafkaBrokers().get())
.withTopics(Utils.getListFromString(options.getKafkaTopics()))
.withKeyDeserializer(
ConfluentSchemaRegistryDeserializerProvider.of(
options.getSchemaRegistryUrl().get(),
options.getSubject().get())
)
.withValueDeserializer(
ConfluentSchemaRegistryDeserializerProvider.of(
options.getSchemaRegistryUrl().get(),
options.getSubject().get()))
.withoutMetadata()
)
.apply("Extract GenericRecord",
MapElements.into(TypeDescriptor.of(GenericRecord.class)).via(KV::getValue)
)
.apply(
"Write data to BQ",
BigQueryIO
.<GenericRecord>write()
.optimizedWrites()
.useBeamSchema()
.useAvroLogicalTypes()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withSchemaUpdateOptions(ImmutableSet.of(BigQueryIO.Write.SchemaUpdateOption.ALLOW_FIELD_ADDITION))
//Temporary location to save files in GCS before loading to BQ
.withCustomGcsTempLocation(options.getGcsTempLocation())
.withNumFileShards(options.getNumShards().get())
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
.withMethod(FILE_LOADS)
.withTriggeringFrequency(Utils.parseDuration(options.getWindowDuration().get()))
.to(new TableReference()
.setProjectId(options.getGcpProjectId().get())
.setDatasetId(options.getGcpDatasetId().get())
.setTableId(options.getGcpTableId().get()))
);
When running, i get the following error:
Exception in thread "main" java.lang.IllegalStateException: Unable to return a default Coder for Extract GenericRecord/Map/ParMultiDo(Anonymous).output [PCollection]. Correct one of the following root causes: No Coder has been manually specified; you may do so using .setCoder().
Inferring a Coder from the CoderRegistry failed: Unable to provide a Coder for org.apache.avro.generic.GenericRecord.
Building a Coder using a registered CoderProvider failed.
How do I set the coder to properly read Avro?

There are at least three approaches to this:
Set the coder inline:
pipeline.apply("Read from Kafka", ....)
.apply("Dropping key", Values.create())
.setCoder(AvroCoder.of(Schema schemaOfGenericRecord))
.apply("Write data to BQ", ....);
Note that the key is dropped because its unused, with this you wont need MapElements any more.
Register the coder in the pipeline's instance of CoderRegistry:
pipeline.getCoderRegistry().registerCoderForClass(GenericRecord.class, AvroCoder.of(Schema genericSchema));
Get the coder from the schema registry via:
ConfluentSchemaRegistryDeserializerProvider.getCoder(CoderRegistry registry)
https://beam.apache.org/releases/javadoc/2.22.0/org/apache/beam/sdk/io/kafka/ConfluentSchemaRegistryDeserializerProvider.html#getCoder-org.apache.beam.sdk.coders.CoderRegistry-

How to fix UnsupportedOperationException while using spark joinWith to create Tuple2

I am using Java with Spark. I need to create a Tuple2 Dataset by combining two separate Datasets. I am using joinWith as I want the individual objects to remain intact (cannot use join). However this is failing with:
Exception in thread "main" java.lang.UnsupportedOperationException: Cannot evaluate expression: NamePlaceholder
I tried it with and without Alias but am still getting the same error. What am I doing wrong?
Dataset<MyObject1> dsOfMyObject1;
Dataset<MyObject2> dsOfMyObject2;
Dataset<Tuple2<MyObject1, MyObject2>> tuple2Dataset =
dsOfMyObject1.as("A").
joinWith(dsOfMyObject2.as("B"),col("A.keyfield")
.equalTo(col("B.keyfield")));
Exception in thread "main" java.lang.UnsupportedOperationException: Cannot evaluate
expression: NamePlaceholder
at org.apache.spark.sql.catalyst.expressions.Unevaluable$class.eval(Expression.scala:255)
at org.apache.spark.sql.catalyst.expressions.NamePlaceholder$.eval(complexTypeCreator.scala:243)
at org.apache.spark.sql.catalyst.expressions.CreateNamedStructLike$$anonfun$names$1.apply(complexTypeCreator.scala:289)
at org.apache.spark.sql.catalyst.expressions.CreateNamedStructLike$$anonfun$names$1.apply(complexTypeCreator.scala:289)
at scala.collection.immutable.List.map(List.scala:274)

Exception while using GET request in Apache Camel Elasticsearch

Description: I have to read a particular field from Elasticsearch 5.4 using Apache camel. When I use the below code, I'm not able to view the response
Exception: Error building toString out of XContent: com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)
Code:
from("direct:start")
.process(exchange -> {
GetRequest a = new GetRequest("example", "doc", "1");
exchange.getIn().setBody(a);
})
.to("elasticsearch5://elastic?operation=GET_BY_ID&ip=<ip>&port=9300")
.log("${body}");
Complete Stacktrace:
(route1) elasticsearch5://elastic?ip=&operation=GET_BY_ID&port=9300 --> log[messageId] <<< Pattern:InOnly, Headers:{breadcrumbId=ID-NLVHPRAAB02027-53300-1510315731625-0-1}, BodyType:org.elasticsearch.action.support.PlainActionFuture, Body:Error building toString out of XContent: com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)
at com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:1897)
at com.fasterxml.jackson.core.json.JsonGeneratorImpl._reportCantWriteValueExpectName(JsonGeneratorImpl.java:244)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator._verifyValueWrite(UTF8JsonGenerator.java:1033)
at com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeStartObject(UTF8JsonGenerator.java:313)
at org.elasticsearch.common.xcontent.json.JsonXContentGenerator.writeStartObject(JsonXContentGenerator.java:161)
at org.elasticsearch.common.xcontent.XContentBuilder.startObject(XContentBuilder.java:217)
at org.elasticsearch.index.get.GetResult.toXContent(GetResult.java:251)
at org.elasticsearch.action.get.GetResponse.toXContent(GetResponse.java:158)
at org.elasticsearch.common.Strings.toString(Strings.java:901)
at org.elastic... [Body clipped after 1000 chars, total length is 4350]

Are you using the camel-elasticsearch5 component? In that case you need to do something like this:
https://github.com/apache/camel/blob/master/components/camel-elasticsearch5/src/test/java/org/apache/camel/component/elasticsearch5/ElasticsearchGetSearchDeleteExistsUpdateTest.java#L45-L57

Create input control using jasper restapiv2

I'm using Jasper API rest v2 https://github.com/Jaspersoft/jrs-rest-java-client. I'm trying to create input control's dynamically.
ClientInputControl cliInp = new ClientInputControl();
cliInp.setLabel("FUNCIONARIO_ID_1");
cliInp.setDataType(new ClientDataType().setType(TypeOfDataType.date));
cliInp.setUri("/datatypes/FUNCIONARIO_ID_1");
session.resourcesService().resource("/datatypes").createNew(cliInp);
I need to create this input control so I can add to my report.
When executing this code I have
Exception in thread "main" com.jaspersoft.jasperserver.jaxrs.client.core.exceptions.BadRequestException: Bad Request
EDIT
Log files give following error:
mt error:[{
"message":"The type 0 is invalid",
"errorCode":"illegal.parameter.value.error",
"parameters":
["type",
"0"]
}]
Can someone tell me what I'm doing wrong?

You should define more values
ClientDataType type = new ClientDataType()
.setLabel("Data")
.setType(TypeOfDataType.date)
.setUri("/types");
byte singleValue = 2;
ClientInputControl inputControl = new ClientInputControl()
.setLabel("Data")
.setType(singleValue) //this parameter missing is your error
.setDataType(type)
.setUri("/inputs");

Reuse results of first computation in second computation

I'm trying to write a computation in Flink which requires two phases.
In the first phase I start from a text file, and perform some parameter estimation, obtaining as a result a Java object representing a statistical model of the data.
In the second phase, I'd like to use this object to generate data for a simulation.
I'm unsure how to do this. I tried with a LocalCollectionOutputFormat, and it works locally, but when I deploy the job on a cluster, I get a NullPointerException - which is not really surprising.
What is the Flink way of doing this?
Here is my code:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
GlobalConfiguration.includeConfiguration(configuration);
// Phase 1: read file and estimate model
DataSource<Tuple4<String, String, String, String>> source = env
.readCsvFile(args[0])
.types(String.class, String.class, String.class, String.class);
List<Tuple4<Bayes, Bayes, Bayes, Bayes>> bayesResult = new ArrayList<>();
// Processing here...
....output(new LocalCollectionOutputFormat<>(bayesResult));
env.execute("Bayes");
DataSet<BTP> btp = env
.createInput(new BayesInputFormat(bayesResult.get(0)))
// Phase 2: BayesInputFormat generates data for further calculations
// ....
This is the exception I get:
Error: The program execution failed: java.lang.NullPointerException
at org.apache.flink.api.java.io.LocalCollectionOutputFormat.close(LocalCollectionOutputFormat.java:86)
at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:176)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
at java.lang.Thread.run(Thread.java:745)
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: java.lang.NullPointerException
at org.apache.flink.api.java.io.LocalCollectionOutputFormat.close(LocalCollectionOutputFormat.java:86)
at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:176)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
at java.lang.Thread.run(Thread.java:745)
at org.apache.flink.client.program.Client.run(Client.java:328)
at org.apache.flink.client.program.Client.run(Client.java:294)
at org.apache.flink.client.program.Client.run(Client.java:288)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
at it.list.flink.test.Test01.main(Test01.java:62)
...

With the latest release (0.9-milestone-1) a collect() method was added to Flink
public List<T> collect()
which fetches a DataSet<T> as List<T> to the driver program. collect() will also trigger an immediate execution of the program (don't need to call ExecutionEnvironment.execute()). Right now, there is size limitation for data sets of about 10 MB.
If you do not evaluate the models in the driver program, you can also chain both programs together and emit the model to the side by attaching a data sink. This will be more efficient, because the data won't do the round-trip over the client machine.

If you're using Flink prior to 0.9 you may use the following snippet to collect your dataset to a local collection:
val dataJavaList = new ArrayList[K]
val outputFormat = new LocalCollectionOutputFormat[K](dataJavaList)
dataset.output(outputFormat)
env.execute("collect()")
Where K is the type of object you want to collect

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Unrecognized pipeline stage name: '$setOnInsert' - java

Related

How to set AvroCoder with KafkaIO and Apache Beam with Java

How to fix UnsupportedOperationException while using spark joinWith to create Tuple2

Exception while using GET request in Apache Camel Elasticsearch

Create input control using jasper restapiv2

Reuse results of first computation in second computation

Categories

Resources