MarkLogic Wilcard Search - QConsole vs. Java API - java

I believe I’m seeing different results from a Java-based query and what I believe is the equivalent cts:search in the query console. There's a lot of information here and I tried to organize it appropriately. Here are the steps to set up a simple example that replicates what I’m seeing.
Create new database with default settings
Add new forest with default settings
Enable three character searches (only non-default database setting)
Insert the three json documents below into the database
Query console returns doc2. Java client returns doc2 AND doc1. Why? I would expect the same results from each. I want to get the results in Java that the query console is returning. Am I writing the query definition in Java incorrectly?
It looks like the Java client wildcard search is searching the entire document even though I’ve specified that I only want to do a wildcard search inside of the given json-property (name.)
Is there a way to see or log the resultant server-side “cts query” given a client-side RawCombinedQueryDefinition? I'd like to see what the Java request gets translated into on the server side.
doc1.json
{
"state": "OH",
"city": "Dayton",
"notes": "not Cincinnati"
}
doc2.json
{
"state": "OH",
"city": "Cincinnati",
"notes": "real city"
}
doc3.json
{
"state": "OH",
"city": "Daytona",
"notes": "this is a made up city"
}
Query console code used to insert documents
xquery version "1.0-ml";
xdmp:document-load("/some/path/doc1.json",
<options xmlns="xdmp:document-load">
<uri>/doc1.json</uri>
</options>
);
Query console code used to search
xquery version "1.0-ml";
cts:search(fn:collection(),
cts:and-query((
cts:json-property-value-query("state", "OH"),
cts:json-property-value-query("city", "*Cincinnati*")
))
)
Java QueryManager query in easy to read text
{
"search": {
"query": {
"queries": [
{
"value-query": {
"type": "string",
"json-property": "state",
"text": "OH"
}
},
{
"value-query": {
"type": "string",
"json-property": "city",
"text": "*Cincinnati*"
}
}
]
}
}
}
Java code
import com.marklogic.client.DatabaseClient;
import com.marklogic.client.DatabaseClientFactory;
import com.marklogic.client.document.DocumentPage;
import com.marklogic.client.document.DocumentRecord;
import com.marklogic.client.document.JSONDocumentManager;
import com.marklogic.client.io.Format;
import com.marklogic.client.io.StringHandle;
import com.marklogic.client.query.QueryManager;
import com.marklogic.client.query.RawCombinedQueryDefinition;
import org.junit.Test;
public class MarkLogicTest
{
#Test
public void testWildcardSearch()
{
DatabaseClientFactory.SecurityContext securityContext = new DatabaseClientFactory.DigestAuthContext("admin", "admin");
DatabaseClient client = DatabaseClientFactory.newClient("localhost", 8000, "test", securityContext);
QueryManager queryManager = client.newQueryManager();
JSONDocumentManager documentManager = client.newJSONDocumentManager();
String query = "{\n" +
" \"search\": {\n" +
" \"query\": {\n" +
" \"queries\": [\n" +
" {\n" +
" \"value-query\": {\n" +
" \"type\": \"string\",\n" +
" \"json-property\": \"state\",\n" +
" \"text\": \"OH\"\n" +
" }\n" +
" },\n" +
" {\n" +
" \"value-query\": {\n" +
" \"type\": \"string\",\n" +
" \"json-property\": \"city\",\n" +
" \"text\": \"*Cincinnati*\"\n" +
" }\n" +
" }\n" +
" ]\n" +
" }\n" +
" }\n" +
"}";
StringHandle queryHandle = new StringHandle(query).withFormat(Format.JSON);
RawCombinedQueryDefinition queryDef = queryManager.newRawCombinedQueryDefinition(queryHandle);
DocumentPage documents = documentManager.search(queryDef, 1);
while (documents.hasNext())
{
DocumentRecord document = documents.next();
StringHandle resultHandle = document.getContent(new StringHandle());
String result = resultHandle.get();
System.out.println(result);
}
}
}
System.out.println() results
{"state":"OH", "city":"Dayton", "notes":"not Cincinnati"}
{"state":"OH", "city":"Cincinnati", "notes":"real city"}
Why does the Java client return the first result where city = Dayton?
Thanks in advance!

The REST API and thus the Java API executes an unfiltered search by default (meaning, the matches are based entirely on the indexes). By contrast, cts:search() executes a filtered search by default (meaning, the result documents are inspected to throw out false positives).
If you add the "unfiltered" option to cts:search(), it also returns both documents.
The quick fix is to add the "filtered" option to the Java API search, but the better fix for performance at scale is to refine the indexes to support exact matching for the required wildcard queries.
Elements are correlated with wildcards based on position.
Thus, for this query, I believe you need to turn on the index configurations for element word positions and for three character word positions.
Hoping that helps,

From a quick look at the above code, You do not have the AND query in your java example. Therefore it is an or-query fo Ohio OR Cincinnati.

Related

How can I get all the entities used in a SPARQL-DL Query?

I'm currently working on a project where I need to graphically represent a SPARQL-DL query.
To do that, I need to get all the entities used in a query (at the end, all the entities used in the query and the results from the query). I'm struggling with getting all the entities of the query. Is there an easy way to get all the atoms of the query ?
The library I'm using is OWL-API 4.2.8 with the latest SPARQL-DL-API. I'm using the Example_Basic.java file to try my method.
Here's the query I used as an example (it gives me all the wines that are located in New Zealand):
PREFIX wine: <http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#>
SELECT ?wine
WHERE {
PropertyValue(?wine, wine:locatedIn, wine:NewZealandRegion)
}
the method I use :
extractAllQueryEntities(QueryResult result) {
List<QueryAtomGroup> queryAtomGroups = result.getQuery().getAtomGroups();
for (QueryAtomGroup queryAtomGroup : queryAtomGroups) {
List<QueryAtom> atoms = queryAtomGroup.getAtoms();
System.out.println("Size of the atoms: " + atoms.size());
Iterator<QueryAtom> queryAtom = atoms.iterator();
while (queryAtom.hasNext()) {
QueryAtom element = queryAtom.next();
System.out.println("atom: " + element);
List<QueryArgument> arguments = element.getArguments();
for (QueryArgument argument : arguments) {
System.out.println("type: " + argument.getType() + " : value: " + argument.getValueAsString());
}
}
}
}
and here's the result I get from my method:
Results:
.
.
.
some wines
.
.
.
Size of the atoms: 1
atom: PropertyValue(?de.derivo.sparqldlapi.Var#37b009, http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#locatedIn, http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#NewZealandRegion)
type: VAR : value: wine
type: URI : value: http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#locatedIn
type: URI : value: http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#NewZealandRegion

How to convert a json to json using external mapping

We are calling an API in our java server which returns a json.
Now we want to convert this json into another json. But we want to have the conversion mapping in a separate file.
As an example,
Json A:
{
"json_id":"3",
"json_another_id":"43",
"code":34,
"json_name":"Yey"
}
To convert into Json B:
{
"id":"3",
"another_id":"43",
"code":34,
"name":"Yeyproto"
}
Below is the mapping which I want it to be present in the separate file:
"json_id" -> "id"
"json_another_id" -> "another_id"
"json_name" -> "name"
Please let me know a good solution for this. I have checked out jackson mixin but it converts from json to pojo
Try JOLT. It is a JSON to JSON transformation library written in Java where the "specification" for the transform is itself a JSON document. It was created on a project that was transforming lot of JSON from an ElasticSearch "backend" to a frontend api.
One popular tool for JSON to JSON transforms in Java is Jolt. You can simply achieve this with given Jolt Spec as follows:
Jolt Spec
[
{
"operation": "shift",
"spec": {
"#json_id": "id",
"#json_another_id": "another_id",
"#code": "code",
"#json_name": "name"
}
}
]
You can verify it by Jolt Transform Demo to put your original JSON string and the Jolt Spec, then press Transform button to see the result. Following code snippet shows how to achieve this in Java.
Maven dependency
<dependency>
<groupId>com.bazaarvoice.jolt</groupId>
<artifactId>jolt-core</artifactId>
<version>0.1.0</version>
</dependency>
<dependency>
<groupId>com.bazaarvoice.jolt</groupId>
<artifactId>json-utils</artifactId>
<version>0.1.0</version>
</dependency>
Code snippet
String spec = "[\r\n" +
" {\r\n" +
" \"operation\": \"shift\",\r\n" +
" \"spec\": {\r\n" +
" \"#json_id\": \"id\",\r\n" +
" \"#json_another_id\": \"another_id\",\r\n" +
" \"#code\": \"code\",\r\n" +
" \"#json_name\": \"name\"\r\n" +
" }\r\n" +
" }\r\n" +
"]";
Chainr chainr = Chainr.fromSpec(JsonUtils.jsonToList(spec));
Object transformedOutput = chainr.transform(JsonUtils.jsonToObject(jsonStr));
System.out.println(transformedOutput.toString());
Console output
{id=3, another_id=43, code=34, name=Yey}

How To Pass Variable To Pepper-Box Plain Text Config

I'm setting up my Pepper-Box Plain text Config to Pass variable using ${accountNumber}, ${{accountNumber}}, {{accountNumber}}, and using function to return string, but it didn't work.
This is my message to kafka :
{
"eventName": "OFFER",
"payload": {
"accountNumber": "${accountNumber}",
"Limit": 20000000
}
}
but the variable didn't pass, when I see the debug sampler, the accountNumber is pass.
I think there is mistake when I call the variable, but I try some techniques but it didnt work too.
The error Message when I try ${{accountNumber}} is :
symbol: method accountNumber()
location: class MessageIterator1566802574812
1 error
Uncaught Exception java.lang.ClassFormatError: Truncated class file. See log file for details.
It looks like a limitation of the plugin, you're basically limited to Schema Template Functions
Alternatively you can send a record to Kafka using JSR223 Sampler and the following Groovy code:
import org.apache.jmeter.threads.JMeterVariables
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.clients.producer.ProducerRecord
def kafkaProps = new Properties()
kafkaProps.put(org.apache.kafka.clients.producer.ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
kafkaProps.put(org.apache.kafka.clients.producer.ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer")
kafkaProps.put(org.apache.kafka.clients.producer.ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.LongSerializer.class.getName())
kafkaProps.put(org.apache.kafka.clients.producer.ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringSerializer.class.getName())
def producer = new KafkaProducer<>(kafkaProps)
JMeterVariables vars = new JMeterVariables()
vars.put("accountNumber", "foo")
def record = new ProducerRecord<>("test", "{\n" +
" \"eventName\": \"OFFER\",\n" +
" \"payload\": {\n" +
" \"accountNumber\": \"" + vars.get("accountNumber") + "\",\n" +
" \"Limit\": 20000000\n" +
" }\n" +
"}")
producer.send(record)
More information: Apache Kafka - How to Load Test with JMeter

How to create a WSResponse object from string for Play WSClient

Documentation suggests testing API client based on WSClient using a mock web service, that is, create a play.server.Server which will respond to real HTTP requests.
I would prefer to create WSResponse objects directly from files, complete with status line, header lines and body, without real TCP connections. That would require less dependencies and run faster. Also there may be other cases when this is useful.
But I can't find a simple way to do it. It seems all implementations wrapped by WSResponse are tied to reading from network.
Should I just create my own subclass of WSResponse for this, or maybe I'm wrong and it already exists?
The API for Play seems intentionally obtuse. You have to use their "Cacheable" classes, which are the only ones that seem directly instantiable from objects you'd have lying around.
This should get you started:
import play.api.libs.ws.ahc.AhcWSResponse;
import play.api.libs.ws.ahc.cache.CacheableHttpResponseBodyPart;
import play.api.libs.ws.ahc.cache.CacheableHttpResponseHeaders;
import play.api.libs.ws.ahc.cache.CacheableHttpResponseStatus;
import play.shaded.ahc.io.netty.handler.codec.http.DefaultHttpHeaders;
import play.shaded.ahc.org.asynchttpclient.Response;
import play.shaded.ahc.org.asynchttpclient.uri.Uri;
AhcWSResponse response = new AhcWSResponse(new Response.ResponseBuilder()
.accumulate(new CacheableHttpResponseStatus(Uri.create("uri"), 200, "status text", "protocols!"))
.accumulate(new CacheableHttpResponseHeaders(false, new DefaultHttpHeaders().add("My-Header", "value")))
.accumulate(new CacheableHttpResponseBodyPart("my body".getBytes(), true))
.build());
The mystery boolean values aren't documented. My guess is the boolean for BodyPart is whether that is the last part of the body. My guess for Headers is whether the headers are in the trailer of a message.
I used another way, mocking WSResponse with Mockito:
import play.libs.ws.WSRequest;
import play.libs.ws.WSResponse;
import org.mockito.Mockito;
...
final WSResponse wsResponseMock = Mockito.mock(WSResponse.class);
Mockito.doReturn(200).when(wsResponseMock).getStatus();
final String jsonStr = "{\n"
+ " \"response\": {\n"
+ " \"route\": [\n"
+ " { \"summary\" :\n"
+ " {\n"
+ " \"distance\": 23\n"
+ " }\n"
+ " }\n"
+ " ]\n"
+ " }\n"
+ "}";
ObjectMapper mapper = new ObjectMapper();
JsonNode jsonNode = null;
try {
jsonNode = mapper.readTree(jsonStr);
} catch (IOException e) {
e.printStackTrace();
}
Mockito.doReturn(
jsonNode)
.when(wsResponseMock)
.asJson();
If you are using play-framework 2.8.x and scala, the below code can help us generate a dummy WSResponse:
import play.api.libs.ws.ahc.AhcWSResponse
import play.api.libs.ws.ahc.cache.CacheableHttpResponseStatus
import play.shaded.ahc.org.asynchttpclient.Response
import play.shaded.ahc.org.asynchttpclient.uri.Uri
import play.api.libs.ws.ahc.cache.CacheableHttpResponseBodyPart
import play.shaded.ahc.io.netty.handler.codec.http.DefaultHttpHeaders
class OutputWriterSpec extends FlatSpec with Matchers {
val respBuilder = new Response.ResponseBuilder()
respBuilder.accumulate(new CacheableHttpResponseStatus(Uri.create("http://localhost:9000/api/service"), 202, "status text", "json"))
respBuilder.accumulate(new DefaultHttpHeaders().add("Content-Type", "application/json"))
respBuilder.accumulate(new CacheableHttpResponseBodyPart("{\n\"id\":\"job-1\",\n\"lines\": [\n\"62812ce276aa9819a2e272f94124d5a1\",\n\"13ea8b769685089ba2bed4a665a61fde\"\n]\n}".getBytes(), true))
val resp = new AhcWSResponse(respBuilder.build())
val outputWriter = OutputWriter
val expected = ("\"job-1\"", List("\"62812ce276aa9819a2e272f94124d5a1\"", "\"13ea8b769685089ba2bed4a665a61fde\""), "_SUCCESS")
"Output Writer" should "handle response from api call" in {
val actual = outputWriter.handleResponse(resp, "job-1")
println("the actual : " + actual)
actual shouldEqual(expected)
}
}

Avro schema resolution for evolving a field from a primitive to a union

I'm working with Avro 1.7.0 using the its Java's generic representation API and I have a problem dealing with our current case of schema evolution. The scenario we're dealing with here is making a primitive-type field optional by changing the field to be a union of null and that primitive type.
I'm going to use a simple example. Basically, our schemas are:
Initial: A record with one field of type int
Second version: Same record, same field name but the type is now a union of null and int
According to the schema resolution chapter of Avro's spec, the resolution for such a case should be:
if reader's is a union, but writer's is not
The first schema in the reader's union that matches the writer's schema is recursively resolved against it. If none match, an error is signalled.
My interpretation is that we should resolve data serialized with the initial schema properly as int is part of the union in the reader's schema.
However, when running a test of reading back a record serialized with version 1 using the version 2, I get
org.apache.avro.AvroTypeException: Attempt to process a int when a union was expected.
Here's a test that shows exactly this:
#Test
public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
Schema writerSchema = new Schema.Parser().parse("{\n" +
" \"type\":\"record\",\n" +
" \"name\":\"NeighborComparisons\",\n" +
" \"fields\": [\n" +
" {\"name\": \"test\",\n" +
" \"type\": \"int\" }]} ");
Schema readersSchema = new Schema.Parser().parse(" {\n" +
" \"type\":\"record\",\n" +
" \"name\":\"NeighborComparisons\",\n" +
" \"fields\": [ {\n" +
" \"name\": \"test\",\n" +
" \"type\": [\"null\", \"int\"],\n" +
" \"default\": null } ] }");
// Writing a record using the initial schema with the
// test field defined as an int
GenericData.Record record = new GenericData.Record(writerSchema);
record.put("test", Integer.valueOf(10));
ByteArrayOutputStream output = new ByteArrayOutputStream();
JsonEncoder jsonEncoder = EncoderFactory.get().
jsonEncoder(writerSchema, output);
GenericDatumWriter<GenericData.Record> writer = new
GenericDatumWriter<GenericData.Record>(writerSchema);
writer.write(record, jsonEncoder);
jsonEncoder.flush();
output.flush();
System.out.println(output.toString());
// We try reading it back using the second schema
// version where the test field is defined as a union of null and int
JsonDecoder jsonDecoder = DecoderFactory.get().
jsonDecoder(readersSchema, output.toString());
GenericDatumReader<GenericData.Record> reader =
new GenericDatumReader<GenericData.Record>(writerSchema,
readersSchema);
GenericData.Record read = reader.read(null, jsonDecoder);
// We should be able to assert that the value is 10 but it
// fails on reading the record before getting here
assertEquals(10, read.get("test"));
}
I would like to either know if my expectations are correct (this should resolve successfully right?) or where I'm not using avro properly to handle such a scenario.
The expectations that migrating a primitive schema to a union of null and a primitive are correct.
The problem with the code above is with how the decoder is created. The decoder needs the writer's schema rather than the reader's schema.
Rather than doing this:
JsonDecoder jsonDecoder = DecoderFactory.get().
jsonDecoder(readersSchema, output.toString());
It should be like this:
JsonDecoder jsonDecoder = DecoderFactory.get().
jsonDecoder(writerSchema, output.toString());
Credits goes to Doug Cutting for the answer on avro's user mailing list:
http://mail-archives.apache.org/mod_mbox/avro-user/201208.mbox/browser

Categories

Resources