I have an elasticsearch documents which contain nested objects within them, I want to be able to remove them via the java update api. Here is the code containing the script:
UpdateRequest updateRequest = new UpdateRequest(INDEX, "thread", String.valueOf(threadId));
updateRequest.script("for (int i = 0; i < ctx._source.messages.size(); i++){if(ctx._source.messages[i]._message_id == " + messageId + ")" +
"{ctx._source.messages.remove(i);i--;}}", ScriptService.ScriptType.INLINE);
client.update(updateRequest).actionGet();
This is the mapping of my document:
{
"thread_and_messages": {
"mappings": {
"thread": {
"properties": {
"messages": {
"type": "nested",
"include_in_parent": true,
"properties": {
"message_id": {
"type": "string"
},
"message_nick": {
"type": "string"
},
"message_text": {
"type": "string"
}
}
},
"thread_id": {
"type": "long"
}
}
}
}
}
}
I'm not receiving any error messages, but when I run a query on the index to find that nested document it hasn't been removed. Could someone let me know what I am doing wrong?
Since message_id is a string your script needs to account for it and be modified like this (see the escaped double quotes around the message_id field). There is a second typo, in that your mapping declares a message_id field but you name it _message_id in your script:
"for (int i = 0; i < ctx._source.messages.size(); i++){if(ctx._source.messages[i].message_id == \"" + messageId + "\")"
^ ^ ^
| | |
no underscore here add escaped double quotes
Finally also make sure that you have dynamic scripting enabled in your ES config
UPDATE
You can try a "groovy-er" way of removing elements from lists, i.e. no more for loop and if, just use the groovy power:
"ctx._source.messages.removeAll{ it.message_id == \"" + messageId + "\"}"
Normally, that will modify the messages array by removing all elements whose message_id field matches the messageId value.
Related
I am new to RedisSearch. I have a Java client. What is the easiest way to parse this sample FT.SEARCH result into JSON or POJO or something more useful?
Sample result from FT.SEARCH (actually a string):
[
3,
movie_json: 1, [$, { "id": 1, "title": "Game of Thrones" } ],
movie_json: 3, [$, { "id": 3, "title": "Looking for Sugarman" } ],
movie_json: 2, [$, { "id": 2, "title": "Inception" } ]
]
Something like this would be useful:
{
"count": 3,
"docs": [
{ "id": 1, "title": "Game of Thrones" },
{ "id": 3, "title": "Looking for Sugarman" },
{ "id": 2, "title": "Inception" }
]
}
The most obvious is a RegEx matcher as below (I am no regex expert).
This is the code generated by the https://regex101.com/ site where I can get the right groups on their site as long as I use a global flag - but it seems that Java doesn't have a GLOBAL pattern / flag! Is that true?
The code the site generated is below and sure enough matcher.find() shows no match, presumably due to the absence of the global flag.
final String regex = "(?<=\\[\\$, ).*?(?= \\])";
final String string = respContent; // The rediSearch result string shown above
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
I could use the String.split() dance too.
However, is there an existing solution that is probably more robust for multiple FT.SEARCH results use-cases?
I imagined someone would have written a RedisSearch results parser by now but I cannot find one.
Thanks,
Murray
The high level Redis API for Quarkus only exposes the plain Redis commands as a set of java APIs. To handle Redis extensions, you can always refer to the low-level API: https://quarkus.io/guides/redis-reference
Once you choose the low-level API, you are, in fact, using the underlying driver that Quarkus uses. This is Vert.x Redis client.
In this mode, you can use any Redis extension and work with JSON directly, for example:
// set a JSON value
lowLevelClient
.send(cmd(Command.create("JSON.SET")).arg("foo").arg(".").arg("\"bar\""))
.compose(response -> {
// OK
// get a JSON value
return lowLevelClient.send(cmd(Command.create("JSON.GET")).arg("foo"));
})
.compose(response -> {
// verify that it is correct
should.assertEquals("\"bar\"", response.toString());
// do another call...
return lowLevelClient.send(cmd(Command.create("JSON.TYPE")).arg("foo").arg("."));
})
.compose(response -> {
should.assertEquals("string", response.toString());
return Future.succeededFuture();
})
.onFailure(should::fail)
.onSuccess(v -> {
test.complete();
});
While this mode is much more verbose, it gives you full control to the Redis extension you're using.
If the response can be mapped to JSON or is JSON already, you can get the content from its holder directly without need to parse the response, for example:
response.getKeys(); // returns the set of keys
response.get("key1"); // returns the JSON value for key "key1"
response.get(0); // returns the JSON value for array index 0
...
How to transform JSON response retrieved from external system to meaningful data (key/value pairs) in ESQL?
Retrieved JSON:
{
"data": [
{
"name": "application.info.header",
"value": "headerValue"
},
{
"name": "entity.statistics.name.fullName",
"value": "fullNameValue"
},
{
"name": "application.info.matter",
"value": "matterValue"
},
{
"name": "entity.statistics.skill",
"value": "skillValue"
}
]
}
where,
name ~ hierarchy of JSON (last attribute being the key)
value ~ value against the key
Expected JSON:
{
"data": {
"application": {
"info": {
"header": "headerValue",
"matter": "matterValue"
}
},
"entity": {
"statistics": {
"name": {
"fullName": "fullNameValue"
},
"skill": "skillValue"
}
}
}
}
Needless to say this can be easily achieved in Java through Split method - I'm looking for a suitable method in ESQL.
Current ESQL Module:
CREATE COMPUTE MODULE getDetails_prepareResponse
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
DECLARE data REFERENCE TO InputRoot.JSON.Data.data.Item[1];
SET OutputRoot.JSON.Data = InputRoot.JSON.Data;
SET OutputRoot.JSON.Data.data = NULL;
WHILE LASTMOVE(data) DO
DECLARE keyA CHARACTER SUBSTRING(data.name BEFORE '.');
DECLARE name CHARACTER SUBSTRING(data.name AFTER '.');
DECLARE keyB CHARACTER SUBSTRING(name BEFORE '.');
DECLARE key CHARACTER SUBSTRING(name AFTER '.');
CREATE LASTCHILD OF OutputRoot.JSON.Data.data.{EVAL('keyA')}.{EVAL('keyB')}
NAME key VALUE data.value;
MOVE data NEXTSIBLING;
END WHILE;
RETURN TRUE;
END;
END MODULE;
This is currently handled through SUBSTRING method in ESQL (for 3 levels only), but now the JSON levels are dynamic (no limit to key/value pairs) as per requirements.
You could implement your own procedure to split a string. Take a look at this answer for an example.
ESQL for splitting a string into mulitple values
The method splits S on Delim into an array in Env (Environment.Split.Array[]) and removes Environment.Split before refilling it.
As per the definition of "default" attribute in Avro docs: "A default value for this field, used when reading instances that lack this field (optional)."
This means that if the corresponding field is missing, the default value is taken.
But this does not seem to be the case. Consider the following student schema:
{
"type": "record",
"namespace": "com.example",
"name": "Student",
"fields": [{
"name": "age",
"type": "int",
"default": -1
},
{
"name": "name",
"type": "string",
"default": "null"
}
]
}
Schema says that: if "age" field is missing, then consider value as -1. Likewise for "name" field.
Now, if I try to construct Student model, from the following JSON:
{"age":70}
I get this exception:
org.apache.avro.AvroTypeException: Expected string. Got END_OBJECT
at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698)
at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:227)
Looks like the default is NOT working as expected. So, What exactly is the role of default here ?
This is the code used to generate Student model:
Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, studentJson);
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);
return datumReader.read(null, decoder);
(Student class is auto-generated by Avro compiler from student schema)
I think there is some miss understanding around default values so hopefully my explanation will help to other people as well. The default value is useful to give a default value when the field is not present, but this is essentially when you are instancing an avro object (in your case calling datumReader.read) but it does not allow read data with a different schema, this is why the concept of "schema registry" is useful for this kind of situations.
The following code works and allow read your data
Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, "{\"age\":70}");
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);
Schema expected = new Schema.Parser().parse("{\n" +
" \"type\": \"record\",\n" +
" \"namespace\": \"com.example\",\n" +
" \"name\": \"Student\",\n" +
" \"fields\": [{\n" +
" \"name\": \"age\",\n" +
" \"type\": \"int\",\n" +
" \"default\": -1\n" +
" }\n" +
" ]\n" +
"}");
datumReader.setSchema(expected);
System.out.println(datumReader.read(null, decoder));
as you can see, I am specifying the schema used to "write" the json input which does not contain the field "name", however (considering your schema contains a default value) when you print the records you will see the name with your default value
{"age": 70, "name": "null"}
Just in case, might or might not already know, that "null" is not really a null value is a string with value "null".
Just to add what is already said in above answer. in order for a field to be null if not present. then union its type with null. otherwise its just a string which is spelled as null that gets in.example schema:
{
"name": "name",
"type": [
"null",
"string"
],
"default": null
}
and then if you add {"age":70} and retrieve the record, you will get below:
{"age":70,"name":null}
I have a sample json which I want to index into elasticsearch.
Sample Json Indexed:
put test/names/1
{
"1" : {
"name":"abc"
},
"2" : {
"name":"def"
},
"3" : {
"name":"xyz"
}
}
where ,
index name : test,
type name : names,
id :1
Now the default mapping generated by elasticsearch is :
{
"test": {
"mappings": {
"names": {
"properties": {
"1": {
"properties": {
"name": {
"type": "string"
}
}
},
"2": {
"properties": {
"name": {
"type": "string"
}
}
},
"3": {
"properties": {
"name": {
"type": "string"
}
}
},
"metadataFieldDefinition": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
}
}
If the map size increases from 3 ( currently) to suppose thousand or million, then ElasticSearch will create a mapping for each which may cause a performance issue as the mapping collection will be huge .
I tried creating a mapping by setting :
"dynamic":false,
"type":object
but it was overriden by ES. since it didnt match the indexed data.
Please let me know how can I define a mapping so that ES. doesnot creates one like the above .
I think there might be a little confusion here in terms of how we index documents.
put test/names/1
{...
document
...}
This says: the following document belongs to index test and is of type name with id 1. The entire document is treated as type name. Using the PUT API as you currently are, you cannot index multiple documents at once. ES immediately interprets 1, 2, and 3 as a properties of type object, each containing a property name of type string.
Effectively, ES thinks you are trying to index ONE document, instead of three
To get many documents into index test with a type of name, you could do this, using the CURL syntax:
curl -XPUT"http://your-es-server:9200/test/names/1" -d'
{
"name": "abc"
}'
curl -XPUT"http://your-es-server:9200/test/names/2" -d'
{
"name": "ghi"
}'
curl -XPUT"http://your-es-server:9200/test/names/3" -d'
{
"name": "xyz"
}'
This will specify the document ID in the endpoint you are index to. Your mapping will then look like this:
"test": {
"mappings": {
"names": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
Final Word: Split your indexing up into discrete operations, or check out the Bulk API to see the syntax on how to POST multiple operations in a single request.
I'm need run queries with 1000 objects. Using /batch endpoint I can get this to work but is too slow (30 seconds with 300 items).
So I'm trying the same approach as said in this docs page: http://docs.neo4j.org/chunked/2.0.1/rest-api-cypher.html#rest-api-create-mutiple-nodes-with-properties
POST this JSON to http://localhost:7474/db/data/cypher
{
"params": {
"props": [
{
"_user_id": "177032492760",
"_user_name": "John"
},
{
"_user_id": "177032492760",
"_user_name": "Mike"
},
{
"_user_id": "100007496328",
"_user_name": "Wilber"
}
]
},
"query": "MERGE (user:People {id:{_user_id}}) SET user.id = {_user_id}, user.name = {_user_name} "
}
The problem is I'm getting this error:
{ message: 'Expected a parameter named _user_id',
exception: 'ParameterNotFoundException',
fullname: 'org.neo4j.cypher.ParameterNotFoundException',
stacktrace:
...
Maybe this works only with CREATE queries, as showing in the docs page?
Use FOREACH and MERGE with ON CREATE SET:
FOREACH (p in {props} |
MERGE (user:People {id:{p._user_id}})
ON CREATE user.name = {p._user_name})
POST this JSON to http://localhost:7474/db/data/cypher
{
"params": {
"props": [
{
"_user_id": "177032492760",
"_user_name": "John"
},
{
"_user_id": "177032492760",
"_user_name": "Mike"
},
{
"_user_id": "100007496328",
"_user_name": "Wilber"
}
]
},
"query": "FOREACH (p in {props} | MERGE (user:People {id:{p._user_id}}) ON CREATE user.name = {p._user_name}) "
}
Actually, the equivalent to the example in the doc would be:
{
"params": {
"props": [
{
"id": "177032492760",
"name": "John"
},
{
"id": "177032492760",
"name": "Mike"
},
{
"id": "100007496328",
"name": "Wilber"
}
]
},
"query": "CREATE (user:People {props})"
}
It might be legal to replace CREATE to MERGE, but the query may not do what you expect.
For example, if a node with the id "177032492760" already exists, but it does not have the name "John", then the MERGE will create a new node; and you'd end up with 2 nodes with the same id (but different names).
Yes, a CREATE statement can take an array of maps and implicitly convert it to several statements with one map each, but you can't use arrays of maps that way outside of simple create statements. In fact you can't use literal maps the same way either when you use MERGE and MATCH. You can CREATE ({map}) but you have to MATCH/MERGE ({prop:{map}.val} i.e.
// {props:{name:Fred, age:2}}
MERGE (a {name:{props}.name})
ON CREATE SET a = {props}
For your purposes either send individual parameter maps with a query like above or for an array of maps iterate through it with FOREACH
FOREACH (p IN props |
MERGE (user:People {id:p._user_id})
ON CREATE SET user = p)