Elasticsearch wildcard query in Java - find all matching fields and replace

Elasticsearch wildcard query in Java - find all matching fields and replace - java

I want to update all path fields starting with "example/one".
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type)
.setScript(new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new);",
parameters))
.setQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"));
client.updateByQuery(request, RequestOptions.DEFAULT);
It's not working (no update, no errors - tried a prefixQuery, same). The following query however is updating the matching documents (Postman).
POST my_index/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.path = ctx._source.path.replace(\"example/one\", \"new/example\")"
},
"query": {
"wildcard": {
"path.tree": {
"value: "example/one*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
What am I missing? The Path hierarchy tokenizer is used on the field path.
Your help is much needed.
PS: I can't upgrade to a newer version of elasticsearch.

When testing the solution, I first thought it was related to the custom analyser used on the path field. But it was quickly discarded as I was getting the expected result via Postman.
I finally decided to go with a 'two steps' solution (couldn't use the update by query API). First search for all matching documents, then perform a bulk update.
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"))
.withSourceFilter(new FetchSourceFilter(new String[]{"_id", "path"}, null))
.build();
List<MyClass> result = elasticsearchRestTemplate.queryForList(query, MyClass.class);
if(!CollectionUtils.isEmpty(result)) {
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
Script script = new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new)",
parameters);
BulkRequest request = new BulkRequest();
for (MyClass myClass : result) {
request.add(new UpdateRequest(index, type, myClass.getId()).script(script));
}
client.bulk(request, RequestOptions.DEFAULT);
}
UPDATE
Turns out setting the type in the request was the problem.
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type) <--------------- Remove
.....

Related

Jooq dsl one to many relations

I'm using Spring Data + Jooq DSL. As result entity I'm using not jooq generated entity, but simple one, without any annotations and for relations One To Many getting result:
[{
"id":2,
"name":"James",
"addresses":[
{
"id":null,
"country":null,
"street":null
}
]
}]
Is any way to return an empty array for addresses?
My code to perform a request:
public Set<User> getUserById(Set<Long> id) {
Set<User> result = new HashSet<>();
ResultQuery users = dsl.select(
field("u.id", Long.class).as("id"),
field("u.name", String.class).as("name"),
field("a.id", Long.class).as("addresses_id"),
field("a.country", String.class).as("addresses_country"),
field("a.street", String.class).as("addresses_street")
).from("schema.user_table u")
.leftJoin("schema.address_table a")
.on("u.id = a.user_id")
.where(field("u.id").in(id));
try(ResultSet rs = users.fetchResultSet()) {
JdbcMapper<User> mapper = JdbcMapperFactory
.newInstance()
.addKeys("id")
.newMapper(User.class);
result = mapper.stream(rs).collect(Collectors.toSet());
} catch (Exception e) {
e.printStackTrace();
}
return result;
}

Why not just use SQL/JSON to produce JSON documents directly from within your database?
public String getUserById(Set<Long> id) {
return dsl.select(coalesce(jsonArrayAgg(
jsonObject(
key("id").value(field("u.id", Long.class)),
key("name").value(field("u.name", String.class)),
key("addresses").value(coalesce(
jsonArrayAgg(jsonObject(
key("id").value(field("a.id", Long.class)),
key("country").value(field("a.country", String.class)),
key("street").value(field("a.street", String.class))
)),
jsonArray()
))
),
jsonArray()
)))
.from("schema.user_table u")
.leftJoin("schema.address_table a")
.on("u.id = a.user_id")
.where(field("u.id").in(id))
.fetchSingle().value1().data();
}
If you really need the intermediate User representation, then you can either:
Use Jackson or Gson to map the JSON document to the nested User DTO structure using reflection (works with jOOQ 3.14)
Use jOOQ 3.15's new MULTISET value constructor operator or MULTISET_AGG aggregate function along with ad-hoc converters, see below:
public Set<User> getUserById(Set<Long> id) {
return dsl.select(
field("u.id", Long.class),
field("u.name", String.class),
multisetAgg(
field("a.id", Long.class),
field("a.country", String.class),
field("a.street", String.class)
).convertFrom(r -> r == null
? Collections.<Address>emptyList()
: r.map(Records.mapping(Address::new)))
)
.from("schema.user_table u")
.leftJoin("schema.address_table a")
.on("u.id = a.user_id")
.where(field("u.id").in(id))
.fetchSet(Records.mapping(User::new));
}
Side note on code generation and execution
While not strictly relevant to this question, unless your schema is dynamic (not known at compile time), I really urge you to reconsider using source code generation. If you're not using it, you're missing out on a lot of jOOQ API advantages, just like when you're executing a jOOQ query with something other than jOOQ.

For me worked specify as a key addressId:
.addKeys("id", "addresses_id")

ElasticSearch Indexing 100K documents with BulkRequest API using Java RestHighLevelClient

Am reading 100k plus file path from the index documents_qa using scroll API. Actual files will be available in my local d:\drive. By using the file path am reading the actual file and converting into base64 and am reindex with the base64 content (of a file) in another index document_attachment_qa.
My current implementation is, am reading filePath, convering the file into base64 and indexing document along with fileContent one by one. So its taking more time for eg:- indexing 4000 documents its taking more than 6 hours and also connection is terminating due to IO exception.
So now i want to index the documents using BulkRequest API, but am using RestHighLevelClient and am not sure how to using BulkRequest API along with RestHighLevelClient.
Please find my current implementation, which am indexing one by one document.
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
String id=Long.toString(doc.getId());
IndexRequest request = new IndexRequest(ATTACHMENT, "doc", id ) // ATTACHMENT is the index name
.source(jsonMap) // Its my single document.
.setPipeline(ATTACHMENT);
IndexResponse response = SearchEngineClient.getInstance3().index(request); // increased timeout
I found the below documentation for BulkRequest.
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk.html
But am not sure how to implement BulkRequestBuilder bulkRequest = client.prepareBulk(); client.prepareBulk() method when and using RestHighLevelClient.
UPDATE 1
Am trying to indexing all 100K documents in one shot. so i creating one JSONArray and put all my JSONObject into the array one by one. Finally am trying to build BulkRequest and add all my documents (JSONArray) as a source to the BulkRequest and trying to index them.
Here am not sure, how to convert my JSONArray to List of String.
private final static String ATTACHMENT = "document_attachment_qa";
private final static String TYPE = "doc";
JSONArray reqJSONArray=new JSONArray();
while (searchHits != null && searchHits.length > 0) {
...
...
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
reqJSONArray.put(jsonMap)
}
String actionMetaData = String.format("{ \"index\" : { \"_index\" : \"%s\", \"_type\" : \"%s\" } }%n", ATTACHMENT, TYPE);
List<String> bulkData = // not sure how to convert a list of my documents in JSON strings
StringBuilder bulkRequestBody = new StringBuilder();
for (String bulkItem : bulkData) {
bulkRequestBody.append(actionMetaData);
bulkRequestBody.append(bulkItem);
bulkRequestBody.append("\n");
}
HttpEntity entity = new NStringEntity(bulkRequestBody.toString(), ContentType.APPLICATION_JSON);
try {
Response response = SearchEngineClient.getRestClientInstance().performRequest("POST", "/ATTACHMENT/TYPE/_bulk", Collections.emptyMap(), entity);
return response.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
} catch (Exception e) {
// do something
}

You can just new BulkRequest() and add the requests without using BulkRequestBuilder, like:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("foo", "bar", "1")
.source(XContentType.JSON,"field", "foobar"));
request.add(new IndexRequest("foo", "bar", "2")
.source(XContentType.JSON,"field", "foobar"));
...
BulkResponse bulkResponse = myHighLevelClient.bulk(request, RequestOptions.DEFAULT);

In addition to #chengpohi answer. I would like to add below points:
A BulkRequest can be used to execute multiple index, update and/or delete operations using a single request.
It requires at least one operation to be added to the Bulk request:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("posts", "doc", "1")
.source(XContentType.JSON,"field", "foo"));
request.add(new IndexRequest("posts", "doc", "2")
.source(XContentType.JSON,"field", "bar"));
request.add(new IndexRequest("posts", "doc", "3")
.source(XContentType.JSON,"field", "baz"));
Note: The Bulk API supports only documents encoded in JSON or SMILE.
Providing documents in any other format will result in an error.
Synchronous Operation:
BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
client will be High-Level Rest Client and execution will be synchronous.
Asynchronous Operation(Recommended Approach):
client.bulkAsync(request, RequestOptions.DEFAULT, listener);
The asynchronous execution of a bulk request requires both the BulkRequest instance and an ActionListener instance to be passed to the asynchronous method.
Listener Example:
ActionListener<BulkResponse> listener = new ActionListener<BulkResponse>() {
#Override
public void onResponse(BulkResponse bulkResponse) {
}
#Override
public void onFailure(Exception e) {
}
};
The returned BulkResponse contains information about the executed operations and allows to iterate over each result as follows:
for (BulkItemResponse bulkItemResponse : bulkResponse) {
DocWriteResponse itemResponse = bulkItemResponse.getResponse();
if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX
|| bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
IndexResponse indexResponse = (IndexResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE) {
UpdateResponse updateResponse = (UpdateResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) {
DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
}
}
The following arguments can optionally be provided:
request.timeout(TimeValue.timeValueMinutes(2));
request.timeout("2m");
I hope this helps.

How to use QueryStringQueryBuilder

I've tried to use QueryStringQueryBuilder in a very simple case, but I dont understand why I get another result than the result I get from Kibana. What am I doing wrong?
Kibana:
GET .../_search
{
"query": {
"query_string" : {
"query" : "\"this is a query\"",
"lenient": true,
"default_operator": "OR"
}
}
}
Java:
private Optional<QueryStringQueryBuilder> parseQuery(String query) {
if (query.equals("")) {
return Optional.empty();
}
QueryStringQueryBuilder queryBuilder = QueryBuilders.queryStringQuery(query);
queryBuilder.lenient(true);
queryBuilder.defaultOperator(Operator.OR);
return Optional.of(queryBuilder);
}
Result from kibana: totalhits = 3336. Result from Java: totalhits = 10018.
EDIT:
This method calls parseQuery. Input is "this is a query".
public Optional<SearchRequestBuilder> getRequestBuilderByQuery(SearchQuery query) {
SearchRequestBuilder builder = getBuilderWithMaxHits(query.getMaxHits());
builder.setFetchSource(Globals.getFIELDS(query.isIncludeStory()), new String[0]);
parseQuery(query.getQuery()).ifPresent(builder::setQuery);
return Optional.of(builder);
}

I dont know what your input parameter 'query' in this case contains.
But i think you want to set queryBuilder.queryName(String queryName) in your QueryStringQueryBuilder.
From the JavaDocs:
queryName(String queryName):
Sets the query name for the filter that can be used when searching for matched_filters per hit.

Ok, I found the problem.
The query for kibana had quotation marks. Which meant it was processed differently than just a normal String. QueryStringQuerybuilder seems to be parsing the query by itself with "",AND,OR,NOT. This is magic!

Scan and filter on nested map attributes in DynamoDB with Java SDK

I have a JSON document stored in an attribute called doc that looks something like this:
{
doc:
{
"foo":
{
"bar": "baz"
}
}
}
I'd like to be able to do a table scan and filter/search on data.foo.bar == "baz". I'm using the Java SDK and have tried the following code but it doesn't seem to work for a sub-map of a document:
String filterExpression = "#d.#f.#b = :val";
Map<String, String> nameMap = new HashMap();
nameMap.put("#d", "doc");
nameMap.put("#f", "foo");
nameMap.put("#b", "bar");
Map valueMap = new HashMap();
valueMap.put(":val", "baz");
ItemCollection<ScanOutcome> items = table.scan(
new ScanSpec()
.withFilterExpression(filterExpression)
.withNameMap(nameMap)
.withValueMap(valueMap));
EDIT - I have found that this works:
String filterExpression = "#d.foo.bar = :val";
Where I only have a single ExpressionAttributeNames for the first attribute it works. Any thoughts why it doesn't work with 3 ExpressionAttributeNames? What if by some chance I needed 3, i.e. they were reserved words?
Any help or suggestions greatly appreciated. Thanks.

Spring Data MongoDB and Bulk Update

I am using Spring Data MongoDB and would like to perform a Bulk Update just like the one described here: http://docs.mongodb.org/manual/reference/method/Bulk.find.update/#Bulk.find.update
When using regular driver it looks like this:
The following example initializes a Bulk() operations builder for the items collection, and adds various multi update operations to the list of operations.
var bulk = db.items.initializeUnorderedBulkOp();
bulk.find( { status: "D" } ).update( { $set: { status: "I", points: "0" } } );
bulk.find( { item: null } ).update( { $set: { item: "TBD" } } );
bulk.execute()
Is there any way to achieve similar result with Spring Data MongoDB ?

Bulk updates are supported from spring-data-mongodb 1.9.0.RELEASE. Here is a sample:
BulkOperations ops = template.bulkOps(BulkMode.UNORDERED, Match.class);
for (User user : users) {
Update update = new Update();
...
ops.updateOne(query(where("id").is(user.getId())), update);
}
ops.execute();

You can use this as long as the driver is current and the server you are talking to is at least MongoDB, which is required for bulk operations. Don't believe there is anything directly in spring data right now (and much the same for other higher level driver abstractions), but you can of course access the native driver collection object that implements the access to the Bulk API:
DBCollection collection = mongoOperation.getCollection("collection");
BulkWriteOperation bulk = collection.initializeOrderedBulkOperation();
bulk.find(new BasicDBObject("status","D"))
.update(new BasicDBObject(
new BasicDBObject(
"$set",new BasicDBObject(
"status", "I"
).append(
"points", 0
)
)
));
bulk.find(new BasicDBObject("item",null))
.update(new BasicDBObject(
new BasicDBObject(
"$set", new BasicDBObject("item","TBD")
)
));
BulkWriteResult writeResult = bulk.execute();
System.out.println(writeResult);
You can either fill in the DBObject types required by defining them, or use the builders supplied in the spring mongo library which should all support "extracting" the DBObject that they build.

public <T> void bulkUpdate(String collectionName, List<T> documents, Class<T> tClass) {
BulkOperations bulkOps = mongoTemplate.bulkOps(BulkOperations.BulkMode.UNORDERED, tClass, collectionName);
for (T document : documents) {
Document doc = new Document();
mongoTemplate.getConverter().write(document, doc);
org.springframework.data.mongodb.core.query.Query query = new org.springframework
.data.mongodb.core.query.Query(Criteria.where(UNDERSCORE_ID).is(doc.get(UNDERSCORE_ID)));
Document updateDoc = new Document();
updateDoc.append("$set", doc);
Update update = Update.fromDocument(updateDoc, UNDERSCORE_ID);
bulkOps.upsert(query, update);
}
bulkOps.execute();
}
Spring Mongo template is used to perform the update. The above code will work if you provide the _id field in the list of documents.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch wildcard query in Java - find all matching fields and replace - java

Related

Jooq dsl one to many relations

ElasticSearch Indexing 100K documents with BulkRequest API using Java RestHighLevelClient

How to use QueryStringQueryBuilder

Scan and filter on nested map attributes in DynamoDB with Java SDK

Spring Data MongoDB and Bulk Update

Categories

Resources