ElasticSearch Indexing 100K documents with BulkRequest API using Java RestHighLevelClient

ElasticSearch Indexing 100K documents with BulkRequest API using Java RestHighLevelClient - java

Am reading 100k plus file path from the index documents_qa using scroll API. Actual files will be available in my local d:\drive. By using the file path am reading the actual file and converting into base64 and am reindex with the base64 content (of a file) in another index document_attachment_qa.
My current implementation is, am reading filePath, convering the file into base64 and indexing document along with fileContent one by one. So its taking more time for eg:- indexing 4000 documents its taking more than 6 hours and also connection is terminating due to IO exception.
So now i want to index the documents using BulkRequest API, but am using RestHighLevelClient and am not sure how to using BulkRequest API along with RestHighLevelClient.
Please find my current implementation, which am indexing one by one document.
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
String id=Long.toString(doc.getId());
IndexRequest request = new IndexRequest(ATTACHMENT, "doc", id ) // ATTACHMENT is the index name
.source(jsonMap) // Its my single document.
.setPipeline(ATTACHMENT);
IndexResponse response = SearchEngineClient.getInstance3().index(request); // increased timeout
I found the below documentation for BulkRequest.
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk.html
But am not sure how to implement BulkRequestBuilder bulkRequest = client.prepareBulk(); client.prepareBulk() method when and using RestHighLevelClient.
UPDATE 1
Am trying to indexing all 100K documents in one shot. so i creating one JSONArray and put all my JSONObject into the array one by one. Finally am trying to build BulkRequest and add all my documents (JSONArray) as a source to the BulkRequest and trying to index them.
Here am not sure, how to convert my JSONArray to List of String.
private final static String ATTACHMENT = "document_attachment_qa";
private final static String TYPE = "doc";
JSONArray reqJSONArray=new JSONArray();
while (searchHits != null && searchHits.length > 0) {
...
...
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
reqJSONArray.put(jsonMap)
}
String actionMetaData = String.format("{ \"index\" : { \"_index\" : \"%s\", \"_type\" : \"%s\" } }%n", ATTACHMENT, TYPE);
List<String> bulkData = // not sure how to convert a list of my documents in JSON strings
StringBuilder bulkRequestBody = new StringBuilder();
for (String bulkItem : bulkData) {
bulkRequestBody.append(actionMetaData);
bulkRequestBody.append(bulkItem);
bulkRequestBody.append("\n");
}
HttpEntity entity = new NStringEntity(bulkRequestBody.toString(), ContentType.APPLICATION_JSON);
try {
Response response = SearchEngineClient.getRestClientInstance().performRequest("POST", "/ATTACHMENT/TYPE/_bulk", Collections.emptyMap(), entity);
return response.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
} catch (Exception e) {
// do something
}

You can just new BulkRequest() and add the requests without using BulkRequestBuilder, like:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("foo", "bar", "1")
.source(XContentType.JSON,"field", "foobar"));
request.add(new IndexRequest("foo", "bar", "2")
.source(XContentType.JSON,"field", "foobar"));
...
BulkResponse bulkResponse = myHighLevelClient.bulk(request, RequestOptions.DEFAULT);

In addition to #chengpohi answer. I would like to add below points:
A BulkRequest can be used to execute multiple index, update and/or delete operations using a single request.
It requires at least one operation to be added to the Bulk request:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("posts", "doc", "1")
.source(XContentType.JSON,"field", "foo"));
request.add(new IndexRequest("posts", "doc", "2")
.source(XContentType.JSON,"field", "bar"));
request.add(new IndexRequest("posts", "doc", "3")
.source(XContentType.JSON,"field", "baz"));
Note: The Bulk API supports only documents encoded in JSON or SMILE.
Providing documents in any other format will result in an error.
Synchronous Operation:
BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
client will be High-Level Rest Client and execution will be synchronous.
Asynchronous Operation(Recommended Approach):
client.bulkAsync(request, RequestOptions.DEFAULT, listener);
The asynchronous execution of a bulk request requires both the BulkRequest instance and an ActionListener instance to be passed to the asynchronous method.
Listener Example:
ActionListener<BulkResponse> listener = new ActionListener<BulkResponse>() {
#Override
public void onResponse(BulkResponse bulkResponse) {
}
#Override
public void onFailure(Exception e) {
}
};
The returned BulkResponse contains information about the executed operations and allows to iterate over each result as follows:
for (BulkItemResponse bulkItemResponse : bulkResponse) {
DocWriteResponse itemResponse = bulkItemResponse.getResponse();
if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX
|| bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
IndexResponse indexResponse = (IndexResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE) {
UpdateResponse updateResponse = (UpdateResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) {
DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
}
}
The following arguments can optionally be provided:
request.timeout(TimeValue.timeValueMinutes(2));
request.timeout("2m");
I hope this helps.

Related

Elasticsearch wildcard query in Java - find all matching fields and replace

I want to update all path fields starting with "example/one".
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type)
.setScript(new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new);",
parameters))
.setQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"));
client.updateByQuery(request, RequestOptions.DEFAULT);
It's not working (no update, no errors - tried a prefixQuery, same). The following query however is updating the matching documents (Postman).
POST my_index/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.path = ctx._source.path.replace(\"example/one\", \"new/example\")"
},
"query": {
"wildcard": {
"path.tree": {
"value: "example/one*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
What am I missing? The Path hierarchy tokenizer is used on the field path.
Your help is much needed.
PS: I can't upgrade to a newer version of elasticsearch.

When testing the solution, I first thought it was related to the custom analyser used on the path field. But it was quickly discarded as I was getting the expected result via Postman.
I finally decided to go with a 'two steps' solution (couldn't use the update by query API). First search for all matching documents, then perform a bulk update.
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"))
.withSourceFilter(new FetchSourceFilter(new String[]{"_id", "path"}, null))
.build();
List<MyClass> result = elasticsearchRestTemplate.queryForList(query, MyClass.class);
if(!CollectionUtils.isEmpty(result)) {
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
Script script = new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new)",
parameters);
BulkRequest request = new BulkRequest();
for (MyClass myClass : result) {
request.add(new UpdateRequest(index, type, myClass.getId()).script(script));
}
client.bulk(request, RequestOptions.DEFAULT);
}
UPDATE
Turns out setting the type in the request was the problem.
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type) <--------------- Remove
.....

How to get cosmos response as a JSON Array in JAVA java using azure-cosmos sdk

I can run query in azure cosmos-db explore like in below image and see the response as a json array
I want to do the same using Java with azure-cosmos SDK
Below is my function
public JSONArray getCosmosResponseFromSyncClient(String databaseName, String
containerName, String sqlQuery) {
try {
cosmosClient = new
CosmosClientBuilder().endpoint(cosmosURI).key(cosmosPrimaryKey).buildClient();
CosmosDatabase database = cosmosClient.getDatabase(databaseName);
CosmosContainer container = database.getContainer(containerName);
int preferredPageSize = 10;
CosmosQueryRequestOptions queryOptions = new CosmosQueryRequestOptions();
queryOptions.setQueryMetricsEnabled(true);
CosmosPagedIterable < JSONArray > responsePagedIterable = container.queryItems(sqlQuery,
queryOptions, JSONArray.class);
return cosmosQueryResponseObjectAsAJSONArray;
}finally {
cosmosClient.close();
}
}

Assuming that org.json.JSONArray is used for JSONArray, you can use Async API in Cosmos DB V4 SDK. The cosmosAsyncClient really should be built outside your method and re-used by all threads calling the method. See sample here for creating async client properly and consuming from multiple methods.
cosmosAsyncClient = new CosmosClientBuilder().endpoint(cosmosURI).key(cosmosPrimaryKey).buildAsyncClient();
CosmosAsyncDatabase database = cosmosAsyncClient.getDatabase(databaseName);
CosmosAsyncContainer container = database.getContainer(containerName);
Your method should look like this:
public JSONArray getCosmosResponseFromAsyncClient(String sqlQuery) {
int preferredPageSize = 10;
CosmosQueryRequestOptions queryOptions = new CosmosQueryRequestOptions();
queryOptions.setQueryMetricsEnabled(true);
CosmosPagedFlux<JsonNode> pagedFlux = container.queryItems(sqlQuery, queryOptions,
JsonNode.class);
List<JsonNode> cosmosQueryResponseObjectAsAJSONArray = pagedFlux.byPage(preferredPageSize)
.flatMap(pagedFluxResponse -> {
return Flux.just(pagedFluxResponse
.getResults()
.stream()
.collect(Collectors.toList()));
}).onErrorResume((exception) -> {
logger.error(
"Exception. e: {}",
exception.getLocalizedMessage(),
exception);
return Mono.empty();
}).blockLast();
return new JSONArray(cosmosQueryResponseObjectAsAJSONArray.toString());
}

How can I convert ArrayList/Set to JSON and post data using postforobject method?

I have set which contains string ["a" , "b" , "c"] , I want to POST json data like (comma seperated and one string)
Here is
JSON
{"view" : "a,b,c",
"fruits" : "apple"}
to the endpoing using Resttemplate postForObject method? I have used GSON but that is not working in my project. Are there any other alternatives?
Here is my code
private run(set<data> datas) {
Set<string> stack = new hashset<>();
iterator<data> itr = datas.iterator();
while (itr.hasnext()) {
data macro = itr.next();
if (//some condition) {
stack.add(macro);
}
}
}
}
Resttemplate.getmessageconverters().add(stringconvertor);
String result = resttemplate.postforobject(endpoint, request, String.class);
}

If the data is in a specific class like format, you could go with the POJO approach that is encouraged by Spring Boot. But looking at your example, it seems like you want to achieve a one time JSON Object response.
import org.json.simple.JSONObject;
public static void run(set<data> datas, string endpoint){
// build your 'stack' set
String joined = String.join(",", stack);
JSONObject obj=new JSONObject();
obj.put("view",joined);
obj.put("fruits","apple");
//return the jsonObject as the response to your entrypoint using your method
}
You could also try the following if you use #ResponseBody annotation in Spring Boot that will convert the Response Body to the appropriate (JSON) format.
HashMap<String, String> map = new HashMap<>();
map.put("view", joined);
map.put("fruits", "apple");
return map;

Apache Kafka - Implementing a KTable

I am new to Kafka Streams API and I am trying to create a KTable. I have an input topic: s-order-topic, which is a json format message, as shown below.
{ "current_ts": "2019-12-24 13:16:40.316952",
"primary_keys": ["ID"],
"before": null,
"tokens": {"txid":"3.17.2493",
"csn":"64913009"},
"op_type":"I",
"after": { "CODE":"AAAA41",
"STATUS":"COMPLETED",
"ID":24},
"op_ts":"2019-12-24 13:16:40.316941",
"table":"S_ORDER"}
I read messages from this topic and I want to create a KTable that has as key, the field "after":"ID" and for value all the fields inside the "after" field (except for "ID").
I have successfully created a KTable only when I use the default aggregate functions i.e count. But I have difficulty creating my own aggregate function. Below I present the part of the code that I try to create the KTable.
KTable<Long, String> s_table = builder.stream("s-order-topic", Consumed.with(Serdes.Long(),Serdes.String()))
.mapValues(value -> {
String time;
JSONObject json = new JSONObject(value);
if (json.getString("op_type").equals("I")) {
time = "after";
}else {
time = "before";
}
JSONObject json2 = new JSONObject(json.getJSONObject(time).toString());
return json2.toString();
})
.groupBy((key, value) -> {
JSONObject json = new JSONObject(value);
return json.getLong("ID");
}, Grouped.with(Serdes.Long(), Serdes.String()))
.aggregate( ... );
How can I implement this KTable?
Am I approaching the problem correctly?
(mapValues -> keep only the "before"/"after" field. groupBy -> Make the ID the key of the message. Aggregate -> ? )

I figured out a solution for my case. I implemented the KTable as shown below:
KTable<String, String> s_table = builder.stream("s-order-topic", Consumed.with(Serdes.String(),Serdes.String()))
.mapValues(value -> {
String time;
JSONObject json = new JSONObject(value);
if (json.getString("op_type").equals("I")) {
time = "after";
}else {
time = "before";
}
JSONObject json2 = new JSONObject(json.getJSONObject(time).toString());
return json2.toString();
})
.groupBy((key, value) -> {
JSONObject json = new JSONObject(value);
return String.valueOf(json.getLong("ID"));
}, Grouped.with(Serdes.String(), Serdes.String()))
.reduce((prev,newval)->newval);
The aggregate function is not suitable for this case, instead I used the reduce function.
The output from the console consumer is shown below:
15 {"CODE":"AAAA17","STATUS":"PENDING","ID":15}
18 {"CODE":"AAAA50","STATUS":"SUBMITTED","ID":18}
4 {"CODE":"AAAA80","STATUS":"SUBMITTED","ID":4}
19 {"CODE":"AAAA83","STATUS":"SUBMITTED","ID":19}
18 {"CODE":"AAAA33","STATUS":"COMPLETED","ID":18}
5 {"CODE":"AAAA38","STATUS":"PENDING","ID":5}
10 {"CODE":"AAAA1","STATUS":"COMPLETED","ID":10}
3 {"CODE":"AAAA68","STATUS":"NOT COMPLETED","ID":3}
9 {"CODE":"AAAA89","STATUS":"PENDING","ID":9}

How to fix JSON format issue in JAVA?

This is my 1st project in java spring. So i m trying to figure out the best way to do things.
I have several Rest Apis in my project for which different kinds of API response will be sent.
Somewhere i m getting data in List Format, somewhere else another format. So i m trying to figure out the best way to send response in JSON format.
One of the API Response i have is this:
{
"result": "true",
"message": null,
"data": "{\"id\":1,\"firstName\":\"test\",\"lastName\":\"test\",\"emailId\":\"test#test.com\",\"mobileNo\":\"1234567890\",\"alternateMobileNo\":\"1234567890\",\"username\":\"test\",\"password\":\"7c4a8d09ca3762af61e59520943dc26494f8941b\",\"status\":\"active\",\"userRole\":\"test\",\"dateCreated\":\"Feb 6, 2019\",\"permissions\":\"\"}"
}
My biggest issue is the formatting of data key in the above JSON.
This is my controller action:
#RequestMapping(value = "/admin/staff/get", method = RequestMethod.POST, consumes = MediaType.APPLICATION_FORM_URLENCODED_VALUE)
public Map get(HttpServletRequest request, #RequestParam Map<String, String> parameters) {
Map<String, String> response = new HashMap<>();
Gson gson = new Gson();
Staff staff = new Staff();
staff.setId(new Integer(parameters.get("id")));
List validateToken = loginAuthTokenService.validateToken(new Integer(request.getHeader("loginId")), request.getHeader("loginType"), request.getHeader("token"));
if (validateToken.size() > 0) {
Staff staffDetails = staffService.getStaff(staff.getId());
response.put("result", "true");
response.put("data", gson.toJson(staffDetails));
} else {
response.put("result", "false");
response.put("message", "No records found.");
}
return response;
}
Should I create a separate Class for sending API Response or anyone please guide me the proper way of sending response.
Thanks

Gson#toJson(Object) returns a String and that String is mapped as JSON key in your map.
You don't have to convert your object to a JSON, Spring will do it for you (it uses Jackson as JSON mapper so you don't have add Gson dependency to your project.
A simple and working implementation could be something like:
#RequestMapping(value = "/admin/staff/get", method = RequestMethod.POST, consumes = MediaType.APPLICATION_FORM_URLENCODED_VALUE)
public ResponseEntity<?> get(
#RequestParam("id") Integer id,
#RequestHeader("loginId") Integer loginId,
#RequestHeader("loginType") String loginType,
#RequestHeader("token") String token) {
List validateToken = loginAuthTokenService.validateToken(loginId, loginType, token);
if (!validateToken.isEmpty()) {
Stuff stuff = staffService.getStaff(id);
return ResponseEntity.ok(stuff);
}
return ResponseEntity.notFound().body("No records found.");
}
Also consider to not return a generic map from your method, but the Stuff object your front-end needs. In case of failure you should return a failure object with a specific http response code (e.g. 404, 400, 500...).
Take a look at this guide.

To format the the data attribute , you can store it in a map :-
Map<String, Object> map1= new HashMap<String, Object>();
and is you have multiple data attributes you can create an ArrayList of Maps :-
ArrayList<Map<String, Object>> dataClerk = new ArrayList<Map<String,Object>>();
I had a similar usecas so i used the below code :-
obj = parser.parse(response);
JSONObject jobj = (JSONObject)parser.parse(response);
JSONArray jsonarr_1 = (JSONArray) jobj.get(item);
for(int i=0 ;i<jsonarr_1.size();i++) {
Map<String, Object> entry = new HashMap<String, Object>();
org.json.simple.JSONObject temp= (org.json.simple.JSONObject)
jsonarr_1.get(i);
Set<String> attributes= temp.keySet();
for(String s: attributes) {
entry.put(s, temp.get(s));
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ElasticSearch Indexing 100K documents with BulkRequest API using Java RestHighLevelClient - java

Related

Elasticsearch wildcard query in Java - find all matching fields and replace

How to get cosmos response as a JSON Array in JAVA java using azure-cosmos sdk

How can I convert ArrayList/Set to JSON and post data using postforobject method?

Apache Kafka - Implementing a KTable

How to fix JSON format issue in JAVA?

Categories

Resources