How to Query ES using Rabbitmq spark stram - java

I am using Spark2. I am trying to get the stream of search text from Rabbitmq and query againt Elasticsearch.
params.put("hosts", "IP");
params.put("queueName", "query");
params.put("exchangeName", "Exchangequery");
params.put("vHost", "/");
params.put("userName", "test");
params.put("password", "test");
Function<byte[], String> messageHandler = new Function<byte[], String>() {
public String call(byte[] message) {
return new String(message);
}
};
JavaReceiverInputDStream<String> messages = RabbitMQUtils.createJavaStream(jssc, String.class, params, messageHandler);
messages.foreachRDD();
above code receives stram from rabbitmq. But i am not sure how to connect ES and query for the stream batch. One thing is, If i use messages.foreachRDD(); and query the elasticsearch for each input item then it will affect the performance.
Always i will query elasticsearch using only one field. For example
My stram messages has the input like
apple
orange
i have a index in es fruit and i want to query like ?q=apple or orange. I know i have to frame the query using should in elasticsearch. My question is how can i query against ES using the value received from Rabbitmq stream

The code makes only one call to the elasticsearch server (basically it constructs a single query with a lot of should clauses)
public static void main(String[] args) throws UnknownHostException {
Client client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300))
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host2"), 9300));
List<String> messages = new ArrayList<>();
messages.add("apple");
messages.add("orange");
String index = "fruit";
String fieldName = "fruit_type";
BoolQueryBuilder query = QueryBuilders.boolQuery();
for (String message : messages) {
query.should(QueryBuilders.matchQuery(fieldName, message));
// alternative if you are not analyzing fields
// query.should(QueryBuilders.termQuery(fieldName, message));
}
int size = 60; //you may want to change this since it defaults to 10
SearchResponse response = client.prepareSearch(index).setQuery(query).setSize(size).execute().actionGet();
long totalHits = response.getHits().getTotalHits();
System.out.println("Found " + totalHits + " documents");
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSource());
}
}
Query generated:
{
"bool" : {
"should" : [ {
"match" : {
"fruit_type" : {
"query" : "apple",
"type" : "boolean"
}
}
}, {
"match" : {
"fruit_type" : {
"query" : "orange",
"type" : "boolean"
}
}
} ]
}
}

Related

Elasticsearch wildcard query in Java - find all matching fields and replace

I want to update all path fields starting with "example/one".
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type)
.setScript(new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new);",
parameters))
.setQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"));
client.updateByQuery(request, RequestOptions.DEFAULT);
It's not working (no update, no errors - tried a prefixQuery, same). The following query however is updating the matching documents (Postman).
POST my_index/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.path = ctx._source.path.replace(\"example/one\", \"new/example\")"
},
"query": {
"wildcard": {
"path.tree": {
"value: "example/one*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
What am I missing? The Path hierarchy tokenizer is used on the field path.
Your help is much needed.
PS: I can't upgrade to a newer version of elasticsearch.
When testing the solution, I first thought it was related to the custom analyser used on the path field. But it was quickly discarded as I was getting the expected result via Postman.
I finally decided to go with a 'two steps' solution (couldn't use the update by query API). First search for all matching documents, then perform a bulk update.
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.wildcardQuery("path.tree", "example/one*"))
.withSourceFilter(new FetchSourceFilter(new String[]{"_id", "path"}, null))
.build();
List<MyClass> result = elasticsearchRestTemplate.queryForList(query, MyClass.class);
if(!CollectionUtils.isEmpty(result)) {
Map<String, Object> parameters = new HashMap<>();
parameters.put("old", "example/one");
parameters.put("new", "new/example");
Script script = new Script(ScriptType.INLINE,
"painless",
"ctx._source.path = ctx._source.path.replace(params.old, params.new)",
parameters);
BulkRequest request = new BulkRequest();
for (MyClass myClass : result) {
request.add(new UpdateRequest(index, type, myClass.getId()).script(script));
}
client.bulk(request, RequestOptions.DEFAULT);
}
UPDATE
Turns out setting the type in the request was the problem.
UpdateByQueryRequest request = new UpdateByQueryRequest(index)
.setDocTypes(type) <--------------- Remove
.....

ElasticSearch Indexing 100K documents with BulkRequest API using Java RestHighLevelClient

Am reading 100k plus file path from the index documents_qa using scroll API. Actual files will be available in my local d:\drive. By using the file path am reading the actual file and converting into base64 and am reindex with the base64 content (of a file) in another index document_attachment_qa.
My current implementation is, am reading filePath, convering the file into base64 and indexing document along with fileContent one by one. So its taking more time for eg:- indexing 4000 documents its taking more than 6 hours and also connection is terminating due to IO exception.
So now i want to index the documents using BulkRequest API, but am using RestHighLevelClient and am not sure how to using BulkRequest API along with RestHighLevelClient.
Please find my current implementation, which am indexing one by one document.
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
String id=Long.toString(doc.getId());
IndexRequest request = new IndexRequest(ATTACHMENT, "doc", id ) // ATTACHMENT is the index name
.source(jsonMap) // Its my single document.
.setPipeline(ATTACHMENT);
IndexResponse response = SearchEngineClient.getInstance3().index(request); // increased timeout
I found the below documentation for BulkRequest.
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk.html
But am not sure how to implement BulkRequestBuilder bulkRequest = client.prepareBulk(); client.prepareBulk() method when and using RestHighLevelClient.
UPDATE 1
Am trying to indexing all 100K documents in one shot. so i creating one JSONArray and put all my JSONObject into the array one by one. Finally am trying to build BulkRequest and add all my documents (JSONArray) as a source to the BulkRequest and trying to index them.
Here am not sure, how to convert my JSONArray to List of String.
private final static String ATTACHMENT = "document_attachment_qa";
private final static String TYPE = "doc";
JSONArray reqJSONArray=new JSONArray();
while (searchHits != null && searchHits.length > 0) {
...
...
jsonMap = new HashMap<String, Object>();
jsonMap.put("id", doc.getId());
jsonMap.put("app_language", doc.getApp_language());
jsonMap.put("fileContent", result);
reqJSONArray.put(jsonMap)
}
String actionMetaData = String.format("{ \"index\" : { \"_index\" : \"%s\", \"_type\" : \"%s\" } }%n", ATTACHMENT, TYPE);
List<String> bulkData = // not sure how to convert a list of my documents in JSON strings
StringBuilder bulkRequestBody = new StringBuilder();
for (String bulkItem : bulkData) {
bulkRequestBody.append(actionMetaData);
bulkRequestBody.append(bulkItem);
bulkRequestBody.append("\n");
}
HttpEntity entity = new NStringEntity(bulkRequestBody.toString(), ContentType.APPLICATION_JSON);
try {
Response response = SearchEngineClient.getRestClientInstance().performRequest("POST", "/ATTACHMENT/TYPE/_bulk", Collections.emptyMap(), entity);
return response.getStatusLine().getStatusCode() == HttpStatus.SC_OK;
} catch (Exception e) {
// do something
}
You can just new BulkRequest() and add the requests without using BulkRequestBuilder, like:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("foo", "bar", "1")
.source(XContentType.JSON,"field", "foobar"));
request.add(new IndexRequest("foo", "bar", "2")
.source(XContentType.JSON,"field", "foobar"));
...
BulkResponse bulkResponse = myHighLevelClient.bulk(request, RequestOptions.DEFAULT);
In addition to #chengpohi answer. I would like to add below points:
A BulkRequest can be used to execute multiple index, update and/or delete operations using a single request.
It requires at least one operation to be added to the Bulk request:
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("posts", "doc", "1")
.source(XContentType.JSON,"field", "foo"));
request.add(new IndexRequest("posts", "doc", "2")
.source(XContentType.JSON,"field", "bar"));
request.add(new IndexRequest("posts", "doc", "3")
.source(XContentType.JSON,"field", "baz"));
Note: The Bulk API supports only documents encoded in JSON or SMILE.
Providing documents in any other format will result in an error.
Synchronous Operation:
BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
client will be High-Level Rest Client and execution will be synchronous.
Asynchronous Operation(Recommended Approach):
client.bulkAsync(request, RequestOptions.DEFAULT, listener);
The asynchronous execution of a bulk request requires both the BulkRequest instance and an ActionListener instance to be passed to the asynchronous method.
Listener Example:
ActionListener<BulkResponse> listener = new ActionListener<BulkResponse>() {
#Override
public void onResponse(BulkResponse bulkResponse) {
}
#Override
public void onFailure(Exception e) {
}
};
The returned BulkResponse contains information about the executed operations and allows to iterate over each result as follows:
for (BulkItemResponse bulkItemResponse : bulkResponse) {
DocWriteResponse itemResponse = bulkItemResponse.getResponse();
if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.INDEX
|| bulkItemResponse.getOpType() == DocWriteRequest.OpType.CREATE) {
IndexResponse indexResponse = (IndexResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.UPDATE) {
UpdateResponse updateResponse = (UpdateResponse) itemResponse;
} else if (bulkItemResponse.getOpType() == DocWriteRequest.OpType.DELETE) {
DeleteResponse deleteResponse = (DeleteResponse) itemResponse;
}
}
The following arguments can optionally be provided:
request.timeout(TimeValue.timeValueMinutes(2));
request.timeout("2m");
I hope this helps.

Parsing unnamed nested arrays with minimal-json?

So I'm working on a fairly simple Java program which grabs market data from cryptocurrency exchanges and displays information to the user. I am using the minimal-json library.
Here is my current code:
public class Market {
static JsonArray arrayBittrex;
public static void startTimer(){
Timer timer = new Timer();
timer.scheduleAtFixedRate(new TimerTask() {
#Override
public void run() {
String url = "https://bittrex.com/api/v1.1/public/getmarketsummaries";
try {
URL url2 = new URL(url);
URLConnection con = url2.openConnection();
InputStream in = con.getInputStream();
String encoding = "UTF-8";
String body = IOUtils.toString(in, encoding);
arrayBittrex = Json.parse(body).asObject().get("result").asArray();
}
catch(MalformedURLException e) {}
catch(IOException e) {}
}
}, 0,5000);
}
public static float getPrice(String exchange, String market) {
for (JsonValue item : arrayBittrex) {
float last = item.asObject().getFloat("Last", 0);
System.out.println(last);
return last;
}
return 0;
}
}
This code works with simple json, for example (from https://bittrex.com/api/v1.1/public/getmarketsummary?market=btc-ltc):
{
"success" : true,
"message" : "",
"result" : [{
"MarketName" : "BTC-LTC",
"High" : 0.01350000,
"Low" : 0.01200000,
"Volume" : 3833.97619253,
"Last" : 0.01349998
}
]
}
It will properly return the "Last" value in the array.
However, this cant work when the json has multiple arrays (like in https://bittrex.com/api/v1.1/public/getmarketsummaries):
{
"success" : true,
"message" : "",
"result" : [{
"MarketName" : "BTC-888",
"High" : 0.00000919,
"Low" : 0.00000820,
"Volume" : 74339.61396015,
"Last" : 0.00000820
}, {
"MarketName" : "BTC-A3C",
"High" : 0.00000072,
"Low" : 0.00000001,
"Volume" : 166340678.42280999,
"Last" : 0.00000005
}
]
}
So my question is: how can I get the "Last" value by searching for the array by the "MarketName" value?
Here is a direct & null-safe way to tackle this using Java 8 library Dynamics. We're going to parse the json into a Map, read that map dynamically to what we want.
So first we can use Jackson, Gson or something to convert json -> map.
// com.fasterxml.jackson.core:jackson-databind json -> map
Map jsonMap = new ObjectMapper()
.enable(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS)
.readValue(jsonStringOrInputSourceEtc, Map.class);
We can now get a Dynamic instance. And, for example, grab the BTC-A3C - Last value.
Dynamic json = Dynamic.from(jsonMap);
BigDecimal a3cLast = json.get("result").children()
.filter(data -> data.get("MarketName").asString().equals("BTC-A3C"))
.findAny()
.flatMap(data -> data.get("Last").maybe().convert().intoDecimal())
.orElse(BigDecimal.ZERO);
// 5E-8
Or perhaps convert the whole lot into a map of MarketName -> Last value
Map<String, BigDecimal> marketNameLastValue = json.get("result").children()
// assume fields are always present, otherwise see #maybe() methods
.collect(toMap(
data -> data.get("MarketName").asString(),
data -> data.get("Last").convert().intoDecimal()
));
// {BTC-A3C=5E-8, BTC-888=0.00000820}
See more examples https://github.com/alexheretic/dynamics

Saving to json and redis from map

I am trying to store some nested information in a redis server as well as a json at the same time.
I want the structure to look like this in redis, and then access the value(eid12345) as a key.
{
"mCat" : [eid1 : ["123", "234"], eid2 : ["1234", "234"], eid3 : ["2", "0", "1"]]
,
"fCat" : [eid1: ["986", "876"], eid3 : ["a", "hx"], eid31 : ["1"]]
}
The json obviously needs to have everything within double quotes.
{
"mCat" : ["eid1" : ["123", "234"], "eid2" : ["1234", "234"], "eid3" : ["2", "0", "1"]]
,
"fCat" : ["eid1": ["986", "876"], "eid3" : ["a", "hx"], "eid31" : ["1"]]
}
This is my code:
public static String getAllListsJSON() {
String query = "MATCH (t:MALECAT) with t match (t)<-[r:CHILD_OF]-(subtag:MALECAT) WITH t,collect(subtag) as subtags return t.eid as eid, t.level as level, REDUCE(relations = \"\", rel IN subtags| relations + \",\" + rel.eid) AS relations;";
Iterable<Map<String, Object>> itr = Neo4j.queryLagData(query, null);
ConcurrentHashMap<String, Object> mapOfMaleCatChildren = new ConcurrentHashMap<String, Object>();
for (Map map : itr) {
String tagID = (String) map.get("eid");
String[] immediateChildren = ((String) map.get("relations")).split(",");
List<String> listOfImmediateChildren = new ArrayList<>();
for (String eachChildTagEID : immediateChildren) {
if (!"".equals(eachChildTagEID)) {
listOfImmediateChildren.add(eachChildTagEID);
}
}
Collections.sort(listOfImmediateChildren);
mapOfMaleCatChildren.put(tagID, new JSONArray(listOfImmediateChildren));
}
RedisCacheManager.setWithInfiniteRedisTTL(maleCatListsKey, mapOfMaleCatChildren);
return mapOfMaleCatChildren.toString();
}
Please suggest how I can use the eids as a hash and also save the json in correct form at the same time.

retrieve array from mongodb using java with mongodb api

I understand that there are many questions which as for the same and they are answered well. The problem is all those questions use MongoDBObject, MongoDBList to retrieve arrays. My problem is I am using http://api.mongodb.org/java/3.0/index.html?overview-summary.html api and I am having hard time retrieving array and adding elements to it. I have to use MongoCollection, MongoDatabase and MongoClient. I am trying to solve an assignment from mongodb course. The problem statement is to find an array and update it in mongod.
Here is what I have tried
Document post = null; Bson filter = new Document("permalink",
permalink); Bson projection = new Document("comments", true);
List<Document> comments = postsCollection.find(filter)
.projection(projection).into(new ArrayList<Document>());
System.out.println(comments);
post = postsCollection.find(Filters.eq("permalink",
permalink)).first();
Document newComment = new Document();
newComment.append("author", name); newComment.append("body", body);
if (email != null && (!email.equals(""))) {
newComment.append("email", email); }
comments.add(newComment);
Bson filter2 = new Document("_id", post.get("_id"));
System.out.println(comments); post =
postsCollection.find(filter).first();
postsCollection.updateOne(filter2, new Document("$unset",new
Document("comments",true))); postsCollection.updateOne(filter2, new
Document("$set", new Document( "comments", comments)));
This does not create a new comment. Instead, it creates another comments array in comments array itself. THe array should be updated in student
Here is the json data
{
"_id" : ObjectId("55d965eee60dd20c14e8573e"),
"title" : "test title",
"author" : "prasad",
"body" : "test body",
"permalink" : "test_title",
"tags" : [
"test",
"teat"
],
"date" : ISODate("2015-08-23T06:19:26.826Z"),
"comments" : [
{
"_id" : ObjectId("55d965eee60dd20c14e8573e"),
"comments" : [
{
"_id" : ObjectId("55d965eee60dd20c14e8573e"),
"comments" : []
},
{
"author" : "commented",
"body" : "something in comment",
"email" : "some#thing.com"
}
]
},
{
"author" : "commented",
"body" : "something in comment",
"email" : "some#thing.com"
}
]
}
To avoid unchecked casts and linter warnings, along with writing your own loop, use the libary's get(final Object key, final Class<T> clazz) method:
List<Document> comments = posts.get("comments", docClazz)
where docClazz is something that you create once:
final static Class<? extends List> docClazz = new ArrayList<Document>().getClass();
You need not write to this much code. Please check following code,
public void addPostComment(final String name, final String email, final String body,
final String permalink) {
Document post = findByPermalink(permalink);
List<Document> comments = null;
Document comment = new Document();
if(post != null){
comments = (List<Document>)post.get("comments");
comment.append("author",name).append("body", body);
if(email != null){
comment.append("email", email);
}
comments.add(comment);
postsCollection.updateOne(new Document("permalink",permalink),
new Document("$set",new Document("comments",comments)));
}
}
This is much simplified here!
version - mongo-java-driver-3.12.5.jar
comments = post.getList("comments", Document.class);
If you're forced to use older version of mongo driver and you can't use the method the MKP has mentioned, then you can copy the method itself.
Here it is as a Kotlin extension
import org.bson.Document
import java.lang.String.format
fun <T> Document.getList(key: String, clazz: Class<T>, defaultValue: List<T>): List<T> {
val list = this.get(key, List::class.java)
if (list == null) {
return defaultValue
}
list.forEach {
if (!clazz.isAssignableFrom(it!!::class.java)) {
throw ClassCastException(format("List element cannot be cast to %s", clazz.getName()))
}
}
return list as List<T>
}

Categories

Resources