I am trying to make the below elasticsearch query to work with spring data. The intent is to return unique results for the field "serviceName". Just like a SELECT DISTINCT serviceName FROM table would do comparing to a SQL database.
{
"aggregations": {
"serviceNames": {
"terms": {
"field": "serviceName"
}
}
},
"size":0
}
I configured the field as a keyword and it made the query work perfectly in the index_name/_search api as per the response snippet below:
"aggregations": {
"serviceNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "service1",
"doc_count": 20
},
{
"key": "service2",
"doc_count": 8
},
{
"key": "service3",
"doc_count": 8
}
]
}
}
My problem is the same query doesn't work in Spring data when I try to run with a StringQuery I get the error below. I am guessing it uses a different api to run queries.
Cannot execute jest action , response code : 400 , error : {"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19}],"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19} , message : null
I have tried using the SearchQuery type to achieve the same results, no duplicates and no object loading, but I had no luck. The below sinnipet shows how I tried doing it.
final TermsAggregationBuilder aggregation = AggregationBuilders
.terms("serviceName")
.field("serviceName")
.size(1);
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("index_name")
.withQuery(matchAllQuery())
.addAggregation(aggregation)
.withSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.withSourceFilter(new FetchSourceFilter(new String[] {"serviceName"}, new String[] {""}))
.withPageable(PageRequest.of(0, 10000))
.build();
Would someone know how to achieve no object loading and object property distinct aggregation on spring data?
I tried many things without success to print queries on spring data, but I could not, maybe because I am using the com.github.vanroy.springdata.jest.JestElasticsearchTemplate implementation.
I got the query parts with the below:
logger.info("query:" + searchQuery.getQuery());
logger.info("agregations:" + searchQuery.getAggregations());
logger.info("filter:" + searchQuery.getFilter());
logger.info("search type:" + searchQuery.getSearchType());
It prints:
query:{"match_all":{"boost":1.0}}
agregations:[{"serviceName":{"terms":{"field":"serviceName","size":1,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}]
filter:null
search type:DFS_QUERY_THEN_FETCH
I figured out, maybe can help someone. The aggregation don't come with the query results, but in a result for it self and is not mapped to any object. The Objects results that comes apparently are samples of the query elasticsearch did to run your aggregation (not sure, maybe).
I ended up by creating a method which can do a simulation of what would be on the SQL SELECT DISTINCT your_column FROM your_table, but I think this will work only on keyword fields, they have a limitation of 256 characters if I am not wrong. I explained some lines in comments.
Thanks #Val since I was only able to figure it out when debugged into Jest code and check the generated request and raw response.
public List<String> getDistinctField(String fieldName) {
List<String> result = new ArrayList<>();
try {
final String distinctAggregationName = "distinct_field"; //name the aggregation
final TermsAggregationBuilder aggregation = AggregationBuilders
.terms(distinctAggregationName)
.field(fieldName)
.size(10000);//limits the number of aggregation list, mine can be huge, adjust yours
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("your_index")//maybe can be omitted
.addAggregation(aggregation)
.withSourceFilter(new FetchSourceFilter(new String[] { fieldName }, new String[] { "" }))//filter it to retrieve only the field we ar interested, probably we can take this out.
.withPageable(PageRequest.of(0, 1))//can't be zero, and I don't want to load 10 results every time it runs, will always return one object since I found no "size":0 in query builder
.build();
//had to use the JestResultsExtractor because com.github.vanroy.springdata.jest.JestElasticsearchTemplate don't have an implementation for ResultsExtractor, if you use Spring defaults, you can probably use it.
final JestResultsExtractor<SearchResult> extractor = new JestResultsExtractor<SearchResult>() {
#Override
public SearchResult extract(SearchResult searchResult) {
return searchResult;
}
};
final SearchResult searchResult = ((JestElasticsearchTemplate) elasticsearchOperations).query(searchQuery,
extractor);
final MetricAggregation aggregations = searchResult.getAggregations();
final TermsAggregation termsAggregation = aggregations.getTermsAggregation(distinctAggregationName);//this is where your aggregation results are, in "buckets".
result = termsAggregation.getBuckets().parallelStream().map(TermsAggregation.Entry::getKey)
.collect(Collectors.toList());
} catch (Exception e) {
// threat your error here.
e.printStackTrace();
}
return result;
}
Related
I am trying to update the following JSON doc in mongodb so that a new document will be created if there is not one matching the "altKey", but if there is a document matching the altKey, any matching "records" will have their "domain" set and their "counts" incremented. I have a requirement that the JSON structure not change and that Spring-Data for mongodb is used.
{
"altKey": "value"
"records": {
"randomName1" {
"domain": "domainValue",
"count": 3
},
"randomName2" {
"domain": "domainValue2",
"count": 5
},
...
"randomNameN" {
"domain": "domainValueN",
"count": 4
}
}
}
The relevent portion of the class I have been attempting to do the update with is:
#Autowired
private MongoTemplate mongoTemplate;
#Override
public void increment(Doc doc) {
Query query = new Query().addCriteria(Criteria.where("altKey").is(doc.getAltKey());
Update update = new Update();
update.setOnInsert("altKey", doc.getAltKey());
for (final Map.Entry<String, RecordData> entry :
doc.getRecords().entrySet()) {
String domainKey = format("records.{0}.domain", entry.getKey());
String domainValue = entry.getValue().getDomain();
update.set(domainKey, domainValue);
String countKey = format("records.{0}.count", entry.getKey());
Integer countValue = entry.getValue().getCount();
update.inc(countKey, countValue);
}
mongoTemplate.upsert(query, update, Doc.class);
}
When I attempt to call the increment method the "altKey" field is successfully persisted, but none of the records persist and I am not sure of why that is. I believe the reason is my attempt to use mongo dot operation for the key when doing the set and inc update portions (ie "records.randomNameN.domain" or "records.randomNameN.count") but I haven't been able to find an alternate way to configure the Update object when I don't know until run time what the name of a particular record will be.
Anyone out there know how to set up the Update object to handle setting nested fields?
Let's say we have a entity "Device" it contains other entity "DeviceInfo", and we have a entity "Site" which contains a List of "DeviceInfo" entities, and "DeviceInfo" has a "Device" and a "Site" in its properties.
My task was to find all "Device"s which are in one "Site". To some endpoint I would send a "Site" id and page number and size of page (since it has to be pageable). I have made it work by creating a JPA specification
public static Specification<Device> bySearchRequest(final DeviceSearchRequest searchRequest) {
return (root, query, cb) -> {
final Join<Device, DeviceInfo> deviceInfo
= root.join(Device_.deviceInfo, JoinType.LEFT);
final Join<DeviceInfo, Site> site
= deviceInfo.join(DeviceInfo_.site, JoinType.LEFT);
return cb.and(cb.equal(site.get(Site.id), searchRequest.getSiteId()));
};
}
And then using I would convert the "Device"s to "IndexDevice"s which is in ES.
deviceRepository.findAll(currentUser,
DeviceRepository.Specs.bySearchRequest(searchRequest),
new PageRequest(searchRequest.getPage(), searchRequest.getSize()))
.getContent().stream().map(x ->indexedDeviceConverter.convert(x)).collect(Collectors.toList());
That is it. It works. But here I am fetching the data from DB, and I already have everything in Elasticsearch. Is there a way to make this same query to fetch the data directly from ES (with paging) ?
Only difference is that in ES "IndexedDevice" has a direct relation with a "IndexedSite" (there is no "IndexedDeviceInfo").
IndexedDevice
{
"id":"3eba5104-0c7a-4564-8270-062945cc8f5e",
"name":"D4",
"site":{
"id":"46e7ada4-3f34-4962-b849-fac59c8fe8ad",
"name":"SomeSite",
"displayInformation":"SomeSite",
"subtitle":""
},
"suggest":{
"input":[]
},
"displayInformation":"D4",
"subtitle":""
}
IndexedSite
{
"id": "46e7ada4-3f34-4962-b849-fac59c8fe8ad",
"name": "SomeSite",
"displayInformation": "SomeSite",
"subtitle": ""
}
I managed to do it. At the end it was really simple. I used ElasticsearchRepository (org.springframework.data.elasticsearch.repository).
elasticsearchRepositoy.search(QueryBuilders.termsQuery
("site.id",
searchRequest.getSite()),
new PageRequest(searchRequest.getPage(),
searchRequest.getSize()));
I'm using the Spring framework to perform an aggregation on my mongodb. However, the lookup keeps failing and I can't understand why. Here's the query:
Aggregation aggregation = newAggregation(
match(Criteria.where("idOfUser").is(loggedInAccount.getId())),
group("imgID"),
new CustomAggregationOperation(
new BasicDBObject("$lookup",
new BasicDBObject("from","img")
.append("localField","_id")
.append("foreignField","_id")
.append("as","uniqueImgs")
)
),
limit(pageable.getPageSize()),
skip(pageable.getPageSize()*pageable.getPageNumber())
);
AggregationResults aggregationResults = mongo.aggregate(aggregation, "comment", String.class); //Using String at the moment just to see the output clearly.
CustomAggregationOperation is as follows:
public class CustomAggregationOperation implements AggregationOperation {
private DBObject operation;
public CustomAggregationOperation (DBObject operation) {
this.operation = operation;
}
#Override
public DBObject toDBObject(AggregationOperationContext context) {
return context.getMappedObject(operation);
}
}
The Spring MongoDB version of lookup isn't recognised which is why I'm using this CustomAggregationOperation. AFAIK it shouldn't affect it.
Ideally what I want to happen is:
Get all the comments of the user.
Make sure that the imgID is distinct for the comments (so there are only the id's of the imgs that have been commented on)
Get the actual img objects related to these ids.
Paginate the returned imgs.
At the moment, step 3 doesn't work, and I think 4 wouldn't work either since limit and skip won't be applied to the objects in "uniqueImgs".
What is returned is:
[{ "_id" : "570e2f5cb1b9125510a443f5" , "uniqueImgs" : [ ]}]
How can I fix this?
EDIT
the imgID stored isn't an ObjectID whereas the _id in the img collection is. Would that have any effect?
The current release (at the time of writing 1.9.5) has support for the $lookup operator and can be implemented as (untested):
LookupOperation lookupOperation = LookupOperation.newLookup()
.from("img")
.localField("_id")
.foreignField("_id")
.as("uniqueImgs");
Aggregation agg = newAggregation(
match(Criteria.where("idOfUser").is(loggedInAccount.getId())),
group("imgID"),
lookupOperation,
limit(pageable.getPageSize()),
skip(pageable.getPageSize()*pageable.getPageNumber())
);
AggregationResults aggregationResults = mongo.aggregate(agg, "comment", String.clas);
I'm new in mongodb. I have following data as a JSON format in mongodb. I need to search the bookLabel or the shortLabel for the book and it should show me all the information about the book. For example: if I query for 'Cosmos' it'll show all the description about the book, like: bookLabel, writer, yearPublish, url. How can I do that in java? Need query, please help.
"Class":"Science",
"Description":[
{
"bookLabel":"Cosmos (Mass Market Paperback)",
"shortLabel":"Cosmos",
"writer":"Carl Sagan",
"yearPublish":[
"2002"
],
"url":"https://www.goodreads.com/book/show/55030.Cosmos"
},
{
"bookLabel":"The Immortal Life of Henrietta Lacks",
"shortLabel":"Immortal Life",
"writer":"Rebecca Skloot",
"yearPublish":[
"2010, 2011"
],
"url":"https://www.goodreads.com/book/show/6493208-the-immortal-life-of-henrietta-lacks"
}
],
"Class":"History",
"Description":[
{
"bookLabel":"The Rise and Fall of the Third Reich",
"shortLabel":"Rise and Fall",
"writer":"William L. Shirer",
"yearPublish":[
"1960"
],
"url":"https://www"
}
]
}
With MongoDB Java Driver v3.2.2 you can do something like this:
FindIterable<Document> iterable = collection.find(Document.parse("{\"Description.shortLabel\": {$regex: \"Cosmos\"}"));
This returns all documents containing Cosmos in the Description.shortLabel nested field. For an exact match, try this {"Description.shortLabel": "Cosmos"}. Replace shortLabel with bookLabelto search the bookLabel field. Then you can do iterable.forEach(new Block<Document>()) on the returned documents. To search both bookLabel and shortLabel, you can do a $or{}. My syntax could be wrong so check the MongoDB manual. But this is the general idea.
For this, you can use MongoDB's Text Search Capabilities. You'll have to create a text index on your collection for that.
First of all create a text index on your collection on fields bookLabel and shortLabel.
db.books.createIndex({ "Description.bookLabel" : "text", "Description.shortLabel" : "text" })
Note that this is done in the Mongo shell
Then
DBObject command = BasicDBObjectBuilder
.start("text", "books")
.append("search", "Cosmos")
.get();
CommandResult result = db.command(command);
BasicDBList results = (BasicDBList) result.get("results");
for(Object o : results) {
DBObject dbo = (DBObject) ((DBObject) o).get("obj");
String id = (String) dbo.get("_ID");
System.out.println(id);
}
Haven't really tested this. But just give it a try. Should work.
I have a template foo.mustache saved in {{ES_HOME}}/config/scripts.
POST to http://localhost:9200/forward/_search/template with the following message body returns a valid response:
{
"template": {
"file": "foo"
},
"params": {
"q": "a",
"hasfilters": false
}
}
I want to translate this to using the java API now that I've validated all the different components work. The documentation here describes how to do it in java:
SearchResponse sr = client.prepareSearch("forward")
.setTemplateName("foo")
.setTemplateType(ScriptService.ScriptType.FILE)
.setTemplateParams(template_params)
.get();
However, I would instead like to just send a plain string query (i.e. the contents of the message body from above) rather than build up the response using the java. Is there a way to do this? I know with normal queries, I can construct it like so:
SearchRequestBuilder response = client.prepareSearch("forward")
.setQuery("""JSON_QUERY_HERE""")
I believe the setQuery() method wraps the contents into a query object, which is not what I want for my template query. If this is not possible, I will just have to go with the documented way and convert my json params to Map<String, Object>
I ended up just translating my template_params to a Map<String, Object> as the documentation requires. I utilized groovy's JsonSlurper to convert the text to an object with a pretty simple method.
import groovy.json.JsonSlurper
public static Map<String,Object> convertJsonToTemplateParam(String s) {
Object result = new JsonSlurper().parseText(s);
//Manipulate your result if you need to do any additional work here.
//I.e. Programmatically determine value of hasfilters if filters != null
return (Map<String,Object>) result;
}
And you could pass in the following as a string to this method:
{
"q": "a",
"hasfilters": true
"filters":[
{
"filter_name" : "foo.untouched",
"filters" : [ "FOO", "BAR"]
},
{
"filter_name" : "hello.untouched",
"list" : [ "WORLD"]
}
]
}