Mongo Projection is not working for SubFields

Mongo Projection is not working for SubFields - java

I have a json object something like this. I want to exclude the field "placeOfBirth" from the response. For that I am using projection. But somehow it is working only for fileds but not for the subfields. So placeofBirth is never excluded but status is removed in the response.
Here is my code
Projection projectionExclude = Projection.of().exclude("subObject.placeOfBirth").exclude("status");
MorphiaCursor<T> cursor = datastore.aggregate(T.class)
.match(Filters.eq("about", id))
.project(projectionExclude).execute(T.class);
if(cursor != null && cursor.hasNext()){
result = cursor.toList().get(0);
}
Json data
{
"about: " "testing/123",
"subObject" : [
{
"about" : "subobject/123",
"placeOfBirth": {
"birth": ["Lisbon"]
}
}
],
"status" : "approved"
}
How can make this work? Is there some other way to achieve this?

This actually works for me on 2.2. Here's the test I'm running:
MongoCollection<Document> collection = getDocumentCollection(User.class);
collection.insertOne(parse("{'about': 'testing/123', 'subObject' : [ {'about' : 'subobject/123', 'placeOfBirth': {'birth': " +
"['Lisbon']}}],'status' : 'approved'}"));
Document next = getDs().aggregate(User.class)
.match(eq("about", "testing/123"))
.project(project()
.exclude("subObject.placeOfBirth")
.exclude("status"))
.execute(Document.class)
.next();
assertFalse(next.toJson().contains("placeOfBirth"));
assertFalse(next.toJson().contains("status"));

Related

Fetching new documents on insertion in ElasticSearch with Java

I have been looking for a solution to create a sort of alert when new documents are added to ES via Logstash. I have seen some threads on here such as : stackoverflow.com/a/51980618/4604579, but that does not really serve my purposes as the plug-ins mentioned do not work with the newest version of ELK and there is no Changes API out yet.
So I have resorted to trying 2 different approaches:
Create a Scroll and run over all the documents in a given index using the Search API, retain the last document's ID and use it after a given timeout period to get all documents that were added after it
Creating a Watcher that checks after a given interval (for example 5 minutes) if new documents have been added to an index.
I have advanced on approach 1, where I can scroll through about 50k documents that are currently in ES and retrieve the last documents id (i sort the query based on timestamp in ascending order, that way I know that the last document will be the latest that was inserted). But I don't know how efficient this approach is and I know that a scroller may time out after a given delay, so if no new documents are inserted, that means the scroll will be removed.
I was looking also into using a Watcher, but I don't really understand how I can set up the condition to check if a new document was inserted in a given index.
I imagine I can do something of the genre:
PUT _watcher/watch/new_docs
{
"trigger" : {
"schedule" : {
"interval" : "5s"
}
},
"input" : {
"search" : {
"request" : {
"indices" : "logstash",
"body" : {
"size" : 0,
"query" : { "match" : { "#timestamp" : "now-5s" } }
}
}
}
},
"condition" : {
"compare" : { ?? }
},
"actions" : {
"my_webhook" : {
"webhook" : {
"method" : "POST",
"host" : "mylisteninghost",
"port" : 9200,
"path" : "/{{watch_id}}",
"body" : "New document {{document ID}} errors"
}
}
I am not exactly sure how to define or use the Watcher and if it would even work.
Can anyone let me know what the best course of action would be?
Thank you
EDIT:
For those interested I found a way to poll the ES REST API using Search After. The difference is that using Scroll, there is a snapshot taken of the documents in the ES DB, so any new documents added wont be in this snapshot. Contrary to that, Search After is state-less, which means that it will use unique sorting parameters (in my case timestamp/id) and hold the last one fetched, afterwards we query all documents that come after the held parameters. This way if any new documents are added, they will come after the held timestamp and will be fetched by the query.
Code:
public static void searchAfterElasticData()
throws FileNotFoundException, IOException, InterruptedException {
//create a search request for a given index
SearchRequest search_request = new SearchRequest(elastic_index);
SearchSourceBuilder source_builder =
getSearchSourceBuilder("#timestamp", "_id", 100);
search_request.source(source_builder);
SearchResponse search_response = null;
try {
search_response = client.search(search_request, RequestOptions.DEFAULT);
} catch (ElasticsearchException | ConnectException ex) {
log.info("Error while querying Elastic API: {}", ex.toString());
}
if (search_response != null) {
SearchHit[] search_hits = search_response.getHits().getHits();
Object[] sort_values = null;
while (search_hits != null) {
if (search_hits.length > 0) {
//if there are records retrieved, parse them
for (SearchHit hit: search_hits) {
Map<String, Object> source_map = hit.getSourceAsMap();
try {
parse((String)source_map.get("message"));
} catch (Exception ex) {
log.error("Error while parsing: {}",
(String)source_map.get("message"));
}
}
//get sorting value of last record and do new request
log.info("Getting sorting values");
sort_values = search_response.getHits()
.getAt(search_hits.length-1).getSortValues();
} else {
log.info("Waiting 1 minute for new entries");
Thread.sleep(60000);
}
source_builder.searchAfter(sort_values);
search_request.source(source_builder);
search_response =
client.search(search_request, RequestOptions.DEFAULT);
search_hits = search_response.getHits().getHits();
log.info("Fetched hits: {}", search_hits.length);
log.info("Searching after for new hits");
}
}
}
I still would like to know if it is possible to do the same using a Watcher, also if anyone has any suggestions to make the code more elegant, please share.
Thank you

Spring data aggregation query elasticsearch

I am trying to make the below elasticsearch query to work with spring data. The intent is to return unique results for the field "serviceName". Just like a SELECT DISTINCT serviceName FROM table would do comparing to a SQL database.
{
"aggregations": {
"serviceNames": {
"terms": {
"field": "serviceName"
}
}
},
"size":0
}
I configured the field as a keyword and it made the query work perfectly in the index_name/_search api as per the response snippet below:
"aggregations": {
"serviceNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "service1",
"doc_count": 20
},
{
"key": "service2",
"doc_count": 8
},
{
"key": "service3",
"doc_count": 8
}
]
}
}
My problem is the same query doesn't work in Spring data when I try to run with a StringQuery I get the error below. I am guessing it uses a different api to run queries.
Cannot execute jest action , response code : 400 , error : {"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19}],"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19} , message : null
I have tried using the SearchQuery type to achieve the same results, no duplicates and no object loading, but I had no luck. The below sinnipet shows how I tried doing it.
final TermsAggregationBuilder aggregation = AggregationBuilders
.terms("serviceName")
.field("serviceName")
.size(1);
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("index_name")
.withQuery(matchAllQuery())
.addAggregation(aggregation)
.withSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.withSourceFilter(new FetchSourceFilter(new String[] {"serviceName"}, new String[] {""}))
.withPageable(PageRequest.of(0, 10000))
.build();
Would someone know how to achieve no object loading and object property distinct aggregation on spring data?
I tried many things without success to print queries on spring data, but I could not, maybe because I am using the com.github.vanroy.springdata.jest.JestElasticsearchTemplate implementation.
I got the query parts with the below:
logger.info("query:" + searchQuery.getQuery());
logger.info("agregations:" + searchQuery.getAggregations());
logger.info("filter:" + searchQuery.getFilter());
logger.info("search type:" + searchQuery.getSearchType());
It prints:
query:{"match_all":{"boost":1.0}}
agregations:[{"serviceName":{"terms":{"field":"serviceName","size":1,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}]
filter:null
search type:DFS_QUERY_THEN_FETCH

I figured out, maybe can help someone. The aggregation don't come with the query results, but in a result for it self and is not mapped to any object. The Objects results that comes apparently are samples of the query elasticsearch did to run your aggregation (not sure, maybe).
I ended up by creating a method which can do a simulation of what would be on the SQL SELECT DISTINCT your_column FROM your_table, but I think this will work only on keyword fields, they have a limitation of 256 characters if I am not wrong. I explained some lines in comments.
Thanks #Val since I was only able to figure it out when debugged into Jest code and check the generated request and raw response.
public List<String> getDistinctField(String fieldName) {
List<String> result = new ArrayList<>();
try {
final String distinctAggregationName = "distinct_field"; //name the aggregation
final TermsAggregationBuilder aggregation = AggregationBuilders
.terms(distinctAggregationName)
.field(fieldName)
.size(10000);//limits the number of aggregation list, mine can be huge, adjust yours
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("your_index")//maybe can be omitted
.addAggregation(aggregation)
.withSourceFilter(new FetchSourceFilter(new String[] { fieldName }, new String[] { "" }))//filter it to retrieve only the field we ar interested, probably we can take this out.
.withPageable(PageRequest.of(0, 1))//can't be zero, and I don't want to load 10 results every time it runs, will always return one object since I found no "size":0 in query builder
.build();
//had to use the JestResultsExtractor because com.github.vanroy.springdata.jest.JestElasticsearchTemplate don't have an implementation for ResultsExtractor, if you use Spring defaults, you can probably use it.
final JestResultsExtractor<SearchResult> extractor = new JestResultsExtractor<SearchResult>() {
#Override
public SearchResult extract(SearchResult searchResult) {
return searchResult;
}
};
final SearchResult searchResult = ((JestElasticsearchTemplate) elasticsearchOperations).query(searchQuery,
extractor);
final MetricAggregation aggregations = searchResult.getAggregations();
final TermsAggregation termsAggregation = aggregations.getTermsAggregation(distinctAggregationName);//this is where your aggregation results are, in "buckets".
result = termsAggregation.getBuckets().parallelStream().map(TermsAggregation.Entry::getKey)
.collect(Collectors.toList());
} catch (Exception e) {
// threat your error here.
e.printStackTrace();
}
return result;
}

Improving mongo query Performance(using Sub query )

Two documents can have same IMAGE_CONTENT_INSTANCE_HANDLE and state can be BOOKED or RELEASED
but I want all image instances handles which are only RELEASED state,
Currently I am doing this by firing two queries it introduced performance issues.
{
"state" : "RELEASED"
}
with projection { "imageContentInstance.handle" : 1}
i am iterating through the result which is coming out from this query
and firing another query as below and excluding the handles which are also in BOOKED state from adding to the list.So i gets handles only in the RELEASED state
while (cursor.hasNext()) {
ICI ici = objectMapper.readValue(result, ICI_COLLECTION_TYPE_REF);
String result = JSON.serialize(cursor.next());
try {
queryDocument = { "imageContentInstance.handle" : ici.getImageContentInstance().getHandle() , "state" : "BOOKED"}
Document bookedDoc = iciDAO.findOne(queryDocument);
if (null != bookedDoc)
LOGGER.debug("Calling continue and skipping booked state ");
continue;
}
iciHandles.add(ici.getImageContentInstance().getHandle().toString());
LOGGER.debug("ImageInstanceHandle is added to the lisr and the lise is "+iciHandles.size());
}
I want to achieve this in a single mongo query as an example query written in sql to increase performance .I really appreciate your comments .
SELECT *
FROM ici i
WHERE i.state = 'RELEASED'
AND NOT EXISTS
(SELECT * FROM ici ic WHERE ic.handle = i.handle AND ic.state = 'BOOKED'
);
example :
Suppose the documents are as below
{
"_id" : ObjectId("58c9f524fa8cd6a517cf5ddf"),
"imageContentInstance" : {
"handle" : "ICI:1234",
"key" : null,
}
"instanceHandle" : "LCI:RNBM12",
"state" : "BOOKED",
}
{
"_id" : ObjectId("58c9f524fa8cd6a517cf5ddf"),
"imageContentInstance" : {
"handle" : "ICI:1234",
"key" : null,
}
"instanceHandle" : "LCI:RNBM13",
"state" : "RELEASED",
}
{
"_id" : ObjectId("58c9f524fa8cd6a517cf5ddf"),
"imageContentInstance" : {
"handle" : "ICI:456",
"key" : null,
}
"instanceHandle" : "LCI:RNBM14",
"state" : "RELEASED"
}
My query should return the handle of the last document alone .ie, document with the status only with the RELEASED status .i am stuck, I really appreciate your ideas to improve this.

From Your question,i understand that you want all state ='Released' ans state!= 'BOOKED' which i think you have written little incorrect.
MongoDB query:
db.inventory.find({'state' : 'RELEASED'}})
Also go through mognodb docs
I hope it will help.I am also new to mongodb,if there is an error please make it correct.

Iteration through json with multiple API calls for other requests

I am using Postman to iterate through a json of about 40 pairs of items. I need to then take that array created and run an API call for each element in the array to return a set of results. Using the code here, i'm only able to pull the final element in the array. I attempted to put the postman.setNextRequest in the for loop but then I found out that no matter where it is, it always executes last.
tests["Status code is 200 (that's good!)"] = (responseCode.code === 200);
if (responseCode.code === 200) {
var jsonData = pm.response.json();
var json = [];
postman.setEnvironmentVariable("json", jsonData)
postman.setNextRequest('GetAdmins');
for (var key in jsonData ) {
if (jsonData.hasOwnProperty(key)) {
postman.setEnvironmentVariable("organizationId", jsonData[key].id)
postman.setEnvironmentVariable("orgname", jsonData[key].name)
tests[jsonData[key].name + " " + jsonData[key].id] = !!jsonData[key].name;
}
}
}
else {
postman.setNextRequest(null);
}
GetAdmins is another GET that uses {{organizationId}} in the call.
I think what i'm looking for is; what is the best way to go about running another API call on each element in the json?
Thanks in advance!
EDIT: Adding JSON output
[
{
"id": XXXXXX,
"name": "Name1"
},
{
"id": XXXXXX,
"name": "Name2"
},
{
"id": XXXXXX,
"name": "Name3"
}
]

This might work to get the data - I’ve not tried it out yet though so it might not work first time.
var jsonData = pm.response.json()
data = _.map(jsonData, item => {
organizationId: item.id
orgName: item.name
})
pm.environment.set('organizationData', JSON.stringify(data))
Then you have all of your organization data in a variable and you can use these to iterate over the Id’s in the next "Get Admins" request.
You would need to have some code in the Pre-request script of the next request to access each of the id’s to iterate over in the request. You need to parse the variable like this:
var orgID = pm.environment.get(JSON.parse("organizationData"))
Then orgID[0].organizationId would be the first one in the list.
Not a complete solution for your problem but it might help you get the data.

I was able to solve this using these two guides:
Loops and dynamic variables in Postman: part 1
Loops and dynamic variables in Postman: part 2
I also had to implement the bigint fix for java, but in Postman, which was very annoying... that can be found here:
Hacking bigint in API testing with Postman Runner Newman in CI Environment
Gist
A lot of google plus trial and error got me up and running.
Thanks anyway for all your help everyone!
This ended up being my final code:
GetOrgs
tests["Status code is 200 (that's good!)"] = (responseCode.code === 200);
eval(postman.getGlobalVariable("bigint_fix"));
var jsonData = JSON.parse(responseBody);
var id_list = [];
jsonData.forEach(function(list) {
var testTitle = "Org: " + list.name + " has id: " + JSON.stringify(list.id);
id_list.push(list.id);
tests[testTitle] = !!list.id;
});
postman.setEnvironmentVariable("organizationId",JSON.stringify(id_list.shift()));
postman.setEnvironmentVariable("id_list", JSON.stringify(id_list));
postman.setNextRequest("GetAdmins");
GetAdmins
eval(postman.getGlobalVariable("bigint_fix"));
var jsonData = JSON.parse(responseBody);
jsonData.forEach(function(admin) {
var testTitle = "Admin: " + admin.name + " has " + admin.orgAccess;
tests[testTitle] = !!admin.name;
});
var id_list = JSON.parse(environment.id_list);
if (id_list.length > 0) {
postman.setEnvironmentVariable("organizationId", JSON.stringify(id_list.shift());
postman.setEnvironmentVariable("id_list", JSON.stringify(id_list));
postman.setNextRequest("GetAdmins");
}
else {
postman.clearEnvrionmentVariable("organizationId");
postman.clearEnvironmentVariable("id_list");
}

Fuzzy query on dates with ElasticSearch Java API

I'm trying to perform the following query through ElasticSearch Java API
{
"query" : {
"fuzzy" : {
"dateOfBirth" : {
"value" : "1944-11-30",
"fuzziness" : "365d"
}
}
}
}
I doesn't understand how to specify the fuzzines value for 365 days in this kind of query.

You can use this:
FuzzyQueryBuilder queryBuilder = fuzzyQuery("dateOfBirth" ,"1944-11-30" ).fuzziness(Fuzziness.build("365d"))
Hope this helps

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Mongo Projection is not working for SubFields - java

Related

Fetching new documents on insertion in ElasticSearch with Java

Spring data aggregation query elasticsearch

Improving mongo query Performance(using Sub query )

Iteration through json with multiple API calls for other requests

Fuzzy query on dates with ElasticSearch Java API

Categories

Resources