I wanna code es aggregation in java. Can I change the query below to java?
... some query
"aggs": {
"ip_address": {
"terms": {
"field": "ip_address"
},
"aggs": {
"dup_docs": {
"top_hits": {
"sort": [
{
"updated_at": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
I think Using the AggregationBuilders provided by elasticsearch, it seems to be, but I'm not sure.Please help me.
Yes, you need to use AggregationBuilders provided in JHLRC and build the queries, you need to use the terms aggregation builder and top tag hits aggregation builder for your aggregation.
Related
In my index I have a lot of documents with a different structure. The shared keys between all the documents are the following keys: (Store,owner,products,timestamp)
{"Store":"books for school","owner":"user_15","products":40,"#timestamp":2020/08/02T18:00, "a1":1,"a2":...}
{"Store":"books for school","owner":"user_15","products":45,"#timestamp":2020/08/02T19:00,"b1":1...}
{"Store":"books for school","owner":"user_17","products":55,"#timestamp":2020/08/02T20:00, "b2":1....}
In my app, I'm trying to get the most recent shared keys for each store (owner,products). So for this example I wanted to get the last document in the example.
I tried to create an aggregation query on all the shared keys but I'm not sure how to order the inner results by the date (so that the most newest value will be first):
{
"size": 0,
"aggs": {
"store_aggr": {
"terms": {
"field": "Store"
},
"aggs": {
"owner_aggr": {
"terms": {
"field": "owner"
}
}
,
"products_aggr": {
"terms": {
"field": "products"
}
}
}
}
}
}
How can I order the inner buckets of the query by #timestamp? In this way I can just take the first value and it definitely will be the newest..
In addition, how can I filter the data so that the documents will be from the last two days? Do I need to add a query filter on the #timestamp field?
Yes, you'll need a range query to select only the last two days. As to the sorting -- you can use a ordered top_hits agg to retrieve the underlying docs:
{
"query": {
"range": {
"#timestamp": {
"gte": "now-2d"
}
}
},
"size": 0,
"aggs": {
"store_aggr": {
"terms": {
"field": "Store"
},
"aggs": {
"owner_aggr": {
"terms": {
"field": "owner"
},
"aggs": {
"top_hits_aggr": {
"top_hits": {
"sort": {
"#timestamp": {
"order": "desc"
}
}
}
}
}
},
"products_aggr": {
"terms": {
"field": "products"
},
"aggs": {
"top_hits_aggr": {
"top_hits": {
"sort": {
"#timestamp": {
"order": "desc"
}
}
}
}
}
}
}
}
}
}
I'm new to both Elasticsearch and Spring. I've written a Javascript POC that converts a JSON string into an Elasticsearch query (and performs the request).
It takes a string like this:
{
"period": "years",
"format": "xml",
"criteria": {
"operator": "OR",
"operands": [
{
"operator": "AND",
"operands": [
{
"operator": "exists",
"field": "def"
},
{
"operator": "includes",
"field": "keywords",
"value": [
"abcd"
]
}
]
},
{
"operator": "AND",
"operands": [
{
"operator": "from",
"field": "links",
"value": 1
},
{
"operator": "includes",
"field": "keywords",
"value": [
"abcd",
"efgh"
]
}
]
}
]
}
}
(Note: This query may have any levels of nesting)
... and converts it into this:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"exists": {
"field": "def"
}
},
{
"range": {
"effectiveDate": {
"gte": 1543982400,
"lt": 1575518400
}
}
}
]
}
},
{
"bool": {
"must": [
{
"terms": {
"keywords.name": [
"abcd",
"efgh"
]
}
},
{
"range": {
"effectiveDate": {
"gte": 1543982400,
"lt": 1575518400
}
}
}
]
}
}
]
}
},
{
"bool": {
"must": [
{
"bool": {
"must": {
"terms": {
"links": [
11048,
34618,
34658
]
}
}
}
},
{
"bool": {
"must": [
{
"terms": {
"keywords.name": [
"abcd",
"efgh"
]
}
},
{
"range": {
"effectiveDate": {
"gte": 1543982400,
"lt": 1575518400
}
}
}
]
}
}
]
}
}
]
}
}
}
},
"size": 0,
"aggs": {
"by_id": {
"composite": {
"sources": [
{
"agg_on_id": {
"terms": {
"field": "id"
}
}
}
],
"size": 10000,
"after": {
"agg_on_id": -1
}
},
"aggs": {
"latest_snapshot": {
"top_hits": {
"sort": [
{
"effectiveDate": "desc"
}
],
"_source": true,
"size": 1
}
}
}
}
}
}
It first creates a query (similar to above) for a first trip to Elasticsearch to extract some info ('links') needed for building this query.
Each trip to Elasticsearch may return millions of results, so it does paging using the "search_after" mechanism.
I need to convert this POC to a Spring application.
Question: Which one is most appropriate for this case - Spring Data Elasticsearch or Elasticsearch Java High Level REST Client?
Spring data elasticsearch seems to do a good job at creating simple queries without much effort, but would it help me in this case?
Any suggestions are be much appreciated.
Thanks!
Spring Data Elasticsearch uses the high level client provided by Elasticsearch for the non-reactive implementation.
You can use the query builders from Elasticsearch together with Spring Data Elasticsearch too, this gives you the greatest flexibility.
Spring Data Elasticsearch puts on top of that the entity mapping (POJO to JSON), repository functions and the other stuff from Spring Data.
So it's not a question if you should do the one or the other, but if you need or want to use the additional functionality that Spring Data Elasticsearch offers.
Edit:
When using Spring Data Elasticsearch, you configure the used RestHighLevelClient (see the documentation) and then have it injected into your other Spring beans. So you can even mix access to ES using Spring Data ElasticsearchOperations or Repositories and access by using the RestHighLevelClient directly.
I would suggest you use the official Java-high-level rest-client which is being worked on actively at Elastic and you can also look at all the queries builders it supports(it has got query builders for almost all the queries ).
Also previously Elasticsearch didn't have an official client for JAVA but now as they have and actively improving and developing, IMHO you should go ahead with them as it also provides a lot of out of box options and who understand Elasticsearch better than the company behind it :)
I'm having an issue implementing a Filter on a Projection that I have working in the Mongo Shell. I've got a Census object that contains a list of Employees.
{
"_id": "ID",
"name": "census1",
"employees": [ {
"eeId": "EE_ID1"
},
{
"eeId": "EE_ID2"
},
{
"eeId": "EE_ID3"
}
}
Realistically this could contain a lot of employees. So I'd like to be able to retrieve the main Census object, and a subset of employees. I've already implemented 'slice', so this is going to be retrieving a set of employees by their eeId.
This works fine:
db.census.aggregate(
[
{
$match: {
"_id": ObjectId("ID1")
}
},
{
$project: {
"censusName": 1,
"employees" : {
$filter : {
input: "$employees",
as: "employees",
cond: { $in: [ "$$employees.eeId", ["EE_ID1", "EE_ID3"]] }
}
}
}
}
]
).toArray()
The problem is, I can't get it implemented in Java. Here 'employeeIds' is a String of the IDs I want.
MatchOperation matchCensusIdStage = Aggregation.match(new Criteria("id").is(censusId));
ProjectionOperation projectStage = Aggregation.project("censusName")
.and(Filter.filter("employees")
.as("employees")
.by(In.arrayOf(employeeIds).containsValue("employees.eeId")))
.as("employees");
Aggregation aggregation = Aggregation.newAggregation(matchCensusIdStage, projectStage);
return mongoTemplate.aggregate(aggregation, Census.class, Census.class).getMappedResults().get(0);
For this, no results are returned. I've also tried implementing it with a BasicDBObject but got stuck there too.
EDIT (workaround):
I did get a solution using aggregation but not with the filter on the project. This is what I did:
db.parCensus.aggregate(
// Pipeline
[
{
$match: {
"_id": ObjectId("ID1")
}
},
{
$project: {
"_id": 0, "employee": "$employees"
}
},
{
$unwind: "$employee"
},
{
$match: {
"employee.eeId": { $in: ["EE_ID1", "EE_ID3"] }
}
}
]
).toArray()
Java Code:
MatchOperation matchCensusIdStage = Aggregation.match(new Criteria("id").is(censusId));
ProjectionOperation projectStage = Aggregation.project("censusName").and("employees").as("employee");
UnwindOperation unwindStage = Aggregation.unwind("employee");
MatchOperation matchEmployeeIdsStage = Aggregation.match(new Criteria("employee.eeId").in(employeeIds));
Aggregation aggregation = Aggregation.newAggregation(matchCensusIdStage, projectStage, unwindStage, matchEmployeeIdsStage);
I know I could add a $group at the end to put it back into one Census object, but I just created a separate CensusEmployee object to store it all.
The aggregation query posted in the question post works fine. The MongoDB Spring Data API for the aggregation ArrayOperators.In syntax is not clear. I couldn't implement a solution based on this aggregation (and no answers related to on the net).
But, the alternative solution is based on the following aggregation query - and it works fine.
db.collection.aggregate( [
{ $unwind: "$employees" },
{ $match: { "employees.eeId": { $in: ["EE_ID1", "EE_ID3"] } } },
{ $group: { _id: "$_id", name: { $first: "$name" }, employees: { $push: "$employees" } } }
] )
The Java code:
List<String> empsToMatch = Arrays.asList("EE_ID1", "EE_ID3");
MongoOperations mongoOps = new MongoTemplate(MongoClients.create(), "test");
Aggregation agg = newAggregation(
unwind("employees"),
match(Criteria.where("employees.eeId").in(empsToMatch )),
group("_id")
.first("name").as("name")
.push("employees").as("employees")
);
AggregationResults<Document> results = mongoOps.aggregate(agg, "collection", Document.class);
I am trying to build a query with java which filters all hits by a list.
Let's say I have a list of different names and now i want to build a query which returns all elements with the names stored in my list.
Since there are going to be 100+ names in this list i just want to pass the whole list to my query.
First I tried to build a raw query in my elasticsearch head plugin to make it easier for me to implement it into java.
At the moment my raw query looks like this:
{
"query": {
"bool": {
"filter": {
"term": {
"name": {
"value": [
"name1",
"name2"
]
}
}
}
}
}
}
I know that i have at least one element with the name "name1", same for "name2". But this query doesn't return anything.
What am I doing wrong?
Thanks,
Asiemie
The term query does not support arrays of values. However the terms one does so you can do the following:
{
"query": {
"bool": {
"filter": {
"terms": {
"name": [
"name1",
"name2"
]
}
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
You can also wrap term queries into a bool -> should query like so:
{
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{
"term": {
"name": "name1"
}
},
{
"term": {
"name": "name2"
}
}
]
}
}
}
}
}
I am trying to query on ElasticSearch using Java API, my query is:
curl -XGET 'http://localhost:9200/logstash-*/_search?search_type=count' -d '
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and" : [
{
"range": {
"timestamp": {
"gte": "2015-08-20",
"lt": "2015-08-21",
"format": "yyyy-MM-dd",
"time_zone": "+8:00"
}
}
},
{"query": {
"match": {
"request": {
"query": "/v2/brand"
}
}
}
},
{"term": { "response" : "200"}
}
]
}
}
},
"aggs": {
"group_by_device_id": {
"terms": {
"field": "clientip"
}
}
}
}'
The similar sql logic is:
select distinct(clientip) from table where timestamp between '2015-08-20' and '2015-08-21' and request like '/v2/brand%' and response = '200'
How to implement it using Java API?
Please guide I am new to ElasticSearch. Thanks in advance!
I have resolved the problem, below is my codes:
SearchResponse scrollResp1 = client.prepareSearch("logstash-*").setSearchType(SearchType.SCAN).
setQuery(QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),
FilterBuilders.andFilter(FilterBuilders.termFilter("response", "200")
, FilterBuilders.rangeFilter("timestamp").gte(startDate).lt
(endDate), FilterBuilders.queryFilter
(QueryBuilders.matchQuery("request", "signup"))
)))
.addAggregation(AggregationBuilders.terms
("group_by_client_ip").size(0).field("clientip")).get();