ElasticSearch Java API to get distinct values from the Query Builders - java

Am querying ElasticSearch using Java API and am getting lot of duplicate values. I want to get only the unique values from the query (distinct value). How can we get the distinct values from the Query Builder.
Please find my java code below, which is giving duplicate values.
QueryBuilder qb2=null;
List<Integer> link_id_array=new ArrayList<Integer>();
for(Replacement link_id:linkIDList) {
link_id_array.add(link_id.getLink_id());
}
qb2 = QueryBuilders.boolQuery()
.must(QueryBuilders.termsQuery("id", link_id_array));
Am using elastic search 6.2.3 version with RestHighLevelClient

Way 1: You need to use the so-called aggregation API :
Sample query to get distinct emails client :
{
"query" : {
"match_all" : { }
},
"aggregations" : {
"label_agg" : {
"terms" : {
"field" : "Email_client",
"size" : 100
}
}
}
}
Java code sample=>
SearchRequestBuilder aggregationQuery =
client.prepareSearch("emails")
.setQuery(QueryBuilders.matchAllQuery())
.addAggregation(AggregationBuilders.terms("label_agg")
.field("Email_client").size(100));
SearchResponse response = aggregationQuery.execute().get();
Aggregation aggregation = response.getAggregations().get("label_agg");
StringTerms st = (StringTerms) aggregation;
return st.getBuckets().stream()
.map(bucket -> bucket.getKeyAsString())
.collect(toList());
Way 2 :
Use cardinality of aggregation Api:
Sample elasticquery:
{
"size": 0,
"aggs": {
"distinct": {
"cardinality": {
"field": "Email_client",
"size" : 100
}
}
}
Java code sample=>
AggregationBuilder agg11 = AggregationBuilders.cardinality("distinct").field("Email_client");
SearchResponse response11 = client.prepareSearch("emails")// we can give multiple index names here
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query11)
.addAggregation(agg11)
.setExplain(true)
.setSize(0)
.get();

Related

MongoDB query for value in field with regex using Java

I am trying to query from my collection of documents which looks like:
{ "_id" : ObjectId("94"), "EmailAddress" :"adam#gmail.com","Interests": "CZ1001,CE2004" }
{ "_id" : ObjectId("44"), "EmailAddress" :"ben#gmail.com", "Interests":"CE1001,CE4002" }
{ "_id" : ObjectId("54"), "EmailAddress" :"chris#gmail.com","Interests":"CE1001,CE2002" }
An example is that i want to retrieve the email addresses, given that the field "Interests" has that value i am looking for.
Example if i search CZ1001, i will get back Obj 1 EmailAddress details.
If i search CE1001, i will get back Obj 2 and Obj 3 EmailAddress details.
The object id are uniquely created when i inserted records at the start if that helps.
I am able to get the objects on MongoDB Shell using
db.users.find(Module: {"$regex": "CE1001"}})
Only the email addresses are needed.
I am trying to get all the email addresses and got stuck at this code.
Document doc = (Document) collection.find(new BasicDBObject("Module", {"$regex":"CE1001"})) .projection(Projections.fields(Projections.include("EmailAddress"), Projections.excludeId())).first();
Where
new BasicDBObject("Module", {"$regex":"CE1001"}) is not allowed.
new BasicDBObject("Module", String_variable) is allowed
For a particular Interest, your mongoDB query would look like(taking CE1001 as an example):
db.collection.find({
$or: [
{
Interests: {
$regex: "^CE1001,"
}
},
{
Interests: {
$regex: ",CE1001,"
}
},
{
Interests: {
$regex: ",CE1001$"
}
}
]
},
{
"EmailAddress": 1
})
For others who happens to drop by this post. Below are the working codes. Credits to #Veeram.
FindIterable <Document> results = collection.find(new BasicDBObject("Module", new BasicDBObject("$regex", "CE1001")) .projection(Projections.fields(Projections.include("EmailAddress"), Projections.excludeId()));
for(Document doc : results) {
doc.getString("EmailAddress")
}

Group aggregation using spring data mongodb

I tried to write a group aggregation query using the year value from a date object as a key, but for some reason I'm getting this exception.
org.springframework.data.mapping.PropertyReferenceException: No property year(invoiceDate)
Here is the mongo query which I'm trying to replicate:
db.collection.aggregate([
{
$match:
{
"status": "Active"
}
},
{
$group:
{
"_id":{$year:"$invoiceDate"}
}
},
{
$sort:
{
"_id" : -1
}
}
])
And this is my Java implementation:
Aggregation aggregation = Aggregation.newAggregation(
match(new Criteria().andOperator(criteria())),
Aggregation.group("year(invoiceDate)")
).withOptions(newAggregationOptions().allowDiskUse(true).build());
I also didn't find a way how I can apply the sorting on the results from the grouping.
You're basically looking for extractYear() which maps to the $year operator with MongoDB:
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(new Criteria().andOperator(criteria())),
Aggregation.project().and("invoiceDate").extractYear().as("_id"),
Aggregation.group("_id"),
Aggregation.sort(Sort.Direction.DESC, "_id)
)
This generally needs to go into a $project in order to make the helpers happy.
If you really want the expression within the $group then you can add a custom operation expression:
Aggregation aggregation = Aggregation.newAggregation(
Aggregation.match(new Criteria().andOperator(criteria())),
new AggregationOperation() {
#Override
public Document toDocument(AggregationOperationContext aggregationOperationContext) {
return new Document("$group",
new Document("_id", new Document("$year","$invoiceDate") )
);
}
},
Aggregation.sort(Sort.Direction.DESC, "_id)
)

Elasticsearch multilevel object search Java

I have a document given below.
{
"my_id": "123",
"content": {
"name": "abc",
"designation": "engineer"
}
}
I have written Java code for elasticsearch to access the field name which is given below.
String field = "content.name";
String value = "abc"
SearchResponse response = esClient.prepareSearch("indexName")
.setTypes("data")
.setQuery(QueryBuilders.matchQuery(field, value))
.get();
But the output that I am getting for this multilevel object search empty hits. Is there a way to access multilevel objects in Java
The given query works from sense.
GET indexName/_search
{
"query" : {
"match" : {
"content.name" : "abc"
}
}
}

MongoDB compare document fields in aggregation pipeline

I have a collection of documents like the following:
{ name : "John" ,
age : 25.0 ,
bornIn : "Milan" ,
city : [ {
name : "Roma" ,
state : "IT" ,
mayor : "John"
}]
}
{ name : "Jim" ,
age : 35.0 ,
bornIn : "Madrid" ,
city : [ {
name : "Madrid" ,
state : "ESP" ,
mayor : "Jim"
}]
}
I want to retrieve all the documents that have the field $bornIn equal to the field $city.name. I need to do this as an intermediate stage of a pipeline, so I can't use the $where operator.
I searched online and I found a suggestion to implement something like this:
{ $project:
{ matches:
{ $eq:[ '$bornIn', '$city.name' ] }
}
},
{ $match:
{ matches:true }
} )
But it didn't work neither via shell nor via Java driver as it marks the fields as different.
For the sake of completeness I report my code:
final DBObject eq = new BasicDBObject();
LinkedList eqFields = new LinkedList();
eqFields.add("$bornIn");
eqFields.add("$city.name");
eq.put("$eq", eqFields);
projectFields.put("matches", eq);
final DBObject proj = new BasicDBObject("$project", projectFields);
LinkedList agg = new LinkedList();
agg.add(proj);
final AggregationOutput aggregate = table.aggregate( agg);
Do you have any suggestion? I'm using MongoDB 3.2, and I need to do this via Java Driver.
Thanks!!
PS. It is not relevant but actually the documents above are the output of a $lookup stage among collections "cities" and "persons", with join on $name/$mayor.. it is super cool!! :D :D
I'm a little rusty on how Mongo deals with deep equality searching arrays of objects, but this is definitely doable with $unwind
db.foo.aggregate([
{$unwind: "$city"},
{ $project:
{ matches:
{ $eq:[ '$bornIn', '$city.name' ] }
}
},
{ $match:
{ matches:true }
}
]);
I'm not on a computer with Mongo right now, so my syntax might be off a bit.

ElasticSearch 2.0 Java API aggregate filter with query_string

Running on ElasticSearch 2.0 connecting via the Java API. I've got the following query working via the REST API and can't figure out how to do this using the Java API.
{
"query": {
"query_string": {
"query": "myfield:*"
}
},
"aggs" : {
"foo_low": {
"filter" : {
"query" : {
"query_string" : {
"query": "myfield:[1 TO 5]"
}
}
}
},
"foo_high": {
"filter" : {
"query" : {
"query_string" : {
"query": "myfield:[6 TO 10]"
}
}
}
}
}
}
I've had a look at the examples using the addAggregation method but not sure how to pass in the query_string part.
As a bit of a background, was originally using Solr so have multiple Solr facet queries that need to be translated to ElasticSearch. The facet queries are a bit more complicated then I've shown in the example, with multiple fields and conditions referenced in each Solr facet query which is why I want to use the Lucene query with query_string.
Any ideas gratefully received! Thanks.
Since it looks like myfield is an integer field, you could use a range filter instead of a query_string which is more intended for text matching. Since you have two ranges you're interested in, I suggest to use the range aggregation which allows you to define several range buckets (note that the to parameter is not included in the range). Your query would then go like this:
{
"query": {
"query_string": {
"query": "myfield:*"
}
},
"aggs": {
"high_low": {
"range": {
"field": "myfield",
"keyed": true,
"ranges": [
{
"key": "foo_low",
"from": 1,
"to": 6
},
{
"key": "foo_high",
"from": 6,
"to": 11
}
]
}
}
}
}
Translated into Java code, it goes like this:
// 1. bootstrap the query
SearchRequestBuilder search = node.client().prepareSearch()
.setSize(0).setFrom(0)
.setQuery(QueryBuilders.queryStringQuery("myfield:*"));
// 2. create the range aggregation
RangeBuilder rangeAgg = AggregationBuilders.range("high_low").field("myfield");
rangeAgg.addRange("foo_low", 1, 6);
rangeAgg.addRange("foo_high", 6, 11);
search.addAggregation(rangeAgg);
// 3. execute the query
SearchResponse response = search.execute().actionGet();
** UPDATE **
As requested, here is the Java code that will generate the exact query you posted:
// 1. bootstrap the query
SearchRequestBuilder search = node.client().prepareSearch()
.setSize(0).setFrom(0)
.setQuery(QueryBuilders.queryStringQuery("myfield:*"));
// 2. create the filter aggregations
FilterAggregationBuilder lowAgg = AggregationBuilders
.filter("foo_low")
.filter(QueryBuilders.queryStringQuery("myfield:[1 TO 5]"));
search.addAggregation(lowAgg);
FilterAggregationBuilder highAgg = AggregationBuilders
.filter("foo_high")
.filter(QueryBuilders.queryStringQuery("myfield:[6 TO 10]"));
search.addAggregation(highAgg);
// 3. execute the query
SearchResponse response = search.execute().actionGet();

Categories

Resources