I have a json
{"id": 2,"name": "Chethan","address":"Banglore"}
Trying to groupby two fields id and name,
List<String> statFields = new ArrayList();
statFields.add("name");
statFields.add("id");
// 2. bootstrap the query
SearchRequestBuilder search = client.prepareSearch("student")
.setSize(0).setFrom(0)
.setQuery(QueryBuilders.matchAllQuery());
// 3. add a stats aggregation for each of your fields
for (String field : statFields) {
search.addAggregation(AggregationBuilders.terms(field+"_stats").field(field));
}
// 4. execute the query
SearchResponse response = search.execute().actionGet();
for(String field : statFields) {
Terms termAgg = (Terms) response.getAggregations().get(field+"_stats");
for (Terms.Bucket entry : termAgg.getBuckets()) {
System.out.println(entry.getKey() + " **** " + entry.getDocCount()); // Doc count
}
}
Below is the response
chethan**** 2
Raj**** 1
Mohan**** 1
1 **** 1
2 **** 1
3 **** 1
But I need combined response like sql,
name id count
chethan 1 1
is it possible through elasticsearch java api
You should have used subAggregation plus use keyword type for aggregations.
Java Rest High-Level Client
Assuming your mappings look like:
PUT student
{
"mappings": {
"doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"id": {
"type": "keyword"
},
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
In order to group by name and id you should use this query (low level query):
GET student/_search
{
"size": 0,
"aggs": {
"name": {
"terms": {
"field": "name.keyword",
"size": 10
},"aggs": {
"id": {
"terms": {
"field": "id",
"size": 10
}
}
}
}
}
}
In java the query above is similar to:
SearchResponse response = client.search(new SearchRequest("student")
.source(new SearchSourceBuilder()
.size(0)
.aggregation(
AggregationBuilders.terms("by_name").field("name.keyword")
.subAggregation(AggregationBuilders.terms("by_id")
.field("id")
)
);
If you want to use your code, probably something like that :
// 2. bootstrap the query
SearchRequestBuilder search = client.prepareSearch("student")
.setSize(0).setFrom(0)
.setQuery(QueryBuilders.matchAllQuery());
// 3. add a stats aggregation for each of your fields
TermsAggregationBuilder aggregation = AggregationBuilders.terms("name_stats").field("name.keyword");
aggregation.subAggregation(AggregationBuilders.terms("id_stats").field("id"));
search.aggregation(aggregation);
// 4. execute the query
SearchResponse response = search.execute().actionGet();
Terms termAgg = (Terms)response.getAggregations().get("name_stats");
for (Terms.Bucket entry: termAgg.getBuckets()) {
if (entry.getDocCount() != 0) {
Terms terms =entry.getAggregations().get("id");
Collection<Terms.Bucket> buckets = terms.getBuckets();
for (Bucket sub : buckets ) {
System.out.println((int) sub.getDocCount());
System.out.println(sub.getKeyAsString());
}
}
}
I removed the for loop. you should design your own structure now that you have to use sub-aggregations.
UPDATE
Is this what you want?
GET student/_search
{
"size": 0,
"aggs" : {
"name_id" : {
"terms" : {
"script" : {
"source": "doc['name.keyword'].value + '_' + doc['id'].value",
"lang": "painless"
}
}
}
}
}
I hope this is what you aimed for.
Related
I have an existing collection, containing several documents.
[{
"_id": "...1",
"prop1": "...",
"prop2": "...",
"someArray": [
{
"value": "sub element 1.1"
},
{
"value": "sub element 1.2"
},
{
"value": "sub element 1.3"
}
]
}, {
"_id": "...2",
"prop1": "...",
"prop2": "...",
"someArray": [
{
"value": "sub element 2.1"
},
{
"value": "sub element 2.2"
}
]
}, // many others here...
]
For each root document, I would like to add an _id property of type ObjectId on each sub-element of someArray. So, after I run my command, the content of the collection is the following:
[{
"_id": "...1",
"prop1": "...",
"prop2": "...",
"someArray": [
{
"_id": ObjectId("..."),
"value": "sub element 1.1"
},
{
"_id": ObjectId("..."),
"value": "sub element 1.2"
},
{
"_id": ObjectId("..."),
"value": "sub element 1.3"
}
]
}, {
"_id": "...2",
"prop1": "...",
"prop2": "...",
"someArray": [
{
"_id": ObjectId("..."),
"value": "sub element 2.1"
},
{
"_id": ObjectId("..."),
"value": "sub element 2.2"
}
]
}, // ...
]
Each ObjectId being, of course, unique.
The closer I got was with this:
db.getCollection('myCollection').updateMany({}, { "$set" : { "someArray.$[]._id" : ObjectId() } });
But every sub-element of the entire collection ends up with the same ObjectId value...
Ideally, I need to get this working using Java driver for MongoDB. The closest version I got is this (which presents the exact same problem: all the ObjectId created have the same value).
database
.getCollection("myCollection")
.updateMany(
Filters.ne("someArray", Collections.emptyList()), // do not update empty arrays
new Document("$set", new Document("someArray.$[el]._id", "ObjectId()")), // set the new ObjectId...
new UpdateOptions().arrayFilters(
Arrays.asList(Filters.exists("el._id", false)) // ... only when the _id property doesn't already exist
)
);
With MongoDB v4.4+, you can use $function to use javascript to assign the _id in the array.
db.collection.aggregate([
{
"$addFields": {
"someArray": {
$function: {
body: function(arr) {
return arr.map(function(elem) {
elem['_id'] = new ObjectId();
return elem;
})
},
args: [
"$someArray"
],
lang: "js"
}
}
}
}
])
Here is the Mongo playground for your reference. (It's slightly different from the code above as playground requires the js code to be in double quote)
For older version of MongoDB, you will need to use javascript to loop the documents and update them one by one.
db.getCollection("...").find({}).forEach(function(doc) {
doc.someArray = doc.someArray.map(function(elem) {
elem['_id'] = new ObjectId();
return elem;
})
db.getCollection("...").save(doc);
})
Here is what I managed to write in the end:
MongoCollection<Document> collection = database.getCollection("myCollection");
collection
.find(Filters.ne("someArray", Collections.emptyList()), MyItem.class)
.forEach(item -> {
item.getSomeArray().forEach(element -> {
if( element.getId() == null ){
collection.updateOne(
Filters.and(
Filters.eq("_id", item.getId()),
Filters.eq("someArray.value", element.getValue())
),
Updates.set("someArray.$._id", new ObjectId())
);
}
});
});
The value property of sub-elements had to be unique (and luckily it was). And I had to perform separate updateOne operations in order to obtain a different ObjectId for each element.
I want to add a new field to jsonObject and this new field's name will be based on a value of another field. To be clear, this an examples of what I want to achieve.
{
"values": [
{
"id": "1",
"properties": [
{
"stat": "memory",
"data": 8
},
{
"stat": "cpu",
"data": 4
}
]
},
{
"id": "2",
"properties": [
{
"stat": "status",
"data": "OK"
},
{
"stat": "cpu",
"data": 4
}
]
}
]
}
I want to add a new field to each json object that will have the value of field "stat" as name.
{
"values": [
{
"id": "1",
"properties": [
{
"stat": "memory",
"data": 8,
"memory": 8
},
{
"stat": "cpu",
"data": 4,
"cpu": 4
}
]
},
{
"id": "2",
"properties": [
{
"stat": "status",
"data": 0,
"status": 0
},
{
"stat": "cpu",
"data": 4,
"cpu": 4
}
]
}
]
}
I have tried to do the following with JsonPath library but for me it's an ugly solution as I will parse the json three times and I do some manual replacements.
val configuration = Configuration.builder().options(Option.DEFAULT_PATH_LEAF_TO_NULL, Option.ALWAYS_RETURN_LIST).build()
val jsonContext5 = JsonPath.using(configuration).parse(jsonStr)
val listData = jsonContext.read("$['values'][*]['properties'][*]['data']").toString
.replace("[", "").replace("]", "").split(",").toList
val listStat = jsonContext.read("$['values'][*]['properties'][*]['stat']").toString
.replace("[", "").replace("]", "")
.replace("\"", "").split(",").toList
// Replacing values of "stat" by values of "data"
jsonContext5.map("$['values'][*]['properties'][*]['stat']", new MapFunction() {
var count = - 1
override def map(currentValue: Any, configuration: Configuration): AnyRef = {
count += 1
listData(count)
}
})
// replace field stat by its value
for( count <- 0 to listStat.size - 1){
val path = s"['values'][*]['properties'][$count]"
jsonContext5.renameKey(path, "stat", s"${listStat(count)}")
}
This is the result obtained
{
"values": [
{
"id": "1",
"properties": [
{
"data": 8,
"memory": "8"
},
{
"data": 4,
"cpu": "4"
}
]
},
{
"id": "2",
"properties": [
{
"data": 0,
"memory": "0"
},
{
"data": 4,
"cpu": "4"
}
]
}
]
}
Is there any better method to achieve this result ? I tried to do it with gson but it's not good handling paths.
This a way to do it with Gson but I will lose the information about other columns since I'm creating another json.
val jsonArray = jsonObject.get("properties").getAsJsonArray
val iter = jsonArray.iterator()
val agreedJson = new JsonArray()
while(iter.hasNext) {
val json = iter.next().getAsJsonObject
agreedJson.add(replaceCols(json))
}
def replaceCols(json: JsonObject) = {
val fieldName = "stat"
if(json.has(fieldName)) {
val columnName = json.get(fieldName).getAsString
val value: String = if (json.has("data")) json.get("data").getAsString else ""
json.addProperty(columnName, value)
}
json
}
How about something like this?
private static void statDup(final JSONObject o) {
if (o.containsKey("properties")) {
final JSONArray a = (JSONArray) o.get("properties");
for (final Object e : a) {
final JSONObject p = (JSONObject) e;
p.put(p.get("stat"), p.get("data"));
}
} else {
for (final Object key : o.keySet()) {
final Object value = o.get(key);
if (value instanceof JSONArray) {
for (final Object e : (JSONArray) value) {
statDup((JSONObject) e);
}
}
}
}
}
Using Gson, what you should do is create a base class that represents your initial JSON object. Then, extend that class and add the additional attribute(s) you want to add, such as "stat". Then, load the JSON objects into memory, either one by one or all together, then make the necessary changes to each to encompass your changes. Then, map those changes to the new class if you didn't in the prior step, and serialize them to a file or some other storage.
This is type-safe, a pure FP circe implementation with circe-optics:
object CirceOptics extends App {
import cats.Applicative
import cats.implicits._
import io.circe.{Error => _, _}
import io.circe.syntax._
import io.circe.parser._
import io.circe.optics.JsonPath._
val jsonStr: String = ???
def getStat(json: Json): Either[Error, String] =
root.stat.string.getOption(json)
.toRight(new Error(s"Missing stat of string type in $json"))
def getData(json: Json): Either[Error, Json] =
root.data.json.getOption(json)
.toRight(new Error(s"Missing data of json type in $json"))
def setField(json: Json, key: String, value: Json) =
root.at(key).setOption(Some(value))(json)
.toRight(new Error(s"Unable to set $key -> $value to $json"))
def modifyAllPropertiesOfAllValuesWith[F[_]: Applicative](f: Json => F[Json])(json: Json): F[Json] =
root.values.each.properties.each.json.modifyF(f)(json)
val res = for {
json <- parse(jsonStr)
modifiedJson <- modifyAllPropertiesOfAllValuesWith { j =>
for {
stat <- getStat(j)
data <- getData(j)
prop <- setField(j, stat, data)
} yield prop
} (json)
} yield modifiedJson
println(res)
}
The previous answer from Gene McCulley gives a solution with Java and using class net.minidev.json. This answer is using class Gson and written in Scala.
def statDup(o: JsonObject): JsonObject = {
if (o.has("properties")) {
val a = o.get("properties").getAsJsonArray
a.foreach { e =>
val p = e.getAsJsonObject
p.add(p.get("stat").getAsString, p.get("data"))
}
} else {
o.keySet.foreach { key =>
o.get(key) match {
case jsonArr: JsonArray =>
jsonArr.foreach { e =>
statDup(e.getAsJsonObject)
}
}
}
}
o
}
Your task is to add a new field to each record under each properties in the JSON file, make the current stat value the field name and data values the new field values. The code will be rather long if you try to do it in Java.
Suggest you using SPL, an open-source Java package to get it done. Coding will be very easy and you only need one line:
A
1
=json(json(file("data.json").read()).values.run(properties=properties.(([["stat","data"]|stat]|[~.array()|data]).record())))
SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as addfield.splx and invoke it in a Java application as you call a stored procedure:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st = con.prepareCall("call addfield()");
st.execute();
…
How can I find the number of duplicates in each document in Java-MongoDB
I have collection like this.
Collection example:
{
"_id": {
"$oid": "5fc8eb07d473e148192fbecd"
},
"ip_address": "192.168.0.1",
"mac_address": "00:A0:C9:14:C8:29",
"url": "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes": {
"$date": "2021-02-13T02:02:00.000Z"
}
{
"_id": {
"$oid": "5ff539269a10d529d88d19f4"
},
"ip_address": "192.168.0.7",
"mac_address": "00:A0:C9:14:C8:30",
"url": "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes": {
"$date": "2021-02-12T19:00:00.000Z"
}
}
{
"_id": {
"$oid": "60083d9a1cad2b613cd0c0a2"
},
"ip_address": "192.168.1.5",
"mac_address": "00:0A:05:C7:C8:31",
"url": "www.facebook.com",
"datetimes": {
"$date": "2021-01-24T17:00:00.000Z"
}
}
example query:
BasicDBObject whereQuery = new BasicDBObject();
DBCursor cursor = table1.find(whereQuery);
while (cursor.hasNext()) {
DBObject obj = cursor.next();
String ip_address = (String) obj.get("ip_address");
String mac_address = (String) obj.get("mac_address");
Date datetimes = (Date) obj.get("datetimes");
String url = (String) obj.get("url");
System.out.println(ip_address, mac_address, datetimes, url);
}
in Java, How I can know count duplicated data of "url". And how many of duplicated.
in mongodb you can solve this problem with "Aggregation Pipelines". You need to implement this pipeline in "Mongodb Java Driver". It gives only duplicated results with their duplicates count.
db.getCollection('table1').aggregate([
{
"$group": {
// group by url and calculate count of duplicates by url
"_id": "$url",
"url": {
"$first": "$url"
},
"duplicates_count": {
"$sum": 1
},
"duplicates": {
"$push": {
"_id": "$_id",
"ip_address": "$ip_address",
"mac_address": "$mac_address",
"url": "$url",
"datetimes": "$datetimes"
}
}
}
},
{ // select documents that only duplicates count higher than 1
"$match": {
"duplicates_count": {
"$gt": 1
}
}
},
{
"$project": {
"_id": 0
}
}
]);
Output Result:
{
"url" : "https://people.richland.edu/dkirby/141macaddress.htm",
"duplicates_count" : 2.0,
"duplicates" : [
{
"_id" : ObjectId("5fc8eb07d473e148192fbecd"),
"ip_address" : "192.168.0.1",
"mac_address" : "00:A0:C9:14:C8:29",
"url" : "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes" : {
"$date" : "2021-02-13T02:02:00.000Z"
}
},
{
"_id" : ObjectId("5ff539269a10d529d88d19f4"),
"ip_address" : "192.168.0.7",
"mac_address" : "00:A0:C9:14:C8:30",
"url" : "https://people.richland.edu/dkirby/141macaddress.htm",
"datetimes" : {
"$date" : "2021-02-12T19:00:00.000Z"
}
}
]
}
If I understand your question correctly you're trying to find the amount of duplicate entries for the field url. You could iterate over all your documents and add them to a Set. A Set has the property of only storing unique values. When you add your values, the ones that are already in the Set will not be added again. Thus the difference of the number of entries in the Set to the number of documents is the amount of duplicate entries for the given field.
If you wanted to know which URLs are non-unique, you could evaluate the return value from Set.add(Object) which will tell you, whether or not the given value has been in the Set beforehand. If it has, you got yourself a duplicate.
Hi have written a query for getting the avg of values at a position in Elastic search
elastic search payload : "userData": [ { "sub":1234, "value":678,"condition" :"A" },{ "sub":1234, "value":678,"condition" :"B" }]
{
"aggs": {
"student_data": {
"date_histogram": {
"field":"#timestamp",
"calendar_interval":"minute"
},
"aggs": {
"user_avg": {
"avg": {
"field":"value"
}
}
}
}
}
}
What I want is to get the array of elements of which the avg value is returned.
For example, if the avg of values on the basis of condition 'A' is 42 with values as {20,10,40,60,80}
In the output needed a field which can provide an array of [20,10,40,60,80]
I don't think you can obtain an array formatted like [20, 10, 40, 60, 80] in the response of a query. I can't think of a way to obtain it by using aggregations or scripted fields. Nevertheless, you can easily (1) get that information from the same query that specifies the aggregations and the filter logic; then, (2) post-process the query response to collect all the value's values used to calculate the average, by formatting format them in the way you prefer. How you post-process your response depends on the client/script you are using to send queries to Elasticsearch.
For example, you can output the values used to calculate the average as query hits.
{
"size": 100, <-- adjust this upper limit to your use case
"_source": "value", <-- include only the `value` field in the response
"query": {
"match": {
"condition": "A"
}
},
"aggs": {
"user_avg": {
"avg": {
"field": "value"
}
}
}
}
Or you can output the values used to calculate the average in a more compact way, by using terms aggregations.
{
"size": 0,
"_source": "value",
"query": {
"match": {
"condition": "A"
}
},
"aggs": {
"group_by_values": {
"terms": {
"field": "value",
"size": 100 . <-- adjust this upper limit to your use case
}
},
"user_avg": {
"avg": {
"field": "value"
}
}
}
}
The result of the latter will be something like:
"aggregations" : {
"array_of_values" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 50,
"doc_count" : 2
},
{
"key" : 60,
"doc_count" : 1
},
{
"key" : 100,
"doc_count" : 1
}
]
},
"user_avg" : {
"value" : 65.0
}
}
Good day guys.
I have a document type "generalTask" with an array of nested documents called "completeUser".
Here is the mapping :
{
"generalTask": {
"properties": {
"id": {
"type": "long"
},
"completeUser": {
"type": "nested",
"properties": {
"completeTime": {
"type": "long"
},
"userId": {
"type": "long"
}
}
}
}
}
}
And now, we have two documents.
e.g.
{
"_source": {
"id": 1001,
"completeUser": [
{
"userId": 1,
"completeTime": 100
},
{
"userId": 1,
"completeTime": 300
},
{
"userId":1,
"completeTime": 500
}
]
}
}
and
{
"_source": {
"id": 1002,
"completeUser": [
{
"userId": 1,
"completeTime": 200
},
{
"userId": 1,
"completeTime": 400
},
{
"userId":1,
"completeTime": 600
}
]
}
}
I can get the docCount (which is 6) by nested aggregation like this:
BoolQueryBuilder query = QueryBuilders.boolQuery();
query.must(nestedQuery("completeUser", termQuery("completeUser.userId", 1)));
BoolQueryBuilder builder = getClient().prepareSearch(getIndexName()).setTypes(getIndexType()).setQuery(query)
.addAggregation(AggregationBuilders.nested("nested").path("completeUser")
.subAggregation(AggregationBuilders.count("count").field("completeUser.userId"))).setSize(0);
SearchResponse searchResponse = getSearchResponse(builder);
Nested nested = searchResponse.getAggregations().get("nested");
long docCount = nested.getDocCount(); // the docCount is 6
but there are still only 2 documents in the searchResponse :
SearchRequestBuilder builder = getClient().prepareSearch(getIndexName()).setTypes(getIndexType())
.setSearchType(SearchType.QUERY_THEN_FETCH).setQuery(query).setFrom(0).setSize(5); // the size is 5
builder.addSort(SortBuilders.fieldSort("completeUser.completeTime")
.setNestedFilter(FilterBuilders.termFilter("completeUser.userId", 1))
.order(SortOrder.DESC));
SearchResponse searchResponse = getSearchResponse(builder);
But what I want is duplicate documents based on completeTime.
How can I get 5 (the value of size) documents in the searchResponse order by the completeTime?
Oh, yes. ElasticSearch version is 1.4.5
Since you have only two documents stored in the index (or at least there are two documents that match your query), those two documents will be returned to you within SearchResponse. You can not directly get the first five nested documents sorted by completeTime as search response contains whole objects that are stored in the index.
The solution for you would be to parse the results out in java code:
Since you set the query size to 5, you will get at max five results back, sorted by highest completeTime first. That means that you do receive all the needed data and then some more
Parse all of the nested documents in java, then sort them again and take the first five of them