How can I fix this ElasticSearch Fielddata exception in Java code? - java

I'm working on Java code to create an index and query on ElasticSearch.
I keep getting this exception when trying to use count, sort API:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true ......
How can I set Fielddata to true?
I used BulkRequest to create index, how can I add mapping to BulkRequest?
Here is the code to create index:
BulkRequest request=new BulkRequest();
try {
BufferedReader br=new BufferedReader(new FileReader(fileName));
String line;
while((line=br.readLine())!=null) {
request.add(new IndexRequest(indexName, type).source(line, XContentType.JSON)); ;
BulkResponse bulkresp=client.bulk(request);
afterBulk(request,bulkresp);
}
catch (IOException e) {
e.printStackTrace();
}

First of all, let's go to the source of the problem, you want to do a sorting operation on the text field, which requires you to have fielddata enabled.
Before you enable fielddata, consider why you are using a text field
for aggregations, sorting, or in a script. It usually doesn’t make
sense to do so.
A text field is analyzed before indexing so that a value like New York
can be found by searching for new or for york. A terms aggregation on
this field will return a new bucket and a york bucket, when you
probably want a single bucket called New York.
Same would be the case for sorting. How you're suppose to sort on the field, where you have tons of terms.
Instead, you should have a text field for full text searches, and an
unanalyzed keyword field with doc_values enabled for aggregations,
as follows
{
"mappings": {
"_doc": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
To the other part of the question - you need to take a look at CreateIndexRequest, it allows to specify mappings explicitly. Most likely, right now you're using dynamics ones, that's why fielddata causes you the problems. More information on how to use CreateIndexRequest - https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index

Related

Fetching new documents on insertion in ElasticSearch with Java

I have been looking for a solution to create a sort of alert when new documents are added to ES via Logstash. I have seen some threads on here such as : stackoverflow.com/a/51980618/4604579, but that does not really serve my purposes as the plug-ins mentioned do not work with the newest version of ELK and there is no Changes API out yet.
So I have resorted to trying 2 different approaches:
Create a Scroll and run over all the documents in a given index using the Search API, retain the last document's ID and use it after a given timeout period to get all documents that were added after it
Creating a Watcher that checks after a given interval (for example 5 minutes) if new documents have been added to an index.
I have advanced on approach 1, where I can scroll through about 50k documents that are currently in ES and retrieve the last documents id (i sort the query based on timestamp in ascending order, that way I know that the last document will be the latest that was inserted). But I don't know how efficient this approach is and I know that a scroller may time out after a given delay, so if no new documents are inserted, that means the scroll will be removed.
I was looking also into using a Watcher, but I don't really understand how I can set up the condition to check if a new document was inserted in a given index.
I imagine I can do something of the genre:
PUT _watcher/watch/new_docs
{
"trigger" : {
"schedule" : {
"interval" : "5s"
}
},
"input" : {
"search" : {
"request" : {
"indices" : "logstash",
"body" : {
"size" : 0,
"query" : { "match" : { "#timestamp" : "now-5s" } }
}
}
}
},
"condition" : {
"compare" : { ?? }
},
"actions" : {
"my_webhook" : {
"webhook" : {
"method" : "POST",
"host" : "mylisteninghost",
"port" : 9200,
"path" : "/{{watch_id}}",
"body" : "New document {{document ID}} errors"
}
}
I am not exactly sure how to define or use the Watcher and if it would even work.
Can anyone let me know what the best course of action would be?
Thank you
EDIT:
For those interested I found a way to poll the ES REST API using Search After. The difference is that using Scroll, there is a snapshot taken of the documents in the ES DB, so any new documents added wont be in this snapshot. Contrary to that, Search After is state-less, which means that it will use unique sorting parameters (in my case timestamp/id) and hold the last one fetched, afterwards we query all documents that come after the held parameters. This way if any new documents are added, they will come after the held timestamp and will be fetched by the query.
Code:
public static void searchAfterElasticData()
throws FileNotFoundException, IOException, InterruptedException {
//create a search request for a given index
SearchRequest search_request = new SearchRequest(elastic_index);
SearchSourceBuilder source_builder =
getSearchSourceBuilder("#timestamp", "_id", 100);
search_request.source(source_builder);
SearchResponse search_response = null;
try {
search_response = client.search(search_request, RequestOptions.DEFAULT);
} catch (ElasticsearchException | ConnectException ex) {
log.info("Error while querying Elastic API: {}", ex.toString());
}
if (search_response != null) {
SearchHit[] search_hits = search_response.getHits().getHits();
Object[] sort_values = null;
while (search_hits != null) {
if (search_hits.length > 0) {
//if there are records retrieved, parse them
for (SearchHit hit: search_hits) {
Map<String, Object> source_map = hit.getSourceAsMap();
try {
parse((String)source_map.get("message"));
} catch (Exception ex) {
log.error("Error while parsing: {}",
(String)source_map.get("message"));
}
}
//get sorting value of last record and do new request
log.info("Getting sorting values");
sort_values = search_response.getHits()
.getAt(search_hits.length-1).getSortValues();
} else {
log.info("Waiting 1 minute for new entries");
Thread.sleep(60000);
}
source_builder.searchAfter(sort_values);
search_request.source(source_builder);
search_response =
client.search(search_request, RequestOptions.DEFAULT);
search_hits = search_response.getHits().getHits();
log.info("Fetched hits: {}", search_hits.length);
log.info("Searching after for new hits");
}
}
}
I still would like to know if it is possible to do the same using a Watcher, also if anyone has any suggestions to make the code more elegant, please share.
Thank you

Spring data aggregation query elasticsearch

I am trying to make the below elasticsearch query to work with spring data. The intent is to return unique results for the field "serviceName". Just like a SELECT DISTINCT serviceName FROM table would do comparing to a SQL database.
{
"aggregations": {
"serviceNames": {
"terms": {
"field": "serviceName"
}
}
},
"size":0
}
I configured the field as a keyword and it made the query work perfectly in the index_name/_search api as per the response snippet below:
"aggregations": {
"serviceNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "service1",
"doc_count": 20
},
{
"key": "service2",
"doc_count": 8
},
{
"key": "service3",
"doc_count": 8
}
]
}
}
My problem is the same query doesn't work in Spring data when I try to run with a StringQuery I get the error below. I am guessing it uses a different api to run queries.
Cannot execute jest action , response code : 400 , error : {"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19}],"type":"parsing_exception","reason":"no [query] registered for [aggregations]","line":2,"col":19} , message : null
I have tried using the SearchQuery type to achieve the same results, no duplicates and no object loading, but I had no luck. The below sinnipet shows how I tried doing it.
final TermsAggregationBuilder aggregation = AggregationBuilders
.terms("serviceName")
.field("serviceName")
.size(1);
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("index_name")
.withQuery(matchAllQuery())
.addAggregation(aggregation)
.withSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.withSourceFilter(new FetchSourceFilter(new String[] {"serviceName"}, new String[] {""}))
.withPageable(PageRequest.of(0, 10000))
.build();
Would someone know how to achieve no object loading and object property distinct aggregation on spring data?
I tried many things without success to print queries on spring data, but I could not, maybe because I am using the com.github.vanroy.springdata.jest.JestElasticsearchTemplate implementation.
I got the query parts with the below:
logger.info("query:" + searchQuery.getQuery());
logger.info("agregations:" + searchQuery.getAggregations());
logger.info("filter:" + searchQuery.getFilter());
logger.info("search type:" + searchQuery.getSearchType());
It prints:
query:{"match_all":{"boost":1.0}}
agregations:[{"serviceName":{"terms":{"field":"serviceName","size":1,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}}]
filter:null
search type:DFS_QUERY_THEN_FETCH
I figured out, maybe can help someone. The aggregation don't come with the query results, but in a result for it self and is not mapped to any object. The Objects results that comes apparently are samples of the query elasticsearch did to run your aggregation (not sure, maybe).
I ended up by creating a method which can do a simulation of what would be on the SQL SELECT DISTINCT your_column FROM your_table, but I think this will work only on keyword fields, they have a limitation of 256 characters if I am not wrong. I explained some lines in comments.
Thanks #Val since I was only able to figure it out when debugged into Jest code and check the generated request and raw response.
public List<String> getDistinctField(String fieldName) {
List<String> result = new ArrayList<>();
try {
final String distinctAggregationName = "distinct_field"; //name the aggregation
final TermsAggregationBuilder aggregation = AggregationBuilders
.terms(distinctAggregationName)
.field(fieldName)
.size(10000);//limits the number of aggregation list, mine can be huge, adjust yours
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("your_index")//maybe can be omitted
.addAggregation(aggregation)
.withSourceFilter(new FetchSourceFilter(new String[] { fieldName }, new String[] { "" }))//filter it to retrieve only the field we ar interested, probably we can take this out.
.withPageable(PageRequest.of(0, 1))//can't be zero, and I don't want to load 10 results every time it runs, will always return one object since I found no "size":0 in query builder
.build();
//had to use the JestResultsExtractor because com.github.vanroy.springdata.jest.JestElasticsearchTemplate don't have an implementation for ResultsExtractor, if you use Spring defaults, you can probably use it.
final JestResultsExtractor<SearchResult> extractor = new JestResultsExtractor<SearchResult>() {
#Override
public SearchResult extract(SearchResult searchResult) {
return searchResult;
}
};
final SearchResult searchResult = ((JestElasticsearchTemplate) elasticsearchOperations).query(searchQuery,
extractor);
final MetricAggregation aggregations = searchResult.getAggregations();
final TermsAggregation termsAggregation = aggregations.getTermsAggregation(distinctAggregationName);//this is where your aggregation results are, in "buckets".
result = termsAggregation.getBuckets().parallelStream().map(TermsAggregation.Entry::getKey)
.collect(Collectors.toList());
} catch (Exception e) {
// threat your error here.
e.printStackTrace();
}
return result;
}

How to upsert nested value using Spring-Data for Mongodb

I am trying to update the following JSON doc in mongodb so that a new document will be created if there is not one matching the "altKey", but if there is a document matching the altKey, any matching "records" will have their "domain" set and their "counts" incremented. I have a requirement that the JSON structure not change and that Spring-Data for mongodb is used.
{
"altKey": "value"
"records": {
"randomName1" {
"domain": "domainValue",
"count": 3
},
"randomName2" {
"domain": "domainValue2",
"count": 5
},
...
"randomNameN" {
"domain": "domainValueN",
"count": 4
}
}
}
The relevent portion of the class I have been attempting to do the update with is:
#Autowired
private MongoTemplate mongoTemplate;
#Override
public void increment(Doc doc) {
Query query = new Query().addCriteria(Criteria.where("altKey").is(doc.getAltKey());
Update update = new Update();
update.setOnInsert("altKey", doc.getAltKey());
for (final Map.Entry<String, RecordData> entry :
doc.getRecords().entrySet()) {
String domainKey = format("records.{0}.domain", entry.getKey());
String domainValue = entry.getValue().getDomain();
update.set(domainKey, domainValue);
String countKey = format("records.{0}.count", entry.getKey());
Integer countValue = entry.getValue().getCount();
update.inc(countKey, countValue);
}
mongoTemplate.upsert(query, update, Doc.class);
}
When I attempt to call the increment method the "altKey" field is successfully persisted, but none of the records persist and I am not sure of why that is. I believe the reason is my attempt to use mongo dot operation for the key when doing the set and inc update portions (ie "records.randomNameN.domain" or "records.randomNameN.count") but I haven't been able to find an alternate way to configure the Update object when I don't know until run time what the name of a particular record will be.
Anyone out there know how to set up the Update object to handle setting nested fields?

Get all videos of a YouTube Playlist in Java

Context I'm working with Android Studio (Java). I want to obtain all the videos of a given playlist (or 50, I will get all the other after).
Problem I see people using url like
https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&maxResults=50&playlistId=PLiEwZfNgb4fVrRzTonlVEMj6DB2Nmzg2M&key=AIzaSyC2_YRcTE9916fsmA0_KRnef43GbLzz8m0
but I don't know how to implement this in Java. I follow some tuto and I got a way to get information totally different like :
YouTube.Search.List query;
query = youtube.search().list("id,snippet");
query.setKey(MY_API_KEY);
query.setMaxResults((long)20);
query.setType("video");
query.setFields("items(id/videoId,snippet/title,snippet/description,snippet/thumbnails/default/url)");
And I really don't understand how do I get something else than a search.
Most of documentation is in english only...
EDIT
Ok, so I continued to try, I think I got a near solution, but I got an error.
private YouTube youtube;
private YouTube.PlaylistItems.List playlistItemRequest;
private String PLAYLIST_ID = "PLiEwZfNgb4fVrRzTonlVEMj6DB2Nmzg2M";
public static final String KEY = "AIzaSyC2_YRcTE9916fsmA0_KRnef43GbLzz8m0";
// Constructor
public YoutubeConnector(Context context)
{
youtube = new YouTube.Builder(new NetHttpTransport(),
new JacksonFactory(), new HttpRequestInitializer()
{
#Override
public void initialize(HttpRequest hr) throws IOException {}
}).setApplicationName(context.getString(R.string.app_name)).build();
}
public List<VideoItem> result()
{
List<PlaylistItem> playlistItemList = new ArrayList<PlaylistItem>();
try
{
/* HERE MUST BE MY PROBLEM ! */
playlistItemRequest = youtube.playlistItems().list("snippet");
playlistItemRequest.setPlaylistId(PLAYLIST_ID);
playlistItemRequest.setFields("items(id/videoId,snippet/title,snippet/description,snippet/thumbnails/default/url),nextPageToken,pageInfo");
playlistItemRequest.setKey(KEY);
String nextToken = "";
do {
playlistItemRequest.setPageToken(nextToken);
PlaylistItemListResponse playlistItemResult = playlistItemRequest.execute();
playlistItemList.addAll(playlistItemResult.getItems());
nextToken = playlistItemResult.getNextPageToken();
} while (nextToken != null);
}catch(IOException e)
{
Log.d("YC", "Could not initialize: "+e);
}
//[...]
}
Here is the error I got :
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"location" : "fields",
"locationType" : "parameter",
"message" : "Invalid field selection videoId",
"reason" : "invalidParameter"
} ],
"message" : "Invalid field selection videoId"
}
EDIT 2 Thanks to : Martijn Woudstra.
Correct line was :
playlistItemRequest = youtube.playlistItems().list("snippet,contentDetails");
//[...]
playlistItemRequest.setFields("items(snippet/title,snippet/description,snippet/thumbnails/default/url,contentDetails/videoId),nextPageToken,pageInfo");
//[...]
videoItem.setId(item.getContentDetails().getVideoId());
I know that is an old question but It is important to identify which resources are we using to understand how to get the proper information. There are many resources in the YouTube API v3, but we usually use the search, video, playlist and playlistItems.
According to the documentation the following JSON structure shows the format of a playlistItems resource:
{
"kind": "youtube#playlistItem",
"etag": etag,
"id": string,
"snippet": {
"publishedAt": datetime,
"channelId": string,
"title": string,
"description": string,
"thumbnails": {
(key): {
"url": string,
"width": unsigned integer,
"height": unsigned integer
}
},
"channelTitle": string,
"playlistId": string,
"position": unsigned integer,
"resourceId": {
"kind": string,
"videoId": string,
}
},
"contentDetails": {
"videoId": string,
"startAt": string,
"endAt": string,
"note": string,
"videoPublishedAt": datetime
},
"status": {
"privacyStatus": string
}
}
From this structure, we may suppose that there are three ways to get the videoId. But first it is important to know how we going to define the PARTS and the FIELDS of the resource.
To define the PARTS we use this code:
YouTube.PlaylistItems.List list = youtube.playlistItems().list("snippet");
In the previous line, "snippet" identifies a property that contains numerous fields (or child properties), including the title, description, position, and resourceId, so when we set "snippet" the API's response will contain all of those child properties.
Now, we also can limit the previous properties if we define the FIELDS. For example, in this code:
list.setFields("items(id/videoId,snippet/title,snippet/description," +
"snippet/thumbnails/default/url)");
If we call list.execute(), it will show an error because we didn't define id in the PARTS properties. Also, according to the JSON structure, id is a String and does not contains videoId as a child property. Ah!, but we can extract videoId from the resourceId? -Well, the answers is YES/NO. -Why so? Come on Teo, the JSON structure shows it clearly. -Yes, I can see that, but the documentation says:
If the snippet.resourceId.kind property's value is youtube#video, then this property will be present and its value will contain the ID that YouTube uses to uniquely identify the video in the playlist.
This means that sometimes may not be available. -Then, how we can get the videoId? -Well, we can add id or contentDetails to the PARTS resources. If we add id then defines fields like this:
YouTube.PlaylistItems.List list = youtube.playlistItems().list("id,snippet");
list.setFields("items(id,snippet/title,snippet/description," +
"snippet/thumbnails/default/url)");
If we add contentDetails then defines fields like this:
YouTube.PlaylistItems.List list = youtube.playlistItems()
.list("snippet,contentDetails");
list.setFields("items(contentDetails/videoId,snippet/title,snippet/description," +
"snippet/thumbnails/default/url)");
I hope this helps you guys.
id/videoId doesnt exist.
There is an id and a snippet/resourceId/videoId.
So my guess is your setfields aren't right.

Plain string template query for elasticsearch through java API?

I have a template foo.mustache saved in {{ES_HOME}}/config/scripts.
POST to http://localhost:9200/forward/_search/template with the following message body returns a valid response:
{
"template": {
"file": "foo"
},
"params": {
"q": "a",
"hasfilters": false
}
}
I want to translate this to using the java API now that I've validated all the different components work. The documentation here describes how to do it in java:
SearchResponse sr = client.prepareSearch("forward")
.setTemplateName("foo")
.setTemplateType(ScriptService.ScriptType.FILE)
.setTemplateParams(template_params)
.get();
However, I would instead like to just send a plain string query (i.e. the contents of the message body from above) rather than build up the response using the java. Is there a way to do this? I know with normal queries, I can construct it like so:
SearchRequestBuilder response = client.prepareSearch("forward")
.setQuery("""JSON_QUERY_HERE""")
I believe the setQuery() method wraps the contents into a query object, which is not what I want for my template query. If this is not possible, I will just have to go with the documented way and convert my json params to Map<String, Object>
I ended up just translating my template_params to a Map<String, Object> as the documentation requires. I utilized groovy's JsonSlurper to convert the text to an object with a pretty simple method.
import groovy.json.JsonSlurper
public static Map<String,Object> convertJsonToTemplateParam(String s) {
Object result = new JsonSlurper().parseText(s);
//Manipulate your result if you need to do any additional work here.
//I.e. Programmatically determine value of hasfilters if filters != null
return (Map<String,Object>) result;
}
And you could pass in the following as a string to this method:
{
"q": "a",
"hasfilters": true
"filters":[
{
"filter_name" : "foo.untouched",
"filters" : [ "FOO", "BAR"]
},
{
"filter_name" : "hello.untouched",
"list" : [ "WORLD"]
}
]
}

Categories

Resources