Elasticsearch Java API addMapping() and setSettings() usage - java

Problem: How to create an index from a json file using
The json file contains a definition for the index de_brochures. It also defines an analyzer de_analyzerwith custom filters that are used by the respective index.
As the json works with curl and Sense I assume I have to adapt the syntax of it to work with the java API.
I don't want to use XContentFactory.jsonBuilder() as the json comes from a file!
I have the following json file to create my mapping from and to set settings:
Using Sense with PUT /indexname it does create an index from this.
{
"mappings": {
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
"settings": {
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
}
As the above did not work with addMapping() alone I tried to split it into two seperate files (I realized that I had to remove the "mappings": and "settings": part):
------ Mapping json ------
{
"de_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "de_analyzer"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
------- Settings json --------
{
"analysis": {
"filter": {
"de_stopwords": {
"type": "stop",
"stopwords": "_german_"
},
"de_stemmer": {
"type": "stemmer",
"name": "light_german"
}
},
"analyzer": {
"de_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"de_stopwords",
"de_stemmer"
]
}
}
}
}
This is my java code to load and add/set the json.
CreateIndexRequestBuilder createIndexRequestBuilder = client.admin().indices().prepareCreate(index);
// CREATE SETTINGS
String settings_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.setSettings(settings_json);
// CREATE MAPPING
String mapping_json = new String(Files.readAllBytes(brochures_mapping_path));
createIndexRequestBuilder.addMapping("de_brochures", mapping_json);
CreateIndexResponse indexResponse = createIndexRequestBuilder.execute().actionGet();
There is no more complaint about the mapping file's structure but it now fails with the error:
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Analyzer [de_analyzer] not found for field [text]

Solution:
I managed to do it with my original json file using createIndexRequestBuilder.setSource(settings_json);

I think the problem is with structure of your mapping file.
Here is a sample example.
mapping.json
{
"en_brochures": {
"properties": {
"text": {
"type": "string",
"store": true,
"index_analyzer": "en_analyzer",
"term_vector": "yes"
},
"classification": {
"type": "string",
"index": "not_analyzed"
},
"language": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
String mapping = new String(Files.readAllBytes(Paths.get("mapping.json")));
createIndexRequestBuilder.addMapping('en_brochures', mapping);
CreateIndexResponse indexResponse =createIndexRequestBuilder.execute().actionGet();
This works in mine, you can try.

Related

Spring Data for ElasticSearch save method increments count (incorrect) even for update

I'm having an issue in which the count keeps incrementing by 1 whenever I do an update to a document using save and the count's supposed to remain the same. If I create a document with save then the count is incremented by 2. Am I setting something wrong?
This is my settings for the ElasticSearch index:
{
"aliases": {
"case": {}
},
"mappings": {
"_doc": {
"dynamic": false,
"properties": {
"created": {
"index": true,
"type": "date"
},
"modified": {
"index": true,
"type": "date"
},
"type": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
},
"states": {
"type": "nested",
"properties": {
"from": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
},
"to": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
},
"event": {
"index": true,
"type": "keyword", "normalizer": "lower_case_normalizer"
}
}
}
}
}
},
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
},
"analysis": {
"normalizer": {
"lower_case_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
}
}
This is how I insert a document into ES:
public Case createCase(final Case case) throws UnableToGenerateUUIDException {
final UUID caseId = uuidService.getNowTimeUUID();
final Instant now = Instant.now();
case.setCreated(now);
case.setModified(now);
case.setId(caseId.toString());
return caseRepository.save(case);
}
This is the bug from the Chrome extension I used, not from Spring Data. I apologize for the mistake. I have verified that the count() from Spring Data reflects the correct number.

How programatically convert JSON to JSON schema to compare structure of two json object [duplicate]

I want convert json document into json schema. I googled it but not got the exact idea according to my requirement.
here is JSON
{
"empId":1001,
"firstName":"jonh",
"lastName":"Springer",
"title": "Engineer",
"address": {
"city": "Mumbai",
"street": "FadkeStreet",
"zipCode":"420125",
"privatePhoneNo":{
"privateMobile": "2564875421",
"privateLandLine":"251201546"
}
},
"salary": 150000,
"department":{
"departmentId": 10521,
"departmentName": "IT",
"companyPhoneNo":{
"cMobile": "8655340546",
"cLandLine": "10251215465"
},
"location":{
"name": "mulund",
"locationId": 14500
}
}
}
I want to generate like this
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"title": "Employee",
"properties": {
"empId": {
"type": "integer"
},
"firstName":{
"type":"string"
},
"lastName": {
"type": "string"
},
"title": {
"type": "string"
},
"address": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"street": {
"type": "string"
},
"zipCode": {
"type": "string"
},
"privatePhoneNo": {
"type": "object",
"properties": {
"privateMobile": {
"type": "string"
},
"privateLandLine": {
"type": "string"
}
}
}
}
},
"salary": {
"type": "number"
},
"department": {
"type": "object",
"properties": {
"departmentId": {
"type": "integer"
},
"departmentName": {
"type": "string"
},
"companyPhoneNo": {
"type": "object",
"properties": {
"cMobile": {
"type": "string"
},
"cLandLine": {
"type": "string"
}
}
},
"location": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"locationId": {
"type": "integer"
}
}
}
}
}
}
}
Is there any library is doing like this or what is another way?
https://github.com/perenecabuto/json_schema_generator
http://jsonschema.net/#/
I'm think this maybe will help
It's been a while since this was asked but I was having the same issue. So far the best solution I have come across is this library:
https://github.com/saasquatch/json-schema-inferrer
I found this from the json-schema doc itself. It has links to implementations for other languages as well:
https://json-schema.org/implementations.html#from-data

Elasticsearch Query String Query not working with synonym analyzer

I am trying to configure elastic search with synonyms.
These are my settings:
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
Mappings config:
"category": {
"properties": {
"name": {
"type":"string",
"search_analyzer" : "category_synonym",
"index_analyzer" : "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And the list of my synonyms
film => video,
ooh => panels , poster,
commercial => advertisement,
print => magazine
I must say that I am using Elasticsearch Java API.
I am using QueryBuilders.queryStringQuery because this is the only way how I set analyzers to my request.
So, when I am making:
QueryBuilders.queryStringQuery("name:film").analyzer(analyzer)
It returns me
[
{
"id": 71,
"name": "Pitch video",
"description": "... ",
"parent": null
},
{
"id": 25,
"name": "Video",
"description": "... ",
"parent": null
}
]
That is perfect for me, but when I am calling something like this
QueryBuilders.queryStringQuery("name:vid").analyzer(analyzer)
I expect that it should return same objects, but there is nothing: []
So, I added asterisk to queryStringQuery:
QueryBuilders.queryStringQuery("name:vid*").analyzer(analyzer)
Works well, but now
QueryBuilders.queryStringQuery("name:film*").analyzer(analyzer)
returns me []
So, how can I configure my elastic search that it will return same objects when I am searching video, vid, film and fil?
Thanks in advance!
Hm, I don't think Elasticsearch will know to "translate" fil into vid :-). So, I think you need edgeNGrams for this, both at indexing and search time.
PUT test
{
"settings": {
"analysis": {
"analyzer": {
"category_synonym": {
"tokenizer": "whitespace",
"filter": [
"synonym_filter",
"my_edgeNGram_filter"
]
},
"standard_edgeNGram": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_filter",
"my_edgeNGram_filter"
]
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
},
"my_edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 8
}
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "string",
"analyzer": "category_synonym",
"index_analyzer": "standard_edgeNGram",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST test/test/1
{"name": "Pitch video"}
POST test/test/2
{"name": "Video"}
GET /test/test/_search
{
"query": {
"query_string": {
"query": "name:fil"
}
}
}

How to have a mapping to not_analyze some fields for all indexes and types by default

I understand how to build a mapping for any index and type. But, I want to have a field my_field_1 and my_field_2 which will not be analyzed for all the indexes and types that will be created in future.
PUT /address_index
{
"mappings":{
"address":{
"properties":{
"state":{
"type":"string",
"fields":{
"raw":{
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
}
}
I also saw in one of the links on how to do for all the strings. But, I am unable to add it for just the fields mentioned above.
I will be implementing this in Java. However, just DSL JSON would be good headstart.
You can do this by creating an index template with a pattern of "*" meaning it will apply to all indices you create in the future and defining this mapping in there.
PUT 127.0.0.1:9200/_template/stacktest
{
"template": "*",
"settings": {
"number_of_shards": 1
},
"mappings": {
"address": {
"properties": {
"state": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Now you can create an index with any name and this mapping will apply to it.
PUT 127.0.0.1:9200/testindex/
GET 127.0.0.1:9200/testindex/_mapping
{
"testindex": {
"mappings": {
"address": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
}
Note that the index: not_analyzed part was transformed into the keyword datatype, as string has been deprecated. You should use text and keyword if you are on version 5.x.
Edit to address your comments
To adapt this to the specific two fields mentioned by you, the following request would create the template:
{
"template": "*",
"settings": {
"number_of_shards": 1
},
"mappings": {
"_default_": {
"properties": {
"my_field_1": {
"type": "string",
"index": "not_analyzed"
},
"my_field_2": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
If you now index a document into a new index, those two fields will be not analyzed for any type of document, any other string fields will be analyzed, which is the way I understood your original question.
PUT 127.0.0.1:9200/testindex/address
{
"my_field_1": "this is not_analyzed",
"my_field_2": "this is not_analyzed either",
"other_field": "this however is analyzed"
}
PUT 127.0.0.1:9200/testindex/differenttype
{
"my_field_1": "this is not_analyzed",
"my_field_2": "this is not_analyzed either",
"other_field": "this however is analyzed"
}
Now check the mapping and notice the difference:
{
"testindex": {
"mappings": {
"differenttype": {
"properties": {
"my_field_1": {
"type": "keyword"
},
"my_field_2": {
"type": "keyword"
},
"other_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"address": {
"properties": {
"my_field_1": {
"type": "keyword"
},
"my_field_2": {
"type": "keyword"
},
"other_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"_default_": {
"properties": {
"my_field_1": {
"type": "keyword"
},
"my_field_2": {
"type": "keyword"
}
}
}
}
}
}

Elasticsearch : How to retrieve only the desired nested objects

I have the below mapping structure for my Elasticsearch index.
{
"users": {
"mappings": {
"user-type": {
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"details": {
"type": "nested",
"properties": {
"lastModifiedBy": {
"type": "string"
},
"lastModifiedDate": {
"type": "date",
"format": "dateOptionalTime"
},
"views": {
"type": "nested",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"properties": {
"properties": {
"name": {
"type": "string"
},
"type": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
}
}
}
}
}
}
Basically I want to retrieve ONLY the view object inside details based on index id & view id(details.views.id).
I have tried with the below java code.But seems to be not working.
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(QueryBuilders.termQuery("_id", sid))
.setPostFilter(FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id)));
Below is the query structure for this java code.
{
"query": {
"term": {
"_id": "123"
}
},
"post_filter": {
"nested": {
"filter": {
"term": {
"details.views.id": "def"
}
},
"path": "details.views"
}
}
}
Since details is nested and view is nested inside details, you basically need two nested filters as well (one for each level) + the constraint on the _id field is best done with the ids query. The query DSL would look like this:
{
"query": {
"ids": {
"values": [
"123"
]
}
},
"post_filter": {
"nested": {
"filter": {
"nested": {
"path": "details.view",
"filter": {
"term": {
"details.views.id": "def"
}
}
}
},
"path": "details"
}
}
}
Translating this into Java code yields:
// 2nd-level nested filter
FilterBuilder detailsView = FilterBuilders.nestedFilter("details.views",
FilterBuilders.termFilter("details.views.id", id));
// 1st-level nested filter
FilterBuilder details = FilterBuilders.nestedFilter("details", detailsView);
// ids constraint
IdsQueryBuilder ids = QueryBuilders.idsQuery(this.type).addIds("123");
SearchRequestBuilder srq = this.client.prepareSearch(this.indexName)
.setTypes(this.type)
.setQuery(ids)
.setPostFilter(details);
PS: I second what #Paul said, i.e. always play around with the query DSL first and when you know you have zeroed in on the exact query you need, then you can translate it to the Java form.

Categories

Resources