ElasticSearch Rest High Level Client remapping wrong - java

I'm trying to create a class which will write automatically to ElasticSearch through the Rest High Level Client with the operations (create, createBatch, remove, removeBatch, update, updateBatch) and those operations all work and my test cases all succeed. To add a bit more flexibility, I wanted to implement the following method: (find, findAll, getFirsts(n), getLasts(n)). find(key) and findAll() both work perfectly fine but getFirsts(n) and getLasts(n) don't at all.
Here is the context:
Before each test case -> Ensure that index "test" exists and create it if it doesn't
After each test case -> Delete index "test"
For getFirsts(n) and getLasts(n) I call create to have a few items in ElasticSearch and then search according to the uniqueKey.
Here is the mapping for my Test Object:
{
"properties": {
"date": { "type": "long" },
"name": { "type": "text" },
"age": { "type": "integer" },
"uniqueKey": { "type": "keyword" }
}
}
Here is my test case:
#Test
public void testGetFirstByIds() throws BeanPersistenceException {
List<StringTestDataBean> beans = new ArrayList<>();
StringTestDataBean bean1 = new StringTestDataBean();
bean1.setName("Tester");
bean1.setAge(22);
bean1.setTimeStamp(23213987321712L);
beans.add(elasticSearchService.create(bean1));
StringTestDataBean bean2 = new StringTestDataBean();
bean1.setName("Antonio");
bean1.setAge(27);
bean1.setTimeStamp(2332321117321712L);
beans.add(elasticSearchService.create(bean2));
Assert.assertNotNull("The beans created should not be null", beans);
Assert.assertEquals("The uniqueKeys of the fetched list should match the existing",
beans.stream()
.map(ElasticSearchBean::getUniqueKey)
.sorted((b1,b2) -> Long.compare(Long.parseLong(b2),Long.parseLong(b1)))
.collect(Collectors.toList()),
elasticSearchService.getFirstByIds(2).stream()
.map(ElasticSearchBean::getUniqueKey)
.collect(Collectors.toList())
);
}
Here is getFirstByIds(n):
#Override
public Collection<B> getFirstByIds(int entityCount) throws BeanPersistenceException {
assertBinding();
FilterContext filterContext = new FilterContext();
filterContext.setLimit(entityCount);
filterContext.setSort(Collections.singletonList(new FieldSort("uniqueKey",true)));
return Optional.ofNullable(find(filterContext)).orElseThrow();
}
Here is the find(filterContext):
#Override
public List<B> find(FilterContext filter) throws BeanPersistenceException {
assertBinding();
BoolQueryBuilder query = QueryBuilders.boolQuery();
List<FieldFilter> fields = filter.getFields();
StreamUtil.ofNullable(fields)
.forEach(fieldFilter -> executeFindSwitchCase(fieldFilter,query));
SearchSourceBuilder builder = new SearchSourceBuilder().query(query);
builder.from((int) filter.getFrom());
builder.size(((int) filter.getLimit() == -1) ? FILTER_LIMIT : (int) filter.getLimit());
SearchRequest request = new SearchRequest();
request.indices(index);
request.source(builder);
List<FieldSort> sorts = filter.getSort();
StreamUtil.ofNullable(sorts)
.forEach(fieldSort -> builder.sort(SortBuilders.fieldSort(fieldSort.getField()).order(
fieldSort.isAscending() ? SortOrder.ASC : SortOrder.DESC)));
try {
if (strict)
client.indices().refresh(new RefreshRequest(index), RequestOptions.DEFAULT);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
SearchHits hits = response.getHits();
List<B> results = new ArrayList<>();
for (SearchHit hit : hits)
results.add(objectMapper.readValue(hit.getSourceAsString(), clazz));
return results;
}
catch(IOException e){
logger.error(e.getMessage(),e);
}
return null;
}
The issue happens if I run the test case more than one time. The first time, the test passes fine but whenever we reach the second test, I get an exception :
ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]];
After looking around for over a day, I've realized that the map gets changed from the original mapping (map specified at the beginning) and it gets automatically created with this :
"test": {
"aliases": {},
"mappings": {
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timeStamp": {
"type": "long"
},
"uniqueKey": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
As I can see, the mapping changes automatically and throws the error.
Thanks for any help!

Elastic creates dynamic mapping only when no mapping exist for a field when documents are inserted. Check to see if the put mapping call happens before documents are added to index. If the mappings are applied statically, be sure the documents are inserted to the right index.

Related

Is there way to get required field array in draft-07 json schema through java code

I want to generate json schema of draft-04 or draft-07 with required array for mandatory fields?
I am new to JSON schema, so able to generate draft-07 schema with victools:
Sharing code for same:
SchemaGeneratorConfigBuilder configBuilder = new SchemaGeneratorConfigBuilder(SchemaVersion.DRAFT_7, OptionPreset.PLAIN_JSON);
SchemaGeneratorConfig config = configBuilder.build();
SchemaGenerator generator = new SchemaGenerator(config);
JsonNode jsonSchema = generator.generateSchema("MyClassName".class);
System.out.println(jsonSchema.toString());
The o/P i got is:
Json schema for draft-07 like this:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"ActiveOrHistoricCurrencyAndAmount": {
"type": "object",
"properties": {
"ccy": { "type": "string" },
"value": { "type": "number" }
}
}
}
}
what i wanted is:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"ActiveOrHistoricCurrencyAndAmount": {
"type": "object",
"properties": {
"ccy": { "type":"string" },
"value": { "type":"number" }
},
"required": ["ccy", "value"]
}
}
}
I wanted required array for mandatory fields also , so how to generate this using java?
victools generator 4.17.0 currently does not support the required attribute of the Jackson annotation #JsonProperty(required=true) out of the box . It is scheduled for the next version, and will probably be enabled with the option JacksonOption.RESPECT_JSONPROPERTY_ORDER set to true.
see victools changelog
Until then, you could customize the SchemaGeneratorConfigBuilder to support this attribute like this :
SchemaGeneratorConfigBuilder configBuilder = new SchemaGeneratorConfigBuilder(SchemaVersion.DRAFT_7, OptionPreset.PLAIN_JSON);
configBuilder.forFields()
.withRequiredCheck(field -> {
JsonProperty jsonProperty = field.getAnnotationConsideringFieldAndGetter(JsonProperty.class) ;
if ( jsonProperty == null )
return false; // No #JsonProperty ? => field not required.
else
return jsonProperty.required(); // let's respect what the 'required' says
});
SchemaGeneratorConfig config = configBuilder.build();
SchemaGenerator generator = new SchemaGenerator(config);
JsonNode jsonSchema = generator.generateSchema("MyClassName".class);
(disclaimer: I recently submitted the PR for this feature)

Best practice to search ingest-attachment from documents (2k+ documents with ingest-attachment)

Am fetching the indexed documents from elastic search using Java API. But am getting Null as a response from elastic search when Index having more number of document like (2k+).
If index doesnt have more documents less than 500 something, the below Java API code is working properly.
More number of documents in Index, creating issue. ( Is that something like performance issue while fetching ?)
I used ingest-attachment processor plugin for attachment, i attached PDF in my documents.
But if i search with the same query using kibana with curl script am getting response, and am able to see the results in Kibana
Please find my java code below
private final static String ATTACHMENT = "document_attachment";
private final static String TYPE = "doc";
public static void main(String args[])
{
RestHighLevelClient restHighLevelClient = null;
try {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
new HttpHost("localhost", 9201, "http")));
} catch (Exception e) {
System.out.println(e.getMessage());
}
SearchRequest contentSearchRequest = new SearchRequest(ATTACHMENT);
SearchSourceBuilder contentSearchSourceBuilder = new SearchSourceBuilder();
contentSearchRequest.types(TYPE);
QueryStringQueryBuilder attachmentQB = new QueryStringQueryBuilder("Activa");
attachmentQB.defaultField("attachment.content");
contentSearchSourceBuilder.query(attachmentQB);
contentSearchSourceBuilder.size(50);
contentSearchRequest.source(contentSearchSourceBuilder);
SearchResponse contentSearchResponse = null;
try {
contentSearchResponse = restHighLevelClient.search(contentSearchRequest); // returning null response
} catch (IOException e) {
e.getLocalizedMessage();
}
System.out.println("Request --->"+contentSearchRequest.toString());
System.out.println("Response --->"+contentSearchResponse.toString());
SearchHit[] contentSearchHits = contentSearchResponse.getHits().getHits();
long contenttotalHits=contentSearchResponse.getHits().totalHits;
System.out.println("condition Total Hits --->"+contenttotalHits);
Please find my script that am using in kibana., am getting response for the below script.
GET document_attachment/_search?pretty
{
"query" :{
"match": {"attachment.content": "Activa"}
}
}
Please find the below search request from Java API
SearchRequest{searchType=QUERY_THEN_FETCH, indices=[document_attachment], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=[doc], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, source={"size":50,"query":{"match":{"attachment.content":{"query":"Activa","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}}}}
Please find my mapping details
{
"document_attachment": {
"mappings": {
"doc": {
"properties": {
"app_language": {
"type": "text"
},
"attachment": {
"properties": {
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"analyzer": "custom_analyzer"
},
"content_length": {
"type": "long"
},
"content_type": {
"type": "text"
},
"date": {
"type": "date"
},
"language": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"catalog_description": {
"type": "text"
},
"fileContent": {
"type": "text"
}
}
}
}
}
}
}
Please find my settings details
PUT _ingest/pipeline/document_attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "fileContent"
}
}
]
}
Am getting this error only when am trying to search based on attachment.content , If i search with some other field am able to get results.
Am using ElasticSearch 6.2.3 version
Please find the error below.
org.apache.http.ContentTooLongException: entity content is too long [105539255] for the configured buffer limit [104857600]
at org.elasticsearch.client.HeapBufferedAsyncResponseConsumer.onEntityEnclosed(HeapBufferedAsyncResponseConsumer.java:76)
at org.apache.http.nio.protocol.AbstractAsyncResponseConsumer.responseReceived(AbstractAsyncResponseConsumer.java:131)
at org.apache.http.impl.nio.client.MainClientExec.responseReceived(MainClientExec.java:315)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseReceived(DefaultClientExchangeHandlerImpl.java:147)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:303)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:255)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
at com.es.utility.DocumentSearch.main(DocumentSearch.java:88)

Altering dynamic mapping Elasticsearch 5.3

Many of the string fields in my application need to be mapped dynamically in elasticsearch 5.3. All new fields that end in id or ids should be mapped and indexed automatically by elastic as such:
"_my_propertyId":
{
"type": "keyword"
}
I defined a dynamic template for the index/type like this
"mappings": {
"my_type": {
"dynamic_templates": [
{
"id_as_keywords": {
"match": "*id|*Id|*Ids",
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
}
]
Yet, elastic still creates the properties like this:
"_someIds": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
I'm not sure what i'm doing wrong or why this is the default mapping for dynamic string fields now. However, I need to be able dynamically map all properties that end in id or ids as keywords, without ignore_above and fully indexed so I can search for them using the searchAPI. Ideas? Why is this the default string mapping now (I understand the introduction of keyword/text, but still)?
Update
Found a good article on these default settings:
Strings
You can use match_pattern parameter to have more control on match parameter. Find the updated dynamic template below:
"dynamic_templates": [
{
"id_as_keywords": {
"match_mapping_type": "string",
"match_pattern": "regex",
"match": ".*(id|Id|Ids)",
"mapping": {
"type": "keyword"
}
}
}
]
You can read more about match_pattern here.

Deserializing json that is false or object with Retrofit GsonConverterFactory

I am working with a server that returns json. One of the elements is either an object or false - it it is non exiting. I know this is very poor implementation of server response and there are quite a few such cases, but this is what I have to work with. How can I deal with this situation? If there is an object I successfully deserialze it, if none - i get error - EXPECTED OBJECT FOUND BOOLEAN.
Even worse, I do not know where I am gonna meet such situations in future on this project.
This is the sample json:
{
"course": {
"id": "47902",
"course": "3844",
"group": "1825",
"teacher": "59502",
"table": "1447",
"client": "1",
"course_id": "3844",
"description": ""
},
"teacher": {
"id": "59502",
"post": "0",
"experience": "",
"dep_experience": "",
"rank": "0",
"online": "1458891283",
"departments": [
null
]
},
"depart": {
"id": "100",
"postcode": "",
"public": "1",
"alias": "",
"faculty": "97",
"client": "1"
},
"progress": false,
"files": [
{
"teacher": {
"id": "59502",
"code": "53bd7c21ad05b03e",
"photo": "1"
},
"files": [
{
"id": "0ffe41e5003ee5c0",
"owner": "59502",
"address": "0ffe41e5003ee5c0",
"type": "f",
"size": "0",
"time": "2015-07-10 14:39:15",
"data": ""
}
]
}
]
}
As you can see progress is false here. Other times it is ordinary object like depart. Deserialization is done by Retrofit 2.
Thanks a lot.
I'm assuming you have a top-level mapping similar to the following one and have configured your Retrofit instance for Gson:
final class Response {
#SerializedName("progress")
#JsonAdapter(FalseAsNullTypeAdapterFactory.class)
final Progress progress = null;
}
final class Progress {
final String foo = null;
}
Note that the progress property is annotated with the #JsonAdapter annotation: we're assuming this is only place were the progress property can be a boolean (if you have many places like this one, you can either annotate each field with this annotation, or .registerTypeAdapter() via GsonBuilder; in case of .registerTypeAdapterFactory() the factory must check against known types in order not to "intercept" all types).
Now, here is a type adapter factory to deal with your issue:
final class FalseAsNullTypeAdapterFactory
implements TypeAdapterFactory {
// Let Gson instantiate it itself
private FalseAsNullTypeAdapterFactory() {
}
#Override
public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> typeToken) {
// Get a downstream parser (for simplicity: get the default parser for the given type)
final TypeAdapter<T> delegateTypeAdapter = gson.getDelegateAdapter(this, typeToken);
return new TypeAdapter<T>() {
#Override
public void write(final JsonWriter out, final T value) {
throw new UnsupportedOperationException();
}
#Override
public T read(final JsonReader in)
throws IOException {
// Peek whether the next JSON token is a boolean
if ( in.peek() == BOOLEAN ) {
// And take the this JSON token as a boolean value
// Is it true?
if ( in.nextBoolean() ) {
// Then it's not something we can handle -- probably a boolean field annotated with #JsonAdapter(FalseAsNullTypeAdapterFactory.class)?
throw new MalformedJsonException("Unexpected boolean marker: true");
}
// We're assuming it's null
return null;
}
// If it's not a boolean value, then we just delegate parsing to the original type adapter
return delegateTypeAdapter.read(in);
}
};
}
}
Now just test it:
try ( final Reader reader = getPackageResourceReader(Q43231983.class, "success.json") ) {
final Response response = gson.fromJson(reader, Response.class);
System.out.println(response.progress.foo);
}
try ( final Reader reader = getPackageResourceReader(Q43231983.class, "failure.json") ) {
final Response response = gson.fromJson(reader, Response.class);
System.out.println(response.progress);
}
where the given resources are:
success.json is {"progress":{"foo": "bar"}};
failure.json is {"progress":false}.
The output is as follows:
bar
null

How to apply a sub schema in the JSON Schema validator?

Hi I'm using the JSON Schema evaluator in version 2.2.6 to validate my server responses. These responses can contain single objects of type A, B or C, but also composite objects, e.g., D containing an array of A objects. To reuse the schema definitions of each object, I started to describe all entities in the same file as described here. Now my problem is, that I have to reference one of those single objects when validating the response.
Here is my (not) SWE.
JSON schema file:
{
"id":"#root",
"properties": {
"objecta": {
"type": "object",
"id":"#objecta",
"properties": {
"attribute1": {"type": "integer"},
"attribute2": {"type": "null"},
},
"required": ["attribute1", "attribute2"]
},
"objectb": {
"type": "object",
"id":"#objectb",
"properties": {
"attribute1": {"type": "integer"},
"attribute2": {
"type": "array",
"items": {
"$ref": "#/objecta"
}
}
}
},
"required": ["attribute1", "attribute2"]
},
}
}
Now I want to validate a server response containing object B. For this, I tried the following:
public class SchemeValidator {
public static void main(String[] args) {
String jsonData = pseudoCodeFileLoad("exampleresponse/objectb.txt");
final File jsonSchemaFile = new File("resources/jsonschemes/completescheme.json");
final URI uri = jsonSchemaFile.toURI();
ProcessingReport report = null;
try {
JsonSchemaFactory factory = JsonSchemaFactory.byDefault();
final JsonSchema schema = factory.getJsonSchema(uri.toString() + "#objectb");
JsonNode data = JsonLoader.fromString(jsonData);
report = schema.validate(data);
} catch (JsonParseException jpex) {
// ... handle parsing errors etc.
}
}
}
The problem is that the scheme is not loaded correctly. I either get no error (even for invalid responses) or I get fatal: illegalJsonRef as the scheme seems to be empty. How can I use the schema of object b in my Java code? Thank you!!
It looks like your $ref is incorrect. It needs to be a relative reference from the base of the JSON Schema file (see here).
So your JSON schema would become:
{
"id":"#root",
"properties": {
"objecta": {
"type": "object",
"id":"#objecta",
"properties": {
"attribute1": {"type": "integer"},
"attribute2": {"type": "null"},
},
"required": ["attribute1", "attribute2"]
},
"objectb": {
"type": "object",
"id":"#objectb",
"properties": {
"attribute1": {"type": "integer"},
"attribute2": {
"type": "array",
"items": {
"$ref": "#/properties/objecta"
}
}
}
},
"required": ["attribute1", "attribute2"]
},
}
}
I've added '/properties' to your $ref. It's operating like XPath to the definition of the object in the schema file.

Categories

Resources