ElasticSearch : search more like this in java - java

Let's say I've indexed a document like this :
{
"_index": "indexapm",
"_type": "membres",
"_id": "3708",
"_score": 1,
"_source": {
"firstname": "John",
"lastname": "GUERET-TALON"
}
}
I want to retrieve this document when searching for "GUER", "GUERET", "TAL" for example.
I have a Java application and I tried this :
MoreLikeThisQueryBuilder qb = QueryBuilders.moreLikeThisQuery(
"firstname^3",
"lastname^3")
.likeText("GUER");
SearchResponse response = client.prepareSearch("myindex")
.setTypes("mytype")
.setSearchType(SearchType.DFS_QUERY_AND_FETCH)
.setQuery(qb) // Query
.setFrom(0)
.setSize(limit)
.setExplain(true)
.execute()
.actionGet();
But this search doen't retrieve my document. Of course if I try an exact match query and search for "GUERET", it works.
Does anyone know what kind of query I have to use and how to make it work with the Java library? Thanks!

The More Like This Query isn't the best choice in this case.
If, as you described, you're looking for documents using the first letters of words, you should use a Prefix Query instead, but they are limited to one field. For a search on more than one field, use the MultiMatch Query (providing the PHRASE_PREFIX type). I would try something like:
QueryBuilders.multiMatchQuery("GUER", "firstname", "lastname")
.type(MatchQueryBuilder.Type.PHRASE_PREFIX);

QueryBuilders.boolQuery().should(QueryBuilders.wildcardQuery("lastname", "*GUER*"));
I got the result using WildcardQueryBuilder it generates following:
{"bool":{"should":[{"wildcard":{"firstname":"*GUER*"}}]}}

Related

Apache Camel: How to transform hierarchical data from database into pojo

I am new to stackoverflow and also to Apache Camel. I try to write a understandable description of my problems.
My goal is to read hierarchical data from database (mysql) which is composed by 1 entry in a parent table and several rows in a child table and transform this data into a pojo.
Sub-goal: Not to write much custom code and use blueprint xml.
Since I could not find fitting EIP for this issue, I list here my approaches so far:
1. Select data by a joined query
select * from parentTable join childTable on childTable.parentId=parentTable.id
This would mean to write a custom processor to transform result into pojo, because the select result will get for each result row every time the parent properties. Since I try to avoid writing a custom processor, I tried the following:
2. Select query returns JSON with correct structure to transform to pojo
select json_object(
'parentProperty1', parentProperty1
, 'parentProperty2', parentProperty2
, 'children', (select CAST(CONCAT('[',
GROUP_CONCAT(
JSON_OBJECT(
'childProperty1', childProperty1
, 'childProperty2', childProperty2
)),
']')
AS JSON)
from childTable c
where p.messageId=c.messageId
)
))
from parentTable p
;
Executing the query on mysql shell returns expected JSON:
{
"parentProperty1": "value1",
"parentProperty1": "value2",
"children": [
{
"childProperty1": "value3",
"childProperty2": "value4"
},
{
"childProperty1": "value5",
"childProperty2": "value6"
}
]
}
Running the query inside camel, I encountered a problem, for which I could not find an explanation nor solution yet.
The body has after the execution of the query the JSON, but it is surrounded by fragments of the initial query:
[{json_object('parentProperty1', parentProperty1 , 'parentProperty2', parentProperty2 , 'children', (select CAST(CONCAT('[',
={"parentProperty1": "value1", "parentProperty2": "value2", "children": [{"childProperty1": "value3", "childProperty2": "value4"}, {"childProperty1": "value5", "childProperty2": "value6"}]}}]
Questions:
Is there an existing EIP to solve my problem?
Why there is no correct JSON as result in my 2. approach?
Thanks in advance
What you are actually getting is a key-value pair. Since no alias name was given to json_obect in the query, mysql generates a default column name. This is what you are seeing as query snippet in the result.
Add an alias name to the json_obect in the query as shown below:
select json_object(
'parentProperty1', parentProperty1
, 'parentProperty2', parentProperty2
, 'children', (select CAST(CONCAT('[',
GROUP_CONCAT(
JSON_OBJECT(
'childProperty1', childProperty1
, 'childProperty2', childProperty2
)),
']')
AS JSON)
from childTable c
where p.messageId=c.messageId
)
) as result
from parentTable p;
This would return something like this:
{result={"children": [{"childProperty1": "value3", "childProperty2": "value4"}, {"childProperty1": "value5", "childProperty2": "value6"}], "parentProperty1": "value1", "parentProperty2": "value2"}}
Hope this solves you issue

Elasticsearch universal search query

I have a string "Jhon Abraham 18". I want to create search query that will search by divided by spaces words from the string in an index. This search have to be set to all fields of the index and you don't know what meaning have to be mapped(set) to a field.
So, I have a document:
{
"_index": "recipient",
"_type": "recipient",
"_id": "37a15258d9",
"_version": 1,
"_score": 1,
"_source": {
"name": "Jhon ",
"surname": "Abraham",
"age": "18 ",
}
and I don't know to what fields of index meanings Jhon, Abraham and 18 correspond. I just have a string and by this string I want to search in all fields of the index documents. I can divide it by separete words by spaces but I don't know exact mapping fields for search. Also, I want to do it at Java.
I'll be appreciate for help.
I think you should use query_string in elasticsearch.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
This will solve your problem.
You can use multi match query, writing all fields or wildcards.
Multi Match Query

Elasticsearch java api for full text search with filters

I want to do a full text search on string "user" which can match any feild in my document and then apply filter so that i get only records where the value of x feild is either "abc" or "xyz".
Fiddling with Sense , the below request fulfills my requirement.
GET _search
{
"query":{
"filtered":{
"query":{
"query_string": {
"query": "user"
}
},
"filter":
[
{"term": { "x": "abc"}},
{"term": { "x": "xyz"}}
]
}
}
}
But, i want a java api to do the above stuff. I have searched the elastic documentation and the SO , but have not found what i am looking for as api like QueryBuilders.filteredQuery seems deprecated . I am currently using 2.3.4 but can upgrade.
Elasticsearch 2.x java Api doesn't have implementation for filtered query. According to the elasticsearch documentation, they suggest to use the bool query instead with a must clause for the query and a filter clause for the filter.
In java you could create a method to create a wrapper method for the filtered query. This would be like:
public static QueryBuilder filteredQuery(QueryBuilder query, QueryBuilder filter){
BoolQueryBuilder filteredQuery = QueryBuilders.boolQuery().must(query);
return filteredQuery.filter(filter);
}
where,
QueryBuilder query = QueryBuilders.queryStringQuery("user");
QueryBuilder filter = QueryBuilders.boolQuery().filter(QueryBuilders.boolQuery().should(QueryBuilders.termsQuery("x", "abc", "xyz")));
This query is equivalent to the one mentioned above. Hope this helps

Retrieve data from Elasticsearch using aggregations where the values contains hyphen

I am working on elastic search for quite some time now... I have been facing a problem recently.
I want to group by a particular column in elastic search index. The values for that particular column has hyphens and other special characters.
SearchResponse res1 = client.prepareSearch("my_index")
.setTypes("data")
.setSearchType(SearchType.QUERY_AND_FETCH)
.setQuery(QueryBuilders.rangeQuery("timestamp").gte(from).lte(to))
.addAggregation(AggregationBuilders.terms("cat_agg").field("category").size(10))
.setSize(0)
.execute()
.actionGet();
Terms termAgg=res1.getAggregations().get("cat_agg");
for(Bucket item :termAgg.getBuckets()) {
cat_number =item.getKey();
System.out.println(cat_number+" "+item.getDocCount());
}
This is the query I have written inorder to get the data groupby "category" column in "my_index".
The output I expected after running the code is:
category-1 10
category-2 9
category-3 7
But the output I am getting is :
category 10
1 10
category 9
2 9
category 7
3 7
I have already went through some questions like this one, but couldn't solve my issue with these answers.
That's because your category field has a default string mapping and it is analyzed, hence category-1 gets tokenized as two tokens namely category and 1, which explains the results you're getting.
In order to prevent this, you can update your mapping to include a sub-field category.raw which is going to be not_analyzed with the following command:
curl -XPUT localhost:9200/my_index/data/_mapping -d '{
"properties": {
"category": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
After that, you need to re-index your data and your aggregation will work and return you what you expect.
Just make sure to change the following line in your Java code:
.addAggregation(AggregationBuilders.terms("cat_agg").field("category.raw").size(10))
^
|
add .raw here
When you index "category-1" you will get (by default) two terms, "category", and "1". Therefore when you aggregate you will get back two results for that.
If you want it to be considered a single "term" then you need to change the analyzer used on that field when indexing. Set it to use the keyword analyzer

elasticsearch match/term query not returning exact match

I am using elasticsearch in my project in Java, with the document format like
/index/type/_mapping
{
"my_id" : "string"
}
Now, suppose the my_id values are
A01, A02, A01.A1, A012.AB0
For the query,
{
"query" : {
"term" : {
"my_id" : "a01"
}
}
}
Observed : the documents returned are for A01, A01.A1, A012.AB0
Expected : I need the A01 document only.
I looked for the solution and found that i would have to use a custom analyzer for my_id field. I do not want to change my mapping for the document.
Also, I used "index": "not_analyzed" in the query but there was no change in the output.
Yes, you could use 'not_analyzed' analyzer, but try to use term filter instead of term query
Also check current mapping of the document

Categories

Resources