Elasticsearch query_string breaks with case insensitive search

Elasticsearch query_string breaks with case insensitive search - java

I have text messages like - "#Someone please note%"
and i need to get the search results when searching - #someone or #Someone or #some or #someone note.
Also need to get when searched with special characters like % or #.
I'm using elasticsearch's BoolqueryBuilder with query_string for fetching.
Is there any way to make this search possible, or should I use any other methods like wildcard or matchquery.
There are some messages in my index - document.
the key name is message.
and it can contain any type of texts.
Suppose if there is message - Hello##$ ther5# #Someone
I need to get the match results when searched for the below keywords:
#someone, #Some, #some, ##, ther5#, #Someone
Im using java backend with BoolQueryBuilder

Related

Hibernate Search Lucene query parser with Special Characters

FIRST QUESTION:
Can somebody explain to me how the lucene query in Hibernate Search handles special characters. I read the documentation about Hibernate search and also the Lucene Regexp Syntax but somehow they don't add up with the generated Queries and Results.
Lets assume i have following database entries:
name
Will
Will Smith
Will - Smith
Will-Smith
and i am using following Query:
Query query = queryBuilder
.keyword()
.onField("firstName")
.matching(input)
.createQuery();
Now I am looking for the following input:
Will -> returns all 4 entries, with the following generated query: FullTextQueryImpl(firstName:will)
Will Smith -> also returns all 4 entries with the following generated query: FullTextQueryImpl(firstName:will firstName:smith)
Will - Smith -> also returns all 4 entries with the following generated query: FullTextQueryImpl(firstName:will firstName:smith) ? Where is the "-" or shouldn't it forbid everything after the "-" according to Lucene Query Syntax?
Will-Smith -> same here
Will-Smith -> here i tried to use backslash but same result
Will -Smith -> Same here
SECOND QUESTION: Lets assume i have following database entries in which the entry without numerical ending always exists and the ones with numerical ending could be in the datebase.
How woul a lucene query for this look like?
name
Will
Will1
Will2

You can play around with Lucene analyzers and see what happens behind the scenes. Here is a tutorial: https://www.baeldung.com/lucene-analyzers
The tokenizer is pluggable, so you can change how special characters are treated.

How to enable `relevance-trace` using MarkLogic Java API?

I'm implementing quite a complex search using MarkLogic Java API. I would like to enable relevance-trace (Relavance trace) to see how my results are scored. Unfortunately, I don't know how to enable it in Java API. I have tried something like:
DatabaseClient client = initClient();
var qmo = client.newServerConfigManager().newQueryOptionsManager();
var searchOptions = "<search:options xmlns=\"http://marklogic.com/appservices/search\">\n"
+ " <search-option>relevance-trace</search-option>\n"
+ " </search:options>";
qmo.writeOptions("searchOptions", new StringHandle(searchOptions).withFormat(Format.XML));
QueryManager qm = client.newQueryManager();
StructuredQueryBuilder qb = qm.newStructuredQueryBuilder("searchOptions");
// query definition
qm.search(query, new SearchHandle())
Unfortunately it ends up with following error:
"Local message: /config/query write failed: Internal Server Error. Server Message: XDMP-DOCNONSBIND:
xdmp:get-request-body(\"xml\") -- No namespace binding for prefix search at line 1 . See the
MarkLogic server error log for further detail."
My question is how to use search options in MarkLogic API, especially I'm interested in relevance-trace and simple-score
Update 1
As suggested by #Jamess Kerr I have change my options to
var searchOptions = "<options xmlns=\"http://marklogic.com/appservices/search\">\n"
+ " <search-option>relevance-trace</search-option>\n"
+ " </options>";
but unfortunately, it still doesn't work. After that change I get error:
Local message: /config/query write failed: Internal Server Error. Server Message: XDMP-UPDATEFUNCTIONFROMQUERY: xdmp:apply(function() as item()*) -- Cannot apply an update function from a query . See the MarkLogic server error log for further detail.

Your search options XML uses the search: namespace prefix but you don't define that prefix. Since you are setting the default namespace, just drop the search: prefix from the search:options open and close tags.

The original Java Query contains both syntactical and semantic issues:
First of all, it is an invalid MarkLogic XQuery in the sense that it has only query option(s) portion. Bypassing the namespace binding prefix is another wrong end of the stick.
To tweak your original query, please replace a search text in between the search:qtext tag ( the pink line ) and run the query.
Result:
Matched and Listing 2 documents:
Matched 1 locations in /medals/coin_1333113127296.xml with 94720 score:
73. …pulsating maple leaf coin another world-first, the [Royal Canadian Mint]is proud to launch a numismatic breakthrough from its ambitious and creative R&D team...
Matched 1 locations in /medals/coin_1333078361643.xml with 94720 score:
71. ...the [Royal Canadian Mint]and Royal Australian Mint have put an end to the dispute relating to the circulation coin colouring process...
Without a semantic criterion, to put it into context, your original query will be an equivalent of removing the search:qtext and performing a fuzzy search.
Note:
If you use serialised term search or search constraints instead of text search, you should get higher score results.
MarkLogic Java API operates in unfiltered mode by default, while the cts:search operates in filtered mode by default. Just be mindful of how you construct the query and the expected score in Java API.
And it is really intended for bulk data write/extract/transformation. qconsole is, in my opinion, more befitting to tune specific query and gather search score, relevance and computation details.

Get list of index names matching pattern from elastic search - JAVA

I have a list of indexes in elastic search as follow:
index1, index2, index3, test-index1, test-index2, test-index3
Now I want only those indexes that matches my pattern "test-*".
I can achieve the above result by following sense query:
GET test-*/_aliases
I want to achieve the same result from java code.

The REST endpoint that responds to /test-*/_aliases does the following (see here):
GetAliasesResponse getAliasesResponse = client().admin().indices()
.prepareGetAliases()
.setIndices("test-*", "index-*").get();

Cannot find document when searching for field with Id using Java API in ElasticSearch

I have a field which contains forward slashes. I'm trying to execute this Query:
QueryBuildres.termQuery("id", QueryParser.escape("/my/field/val"))
and I cannot get any results. When I'm looking for 'val' only, then I get the proper results. Any ideas why is that happening? Of course without escaping it also doesn't return the results.
UPDATE
so QP.escape parses string properly, but when request goes to elasticsearch it's double escaped
[2015-07-10 01:53:00,063][WARN ][index.search.slowlog.query] [Aaa AA] [index_name][4] took[420.8micros], took_millis[0], types[page], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"pageId":"\\/path\\/and\\/testestest"}}}], extra_source[],
UPDATE 2: It works when I'm using querystring, but I wouldn't like to user that and type everything by hand.

You might have to use _id instead of Id

So the reason why I didn't get any results is the default index which I had created.
I didn't specified mapping for my field, so ElasticSearch didn't treated my field.
In ElasticSearch documentation I read, that during the analysis process, elastic search splits the string into words, lower-case them and do some other stuff.
In my case my "/path/in/my/field" was splitted into four fields:
path
in
my
field
So when I was searching for "pageId:/path/in/my/field" I didn't get any results because pageId in fact didn't contained it.
To solve the issue I had to add proper mapping to pageId field, which didn't do any preprocessing (instead of four words, now I have one "/path/in/my/field")
Links to docs:
https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html

ElasticSearch - Using FilterBuilders

I am new to ElasticSearch and Couchbase. I am building a sample Java application to learn more about ElasticSearch and Couchbase.
Reading the ElasticSearch Java API, Filters are better used in cases where sort on score is not necessary and for caching.
I still haven't figured out how to use FilterBuilders and have following questions:
Can FilterBuilders be used alone to search?
Or Do they always have to be used with a Query? ( If true, can someone please list an example? )
Going through a documentation, if I want to perform a search based on field values and want to use FilterBuilders, how can I accomplish that? (using AndFilterBuilder or TermFilterBuilder or InFilterBuilder? I am not clear about the differences between them.)
For the 3rd question, I actually tested it with search using queries and using filters as shown below.
I got empty result (no rows) when I tried search using FilterBuilders. I am not sure what am I doing wrong.
Any examples will be helpful. I have had a tough time going through documentation which I found sparse and even searching led to various unreliable user forums.
private void processQuery() {
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
QueryBuilder qb = QueryBuilders.fieldQuery("doc.address.state", "TX");
srb.setQuery(qb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
private void searchWithFilters(){
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
srb.setFilter(FilterBuilders.termFilter("doc.address.state", "tx"));
//AndFilterBuilder andFb = FilterBuilders.andFilter();
//andFb.add(FilterBuilders.termFilter("doc.address.state", "TX"));
//srb.setFilter(andFb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
--UPDATE--
As suggested in the answer, changing to lowercase "tx" works. With this question resolved. I still have following questions:
In what scenario(s), are filters used with query? What purpose will this serve?
Difference between InFilter, TermFilter and MatchAllFilter. Any illustration will help.

Right, you should use filters to exclude documents from being even considered when executing the query. Filters are faster since they don't involve any scoring, and cacheable as well.
That said, it's pretty obvious that you have to use a filter with the search api, which does execute a query and accepts an optional filter. If you only have a filter you can just use the match_all query together with your filter. A filter can be a simple one, or a compund one in order to combine multiple filters together.
Regarding the Java API, the names used are the names of the filters available, no big difference. Have a look at this search example for instance. In your code I don't see where you do setFilter on your SearchRequestBuilder object. You don't seem to need the and filter either, since you are using a single filter. Furthermore, it might be that you are indexing using the default mappings, thus the term "TX" is lowercased. That's why when you search using the term filter you don't find any match. Try searching for "tx" lowercased.
You can either change your mapping if you want to keep the "TX" term as it is while indexing, probably setting the field as not_analyzed if it should only be a single token. Otherwise you can change filter, you might want to have a look at a query that is analyzed, so that your query wil be analyzed the same way the content was indexed.
Have a look at the query DSL documentation for more information regarding queries and filters:
MatchAllFilter: matches all your document, not that useful I'd say
TermFilter: Filters documents that have fields that contain a term (not analyzed)
AndFilter: compound filter used to put in and two or more filters
Don't know what you mean by InFilterBuilder, couldn't find any filter with this name.
The query usually contains what the user types in through the text search box. Filters are more way to refine the search, for example clicking on facet entries. That's why you would still have the query plus one or more filters.

To append to what #javanna said:
A lot of confusion can come from the fact that filters can be defined in several ways:
standalone (with a required query, for instance match_all if all you need is the filters) (http://www.elasticsearch.org/guide/reference/api/search/filter/)
or as part of a filtered query (http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/)
What's the difference you might ask. And indeed you can construct exactly the same logic in both ways.
The difference is that a query operates on BOTH the resultset as well as any facets you have defined. Whereas, a Filter (when defined standalone) only operates on the resultset and NOT on any facets you may have defined (explained here: http://www.elasticsearch.org/guide/reference/api/search/filter/)

To add to the other answers, InFilter is only used with FilterBuilders. The definition is, InFilter: A filter for a field based on several terms matching on any of them.
The query Java API uses FilterBuilders, which is a factory for filter builders that can dynamically create a query from Java code. We do this using a form and we build our query based on user selections from it with checkboxes, options, and dropdowns.
Here is some Example code for FilterBuilders and there is a snippet from that link that uses InFilter as shown below:
FilterBuilder filterBuilder;
User user = (User) auth.getPrincipal();
if (user.getGroups() != null && !user.getGroups().isEmpty()) {
filterBuilder = FilterBuilders.boolFilter()
.should(FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName())))
.should(FilterBuilders.nestedFilter("groupRoles", FilterBuilders.inFilter("groupRoles.key", user.getGroups().toArray())));
} else {
filterBuilder = FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName()));
}
...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.