ElasticSearch - Using FilterBuilders - java

I am new to ElasticSearch and Couchbase. I am building a sample Java application to learn more about ElasticSearch and Couchbase.
Reading the ElasticSearch Java API, Filters are better used in cases where sort on score is not necessary and for caching.
I still haven't figured out how to use FilterBuilders and have following questions:
Can FilterBuilders be used alone to search?
Or Do they always have to be used with a Query? ( If true, can someone please list an example? )
Going through a documentation, if I want to perform a search based on field values and want to use FilterBuilders, how can I accomplish that? (using AndFilterBuilder or TermFilterBuilder or InFilterBuilder? I am not clear about the differences between them.)
For the 3rd question, I actually tested it with search using queries and using filters as shown below.
I got empty result (no rows) when I tried search using FilterBuilders. I am not sure what am I doing wrong.
Any examples will be helpful. I have had a tough time going through documentation which I found sparse and even searching led to various unreliable user forums.
private void processQuery() {
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
QueryBuilder qb = QueryBuilders.fieldQuery("doc.address.state", "TX");
srb.setQuery(qb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
private void searchWithFilters(){
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
srb.setFilter(FilterBuilders.termFilter("doc.address.state", "tx"));
//AndFilterBuilder andFb = FilterBuilders.andFilter();
//andFb.add(FilterBuilders.termFilter("doc.address.state", "TX"));
//srb.setFilter(andFb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
--UPDATE--
As suggested in the answer, changing to lowercase "tx" works. With this question resolved. I still have following questions:
In what scenario(s), are filters used with query? What purpose will this serve?
Difference between InFilter, TermFilter and MatchAllFilter. Any illustration will help.

Right, you should use filters to exclude documents from being even considered when executing the query. Filters are faster since they don't involve any scoring, and cacheable as well.
That said, it's pretty obvious that you have to use a filter with the search api, which does execute a query and accepts an optional filter. If you only have a filter you can just use the match_all query together with your filter. A filter can be a simple one, or a compund one in order to combine multiple filters together.
Regarding the Java API, the names used are the names of the filters available, no big difference. Have a look at this search example for instance. In your code I don't see where you do setFilter on your SearchRequestBuilder object. You don't seem to need the and filter either, since you are using a single filter. Furthermore, it might be that you are indexing using the default mappings, thus the term "TX" is lowercased. That's why when you search using the term filter you don't find any match. Try searching for "tx" lowercased.
You can either change your mapping if you want to keep the "TX" term as it is while indexing, probably setting the field as not_analyzed if it should only be a single token. Otherwise you can change filter, you might want to have a look at a query that is analyzed, so that your query wil be analyzed the same way the content was indexed.
Have a look at the query DSL documentation for more information regarding queries and filters:
MatchAllFilter: matches all your document, not that useful I'd say
TermFilter: Filters documents that have fields that contain a term (not analyzed)
AndFilter: compound filter used to put in and two or more filters
Don't know what you mean by InFilterBuilder, couldn't find any filter with this name.
The query usually contains what the user types in through the text search box. Filters are more way to refine the search, for example clicking on facet entries. That's why you would still have the query plus one or more filters.

To append to what #javanna said:
A lot of confusion can come from the fact that filters can be defined in several ways:
standalone (with a required query, for instance match_all if all you need is the filters) (http://www.elasticsearch.org/guide/reference/api/search/filter/)
or as part of a filtered query (http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/)
What's the difference you might ask. And indeed you can construct exactly the same logic in both ways.
The difference is that a query operates on BOTH the resultset as well as any facets you have defined. Whereas, a Filter (when defined standalone) only operates on the resultset and NOT on any facets you may have defined (explained here: http://www.elasticsearch.org/guide/reference/api/search/filter/)

To add to the other answers, InFilter is only used with FilterBuilders. The definition is, InFilter: A filter for a field based on several terms matching on any of them.
The query Java API uses FilterBuilders, which is a factory for filter builders that can dynamically create a query from Java code. We do this using a form and we build our query based on user selections from it with checkboxes, options, and dropdowns.
Here is some Example code for FilterBuilders and there is a snippet from that link that uses InFilter as shown below:
FilterBuilder filterBuilder;
User user = (User) auth.getPrincipal();
if (user.getGroups() != null && !user.getGroups().isEmpty()) {
filterBuilder = FilterBuilders.boolFilter()
.should(FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName())))
.should(FilterBuilders.nestedFilter("groupRoles", FilterBuilders.inFilter("groupRoles.key", user.getGroups().toArray())));
} else {
filterBuilder = FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName()));
}
...

Related

Script fields in hibernate elasticsearch

I'm using hibernate-search-elasticsearch 5.8.2.Final and I can't figure out how to get script fields:
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-script-fields.html
Is there any way to accomplish this functionality?
This is not possible in Hibernate Search 5.8.
In Hibernate Search 5.10 you could get direct access to the REST client, send a REST request to Elasticsearch and get the result as a JSON string that you would have to parse yourself, but it is very low-level and you would not benefit from the Hibernate Search search APIs at all (no query DSL, no managed entity loading, no direct translation entity type => index name, ...).
If you want better support for this feature, don't hesitate to open a ticket on our JIRA, describing in details what you are trying to achieve and how you would have expected to be able to do that. We are currently working on Search 6.0 which brings a lot of improvements, in particular when it comes to using native features of Elasticsearch, so it just might be something we could slip into our backlog.
EDIT: I forgot to mention that, while you cannot use server-side scripts, you can still get the full source from your documents, and do some parsing in your application to achieve a similar result. This will work even in Search 5.8:
FullTextEntityManager fullTextEm = Search.getFullTextEntityManager(entityManager);
FullTextQuery query = fullTextEm.createFullTextQuery(
qb.keyword()
.onField( "tags" )
.matching( "round-based" )
.createQuery(),
VideoGame.class
)
.setProjection( ElasticsearchProjectionConstants.SCORE, ElasticsearchProjectionConstants.SOURCE );
Object[] projections = (Object[]) query.getSingleResult();
for (Object projection : projections) {
float score = (float) projection[0];
String source = (String) projection[1];
}
See this section of the documentation.

In AEM query builder order by based on someproperty=somevalue

I m trying to sort the AEM query builder search results based on particular value of particular property. as we have in any database like MySQL we can sort based on column's value as well (for exp. ORDER BY FIELD('columnName','anyColumnName'). can we have something like this in AEM.
Suppose we have 5 Assets under path /content/dam/Assets.
Asset Name------------dc:title
1.jpg------------------Apple
2.jpg------------------Cat
3.jpg------------------Cat
4.jpg------------------Ball
5.jpg------------------Drag
I need assets on top of the results where dc:title = cat and also need other results also in sorting asc. expected result as given below
2.jpg------------------Cat
3.jpg------------------Cat
1.jpg------------------Apple
4.jpg------------------Ball
5.jpg------------------Drag
Note:- Using version AEM 6.2
You can use the orderby predicate with a value of #jcr:content/metadata/dc:title to sort by dc:title with the QueryBuilder. /libs/cq/search/content/querydebug.html is an interface to test queries on your instance. ACS Commons has a good breakdown of all out of the box predicates
If you want to pull Cats to the top of the results with a single query, you could write a custom predicate. The sample code from ACS Commons shows an example. Adobe has documentation as well.

What's the difference between And filter and Bool filter in Elastic Search?

I'm having a trouble understanding the difference between the Bool filter and the And filter in elastic search.
Context: say my documents have fields: X, Y, Z.
Each field can have multiple values.
Goal:
I want to send a query to elastic search in the following sense: (X=valueX1 OR X=valueX2) AND (Y=valueY1 OR Y=valueY2 OR .. ) AND (Z=valueZ1 OR Z=valueZ2 OR ...).
Attempt:
This is how I'm doing that:
BoolFilterBuilder mainClaus = FilterBuilders.boolFilter();
FilterBuilder filterBuilder = mainClaus;
BoolFilterBuilder xClaus = FilterBuilders.boolFilter();
BoolFilterBuilder yClaus = FilterBuilders.boolFilter();
BoolFilterBuilder zClaus = FilterBuilders.boolFilter();
mainClaus.must(xClaus);
mainClaus.must(yClaus);
mainClaus.must(zClaus);
//Return a document if it has at least one of those values.
xClaus.should( FilterBuilders.termFilter("X", "valueX1") );
xClaus.should( FilterBuilders.termFilter("X", "valueX2") );
xClaus.should( FilterBuilders.termFilter("X", "valueX3") );
//Return a document if it has at least one of those values.
yClaus.should( FilterBuilders.termFilter("Y", "valueY1") );
yClaus.should( FilterBuilders.termFilter("Y", "valueY2") );
yClaus.should( FilterBuilders.termFilter("Y", "valueY3") );
//Return a document if it has at least one of those values.
zClaus.should( FilterBuilders.termFilter("Z", "valueZ1") );
zClaus.should( FilterBuilders.termFilter("Z", "valueZ2") );
zClaus.should( FilterBuilders.termFilter("Z", "valueZ3") );
Questions:
Is my approach correct?
What's the difference between the Bool filter and the And filter?
The main difference lies in how they are executed. And the keyword here is bitset. Simply put, bool filters leverage bitsets, while and filters do not.
When bool filters are used, bitsets are created and then AND/OR'ed together in order to figure out the matching documents.
When and filters are used, ES simply scans through the list of documents one by one and includes it or not depending on whether it matches the filter or not.
Needless to say that a bool filter is a much faster alternative than a and one. Yet, not always. There are situations where you still want to prefer and over bool: when using geo filters, script filter and numeric range filter, i.e. when those filters are used, ES has to iterate over all documents anyway.
However, all of this only holds for ES pre-2.0, as starting in 2.0, and/or filters will be implemented as bool and the query DSL will be completely overhauled, so that there won't be any difference anymore between queries and filters.
For more in depth info, you can read the nitty gritty details in this great blog post titled: "All about ES filter bitsets"
So what you're doing is OK, but a more concise alternative would be to simply must three terms filter, like this:
BoolFilterBuilder mainClaus = FilterBuilders.boolFilter();
mainClaus.must(FilterBuilders.termsFilter("X", "valueX1", "valueX2", "valueX3"));
mainClaus.must(FilterBuilders.termsFilter("Y", "valueY1", "valueY2", "valueY3"));
mainClaus.must(FilterBuilders.termsFilter("Z", "valueZ1", "valueZ2", "valueZ3"));
Instead of using boolean filters here, you should go for a multi-match query.
Since you are comparning one variable 'X' with three different values, something like the code below would be a better approach.
String [] params = {'valueX1','valueX3','valueX3'}
queryBuilder = QueryBuilders.multiMatchQuery('X', params);
This queryBuilder can then be added as a part of a bigger 'must' query where all three variable X, Y and Z can be compared.
You can read more about multimatch queries here.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html

Cannot find document when searching for field with Id using Java API in ElasticSearch

I have a field which contains forward slashes. I'm trying to execute this Query:
QueryBuildres.termQuery("id", QueryParser.escape("/my/field/val"))
and I cannot get any results. When I'm looking for 'val' only, then I get the proper results. Any ideas why is that happening? Of course without escaping it also doesn't return the results.
UPDATE
so QP.escape parses string properly, but when request goes to elasticsearch it's double escaped
[2015-07-10 01:53:00,063][WARN ][index.search.slowlog.query] [Aaa AA] [index_name][4] took[420.8micros], took_millis[0], types[page], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"pageId":"\\/path\\/and\\/testestest"}}}], extra_source[],
UPDATE 2: It works when I'm using querystring, but I wouldn't like to user that and type everything by hand.
You might have to use _id instead of Id
So the reason why I didn't get any results is the default index which I had created.
I didn't specified mapping for my field, so ElasticSearch didn't treated my field.
In ElasticSearch documentation I read, that during the analysis process, elastic search splits the string into words, lower-case them and do some other stuff.
In my case my "/path/in/my/field" was splitted into four fields:
path
in
my
field
So when I was searching for "pageId:/path/in/my/field" I didn't get any results because pageId in fact didn't contained it.
To solve the issue I had to add proper mapping to pageId field, which didn't do any preprocessing (instead of four words, now I have one "/path/in/my/field")
Links to docs:
https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html

Boost one value of several on field with Spring Data Solr

I have a solr field visibility that has a set of possible values and I would like to perform a search using Spring Data Solr. I'd like to use in() and boost some but not all values if possible.
Here's an example of the search I need,
visibility:(visible^1000 archived)
or
visibility:(visible^1000 draft^500 archived)
Would this be possible in version 1.0.0.RELEASE or later versions? Currently I'm using 1.0.0.RELEASE.
Thanks, /w
I never managed to chain this query properly, but was enlightened by petrikainulainen.net on SimpleStringCriteria().
Criteria criteria = new SimpleStringCriteria("(visibility:(visible^1000) OR visibility:(draft^500) OR visibility:(archived))");
By doing as shown below you will give far higher boost to the docs that has the category as 'webpage'. Then add 'combined' criteria to your existing other criteria
Criteria webpage= new Criteria("category" ).is("webpage" ).boost((float) 1000);
Criteria document= new Criteria("category" ).is("document" ).boost((float) 2);
Criteria combined= webpage.or( document);

Categories

Resources