ElasticSearch Full Text Search - java

I try to run full text search with regular expression on elastic search java api. My filter is like this:
FilterBuilder qFilter= FilterBuilders.regexpFilter("_all",
". *"+text+". *");
But it matches with only one word not with a phrase. What I mean is, for example:
if there is a string in the soruce like: "one two three four five.." and when my text string is like these: "two" , "our", "thr" ... then it works.
But when my realTimeTextIn string is "two three" full text search doesn't work. I can't search one more than one words.
What I'm missing here?
The rest of the codes are something like this:
FilterBuilder qFilter = FilterBuilders.regexpFilter("_all", ".*"+q+".*");
SearchResponse response = ClientProvider.instance().getClient().prepareSearch(index)
.setTypes(type)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setPostFilter(qFilter)
.setFrom(0).setSize(250).setExplain(true)
.execute()
.actionGet();
Thanks for helps.

When text string is empty or null,this join method throws exception.
You can use regexp filter like this.
FilterBuilder qFilter = FilterBuilders.regexpFilter("_all",(".*"+q+".*").replace(" ", ".*"));

That is an interesting question. I found something like phrase queries and phrase matching:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/phrase-matching.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_phrase_search.html
In java api we can do this for query (I tested this):
SearchResponse response = client.prepareSearch(index)
.setTypes(type)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFrom(0).setSize(250).setExplain(true).setQuery(QueryBuilders.matchPhraseQuery(field, "one two"))
.execute()
.actionGet();
I'm sorry, but I didn't find any solution.
You can try build a script filter (insert plain json to your filter instead of java method) or something called query filter:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html
I hope that it helped you a little.
EDIT:
Of course there is a simple solution, but I don't know if it satistfies you.
FilterBuilder qFilter= FilterBuilders.regexpFilter(
"_all",". *"+Joiner.on(".*").join(text.split(" "))+". *");

I happened to make a full text search like this using query builder
QueryBuilder queryBuilder = QueryBuilders.multiMatchQuery(query)
.field("name", 2.0f)
.field("email")
.field("title")
.field("jobDescription", 3.0f)
.type(MultiMatchQueryBuilder.Type.PHRASE_PREFIX);

Related

Spring Data MongoDb elemMatch criteria matching all search strings

I'm having an issue with custom Spring Data queries with MongoDb and Java. I'm attempting to implement a flexible search functionality against most of the fields of the document.
This document represents a person, and it contains a set of addresses embedded in it; the address has a field that is a set of strings that are the 'street address lines'.
I started with Query By Example, and this works for the single fields. but doesn't work for other types - such as this set of strings. For these, I'm building custom criteria.
The search criteria includes a set of street lines that I would like to match against the document's lines. If every line in the search is found in the document, the criteria should be considered matching.
I've tried using elemMatch, but this doesn't quite work like I want:
addressCriteriaList.add(Criteria.where("streetAddressLines").elemMatch(new Criteria().in(addressSearch.getStreetAddressLines())));
This seems to match if only ONE line in the document matches the search. If I have the following document:
"streetAddressLines": [ "123 Main Street", "Apt 1" ]
and the search looks like this:
"streetAddressLines": [ "123 Main Street", "Apt 2" ]
the elemMatch succeeds, but that's not what i want.
I've also tried looping through each of the search lines, trying an elemMatch to see if each is in the document:
var addressLinesCriteriaList = new Array<Criteria>();
var streetAddressLines = address.getStreetAddressLines();
streetAddressLines.forEach(l -> addressLinesCriteriaList.add(Criteria.where("streetAddressLines").elemMatch(new Criteria().is(l))))
var matchCriteria = new Criteria.andOperator(addressLinesCriteriaList);
This doesn't seem to work. I have done some experimenting, and it may be that this doesn't seem to work: new Criteria().is(l)
I tried this, and this DOES seem to work, but I would think that it's really inefficient to create a collection for each search line:
streetAddressLines.forEach(l ->
{
var list = new ArrayList<String>();
list.add(l);
addressCriteriaList.add(Criteria.where("streetAddressLines").elemMatch(new Criteria().in(l)));
});
So I don't know exactly what's going on - does anyone have any ideas of what I'm doing wrong? Thanks in advance.
You need to use the $all operator or the all method of Criteria class. Something along these lines:
addressCriteriaList.add(Criteria.where("streetAddressLines").all(addressSearch.getStreetAddressLines()));
If addressSearch.getStreetAddressLines returns a list, try this:
addressCriteriaList.add(Criteria.where("streetAddressLines").all(addressSearch.getStreetAddressLines().toArray(new String[0])));

How can I automatically convert all Lucene TermQuery objects to PrefixQuery?

I'm using QueryParser with a StandardAnalyzer to parse a queryString. With this setup, if I search for "key short", it will not match the text "keyboard shortcut".
I think it's because the queryString "key short" gets parsed as BooleanQuery(TermQuery("key"), TermQuery("short")). If I wanted it to match "keyboard shortcut", I'd have to search for "key* short*". I'd like the QueryParser to do this for me automatically, ie produce: BooleanQuery(PrefixQuery("key"), PrefixQuery("short")) when given the queryString "key short".
Is this the right approach? If so, how should I go about doing this?
I never found a 'proper' solution to this, so I implemented a hack that appends wildcards to individual words in the raw query and then feeds that to the analyzer:
private static final Pattern QUERY_WORD_PATTERN = Pattern.compile("(?<= |^)(?!AND|OR)(\\w+)(?= |$)");
...
String processedQuery = String.format("%s OR %s",
QUERY_WORD_PATTERN.matcher(queryString).replaceAll("$1*"),
queryString);
Query query = new QueryParser(CONTENTS_FIELD, analyzer).parse(processedQuery);

Elasticsearch similar documents in Java

I'm doing a website (an auction website) using java. I have one page to show the product in auction and I want to show 10 similar products.
To perform the search I'm using elasticsearch (by using the elasticsearch java implementation dadoonet).
One requirement I have is to show only the 10 similar documents that has date > now.
I say the elasticsearch documentation and I found the query "More like this" but first I'm not getting this to work using:
new MoreLikeThisRequest("auction").searchSize(size).id(productId + "").fields(new String[] { "name", "description", "brand" }).type("string");
Because is always showing the error:
org.elasticsearch.index.engine.DocumentMissingException: [_na][_na] [string][2]: document missing
And I'm not find the way to filter the date.
Someone can point me on the right way to do this?
thks
My best bet would be that you have the wrong id and I also see that you are missing the type. To use more like this, you have to provide the document to use. This is defined by the combination of index,type and id. If you do not specify the document right, elasticsearch cannot find the document and that is most probably why you get the document missing message.
In java I would do something like this:
FilteredQueryBuilder queryBuilder =
new FilteredQueryBuilder(
QueryBuilders.matchAllQuery(),
FilterBuilders.rangeFilter("datefield").lte("now")
);
SearchSourceBuilder query = SearchSourceBuilder.searchSource().query(queryBuilder);
client.prepareMoreLikeThis("index","type","id")
.setField("field1","field2")
.setSearchSource(query)
.execute().actionGet();
So after strugling a little bit I found someone with the same problem. So his suggestion was to set the min_term_freq to 1.
So the code now looks like this:
FilteredQueryBuilder queryBuilder = new FilteredQueryBuilder(QueryBuilders.matchAllQuery(), FilterBuilders.rangeFilter("finish_date").lt("now"));
SearchSourceBuilder query = SearchSourceBuilder.searchSource().query(queryBuilder);
SearchResponse response = esClient.prepareMoreLikeThis("auction", "product", productId + "").setField("name.name", "description", "brand").setPercentTermsToMatch(0.3f)
.setMinTermFreq(1).setSearchSource(query).execute().actionGet();
But I dont know what this MinTermFreq does and if the value 1 is the right value. Someone know what is this field?
Thks for all the help!
Once again, Thank you for all the help and sorry for all the trouble!

How to get facet value for all fields using Java API

I'm new to Elasticsearch and tried to query some sample documents. I issued the following query using the Java API. This query fetched me the correct result. It returned the names of all categories. Now I want the count of all names of a category. Could you explain me how to do that? I'm sorry for my bad English.
SearchResponse sr = client.prepareSearch()
.addField("Category")
.setQuery(QueryBuilders.matchAllQuery())
.addFacet(FacetBuilders.termsFacet("f")
.field("Category"))
.execute()
.actionGet();
Look at the Count API to count your results if you not want to get the result set (matched documents) but only the count.
If you want to get the result set, you get the result size in the response for every filter or query request too.

Lucene: Multiple words in a single term

Let's say I have a docs like
stringfield:123456
textfield:name website stackoverflow
and If I build a query in the following manner
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_42);
QueryParser luceneQueryParser = new QueryParser(Version.LUCENE_42, "", analyzer);
Query luceneSearchQuery = luceneQueryParser.parse("textfield:\"name website\"");
it will return the doc as expected, but if I build my query using Lucene QueryAPI
PhraseQuery firstNameQuery = new PhraseQuery();
firstNameQuery.add(new Term("textfield","name website"));
it will not give me any result, i will have to tokenize "name website" and add each token in phrasequery.
Is there any default way in QueryAPI to tokenize as it does while parsing a String Query.
Sure I can do that myself but reinvent the wheel if it's already implemented.
You are adding the entire query as a single term to your PhraseQuery. You are on the right track, but when tokenized, that will not be a single term, but rather two. That is, your index has the terms name, website, and stackoverflow, but your query only has one term, which matches none of those name website.
The correct way to use a PhraseQuery, is to add each term to the PhraseQuery separately.
PhraseQuery phrase = new PhraseQuery();
phrase.add(new Term("textfield", "name"));
phrase.add(new Term("textfield", "website"));
When you:
luceneQueryParser.parse("textfield:\"name website\"");
Lucene will tokenize the string "name website", and get 2 terms.
When you:
new Term("textfield","name website")
Lucene will not tokenize the string "name website", instead use the whole as a term.
As the result what you said, when you index the document, the field textfield MUST be Indexed and Tokenized.

Categories

Resources