Elasticsearch search query selection

Elasticsearch search query selection - java

I'd like to search terms (GoogleEarth or googleearch) using elasticSearch.
Now if I tried to search query 'Google', I cannot get any results without NGram or EdgeNGram.
I don't want to use nGram because they get a lot of results. So now I just use Bool Query + multimatchquery. At this case, I cannot get results by partial words.
I hope I can search 'Google Earth' or 'Google' or 'Earth' to get GoogleEarth. How can I get this?
Now I just use query 'GoogleEarth' to get right result. I want to search terms if they included.
.setQuery(QueryBuilders.boolQuery().should(QueryBuilders.multiMatchQuery(query,
'title','name','tag')))
update
I tried to search terms based on exact match. If I search 'google', i want to get 'google***' 'googleearth' and so on. I know if I use edgeNGram or nGram, i may get less related results. So if possible, I don't want to use nGram or edgeNGram.
Do you have any ideas?

I think you need to define a custom analyzer to tokenize words based on camel case - i.e. "GoogleEarth" needs to be tokenized into the parts "Google" and "Earth".
See the camelcase tokenizer section of http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer/

Related

Lucene get list of matched keywords

I have a Java (lucene 4) based application and a set of keywords fed into the application as a search query (the terms may include more than one words, eg it can be: “memory”, “old house”, “European Union law”, etc).
I need a way to get the list of matched keywords out of an indexed document and possibly also get keyword positions in the document (also for the multi-word keywords).
I tried with the lucene highlight package but I need to get only the keywords without any surrounding portion of text. It also returns multi-word keywords in separate fragments.
I would greatly appreciate any help.

There's a similar (possibly same) question here:
Get matched terms from Lucene query
Did you see this?
The solution suggested there is to disassemble a complicated query into a more simple query, until you get a TermQuery, and then check via searcher.explain(query, docId) (because if it matches, you know that's the term).
I think It's not very efficient, but
it worked for me until I ran into SpanQueries. it might be enough for you.

Get diacritic insensitive results from Realm database query

I'm in trouble with a simple query to get strings from Realm engine in Java for an Android app.
As said in the title of my topic, I want to get diacritic insensitive results from my query.
Example:
If user type the word "securite", I want my query to return "securite" and "sécurité".
How can I do that ?
Thanks a lot in advance for your help !

While Realm doesn't support that currently. Depending on how much of the data you control, you can also add a "normalized" field you can use in your search. There is an approach described here: Remove diacritics from string in Java

This is not possible in Realm at the moment. Your only option is to manage tables containing all the possibilities for each letter of the alphabet you are interested in. Something like [a, á, à, å, etc] and then for each string compute all the possible permutations and build a huge query with equalTo() and or(). It would probably take longer to build such query than to execute it, but that's a very interesting use case! If you end up implementing it I would love to know the results!

Solr query is returning partial matches for one field, and not a different field

I have a search autocomplete on my site, and I'm using Solr to find matching documents. I am trying to get partial matches on page titles, so for example Java* would match Java, Javascript, etc. As of right now, the autocomplete is set up to give me partial matches on all of the text in the page, which gives some weird results, so I've decided to switch over to using the page title. However, when I try to switch the search term from text for the page text to title, the query suddenly does not pick up partial matches any more. Here is an example of my original query:
q=text:java^2+text:"java"
&hl=true&hl.snippets=1&hl.fragsize=25&hl.fl=title&start=0&rows=3
Unfortunately, the guy who set this up for me does not work with me any more, so I have little idea what's going on 'under the hood'. I'm using Spring/J2EE for my backend, if that makes any difference.

You need to make sure that the field is no string based field. You can lookup this if you take a look at your schema.xml. If you search with Java* inside a string field it will match only titles which start with Java*.
Another thing is that you need to make sure that you are aware that Wildcard Queries are case sensitive (see this).

Depends on how the field title was analyzed, look at schema.xml to see what type the field is and how its analyzed to create term. Easy way to do that would be to go to solr admin http://localhost:8983/solr/admin/analysis.jsp, choose the same name option, type in the field name (am guessing 'title') put some sample text and query to see what terms are created and matched.

Lucene : Use SpanTermQuery to get results for words with special characters

Is it possible to search for results in Lucene for non-character words, for example if I am trying to find results for "word-processing" or "foo-bar". It doesn't look like they are considered as single term, while using SpanTermQuery. I get results for it using QueryParser but not SpanTermQuery. I am just wondering how it works, Any comments/ Ideas on how to have SpanTermQuery work for it?

I would recommend taking a look at how your field's Tokenizers and Analyzers are configured. Read the javadocs for the existing out of the box Tokenizers/Analyzers to see if one of them meet your needs. If one doesn't meet your needs, it's pretty easy to extend and create your own Tokenizer and/or Analyzer.
http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_write_my_own_Analyzer.3F
http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/Analyzer.html
http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/analysis/Tokenizer.html

Google App Engine and SQL LIKE

Is there any way to query GAE datastore with filter similar to SQL LIKE statement? For example, if a class has a string field, and I want to find all classes that have some specific keyword in that string, how can I do that?
It looks like JDOQL's matches() don't work... Am I missing something?
Any comments, links or code fragments are welcome

As the GAE/J docs say, BigTable doesn't have such native support. You can use JDOQL String.matches for "something%" (i.e startsWith). That's all there is. Evaluate it in-memory otherwise.

If you have a lot of items to examine you want to avoid loading them at all. The best way would probably be to break down the inputs a write time. If you are only searching by whole words then that is easy
For example, "Hello world" becomes "Hello", "world" - just add both to a multi valued property. If you have a lot of text you want to avoid loading the multi valued property because you only need it for the index lookup. You can do this by creating a "Relation Index Entity" - see bret slatkins Google IO talk for details.
You may also want to break down the input into 3 character, 4 character etc strings or stem the words - perhaps with a lucene stemmer.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Elasticsearch search query selection - java

I think you need to define a custom analyzer to tokenize words based on camel case - i.e. "GoogleEarth" needs to be tokenized into the parts "Google" and "Earth". See the camelcase tokenizer section of http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer/

Related

Lucene get list of matched keywords

Get diacritic insensitive results from Realm database query

Solr query is returning partial matches for one field, and not a different field

Lucene : Use SpanTermQuery to get results for words with special characters

Google App Engine and SQL LIKE

Categories

Resources