Get diacritic insensitive results from Realm database query - java

I'm in trouble with a simple query to get strings from Realm engine in Java for an Android app.
As said in the title of my topic, I want to get diacritic insensitive results from my query.
Example:
If user type the word "securite", I want my query to return "securite" and "sécurité".
How can I do that ?
Thanks a lot in advance for your help !

While Realm doesn't support that currently. Depending on how much of the data you control, you can also add a "normalized" field you can use in your search. There is an approach described here: Remove diacritics from string in Java

This is not possible in Realm at the moment. Your only option is to manage tables containing all the possibilities for each letter of the alphabet you are interested in. Something like [a, á, à, å, etc] and then for each string compute all the possible permutations and build a huge query with equalTo() and or(). It would probably take longer to build such query than to execute it, but that's a very interesting use case! If you end up implementing it I would love to know the results!

Related

Lucene get list of matched keywords

I have a Java (lucene 4) based application and a set of keywords fed into the application as a search query (the terms may include more than one words, eg it can be: “memory”, “old house”, “European Union law”, etc).
I need a way to get the list of matched keywords out of an indexed document and possibly also get keyword positions in the document (also for the multi-word keywords).
I tried with the lucene highlight package but I need to get only the keywords without any surrounding portion of text. It also returns multi-word keywords in separate fragments.
I would greatly appreciate any help.
There's a similar (possibly same) question here:
Get matched terms from Lucene query
Did you see this?
The solution suggested there is to disassemble a complicated query into a more simple query, until you get a TermQuery, and then check via searcher.explain(query, docId) (because if it matches, you know that's the term).
I think It's not very efficient, but
it worked for me until I ran into SpanQueries. it might be enough for you.

Exact Pharse Match in Solr with single/multi words for text field

I have a big problem and questions regarding solr behaviour,could you please help me to solve this.
Don't mind my questions are too long.
My client have a requirement like the following below.
We need a matchall and matchallpartial scenarios.
It depends on search field we do matchall or matchallpartial in our application level.
and we have wildcard as well left,right,both are the wildcard entries.
I used Keyword tokenizer for indexing as well as querying it is satisfying my requirement in all scenarios.But synonyms,stopwords and stemming not working at all.Because i used keyword tokenizer and it is building queries like whole phrase.I tried with StandaradTokenizer factory it is failing only matchall scenario remaining it is working fine.
Could you please post some example queries and suggestions to get exact matches with single word/multiword .
e.g.
If my field has
"Indicators Indicator Components" this is whole phrase.i am getting results even though i am searching for "indicator" i don't want that.
If i use keyword tokenizer i am getting what i want but it is failing in synonyms,stopwords scenarios.
Some times(depends on the logic) i will use same text filed for matchallpartial scenario that time i want results for "indicator" how can i get exact matchall for whole phrase/word by using stadardtokenizer.
Please help me.
Thanks,
Sri
I am listing two examples which will surely help to get exact match
My first query is :- /select?q=name:anand kishore - By this way I will get 1000 records which will be having name anand or kishore or both
My Second query is :- /select?q=name:"anand kishore" - By this I will get 60 results of the records which will have anand kishore like, (anand kishore tripathy, kamal anand kishore)
My Third query is :- /select?q=name:"kamal anand kishore" - By this I will get only one result which is matching exactly, i.e kamal anand kishore

Ignore punctuation in query in Sqlite

I'm using Sqlite with Android (Java).
I have a database that contains texts with hebrew punctuation.
My problem is that when I'm doing a SELECT for certain value (without punctuation) I don't get all the results as I guess the DB is not ignoring the records that are punctuated and treating the punctuation as a normal characters.
After doing a search, I found some answers which says I should register a collation for it (sqlite3_create_collation).
As I've never used collations, I would like if some one will give me a hint on how to register it and use it to get the correct full result as I want.
For example:
SELECT * FROM sometable WHERE punctuated_field LIKE '%re%'
I would like to get both the following:
dream
drém
Currently I'm getting just:
dream
I read this relevant answer but didn't managed to understand how to implement it within my query or the Java code.
I would be happy to have someone writing the full query required for me to write within my code.
Thanks in advance!
The Android API does not allow registering custom collations.
You have to make do with the built-in collations, or with Android's LOCALIZED and UNICODE collations.
Since the Android sqlite API doesn't expose anything to set up custom collations, you'll have to figure some other way to solve the problem.
One is to add another column where you have the strings normalized i.e. accent marks ("punctuation" as you like) removed. Then do your LIKE matching on this normalized column and use the original column for display purposes. The cost of this is larger data size and some extra code when inserting into the database.
I've described one such normalization approach in here:
How to ignore accent in SQLite query (Android) - I have no idea how well that works with Hebrew chars though.

Elasticsearch search query selection

I'd like to search terms (GoogleEarth or googleearch) using elasticSearch.
Now if I tried to search query 'Google', I cannot get any results without NGram or EdgeNGram.
I don't want to use nGram because they get a lot of results. So now I just use Bool Query + multimatchquery. At this case, I cannot get results by partial words.
I hope I can search 'Google Earth' or 'Google' or 'Earth' to get GoogleEarth. How can I get this?
Now I just use query 'GoogleEarth' to get right result. I want to search terms if they included.
.setQuery(QueryBuilders.boolQuery().should(QueryBuilders.multiMatchQuery(query,
'title','name','tag')))
update
I tried to search terms based on exact match. If I search 'google', i want to get 'google***' 'googleearth' and so on. I know if I use edgeNGram or nGram, i may get less related results. So if possible, I don't want to use nGram or edgeNGram.
Do you have any ideas?
I think you need to define a custom analyzer to tokenize words based on camel case - i.e. "GoogleEarth" needs to be tokenized into the parts "Google" and "Earth".
See the camelcase tokenizer section of http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer/

Google App Engine and SQL LIKE

Is there any way to query GAE datastore with filter similar to SQL LIKE statement? For example, if a class has a string field, and I want to find all classes that have some specific keyword in that string, how can I do that?
It looks like JDOQL's matches() don't work... Am I missing something?
Any comments, links or code fragments are welcome
As the GAE/J docs say, BigTable doesn't have such native support. You can use JDOQL String.matches for "something%" (i.e startsWith). That's all there is. Evaluate it in-memory otherwise.
If you have a lot of items to examine you want to avoid loading them at all. The best way would probably be to break down the inputs a write time. If you are only searching by whole words then that is easy
For example, "Hello world" becomes "Hello", "world" - just add both to a multi valued property. If you have a lot of text you want to avoid loading the multi valued property because you only need it for the index lookup. You can do this by creating a "Relation Index Entity" - see bret slatkins Google IO talk for details.
You may also want to break down the input into 3 character, 4 character etc strings or stem the words - perhaps with a lucene stemmer.

Categories

Resources