Hibernate Search Lucene query parser with Special Characters - java

FIRST QUESTION:
Can somebody explain to me how the lucene query in Hibernate Search handles special characters. I read the documentation about Hibernate search and also the Lucene Regexp Syntax but somehow they don't add up with the generated Queries and Results.
Lets assume i have following database entries:
name
Will
Will Smith
Will - Smith
Will-Smith
and i am using following Query:
Query query = queryBuilder
.keyword()
.onField("firstName")
.matching(input)
.createQuery();
Now I am looking for the following input:
Will -> returns all 4 entries, with the following generated query: FullTextQueryImpl(firstName:will)
Will Smith -> also returns all 4 entries with the following generated query: FullTextQueryImpl(firstName:will firstName:smith)
Will - Smith -> also returns all 4 entries with the following generated query: FullTextQueryImpl(firstName:will firstName:smith) ? Where is the "-" or shouldn't it forbid everything after the "-" according to Lucene Query Syntax?
Will-Smith -> same here
Will-Smith -> here i tried to use backslash but same result
Will -Smith -> Same here
SECOND QUESTION: Lets assume i have following database entries in which the entry without numerical ending always exists and the ones with numerical ending could be in the datebase.
How woul a lucene query for this look like?
name
Will
Will1
Will2

You can play around with Lucene analyzers and see what happens behind the scenes. Here is a tutorial: https://www.baeldung.com/lucene-analyzers
The tokenizer is pluggable, so you can change how special characters are treated.

Related

Elasticsearch query_string breaks with case insensitive search

I have text messages like - "#Someone please note%"
and i need to get the search results when searching - #someone or #Someone or #some or #someone note.
Also need to get when searched with special characters like % or #.
I'm using elasticsearch's BoolqueryBuilder with query_string for fetching.
Is there any way to make this search possible, or should I use any other methods like wildcard or matchquery.
There are some messages in my index - document.
the key name is message.
and it can contain any type of texts.
Suppose if there is message - Hello##$ ther5# #Someone
I need to get the match results when searched for the below keywords:
#someone, #Some, #some, ##, ther5#, #Someone
Im using java backend with BoolQueryBuilder

In AEM query builder order by based on someproperty=somevalue

I m trying to sort the AEM query builder search results based on particular value of particular property. as we have in any database like MySQL we can sort based on column's value as well (for exp. ORDER BY FIELD('columnName','anyColumnName'). can we have something like this in AEM.
Suppose we have 5 Assets under path /content/dam/Assets.
Asset Name------------dc:title
1.jpg------------------Apple
2.jpg------------------Cat
3.jpg------------------Cat
4.jpg------------------Ball
5.jpg------------------Drag
I need assets on top of the results where dc:title = cat and also need other results also in sorting asc. expected result as given below
2.jpg------------------Cat
3.jpg------------------Cat
1.jpg------------------Apple
4.jpg------------------Ball
5.jpg------------------Drag
Note:- Using version AEM 6.2
You can use the orderby predicate with a value of #jcr:content/metadata/dc:title to sort by dc:title with the QueryBuilder. /libs/cq/search/content/querydebug.html is an interface to test queries on your instance. ACS Commons has a good breakdown of all out of the box predicates
If you want to pull Cats to the top of the results with a single query, you could write a custom predicate. The sample code from ACS Commons shows an example. Adobe has documentation as well.

How to use Regex keyword in Spring Data Repository Method

I am currently using spring-data-jpa version 1.9.4.
I have a MySql table with columns project(integer), summary(varchar), and description(varchar).
I have a regex that I would like to use to search the summary and/or description field meaning that if it finds it in summary does not need to apply regex to description.
The repository method I am attempting to use is:
List<Issue> findByProjectAndSummaryOrDescriptionRegex(long project, String regex)
The error I am receiving is:
java.lang.IllegalArgumentException: Unsupported keyword REGEX (1):
[MatchesRegex, Matches, Regex]
It is difficult in my company environment to update/upgrade versions, so if the issue is NOT my syntax but rather the then if someone knows which version now supports 'Regex' for query derivation or where I could find that specific information I would be grateful. I have looked at the Changelog and it appears that 1.9.4 should support but it appears not.
Thanks for your help!
JD
EDIT 1: I am aware of the #Query annotation but have been asked by my lead to only use that as a last resort if I cannot find the correct version which supports keyword REGEX [MatchesRegex, Matches, Regex]
I would recommend using native query (with #Query annotation) if the Spring data syntax does not work, e.g.:
#Query(nativeQuery=true, value="SELECT * FROM table WHERE project = ?1 AND (summary regexp ?2 OR description regexp ?2)")
List<Issue> findByProjectAndSummaryOrDescription(long project, String regex);
Update
If native query is not an option then (a) could you try it with single column and see if that works and (b) could you try by appending regex to both the columns, e.g.:
List<Issue> findByProjectAndDescriptionRegex(long project, String regex);
List<Issue> findByProjectAndSummaryRegexOrDescriptionRegex(long project, String regex, String regex);
In a followup, I discovered by doing some digging that the authoratative list will reside in the org.springframework.data.jpa.repository.query.JpaQueryCreator class. So for future folks that want to know which keywords from the 'Documented' list are ACTUALLY implemented, look inside JpaQueryCreator and you will the keywords supported as case arguments inside a switch!
Hope this helps!
PS - as suspected, REGEX was not supported in my version
try tu use #Query with param nativeQuery = true inside You can use database regexp_like function :
#Query(value = "select t.* from TABLE_NAME t where regexp_like(t.column, ?1)", nativeQuery = true)
Documentation :
https://www.techonthenet.com/oracle/regexp_like.php

Neo4j slow cypher query in embedded mode

I have a huge graphdatabase with authors, which are connected to papers and papers a connected to nodes which contains meta information of the paper.
I tried to select authors which match a specific pattern and therefore I executed the following cypher statement in java.
String query = "MATCH (n:AUTHOR) WHERE n.name =~ '(?i).*jim.*' RETURN n";
db.execute(query);
I get a resultSet with all "authors" back. But the execution is very slow. Is it, because Neo4j writes the result into the memory?
If I try to find nodes with the Java API, it is much faster. Of course, I am only able to search for the exact name like the following code example, but it is about 4 seconds faster as the query above. I tested it on a small database with about 50 nodes, whereby only 6 of the nodes are authors. The six author are also in the index.
db.findNodes(NodeLabel.AUTHOR, NodeProperties.NAME, "jim knopf" );
Is there a chance to speed up the cypher? Or a possiblity to get all nodes via Java API and the findNodes() method, which match a given pattern?
Just for information, I created the index for the name of the author in java with graph.schema().indexFor(NodeLabel.AUTHOR).on("name").create();
Perhaps somebody could help. Thanks in advance.
EDIT:
I run some tests today. If I execute the query PROFILE MATCH (n:AUTHOR) WHERE n.name = 'jim seroka' RETURN n; in the browser interface, I have only the operator NodeByLabelScan. It seems to me, that Neo4j does not automatic use the index (Index for name is online). If I use a the specific index, and execute the query PROFILE MATCH (n:AUTHOR) USING INDEX n:AUTHOR(name) WHERE n.name = 'jim seroka' RETURN n; the index will be used. Normally Neo4j should use automatically the correct index. Is there any configuration to set?
I also did some testing in the embedded mode again, to check the performance of the query in the embedded mode. I tried to select the author "jim seroka" with db.findNode(NodeLabel.AUTHOR, "name", "jim seroka");. It works, and it seems to me that the index is used, because of a execution time of ~0,05 seconds.
But if I run the same query, as I executed in the interface and mentioned before, using a specific index, it takes ~4,9 seconds. Why? I'm a little bit helpless. The database is local and there are only 6 authors. Is the connector slow or is the creation of connection wrong? OK, findNode() does return just a node and execute a whole Result, but four seconds difference?
The following source code should show how the database will be created and the query is executed.
public static GraphDatabaseService getNeo4jDB() {
....
return new GraphDatabaseFactory().newEmbeddedDatabase(STORE_DIR);
}
private Result findAuthorNode(String searchValue) {
db = getNeo4jDB();
String query = "MATCH (n:AUTHOR) USING INDEX n:AUTHOR(name) WHERE n.name = 'jim seroka' RETURN n";
return db.execute(query);
}
Your query uses a regular expression and therefore is not able to use an index:
MATCH (n:AUTHOR) WHERE n.name =~ '(?i).*jim.*' RETURN n
Neo4j 2.3 introduced index supported STARTS WITH string operator so this query would be very performant:
MATCH (n:Author) WHERE n.name STARTS WITH 'jim' RETURN n
Not quite the same as the regular expression, but will have better performance.

AND OR search syntax in lucene

I am using Lucene. I have three columns which are
DocId - TermID - TermFrequency
1 - 004 - 667
2 - 005 - 558
If i use mysql then query for AND operation is
SELECT * FROM table_name WHERE DocId='1' AND TermId='004'
How can i write above query in Lucene using JAVA. For one column search code i am using is
Query query = new QueryParser(Version.LUCENE_35,"TermID", analyzer).parse("004");
How can i use AND operation in QueryParser ??
Terms can be grouped with the AND keyword like so:
Query query = new QueryParser(Version.LUCENE_35,"TermID", analyzer).parse("004 AND DocId:1");
Note that you don't need to qualify the field for your "004" term because you've set "TermId" as the default field.
You should read the manual on the query syntax...it's pretty expressive.

Categories

Resources