Hibernate search - handling null in boolean query

Hibernate search - handling null in boolean query - java

Is there a best practice for handling optional sub-queries? So say my search service has
query = builder.bool().must(createQuery(field1, term1)).must(createQuery(field2, term2)).createQuery();
createQuery(field, term) {
if(term != null) {
return builder.keyword().onField(field).matching(term).createQuery();
}
return null;
}
With the default QueryBuilder if I use a query like this and the term is null, the resulting query is "+term1 +null" or something along those lines, which causes a null pointer exception when the query is executed against the index. Is there a recommended way to avoid this issue? I was thinking about a custom QueryBuilder but I'm not sure how to tell the fulltext session to use my implementation rather than it's default. The only other way I can think of is something like
query;
query1 = createQuery(field1, term1);
query2 = createQuery(field2, term2);
if(query1 != null && query2 != null) {
query = builder.bool().must(query1).must(query2).createQuery();
} else if(query1 != null && query2 == null) {
query = query1;
} else if(query1 == null && query2 != null) {
query = query2;
}
createQuery(field, term) {
if(term != null) {
return builder.keyword().onField(field).matching(term).createQuery();
}
return null;
}
But this gets really messy really fast if there are more than a handful of sub-queries.

What you might do is introducing a method whose sole purpose would be to add a "must" in a null-safe way. I.e. do something like this:
BooleanJunction junction = builder.bool();
must(junction, createQuery(field1, term1));
must(junction, createQuery(field2, term2));
query = junction.createQuery();
void must(BooleanJunction junction, Query query) {
if (query != null) {
junction.must(query);
}
}
Query createQuery(String field, Object term) {
if(term != null) {
return builder.keyword().onField(field).matching(term).createQuery();
}
return null;
}
This would take out the "fluidity" of the BooleanJunction API, but since it's at the top-level only, I guess it's not so bad.

what about this
org.json.JSONObject json = new org.json.JSONObject();
json.put(field1, term1);
json.put(field2, term2);
...
bool = builder.bool();
for (Iterator keys = json.keys(); keys.hasNext();) {
String field = (String) keys.next();
String term = (String) json.get(field);
q = createQuery(field, term);
if (q != null) {
bool.must(q);
}
}
query = bool.createQuery();
if you have duplicate fields with different terms you must use this :
org.json.JSONObject json = new org.json.JSONObject();
json.append(field1, term1);
json.append(field2, term2);
...
bool = builder.bool();
for (Iterator keys = json.keys(); keys.hasNext();) {
String field = (String) keys.next();
JSONArray terms = (JSONArray) json.get(field);
for (int i = 0; i < terms.length(); i++) {
String term = (String) terms.get(i);
q = createQuery(field, term);
if (q != null) {
bool.must(q);
}
}
}
query = bool.createQuery();

Related

Lucene: FastVectorHighlighter returns null

Here's what I did:
String textField1 = fastVectorHighlighter.getBestFragment(fastVectorHighlighter.getFieldQuery(query), indexReader, docId, SearchItem.FIELD_TEXT_FIELD1, DEFAULT_FRAGMENT_LENGTH);
Here's the query:
((FIELD_TEXT_FIELD1:十五*)^4.0) (FIELD_TEXT_FIELD3:十五*)
The original text is correct(indexReader.document(docId).get(SearchItem.FIELD_TEXT_FIELD3) is correct.), and definitely contains characters in query.
Here's how I index textField1 :
Field textField1 = new TextField(SearchItem.FIELD_TEXT_FIELD1, "", Field.Store.YES);

Problem solved!
It turns out, I need to change
fastVectorHighlighter.getFieldQuery(query)
to
fastVectorHighlighter.getFieldQuery(query, indexReader)
Follow the code into FieldQuery#flatten, we will find Lucene doesn't deal with PrefixQuery the normal way：
} else if (sourceQuery instanceof CustomScoreQuery) {
final Query q = ((CustomScoreQuery) sourceQuery).getSubQuery();
if (q != null) {
flatten( applyParentBoost( q, sourceQuery ), reader, flatQueries);
}
} else if (reader != null) { // <<====== Here it is!
Query query = sourceQuery;
if (sourceQuery instanceof MultiTermQuery) {
MultiTermQuery copy = (MultiTermQuery) sourceQuery.clone();
copy.setRewriteMethod(new MultiTermQuery.TopTermsScoringBooleanQueryRewrite(MAX_MTQ_TERMS));
query = copy;
}
Query rewritten = query.rewrite(reader);
if (rewritten != query) {
// only rewrite once and then flatten again - the rewritten query could have a speacial treatment
// if this method is overwritten in a subclass.
flatten(rewritten, reader, flatQueries);
}
We can see it needs a IndexReader for PrefixQuery, FuzzyQuery etc.

how to improve code quality (mostly duplicates)

I have a set of variables that was passed in by a mega method in an ancient legacy code.....
public List<type> check (String required, String sales, String report,
Long passId, Long seatId, String capName, String vCapName,
String attName, Long vid) {
if(required != null) {
goodA = method(required);
goodB = methodTwo(required);
goodC = methodThree(required);
}
if(sales != null) {
goodA = method(sales);
goodB = methodTwo(sales);
goodC = methodThree(sales);
}
if(report != null) {
goodA = method(report);
goodB = methodTwo(report);
goodC = methodThree(report);
if(passId != null)
... you got the point....
}
The variables that passed into check can only be 1 valid value all other variables will become null.
For example
check("Yes",null,null,null,null,null...)
or
check(null,null,null,13212L,null,null,null,null)
right now I am trying to rewrite this into something less repetitive and clean I was wondering if anyone can provide some ideas on how to do this.

How about something like this?
List<Object> items = Lists.newArrayList(required, sales, report,
capName, vCapName, attName);
for(Object item : items) {
if(item != null){
methodOne(item);
methodTwo(item);
methodThree(item);
}
}

How build sql query with many parameters from java code?

I have sql select with parameters:
SELECT * FROM tbl t WHERE t.name = ? AND t.age = ? AND t.number = ? AND ... AND t.last_parameter = ? order by t.some desc //many parameterss
I get parameters from form's fields and some fields may be empty. I build sql string:
String sqlStatementText;
MessageFormat sqlStatementTextTemplate = new MessageFormat(Queries.WAR_GET_REPORT_COUNT);
List<Object> parametrs = new ArrayList<>();
if (null == subscriberMSISDN || subscriberMSISDN.length() == 0) {
parametrs.add("");
} else {
parametrs.add(Queries.WAR_REPORT_CALLING_NUMBER);
}
if (null == operatorID || operatorID.length() == 0) {
parametrs.add("");
} else {
parametrs.add(Queries.WAR_REPORT_OPERATOR_AVAYA_ID);
}
if (null == operatorNickname || operatorNickname.length() == 0) {
parametrs.add("");
} else {
parametrs.add(Queries.WAR_REPORT_NICKNAME);
}
if (null == msg1 || msg1.length() == 0) {
parametrs.add("");
} else {
parametrs.add(Queries.WAR_REPORT_MSG1);
}
if (null == msg2 || msg2.length() == 0) {
parametrs.add("");
} else {
parametrs.add(Queries.WAR_REPORT_MSG2);
}
sqlStatementText = sqlStatementTextTemplate.format(parametrs.toArray());
ant them i do it:
try (Connection sqlConnection = connectionPool.getConnection();
PreparedStatement sqlStatment = sqlConnection.prepareStatement(sqlStatementText)) {
int paramID = 1;
sqlStatment.setInt(paramID++, 1);
sqlStatment.setDate(paramID++, new java.sql.Date(fromDate.getTime()));
sqlStatment.setDate(paramID++, new java.sql.Date(toDate.getTime()));
if (null != subscriberMSISDN && subscriberMSISDN.length() != 0) {
sqlStatment.setString(paramID++, subscriberMSISDN);
}
if (null != operatorID && operatorID.length() != 0) {
sqlStatment.setString(paramID++, operatorID);
}
if (null != operatorNickname && operatorNickname.length() != 0) {
sqlStatment.setString(paramID++, operatorNickname);
}
if (null != msg1 && msg1.length() != 0) {
sqlStatment.setString(paramID++, msg1);
}
if (null != msg2 && msg2.length() != 0) {
sqlStatment.setString(paramID++, msg2);
}
try (ResultSet resultSet = sqlStatment.executeQuery()) {
while (resultSet.next()) {
count = resultSet.getInt(1);
}
resultSet.close();
sqlStatment.close();
sqlConnection.close();
}
But i thig it not correctly. But I dont know how build sql query with many paramaters and if some parameters maybe empty.

Switch to an ORM. They will have some form of criteria-like object.
Use the param is null or column = param SQL syntax. select x from y where (? is null OR column1 = ?)
You need to set the value of the param twice, and the input value can not legitimately be null.

There is no way to do it, given the SQL statement you have.
You need to change the SQL statement WHERE conditions from things like t.name = ? to t.name = nvl(?, t.name). Then, you can bind a NULL there and the condition will always evaluate to true (so it's not acting as a filter -- which is what you want when the user leaves the field blank).
Or -- a better approach if you can do it, it's even better to use conditions like you've got them (e.g., t.name= ?), but build the conditions dynamically based on what fields the user give you. That is, for example, if the user leaves the "name" parameter blank, just omit the t.name = ? condition entirely.
That leaves you with a shorter SQL statement that makes the Oracle optimizer's job a little bit easier. With the t.name = nvl(?, t.name) approach I gave you above, you're relying on some pretty advanced optimizer features to get the best performance, because it's not immediately clear whether, say, it would be good or bad for the optimizer to use an index on t.name.

Segregating filtered tweets based on matched keywords : Twitter4j API

I have created twitter stream filtered by some keywords as follows.
TwitterStream twitterStream = getTwitterStreamInstance();
FilterQuery filtre = new FilterQuery();
String[] keywordsArray = { "iphone", "samsung" , "apple", "amazon"};
filtre.track(keywordsArray);
twitterStream.filter(filtre);
twitterStream.addListener(listener);
What is the best way to segregate tweets based on keywords matched. e.g. All the tweets that matches "iphone" should be stored into "IPHONE" table and all the tweets that matches "samsung" will be stored into "SAMSUNG" table and so on. NOTE: The no of filter keywords is about 500.

It seems that the only way to find out to which keyword a tweet belongs to is iterating over multiple properties of the Status object. The following code requires a database service with a method insertTweet(String tweetText, Date createdAt, String keyword) and every tweet is stored in the database multiple times, if multiple keywords are found. If at least one keyword is found in the tweet text, the additional properties are not searched for more keywords.
// creates a map of the keywords with a compiled pattern, which matches the keyword
private Map<String, Pattern> keywordsMap = new HashMap<>();
private TwitterStream twitterStream;
private DatabaseService databaseService; // implement and add this service
public void start(List<String> keywords) {
stop(); // stop the streaming first, if it is already running
if(keywords.size() > 0) {
for(String keyword : keywords) {
keywordsMap.put(keyword, Pattern.compile(keyword, Pattern.CASE_INSENSITIVE));
}
twitterStream = new TwitterStreamFactory().getInstance();
StatusListener listener = new StatusListener() {
#Override
public void onStatus(Status status) {
insertTweetWithKeywordIntoDatabase(status);
}
/* add the unimplemented methods from the interface */
};
twitterStream.addListener(listener);
FilterQuery filterQuery = new FilterQuery();
filterQuery.track(keywordsMap.keySet().toArray(new String[keywordsMap.keySet().size()]));
filterQuery.language(new String[]{"en"});
twitterStream.filter(filterQuery);
}
else {
System.err.println("Could not start querying because there are no keywords.");
}
}
public void stop() {
keywordsMap.clear();
if(twitterStream != null) {
twitterStream.shutdown();
}
}
private void insertTweetWithKeywordIntoDatabase(Status status) {
// search for keywords in tweet text
List<String> keywords = getKeywordsFromTweet(status.getText());
if (keywords.isEmpty()) {
StringBuffer additionalDataFromTweets = new StringBuffer();
// get extended urls
if (status.getURLEntities() != null) {
for (URLEntity url : status.getURLEntities()) {
if (url != null && url.getExpandedURL() != null) {
additionalDataFromTweets.append(url.getExpandedURL());
}
}
}
// get retweeted status -> text
if (status.getRetweetedStatus() != null && status.getRetweetedStatus().getText() != null) {
additionalDataFromTweets.append(status.getRetweetedStatus().getText());
}
// get retweeted status -> quoted status -> text
if (status.getRetweetedStatus() != null && status.getRetweetedStatus().getQuotedStatus() != null
&& status.getRetweetedStatus().getQuotedStatus().getText() != null) {
additionalDataFromTweets.append(status.getRetweetedStatus().getQuotedStatus().getText());
}
// get retweeted status -> quoted status -> extended urls
if (status.getRetweetedStatus() != null && status.getRetweetedStatus().getQuotedStatus() != null
&& status.getRetweetedStatus().getQuotedStatus().getURLEntities() != null) {
for (URLEntity url : status.getRetweetedStatus().getQuotedStatus().getURLEntities()) {
if (url != null && url.getExpandedURL() != null) {
additionalDataFromTweets.append(url.getExpandedURL());
}
}
}
// get quoted status -> text
if (status.getQuotedStatus() != null && status.getQuotedStatus().getText() != null) {
additionalDataFromTweets.append(status.getQuotedStatus().getText());
}
// get quoted status -> extended urls
if (status.getQuotedStatus() != null && status.getQuotedStatus().getURLEntities() != null) {
for (URLEntity url : status.getQuotedStatus().getURLEntities()) {
if (url != null && url.getExpandedURL() != null) {
additionalDataFromTweets.append(url.getExpandedURL());
}
}
}
String additionalData = additionalDataFromTweets.toString();
keywords = getKeywordsFromTweet(additionalData);
}
if (keywords.isEmpty()) {
System.err.println("ERROR: No Keyword found for: " + status.toString());
} else {
// insert into database
for(String keyword : keywords) {
databaseService.insertTweet(status.getText(), status.getCreatedAt(), keyword);
}
}
}
// returns a list of keywords which are found in a tweet
private List<String> getKeywordsFromTweet(String tweet) {
List<String> result = new ArrayList<>();
for (String keyword : keywordsMap.keySet()) {
Pattern p = keywordsMap.get(keyword);
if (p.matcher(tweet).find()) {
result.add(keyword);
}
}
return result;
}

Here's how you'd use a StatusListener to interrogate the received Status objects:
final Set<String> keywords = new HashSet<String>();
keywords.add("apple");
keywords.add("samsung");
// ...
final StatusListener listener = new StatusAdapter() {
#Override
public void onStatus(Status status) {
final String statusText = status.getText();
for (String keyword : keywords) {
if (statusText.contains(keyword)) {
dao.insert(keyword, statusText);
}
}
}
};
final TwitterStream twitterStream = getTwitterStreamInstance();
final FilterQuery fq = new FilterQuery();
fq.track(keywords.toArray(new String[0]));
twitterStream.addListener(listener);
twitterStream.filter(fq);
I see the DAO being defined along the lines of:
public interface StatusDao {
void insert(String tableSuffix, Status status);
}
You would then have a DB table corresponding with each keyword. The implementation would use the tableSuffix to store the Status in the correct table, the sql would roughly look like:
INSERT INTO status_$tableSuffix$ VALUES (...)
Notes:
This implementation would insert a Status into multiple tables if a Tweet contained 'apple' and 'samsung' for instance.
Additionally, this is quite a naive implementation, you might want to consider batching inserts into the tables... but it depends on the volume of Tweets you'll be receiving.
As noted in the comments, the API considers other attributes when matching e.g. URLs and an embedded Tweet (if present) so searching the status text for a keyword match may not be sufficient.

Well, you could create a class similar to an ArrayList but make it so you can create an array of ArrayLists, call it TweetList. This class will need an insert function.
Then use two for loops to search through the tweets and find matching keywords that are contained in a normal array list, and then add them to the TweetList that matches the index of the keyword in the keywords ArrayList
for (int i = 0; i < tweets.length; i++)
{
String[] split = tweets[i].split(" ");// split the tweet up
for (int j = 0; j < split.length; j++)
if (keywords.contains(split[j]))//check each word against the keyword list
list[keywords.indexOf(j)].insert[tweets[i]];//add the tweet to the tree index that matches index of the keyword
}

Filtering data with CriteriaBuilder to compare enum values with literals not working

I have a java class with a enum field,
org.example.Importacion {
...
#Enumerated(EnumType.STRING)
private EstadoImportacion estadoImportacion;
public static enum EstadoImportacion {
NOT_VALID, IMPORTED, ERROR, VALID
}
}
When I create a Query with CriteriaBuilder and I try to compare the enum values, one from a filter to the criteriabuilder using literals, the final result of the query does not filter the enum values, so if I send org.example.Importacion.EstadoImportacion.ERROR to the iterator method, the rersult will not filter ERROR on the filnal result list.
The companyCod filters ok, so If I send "COMPANY001" as a companyCode, the querybuilder filters the final result.
I would like to know how to compare enums in the query:
public Iterator<Importacion> iterator (
long first,
long count,
String companyCod,
org.example.Importacion.EstadoImportacion estado) {
CriteriaBuilder cb = getEntityManager().getCriteriaBuilder();
CriteriaQuery<Importacion> criteria = cb.createQuery(Importacion.class);
Root<Importacion> desembolso = criteria.from(Importacion.class);
criteria.select(desembolso);
Predicate p = cb.conjunction();
if(companyCod != null) {
p = cb.and(p, cb.equal(desembolso.get("codigo"), companyCod));
//This part works fine!
}
if (estado != null) {
Expression<org.example.Importacion.EstadoImportacion> estadoImportacion = null;
if (estado.equals(org.example.Importacion.EstadoImportacion.ERROR)) {
estadoImportacion = cb.literal(org.example.Importacion.EstadoImportacion.ERROR);
}
if (estado.equals(org.example.Importacion.EstadoImportacion.IMPORTED)) {
estadoImportacion = cb.literal(org.example.Importacion.EstadoImportacion.IMPORTED);
}
if (estado.equals(org.example.Importacion.EstadoImportacion.NOT_VALID)) {
estadoImportacion = cb.literal(org.example.Importacion.EstadoImportacion.NOT_VALID);
}
if (estado.equals(org.example.Importacion.EstadoImportacion.VALID)) {
estadoImportacion = cb.literal(org.example.Importacion.EstadoImportacion.VALID);
}
p = cb.and(p, cb.equal(estadoImportacion, cb.literal(estado)));
//Doesn't seems to compare enum values
}
criteria.where(p);
javax.persistence.Query query = em.createQuery(criteria);
query.setMaxResults((int)count + (int)first + 1);
query.setFirstResult((int)first);
List resultList = query.getResultList();
Iterator iterator = (Iterator) resultList.iterator();
LOGGER.info("desembolso size: {}", resultList.size());
return iterator;
}

Your criteria compares a literal with the enum. That's not what you want. You want to compare the Importacion's estadoImportacion with the given estado:
Predicate p = cb.conjunction();
if(companyCod != null) {
p = cb.and(p, cb.equal(desembolso.get("codigo"), companyCod));
}
if (estado != null) {
p = cb.and(p, cb.equal(desembolso.get("estadoImportacion"), estado));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hibernate search - handling null in boolean query - java

Related

Lucene: FastVectorHighlighter returns null

how to improve code quality (mostly duplicates)

How build sql query with many parameters from java code?

Segregating filtered tweets based on matched keywords : Twitter4j API

Filtering data with CriteriaBuilder to compare enum values with literals not working

Categories

Resources