Lucene: How to search by specific term

Lucene: How to search by specific term - java

I'm trying to do a Lucene search by a specific string term.
Eg: I had the tags 1-"Hello World", 2-"Hello, Steve", 3-"Helloween" and finally 4-"Hello" if I look for the last tag (hello), Lucene will bring all tags, because all of them had "hello" at some point. I need an operator or a logic that makes the search without "like".
There is a way to avoid this using the clause "must_not" (- operator) and the query will be:
term:hello -term:world. But this is not the case, cause I will need to find all other words that should not be in search.
private <T> Query createQuery(final Class<T> clazz, String s, final String[] fields, final SearchFactory searchFactory, final Boolean allowLeadingWildcard) throws ParseException {
final Analyzer analyzer = searchFactory.getAnalyzer(clazz);
final QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36, fields, analyzer);
Query query = null;
try{
query = parser.parse(s);
} catch(...){...}
return query;
My knowledge of Lucene is short, so I will place an SQL example to see if will be easier to understand
/*This is what Lucene is doing. It will bring "HELLO", "HELLO WORLD", "Hello, Steve"...*/
WHERE table.tag LIKE "%HELLO%"
/*This is what I want. Match exactly the term "HELLO" and nothing more*/
WHERE table.tag = "HELLO"
I guess that this is the Analyzer used in the application:
public class AnalyserCustom extends Analyzer {
#Override
public TokenStream tokenStream(final String fieldName, final Reader reader) {
final StandardTokenizer tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);
TokenStream stream = new StandardFilter(Version.LUCENE_36, tokenizer);
stream = new LowerCaseFilter(Version.LUCENE_36, stream);
return new ASCIIFoldingFilter(stream);
}
}
And the attribute TAG is this:
...
#Field
private String tagname;
...
Any suggestions?
PS: I'm new to Lucene.

You have to use to index the field, that will generate one specific token for the searched string, try with KeywordAnalyzer.

Related

Lucene LongPoint Range search doesn't work

I am using Lucene 8.2.0 in Java 11.
I am trying to index a Long value so that I can filter by it using a range query, for example like so: +my_range_field:[1 TO 200]. However, any variant of that, even my_range_field:[* TO *], returns 0 results in this minimal example. As soon as I remove the + from it to make it an OR, I get 2 results.
So I am thinking I must make a mistake in how I index it, but I can't make out what it might be.
From the LongPoint JavaDoc:
An indexed long field for fast range filters. If you also need to store the value, you should add a separate StoredField instance.
Finding all documents within an N-dimensional shape or range at search time is efficient. Multiple values for the same field in one document is allowed.
This is my minimal example:
public static void main(String[] args) {
Directory index = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer();
try {
IndexWriter indexWriter = new IndexWriter(index, new IndexWriterConfig(analyzer));
Document document1= new Document();
Document document2= new Document();
document1.add(new LongPoint("my_range_field", 10));
document1.add(new StoredField("my_range_field", 10));
document2.add(new LongPoint("my_range_field", 100));
document2.add(new StoredField("my_range_field", 100));
document1.add(new TextField("my_text_field", "test content 1", Field.Store.YES));
document2.add(new TextField("my_text_field", "test content 2", Field.Store.YES));
indexWriter.deleteAll();
indexWriter.commit();
indexWriter.addDocument(document1);
indexWriter.addDocument(document2);
indexWriter.commit();
indexWriter.close();
QueryParser parser = new QueryParser("text", analyzer);
IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(index));
String luceneQuery = "+my_text_field:test* +my_range_field:[1 TO 200]";
Query query = parser.parse(luceneQuery);
System.out.println(indexSearcher.search(query, 10).totalHits.value);
} catch (IOException e) {
} catch (ParseException e) {
}
}

You need to first use StandardQueryParser, then provide the parser with a PointsConfig map, essentially hinting which fields are to be treated as Points. You'll now get 2 results.
// Change this line to the following
StandardQueryParser parser = new StandardQueryParser(analyzer);
IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(dir));
/* Added code */
PointsConfig longConfig = new PointsConfig(new DecimalFormat(), Long.class);
Map<String, PointsConfig> pointsConfigMap = new HashMap<>();
pointsConfigMap.put("my_range_field", longConfig);
parser.setPointsConfigMap(pointsConfigMap);
/* End of added code */
String luceneQuery = "+my_text_field:test* +my_range_field:[1 TO 200]";
// Change the query to the following
Query query = parser.parse(luceneQuery, "text");

I found the solution to my problem.
I was under the impression that the query parser could just parse any query string correctly. That doesn't seem to be the case.
Using
Query rangeQuery = LongPoint.newRangeQuery("my_range_field", 1L, 11L);
Query searchQuery = new WildcardQuery(new Term("my_text_field", "test*"));
Query build = new BooleanQuery.Builder()
.add(searchQuery, BooleanClause.Occur.MUST)
.add(rangeQuery, BooleanClause.Occur.MUST)
.build();
returned the correct result.

Univocity - Is it possible to parse a file to a runtime generated bean/class?

I am using univocity to parse some files to javabeans. These beans are compiled classes. However I wish to generate these classes during runtime and then parse the files to the at runtime generated classes.
Full code is here: gist
A snippet of the code that uses the Univocity library:
private static void parseBean(final Class<?> dynamicClass) throws FileNotFoundException {
#SuppressWarnings("unchecked")
final BeanListProcessor<?> rowProcessor = new BeanListProcessor<Class<?>>((Class<Class<?>>) dynamicClass);
final CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(false);
parserSettings.getFormat().setDelimiter('|');
parserSettings.setEmptyValue("");
parserSettings.setNullValue("");
final CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader("src/main/resources/person.csv"));
final List<?> beans = rowProcessor.getBeans();
for (final Object domain : beans) {
final Domain domainImpl = (Domain) domain;
System.out.println("Person id is: " + domainImpl.getIdentifier());
System.out.println("Person name is: " + domainImpl.getColumnByIndex(1));
System.out.println();
}
}
The file looks like this:
0|Eric
1|Maria
All the values seems to be null, so something is going wrong when parsing the file and mapping it to the bean...
Person id is: null
Person name is: null
Is it possible to parse files to runtime generated beans/classes using the Univocity library?

The problem here is that your code is not generating the #Parsed annotations correctly. Check this:
Object o = dynamicClass.newInstance();
Field f = dynamicClass.getDeclaredField("id");
f.setAccessible(true);
java.lang.annotation.Annotation[] annotations = f.getAnnotations();
System.out.println(Arrays.toString(annotations));
You will get an empty annotation array. I've fixed your code to generate the annotations properly:
Change your addAnnotation method to this:
private static void addAnnotation(final CtClass clazz, final String fieldName, final String annotationName, String member, int memberValue) throws Exception {
final ClassFile cfile = clazz.getClassFile();
final ConstPool cpool = cfile.getConstPool();
final CtField cfield = clazz.getField(fieldName);
final AnnotationsAttribute attr = new AnnotationsAttribute(cpool, AnnotationsAttribute.visibleTag);
final Annotation annot = new Annotation(annotationName, cpool);
annot.addMemberValue(member, new IntegerMemberValue(cpool, memberValue));
attr.addAnnotation(annot);
cfield.getFieldInfo().addAttribute(attr);
}
And call it like this:
addAnnotation(cc, "id", "com.univocity.parsers.annotations.Parsed","index", 0);
With this change, I can parse a sample input such as this:
parser.parse(new StringReader("0|John|12-04-1986"));
And will get the following output:
Person id is: 0
Person name is: John
Hope this helps.

How to write ArrayList<Object> to a csv file

I have a ArrayList<Metadata> and i want to know if there is a Java API for working with CSV files which has a write method which accepts a ArrayList<> as parameter similar to LinqToCsv in .Net. As i know OpenCSV is available but the CsvWriter class doesn't accept a collection.
My Metadata Class is
public class Metadata{
private String page;
private String document;
private String loan;
private String type;
}
ArrayList<Metadata> record = new ArrayList<Metadata>();
once i populate the record, i want to write each row into a csv file.
Please suggest.

Surely there'll be a heap of APIs that will do this for you, but why not do it yourself for such a simple case? It will save you a dependency, which is a good thing for any project of any size.
Create a toCsvRow() method in Metadata that joins the strings separated by a comma.
public String toCsvRow() {
return Stream.of(page, document, loan, type)
.map(value -> value.replaceAll("\"", "\"\""))
.map(value -> Stream.of("\"", ",").anyMatch(value::contains) ? "\"" + value + "\"" : value)
.collect(Collectors.joining(","));
}
Collect the result of this method for every Metadata object separated by a new line.
String recordAsCsv = record.stream()
.map(Metadata::toCsvRow)
.collect(Collectors.joining(System.getProperty("line.separator")));
EDIT
Should you not be so fortunate as to have Java 8 and the Stream API at your disposal, this would be almost as simple using a traditional List.
public String toCsvRow() {
String csvRow = "";
for (String value : Arrays.asList(page, document, loan, type)) {
String processed = value;
if (value.contains("\"") || value.contains(",")) {
processed = "\"" + value.replaceAll("\"", "\"\"") + "\"";
}
csvRow += "," + processed;
}
return csvRow.substring(1);
}

By using CSVWriter, you could convert the ArrayList to an array, and pass that to the writer .
csvWriter.writeNext(record.toArray(new String[record.size()]));

If you have an ArrayList of Objects (Metadata in your case) you would use the BeanToCSV instead of the CSVWriter.
You can look at the BeanToCSVTest in the opencsv source code for examples of how to use it.

Add field with value to existing document in MongoDB via Java API

The following code haven't worked for me:
public void addFieldWithValueToDoc(String DBName, String collName, String docID, String key, String value) {
BasicDBObject setNewFieldQuery = new BasicDBObject().append("$set", new BasicDBObject().append(key, value));
mongoClient.getDB(DBName).getCollection(collName).update(new BasicDBObject().append("_id", docID), setNewFieldQuery);
}
Where mongoClient variable's type is MongoClient.
It's inspired by Add new field to a collection in MongoDB .
What's wrong and how to do it right?
Thanks.

I've written a JUnit test to prove that your code does work:
#Test
public void shouldUpdateAnExistingDocumentWithANewKeyAndValue() {
// Given
String docID = "someId";
collection.save(new BasicDBObject("_id", docID));
assertThat(collection.find().count(), is(1));
// When
String key = "newKeyName";
String value = "newKeyValue";
addFieldWithValueToDoc(db.getName(), collection.getName(), docID, key, value);
// Then
assertThat(collection.findOne().get(key).toString(), is(value));
}
public void addFieldWithValueToDoc(String DBName, String collName, String docID, String key, String value) {
BasicDBObject setNewFieldQuery = new BasicDBObject().append("$set", new BasicDBObject().append(key, value));
mongoClient.getDB(DBName).getCollection(collName).update(new BasicDBObject().append("_id", docID), setNewFieldQuery);
}
So your code is correct, although I'd like to point out some comments on style that would make it more readable:
Parameters and variables should start with a lower-case letter. DBName should be dbName,
You don't need new BasicDBObject().append(key, value) use new BasicDBObject(key, value)
This code does the same thing as your code, but is shorter and simpler:
public void addFieldWithValueToDoc(String dbName, String collName, String docID, String key, String value) {
mongoClient.getDB(dbName).getCollection(collName).update(new BasicDBObject("_id", docID),
new BasicDBObject("$set", new BasicDBObject(key, value)));
}

To update existing documents in a collection, you can use the collection’s updateOne() or updateMany methods.
updateOne method has the following form:
db.collection.updateOne(filter, update, options)
filter - the selection criteria for the update. The same query selectors as in the find() method are available.
Specify an empty document { } to update the first document returned in
the collection.
update - the modifications to apply.
So, if you want to add one more field using Mongodb Java driver 3.4+, it will be:
collection.updateOne(new Document("flag", true),
new Document("$set", new Document("title", "Portable Space Ball")));
The following operation updates a single document where flag:true
Or in the same logic:
collection.updateOne(eq("flag", true),
new Document("$set", new Document("title", "Portable Space Ball")));
If the title field does not exist, $set will add a new field with the specified value, provided that the new field does not violate a type constraint. If you specify a dotted path for a non-existent field, $set will create the embedded documents as needed to fulfill the dotted path to the field.

Is there a way to render a string like 'Hello, %(name)s' % {'name':'Felix'} in Java?

In Python we can do this easily:
data = {'name':'Felix'}
s = 'Hello, %(name)s' % data
s
'Hello, Felix'
Is there a similar way in Java to implement the same thing?
PS:
Sorry for the unclear question. the use case is : we have a map which stores the key-values, the Template only need to specify a key in the map, then the value of the key will be in the place where the key is in the template.

AFAIK you can use String#format for this:
String name = "Felix";
String s = String.format("Hello, %s", name);
System.out.println(s);
This will print
Hello, Felix
More info about how to use the formatting of String#format can be found on java.util.Formatter syntax

You want String.format method.
String data = "Hello, %s";
String updated = String.format(data, "Felix");

If you want to replace only Strings with Strings then code from second part of my answer will be better
Java Formatter class doesn't support %(key)s form, but instead you can use %index$s where index is counted from 1 like in this example
System.out.format("%3$s, %2$s, %1s", "a", "b", "c");
// indexes 1 2 3
output:
c, b, a
So all you need to do is create some array that will contain values used in pattern and change key names to its corresponding indexes (increased by 1 since first index used by Formatter is written as 1$ not as 0$ like we would expect for arrays indexes).
Here is example of method that will do it for you
// I made this Pattern static and put it outside of method to compile it only once,
// also it will match every (xxx) that has % before it, but wont include %
static Pattern formatPattern = Pattern.compile("(?<=%)\\(([^)]+)\\)");
public static String format(String pattern, Map<String, ?> map) {
StringBuffer sb = new StringBuffer();
List<Object> valuesList = new ArrayList<>();
Matcher m = formatPattern.matcher(pattern);
while (m.find()) {
String key = m.group(1);//group 1 contains part inside parenthesis
Object value = map.get(key);
// If map doesn't contain key, value will be null.
// If you want to react somehow to null value like throw some
// Exception
// now is the good time.
if (valuesList.contains(value)) {
m.appendReplacement(sb, (valuesList.indexOf(value) + 1) + "\\$");
} else {
valuesList.add(value);
m.appendReplacement(sb, valuesList.size() + "\\$");
}
}
m.appendTail(sb);
return String.format(sb.toString(), valuesList.toArray());
}
usage
Map<String, Object> map = new HashMap<>();
map.put("name", "Felix");
map.put("age", 70);
String myPattern =
"Hi %(emptyKey)s! My name is %(name)s %(name)s and I am %(age)s years old";
System.out.println(format(myPattern, map));
output:
Hi null! My name is Felix Felix and I am 70 years old
As you can see you can use same key few times (in our case name) and if your map wont contain key used in your String pattern (like emptyKey) it will be replaced with null.
Above version was meant to let you set type of data like s d and so on, but if your data will always be replaced with Strings, then you can skip String.format(sb.toString(), valuesList.toArray()) and replace all your keys with values earlier.
Here is simpler version that will accept only map with <String,String> key-value relationship.
static Pattern stringsPattern = Pattern.compile("%\\(([^)]+)\\)s\\b");
public static String formatStrings(String pattern, Map<String, String> map) {
StringBuffer sb = new StringBuffer();
Matcher m = stringsPattern.matcher(pattern);
while (m.find()) {
// we can't use null as replacement so we need to convert it to String
// first. We can do it with String.valueOf method
m.appendReplacement(sb, String.valueOf(map.get(m.group(1))));
}
m.appendTail(sb);
return sb.toString();
}

Under this use case, you need a template engine like velocity or freemarker to use a Map-like data structure to render a string template, there is no builtin module in java to do that. like this(with velocity):
public static void main(String[] args) {
Context context = new VelocityContext();
context.put("appid", "9876543d1");
context.put("ds", "2013-09-11");
StringWriter sw = new StringWriter();
String template = "APPID is ${appid} and DS is ${ds}";
Velocity.evaluate(context, sw, "velocity", template);
System.out.println(sw.toString());
}

If you want more advanced techniques like i18n support, you can use the advanced Message Format features
ex:
in langage properties files you add the property 'template' wich is your message
template = At {2,time,short} on {2,date,long}, \
we detected {1,number,integer} spaceships on \
the planet {0}.
then you can format your valriables pass the arguments in an array:
Object[] messageArguments = {
"Mars",
new Integer(7),
new Date()
};
You call the formatter it this way:
MessageFormat formatter = new MessageFormat("");
formatter.setLocale(currentLocale);
formatter.applyPattern(messages.getString("template"));
String output = formatter.format(messageArguments);
the detailed example is here
http://docs.oracle.com/javase/tutorial/i18n/format/messageFormat.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Lucene: How to search by specific term - java

You have to use to index the field, that will generate one specific token for the searched string, try with KeywordAnalyzer.

Related

Lucene LongPoint Range search doesn't work

Univocity - Is it possible to parse a file to a runtime generated bean/class?

How to write ArrayList<Object> to a csv file

Add field with value to existing document in MongoDB via Java API

Is there a way to render a string like 'Hello, %(name)s' % {'name':'Felix'} in Java?

Categories

Resources