MongoDB Text Index using Java Driver - java

Using the MongoDB Java API, I have not been able to successfully locate a full example using text search. The code I am using is this:
DBCollection coll;
String searchString = "Test String";
coll.createIndex(new BasicDBObject ("blogcomments", "text"));
DBObject q = start("blogcomments").text(searchString).get();
The name of my collection that I am performing the search on is blogcomments. creatIndex() is the replacement method for the deprecated method ensureIndex(). I have seen examples for how to use the createIndex(), but not how to execute actual searches with the Java API. Is this the correct way to go about doing this?

That's not quite right. Queries that use indexes of type "text" can not specify a field name at query time. Instead, the field names to include in the index are specified at index creation time. See the documentation for examples. Your query will look like this:
DBObject q = QueryBuilder.start().text(searchString).get();

Related

Mongodb Java access collection using regex

All of the documents in my collection contain a string field, "sourceTimeStamp" which looks like for example, 2018-11-15T14:20:06. I am trying to come up with a way to get a particular day's worth of data. I can access the data directly from RoboMongo using:
db.getCollection('archive_Nov_15_8pm_2018').find({ "tfms_object.sourceTimeStamp" : { $regex : /^2018-11-25*/}})
This returns many documents. But I need to do this using JAVA so I tried this:
DBCollection collection = db.getCollection(ARCHIVE_COLLECTION);
Pattern pat = Pattern.compile("^2018-11-15.*");
BasicDBObject query = new BasicDBObject("departureTime", pat);
List<BasicDBObject> obj = new ArrayList<BasicDBObject>();
query.put("$and", obj);
However, I get 0 documents returned. Any ideas?

Apache Lucene createWeight() for wildcard query

I'm using Apache Lucene 6.6.0 and I'm trying to extract terms from the search query. Current version of code looks like this:
Query parsedQuery = new AnalyzingQueryParser("", analyzer).parse(query);
Weight weight = parsedQuery.createWeight(searcher, false);
Set<Term> terms = new HashSet<>();
weight.extractTerms(terms);
It works pretty much fine, but recently I noticed that it doesn't support queries with wildcards (i.e. * sign). If the query contains wildcard(s), then I get an exception:
java.lang.UnsupportedOperationException: Query
id:123*456 does not implement createWeight at
org.apache.lucene.search.Query.createWeight(Query.java:66) at
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:751)
at
org.apache.lucene.search.BooleanWeight.(BooleanWeight.java:60)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:225)
So is there a way to use createWeight() with wildcarded queries? Or maybe there's another way to extract search terms from query without createWeight()?
Long story short, it is necessary to rewrite the query, for example, as follows:
final AnalyzingQueryParser analyzingQueryParser = new AnalyzingQueryParser("", analyzer);
// TODO: The rewrite method can be overridden.
// analyzingQueryParser.setMultiTermRewriteMethod(MultiTermQuery.CONSTANT_SCORE_BOOLEAN_REWRITE);
Query parsedQuery = analyzingQueryParser.parse(query);
// Here parsedQuery is an instance of the org.apache.lucene.search.WildcardQuery class.
parsedQuery = parsedQuery.rewrite(reader);
// Here parsedQuery is an instance of the org.apache.lucene.search.MultiTermQueryConstantScoreWrapper class.
final Weight weight = parsedQuery.createWeight(searcher, false);
final Set<Term> terms = new HashSet<>();
weight.extractTerms(terms);
Please refer to the thread:
Nabble: Lucene - Java Users - How to get the terms matching a WildCardQuery in Lucene 6.2?
Mail archive: How to get the terms matching a WildCardQuery in Lucene 6.2?
for further details.
It seems the mentioned Stack Overflow question is this one: How to get matches from a wildcard Query in Lucene 6.2.

Java method for MongoDB collection.save()

I'm having a problem with MongoDB using Java when I try adding documents with customized _id field. And when I insert new document to that collection, I want to ignore the document if it's _id has already existed.
In Mongo shell, collection.save() can be used in this case but I cannot find the equivalent method to work with MongoDB java driver.
Just to add an example:
I have a collection of documents containing websites' information
with the URLs as _id field (which is unique)
I want to add some more documents. In those new documents, some might be existing in the current collection. So I want to keep adding all the new documents except for the duplicate ones.
This can be achieve by collection.save() in Mongo Shell but using MongoDB Java Driver, I can't find the equivalent method.
Hopefully someone can share the solution. Thanks in advance!
In the MongoDB Java driver, you could try using the BulkWriteOperation object with the initializeOrderedBulkOperation() method of the DBCollection object (the one that contains your collection). This is used as follows:
MongoClient mongo = new MongoClient("localhost", port_number);
DB db = mongo.getDB("db_name");
ArrayList<DBObject> objectList; // Fill this list with your objects to insert
BulkWriteOperation operation = col.initializeOrderedBulkOperation();
for (int i = 0; i < objectList.size(); i++) {
operation.insert(objectList.get(i));
}
BulkWriteResult result = operation.execute();
With this method, your documents will be inserted one at a time with error handling on each insert, so documents that have a duplicated id will throw an error as usual, but the operation will still continue with the rest of the documents. In the end, you can use the getInsertedCount() method of the BulkWriteResult object to know how many documents were really inserted.
This can prove to be a bit ineffective if lots of data is inserted this way, though. This is just sample code (that was found on journaldev.com and edited to fit your situation.). You may need to edit it so it fits your current configuration. It is also untested.
I guess save is doing something like this.
fun save(doc: Document, col: MongoCollection<Document>) {
if (doc.getObjectId("_id") != null) {
doc.put("_id", ObjectId()) // generate a new id
}
col.replaceOne(Document("_id", doc.getObjectId("_id")), doc)
}
Maybe they removed save so you decide how to generate the new id.

Using regular Expressions with Mongodb

I am using Java Spring to work with Mongodb. I need to find documents which the word 'manager' is existed in description field. I tried following two method
Method 1
Query query = new Query();
query.addCriteria(Criteria.where("discription").regex("/\bmanager\b/"));
Method 2
Query query = new Query();
Pattern p = Pattern.compile("/\bmanager\b/");
query.addCriteria(Criteria.where("discription").regex(p));
But none of these were worked. I tried it with mongodb console like this
db.test.find({discription: {$regex: /\bmanager\b/}})
It worked as I expected. What's wrong with my Java code.
You don't have to add the slashes in the regex expression, as the regex method takes care of it. So
Query query = new Query();
query.addCriteria(Criteria.where("description").regex("\bmanager\b"));
should work.
It looks like you can just pass your regex string straight through without using Pattern.compile(). Have you tried that?

How to query mongodb with “like” using the java api without using Pattern Matching?

Currently I am using java to connect to MONGODB,
I want to write this sql query in mongodb using java driver:
select * from tableA where name like("%ab%")
is their any solution to perform the same task through java,
the query in mongodb is very simple i know, the query is
db.collection.find({name:/ab/})
but how to perform same task in java
Current I am using pattern matching to perform the task and code is
DBObject A = QueryBuilder.start("name").is(Pattern.compile("ab",
Pattern.CASE_INSENSITIVE)).get();
but it makes query very slow I think , does a solution exist that does not use pattern matching?
Can use Regular Expressions. Take a look at the following:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RegularExpressions
Make sure you understand the potential performance impacts!
DBObject A = QueryBuilder.start("name").is(Pattern.compile("ab",
Pattern.CASE_INSENSITIVE)).get();
I think this is one of the possible solution, you need to create index to achieve those.
Why do you fear the regular expressions? Once the expression is compiled they are very fast, and if the expression is "ab" the result is similar to a function that search a substring in a string.
However to do what you need you have 2 possibilities:
The first one, using regular expression, as you mention in your question. And I believe this is the best solution.
The second one, using the $where queries.
With $where queries you can specify expression like these
db.foo.find({"$where" : "this.x + this.y == 10"})
db.foo.find({"$where" : "function() { return this.x + this.y == 10; }"})
and so you can use the JavaScript .indexOf() on string fields.
Code snippet using the $regex clause (as mentioned by mikeycgto)
String searchString = "ab";
DBCollection coll = db.getCollection("yourCollection");
query.put("name",
new BasicDBObject("$regex", String.format(".*((?i)%s).*", searchString)) );
DBCursor cur = coll.find(query);
while (cur.hasNext()) {
DBObject dbObj = cur.next();
// your code to read the DBObject ..
}
As long as you are not opening and closing the connection per method call, the query should be fast.

Categories

Resources