How to query Marklogic with punctuation-sensitve terms using JAVA? - java

I have the following info stored in Marklogic for the json files as follows.
1.json>> "dateSubmitted" : "2017/10/11 09:15:14"
2.json>> "dateSubmitted" : "2017/10/11 10:13:14"
3.json>> "dateSubmitted" : "2017/10/14 11:12:13"
My query term is:
String dateQuery = "2017/10/11";
I tried 2 methods and none seems to be working.
Method 1:
StructuredQueryBuilder qb = new StructuredQueryBuilder();
QueryDefinition queryDef = qb.and(qb.word((qb.jsonProperty("dateSubmitted"),dateQuery)));
queryDef.setDirectory(DIRECTORY);
SearchHandle resultsHandle = new SearchHandle();
queryManager.search(queryDef, resultsHandle, start);
Method 2:
StructuredQueryBuilder qb = new StructuredQueryBuilder();
String[] wordQueryOptions = {"punctuation-sensitive", "space-sensitive"};
QueryDefinition queryDef = qb.and(qb.word((qb.jsonProperty("dateSubmitted"),
FragmentScope.DOCUMENTS,
wordQueryOptions,100.0,dateQuery)));
queryDef.setDirectory(DIRECTORY);
SearchHandle resultsHandle = new SearchHandle();
queryManager.search(queryDef, resultsHandle, start);
The expected result is to return only 1.json and 2.json.
However 3.json was also returned.
Is there some settings I'm missing in my Marklogic admin to activate options or punctuation-sensitive?

Working with dates is often easier and more powerful if you index the property as a date. That way, you can do before and after matches on the date as well as sort on the date.
To index a property as a date, you can create a range index on the date. You can then use a range query on the date.
In MarkLogic 9, you can also use TDE to project rows from the documents with a column for the dates.
Hoping that helps,

Related

Unable to parse 2022-10-04T19:24:50Z format in ElasticSearch Java Implemnetation

SearchRequest searchRequest = Requests.searchRequest(indexName);
SearchSourceBuilder builder = new SearchSourceBuilder();
Gson gson = new Gson();
QueryBuilder querybuilder = QueryBuilders.wrapperQuery(query);
query : {
"range": {
"timecolumn": {
"gte":"2022-10-07T09:45:13Z",
"lte":"2022-10-07T09:50:50Z"
}
}
}
While passing the above Query I am getting Parser Exception , I cannot change the date format as data in DB is getting inserted in same format .
Need Advice on :
How can we parse this kind of timestamp in ElasticSearch Java , if not
How can we control pattern updation during data insertion like my column in defined as text which takes date format "2022-10-07T09:45:13Z" as text .
either I have to pass this format in ES Parser or I have to change format to 2022-10-07 09:45:13 during insertion itself .
I cannot convert for each row after inserting because we have lakhs of data
As you are mentioning, Elasticsearch storing timecolumn as text type hence i will suggest to change mapping of timecolumn to date type and you will be able to use range query with date. Because if you store date as text and applied range then it will not return a expected result.
{
"mappings": {
"properties": {
"timecolumn": {
"type": "date"
}
}
}
}
Now coming to your Java code issue, You can use below code for creating range query in Java as you are using Java client.
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder query = QueryBuilders.rangeQuery("timecolumn").gte("2022-10-07T09:45:13Z").lte("2022-10-07T09:50:50Z");
searchSourceBuilder.query(query);
searchRequest.source(searchSourceBuilder);
Regarding your concern about reindexing data:
I cannot convert for each row after inserting because we have lakhs of
data
You can use Reindex API to move data from original index to temp index and delete your original index. Then, defined the index with proper mapping and again use same Reindex API to copy data from temp index to original index with new mapping.

Count results with MongoDB 3.0 Java Driver

I just want to get the number of results of some query. Specifically I want to know how much users were online in the past 15 minutes. So, I set the connection up with:
mongoClient = new MongoClient("localhost", 3001);
database = mongoClient.getDatabase("database1");
Then in my method i get the collection and send a query...:
MongoCollection<Document> users = database.getCollection("users");
users.find(and(gte("lastlogin",xvminago),lte("lastlogin",now)
I'm not even sure if the last step is right. But it seems so easy in Javascript and this .count()-opereration which I can't find in Java. And the documentation(s), are weird and somehow all diffrent. (I use the MongoDB Java Driver 3.0)
Use MongoCollection's count() method, applying a query filter which makes use of the Datetime object from the Joda-Time library that simplifies date manipulation in java. You can check that out here. Basically create a datetime object 15 minutes from current time:
DateTime dt = new DateTime();
DateTime now = new DateTime();
DateTime subtracted = dt.minusMinutes(15);
Then use the variables to construct a date range query for use in the count() method:
Document query = new Document("lastlogin", new Document("$gte", subtracted).append("$lte", now));
mongoClient = new MongoClient("localhost", 3001);
long count = mongoClient.getDatabase("database1")
.getCollection("users")
.count(query);
On a sharded cluster, the underlying db.collection.count() method can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress. So it's safer to use aggregate() method instead:
Iterator<Document> it = mongoClient.getDatabase("database1")
.getCollection("users")
.aggregate(Arrays.asList(
new Document("$match", new Document("lastlogin",
new Document("$gte", subtracted).append("$lte", now))
),
new Document("$group", new Document("_id", null)
.append("count",
new Document("$sum", 1)
)
)
)
).iterator();
int count = it.hasNext() ? (Integer)it.next().get("count") : 0;

Elastic search range dates

I have created an Elastic search index from a Mongo database.
The documents in Mongo have the following structure:
{
"_id" : ObjectId("525facace4b0c1f5e78753ea"),
"time" : ISODate("2013-10-17T09:23:56.131Z"),
"type" : "A",
"url" : "www.google.com",
"name" : "peter",
}
The index was created (apparently) without any problems.
Now, I am trying to use Elastic Search to retrieve the documents in the index between two dates. I have read that I have to use range queries, but I have tried many times things like
MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("name", "peter").type(Type.PHRASE).minimumShouldMatch("99%");
LocalDateTime toLocal = new LocalDateTime(2013,12,18, 0, 0);
Date to = toLocal.toDate();
LocalDateTime fromLocal = new LocalDateTime(2013,12,17, 0, 0);
Date from = fromLocal.toDate();
RangeQueryBuilder queryDate = QueryBuilders.rangeQuery("time").to(to).from(from);
FilterBuilder filterDate = FilterBuilders.queryFilter(queryDate);
srb = esH.client.prepareSearch("my_index");
srb.setQuery(queryBuilder);
srb.setFilter(filterDate);
sr = srb.execute().actionGet();
and I get 0 hits although there should be many results. I have tried to enter strings instead of dates, but same results.
When I perform a basic query without filters such as:
MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("name", "peter").type(Type.PHRASE).minimumShouldMatch("99%");
SearchRequestBuilder srb = esH.client.prepareSearch("my_index");
rb.setQuery(queryBuilder);
SearchResponse sr = srb.execute().actionGet();
I get hits with that look like this:
{
"_index" : "my_index",
"_type" : "type",
"_id" : "5280d3c2e4b05e95aa703e34",
"_score" : 1.375688, "_source" : {"type":["A"],"time":["Mon Nov 11 13:55:30 CET 2013"],"name":["peter"]}
}
Where the field time does not have the format ISODate("2013-10-17T09:23:56.131Z")anymore.
To sum up, what would be the Java code (and types) for querying between two dates (and times), taking into account the format?
You are probably passing the wrong field name to the range query at this line:
RangeQueryBuilder queryDate = QueryBuilders.rangeQuery("time").to(to).from(from);
It should probably be #timestamp (or the field you're using to store your timestamp) instead of time. Additionally, it seems that there is no time field in Elasticsearch for the example document you included. This also points to the issue that the time field wasn't converted correctly from Mongo to Elasticsearch.
Can you try
FilterBuilders.rangeFilter("#timestamp").from("from time").to("toTime")
This will work -
You can pass in Long timestamps to the gte and lte params.
QueryBuilders.rangeQuery("time").gte(startTime).lte(endTime);
Make sure to add an "L" at the end of the startTime and endTime, so that it knows its a long and not an int.

Mongodb + Java Drivers. Search by date range

This is my first shot at using Mongodb with the java drivers. I can query the database via command line using javascript and the Date() object, however, I am having trouble using the driver. Based on my query, can anybody see what the problem is? Thanks
Date current = new Date();
DBCollection coll = db.getCollection("messages");
BasicDBObject query = new BasicDBObject("created_on", new BasicDBObject("$gte", new Date(current.getYear(), current.getMonth(), current.getDate())).
append("created_on", new BasicDBObject("$lt", new Date(current.getYear(), current.getMonth() - 1, current.getDate()))));
System.out.println("Query: " + query);
DBCursor cursor = coll.find(query);
Query: { "created_on" : { "$gte" : { "$date" :
"2012-12-06T05:00:00.000Z"} , "created_on" : { "$lt" : { "$date" :
"2012-11-06T05:00:00.000Z"}}}}
P.S. In case it is not obvious, I'm trying to find all of the records within the last month.
Seems like you are constructing the query wrong. Please try the below one:
BasicDBObject query = new BasicDBObject("created_on", //
new BasicDBObject("$gte", new DateTime().toDate()).append("$lt", new DateTime().toDate()));
Datetime object is a library which simplies date manipulation in java. You can check that out.
http://joda-time.sourceforge.net/
Also morphia is a nice java object-document-mapper (ODM) framework for working with mongodb through java driver. It simplifies querying through java.
https://github.com/jmkgreen/morphia
Based on the query that was output, you are looking for a document with a field created_on that also has a child named created_on. I assume no such document exists. In other words, you query is not correctly formed.
Your query object should look like this:
BasicDBObject dateRange = new BasicDBObject ("$gte", new Date(current.getYear(), current.getMonth(), current.getDate());
dateRange.put("$lt", new Date(current.getYear(), current.getMonth() - 1, current.getDate());
BasicDBObject query = new BasicDBObject("created_on", dateRange);
Also, as a sidebar, you probably should avoid using the three-argument constructor of the java.util.Date class, as it is deprecated. When working with dates in the MongoDB Java driver, I typically use the java.util.Calendar class, and its getTime() method.
I have not used the Java driver for mongo before, but it seems that the query you have created is not correct.
Query: { "created_on" : { "$gte" : { "$date" : "2012-12-06T05:00:00.000Z"} , "created_on" : { "$lt" : { "$date" : "2012-11-06T05:00:00.000Z"}}}}
The query should in fact end up looking like:
Query: { "created_on" : {$gte: start, $lt: end}}
Where start and end are dates. It seems like the second time you refer to "created_on" is unnecessary and in fact might be breaking your query.
NOTE: I have not had the chance to test out this theory, but I am working from http://cookbook.mongodb.org/patterns/date_range/ which seems to be very relevant to the question at hand.
Jodatime lib is very userful, Please make use of DateTimeZone.UTC for timezone parameter of DateTime. Once you set timezone, you will get accurate results. Try this
Calendar cal = Calendar.getInstance();
//get current year,month & day using Calender
int year=cal.get(Calendar.YEAR);
int monthNumber=cal.get(Calendar.MONTH);
int dateNumber=cal.get(Calendar.DAY_OF_MONTH);
monthNumber+=1;
BasicDBObject query = new BasicDBObject("dateCreated",new BasicDBObject("$gte", new DateTime(year, monthNumber, dateNumber, 0, 0,DateTimeZone.UTC).toDate()).append("$lte",new DateTime(year, monthNumber, dateNumber, 23, 59,DateTimeZone.UTC).toDate()));
System.out.println("formed query: "+query);
DBCursor cursor = collection.find(query);
while(cursor.hasNext())
{
System.out.println("found doc in given time range: "+cursor.next().toString());
}

Hibernate search with Criteria restriction returning incorrect count

The result list is perfect but the getResultSize() is incorrect.
I've knocked up some code to illustrate.
Criteria criteria2 = this.getSession().createCriteria(Film.class);
Criterion genre = Restrictions.eq("genreAlias.genreName", details.getSearch().getGenreName());
criteria2.createAlias("genres", "genreAlias", CriteriaSpecification.INNER_JOIN);
criteria2.add(genre);
criteria2.setMaxResults(details.getMaxRows())
.setFirstResult(details.getStartResult());
FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.createFullTextEntityManager(entityManager);
org.apache.lucene.queryParser.QueryParser parser2 = new QueryParser("title", new StopAnalyzer() );
org.apache.lucene.search.Query luceneQuery2 = parser2.parse( "title:"+details.getSearch()");
FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery( luceneQuery2, Film.class);
fullTextQuery.setCriteriaQuery(criteria2);
fullTextQuery.getResultList()); // Returns the correctly filtered list
fullTextQuery.getResultSize()); // Returns the retsult size without the genre resrtiction
From http://docs.jboss.org/hibernate/search/3.3/api/org/hibernate/search/jpa/FullTextQuery.html
int getResultSize()
Returns the number of hits for this search Caution: The number of results might be slightly different from getResultList().size() because getResultList() may be not in sync with the database at the time of query.
You should try to use some of the more specialized queries like this one:
Query query = new FuzzyQuery(new Term("title", q));
FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(query, Film.class);
int filmCount = fullTextQuery.getResultSize();
and this is how you do pagination requests (I'm guessing you have improperly implemented your paggination):
FullTextQuery hits = Search.getFullTextSession(getSession()).createFullTextQuery(query, Film.class)
.setFirstResult((pageNumber - 1) * perPageItems).setMaxResults(perPageItems);
The above works for me every time. You should keep in mind that the result of getResultSize() more of estimate. I use pagination a lot and I have experienced the number changing between pages. So you should say "about xxxx" results.

Categories

Resources