How to save a searchable and queryable json document in Postgres? - java

Receiving a Person's profile as JSON. How can I model it in a way it every value of this JSON document is searchable?
Not only json document needs to be searchable. It should also be queryable like "find all the persons who like Tarantino movies".
I can define this document in a relational model with one to many relationships. But this approach wouldn't allow free text search from the client side.Is there a better way to handle such scenarios? Document look like this:
{
"name":"FirstN LastN",
"photo":nicephoto.jpg,
"location":"Boston, MA",
"contacts":[
{
"type":"phone",
"value":"701290012734"
},
{
"type":"email",
"value":"test#test.com"
}
],
"movies":[
{
"name":"The Godfather",
"director":"Francis Ford Coppola",
"releaseYear":"1972",
"favQuote":"I'm gonna make him an offer he can't refuse. Okay?"
},
{
"name":"Pulp Fiction",
"director":"Quentin Tarantino",
"releaseYear":"1994",
"favQuote":"Just because you are a character doesn't mean that you have character."
}
],
"school":null,
}

"find all the persons who like Tarantino movies" needs to be written or converted in SQL like:
select persons->>'name' from jdoc,json_array_elements(jdoc.persons->'movies') movies where movies->>'director' ~ 'Tarantino';
other selection criteria can be modeled in similar way.
Requires Postgres 9.3 or later
http://sqlfiddle.com/#!15/652eb/10
to the question "how to save":
create table jdoc (persons json);
insert into jdoc values ('{
"name":"FirstN LastN",
"photo":"nicephoto.jpg",
"location":"Boston, MA",
"contacts":[
{
"type":"phone",
"value":"701290012734"
},
{
"type":"email",
"value":"test#test.com"
}
],
"movies":[
{
"name":"The Godfather",
"director":"Francis Ford Coppola",
"releaseYear":"1972",
"favQuote":"Im gonna make him an offer he cant refuse. Okay?"
},
{
"name":"Pulp Fiction",
"director":"Quentin Tarantino",
"releaseYear":"1994",
"favQuote":"Just because you are a character doesnt mean that you have character."
}
],
"school":null
}')
;

You might be looking for FLWOR:
for $d in doc("depts.xml")//deptno
let $e := doc("emps.xml")//employee[deptno = $d]
where count($e) >= 10
order by avg($e/salary) descending
return
<big-dept>
{ $d,
<headcount>{count($e)}</headcount>,
<avgsal>{avg($e/salary)}</avgsal>
}
</big-dept>
although it doesn't look like Postgres has plans to support Xquery.

Related

How can I fix this ElasticSearch Fielddata exception in Java code?

I'm working on Java code to create an index and query on ElasticSearch.
I keep getting this exception when trying to use count, sort API:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true ......
How can I set Fielddata to true?
I used BulkRequest to create index, how can I add mapping to BulkRequest?
Here is the code to create index:
BulkRequest request=new BulkRequest();
try {
BufferedReader br=new BufferedReader(new FileReader(fileName));
String line;
while((line=br.readLine())!=null) {
request.add(new IndexRequest(indexName, type).source(line, XContentType.JSON)); ;
BulkResponse bulkresp=client.bulk(request);
afterBulk(request,bulkresp);
}
catch (IOException e) {
e.printStackTrace();
}
First of all, let's go to the source of the problem, you want to do a sorting operation on the text field, which requires you to have fielddata enabled.
Before you enable fielddata, consider why you are using a text field
for aggregations, sorting, or in a script. It usually doesn’t make
sense to do so.
A text field is analyzed before indexing so that a value like New York
can be found by searching for new or for york. A terms aggregation on
this field will return a new bucket and a york bucket, when you
probably want a single bucket called New York.
Same would be the case for sorting. How you're suppose to sort on the field, where you have tons of terms.
Instead, you should have a text field for full text searches, and an
unanalyzed keyword field with doc_values enabled for aggregations,
as follows
{
"mappings": {
"_doc": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
To the other part of the question - you need to take a look at CreateIndexRequest, it allows to specify mappings explicitly. Most likely, right now you're using dynamics ones, that's why fielddata causes you the problems. More information on how to use CreateIndexRequest - https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index

Solr string field search with special characters

I have just started to work on Solr. There is a phone field and it has been defined in schema like below
<field docValues="true" indexed="true" multiValued="true" name="phones" stored="true" type="StrField"/>
From my understanding the string field will try to do the exact match but the user can use any format to search the phone number with special characters like (111) 111-1111. So I used ClientUtils.escapeQueryChars to add a slash for the special characters but the search does not result any result. I have been trying to understand why and is there any criteria that special characters cannot be escaped for string field? I don't think tokenizer matters as it is string field and I use edismax parser. Any ideas?
Using Solr 7.3.1 I reproduced what you've asked and can confirm that as long as you escape (, ) and properly, you'll get the hits you're looking for.
Schema
id: string
phones: string (multivalued, docvalues, indexed, stored)
Documents
{
"id":"doc1",
"phones":["(111) 111-1111"],
"_version_":1602190176246824960
},
{
"id":"doc2",
"phones":["111 111-1111"],
"_version_":1602190397829808128
},
{
"id":"doc3",
"phones":["111 (111)-1111"],
"_version_":1602190400002457600
}
Query
/select?q=phones:\(111\)\ 111-1111
{
"id":"doc1",
"phones":["(111) 111-1111"],
"_version_":1602190176246824960}]
}
/select?debugQuery=on&q=phones:111\ 111-1111
{
"id":"doc2",
"phones":["111 111-1111"],
"_version_":1602190397829808128}]
}
/select?debugQuery=on&q=phones:1111111111
"response":{"numFound":0,"start":0,"docs":[]}
The behavior is exactly as described - exact matches only.
Getting the behavior you want with PatternReplaceCharFilterFactory
Let's create a custom field type that removes anything that's not a number or letter:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field-type" : {
"name":"phoneStripped",
"class":"solr.TextField",
"positionIncrementGap":"100",
"analyzer" : {
"charFilters":[{
"class":"solr.PatternReplaceCharFilterFactory",
"replacement":"",
"pattern":"[^a-zA-Z0-9]"
}],
"tokenizer":{
"class":"solr.KeywordTokenizerFactory"
},
}
}
}' http://localhost:8983/solr/foo/schema
Then we create a new field named phone_stripped using this new field type (you can do this in the UI), and reindex our documents - now using the new field name:
{
"id":"doc1",
"phone_stripped":"(111) 111-1111"
},
{
"id":"doc3",
"phone_stripped":"111 (111)-1111"
},
{
"id":"doc2",
"phone_stripped":"111 111-1111"
}
And then we search for just 1111111111:
"response":{"numFound":3,"start":0,"docs":[ .. all our docs ..]
Using the previous search, phone_stripped:\(111\)\ 111-1111:
"response":{"numFound":3,"start":0,"docs":[ .. all our docs ..]
And just to make sure we haven't broken things in unspeakable ways, let's search for phone_stripped:\(111\)\ 111-1112:
"response":{"numFound":0,"start":0,"docs":[]

How to add a sort query to my StructuredQueryBuilder before talking to Marklogic

I am trying to add a sort/ordering query.
At my java:
StructuredQueryBuilder qb = new StructuredQueryBuilder();
QueryDefinition queryDef = qb.and(qb.value(qb.jsonProperty("status"), "Active"));
SearchHandle resultsHandle = new SearchHandle();
queryManager.setPageLength(PAGE_SIZE_TEN);
int start = PAGE_SIZE_TEN * (pageNumber - 1) + 1;
queryManager.search(queryDef, resultsHandle, start);
The above will return the resultsHandle with 10 json files found for each page specified for the variable "start", with status "Active".
My question is how do I include a sorting query like maybe something along the line of the following:
QueryDefinition queryDef = qb.and(qb.value(qb.jsonProperty("status"), "Active"),
qb.sort?(qb.jsonProperty("dateCreated"));
I want it to get me the 1st 10 json files in order of latest date. It is too late to do a Comparator after getting the result, as the result returns a random 10 json files not in any particular order.
A few samples of the json files will look as such:
1.json
[
{
"id":"1",
"dateCreated":"2017-10-01 12:00:00",
"status":"Active"
"body":"This is a test"
}
]
2.json
[
{
"id":"2",
"dateCreated":"2017-10-02 12:00:00",
"status":"Active"
"body":"This is a test 2"
}
]
I realized there's a enum StructuredQueryBuilder.Ordering, how do I use it?
StructuredQueryBuilder.Ordering is specifically for use with near-query and is unrelated to what you want to do. You need to use query options to define a sort order for your search results. See the sort-order query option:
http://docs.marklogic.com/guide/search-dev/appendixa#id_44212
Options can be pre-defined and installed on MarkLogic and then referenced in your search, or you can define them at runtime and combine them with your structured query in a combined query.
Predefined: http://docs.marklogic.com/guide/java/query-options#chapter
Dynamic: http://docs.marklogic.com/guide/java/searches#id_76144

Complex queries in elasticsearch

Let's say we have a entity "Device" it contains other entity "DeviceInfo", and we have a entity "Site" which contains a List of "DeviceInfo" entities, and "DeviceInfo" has a "Device" and a "Site" in its properties.
My task was to find all "Device"s which are in one "Site". To some endpoint I would send a "Site" id and page number and size of page (since it has to be pageable). I have made it work by creating a JPA specification
public static Specification<Device> bySearchRequest(final DeviceSearchRequest searchRequest) {
return (root, query, cb) -> {
final Join<Device, DeviceInfo> deviceInfo
= root.join(Device_.deviceInfo, JoinType.LEFT);
final Join<DeviceInfo, Site> site
= deviceInfo.join(DeviceInfo_.site, JoinType.LEFT);
return cb.and(cb.equal(site.get(Site.id), searchRequest.getSiteId()));
};
}
And then using I would convert the "Device"s to "IndexDevice"s which is in ES.
deviceRepository.findAll(currentUser,
DeviceRepository.Specs.bySearchRequest(searchRequest),
new PageRequest(searchRequest.getPage(), searchRequest.getSize()))
.getContent().stream().map(x ->indexedDeviceConverter.convert(x)).collect(Collectors.toList());
That is it. It works. But here I am fetching the data from DB, and I already have everything in Elasticsearch. Is there a way to make this same query to fetch the data directly from ES (with paging) ?
Only difference is that in ES "IndexedDevice" has a direct relation with a "IndexedSite" (there is no "IndexedDeviceInfo").
IndexedDevice
{
"id":"3eba5104-0c7a-4564-8270-062945cc8f5e",
"name":"D4",
"site":{
"id":"46e7ada4-3f34-4962-b849-fac59c8fe8ad",
"name":"SomeSite",
"displayInformation":"SomeSite",
"subtitle":""
},
"suggest":{
"input":[]
},
"displayInformation":"D4",
"subtitle":""
}
IndexedSite
{
"id": "46e7ada4-3f34-4962-b849-fac59c8fe8ad",
"name": "SomeSite",
"displayInformation": "SomeSite",
"subtitle": ""
}
I managed to do it. At the end it was really simple. I used ElasticsearchRepository (org.springframework.data.elasticsearch.repository).
elasticsearchRepositoy.search(QueryBuilders.termsQuery
("site.id",
searchRequest.getSite()),
new PageRequest(searchRequest.getPage(),
searchRequest.getSize()));

How to custom search for text query in mongodb?

I'm new in mongodb. I have following data as a JSON format in mongodb. I need to search the bookLabel or the shortLabel for the book and it should show me all the information about the book. For example: if I query for 'Cosmos' it'll show all the description about the book, like: bookLabel, writer, yearPublish, url. How can I do that in java? Need query, please help.
"Class":"Science",
"Description":[
{
"bookLabel":"Cosmos (Mass Market Paperback)",
"shortLabel":"Cosmos",
"writer":"Carl Sagan",
"yearPublish":[
"2002"
],
"url":"https://www.goodreads.com/book/show/55030.Cosmos"
},
{
"bookLabel":"The Immortal Life of Henrietta Lacks",
"shortLabel":"Immortal Life",
"writer":"Rebecca Skloot",
"yearPublish":[
"2010, 2011"
],
"url":"https://www.goodreads.com/book/show/6493208-the-immortal-life-of-henrietta-lacks"
}
],
"Class":"History",
"Description":[
{
"bookLabel":"The Rise and Fall of the Third Reich",
"shortLabel":"Rise and Fall",
"writer":"William L. Shirer",
"yearPublish":[
"1960"
],
"url":"https://www"
}
]
}
With MongoDB Java Driver v3.2.2 you can do something like this:
FindIterable<Document> iterable = collection.find(Document.parse("{\"Description.shortLabel\": {$regex: \"Cosmos\"}"));
This returns all documents containing Cosmos in the Description.shortLabel nested field. For an exact match, try this {"Description.shortLabel": "Cosmos"}. Replace shortLabel with bookLabelto search the bookLabel field. Then you can do iterable.forEach(new Block<Document>()) on the returned documents. To search both bookLabel and shortLabel, you can do a $or{}. My syntax could be wrong so check the MongoDB manual. But this is the general idea.
For this, you can use MongoDB's Text Search Capabilities. You'll have to create a text index on your collection for that.
First of all create a text index on your collection on fields bookLabel and shortLabel.
db.books.createIndex({ "Description.bookLabel" : "text", "Description.shortLabel" : "text" })
Note that this is done in the Mongo shell
Then
DBObject command = BasicDBObjectBuilder
.start("text", "books")
.append("search", "Cosmos")
.get();
CommandResult result = db.command(command);
BasicDBList results = (BasicDBList) result.get("results");
for(Object o : results) {
DBObject dbo = (DBObject) ((DBObject) o).get("obj");
String id = (String) dbo.get("_ID");
System.out.println(id);
}
Haven't really tested this. But just give it a try. Should work.

Categories

Resources