MarkLogic search and retrieve specific fields - java

I am faily new to MarkLogic (and noSQL) and currently trying to learn the Java API client. My question on searching, which returns back search result snippets / matches, is it possible for the search result to include specific fields in the document?
For example, given this document:
{"id":"1", "type":"classified", "description": "This is a classified type."}
And I search using this:
QueryManager queryMgr = client.newQueryManager();
StringQueryDefinition query = queryMgr.newStringDefinition();
query.setCriteria("classified");
queryMgr.search(query, resultsHandle);
How can I get the JSON document's 3 defined fields (id, type, description) as part of the search result - so I can display them in my UI table?
Do I need to hit the DB again by loading the document via URI (thus if I have 1000 records, that means hitting the DB again 1000 times)?

You have several options to retrieve specific fields with your search results. You could use the Pojo Data Binding Interface. You could read multiple documents matching a query which brings back the entirety of each document which you can then get as a pojo or String or any other handle. Or you can use the same API you're using above but add search options to allow you to extract a portion of a matching document.
If you're bring back thousands of matches, you're probably not showing all those snippets to end users, so you should probably disable snippets using something like
<transform-results apply="empty-snippet" />
in your options.

Related

Jiray REST API - Querying for issues with all of the fields in a flat JSON structure

In the backlog view of a project, I can select one or more issues and export them to Excel. Here's what I see when I open it.
Each issue takes up a row. Each field in an issue takes up a column in excel.
If I were to visualize this in JSON it would look something like
[
issue1:{
field1:value1,
field2:value2,
..
},
issue2:{
}
]
So the issue block has all of the attributes in a flat structure.
Is there a URL mapping in the JIRA api that can get me a response in a flat structure as above? Most of their documented apis return data in nested structures (there are different levels of complex objects for "Issues").
The REST API search and issue resources will let you extract the information. You can then normalize it into the flat json you want. Note that some fields such as components ca contain multiple values

Storing filter criteria in database

I have an web application with Javascript based UI and Jersey based REST service. The UI has certain features requiring complex search criteria to retrieve the data. The UI defines the search criteria with various parameters but the most significant to this discussion is this. A user can create a filter like this-
1. ( Receiver_Email_ID CONTAINS xyz#abc.com AND
2. Sender_Email_ID CONTAINS blocked#abc.com ) OR
3. ( Origin_Domain IS BADDOMAIN.com AND
4. Email_has_attachment == true )
The numbers 1 to 4 just indicate that these are different rows in the UI and that the sequence matters here. The UI represents it in JSON format which on the server could be received as a POJO.
I need the ability to store this criteria such that the UI can be repopulated with the filter and it could be executed. Along with several other tables, I plan to use the following database table to store this-
rule_id
rule_seq
op_paren
left_opr
operator
right_opr
cl_paren
and_or
Does it look right or I can improve it? The other option I have is to store the entire JSON in an embedded document DB such as OrientDB.
From formal point of view each search criteria is Abstract syntax tree It's don't fit you structure. You idea about saving search in OrientDB is very fit to task from this point of view but realization not simple.

Is it possible to filter MongoDB query results?

I am developing a simple web application that fetches data from MongoDB.
What I need to do, is to show data matching the query on the webpage. Let's say the user has to choose a
programming language [Java, C#, Python]
project creation time [all, max week ago, max month ago]
implemented algorithm [heapsort, quicksort, mergesort]
Now, my MongoDB collection contains all types of object, some of which are not necessarily an algorithm at all (this is unavoidable, unfortunately).
Because of this fact, I have a specific query that finds all the documents which are eligible to further processing.
FindIterable<Document> docs = collection.find(Filters.ex(programmingLanguage));
And here comes my final question:
When I already have a FindIterable object, can I filter it so that only specific documents from previously selected documents will be chosen?
For example, I need a line of code, that will give me only documents created no longer than a month ago which are written in Java, given docs object.
Desirably I would implement it like this:
create function that applies additional filter on a FindIterable object
public static FindIterable<Document> applyLastMonth(FindIterable<Document> docs) {
return docs.<magicfunction>(Filters.gte("date", dateMonthAgo()))
}
and apply it to wherever it is needed. Is it possible?
My problem is much more complex, so please do not solve the example given above, I just want to be able to filter results returned by other query, so that I don't look at dozens of cases in my code. Unfortunately I found out that docs.filter(...) does not work for me, as it replaces the old query with the new one.

Is it possible to create a multivalued polyfield in Solr that will allow custom logic at query time?

I'm working with a pretty niche requirement to model a relational structure within Solr and thought that a custom polyfield would be the most suitable solution to my problem. In short, each record in the index will have a number of embargo and expiry dates for when the content should be considered 'available'. These dates are grouped with another kind of categorisation (let's say by device), so for example, any given item in the index may be available for mobile users between two dates, but only available for desktop users between another two dates.
Much like the currency and the latlon types, I would index the values as a comma separated list representing each availability window, for example:
mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z
So, a single index record could look like
{
id: "1234",
text: ["foobarbaz"],
availability: [
"mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z",
"pc,2013-09-22T00:00:00Z,2013-09-30T00:00:00Z"
]
}
The custom type would do the job of parsing the incoming value and storing it accordingly. Is this a viable solution? How would I approach the custom logic required at query time to filter by device and then make sure that NOW is within the provided dates?
My attempt so far has been based on the Currency field type, but now I've dialled it back to just storing the string in its un-parsed state. If I could prove that the filtering I want is even possible before using the polyfield features, then I'll know if it's worth continuing.
Does anybody else have any experience writing custom (poly)fields, or doing anything similar to what I'm doing?
Thanks!
If you want to be able to filter and search on these ranges, I don't think you'll have much luck storing records like that. It would make more sense to me to have a more structured document, something like:
id: "1234",
text: ["foobarbaz"],
mobileavailabilitystart: "mobile,2013-09-23T00:00:00Z",
mobileavailabilityend: "2013-09-30T00:00:00Z",
pcavailabilitystart: "2013-09-22T00:00:00Z",
pcavailabilityend: "2013-09-30T00:00:00Z"
Indexing the full contents of a csv line in Lucene/Solr, in a single field, would allow you to perform full-text searches on it, but would not be a good way to support querying for a specific element of it.

Storing and retrieving Json object to/from lucene indexes

I have store a set of json object into the lucene indexes and also want to retrieve it from the index. I am using lucene-3.4.
So is there any library or easy mechanism to make this happen in lucene.
For sample: Json object
{
BOOKNAME1: {
id:1,
name:"bname1",
price:"p1"
},
BOOKNAME2: {
id:2,
name:"bname2",
price:"p2"
},
BOOKNAME3: {
id:3,
name:"bname3",
price:"p3"
}
}
Any sort of help will be appreciated.
Thanks in advance,
I would recommend you to index your json object by:
1) Parse your json file. I usually use json simple.
2) Open an index using IndexWriterConfig
3) Add documents to the index.
4) Commit changes and close the index
5) Run your queries
If you would like to use Lucene Core instead of elasticsearch, I have created a sample project, which gets as an input a file with JSON objects and creates an Index. Also, I have added a test to query the index.
I am using the latest Lucene version (4.8), please have a look here:
http://ignaciosuay.com/getting-started-with-lucene-and-json-indexing/
If you have time, I think it is worth reading "Lucene in Action".
Hope it helps.
If you don't want to search within the json but only store it, you just need to extract the id, which will hopefully be unique. Then your lucene document would have two fields:
the id (indexed, not necessarily stored)
the json itself, as it is (only stored)
Once you stored your json in lucene you can retrieve it filtering by id.
On the other hand this is pretty much what elasticsearch does with your documents. You just send some json to it via a REST api. elasticsearch will keep the json as it is and also make it searchable by default. That means you can either retrieve the json by id or search against it, out of the box without having to write any code.
Also, with lucene your documents wouldn't be available till you commit your documents or reopen the index reader, while elasticsearch adds a handy transaction log to it, so that the GET is always real time.
Also, elasticsearch offers a lot more: a nice distributed infrastructure, faceting, scripting and more. Check it out!

Categories

Resources