Flatten and De-flatten for Solr input/output

Flatten and De-flatten for Solr input/output - java

I have lots of Java objects which have parent child relationships.
These need to be put into Solr.
To do that, we convert the Java object into json as follows:
{
"employee": {
"name" : "John",
"address": {
"apt": 100,
"city": "New York",
"country": "USA"
},
"vehicles": [
{
"name" : "Hyundai",
"color" : "red"
},
{
"name" : "Toyota",
"color" : "black"
}
]
}
}
Now since Solr does not handle this, I am flattening it out as follows:
"employee.name": "John",
"employee.address.apt": 100,
"employee.address.city": "New York",
"employee.address.country": "USA",
"employee.vehicles_0.name": "Hyundai", // Note how arrays are being flattened
"employee.vehicles_0.color": "red",
"employee.vehicles_1.name": "Toyota",
"employee.vehicles_1.color": "black",
It is easy to flatten, but clients of my library do not want the flattened schema when they query.
So I need to de-flatten the above on return from Solr and convert them back to the original Java object.
Does anyone know how this can be done?
I am thinking of somewhat crude way of taking the flattened output from Solr (as shown above) and write a parser to put the fields back to Java objects. But this seems like lot of work. An easy way out or an existing tool would be much appreciated.
I am using Solr 4.5.1

Solr is designed for search, not storing deep object graphs. You might be better off optimizing the Solr records for search and then getting the original objects from the master store by recordID or some such.
Think about what will you be trying to find. For example, will you be searching for individual vehicles? If yes, your current document level should be a vehicle not an employee.

You can index your documents in a Parent->Child structure at first place.
Take a look at this blog post: http://blog.griddynamics.com/2013/09/solr-block-join-support.html

Related

ElasticSearch / Java - Dynamic Templates aggregation with null values included

I'm having a diffuculties with aggregations over dynamic templates. I have values stored like this.
[
{
"country": "CZ",
"countryName": {
"en": "Czech Republic",
"es": "Republica checa",
"de": "Tschechische Republik"
},
"ownerName": "..."
},
{
"ownerName": "..."
}
]
Country field is classic keyword, mapping for country name is indexed as dynamic template according to the fact that I want to extend with with another languages when I need to.
{
"dynamic_templates": [
{
"countryName_lsi_object_template": {
"path_match": "countryName.*",
"mapping": {
"type": "keyword"
}
}
}
]
}
countryName and country are not mandatory parameters - when the document is not assigned to any country, I can't have countryName filled either. However I need to do a sorted aggregation over the country names with according to chosen key and also need to include buckets with null countries. Is there any way to do that?
Previously, I used TermsValuesSourceBuilder with order on "country" field, but I need data sorted according to specifix language and name and that can't be done over country codes.
(I'm using elasticsearch 7.7.1 and java 8 and recreation of index / changing data structure is not my option.)
I tried to use missing bucket option, but the response does not include buckets with "countryName" missing at all.
TermsValuesSourceBuilder("countryName").field("countryName.en").missingBucket(true);

Checkboxes checked into JSON Format in SpringMVC

I am working on a spring MVC application. I have a sitaution where i need to check some checkboxes from UI and save the checked values in the form of JSON in the backend and i need to convert that into a string.
The picture shows more.
So i want to save like:
[{
Coast : 'East',
States : [ 'NY', 'MI' ]
},{
Coast : 'Central',
States : [ 'TX', 'OK' ]
}].
Please suggest me how can i implement this.

Your question is quite vague so I'm going to assume because you've used the json tag that you're asking for help on how to model this information in JSON and handle it within your Spring app.
You probably want to restructure your JSON schema to support extra fields being set per state. Instead of States being a list of strings, you could change it to a list of objects which has a name and selected field.
I'd also recommend you change the keys in your JSON to be lower case, this enables more fluent mapping between your JSON and model classes.
For example, MI is selected in the below JSON, whereas NY isn't:
[{
"coast": "East",
"states": [{
"name": "NY",
"selected": true
}, {
"name": "MI",
"selected": false
}]
}, {
...same again for West and Central
}]
You could then have some classes along the lines of and use Jackson to map between them:
public class Region {
String coast;
List<State> states;
}
public class State {
String name;
boolean selected;
}

How do I upsert a mongo (or spring-mongo) document containing a list

I'm trying determine if there is a way using spring-mongodb (or even using the mongo java API), to upsert a document containing a list such that the elements of the list are always a union of the values upserted.
Suppose I have the following Classes (made up to simplify things):
public class Patron {
private String name;
private String address;
private List<Book> booksRead;
// assume gets/sets
}
public class Book{
private String title;
private String author;
// assume gets/sets
}
Further, let's assume I get the updates on only the latest books read, but I want to keep in the DB the full list of all books read. So what I'd like to do is insert a Patron(with booksRead) if it doesn't exist or update their booksRead if the Patron already does exist.
So, the first upsert 'John Doe' is not in collection so the document is inserted and looks like this:
"_id": ObjectId("553450062ef7b63435ec1f57"),
"name" : "John Doe"
"address" : "123 Oak st, Anytown, NY, 13760"
"booksRead" : [
{
"title" : "Grapes of Wrath",
"author" : "John Steinbeck"
},
{
"title" : "Creatures Great and Small",
"author" : "James Herriot"
}
]
John re-reads 'Grapes of Wrath' and also reads 'Of Mice and Men'. An insert is attempted passing 2 books as books read, but I'd like to only insert 'Of Mice and Men' to the read list so the document looks like:
"_id": ObjectId("553450062ef7b63435ec1f57"),
"name" : "John Doe"
"address" : "123 Oak st, Anytown, NY, 13760"
"booksRead" : [
{
"title" : "Grapes of Wrath",
"author" : "John Steinbeck"
},
{
"title" : "Creatures Great and Small",
"author" : "James Herriot"
},
{
"title" : "Of Mice and Men",
"author" : "John Steinbeck"
}
]
Everything I've tried seems to point to needing to separate calls one for the insert and one for the update. Update.set works for the initial load (insert) but replaces full list on second update. $addToSet works on update, but complains about trying to insert into 'non-array' on initial insert.
UPDATE:
This appears to be an issue with Spring-mongodb. I can achieve the above with mongo java api calls, it just fails when using the spring equivalents. (at least up to spring-data-mongodb 1.6.2)

Easiest way to accomplish would be to remove the old entry and then re-add it with the added books. Ideally, this should be done transactionally.

I think you could do this using the addToSet operator.
You can find documentation on MongoDB's web site :
http://docs.mongodb.org/manual/reference/operator/update/addToSet/#up._S_addToSet
It is used with either update() or findAndModify() methods.

Issue appears to be at Spring layer. While I get errors upserting using its FindAndModify command, I don't have a issue with $addToSet with the mongo DBCollection.findAndModify and DBCollection.update methods.

importing mongodb data to solr in optimal way

I have millions of documents in my mongodb database and to implement searching I am using Apache Solr. I googled how to import data from mongodb to solr, but found that there is no straight forward approach using Data Import Handler. So I decided to insert the document into Solr at the same time when I am inserting it into mongoDB using the solr client for Java, SolrJ. My documents are in following format in mongoDB:
{
"_id" : ObjectId("51cc52c9e4b04f75b27542ba"),
"article" : {
"summary" : "As more people eschew landlines, companies are offering technologies like\npersonal cell sites and signal boosters to augment cellphones, Eric A. Taub\nwrites in The New York Times.\n\n",
"author" : "By THE NEW YORK TIMES",
"title" : "Daily Report: Trying to Fix a Big Flaw in Cellphone Technology (NYT)",
"source" : "NYT",
"publish_date" : "Thu, 27 Jun 2013 12:01:00 +0100",
"source_url" : "http://bits.blogs.nytimes.com/feed/",
"url" : "http://news.feedzilla.com/en_us/stories/top-news/315578403?client_source=api&format=json"
},
"topics" : [
{
"categoryName" : "Technology Internet",
"score" : "94%"
}
],
"socialTags" : [
{
"originalValue" : "Cell site",
"importance" : "1"
},
{
"originalValue" : "Cellular repeater",
"importance" : "1"
},
{
"originalValue" : "Technology Internet",
"importance" : "1"
}
],
"entities" : [
{
"_type" : "PublishedMedium",
"name" : "The New York Times"
},
{
"_type" : "Company",
"name" : "The New York Times"
},
{
"_type" : "Person",
"name" : "Eric A. Taub"
}
]
}
I want to do indexing on two fields: 'summary' and 'title' of 'article' array.
So far, what I have learned is putting this entire document into solr does not make sense as it will increase the size of index and make searches slower. So, I decided to store following fields in Solr: 'docId', 'title', 'summary', so while searching in solr I will retrieve only docId and then retrieve other details from mongodb bec it faster than retrieving data from solr bec of analysers, tokenisers and all.
FIRST:
So, I need to maintain a unique field 'docId', shall I use use the default '_id' generated by mongod? But for that the doc has to be inserted first, so that mongod can generate the '_id'. So I need to retrieve the doc after inserting into mongodb, fetch the '_id' then insert in solr the fields 'docId', 'summary' and 'title'. Can this be imroved?
SECOND:
I need to define the schema in solr for this where I have to map fields from mongodb to fields in solr. I have the default instance of solr running, from solr/example/start.jar.
It has a default schema and a default collection called 'collection1'. How can I create my own collection, I cannot find anything in the admin interface for this. I want to create a collection for my project and then write a schema as I have defined above.
Whatever tutorials I have found, they simply add the documents to solr. So do I need to override the default schema?
Note: I am new to Solr, you would have already inferred after reading the question :D So please help!

Freemarker, Maps and Mongo. Getting a value out a nested structure

I have a Mongo Database and I am making a small web application to display the values from the database. I'm having a little problem with getting a value from a nested structure.
The database structure I have is:
{
"_id": ObjectId("4e244ddcb1633b25c38f2155"),
"id": "2173",
"type": "P",
"title": "Test title",
"tag": {
"0": {
"_id": ObjectId("4e244ddcb1633b25c38f1fc1"),
"name": "Education",
"id": "6"
}
}
}
What I do is, I get the database from the database and put this in a map and this map I put together with several other maps into one map (root) and put that into the .ftl page.
I can then read the value in .ftl like: ${root.title} which will ofcourse print 'Test title' in the above example.
I now what to get the value of name in tage so that it can print 'Education'
I don't know how to get this. Any ideas.

You can access the value using Freemarker's built-in's for hashes. Similar to below:
<#assign tagKeys = root.tag?keys>
<#list tagKeys as tagKey>
${root.tag[tagKey].name}
</#list>
I have not tested this yet but generally speaking this should work. You might also want to read some previous answers on SO:
Freemarker iterating over hashmap keys

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flatten and De-flatten for Solr input/output - java

You can index your documents in a Parent->Child structure at first place. Take a look at this blog post: http://blog.griddynamics.com/2013/09/solr-block-join-support.html

Related

ElasticSearch / Java - Dynamic Templates aggregation with null values included

Checkboxes checked into JSON Format in SpringMVC

How do I upsert a mongo (or spring-mongo) document containing a list

importing mongodb data to solr in optimal way

Freemarker, Maps and Mongo. Getting a value out a nested structure

Categories

Resources