Modeling nested documents with Spring-Data-MongoDB

Modeling nested documents with Spring-Data-MongoDB - java

I have a MongoDB database that represents snippets of public gene information like so:
{
_id: 1,
symbol: "GENEA",
db_references: {
"DB A": "DBA000123",
"DB B" ["ABC123", "DEF456"]
}
}
I am trying to map this to a #Document-annotated POJO class, like this:
#Document
Public class Gene {
#Id
private int id;
private String symbol;
private Map<String,Object> db_references;
//getters and setters
}
Because of the nature of MongoDB's schema-less design, the db_references field can contain a long list of possible keys, with values sometimes being arrays or other key-value pairs. My primary concern is the speed at which I can fetch multiple Gene documents and slice up their db_references.
My question: what is the best way to represent this field to optimize fetching performance? Should I define a custom POJO and map this field to it? Should I make it a BasicDBObject? Or would it be best not map the documents at all with Spring Data and just use the MongoDB Java driver and parse the DBObjects returned?

Sorry to see your question hasn't been answered yet.
If db_references represent an actual concept within the domain you are much better off capturing this domain knowledge in a class. It is always a good idea and MongoDB helps with it a lot.
Thus, you can store this list of nested objects inside the MongoDB document and fetch the whole aggregate in a single query. Spring Data should also handle deserializing as well.

Related

Android Firestore limitations to custom object models

I am migrating my app to use Firebase Firestore, and one of my models is very complex (contains lists of other custom objects). Looking at the documentation, on how to commit a model object as a document, it looks like you simply create your model object with a public constructor, and getters and setters.
For example from the add data guide:
public class City {
private String name;
private String state;
private String country;
private boolean capital;
private long population;
private List<String> regions;
public City() {}
public City(String name, String state, String country, boolean capital, long population, List<String> regions) {
// getters/setters
}
Firestore automatically translates this to and from and document without any additional steps. You pass an instance to a DocumentReference.set(city) call, and retrieve it from a call to DocumentSnapshot.toObject(City.class)
How exactly does it serialize this to a document? Through reflection? It doesn't discuss any limitations. Basically, I'm left wondering if this will work on more complex models, and how complex. Will it work for a class with an ArrayList of custom objects?

Firestore automatically translates this to and from and document without any additional steps. How exactly does it serialize this to a document? Through reflection?
You're guessing right, through reflection. As also #Doug Stevenson mentioned in his comment, that's very common for systems as Firebase, to convert JSON data to POJO (Plain Old Java Object). Please also note that the setters are not required. If there is no setter for a JSON property, the Firebase client will set the value directly onto the field. A constructor-with-arguments is also not required. While both are idiomatic, there are good cases to have classes without them. Please also take a look at some informations regarding the existens fo the no-argument constructor.
It doesn't discuss any limitations.
Yes it does. The official documentation explains that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:
Maximum size for a document: 1 MiB (1,048,576 bytes)
As you can see, you are limited to 1 MiB total of data in a single document. When we are talking about storing text, you can store pretty much but as your array getts bigger (with custom objects), be careful about this limitation.
Please also note, that if you are storing large amount of data in arrays and those arrays should be updated by lots of users, there is another limitation that you need to take care of. So you are limited to 1 write per second on every document. So if you have a situation in which a lot of users al all trying to write/update data to the same documents all at once, you might start to see some of this writes to fail. So, be careful about this limitation too.
Will it work for a class with an ArrayList of custom objects?
It will work with any types of classes as long as are supported data type objects.
Basically, I'm left wondering if this will work on more complex models, and how complex.
It will work with any king of complex model as long as you are using the correct data types for your objects and your documents are within that 1 MIB limitation.

MySql Json object deserialisation optimization

I am having a field of type "text" in mysql and storing json data in it.
Eg:- "["android_app","iphone_app","windows_app"]";
I am interacting with mysql using hibernate and while reading this field I am deserializing it to an arraylist in Java.
My question is, is this the best and fastest way to handle such cases or there are some better ways of doing it.

If you're able to take advantage of some JPA 2.1 features, you could use anAttributeConverter to handle this for you automatically without having to deal with this in your business code.
public class YourEntity {
// other stuff
#Convert(converter = StringArrayToJsonConverter.class)
List<String> textValues;
}
Then you just define the converter as follows:
public class StringArraytoJsonConverter
implements AttributeConverter<List<String>, String> {
#Override
public string convertToDatabaseColumn(List<String> list) {
// convert the list to a json string here and return it
}
#Override
public List<String> convertToEntityAttribute(String dbValue) {
// convert the json string to the array list here and return it
}
}
The best part is this becomes a reusable component that you can simply place anywhere you need to represent a json array as a List<> in your java classes but would rather store it as JSON in the database using a single text field.
Another alternative would be to avoid storing the data as JSON but instead use a real table where that it would allow you to actually query on the JSON values. To do this, you'd rewrite that mapping using JPA's #ElementCollection
#ElementCollection
private List<String> textValues;
Internally, Hibernate creates a secondary table where it stores the text string values for the array with a reference to the owning entity's primary key, where the entity primary key and the string value are considered the PK of this secondary table.
You then either handle serializing the List<> as a JSON array in your controller/business code to avoid mixing persistence with that type of medium, particularly given that most databases have not yet introduced a real JSON data-type yet :).

Transpose result of hibernate query into list of POJOs

I got a generics class that contains runQuery method that has following code setup:
public Object runQuery(String query) {
Query retVal = getSession().createSQLQuery(query);
return retVal.list();
}
I am trying to figure out how to transpose the returned values into a list of Check objects (List):
public class Check {
private int id;
private String name;
private String confirmationId;
//getters and setters
}
Most queries that i run are actually stored procs in mysql. I know of native queries and resultTransforms (which if implemented would mean i have to change my generics and not after that).
Any ideas how i can accomplish this with current setup?

You can find tutorials on ORMs ( What is Object/relational mapping(ORM) in relation to Hibernate and JDBC? ).
Basically, you add annotation to your Check class to tell hibernate which java fields matches which DB field, make a JPQL (it looks like SQL) request and it gets your object and maps them from DB to POJO.
It's a broad subject, this is a good start: https://www.tutorialspoint.com/hibernate/hibernate_quick_guide.htm
It will require some configuration, but that's worth it. Here's one tutorial on annotation based configuration: https://www.tutorialspoint.com/hibernate/hibernate_annotations.htm but with less explanations on how ORM works (there's also EclipseLink as an ORM)
Else, you could make you're own mapper, which takes values from a ResultSet, and set them in your class. For a lot of reason, I would recommand using an ORM than this method (Except maybe if you have only one class that is stored in the DB, which I doubt).

Key-Value on top of Appengine

Although appengine already is schema-less, there still need to define the entities that needed to be stored into the Datastore through the Datanucleus persistence layer. So I am thinking of a way to get around this; by having a layer that will store Key-value at runtime, instead of compile-time Entities.
The way this is done with Redis is by creating a key like this:
private static final String USER_ID_FORMAT = "user:id:%s";
private static final String USER_NAME_FORMAT = "user:name:%s";
From the docs Redis types are: String, Linked-list, Set, Sorted set. I am not sure if there's more.
As for the GAE datastore is concerned a String "Key" and a "Value" have to be the entity that will be stored.
Like:
public class KeyValue {
private String key;
private Value value; // value can be a String, Linked-list, Set or Sorted set etc.
// Code omitted
}
The justification of this scheme is rooted to the Restful access to the datastore (that is provided by Datanucleus-api-rest)
Using this rest api, to persist a object or entity:
POST http://datanucleus.appspot.com/dn/guestbook.Greeting
{"author":null,
"class":"guestbook.Greeting",
"content":"test insert",
"date":1239213923232}
The problem with this approach is that in order to persist a Entity the actual class needs to be defined at compile time; unlike with the idea of having a key-value store mechanism we can simplify the method call:
POST http://datanucleus.appspot.com/dn/org.myframework.KeyValue
{ "class":"org.myframework.KeyValue"
"key":"user:id:johnsmith;followers",
"value":"the_list",
}
Passing a single string as "value" is fairly easy, I can use JSON array for list, set or sorted list. The real question would be how to actually persist different types of data passed into the interface. Should there be multiple KeyValue entities each representing the basic types it support: KeyValueString? KeyValueList? etc.

Looks like you're using a JSON based REST API, so why not just store Value as a JSON string?

You do not need to use the Datanucleus layer, or any of the other fine ORM layers (like Twig or Objectify). Those are optional, and are all based on the low-level API. If I interpret what you are saying properly, perhaps it already has the functionality that you want. See: https://developers.google.com/appengine/docs/java/datastore/entities

Datanucleus is a specific framework that runs on top of GAE. You can however access the database at a lower, less structured, more key/value-like level - the low-level API. That's the lowest level you can access directly.
BTW, the low-level-"GAE datastore" internally runs on 6 global Google Megastore tables, which in turn are hosted on the Google Big Table database system.
Saving JSON as a String works fine. But you will need ways to retrieve your objects other than by ID. That is, you need a way to index your data to support any kind of useful query on it.

solrj: how to store and retrieve List<POJO> via multivalued field in index

My use case is an index which holds titles of online media. The provider of the data associates a list of categories with each title. I am using SolrJ to populate the index via an annotated POJO class
e.g.
#Field("title")
private String title;
#Field("categories")
private List<Category> categoryList;
The associated POJO is
public class Category {
private Long id;
private String name;
...
}
My question has two parts:
a) is this possible via SolrJ - the docs only contain an example of #Field using a List of String, so I assume the serialization/marshalling only supports simple types ?
b) how would I set up the schema to hold this. I have a naive assumption I just need to set
multiValued=true on the required field & it will all work by magic.
I'm just starting to implement this so any response would be highly appreciated.

The answer is as you thought:
a) You have only simple types available. So you will have a List of the same type e.g. String. The point is you cant represent complex types inside the lucene document so you wont deserialize them as well.
b) The problem is what you are trying is to represent relational thinking in a "document store". That will probably work only to a certain point. If you want to represent categories inside a lucene document just use the string it is not necessary to store a id as well.
The only point to store an id as well is: if you want to do aside the search a lookup on a RDBMS. If you want to do this you need to make sure that the id and the category name is softlinked. This is not working for every 1:n relation. (Every 1:n relation where the n related table consists only of required fields is possible. If you have an optional field you need to put something like a filling emptyconstant in the field if possible).
However if these 1:n relations are not sparse its possible actually if you maintain the order in which you add fields to the document. So the case with the category relation can be probably represented if you dont sort the lists.
You may implement a method which returns this Category if you instantiate it with the values at position 0...n. So the solution would be if you want to have the first category it will be at position 0 of every list related to this category.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.