Versioned Object/JSON Mapping to/from Mongo?

Versioned Object/JSON Mapping to/from Mongo? - java

So I have an app that needs to store certain configuration info, and so I am planning on storing the configs as simple JSON documents in Mongo:
appConfig: {
fizz: true,
buzz: 34
}
This might map to a Java POJO/entity like:
public class AppConfig {
private boolean fizz;
private int buzz;
}
etc. Ordinarily, with relational databases, I use Hibernate/JPA for O/R mapping from table data to/from Java entities. I believe the closest JSON/Mongo companion to table/Hibernate is a Morphia/GSON combo: use Morphia to drive connectivity from my Java app to Mongo, and then use GSON to O/J map the JSON to/from Java POJOs/entities.
The problem here is that, over time, my appConfig document structure will change. It may be something simple like:
appConfig: {
fizz: true,
buzz: 34
foo: "Hello!"
}
Which would then require the POJO/entity to become:
public class AppConfig {
private boolean fizz;
private int buzz;
private String foo;
}
But the problem is that I may have tens of thousands of JSON documents already stored in Mongo that don't have foo properties in them. In this specific case, the obvious solution is to set a default on the property like:
public class AppConfig {
private boolean fizz;
private int buzz;
private String foo = "Hello!"
}
However in reality, eventually the AppConfig document/schema/structure might change so much that it in no way, shape or form resembles its original design. But the kicker is: I need to be backwards-compatible and, preferably, be capable of updating/transforming documents to match the new schema/structure where appropriate.
My question: how is this "versioned document" problem typically solved?

I usually solve this problem by adding a version field to each document in the collection.
You might have several documents in the AppConfig collection:
{
_id: 1,
fizz: true,
buzz: 34
}
{
_id: 2,
version: 1,
fizz: false,
buzz: 36,
foo: "Hello!"
}
{
_id: 3,
version: 1,
fizz: true,
buzz: 42,
foo: "Goodbye"
}
In the above example, there are two documents at version one, and one older document at version zero (in this pattern, I generally interpret a missing or null version field to be version zero, because I always only add this once I'm versioning by documents in production).
The two principles of this pattern:
Documents are always saved at the newest version when they are actually modified.
When a document is read, if it's not at the newest version, it gets transparently upgraded to the newest version.
You do this by checking the version field, and performing a migration when the version isn't new enough:
DBObject update(DBObject document) {
if (document.getInt("version", 0) < 1) {
document.put("foo", "Hello!"); //add default value for foo
document.put("version", 1);
}
return document;
}
This migration can fairly easily add fields with default values, rename fields, and remove fields. Since it's located in application code, you can do more complicated calculations as necessary.
Once the document has been migrated, you can run it through whatever ODM solution you like to convert it into Java objects. This solution no longer has to worry about versioning, since the documents it deals with are all current!
With Morphia this could be done using the #PreLoad annotation.
Two caveats:
Sometimes you may want to save the upgraded document back to the database immediately. The most common reasons for this are when the migration is expensive, the migration is non-deterministic or integrates with another database, or you're in a hurry to upgrade an old version.
Adding or renaming fields that are used as criteria in queries is a bit trickier. In practice, you may need to perform more than one query, and unify the results.
In my opinion, this pattern highlights one of the great advantages of MongoDB: since the documents are versioned in the application, you can seamlessly migrate data representations in the application without any offline "migration phase" like you would need with a SQL database.

The JSON deserialzer solves this in a very simple way for you, (using JAVA)
Just allow your POJO/entity to grown with new fields. When you deserialize your JSON from mongo to you entity - all missing fields will be null.
mongoDocument v1 : Entity of v3
{
fizz="abc", --> fizz = "abc";
buzz=123 --> buzz = 123;
--> newObj = null;
--> obj_v3 = null;
}
You can even use this the other way around if you like to have you legacy servers work with new database objects:
mongoDocument v3 : Entity of v1
{
fizz:"abc", --> fizz = "abc";
buzz:123, --> buzz = 123;
newObj:"zzz", -->
obj_v3:"b -->
}
Depending if they have the fields or not - it will be populated by the deserializer.
Keep in mind that booleans are not best suited for this since they can default to false (depending on which deserializer you use).
So unless you are actively going to work with versioning of your objects why bother with the overhead when you can build a legacy safe server implementation what with just a few null checks can handle any of the older objects.
I hope this proposal might help you with your set-up

I guess below thread will help you although it is not about versioning documents in the DB, and it has been done using spring-data-mongodb,
How to add a final field to an existing spring-data-mongodb document collection?
So you can assign values to the POJO based on existence of the property in the document using the Converter implementation.

You have a couple of options with morphia, at least. You could use a versioned class name then rely on morphia's use of the className property to fetch correct class version. Then your application would just have to migrate that old object to the new class definition. Another option is to use #PreLoad and massage the DBObject coming out of mongo to the new shape before morphia maps the DBObject to your class. Using a version field on the class you can determine which migration to run when the data is loaded. From that point, it would just look like the new form to morphia and would map seamlessly. Once you save that configuration object back to mongo, it'd be in the new form and the next load wouldn't need to run the migration.

Related

Highlighting in Hibernate Search 6 and Elasticsearch backend

We're in the process of converting our java application from Hibernate Search 5 to 6 with an Elasticsearch backend.
For some good background info, see How to do highlighting within HibernateSearch over Elasticsearch for a question we had when upgrading our highlighting code from a Lucene to Elasticsearch backend and how it was resolved.
Hibernate Search 6 seems to support using 2 backends at the same time, Lucene and Elasticsearch, so we'd like to use Elasticsearch for all our queries and Lucene for the highlighting, if that's possible.
Here is basically what we're trying to do:
public boolean matchPhoneNumbers() {
String phoneNumber1 = "603-436-1234";
String phoneNumber2 = "603-436-1234";
LuceneBackend luceneBackend =
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
Analyzer analyzer = luceneBackend.analyzer("phoneNumberKeywordAnalyzer").get();
//... builds a Lucene Query using the analyzer and phoneNumber1 term
Query phoneNumberQuery = buildQuery(analyzer, phoneNumber1, ...);
return isMatch("phoneNumberField", phoneNumber2, phoneNumberQuery, analyzer);
}
private boolean isMatch(String field, String target, Query sourceQ, Analyzer analyzer) {
Highlighter highlighter = new Highlighter(new QueryScorer(sourceQ, field));
highlighter.setTextFragmenter(new NullFragmenter());
try {
String result = highlighter.getBestFragment(analyzer, field, target);
return StringUtils.hasText(result);
} catch (IOException e) {
...
}
}
What I've attempted so far is to configure two separate backends in the configuration properties, per the documentation, like this:
properties.setProperty("hibernate.search.backends.elasticsearch.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backends.elasticsearch.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backends.elasticsearch.uris", "http://127.0.0.1:9200");
The AnalysisConfigurer class implements ElasticsearchAnalysisConfigurer and
CustomLuceneAnalysisConfigurer implements from LuceneAnalysisConfigurer.
Analyzers are defined twice, once in the Elasticsearch configurer and again in the Lucene configurer.
I don't know why both hibernate.search.backends.elasticsearch.type and hibernate.search.backends.lucene.type are necessary but if I don't include the lucene.type, I get Ambiguous backend type: configuration property 'hibernate.search.backends.lucene.type' is not set.
But if I do have both backend properties types set, I get
HSEARCH000575: No default backend. Check that at least one entity is configured to target the default backend, when attempting to retrieve the Lucene backend, like:
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
And the same error when trying to retrieve the Elasticsearch backend.
I've also added #Indexed(..., backend = "elasticsearch") to my entities since I wish to have them saved into Elasticsearch and don't need them in Lucene. I also tried adding a fake entity with #Indexed(..., backend = "lucene") but it made no difference.
What have I got configured wrong?

I don't know why both hibernate.search.backends.elasticsearch.type and hibernate.search.backends.lucene.type are necessary but if I don't include the lucene.type, I get Ambiguous backend type: configuration property 'hibernate.search.backends.lucene.type' is not set.
That's because the backend name is just that: a name. Hibernate Search doesn't infer particular information from it, even if you name your backend "lucene" or "elasticsearch". You could have multiple Elasticsearch backends for all it knows :)
But if I do have both backend properties types set, I get HSEARCH000575: No default backend. Check that at least one entity is configured to target the default backend, when attempting to retrieve the Lucene backend, like:
Search.mapping(entityManager.getEntityManagerFactory())
.backend().unwrap(LuceneBackend.class);
``
You called .backend(), which retrieves the default backend, i.e. the backend that doesn't have a name and is configured through hibernate.search.backend.* instead of hibernate.search.backends.<somename>.* (see https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#configuration-structure ).
But you are apparently mapping all your entities to a named backends, one named elasticsearch and one named lucene. So the default backend just doesn't exist.
You should call this:
Search.mapping(entityManager.getEntityManagerFactory())
.backend("lucene").unwrap(LuceneBackend.class);
I've also added #Indexed(..., backend = "elasticsearch") to my entities since I wish to have them saved into Elasticsearch
Since you obviously only want to use one backend for indexing, I would recommend reverting that change (keeping #Indexed without setting #Indexed.backend) and simply making using the default backend.
In short, remove the #Indexed.backend and replace this:
properties.setProperty("hibernate.search.backends.elasticsearch.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backends.elasticsearch.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backends.elasticsearch.uris", "http://127.0.0.1:9200");
With this
properties.setProperty("hibernate.search.backend.analysis.configurer", "com.bt.demo.search.AnalysisConfigurer");
properties.setProperty("hibernate.search.backends.lucene.analysis.configurer", "com.bt.demo.search.CustomLuceneAnalysisConfigurer");
properties.setProperty("hibernate.search.backend.type", "elasticsearch");
properties.setProperty("hibernate.search.backends.lucene.type", "lucene");
properties.setProperty("hibernate.search.backend.uris", "http://127.0.0.1:9200");
You don't technically have to do that, but I think it will be simpler in the long term. It keeps the Lucene backend as a separate hack that doesn't affect your whole application.
I also tried adding a fake entity with #Indexed(..., backend = "lucene")
I confirm you will need that fake entity mapped to the "lucene" backend, otherwise Hibernate Search will not create the "lucene" backend.

How can I handle a giant static configuration class used in multiple applications?

A related group of applications I've been assigned share a database table for configuration values that has a table with columns 'application', 'config_name', 'config_type' (IE String, Integer), and 'config_value'. There's also a stored procedure that takes in a string (applicationName) and returns all config names, types, and values where applicationName == application.
In each application, a wrapper class is instantiated which contains a static ThreadLocal (hereafter 'static config)', and that static config pulls all values from the config table for the application.
When loading configuration values, the stored procedure returns a massive list of properties that are iterated over, going through a massive list of if-else statements testing whether the 'config_name' column matches a string literal, and if so, loads the value into a differently named variable.
EX:
if (result.isBeforeFirst()) {
while(result.next()) {
if (result.getString("config_name").equals("myConfig1") {
myConfigurationValue1 = result.getString(config_value); }
else if (result.getString("config_name").equals("myConfig2") {
myConfigurationValue2 = result.getString(config_value); }
}}
These cover between 60-100ish configs per app, and each application has an identical Configuration class save for the names of the properties they're trying to read.
So my questions are:
Having one gigantic configuration class is poor design, right? I'm not entirely sure how to break them down and I can't share them here, but I'm assuming best practice would be to have multiple configuration classes that have all the components needed to perform a particular operation, IE 'LocalDatabaseConfig' or '(ExternalSystemName)DatabaseConfig'?
Once broken down, what's the best way to get config valuee where needed without static access? If I have each class instantiate the configuration it needs I'll be doing a lot of redundant db operations, but if I just pass them from the application entry point then many classes have to 'hold on' to data they don't need to feed it to later classes... Is this a time when static config classes are the best option??
Is there an elegant way to load properties from the DB (in core java - company is very particular with third party libraries) without using this massive if-else chain I keep thinking that ideally we'd just dynamically load each property as it's referenced, but the only way I can think to do that is to use another stored procedure that takes in a unique identifier for a property and load it that way, but that would involve a lot more string literals...m
(Might be invalidated by 3) Is there a better way for the comparison in the pseudo-code above to test for a property rather than using a string literal? Could this be resolved if we just agreed to name our configuration properties in the application the same way they're named in the DB?
Currently every application just copy-pastes this configuration class and replaces the string literals and variable names; many of the values are unique in name and value, some are unique in value but are named the same between applications (and vice versa), and some are the same name and value for each application, but because the stored procedure fetches values based on application, redundant db entries are necessary (despite that many such values are supposed to be the same at all times, and any change to one needs to be performed on the other versions as well). Would it make sense to create a core library class that can construct any of the proposed 'broken down' configuration classes? IE, every application needs some basic logging configurations that don't change across the applications. We already have a core library that's a dependency for each application, but I don't know whether it would make sense add all/some/none of the configuration classes to the core library...
Thanks for your help! Sorry for the abundance of questions!?

The cascading if-then-else might be eliminated by using a while loop to copy the database-query results into two maps: a Map[String, String] for the string-based configuration variables, and a Map[String, Integer] for the integer configuration variables. Then the class could provide the following operations:
public String lookupStringVariable(String name, String defaultValue) {
String value = stringMap.get(name);
if (value == null) {
return defaultValue;
} else {
return value;
}
}
public int lookupIntVariable(String name, int defaultValue) {
Integer value = intMap.get(name);
if (value == null) {
return defaultValue;
} else {
return value.intValue();
}
}
If there is a requirement (perhaps for runtime performance) to have the configuration values stored in fields of the configuration class, then the configuration class could make the above two operations private and use them to initialize fields. For example:
logLevel = lookupIntVariable("log_level", 2);
logDir = lookupStringVariable("log_dir", "/tmp");
An alternative (but complementary) suggestion is to write a code generator application that will query the DB table and generate a separate Java class for each value in the application column of the DB table. The implementation of a generated Java class would use whatever coding approach you like to query the DB table and retrieve the application-specific configuration variables. Once you have written this generator application, you can rerun it whenever the DB table is updated to add/modify configuration variables. If you decide to write such a generator, you can use print() statements to generate the Java code. Alternatively, you might use a template engine to reduce some of the verbosity associated with print() statements. An example of a template engine is Velocity, but the Comparison of web template engines Wikipedia article lists dozens more.

You would be better off separating the database access from the application initialisation. A basic definition would be Map<String,String> returned by querying for one application's settings:
Map<String,String> config = dbOps.getConfig("myappname");
// which populates a map from the config_name/config_value queries:
// AS: config.put(result.getString("config_name"), result.getString("config_value");
Your application code then can initialise from the single application settings:
void init(Map<String,String> config) {
myConfigurationValue1 = config.get("myConfig1");
myConfigurationValue2 = config.get("myConfig2");
}
A benefit of this decoupling is that you define test cases for your application by hardwiring the config for different permutations of Map settings without accessing a huge test database configurations, and can test the config loader independently of your application logic.
Once this is working, you might consider whether dbOps.getConfig("myappname") caches the per-application settings to avoid excessive queries (if they don't change on database), or whether to declare Config as a class backed by Map but with calls for getInt / get and default values, and which throw RuntimeException if missing keys:
void init(Config config) {
myConfigurationValue1 = config.get("myConfig1", "aDefaultVal");
myConfigurationInt2 = config.getInt("myConfig2", 100);
}

How to make a dynamic DTO without to send a complete Payload in Spring

I have the updateProvider(ProviderUpdateDto providerUpdt) method in my Spring controller, But I do not see the need to send the whole payload of the provider entity, if for example the client can only update the name or other attribute, that is, it is not necessary to send the whole entity if only it is necessary to update a field, this produces a Excessive bandwidth consumption when it is not necessary.
What is a better practice to send only the fields that are going to be updated and be able to build a DTO dynamically? and How would I do if I'm using Spring Boot to build my API?

You can use Jackson library, it provides the annotation #JsonInclude(Include.NON_NULL) and with this only properties with not null values will be passed to your client.
Check the link http://www.baeldung.com/jackson-ignore-null-fields for an example.

There are many technique to improve bandwidth usage
not pretty print Json
enable HTTP GZIP compression
However, it is more important to ensure ur API is logically sound, omitting some fields may break the business rules, too fine grain API design will also increase the interface complexity

Another option would be to have a DTO object for field changes which would work for every entity you have. E.g:
class EntityUpdateDTO {
// The class of the object you are updating. Or just use a custom identifier
private Class<? extends DTO> entityClass;
// the id of such object
private Long entityId;
// the fields you are updating
private String[] updateFields;
// the values of those fields...
private Object[] updateValues;
}
Example of a json object:
{
entityClass: 'MyEntityDTO',
entityId: 324123,
updateFields: [
'property1',
'property2'
],
updateValues: [
'blabla',
25,
]
}
Might bring some issues if any of your updateValues are complex objects themselves though...
Your API would become updateProvider(EntityUpdateDTO update);.
Of course you should leave out the entityClass field if you have an update API for each DTO, as you'd already know which class entity you are working on...
Still, unless you are working with huge objects I wouldn't worry about bandwidth.

Morphia - change class associated with a collection

I'm trying to phase out an older java codebase that uses MongoDB/Morphia. During this transition, I'd like the new platform to write to the same MongoDB database/collections so that each can live side by side for a little while. That part I'm doing alright with. My issue is that in the new platform, I need a different package/class structure for the objects I'm mapping with morphia than what is currently in the collection.
For instance, in the old platform I've got this class:
package com.foo;
#Entity
public class Bar {
#Id private String id;
private String name;
...
}
In my mongo database, I now have a collection "Bar" and its documents have the className attribute set to "com.foo.Bar". That's all wonderful.
What I'd like to do in the new platform is create a brand new class in a different package to represent that entity, but have it interact with mongo in the same way. I'm hoping to be able to do something like this:
package com.foo.legacy;
#Entity("com.foo.Bar")
public class LegacyBar {
#Id private String id;
private String name;
...
}
I realize the above doesn't work, but if I change the annotation to #Entity("Bar") I don't get any errors, but when I look up entities by id, I always get null back.
So... is there any way for me to have 2 separate VMs with 2 class structures and 2 different configurations of Morpha such that each can write to the same database/collection in the same fashion?
If I change LegacyBar to just "Bar" and create it in a package called "com.foo" then everything works as expected. I would just REALLY prefer to have the flexibility to quarantine all of this legacy data in a semi-clean fashion.

Do you even need the className attribute?
You can disable it with
#Entity(value = "Bar", noClassnameStored = true)
and drop the attribute in the database.
Quoting the official documentation:
Why would you need it?
This is mainly used when storing different
entities in the same collection and reading them back as the base or
super class.
If you don't do this, it should be an easy workaround to allow different package structures.

Avoid implicit eager-loading of collections stubs?

I am using Spring-data to access a Neo4j database via REST.
One of my entities looks similar to the following one:
#NodeEntity
#TypeAlias("org.example.Foo")
public class Foo {
#GraphId
private Long nodeId;
//...
#RelatedTo(type="HAS_BAR", direction=Direction.OUTGOING)
private Set<Bar> bars;
//...
}
A typical Foo might have anywhere from 0-1000 Bars. Most of the time, those Bars are not needed when loading a Foo so I thought I should by fine by not adding a #Fetch annotation and thus avoiding to eager-load the Bars.
However, when now loading a Foo using the generated repository methods, the Bars are loaded - at least partially (only their nodeId properties).
Is there any way to avoid this? Performance suffers quite much from this behavior.
I really would like to be able to use lazy-loading like shown in https://stackoverflow.com/a/16159051/232175 for the collection itself.

For the lazy-fetching to work, spring data creates a proxy for all the Bar's with just enough information (node id) that can be used to lazily fetch the Bar's when required. That is why the Bar's are being created in your case. I suggest you use the Neo4jTemplate to pull just the Foo's properties that you are looking for as shown below
Result<Map<String, Object>> result = template.query("START n=node({0}) RETURN n.property1, n.property2, n.property3");
result.handle(new Handler<Map<String, Object>>()
{
#Override
public void handle(Map<String, Object> row)
{
System.err.println(row.get("n.property1"));
System.err.println(row.get("n.property2"));
System.err.println(row.get("n.property3"));
}
});

If you don't any particular reason to use neo4j via rest you can use it embedded with the aspectj mapping that doesn't have this problem. You could use it also via REST but according to this post Neo4j Spring data POC for social RESTful layer is better to avoid it.

If you dont need the Bars most of the time, remove them from your entity and and just load them if needed with cypher?
Apart from that, spring-data-neo4j doesnt support explicit lazy loading in simple mode, but you might try your luck with the advanced mapping mode(http://static.springsource.org/spring-data/data-graph/snapshot-site/reference/html/#reference:aspectj)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.