Does MongoDB duplicate subdocument with identical data? - java

I'm completely new to MongoDB and looking at moving my base persistence code (for many projects) over to it using JDO as an agnostic layer. So I'm asking this question from the perspective of a java developer who likes to the work with beans as the basic model unit.
My question is about subdocuments and whether they exists independently or are internally consolidated by MongoDB. i.e. if I had a domain structure like this:
Household - collection of Persons
Person
- name
- address
Address
- street
- postcode
If I had a document for a household it would have multiple Persons but each Person would have the same address.
Would each address be a distinct and separate entity within MongoDB (even though they are the same 'class' and have the same values. Or does Mongo somehow identify that they are referring to the same entity and internally store a UID for each Address?
More importantly. If I update the postcode for one address does that mean that every member of Household's address subdocument would reflect that change?
It seems if it does then it's straying into the relational sphere but without such referencing I can see horrible inefficiencies arising?

Mongo will not deduplicate those subdocuments for you, no. If you want to normalize that data, you'll need to save those addresses in to a different collection (ideally) and store DBRefs to those documents when you save the enclosing documents. Using something like morphia or spring-data can help manage those references for you.

If persisting data via JDO you have the choice of embedding the Person+Address into Household, or persisting as individual objects (just like you do with RDBMS). If storing as not-embedded then its up to you whether you have multiple copies of the same Person, or a single one referred to by multiple Households. If storing as embedded then they are embedded, so part of Household, hence info is dupd.

Related

java object querying ( applying SQL logic on List of java objects)

I have a Employee object for example, Employee contains several fields (300+) like name, department, salary, age, account, etc.
Entire Employee table data cached into java List object , which contains 2+ million records.
Requirement
user can search on any filed presents in Employee object like employee name like Sehwag and age > 30 or salary > 100000, Based on user search we have to show filtered list of Employee list.
due to performance issues we are not querying DB, we want to apply the user search criteria on cached java List object earlier
is there way api / frameworks / any other solution where we can query on java objects?
below approach I am trying but I am feeling not a good approach
Iterating the Employee list and applying condition user search criteria on employee object, to know the user selected search criteria among 300 fields is challenging, written a lot enum mapping logic and some additional logic for every filed to make it work.
with current requirement it may works but thinking to use api or framework or better way to solve the requirement!
thanks in advance for your help.
First of all. If you either don't want or don't have chance to change the existing solution, take a look at querydsl.
There is a querydsl-collection module which fits exactly with your need. See more at http://www.querydsl.com/ and http://www.querydsl.com/static/querydsl/latest/reference/html/ch02s08.html
However, if you have a chance to review/rebuild the solution, you should consider something more appropriate for large volume querying. I suggest you exploring more about nonsql databases (mongodb) or indexing tools such as lucene or elasticsearch which adds a RESTFul layer on top of lucene.
I hope it helps.
tried CQEngine's SQL based queries https://github.com/npgall/cqengine suits to my requirement,
below are some useful links
https://dzone.com/articles/getting-started-cqengine-linq
https://mvnrepository.com/artifact/com.googlecode.cqengine/cqengine

Google App Engine update only one property of an entity that has many efficiently java

Looking for an efficient way to update only one property for an entity in GAE.
I know I can do a get by key, set a property and then put. But will the get not be very inefficient as it will load all properties? I have heard that you can make property specific queries but I was worried that once you load an entity with only say one or two out of its total properties, then put it back in the datastore that the properties not loaded in the query will be lost.
Any Advice?
PS also not sure about the query method because I heard direct gets are more efficient. Any possibility of a query that specifies simply the key and therefore will be just as efficient?
Afaik, entities are stored in a serialised form, so it makes no difference if you need one or all properties as they will all be loaded when entity's serialised form is loaded.
The "property specific queries" are actually called projection queries. They work on indexes only and only recreate "projected" fields you queried by. Since entities are only partially loaded (only projected fields are loaded) they should not be saved back to the Datastore.
Just use normal query and then multi-put. Yes, direct gets are more efficient (and less costly) but you need to have key/id of the entity.
If you need to update one property far more than others, you can move it into a separate, simpler entity that you can load and update independently of the main entity. This could be a child entity, or a separate one that shares key characteristics.
E.g.
Email <- main entity
Unread <- child entity of email
When the email is created, create an unread entity. When it's read, delete the unread entity. When searching for unread emails, perform a key-only query on the Unread entities, extract parent keys to find the Email entities you want.

Is modeling infinite-scale relationships in NoSQL / BigTable (GAE) possible?

My team is writing an application with GAE (Java) that has led me to question the scalability of entity relationship modeling (specifically many-to-many) in object oriented databases like BigTable.
The preferred solution for modeling unowned one-to-many and many-to-many relationships in the App Engine Datastore (see Entity Relationships in JDO) seems to be list-of-keys. However, Google warns:
"There are a few limitations to implementing many-to-many
relationships this way. First, you must explicitly retrieve the values
on the side of the collection where the list is stored since all you
have available are Key objects. Another more important one is that you
want to avoid storing overly large lists of keys..."
Speaking of overly large lists of keys, if you attempt to model this way and assume that you are storing one Long for each key then with a per-entity limit of 1MB the theoretical maximum number of relationships per entity is ~130k. For a platform who's primary advantage is scalabililty, that's really not that many relationships. So now we are looking at possibly sharding entities which require more than 130k relationships.
A different approach (Relationship Model) is outlined in the article Modeling Entity Relationships as part of the Mastering the datastore series in the AppEngine developer resources. However, even here Google warns about the performance of relational models:
"However, you need to be very careful because traversing the
connections of a collection will require more calls to the datastore.
Use this kind of many-to-many relationship only when you really need
to, and do so with care to the performance of your application."
So by now you are asking: 'Why do you need more than 130k relationships per-entity?' Well I'm glad you asked. Let's take, for example, a CMS application with say 1 million users (Hey I can dream right?!)
Users can upload content and share it with:
1. public
2. individuals
3. groups
4. any combination
Now someone logs in, and navigates to a dashboard that shows new uploads from people they are connected to in any group. This dashboard should include public content, and content shared specifically with this user or a group this user is a member of. Not too bad right? Let's dig into it.
public class Content {
private Long id;
private Long authorId;
private List<Long> sharedWith; //can be individual ids or group ids
}
Now my query to get everything an id is allowed to see might look like this:
List<Long> idsThatGiveMeAccess = new ArrayList<Long>();
idsThatGiveMeAccess.add(myId);
idsThatGiveMeAccess.add(publicId); //Let's say that sharing with 0L makes it public
for (Group g : groupsImIn)
idsThatGiveMeAccess.add(g.getId());
List<Long> authorIdsThatIWantToSee = new ArrayList<Long>();
//Add a bunch of authorIds
Query q = new Query("Content")
.addFilter("authorId", Query.FilterOperator.IN, authorIdsThatIWantToSee)
.addFilter("sharedWith", Query.FilterOperator.IN, idsThatGiveMeAccess);
Obviously I've already broken several rules. Namely, using two IN filters will blow up. Even a single IN filter at any size approaching the limits we are talking about would blow up. Aside from all that, let's say I want to limit and page through the results... no no! You can't do that if you use an IN filter. I can't think of any way to do this operation in a single query - which means you can't paginate it without extensive read-time processing and managing multiple cursors.
So here are the tools I can think of for doing this: denormalization, sharding, or relationship entities. However even with these concepts I don't see how it is possible to model this data in a way that could scale. Obviously it's possible. Google and others do it all the time. I just can't see how. Can anyone shed any light on how to model this or point me toward any good resources for cms-style access control based on NoSQL DB?
storing a list of ids as a property wont scale.
Why not simply store a new object for each new relationship? (Like in sql).
That object will store for your cms two properties: The id of the shared item and the user id. If its shared with 1000 users you will have 1000 of these. Querying it for a given user is trivial. Listing permissions for a given item or a list of what a user has shared with them is easy too.

How to accomplish CRUD operations on dynamic forms?

I have followed Balusc's 1st method to create dynamic form from fields defined in database.
I can get field names and values of posted fields.
But I am confused about how to save values into database.
Should I precreate a table to hold values after creating form and
save values there manually (by forming SQL query manually)?
Should I convert name/value pairs to JSON objects
and save?
Should I create a simple table with id,name,value field and
save name/value pairs here (Like EAV Scheme)?
Or is there any way for persisting posted values into database?
Regards
It look like that you're trying to work bottom-up instead of top-down.
The dynamic form in the linked answer is intented to be reused among all existing tables without the need to manually create separate JSF CRUD forms on "hardcoded" Facelets files for every single table. You should already have a generic model available which contains information about all available columns in the particular DB table (which is Field in the linked answer). This information can be extracted dynamically and generically via JPA metadata information (how to do that in turn depends on the JPA provider used) or just via good 'ol JDBC ResultSetMetaData class once during application's startup.
If you really need to work bottom-up, then it gets trickier. Creating tables/columns during runtime is namely a very bad design (unless you intend to develop some kind of DB management tool like PhpMyAdmin or so, of course). Without the need to create tables/columns runtime, you should basically have 3 tables:
1 table which contains information about which "virtual" DB tables are all available.
1 table which contains information which columns one such "virtual" DB table has.
1 table which contains information which values one such column has.
Then you should link them together by FK relationships.

using objectify how to get a subset of properties for an object

I have a large object that I store using objectify. I need a list of those objects with only subset of the properties populated. How can this be done?
App Engine stores and retrieves entities as encoded Protocol Buffers. There's no way for the underlying infrastructure to store, update, or retrieve only part of an entity, so there's no point having a library that does this - hence Objectify, like other libraries, don't. If you regularly need to access only part of an entity, split those fields into a separate entity.
It's not a good idea to split an entity in two in a noSql database: when you need to read a list of entries, you would be obliged to do n requests to get the second part of the list (n x m if your data is split in more entities). This is naturally due to the fact that there is no possible join in noSql databases.
What could be done is to "cache": duplicate the needed subset in another entity to get the most of performance. It has the disadvantage of being obliged to write twice on a persist of the main entity (if a field of the subset was changed).
What I usually do is write a /** OPTIMIZE xxxx */ comment on the class that needs to read a subset and get back to it when I need more performance.

Categories

Resources