How to avoid creating duplicate nodes when ID is not known

How to avoid creating duplicate nodes when ID is not known - java

I am using Neo4J to save events from Git-lab web hooks.
An example of the data can be found here https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/web_hooks/web_hooks.md#push-events
One of the nodes is an Author {name,email}
Here the email is the natural unique id.
In Hibernate(JPA) there is an annotation called #Id that i could set on the Author field email (ref to docs).
How can I make Neo4J OGM to persist/merge based on email instead of it's id?

One of the fastes/easiest solution would be to use constraints:
Create CONSTRAINT ON (a:Author) ASSERT a.email IS UNIQUE
This way, neo4j will ensure constraint is respected, and you don't have to implement uniqueness on server side, since database is doing it.

In an ETL tool you would reasonably expect to able to define your source keys/identities. Neo4j OGM is not really an ETL tool, but if you want to use it as a data importer, you have a couple of options.
The first is to manage the key-mappings yourself. Of course, this may be impractical, depending on volumes and other considerations. The second is always try to fetch a given object from the graph via its email address before saving any item from your event feed.

I have a couple of comments.
Administrator seems to be the value of the name property, not the email property. Are you sure the 3 Administrator nodes in you image actually have the same email property value? To make the neo4j Browser show the email values, set the caption for the Author nodes to email instead of name.
Aside from the above, it looks like you have already tried this Cypher query, but are getting what seem to be duplicates:
MERGE (n:Author {name: {name}, email: {email}})
RETURN n
That could be explained if it is possible for the input data to contain multiple names for the same email address. In that case, the following query should prevent "duplicate" Author nodes. If an Author with the parameterized email address already exists (regardless of its name value), it just returns the existing node (without changing its name); otherwise, it creates a new Author node with the parameterized email and name properties. This solution would mean that only the first name encountered for an email address will be stored in the DB.
MERGE (n:Author {email: {email}})
ON CREATE SET n.name = {name}
RETURN n

Related

How to get users of group (with nested) in OpenLDAP (UnboundID Java API)

H everyone,
I am having a problem with getting all the users that are inside of one group. I have the group name, and my task is to get list of all users. I do not have enabled memberOf property in OpenLDAP server. So far I was able to get the group using:
"(&(objectClass=groupOfNames)(cn=" + groupName + "))";
When I got this, I used the attribute member of found group e.g "cn=ldapuser1,ou=Users,dc=example,dc=com"
Then having this I did another query to get all users with given name (in the example above the name would be ldapuser1). I user this query:
"(&(objectClass=inetOrgPerson)(|" + builder + "))"; // it can contain several names
Problem is that if my main group contain another group... my second query would not work.
So what works for now is: (but is not that straightforward and easy) getting users of one single group (2 calls to the server are required - first to get group, and then based on member attribute I do second query that asks for specific users)
What does not work is e.g if one single group contain one user and one group that e.g. contains 2 users, in the final result with my current solution I got only one user. What I want is to have three users as a result in this example.
I have already working Active Directory user search and it is so simple - I use just memberOf with "1.2.840.113556.1.4.1941" filter for nested groups and their users. Why it can not be this simple with OpenLDAP?
So my final question is what is the best approach to implement this, how to build this kind of query ?
I would really like some advice from you guys,
any help would be appreciated!
many thanks,
cheers

Why it can not be this simple with OpenLDAP?
"1.2.840.113556.1.4.1941" (aka LDAP_MATCHING_RULE_IN_CHAIN) is an Extensible Match operator that walks the chain of ancestry in objects all the way to the root until it finds a match and is, as far as I know, only available with Microsoft Active Directory.
You could, of course, write code to evaluate each memberOf value returned to determine if it is a group and then transverse through each group.
A friend of mine once said "Complexity can neither be created or destroyed, but only moved around". Nested groups is one of those type of complexity issues. Do it on Client or server but it is still complex and resource intensive.
Nested groups is a compounding problem and even with Microsoft Active Directory using "1.2.840.113556.1.4.1941" will fail on large nests of groups and/or large numbers of members. I recommend Nested Groups be avoided.

How to search for an unknown collection containing identifying document fields?

Is it possible to search for an unknown collection that contains a document with identifying fields like an email?
My structure is like this:
Each user gets their own collection based on a unique ID. Each collection contains a user doc and a accounts doc. User doc contains the information about the user that I would like to search for. Accounts doc contains a collection of bank accounts that I want to transfer money between users.
My problem is that I don't want users to type in the long unique id to enter the collection but to type the email of the user that is inside the collection\user document. The email is unique.
Have I just made a bad structure for my project or is there something I can do?
UPDATE
Thanks, Alex and Frank for the feedback.
I went on and changed my structure to as shown:
/users/$uid/accounts/$accountid.
Did a java Query collectionReference = db.collection("users").whereEqualTo("uEmail", userEmail); and saving the document.getId() as a String userId.
I Then use the UserId in a spinner to enable the user to pick an account from the userId accounts collection.

As Alex said, there is no way to load data from a collection (with the client-side SDKs) unless you know the collection name.
But in this case, it seems like your collections are named after the user's UID.
That means that if the user is signed in, you can know their collection by:
String uid = FirebaseAuth.getInstance().getCurrentUser().getUid();
CollectionReference userCollection = FirebaseFirestore.getInstance().collection(uid);
A few notes:
It is much more idiomatic to store your structure with a top-level collection of users, and then a document for each user under that, and then subcollections for the other data under that. So for example: /users/$uid/accounts/$accountid.
The server-side SDKs do have a method to get a list of collections, for example like this listCollections method in the Node.js SDK. But these SDKs are only to be used in a trusted environment, such as your development machine, a server you control, or Cloud Functions, and not directly on the client. Even with this SDK though, you'll have to iterate the list of collections and check each in turn, because as said before: you can only read data from a collection of which you know the name.
If you're trying to look up the collection/UID for another user than the one who's signed in to the app, you may need to way to map an email address to a UID. Such functionality is not available in the client-side SDKs. But similar as the point above, there is a method getUserByEmail in the Admin SDKs.

Is it possible to search for an unknown collection that contains a document with identifying fields like an email?
No, you should know the name of your collection in order to be able to use it in your reference. There are no wildcards in Cloud Firestore paths to collections/documents. You have to identify every collection and every document by their specific ids.

using ldap credential and search in postgresql tables

I'm a beginner with LDAP, and I want to use it in the future project with PostgreSQL database.
Suppose that I'll do the authentication with LDAP server, so the user table will not be inserted in PostgreSQL database, in the PostgreSQL database I'll have other tables that must be in relation with the identity of user (that will be retrieved from LDAP) so I have to add a column in each of this tables named uid that store the uid value of the user. Is my idea correct?

What you describe is perfectly fine. Just be aware that which attribute you use as the unique identifier depends on which LDAP directory you are using.
I really only know Active Directory, which does not use the uid attribute at all. AD has a few attributes that are enforced unique:
distinguishedName: Describes where the object is in the directory. It looks something like: CN=Gabriel Luci,OU=Users,DC=domain,DC=com. This is common to LDAP in general, but might be called something different in other LDAP directories.
sAMAccountName: This is commonly referred to as the "username". It must be unique on the domain, but it can be changed.
userPrincipalName: Uses the format username#domain.com. This must be unique in the AD forest, but it can be changed (a "forest" is when there are multiple AD domains in the same organization)
objectSid: (usually just called the SID). It is stored as a byte array, but can be converted to a string that looks like S-1-5-32-##########-###########-##########-#####. This is what is used by Windows in security permissions to grant accounts permissions to files, etc. This cannot be changed.
objectGuid: A GUID that is automatically assigned when the account is created. This cannot be changed.
The first three are human-readable (they will usually have the person's name in it). The other two are not, but they also stay the same for the life of the object (if the person changes their name, the SID and GUID will still be the same).
Which one you use depends on your requirements. The distinguishedName is unique and allows you to bind directly to the object when you need to (as opposed to having to search for the sAMAccountName to find the account). But if you want something that will never change even if the person's name changes, then objectSid or objectGUID is best.

Google App Engine update only one property of an entity that has many efficiently java

Looking for an efficient way to update only one property for an entity in GAE.
I know I can do a get by key, set a property and then put. But will the get not be very inefficient as it will load all properties? I have heard that you can make property specific queries but I was worried that once you load an entity with only say one or two out of its total properties, then put it back in the datastore that the properties not loaded in the query will be lost.
Any Advice?
PS also not sure about the query method because I heard direct gets are more efficient. Any possibility of a query that specifies simply the key and therefore will be just as efficient?

Afaik, entities are stored in a serialised form, so it makes no difference if you need one or all properties as they will all be loaded when entity's serialised form is loaded.
The "property specific queries" are actually called projection queries. They work on indexes only and only recreate "projected" fields you queried by. Since entities are only partially loaded (only projected fields are loaded) they should not be saved back to the Datastore.
Just use normal query and then multi-put. Yes, direct gets are more efficient (and less costly) but you need to have key/id of the entity.

If you need to update one property far more than others, you can move it into a separate, simpler entity that you can load and update independently of the main entity. This could be a child entity, or a separate one that shares key characteristics.
E.g.
Email <- main entity
Unread <- child entity of email
When the email is created, create an unread entity. When it's read, delete the unread entity. When searching for unread emails, perform a key-only query on the Unread entities, extract parent keys to find the Email entities you want.

Does MongoDB duplicate subdocument with identical data?

I'm completely new to MongoDB and looking at moving my base persistence code (for many projects) over to it using JDO as an agnostic layer. So I'm asking this question from the perspective of a java developer who likes to the work with beans as the basic model unit.
My question is about subdocuments and whether they exists independently or are internally consolidated by MongoDB. i.e. if I had a domain structure like this:
Household - collection of Persons
Person
- name
- address
Address
- street
- postcode
If I had a document for a household it would have multiple Persons but each Person would have the same address.
Would each address be a distinct and separate entity within MongoDB (even though they are the same 'class' and have the same values. Or does Mongo somehow identify that they are referring to the same entity and internally store a UID for each Address?
More importantly. If I update the postcode for one address does that mean that every member of Household's address subdocument would reflect that change?
It seems if it does then it's straying into the relational sphere but without such referencing I can see horrible inefficiencies arising?

Mongo will not deduplicate those subdocuments for you, no. If you want to normalize that data, you'll need to save those addresses in to a different collection (ideally) and store DBRefs to those documents when you save the enclosing documents. Using something like morphia or spring-data can help manage those references for you.

If persisting data via JDO you have the choice of embedding the Person+Address into Household, or persisting as individual objects (just like you do with RDBMS). If storing as not-embedded then its up to you whether you have multiple copies of the same Person, or a single one referred to by multiple Households. If storing as embedded then they are embedded, so part of Household, hence info is dupd.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.