Google Dataflow: Write to Datastore without overwriting existing entities

Google Dataflow: Write to Datastore without overwriting existing entities - java

TLDR: Looking for a way to update Datastore entities without overwriting existing data via Dataflow
I'm using dataflow 2.0.0 (beam) to update entities in Google Datastore. My dataflow loads entities from datastore, updates them, and then saves them back into datastore (overwriting existing entities).
However, during the update process I also discover additional entities that may or may not already exist. In order to prevent overwriting existing entities, I previously would load all the entities from Datastore and reduce them (group by key), removing new duplicates.
As the number of entities grows, I want to avoid having to load all entities into Dataflow (instead taking them in batches based on oldest timestamps), but I'm coming across the problem that old entities are getting overwritten when they are not in the current batch.
I'm writing the entities to Dataflow using (in two spots, one for existing entities, and one for new entities):
collection.apply(DatastoreIO.v1().write().withProjectId("..."))
It would be really nice if there was something like a DatastoreIO.v1().writeNew() method but sadly it doesn't exist. Thank you for any help.

If you want to write a new entity that does not exist on Datastore, you just create one with a new key and write it.
List<String> keyNames = Arrays.asList("L1", "L2"); // Somewhat you have new keys to store
PTransform<PCollection<Entity>, ?> write =
DatastoreIO.v1().write().withProjectId(project_id); // This is a typical write operation
p.
apply("GetInMemory", Create.of(keyNames)).setCoder(StringUtf8Coder.of()). // L1 and L2 are loaded
apply("Proc1", ParDo.of(new DoFn<String, Entity>(){
#ProcessElement
public void processElement(ProcessContext c) {
Key.Builder key = makeKey("k2", c.element()); // Generate an entity key
final Entity entity = Entity.newBuilder().
setKey(key). // Set the key
putProperties("p1", makeValue(new String("test constant value")
).setExcludeFromIndexes(true).build()).
build();
c.output(entity);
}
})).
apply(write); // Write them
p.run();
Entire code can be referred in my code repository at https://github.com/yiu31802/gcp-project/commit/cc224b34

Related

How to update only a subset of fields and update the repository?

I'm making a spring boot application, and I'm looking to update an existing entry in the DB through my service and controller. In my service layer I have the below method. So I'm retrieving the fields associated with a caseID, creating a model mapper which maps my entity object class to my VO, and then mapping the retrieved data to my DTO. Then I save my repository. The purpose is to add only the fields which I have specified in my req message ie if I only want to update 1 field out of 20, it updates this field and leaves the rest untouched. The below runs successfully, but the field I specify in my request message in postman does not update in the DB. Why is this? I have tried mapping different objects and saving different variables to the repository but nothing seems to update the DB.
public StoredOutboundErrorCaseVO updateCase(OutboundErrorCaseVO outboundErrorCaseVO, Long caseNumber) {
OutboundErrorCaseData existingCaseData = ErrorCaseDataRepository.findById(caseNumber).get();
ModelMapper mm = new ModelMapper();
mm.getConfiguration().setAmbiguityIgnored(true);
OutboundErrorCaseData uiOutboundErrorCaseData = mm.map(outboundErrorCaseVO,
OutboundErrorCaseData.class);
mm.map(existingCaseData, uiOutboundErrorCaseData);
ErrorCaseDataRepository.save(uiOutboundErrorCaseData);
return mm.map(uiOutboundErrorCaseData, StoredOutboundErrorCaseVO.class);
}
Controller - code omitted for brevity, POST method (I usually use PUT for updates but I believe I can still use POST)
StoredOutboundErrorCaseVO updatedCase = outboundErrorService.updateCase(outboundErrorCaseVO,
caseNumber);
Repo
#Repository
public interface OutboundErrorCaseDataRepository extends JpaRepository<OutboundErrorCaseData, Long> {

You are getting data and passing it into existingCaseData and save uiOutboundErrorCaseData. So my guess is Hibernate is adding a new object into the database with new Id and with you updated value. It of course depends on your model definition. Especially id.
I also think Hibernate won't let you save uiOutboundErrorCaseData with the same Id if you already have an object in Hibernate Session associated with that id. So, why don't you update existingCaseData with the new value and save it back.

I created a working solution, although I realise it can be improved, it certainly works. The only drawback is that I need to specify all the fields which can be updated, ideally I want a solution which takes in n number of fields and updates the record.
OutboundErrorCaseData existingCaseDta = ErrorCaseDataRepository.findById(caseNumber).get();
if (outboundErrorCaseVO.getChannel() != null) {
existingCaseDta.setChannel(outboundErrorCaseVO.getChannel());
}
ErrorCaseDataRepository.save(existingCaseDta);
ModelMapper mm = new ModelMapper();
return mm.map(existingCaseDta, StoredOutboundErrorCaseVO.class);

HibernateException when updating a collection configured with delete orphan : can't save the parent object

I work on a Java project and I have to write a new module in order to copy some data from one database to another (same tables).
I have an entity Contrat containing several fields and the following field :
#OneToMany(mappedBy = "contrat", fetch = FetchType.LAZY)
#Fetch(FetchMode.SUBSELECT)
#Cascade( { org.hibernate.annotations.CascadeType.ALL, org.hibernate.annotations.CascadeType.DELETE_ORPHAN })
#BatchSize(size = 50)
private Set<MonElement> elements = new HashSet<MonElement>();
I must read some "Contrat" objects from a database and write them in another database.
I hesitate between 2 solutions :
use jdbc to query the first database and get the objects and then write those objects into the second database (paying attention to the order and the different keys). It will be long.
as the project currently uses Hibernate and contains all hibernate mapping classes, I was thinking about opening a first session to the first database, reading the hibernate Contrat object, setting the ids to null in the children elements and writing the object to the destination database with a second session. It should be quicker.
I wrote a test class for the second use case and the process fails with the following exception :
org.hibernate.HibernateException: Don't change the reference to a
collection with cascade="all-delete-orphan"
I think the reference must change when I set the ids to null, but I am not sure : I don't understand how changing a field of a Collection member can change the Collection reference
Note that if I remove DELETE_ORPHAN from the configuration, everything works, all the objects and their dependencies are written in the database.
So I would like to use the hibernate solution which is faster but I have to keep the DELETE_ORPHAN feature because the application currently uses this feature to ensure that every MonElement removed from the elements Set will be deleted in the database.
I don't need this feature but cannot remove it.
Also, I need to set the MonElement ids to null in order to generate new ones because their id in the first database may exist in the target database.
Here is the code I wrote which works well when I remove the DELETE_ORPHAN option.
SessionFactory sessionFactory = new AnnotationConfiguration().configure("/hibernate.cfg.src.xml").buildSessionFactory();
Session session = sessionFactory.openSession();
// search the Contrat object
Criteria crit = session.createCriteria(Contrat.class);
CriteriaUtil.addEqualCriteria(crit, "column", "65465454");
Contrat contrat = (Contrat)crit.list().get(0);
session.close();
SessionFactory sessionFactoryDest = new AnnotationConfiguration().configure("/hibernate.cfg.dest.xml").buildSessionFactory();
Session sessionDest = sessionFactoryDest.openSession();
Transaction transaction = sessionDest.beginTransaction();
// setting id to null, also for the elements in the elements Set
contrat.setId(null);
for (MonElement element:contrat.getElements()) {
element.setId(null);
}
// writing the object in the database
sessionDest.save(contrat);
transaction.commit();
sessionDest.flush();
sessionDest.close();
This is way faster than managing myself the queries and the primary / foreign keys and dependencies between objects.
Does anyone have an idea to get rid of this exception ?
Or maybe I should change the state of the Set.
In fact I'm not trying to delete any element of this Set, I just want them to be considered as new objects.
If I don't find a solution, I will do something dirty : duplicate all hibernate entity objects in my new project and remove the DELETE_ORPHAN parameter in the newly created Contrat.
So the application will continue using its mapping and my new project will use my specific mapping. But I want to avoid that.
Thanks

A correct solution has been written by crizzis as a comment to my question.
I quote him :
I'd try wrapping the contrat.elements in a new collection (contrat.setElements(new HashSet<>(contrat.getElements())) before trying to persist the contract with the new session
It works well.

I want to find a record from cached list using key in #Cacheable or any other mechanism

Hi there i need help I'm new to caching data. I'm using ehcache in spring application using xml configuration and I want to use different keys on different method to find same record. Suppose, one method is annotated like this:
#Cacheable(value="getCustomerByAreaId",key="T(java.lang.String).valueOf(#areaid)")
public List<Customer> getCustomerByAreaId(String areaid) {
//code
return customerList;
}
it will return all the customers having same area id. This list will be stored in the cache as each customer have unique customer id. Can I use some mechanism to fetch single customer record from cache= getCustomerByAreaId based on customer id.
#Cacheable(value="getCustomerByAreaId",key="T(java.lang.String).valueOf(#customerId)")
public Customer getCustomerById(long customerId) {
// code
return customer;
}
I know if I make key like this it will enter a new record in the cache (getCustomerByAreaId) with the new key.
Instead I want to fetch record from list that is already being cached.
If it is possible can I do this using xml or java.
I'm using ehcache version 2.5.

This is not possible simply by using Ehcache APIs or Spring's caching abstraction.
If you want to achieve this, you will have to program your own logic to cache the list but also its elements individually.

JPA handle merge() of relationship

I have a unidirectional relation Project -> ProjectType:
#Entity
public class Project extends NamedEntity
{
#ManyToOne(optional = false)
#JoinColumn(name = "TYPE_ID")
private ProjectType type;
}
#Entity
public class ProjectType extends Lookup
{
#Min(0)
private int progressive = 1;
}
Note that there's no cascade.
Now, when I insert a new Project I need to increment the type progressive.
This is what I'm doing inside an EJB, but I'm not sure it's the best approach:
public void create(Project project)
{
em.persist(project);
/* is necessary to merge the type? */
ProjectType type = em.merge(project.getType());
/* is necessary to set the type again? */
project.setType(type);
int progressive = type.getProgressive();
type.setProgressive(progressive + 1);
project.setCode(type.getPrefix() + progressive);
}
I'm using eclipselink 2.6.0, but I'd like to know if there's a implementation independent best practice and/or if there are behavioral differences between persistence providers, about this specific scenario.
UPDATE
to clarify the context when entering EJB create method (it is invoked by a JSF #ManagedBean):
project.projectType is DETACHED
project is NEW
no transaction (I'm using JTA/CMT) is active
I am not asking about the difference between persist() and merge(), I'm asking if either
if em.persist(project) automatically "reattach" project.projectType (I suppose not)
if it is legal the call order: first em.persist(project) then em.merge(projectType) or if it should be inverted
since em.merge(projectType) returns a different instance, if it is required to call project.setType(managedProjectType)
An explaination of "why" this works in a way and not in another is also welcome.

You need merge(...) only to make a transient entity managed by your entity manager. Depending on the implementation of JPA (not sure about EclipseLink) the returned instance of the merge call might be a different copy of the original object.
MyEntity unmanaged = new MyEntity();
MyEntity managed = entityManager.merge(unmanaged);
assert(entityManager.contains(managed)); // true if everything worked out
assert(managed != unmanaged); // probably true, depending on JPA impl.
If you call manage(entity) where entity is already managed, nothing will happen.
Calling persist(entity) will also make your entity managed, but it returns no copy. Instead it merges the original object and it might also call an ID generator (e.g. a sequence), which is not the case when using merge.
See this answer for more details on the difference between persist and merge.
Here's my proposal:
public void create(Project project) {
ProjectType type = project.getType(); // maybe check if null
if (!entityManager.contains(type)) { // type is transient
type = entityManager.merge(type); // or load the type
project.setType(type); // update the reference
}
int progressive = type.getProgressive();
type.setProgressive(progressive + 1); // mark as dirty, update on flush
// set "code" before persisting "project" ...
project.setCode(type.getPrefix() + progressive);
entityManager.persist(project);
// ... now no additional UPDATE is required after the
// INSERT on "project".
}
UPDATE
if em.persist(project) automatically "reattach" project.projectType (I suppose not)
No. You'll probably get an exception (Hibernate does anyway) stating, that you're trying to merge with a transient reference.
Correction: I tested it with Hibernate and got no exception. The project was created with the unmanaged project type (which was managed and then detached before persisting the project). But the project type's progression was not incremented, as expected, since it wasn't managed. So yeah, manage it before persisting the project.
if it is legal the call order: first em.persist(project) then em.merge(projectType) or if it should be inverted
It's best practise to do so. But when both statements are executed within the same batch (before the entity manager gets flushed) it may even work (merging type after persisting project). In my test it worked anyway. But as I said, it's better to merge the entities before persisting new ones.
since em.merge(projectType) returns a different instance, if it is required to call project.setType(managedProjectType)
Yes. See example above. A persistence provider may return the same reference, but it isn't required to. So to be sure, call project.setType(mergedType).

Do you need to merge? Well it depends. According to merge() javadoc:
Merge the state of the given entity into the current persistence
context
How did you get the instance of ProjectType you attach to your Project to? If that instance is already managed then all you need to do is just
type.setProgessive(type.getProgressive() + 1)
and JPA will automatically issue an update effective on next context flush.
Otherwise if the type is not managed then you need to merge it first.
Although not directly related this quesetion has some good insight about persist vs merge: JPA EntityManager: Why use persist() over merge()?
With the call order of em.persist(project) vs em.merge(projectType), you probably should ask yourself what should happen if the type is gone in the database? If you merge the type first it will get re-inserted, if you persist the project first and you have FK constraint the insert will fail (because it's not cascading).

Here in this code. Merge basically store the record in different object, Let's say
One Account pojo is there
Account account =null;
account = entityManager.merge(account);
then you can store the result of this.
But in your code your are using merge different condition like
public void create(Project project)
{
em.persist(project);
/* is necessary to merge the type? */
ProjectType type = em.merge(project.getType());
}
here
Project and ProjectType two different pojo you can use merge for same pojo.
or is there any relationship between in your pojo then also you can use it.

How to create entities in one Entity group?

I am building an app based on google app engine (Java) using JDO for persistence.
Can someone give me an example or a point me to some code which shows persisting of multiple entities (of same type) using javax.jdo.PersistenceManager.makePersistentAll() within a transaction.
Basically I need to understand how to put multiple entites in one Entity Group so that they can be saved using makePersistentAll() inside transaction.

This section of the docs deals with exactly that.

i did this:
public static final Key root_key = KeyFactory.createKey("Object", "RootKey");
...
so a typical datastore persistent object will set the id in the constructor instead of getting one automatically
public DSO_MyType(string Name, Key parent)
{
KeyFactory.Builder b = new KeyFactory.Builder(parent);;
id = b.addChild(DSO_MyType.class.getSimpleName() , Name).getKey();
}
and you pass root_key as the parent
i'm not sure if you can pass different parents to objects of the same kind

Thanks for the response Nick.
This document only tells about implicit handling of entity groups by app engine when its a parent-child relationship. I want to save multiple objects of same type using PeristentManager.makePersistentAll(list) within a transaction. If objects are not same Entity Group this throws exception. Currently I could do it as below but think there must be a better and more appropriate approach to do this -
User u1 = new User("a");
UserDAO.getInstance().addObject(user1);
// UserDAO.addObject uses PersistentManager.makePersistent() in transaction and user
// object now has its Key set. I want to avoid this step.
User u2 = new User("x");
u2.setKey(KeyFactory.createKey(u1.getKey(),User.class.getSimpleName(), 100 /*some random id*/));
User u3 = new User("p");
u3.setKey(KeyFactory.createKey(u1.getKey(), User.class.getSimpleName(), 200));
UserDAO.getInstance().addObjects(Arrays.asList(new User[]{u2, u3}));
// UserDAO.addObjects uses PersistentManager.makePersistentAll() in transaction.
Although this approach works, the problem with this is that you have to depend on an already persistent entity to create an entity group.

Gopi, AFAIK you don't have to do that... this should work (haven't tested it):
List<User> userList = new ArrayList<User>();
userList.add(new User("a"));
userList.add(new User("b"));
userList.add(new User("c"));
UserDAO().getInstance().addObjects(userList);
Again, AFAIK, this should put all these objects in the same entity group. I'd love to know if I am wrong.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Google Dataflow: Write to Datastore without overwriting existing entities - java

Related

How to update only a subset of fields and update the repository?

HibernateException when updating a collection configured with delete orphan : can't save the parent object

I want to find a record from cached list using key in #Cacheable or any other mechanism

JPA handle merge() of relationship

How to create entities in one Entity group?

Categories

Resources