Testing MongoDB in Java: concurrency problems

Testing MongoDB in Java: concurrency problems - java

I'm testing the part of my Java application where I store data in a MongoDB database. My test setup looks like this:
public class MongoDataStoreTest {
private MongoClient client;
#Before
public void before() throws UnknownHostException {
this.client = new MongoClient();
}
#After
public void after() throws InterruptedException {
this.client.dropDatabase("testdb");
this.client.close();
}
}
In my tests I execute some code which does the following:
I create a DB instance with: DB database = client.getDB("testdb")
I add a collection in the database: database.getCollection("testcoll")
And then I insert a BasicDBObject: collection.insert(object, WriteConcern.SAFE)
Directly after this I query the database using the standard cursor method.
As can be seen in my test setup code, after each test I drop the database and close all client connections. I execute ten such tests. When running them locally everything happens as I expect. The objects are inserted and afterwards the database is dropped for each test (I can see this in the mongo log). However when executing this on a Jenkins server it sometimes happens that when querying the database, an object of the previous test is still in that database, although that database should have been dropped. This looks like a concurrency problem to me, but I can't see where the race condition is situated. I have no access to the database log on the Jenkins server. Does anyone know what I should change to make sure my tests always succeed?

Dont't drop the database. There might be some internal references to it in mongo. I don't beleive, that your test-case needs the DB to be dropped. Usually it's enough simply to remove the all documents in the collections under test

To clear MongoDB databases our code looks like this:
public void clearData() {
try {
for (String collection : datastore.getDB().getCollectionNames()) {
// We must not mess with system indexes and users as this will cause
// errors
if (!collection.startsWith("system.")) {
// Do not drop the entire database or full collections as this
// will lead to missing index errors (for no obvious reason).
datastore.getDB().getCollection(collection).drop();
}
}
} catch (MongoException e) {
LOG.log(Level.INFO,
"Could not fetch all collection names - this is a permission thing, but can be ignored");
}
// The indexes are not automatically recreated (for no obvious
// reason) - ensure they are still there after the drop().
datastore.ensureIndexes();
datastore.ensureCaps();
}

The problem was caused by the dropDatabase operation.
This operation seemed to take longer on the Jenkins server than on my local machine. Since MongoDB doesn't seem to wait until the database is completely dropped it added the new document in the old (dropping) database.
To keep my tests as independent as possible I solved the problem by generating a different unique database name for each test.

Related

Index not Found Exception

So, back again
I have a JHipster generated project which uses an elasticsearch java client embedded in spring boot.
I have recently done some major changes to the datasets since we've been migrating a whole new bunch of data from different repositories
When deploying the application it all works fine, all SearchRepositories are loaded with no problem and all search capabilities roll smooth
The issues come when running from the test environment. There have been no changes what so ever to the application-test.yml file nor to the elasticsearch java config file.
We have some code which updates the indices and I've run it several times, it seems to update the clusters indices just fine, but where I'm suffering is in the target folder, it just won't create the new indices
There are 12 indices that I cannot get in to the target folder when running in test mode, however, only 5 of them fail in their ResourceIntTest because of the error mentioned in the title
I don't want to fill this post with hundreds of irrelevant lines of code, so suffice for now to include the workaround that helps test not to fail:
When in the initTest of the 5 failing test cases, if I write the following line (obviously changing the class name in each different case):
surveyDataQualitySearchRepository.save(surveyDataQualityRepository.findAll());
Then the index will create itself and the testcase will not fail, however this shouldn't be necessary to do manually, it should be created when the resetIndex method in the IndexReinitializer class is called upon deployment
resetIndex:
#PostConstruct
public void resetIndex() {
long t = currentTimeMillis();
elasticsearchTemplate.deleteIndex("_all");
t = currentTimeMillis() - t;
logger.debug("ElasticSearch indexes reset in {} ms", t);
}
Commenting this piece of code also allows all indices to be loaded, but it should not be commented as this serves as an updater for the indices, plus it works fine in an old version of the application which is still pointing to the old dataset
All help will be very welcome, I've been on this almost a full day now trying to understand where the error is coming from, I'm also more than happy to upload any pieces of code that may be relevant to anyone willing to help here.
EDIT To add code for the indices rebuild as requested via comments
#Test
public void synchronizeData() throws Exception{
resetIndex();
activePharmaIngredientSearchRepository.save(activePharmaIngredientRepository.findAll());
countrySearchRepository.save(countryRepository.findAll());
dosageUnitSearchRepository.save(dosageUnitRepository.findAll());
drugCategorySearchRepository.save(drugCategoryRepository.findAll());
drugQualityCategorySearchRepository.save(drugQualityCategoryRepository.findAll());
formulationSearchRepository.save(formulationRepository.findAll());
innDrugSearchRepository.save(innDrugRepository.findAll());
locationSearchRepository.save(locationRepository.findAll());
manufacturerSearchRepository.save(manufacturerRepository.findAll());
outletTypeSearchRepository.save(outletTypeRepository.findAll());
publicationSearchRepository.save(publicationRepository.findAll());
publicationTypeSearchRepository.save(publicationTypeRepository.findAll());
qualityReferenceSearchRepository.save(qualityReferenceRepository.findAll());
reportQualityAssessmentAssaySearchRepository.save(reportQualityAssessmentAssayRepository.findAll());
//rqaaQualitySearchRepository.save(rqaaQualityRepository.findAll());
rqaaTechniqueSearchRepository.save(rqaaTechniqueRepository.findAll());
samplingTypeSearchRepository.save(samplingTypeRepository.findAll());
//surveyDataQualitySearchRepository.save(surveyDataQualityRepository.findAll());
surveyDataSearchRepository.save(surveyDataRepository.findAll());
techniqueSearchRepository.save(techniqueRepository.findAll());
tradeDrugApiSearchRepository.save(tradeDrugApiRepository.findAll());
tradeDrugSearchRepository.save(tradeDrugRepository.findAll());
publicationDrugTypesSearchRepository.save(publicationDrugTypesRepository.findAll());
wrongApiSearchRepository.save(wrongApiRepository.findAll());
}
private void resetIndex() {
long t = currentTimeMillis();
elasticsearchTemplate.deleteIndex("_all");
t = currentTimeMillis() - t;
logger.debug("ElasticSearch indexes reset in {} ms", t);
}

Please try to update to the latest version of spring-data-elasticsearch

How to deal with code that runs before foreach block in Apache Spark?

I'm trying to deal with some code that runs differently on Spark stand-alone mode and Spark running on a cluster. Basically, for each item in an RDD, I'm trying to add it to a list, and once this is done, I want to send this list to Solr.
This works perfectly fine when I run the following code in stand-alone mode of Spark, but does not work when the same code is run on a cluster. When I run the same code on a cluster, it is like "send to Solr" part of the code is executed before the list to be sent to Solr is filled with items. I try to force the execution by solrInputDocumentJavaRDD.collect(); after foreach, but it seems like it does not have any effect.
// For each RDD
solrInputDocumentJavaDStream.foreachRDD(
new Function<JavaRDD<SolrInputDocument>, Void>() {
#Override
public Void call(JavaRDD<SolrInputDocument> solrInputDocumentJavaRDD) throws Exception {
// For each item in a single RDD
solrInputDocumentJavaRDD.foreach(
new VoidFunction<SolrInputDocument>() {
#Override
public void call(SolrInputDocument solrInputDocument) {
// Add the solrInputDocument to the list of SolrInputDocuments
SolrIndexerDriver.solrInputDocumentList.add(solrInputDocument);
}
});
// Try to force execution
solrInputDocumentJavaRDD.collect();
// After having finished adding every SolrInputDocument to the list
// add it to the solrServer, and commit, waiting for the commit to be flushed
try {
if (SolrIndexerDriver.solrInputDocumentList != null
&& SolrIndexerDriver.solrInputDocumentList.size() > 0) {
SolrIndexerDriver.solrServer.add(SolrIndexerDriver.solrInputDocumentList);
SolrIndexerDriver.solrServer.commit(true, true);
SolrIndexerDriver.solrInputDocumentList.clear();
}
} catch (SolrServerException | IOException e) {
e.printStackTrace();
}
return null;
}
}
);
What should I do, so that sending-to-Solr part executes after the list of SolrDocuments are added to solrInputDocumentList (and works also in cluster mode)?

As I mentioned on the Spark Mailing list:
I'm not familiar with the Solr API but provided that 'SolrIndexerDriver' is a singleton, I guess that what's going on when running on a cluster is that the call to:
SolrIndexerDriver.solrInputDocumentList.add(elem)
is happening on different singleton instances of the SolrIndexerDriver on different JVMs while
SolrIndexerDriver.solrServer.commit
is happening on the driver.
In practical terms, the lists on the executors are being filled-in but they are never committed and on the driver the opposite is happening.
The recommended way to handle this is to use foreachPartition like this:
rdd.foreachPartition{iter =>
// prepare connection
Stuff.connect(...)
// add elements
iter.foreach(elem => Stuff.add(elem))
// submit
Stuff.commit()
}
This way you can add the data of each partition and commit the results in the local context of each executor. Be aware that this add/commit must be thread safe in order to avoid data loss or corruption.

have you checked under the spark UI to see the execution plan of this job.
Check how it is getting split into stages and their dependencies. That should give you an idea hopefully.

How to do multiple add operation apache jena tdb

I have to serialize some specific properties (about ten film's properties) for a set of 1500 entity from DBpedia. So for each entity I run a sparql query in order to retrieve them and after that, for each ResultSet I store all the data in the tdb dataset using the default apache jena tdb API. I create a single statement for each property and I add them using this code:
public void addSolution(QuerySolution currSolution, String subjectURI) {
if(isWriteMode) {
Resource currResource = datasetModel.createResource(subjectURI);
Property prop = datasetModel.createProperty(currSolution.getResource("?prop").toString());
Statement stat = datasetModel.createStatement(currResource, prop, currSolution.get("?value").toString());
datasetModel.add(stat);
}
}
How can I do in order to execute multiple add operations on a single dataset? What's the strategy that I should use?
EDIT:
I'm able to execute all the code without errors, but no files were created by the TDBFactory. Why this happens?
I think that I need Joshua Taylor's help

It sounds like the query is running over the remote dbpedia endpoint. Assuming that's correct you can do a couple of things.
Firstly wrap the update in a transaction:
dataset.begin(ReadWrite.WRITE);
try {
for (QuerySolution currSolution: results) {
addSolution(...);
}
dataset.commit();
} finally {
dataset.end();
}
Secondly, you might be able to save yourself work by using CONSTRUCT to get a model back, rather than having to loop through the results. I'm not clear what's going on with subjectURI, however, but it might be as simple as:
CONSTRUCT { <subjectURI> ?prop ?value }
WHERE
... existing query body ...

I've solved my problem and I want to put here the problem that I've got for anyone will have the same.
For each transaction that you do, you need to re-obtain the dataset model and don't use the same for all the transaction.
So for each transaction that you start you need to obtain the dataset model just after the call to begin().
I hope that will be helpful.

What does eclipse do when you inspect variables (while debugging)

I have an instance of org.hibernate.envers.entities.mapper.relation.lazy.proxy.ListProxy that is causing some grief: whenever I programmatically try to access it I get a null pointer exception (ie calling list.size()) but when I first inspect the object using Eclipse's variable inspector I see Hibernate generate a SQL statement and the list changes dynamically. Then everything works. How can I do the same thing programmatically? I've tried list.toString() but that doesn't seem to help.
Update 1
Don't know if this helps but when I first click on the list instance I see in the display:
com.sun.jdi.InvocationException occurred invoking method.
Then database query runs and when I click again I get the correct .toString() result.
Update 2
Here is the original exception I get (when I don't inspect the element in debug mode).
java.lang.NullPointerException
at org.hibernate.envers.query.impl.EntitiesAtRevisionQuery.list(EntitiesAtRevisionQuery.java:72)
at org.hibernate.envers.query.impl.AbstractAuditQuery.getSingleResult(AbstractAuditQuery.java:104)
at org.hibernate.envers.entities.mapper.relation.OneToOneNotOwningMapper.mapToEntityFromMap(OneToOneNotOwningMapper.java:74)
at org.hibernate.envers.entities.mapper.MultiPropertyMapper.mapToEntityFromMap(MultiPropertyMapper.java:118)
at org.hibernate.envers.entities.EntityInstantiator.createInstanceFromVersionsEntity(EntityInstantiator.java:93)
at org.hibernate.envers.entities.mapper.relation.component.MiddleRelatedComponentMapper.mapToObjectFromFullMap(MiddleRelatedComponentMapper.java:44)
at org.hibernate.envers.entities.mapper.relation.lazy.initializor.ListCollectionInitializor.addToCollection(ListCollectionInitializor.java:67)
at org.hibernate.envers.entities.mapper.relation.lazy.initializor.ListCollectionInitializor.addToCollection(ListCollectionInitializor.java:39)
at org.hibernate.envers.entities.mapper.relation.lazy.initializor.AbstractCollectionInitializor.initialize(AbstractCollectionInitializor.java:67)
at org.hibernate.envers.entities.mapper.relation.lazy.proxy.CollectionProxy.checkInit(CollectionProxy.java:50)
at org.hibernate.envers.entities.mapper.relation.lazy.proxy.CollectionProxy.size(CollectionProxy.java:55)
at <MY CODE HERE, which checks list.size()>
Final Solution (Actually more of a temporary hack)
boolean worked = false;
while (!worked) {
try {
if(list.size() == 1) {
// do stuff
}
worked = true;
} catch (Exception e) {
// TODO: exception must be accessed or the loop will be infinite
e.getStackTrace();
}
}

Well what happends there is you're seing Hibernate's lazy loading in deep action :)
Basically hibernate loads proxy classes for you lazily associated relations, such that instead of a List of classes C you get a List (actually a PersistenceBag implementation) of Hibernate autogenerated proxie for your C class. THis is hibernate's way of deferring load of that association's values until they are actually accessed. So that's why when you access it in the eclipse debugger (which basically accesses an instance's fields/methids via introspection) you see the sql hibernate triggers to fetch the needed data.
The trick here is that depending on WHEN you access a lazy collection you might get different results. If you access it using the eclipse debugger you're more likely still in the Hibernate session that started loading that thing, so everything works as expected, an sql is (lazily) triggered when the thing is accessed and the data is loaded). Problem is that if you wanna access that same data in your code, but at a point where the session is already closed, you'll either get a LazyInitializationException or null (the latter if you're using some library for cleaning put hibenrate proxises such as Gilead)

Inserting or updating multiple records in database in a multi-threaded way in java

I am updating multiple records in database. Now whenever UI sends the list of records to be updated, I have to just update those records in database. I am using JDBC template for that.
Earlier Case
Earlier what I was whenever I got records from UI, I just do
jdbcTemplate.batchUpdate(Query, List<object[]> params)
Whenever there was an exception, I used to rollback whole transaction.
(Updated : Is batchUpdate multi-threaded or faster than batch update in some way?)
Later Case
But later as requirement changed whenever there was exception. So, whenever there is some exception, I should know which records failed to update. So I had to sent the records back to UI in case of exception with a reason, why did they failed.
so I had to do something similar to this:
for(Record record : RecordList)
{
try{
jdbcTemplate.update(sql, Object[] param)
}catch(Exception ex){
record.setReason("Exception : "+ex.getMessage());
continue;
}
}
So am I doing this in right fashion, by using the loop?
If yes, can someone suggest me how to make it multi-threaded.
Or is there anything wrong in this case.
To be true, I was hesitating to use try catch block inside the loop :( .
Please correct me, really need to learn a better way because I myself feel, there must be a better way , thanks.

make all update-operation to a Collection Callable<>,
send it to java.util.concurrent.ThreadPoolExecutor. the pool is multithreaded.
make Callable:
class UpdateTask implements Callable<Exception> {
//constructor with jdbctemplate,sql,param goes here.
#Override
public Exception call() throws Exception {
try{
jdbcTemplate.update(sql, Object[] param)
}catch(Exception ex){
return ex;
}
return null;
}
invoke call:
<T> List<Future<T>> java.util.concurrent.ExecutorService.invokeAll(Collection<? extends Callable<T>> tasks) throws InterruptedException

Your case looks like you need to use validation in java and filter out the valid data alone and send to the data base for updating.
BO layer
-> filter out the Valid Record.
-> Invalid Record should be send back with some validation text.
In DAO layer
-> batch update your RecordList
This will give you the best performance.
Never use database insert exception as a validation mechanism.
Exceptions are costly as the stack trace has to be created
Connection to database is another costly process and will take time to get a connection
Java If-Else will run much faster for same data-base validation

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Testing MongoDB in Java: concurrency problems - java

Dont't drop the database. There might be some internal references to it in mongo. I don't beleive, that your test-case needs the DB to be dropped. Usually it's enough simply to remove the all documents in the collections under test

Related

Index not Found Exception

How to deal with code that runs before foreach block in Apache Spark?

How to do multiple add operation apache jena tdb

What does eclipse do when you inspect variables (while debugging)

Inserting or updating multiple records in database in a multi-threaded way in java

Categories

Resources