My project requires a service that will repeatable migration between two tables on two different databases. I have implemented this via Hibernate. So, I have a service that fetches data from the primary database table and then migrate it to the second database table, however, my primary database table has over 200000 rows and the iterator takes quite a long time to complete the task.
What should I use to speed up the process?
Here is the Service code:
#Service
public class TrmInCardClientService {
#Autowired
TrmInCardClientDBRepository trmInCardClientDBRepository;
#Autowired
TrmInCardClientUKMRepository trmInCardClientUKMRepository;
private Logger log = Logger.getLogger(TrmInCardClientService.class.getName());
public TrmInCardClientService() {
}
#Scheduled(
fixedRate = 300000
)
public void updatelist() {
this.log.info("TrmInCardClient data transfer start");
try {
Iterable<TrmInCardClientUKM> trmInCardClientUKMS = this.trmInCardClientUKMRepository.findByDeleted(0);
List<TrmInCardClientUKM> trmInCardClientUKMList = new ArrayList();
List<TrmInCardClientDB> trmInCardClientDBList = new ArrayList<>();
Iterator var5 = trmInCardClientUKMS.iterator();
while (var5.hasNext()) {
TrmInCardClientUKM cardClientUKM = (TrmInCardClientUKM) var5.next();
trmInCardClientUKMList.add(cardClientUKM);
trmInCardClientDBList.add(new TrmInCardClientDB(cardClientUKM.getCard(), cardClientUKM.getClient(),
cardClientUKM.getDeleted(), cardClientUKM.getGlobal_id(), cardClientUKM.getVersion()));
}
this.trmInCardClientDBRepository.saveAll(trmInCardClientDBList);
this.log.info("TrmInCardClient data transfer end");
}
catch (Exception e) {
this.log.warning("Error encountered during TrmInCardClient data migration");
}
}
}
Related
I have fairly large CSV files which I need to parse and then persist into PostgreSQL. For example, one file contains 2_070_000 records which I was able to parse and persist in ~8 minutes (single thread). Is it possible to persist them using multiple threads?
public void importCsv(MultipartFile csvFile, Class<T> targetClass) {
final var headerMapping = getHeaderMapping(targetClass);
File tempFile = null;
try {
final var randomUuid = UUID.randomUUID().toString();
tempFile = File.createTempFile("data-" + randomUuid, "csv");
csvFile.transferTo(tempFile);
final var csvFileName = csvFile.getOriginalFilename();
final var csvReader = new BufferedReader(new FileReader(tempFile, StandardCharsets.UTF_8));
Stopwatch stopWatch = Stopwatch.createStarted();
log.info("Starting to import {}", csvFileName);
final var csvRecords = CSVFormat.DEFAULT
.withDelimiter(';')
.withHeader(headerMapping.keySet().toArray(String[]::new))
.withSkipHeaderRecord(true)
.parse(csvReader);
final var models = StreamSupport.stream(csvRecords.spliterator(), true)
.map(record -> parseRecord(record, headerMapping, targetClass))
.collect(Collectors.toUnmodifiableList());
// How to save such a large list?
log.info("Finished import of {} in {}", csvFileName, stopWatch);
} catch (IOException ex) {
ex.printStackTrace();
} finally {
tempFile.delete();
}
}
models contains a lot of records. The parsing into records is done using parallel stream, so it's quite fast. I'm afraid to call SimpleJpaRepository.saveAll, because I'm not sure what it will do under the hood.
The question is: What is the most efficient way to persist such a large list of entities?
P.S.: Any other improvements are greatly appreciated.
You have to use batch inserts.
Create an interface for a custom repository SomeRepositoryCustom
public interface SomeRepositoryCustom {
void batchSave(List<Record> records);
}
Create an implementation of SomeRepositoryCustom
#Repository
class SomesRepositoryCustomImpl implements SomeRepositoryCustom {
private JdbcTemplate template;
#Autowired
public SomesRepositoryCustomImpl(JdbcTemplate template) {
this.template = template;
}
#Override
public void batchSave(List<Record> records) {
final String sql = "INSERT INTO RECORDS(column_a, column_b) VALUES (?, ?)";
template.execute(sql, (PreparedStatementCallback<Void>) ps -> {
for (Record record : records) {
ps.setString(1, record.getA());
ps.setString(2, record.getB());
ps.addBatch();
}
ps.executeBatch();
return null;
});
}
}
Extend your JpaRepository with SomeRepositoryCustom
#Repository
public interface SomeRepository extends JpaRepository, SomeRepositoryCustom {
}
to save
someRepository.batchSave(records);
Notes
Keep in mind that, if you are even using batch inserts, database driver will not use them. For example, for MySQL, it is necessary to add a parameter rewriteBatchedStatements=true to database URL.
So better to enable driver SQL logging (not Hibernate) to verify everything. Also can be useful to debug driver code.
You will need to make decision about splitting records by packets in the loop
for (Record record : records) {
}
A driver can do it for you, so you will not need it. But better to debug this thing too.
P. S. Don't use var everywhere.
I want to create an AWS Lambda function in java that writes to a database in Firestore. The short story is that, while the code does what it should when I execute it
on my own computer, using NetBeans (the truth is that it works most of the time, but not always, maybe due to problems with my internet connection), nothing at all
happens when I deploy it as a Lambda function and invoke this. I suspect that this has less to do with Firestore itself, but rather with how AWS Lambda handles asynchronous
operations.
Now to the details!
As a simple example, the method that writes to the Firestore object db reads
public static void writeFirestore(Firestore db){
try{
DateTime now = DateTime.now();
String time = now.toString();
Map<String, String> data = new HashMap<>();
data.put("time", time);
String collTitle = "Notebook";
String docTitle = "Document: "+time;
db.collection(collTitle).document(docTitle).set(data);
System.out.println("wrote to Firestore");
}
catch(Exception e){
System.out.println("Could not write to db: "+e.toString());
}
}
Now, as it takes some time to connect to Firestore and initialize db, I want to make sure that db is not passed as an argument into writeFirestore() before it
has been properly retrieved. So, I define a version of db in the form of a Future object, using ExecutorService, and then retrieve
the object db with the get()-method. For this, I define the class TaskRunner:
public class TaskRunner {
ExecutorService executor;
public TaskRunner(){
executor = Executors.newSingleThreadExecutor();
}
public static interface Callback<T>{
public void onCallback(T result);
}
public <T> void executeAsync(Callable<T> callable, Callback<T> callback) throws Exception{
try{
Future future = executor.submit(callable);
Object result = future.get();
if(result != null){
System.out.println("result is not null; applying callback...");
callback.onCallback((T) result);
}
else{
System.out.println("result is null");
}
}
catch(Exception e){
System.out.println("Problem running executeAsync: "+e.toString());
}
}
}
Writing the example document to my fixed database db now goes as follows:
I define the class FirestoreCreator that implements Callable with the purpose of retrieving the Firestore object db:
public static class FirestoreCreator implements Callable<Firestore>{
#Override
public Firestore call() throws Exception {
String projectId = "myProjectId";
GoogleCredentials credentials =
GoogleCredentials.fromStream(new FileInputStream("myCredentialsFile.json"));
FirestoreOptions firestoreOptions = FirestoreOptions.getDefaultInstance()
.toBuilder()
.setProjectId(projectId)
.setCredentials(credentials)
.build();
Firestore db = firestoreOptions.getService();
return db;
}
}
I implement the TaskRunner.Callback interface using writeFirestore().
I create a TaskRunner object, taskRunner, and call its executeAsync() method with the above two objects as parameters.
These three steps are collected in the final method testUpdateFirestore() that does the job:
public static void testUpdateFirestoreInterface(){
FirestoreCreator fsCreator = new FirestoreCreator();
TaskRunner.Callback<Firestore> updateCallback = new TaskRunner.Callback<Firestore>() {
#Override
public void onCallback(Firestore result) {
writeFirestore(result);
}
};
TaskRunner taskRunner = new TaskRunner();
try {
taskRunner.executeAsync(fsCreator, updateCallback);
} catch (Exception ex) {
System.out.println("Failed to run executeAsync");
}
}
As I already mentioned in the introduction, the code works (most times) when I run it on my computer, but not at all in AWS Lambda. No exception is thrown, and yet no document has been written in Firestore.
The discussion about threads in AWS Lambda (https://dzone.com/articles/multi-threaded-programming-with-aws-lambda) made me suspect that reason is that the use of some thread that runs when ExecutorService is used is not being handled properly.
Does anyone know what goes wrong and what a solution could look like?
I looked a lot of stuff on on internet but I don't found any solution for my needs.
Here is a sample code which doesn't work but show my requirements for better understanding.
#Service
public class FooCachedService {
#Autowired
private MyDataRepository dataRepository;
private static ConcurrentHashMap<Long, Object> cache = new ConcurrentHashMap<>();
public void save(Data data) {
Data savedData = dataRepository.save(data);
if (savedData.getId() != null) {
cache.put(data.getRecipient(), null);
}
}
public Data load(Long recipient) {
Data result = null;
if (!cache.containsKey(recipient)) {
result = dataRepository.findDataByRecipient(recipient);
if (result != null) {
cache.remove(recipient);
return result;
}
}
while (true) {
try {
if (cache.containsKey(recipient)) {
result = dataRepository.findDataByRecipient(recipient);
break;
}
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return result;
}
}
and data object:
public class Data {
private Long id;
private Long recipient;
private String payload;
// getters and setters
}
As you can see in code above I need implement service which will be stored new data into database and into cache as well.
Whole algorithm should looks something like that:
Some userA create POST request to my controller to store data and it fire save method of my service.
Another userB logged in system send request GET to my controller which fire method load of my service. In this method is compared logged user's id which sent request with recipients' ids in map. If map contains data for this user they are fetched with repository else algorithm check every second if there are some new data for that user (this checking will be some timeout, for example 30s, and after 30s request return empty data, and user create new GET request and so on...)
Can you tell me if it possible do it with some elegant way and how? How to use cache for that or what is the best practice for that? I am new in this area so I will be grateful for any advice.
I have written a RESTful API using Apache Jersey. I am using MongoDB as my backend. I used Morphia (v.1.3.4) to map and persist POJO to database. I tried to follow "1 application 1 connection" in my API as recommended everywhere but I am not sure I am successful. I run my API in Tomcat 8. I also ran Mongostat to see the details and connection. At start, Mongostat showed 1 connection to MongoDB server. I tested my API using Postman and it was working fine. I then created a load test in SoapUI where I simulated 100 users per second. I saw the update in Mongostat. I saw there were 103 connections. Here is the gif which shows this behaviour.
I am not sure why there are so many connections. The interesting fact is that number of mongo connection are directly proportional to number of users I create on SoapUI. Why is that? I found other similar questions but I think I have implemented there suggestions.
Mongo connection leak with morphia
Spring data mongodb not closing mongodb connections
My code looks like this.
DatabaseConnection.java
// Some imports
public class DatabaseConnection {
private static volatile MongoClient instance;
private static String cloudhost="localhost";
private DatabaseConnection() { }
public synchronized static MongoClient getMongoClient() {
if (instance == null ) {
synchronized (DatabaseConnection.class) {
if (instance == null) {
ServerAddress addr = new ServerAddress(cloudhost, 27017);
List<MongoCredential> credentialsList = new ArrayList<MongoCredential>();
MongoCredential credentia = MongoCredential.createCredential(
"test", "test", "test".toCharArray());
credentialsList.add(credentia);
instance = new MongoClient(addr, credentialsList);
}
}
}
return instance;
}
}
PourService.java
#Secured
#Path("pours")
public class PourService {
final static Logger logger = Logger.getLogger(Pour.class);
private static final int POUR_SIZE = 30;
#POST
#Consumes(MediaType.APPLICATION_JSON)
#Produces(MediaType.APPLICATION_JSON)
public Response createPour(String request)
{
WebApiResponse response = new WebApiResponse();
Gson gson = new GsonBuilder().setDateFormat("dd/MM/yyyy HH:mm:ss").create();
String message = "Pour was not created.";
HashMap<String, Object> data = null;
try
{
Pour pour = gson.fromJson(request, Pour.class);
// Storing the pour to
PourRepository pourRepository = new PourRepository();
String id = pourRepository.createPour(pour);
data = new HashMap<String, Object>();
if ("" != id && null != id)
{
data.put("id", id);
message = "Pour was created successfully.";
logger.debug(message);
return response.build(true, message, data, 200);
}
logger.debug(message);
return response.build(false, message, data, 500);
}
catch (Exception e)
{
message = "Error while creating Pour.";
logger.error(message, e);
return response.build(false, message, new Object(),500);
}
}
PourDao.java
public class PourDao extends BasicDAO<Pour, String>{
public PourDao(Class<Pour> entityClass, Datastore ds) {
super(entityClass, ds);
}
}
PourRepository.java
public class PourRepository {
private PourDao pourDao;
final static Logger logger = Logger.getLogger(PourRepository.class);
public PourRepository ()
{
try
{
MongoClient mongoClient = DatabaseConnection.getMongoClient();
Datastore ds = new Morphia().map(Pour.class)
.createDatastore(mongoClient, "tilt45");
pourDao = new PourDao(Pour.class,ds);
}
catch (Exception e)
{
logger.error("Error while creating PourDao", e);
}
}
public String createPour (Pour pour)
{
try
{
return pourDao.save(pour).getId().toString();
}
catch (Exception e)
{
logger.error("Error while creating Pour.", e);
return null;
}
}
}
When I work with Mongo+Morphia I get better results using a Factory pattern for the Datastore and not for the MongoClient, for instance, check the following class:
public DatastoreFactory(String dbHost, int dbPort, String dbName) {
final Morphia morphia = new Morphia();
MongoClientOptions.Builder options = MongoClientOptions.builder().socketKeepAlive(true);
morphia.getMapper().getOptions().setStoreEmpties(true);
final Datastore store = morphia.createDatastore(new MongoClient(new ServerAddress(dbHost, dbPort), options.build()), dbName);
store.ensureIndexes();
this.datastore = store;
}
With that approach, everytime you need a datastore you can use the one provided by the factory. Of course, this can implemented better if you use a framework/library that support factory pattern (e.g.: HK2 with org.glassfish.hk2.api.Factory), and also singleton binding.
Besides, you can check the documentation of MongoClientOptions's builder method, perhaps you can find a better connection control there.
In my web application I'm using Stateless sessions with Hibernate to have better performances on my inserts and updates.
It was working fine with H2 database (the one used in play framework in dev mode).
But when I test it with MySQL I get the following exception :
ERROR ~ Lock wait timeout exceeded; try restarting transaction
ERROR ~ HHH000315: Exception executing batch [Lock wait timeout exceeded; try restarting transaction]
Here is the code :
public static void update() {
Session session = (Session) JPA.em().getDelegate();
StatelessSession stateless = this.session.getSessionFactory().openStatelessSession();
try {
stateless.beginTransaction();
// Fetch all products
{
List<ProductType> list = ProductType.retrieveAllWithHistory();
for (ProductType pt : list) {
updatePrice(pt, stateless);
}
}
// Fetch all raw materials
{
List<RawMaterialType> list = RawMaterialType.retrieveAllWithHistory();
for (RawMaterialType rm : list) {
updatePrice(rm, stateless);
}
}
} catch (Exception ex) {
play.Logger.error(ex.getMessage());
ExceptionLog.log(ex, Thread.currentThread());
} finally {
stateless.getTransaction().commit();
stateless.close();
}
}
private static void updatePrice(ProductType pt, StatelessSession stateless) {
pt.priceDelta = computeDelta();
pt.unitPrice = computePrice();
stateless.update(pt);
PriceHistory ph = new PriceHistory(pt, price);
stateless.insert(ph);
}
private static void updatePrice(RawMaterialType rm, StatelessSession stateless) {
rm.priceDelta = computeDelta();
rm.unitPrice = computePrice();
stateless.update(rm);
PriceHistory ph = new GoodPriceHistory(rm, price);
stateless.insert(ph);
}
In this example I have 3 simple Entities (ProductType, RawMaterialType and PriceHistory).
computeDelta and computePrice are just algorithm functions with no DB stuff.
retrieveAllWithHistory functions are functions that fetch some data from the database using Play framework model functions.
So, this code retrieves some data, edit some, create new one and finally save everything.
Why have I a lock exception with MySQL and no exception with H2 ?
I'm not sure why you have a commit in a finally block. Give this structure a try:
try {
factory.getCurrentSession().beginTransaction();
factory.getCurrentSession().getTransaction().commit();
} catch (RuntimeException e) {
factory.getCurrentSession().getTransaction().rollback();
throw e; // or display error message
}
Also, it might be helpful for you to check this documentation.