How I sure such a this condition always true:
Let's say I have 3 tables which are contains 1000, 2000, 3000 records.
And I have to load that all records to the cache, but I want to be sure all the data is in cache before Client using this cache.
Note that cache mode is replicated.
Apache Ignite has this future?
Here is the things that I will plan to do:
I will set the "atomicityMode" to the "TRANSACTIONAL"
I will create class for cacheStoreFactory
This class contains loadCache method extends from CacheStoreAdapter.
here is the pseducode:
public void loadCache(IgniteBiInClosure<K, V> clo, Object... args) {
// Connect the all databases
/*while true:
try(Transaction transaction = Ignition.ignite().transactions().txStart(TransactionConcurrency.OPTIMISTIC, TransactionIsolation.SERIALIZABLE)){
PreparedStatement for all databases such as "select * from persons"
then take the all ResultSet from all databases
then create the corresponding object from the results and put the "clo.apply(objectID, object);"
before transaction.commit()
if there is way, find the total records number from all databases (maybe find before starting try block)
then, again if there is way, compare the cache size and total records
If 2 numbers are equals -> transaction.commmit() & break
else -> continue;
}
*/
}
You can use distributed data structures to signal that some action was completed. For example:
https://apacheignite.readme.io/docs/countdownlatch
https://apacheignite.readme.io/docs/distributed-semaphore
https://apacheignite.readme.io/docs/atomic-types
Also, you can check the size of the cache or send the message from one node to another like here:
https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/messaging/MessagingExample.java
However, until you load the data in the transaction then other nodes will not be able to read it.
Related
I have a Spring application that runs a cron on it. The cron every few minutes gets new data from external API. The data should be stored in a database (MySQL), in place of old data (Old data should be overwritten by new data). The data requires to be overwritten instead of updated. The application itself provides REST API so the client is able to get the data from the database. So there should not be situation that client sees an empty or just a part of data from database because there is an data update.
Currently I've tried deleting whole old data and insert new data but there is a place that a client gets just a part of the data. I've tried it via Spring Data deleteAll and saveAll methods.
#Override
#Transactional
public List<Country> overrideAll(#NonNull Iterable<Country> countries) {
removeAllAndFlush();
List<CountryEntity> countriesToCreate = stream(countries.spliterator(), false)
.map(CountryEntity::from)
.collect(toList());
List<CountryEntity> createdCountries = repository.saveAll(countriesToCreate);
return createdCountries.stream()
.map(CountryEntity::toCountry)
.collect(toList());
}
private void removeAllAndFlush() {
repository.deleteAll();
repository.flush();
}
I also thought about having a temporary table that gets new data and when the data is complete just replace main table with temporary table. Is it a good idea? Any other ideas?
It's a good idea. You can minimize the downtime by working on another table until it's ready and then switch tables quickly by renaming. This will also improve perceived performance by the users because no record needs to be locked like what happens when using UPDATE/DELETE.
In MySQL, you can use RENAME TABLE if you don't have triggers on the table. It allows multiple table renaming at once and it works atomically (i.e. transaction - if any error happens, no change is made). You can use the following for example
RENAME TABLE countries TO countries_old, countries_new TO countries;
DROP TABLE countries_old;
Refer here for more details
https://dev.mysql.com/doc/refman/5.7/en/rename-table.html
We are trying to make an Application that returns paginated results from cassandra db for a UI.
UI would pass fetchSize and pagingState to our API and based on that we would return a List<MyObject> of size=fetchSize. If pagingState is passed we would resume the query from last page (as mentioned in cassandra docs : https://docs.datastax.com/en/developer/java-driver/3.6/manual/paging/)
Please note that I'm using Cassandra driver version 3.6.
But when we implemented this, Cassandra always returns all entries in the database ignoring the fetch size, which in turn results null value for ResultSet.getExecutionInfo().getPagingState(). How do I solve this?
I created 16 records in my database for MyObject and tried passing fetch size as 5 to get them. All 16 records have same partition key ID-1.
// Util method to invoke Statement. "session" is cassandra session
public static ResultSet execute(int pageSize, Statement statement, String pageState) {
if (isVoid(pageSize)) {
pageSize=-1;
}
statement.setFetchSize(pageSize);
if (!isVoid(pageState)) {
statement.setPagingState(PagingState.fromString(pageState));
}
return session.execute(statement);
}
// Accesor interface method for my query that returns a Statement
object
#Query("SELECT * FROM " + MY_TABLE + " WHERE id=:id")
Statement getAll(#Param("id") String id);
// Main Code returning list of MyObject that has an object Mapper ->
//mapper
Statement statement=accessor.getAll("ID1");
ResultSet rs=execute(5,statement,null );
List<MyObject> list=mapper.map(rs).all();
String pageState=rs.getExecutionInfo().getPagingState();
In the above code, I expected Cassandra to return a list of 5 MyObject objects and have a string value for my pageState variable.
Neither worked as expected.
List had a size of 16 (Basically it fetched all records)
and because of above, pageState was null as all records were already fetched.
What am I missing here?
EDIT:
From observation ResultSet will honour fetchSize passed in the statement, but when we map it to List<MyObject> using all() method, it fetches all the results in the database(of size = Cluster wide fetchSize).
So when I invoked Result#one method 5(= pageSize) times and pushed them in a List, I got the paging state as well as results of size page size.
Sample Util method for above
public static <T> List<T> getPaginatedList(ResultSet resultSet, Mapper<T> mapper,int pageSize) {
List<T> entities=new ArrayList<>();
Result<T> result=mapper.map(resultSet);
IntStream.range(1,pageSize).forEach(i->{
entities.add(result.one());
});
return entities;
}
What is the performance impact of this?
As you were able to discern, the reason you are getting all results back despite the fact that you are specifying setFetchSize is because fetch size simply sets the requested size of each requested page. When you invoke all(), the driver transparently pages through all results.
Calling one() individually will not have a performance impact when compared to all(), however I would recommend changing your logic for consuming the page as I would expect IntStream.range(1, pageSize) to fail if you've exhausted your result set (i.e. you set fetch size to 500, but there are only 495 rows). Instead you could use IntStream.range(1, resultSet.getAvailableWithoutFetching()).
You could also choose to iterate over the result set until ResultSet.isExhausted() returns true to prevent fetching the next page.
I have a table with 62,000,000 rows aprox, a need select data from these a export to .txt or .csv
My query limit the result to 60,000 rows aprox.
When I run my the query in my developer machine, I eat all memory and get a java.lang.OutOfMemoryError
In this moment I use Hibernate for DAO, but I can change to pure JDBC solution when you recommend
My pseoudo-code is
List<Map> list = myDao.getMyData(Params param); //program crash here
initFile();
for(Map map : list){
util.append(map); //this transform row to file
}
closeFile();
Suggesting me to write my file?
Note: I use .setResultTransformer(Transformers.ALIAS_TO_ENTITY_MAP); to get Map instead of any Entity
You could use hibernate's ScrollableResults. See documentation here: http://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch11.html#objectstate-querying-executing-scrolling
This uses server-side cursors, if your database engine / database driver supports this. Be sure for this to work you set the following properties:
query.setReadOnly(true);
query.setCacheable(false);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
while (results.next()) {
SomeEntity entity = results.get()[0];
}
results.close();
lock the table and then perform subset selection and exports, appending to the results file. ensure you unconditionally unlock when done.
Not nice, but the task will perform to completion even on limited resource servers or clients.
I have a db fetch call with Spring jdbcTemplate and rows to be fetched is around 1 millions. It takes too much time iterating in result set. After debugging the behavior I found that it process some rows like a batch and then waits for some time and then again takes a batch of rows and process them. It seems row processing is not continuous so overall time is going into minutes. I have used default configuration for data source. Please help.
[Edit]
Here is some sample code
this.prestoJdbcTempate.query(query, new RowMapper<SomeObject>() {
#Override
public SomeObject mapRow(final ResultSet rs, final int rowNum) throws SQLException {
System.out.println(rowNum);
SomeObject obj = new SomeObject();
obj.setProp1(rs.getString(1));
obj.setProp2(rs.getString(2));
....
obj.setProp8(rs.getString(8));
return obj;
}
});
As most of the comments tell you, One mllion records is useless and unrealistic to be shown in any UI - if this is a real business requirement, you need to educate your customer.
Network traffic application and database server is a key factor in performance in scenarios like this. There is one optional parameter that can really help you in this scenario is : fetch size - that too to certain extent
Example :
Connection connection = //get your connection
Statement statement = connection.createStatement();
statement.setFetchSize(1000); // configure the fetch size
Most of the JDBC database drivers have a low fetch size by default and tuning this can help you in this situation. **But beware ** of the following.
Make sure your jdbc driver supports fetch size
Make sure your JVM heap setting ( -Xmx) is wide enough to handle objects created as a result of this.
Finally, select only the columns you need to reduce network overhead.
In spring, JdbcTemplate lets you set the fetchSize
I am executing the following set of statements in my java application. It connects to a oracle database.
stat=connection.createStatement();
stat1=commection.createstatement();
ResultSet rs = stat.executeQuery(BIGQUERY);
while(rs.next()) {
obj1.setAttr1(rs.getString(1));
obj1.setAttr2(rs.getString(1));
obj1.setAttr3(rs.getString(1));
obj1.setAttr4(rs.getString(1));
ResultSet rs1 = stat1.executeQuery(SMALLQ1);
while(rs1.next()) {
obj1.setAttr5(rs1.getString(1));
}
ResultSet rs2 = stat1.executeQuery(SMALLQ2);
while(rs2.next()) {
obj1.setAttr6(rs2.getString(1));
}
.
.
.
LinkedBlockingqueue.add(obj1);
}
//all staements and connections close
The BIGQUERY returns around 4.5 million records and for each record, I have to execute the smaller queries, which are 14 in number. Each small query has 3 inner join statements.
My multi threaded application now can process 90,000 in one hour. But I may have to run the code daily, so I want to process all the records in 20 hours. I am using about 200 threads which process the above code and stores the records in linked blocking queue.
Does increasing the thread count blindly helps increase the performance or is there some other way in which I can increase the performance of the result sets?
PS : I am unable to post the query here, but I am assured that all queries are optimized.
To improve JDBC performance for your scenario you can apply some modifications.
As you will see, all these modifications can significantly speed your task.
1. Using batch operations.
You can read your big query and store results in some kind of buffer.
And only when buffer is full you should run subquery for all data collected in buffer.
This significantly reduces number of SQL statements to execute.
static final int BATCH_SIZE = 1000;
List<MyData> buffer = new ArrayList<>(BATCH_SIZE);
while (rs.hasNext()) {
MyData record = new MyData( rs.getString(1), ..., rs.getString(4) );
buffer.add( record );
if (buffer.size() == BATCH_SIZE) {
processBatch( buffer );
}
}
void processBatch( List<MyData> buffer ) {
String sql = "select ... where X and id in (" + getIDs(buffer) + ")";
stat1.executeQuery(sql); // query for all IDs in buffer
while(stat1.hasNext()) { ... }
...
}
2. Using efficient maps to store content from many selects.
If your records are no so big you can store them all at once event for 4 mln table.
I used this approach many times for night processes (with no normal users).
Such approach may need much heap memory (i.e. 100 MB - 1 GB) - but is much faster that approach 1).
To do that you need efficient map implementation, i.e. - gnu.trove.map.TIntObjectMap (etc)
which is much better that java standard library maps.
final TIntObjectMap<MyData> map = new TIntObjectHashMap<MyData>(10000, 0.8f);
// query 1
while (rs.hasNext()) {
MyData record = new MyData( rs.getInt(1), rs.getString(2), ..., rs.getString(4) );
map.put(record.getId(), record);
}
// query 2
while (rs.hasNext()) {
int id = rs.getInt(1); // my data id
String x = rs.getString(...);
int y = rs.getInt(...);
MyData record = map.get(id);
record.add( new MyDetail(x,y) );
}
// query 3
// same pattern as query 2
After this you have map filled with all data collected. Probably with a lot of memory allocated.
This is why you can use that method only if you hava such resources.
Another topic is how to write MyData and MyDetail classes to be as small as possible.
You can use some tricks:
storing 3 integers (with limited range) in 1 long variable (using util for bit shifting)
storing Date objects as integer (yymmdd)
calling str.intern() for each string fetched from DB
3. Transactions
If you have to do some updates or inserts than 4 mln records is too much to handle in on transactions.
This is too much for most database configurations.
Use approach 1) and commit transaction for each batch.
On each new inserted record you can have something like RUN_ID and if everything went well you can mark this RUN_ID as successful.
If your queries only read - there is no problem. However you can mark transaction as Read-only to help your database.
4. Jdbc fetch size.
When you load a lot of records from database it is very, very important to set proper fetch size on your jdbc connection.
This reduces number of physical hits to database socket and speeds your process.
Example:
// jdbc
statement.setFetchSize(500);
// spring
JdbcTemplate jdbc = new JdbcTemplate(datasource);
jdbc.setFetchSize(500);
Here you can find some benchmarks and patterns for using fetch size:
http://makejavafaster.blogspot.com/2015/06/jdbc-fetch-size-performance.html
5. PreparedStatement
Use PreparedStatement rather than Statement.
6. Number of sql statements.
Always try to minimize number of sql statements you send to database.
Try this
resultSet.setFetchSize(100);
while(resultSet.next) {
...
}
The parameter is the number of rows that should be retrieved from the
database in each roundtrip