Multiple thread with multiple table insert - java

I am working on Spring+Hibernate based web application.
In this application I have to do calculation on 50000 records available into the database.
Current logic :-
Loop through 0 to 50000 (All 50000 records are independent to each other)
Select ith element
Do calculation on ith element (Drop CALCULATION_TEMP table if exist, create new table CALCULATION_TEMP and insert calculation in CALCULATION_TEMP table)
Do some calculation on step 3 table and get the result
Put step 4 result into Results table
Currently these all calculation taking around 38 hours to complete with single thread.
Now we want to run this system by multiple threads.
For testing purpose I have taken 50 records.
Using Single thread it is taking around 30 sec.
Using two threads :-
Half records performing by first thread and rest of the records by second thread.
Now I am using two TEMP tables for both threads. (TEMP1 and TEMP2)
It is taking 225 sec.
Rough Code :-
for (int i = 0; i < recordsSize; i++) {
final int j = i;
String recordId = list.get(i);
// Method call : Code for creating CALCULATION_TEMP table
// CALCULATION_TEMP table will contain dynamic number of column. It is depends on the record data (50 to 70 columns)
// return flag value
boolean flag = xyzMethod(....);
if (flag) {
// All calculation done in this method
// Around 600 - 700 rows will be created into CALCULATION_TEMP table on the basis of calculation logic
Object fileMapColumnData[] = /* Method call */;
// Insert result for one record into RESULT table for unique recordId (this result is calculated in CALCULATION_TEMP table)
insertIntoResultTable(....);
// Drop CALCULATION_TEMP table
} else {
LOGGER.error("Unable to calculate ... because of some wrong data");
loggerDTO.getCustomLogger().severe("Unable to calculate ... because of some wrong data");
}
if (i % 100 == 0) {
calculationDao.flushAndClear();
}
// Thread for showing process completion status in percentage
Thread t = new Thread() {
#Override
public void run() {
getPercentageDone((float) recordsSize, (float) (j + 1));
}
};
t.start();
}
Please suggest, How I can improve the performance.

Creating/Droping temporary tables take a lots of time by 50000 times. Can you do calculations without temporary table ? It can improve peformance.

There is no hard and fast rule for performance improvements. If you are having constraints and business decisions, we need to know the calculations and how the threads are managed, connections are managed and the number of loops involved etc. There are so many things to consider. Start with checking the pain point areas, the time taken for it and improve each method first.

Related

Java Algorithm for retrieving and storing user dates using Firestore database

So I am trying to build this algorithm, what it will do is retrieve a reference (in this case a date), this date will be used to check against my firestore database to ensure that there are less than 3 employees who have booked this same date. So you will see here
startDateReference = 202131Thursday = Thursday, 1st of April, 2021
So I am running a loop, within this loop is a counter, this counter will take account the amount of times this reference is called in the loop. However, if it is looped 3 times the program will stop the loop, which is great, but I am unable to wrap my head around how to execute a condition if the loop has resolved as less than 3 iterations. For example, at the moment my database only has 2 of these reference stored, so I am successully able to call each iteration, but it does not take account that there is 1 more extra slot.
my code:
final Query query = db.collection("holidays").whereEqualTo("startDateReference",totalDateCode);
query.get().addOnSuccessListener(new OnSuccessListener<QuerySnapshot>() {
#Override
public void onSuccess(QuerySnapshot queryDocumentSnapshots) {
//creating a map to obtain information
Map<String, Object> test = new HashMap<>();
//counter
int counter = 0;
boolean flag = false;
//retrieve data as a hashmap document
for (QueryDocumentSnapshot documentSnapshot: queryDocumentSnapshots){
Note data = documentSnapshot.toObject(Note.class);
test.put("startDateReference", data.getStartDateReference());
Collection<Object> values = test.values();
//retrieve results as single values
for (Object string : values) {
do {
//System.out.println(string);
if (string.equals("202131Thursday")) {
counter++;
System.out.println("Checking for holidays" + counter);
} else if (counter == 3) {
System.out.println("could not save data it is packed");
} else {
System.out.println("storing details");
}
} while (counter == 3);
}
}
}
});
The results i get:
> I/System.out: 0
> Checking for holidays1 I/System.out: 1
> Checking for holidays2
But after this last result I expect this condition to execute as seen in the else condition within my code as there is a extra space (note i have not coded in my database store function i am using just string text to see how it would work right now and the database only has 2 of the reference, there is a extra slot which i am trying to store details in):
System.out.println("storing details");
Call another condition outside of all the loops and then after the loops have resolved you can use the counter as its final form to verify whether there is space or not using else if statements.

Is there a way to consume sample records from kinesis?

I wrote an Amazon Kinesis consumer using KCL, which manages my record processing task. It is currently processing the records without filtering.
I am looking for a way to process sample records while skipping some of them.
For example, in case there are 100 records total, and I would like to process just 1/10 out of it (10 sample records).
Thanks!
As far as I'm aware, there's no way to tell Kinesis only to return a random subset of records. So it's up to you to ignore records once they've been returned to you.
In the processRecords method, simply ignore 9/10 records randomly:
public void processRecords(ProcessRecordsInput processRecordsInput) {
processRecordsInput.records()
.forEach(r -> if (generateRandomNumber()%10 == 0) process(r));
}
private int generateRandomNumber() {
Random random = new Random();
return random.nextInt(10);
}

Java Mysql big data out of heap space

I have an application which accesses about 2 million tweets from a MySQL database. Specifically one of the fields holds a tweet of text (with maximum length of 140 characters). I am splitting every tweet into an ngram of words ngrams where 1 <= n <= 3. For example, consider the sentence:
I am a boring sentence.
The corresponding nGrams are:
I
I am
I am a
am
am a
am a boring
a
a boring
a boring sentence
boring
boring sentence
sentence
With about 2 million tweets, I am generating a lot of data. Regardless, I am surprised to get a heap error from Java:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2145)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1922)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3423)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:483)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3118)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2709)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1612)
at twittertest.NGramFrequencyCounter.FreqCount(NGramFrequencyCounter.java:49)
at twittertest.Global.main(Global.java:40)
Here is the problem code statement (line 49) as given by the above output from Netbeans:
results = stmt.executeQuery("select * from tweets");
So, if I am running out of memory it must be that it is trying to return all the results at once and then storing them in memory. What is the best way to solve this problem? Specifically I have the following questions:
How can I process pieces of results rather than the whole set?
How would I increase the heap size? (If this is possible)
Feel free to include any suggestions, and let me know if you need more information.
EDIT
Instead of select * from tweets I partitioned the table into equally sized subsets of about 10% of the total size. Then I tried running the program. It looked like it was working fine but it eventually gave me the same heap error. This is strange to me because I have ran the same program in the past, successfully with 610,000 tweets. Now I have about 2,000,000 tweets or roughly 3 times as much more data. So if I split the data into thirds it should work, but I went further and split the subsets into size 10%.
Is some memory not being freed? Here is the rest of the code:
results = stmt.executeQuery("select COUNT(*) from tweets");
int num_tweets = 0;
if(results.next())
{
num_tweets = results.getInt(1);
}
int num_intervals = 10; //split into equally sized subets
int interval_size = num_tweets/num_intervals;
for(int i = 0; i < num_intervals-1; i++) //process 10% of the data at a time
{
results = stmt.executeQuery( String.format("select * from tweets limit %s, %s", i*interval_size, (i+1)*interval_size));
while(results.next()) //for each row in the tweets database
{
tweetID = results.getLong("tweet_id");
curTweet = results.getString("tweet");
int colPos = curTweet.indexOf(":");
curTweet = curTweet.substring(colPos + 1); //trim off the RT and retweeted
if(curTweet != null)
{
curTweet = removeStopWords(curTweet);
}
if(curTweet == null)
{
continue;
}
reader = new StringReader(curTweet);
tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);
//tokenizer = new StandardFilter(Version.LUCENE_36, tokenizer);
//Set stopSet = StopFilter.makeStopSet(Version.LUCENE_36, stopWords, true);
//tokenizer = new StopFilter(Version.LUCENE_36, tokenizer, stopSet);
tokenizer = new ShingleFilter(tokenizer, 2, 3);
charTermAttribute = tokenizer.addAttribute(CharTermAttribute.class);
while(tokenizer.incrementToken()) //insert each nGram from each tweet into the DB
{
insertNGram.setInt(1, nGramID++);
insertNGram.setString(2, charTermAttribute.toString().toString());
insertNGram.setLong(3, tweetID);
insertNGram.executeUpdate();
}
}
}
Don't get all rows from table. Try to select partial
data based on your requirement by setting limits to query. You are using MySQL database your query would be select * from tweets limit 0,10. Here 0 is starting row id and 10 represents 10 rows from start.
You can always increase the heap size available to your JVM using the -Xmx argument. You should read up on all the knobs available to you (e.g. perm gen size). Google for other options or read this SO answer.
You probably can't do this kind of problem with a 32-bit machine. You'll want 64 bits and lots of RAM.
Another option would be to treat it as a map-reduce problem. Solve it on a cluster using Hadoop and Mahout.
Have you considered streaming the result set? Halfway down the page is a section on result set, and it addresses your problem (I think?) Write the n grams to a file, then process the next row? Or, am I misunderstanding your problem?
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

How to perform limit query in HQL when table size is not known? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do you do a limit query in HQL
I have a table A of large yet unknown size. I need to retrieve results in batches, i.e. 100 at a time, from the table and perform operations on them. For this, I have a variable m so that I can execute the HQL equivalent of query "SELECT * from A LIMIT m,100" within a loop, which iterates m as 1, 101, 201, ... and so on. What shall be the terminating condition of the loop? What is the result when the above query is executed in a condition when m > number of rows in the table?
use select count(*) query first , this will give you total number of records. now you write java code to set limit accordingly.
(count/m) int = x= number of full batches and count-x*m = rows for last batch.
The issue you have is more known as pagination.
And your problem is how many pages you can retrieve form a table ?
You must know the pages count to use pagination.
To set page count, you need to execute pre-query that will have same FROM clause but only SELECT count(*) FROM .... This will give you always correct answer.
If you can not or do not want to use extra statement. Then you should as database for pages until the result is not empty. So you execute statement for page 1, paget 2, ... page n until size of result is less then page size.
int pageSize = 100;
int page = 1;
do {
currentPage = loadPage(page, pageSize);
page++;
} while(currentPage.size() == pageSize);
I suggest to use hibernate functions setFirstResult() and setMaxResults() to achieve your desired results e.g. below:
Query query= session.createQuery("SELECT * from A");
int resultSize = 100;
List<?> pagedResults = null;
//use any approriate number for iterations or do a result count first to get the expected result size
for(int m=0; ; m++){
pagedResults = (List<?>)query.setFirstResult(m*resultSize).setMaxResults((m+1)*resultSize).list();
//do your processing
if(pagedResults.isEmpty() || pagedResults.size() <resultSize){
break; //no further iterations required
}
}
setMaxResults() restricts the result size only. If there are less records available, you will receive those record only in the result. Added on "if" condition to avoid any unnecessary iterations.
Hope this helps!!

How do I use use data from a Resultset in a JTable?

I am doing a school project and I am having trouble with storing the data from a resultset in a JTable. Previously I had used DButils but now I am wondering if there is a way to do the same thing without having to used external class files or if it is easier to use DButils.
The data is coming from only one table and all that needs to happen is the data must be displayed in the JTable.
I would post my code here but I have looked and the only tutorials I could find were ones on how to populate a JTable using and Object [][]. I am using JDBC to create the connection.
Thanks in advance.
Well this will require several steps.
I will explain my way, which is good for very large sets, but a little complicated if you only want to show a few lines. Still I'm sure it will help you. This method will load the required records on the fly, not the whole set before hand. It creates the illusion of having the whole set, but with out having to wait for a lengthy load.
1) Ok, first, let's assume that we have a nice JFrame that you can display, to start with. So first I will add a JScrollPane, and inside it I will add a JTable. Run it and make sure you have a nice window with an empty JTable inside scroll bars.
2) So next you need a data source for the JTable. Since a JTable is a very generic component not made specifically for SQL resultSets, it requires a data source that implements javax.swing.table.AbstractTableModel which has nothing to do with SQL. So we will now create a TableModelClass which will implement AbstractTableModel, then we will add this to the JTable and it will start working. Of course, the trick is to implement all of AbstractTableModel's methods that get data by using our SQL result set, and this is up to you. From here is my suggestion ->
3) Since this will be dynamic, we will not need to load all the data before hand, but we need an initial set to display. I will have a Object [][] of a fixed size, lets say 200 - 300 rows. So I will initially execute the SQL and fill the array with the buffer size of 200-300 rows. How much to cache will depend on 2 things: 1 It shall be enough to get all the data for the current display size of the JTable, and 2, it should be small enough so that as we scroll and get subsequent caches it executes very fast.
4) Now let's begin implementing all AbstractTableModel's interface methods.
5) First we look at the initial result set and report the number of columns. Just add a class variable, set the column count and return it using: public int getColumnCount( ). This will not change from now.
6) Also looking at the result set metadata, make a list variable in the class and add the column names returned in the meta data. Using this list return the column names in "getColumnName( int col )". Of course, the col index is the column position in the result set.
7) Now lets do "int getRowCount( )". Inside the TableModelClass keep a variable to contain the rowCount and return it in this method. TIP: Don’t worry for now, set it to a fixed large number like 65000, this will let scroll as you dynamically load the records. Once we hit the end we will set the number to the real value and the scroll pane will adjust to the correct proportions. Trust me, it works ok.
8) Now comes the fun part. As the JTable presents the first "page" of the table and as the user scrolls it will begin calling "getValueAt( int row, int col )". This will map directly to our Object[][], but since we only have a cache, and not the whole table, as the user scrolls down we will need to fetch more data. I do this:
public Object getValueAt( int row, int col )
{
// load failed before, no more trying...
if( loadExceptionOccur || ( row >= visualTableSize ) ) return( "" );
// check if requested row is OUT of cache …
try{
if(
// less than cache lower limit...
( ( row < startRow )
||
// Beyond cache upper limit...
( row >= startRow + tableDataCache.size()) )
// Stop unnecessary loading, because of Jtable readjusting
// its visual table size and redrawing the entire table.
&& !tableRedraw
// yes, get new cache...
){
load( row ); // <- below is code
}
// now we now the row is in cache, so ->
// verify requested cell in cache, or beyond data rows,
if(
// greater than lower limit
( row >= startRow )
&&
// less than upper limit...
( row < ( startRow + tableDataCache.size() ) )
){
tableRedraw = false;
// just get the data from the cache. tableDataCache is just your Object[][] array…
Object cellValue = ( ( recordClass ) tableDataCache.get( row-startRow ) ).getValueAt( col );
return ( cellValue );
}
else{
// just show as blank
return( "" );
}
}
catch( Exception error ){
…
In case of a cache miss you need to reload a cache of data. I will normally load some rows before the requested row and some beyond, at least for a JTable page size, so that we only go once to the db to render a screen. The bigger the cache the more scrolling before loading, but the larger the time it takes to load a cache. If you fine tune it, the cache processing might be almost unnoticeable.
Here is the implementation of "load":
public void load( int rowIndex )
throws KExceptionClass
{
// calculate start of new cache, if not enough rows for top half of cache
// then start from 0
int halfCache = cacheSize / 2 ;
int DBStartRow = 0;
if( rowIndex > halfCache ) DBStartRow = rowIndex - halfCache;
//Do query to DB
try{
SQLP.load( DBStartRow, cacheSize ); // <- using jdbc load from DbsartRow as many rows as cacheSize. Some sample SQL paging code below ->
}catch( Exception loadError ){
// if the database fails or something do this, so you don’t get a billion errors for each cell. ->
//set load failed flag, kill window
loadExceptionOccur = true;
visualTableSize = 0;
tableDataCache = new ArrayList< recordClass >();
fireTableDataChanged(); // clear the Jtable
// log error
log.log( this, KMetaUtilsClass.getStackTrace( loadError ) );
// show error message
throw new KExceptionClass( "Could not load table data! " , loadError );
}
//Load rows into the cache list.
//Key field values are in the cache list as the last field in each record.
tableDataCache.clear(); // the Object [][], wrapped in class
while( SQLPreprocessor.nextRowValue() ) {
SQL.fetch( record ); //<- get JDBC rows to table cache
tableDataCache.add( record ); // this uses my library, change to JDBC or what ever you use to access SQL
}
log.log( this, "cacheList size = " + tableDataCache.size());
//---------
if(
// Last requested row number
( DBStartRow + cacheSize ) >
// Last replied row number
( SQLPreprocessor.getloadedStartRowIndex() + SQLPreprocessor.getloadedRowCount() )
){
// It is the end of table.
// The visual table is readjusted accordingly.
visualTableSize = SQLPreprocessor.getloadedStartRowIndex() + SQLPreprocessor.getloadedRowCount();
fireTableDataChanged();
tableRedraw = true;
}
startRow = SQLPreprocessor.getloadedStartRowIndex();
log.log( this, "visualTableSize = " + visualTableSize );
}
Ok this will dynamically load the data in small caches which will give the impression of having the whole set.
If the user scrolls to the middle or all the way to the end, the JTable will ask only for the data need to display not all the rows as it moves, so, if you have a 10K row table, but the JTable is only 20 rows high, a scroll to the end will only take 40 - 50 rows to load. Pretty nice. Your users will be impressed.
Now the thing is that the load assumes that you have a SQL cursor that moves forward and backwards by row number. This simple thing is quite a challenge in SQL. For Oracle check : http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html
Ok, hope that helps.--
Of course there's a way: iterate through the ResultSet and add what you find to the Object [][] array that gets passed to the JTable. There's one row in the 2D array for each row in the ResultSet; the columns are the values.
The problem you'll have is that you won't know how many rows came back without iterating through it. That's why loading it into a Map<String, Object> might be a better idea.
Here's an example showing how to do it. You'll find that method (and more) at my answer to this question:
java sql connections via class
public static List<Map<String, Object>> map(ResultSet rs) throws SQLException {
List<Map<String, Object>> results = new ArrayList<Map<String, Object>>();
try {
if (rs != null) {
ResultSetMetaData meta = rs.getMetaData();
int numColumns = meta.getColumnCount();
while (rs.next()) {
Map<String, Object> row = new HashMap<String, Object>();
for (int i = 1; i <= numColumns; ++i) {
String name = meta.getColumnName(i);
Object value = rs.getObject(i);
row.put(name, value);
}
results.add(row);
}
}
} finally {
close(rs);
}
return results;
}

Categories

Resources