I have a scenario where a Bulk number of records, not less than 8000, needs to be fetched from DB at a time and then setting them all in a 'resultList' (an ArrayList object) before rendering the result jsp page. There is a parent-child relationship in the resultSet (where almost 3000 are parent records and rest are children) and I did iterating over the parent using one for loop, and again iterating over the child records using another for loop to set in the arraylist.
But it is taking minimum of 30 mins to iterate over the 2 for loops!
While my db query fetch takes just 1.5 min to get all records from DB. I used oracle in db connection.
My question is, How can I decrease the turnaround time of my code? How can I minimize the looping time? Is there any other possibilities to get the bulk records? please suggest.
Thanks.
Related
I want to get all data from offset to limit from a table with about 40 columns and 1.000.000 rows. I tried to index the id column via postgres and get the result of my select query via java and an entitymanager.
My query needs about 1 minute to get my results, which is a bit too long. I tried to use a different index and also limited my query down to 100 but still it needs this time. How can i fix it up? Do I need a better index or is anything wrong with my code?
CriteriaQuery<T> q = entityManager.getCriteriaBuilder().createQuery(Entity.class);
TypedQuery<T> query = entityManager.createQuery(q);
List<T> entities = query.setFirstResult(offset).setMaxResults(limit).getResultList();
Right now you probably do not utilize the index at all. There is some ambiguity how a hibernate limit/offset will translate to database operations (see this comment in the case of postgres). It may imply overhead as described in detail in a reply to this post.
If you have a direct relationship of offset and limit to the values of the id column you could use that in a query of the form
SELECT e
FROM Entity
WHERE id >= offset and id < offset + limit
Given the number of records asked for is significantly smaller than the total number of records int the table the database will use the index.
The next thing is, that 40 columns is quite a bit. If you actually need significantly less for your purpose, you could define a restricted entity with just the attributes required and query for that one. This should take out some more overhead.
If you're still not within performance requirements you could chose to take a jdbc connection/query instead of using hibernate.
Btw. you could log the actual sql issued by jpa/hibernate and use it to get an execution plan from postgress, this will show you what the query actually looks like and if an index will be utilized or not. Further you could monitor the database's query execution times to get an idea which fraction of the processing time is consumed by it and which is consumed by your java client plus data transfer overhead.
There also is a technique to mimick the offset+limit paging, using paging based on the page's first record's key.
Map<Integer, String> mapPageTopRecNoToKey = new HashMap<>();
Then search records >= page's key and load page size + 1 records to find the next page.
Going from page 1 to page 5 would take a bit more work but would still be fast.
This of course is a terrible kludge, but the technique at that time indeed was a speed improvement on some databases.
In your case it would be worth specifying the needed fields in jpql: select e.a, e.b is considerably faster.
I have a table in a database that is continuously being populated with new records that have to be simply sent to Elasticsearch.
Every 15 minutes the table accrues about 15000 records. My assignment is to create a #Scheduled job that every 15 minutes gathers unprocessed records and post them to Elasticsearch.
My question is what is the most efficient way to do it? How to track unprocessed records efficiently?
My suggestion is to employ a column INSERTED_DATE that is already in this table and each time persist the last processed INSERTED_DATE in an auxiliary table. Nevertheless, it can happen that two or more records were inserted simultaneously but only one of them was processed? Surely there must be other corner cases that discard my approach.
Could you share any thoughts about it? For me it looks like a typical problem for Data Intensive application but I face it for the 1st time in a real life.
Suppose, if I execute one select query (HQL) and it gives 100K rows as result. I would like to know if there is a way to load in java(or any other language) those 100k in 1K of chunks After query is finished executing.
The reason for me to break it in chunks is - I do not know where exactly those 100K results will be stored, while i perform processing on them in java. But i would like to use lesser memory consumption.
execute query(hibernate criteria with hql) (suppose 100K row results)
pick first 1K in them (without loading other 99K in memory of JVM or somewhere, like lazy loading in hibernate)
process
pick next 1K
repeat from (2)
update- I do not want to hit the query again.
either i am not able to understand any of the answers or you people aren't able to understand my question
First split up your query in two queries.
First one is get the count of the result set by changing the Select part to something like SELECT COUNT(o) FROM Object o.
Send is you existing query without changes.
Then first run the count query an request a single result. It will be directly a Long value with the result size.
Then calculate you iterations:
long pages = Math.ceil(count/1000);
Last but not least iterate over the calculated pages and fire up your query by setting Offset and Limit before getting the result.
I have data being collected every 1 sec and stored in hsqlDB.
I need to have aggregation data (per 15 sec, 1 min etc) on each metrics in the data collected.
What is the best approach to calculate the aggregation values? When to store in the DB?
Should I calculate the values online and each 15 sec store in DB? Or maybe query the DB for the last results and calculate the aggregation on them? Should I use small aggregation (15 sec) to calculate the large aggregation (1 min) ?
Are there free java tools for it?
From previous experience, I would suggest using a real time database, probably non-relational with a built-in ability to deal with time series. That way, you should be able to avoid storing calculated aggregated data. Using a relational database, you will quickly end up with millions of rows that will be difficult to manage and slow to access. Your other option is to denormalize your data and store every 1 hour of data in a single row, in a BLOB column (in binary format).
You can use HSQLDB is MVCC mode for concurrent reads and writes.
Provided the table for the raw data has an indexed timestamp column, aggregate calculation on a range is very fast using a SELECT statement. Because SELECT statements with aggregate calculations happen concurrently, you can use separate threads to perform the operation every 1 second and every 15 seconds.
In this oracle java tutorial, it says:
TYPE_FORWARD_ONLY: The result set cannot be scrolled; its cursor moves
forward only, from before the first row to after the last row. The
rows contained in the result set depend on how the underlying database
generates the results. That is, it contains the rows that satisfy the
query at either the time the query is executed or as the rows are
retrieved.
"The rows contained in the result set depend on how the underlying database generates the results."
What's the difference between the query execution time and rows retrieving time?
And how can I know which my database supports?
Thanks in advance.
It's the difference between eager and lazy loading. I'd recommend researching those terms.
Eager loading means all the results are made available at once. It could require a great deal of time and memory if the set is large.
Lazy loading doles out results as needed. It's along the lines of what Google does when you search for pages: they'll find millions, but only return them 25 at time with higher ranks first.