How do I tell Postgres to load small tables into memory? - java

Let's say I have a small postgres database (< 500mb) ,
and I have app which is very read intensive 99% of requests are reads.
Is there a way to tell Postgres to load all tables into RAM so it can do selects faster.?
I think Oracle and SQL server have that kind of functionality.
I have done some test on my local machine, I have table with 500 records, Java HashMap Took 2ms, sql select took 12000 ms,
Obviously java HashMap is faster because it is within same process but is there a way to speedup sql queries for small tables in postgres ? Thanks
for (int i = 0; i < 100_000; i++) {
//1) select * from someTable where id = 10
// 2) get from Java HashMap by key
}

PostgreSQL caches data in RAM automatically. All you have to do is set shared_buffers to be at least as big as your database is, then all data have to be read only once and will stay cached in RAM.
If you want to reduce the initial slowness caused by reading the data, set shared_preload_libraries = 'pg_prewarm,pg_stat_statements'.
I suspect that the excessive slowness you report is not slowness of the database, but network lag or inefficient queries. pg_stat_statements will help profile your database workload.

Related

MySql slow with concurrent queries

I'm running queries in parallel against a MySql database. Each query takes less than a second and another half a second to a second to fetch.
This is acceptable for me. But when I run 10 of these queries in parallel and then attempt another set in a different session everything slows down and a single query can take some 20 plus seconds.
My ORM is hibernate and I'm using C3P0 with <property name="hibernate.c3p0.max_size">20</property>. I'm sending the queries in parallel by using Java threads. But I don't think these are related because the slowdown also happens when I run queries in MySql Workbench. So I'm assuming something in my MySql config is missing, or the machine is not fast enough.
This is the query:
select
*
FROM
schema.table
where
site = 'sitename' and (description like '% family %' or title like '% family %')
limit 100 offset 0;
How can I make this go faster when facing let's say 100 concurrent queries?
I'm guessing that this is slow because the where clause is doing a full text search on the description and title columns; this will require the database to look through the entire field on every record, and that's never going to scale.
Each of those 10 concurrent queries must read the 1 million rows to fulfill the query. If you have a bottleneck anywhere in the system - disk i/o, memory, CPU - you may not hit that bottleneck with a single query, but you do hit it with 10 concurrent queries. You could use one of these tools to find out which bottleneck you're hitting.
Most of the time, those bottlenecks (CPU, memory, disk) are too expensive to upgrade - especially if you need to scale to 100 concurrent queries. So it's better to optimize the query/ORM approach.
I'd consider using Hibernate's built-in free text capability here - it requires some additional configuration, but works MUCH better when looking for arbitrary strings in a textual field.

Read 3 million records with hibernate

I'm a noob in hibernate and I have to read 2 million records from a DB2 z/OS-Database with hibernate in Java. (JDBC)
My problem is, that I run OutOfMemory after 150000 records.
I've heard about batching etc, but I only find solutions for actually inserting new records. What I want to do is to read this records in an ArrayList for further usage.
So I'm actually just selecting one row of the database to reduce the data:
getEntityManager().createQuery("select t.myNumber from myTable t").getResultList();
Also it would be interesting, if there is a better way to read such a huge amount of records.(Maybe without Hibernate?)
The following is the way to do batch processing using hibernate. Keep in mind this is not 100% tested. It's kind of pseudo logic.
int i=0;
int batch = 100;
List<myNumber> numList = getEntityManager().createQuery("select t.myNumber from myTable t").setFirstResult(i).setMaxResults(batch).getResultList();
while(numList.size() == batch){
//process numList
i+=batch;
numList = getEntityManager().createQuery("select t.myNumber from myTable t").setFirstResult(i).setMaxResults(batch).getResultList();
}
Hibernate documentation for setFirstResult() and setMaxResults()
A best approach is to use statelessSession ( no deal with cache ) and bulk operations with the scrollableResults method :
StatelessSession statelessSession = sessionFactory.openStatelessSession(connection);
try {
ScrollableResults scrollableResults = statelessSession.createQuery("from Entity").scroll(ScrollMode.FORWARD_ONLY);
int count = 0;
while (scrollableResults.next()) {
if (++count > 0 && count % 100 == 0) {
System.out.println("Fetched " + count + " entities");
}
Entity entity = (Entity) scrollableResults.get()[0];
//Process and write result
}
} finally {
statelessSession.close();
}
You should not load all records into memory but process them in batch, e.g: loop every 1000 records by using
createQuery(...).setFirstResult(i*1000).setMaxResults(1000);
You have found the upper limit of your heap. Have a look here to know how to properly size you heap:
Increase heap size in Java
However, I cannot imagine why you would need to have a List of 3 million records in memory. Perhaps with more information we could find an alternative solution to your algorithm?
Yes off course
You may use The Apache™ Hadoop® for large project . it develops open-source software for reliable, scalable, distributed computing. It is designed to scale up from single servers to thousands of machines
hadoop apache
This is basically a design question for the problem you are working on. Forget Hibernate even if you are doing the same thing in plain JDBC you will hit the memory issue, maybe a bit late. The idea of loading such huge data and keeping in memory is not suitable for applications requiring short request-response cycles and is not good for scalability either. As others have suggested you can try the batch or paging behaviour or if you want to be more exotic you can try parallel processing via distributed data-grid (like Infinispan) or map-reduce framework from Hadoop.
Going by the description of your problem it seems that you need to keep the data around in memory.
If you must keep the huge data around in memory then you can query the data in batches and keep storing them in a distribited cache (like Infinispan) which can span multiple JVM's on single machine or multiple machine forming a cluster. This way your data will reside partially on each node.Here Infinispan can be used as a distributed cache.
There are framework like Spring Batch which take the route of solving such problems by dividing the work into chunks (batch) and then processing them one by one. It has even inbuilt JPA based readers and writers which perform this work in a batch.

JDBC Select batching/fetch-size with MySQL

I have a Java application using JDBC that runs once per day on my server and interacts with a MySQL database (v5.5) also running on the same server. The app is querying for and iterating through all rows in a table.
The table is reasonably small at the moment (~5000 rows) but will continue to grow indefinitely. My servers memory is limited and I don't like the idea of the app's memory consumption being indeterminate.
If I use statement.setFetchSize(n) prior to running the query, it's unclear to me what is happening here. For example, if I use something like:
PreparedStatement query = db.prepareStatement("SELECT x, y FROM z");
query.setFetchSize(n);
ResultSet result = query.executeQuery();
while ( result.next() ){
...
}
Is this how to appropriately control potentially large select queries? What's happening here? If n is 1000, then will MySQL only pull 1000 rows into memory at a time (knowing where it left off) and then grab the next 1000 (or however many) rows each time it needs to?
Edit:
It's clear to me now that setting the fetch size is not useful for me. Remember that my application and MySQL server are both running on the same machine. If MySQL is pulling in the entire query result into memory, then that affects the app too since they both share the same physical memory.
The MySQL Connector/J driver will fetch all rows, unless the fetch size is set to Integer.MIN_VALUE (in which case it will fetch one row at a time AFAIK). See the MySQL Connector/J JDBC API Implementation Notes under ResultSet.
If you expect memory usage to become a problem (or when it actually becomes a problem), you could also implement paging using the LIMIT clause (instead of using setFetchSize(Integer.MIN_VALUE)).

Large table in sqlserver

I have web application that use SQLserver . In db I have one big table (3GB). All db have 4GB. Problem is that query from another table (not that big one) is very slow sometimes. Sometimes query need few second, but sometimes same query need several minutes.
My question is: can one big table slow down query from another table?
Because i em using sql-server-2008 express edition, with limitation of 1 GB RAM and 10 GB database size, could that be a problem? Would changing sql server edition solve my problem? There is about 50 users all the time that use application.
In general, the simple existence of table A should not have any affect on queries against table B that do not involve table A. That said, if application X is querying table B and at the same time application Y is querying table A, and the query against A takes a lot of work, than that can slow down the query against table B, because the server only has so much power.
I can think of ways in which the existence of a large table could slow down queries against another table. For example, if the disk is fragmented with parts of small table B, then big table A, then more of small table B, any access against B has to move across larger sections of the hard drive. But wow, I doubt this would be a big issue.
I suppose there could be background processes, like accumulating table statistics for the optimizer, that would kick in on the big table just as you are running a query against the little table. Maybe someone with more knowledge of the internals could weigh in on a question like that.
it could be RAM related, SQL server caches data in RAM and if a very large table gets cached, this could be at the expense of other tables not being cached and therefore slower to access from disk.
This is just a theory, but you might want to try out the queries in
https://www.mssqltips.com/sqlservertip/2393/determine-sql-server-memory-use-by-database-and-object/
How much RAM have you allocated to SQL server, and how many other databases/things going on?

Splitting Long running SQL query into multiple smaller queries

I am using SQL Server 2008 and Java 6 / Spring jdbc.
We have a table with records count ~60mn.
We need to load this entire table into memory, but firing select * on this table takes hours to complete.
So I am splitting the query as below
String query = " select * from TABLE where " ;
for(int i =0;i<10;i++){
StringBuilder builder = new StringBuilder(query).append(" (sk_table_id % 10) =").append(i);
service.submit(new ParallelCacheBuilder(builder.toString(),namedParameters,jdbcTemplate));
}
basically, I am splitting the query by adding a where condition on primary key column,
above code snippet splits the query into 10 queries running in parallel.this uses java's ExecutorCompletionService.
I am not a SQL expert, but I guess above queries will need to load same data in memory before applyinh modulo operator on primary column.
Is this good/ bad/ best/worst way? Is there any other way, please post.
Thanks in advance!!!
If you do need all the 60M records in memory, select * from ... is the fastest approach. Yes, it's a full scan; there's no way around. It's disk-bound so multithreading won't help you any. Not having enough memory available (swapping) will kill performance instantly. Data structures that take significant time to expand will hamper performance, too.
Open the Task Manager and see how much CPU is spent; probably little; if not, profile your code or just comment out everything but the reading loop. Or maybe it's a bottleneck in the network between the SQL server and your machine.
Maybe SQL Server can offload data faster to an external dump file of known format using some internal pathways (e.g. Oracle can). I'd explore the possibility of dumping a table into a file and then parsing that file with C#; it could be faster e.g. because it won't interfere with other queries that the SQL server is serving at the same time.

Categories

Resources