is concurrency automatically handled in sqlite? - java

On my server, in order to speed up things, I have allocated a connection pool to my sqlite odbc source.
What happens if two or more hosts want to alter my data?
Are these multiple connections automatically handled by the sqllite?

You can read this thread
If most of those concurrent accesses are reads (e.g. SELECT), SQLite can handle them very well. But if you start writing concurrently, lock contention could become an issue. A lot would then depend on how fast your filesystem is, since the SQLite engine itself is extremely fast and has many clever optimizations to minimize contention. Especially SQLite 3.
For most desktop/laptop/tablet/phone applications, SQLite is fast enough as there's not enough concurrency. (Firefox uses SQLite extensively for bookmarks, history, etc.)
For server applications, somebody some time ago said that anything less than 100K page views a day could be handled perfectly by a SQLite database in typical scenarios (e.g. blogs, forums), and I have yet to see any evidence to the contrary. In fact, with modern disks and processors, 95% of web sites and web services would work just fine with SQLite.
If you want really fast read/write access, use an in-memory SQLite database. RAM is several orders of magnitude faster than disk.
And check this

In short: It is not good solution.
Description:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time.
For your situation it is not good.
Advice: Use another RDBMS.

Related

Load existing SQLite database to memory

I have an existing database in a file. I want to load the database in memory; because I'm doing a lot queries and the database isn't very large (<50MB) to fasten those queries. Is there any way to do this?
50 MB easily fits in the OS file cache; you do not need to do anything.
If the file locking results in a noticeable overhead (which is unlikely), consider using the exclusive locking mode.
You could create a RAM drive and have the database use these files instead of your HDD/SSD hosted files. If you have insane performance requirements your could go for a in memory database as well.
Before you do for any in memory solutions: what is "a lot of queries" an what is the expected response time per query? Chances are that the database program isn't the performance bottleneck, but slow application code or inefficient queries / lack of indexes / ... .
I think SQLite does not support concurrent access to the database, which would waste a lot of performance. If write occur rather infrequently, you could boost your performance by keeping copies of the database and have different threads read different SQLite instances (never tried that).
Either of the solutions suggested by CL and Ray will not perform as well as a true in-memory database due to the simple fact of the file system overhead (irrespective of whether the data is cached and/or in a RAM drive; those measure will help, but you can't beat getting the file system out of the way, entirely).
SQLite allows multiple concurrent readers, but any write transaction will block readers until it is complete.
SQLite only allows a single process to use an in-memory database, though that process can have multiple threads.
You can't load (open) a persistent SQLite database as an in-memory database (at least, the last time I looked into it). You'll have to create a second in-memory database and read from the persistent database to load the in-memory database. But if the database is only 50 MB, that shouldn't be an issue. There are 3rd party tools that will then let you save that in-memory SQLite database and subsequently reload it.

Threads on Multiple VMs accessing a table on single Instance of DB causing low performance and Exceptions occasionally

Application is hosted on multiple Virtual Machines and DB is on single server. All VMs are pointing to single Instance of DB.
In this architecture, I have a table having very few record. But this table is accessed and updated by threads running on VMs very heavily. This is causing a performance bottleneck and sometimes record level exception. Database level locking does not seem to be best option as it is introducing significant delays in request processing.
Please suggest if there is any other technique to solve this problem.
Few questions first!
Is your application using connection pooling? If not, please use it. Creating a JDBC connection is expensive!
Is your application read heavy/write heavy?
What kind of storage engine are you using in your MySQL tables? InnoDB or MyISAM. If your application is write heavy, please use InnoDB based tables as it uses row level locking and will serve concurrent requests better.
One special case - if you are implementing queues on top of database tables, find a database that has a built-in queue operation and use that, or use a reliable messaging service. Building queues on top of databases is typically not efficient. See e.g. http://mikehadlow.blogspot.co.uk/2012/04/database-as-queue-anti-pattern.html
In general, running transactions on databases is slow because at the end of each transaction the database needs to be sure that enough has been saved out to disk that if the system died right now the changes made by the transaction would be safely preserved. If you don't need this you might find it faster to write a single non-database application that does what the database does but doesn't write anything out to disk, or still does database IO but does the minimum possible. Then instead of all of the VMs talking to the database directly they would all talk to this application.

MySQL data lookup?

I am very new at MySQL but I have some experience in OOP in Java. When you have a "database" in memory (like when programming in Java) speed of lookups depend on whether it is stored in the form of a linked list, hashmap, etc. A MySQL database is stored on the hard disk, as it is a file. What is the running time of a lookup in that case? Is there a way to get constant time access to an item in the hard disk like you do when data is in memory? Does it depend on whether its a hard disk or an SSD?
Thanks!
MySQL performance is a large and complicated topic. You can find many books on the topic and no answer here can hope to touch on all the important issues.
From the perspective of data base design, the main steps (not necessarily in order) are to design a good schema, define the right kind of indexes to support the queries you expect to process, pick the right engine for each table, and design the queries themselves for best performance. The MySQL Query Analyzer is a tool for addressing these issues, as is the EXPLAIN statement.
In terms of server configuration, there's a huge amount of tuning that goes into improving performance. As the MySQL manual topic Estimating Query Performance describes, the fundamental unit of performance is a disk seek, and theoretically this is log N in the number of rows. However, both the underlying OS and the MySQL server may do quite a bit of caching, which can make the actual performance grow much more slowly in practice (not to mention making it faster in absolute terms). Follow the links from the above page, search for some server configuration guides on the web, and/or read a book or two to get some insight into how to use all the performance-related parameters that MySQL provides.

Fast Oracle Select [Huge Data]

I have a project whereby I'm reading huge volumes of data from an Oracle database from Java.
I have the feeling that the application we are writing is going to process the data far faster than it will be given to us using a single threaded SELECT query and so I've been trying to research faster ways of obtaining the data.
Does anyone have anything I could read that would help me with my plight?
You haven't given us a lot of information on why it will be necessary to bring "huge volumes of data" into the Java application instead of processing it on the database side. Although there can be exceptions, usually this is signal to re-think the design. As a general rule with Oracle it is most efficient to do as much work as you can with pure set operations (SQL), followed by procedural processing with the rdbms engine (PL/SQL) before bringing results back to the client application.
Oracle supports parallel DML. In particular this applies to SELECT queries. Ultimately the bottleneck will probably be the IO read speed. Either use faster disks or stripe the data accross many disks.
Update
As APC noted in the comments Parallel Queries/DML is an Entreprise Edition feature and is not available in the Standard Edition.
Also, Parallel DML/Query is not the solution to all performance problems. Since more than one process will be used by the query it may improve throughput, but at the cost of concurrency. The purpose of parallelism is to use more resources to process the query faster. If the query is IO-bound or CPU-bound, there is no extra resources to use and adding parallelism will only make matter worse.
From the link above:
Parallel execution is not normally
useful for:
Environments in which the CPU, memory, or I/O resources are already
heavily utilized. Parallel execution
is designed to exploit additional
available hardware resources; if no
such resources are available, then
parallel execution will not yield any
benefits and indeed may be detrimental
to performance.
Use the setFetchSize(int) method on the Statement or PreparedStatement before you open the query. You should experiment with different sizes. Try 75 as a starting point.
On a slightly different useage, people have said that the PL/SQL bulk fetch "sweet spot" is between 2000 and 3000 but I saw one benchmark that indicated that 75 was optimum.
A large fetch size will tend to reduce the number of round trips between client and server. But if it is too large the database has to have a big buffer and the networking software may have to break up the big message into a lot of packets.
Firstly, 'huge data' to database people is [at least] gigabytes, in which case I suspect your problems are going to be reading those sort of volumes into your processes memory and aggregating them there. Why do you think a single-threaded select will be the bottleneck ?
If the bottleneck were getting the data from disk, then having multiple threads pulling data from the same disk wouldn't necessarily be faster and may even be slower. But if you could spread the data over separate disks, separate threads would be faster. If, using SSD, you don't think disks will be a contention point,we can look elsewhere.
If the bottleneck was network bandwidth, again multiple threads wouldn't fit any more data through the pipe any faster. You may even benefit from unloading the data to a flat file, compressing it and transferring that.
If the select is being sorted or comes from a hash-join, you may use memory more efficiently with a single thread. Multiple sessions would have to share the machine's memory.
If there is a CPU intensive processing then multiple threads may help. That could be as simple as having multiple connections from java, each getting a different slice of data (eg A-K and L-Z), but it would very much depend on the SELECT.
I agree with dpbradley that you should determine the bottleneck first. If you have the data and select, it should be simple enough to determine how long it takes (both on the local machine and through the network), and a trace would be a necessary starting point to really go into how it could be speeded up.

Terracotta + Compass = Hibernate + HSQLDB + JMS?

I am currently in need of a high performance java storage mechanism.
This means:
1) I have 10,000+ objects with 1 - Many Relationship.
2) The objects are updated every 5 seconds, with the most recent updates persistent in the case of system failure.
3) The objects need to be queryable in a reasonable time (1-5 seconds). (IE: Give me all of the objects with this timestamp or give me all of the objects within these location boundaries).
4) The objects need to be available across various Glassfish installs.
Currently:
I have been using JMS to distribute the objects, Hibernate as an ORM, and HSQLDB to provide the needed recoverablity.
I am not exactly happy with the performance. Especially the JMS part of this.
After doing some Stack Overflow research, I am wondering if this would be a better solution. Keep in mind that I have no experience with what Terracotta gives me.
I would use Terracotta to distribute objects around the system, and something else need to give the ability to "query" for attributes of those objects.
Does this sound reasonable? Would it meet these performance constraints? What other solutions should I consider?
I know it's not what you asked, but, you may want to start by switching from HSQLDB to H2. H2 is a relatively new, pure Java DB. It is written by the same guy who wrote HSQLDB and he claims the performance is much better. I'm using it for some time now and I'm very happy with it. It should be a very quick transition (add a Jar, change the connection string, create the database) so it's worth a shot.
In general, I believe in trying to get the most of what I have before rewriting the application in a different architecture. Try profiling it to identify the bottleneck first.
At first, Lucene isn't your friend here. (read only)
Terracotta is to scale around at the Logical layer! Your problem seems not to be related to the processing logic. It's more around the Storage/Communication point.
Identify your bottleneck! Benchmark the Storage/Logic/JMS processing time and overhead!
Kill JMS issues with a good JMS framework (eg. ActiveMQ) and a good/tuned configuration.
Maybe a distributed key=>value store is your friend. Try Project Voldemort!
If you like to stay at Hibernate and HSQL, check out the Hibernate 2nd level cache and connection pooling (c3po, container driven...)!
Several Terracotta users have built systems like this in the past, so I can you tell you by proof of existence that it can be done. :)
Compass does have support for clustering with Terracotta so that might help you. I suspect you might get further faster by just being careful with how you create your clustered data structures.
Regarding your requirements and Terracotta:
1) 10k objects is quite small from a Terracotta perspective
2) 5 sec update rate doesn't seem like an issue. Might depend how many nodes there are and whether there is any natural partitioning you can take advantage of. All updates will be persistent.
3) 1-5 second query time seems quite easy. Building your own well-organized data structures for lookup is the tricky part. Obviously you want to avoid scanning all the data.
4) Terracotta currently supports Glassfish v1 and v2.
If you post on the Terracotta forums, you could probably get more Terracotta eyeballs on the problem.
I am currently working on writing the client for a very (very) fast Key/Value distributed hash DB that provides set + list semantics. The DB is C99 and requires GCC and right now I'm battling with good old Java network IO to break my current 30,000 get/sets per/sec barrier. Hope to be done within the week. Drop me a line through my account and I'll get back when its show time.
With such a high update rate, Lucene is almost definitely not what you're looking for, since there is no way to update a document once it's indexed. You'd have to keep all the object versions in the index and select the one with the latest time stamp, which will kill your performance.
I'm no DB expert, but I think you should look into any one of the distributed DB solutions that's been on the news lately. (CouchDB, Cassandra)
Maybe you should take a look to: Prevayler.
Your objects are always in mem.
The "changes" to your objects are persisted.
From time to time you are able to take a snapshot: every object is persisted.
You don't say what vendor you are using for JMS, but I wouldn't surprise me if you have some bottle neck there. I couldn't get more than 100 messages a second from ActiveMq, and whatever I tried in terms of configuration of acknowledgment, queue size, etc we were unable to soak the CPU beyond a few percent.
The solution was to batch many queries into one JMS message. We had a simple class that either sent a batch of messages when it got to 200 queries or reached a timeout (we used 20ms), which gave us a dramatic increase in message throughput.
Guaranteed messaging is going to be much slower than volatile messaging. Given every object is updated every few second, you might consider batching your updates (into say 500 changes or by time say 1-10 ms' worth), sending over volatile messaging, and batching your transactions. In this case you are more likely to be limited by bandwidth. Tuning your use case you may find smaller batch sizes also work efficiently. If bandwidth is critical (say you have a 10 MB connection or slower, then you could use compression over JMS)
You can achieve much higher performance with a custom solution (which also might be simpler) e.g. Hazelcast & JGroups are free (you can add a node(s) which does the database synchronization so your main app doesn't slow down). There are commercial products which handle in the order of half a million durable messages/sec.
Terracotta + jofti = queryable persistent clustered data structures
Search google for terracotta querymap or visit tusharkhairnar.blogspot.com for querymap blog
You may want to integrate timasync as well to update your database. Database is is your system of record use terracotta as caching and database offloading mechanism you can even batch async updates to make it faster so that I'd db contains fairly recent data
Tushar
tusharkhairnar.blogspot.com

Categories

Resources