SQL "like query" on varchar column in H2 not using index - java

In our application using a H2 database (version 1.4.196) we have a search on a varchar field doing either "contains" (column like '%searchterm%) or "begins with" (column like 'searchterm%) searches.
The table is quite large (approx. 400,000 entries) and the search turns out to be slow (varying between 3 seconds on my local development computer to 6 to 12 seconds on our customer's machines).
I found out that the column in question was not indexed and added an index. It turned out that the search time did not improve, even when I added an index hint explicitly to the query. explain revealed to me that no index was used in both cases.
From my experience with other database systems (e.g. MSSQL) I know that, at least for "begins with" queries, indexes can be used to improve search peformances.
As I did not find any related documentation for the H2 database my question:
Is it possible to use indexes in like queries in H2?

Most DB systems WILL use the index if you have a LIKE query... but only up to the first % in it. There is pretty much no DB system in existence where the mere act of having an index on the column would have any effect on the speed of, say: SELECT * FROM my_table WHERE my_column LIKE '%findme%';.
What you are presumably looking for is an index specifically designed to aid in searching. For example, the tsquery system that postgres has (see postgres tsquery documentation).
Another way to go is to add a dependency to your software that does this. The obvious choice there is Apache Lucene.
H2 has some support for this. Please refer to H2 documentation on full text search.

Related

Is there a way within hibernate to retrieve fast non-blocking row counts?

The following query generated by hibernate takes 13+ seconds and locks the table:
SELECT COUNT(auditentit0_.audit_id) AS col_0_0_ FROM Audit auditentit0_ WHERE 1=1;
The growing Microsoft SQL server database table contains 90+ million rows.
For Microsoft SQL server, I have found an accurate meta data way of getting the same information very quickly.
However, I would rather not write custom code for Microsoft sql server and oracle (the next database) if hibernate has a way of getting this information.
Here is an example meta data query for Microsoft sql server that is accurate and almost instant:
SELECT SUM (row_count) FROM sys.dm_db_partition_stats WHERE object_id=OBJECT_ID('huge_audit_table') AND (index_id=0 or index_id=1);
Is there a way to have hibernate issue a similar query for a table row count?
One posted answer has indicated that a view could be of use. I'm investigating this post to see if it can solve the issue:
https://vladmihalcea.com/map-jpa-entity-to-view-or-sql-query-with-hibernate/
In hibernate you should use projections like in the link you provided in order to guarantee that it works on multiple dbms:
protected Long countByCriteria(DetachedCriteria criteria) {
Criteria crit = criteria.getExecutableCriteria(getSession());
crit.setProjection(Projections.rowCount());
return (Long)crit.uniqueResult();
}
What engine are you using in mysql? I never had a blocking problem with row count in MySql or Oracle. Maybe the following link will help you: Any way to select without causing locking in MySQL?
Also, after some quick reading i see that Sql Server does indeed block on count.
Maybe you could use a stored procedure or some other mechanism to pass the problem to the dbms.
Edit:
Projections in Hibernate are used to select the columns to fetch, the columns to group elements by, and to use built-in aggregate functions (sum, count, avg, max, min, countDistinct).
It helps you keep your application database-agnotic. Remember that hibernate supports around 30 databases.
In your case you have an specific problem with mssql as the count blocks the table prioritizing accuracy. And using the system views is really quick as you get an estimate but isnĀ“t standard.
You could encapsulate the problem into a view or stored procedure dbms dependant. Or maybe you could try with a NOLOCK hint or READ UNCOMMITED in hibernate (in a count of an audit table it should be acceptable).
To solve this particular problem we stepped back and changed how the UI functions. Through a collaborative effort between UIX and UI developers we agreed that unfiltered queries will NOT ask for total counts. The initial screen load will show only a page full of data. No page 1 of 60,000 controls will exists. Only when the user enters specific criteria will the total count come into play. Those queries should be very fast. Now... it is possible for the user to still setup a query that will be just as bad as the original problem. It should be the exception versus the norm.
So there really is not a solid answer for the OP. If you are faced with this type of problem, if you have control of the UI and API, then it is time to rethink the solution. Think of how google handles paging from a UI perspective. The days of showing a "page 1 of (XX)" are gone IMHO.

Partial search through a SQL database efficiently

What are some examples of efficiently searching through a directory as you're typing a person's name?
Say for example, we have a database with 1 million users. We start typing in the search box: "sea", it will display every user's name on a scroll-able window that has "sea" on it (kind of like searching through a Skype directory). After changing a letter, the window should update immediately. All of this is coming from a SQL database. What are few efficient libraries, algorithms that can do this without much delay?
First consider changing the task from "name contains substring" to "name starts with substring". If this is possible, then add index on your name column in database table and use the query:
select name from table where name like :1 || '%'
Limit the number of returned rows using DBMS-specific syntax, for example, for Oracle add
and rownum < 20
This query should return your rows pretty fast.
If you really need "contains substring", then decide whether you want the search to be handled by database or by an external text indexing solution.
For database-contained solution you'll have to use a different approach depending on DBMS. Every one of these solutions requires configuration steps not described here.
For Oracle you can use Oracle Text, see
http://www.oracle.com/technetwork/documentation/index-098492.html
The query will look like
select name from table where contains(name, :1) > 0
For Postgres you can use Full Text Search.
You can also use a solution that is not dependent on the database, for example, see Apache Solr:
http://lucene.apache.org/solr/
for example
SELECT name
FROM Table
WHERE name LIKE '%sea%'

Documentum: ORDER_BY r_modify_date takes too long

I'm new to Documentum and have a simple problem, I am trying to retrieve all the record according to last modified.
Basically I have a datatable with 1000 records.
current we use
Select * from docfolder enabled (FETCH_ALL_RESULTS 1000)
The problem with the above statement is sometimes a newly created report or modified report will out of the 1000 range and our users will complain report not found * valid complain *
actually the last modified record does not even need to be the first on the list, it just need to appear.
I tried using
Select * from docfolder order by r_modify_date enabled (FETCH_ALL_RESULTS 1000)
but this takes too long(never complete). I try replacing * with a,b,c,d (fields) but it does not work too.
May I know if there is other solutions to my issue?
I am considering documentum "ENABLE (RETURN_TOP 10)" hint but I doubt it work for Oracle 11g and how does documentum define top 1000?
UPDATE: It seems that using data link via toad is faster than using DQL, but I need a DQL solution due to legacy issues.
Documentum 6.0 and Oracle 11g.
What version of Documentum are you using?
Ensure that there are indexes on the r_object_id. You may also want to add an index to the r_modify_date.
Further, when adding fields a,b,c,d - ensure that these fields are "non-repeating". In this way, Documentum will not need to join the _r table making the overall query faster.
Further, in DA, if you do the query, you can actually see the SQL query passed to Oracle. Take this query and run it in Toad and look for optimizations. You may also register the _s table so that your can DQL query the _s table directly.
I manage to solve this problem by querying the under lining table in oracle database.
The reason for slow performance was because of the table begin joint behind to obtain the result.
In future if you have exhausted all ways to optimize your DQL, just fall back to querying the oracle database.
I have recommended for all table view and search to query via oracle.
Only individual report are retrieved via documentum, sometimes I question the purpose of having documentum.

JDBC setMaxRows database usage

I am trying to write a database independant application with JDBC. I now need a way to fetch the top N entries out of some table. I saw there is a setMaxRows method in JDBC, but I don't feel comfortable using it, because I am scared the database will push out all results, and only the JDBC driver will reduce the result. If I need the top 5 results in a table with a billion rows this will break my neck (the table has an usable index).
Writing special SQL-statements for every kind of database isn't very nice, but will let the database do clever query planning and stop fetching more results than necessary.
Can I rely on setMaxRows to tell the database to not work to much?
I guess in the worst case I can't rely on this working in the hoped way. I'm mostly interested in Postgres 9.1 and Oracle 11.2, so if someone has experience with these databases, please step forward.
will let the database do clever query planning and stop fetching more
results than necessary.
If you use
PostgreSQL:
SELECT * FROM tbl ORDER BY col1 LIMIT 10; -- slow without index
Or:
SELECT * FROM tbl LIMIT 10; -- fast even without index
Oracle:
SELECT *
FROM (SELECT * FROM tbl ORDER BY col1 DESC)
WHERE ROWNUM < 10;
.. then only 10 rows will be returned. But if you sort your rows before picking top 10, all basically qualifying rows will be read before they can be sorted.
Matching indexes can prevent this overhead!
If you are unsure, what JDBC actually send to the database server, run a test and have the database engine log the statements received. In PostgreSQL you can set in postgresql.conf:
log_statement = all
(and reload) to log all statements sent to the server. Be sure to reset that setting after the test or your log files may grow huge.
The thing which could/may kill you with billion(s) of rows is the (highly likely) ORDER BY clause in your query. If this order cannot be established using an index then . . . it'll break your neck :)
I would not depend on the jdbc driver here. As a previous comment suggests it's unclear what it really does (looking at different rdbms).
If you are concerned regarding speed of your query you can use a LIMIT clause as well. If you use LIMIT you can at least be sure that it's passed on to the DB server.
Edit: Sorry, I was not aware that Oracle doesn't support LIMIT.
In direct answer to your question regarding PostgreSQL 9.1: Yes, the JDBC driver will tell the server to stop generating rows beyond what you set.
As others have pointed out, depending on indexes and the plan chosen, the server might scan a very large number of rows to find the five you want. Proper server configuration can help accurately model the costs to prevent this, but if value distribution is unusual you may need to introduce and optimization barrier (like with a CTE) to coerce the planner to produce a good plan.

Java MySQL Programming

I am accessing a MySQL table that has over 1 million or more Records. I am using My SQL query browser which is unable to grab all the records and it break the connection in the middle.
Now I have to write a Java Program which access that particular table without being broken in the middle as this table will be modified and accessed frequently.
Can you experts suggest me how should do I go over this problem
either I create an Index on the table and how do I create index
There are different reasons why a MySQL connection might break during a query. Can you give the exact error message you receive?
A simplified explanation on how to add an index to the table for a simple query
Look at the field(s) in the WHERE
clause of the query
Add an index on the field(s) using
ALTER TABLE ADD INDEX
Use EXPLAIN on the query and check
if the query is actually using the
index.
IF you want more specific help, Post the SHOW CREATE TABLE and the EXPLAIN of your query.
MySQL query browser limits the number of records to be displayed for performance reasons, because it is an interactive program and nobody like to wait for half an hour before the program crashes with an out-of-memory error. You can change these limits in the settings.
Your Java program will face similar problems.
When using large datasets it is important to plan how you are going to access that dataset and create the necessary indexes.
It would be useful to edit the question to show the structure of the data. Generqlly it looks like this :
CREATE INDEX idx_customer_name ON customer (name);
Here are more details
If you just want to dump the data to work on the data using Excel you can try this on the commandline
mysqldump -u [username] -p -t -T/path/to/directory [database] --fields-enclosed-by=\" --fields-terminated-by=,
In my experience this is a very painful exercise as Excel really is not made to deal with this amount of rows, and the dump format usually is slightly, but infuriatingly incompatible.
Your best bet is to invest an hour of your time to go through a SQL tutorial like sql fundamentals and play with MySQL query browser to get a feel of what you can do with SQL. I guarantee your investment paid itself back by tomorrow.
I am not very well used to MySQL programming, but generally indexes are used to arrange the values of one or more columns in a database table in specific order.
SYNTAX
CREATE INDEX IndexName ON tableName (column);
Just go through this tutorial for more information,
http://dev.mysql.com/doc/refman/5.0/en/create-index.html

Categories

Resources