I am new to cassandra and migrating my application from Mysql to cassandra.As i read of cassandra it says it reduces the read and write time compared to Mysql.when i tried a simple example with single node using hector, reading operation is quite faster compared to Mysql,and if i tried to insert a single column it is very fast compared to mysql.But when i tried to insert a single row with multiple columns its taking long time compared to mysql. Is there anyway to improve Write performance or please let me know if i am wrong with my way of coding.
My sql code is
INSERT into energy_usage(meter_id,reading_datetime,reading_date,reading_time,asset_id,assetid)
VALUES('164','2012-12-07 00:30:00','2012-12-07','00:00:00','1','1') "
my cassandra code is
Mutator<String> mutator = HFactory.createMutator(keyspaceOperator, StringSerializer.get());
mutator.addInsertion("888999", DYN_CF, HFactory.createStringColumn("assetid","1")).
addInsertion("888999", DYN_CF, HFactory.createStringColumn("meterid", "164")).
addInsertion("888999", DYN_CF, HFactory.createStringColumn("energyusage","10")).
addInsertion("888999", DYN_CF, HFactory.createStringColumn("readdate","2012-12-07")).
addInsertion("888999", DYN_CF, HFactory.createStringColumn("readdatetime","2012-12-07 00:30:00"));
mutator.execute();
Well, if you can provide us with some benchmarks stating what do you find as 'very slow', maybe we can help you further. But one important thing is to note that a cassandra feature is that as the amount of data in your cluster grows, you are guaranteed (loosely) to get somewhat more constant read/write performance times as compared to a relational db engine.
Have you tried executing the same query in a cassandra-cli/cqlsh, for example? What kind of performance do you get then? If the difference between the cassandra-cli/cqlsh statement and the piece of code you posted is dramatic, that means that it is probably a driver issue, so I would dig deeper in Hector code.
Related
I recently got into an interview and I was asked a question
We have a table employee(id, name). And in our java code, we are writing a logic to fetch data from this table and display it in UI. The query is
Select id,name from employee
Query was that during debugging, we found that this jdbc call to fire the query and get the output is taking say 20 secs and we want to reduce this to say 5 seconds or to the optimal time. How can we you do that, or how will I tackle this problem?
As there is no where clause in the query, I didn't suggest to index the column.
As this logic is taking 20 secs every time, so, some other code getting a lock on this table is also out of question.
I suggested that limiting the number of records fetched from the table should help but the interviewer didn't look convinced
Is there anything else we can do as a developer to optimize the call. I guess DBA might tune database setting to improve the performance of this query, but is there any other way
OK, so this is an interview question, so both the problem and the solutions are hypothetical. The interviewer is asking for possible optimizations and / or approaches. Here are some that are most likely to help:
Modify the query to page the data rather than fetching the whole lot. This looks applicable for the example query. Note that this is not just "limiting the number of rows selected from the table" ... which is probably why the interviewer looked doubtful when you said that!
If you do need to display the entire selected record set but in a reduced form (e.g. summed, averaged, sorted, collated etc), do the reduction in the query rather than by fetching the records and doing it in the client.
Tune the fetchSize() as suggested by Ivan.
Here are some other ideas that are less likely to help and / or will require extensive reworking.
Look at the network configs. For example you may be able to get better throughput by OS-level tuning TCP buffer, or optimizing physical or virtual network paths.
Run the query on the database server itself (to eliminate network overheads)
Use an in-memory table
Query a secondary database server; e.g. a readonly snapshot or a slave
You can try to increase fetchSize() for Statement/PreparedStatement to decrease number of network roundtrips between application server/desktop and database server.
You can start several threads that will query some piece of data and then merge all data from several threads.
EDIT: doesn't apply to this situation because id and name are the only columns on this table, but still useful for other readers to note.
If you create an index covering both id and name, then the database can use that index to read the data faster since it wont even have to even read the table.
See this link for a more thorough explanation.
if the index contains all the columns you’re requesting it doesn’t even need to look in the table. That concept is known as index coverage.
I was curious about how fast we can execute MySQL query through a loop in Java and well it is taking extremely long time with the code I wrote. I am sure there is a faster way to do it. Here is the piece of code that is executing the query.
PreparedStatement ps=connection.prepareStatement("INSERT INTO books.author VALUES (?,?,?)");
for (int i=0;i<100000;i++)
{
ps.setString(1,test[i][0]);
ps.setString(2,test[i][1]);
ps.setString(3,test[i][2]);
ps.addBatch();
}
int []p=ps.executeBatch();
Any help would be much appreciated. Thank you
Your basic approach is correct. Since you are using MySQL you may want to add rewriteBatchedStatements=true to your connection URL as discussed in this answer. It has been known to significantly speed up batch INSERT operations to MySQL databases.
There are a number of other things that kick in when you have a huge batch. So going "too" big will slow down the rows/second. This depends on a few settings, the specific table schema, and other things. I have see a too-big batch run twice as slow as a more civilized batch.
Think of the overhead of parsing the query as being equivalent to inserting an extra 10 rows. Based on that Rule of Thumb, a batch insert of 1000 rows is within 1% of the theoretical maximum speed.
It may be faster to create a file with all the rows and do LOAD DATA. But, again, this depends on various settings, etc.
If a query is taking more time in db even after using indexed columns in join conditions then what can we do in code to minimize the execution time in Oracle and MySql.
I am feeling some daily in execution of query in Oracle from Java layer. Although I am using condition on the query on index column on numeric value column.
I am using Java Prepared Statement and execution executed from Java.
You are asking us to diagnose something without symptoms. You should provide output of EXPLAIN PLAN (or set autotrace on) and also the schema in question.
There is more to tuning than indexing columns. But without knowing, and I assume you've done all the optimization you can do, then it may be time to do pre-calculation with either tables, or materialized views.
Other options include solid state disk or parallelism (partitioning and/or parallel query) and so forth.
Not sure what you mean by "Java layer", I find that Java is often a hindrance to performance in Oracle. Stick with PL/SQL for stored procedures and daily jobs, if possible. To a Java programmer, every problem appears to be a Java problem. But Java brings little to the table as far as speeding up queries.
I need one help from you guys regarding JDBC performance optimization. One of our pojo is using jdbc to connect to a oracle database and retrieve the records. Basically the records are email addresses basing upon which emails will be sent to the users. The problem here is the performance. This process happens every weekend and the records are very huge in number, around 100k.
The performance is very slow and it worries us a lot. Only 1000 records seem to be fetched from the database every 1 hour, which means that it will take 100 hours for this process to complete (which is very bad). Please help me on this.
The database server and the java process are in two different remote servers. We have used rs_email.setFetchSize(1000); hoping that it would make any difference but no change at all.
The same query executed on server takes 0.35 seconds to complete. Any quick suggestion would of great help to us.
Thanks,
Aamer.
First look at your queries. Analyze them. See if the SQL could be made more efficient (ie, ask the database for what you want, not for what you don't want -- makes a big difference). Also check to see if there are indexes on any fields in your where and join clauses. Indexes make a big difference. But it can't be just any indexes. They have to be good indexes (ie, that the fields that make up the index provide enough uniqueness for the database to retrieve things appropriately). Work with your DBA on this. Look for either high run time against the db or check for queries with high CPU usage (even if the queries run sub-second). These are the thing that can kill your database.
Also from a code perspective, check to see if you are opening and closing your connections or if you are re-using them. Can make a big difference too.
It would help to post your code, queries, table layouts, and any indexes you have.
Use log4jdbc to get the real sql for fetching single record. Then check speed and plan for that sql. You may need a proper index or even db defragmentation.
Not sure about the Oracle driver, but I do know that the MySQL driver supports two different results retrieval methods: "stream" and "wait until you've got it all".
The streaming method lets you start process the results the moment you've got the first row returned from the query, whereas the other method retrieves the entire resultset before you can start work on it. In cases where you deal with huge recordsets, this often leads to memory exceptions, or slow performance because java hit the "memory roof" and the garbage collector can't throw away "used" records like it can in the streaming mode.
The streaming mode doesn't let you navigate/scroll the resultset the way the "normal"/"wait until you've got it all" mode...
Anyway, not sure if this is of any help but it might be worth checking out.
My answer to your question, in summary is:
1. Check network
2. Check SQL
3. Check Java code.
It sounds very slow. First thing to check would be to see if you have a slow network. You can do this pretty quickly by just pinging the database server. Or run the database server on the same machine as your JVMM. If it is not the network, get an explain plan for your SQL and ensure you are not doing table scans when you don't need to be. If it is not the network or the SQL, then it's time to check your Java code. Are you doing anything like blocking when you shouldn't be?
The problem is, we have a huge number of records (more than a million) to be inserted into a single table from a Java application. The records are created by the Java code, it's not a move from another table, so INSERT/SELECT won't help.
Currently, my bottleneck is the INSERT statements. I'm using PreparedStatement to speed-up the process, but I can't get more than 50 recods per second on a normal server. The table is not complicated at all, and there are no indexes defined on it.
The process takes too long, and the time it takes will make problems.
What can I do to get the maximum speed (INSERT per second) possible?
Database: MS SQL 2008. Application: Java-based, using Microsoft JDBC driver.
Batch the inserts. That is, only send 1000 rows at a time, rather then one row at a time, so you hugely reduce round trips/server calls
Performing Batch Operations on MSDN for the JDBC driver. This is the easiest method without reengineering to use genuine bulk methods.
Each insert must be parsed and compiled and executed. A batch will mean a lot less parsing/compiling because a 1000 (for example) inserts will be compiled in one go
There are better ways, but this works if you are limited to generated INSERTs
Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.
Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.
The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.
Have you looked into bulk operations bulk operations?
Have you considered to use batch updates?
Is there any integrity constraint or trigger on the table ?
If so, droping it before inserts will help, but you have to be sure that you can afford the consequences.
Look into Sql Server's bcp utility.
This would mean a big change in your approach in that you'd be generating a delimited file and using an external utility to import the data. But this is the fastest method for inserting a large number of records into a Sql Server db and will speed up your load time by many orders of magnitude.
Also, is this a one-time operation you have to perform or something that will occur on a regular basis? If it's one time I would suggest not even coding this process but performing an export/import with a combination of db utilities.
I would recommend using an ETL engine for it. You can use Pentaho. It's free. The ETL engines are optimized for doing bulk loading on data and also any forms of transformation/validation that are required.