I'm having a big problem with the embedded firebird database engine and Java SE.
I'm currently developing a filtering tool for users to filter out data.
So I have made two Options for filtering, user can chose one or both:
Filter out from a black list(the black list is controled by user).
Filter out according to a massive list that records every record ever uploaded and filtered out.
The data the user uploads its in plain text comma or token separated like this:
(SET OF COLUMNS)| RECORD TO FILTER |
0-MANY COLUMNS | ABC2 |
0-MANY COLUMNS | ABC5 |
When I Upload it to the DB, I add A FLAG for every filter
(SET OF COLUMNS) | RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC2 | | |
0-MANY COLUMNS | ABC5 | | |
So, when it comes to the second filter, the program has a main empty table on the first run of the software, then it fills that table with all the records from the very first upload.
The main table will have unique records like the following table after a few text uploads made by the user:
Record | Date criteria for filtering |
ABC1 | 08/11/2012:1,07/11/2012:3,06/11/2012:5|
ABC2 | 05/11/2012:1,04/11/2012:0,03/11/2012:0|
ABC3 | 12/11/2012:3,11/11/2012:0,10/11/2012:0|
ABC4 | 12/11/2012:1,11/11/2012:0,10/11/2012:0|
ABC5 | 12/11/2012:3,11/11/2012:0,10/11/2012:3|
ABC9 | 11/11/2012:3,10/11/2012:1,09/11/2012:0|
When the data is processed, for example, the software updates both, the main table and the user table:
(SET OF COLUMNS| RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC4 | | |
0-MANY COLUMNS | ABC9 | | |
So the main table will update:
Record | Day criteria for filtering |
ABC1 | 08/11/2012:1,07/11/2012:3,06/11/2012:5|
ABC2 | 05/11/2012:1,04/11/2012:0,03/11/2012:0|
ABC3 | 12/11/2012:3,11/11/2012:0,10/11/2012:0|
ABC4 | 12/11/2012:1,11/11/2012:0,10/11/2012:0| ->12/11/2012:2,11/11/2012:0,10/11/2012:0
ABC5 | 12/11/2012:3,11/11/2012:0,10/11/2012:3|
ABC9 | 11/11/2012:3,10/11/2012:1,09/11/2012:0| ->12/11/2012:1,11/11/2012:3,10/11/2012:1
If in the last three days the data criteria event has reached more than four, the user table will flag filter B. Notice that each date has an integer next to it.
(SET OF COLUMNS)| RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC4 | | |
0-MANY COLUMNS | ABC9 | | X |
Both updates are in a single transaction, the problem is that when the user uploads more than 800,000 records my program throws the following exception in the while loop.
I use StringBuilder parsing and append methods for maximun performance on the mutable days string.
java.lang.OutOfMemoryError: Java heap space
Here it's my code, I use five days:
FactoriaDeDatos factoryInstace = FactoriaDeDatos.getInstance();
Connection cnx = factoryInstace.getConnection();
cnx.setAutoCommit(false);
PreparedStatement pstmt = null;
ResultSet rs=null;
pstmt = cnx.prepareStatement("SELECT CM.MAIL,CM.FECHAS FROM TCOMERCIALMAIL CM INNER JOIN TEMPMAIL TMP ON CM.MAIL=TMP."+colEmail);
rs=pstmt.executeQuery();
pstmtDet = cnx.prepareStatement("ALTER INDEX IDX_MAIL INACTIVE");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("SET STATISTICS INDEX IDX_FECHAS");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("ALTER INDEX IDX_FECHAS INACTIVE");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("SET STATISTICS INDEX IDX_FECHAS");
pstmtDet.executeUpdate();
sql_com_local_tranx=0;
int trxNum=0;
int ix=0;
int ixE1=0;
int ixAc=0;
StringBuilder sb;
StringTokenizer st;
String fechas;
int pos1,pos2,pos3,pos4,pos5,pos6,pos7,pos8,pos9;
StringBuilder s1,s2,sSQL,s4,s5,s6,s7,s8,s9,s10;
long startLoop = System.nanoTime();
long time2 ;
boolean ejecutoMax=false;
//int paginador_sql=1000;
//int trx_ejecutada=0;
sb=new StringBuilder();
s1=new StringBuilder();
s2=new StringBuilder();
sSQL=new StringBuilder();
s4=new StringBuilder();
s6=new StringBuilder();
s8=new StringBuilder();
s10=new StringBuilder();
while(rs.next()){
//De aqui
actConteoDia=0;
sb.setLength(0);
sb.append(rs.getString(2));
pos1= sb.indexOf(":",0);
pos2= sb.indexOf(",",pos1+1);
pos3= sb.indexOf(":",pos2+1);
pos4= sb.indexOf(",",pos3+1);
pos5= sb.indexOf(":",pos4+1);
pos6= sb.indexOf(",",pos5+1);
pos7= sb.indexOf(":",pos6+1);
pos8= sb.indexOf(",",pos7+1);
pos9= sb.indexOf(":",pos8+1);
s1.setLength(0);
s1.append(sb.substring(0, pos1));
s2.setLength(0);
s2.append(sb.substring(pos1+1, pos2));
s4.setLength(0);
s4.append(sb.substring(pos3+1, pos4));
s6.setLength(0);
s6.append(sb.substring(pos5+1, pos6));
s8.setLength(0);
s8.append(sb.substring(pos7+1, pos8));
s10.setLength(0);
s10.append(sb.substring(pos9+1));
actConteoDia=Integer.parseInt(s2.toString());
actConteoDia++;
sb.setLength(0);
//sb.append(s1).a
if(actConteoDia>MAXIMO_LIMITE_POR_SEMANA){
actConteoDia=MAXIMO_LIMITE_POR_SEMANA+1;
}
sb.append(s1).append(":").append(actConteoDia).append(",").append(rs.getString(2).substring(pos2+1, rs.getString(2).length()));
//For every date record it takes aprox 8.3 milisec by record
sSQL.setLength(0);
sSQL.append("UPDATE TCOMERCIALMAIL SET FECHAS='").append(sb.toString()).append("' WHERE MAIL='").append(rs.getString(1)).append("'");
pstmtDet1.addBatch(sSQL.toString());
//actConteoDia=0;
//actConteoDia+=Integer.parseInt(s2.toString());
actConteoDia+=Integer.parseInt(s4.toString());
actConteoDia+=Integer.parseInt(s6.toString());
actConteoDia+=Integer.parseInt(s8.toString());
actConteoDia+=Integer.parseInt(s10.toString());
if(actConteoDia>MAXIMO_LIMITE_POR_SEMANA){
sSQL.setLength(0);
sSQL.append("UPDATE TEMPMAIL SET DIASLIMITE='S' WHERE ").append(colEmail).append("='").append(rs.getString(1)).append("'");
pstmtDet.addBatch(sSQL.toString());
}
sql_com_local_tranx++;
if(sql_com_local_tranx%2000==0 || sql_com_local_tranx%7000==0 ){
brDias.setString("PROCESANDO "+sql_com_local_tranx);
pstmtDet1.executeBatch();
pstmtDet.executeBatch();
}
if(sql_com_local_tranx%100000==0){
System.gc();
System.runFinalization();
}
}
pstmtDet1.executeBatch();
pstmtDet.executeBatch();
cnx.commit();
I've made telemetry tests so I can trace where the problem lies.
The big while it's the problem, I think, but I don't know where the problem may exactly be.
I'm adding some images of the telemtry tests, please I need to interpretate them properly.
The gc becomes inverse to the time the jvm to keep objects alive:
http://imageshack.us/photo/my-images/849/66780403.png
The memory heap goes from 50 MB to 250 MB, the used heap reaches the 250 MB creating the outOfMemory exception:
50 MB
http://imageshack.us/photo/my-images/94/52169259.png
REACHING 250 MB
http://imageshack.us/photo/my-images/706/91313357.png
OUT OF MEMORY
http://imageshack.us/photo/my-images/825/79083069.png
The final stack of objetcts generated ordered by LiveBytes:
http://imageshack.us/photo/my-images/546/95529690.png
Any help, suggestion, answer will be vastly appreciated.
The problem is that you are using PreparedStatement as if it is a Statement, as you are calling addBatch(string). The javadoc of this method says:
Note:This method cannot be called on a PreparedStatement or CallableStatement.
This comment was added with JDBC 4.0, before that it said the method was optional. The fact that Jaybird allows you to call this method on PreparedStatement is therefor a bug: I created issue JDBC-288 in the Jaybird tracker.
Now to the cause of the OutOfMemoryError: When you use addBatch(String) on the PreparedStatement implementation of Jaybird (FBPreparedStatement), it is added to a list internal to the Statement implementation (FBStatement). In case of FBStatement, when you call executeBatch(), it will execute all statements in this list and then clear it. In FBPreparedStatement however executeBatch() is overridden to execute the originally prepared statement with batch parameters (in your example it won't do anything, as you never actually add a PreparedStatement-style batch). It will never execute the statements you added with addBatch(String), but it will also not clear the list of statements in FBStatement and that is most likely the cause of your OutOfMemoryError.
Based on this, the solution should be to create a Statement using cnx.createStatement and use that to execute your queries, or investigate if you could benefit from using one or more PreparedStatement objects with a parameterized query. It looks like you should be able to use two separate PreparedStatements, but I am not 100% sure; the added benefit would be protection against SQL injection and a minor performance improvement.
Addendum
This issue has been fixed since Jaybird 2.2.2
Full disclosure: I am the developer of the Jaybird / Firebird JDBC driver.
do not execute batch statements while iterating through the result set. store the sql you want to execute in a collection and then when you have finished processing the result set
start executing the new sql. does everything have to happen inside the same transaction?
Related
I have a table in DynamoDB with my users (Partial key = key, Sort key = no):
key isActive
user1 true
user2 false
... ...
In my code I need to return a next user with status not active (isActive=false). What is the best way to do this, if I need solution is based on that I have
A huge table
Concurrent environment
I wrote code that works, BUT I amn't sure it is a good solution due to Scan and Filter expression:
public String getFreeUser() throws IOException {
Table table = dynamoDB.getTable("usersTableName");
ScanSpec spec = new ScanSpec()
.withFilterExpression("isActive = :is_active")
.withValueMap(new ValueMap().withBoolean(":is_active", false));
ItemCollection<ScanOutcome> items = table.scan(spec);
Iterator<Item> iterator = items.iterator();
Item item = null;
while (iterator.hasNext()) {
item = iterator.next();
try {
UpdateItemSpec updateItemSpec = new UpdateItemSpec()
.withPrimaryKey(new PrimaryKey("key", item.getString("key")))
.withUpdateExpression("set #ian=:is_active_new")
.withConditionExpression("isActive = :is_active_old")
.withNameMap(new NameMap()
.with("#ian", "isActive"))
.withValueMap(new ValueMap()
.withBoolean(":is_active_old", false)
.withBoolean(":is_active_new", true))
.withReturnValues(ReturnValue.ALL_OLD);
UpdateItemOutcome outcome = table.updateItem(updateItemSpec);
return outcome.getItem().getString("key");
} catch (Exception e) {
}
}
throw new IOException("No active users were found");
}
GSI + Query == GOOD
userID (PK) | isActive | otherAttribute | ...
user1 | true | foo | ...
user2 | false | bar | ...
user3 | true | baz | ...
user4 | false | 42 | ...
...
GSI:
userID | isActive (GSI-PK)
user1 | true
user2 | false
user3 | true
user4 | false
Add a GSI with a hash key of isActive. This will allow you to query directly the items where isActive == false.
The benefit vs scan and filter is that reads will be much more efficient. The cost is that your GSI requires it's own storage, so if your table is huge (as per your assumption) then you might want to consider a sparse index.
Sparse Index GSI + Query == BETTER
userID (PK) | isNotActive | otherAttribute | ...
user1 | | foo | ...
user2 | false | bar | ...
user3 | | baz | ...
user4 | false | 42 | ...
...
GSI:
userId | isNotActive (GSI-PK)
user2 | false
user4 | false
Consider replacing the attribute isActive with isNotActive and don't give this attribute to the active users. That is, the inactive users will have true but the active users will not have this attribute at all. You can then create your GSI with this isNotActive attribute. Since it only contains the inactive users, it will be smaller and more efficient to store and query.
Note that when a user becomes active you will need to delete this attribute, and vice versa for active users that become inactive.
Attribute Projections
Regardless of which GSI you decide is best for you, if you know which attribute(s) you will need when querying these inactive users - even if it's just "all of them" - you can project these to your GSI so that you don't need to do the second lookup by key. This will increase the size of your GSI, but may be a tradeoff worth making depending on your table size, the ratio of active to inactive users, and your expected access patterns.
UPDATE
In response to the first comment, to be clear the GSI key (now labelled "GSI-PK") is not the userID. I could put the isActive or active column on the far left in the GSI tables, but that's not how it appears in the AWS console so I've left it in the original order for consistency with the way AWS display it.
Re the second comment on concurrency, you're right I didn't address this. My solution will work in a concurrent environment except for one thing - you can only do eventually consistent reads, not strongly consistent reads. What this means is that a very recent newly inactive user (and by recent I mean a fraction of a second in most circumstances) might not have replicated to the GSI yet. Similarly, a user that has recently changed from inactive to active might not have updated the GSI yet. You'll need to consider whether eventually consistent reads are acceptable for your use case.
Another consideration is that if this is going to be a very large table, if the query results are going to total >1MB you're going to get a paginated result anyway because DynamoDB enforce that limit. Without a global table lock, you're going to get some inconsistency due to updates from other clients between page queries, in which case eventually consistent reads will need to work for you.
I have successfully called a Stored Procedure ("Users_info") from my java app, which returns a table that contains two columns: ("user" and "status").
CallableStatement proc_stmt = con.prepareCall("{call Users_info (?)}");
proc_stmt.setInt(1, 1);
ResultSet rs = proc_stmt.executeQuery();
I managed to get all the user names and their respective status ("0" for inactive and "1" for active) into a ArrayList using ResultSet.getString.
ArrayList<String> userName= new ArrayList<String>();
ArrayList<String> userStatus= new ArrayList<String>();
while ( rs.next() ) {
userName.add(rs.getString("user"));
userStatus.add(rs.getString("CHKStatus_Id"));
}
Database shows:
user | status|
------------|--------
Joshua H. | 1 |
Mark J. | 0 |
Jonathan R. | 1 |
Angie I. | 0 |
Doris S. | 0 |
I want to find the best or most efficient way to separate the active users from the inactive users.
I can do this by comparing values using IF STATEMENTS, which, because of my limited knowledge in Java programming I think is not the most efficient way and maybe there is a better way that you guys can help me understand.
Thanks.
I am new with Cassandra CQL, I want to get the Cassandra query execution time. Can i do it in CQL shell by storing the current time in the variable, execute the query and then store the current time in another variable and calculate the actual execution time by taking the difference of both variables. Can anyone guide me.
From within cqlsh, your best option is probably to use tracing (output shortened for brevity):
aploetz#cqlsh:stackoverflow> tracing on;
Now Tracing is enabled
aploetz#cqlsh:stackoverflow> SELECT * FROM sujata WHERE id=2;
id | roll_number | age
----+-------------+-----
2 | 10 | 26
2 | 20 | 26
(2 rows)
Tracing session: 35072590-99fb-11e5-beaa-8b496c707234
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2015-12-03 14:19:51.027000 | 127.0.0.1 | 0
Parsing SELECT * FROM sujata WHERE id=2; [SharedPool-Worker-1] | 2015-12-03 14:19:51.034000 | 127.0.0.1 | 12378
Preparing statement [SharedPool-Worker-1] | 2015-12-03 14:19:51.035000 | 127.0.0.1 | 13415
Executing single-partition query on roles [SharedPool-Worker-2] | 2015-12-03 14:19:51.036000 | 127.0.0.1 | 14052
.................................................
Read 2 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-12-03 14:19:51.054001 | 127.0.0.1 | 32768
Request complete | 2015-12-03 14:19:51.063069 | 127.0.0.1 | 36069
Edit:
can I store this tracing log report to some file...?
Yes, you can. If I were to run the above trace from the Linux command line, and output that to a file, I would start by creating a file to hold my cqlsh commands:
aploetz#dockingBay94:~/cql$ cat traceSujata.cql
use stackoverflow;
tracing on;
SELECT * FROM sujata WHERE id=2;
Then, I'd use the -f flag on cqlsh to run commands from that file, and then redirect the output to another text file.
aploetz#dockingBay94:~/cql$ cqlsh -f traceSujata.cql > queryTrace_20151204.txt
Now you can peruse the query trace file at your leisure!
Option A
With datastax devcenter you directly have access to the request used time.
Go in the "query_trace" tab, just next to "Results".
More info : http://docs.datastax.com/en/developer/devcenter/doc/devcenter/dcQueryTrace.html
Option B
tracing on
More info : http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
Nb : Option A uses Option B
(1) If query is a small, just use like:
use nishi;
tracing on;
select * from family where name='nkantkumar';
(2) If query statement is very big say 1k, 5k select statement at a time.
cd <CASSANDRA_HOME>/bin
cqlsh -f '/apps/nkantkumar/query.txt' > '/apps/nkantkumar/traceQuery.cql'
here your query file will be like:-
use nishi;
tracing on;
select * from family where name='nkantkumar';
select * from family where name='nkantkumar1';
select * from family where name='nkantkumar2';
I attempted to use the tracing approach suggested by other answers, but something about the nature of my query meant that tracing was taking a very, very long time to return.
Since it was just a one-off timing for me to compare query performance between two options, I instead wrote a file with SQL commands to execute the command, getting timestamp before & after executing my query. I then copied the timestamps into Microsoft Excel and used it to calculate the difference in seconds (after stripping off the +0000 time zone information so Excel could understand it).
timing.sql
SELECT toTimestamp(now()) FROM system.local;
SELECT * from TABLE_1;
SELECT toTimestamp(now()) FROM system.local;
SELECT * from TABLE_2;
SELECT toTimestamp(now()) FROM system.local;
Execute timing.sql
cqlsh example.com -u my_user --file=timing.sql
Current time output
SELECT toTimestamp(now()) FROM system.local;
system.totimestamp(system.now())
----------------------------------
2020-11-18 16:10:35.745000+0000
Excel date difference
=(C1-B1)*60*60*24
The use case that we are working to solve with Cassandra is this: We need to retrieve a list of entity UUIDs that have been updated within a certain time range within the last 90 days. Imagine that we're building a document tracking system, so our relevant entity is a Document, whose key is a UUID.
The query we need to support in this use case is: Find all Document UUIDs that have changed between StartDateTime and EndDateTime.
Question 1: What's the best Cassandra table design to support this query?
I think the answer is as follows:
CREATE TABLE document_change_events (
event_uuid TIMEUUID,
document_uuid uuid,
PRIMARY KEY ((event_uuid), document_uuid)
) WITH default_time_to_live='7776000';
And given that we can't do range queries on partition keys, we'd need to use the token() method. As such the query would then be:
SELECT document_uuid
WHERE token(event_uuid) > token(minTimeuuid(?))
AND token(event_uuid) < token(maxTimeuuid(?))
For example:
SELECT document_uuid
WHERE token(event_uuid) > token(minTimeuuid('2015-05-10 00:00+0000'))
AND token(event_uuid) < token(maxTimeuuid('2015-05-20 00:00+0000'))
Question 2: I can't seem to get the following Java code using DataStax's driver to reliability return the correct results.
If I run the following code 10 times pausing 30 seconds between, I will then have 10 rows in this table:
private void addEvent() {
String cql = "INSERT INTO document_change_events (event_uuid, document_uuid) VALUES(?,?)";
PreparedStatement preparedStatement = cassandraSession.prepare(cql);
BoundStatement boundStatement = new BoundStatement(preparedStatement);
boundStatement.setConsistencyLevel(ConsistencyLevel.ANY);
boundStatement.setUUID("event_uuid", UUIDs.timeBased());
boundStatement.setUUID("document_uuid", UUIDs.random());
cassandraSession.execute(boundStatement);
}
Here are the results:
cqlsh:> select event_uuid, dateOf(event_uuid), document_uuid from document_change_events;
event_uuid | dateOf(event_uuid) | document_uuid
--------------------------------------+--------------------------+--------------------------------------
414decc0-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:51:09-0500 | 92b6fb6a-9ded-47b0-a91c-68c63f45d338
9abb4be0-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:53:39-0500 | 548b320a-10f6-409f-a921-d4a1170a576e
6512b960-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:52:09-0500 | 970e5e77-1e07-40ea-870a-84637c9fc280
53307a20-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:51:39-0500 | 11b4a49c-b73d-4c8d-9f88-078a6f303167
ac9e0050-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:54:10-0500 | b29e7915-7c17-4900-b784-8ac24e9e72e2
88d7fb30-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:53:09-0500 | c8188b73-1b97-4b32-a897-7facdeecea35
0ba5cf70-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:49:39-0500 | a079b30f-be80-4a99-ae0e-a784d82f0432
76f56dd0-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:52:39-0500 | 3b593ca6-220c-4a8b-8c16-27dc1fb5adde
1d88f910-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:50:09-0500 | ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
2f6b3850-0014-11e5-93a9-51f9a7931084 | 2015-05-21 18:50:39-0500 | db42271b-04f2-45d1-9ae7-0c8f9371a4db
(10 rows)
But if I then run this code:
private static void retrieveEvents(Instant startInstant, Instant endInstant) {
String cql = "SELECT document_uuid FROM document_change_events " +
"WHERE token(event_uuid) > token(?) AND token(event_uuid) < token(?)";
PreparedStatement preparedStatement = cassandraSession.prepare(cql);
BoundStatement boundStatement = new BoundStatement(preparedStatement);
boundStatement.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
boundStatement.bind(UUIDs.startOf(Date.from(startInstant).getTime()),
UUIDs.endOf(Date.from(endInstant).getTime()));
ResultSet resultSet = cassandraSession.execute(boundStatement);
if (resultSet == null) {
System.out.println("None found.");
return;
}
while (!resultSet.isExhausted()) {
System.out.println(resultSet.one().getUUID("document_uuid"));
}
}
It only retrieves three results:
3b593ca6-220c-4a8b-8c16-27dc1fb5adde
ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
db42271b-04f2-45d1-9ae7-0c8f9371a4db
Why didn't it retrieve all 10 results? And what do I need to change to achieve the correct results to support this use case?
For reference, I've tested this against dsc-2.1.1, dse-4.6 and using the DataStax Java Driver v2.1.6.
First of all, please only ask one question at a time. Both of your questions here could easily stand on their own. I know these are related, but it just makes the readers come down with a case of tl;dr.
I'll answer your 2nd question first, because the answer ties into a fundamental understanding that is central to getting the data model correct. When I INSERT your rows and run the following query, this is what I get:
aploetz#cqlsh:stackoverflow2> SELECT document_uuid FROM document_change_events
WHERE token(event_uuid) > token(minTimeuuid('2015-05-10 00:00-0500'))
AND token(event_uuid) < token(maxTimeuuid('2015-05-22 00:00-0500'));
document_uuid
--------------------------------------
a079b30f-be80-4a99-ae0e-a784d82f0432
3b593ca6-220c-4a8b-8c16-27dc1fb5adde
ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
db42271b-04f2-45d1-9ae7-0c8f9371a4db
(4 rows)
Which is similar to what you are seeing. Why didn't that return all 10? Well, the answer becomes apparent when I include token(event_uuid) in my SELECT:
aploetz#cqlsh:stackoverflow2> SELECT token(event_uuid),document_uuid FROM document_change_events WHERE token(event_uuid) > token(minTimeuuid('2015-05-10 00:00-0500')) AND token(event_uuid) < token(maxTimeuuid('2015-05-22 00:00-0500'));
token(event_uuid) | document_uuid
----------------------+--------------------------------------
-2112897298583224342 | a079b30f-be80-4a99-ae0e-a784d82f0432
2990331690803078123 | 3b593ca6-220c-4a8b-8c16-27dc1fb5adde
5049638908563824288 | ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
5577339174953240576 | db42271b-04f2-45d1-9ae7-0c8f9371a4db
(4 rows)
Cassandra stores partition keys (event_uuid in your case) in order by their hashed token value. You can see this when using the token function. Cassandra generates partition tokens with a process called consistent hashing to ensure even cluster distribution. In other words, querying by token range doesn't make sense unless the actual (hashed) token values are meaningful to your application.
Getting back to your first question, this means you will have to find a different column to partition on. My suggestion is to use a timeseries mechanism called a "date bucket." Picking the date bucket can be tricky, as it depends on your requirements and query patterns...so that's really up to you to pick a useful one.
For the purposes of this example, I'll pick "month." So I'll re-create your table partitioning on month and clustering by event_uuid:
CREATE TABLE document_change_events2 (
event_uuid TIMEUUID,
document_uuid uuid,
month text,
PRIMARY KEY ((month),event_uuid, document_uuid)
) WITH default_time_to_live='7776000';
Now I can query by a date range, when also filtering by month:
aploetz#cqlsh:stackoverflow2> SELECT document_uuid FROM document_change_events2
WHERE month='201505'
AND event_uuid > minTimeuuid('2015-05-10 00:00-0500')
AND event_uuid < maxTimeuuid('2015-05-22 00:00-0500');
document_uuid
--------------------------------------
a079b30f-be80-4a99-ae0e-a784d82f0432
ec155e0b-39a5-4d2f-98f0-0cd7a5a07ec8
db42271b-04f2-45d1-9ae7-0c8f9371a4db
92b6fb6a-9ded-47b0-a91c-68c63f45d338
11b4a49c-b73d-4c8d-9f88-078a6f303167
970e5e77-1e07-40ea-870a-84637c9fc280
3b593ca6-220c-4a8b-8c16-27dc1fb5adde
c8188b73-1b97-4b32-a897-7facdeecea35
548b320a-10f6-409f-a921-d4a1170a576e
b29e7915-7c17-4900-b784-8ac24e9e72e2
(10 rows)
Again, month may not work for your application. So put some thought behind coming up with an appropriate column to partition on, and then you should be able to solve this.
I get an exception (see below) if I try to do
resultset.getString("add_date");
for a JDBC connection to a MySQL database containing a DATETIME value of 0000-00-00 00:00:00 (the quasi-null value for DATETIME), even though I'm just trying to get the value as string, not as an object.
I got around this by doing
SELECT CAST(add_date AS CHAR) as add_date
which works, but seems silly... is there a better way to do this?
My point is that I just want the raw DATETIME string, so I can parse it myself as is.
note: here's where the 0000 comes in: (from http://dev.mysql.com/doc/refman/5.0/en/datetime.html)
Illegal DATETIME, DATE, or TIMESTAMP
values are converted to the “zero”
value of the appropriate type
('0000-00-00 00:00:00' or
'0000-00-00').
The specific exception is this one:
SQLException: Cannot convert value '0000-00-00 00:00:00' from column 5 to TIMESTAMP.
SQLState: S1009
VendorError: 0
java.sql.SQLException: Cannot convert value '0000-00-00 00:00:00' from column 5 to TIMESTAMP.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926)
at com.mysql.jdbc.ResultSetImpl.getTimestampFromString(ResultSetImpl.java:6343)
at com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5670)
at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5491)
at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5531)
Alternative answer, you can use this JDBC URL directly in your datasource configuration:
jdbc:mysql://yourserver:3306/yourdatabase?zeroDateTimeBehavior=convertToNull
Edit:
Source: MySQL Manual
Datetimes with all-zero components (0000-00-00 ...) — These values can not be represented reliably in Java. Connector/J 3.0.x always converted them to NULL when being read from a ResultSet.
Connector/J 3.1 throws an exception by default when these values are encountered as this is the most correct behavior according to the JDBC and SQL standards. This behavior can be modified using the zeroDateTimeBehavior configuration property. The allowable values are:
exception (the default), which throws an SQLException with an SQLState of S1009.
convertToNull, which returns NULL instead of the date.
round, which rounds the date to the nearest closest value which is 0001-01-01.
Update: Alexander reported a bug affecting mysql-connector-5.1.15 on that feature. See CHANGELOGS on the official website.
I stumbled across this attempting to solve the same issue. The installation I am working with uses JBOSS and Hibernate, so I had to do this a different way. For the basic case, you should be able to add zeroDateTimeBehavior=convertToNull to your connection URI as per this configuration properties page.
I found other suggestions across the land referring to putting that parameter in your hibernate config:
In hibernate.cfg.xml:
<property name="hibernate.connection.zeroDateTimeBehavior">convertToNull</property>
In hibernate.properties:
hibernate.connection.zeroDateTimeBehavior=convertToNull
But I had to put it in my mysql-ds.xml file for JBOSS as:
<connection-property name="zeroDateTimeBehavior">convertToNull</connection-property>
Hope this helps someone. :)
My point is that I just want the raw DATETIME string, so I can parse it myself as is.
That makes me think that your "workaround" is not a workaround, but in fact the only way to get the value from the database into your code:
SELECT CAST(add_date AS CHAR) as add_date
By the way, some more notes from the MySQL documentation:
MySQL Constraints on Invalid Data:
Before MySQL 5.0.2, MySQL is forgiving of illegal or improper data values and coerces them to legal values for data entry. In MySQL 5.0.2 and up, that remains the default behavior, but you can change the server SQL mode to select more traditional treatment of bad values such that the server rejects them and aborts the statement in which they occur.
[..]
If you try to store NULL into a column that doesn't take NULL values, an error occurs for single-row INSERT statements. For multiple-row INSERT statements or for INSERT INTO ... SELECT statements, MySQL Server stores the implicit default value for the column data type.
MySQL 5.x Date and Time Types:
MySQL also allows you to store '0000-00-00' as a “dummy date” (if you are not using the NO_ZERO_DATE SQL mode). This is in some cases more convenient (and uses less data and index space) than using NULL values.
[..]
By default, when MySQL encounters a value for a date or time type that is out of range or otherwise illegal for the type (as described at the beginning of this section), it converts the value to the “zero” value for that type.
DATE_FORMAT(column name, '%Y-%m-%d %T') as dtime
Use this to avoid the error. It return the date in string format and then you can get it as a string.
resultset.getString("dtime");
This actually does NOT work. Even though you call getString. Internally mysql still tries to convert it to date first.
at com.mysql.jdbc.ResultSetImpl.getDateFromString(ResultSetImpl.java:2270)
~[mysql-connector-java-5.1.15.jar:na] at com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5743)
~[mysql-connector-java-5.1.15.jar:na] at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5576)
~[mysql-connector-java-5.1.15.jar:na]
If, after adding lines:
<property
name="hibernate.connection.zeroDateTimeBehavior">convertToNull</property>
hibernate.connection.zeroDateTimeBehavior=convertToNull
<connection-property
name="zeroDateTimeBehavior">convertToNull</connection-property>
continues to be an error:
Illegal DATETIME, DATE, or TIMESTAMP values are converted to the “zero” value of the appropriate type ('0000-00-00 00:00:00' or '0000-00-00').
find lines:
1) resultSet.getTime("time"); // time = 00:00:00
2) resultSet.getTimestamp("timestamp"); // timestamp = 00000000000000
3) resultSet.getDate("date"); // date = 0000-00-00 00:00:00
replace with the following lines, respectively:
1) Time.valueOf(resultSet.getString("time"));
2) Timestamp.valueOf(resultSet.getString("timestamp"));
3) Date.valueOf(resultSet.getString("date"));
I wrestled with this problem and implemented the 'convertToNull' solutions discussed above. It worked in my local MySql instance. But when I deployed my Play/Scala app to Heroku it no longer would work. Heroku also concatenates several args to the DB URL that they provide users, and this solution, because of Heroku's use concatenation of "?" before their own set of args, will not work. However I found a different solution which seems to work equally well.
SET sql_mode = 'NO_ZERO_DATE';
I put this in my table descriptions and it solved the problem of '0000-00-00 00:00:00' can not be represented as java.sql.Timestamp
I suggest you use null to represent a null value.
What is the exception you get?
BTW:
There is no year called 0 or 0000. (Though some dates allow this year)
And there is no 0 month of the year or 0 day of the month. (Which may be the cause of your problem)
I solved the problem considerating '00-00-....' isn't a valid date, then, I changed my SQL column definition adding "NULL" expresion to permit null values:
SELECT "-- Tabla item_pedido";
CREATE TABLE item_pedido (
id INTEGER AUTO_INCREMENT PRIMARY KEY,
id_pedido INTEGER,
id_item_carta INTEGER,
observacion VARCHAR(64),
fecha_estimada TIMESTAMP,
fecha_entrega TIMESTAMP NULL, // HERE IS!!.. NULL = DELIVERY DATE NOT SET YET
CONSTRAINT fk_item_pedido_id_pedido FOREIGN KEY (id_pedido)
REFERENCES pedido(id),...
Then, I've to be able to insert NULL values, that means "I didnt register that timestamp yet"...
SELECT "++ INSERT item_pedido";
INSERT INTO item_pedido VALUES
(01, 01, 01, 'Ninguna', ADDDATE(#HOY, INTERVAL 5 MINUTE), NULL),
(02, 01, 02, 'Ninguna', ADDDATE(#HOY, INTERVAL 3 MINUTE), NULL),...
The table look that:
mysql> select * from item_pedido;
+----+-----------+---------------+-------------+---------------------+---------------------+
| id | id_pedido | id_item_carta | observacion | fecha_estimada | fecha_entrega |
+----+-----------+---------------+-------------+---------------------+---------------------+
| 1 | 1 | 1 | Ninguna | 2013-05-19 15:09:48 | NULL |
| 2 | 1 | 2 | Ninguna | 2013-05-19 15:07:48 | NULL |
| 3 | 1 | 3 | Ninguna | 2013-05-19 15:24:48 | NULL |
| 4 | 1 | 6 | Ninguna | 2013-05-19 15:06:48 | NULL |
| 5 | 2 | 4 | Suave | 2013-05-19 15:07:48 | 2013-05-19 15:09:48 |
| 6 | 2 | 5 | Seco | 2013-05-19 15:07:48 | 2013-05-19 15:12:48 |
| 7 | 3 | 5 | Con Mayo | 2013-05-19 14:54:48 | NULL |
| 8 | 3 | 6 | Bilz | 2013-05-19 14:57:48 | NULL |
+----+-----------+---------------+-------------+---------------------+---------------------+
8 rows in set (0.00 sec)
Finally: JPA in action:
#Stateless
#LocalBean
public class PedidosServices {
#PersistenceContext(unitName="vagonpubPU")
private EntityManager em;
private Logger log = Logger.getLogger(PedidosServices.class.getName());
#SuppressWarnings("unchecked")
public List<ItemPedido> obtenerPedidosRetrasados() {
log.info("Obteniendo listado de pedidos retrasados");
Query qry = em.createQuery("SELECT ip FROM ItemPedido ip, Pedido p WHERE" +
" ip.fechaEntrega=NULL" +
" AND ip.idPedido=p.id" +
" AND ip.fechaEstimada < :arg3" +
" AND (p.idTipoEstado=:arg0 OR p.idTipoEstado=:arg1 OR p.idTipoEstado=:arg2)");
qry.setParameter("arg0", Tipo.ESTADO_BOUCHER_ESPERA_PAGO);
qry.setParameter("arg1", Tipo.ESTADO_BOUCHER_EN_SERVICIO);
qry.setParameter("arg2", Tipo.ESTADO_BOUCHER_RECIBIDO);
qry.setParameter("arg3", new Date());
return qry.getResultList();
}
At last all its work. I hope that help you.
To add to the other answers: If yout want the 0000-00-00 string, you can use noDatetimeStringSync=true (with the caveat of sacrificing timezone conversion).
The official MySQL bug: https://bugs.mysql.com/bug.php?id=47108.
Also, for history, JDBC used to return NULL for 0000-00-00 dates but now return an exception by default.
Source
you can append the jdbc url with
?zeroDateTimeBehavior=convertToNull&autoReconnect=true&characterEncoding=UTF-8&characterSetResults=UTF-8
With the help of this, sql convert '0000-00-00 00:00:00' as null value.
eg:
jdbc:mysql:<host-name>/<db-name>?zeroDateTimeBehavior=convertToNull&autoReconnect=true&characterEncoding=UTF-8&characterSetResults=UTF-8