Timeout Exception In Cassandra - java

I use Cassandra db for get data for some frequently requests. Following is my code
public Map<String,String> loadObject(ArrayList<Integer> tradigAccountList){
com.datastax.driver.core.Session session;
Map<String,String> orderListMap = new HashMap<>();
List<ResultSetFuture> futures = new ArrayList<>();
List<ListenableFuture<ResultSet>> Future;
try {
session =jdbcUtils.getCassandraSession();
PreparedStatement statement = jdbcUtils.getCassandraPS(CassandraPS.LOAD_ORDER_LIST);
for (Integer tradingAccount:tradigAccountList){
futures.add(session.executeAsync(statement.bind(tradingAccount).setFetchSize(3000)));
}
for (ResultSetFuture future : futures){
for ( Row row : future.get().all()){
orderListMap.put(row.getString("cliordid"),row.getString("ordermsg"));
}
}
}catch (Exception e){
}finally {
}
return orderListMap;
}
I send approximately 30 requests simultaneously and my query is something like this :
"SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid = ?"
Each time when I run this query it will approximately fetch at least 30000 rows. But when i send multiple requests simultaneously this will throw timeout exception.
My Cassandra cluster has 2 nodes with 32 concurrent read and write thread for each. Please can anyone provide me an solution for this?

CREATE TABLE omsks_v1.ordersstringv1_copy1 (
tradacntid int,
cliordid text,
ordermsg text,
PRIMARY KEY (tradacntid, cliordid)
) WITH bloom_filter_fp_chance = 0.01
AND comment = ''
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'sstable_compression' : 'LZ4Compressor'
}
AND compaction = {
'class' : 'SizeTieredCompactionStrategy'
};
This is table schema

Related

Improve performance of loading 100,000 records from database

We created a program to make the use of the database easier in other programs. So the code im showing gets used in multiple other programs.
One of those other programs gets about 10,000 records from one of our clients and has to check if these are in our database already. If not we insert them into the database (they can also change and have to be updated then).
To make this easy we load all the entries from our whole table (at the moment 120,000), create a class for every entry we get and put all of them into a Hashmap.
The loading of the whole table this way takes around 5 minutes. Also we sometimes have to restart the program because we run into a GC overhead error because we work on limited hardware. Do you have an idea of how we can improve the performance?
Here is the code to load all entries (we have a global limit of 10.000 entries per query so we use a loop):
public Map<String, IMasterDataSet> getAllInformationObjects(ISession session) throws MasterDataException {
IQueryExpression qe;
IQueryParameter qp;
// our main SDP class
Constructor<?> constructorForSDPbaseClass = getStandardConstructor();
SimpleDateFormat itaTimestampFormat = new SimpleDateFormat("yyyyMMddHHmmssSSS");
// search in standard time range (modification date!)
Calendar cal = Calendar.getInstance();
cal.set(2010, Calendar.JANUARY, 1);
Date startDate = cal.getTime();
Date endDate = new Date();
Long startDateL = Long.parseLong(itaTimestampFormat.format(startDate));
Long endDateL = Long.parseLong(itaTimestampFormat.format(endDate));
IDescriptor modDesc = IBVRIDescriptor.ModificationDate.getDescriptor(session);
// count once before to determine initial capacities for hash map/set
IBVRIArchiveClass SDP_ARCHIVECLASS = getMasterDataPropertyBag().getSDP_ARCHIVECLASS();
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
int nrOfHitsTotal = session.getDocumentServer().queryCount(session, qp, "*");
int initialCapacity = (int) (nrOfHitsTotal / 0.75 + 1);
// MD sets; and objects already done (here: document ID)
HashSet<String> objDone = new HashSet<>(initialCapacity);
HashMap<String, IMasterDataSet> objRes = new HashMap<>(initialCapacity);
qp.close();
// do queries until hit count is smaller than 10.000
// use modification date
boolean keepGoing = true;
while(keepGoing) {
// construct query expression
// - basic part: Modification date & class type
// a. doc. class type
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
// b. ID
qe = SearchUtil.appendQueryExpressionWithANDoperator(session, qe,
new PlainExpression(modDesc.getQueryLiteral() + " BETWEEN " + startDateL + " AND " + endDateL));
// 2. Query Parameter: set database; set expression
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
// order by modification date; hitlimit = 0 -> no hitlimit, but the usual 10.000 max
qp.setOrderByExpression(session.getDocumentServer().getClassFactory().getOrderByExpressionInstance(modDesc, true));
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
// Do not sort by modification date;
qp.setHints("+NoDefaultOrderBy");
keepGoing = false;
IInformationObject[] hits = null;
IDocumentHitList hitList = null;
hitList = session.getDocumentServer().query(qp, session);
IDocument doc;
if (hitList.getTotalHitCount() > 0) {
hits = hitList.getInformationObjects();
for (IInformationObject hit : hits) {
String objID = hit.getID();
if(!objDone.contains(objID)) {
// do something with this object and the class
// here: construct a new SDP sub class object and give it back via interface
doc = (IDocument) hit;
IMasterDataSet mdSet;
try {
mdSet = (IMasterDataSet) constructorForSDPbaseClass.newInstance(session, doc);
} catch (Exception e) {
// cause for this
String cause = (e.getCause() != null) ? e.getCause().toString() : MasterDataException.ERRMSG_PART_UNKNOWN;
throw new MasterDataException(MasterDataException.ERRMSG_NOINSTANCE_POSSIBLE, this.getClass().getSimpleName(), e.toString(), cause);
}
objRes.put(mdSet.getID(), mdSet);
objDone.add(objID);
}
}
doc = (IDocument) hits[hits.length - 1];
Date lastModDate = ((IDateValue) doc.getDescriptor(modDesc).getValues()[0]).getValue();
startDateL = Long.parseLong(itaTimestampFormat.format(lastModDate));
keepGoing = (hits.length >= 10000 || hitList.isResultSetTruncated());
}
qp.close();
}
return objRes;
}
Loading 120,000 rows (and more) each time will not scale very well, and your solution may not work in the future as the record size grows. Instead let the database server handle the problem.
Your table needs to have a primary key or unique key based on the columns of the records. Iterate through the 10,000 records performing JDBC SQL update to modify all field values with where clause to exactly match primary/unique key.
update BLAH set COL1 = ?, COL2 = ? where PKCOL = ?; // ... AND PKCOL2 =? ...
This modifies an existing row or does nothing at all - and JDBC executeUpate() will return 0 or 1 indicating number of rows changed. If number of rows changed was zero you have detected a new record which does not exist, so perform insert for that new record only.
insert into BLAH (COL1, COL2, ... PKCOL) values (?,?, ..., ?);
You can decide whether to run 10,000 updates followed by however many inserts are needed, or do update+optional insert, and remember JDBC batch statements / auto-commit off may help speed things up.

Collecting the result of cypher query into a hash map java?

This is a followup of Finding connected nodes question.
The code is
firstNode = graphDb.createNode();//creating nodes
firstNode.setProperty( "person", "Andy " );
Label myLabel = DynamicLabel.label("person");
firstNode.addLabel(myLabel); ...
relationship = firstNode.createRelationshipTo( secondNode, RelTypes.emails );// creating relationships
relationship.setProperty( "relationship", "email " );....
ExecutionEngine engine = new ExecutionEngine(graphDb);
ExecutionResult result = engine.execute("MATCH (sender:person)-[:emails]- (receiver)RETURN sender, count(receiver)as count, collect(receiver) as receivers ORDER BY count DESC ");
System.out.println(result.dumpToString());
The result I got was:
sender | count | receivers
Node[2]{person:"Chris"} | 3 | [Node[4]{person:"Elsa "},Node[0]{person:"Andy "},Node[1]{person:"Bobby"}]
Node[4]{person:"Elsa "} | 3 | [Node[5]{person:"Frank"},Node[2]{person:"Chris"},Node[3]{person:"David"}]
Node[1]{person:"Bobby"} | 3 | [Node[2]{person:"Chris"},Node[3]{person:"David"},Node[0]{person:"Andy "}]
Node[5]{person:"Frank"} | 2 | [Node[3]{person:"David"},Node[4]{person:"Elsa "}
How to collect the sender as key and receivers as values?
For ex : {Frank =[David, Elsa], Chris =[Elsa, Andy, Nobby]..
Any idea?
Initially I tried iterating something like this
for (Map<String,Object> row : result) {
Node x = (Node)row.get("receivers");
System.out.println(x);
for (String prop : x.getPropertyKeys()) {
System.out.println(prop +": "+x.getProperty(prop));
}
This throws classcast exception. It works for column "sender" and not for "receivers".
I am very new to cypher. I don't know how to transform the result into a hash map. How is this possible ?
You can rewrite the cypher to return a map instead... (split for readability)
MATCH (sender:person)-[:emails]->(receiver)
WITH sender, collect(receiver.person) as receivers
RETURN {sender: sender.person, receivers: receivers)
ORDER BY size(receivers) DESC
The result can be treated as a list of maps, each map represents the key-value mapping of its record in the result.
Code to achieve this:
private List<Map<String, Object>> buildResultsList(Result result) {
List<Map<String, Object>> results = new LinkedList<>();
while (result.hasNext()) {
Record record = result.next();
results.add(record.asMap());
}
return results;
}

scan count returns significantly less number for a dynamodb table

I'm running a sample java program to query a dynamodb table, the table has about 90000 items but when i get the scan count from java it shows only 1994 items
ScanRequest scanRequest = new ScanRequest().withTableName(tableName);
ScanResult result = client.scan(scanRequest);
System.out.println("#items:" + result.getScannedCount());
the program outputs #items:1994
but the detail from amazon aws console shows:
Item Count*: 89249
any idea?
thanks
scanning or querying dynamodb only returns maximum of 1MB of data.
the count is the number of return items fit in 1MB. in order to get the whole table, you should aggressively scan the database until the value LastEvaluatedKey is null
Set your book object with correct hash key value, and use DynamoDBMapper to get the count.
DynamoDBQueryExpression<Book> queryExpression = new DynamoDBQueryExpression<Book>()
.withHashKeyValues(book);
dynamoDbMapper.count(Book.class, queryExpression);
This should help . Worked for me
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
.withRegion("your region").build();
DynamoDB dynamoDB = new DynamoDB(client);
TableDescription tableDescription = dynamoDB.getTable("table name").describe();
tableDescription.getItemCount();
Based on answer from nightograph
private ArrayList<String> fetchItems() {
ArrayList<String> ids = new ArrayList<>();
ScanResult result = null;
do {
ScanRequest req = new ScanRequest();
req.setTableName("table_name");
if (result != null) {
req.setExclusiveStartKey(result.getLastEvaluatedKey());
}
result = amazonDynamoDBClient.scan(req);
List<Map<String, AttributeValue>> rows = result.getItems();
for (Map<String, AttributeValue> map : rows) {
AttributeValue v = map.get("rangeKey");
String id = v.getS();
ids.add(id);
}
} while (result.getLastEvaluatedKey() != null);
System.out.println("Result size: " + ids.size());
return ids;
}
I agreed with nightograph. I thinks this link is useful.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html
I just tested with this example. Anyway this is the Dyanamodb v2.
final ScanRequest scanRequest = new ScanRequest()
.withTableName("table_name");
final ScanResult result = dynamoDB.scan(scanRequest);
return result.getCount();

Cassandra Hector : How to retrieve all keyrows value in family?

I am newbie in casssandra and wants to fetch all rowkey value from CloumnFamily.
suppose I have User CoulmnFamily look like this
list User
RowKey: amit
=> (column=fullname, value=amitdubey, timestamp=1381832571947000)
-------------------
RowKey: jax1
=> (column=fullname, value=jagveer1, timestamp=1381655141564000)
-------------------
RowKey: jax2
=> (column=fullname, value=amitdubey, timestamp=1381832571947000)
-------------------
RowKey: jax3
=> (column=fullname, value=jagveer3, timestamp=1381655141564000)
-------------------
I am looking for the example code to retrieve the all keyrows value of the family.
Something like this:
amit
jax1
jax2
jax3
My Cassandra Version is 1.1
appreciate any help
You can do it simply by using RangeSlicesQuery in hector client.
RangeSlicesQuery<String, String, String> rangeSlicesQuery =
HFactory.createRangeSlicesQuery(keyspace, ss, ss, ss)
.setColumnFamily("User")
.setRange(null, null, false, rowCount)
.setRowCount(rowCount)
.setReturnKeysOnly();
String lastKey = null;
while (true) {
rangeSlicesQuery.setKeys(lastKey, null);
QueryResult<OrderedRows<String, String, String>> result = rangeSlicesQuery.execute();
OrderedRows<String, String, String> rows = result.get();
Iterator<Row<String, String, String>> rowsIterator = rows.iterator();
/**
* we'll skip this first one, since it is the same as the last one from previous time we executed.
*/
if (lastKey != null && rowsIterator != null) {
rowsIterator.next();
}
while (rowsIterator.hasNext()) {
Row<String, String, String> row = rowsIterator.next();
lastKey = row.getKey();
System.out.println(lastkey);
}
if (rows.getCount() < rowCount) {
break;
}
}

Hbase read performance varying abnormally

I've installed HBase 0.94.0. I had to improve my read performance through scan. I've inserted random 100000 records.
When I set setCache(100); my performance was 16 secs for 100000 records.
When I set it to setCache(50) my performance was 90 secs for 100000 records.
When I set it to setCache(10); my performance was 16 secs for 100000 records
public class Test {
public static void main(String[] args) {
long start, middle, end;
HTableDescriptor descriptor = new HTableDescriptor("Student7");
descriptor.addFamily(new HColumnDescriptor("No"));
descriptor.addFamily(new HColumnDescriptor("Subject"));
try {
HBaseConfiguration config = new HBaseConfiguration();
HBaseAdmin admin = new HBaseAdmin(config);
admin.createTable(descriptor);
HTable table = new HTable(config, "Student7");
System.out.println("Table created !");
start = System.currentTimeMillis();
for(int i =1;i<100000;i++) {
String s=Integer.toString(i);
Put p = new Put(Bytes.toBytes(s));
p.add(Bytes.toBytes("No"), Bytes.toBytes("IDCARD"),Bytes.toBytes("i+10"));
p.add(Bytes.toBytes("No"), Bytes.toBytes("PHONE"),Bytes.toBytes("i+20"));
p.add(Bytes.toBytes("No"), Bytes.toBytes("PAN"),Bytes.toBytes("i+30"));
p.add(Bytes.toBytes("No"), Bytes.toBytes("ACCT"),Bytes.toBytes("i+40"));
p.add(Bytes.toBytes("Subject"), Bytes.toBytes("English"),Bytes.toBytes("50"));
p.add(Bytes.toBytes("Subject"), Bytes.toBytes("Science"),Bytes.toBytes("60"));
p.add(Bytes.toBytes("Subject"), Bytes.toBytes("History"),Bytes.toBytes("70"));
table.put(p);
}
middle = System.currentTimeMillis();
Scan s = new Scan();
s.setCaching(100);
ResultScanner scanner = table.getScanner(s);
try {
for (Result rr = scanner.next(); rr != null; rr=scanner.next()) {
System.out.println("Found row: " + rr);
}
end = System.currentTimeMillis();
} finally {
scanner.close();
}
System.out.println("TableCreation-Time: " + (middle - start));
System.out.println("Scan-Time: " + (middle - end));
} catch (IOException e) {
System.out.println("IOError: cannot create Table.");
e.printStackTrace();
}
}
}
Why is this happening?
Why would you want to return every record in your 100000 records table? You're doing a full
table scan and just as in any large database this is slow.
Try thinking about a more useful use case in which you would like to return some columns of a record or a range of records.
HBase does only have one index on it's table, the row key. Make use of that. Try defining your row key so that you can get the data you need just by specifying the row key.
Let's say you would like to know the value of Subject:History for the rows with a
row key between 80000 and 80100. (Note that setCaching(100) means HBase will fetch 100 records per RPC and is this case thus one. Fetching 100 rows obviously requires more memory opposed to fetching, let's say, one row. Keep that in mind in a large multi-user environment.)
Long start, end;
start = System.currentTimeMillis();
Scan s = new Scan(String.valueOf(80000).getBytes(), String.valueOf(80100).getBytes());
s.setCaching(100);
s.addColumn("Subject".getBytes(), "History".getBytes());
ResultScanner scanner = table.getScanner(s);
try {
for (Result rr = scanner.next(); rr != null; rr=scanner.next()) {
System.out.println("Found row: " + new String(rr.getRow(), "UTF-8") + " value: " + new String(rr.getValue("Subject".getBytes(), "History".getBytes()), "UTF-8")));
}
end = System.currentTimeMillis();
} finally {
scanner.close();
}
System.out.println("Scan: " + (end - start));
This might look stupid because how would you know which rows you need just by an integer? Well, exactly, but that's why you need to design a row key according to what you're about to query instead of just using an incremental value as you would in a traditional database.
Try this example. It should be fast.
Note: I didn't run the example. I just typed it here. Maybe there are some small syntax errors you should correct but I hope the idea is clear.

Categories

Resources