Cassandra Hector : How to retrieve all keyrows value in family? - java

I am newbie in casssandra and wants to fetch all rowkey value from CloumnFamily.
suppose I have User CoulmnFamily look like this
list User
RowKey: amit
=> (column=fullname, value=amitdubey, timestamp=1381832571947000)
-------------------
RowKey: jax1
=> (column=fullname, value=jagveer1, timestamp=1381655141564000)
-------------------
RowKey: jax2
=> (column=fullname, value=amitdubey, timestamp=1381832571947000)
-------------------
RowKey: jax3
=> (column=fullname, value=jagveer3, timestamp=1381655141564000)
-------------------
I am looking for the example code to retrieve the all keyrows value of the family.
Something like this:
amit
jax1
jax2
jax3
My Cassandra Version is 1.1
appreciate any help

You can do it simply by using RangeSlicesQuery in hector client.
RangeSlicesQuery<String, String, String> rangeSlicesQuery =
HFactory.createRangeSlicesQuery(keyspace, ss, ss, ss)
.setColumnFamily("User")
.setRange(null, null, false, rowCount)
.setRowCount(rowCount)
.setReturnKeysOnly();
String lastKey = null;
while (true) {
rangeSlicesQuery.setKeys(lastKey, null);
QueryResult<OrderedRows<String, String, String>> result = rangeSlicesQuery.execute();
OrderedRows<String, String, String> rows = result.get();
Iterator<Row<String, String, String>> rowsIterator = rows.iterator();
/**
* we'll skip this first one, since it is the same as the last one from previous time we executed.
*/
if (lastKey != null && rowsIterator != null) {
rowsIterator.next();
}
while (rowsIterator.hasNext()) {
Row<String, String, String> row = rowsIterator.next();
lastKey = row.getKey();
System.out.println(lastkey);
}
if (rows.getCount() < rowCount) {
break;
}
}

Related

Get LinkedHashMapValue with Dynamic Key

I want to get the value from LinkedHashmap with dynamic key like below.
def map = [Employee1: [Status: 'Working', Id: 1], Employee2: [Status: 'Resigned', Id: 2]]
def keys = "Employee1.Status"
def keyPath = "";
def keyList = keys.tokenize(".");
keyList.eachWithIndex() { key, i ->
keyPath += "$key"
if(i != keyList.size() - 1){ keyPath += "." }
}
println keyPath //Employee1.Status
println map.keyPath //Always null
println map.'Employee1'.'Status' //Working
println map.Employee1.Status //Working
Here map.keyPath always returning null. How to get the value with dynamic key?
I think you can simply do this:
def tmpMap = map;
keyList.subList(0, keyList.size - 1).each {key ->
tmpMap = map[key]
}
println tmpMap[keyList[keyList.size - 1]]
This will extract the sub maps until the actual value key is reached. To make this more stable you should add some logic to check if the value associated with the current key is actually a map.
With curiosity I try to use just Your code.
keyPath == 'Employee1.Status' not 'Employee1'.'Status'
So to do this you can make something like this:
def map = [
Employee1:
[Status: 'Working', Id: 1],
Employee2:
[Status: 'Resigned', Id: 2]
]
def keys = "Employee1.Status"
def keyPath = "";
def keyList = keys.tokenize(".");
keyList.eachWithIndex() { key, i ->
keyPath += "$key"
if(i != keyList.size() - 1){ keyPath += '.' }
}
println keyPath //Employee1.Status
//tokenize it and get elements as a[0] and a[1]
a = keyPath.tokenize(".");
println map.(a[0]).(a[1]) //Working
println map.'Employee1'.'Status' //Working
println map.Employee1.Status //Working

Timeout Exception In Cassandra

I use Cassandra db for get data for some frequently requests. Following is my code
public Map<String,String> loadObject(ArrayList<Integer> tradigAccountList){
com.datastax.driver.core.Session session;
Map<String,String> orderListMap = new HashMap<>();
List<ResultSetFuture> futures = new ArrayList<>();
List<ListenableFuture<ResultSet>> Future;
try {
session =jdbcUtils.getCassandraSession();
PreparedStatement statement = jdbcUtils.getCassandraPS(CassandraPS.LOAD_ORDER_LIST);
for (Integer tradingAccount:tradigAccountList){
futures.add(session.executeAsync(statement.bind(tradingAccount).setFetchSize(3000)));
}
for (ResultSetFuture future : futures){
for ( Row row : future.get().all()){
orderListMap.put(row.getString("cliordid"),row.getString("ordermsg"));
}
}
}catch (Exception e){
}finally {
}
return orderListMap;
}
I send approximately 30 requests simultaneously and my query is something like this :
"SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid = ?"
Each time when I run this query it will approximately fetch at least 30000 rows. But when i send multiple requests simultaneously this will throw timeout exception.
My Cassandra cluster has 2 nodes with 32 concurrent read and write thread for each. Please can anyone provide me an solution for this?
CREATE TABLE omsks_v1.ordersstringv1_copy1 (
tradacntid int,
cliordid text,
ordermsg text,
PRIMARY KEY (tradacntid, cliordid)
) WITH bloom_filter_fp_chance = 0.01
AND comment = ''
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'sstable_compression' : 'LZ4Compressor'
}
AND compaction = {
'class' : 'SizeTieredCompactionStrategy'
};
This is table schema

Spark - Java UDF returning multiple columns

I'm using sparkSql 1.6.2 (Java API) and I have to process the following DataFrame that has a list of value in 2 columns:
ID AttributeName AttributeValue
0 [an1,an2,an3] [av1,av2,av3]
1 [bn1,bn2] [bv1,bv2]
The desired table is:
ID AttributeName AttributeValue
0 an1 av1
0 an2 av2
0 an3 av3
1 bn1 bv1
1 bn2 bv2
I think I have to use a combination of the explode function and a custom UDF function.
I found the following resources:
Explode (transpose?) multiple columns in Spark SQL table
How do I call a UDF on a Spark DataFrame using JAVA?
and I can successfully run an example that read the two columns and return the concatenation of the first two strings in a column
UDF2 combineUDF = new UDF2<Seq<String>, Seq<String>, String>() {
public String call(final Seq<String> col1, final Seq<String> col2) throws Exception {
return col1.apply(0) + col2.apply(0);
}
};
context.udf().register("combineUDF", combineUDF, DataTypes.StringType);
the problem is to write the signature of a UDF returning two columns (in Java).
As far as I understand I must define a new StructType as the one shown below and set that as return type, but so far I didn't manage to have the final code working
StructType retSchema = new StructType(new StructField[]{
new StructField("#AttName", DataTypes.StringType, true, Metadata.empty()),
new StructField("#AttValue", DataTypes.StringType, true, Metadata.empty()),
}
);
context.udf().register("combineUDF", combineUDF, retSchema);
Any help will be really appreciated.
UPDATE: I'm trying to implement first the zip(AttributeName,AttributeValue) so then I will need just to apply the standard explode function in sparkSql:
ID AttName_AttValue
0 [[an1,av1],[an1,av2],[an3,av3]]
1 [[bn1,bv1],[bn2,bv2]]
I built the following UDF:
UDF2 combineColumns = new UDF2<Seq<String>, Seq<String>, List<List<String>>>() {
public List<List<String>> call(final Seq<String> col1, final Seq<String> col2) throws Exception {
List<List<String>> zipped = new LinkedList<>();
for (int i = 0, listSize = col1.size(); i < listSize; i++) {
List<String> subRow = Arrays.asList(col1.apply(i), col2.apply(i));
zipped.add(subRow);
}
return zipped;
}
};
But when I run the code
myDF.select(callUDF("combineColumns", col("AttributeName"), col("AttributeValue"))).show(10);
I got the following error message:
scala.MatchError: [[an1,av1],[an1,av2],[an3,av3]] (of class java.util.LinkedList)
and it looks like the combining has been performed correctly but then the return type is not the expected one in Scala.
Any Help?
Finally I managed to get the result I was looking for but probably not in the most efficient way.
Basically the are 2 step:
Zip of the two list
Explode of the list in rows
For the first step I defined the following UDF Function
UDF2 concatItems = new UDF2<Seq<String>, Seq<String>, Seq<String>>() {
public Seq<String> call(final Seq<String> col1, final Seq<String> col2) throws Exception {
ArrayList zipped = new ArrayList();
for (int i = 0, listSize = col1.size(); i < listSize; i++) {
String subRow = col1.apply(i) + ";" + col2.apply(i);
zipped.add(subRow);
}
return scala.collection.JavaConversions.asScalaBuffer(zipped);
}
};
Missing the function registration to SparkSession:
sparkSession.udf().register("concatItems",concatItems,DataTypes.StringType);
and then I called it with the following code:
DataFrame df2 = df.select(col("ID"), callUDF("concatItems", col("AttributeName"), col("AttributeValue")).alias("AttName_AttValue"));
At this stage the df2 looks like that:
ID AttName_AttValue
0 [[an1,av1],[an1,av2],[an3,av3]]
1 [[bn1,bv1],[bn2,bv2]]
Then I called the following lambda function for exploding the list into rows:
DataFrame df3 = df2.select(col("ID"),explode(col("AttName_AttValue")).alias("AttName_AttValue_row"));
At this stage the df3 looks like that:
ID AttName_AttValue
0 [an1,av1]
0 [an1,av2]
0 [an3,av3]
1 [bn1,bv1]
1 [bn2,bv2]
Finally to split the attribute name and value into two different columns, I converted the DataFrame into a JavaRDD in order to use the map function:
JavaRDD df3RDD = df3.toJavaRDD().map(
(Function<Row, Row>) myRow -> {
String[] info = String.valueOf(myRow.get(1)).split(",");
return RowFactory.create(myRow.get(0), info[0], info[1]);
}).cache();
If anybody has a better solution feel free to comment.
I hope it helps.

Get number of objects referenced from ArrayList with size 1 grouped by class

I've got a heap dump from the application and found out that there's a huge number of ArrayLists with only 1 object in it. I know how to get the list of such arraylists and also show the class of the contained element:
SELECT list.elementData[0] FROM java.util.ArrayList list WHERE (list.size = 1)
The result looks like this:
java.lang.String [id=0x7f8e44970]
java.lang.String [id=0x7f8e44980]
java.lang.String [id=0x7f8e44572]
java.io.File [id=0x7f8e43572]
...
What I would like is to get something like this:
Class | count
=================================
java.lang.String | 100
java.io.File | 74
...
but I'm not able to aggregate the results or do anything else on those. I've found here how to pass the values to outer select, but I can't figure out how to use anything else beside the * in the first select.
SELECT * from OBJECTS
(SELECT OBJECTS list.elementData[0] FROM java.util.ArrayList list WHERE (list.size = 1))
There is no group by in VisualVM's OQL But you can use the build-in functions to make a JavaScript snippet and run it in the "OQL Console":
var c = {};
/* Filter to show only the first occurence (max count) */
filter(
/* Sort by occurences desc */
sort(
/* Count class instances */
map(
heap.objects("java.util.ArrayList"),
function(list) {
var clazz = 'null';
if (list.size = 1 && list.elementData[0] != null) {
clazz = classof(list.elementData[0]).name;
}
c[clazz] = (c[clazz] ? c[clazz] + 1 : 1);
return { cnt:c[clazz], type:clazz };
}
), 'rhs.cnt - lhs.cnt'
),
function (item) {
if (c[item.type]) {
c[item.type] = false;
return true;
} else {
return false;
}
}
);
The output is an array of object like:
{
cnt = 3854.0,
type = null
}
{
cnt = 501.0,
type = org.apache.tomcat.util.digester.CallMethodRule
}
{
cnt = 256.0,
type = java.lang.ref.WeakReference
}
{
cnt = 176.0,
type = sun.reflect.generics.tree.SimpleClassTypeSignature
}
Finally you can call map function again to format the output to something else like per exmpale as csv:
map(
filter(...),
'it.type + ", " + it.cnt'
);
output:
null, 3854
org.apache.tomcat.util.digester.CallMethodRule, 501
java.lang.ref.WeakReference, 256
sun.reflect.generics.tree.SimpleClassTypeSignature, 176
org.apache.tomcat.util.digester.CallParamRule, 144
com.sun.tools.javac.file.ZipFileIndex$Entry, 141
org.apache.tomcat.util.digester.ObjectCreateRule, 78

How to read data from Hbase?

Hi there I'm use to SQL, but I need to read data from a HBase table. Any help on this would be great. A book or maybe just some sample code to read from the table. Someone said using a scanner would do the trick, but I do not know how to use it.
From the website:
// Sometimes, you won't know the row you're looking for. In this case, you
// use a Scanner. This will give you cursor-like interface to the contents
// of the table. To set up a Scanner, do like you did above making a Put
// and a Get, create a Scan. Adorn it with column names, etc.
Scan s = new Scan();
s.addColumn(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
ResultScanner scanner = table.getScanner(s);
try {
// Scanners return Result instances.
// Now, for the actual iteration. One way is to use a while loop like so:
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// print out the row we found and the columns we were looking for
System.out.println("Found row: " + rr);
}
// The other approach is to use a foreach loop. Scanners are iterable!
// for (Result rr : scanner) {
// System.out.println("Found row: " + rr);
// }
} finally {
// Make sure you close your scanners when you are done!
// Thats why we have it inside a try/finally clause
scanner.close();
}
I would like to offer solution without deprecated methods
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
Admin admin = connection.getAdmin();
// list the tables
Arrays.stream(admin.listTables()).forEach(System.out::println);
// let's insert some data in 'mytable' and get the row
TableName tableName = TableName.valueOf("test_1");
Table table = connection.getTable(tableName);
//Put
Put thePut = new Put(Bytes.toBytes("rowkey1"));
String columnFamily = "m";
String columnQualifier1 = "col1";
String outValue1 = "value1";
String columnQualifier2 = "col2";
String outValue2 = "value2";
thePut.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier1), Bytes.toBytes(outValue1));
thePut.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier2), Bytes.toBytes(outValue2));
table.put(thePut);
//Get
Get theGet = new Get(Bytes.toBytes("rowkey1"));
Result result = table.get(theGet);
//get value first column
String inValue1 = Bytes.toString(result.value());
//get value by ColumnFamily and ColumnName
byte[] inValueByte = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier1));
String inValue2 = Bytes.toString(inValueByte);
//loop for result
for (Cell cell : result.listCells()) {
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.printf("Qualifier : %s : Value : %s%n", qualifier, value);
}
//create Map by result and print it
Map<String, String> getResult = result.listCells().stream().collect(Collectors.toMap(e -> Bytes.toString(CellUtil.cloneQualifier(e)), e -> Bytes.toString(CellUtil.cloneValue(e))));
getResult.entrySet().stream().forEach(e -> System.out.printf("Qualifier : %s : Value : %s%n", e.getKey(), e.getValue()));
System.out.println("---------Scan---------");
Scan scan = new Scan();
ResultScanner resultScan = table.getScanner(scan);
resultScan.forEach(e -> {
System.out.printf("Row \"%s\"%n", Bytes.toString(e.getRow()));
Map<String, String> getResultScan = e.listCells().stream().collect(Collectors.toMap(d -> Bytes.toString(CellUtil.cloneQualifier(d)), d -> Bytes.toString(CellUtil.cloneValue(d))));
getResultScan.entrySet().stream().forEach(d -> System.out.printf("column \"%s\", value \"%s\"%n", d.getKey(), d.getValue()));
System.out.println();
});
I used that but to get the String value you must use method getValue from Result.
byte[] bytes = rr.getValue(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
System.out.println(new String(bytes));

Categories

Resources