Hi there I'm use to SQL, but I need to read data from a HBase table. Any help on this would be great. A book or maybe just some sample code to read from the table. Someone said using a scanner would do the trick, but I do not know how to use it.
From the website:
// Sometimes, you won't know the row you're looking for. In this case, you
// use a Scanner. This will give you cursor-like interface to the contents
// of the table. To set up a Scanner, do like you did above making a Put
// and a Get, create a Scan. Adorn it with column names, etc.
Scan s = new Scan();
s.addColumn(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
ResultScanner scanner = table.getScanner(s);
try {
// Scanners return Result instances.
// Now, for the actual iteration. One way is to use a while loop like so:
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// print out the row we found and the columns we were looking for
System.out.println("Found row: " + rr);
}
// The other approach is to use a foreach loop. Scanners are iterable!
// for (Result rr : scanner) {
// System.out.println("Found row: " + rr);
// }
} finally {
// Make sure you close your scanners when you are done!
// Thats why we have it inside a try/finally clause
scanner.close();
}
I would like to offer solution without deprecated methods
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
Admin admin = connection.getAdmin();
// list the tables
Arrays.stream(admin.listTables()).forEach(System.out::println);
// let's insert some data in 'mytable' and get the row
TableName tableName = TableName.valueOf("test_1");
Table table = connection.getTable(tableName);
//Put
Put thePut = new Put(Bytes.toBytes("rowkey1"));
String columnFamily = "m";
String columnQualifier1 = "col1";
String outValue1 = "value1";
String columnQualifier2 = "col2";
String outValue2 = "value2";
thePut.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier1), Bytes.toBytes(outValue1));
thePut.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier2), Bytes.toBytes(outValue2));
table.put(thePut);
//Get
Get theGet = new Get(Bytes.toBytes("rowkey1"));
Result result = table.get(theGet);
//get value first column
String inValue1 = Bytes.toString(result.value());
//get value by ColumnFamily and ColumnName
byte[] inValueByte = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier1));
String inValue2 = Bytes.toString(inValueByte);
//loop for result
for (Cell cell : result.listCells()) {
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.printf("Qualifier : %s : Value : %s%n", qualifier, value);
}
//create Map by result and print it
Map<String, String> getResult = result.listCells().stream().collect(Collectors.toMap(e -> Bytes.toString(CellUtil.cloneQualifier(e)), e -> Bytes.toString(CellUtil.cloneValue(e))));
getResult.entrySet().stream().forEach(e -> System.out.printf("Qualifier : %s : Value : %s%n", e.getKey(), e.getValue()));
System.out.println("---------Scan---------");
Scan scan = new Scan();
ResultScanner resultScan = table.getScanner(scan);
resultScan.forEach(e -> {
System.out.printf("Row \"%s\"%n", Bytes.toString(e.getRow()));
Map<String, String> getResultScan = e.listCells().stream().collect(Collectors.toMap(d -> Bytes.toString(CellUtil.cloneQualifier(d)), d -> Bytes.toString(CellUtil.cloneValue(d))));
getResultScan.entrySet().stream().forEach(d -> System.out.printf("column \"%s\", value \"%s\"%n", d.getKey(), d.getValue()));
System.out.println();
});
I used that but to get the String value you must use method getValue from Result.
byte[] bytes = rr.getValue(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
System.out.println(new String(bytes));
Related
I am trying to create a list with HBase on Java. I can get values all together, but I am confused how to assign them to variables.(String bookName = ...,String bookAuthor = ...)
I need to get all values which are in contribute table, and assign them to variable.
In contribute table, there are id,author,name.
HTable hTable = new HTable(hConn.config, "contribute");
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("author"));
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("name"));
ResultScanner scanner = hTable.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next())
{
for(KeyValue keyValue : result.list()) {
System.out.println("Qualifier : " + Bytes.toString(keyValue.getKey()) + " : Value : " + Bytes.toString(keyValue.getValue()));
}
}
Qualifier : $dba190f6-ff45-4d5b-bf2f-d2ea75bb528fbookauthorT�� : Value : Frans Hoffman
Qualifier : $dba190f6-ff45-4d5b-bf2f-d2ea75bb528fbooknameT�� : Value : h32
When I check them, keyValue.getKey() and keyValue.getValue() show every value in table.
Is it possible to get specific value/qualify? For example, I just need to get values of name.
My understanding :
You want to get only one column name
HTable hTable = new HTable(hConn.config, "contribute");
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("author"));
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("name"));
ResultScanner scanner = hTable.getScanner(scan);
for (Result rr = scan.next() ; rr != null; rr = scan.next()) {
NavigableMap familyMap = rr.getFamilyMap(Bytes.toBytes("book"));
byte[] name = (byte[]) familyMap.get(Bytes.toBytes("name"));
System.out.println(Bytes.toString(name)); // This you can assign it to variable
}
In hbase I have number of columns: name, city,...
Not all columns have values ( some rows can have only 'name' for example)
I want to extract all columns in a row + timestamp of column (in specific order), in case value is null I want to return empty string.
The problem that I facing, I must access column in Result by 'family' and 'qualifier' (I can't access by index of result.listCells().get(i) because null values are skipped)
scan.addColumn(Bytes.toBytes("personal data"), Bytes.toBytes("name"));
scan.addColumn(Bytes.toBytes("personal data"), Bytes.toBytes("city"));
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next()){
byte [] valCity = result.getValue("personal data", "city"); //can't get timestamp using this
//check if valCity null write ("") else write the value
//next column...
}
You can try to use a CellScanner for this. See example below:
CellScanner cellScanner = result.cellScanner();
while (cellScanner.advance()) {
Cell cell = cellScanner.current();
byte[] columnName = Bytes.copy(cell.getQualifierArray(),
cell.getQualifierOffset(),
cell.getQualifierLength());
byte[] familyName = Bytes.copy(cell.getFamilyArray(),
cell.getFamilyOffset(),
cell.getFamilyLength());
long timestamp = cell.getTimestamp();
.....
}
How do I return all timestamped versions of an HBase cell with the Get.setMaxVersions(10) method where 10 is an arbitrary number (could be something else like 20 or 5)? The following is a console main method that creates a table, inserts 10 random integers, and tries to retrieve all of them to print out.
public static void main(String[] args)
throws ZooKeeperConnectionException, MasterNotRunningException, IOException, InterruptedException {
final String HBASE_ZOOKEEPER_QUORUM_IP = "localhost.localdomain"; //set ip in hosts file
final String HBASE_ZOOKEEPER_PROPERTY_CLIENTPORT = "2181";
final String HBASE_MASTER = HBASE_ZOOKEEPER_QUORUM_IP + ":60010";
//identify a data cell with these properties
String tablename = "characters";
String row = "johnsmith";
String family = "capital";
String qualifier = "A";
//config
Configuration config = HBaseConfiguration.create();
config.clear();
config.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM_IP);
config.set("hbase.zookeeper.property.clientPort", HBASE_ZOOKEEPER_PROPERTY_CLIENTPORT);
config.set("hbase.master", HBASE_MASTER);
//admin
HBaseAdmin hba = new HBaseAdmin(config);
//create a table
HTableDescriptor descriptor = new HTableDescriptor(tablename);
descriptor.addFamily(new HColumnDescriptor(family));
hba.createTable(descriptor);
hba.close();
//get the table
HTable htable = new HTable(config, tablename);
//insert 10 different timestamps into 1 record
for(int i = 0; i < 10; i++) {
String value = Integer.toString(i);
Put put = new Put(Bytes.toBytes(row));
put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), System.currentTimeMillis(), Bytes.toBytes(value));
htable.put(put);
Thread.sleep(200); //make sure each timestamp is different
}
//get 10 timestamp versions of 1 record
final int MAX_VERSIONS = 10;
Get get = new Get(Bytes.toBytes(row));
get.setMaxVersions(MAX_VERSIONS);
Result result = htable.get(get);
byte[] value = result.getValue(Bytes.toBytes(family), Bytes.toBytes(qualifier)); // returns MAX_VERSIONS quantity of values
String output = Bytes.toString(value);
//show me what you got
System.out.println(output); //prints 9 instead of 0 through 9
}
The output is 9 (because the loop ended at i=9, and I don't see multiple versions in Hue's HBase Browser web UI. What can I do to fix the versions so it gives me 10 individual results for 0 - 9 instead of one result of only the number 9?
You should use getColumnCells on Result to get all versions (depending on MAX_VERSION_COUNT you have set in Get). getValue returns the latest value.
Sample Code:
List<Cell> values = result.getColumnCells(Bytes.toBytes(family), Bytes.toBytes(qualifier));
for ( Cell cell : values )
{
System.out.println( Bytes.toString( CellUtil.cloneValue( cell ) ) );
}
This is a deprecated approach which matches the version of HBase I am currently working on.
List<KeyValue> kvpairs = result.getColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier));
String line = "";
for(KeyValue kv : kvpairs) {
line += Bytes.toString(kv.getValue()) + "\n";
}
System.out.println(line);
Then, going one step further, it is important to note the setMaxVersions method must be called at table creation to allow for more than a default three values to be inserted into a cell. Here's the updated table creation:
//create a table based on variables from question above
HTableDescriptor tableDescriptor = new HTableDescriptor(tablename);
HColumnDescriptor columnDescriptor = new HColumnDescriptor(columnFamily);
columnDescriptor.setMaxVersions(MAX_VERSIONS);
tableDescriptor.addFamily(columnDescriptor);
hba.createTable(tableDescriptor);
hba.close();
I have the following table structure in Hbase:
Row column+cell
Mary_Ann_05/10/2013 column=cf:verified, timestamp=234454454,value=2,2013-02-12
Mary_Ann_06/10/2013 column=cf:verified, timestamp=2345454454,value=3,2013-02-12
Mary_Ann_07/10/2013 column=cf:verified, timestamp=2345454522454,value=4,2013-02-12
Mary_Ann_08/10/2013 column=cf:verified, timestamp=23433333454,value=1,2013-12-12
I want to retrieve all the records that start with Mary_Ann using java. How do I do that?
You could achieve that using PrefixFilter. Given a prefix, specified when you instantiate the filter instance, all rows that match this prefix are returned to the client. The constructor is : public PrefixFilter(byte[] prefix)
Usage :
Filter filter = new PrefixFilter(Bytes.toBytes("Mary_Ann"));
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
for (KeyValue kv : result.raw()) {
System.out.println("KV: " + kv + ", Value: " +
Bytes.toString(kv.getValue()));
}
}
scanner.close();
HTH
I have a HBase table (from java) and i want to query the table by list of keys. I did the following, but its not working.
mFilterFeatureIt = mFeatureSet.iterator();
FilterList filterList=new FilterList(FilterList.Operator.MUST_PASS_ONE);
while (mFilterFeatureIt.hasNext()) {
long myfeatureId = mFilterFeatureIt.next();
System.out.println("FeatureId:"+myfeatureId+" , ");
RowFilter filter = new RowFilter(CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(myfeatureId)) );
filterList.addFilter(filter);
}
outputMap = HbaseUtils.getHbaseData("mytable", filterList);
System.out.println("Size of outputMap map:"+ outputMap.szie());
public static Map<String, Map<String, String>> getHbaseData(String table, FilterList filter) {
Map<String, Map<String, String>> data = new HashMap<String, Map<String, String>>();
HTable htable = null;
try {
htable = new HTable(HTableConfiguration.getHTableConfiguration(),table);
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner resultScanner = htable.getScanner(scan);
Iterator<Result> results = resultScanner.iterator();
while (results.hasNext()) {
Result result = results.next();
String rowId = Bytes.toString(result.getRow());
List<KeyValue> columns = result.list();
if (null != columns) {
HashMap<String, String> colData = new HashMap<String, String>();
for (KeyValue column : columns) {
colData.put(Bytes.toString(column.getFamily()) + ":"+ Bytes.toString(column.getQualifier()),Bytes.toString(column.getValue()));
}
data.put(rowId, colData);
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (htable != null)
try {
htable.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return data;
}
FeatureId:80515900 ,
FeatureId:80515901 ,
FeatureId:80515902 ,
Size of outputMap map: 0
I see that value of feature id is what i want , but I always get the above output even if the key is present in the hbase table. Can anyone tell me what am i doing wrong ?
EDIT:
I posted the code for my hbase util method too above, so that you can point me to any bugs there.
I am trying to do an SQL equivalent of select * FROM mytable where featureId in (80515900, 80515901, 80515902) My idea to achieve the same in HBase was to create a filter list with one filter for each featureId. Is that correct ?
Here is the content of my table
scan 'mytable', {COLUMNS => ['sample:tag_count'] }
80515900 column=sample:tag_count, timestamp=1339304052748, value=4
80515901 column=sample:tag_count, timestamp=1339304052748, value=0
80515902 column=sample:tag_count, timestamp=1339304052748, value=3
80515903 column=sample:tag_count, timestamp=1339304052748, value=1
80515904 column=sample:tag_count, timestamp=1339304052748, value=2
Its not returning any data as while inserting the data into hbase,
the data-type for key is 'String' (from your scan result) & while fetching, the value passed in RowFilter has 'long' data type. Use this filter:
RowFilter filter = new RowFilter(CompareOp.EQUAL,new
BinaryComparator(Bytes.toBytes(myfeatureId.toString())) );
the while loop will always generate a new filter and added to the filter list.
The circuit are all the keys in the filter. This filter can never apply to a single row. Create only one filter in the while loop pointing to a knowing "myfeatureId".
while (mFilterFeatureIt.hasNext()) {
long myfeatureId = mFilterFeatureIt.next();
System.out.println("FeatureId:"+myfeatureId+" , ");
if ( myfeatureId=="80515902") {
RowFilter filter = new RowFilter(CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(myfeatureId)) );
filterList.addFilter(filter);
}
}
EDIT
For rows quantity, the query is responsible. HBase is not
HBase filters
Filters push row selection criteria out to the HBase. Rows can be filtered remotely and in parallel. Using these functions helps you to avoid sending rows to the client that are not needed.
To get a part out of the key, gets all from 80515900 .. 80515909 try this
of course remove from the loop
RowFilter filter = new RowFilter(CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes(myfeatureId)) );
filterList.addFilter(filter);
and add above the line outputMap = HbaseUtils.getHbaseData("mytable", filterList);
....
RowFilter filter = new RowFilter(CompareOp.EQUAL,new SubStringComparator("8051590"));
filterList.addFilter(filter);
outputMap = HbaseUtils.getHbaseData("mytable", filterList);