Hbase extracting value and timestamp from a cell - java

In hbase I have number of columns: name, city,...
Not all columns have values ( some rows can have only 'name' for example)
I want to extract all columns in a row + timestamp of column (in specific order), in case value is null I want to return empty string.
The problem that I facing, I must access column in Result by 'family' and 'qualifier' (I can't access by index of result.listCells().get(i) because null values are skipped)
scan.addColumn(Bytes.toBytes("personal data"), Bytes.toBytes("name"));
scan.addColumn(Bytes.toBytes("personal data"), Bytes.toBytes("city"));
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next()){
byte [] valCity = result.getValue("personal data", "city"); //can't get timestamp using this
//check if valCity null write ("") else write the value
//next column...
}

You can try to use a CellScanner for this. See example below:
CellScanner cellScanner = result.cellScanner();
while (cellScanner.advance()) {
Cell cell = cellScanner.current();
byte[] columnName = Bytes.copy(cell.getQualifierArray(),
cell.getQualifierOffset(),
cell.getQualifierLength());
byte[] familyName = Bytes.copy(cell.getFamilyArray(),
cell.getFamilyOffset(),
cell.getFamilyLength());
long timestamp = cell.getTimestamp();
.....
}

Related

while loop with next value

I want to take the key and values in the while loop
while (rs.next()) {
String simpleData = "<SimpleData name="akey">avalue</SimpleData>\n";
}
I need to take all the key and values. If I have 10 values available in resultset, then the simple data should contain all the key and values. like below
Output:- finally my string should be like below
String
simpleData = "<SimpleData name="acolumnname">acolumnvalue</SimpleData>
<SimpleData name="bcolumnname">bcolumnvalue</SimpleData>
…";
How can I achieve
If you want to create an xml structure by hand (that means without making use of a suitable library), you can try something like this:
public static void main(String[] args) {
ResultSet rs = // however you get it
// get the meta data of the result set, they are including the column headers
ResultSetMetaData resultSetMetaData = rs.getMetaData();
// and get the first column header
String columnHeader = resultSetMetaData.getColumnLabel(1);
// initialize an empty StringBuilder OUTSIDE the loop
StringBuilder sb = new StringBuilder();
// then loop through the resultset
while (rs.next()) {
// appending the results to the StringBuilder
sb.append("<SimpleData name=\"") // opening tag plus xml attribute name
.append(columnHeader) // column header as determined before the loop
.append("\">") // close the opening tag and the attribute value
.append(rs.getString(1)) // get the value from the result set
.append("</SimpleData>") // write the closing tag
.append(System.lineSeparator()); // append a line break
}
System.out.println(sb.toString());
}
This should be printing an xml structure (hopefully the desired one):
<SimpleData name="column header">value</SimpleData>
EDIT
Turned out you want to create a single xml node for each column value of a result set that has only one row. That's (nearly totally) different...
I would then access the columns by their alias (header / label) instead of their index:
public static void main(String[] args) throws SQLException {
ResultSet rs = null; // however you get it
// create a container for the headers
List<String> columnHeaders = new ArrayList<>();
// get the meta data of the result set, they are including the column headers
ResultSetMetaData resultSetMetaData = rs.getMetaData();
// determine the amount of columns
int columnCount = resultSetMetaData.getColumnCount();
// iterate them and store their values in a list of strings
for (int i = 1; i <= columnCount; i++) {
columnHeaders.add(resultSetMetaData.getColumnLabel(i));
}
// initialize an empty StringBuilder OUTSIDE the loop
StringBuilder sb = new StringBuilder();
// then loop through the resultset
while (rs.next()) {
// now loop through the columnHeaders
for (String header : columnHeaders) {
// append each column result to the StringBuilder as a single xml node
sb.append("<SimpleData name=\"") // opening tag plus xml attribute name
.append(header) // column header as determined before the loop
.append("\">") // close the opening tag and the attribute value
.append(rs.getString(header)) // get the value from the result set by header, not index
.append("</SimpleData>") // write the closing tag
.append(System.lineSeparator()); // append a line break
}
}
System.out.println(sb.toString());
}
Declare simpleData outside the while loop and in every iteration you should append to simpleData by +=
String simpleData ;
int i ;
while (rs.next()) {
simpleData += "<SimpleData name="+key+" "+rs.getString(i)+"</SimpleData>\n";
i++;
}

HBase scan values without their id

I am trying to create a list with HBase on Java. I can get values all together, but I am confused how to assign them to variables.(String bookName = ...,String bookAuthor = ...)
I need to get all values which are in contribute table, and assign them to variable.
In contribute table, there are id,author,name.
HTable hTable = new HTable(hConn.config, "contribute");
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("author"));
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("name"));
ResultScanner scanner = hTable.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next())
{
for(KeyValue keyValue : result.list()) {
System.out.println("Qualifier : " + Bytes.toString(keyValue.getKey()) + " : Value : " + Bytes.toString(keyValue.getValue()));
}
}
Qualifier : $dba190f6-ff45-4d5b-bf2f-d2ea75bb528fbookauthorT�� : Value : Frans Hoffman
Qualifier : $dba190f6-ff45-4d5b-bf2f-d2ea75bb528fbooknameT�� : Value : h32
When I check them, keyValue.getKey() and keyValue.getValue() show every value in table.
Is it possible to get specific value/qualify? For example, I just need to get values of name.
My understanding :
You want to get only one column name
HTable hTable = new HTable(hConn.config, "contribute");
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("author"));
scan.addColumn(Bytes.toBytes("book"), Bytes.toBytes("name"));
ResultScanner scanner = hTable.getScanner(scan);
for (Result rr = scan.next() ; rr != null; rr = scan.next()) {
NavigableMap familyMap = rr.getFamilyMap(Bytes.toBytes("book"));
byte[] name = (byte[]) familyMap.get(Bytes.toBytes("name"));
System.out.println(Bytes.toString(name)); // This you can assign it to variable
}

Java - Parse delimited file and find column datatypes

Is it possible to parse a delimited file and find column datatypes? e.g
Delimited file:
Email,FirstName,DOB,Age,CreateDate
test#test1.com,Test User1,20/01/2001,24,23/02/2015 14:06:45
test#test2.com,Test User2,14/02/2001,24,23/02/2015 14:06:45
test#test3.com,Test User3,15/01/2001,24,23/02/2015 14:06:45
test#test4.com,Test User4,23/05/2001,24,23/02/2015 14:06:45
Output:
Email datatype: email
FirstName datatype: Text
DOB datatype: date
Age datatype: int
CreateDate datatype: Timestamp
The purpose of this is to read a delimited file and construct a table creation query on the fly and insert data into that table.
I tried using apache validator, I believe we need to parse the complete file in order to determine each column data type.
EDIT: The code that I've tried:
CSVReader csvReader = new CSVReader(new FileReader(fileName),',');
String[] row = null;
int[] colLength=(int[]) null;
int colCount = 0;
String[] colDataType = null;
String[] colHeaders = null;
String[] header = csvReader.readNext();
if (header != null) {
colCount = header.length;
}
colLength = new int[colCount];
colDataType = new String[colCount];
colHeaders = new String[colCount];
for (int i=0;i<colCount;i++){
colHeaders[i]=header[i];
}
int templength=0;
String tempType = null;
IntegerValidator intValidator = new IntegerValidator();
DateValidator dateValidator = new DateValidator();
TimeValidator timeValidator = new TimeValidator();
while((row = csvReader.readNext()) != null) {
for(int i=0;i<colCount;i++) {
templength = row[i].length();
colLength[i] = templength > colLength[i] ? templength : colLength[i];
if(colHeaders[i].equalsIgnoreCase("email")){
logger.info("Col "+i+" is Email");
} else if(intValidator.isValid(row[i])){
tempType="Integer";
logger.info("Col "+i+" is Integer");
} else if(timeValidator.isValid(row[i])){
tempType="Time";
logger.info("Col "+i+" is Time");
} else if(dateValidator.isValid(row[i])){
tempType="Date";
logger.info("Col "+i+" is Date");
} else {
tempType="Text";
logger.info("Col "+i+" is Text");
}
logger.info(row[i].length()+"");
}
Not sure if this is the best way of doing this, any pointers in the right direction would be of help
If you wish to write this yourself rather than use a third party library then probably the easiest mechanism is to define a regular expression for each data type and then check if all fields satisfy it. Here's some sample code to get you started (using Java 8).
public enum DataType {
DATETIME("dd/dd/dddd dd:dd:dd"),
DATE("dd/dd/dddd",
EMAIL("\\w+#\\w+"),
TEXT(".*");
private final Predicate<String> tester;
DateType(String regexp) {
tester = Pattern.compile(regexp).asPredicate();
}
public static Optional<DataType> getTypeOfField(String[] fieldValues) {
return Arrays.stream(values())
.filter(dt -> Arrays.stream(fieldValues).allMatch(dt.tester)
.findFirst();
}
}
Note that this relies on the order of the enum values (e.g. testing for datetime before date).
Yes it is possible and you do have to parse the entire file first. Have a set of rules for each data type. Iterate over every row in the column. Start of with every column having every data type and cancel of data types if a row in that column violates a rule of that data type. After iterating the column check what data type is left for the column. Eg. Lets say we have two data types integer and text... rules for integer... well it must only contain numbers 0-9 and may begin with '-'. Text can be anything.
Our column:
345
-1ab
123
The integer data type would be removed by the second row so it would be text. If row two was just -1 then you would be left with integer and text so it would be integer because text would never be removed as our rule says text can be anything... you dont have to check for text basically if you left with no other data type the answer is text. Hope this answers your question
I have slight similar kind of logic needed for my project. Searched lot but did not get right solution. For me i need to pass string object to the method that should return datatype of the obj. finally i found post from #sprinter, it looks similar to my logic but i need to pass string instead of string array.
Modified the code for my need and posted below.
public enum DataType {
DATE("dd/dd/dddd"),
EMAIL("#gmail"),
NUMBER("[0-9]+"),
STRING("^[A-Za-z0-9? ,_-]+$");
private final String regEx;
public String getRegEx() {
return regEx;
}
DataType(String regEx) {
this.regEx = regEx;
}
public static Optional<DataType> getTypeOfField(String str) {
return Arrays.stream(DataType.values())
.filter(dt -> {
return Pattern.compile(dt.getRegEx()).matcher(str).matches();
})
.findFirst();
}
}
For example:
Optional<DataType> dataType = getTypeOfField("Bharathiraja");
System.out.println(dataType);
System.out.println(dataType .get());
Output:
Optional[STRING]
STRING
Please note, regular exp pattern is vary based on requirements, so modify the pattern as per your need don't take as it is.
Happy Coding !

How to get timestamped versions of HBase cell

How do I return all timestamped versions of an HBase cell with the Get.setMaxVersions(10) method where 10 is an arbitrary number (could be something else like 20 or 5)? The following is a console main method that creates a table, inserts 10 random integers, and tries to retrieve all of them to print out.
public static void main(String[] args)
throws ZooKeeperConnectionException, MasterNotRunningException, IOException, InterruptedException {
final String HBASE_ZOOKEEPER_QUORUM_IP = "localhost.localdomain"; //set ip in hosts file
final String HBASE_ZOOKEEPER_PROPERTY_CLIENTPORT = "2181";
final String HBASE_MASTER = HBASE_ZOOKEEPER_QUORUM_IP + ":60010";
//identify a data cell with these properties
String tablename = "characters";
String row = "johnsmith";
String family = "capital";
String qualifier = "A";
//config
Configuration config = HBaseConfiguration.create();
config.clear();
config.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM_IP);
config.set("hbase.zookeeper.property.clientPort", HBASE_ZOOKEEPER_PROPERTY_CLIENTPORT);
config.set("hbase.master", HBASE_MASTER);
//admin
HBaseAdmin hba = new HBaseAdmin(config);
//create a table
HTableDescriptor descriptor = new HTableDescriptor(tablename);
descriptor.addFamily(new HColumnDescriptor(family));
hba.createTable(descriptor);
hba.close();
//get the table
HTable htable = new HTable(config, tablename);
//insert 10 different timestamps into 1 record
for(int i = 0; i < 10; i++) {
String value = Integer.toString(i);
Put put = new Put(Bytes.toBytes(row));
put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), System.currentTimeMillis(), Bytes.toBytes(value));
htable.put(put);
Thread.sleep(200); //make sure each timestamp is different
}
//get 10 timestamp versions of 1 record
final int MAX_VERSIONS = 10;
Get get = new Get(Bytes.toBytes(row));
get.setMaxVersions(MAX_VERSIONS);
Result result = htable.get(get);
byte[] value = result.getValue(Bytes.toBytes(family), Bytes.toBytes(qualifier)); // returns MAX_VERSIONS quantity of values
String output = Bytes.toString(value);
//show me what you got
System.out.println(output); //prints 9 instead of 0 through 9
}
The output is 9 (because the loop ended at i=9, and I don't see multiple versions in Hue's HBase Browser web UI. What can I do to fix the versions so it gives me 10 individual results for 0 - 9 instead of one result of only the number 9?
You should use getColumnCells on Result to get all versions (depending on MAX_VERSION_COUNT you have set in Get). getValue returns the latest value.
Sample Code:
List<Cell> values = result.getColumnCells(Bytes.toBytes(family), Bytes.toBytes(qualifier));
for ( Cell cell : values )
{
System.out.println( Bytes.toString( CellUtil.cloneValue( cell ) ) );
}
This is a deprecated approach which matches the version of HBase I am currently working on.
List<KeyValue> kvpairs = result.getColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier));
String line = "";
for(KeyValue kv : kvpairs) {
line += Bytes.toString(kv.getValue()) + "\n";
}
System.out.println(line);
Then, going one step further, it is important to note the setMaxVersions method must be called at table creation to allow for more than a default three values to be inserted into a cell. Here's the updated table creation:
//create a table based on variables from question above
HTableDescriptor tableDescriptor = new HTableDescriptor(tablename);
HColumnDescriptor columnDescriptor = new HColumnDescriptor(columnFamily);
columnDescriptor.setMaxVersions(MAX_VERSIONS);
tableDescriptor.addFamily(columnDescriptor);
hba.createTable(tableDescriptor);
hba.close();

How to read data from Hbase?

Hi there I'm use to SQL, but I need to read data from a HBase table. Any help on this would be great. A book or maybe just some sample code to read from the table. Someone said using a scanner would do the trick, but I do not know how to use it.
From the website:
// Sometimes, you won't know the row you're looking for. In this case, you
// use a Scanner. This will give you cursor-like interface to the contents
// of the table. To set up a Scanner, do like you did above making a Put
// and a Get, create a Scan. Adorn it with column names, etc.
Scan s = new Scan();
s.addColumn(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
ResultScanner scanner = table.getScanner(s);
try {
// Scanners return Result instances.
// Now, for the actual iteration. One way is to use a while loop like so:
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// print out the row we found and the columns we were looking for
System.out.println("Found row: " + rr);
}
// The other approach is to use a foreach loop. Scanners are iterable!
// for (Result rr : scanner) {
// System.out.println("Found row: " + rr);
// }
} finally {
// Make sure you close your scanners when you are done!
// Thats why we have it inside a try/finally clause
scanner.close();
}
I would like to offer solution without deprecated methods
Configuration conf = HBaseConfiguration.create();
Connection connection = ConnectionFactory.createConnection(conf);
Admin admin = connection.getAdmin();
// list the tables
Arrays.stream(admin.listTables()).forEach(System.out::println);
// let's insert some data in 'mytable' and get the row
TableName tableName = TableName.valueOf("test_1");
Table table = connection.getTable(tableName);
//Put
Put thePut = new Put(Bytes.toBytes("rowkey1"));
String columnFamily = "m";
String columnQualifier1 = "col1";
String outValue1 = "value1";
String columnQualifier2 = "col2";
String outValue2 = "value2";
thePut.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier1), Bytes.toBytes(outValue1));
thePut.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier2), Bytes.toBytes(outValue2));
table.put(thePut);
//Get
Get theGet = new Get(Bytes.toBytes("rowkey1"));
Result result = table.get(theGet);
//get value first column
String inValue1 = Bytes.toString(result.value());
//get value by ColumnFamily and ColumnName
byte[] inValueByte = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(columnQualifier1));
String inValue2 = Bytes.toString(inValueByte);
//loop for result
for (Cell cell : result.listCells()) {
String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));
String value = Bytes.toString(CellUtil.cloneValue(cell));
System.out.printf("Qualifier : %s : Value : %s%n", qualifier, value);
}
//create Map by result and print it
Map<String, String> getResult = result.listCells().stream().collect(Collectors.toMap(e -> Bytes.toString(CellUtil.cloneQualifier(e)), e -> Bytes.toString(CellUtil.cloneValue(e))));
getResult.entrySet().stream().forEach(e -> System.out.printf("Qualifier : %s : Value : %s%n", e.getKey(), e.getValue()));
System.out.println("---------Scan---------");
Scan scan = new Scan();
ResultScanner resultScan = table.getScanner(scan);
resultScan.forEach(e -> {
System.out.printf("Row \"%s\"%n", Bytes.toString(e.getRow()));
Map<String, String> getResultScan = e.listCells().stream().collect(Collectors.toMap(d -> Bytes.toString(CellUtil.cloneQualifier(d)), d -> Bytes.toString(CellUtil.cloneValue(d))));
getResultScan.entrySet().stream().forEach(d -> System.out.printf("column \"%s\", value \"%s\"%n", d.getKey(), d.getValue()));
System.out.println();
});
I used that but to get the String value you must use method getValue from Result.
byte[] bytes = rr.getValue(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"));
System.out.println(new String(bytes));

Categories

Resources