HBase Filters: SingleColumnValueFilter & ColumnPrefixFilter - java

I have an HBase table, Documents.
Suppose, I have entries for one user like the below -
ROW_KEY COLUMN
R1 "A:PRDDOC123" - 012021
"A:DATADOC123" - Dummy JSON
R2 "A:PRDDOC124" - 022021
"A:DATADOC124" - Dummy JSON
R3 "A:PRDDOC125" - 012021
"A:DATADOC124" - Dummy JSON
Each Row key has two columns, PRD and DATA having suffix document numbers i.e. DOC123, DOC124 & DOC125.
I need to fetch the rows which have a PRD value of 012021 but I don't have access to document numbers.
If I am using ColumnPrefixFilter, it's fetching all rows for the user. I am using the below code.
Table table = getConnection().getTable("DOCUMENTS");
PrefixFilter rowFilter = new PrefixFilter(Bytes.toBytes("USER_ID|FY"));
Scan scan = new Scan();
FilterList filterList = new FilterList();
ColumnPrefixFilter colPrefixFilter = new ColumnPrefixFilter(Bytes.toBytes("PRD"));
filterList.add(colPrefixFilter);
scan.setFilter(filterList);
ResultScanner scanner = table.getScanner(scan);
If I am using Single Column Value Filter, it's not fetching any rows as I don't have the full column name, using the below code.
Table table = getConnection().getTable("DOCUMENTS");
PrefixFilter rowFilter = new PrefixFilter(Bytes.toBytes("USER_ID|FY"));
Scan scan = new Scan();
FilterList filterList = new FilterList();
filterList.add(new SingleColumnValueFilter(Bytes.toBytes("A"), Bytes.toBytes("PRD"),
CompareOp.EQUAL, Bytes.toBytes("012021"));
scan.setFilter(filterList);
ResultScanner scanner = table.getScanner(scan);
Is there any way I can fetch only rows with column PRD having a value of 012021?
Can we somehow give column prefix filter inside Column value filter?

Related

hbase GREATER_OR_EQAUL operator does not work as expected

i have this row in my hbase table:
{rowkey:1_1611646861574/cf:value/1611646776287/Put/vlen=8/seqid=0}
now i want to do a simple scan and find rows where their column value is greater than "1611388300000". but it does not return any record; however it returns this record when i use LESS_OR_EQUAL than 1611647500000.
the weird thing is i can get this record from apache phoenix sql query: select * from my_table where value>=1611388300000.
so number is a obviously bigger than column value and apache phoenix returns it; why hbase compare Operator does not?
here is my code:
Table table = con.getTable(TableName.valueOf("my_table"));
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("value"));
FilterList flist = new FilterList(FilterList.Operator.MUST_PASS_ALL);
flist.addFilter(new PrefixFilter(Bytes.toBytes("1")));
flist.addFilter(new SingleColumnValueFilter(
Bytes.toBytes("cf"), Bytes.toBytes("value"), CompareOperator.GREATER_OR_EQUAL,
new BinaryComparator(Bytes.toBytes("1611388300000"))));
scan.setFilter(flist);
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result = scanner.next())
System.out.println("Found row : " + result);
scanner.close();
i dont exactly know why but LESS_OR_EQUAL works with string:
new SingleColumnValueFilter(
Bytes.toBytes("cf"), Bytes.toBytes("value"), CompareOperator.LESS_OR_EQUAL,
new BinaryComparator(Bytes.toBytes("1611647500000")))
but GREATER_OR_EQUAL did not work when i used quotes so i had to use number (L is added to specify Long)
new SingleColumnValueFilter(
Bytes.toBytes("ui"), Bytes.toBytes("C"), CompareOperator.GREATER_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(1611388300000L)))
Since you're comparing long, you should use LongComparator instead of BinaryComparator:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/LongComparator.html

int values doesn't compare in hbase filter using SingleColumnValueFilter

Compare int values with specific qualifier isn't working.
I tried using new BinaryComparator but it doesn't work.
FilterList fl = new FilterList();
SingleColumnValueFilte filter = new SingleColumnValueFilter(
Bytes.toBytes("dynamic"),Bytes.toBytes("speed"),CompareOp.GREATER_OR_EQUAL,
new BinaryComparator(Bytes.toBytes(10));
I tried both "10" and 10 but it doesn't work.
fl.addFilter(filter);
Scan s = new Scan(Bytes.toBytes("235091871:2015-01-01T00:02:58"),Bytes.toBytes("235091871:2015-01-01T00:22:19"));
System.out.println("hello scan");
s.addColumn(Bytes.toBytes("dynamic"),Bytes.toBytes("speed"));
s.setFilter(filter);
ResultScanner rs = table.getScanner(s);
for (Result res : rs) {
for (KeyValue result : res.raw()) {
System.out.println(Bytes.toString(result.getRow())+" "+Bytes.toString(result.getValue()));
}
}
I have speed values like 10.3,10.4,11.1,9.6,8.3
The output should be 10.3,10.4,11.4 when I process or filter using CompareOp.GREATER_OR_EQUAL.
If I need to write a custom comparator please suggest how to do it with code.

How to compare a cell value with a string and count the frequency in hbase?

I am very new to hbase.
I have a two column family named as"name", "story type" and the values are like {Rajib,Clarke}, {photo,status,gif} respectively
I want to scan the rows of 'story type' when 'name' column has value "Rajib" and count the frequency of finding 'gif' with respect to "Rajib"
I have tried like below but this is not working. How can I do that??
Thanks in advance..
String columnfamily1="story type";
int countgif=0;
for (Result res: scanner)
{
if(Bytes.toString(res.getValue(Bytes.toBytes(columnfamily),null))=="Clarke"){
if(Bytes.toString(res.getValue(Bytes.toBytes(columnfamily1),null))=="gif"){
countgif=countgif+1;
}
}
}
system.out.printf("%d",countgif);
Can you put some sample data so it will be clear about your requirement.
As I understood from your question you need to filter those rows which has name as 'clarke' and story-type as 'gif'.
This can be implemented using Scan class with additional filters as
SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter(
Bytes.toBytes("name" ),
Bytes.toBytes("name"),
CompareOp.EQUAL,
Bytes.toBytes("clarke"));
SingleColumnValueFilter filter_by_story = new SingleColumnValueFilter(
Bytes.toBytes("story_type" ),
Bytes.toBytes("type"),
CompareOp.EQUAL,
Bytes.toBytes("gif"));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
filterList.addFilter(filter_by_name);
filterList.addFilter(filter_by_story);
Scan scan = new Scan();
scan.setFilter(filterList);
ResultScanner scanner = table.getScanner(scan);
Result result = scanner.next;
// now use iteration to print the complete value.. if you are interested only in count then just iterate and increment count
int gifCount =0;
while(result != null)
{
gifCount ++;
result =scanner.next;
}
println(gifCount);
Hope this resolves your problem.

Apache Spark: How to join multiple columns (features) from csv with Tokenizer in Java?

I have a csv file, with three columns: Id, Main_user and Users. Id is the label and both other values as features. Now I want to load the two features (main_user and users) from the csv, vectorize them and assemble them as one vector.
After using HashingTF as described in the documentation, how do I add a second feature "Main_user", in addition to the feature "Users".
DataFrame df = (new CsvParser()).withUseHeader(true).csvFile(sqlContext, csvFile);
Tokenizer tokenizer = new Tokenizer().setInputCol("Users").setOutputCol("words");
DataFrame wordsData = tokenizer.transform(df);
int numFeatures = 20;
HashingTF hashingTF = new HashingTF().setInputCol("words")
.setOutputCol("rawFeatures").setNumFeatures(numFeatures);
ok I found a solution. Load the columns one after another, tokenize, hashTF and at the end assemble them. I would appreciate any improvement to this.
DataFrame df = (new CsvParser()).withUseHeader(true).csvFile(sqlContext, csvFile);
Tokenizer tokenizer = new Tokenizer();
HashingTF hashingTF = new HashingTF();
int numFeatures = 35;
tokenizer.setInputCol("Users")
.setOutputCol("Users_words");
DataFrame df1 = tokenizer.transform(df);
hashingTF.setInputCol("Users_words")
.setOutputCol("rawUsers").setNumFeatures(numFeatures);
DataFrame featurizedData1 = hashingTF.transform(df1);
tokenizer.setInputCol("Main_user")
.setOutputCol("Main_user_words");
DataFrame df2 = tokenizer.transform(featurizedData1);
hashingTF.setInputCol("Main_user_words")
.setOutputCol("rawMain_user").setNumFeatures(numFeatures);
DataFrame featurizedData2 = hashingTF.transform(df2);
// Now Assemble Vectors
VectorAssembler assembler = new VectorAssembler()
.setInputCols(new String[]{"rawUsers", "rawMain_user"})
.setOutputCol("assembeledVector");
DataFrame assembledFeatures = assembler.transform(featurizedData2);

How to get all the rows given a part of the row key in Hbase

I have the following table structure in Hbase:
Row column+cell
Mary_Ann_05/10/2013 column=cf:verified, timestamp=234454454,value=2,2013-02-12
Mary_Ann_06/10/2013 column=cf:verified, timestamp=2345454454,value=3,2013-02-12
Mary_Ann_07/10/2013 column=cf:verified, timestamp=2345454522454,value=4,2013-02-12
Mary_Ann_08/10/2013 column=cf:verified, timestamp=23433333454,value=1,2013-12-12
I want to retrieve all the records that start with Mary_Ann using java. How do I do that?
You could achieve that using PrefixFilter. Given a prefix, specified when you instantiate the filter instance, all rows that match this prefix are returned to the client. The constructor is : public PrefixFilter(byte[] prefix)
Usage :
Filter filter = new PrefixFilter(Bytes.toBytes("Mary_Ann"));
Scan scan = new Scan();
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
for (KeyValue kv : result.raw()) {
System.out.println("KV: " + kv + ", Value: " +
Bytes.toString(kv.getValue()));
}
}
scanner.close();
HTH

Categories

Resources