HBASE filter by multiple values - java

I am having problems using filters to search data in hbase.
First I am reading some data from one table and storing in a vector or arrayList:
for (Result r : rs) {
for (KeyValue kv : r.raw()) {
if (new String(kv.getFamily()).equals("mpnum")) {
temp = new String(kv.getValue());
x.addElement(temp);
}
}
}
Then, I want to search a different table based on the values of this vector. I used filters to do this: (I tried BinaryPrefixComparator and BinaryComparator as well)
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);
for (int c = 0; c < x.size(); c++) {
System.out.println(x.get(c).toString());
filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes("mpnum"), null, CompareOp.EQUAL, new SubstringComparator( x.get(c).toString() )));
}
I should get 3 results back, however I only get one result back, the first entry in the database.
What doesn't make sense is that when I hardcode the value that I am looking for into my code, I will get all 3 results back.
I thought there might be some issue with converting the bytes to String and then back to bytes, but that would not explain how it was able to bring back the first result. For some reason, it is stopping at the first match and doesn't continue to find the other 2 rows that contain matching data. If I hardcode it i get the results:
x.addElement("abc123");
filterList.addFilter(new SingleColumnValueFilter(Bytes.toBytes("mpnum"), null, CompareOp.EQUAL, new SubstringComparator( x.get(0).toString() )));
Does anyone know what the problem is or what I need to do to resolve my issue? Your help is much appreciated.
Thank You
edit: Here is the contents of the tables:
TABLE1:
ROW COLUMN+CELL
0 column=gpnum:, timestamp=1481300288449, value=def123
0 column=mpnum:, timestamp=1481300273355, value=abc123
0 column=price:, timestamp=1481300255337, value=85.0
1 column=gpnum:, timestamp=1481301599999, value=def2244
1 column=mpnum:, timestamp=1481301582336, value=011511607
1 column=price:, timestamp=1481301673886, value=0.76
TABLE2
ROW COLUMN+CELL
0 column=brand:, timestamp=1481300227283, value=x
0 column=mpnum:, timestamp=1481300212289, value=abc123
0 column=price:, timestamp=1481300110950, value=50.0
1 column=mpnum:, timestamp=1481301806687, value=011511607
1 column=price:, timestamp=1481301777345, value=1.81
13 column=webtype:, timestamp=1483507543878, value=US
3 column=avail:, timestamp=1481306538360, value=avail
3 column=brand:, timestamp=1481306538360, value=brand
3 column=descr:, timestamp=1481306538360, value=description
3 column=dist:, timestamp=1481306538360, value=distributor
3 column=mpnum:, timestamp=1481306538360, value=pnum
3 column=price:, timestamp=1481306538360, value=price
3 column=url:, timestamp=1481306538360, value=url
3 column=webtype:, timestamp=1481306538360, value=webtype
4 column=avail:, timestamp=1481306538374, value=4
4 column=brand:, timestamp=1481306538374, value=x
4 column=descr:, timestamp=1481306538374, value=description
4 column=dist:, timestamp=1481306538374, value=x
4 column=mpnum:, timestamp=1482117383212, value=011511607
4 column=price:, timestamp=1481306538374, value=34.51
4 column=url:, timestamp=1481306538374, value=x
4 column=webtype:, timestamp=1481306538374, value=US
5 column=avail:, timestamp=1481306538378, value=
5 column=brand:, timestamp=1481306538378, value=name
5 column=descr:, timestamp=1481306538378, value=x
5 column=dist:, timestamp=1481306538378, value=x
5 column=mpnum:, timestamp=1482117392043, value=011511607
5 column=price:, timestamp=1481306538378, value=321.412
5 column=url:, timestamp=1481306538378, value=x.com
THIRD TABLE (to store result matches)
0 column=brand:, timestamp=1481301813849, value=name
0 column=cprice:, timestamp=1481301813849, value=1.81
0 column=gpnum:, timestamp=1481301813849, value=def2244
0 column=gprice:, timestamp=1481301813849, value=0.76
0 column=mpnum:, timestamp=1481301813849, value=011511607
**should be three matches those that are in bold above but only brings back one match
If anyone is willing to help for a fee, send me an email at tt224416#gmail.com

Related

REGEX not working as in JAVA program as expected

I have been working on a program which makes use of Regular Expressions. It searches for some text in the files to give me a database based on the scores of different players.
Here is the sample of the text within which it searches.
ISLAMABAD UNITED 1st innings
Player Status Runs Blls 4s 6s S/R
David Warner lbw b. Hassan 19 16 4 0 118.8%
Joe Burns b. Morkel 73 149 16 0 49.0%
Kane Wiliiamson b. Tahir 135 166 28 2 81.3%
Asad Shafiq c. Rahane b. Morkel 22 38 5 0 57.9%
Kraigg Braithwaite c. Khan b. Boult 24 36 5 0 66.7%
Corey Anderson b. Tahir 18 47 3 0 38.3%
Sarfaraz Ahmed b. Morkel 0 6 0 0 0.0%
Tim Southee c. Hales b. Morkel 0 6 0 0 0.0%
Kyle Abbbott c. Rahane b. Morkel 26 35 4 0 74.3%
Steven Finn c. Hales b. Hassan 10 45 1 0 22.2%
Yasir Shah not out 1 12 0 0 8.3%
Total: 338/10 Overs: 92.1 Run Rate: 3.67 Extras: 10
Day 2 10:11 AM
-X-
I am using the following regex to get the different fields..
((?:\/)?(?:[A-Za-z']+)?\s?(?:[A-Za-z']+)?\s?(?:[A-Za-z']+)?\s?)\s+(?:lbw)?(?:not\sout)?(?:run\sout)?\s?(?:\(((?:[A-Za-z']+)?\s?(?:['A-Za-z]+)?)\))?(?:(?:st\s)?\s?(?:((?:['A-Za-z]+)\s(?:['A-Za-z]+)?)))?(?:c(?:\.)?\s((?:(?:['A-Za-z]+)?\s(?:[A-Za-z']+)?)?(?:&)?))?\s+(?:b\.)?\s+((?:[A-Za-z']+)\s(?:[A-Za-z']+)?)?\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)
Batsman Name - Group 1
Person Affecting Stumping (if any) - Group 2
Person Affecting RunOut (if any) - Group 3
Person Taking Catch (if any) - Group 4
Person Taking the wicket (if any) - Group 5
Runs Scored - Group 6
Balls Faced - Group 7
Fours Hit - Group 8
Sixes Hit - Group 9
Here is an example of the text I need to extract...
Group 0 contains David Warner lbw b. Hassan 19 16 4 0 118.8%
Group 1 contains 'David Warner'
Group 2 does not exist in this example
Group 3 does not exist in this example
Group 4 does not exist in this example
Group 5 contains 'Hassan'
Group 6 contains '19'
Group 7 contains '16'
Group 8 contains '4'
Group 9 contains '0'
When I try this on Regexr or Regex101, it gives the Group 1 as David Warner in the Group 1... But in my Java Program, it gives it as David. It is same for all results. I don't know why?
Here's the code of my program:
Matcher bat = Pattern.compile("((?:\\/)?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?(?:[A-Za-z']+)?\\s?)\\s+(?:lbw)?(?:not\\sout)?(?:run\\sout)?\\s?(?:\\(((?:[A-Za-z']+)?\\s?(?:['A-Za-z]+)?)\\))?(?:(?:st\\s)?\\s?(?:((?:['A-Za-z]+)\\s(?:['A-Za-z]+)?)))?(?:c(?:\\.)?\\s((?:(?:['A-Za-z]+)?\\s(?:[A-Za-z']+)?)?(?:&)?))?\\s+(?:b\\.)?\\s+((?:[A-Za-z']+)\\s(?:[A-Za-z']+)?)?\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)").matcher(batting.group(1));
while (bat.find()) {
batPos++;
Batsman a = new Batsman(bat.group(1).replace("\n", "").replace("\r", "").replace("S/R", "").replace("/R", "").trim(), batting.group(2));
if (bat.group(0).contains("not out")) {
a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), false);
} else {
a.bat(Integer.parseInt(bat.group(6)), Integer.parseInt(bat.group(7)), Integer.parseInt(bat.group(8)), Integer.parseInt(bat.group(9)), batting.group(2), true);
}
if (!teams.contains(batting.group(2))) {
teams.add(batting.group(2));
}
boolean f = true;
Batsman clone = null;
for (Batsman b1 : batted) {
if (b1.eq(a)) {
clone = b1;
f = false;
break;
}
}
if (!f) {
if (bat.group(0).contains("not out")) {
clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), false, true);
} else {
clone.batUpdate(a.getRunScored(), a.getBallFaced(), a.getFour(), a.getSix(), true, true);
}
} else {
batted.add(a);
}
}
Your regex is way too complicated for such a simple task. To make it simple(or eliminate it for that matter), operate on a single line rather than the bunch of text.
For this, do
String array[] = str.split("\\n");
Then once you get each individual line, just split by a mutliple spaces, like
String parts[] = array[1].split("\\s\\s+");
Then you can access each part seperately, like Status can be accessed like
System.out.println("Status - " + parts[1]);
All commentators are right, of course, this might not be a typical problem to solve with a regex. But to answer your question - why is there a difference between java and regex101? - let's try to pull out some of the problems caused by your regex that makes it too complex. Next step would be to track down if and why there is a difference in using it in java.
I tried to understand your regex (and cricket at the same time!) and came up with a proposal that might help you to make us understand what your regex should look like.
First attempt reads until the number columns are reached. My guess is, that you should be looking at alternation instead of introducing a lot of groups. Take a look at this: example 1
Explanation:
( # group 1 start
\/? # not sure why there should be /?
[A-Z][a-z]+ # first name
(?:\s(?:[A-Z]['a-z]+)+) # last name
)
(?:\ # spaces
( # group 2 start
lbw # lbw or
|not\sout # not out or
|(c\.|st|run\sout) # group 3: c., st or run out
\s # space
\(? # optional (
(\w+) # group 4: name
\)? # optional )
))? # group 2 end
(?:\s+ # spaces
( # group 5 start
(?:b\.\s)(\w+) # b. name
))? # group 5 end
\s+ # spaces
EDIT 1: Actually, there is a 'stumped' option missing in your regex as well. Added that in mine.
EDIT 2: Stumped doesn't have a dot.
EDIT 3: The complete example can be found at example 2
Some java code to test it:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Foo {
public static void main(String[] args) {
String[] examples = {
"David Warner lbw b. Hassan 19 16 4 0 118.8%",
"Joe Burns b. Morkel 73 149 16 0 49.0%",
"Asad Shafiq c. Rahane b. Morkel 22 38 5 0 57.9%",
"Yasir Shah not out 1 12 0 0 8.3%",
"Yasir Shah st Rahane 1 12 0 0 8.3%",
"Morne Morkel run out (Shah) 11 17 1 1 64.7%"
};
Pattern pattern = Pattern.compile("(\\/?[A-Z][a-z]+(?:\\s(?:[A-Z]['a-z]+)+))(?:\\s+(lbw|not\\sout|(c\\.|st|run\\sout)\\s\\(?(\\w+)\\)?))?(?:\\s+((?:b\\.\\s)(\\w+)))?\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+(\\d+\\.\\d%)");
for (String text : examples) {
System.out.println("TEXT: " + text);
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
System.out.println("batsman: " + matcher.group(1));
if (matcher.group(2) != null) System.out.println(matcher.group(2));
if (matcher.group(5) != null && matcher.group(5).matches("^b.*"))
System.out.println("bowler: " + matcher.group(6));
StringBuilder sb = new StringBuilder("numbers are: ");
int[] groups = {7, 8, 9, 10, 11};
for (int i : groups) {
sb.append(" " + matcher.group(i));
}
System.out.println(sb.toString());
System.out.println();
}
}
}
}

Sqlite ExecuteQuery very slow with java (netbeans)

I use sqlite to store data. I am trying to get data from sqlite table view and fill array of objects in java, but Query Execution takes very long time.
I only have 32 objects with 22 fields, and sqlite with 380 rows.
But to Execute similar statement took me 17 seconds for 32 objects.
sql = "SELECT "
+ " field1,"
+ " field2,"
....
+ " field22"
+ " from Rankedview WHERE Ranking = " + Integer.toString(RankingIndex);
try (ResultSet rs = stmt.executeQuery(sql)) {
while (rs.next()) {
a[j].field1= rs.getString("field1");
..........
a[j].field22 = rs.getInt("field22");
}
}
After I updated sqlite-jdbc driver from 3.7.2 to 3.8.5 time from 17 seconds lowered to 9 seconds.
How can I improve its performance?
Edit:
view definition (ATP is a table)
CREATE VIEW Ranked AS
SELECT p1.ID,
p1.field2,
...
p1.field21,
(
SELECT count() + (
SELECT count() + 1
FROM Table AS p2
WHERE p2.field21 = p1.field21 AND
p2.id > p1.id
)
FROM ATP AS p2
WHERE p2.field21 > p1.field21
)
AS Ranking
FROM ATP AS p1
ORDER BY Ranking ASC;
EXPLAIN QUERY PLAN output:
selectid order from detail
0 0 0 SCAN TABLE ATP AS p1
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SCAN TABLE ATP AS p2
1 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 2
2 0 0 SEARCH TABLE ATP AS p2 USING INTEGER PRIMARY KEY (rowid>?)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 3
3 0 0 SCAN TABLE ATP AS p2
3 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 4
4 0 0 SEARCH TABLE ATP AS p2 USING INTEGER PRIMARY KEY (rowid>?)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 5
5 0 0 SCAN TABLE ATP AS p2
5 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 6
6 0 0 SEARCH TABLE ATP AS p2 USING INTEGER PRIMARY KEY (rowid>?)
0 0 0 USE TEMP B-TREE FOR ORDER BY
To get a row with a specific rank, you should not compute the rank by hand, but use the LIMIT/OFFSET clauses:
SELECT ...
FROM ATP
ORDER BY field21, id
LIMIT 1 OFFSET x
This still requires sorting all table rows to determine which is the x-th, but is much more efficient than multiple nested table scans.

PrintWriter creates file but doesn't print to file

This is a college course assignment that consists of classes TotalSales and TotalSalesTest.In the main program I have created a two dimensional array to output a columnar layout with cross-totals in 4 rows and 5 columns. This program outputs sales totals by row for each sales person(1 - 4) and output by column for products(1 - 5). I have created extra elements in the array to store total for rows and columns. So far both classes compiles. The problem is that although the PrintWriter creates a notepad file, it doesn't print to it. I could use some help on this problem. Here is the code`
//write program in a two diminsional array to output a columnar layout with cross-totals in 4 rows and 5 columns
//program outputs sales totals by row for each sales person(1 - 4) and output by column for products(1 - 5)
//create extra elements to store total for rows and columns
import java.util.Scanner;
import java.io.*;
public class TotalSales
{
private int salesPerson; //declare class variable
private int productNumber;//declare class variable
private double totalSales;//declare class variable
private double allSales;
//declare input and output variables
Scanner inFile; //declare inFile variable
PrintWriter outFile;//declare outFile variable
double[][]sales = new double[6][7];//declare array sales
public void initializer()
{
try
{
inFile = new Scanner( new File( "assign06.txt" ) );
outFile = new PrintWriter( "MonthlyTotalSales.txt" );
outFile.flush();
}
catch (FileNotFoundException e)
{
System.out.println("The input file could not be found!");
System.exit(1);
}
while(inFile.hasNext()) //while there is data to process…
{
salesPerson = inFile.nextInt();//reads salesPerson
productNumber = inFile.nextInt();//reads productNumber
totalSales = inFile.nextDouble();//reads totalSales
sales[salesPerson][productNumber]+=totalSales;
sales[salesPerson][6]+=totalSales;
sales[5][productNumber]+=totalSales;
allSales += totalSales;
} //end while loop
printDetails(sales);//call method printDetails
finishUp();//call method finishUp
}//end initializer
public void printDetails(double[][] array)
{
outFile.println("\t1\t2\t3\t4\t5");
for (int salesPerson =1; salesPerson <5; salesPerson++)
{
outFile.print(salesPerson+ " ");
for(int productNumber=1; productNumber <=array.length; productNumber++)
outFile.print(array[salesPerson][productNumber]+" ");
//end inside loop
outFile.println();
}//end outside loop
outFile.print("Total: \t ");
for(int salesTotal=1; salesTotal<array.length; salesTotal++)
{
outFile.print(array[5][salesTotal] +" ");
}
outFile.print(allSales);
outFile.println();
outFile.print(" ");
outFile.println();
}//end printDetails
public void finishUp()
{
inFile.close();
outFile.close();
System.out.println("The program has finished.");
}//end finishUp
}//end class TotalSales
Here is the test program:
public class TotalSalesTest
{
public static void main(String[] args)
{
TotalSales ts = new TotalSales();
ts.initializer();
}//end method main
}
Here is the text file for the input:
1 1 37.50
1 2 77.00
1 3 68.75
1 4 61.25
1 5 175.00
2 1 45.00
2 2 66.00
2 3 27.50
2 4 49.00
2 5 250.00
3 1 67.50
3 2 33.00
3 4 73.50
3 5 200.00
4 1 15.00
4 2 99.00
4 3 123.75
4 4 85.75
4 5 125.00
1 1 60.00
1 2 88.00
1 3 41.25
1 4 49.00
1 5 225.00
2 1 67.50
2 2 33.00
2 3 27.50
2 4 122.50
2 5 25.00
3 1 60.00
3 2 44.00
3 3 96.25
3 4 36.75
3 5 50.00
4 1 75.00
4 2 11.00
4 3 41.25
4 4 98.00
4 5 125.00
1 1 45.00
1 2 33.00
1 3 27.50
1 4 61.25
1 5 200.00
2 1 52.50
2 2 22.00
2 3 13.75
2 4 36.75
2 5 50.00
3 1 37.50
3 2 88.00
3 3 96.25
3 4 36.75
4 1 37.50
4 2 77.00
4 3 82.50
4 4 73.50
4 5 25.00
1 1 30.00
1 2 88.00
1 3 41.25
1 4 12.25
1 5 175.00
2 1 45.00
2 2 22.00
2 3 68.75
2 4 98.00
3 2 88.00
3 3 41.25
3 4 24.50
4 1 30.00
4 2 88.00
4 3 82.50
4 4 122.50
4 5 175.00
You call printDetails before initializing the writer... Move the printDetails call after outFile = new PrintWriter( "MonthlyTotalSales.txt" );and it should be fine. However, you should have included your constructor since as it is right now, I would think your code throws a NullPointerException because the writer is never initialized. You also need to close the file before writing in it, or at least you need to flush.
The whole structure of your code is bad, you should never have a method such as mainProgram on a class, you should never use a file as an attribute just to use it in every method, and you should never separate in different methods creation and closing of i/o classes.
It seems you didn't call outFile.close();
Call your finishUp() method once you complete writing in the file.
I think your program crashes with a nullpointer on
inFile.hasNext()
because you never actually opened or even instantiated the input file along with the output file in the try/catch block. Whoops!
change
try
{
outFile = new PrintWriter( "MonthlyTotalSales.txt" );
}
to
try
{
inFile = new Scanner(new File("Input.txt"));
outFile = new PrintWriter( "MonthlyTotalSales.txt" );
}

External file with multiple numbers per line Java

I need help reading an external file that has more than one number per line. Here is the external data file:
1 1
2 3
3 5
4 7
5 2
6 4
1 6
2 8
3 1
4 3
5 5
6 7
1 8
2 1
3 2
4 3
5 4
6 5
I read it in by using
public class Prog435a
{
public static void main(String[] args) throws IOException
{
Scanner kbReader = new Scanner(new File("C:\\Users\\Super Mario\\Documents\\java programs\\Prog435\\Prog435a.in"));
while(kbReader.hasNext())
{
int data = kbReader.nextInt();
System.out.println(data);
}
}
}
However, it prints out the file with each number line by line. So instead of appearing in columns, it appears in a single column. How can I get this to print out in two columns as shown above? Thanks for the help.
Loop by line. Call nextInt() two times per line.
while(kbReader.hasNextLine()) {
System.out.println(kbReader.nextInt() + " " + kbReader.nextInt());
}

How to join two tables using Guava

Fact table :
Id Year Month countryId Sales
1 1999 1 1 3000
2 1999 2 1 2300
3 2000 3 2 3999
4 2000 4 3 2939
Dimension table:
Id country province
1 US LA
2 US CA
3 US GA
4 EN LN
and I use Guava table like this :
Table<Integer, String, Object> table = Tables.newCustomTable(
Maps.<Integer, Map<String, Object>> newLinkedHashMap(),
new Supplier<Map<String, Object>>() {
public Map<String, Object> get() {
return Maps.newLinkedHashMap();
}
});
table.put(1, "Year", 1999);
table.put(1, "Month", 1);
table.put(1, "countyId", 1);
table.put(1, "Sales", 3000);
// ...... etc
table1.put(1, "county", "US");
table1.put(1, "provice", 1999);
// ......
I want to implement a LEFT JOIN like:
1 1999 1 1 3000 US LA
2 1999 2 1 2300 US LA
3 2000 3 2 3999 US CA
4 2000 4 3 2939 EN LN
What should I do?
Guava's Table isn't supposed to be used like any SQL's table, as it is a collection. SQL's tables are designed to be indexable, sortable, filterable, etc. Guava's Table has only a fraction of those and only indirectly, and joints aren't part of them (unless you play with transformations).
What you need to do is to have your two tables and loop through the elements of table and find the corresponding mapping in table1.
In your case, I believe you're better off with a List replacing table and a Guava Table for table1. Loop through the list and make your final objects as you get your elements.

Categories

Resources