What is the fastest way to check if Hbase table exists? Looking at this api :
http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html
Which of these is the fastest :
tableExists
isTableEnabled
isTableAvailable
listTables
With #4 you get list of all tables and iterate trough it and compare if one of those tables matches your table name.
Or there is another, more smart way ?
Here is my sample code. (scala)
import org.apache.hadoop.hbase.HBaseConfiguration
var TableName = "sample"
val conf = HBaseConfiguration.create()
var hbaseAdmin = new HBaseAdmin(conf)
if (!hbaseAdmin.tableExists(TableName)) {
println(TableName + " Does Not Exist")
}
Here, you just need to use "tableExists" to check whether this TableName exists.
HBaseAdmin hba = new HBaseAdmin(hbaseTemplate.getConfiguration());
if (hba.tableExists(tableName) == false) {
HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);
HColumnDescriptor columnDescriptor = new HColumnDescriptor(columnFamilyProfile);
tableDescriptor.addFamily(columnDescriptor);
hba.createTable(tableDescriptor);
}
Using HBaseAdmin.tableExists only takes about 500ms to check if the table exists. We only have two nodes in our cluster, so it might be dependent on the size of your cluster, but it doesn't seem unreasonably slow.
You could attempt to open an HTable to the table and (I think) it will throw an exception/error (not at work yet so can't do a quick test) if the table doesn't exist.
Not 100% this will work, just an off the top of the head idea. :)
I have to check if table exist every time i start my app. I have made this in a configuration class, with spring boot
Here is the code, hope it helps.
#Configuration
public class CustomHbaseConfiguration {
#Bean
public Connection hbaseConnection() throws IOException {
// Create connection
final org.apache.hadoop.conf.Configuration configuration = HBaseConfiguration.create();
// Validate that Hbase is available
HBaseAdmin.available(configuration);
// return the hbaseConnection Bean
return ConnectionFactory.createConnection(configuration);
}
#PostConstruct
public void hbaseTableLogic() throws IOException {
// With the hbaseConnection bean, get the HbaseAdmin instance
Admin admin = hbaseConnection().getAdmin();
// The name of my table
TableName YOUR_TABLE_NAME_HERE = TableName.valueOf("PUT_YOUR_TABLE_NAME_HERE");
// Check if the table already exists ? else : create table and colum family
if (!admin.tableExists(YOUR_TABLE_NAME_HERE)) {
HTableDescriptor hTableDescriptor = new HTableDescriptor(YOUR_TABLE_NAME_HERE);
hTableDescriptor.addFamily(new HColumnDescriptor("PUT_YOUR_COLUM_FAMILY_HERE"));
admin.createTable(hTableDescriptor);
}
}
}
Related
I have 2 database one is mysql and other is postgree.
I tried to get postgree data from mysql transactional method.
#Transactional(value = "pg")
public List<String> getSubordinate(){
Query q1 = JPA.em().createNativeQuery("select vrs.subordinate_number, vrs.superior_number\n" +
"from view_reporting_structure vrs\n" +
"where vrs.superior_number = :personel_number");
q1.setParameter("personel_number","524261");
List<String> me = q1.getResultList();
return me;
}
}
from another method
#Transactional
public Result getOpenRequestList(){
Subordinate subordinate = new Subordinate();
List<String> subordinateData = subordinate.getSubordinate();
....
}
i got error
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'db_hcm.view_reporting_structure' doesn't exist
so my Postgre method recognized as mySQL transaction which is the view not exist in mySQL database. how do I get data from different presistence unit with 1 method?
I never did it (different databases), but I guess the following may work.
For example, you have the following data source definition in application.conf:
# MySql
db.mysql.driver=com.mysql.jdbc.Driver
... the rest of setting for db.mysql
# H2
db.postgre.driver=org.postgresql.Driver
... the rest of setting for db.postgre
Instead of using #Transactional annotation, manage a transaction explicitly and use JPA withTransaction API:
private static final String MYSQL_DB = "mysql";
private static final String POSTGRE_DB = "postgre";
public List<String> getSubordinate() {
JPA.withTransaction(MYSQL_DB, true/* this is read-only flag*/,
() -> {
Query q1 = JPA.em().createNativeQuery("select vrs.subordinate_number, vrs.superior_number\n" +
"from view_reporting_structure vrs\n" +
"where vrs.superior_number = :personel_number");
q1.setParameter("personel_number","524261");
List<String> me = q1.getResultList();
return me;
}
}
public Result getOpenRequestList(){
JPA.withTransaction(POSTGRE_DB, true/* this is read-only flag*/,
() -> {
Subordinate subordinate = new Subordinate();
List<String> subordinateData = subordinate.getSubordinate();
....
}
}
Note: I prefer always use withTransaction, since it allows better control of unhappy flow. You should wrap the call with try-catch. If JPA throws a run-time exception on commit, you can do proper error handling. In case of using #Transactional annotation, commit takes place after controller have finished and you cannot handle the error.
I'm using azure-documentdb java SDK in order to create and use "User Defined Functions (UDFs)"
So from the official documentation I finally find the way (with a Java client) on how to create an UDF:
String regexUdfJson = "{"
+ "id:\"REGEX_MATCH\","
+ "body:\"function (input, pattern) { return input.match(pattern) !== null; }\","
+ "}";
UserDefinedFunction udfREGEX = new UserDefinedFunction(regexUdfJson);
getDC().createUserDefinedFunction(
myCollection.getSelfLink(),
udfREGEX,
new RequestOptions());
And here is a sample query :
SELECT * FROM root r WHERE udf.REGEX_MATCH(r.name, "mytest_.*")
I had to create the UDF one time only because I got an exception if I try to recreate an existing UDF:
DocumentClientException: Message: {"Errors":["The input name presented is already taken. Ensure to provide a unique name property for this resource type."]}
How should I do to know if the UDF already exists ?
I try to use "readUserDefinedFunctions" function without success. Any example / other ideas ?
Maybe for the long term, should we suggest a "createOrReplaceUserDefinedFunction(...)" on azure feedback
You can check for existing UDFs by running query using queryUserDefinedFunctions.
Example:
List<UserDefinedFunction> udfs = client.queryUserDefinedFunctions(
myCollection.getSelfLink(),
new SqlQuerySpec("SELECT * FROM root r WHERE r.id=#id",
new SqlParameterCollection(new SqlParameter("#id", myUdfId))),
null).getQueryIterable().toList();
if (udfs.size() > 0) {
// Found UDF.
}
An answer for .NET users.
`var collectionAltLink = documentCollections["myCollection"].AltLink; // Target collection's AltLink
var udfLink = $"{collectionAltLink}/udfs/{sampleUdfId}"; // sampleUdfId is your UDF Id
var result = await _client.ReadUserDefinedFunctionAsync(udfLink);
var resource = result.Resource;
if (resource != null)
{
// The UDF with udfId exists
}`
Here _client is Azure's DocumentClient and documentCollections is a dictionary of your documentDb collections.
If there's no such UDF in the mentioned collection, the _client throws a NotFound exception.
I am trying to fetch "Region Name" for a "table" using HBase API.
The setup is mentioned below:
Hbase pseudo-distributed installation (version 0.98.7).
Hadoop 2.5.1 installation.
The Hbase contains very few tables for testing purpose. And information about available regions are shown below from the web UI.
"region name" corresponding to the table "test_table" has been highlighted purposefully.
Now, I have been trying to get these region information from the java based API of hbase using below codes.
void scanTable(String tabName){
org.apache.hadoop.conf.Configuration config = HBaseConfiguration.create();
try{
HTable table = new HTable(config, tabName);
org.apache.hadoop.hbase.TableName tn = table.getName();
HRegionInfo hr = new HRegionInfo(tn);
System.out.println(hr.getRegionNameAsString());
table.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
}
Whenever, I pass a table name, say "test_table", the regionName is returned differently on every run.
RUN 1:
test_table,,1419247657866.77b98d085239ed8668596ea659a7ad7d.
RUN 2:
test_table,,1419247839479.d3097b0f4b407ca827e9fa3773b4d7c7.
RUN 3:
test_table,,1419247859921.e1e39678fa724d7168cd4100289c4234.
I assume that I am using wrong method to generate "region_name" or my approach is wrong.
Please help me to get the region information for given table name.
There is a getTableRegions() in HBaseAdmin which returns all the region info for the table name you want.
List getTableRegions(final TableName tableName)
Below is the method that outputs region name for a given table name.
void getRegionOfTable(String tabName){
org.apache.hadoop.hbase.TableName tn = org.apache.hadoop.hbase.TableName.valueOf(tabName);
org.apache.hadoop.conf.Configuration config = HBaseConfiguration.create();
HRegionInfo ob;
try{
HBaseAdmin hba = new HBaseAdmin(config);
List<HRegionInfo> lr = hba.getTableRegions(tn);
Iterator<HRegionInfo> ir = lr.iterator();
while(ir.hasNext()){
ob = ir.next();
System.out.println(ob.getRegionNameAsString());
}
hba.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
Your code produce a different result every time, because you are building a new "region" with a different timestamp every time. Also that code assumes that your table has a single region.
Trying to use a similar example from the sample code found here
My sample function is:
void query()
{
String nodeResult = "";
String rows = "";
String resultString;
String columnsString;
System.out.println("In query");
// START SNIPPET: execute
ExecutionEngine engine = new ExecutionEngine( graphDb );
ExecutionResult result;
try ( Transaction ignored = graphDb.beginTx() )
{
result = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n, n.Name" );
// END SNIPPET: execute
// START SNIPPET: items
Iterator<Node> n_column = result.columnAs( "n" );
for ( Node node : IteratorUtil.asIterable( n_column ) )
{
// note: we're grabbing the name property from the node,
// not from the n.name in this case.
nodeResult = node + ": " + node.getProperty( "Name" );
System.out.println("In for loop");
System.out.println(nodeResult);
}
// END SNIPPET: items
// START SNIPPET: columns
List<String> columns = result.columns();
// END SNIPPET: columns
// the result is now empty, get a new one
result = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n, n.Name" );
// START SNIPPET: rows
for ( Map<String, Object> row : result )
{
for ( Entry<String, Object> column : row.entrySet() )
{
rows += column.getKey() + ": " + column.getValue() + "; ";
System.out.println("nested");
}
rows += "\n";
}
// END SNIPPET: rows
resultString = engine.execute( "start n=node(*) where n.Name =~ '.*79.*' return n.Name" ).dumpToString();
columnsString = columns.toString();
System.out.println(rows);
System.out.println(resultString);
System.out.println(columnsString);
System.out.println("leaving");
}
}
When I run this in the web console I get many results (as there are multiple nodes that have an attribute of Name that contains the pattern 79. Yet running this code returns no results. The debug print statements 'in loop' and 'nested' never print either. Thus this must mean there are not results found in the Iterator, yet that doesn't make sense.
And yes, I already checked and made sure that the graphDb variable is the same as the path for the web console. I have other code earlier that uses the same variable to write to the database.
EDIT - More info
If I place the contents of query in the same function that creates my data, I get the correct results. If I run the query by itself it returns nothing. It's almost as the query works only in the instance where I add the data and not if I come back to the database cold in a separate instance.
EDIT2 -
Here is a snippet of code that shows the bigger context of how it is being called and sharing the same DBHandle
package ContextEngine;
import ContextEngine.NeoHandle;
import java.util.LinkedList;
/*
* Class to handle streaming data from any coded source
*/
public class Streamer {
private NeoHandle myHandle;
private String contextType;
Streamer()
{
}
public void openStream(String contextType)
{
myHandle = new NeoHandle();
myHandle.createDb();
}
public void streamInput(String dataLine)
{
Context context = new Context();
/*
* get database instance
* write to database
* check for errors
* report errors & success
*/
System.out.println(dataLine);
//apply rules to data (make ContextRules do this, send type and string of data)
ContextRules contextRules = new ContextRules();
context = contextRules.processContextRules("Calls", dataLine);
//write data (using linked list from contextRules)
NeoProcessor processor = new NeoProcessor(myHandle);
processor.processContextData(context);
}
public void runQuery()
{
NeoProcessor processor = new NeoProcessor(myHandle);
processor.query();
}
public void closeStream()
{
/*
* close database instance
*/
myHandle.shutDown();
}
}
Now, if I call streamInput AND query in in the same instance (parent calls) the query returns results. If I only call query and do not enter ANY data in that instance (yet web console shows data for same query) I get nothing. Why would I have to create the Nodes and enter them into the database at runtime just to return a valid query. Shouldn't I ALWAYS get the same results with such a query?
You mention that you are using the Neo4j Browser, which comes with Neo4j. However, the example you posted is for Neo4j Embedded, which is the in-process version of Neo4j. Are you sure you are talking to the same database when you try your query in the Browser?
In order to talk to Neo4j Server from Java, I'd recommend looking at the Neo4j JDBC driver, which has good support for connecting to the Neo4j server from Java.
http://www.neo4j.org/develop/tools/jdbc
You can set up a simple connection by adding the Neo4j JDBC jar to your classpath, available here: https://github.com/neo4j-contrib/neo4j-jdbc/releases Then just use Neo4j as any JDBC driver:
Connection conn = DriverManager.getConnection("jdbc:neo4j://localhost:7474/");
ResultSet rs = conn.executeQuery("start n=node({id}) return id(n) as id", map("id", id));
while(rs.next()) {
System.out.println(rs.getLong("id"));
}
Refer to the JDBC documentation for more advanced usage.
To answer your question on why the data is not durably stored, it may be one of many reasons. I would attempt to incrementally scale back the complexity of the code to try and locate the culprit. For instance, until you've found your problem, do these one at a time:
Instead of looping through the result, print it using System.out.println(result.dumpToString());
Instead of the regex query, try just MATCH (n) RETURN n, to return all data in the database
Make sure the data you are seeing in the browser is not "old" data inserted earlier on, but really is an insert from your latest run of the Java program. You can verify this by deleting the data via the browser before running the Java program using MATCH (n) OPTIONAL MATCH (n)-[r]->() DELETE n,r;
Make sure you are actually working against the same database directories. You can verify this by leaving the server running. If you can still start your java program, unless your Java program is using the Neo4j REST Bindings, you are not using the same directory. Two Neo4j databases cannot run against the same database directory simultaneously.
I'm familiar with the java.sql.DatabaseMetaData interface, but I find it quite clunky to use. For example, in order to find out the table names, you have to call getTables and loop through the returned ResultSet, using well-known literals as the column names.
Is there an easier way to obtain database metadata?
It's easily done using DdlUtils:
import javax.sql.DataSource;
import org.apache.ddlutils.Platform;
import org.apache.ddlutils.PlatformFactory;
import org.apache.ddlutils.model.Database;
import org.apache.ddlutils.platform.hsqldb.HsqlDbPlatform;
public void readMetaData(final DataSource dataSource) {
final Platform platform = PlatformFactory.createNewPlatformInstance(dataSource);
final Database database = platform.readModelFromDatabase("someName");
// Inspect the database as required; has objects like Table/Column/etc.
}
Take a look at SchemaCrawler (free and open source), which is another API designed for this purpose. Some sample SchemaCrawler code:
// Create the options
final SchemaCrawlerOptions options = new SchemaCrawlerOptions();
// Set what details are required in the schema - this affects the
// time taken to crawl the schema
options.setSchemaInfoLevel(SchemaInfoLevel.standard());
options.setShowStoredProcedures(false);
// Sorting options
options.setAlphabeticalSortForTableColumns(true);
// Get the schema definition
// (the database connection is managed outside of this code snippet)
final Database database = SchemaCrawlerUtility.getDatabase(connection, options);
for (final Catalog catalog: database.getCatalogs())
{
for (final Schema schema: catalog.getSchemas())
{
System.out.println(schema);
for (final Table table: schema.getTables())
{
System.out.print("o--> " + table);
if (table instanceof View)
{
System.out.println(" (VIEW)");
}
else
{
System.out.println();
}
for (final Column column: table.getColumns())
{
System.out.println(" o--> " + column + " (" + column.getType()
+ ")");
}
}
}
}
http://schemacrawler.sourceforge.net/