Hive-MetaStore issue during updating table partitions

Hive-MetaStore issue during updating table partitions - java

I am trying to update hive-table partitions using Hive Java Api's.These are the below steps that i am following to achieve this:-
1.Extracting partitions which are not in metastore.
2.Adding these Partitions to table.
3.Going back to Hive-Command line and running show partitions and msck repair table command just to make sure everything is fine.
What i got:-
1.Show partitions is working fine(giving list of partitions which i have added).
2.MSCK Repair command is not working(getting this :Partitions are not present in metastore.)
Here is the piece of code that i am using :-
public class HiveMetastoreChecker {
public static void main(String[] args) {
final String dbName = "db_name";
final String tableName = "db_name.table_name";
CheckResult result = new CheckResult();
try {
Configuration configuration = new Configuration();
HiveConf conf = new HiveConf();
conf.addResource(configuration);
Hive hive = Hive.get(conf, true);
HiveMetaStoreChecker checker = new HiveMetaStoreChecker(hive);
Table table = new Table(dbName, tableName);
table.setDbName(dbName);
table.setInputFormatClass(TextInputFormat.class);
table.setOutputFormatClass(HiveIgnoreKeyTextOutputFormat.class);
table = hive.getTable(dbName, tableName);
checker.checkMetastore(dbName, tableName, null, result);
System.out.println(table.getDataLocation());
List<CheckResult.PartitionResult> partitionNotInMs = result.getPartitionsNotInMs();
System.out.println("not in ms " + partitionNotInMs.size());
List<org.apache.hadoop.hive.ql.metadata.Partition> partitions = hive.getPartitions(table);
System.out.println("partitions size " + partitions.size());
AddPartitionDesc apd = new AddPartitionDesc(table.getDbName(), table.getTableName(), false);
List<String> finalListOfPartitionsNotInMs = new ArrayList<String>();
for (CheckResult.PartitionResult part : partitionNotInMs){
if(!finalListOfPartitionsNotInMs.contains(part.getPartitionName().replace("/",""))){
finalListOfPartitionsNotInMs.add(part.getPartitionName().replace("/",""));
}
}
for (String partition:finalListOfPartitionsNotInMs) {
apd.addPartition(Warehouse.makeSpecFromName(partition), table.getDataLocation().toString());
}
hive.createPartitions(apd);
} catch (HiveException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (MetaException e) {
e.printStackTrace();
}
}
}
Any kind of help would be appreciated.
Thanks.

MSCK REPAIR is failing on HIVE? If yes then check if the Partition Column name is in CAPITAL Letters. I found the same issue where my PARTITION on aws s3 was like DCA=1000.
If that is the case then execute MSCK REPAIR using Spark SQL and it will owrk, in case you don't want to rename the partition into lower case.

Related

error when get value from marketdataincremental refresh

I got an error in my quickfixj Application. First, I got an error like this:
Out of order repeating group members
After that, I added this text into my initiator.config:
ValidateUserDefinedFields=N
ValidateIncomingMessage=N
But now I got another error in my application:
quickfix.FieldNotFound: Field was not found in message, field=55
at quickfix.FieldMap.getField(FieldMap.java:223)
at quickfix.FieldMap.getString(FieldMap.java:237)
at com.dxtr.fastmatch.marketdatarequestapps.TestMarketdataRequest.fromApp(TestMarketdataRequest.java:38)
at quickfix.Session.fromCallback(Session.java:1847)
at quickfix.Session.verify(Session.java:1791)
at quickfix.Session.verify(Session.java:1862)
at quickfix.Session.next(Session.java:1047)
at quickfix.Session.next(Session.java:1204)
at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:163)
at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:113)
at quickfix.mina.SingleThreadedEventHandlingStrategy.lambda$blockInThread$1(SingleThreadedEventHandlingStrategy.java:145)
at quickfix.mina.SingleThreadedEventHandlingStrategy$ThreadAdapter$RunnableWrapper.run(SingleThreadedEventHandlingStrategy.java:267)
at java.lang.Thread.run(Thread.java:748)
My code for get value of symbols is :
public void fromApp(quickfix.Message message, SessionID sessionID)
throws FieldNotFound, IncorrectDataFormat, IncorrectTagValue, UnsupportedMessageType {
try {
String symbol = message.getString(Symbol.FIELD);
System.out.println(" FromApp " + message);
message.getString(TransactTime.FIELD);
// String seqNo = message.getString(MsgSeqNum.FIELD);
double bid = message.getDouble(MDEntryPx.FIELD);
double ask = message.getDouble(MDEntryPx.FIELD);
// System.out.println(seqNo + " " + message);
} catch (FieldNotFound fieldNotFound) {
fieldNotFound.printStackTrace();
}
}
I have also using this code
public void onMessage (MarketDataIncrementalRefresh message, SessionID sessionID) throws FieldNotFound{
try
{
MDReqID mdreqid = new MDReqID();
SendingTime sendingtime = new SendingTime();
NoMDEntries nomdentries = new NoMDEntries();
quickfix.fix42.MarketDataIncrementalRefresh.NoMDEntries group
= new quickfix.fix42.MarketDataIncrementalRefresh.NoMDEntries();
MDUpdateAction mdupdateaction = new MDUpdateAction();
DeleteReason deletereason = new DeleteReason();
MDEntryType mdentrytype = new MDEntryType();
MDEntryID mdentryid = new MDEntryID();
Symbol symbol = new Symbol();
MDEntryOriginator mdentryoriginator = new MDEntryOriginator();
MDEntryPx mdentrypx = new MDEntryPx();
Currency currency = new Currency();
MDEntrySize mdentrysize = new MDEntrySize();
ExpireDate expiredate = new ExpireDate();
ExpireTime expiretime = new ExpireTime();
NumberOfOrders numberoforders = new NumberOfOrders();
MDEntryPositionNo mdentrypositionno = new MDEntryPositionNo();
message.getField(nomdentries);
message.getField(sendingtime);
message.getGroup(1, group);
int list = nomdentries.getValue();
for (int i = 0; i < list; i++)
{
message.getGroup(i + 1, group);
group.get(mdupdateaction);
if (mdupdateaction.getValue() == '2')
System.out.println("Enter");
group.get(deletereason);
group.get(mdentrytype);
group.get(mdentryid);
group.get(symbol);
group.get(mdentryoriginator);
if (mdupdateaction.getValue() == '0')
group.get(mdentrypx);
group.get(currency);
if (mdupdateaction.getValue() == '0')
group.get(mdentrysize);
}
System.out.printf("Got Symbol {0} Price {1}",
symbol.getValue(), mdentrypx.getValue());
}catch (Exception ex)
{
System.out.println("error" + ex);
}
but i also get error like this
quickfix.FieldNotFound: Field was not found in message, field=55
at quickfix.FieldMap.getField(FieldMap.java:223)
at quickfix.FieldMap.getString(FieldMap.java:237)
at com.dxtr.fastmatch.marketdatarequestapps.TestMarketdataRequest.fromApp(TestMarketdataRequest.java:39)
at quickfix.Session.fromCallback(Session.java:1847)
at quickfix.Session.verify(Session.java:1791)
at quickfix.Session.verify(Session.java:1862)
at quickfix.Session.next(Session.java:1047)
at quickfix.Session.next(Session.java:1204)
at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:163)
at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:113)
at quickfix.mina.SingleThreadedEventHandlingStrategy.lambda$blockInpacket_write_wait: Connection to 3.13.235.241 port 22: Broken pipe
and here the value i check in my message.log
8=FIX.4.2^A9=0217^A35=X^A34=7291^A49=Fastmatch1^A52=20200401-10:47:59.833^A56=MDValueTrade2UAT1^A262=VT_020^A268=02^A279=2^A55=GBP/CHF^A269=0^A278=1140851192^A270=1.19503^A271=02000000^A279=0^A55=GBP/CHF^A269=0^A278=1140851194^A270=1.19502^A271=06000000^A10=114^A
my broker have send to me the price and etc
My question is: how to fix my problem from this code ?

First, I got an error like this:
Out of order repeating group members
Your data dictionary doesn't match your counterparty's. Fix that and this will go away.
After that, I added this text into my initiator.config:
ValidateUserDefinedFields=N
ValidateIncomingMessage=N
This did not fix anything -- it HIDES your actual problem and has you looking at a new fake problem.
What you need to do:
Your configuration has this, right?
UseDataDictionary=Y
DataDictionary=path/to/FIXnn.xml
# or if FIX5:
AppDataDictionary=path/to/FIX5n.xml
TransportDataDictionary=path/to/FIXT.xml
Find your counterparty's documentation, and make sure your xml file's messages and fields match what they say they're going to send you. Make sure all repeating groups have the same fields in the same order.
Here is some documentation about how the Data Dictionary xml file is structured. It's pretty easy.

Limit number of total records that a JdbcPagingItemReader can read in a Spring Batch project

I am trying to read a database table that has millions of records and I am using JdbcPagingItemReader for that reason.
However, I am at the testing phase for now and I am trying to limit number of total records that I can read using JdbcPagingItemReader.
I know this should be simple, its just hiding somewhere.
This is how my reader looks like:
#Bean (name = "metadataItemReader")
public ItemReader<DocumentMetadata> itemReader( #Value( "${count}" ) int count ) {
JdbcPagingItemReader<DocumentMetadata> reader = new JdbcPagingItemReader<>();
final SqlPagingQueryProviderFactoryBean sqlPagingQueryProviderFactoryBean = new SqlPagingQueryProviderFactoryBean();
sqlPagingQueryProviderFactoryBean.setDataSource(dataSource);
sqlPagingQueryProviderFactoryBean.setSelectClause("select id, file_path, file_name, extension, created_by, TO_CHAR(create_date, 'yyyy-mm-dd hh24:mi:ss') as create_date");
sqlPagingQueryProviderFactoryBean.setFromClause("from document_metadata");
//sqlPagingQueryProviderFactoryBean.l
sqlPagingQueryProviderFactoryBean.setSortKey("id");
try {
reader.setQueryProvider(sqlPagingQueryProviderFactoryBean.getObject());
} catch (Exception e) {
log.error(e.getMessage());
}
reader.setDataSource(dataSource);
reader.setPageSize(10);
reader.setRowMapper( new MetadataRowMapper() );
return reader;
}

The JdbcPagingItemReader is a AbstractItemCountingItemStreamItemReader bean, it has a maxItemCount property which you can set in order to achieve what you want.

How to mass delete multiple rows in hbase?

I have the following rows with these keys in hbase table "mytable"
user_1
user_2
user_3
...
user_9999999
I want to use the Hbase shell to delete rows from:
user_500 to user_900
I know there is no way to delete, but is there a way I could use the "BulkDeleteProcessor" to do this?
I see here:
https://github.com/apache/hbase/blob/master/hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/TestBulkDeleteProtocol.java
I want to just paste in imports and then paste this into the shell, but have no idea how to go about this. Does anyone know how I can use this endpoint from the jruby hbase shell?
Table ht = TEST_UTIL.getConnection().getTable("my_table");
long noOfDeletedRows = 0L;
Batch.Call<BulkDeleteService, BulkDeleteResponse> callable =
new Batch.Call<BulkDeleteService, BulkDeleteResponse>() {
ServerRpcController controller = new ServerRpcController();
BlockingRpcCallback<BulkDeleteResponse> rpcCallback =
new BlockingRpcCallback<BulkDeleteResponse>();
public BulkDeleteResponse call(BulkDeleteService service) throws IOException {
Builder builder = BulkDeleteRequest.newBuilder();
builder.setScan(ProtobufUtil.toScan(scan));
builder.setDeleteType(deleteType);
builder.setRowBatchSize(rowBatchSize);
if (timeStamp != null) {
builder.setTimestamp(timeStamp);
}
service.delete(controller, builder.build(), rpcCallback);
return rpcCallback.get();
}
};
Map<byte[], BulkDeleteResponse> result = ht.coprocessorService(BulkDeleteService.class, scan
.getStartRow(), scan.getStopRow(), callable);
for (BulkDeleteResponse response : result.values()) {
noOfDeletedRows += response.getRowsDeleted();
}
ht.close();
If there exists no way to do this through JRuby, Java or alternate way to quickly delete multiple rows is fine.

Do you really want to do it in shell because there are various other better ways. One way is using the native java API
Construct an array list of deletes
pass this array list to Table.delete method
Method 1: if you already know the range of keys.
public void massDelete(byte[] tableName) throws IOException {
HTable table=(HTable)hbasePool.getTable(tableName);
String tablePrefix = "user_";
int startRange = 500;
int endRange = 999;
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} finally {
if (hbasePool != null && table != null) {
hbasePool.putTable(table);
}
}
}
Method 2: If you want to do a batch delete on the basis of a scan result.
public bulkDelete(final HTable table) throws IOException {
Scan s=new Scan();
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
//add your filters to the scanner
s.addFilter();
ResultScanner scanner=table.getScanner(s);
for (Result rr : scanner) {
Delete d=new Delete(rr.getRow());
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} catch (Exception e) {
LOGGER.log(e);
}
}
Now coming down to using a CoProcessor. only one advice, 'DON'T USE CoProcessor' unless you are an expert in HBase.
CoProcessors have many inbuilt issues if you need I can provide a detailed description to you.
Secondly when you delete anything from HBase it's never directly deleted from Hbase there is tombstone marker get attached to that record and later during a major compaction it gets deleted, so no need to use a coprocessor which is highly resource exhaustive.
Modified code to support batch operation.
int batchSize = 50;
int batchCounter=0;
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
batchCounter++;
if(batchCounter==batchSize){
try {
table.delete(listOfBatchDelete);
listOfBatchDelete.clear();
batchCounter=0;
}
}}
Creating HBase conf and getting table instance.
Configuration hConf = HBaseConfiguration.create(conf);
hConf.set("hbase.zookeeper.quorum", "Zookeeper IP");
hConf.set("hbase.zookeeper.property.clientPort", ZookeeperPort);
HTable hTable = new HTable(hConf, tableName);

If you already aware of the rowkeys of the records that you want to delete from HBase table then you can use the following approach
1.First create a List objects with these rowkeys
for (int rowKey = 1; rowKey <= 10; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes(rowKey + "")));
}
2.Then get the Table object by using HBase Connection
Table table = connection.getTable(TableName.valueOf(tableName));
3.Once you have table object call delete() by passing the list
table.delete(deleteList);
The complete code will look like below
Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
config.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
String tableName = "users";
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(tableName));
List<Delete> deleteList = new ArrayList<Delete>();
for (int rowKey = 500; rowKey <= 900; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes("user_" + rowKey)));
}
table.delete(deleteList);

How to create a BigQuery dataset and a table/schema from Java client (without CSV file)

I think the method starting line 200 here is relevant (edit: I needed to add a parameter to the line
Insert insertReq = bigquery.jobs().insert(PROJECT_ID, insertJob);
) but it doesn't work. I get "Load configuration must specify at least one source URI"
I have tried the following:
TableSchema schema = new TableSchema();
List<TableFieldSchema> tableFieldSchema = new ArrayList<TableFieldSchema>();
TableFieldSchema schemaEntry = new TableFieldSchema();
schemaEntry.setName(myFirstFieldName);
schemaEntry.setType("STRING");
tableFieldSchema.add(schemaEntry);
schema.setFields(tableFieldSchema);
Table table = new Table();
table.setSchema(schema);
table.setId(tableName);
table.setCreationTime(System.currentTimeMillis());
table.setKind("bigquery#table");
try {
bigquery.tables().insert(PROJECT_ID, DATASET_ID, table).execute();
} catch (IOException e) {
}
but I get an error Required parameter is missing

OK based on the idea by Jordan Tigani, here is the Java code that works to create a blank table in BigQuery with Google Java API Client:
TableSchema schema = new TableSchema();
List<TableFieldSchema> tableFieldSchema = new ArrayList<TableFieldSchema>();
TableFieldSchema schemaEntry = new TableFieldSchema();
schemaEntry.setName(myFirstFieldName);
schemaEntry.setType("STRING");
tableFieldSchema.add(schemaEntry);
schema.setFields(tableFieldSchema);
Table table = new Table();
table.setSchema(schema);
TableReference tableRef = new TableReference();
tableRef.setDatasetId(DATASET_ID);
tableRef.setProjectId(PROJECT_ID);
tableRef.setTableId(tableId);
table.setTableReference(tableRef);
try {
bigquery.tables().insert(PROJECT_ID, DATASET_ID, table).execute();
} catch (IOException e) {
}
To create a dataset (before creating the table)
Dataset dataset = new Dataset();
DatasetReference datasetRef = new DatasetReference();
datasetRef.setProjectId(PROJECT_ID);
datasetRef.setDatasetId(DATASET_ID);
dataset.setDatasetReference(datasetRef);
try {
bigquery.datasets().insert(PROJECT_ID, dataset).execute();
} catch (IOException e) {
}

Try setting the project id and the dataset id on the table (I realize that it seems redundant becuase you specify them on the insert() operation, but that is a quirk of REST ... the project and dataset are part of the URL, but they are also part of the resource.
From a raw HTTP api level, the following works:
https://www.googleapis.com/bigquery/v2/projects/myproject/datasets/mydataset/tables?alt=json
{"tableReference":
{"tableId": "dfdlkfjx", "projectId": "myproject", "datasetId": "mydataset"},
"schema":
{"fields": [{"name": "a", "type": "STRING"}]}}

MongoDB Java driver: Undefined values are not shown

Open mongo shell and create a document with a undefined value:
> mongo
MongoDB shell version: 2.4.0
connecting to: test
> use mydb
switched to db mydb
> db.mycol.insert( {a_number:1, a_string:"hi world", a_null:null, an_undefined:undefined} );
> db.mycol.findOne();
{
"_id" : ObjectId("51c2f28a7aa5079cf24e3999"),
"a_number" : 1,
"a_string" : "hi world",
"a_null" : null,
"an_undefined" : null
}
As we can see, javascript translates the "undefined" value (stored in the db) to a "null" value, when showing it to the user. But, in the db, the value is still "undefined", as we are going to see with java.
Let's create a "bug_undefined_java_mongo.java" file, with the following content:
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.MongoClient;
public class bug_undefined_java_mongo
{
String serv_n = "myserver"; // server name
String db_n = "mydb"; // database name
String col_n = "mycol"; // collection name
public static void main(String[] args)
{
new bug_undefined_java_mongo().start();
}
public void start()
{
pr("Connecting to server ...");
MongoClient cli = null;
try
{
cli = new MongoClient( serv_n );
}
catch (Exception e)
{
pr("Can't connecto to server: " + e);
System.exit(1);
}
if (cli == null)
{
pr("Can't connect to server");
System.exit(1);
}
pr("Selecting db ...");
DB db_res = cli.getDB( db_n );
pr("Selecting collection ...");
DBCollection col = db_res.getCollection( col_n );
pr("Searching documents ...");
DBCursor cursor = null;
try
{
cursor = col.find( );
}
catch (Exception e)
{
pr("Can't search for documents: " + e);
System.exit(1);
}
pr("Printing documents ...");
try
{
while (cursor.hasNext())
{
Object doc_obj = cursor.next();
System.out.println("doc: " + doc_obj);
}
}
catch (Exception e)
{
pr("Can't browse documents: " + e);
return;
}
finally
{
pr("Closing cursor ...");
cursor.close();
}
}
public void pr(String cad)
{
System.out.println(cad);
}
}
After compiling and running it, we get this:
Connecting to server ...
Selecting db ...
Selecting collection ...
Searching documents ...
Printing documents ...
doc: { "_id" : { "$oid" : "51c2f0f85353d3425fcb5a14"} , "a_number" : 1.0 , "a_string" : "hi world" , "a_null" : null }
Closing cursor ...
We see that the "a_null:null" pair is shown, but... the "an_undefined:undefined" pair has disappeared! (both the key and the value).
Why? Is it a bug?
Thank you

Currently undefined is not supported by the java driver as there is no equivalent mapping in java.
Other drivers such as pymongo and the js shell handles this differently by casting undefined to None when representing the data, however it is a separate datatype and is deprecated in the bson spec.
If you need it in the java driver then you will have to code your own decoder factory and then set it like so:
collection.setDBDecoderFactory(MyDecoder.FACTORY);
A minimal example that has defined handling for undefined and factory is available on github in the horn of mongo repo.

I see, creating a factory could be a solution.
Anyway, probably many developers would find it useful the posibility of enabling a mapping in the driver to convert automatically "undefined" values to "null" value. For example, by calling a mapUndefToNull() method:
cli = new MongoClient( myserver );
cli.mapUndefToNull(true);
In my case, I'm running a MapReduce (it is Javascript code) on my collections, and I am having to explicitly convert the undefined values (generated when accessing to non existent keys) to null, in order to avoid Java driver to remove it:
try { value = this[ key ] } catch(e) {value = null}
if (typeof value == "undefined") value = null; // avoid Java driver to remove it
So, as a suggestion, I'd like the mapUndefToNull() method to be added to the Java driver. If possible.
Thank you

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Hive-MetaStore issue during updating table partitions - java

Related

error when get value from marketdataincremental refresh

Limit number of total records that a JdbcPagingItemReader can read in a Spring Batch project

How to mass delete multiple rows in hbase?

How to create a BigQuery dataset and a table/schema from Java client (without CSV file)

MongoDB Java driver: Undefined values are not shown

Categories

Resources