cqlsh does not show the frozen collection in the table - java

CREATE TABLE data.banks (
id text,
codes frozen<map<text, text>>
PRIMARY KEY (id,codes));
Added a corresponding model class with #Frozen("map<text, text>") anotation on codes field
Insert goes in properly but when i open cqlsh and run
select * from data.banks i get following error
Traceback (most recent call last):
File "/usr/bin/cqlsh", line 1078, in perform_simple_statement
rows = self.session.execute(statement, trace=self.tracing_enabled)
File "/usr/share/cassandra/lib/cassandra-driver-internal-only-2.6.0c2.post.zip/cassandra-driver-2.6.0c2.post/cassandra/cluster.py", line 1594, in execute
result = future.result(timeout)
File "/usr/share/cassandra/lib/cassandra-driver-internal-only-2.6.0c2.post.zip/cassandra-driver-2.6.0c2.post/cassandra/cluster.py", line 3296, in result
raise self._final_exception
error: unpack requires a string argument of length 4
One more problem is when i add a row with values ('1',{'code2':'435sdfd','code1':'2132sd'}). It shows one row inserted. But when I add another row with ('1',{'code2':'435sdfe','code1':'2132sd'}) .
It throws TimedOut Exception.
Using cassandra 2.1.8 , cassandra-driver-mapping 2.1.8 , kundera-cassandra-pelops 3.0 version.

Related

Connection and read data from elasticsearch to hive

I want to connect hive to elasticsearch. I followed the instruction from here.
I do the following steps
1. start-dfs.sh
2. start-yarn.sh
3. launch elasticsearch
4. launch kibana
5. launch hive
inside hive
a- create a database
b- create a table
c- load data into the table (LOAD DATA LOCAL INPATH '/home/myuser/Documents/datacsv/myfile.csv' OVERWRITE INTO TABLE students; )
d- add jar /home/myuser/elasticsearch-hadoop-7.10.1/dist/elasticsearch-hadoop-hive-7.10.1.jar
e- create a table for Elastic.
create table students_es (stt int not null, mahocvien varchar(10), tenho string, ten string, namsinh date, gioitinh string, noisinh string, namvaodang date, trinhdochuyenmon string, hesoluong float, phucaptrachnhiem float, chucvudct string, chucdqh string, dienuutien int, ghichu int) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.nodes' = '127.0.0.1', 'es.port' = '9201', 'es.resource' = 'students/student');
f- insert overwrite table students_es select * from students;
Then the error I got is the following
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org/apache/commons/httpclient/protocol/ProtocolSocketFactory
I used the components
kibana: 7.10.1
hive : 3.1.2
hadoop: 3.1.2
I finally found how to solve it.
You need to download the jar file commons-httpclient-3.1.jar and put it into
your hive lib directory.

Push Data to Existing Table in BigQuery

I am using Java and SQL to push data to a Timestamp partitioned table in BigQuery. In my code, I specify the destination table:
.setDestinationTable(TableId.of("MyDataset", "MyTable"))
When I run it, it creates a table perfectly. However, when I attempt to insert new data, it throws a BigQueryException claiming the table already exists:
Exception in thread "main" com.google.cloud.bigquery.BigQueryException:
Already Exists: Table MyProject:MyDataset.MyTable
After some documentation digging, I found a solution that works:
.setWriteDisposition(WriteDisposition.WRITE_APPEND)
Adding the above appends any data (even if it's duplicate). I'm not sure why the default setting for .setDestinationTable() is the equivalent of WRITE_EMPTY, which returns a duplicate error. The google docs for .setDestinationTable() are:
Describes the table where the query results should be stored. If not
present, a new table will be created to store the results.
The docs should probably clarify the default value.

tracing a sql server stored procedure not appearing in profiler

I am trying to trace the execution of a sql server stored procedure that gets executed from within a large java application. I first changed the stored procedure to complete the task I needed to do and I tested within sql server and it worked fine, with all triggers working too. However, when I apply the changes and run the code it seems like the changed stored procedure is not executed but just the original version. I tried to use profile to trace the execution but the weird thing is that it does not appear. Again testing the inserts manually does then show the procedure in profiler.
I tried to trace the code that implements this procedure, found it and wrote some trace comments to tag the execution but again the class trace commenting does not occur. As if the class does not get called. I have added the procedure just in case someone sees a fault but like I said it works fine testing in SQL server.
BEGIN
DECLARE #KeepStoreLingerTime int
DECLARE #KeepDeleteLingerTime int
DECLARE #StoreActionID int
DECLARE #insertedID int;
IF #TimeToExecute IS NULL
BEGIN
SELECT #KeepStoreLingerTime = [Value] FROM SystemProperty WHERE [Name] = 'KeepStoreLingerTimeInMinutes'
SELECT #KeepDeleteLingerTime = [Value] FROM SystemProperty WHERE [Name] = 'KeepDeleteLingerTimeInMinutes'
IF (#KeepDeleteLingerTime >= #KeepStoreLingerTime) SET #KeepStoreLingerTime = #KeepDeleteLingerTime+1
SET #TimeToExecute = dateadd(mi, #KeepStoreLingerTime, getutcdate())
END
SELECT #StoreActionID = [ID] FROM StoreActionQueue
WHERE Parameter=#Parameter AND StorageRuleID=#StorageRuleID AND StoreFlags=#StoreFlags
IF #StoreActionID IS NOT NULL AND #StoreFlags != 11
BEGIN
UPDATE StoreActionQueue SET FilterID = #FilterID WHERE [ID] = #StoreActionID
END
ELSE
BEGIN
INSERT INTO StoreActionQueue (TimeToExecute, Operation, Parameter, StorageRuleID, FilterID, StoreFlags)
SELECT #TimeToExecute, #Operation, #Parameter, #StorageRuleID, #FilterID, #StoreFlags FROM Call WHERE [ID]=#Parameter AND Active=0
AND (OnlineCount>0 OR OnlineScreenCount>0 OR (Flags&POWER(2,26)=POWER(2,26))) --bit26 indicates META-DATA-ONLY
---- INSERT if a row is successfully inserted
IF ##ROWCOUNT > 0
IF #StoreFlags = 11
BEGIN
INSERT INTO StoreActionQueue (TimeToExecute, Operation, Parameter,
StorageRuleID, FilterID, StoreFlags)
SELECT DATEADD(mi, 2, GETUTCDATE()), #Operation, #Parameter,
#StorageRuleID, #FilterID, 9 FROM Call
WHERE [ID]=#Parameter AND Active=0
END
END
END
Is there a way to log anything from lets say within the procedure to a file? In java (eclipse) other than looking at call hierarchy etc. is there a suggestion on how to match where the class gets called.
public class ProcessingController extends TimeController implements
TimeObserver {

Hive throws an error while creating table "Cannot validate serde: com.cloudera.hive.serde.JSONSerDe"

Working on apache-hive-0.13.1.
while creating table hive throw an error as below
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: com.cloudera.hive.serde.JSONSerDe
table structure is
create external table tweets(id BigInt, created_at String, scource String, favorited Boolean, retweet_count int,
retweeted_status Struct <
text:String,user:Struct<
screen_name:String, name:String>>,
entities Struct<
urls:Array<Struct<
expanded_url:String>>,
user_mentions:Array<Struct<
screen_name:String,
name:String>>,
hashtags:Array<Struct<text:String>>>,
text String,
user Struct<
screen_name:String,
name:String,
friends_count:int,
followers_count:int,
statuses_count:int,
verified:boolean,
utc_offset:int,
time_zone:String> ,
in_reply_to_screen_name String)
partitioned by (datehour int)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
location '/home/edureka/sachinG'
Added a json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar in class to resolved the issue but no success
Finally , got a solution for this. The issue is with json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar
Different distribution (Cloudera, Azure, etc ) needed different JSON-Serde jar file. Means, serde jar should be compatible to there distribution.
I changed jar and it worked for me.
I faced a similar issue while working with hive 1.2.1 and hbase 0.98. I followed below steps and the issue was resolved.
1) Copied all the hbase-* files from hbase/lib location to hive/lib directory
2) Verified that the hive-hbase-handler-1.2.1.jar was present in hive/lib
3) Verified that hive-serde-1.2.1.jar was present in hive/lib
4) Verified that zookeeper-3.4.6.jar was present in hive/lib(If not copy from hbase/lib and paste to hive/lib)
5) In the hive-site.xml(If not present use hive-default.xml.template) located at hive/conf under 'both'
a) hive.aux.jars.path field and
b) hive.added.jars.path field
give path '/usr/local/hive/lib/'.
6) Open hive terminal and create the table using below command:-
CREATE TABLE emp_hive (
RowKey_HIVE String,
Employee_No int,
Employee_Name String,
Job String,
Mgr int,
Hiredate String,
Salary int,
Commision int,
Department_No int
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,details:Employee_No_hbase,details:Employee_Name_hbase,details:job_hbase,details:Mgr_hbase,details:hiredate_hbase,details:salary_hbase,details:commision_hbase,details:department_no_hbase")
TBLPROPERTIES("hbase.table.name"="emp_hbase");

Error in importing a tsv to hbase

I created a table in hbase using:
create 'Province','ProvinceINFO'
Now, I want to import my data from a tsv file to it. My table in tsv have two columns: ProvinceID (as pk), ProvinceName
I am using the below code for import:
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,'
-Dimporttsv.columns= HBASE_ROW_KEY, ProvinceINFO:ProvinceName Province /usr/data
/Province.csv
but it gives me this error:
ERROR: No columns specified. Please specify with -Dimporttsv.columns=...
Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>
Imports the given input directory of TSV data into the specified table.
The column names of the TSV data must be specified using the -Dimporttsv.columns
option. This option takes the form of comma-separated column names, where each
column name is either a simple column family, or a columnfamily:qualifier. The special
column name HBASE_ROW_KEY is used to designate that this column should be used
as the row key for each imported record. You must specify exactly one column
to be t he row key, and you must specify a column name for every column that exists in
the
input data. Another special columnHBASE_TS_KEY designates that this column should be
used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.
You must specify at most one column as timestamp key for each imported record.
Record with invalid timestamps (blank, non-numeric) will be treated as bad record.
Note: if you use this option, then 'importtsv.timestamp' option will be ignored.
By default importtsv will load data directly into HBase. To instead generate
HFiles of data to prepare for a bulk data load, pass the option:
-Dimporttsv.bulk.output=/path/for/output
Note: if you do not use this option, then the target table must already exist in HBase
Other options that may be specified with -D include:
-Dimporttsv.skip.bad.lines=false - fail if encountering an invalid line
'-Dimporttsv.separator=|' - eg separate on pipes instead of tabs
-Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import
-Dimporttsv.mapper.class=my.Mapper - A user-defined Mapper to use instead of
org.apache.hadoop.hbase.mapreduce.TsvImporterMapper
-Dmapred.job.name=jobName - use the specified mapreduce job name for the import
For performance consider the following options:
-Dmapred.map.tasks.speculative.execution=false
-Dmapred.reduce.tasks.speculative.execution=false
Maybe also try wrapping column into a string, i.e.
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=','
-Dimporttsv.columns="HBASE_ROW_KEY, ProvinceINFO:ProvinceName" Province /usr/data
/Province.csv
You should try something like:
bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=','
-Dimporttsv.columns= HBASE_ROW_KEY, ProvinceINFO:ProvinceName Province /usr/data
/Province.csv
Try to remove the spaces in -Dimporttsv.columns=a,b,c.

Categories

Resources