Adding new columns to Schema of BigQuery Table in Java

Adding new columns to Schema of BigQuery Table in Java - java

I am looking for ways to update Schema of an existing table in BigQuery. I could see doing the same in python here which is an API Request. I wanted to see this in Java and going through documentation and source code I could find
TableDefinition tableDefinition = StandardTableDefinition.of(schema);
table.toBuilder().setDefinition(definition)
But it rewrites the whole schema. Other possible of ways of updating schema can be found here
Can someone guide me regarding adding new columns to existing table in BigQuery using Java ?

Have a look in this github issue. You need to specify the entire schema again -including your new cols.

Related

saving dataset to cassandra using java spark

I'm trying to save a dataset to cassandra db using java spark.
I'm able to read data into dataset successfully using the below code
Dataset<Row> readdf = sparkSession.read().format("org.apache.spark.sql.cassandra")
.option("keyspace","dbname")
.option("table","tablename")
.load();
But when I try to write dataset I'm getting IOException: Could not load or find table, found similar tables in keyspace
Dataset<Row> dfwrite= readdf.write().format("org.apache.spark.sql.cassandra")
.option("keyspace","dbname")
.option("table","tablename")
.save();
I'm setting host and port in sparksession
The thing is I'm able to write in overwrite and append modes but not able to create table
Versions which I'm using are below:
spark java 2.0
spark cassandra connector 2.3
Tried with different jar versions but nothing worked
I have also gone through different stack overflow and github links
Any help is greatly appreciated.

The write operation in Spark doesn't have a mode that will automatically create a table for you - there are multiple reasons for that. One of them is that you need to define a primary key for your table, otherwise, you may just overwrite data if you set incorrect primary key. Because of this, Spark Cassandra Connector provides a separate method to create a table based on your dataframe structure, but you need to provide a list of partition & clustering key columns. In Java it will look as following (full code is here):
DataFrameFunctions dfFunctions = new DataFrameFunctions(dataset);
Option<Seq<String>> partitionSeqlist = new Some<>(JavaConversions.asScalaBuffer(
Arrays.asList("part")).seq());
Option<Seq<String>> clusteringSeqlist = new Some<>(JavaConversions.asScalaBuffer(
Arrays.asList("clust", "col2")).seq());
CassandraConnector connector = new CassandraConnector(
CassandraConnectorConf.apply(spark.sparkContext().getConf()));
dfFunctions.createCassandraTable("test", "widerows6",
partitionSeqlist, clusteringSeqlist, connector);
and then you can write data as usual:
dataset.write()
.format("org.apache.spark.sql.cassandra")
.options(ImmutableMap.of("table", "widerows6", "keyspace", "test"))
.save();

Hbase Single Query to Return all the column from Multiple Column Family

Iam new to the Hbase. I would to like to know how to retrieve all the columns from two column family in Single Query using Java API, and please provide me the link which will give the brief about Hbase internal architecture.

You can adding multiple column families with "addFamily" method.
Get get = new Get(rowKey);
get.addFamily(columnFamily1);
get.addFamily(columnFamily2);
(if you use the "scan")
Scan scan = new Scan(startRowKey, stopRowKey);
scan.addFamily(columnFamily1);
scan.addFamily(columnFamily2);
And about the document, you can find everything in hbase book which in the hbase website.
https://hbase.apache.org/2.0/book.html

How to fix the Exception: Saving data in the Hive serde table Please use the insertInto() API as an alternative. Spark:2.1.0

We are trying to save Dataframe to a Hive Table using the saveAsTable() method. But, We are getting the below exception. We are trying to store the data as TextInputFormat.
Exception in thread "main" org.apache.spark.sql.AnalysisException: Saving data in the Hive serde table `cdx_network`.`inv_devices_incr` is not supported yet. Please use the insertInto() API as an alternative..;
reducedFN.write().mode(SaveMode.Append).saveAsTable("cdx_network.alert_pas_incr");
I tried insertInto() and also enableHiveSupport() and it works. But, I want to use saveAsTable() .
I want to understand why the saveAsTable() does not work. I tried going through the documentation and also the code. Did not get much understanding. It supposed to be working. I have seen issues raised by people who are using Parquet format but for TextFileInputFormat i did not see any issues.
Table definition
CREATE TABLE `cdx_network.alert_pas_incr`(
`alertid` string,
`alerttype` string,
`alert_pas_documentid` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'maprfs:/apps/cdx-dev/alert_pas_incr'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1524121971')

Looks like this is bug. I made a little research and found this issue SPARK-19152. Fixed version is 2.2.0. Unfortunately I can’t verify it, cause my company’s cluster uses version 2.1.0

Export data from Big Query table to GCS across projects - java API

I'm trying to export data from a Big Query table to GCS by Java API.
The Big Query table is the ProjectA while the GCS bucket in ProjectB and I have 2 different accounts (keys) to access them.
It seems there is no way in JobConfigurationExtractor object to specify destination credentials and the project details, just for Big Query object/table.
Is there a way to overcome this limitation? Anyone experiencing similar issues?
Snippet code
JobConfigurationExtract extract =
new JobConfigurationExtract().setSourceTable(table).setDestinationUri(cloudStoragePath);
return bigquery
.jobs()
.insert(
table.getProjectId(),
new Job().setConfiguration(new JobConfiguration().setExtract(extract)))
.execute();
}
Thanks!

Set up Accumulo table through api

new to Accumulo, and this may sound silly, but I was wondering how to setup a table through the api? The documentation is definitely lacking. I have been able to find
conn.tableOperations().createTable("myTable");
as well as like setting up locality groups:
HashSet<Text> metadataColumns = new HashSet<Text>();
metadataColumns.add(new Text("domain"));
metadataColumns.add(new Text("link"));
HashSet<Text> contentColumns = new HashSet<Text>();
contentColumns.add(new Text("body"));
contentColumns.add(new Text("images"));
localityGroups.put("metadata", metadataColumns);
localityGroups.put("content", contentColumns);
conn.tableOperations().setLocalityGroups("mytable", localityGroups);
Map<String, Set<Text>> groups =
conn.tableOperations().getLocalityGroups("mytable");
From the documentation, but I want to know how to take the first approach and build the table. Then build the columns.
Thanks in advance!

There is no inherent schema for a table to set up. Once it is created using the API you found, you can insert whatever key-value pairs you wish in it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Adding new columns to Schema of BigQuery Table in Java - java

Have a look in this github issue. You need to specify the entire schema again -including your new cols.

Related

saving dataset to cassandra using java spark

Hbase Single Query to Return all the column from Multiple Column Family

How to fix the Exception: Saving data in the Hive serde table Please use the insertInto() API as an alternative. Spark:2.1.0

Export data from Big Query table to GCS across projects - java API

Set up Accumulo table through api

Categories

Resources