BigQuery Java client - how to query external ( federated ) table?

BigQuery Java client - how to query external ( federated ) table? - java

I am using the new google-cloud-bigquery and google-cloud-storage api. I want to query an external table, which I created like this:
ExternalTableDefinition etd = ExternalTableDefinition.newBuilder(bucketPath, schema, FormatOptions.csv()).build();
TableId tableID = TableId.of(dataset, targetTableName);
TableInfo tableInfo = TableInfo.newBuilder(tableID, etd).build();
Now I want to query this table, but to have it as a temporary one, using the QueryRequest:
QueryRequest queryRequest = QueryRequest.newBuilder("select * from table limit 10 ").setUseLegacySql(true).setDefaultDataset(dataset).build();
QueryResponse response = client.query(queryRequest);
But it fails due to a table not exists, which makes sense. I am trying to do something similar to this command line:
bq query --project_id=<project ID> --external_table_definition=wikipedia::/tmp/wikipedia 'select name from wikipedia where name contains "Alex";'
but in Java.
To summarize: how do I create and query an external temporary table through Java client for big query?
Reference from documentation:
https://cloud.google.com/bigquery/external-data-sources#temporary-tables

For reference, here is the way to do it:
public static void main(String[] args)
{
BigQuery bigquery =
BigQueryOptions.getDefaultInstance().getService();
ExternalTableDefinition etd =
ExternalTableDefinition.newBuilder("gs://", createSchema(),
FormatOptions.csv()).build();
TableId tableId = TableId.of("testdataset", "mytablename");
bigquery.create(TableInfo.newBuilder(tableId, etd).build());
//Now query the table project.testdataset.mytablename
}

Related

How to pass Side Inputs/Extra Input to JdbcIO RowMapper Java

I am trying to read cloud SQL table in java beam using JdbcIO.Read. I want to convert each row in Resultset into GenericData.Record using .withRowMapper(Resultset resultSet) method. Is there a way I can pass JSON Schema String as input in .withRowMapper method like ParDo accepts sideInputs as PCollectionView
I have tried doing both reads operations (read from information_schema.columns and My Table in same JdbcIO.Read transform). However, I would like to have Schema PCollection generated first and then read table using JdbcIO.Read
I am generating Avro schema of table on the fly like this :
PCollection<String> avroSchema= pipeline.apply(JdbcIO.<String>read()
.withDataSourceConfiguration(config)
.withCoder(StringUtf8Coder.of())
.withQuery("SELECT DISTINCT column_name, data_type \n" +
"FROM information_schema.columns\n" +
"WHERE table_name = " + "'" + tableName + "'")
.withRowMapper((JdbcIO.RowMapper<String>) resultSet -> {
// code here to generate avro schema string
// this works fine for me
}))
Creating PCollectionView which will hold my json schema for each table.
PCollectionView<String> s = avroSchema.apply(View.<String>asSingleton());
// I want to access this view as side input in next JdbcIO.Read operation
// something like this ;
pipeline.apply(JdbcIO.<String>read()
.withDataSourceConfiguration(config)
.withCoder(StringUtf8Coder.of())
.withQuery(queryString)
.withRowMapper(new JdbcIO.RowMapper<String>() {
#Override
public String mapRow(ResultSet resultSet) throws Exception {
// access schema here and use it to parse and create
//GenericData.Record from ResultSet fields as per schema
return null;
}
})).
withSideInputs(My PCollectionView here); // this option is not there right now.
Is there any better way to approach this problem?

At this point IOs API do not accept SideInputs.
It should be feasible to add ParDo right after read and do mapping there. That ParDo can accept side inputs.

spring boot, jdbcTemplate, Java

I have a query that takes one column row from database and I want to set this data to my model. My model name is Pictures and my method is follow:
#Override
public Pictures getPictureList() throws Exception {
JdbcTemplate jdbc = new JdbcTemplate(datasource);
String sql= "select path from bakery.pictures where id=1";
Pictures pcList = jdbc.query(sql, new BeanPropertyRowMapper<Pictures>(Pictures.class));
return pcList;
}
This method returns "a query was Inferred to list".
How can I solve it?

Use JdbcTemplate.queryForObject() method to retrieve a single row by it's primary key.
Pictures p = jdbc.queryForObject(sql,
new BeanPropertyRowMapper<Pictures>(Pictures.class));
JdbcTemplate.query() will return multiple rows which makes little sense if you are querying by primary key.
List<Pictures> list = jdbc.query(sql,
new BeanPropertyRowMapper<Pictures>(Pictures.class));
Picture p = list.get(0);

Reuse an stream in esper in several queries

I'm trying to use EsperIO to load some information from database and use it in other queries with different conditions. To do it I'm using the following code:
ConfigurationDBRef dbConfig = new ConfigurationDBRef();
dbConfig.setDriverManagerConnection("org.postgresql.Driver",
"jdbc:postgresql://localhost:5432/myDatabase",
"myUser", "myPassword");
Configuration engineConfig = new Configuration();
engineConfig.addDatabaseReference("myDatabase", dbConfig);
// Custom class
engineConfig.addEventType("UserFromDB", UserDB.class);
EPServiceProvider esperEngine = EPServiceProviderManager.getDefaultProvider(engineConfig);
String statement = "insert into UserFromDB "
+ " select * from sql:myDatabase ['SELECT * from data.user']";
//Install this query in the engine
EPStatement queryEngineObject = esperEngine.getEPAdministrator().createEPL(statement);
// 1. At this point I can iterate over queryEngineObject without problems getting the information sent by database
// This query is only a 'dummy example', the 'final queries' are more complex
statement = "select * from UserFromDB";
EPStatement queryEngineObject2 = esperEngine.getEPAdministrator().createEPL(statement);
// 2. If I try to iterate over queryEngineObject2 I receive no data
How can I reuse UserFromDB stored information in other queries? (in the above example, in queryEngineObject2)

You don't have a stream since the database doesn't provide a stream. The database query provides rows only when its being iterated/pulled.
One option is to loop over each row and send it into the engine using "sendEvent":
// create other EPL statements before iterating
Iterator<EventBean> it = statement.iterator();
while(it.hasNext()) {
epService.getEPRuntime().sendEvent(event);
}

Querydsl join on same table multiple times

Let's say I have two tables Task and Company. Company has columns id and name. Task has two columns customerId and providerId which link back to the id column for Company.
Using Querydsl how do I join on the Company table twice so I can get the name for each company specified by the customerId and providerId?
Code that maybe explains better what I'm trying:
Configuration configuration = new Configuration(templates);
JPASQLQuery query = new JPASQLQuery(this.entityManager, configuration);
QTask task = QTask.task;
QCompany customer = QCompany.company;
QCompany provider = QCompany.company;
JPASQLQuery sql = query.from(task).join(customer).on(customer.id.eq(task.customerId))
.join(provider).on(provider.id.eq(task.providerId));
return sql.list(task.id, customer.name.as("customerName"), provider.name.as("providerName"));
Which generates SQL:
select task.id, company.name as customerName, company.name as providerName from task join company on company.id = task.customerId
And I'd really like it to be:
select task.id, customer.name as customerName, provider.name as providerName from task join company as customer on customer.id = task.customerId join company as provider on provider.id = task.providerId
I couldn't figure out how to alias the table I was joining so I could distinguish between customer and provider names. I tried doing new QCompany("company as provider") but that didn't work. Anyone know how one can do this?

If you need to variables just do the following
QCompany customer = new QCompany("customer");
QCompany provider = new QCompany("provider");
Reassignment of the default variable QCompany.company doesn't help

send a query in neo4j [JAVA]

I am new to neo4j and graph database, and I have to send a query to get some values.
I have food and category nodes, and the relationship type between the two is specified by another node categorized_as.
What I need to fetch is the pair of food_name and its category_name.
Thanks for your help in advance.

Here's the documentation on how to run cypher queries from java. Adapted for your example, it would look like this:
// Create a new graph DB at path DB_PATH
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
// Create a new execution engine for running queries.
ExecutionEngine engine = new ExecutionEngine( db );
ExecutionResult result;
// Queries need to be run inside of transactions...
try ( Transaction ignored = db.beginTx() )
{
String query = "MATCH (f:food)-[:categorized_as]->(c:category) RETURN f.food_name as foodName, c.category_name as categoryName";
// Run that query we just defined.
result = engine.execute(query);
// Pull out the "foodNames" column from the result indicated by the query.
Iterator<String> foodNames = result.columnAs( "foodName" );
// Iterate through foodNames...
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

BigQuery Java client - how to query external ( federated ) table? - java

Related

How to pass Side Inputs/Extra Input to JdbcIO RowMapper Java

spring boot, jdbcTemplate, Java

Reuse an stream in esper in several queries

Querydsl join on same table multiple times

send a query in neo4j [JAVA]

Categories

Resources