Google cloud Big query UDF limitations

Google cloud Big query UDF limitations - java

I am facing a problem in Google bigquery. I have some complex computation need to do and save the result in Bigquery. So we are doing that complex computation in Java and saving result in google bigquery with the help of Google cloud dataflow.
But this complex calculation is taking around 28 min to complete in java. Customer requirement is to do within 20 sec.
So we switch to Google bigquery UDF option. One option is Bigquery legacy UDF. Bigquery legacy UDF have limitation that it is processing row one by one so we phased out this option. As we need multiple rows to process the results.
Second option is Scalar UDF. Big query scalar UDF are only can be called from WEB UI or command line and can not be trigger from java client.
If any one have any idea the please provide the direction on the problem how to proceed.

You can use scalar UDFs with standard SQL from any client API, as long as the CREATE TEMPORARY FUNCTION statements are passed in the query attribute of the request. For example,
QueryRequest queryRequest =
QueryRequest
.newBuilder(
"CREATE TEMP FUNCTION GetWord() AS ('fire');\n"
+ "SELECT COUNT(DISTINCT corpus) as works_with_fire\n"
+ "FROM `bigquery-public-data.samples.shakespeare`\n"
+ "WHERE word = GetWord();")
// Use standard SQL syntax for queries.
// See: https://cloud.google.com/bigquery/sql-reference/
.setUseLegacySql(false)
.build();
QueryResponse response = bigquery.query(queryRequest);

Big query scalar UDF are only can be called from WEB UI or command
line and can not be trigger from java client.
This is not accurate. Standard SQL supports scalar UDFs through CREATE TEMPORARY FUNCTION statement which can be used from any application and any client - it is simply part of the SQL query:
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
To learn how to enable Standard SQL, see this documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql
Particularly, simplest thing would be to add #standardSql tag at the beginning of SQL query.

Related

Dataflow Pipeline - Using dynamic param or query

I am trying to create a dataflow pipeline template, which required me to read data from bigquery. So what i need is to make my query dynamic using like Instant.now() but it seems the query is locked when creating the template
Some Code HERE
Some Code HERE
Some Code HERE
pipeline.apply("ReadFromBigQuery",
BigQueryIO.read(new DataTransformer(MyCustomObject.getQuery()))
.fromQuery(spec.getQuery())
.usingStandardSql()
.withQueryLocation("US")
.withoutValidation()
).apply("do Something 1",
Combine.globally(new CombineIterableAccumulatorFn<MyCustomObject2>())
).apply("do Something 2",
ParDo.of(new SendToKenshoo(param, param2)
);
My query is like this
SELECT * FROM `my-project-id.my-dataset.my-view` where PARTITIONTIME between TIMESTAMP('#currentDate') and TIMESTAMP('#tomorrowDate')
need to replace that #currentDate and #tomorrowDate using Instant.now() or any time function
please give me some example
note : i need to change the date on the code instead on query level like this
SELECT * FROM `my-project-id.my-dataset.my-view` where PARTITIONTIME between DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY) and CURRENT_DATE()

I'm not sure how you're sending those parameters to the query (via value provider, etc). However, I wouldn't recommend using templates for that because you need dynamic inputs. If you want to do that, I would use Flex Templates: https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates

Spring R2DBC + SQL Server: procedures query

I am required to execute a stored procedure in a SQL server to fetch some data, and since I will later save the data into a Mongo and this one is with ReactiveMongoTemplate and so on, I introduced Spring R2DBC.
implementation("org.springframework.data:spring-data-r2dbc:1.0.0.RELEASE")
implementation("io.r2dbc:r2dbc-mssql:0.8.1.RELEASE")
I see that I can do SELECT and INSERT and so on with R2DBC, but is it possible to EXEC prod_name? I tried it and it hangs forever and then the test terminates, without success but neither failure. The last line of log is:
io.r2dbc.mssql.QUERY - Executing query: EXEC "SCHEMA"."MY_PROCEDURE"
The code is like:
public Flux<Coupon> selectWithProcedure() {
return databaseClient
.execute("EXEC \"SCHEMA\".\"MY_PROCEDURE\" ")
.as(Coupon.class)
.fetch().all()
.doOnNext(coupon -> {
coupon.setCouponStatusRefFromId(coupon.getCouponStatusRefId());
});
}
And it seems that no data is retrieved.
If I test some other methods with simple queries like SELECT... it works. But the problem is, DBAs do not allow my app to read table data, instead, they create a procedure for me. If this query is not possible, I must go with traditional JPA way and going reactive at Mongo side has lost its sense.

Well. I just saw this:
https://github.com/r2dbc/r2dbc-mssql, version 0.8.1:
Next steps:
Execution of stored procedures
Add support for TVP and UDTs
And:
https://r2dbc.io/2019/05/13/r2dbc-0-8-milestone-8-released
We already have a few tickets lined up for the next milestone, and we know that they will require further SPI modifications:
Support for Auto-Commit
Connection Validation
Support for Stored Procedures

Ordering BigQuery Results in Java SDK

I am trying to get ordered results from a BigQuery with help of google cloud SDK.
The query looks like:
SELECT * FROM `table`
|WHERE id = 111
|ORDER BY time DESC
Then I create and run the Job:
Job job = QueryJobConfiguration.newBuilder(query)
.setUseLegacySql(false)
.build()
The issue, is when I actually fetch results, I receive them unordered:
TableResult results = job.getQueryResults()
results.iterateAll()
If I run the original query inside the BigQuery UI, everything seems to fine.
Any ideas, at which place and why the results being shuffled?

The issue was, that I've added ORDER BY clause later in query.
Still, I was accessing the job with the same jobId.
That made BigQuery to fetch previous results, which where unsorted.
Updating JobId helped!

No response with a query by ID on Azure DocumentDB

I'm currently facing very slow/ no response on a collection looking by ID. I have ~ 2 milion of documents in a partitioned collection. If lookup the document using the partitionKey and id the response is immediate
SELECT * FROM c WHERE c.partitionKey=123 AND c.id="20566-2"
if I try using only the id
SELECT * FROM c WHERE c.id="20566-2"
the response never returns, java client seems freezed and I have the same situation using the Data Explorer from Azure Portal. I tried also looking up by another field that isn't the id or the partitionKey and the response always returns. When I try the select from Java client I always set the flag to enable cross partition query.
The next thing to try is to avoid the character "-" in the ID to test if this character blocks the query (anyway I didn't find anything on the documentation)

The issue is related to your Java code. Due to Azure DocumentDB Java SDK wrapped the DocumentDB REST APIs, according to the reference of REST API Query Documents, as #DanCiborowski-MSFT said, the header x-ms-documentdb-query-enablecrosspartition explains your issue reason as below.
Header: x-ms-documentdb-query-enablecrosspartition
Required/Type: Optional/Boolean
Description: If the collection is partitioned, this must be set to True to allow execution across multiple partitions. Queries that filter against a single partition key, or against single-partitioned collections do not need to set the header.
So you need to set True to enable cross partition for querying across multiple partitions without a partitionKey in where clause via pass a instance of class FeedOption to the method queryDocuments, as below.
FeedOptions queryOptions = new FeedOptions();
queryOptions.setEnableCrossPartitionQuery(true); // Enable query across multiple partitions
String collectionLink = collection.getSelfLink();
FeedResponse<Document> queryResults = documentClient.queryDocuments(
collectionLink,
"SELECT * FROM c WHERE c.id='20566-2'", queryOptions);

Automatically generated database requests

How do you implement automatically generated database (let it be SQL) requests?
Let us have offline shop with filters:
The database is standalone offline.
SO if I want to filter items by Price the request would be something like:
select Snowboard.Name
from Snowboard
where Snowboard.Price between 400 and 600;
And if I filter by two characteristics e.g. Price from and Camber. There would be:
select s.Name, s.Camber
from Snowboard s
where s.Price between 400 and 600
and s.Camber in ('Rocker', 'Hybrid');
The question is how could it be implemented in Java so that these requests are generated automatically from any combination of filters selected?

Quick and dirty solution #1
Generate a query at run time & make clever use of WHERE 1=1 condition as the number of where clause are unknown. (This sample is in C# but works more or less the same with JAVA as well)
string sql= #"select Snowboard.Name
from Snowboard
where 1=1";
Now you can build your query based on the UI element selections like
string whereClause="";
if(yourCheckBoxPrice.Checked)
{
whereClause+= " AND Price BETWEEN "+ txtPriceFrom.Text + " AND "+ txtPriceTo.Text;
}
if(yourCheckBoxCamber.Checked)
{
whereClause+= " AND Camber IN ("+ /* your list of values go here */ +")";
}
sql += whereClause;
2nd Solution (Use SQL CASE)
You can use SQL CASE inside your query for each where clause to check for nulls or specific values. But beware, dynamic SQL will make your code pretty messy & hard to read (Can be done via a stored procedure as well)
SQL- CASE Statement
I advise you to use a stored procedure with a mix of both options 1 and 2. Implementing Dynamic SQL Where Clause. Keep it simple and you are good to go.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Google cloud Big query UDF limitations - java

Related

Dataflow Pipeline - Using dynamic param or query

Spring R2DBC + SQL Server: procedures query

Ordering BigQuery Results in Java SDK

No response with a query by ID on Azure DocumentDB

Automatically generated database requests

Categories

Resources