java bigquery storage write api

java bigquery storage write api - java

Using the java bigquery storage api as documented here https://cloud.google.com/bigquery/docs/write-api.
Keeping the write stream long lived and refreshing it when one of the non-retry-able errors happened as per this https://cloud.google.com/bigquery/docs/write-api#error_handling
I am sticking with default stream. I have two tables and different parts of code responsible for writing to each table, maintaining its own stream writer.
If data is flowing, everything is fine. No errors. However I want to test refreshing the stream writers work too so I wait for default stream timeout (10mins) which closes the stream and try writing again. I can create the stream fine, no error there, but for one of the table I keep getting cancelled error wrapped in a Pre condition failed making my code refresh again and again.
Original error because stream closed due to inactivity
! io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Stream is closed due to com.google.api.gax.rpc.AbortedException: io.grpc.StatusRuntimeException: ABORTED: Closing the stream because it has been inactive for 600 seconds. Entity: projects/<id>/datasets/<id>/tables/<id>/_default
! at com.google.cloud.bigquery.storage.v1beta2.StreamWriterV2.appendInternal(StreamWriterV2.java:263)
! at com.google.cloud.bigquery.storage.v1beta2.StreamWriterV2.append(StreamWriterV2.java:234)
! at com.google.cloud.bigquery.storage.v1beta2.JsonStreamWriter.append(JsonStreamWriter.java:114)
! at com.google.cloud.bigquery.storage.v1beta2.JsonStreamWriter.append(JsonStreamWriter.java:89)
Further repeating errors on new stream(s)
! io.grpc.StatusRuntimeException: FAILED_PRECONDITION: Stream is closed due to com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCELLED: io.grpc.Context was cancelled without error
! at com.google.cloud.bigquery.storage.v1beta2.StreamWriterV2.appendInternal(StreamWriterV2.java:263)
! at com.google.cloud.bigquery.storage.v1beta2.StreamWriterV2.append(StreamWriterV2.java:234)
! at com.google.cloud.bigquery.storage.v1beta2.JsonStreamWriter.append(JsonStreamWriter.java:114)
! at com.google.cloud.bigquery.storage.v1beta2.JsonStreamWriter.append(JsonStreamWriter.java:89)
I am not sure why its being cancelled without error. Any pointers on how I can debug this or recommendation on how to maintain and refresh a long-lived streaming writer?

Updating the Java client library version should solve this problem since reconnect support was added for the JsonStreamwriter. Instead of throwing this error, it should handle retries.

Related

Closed Connection Error 17008

I am trying to execute a query which takes huge amount of time sometimes which causes closed connection, meaning the connection gets closed before the query executes/commits. I want to recover from the error, get a new connection and then retry.
Caused by: java.sql.SQLRecoverableException: Closed Connection
at oracle.jdbc.driver.OracleStatement.ensureOpen(OracleStatement.java:4051)
at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1473)
at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:406)
at com.fimt.sat.testora12date.dao.DateSaverGetterDao.testAbandonedConnectionWithDS(DateSaverGetterDao.java:73)

You can try catch this particular error:
public save(MyData data) {
try {
...
} catch (SQLRecoverableException e) {
// Better handling a parameter to set
// the maximum number of retries
// Eventually consider to retry on a secondary thread
// delayed of a certain number of seconds
save(data);
}
}

The answer of #Davide Lorenzo MARINO is great unless you have a query heavy enough to not to manage to be executed even after such several recovers.
I'm not a professional in Oracle but what I've found is that you can tune some kind of RAC failover that will keep you transaction even after exceeding the timeout.
But generally my vision is that it is better to split data somehow in multiple queries

Better split / bucket the data in the select query and commit at some intervals. E.g if the query is for a duration of 2 months, bucket it / loop / split it 15 days of period and do the necessary statements and commit it

I have got the following error upon connection to DB and accessing the tables.
"An error was encountered performing the requested operation"
Closed Connection
Vendor code 17008.
Was able to resolve this after downloading the latest version of SQL Developer 19.2.1.247. Earlier older version was 3.x.

How to use channel.Qos, channel.basicAck in my handleDelivery() of DefaultConsumer?

I am using new DefaultConsumer(channel) and overriding handleDeliverymethod.
My goal is to use my consumer as worker queues and for that I know that I have to provide the channel.basicQos(1). Using 1 as my prefetchCount. I have been reading that I also need to provide channel.basicAck in order for my server to know how many unacknowledged messages must be sent(correct me if I am wrong here). Based on this count the channel.basicQos takes effect. Now, I am using the following statements in the handleDelivery method
channel.basicQos(1);
channel.basicAck(envelope.getDeliveryTag(), false);
The issue is, I keep getting the following error:
com.rabbitmq.client.AlreadyClosedException: clean connection shutdown;
reason: Attempt to use closed channel
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:190)
at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:223)
.................................
..................................
When I remove the channel.basicAck, I don't see that problem.
How can I use the channel.basicQos to work (My understanding is for this to work, I need to provide the basicAck) and I don't want to get the AlreadyClosedException error.
Thanks for taking the time to read this and for any help you guys can offer!

Spring Batch - Connection closed in when processing is done in external process

I have a job that is built of several steps - one of the steps is a tasklet that activates processing Pentaho
I pass to Pentaho the parameters it needs in order to connect to the DB on its own and it works OK
The issue I have starts when the processing time in Pentaho is long
Pentaho completes successfully and the code in the tasklet that activated it completes OK, but in the job mechanism that wraps it I get an error when it tries to update the job execution table in the db because the connection it has was already closed
o.s.j.s.SQLErrorCodesFactory: Error while extracting database product name - falling back to empty error codes
org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData;
nested exception is
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed.
at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:296)
at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:320)
at org.springframework.jdbc.support.SQLErrorCodesFactory.getErrorCodes(SQLErrorCodesFactory.java:214)
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.setDataSource(SQLErrorCodeSQLExceptionTranslator.java:141)
at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.<init>(SQLErrorCodeSQLExceptionTranslator.java:104)
at org.springframework.jdbc.support.JdbcAccessor.getExceptionTranslator(JdbcAccessor.java:99)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:603)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:812)
at org.springframework.jdbc.core.JdbcTemplate.update(JdbcTemplate.java:868)
at org.springframework.batch.core.repository.dao.JdbcExecutionContextDao.persistSerializedContext(JdbcExecutionContextDao.java:230)
at org.springframework.batch.core.repository.dao.JdbcExecutionContextDao.updateExecutionContext(JdbcExecutionContextDao.java:159)
at org.springframework.batch.core.repository.support.SimpleJobRepository.updateExecutionContext(SimpleJobRepository.java:203)
...
14:21:37.143 UTC [ERROR] jobScheduler_Worker-2 T:b U: o.s.t.i.TransactionInterceptor: Application exception overridden by rollback exception
org.springframework.dao.RecoverableDataAccessException: PreparedStatementCallback; SQL [UPDATE BAT_STEP_EXECUTION_CONTEXT SET SHORT_CONTEXT = ?, SERIALIZED_CONTEXT = ? WHERE STEP_EXECUTION_ID = ?]; Communications link failure
It looks like the connection that the job repository received when the job started was abandoned and I'm trying to understand if there is a way to order it get a new connection or give it some keep alive command
I tried the following workarounds
change the step status in a job listener so the job will complete - fails with the same DB error
mark this exception as if it can be skipped - fails with the same DB error
<batch:no-rollback-exception-classes>
<batch:include class="com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException" />
<batch:include class="org.springframework.jdbc.support.MetaDataAccessException" />
</batch:no-rollback-exception-classes>
Any ideas how I can work around this?
Can I configure a job listener that will restart the job from the step that follows the Pentaho step?
Additional info
I think that the issue is here -
org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSource)
This
ConnectionHolder conHolder = (ConnectionHolder) TransactionSynchronizationManager.getResource(dataSource);
thinks that the connection is valid
so I guess the solution will be to call org.springframework.transaction.support.TransactionSynchronizationManager.unbindResource(Object)
and the question is how can I get the data source object to pass to this method
I will try querying the
org.springframework.transaction.support.TransactionSynchronizationManager.getResourceMap() and see where it gets me
update
no luck - the get resources map gives me just the repositories I'm using, not the data source. Still digging...
Another update
I'm debugging the process and it seems that the problem is indeed org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSource) the connection holder is holding a connection that is closed but the code here doesn't check if the connection is open; it only checks if the connection isn't null and if it was some weak reference maybe it was enough here - but in this use case it just proceedes with the closed connection instead of requesting a new one.

add this to the tasklet definition
<batch:transaction-attributes propagation="NEVER" />
since the Tasklet is doing external processing and doesn't need a spring batch transaction it need to tell spring batch not to open a transaction for this tasklet.
see
http://www.javabeat.net/transaction-management-in-spring-batch-components/
http://forum.spring.io/forum/spring-projects/batch/91158-legacy-integration-tasklet-transaction

Apache Camel ZipInputStream closed with parallel processing

I am successfully using ZipSplitter() to process files inside a zip file. I would like to use parallel processing if possible, but calling parallelProcessing() results in the stream being closed prematurely. This results results in an IOException when the stream is being cached by DefaultStreamCachingStrategy.
I note that when parallel processing is enabled, ZipIterator#checkNullAnswer(Message) is called which closes the ZipInputStream. Curiously, everything is dandy if I loiter on this method in my debugger, which suggests that the iterator is being closed before processing has completed. Is this a bug or have I messed up something?
A simplified version of my route which exhibits this behaviour is:
from("file:myDirectory").
split(new ZipSplitter()).streaming().parallelProcessing().
log("Validating filename ${file:name}").
end();
This is using Camel 2.13.1.

Can you just try to apply the CAMEL-7415 into the camel 2.13.1 branch?
I'm not quit sure if it can fix your issue, but it is worth to give it a shot.

Why did I get "FileUploadException: Stream ended unexpectedly" with Apache Commons FileUpload?

What is the reason for encountering this Exception:
org.apache.commons.fileupload.FileUploadException:
Processing of multipart/form-data request failed. Stream ended unexpectedly

The main reason is that the underlying socket was closed or reset. The most common reason is that the user closed the browser before the file was fully uploaded. Or the Internet was interrupted during the upload. In any case, the server side code should be able to handle this exception gracefully.

Its been about a year since I dealt with that library, but if I remember correctly, if someone tries to upload a file, then changes the browser URL (clicks a link, opens a bookmark, etc) then you could get that exception.

You could possibly get this exception if you're using FileUpload to receive an upload from flash.
At least as of version 8, Flash contains a known bug: The multipart stream it produces is broken, because the final boundary doesn't contain the suffix "--", which ought to indicate, that no more items are following. Consequently, FileUpload waits for the next item (which it doesn't get) and throws an exception.
There is a workaround suggests to use the streaming API and catch the exception.
catch (MalformedStreamException e) {
// Ignore this
}
For more details, please refer to https://commons.apache.org/proper/commons-fileupload/faq.html#missing-boundary-terminator

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.