Google BigQuery - query ran successfully but results not pushed to destination table - java

We run a nightly query against BigQuery via the Java REST API that specifies a destination table for the results to be pushed to (write disposition=WRITE_TRUNCATE). Today's query appeared to run without errors but the results were not pushed to the destination table.
This query has been running for a few weeks now and we've had no issues. No code changes were made either.
Manually running it a second time after it "failed" worked fine. It was just this one glitch that we spotted and we're concerned it may happen again.
Our logged JSON response from the "failed" query looks fine (I've obfuscated any sensitive data):
INFO: Job finished successfully: {
"configuration" : {
"dryRun" : false,
"query" : {
"createDisposition" : "CREATE_IF_NEEDED",
"destinationTable" : {
"datasetId" : "[REMOVED]",
"projectId" : "[REMOVED]",
"tableId" : "[REMOVED]"
},
"priority" : "INTERACTIVE",
"query" : "[REMOVED]",
"writeDisposition" : "WRITE_TRUNCATE"
}
},
"etag" : "[REMOVED]",
"id" : "[REMOVED]",
"jobReference" : {
"jobId" : "[REMOVED]",
"projectId" : "[REMOVED]"
},
"kind" : "bigquery#job",
"selfLink" : "[REMOVED]",
"statistics" : {
"creationTime" : "1390435780070",
"endTime" : "1390435780769",
"query" : {
"cacheHit" : false,
"totalBytesProcessed" : "12546"
},
"startTime" : "1390435780245",
"totalBytesProcessed" : "12546"
},
"status" : {
"state" : "DONE"
}
}
Using the "try it!" for Jobs/GET here and plugging in the job id also shows the job was indeed successful and matches our logged output (pasted above).
Checking the web console shows the destination table has been truncated but not updated. Weirdly, the "Last Modified" has not been updated (I did try refreshing the page numerous times):
http://i.stack.imgur.com/384NL.png
Has anyone experienced this before with BigQuery - a query appearing to run successfully but if a destination/reference table was specified the results were not pushed yet the table was truncated?

I am a developer on the BigQuery team. I've looked up the details of you job from the breadcrumbs you left (your query was the only one that started at that start time).
It looks like your destination table was truncated at 4:09 pm today PST, which is the time your job ran, but it was left empty -- the query that truncated it didn't actually fill in any information.
I'm having a little bit of trouble piecing together the details, because one of the source tables appears to have been overwritten (the left table in your left outer of join was created at 4:20 PM).
However, there is a clue in the "total bytes processed" field -- it says that the query only processed 12K of data. The internal statistics say that only 384 rows were involved in the query among both tables that were involved.
My guess is that the query legitimately returned 0 rows, so the table was cleared.
There is a bug in that deleting all of the data in a table doesn't update the last modified time. We use last modified to mean either ast time the metadata was updated (like description, schema, etc) or the last time the table had data added to it). But if you just truncate the table, that doesn't update the metadata or add data, so we end up with a stale last-modified time.
If this doesn't sound like a reasonable chain of events, we'll need more information from you about how to debug it (especially since it looks like the tables involved have been modified since you ran this query), and a way that we can reproduce it would be great.

So, we figured out what the problem is with this. It failed again a few times over the last few days so we dug in further.
The query that is being executed is dependant on a another query which is executed immediately before it. Although we do wait for the first query to finish (job status = "DONE"), it appears that behind the scenes it's actually not fully complete and it's data is not yet available to be used.
Current process is:
Fetch data from another data source and stream the results into table A
When (1) is complete (poll job id and get status "DONE") submit another query which uses the results in table A to join on to create table B
Table A's data is not yet available so query from (2) results in an empty table
We've noticed it takes about 5-10 seconds for the data to actually appear and be available in BigQuery when using streaming for the first query.
We used a fairly ugly workaround - simply wait a few seconds after the first query before running the next one. Not exactly elegant but it works.

Related

BigQuery: 404 "Table is truncated." when insert right after truncate

I truncate my table by executing a queryJob described here: https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries
"truncate table " + PROJECT_ID + "." + datasetName + "." + tableName;
i wait until the job finishes via
queryJob = queryJob.waitFor();
Truncate works fine.
Anyway, if i do an insert right after the truncate operation via
InsertAllResponse response = table.insert(rows);
it results in a
com.google.cloud.bigquery.BigQueryException: Table is truncated.
with following log:
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST https://www.googleapis.com/bigquery/v2/projects/[MYPROJECTID]/datasets/[MYDATASET]/tables/[MYTABLE]/insertAll?prettyPrint=false
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Table is truncated.",
"reason" : "notFound"
} ],
"message" : "Table is truncated.",
"status" : "NOT_FOUND"
}
Sometimes i have even to wait more than 5 Minutes between truncate and insert.
I would like to check if my table is still in the state "Table is truncated." periodically until this state is gone.
How can i request bigquery api in order to check if the table is ready for inserts?
How can i request bigquery api for get the status of the table?
Edit
example for reproduce can be found here
If a table is truncated while the streaming pipeline is still going on or performing a streaming insertion on a recently truncated table, you could receive some errors like mentioned in the question (Table is truncated), that's expected behavior. The metadata consistency mode for the InsertAll (very high QPS API) is eventually consistent, this means that when using the InsertAll API, it may get delayed table metadata and returns the failure like table truncated. The typical way to resolve this issue is to back-off and retry.
Currently, there is no option in the BigQuery API to check if the table is in truncated state or not.
Unfortunately the api does not (yet?) provide an endpoint to check the truncated state of the table.
In order to avoid this issue, one can use a load job via gc storage.
It looks like the load job respects this state, as i have no issues with truncate/load multiple times in a row.
public void load(String datasetName, String tableName, String sourceUri) throws InterruptedException {
Table table = getTable(datasetName, tableName);
Job job = table.load(FormatOptions.json(), sourceUri);
// Wait for the job to complete
Job completedJob = job.waitFor(RetryOption.initialRetryDelay(Duration.ofSeconds(1)),
RetryOption.totalTimeout(Duration.ofMinutes(3)));
if (completedJob != null && completedJob.getStatus().getError() == null) {
// Job completed successfully
} else {
// Handle error case
}
}

How to search in firebase database

I'm trying to filter the data from my database using this code:
fdb.orderByChild("title").startAt(searchquery).endAt(searchquery+"\uf8ff").addValueEventListener(valuelistener2);
My database is like this:
"g12" : {
"Books" : {
"-Mi_He4vHXOuKHNL7yeU" : {
"title" : "Technical Sciences P1"
},
"-Mi_He50tUPTN9XDiVow" : {
"title" : "Life Sciences"
},
"-Mi_He51dhQfl3RAjysQ" : {
"title" : "Technical Sciences P2"
}}
While the code works, it only returns the first value that matches the query and doesn't fetch the rest of the data even though it matches.
If I put a "T" as my search query, I just get the first title "Technical Sciences P1 " and don't get the other one with P2
(Sorry for the vague and common question title, it's just I've been looking for a solution for so long)
While the codes works, it only returns the first value that matches the query
That's the expected behavior since Firebase Realtime Database does not support native indexing or search for text fields in database properties.
When you are using the following query:
fdb.orderByChild("title").startAt(searchquery).endAt(searchquery+"\uf8ff")
It means that you are trying to get all elements that start with searchquery. For example, if you have a title called "Don Quixote" and you search for "Don", your query will return the correct results. However, searching for "Quix" will yield no results.
You might consider downloading the entire node to search for fields client-side but this solution isn't practical at all. To enable full-text search of your Firebase Realtime Database data, I recommend you to use a third-party search service like Algolia or Elasticsearch.
If you consider at some point in time to try using Cloud Firestore, please see the following example:
Is it possible to use Algolia query in FirestoreRecyclerOptions?
To see how it works with Cloud Firestore but in the same way, you can use it with Firebase Realtime Database.

Ripple XRP Ledger - Cant get transaction validated (Testnet)

Im using testnet to validate my transaction, transaction :
{"transaction":"ECAB482EB34177FA1B1E6C724F038C42308004B1F307A169FAEA88C825E11642","command":"tx","id":0}
Response :
{id=0, status='success', errorMessage='null', result=TxResult{validated=false}}
Im using websocket , method 'tx' to check. What is the best course of action to figure out problem, is there a way to see reason this is not validated on some of the testnet validators?
Im connected to wss://s.altnet.rippletest.net:51233 ,address i use is rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF. Can someone help?
Fee is at 1 000 000 drops. This is the transaction blob 1200002200000000240000000061D4838D7EA4C680000000000000000000000000005553440000000000C882FD6AB9862C4F90E57E1BA15C248CABAD5BF96840000000000F42407321033BF063167F21FF6C01045B4E2F03F519879B552D2611F0E885E01F08C88D15247446304402202E90609AAFBF4C105408CFF2377D48085879BEE3C7DE57AF125F73926284362A022002D7A487F5929F9A3E1050FC2B5D6AE1DD5384647AD1ABF6D322765F0ABE0A498114C882FD6AB9862C4F90E57E1BA15C248CABAD5BF983148DC6B336C7D3BE007297DB086B1D3483DEA24C2A
Is my transaction fualty ? Then why was it corrently submited to the network ? Seems like its valid, bud why its not validated and hence finalized in ledger?
Note : responses use my internal model to represent some
properties, hence thats why names might be slightly different and some properties ommited.
Result from 'submit' call :
Result :SubmitResult{engineResult='tefPAST_SEQ', engineResultCode=-190, engineResultMessage='This sequence number has already passed.', txBlob='1200002200000000240000000061D4838D7EA4C680000000000000000000000000005553440000000000C882FD6AB9862C4F90E57E1BA15C248CABAD5BF96840000000000F42407321033BF063167F21FF6C01045B4E2F03F519879B552D2611F0E885E01F08C88D15247446304402202E90609AAFBF4C105408CFF2377D48085879BEE3C7DE57AF125F73926284362A022002D7A487F5929F9A3E1050FC2B5D6AE1DD5384647AD1ABF6D322765F0ABE0A498114C882FD6AB9862C4F90E57E1BA15C248CABAD5BF983148DC6B336C7D3BE007297DB086B1D3483DEA24C2A', txJson=TxJson{transactionType='Payment', account='rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF', destination='rDveJyEotoUp9jCD1Ghi2ktEBnhHiA6RBB', amount=Amount{currency='USD', value=1, issuer='rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF'}, fee='1000000', flags=0, sequence=0, signingPubKey='033BF063167F21FF6C01045B4E2F03F519879B552D2611F0E885E01F08C88D1524', txnSignature='304402202E90609AAFBF4C105408CFF2377D48085879BEE3C7DE57AF125F73926284362A022002D7A487F5929F9A3E1050FC2B5D6AE1DD5384647AD1ABF6D322765F0ABE0A49', hash='ECAB482EB34177FA1B1E6C724F038C42308004B1F307A169FAEA88C825E11642'}}
I submited it few times , so 'tefPAST_SEQ' is present.
Looks like your transaction object has sequence field in it.
According to THIS your sequence can be auto-filled. It can be set manually in case you'd want to submit multiple transactions at once by incrementing them manually.
This gives you control over the order of the transactions to be executed in certain order.
If that doesn't matter you can just go without setting the sequence.
In your case your account looks like this (using account_info):
{
"result": {
"account_data": {
"Account": "rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF",
"Balance": "10000000000",
"Flags": 0,
"LedgerEntryType": "AccountRoot",
"OwnerCount": 0,
"PreviousTxnID": "12CA4E5AAF4198155FF3F16E53D35353B051F4AB5E01749833202339B48D187A",
"PreviousTxnLgrSeq": 11450559,
"Sequence": 1,
"index": "169B6BA91A54B2EC86EFB618995A59E76F07853BB88AF231776118339FFD7268"
},
"ledger_hash": "449E3420C6B1C6959FA794066264432EF4E98543B0C6582B00D6CD28DE33B8F8",
"ledger_index": 11523855,
"status": "success",
"validated": true
}
See the result.account_data.Sequence being 1?
The reason you're seeing This sequence number has already passed is you've set sequence=0 in your transaction. (provided by Result from 'submit' call :)
On a side note, I see you've set currency='USD' which means you have to open a trust line first. your account currently has 0 account_lines
Either way, good luck using XRP ;)

When profiling a Mongo query, what does "millis" mean?

We are working on an application where Java code talks to Mongo and streams the results back with Spring Data. We have been looking at the profiler output and I am not 100% on what it means.
https://docs.mongodb.com/manual/reference/database-profiler/
{
"op" : "query",
"ns" : "test.c",
"query" : {
"find" : "c",
"filter" : {
"a" : 1
}
},
"keysExamined" : 2,
"docsExamined" : 2,
"cursorExhausted" : true,
...
"responseLength" : 108,
"millis" : 0,
The documentation's description is:
system.profile.millis
The time in milliseconds from the perspective of the mongod from the beginning of the operation to the end of the operation.
OK, but what is the operation? If I am executing a query and I am pulling 1000 results back, is the "millis" time just for the query plan? Or does it include the ENTIRE it spends pulling the results back and sending them to the driver?
Will this give different answers when streaming vs non-streaming?
The operation is the query; the query does not return documents, but instead returns a cursor that points to the locations of the documents on disk:
https://docs.mongodb.com/v3.0/core/cursors/
The "millis" result is the time it takes MongoDB to search for the query results (perform index or collection scan, identify all documents that meet the query criteria and perform sorts if necessary) and return the corresponding cursor to the driver.
I'm not certain with what you mean by "streaming", but it could be the driver iterating over the cursor to access the results of the query.

Mongo DuplicateKey error despite no overlap

I have a well-logged pool of several java servers behind an F5 load balancer (professionally managed, it's not sending traffic to >1 host) running Tomcat with my application installed, connecting to a sharded mongo cluster. I'm using a base64-encoded SHA-1 hash of the primary natural key as the _id. When a new record is to be created, I do a pretty basic:
BasicDBObject query = new BasicDBObject();
query.put("userId", userId);
query.put("_id", id);
DBObject user = getUsersCollection().findOne(query);
if (user == null) {
getUsersCollection().insert(new UserObject(userId));
}
This is simplified. In fact there are multiple checks for the pre-existence of this user, including one which should throw a custom exception, and none are triggered. The traffic logs indicate a single incoming create request, and here's an example of what happens:
2014-01-19 20:03:45,167 [http-bio-7950-exec-827]:[...] : ERROR FATAL [...] - Internal server error
[...]: com.mongodb.MongoException$DuplicateKey: { "serverUsed" : "[...]" , "singleShard" : "replicaset_2/host1:27017,host2:27017,host3:27017" , "err" : "E11000 duplicate key error index: Users.$_id_ dup key: { : \"HASH\" }" , "code" : 11000 , "n" : 0 , "lastOp" : { "$ts" : 1390190614 , "$inc" : 1} , "connectionId" : 335764 , "ok" : 1.0}
Yet in my Users collection the record has been created:
db.Users.findOne({_id:"HASH"}):
{
"_id" : "HASH",
"createDate" : ISODate("2014-01-20T04:03:45.161Z"),
...
}
I'm pasting this as important because of the timestamps. We have a timezone issue, but that aside I interpret the 6ms difference as clock skew between the mongo cluster and my application servers. There is no other record of this incoming traffic (and it is logged as it bounces from server to server, even - nothing else!) So I am 99.999% confident that my SINGLE LEGITIMATE insert call is both inserting and throwing an error.
Any theories as to how/why this is happening would be greatly appreciated. I'll run tracers and examples if needed to answer questions with more information.
You are searching for a user using both _id and userId fields. Try to comment out this line: query.put("_id", id);.
It's not clear in your code where Java variable userId comes from. It's also not clear how UserObject sets an _id if at all.
Overall it looks like the way you search for user and the way you create him does not match, i.e. what defines a unique key on that user.
One fix could to replace these lines:
query.put("userId", userId);
query.put("_id", id);
with:
query.put("_id", userId);
To make _id field to be your userId.

Categories

Resources