Ordering BigQuery Results in Java SDK - java

I am trying to get ordered results from a BigQuery with help of google cloud SDK.
The query looks like:
SELECT * FROM `table`
|WHERE id = 111
|ORDER BY time DESC
Then I create and run the Job:
Job job = QueryJobConfiguration.newBuilder(query)
.setUseLegacySql(false)
.build()
The issue, is when I actually fetch results, I receive them unordered:
TableResult results = job.getQueryResults()
results.iterateAll()
If I run the original query inside the BigQuery UI, everything seems to fine.
Any ideas, at which place and why the results being shuffled?

The issue was, that I've added ORDER BY clause later in query.
Still, I was accessing the job with the same jobId.
That made BigQuery to fetch previous results, which where unsorted.
Updating JobId helped!

Related

Influx db java client batch does not write to DB

I am trying to write points to influxDB using their Java client.
Batch is important to me.
If I use the influxDB.enableBatch with influxDB.write(Point) no data is inserted.
If I use the BatchPoints and influxDB.write(batchPoints) - data is inserted successfully.
Both code samples are taken from: https://github.com/influxdata/influxdb-java/tree/influxdb-java-2.7
InfluxDB influxDB = InfluxDBFactory.connect(influxUrl, influxUser, influxPassword);
influxDB.setDatabase(dbName);
influxDB.setRetentionPolicy("autogen");
// Flush every 2000 Points, at least every 100ms
influxDB.enableBatch(2000, 100, TimeUnit.MILLISECONDS);
influxDB.write(Point.measurement("cpu")
.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
.addField("idle", 90L)
.addField("user", 9L)
.addField("system", 1L)
.build());
Query query = new Query("SELECT idle FROM cpu", dbName);
QueryResult result = influxDB.query(query);
Returns nothing.
BatchPoints batchPoints = BatchPoints.database(dbName).tag("async", "true").build();
Point point1 = Point
.measurement("cpu")
.tag("atag", "test")
.addField("idle", 90L)
.addField("usertime", 9L)
.addField("system", 1L)
.build();
batchPoints.point(point1);
influxDB.write(batchPoints);
Query query = new Query("SELECT * FROM cpu ", dbName);
QueryResult result = influxDB.query(query);
This returns data successfully.
As mentioned, I need the first way to function.
How can I achieve that?
versions:
influxdb-1.3.6
influxdb-java:2.7
Regards, Ido
maybe it's too late or you have already resolved your issue, but I will answer your question, it may be useful for others.
I think your first example is not working because you enabled batch functionality and it will "Flush every 2000 Points, at least every 100ms". So basically it's working, but you are making select before the actual save is performed.
When you use influxDB.enableBatch(...); functionality influxdb-client creates internal thread pool for storing your data after collecting them or by timeout and it will not be done immediately.
In second example when you use influxDB.write(batchPoints); influxdb-client is synchronously writing your data to InfluxDb. That's why your select statement is able to return data immediately.

Google cloud Big query UDF limitations

I am facing a problem in Google bigquery. I have some complex computation need to do and save the result in Bigquery. So we are doing that complex computation in Java and saving result in google bigquery with the help of Google cloud dataflow.
But this complex calculation is taking around 28 min to complete in java. Customer requirement is to do within 20 sec.
So we switch to Google bigquery UDF option. One option is Bigquery legacy UDF. Bigquery legacy UDF have limitation that it is processing row one by one so we phased out this option. As we need multiple rows to process the results.
Second option is Scalar UDF. Big query scalar UDF are only can be called from WEB UI or command line and can not be trigger from java client.
If any one have any idea the please provide the direction on the problem how to proceed.
You can use scalar UDFs with standard SQL from any client API, as long as the CREATE TEMPORARY FUNCTION statements are passed in the query attribute of the request. For example,
QueryRequest queryRequest =
QueryRequest
.newBuilder(
"CREATE TEMP FUNCTION GetWord() AS ('fire');\n"
+ "SELECT COUNT(DISTINCT corpus) as works_with_fire\n"
+ "FROM `bigquery-public-data.samples.shakespeare`\n"
+ "WHERE word = GetWord();")
// Use standard SQL syntax for queries.
// See: https://cloud.google.com/bigquery/sql-reference/
.setUseLegacySql(false)
.build();
QueryResponse response = bigquery.query(queryRequest);
Big query scalar UDF are only can be called from WEB UI or command
line and can not be trigger from java client.
This is not accurate. Standard SQL supports scalar UDFs through CREATE TEMPORARY FUNCTION statement which can be used from any application and any client - it is simply part of the SQL query:
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
To learn how to enable Standard SQL, see this documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql
Particularly, simplest thing would be to add #standardSql tag at the beginning of SQL query.

hibernate criteria api how to get Path from fetched records?

I have a query written in criteria api that goes like this:
CriteriaQuery<Application> query = builder.get().createQuery(Application.class).distinct(true);
Root<Application> root = query.from(Application.class);
root.fetch(Application_.answerSets, JoinType.LEFT);
I need to get Path for answerSet.createDate field, but have no idea how to achieve it. I need it to make a query for applications where answer sets are older than X days.
Try query.add(Restrictions.eq("answerSet.createDate", YOURVALUE)). However I am not if createQuery is the same as createCriteria.

Update all objects in JPA entity

I'm trying to update all my 4000 Objects in ProfileEntity but I am getting the following exception:
javax.persistence.QueryTimeoutException: The datastore operation timed out, or the data was temporarily unavailable.
this is my code:
public synchronized static void setX4all()
{
em = EMF.get().createEntityManager();
Query query = em.createQuery("SELECT p FROM ProfileEntity p");
List<ProfileEntity> usersList = query.getResultList();
int a,b,x;
for (ProfileEntity profileEntity : usersList)
{
a = profileEntity.getA();
b = profileEntity.getB();
x = func(a,b);
profileEntity.setX(x);
em.getTransaction().begin();
em.persist(profileEntity);
em.getTransaction().commit();
}
em.close();
}
I'm guessing that I take too long to query all of the records from ProfileEntity.
How should I do it?
I'm using Google App Engine so no UPDATE queries are possible.
Edited 18/10
In this 2 days I tried:
using Backends as Thanos Makris suggested but got to a dead end. You can see my question here.
reading DataNucleus suggestion on Map-Reduce but really got lost.
I'm looking for a different direction. Since I only going to do this update once, Maybe I can update manually every 200 objects or so.
Is it possible to to query for the first 200 objects and after it the second 200 objects and so on?
Given your scenario, I would advice to run a native update query:
Query query = em.createNativeQuery("update ProfileEntity pe set pe.X = 'x'");
query.executeUpdate();
Please note: Here the query string is SQL i.e. update **table_name** set ....
This will work better.
Change the update process to use something like Map-Reduce. This means all is done in datastore. The only problem is that appengine-mapreduce is not fully released yet (though you can easily build the jar yourself and use it in your GAE app - many others have done so).
If you want to set(x) for all object's, better to user update statement (i.e. native SQL) using JPA entity manager instead of fetching all object's and update it one by one.
Maybe you should consider the use of the Task Queue API that enable you to execute tasks up to 10min. If you want to update such a number of entities that Task Queues do not fit you, you could also consider the user of Backends.
Put the transaction outside of the loop:
em.getTransaction().begin();
for (ProfileEntity profileEntity : usersList) {
...
}
em.getTransaction().commit();
Your class behaves not very well - JPA is not suitable for bulk updates this way - you just starting a lot of transaction in rapid sequence and produce a lot of load on the database. Better solution for your use case would be scalar query setting all the objects without loading them into JVM first ( depending on your objects structure and laziness you would load much more data as you think )
See hibernate reference:
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html#batch-direct

Java SimpleJPA for AWS SimpleDB Select Query

I'm having trouble getting objects back out of SimpleDB using the simpleJPA persistance API. I have successfully installed all the jars and can persist objects no problem. However I cannot seem to retrieve objects using select queries - but weirdly I can get results using count queries. There are no errors or exceptions, the queries simply don't return any results. When I debug I can view the actual AWS Query that is being generated in the background by simpleJPA, and when I run this query against a domain it returns the expected results no problem.
I've included my Java code below, it should return me a list of all the users in my database.
Query query = em.createQuery("SELECT u FROM User u");
List<User> results = (List<User>)query.getResultList();
As I said I can persist objects and count them, so there isn't anything wrong with my entity manager or factory, its just returning empty lists. If you need any more information just ask,
Thanks in advance!
I never got to the bottom of this problem. In the end I started a new AWS project in Eclipse and re-added the JAR files, solving the issue.

Categories

Resources