Alternative BigQuery Writer to the plugin provided by google? - java

Recently I upgraded my system to java 8. I am using Groovy. I have data that I need to write to Big Query.
In short: the data gets extracted from my database, gets pushed into a queue (RabbitMQ) and when it comes out of the queue, the data is formatted for Big Query and is sent to be added to the Big Query Table.
I am stuck at the first step: Installing a BigQuery plug in so that I can connect to my table and then push data to my table after formatting the data.
I tried adding the recommended Big Query plug in, from google but I get the following error:
Resolve error obtaining dependencies: Could not transfer artifact com.google.cloud:libraries-bom:zip:26.3.0 from/to repo_grails_org_ui_native_plugins_org_grails_plugins (https://repo.grails.org/ui/native/plugins/org/grails/plugins): Checksum validation failed, expected <!doctype but is 1b29c3f550acf246bb05b9ba5f82e0adbd0ad383 (Use --stacktrace to see the full trace)
I tried looking for plug ins in my IDE (Intellij) but the recommended one on google does not exist.
What plug in can I use that will work?
Is there an alternative plug in that I can use?
I tried to use older versions of the plug in but that also failed also producing a checksum validation error. I was hoping to find a compatible version. I used the first stable version just after the beta releases and that also failed, producing the same error. I also used version 20.

It turns out that the grails version of my application does not "accomodate" the big query plug-ins. I will have to upgrade my grails from 2.5.6 to a higher version

Related

Does deleting from a table of a h2 database handled by Hibernate corrupts the table?

Here a quick description of the system:
A java 7 REST client receives jsons and write their parsed content into an h2 database via Hibernate.
Some Pentaho Kettle Spoon 4 ETLs directly connect to the same database to read and delete a lot of entries at once.
This solution worked fine in our test environment, but in production (where the traffic is really higher because of course it is) the ETLs are often failing with the following error
Error inserting/updating row
General error: "java.lang.ArrayIndexOutOfBoundsException: -1"; SQL statement:
DELETE FROM TABLE_A
WHERE COLUMN_A < ? [50000-131]
and if I navigate the database I can indeed see that that table is not readable (apparently because it thinks its lenght is -1?). The error code 50000 is for "Generic" so is no use.
Apart from the trivial "maybe h2 is not good for an Event Handler", I've been thinking that the corruption could possible be caused by a confict between Kettle and Hibernate, or in other words that no one should delete from an Hibernate handled database without him knowing.
My questions to those more experienced then me with Hibernate are:
Is my sopposition correct?
Should I re-design my solution to also use the same restful Hibernate to perform deletes?
Should I resign using h2 for such a system?
Thanks for the help!
EDIT:
The database is created by a simple sh script that runs the following command that basically uses the provided Shell tool to connect to a non existing db which by defalts creates it.
$JAVA_HOME/bin/java -cp *thisIsAPath*/h2database/h2/main/h2-1.3.168-redhat-2.jar org.h2.tools.Shell -user $DB_USER -password $DB_PASSWORD -url jdbc:h2:$DB_FOLDER/Temp_SD_DS_EventAgent<<END
So all its parameters are set to version 1.3.168's defaults. Unfortunately while I can find the current URL setting I can't find where to look for that version's defauts and experimentals.
I also found the followings:
According to the tutorial When using Hibernate, try to use the H2Dialect if possible. which I didn't.
The tutorial also says Please note MVCC is enabled in version 1.4.x by default, when using the MVStore. Does that mean cuncurrency is disabled/unsupported by default in this older case and this is the problem?
The database is created with h2 version 1.3.168 but the consumer uses 1.4.197. Is this a big deal?
I cannot comment on the credibility of h2 db.
But from application perspective, I think you should use locking mechanism - Optimistic or Pessimistic lock. This will avoid the conflict situations. Hope this answer helps to point in correct direction
Article on Optimistic and Pessimistic locking

commitlog analysis in cassandra

I found the commitlog(.log) files in the folder, and would like to analyze them. For example, I wanna know which query is executed in the history of the machine. Is there any code to do that?
Commit log files are specific for a version of Cassandra, and you may need to tinker with CommitLogReader, etc. You can find more information in the documentation on Change Data Capture.
But the main issue for you is that commit log doesn't contain the query executed, it contains the data that is modified. What you really need is the audit functionality - here you have several choices:
It's built-in into upcoming Cassandra 4.0 - see the documentation on how to use it
use ecAudit plugin open sourced by Ericsson - it supports Cassandra 2.2, 3.0 & 3.11
if you use DataStax Enterprise (DSE) it has built-in support for audit logging

Lucene Format version is not supported

For our project we are using Lucene 5.5.0 library to create Lucene shards, however there is one ETL job for which, we need to create Lucene 4.10.3 shards so that we can index the shards in Solr cloud. I'd like to keep the Lucene version as 5.5.0, so I am trying to set the version through the API to be more specific I do this:
val analyzer = new KeywordAnalyzer()
val luceneVersion = Version.parseLeniently(version)
analyzer.setVersion(luceneVersion)
However when I try to index the generated shards into Solr cloud I get the following error message:
Error CREATEing SolrCore 'ac_test2_shard2_replica1': Unable to create core [ac_test2_shard2_replica1] Caused by: Format version is not supported (resource: BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 0 and 3)
Which based on this post is due to the fact that created Lucene version are not compatible with Solr cloud version. Can someone help me to understand why the created shards are still not compatible and how can I create a compatible older version shards?
Simply setting the version in the analyzer isn't going to do anything about the format of the index. All that does it make sure you are using familiar analysis rules, it has nothing to do with this problem.
You need to use the appropriate codec to write an index in an older format. Particularly Lucene410Codec. You can set the codec to use in your IndexWriterConfig. The backwards-codecs are primarily intended to read old indexes, rather than write them. I don't know for sure whether using it for your purpose would even work.
If possible, I would recommend you just use compatible Lucene versions instead. Either upgrade your Solr instance, or just use Lucene 4.10 for this job.

How can a Elasticsearch client be notified of a new indexed document?

I am using Elasticsearch, and I am building a client (using the Java Client API) to export logs indexed via Logstash.
I would like to be able to be notified (by adding a listener somewhere) when a new document is index (= a new log line have been added) instead of querying the last X documents.
Is it possible ?
This is what you're looking for: https://github.com/ForgeRock/es-change-feed-plugin
Using this plugin, you can register to a websocket channel to receive indexation/deletion events as they happen. It has some limitations, though.
Back in the days, it was possible to install river plugins to stream documents to ES. The river feature has been removed, but this plugin above is like a "reverse river", where outside clients are notified by ES as documents get indexed.
Very useful and seemingly up-to-date with ES 6.x
UPDATE (April 14th, 2019):
According to what was said at Elastic{ON} Zurich 2019, at some point in the 7.x series, there will be a Changes API that will provide index changes notifications (document creation, update, deletion and more).
UPDATE (July 22nd, 2022):
ES 8.x is out and the Changes API is still nowhere in sight ... Good to know, though, that's it's still open at least.

Datanucleus/JDO Level 2 Cache on Google App Engine

Is it possible (and does it make sense) to use the JDO Level 2 Cache for the Google App Engine Datastore?
First of all, why is there no documentation about this on Google's pages? Are there some problems with it? Do we need to set up limits to protect our memcache quota?
According to DataNucleus on Stackoverflow, you can set the following persistence properties:
datanucleus.cache.level2.type=javax.cache
datanucleus.cache.level2.cacheName={cache name}
Is that all? Can we choose any cache name?
Other sources on the Internet report using different settings.
Also, it seems we need to download the DataNucleus Cache support plugin. Which version would be appropriate? And do we just place it in WEB-INF/lib or does it need more setup to activate it?
Before you can figure this out, you have to answer one question:
Which version of DataNucleus are you using?
Everything on this post has to do with the old version of the plugin -- v1. Only recently has the Google Plugin for Eclipse supported v2 of the DataNucleus plugin for AppEngine (which is basically the conduit between AppEngine and the DataNucleus Core).
I'd recommend upgrading to v2 of the Datanucleus plugin for AppEngine -- if you're using Eclipse, it's easy -- there's a UI for it that allows you to select v1 or v2. Just go to your Project properties and find the App Engine settings and look for "Datanucleus JDO/JPA version".
Plus, you have to make a change to your jdo-config.xml. Specifically, you have to change just one property:
<property name="javax.jdo.PersistenceManagerFactoryClass" value="org.datanucleus.api.jdo.JDOPersistenceManagerFactory"/>
SO -- Once you've upgraded to v2, this is how you specify caching (an addition to jdoconfig.xml):
<property name="datanucleus.cache.level2.type" value="jcache"/>
<property name="datanucleus.cache.level2.cacheName" value="NameItWhateverYouWant"/>
At this point, caching should happen automatically every time you put and get using a PersistenceManager. Hooray!
No known problems with anything to do with L2 caching and GAE/J. If people have problems then perhaps they ought to report them to Google. Set the cache name to what you wish. Anything put into memcache has to be Serializable, obviously, since that is what memcache does. Yes, you need the datanucleus-cache plugin (ver 1.x), and put it in the same place as any other DN jars. One day Google will update to use DN 2.x
It seems to have problems instead: I tried (with JPA) and I got the error someone else already reported: http://code.google.com/p/datanucleus-appengine/issues/detail?id=163

Categories

Resources