For our project we are using Lucene 5.5.0 library to create Lucene shards, however there is one ETL job for which, we need to create Lucene 4.10.3 shards so that we can index the shards in Solr cloud. I'd like to keep the Lucene version as 5.5.0, so I am trying to set the version through the API to be more specific I do this:
val analyzer = new KeywordAnalyzer()
val luceneVersion = Version.parseLeniently(version)
analyzer.setVersion(luceneVersion)
However when I try to index the generated shards into Solr cloud I get the following error message:
Error CREATEing SolrCore 'ac_test2_shard2_replica1': Unable to create core [ac_test2_shard2_replica1] Caused by: Format version is not supported (resource: BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 0 and 3)
Which based on this post is due to the fact that created Lucene version are not compatible with Solr cloud version. Can someone help me to understand why the created shards are still not compatible and how can I create a compatible older version shards?
Simply setting the version in the analyzer isn't going to do anything about the format of the index. All that does it make sure you are using familiar analysis rules, it has nothing to do with this problem.
You need to use the appropriate codec to write an index in an older format. Particularly Lucene410Codec. You can set the codec to use in your IndexWriterConfig. The backwards-codecs are primarily intended to read old indexes, rather than write them. I don't know for sure whether using it for your purpose would even work.
If possible, I would recommend you just use compatible Lucene versions instead. Either upgrade your Solr instance, or just use Lucene 4.10 for this job.
Related
Recently I upgraded my system to java 8. I am using Groovy. I have data that I need to write to Big Query.
In short: the data gets extracted from my database, gets pushed into a queue (RabbitMQ) and when it comes out of the queue, the data is formatted for Big Query and is sent to be added to the Big Query Table.
I am stuck at the first step: Installing a BigQuery plug in so that I can connect to my table and then push data to my table after formatting the data.
I tried adding the recommended Big Query plug in, from google but I get the following error:
Resolve error obtaining dependencies: Could not transfer artifact com.google.cloud:libraries-bom:zip:26.3.0 from/to repo_grails_org_ui_native_plugins_org_grails_plugins (https://repo.grails.org/ui/native/plugins/org/grails/plugins): Checksum validation failed, expected <!doctype but is 1b29c3f550acf246bb05b9ba5f82e0adbd0ad383 (Use --stacktrace to see the full trace)
I tried looking for plug ins in my IDE (Intellij) but the recommended one on google does not exist.
What plug in can I use that will work?
Is there an alternative plug in that I can use?
I tried to use older versions of the plug in but that also failed also producing a checksum validation error. I was hoping to find a compatible version. I used the first stable version just after the beta releases and that also failed, producing the same error. I also used version 20.
It turns out that the grails version of my application does not "accomodate" the big query plug-ins. I will have to upgrade my grails from 2.5.6 to a higher version
I found the commitlog(.log) files in the folder, and would like to analyze them. For example, I wanna know which query is executed in the history of the machine. Is there any code to do that?
Commit log files are specific for a version of Cassandra, and you may need to tinker with CommitLogReader, etc. You can find more information in the documentation on Change Data Capture.
But the main issue for you is that commit log doesn't contain the query executed, it contains the data that is modified. What you really need is the audit functionality - here you have several choices:
It's built-in into upcoming Cassandra 4.0 - see the documentation on how to use it
use ecAudit plugin open sourced by Ericsson - it supports Cassandra 2.2, 3.0 & 3.11
if you use DataStax Enterprise (DSE) it has built-in support for audit logging
I have a liferay cluster(2 servers), while each liferay boundle has one lucene files, I want to separate these lucene files into a mounted volume, like EFS. Is there any way that I can do this? I had tried, but failed, the main reason is that the server will lock the lucene file when indexing, and another server can not access.
When using a clustered environment, it is recommended to not use a plain file base lucene search index. Liferay rather recommends (Liferay Clustering) to use a pluggable enterprise search such as SOLR or Elasticsearch. There are also some help advices on that page for setup such an environment.
As Liferay says:
Sharing a Search Index (not recommended unless you have a file
locking-aware SAN)
That's why, the best option are:
Use pluggable engines like SolR or ElasticSearch (Elasticray or others).
Configure Liferay cluster with 1 node writer and 1 node reader with the property:
index.read.only=false
IMHO, I would try to use elasticsearch for indexes because it's the one used in the last versions (7+) and Lucene is not as powerful as Elastic, for example with the performance.
I would like to migrate documents persisted in couchbase via API 1.4.10 to new documents provided by API 2.0.5 like JsonDocument. I found that there is possibility to add custom transcoders to Bucket, so when decoding documents I can check for flags and decide which transcoder exactly should I use. But it seems to me that this is not quite good solution. Are there any other ways to do that in a proper way? Thanks.
Migration can be done only at runtime upon user request since there are too many documents, we can not migrate them all at once in the background.
You don't need to use a custom transcoder to read documents created with the 1.x SDK. Instead, use the LegacyDocument type to read (and write) documents in legacy format.
More importantly, you shouldn't continue running with a mix of legacy and new documents in the database for very long. The LegacyDocument type is provided to facilitate the migration from the old format to the new SDK. The best practice in this case is to deploy an intermediate version of your application which attempts to read documents in one format, then falls back on trying to read them in the other. Legacy to new or vice versa, depending on which type of document is accessed more frequently at first. Once you have the intermediate version deployed, you should run a background task that will read and convert all documents from the old format to the new. This is pretty straightforward: you just try to read documents as LegacyDocument and, if it succeeds, you store the document right back as a JsonDocument using the CAS value you got earlier. If you can't read the document as legacy, then it's already in the new format. The task should be throttled enough that it doesn't cause a large increase in database load. After the task finishes, remove the fallback code from the application and just read and write everthing as JsonDocument.
You mention having too many documents - how many is that? We've successfully migrated datasets with multiple billions of documents this way. This, admittedly, took several days to run. If you have a database that's larger than that, or has a very low resident ratio, it might not be practical to attempt to convert all documents.
I have imported nodes using jdbc importer but am unable to figure out auto_index support. How do I get auto indexing?
The tool you link to does give instructions for indexing, but I've never used it and it doesn't seem to be up to date. I would recommend you use one of the importing tools listed here. You can convert your comma separated file to tab separated and use this batch importer or one of the neo4j-shell tools, both of which support automatic indexing.
If you want to use a JDBC driver, for instance with some data transfer tool like Pentaho Kettle, there are instructions and links on the Neo4j import page, first link above.
I know from another question that you use regular expressions heavily and it is possible that 'automatic index', which is a Lucene index, may be very good for that, since you can query the index with regexp directly. But if you want to index your nodes within their labels, the new type of index in 2.0, then you don't need to setup indexing before importing. You can create an index at any time and it is populated in the background. If that's what you want, you can read the documentation about working with indices from Java API and Cypher.