Cannot publish json message on Kafka topic using ZeroCode - java

I am trying to create a test framework using ZeroCode for Kafka. The product I am trying to test is based on micro-services and Kafka. All I am trying to do is to connect to my topic and publish a message to it, at the moment. But when I run the test case I get an error saying 'Exception during operation:produce
Stacktrace
-------------------------- BDD: Scenario:Produce a message to kafka topic - vanilla -------------------------
27 Mar 2020 10:43:21,531 INFO [main] runner.ZeroCodeMultiStepsScenarioRunnerImpl |
### Executing Scenario -->> Count No: 0
27 Mar 2020 10:43:21,531 INFO [main] runner.ZeroCodeMultiStepsScenarioRunnerImpl |
### Executing Step -->> Count No: 0
---------------------------------------------------------
kafka.bootstrap.servers - <myKafkaBootstrapServer>
---------------------------------------------------------
27 Mar 2020 10:43:21,681 INFO [main] client.BasicKafkaClient | <myKafkaBootstrapServer>, topicName:executions.enriched, operation:produce, requestJson:{"recordType":"JSON","records":[{"value":"EquityExecution"}]}
27 Mar 2020 10:43:21,683 ERROR [main] client.BasicKafkaClient | Exception during operation:produce, topicName:executions.enriched, error:null
java.lang.RuntimeException: java.lang.NullPointerException
at org.jsmart.zerocode.core.kafka.client.BasicKafkaClient.execute(BasicKafkaClient.java:50)
at org.jsmart.zerocode.core.engine.executor.JsonServiceExecutorImpl.executeKafkaService(JsonServiceExecutorImpl.java:102)
at org.jsmart.zerocode.core.runner.ZeroCodeMultiStepsScenarioRunnerImpl.runScenario(ZeroCodeMultiStepsScenarioRunnerImpl.java:190)
at org.jsmart.zerocode.core.runner.ZeroCodeUnitRunner.runLeafJsonTest(ZeroCodeUnitRunner.java:198)
I am using .properties file to give broker and SSL credentials. Then send a test JSON. If publishing is successful then I plan to consume from a certain topic and assert on the values - thereby performing an integration test on the service.
Please help me resolving this as I cannot find any meaningful information online as to how to fix this. Much appreciated!
My .properties file look something like this:
security.properties=SSL
ssl.keystore.password=<myPassword>
ssl.keystore.location=<myLocation>
kafka.bootstrap.servers=<myServer>
My JSON file (Test Scenario, null key is a valid input to my topic) looks something like this:
{
"scenarioName": "Produce a message to kafka topic - vanilla",
"steps": [
{
"name": "produce_step",
"url": "kafka-topic:my.topic",
"operation": "produce",
"request": {
"records":[
{
"value": "My test value"
}
]
},
"assertions": {
"status" : "Ok",
}
}
]
}

Everything looking ok except the below. You should fix these first, then it will work fine.
kafka.bootstrap.servers=<myServer> should go into the Kafka broker properties which your #TargetEnv("kafka_servers/kafka_test_server.properties") is pointing to.
The producer.properties should not have kafka.bootstrap.servers=... redundant entry.
And properties file as below:
kafka_test_server.properties
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# kafka bootstrap servers comma separated
# e.g. localhost:9092,host2:9093
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
kafka.bootstrap.servers=localhost:9092
kafka.producer.properties=producer.properties
kafka.consumer.properties=consumer.properties
That's it.
#TargetEnv("kafka_servers/kafka_test_server.properties")
#RunWith(ZeroCodeUnitRunner.class)
public class KafkaProduceTest {
#Test
#JsonTestCase("kafka/produce/test_kafka_produce.json")
public void testProduce() throws Exception {
}
}
There is a working example KafkaProduceTest in GitHub Kafka HelloWorld project you can clone and run locally.

Related

need insight with logstash and elk

I am trying to add in data from logstash to elastic search bellow is the config file and the result on the terminal
input {
file {
path => "F:/business/data/clinical_trials/ctp.csv"
start_position => "beginning"
sincedb_path => "C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-
7.14.2/data/plugins/inputs/file/.sincedb_88142d557695dc3df93b28d02940763d"
}
}
filter {
csv {
separator => ","
columns => ["web-scraper-order", "web-scraper-start-url", "First Submitted Date"]
}
}
output {
stdout { codec => "rubydebug"}
elasticsearch {
hosts => ["https://idm.es.eastus2.azure.elastic-cloud.com:9243"]
index => "DB"
user => "elastic"
password => "******"
}
}
and when i run i get the following it gets right down to the piplines and the next step should be establishing connection with elastic but it hangs on pipeline forever.
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-7.14.2/vendor/bundle/jruby/2.5.0/gems/bundler-1.17.3/lib/bundler/rubygems_integration.rb:200: warning: constant Gem::ConfigMap is deprecated
Sending Logstash logs to C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-7.14.2/logs which is now configured via log4j2.properties
[2021-09-22T22:40:20,327][INFO ][logstash.runner ] Log4j configuration path used is: C:\Users\matth\Downloads\logstash-7.14.2-windows-x86_641\logstash-7.14.2\config\log4j2.properties
[2021-09-22T22:40:20,333][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.14.2", "jruby.version"=>"jruby 9.2.19.0 (2.5.8) 2021-06-15 55810c552b OpenJDK 64-Bit Server VM 11.0.12+7 on 11.0.12+7 +indy +jit [mswin32-x86_64]"}
[2021-09-22T22:40:20,383][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2021-09-22T22:40:21,625][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2021-09-22T22:40:22,007][INFO ][org.reflections.Reflections] Reflections took 46 ms to scan 1 urls, producing 120 keys and 417 values
[2021-09-22T22:40:22,865][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["https://idm.es.eastus2.azure.elastic-cloud.com:9243"]}
[2021-09-22T22:40:23,032][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://elastic:xxxxxx#idm.es.eastus2.azure.elastic-cloud.com:9243/]}}
[2021-09-22T22:40:23,707][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"https://elastic:xxxxxx#idm.es.eastus2.azure.elastic-cloud.com:9243/"}
[2021-09-22T22:40:24,094][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch version determined (7.14.1) {:es_version=>7}
[2021-09-22T22:40:24,096][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2021-09-22T22:40:24,277][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>12, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1500, "pipeline.sources"=>["C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-7.14.2/config/logstash-sample.conf"], :thread=>"#<Thread:0x52ff5a7a run>"}
[2021-09-22T22:40:24,346][INFO ][logstash.outputs.elasticsearch][main] Using a default mapping template {:es_version=>7, :ecs_compatibility=>:disabled}
[2021-09-22T22:40:24,793][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.51}
[2021-09-22T22:40:24,835][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
[2021-09-22T22:40:24,856][INFO ][filewatch.observingtail ][main][c4df97cc309d579dc89a5115b7fcf43a440ef876f640039f520f6b116b913f49] START, creating Discoverer, Watch with file and sincedb collections
[2021-09-22T22:40:24,876][INFO ][logstash.agent ] Pipelines running {:count=>1, `:running_pipelines=>[:main], :non_running_pipelines=>[]}
any help as to why my data isnt getting to elastic??? please help

IBM MQ client preventing garbage into the console output

we are using IBM MQ client (com.ibm.mq.allclient-9.1.3.0.jar) from a Spring-Boot project sending mq messages, the problem is the too many (tons) garbage output into the console:
..... ----+----+- } <init>(JmqiEnvironment)
..... ----+----+- { setUserDataSingle(byte [ ],int) <null> [0(0x0)]
..... ----+----+- } setUserDataSingle(byte [ ],int)
(xxx are replacement for privacy)
How can we control/remove the this low level info from the console ?
Thanks in advance,
Csaba
I looks like you have trace enabled.
Check if you have the java system property com.ibm.msg.client.commonservices.trace.status=ON set. If it is set, remove it and restart.

Google Cloud Search Database Connector Issues

I am working with Google Cloud search (https://developers.google.com/cloud-search/docs/guides/?_ga=2.124920714.-122300216.1578247736) and I am attempting to index a Cloud SQL instance. Presently I am using the guide as shown here (https://developers.google.com/cloud-search/docs/guides/database-connector#important-considerations). I have registered source in G-Suite. I have a Cloud Search Service Account, I tested that I connect to the Cloud SQL instance from my Compute Engine instance which I can.
My config file is as follows with the necessary information replaced with XXXX:
#
# data source access
api.sourceId=xxxxxxxxxxx
api.identitySourceId=xxxxxxxxxxxxxxxx
api.serviceAccountPrivateKeyFile=./private-key.json
#
# database access
db.url=jdbc:mysql:///<database>?cloudSqlInstance=<cloud_sql_instance>&socketFactory=mysql-socket-factory-connector-j-8&useSSL=false&user=xxxxxxxxx&password=xxxxxxxxx
#
# traversal SQL statements
db.allRecordsSql=select field_1, field_2, field_3 from table;
#
# schedule traversals
schedule.traversalIntervalSecs=36000
schedule.performTraversalOnStart=true
schedule.incrementalTraversalIntervalSecs=3600
#
# column definitions
db.allColumns= field1, field2, field3
db.uniqueKeyColumns=field1
url.columns=field1
#
# content fields
contentTemplate.db.title=field1
db.contentColumns=field1, field2, field3
#
# setting ACLs to "entire domain accessible"
defaultAcl.mode=fallback
defaultAcl.public=true
with the jdbc connection string being based off https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory. I'm at the stage where I run:
java \
-cp "google-cloudsearch-database-connector-v1-0.0.3.jar:mysql-connector-java-5.1.41-bin.jar" \
com.google.enterprise.cloudsearch.database.DatabaseFullTraversalConnector \
[-Dconfig=mysql.config]
but i get the error Failed to initialize connector. The full stacktrace is:
Jan 05, 2020 6:29:38 PM com.google.enterprise.cloudsearch.sdk.indexing.IndexingApplication startUp
SEVERE: Failed to initialize connector
com.google.enterprise.cloudsearch.sdk.StartupException: Failed to initialize connector
at com.google.enterprise.cloudsearch.sdk.Application.startConnector(Application.java:150)
at com.google.enterprise.cloudsearch.sdk.indexing.IndexingApplication.startUp(IndexingApplication.java:96)
at com.google.common.util.concurrent.AbstractIdleService$DelegateService$1.run(AbstractIdleService.java:62)
at com.google.common.util.concurrent.Callables$4.run(Callables.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
at com.google.common.io.BaseEncoding$StandardBaseEncoding.trimTrailingPadding(BaseEncoding.java:672)
at com.google.common.io.BaseEncoding.decodeChecked(BaseEncoding.java:226)
at com.google.common.io.BaseEncoding.decode(BaseEncoding.java:212)
at com.google.api.client.util.Base64.decodeBase64(Base64.java:93)
at com.google.api.services.cloudsearch.v1.model.Item.decodeVersion(Item.java:329)
at com.google.enterprise.cloudsearch.sdk.indexing.IndexingServiceImpl.indexItem(IndexingServiceImpl.java:678)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl.<init>(DefaultAcl.java:203)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl.<init>(DefaultAcl.java:93)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl$Builder.build(DefaultAcl.java:466)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl.fromConfiguration(DefaultAcl.java:266)
at com.google.enterprise.cloudsearch.sdk.indexing.template.FullTraversalConnector.init(FullTraversalConnector.java:182)
at com.google.enterprise.cloudsearch.sdk.indexing.template.FullTraversalConnector.init(FullTraversalConnector.java:97)
at com.google.enterprise.cloudsearch.sdk.Application.startConnector(Application.java:142)
... 4 more
Jan 05, 2020 6:29:38 PM com.google.enterprise.cloudsearch.sdk.BatchRequestService shutDown
INFO: Shutting down batching service. flush on shutdown: true
Exception in thread "main" java.lang.IllegalStateException: Expected the service IndexingApplication [FAILED] to be RUNNING, but the service has FAILED
at com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:344)
at com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:280)
at com.google.common.util.concurrent.AbstractIdleService.awaitRunning(AbstractIdleService.java:175)
at com.google.enterprise.cloudsearch.sdk.Application.start(Application.java:122)
at com.google.enterprise.cloudsearch.database.DatabaseFullTraversalConnector.main(DatabaseFullTraversalConnector.java:30)
Caused by: com.google.enterprise.cloudsearch.sdk.StartupException: Failed to initialize connector
at com.google.enterprise.cloudsearch.sdk.Application.startConnector(Application.java:150)
at com.google.enterprise.cloudsearch.sdk.indexing.IndexingApplication.startUp(IndexingApplication.java:96)
at com.google.common.util.concurrent.AbstractIdleService$DelegateService$1.run(AbstractIdleService.java:62)
at com.google.common.util.concurrent.Callables$4.run(Callables.java:122)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:871)
at com.google.common.io.BaseEncoding$StandardBaseEncoding.trimTrailingPadding(BaseEncoding.java:672)
at com.google.common.io.BaseEncoding.decodeChecked(BaseEncoding.java:226)
at com.google.common.io.BaseEncoding.decode(BaseEncoding.java:212)
at com.google.api.client.util.Base64.decodeBase64(Base64.java:93)
at com.google.api.services.cloudsearch.v1.model.Item.decodeVersion(Item.java:329)
at com.google.enterprise.cloudsearch.sdk.indexing.IndexingServiceImpl.indexItem(IndexingServiceImpl.java:678)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl.<init>(DefaultAcl.java:203)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl.<init>(DefaultAcl.java:93)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl$Builder.build(DefaultAcl.java:466)
at com.google.enterprise.cloudsearch.sdk.indexing.DefaultAcl.fromConfiguration(DefaultAcl.java:266)
at com.google.enterprise.cloudsearch.sdk.indexing.template.FullTraversalConnector.init(FullTraversalConnector.java:182)
at com.google.enterprise.cloudsearch.sdk.indexing.template.FullTraversalConnector.init(FullTraversalConnector.java:97)
This issue is "Caused by: java.lang.NullPointerException".
You should verify the line referred in the stack trace to find the culprit variable, which should be revealed by expanding the view of the "4 more" error lines in the stack.
Here's information about this exception.
Here's information about how to read a java stack trace
If I had to guess you're not establishing a connection to your database. Your configuration file looks good to me except db.url. It should look something like this from my experience jdbc:mysql://localhost:1433;DatabaseName=ExampleDatabase. The errors you typically get for other properties tell you exactly what the issue is (example field_1 does not exist in table) , but in this case it's not establishing a connection. Try rewriting the db.url.
Failed to initialize connector --> This exception occurs when there is something wrong with your config file. But usually it tells you what's wrong along with the message.
Did you try running Cloud SQL proxy on the Compute Engine VM and connecting through that?

MongoDB MongoException$Network

We seem to be out of ideas on how to continue troubleshooting this issue. Suddenly we see the exception listed below being hit every few minutes.
This is a 4 Mongo Shard Setup. The 4 MongoS servers are proxied using 3 HAProxies. We aren't having any visible network issues between our application and shards. The mongo logs for the shards, config servers or mongos don't show anything out of the ordinary. The haproxies seem to be doing their job just fine. Any thoughts and leads on this would be most appreciated!
{"#timestamp":"2016-02-10T05:10:23.780+00:00","#version":1,"message":"unable to process event [ Request Id: [ eb8c702c-b3e7-4605-99a7-c3dcb5a076a9 ] - Event: id [ 49e0ae8d-f16e-448c-9b28-a4af59aa2eb0 ] messageType [ bulk ] operationType: [ create ] xpoint [ 10254910941235296908401 ] ]","logger_name":"com.xyz.event.RabbitMQMessageProcessor","thread_name":"SimpleAsyncTaskExecutor-1","level":"WARN","level_value":30000,"stack_trace":"java.io.EOFException: null\n\tat org.bson.io.Bits.readFully(Bits.java:50) ~[mongo-java-driver-2.12.5.jar:na]\n\tat org.bson.io.Bits.readFully(Bits.java:35)\n\tat org.bson.io.Bits.readFully(Bits.java:30)\n\tat com.mongodb.Response.(Response.java:42)\n\tat com.mongodb.DBPort$1.execute(DBPort.java:141)\n\tat com.mongodb.DBPort$1.execute(DBPort.java:135)\n\tat com.mongodb.DBPort.doOperation(DBPort.java:164)\n\tat com.mongodb.DBPort.call(DBPort.java:135)\n\tat c.m.DBTCPConnector.innerCall(DBTCPConnector.java:289)\n\t... 56 common frames omitted\nWrapped by: c.m.MongoException$Network: Read operation to server prod_mongos.internal.xyz.com:27017 failed on database xyz\n\tat c.m.DBTCPConnector.innerCall(DBTCPConnector.java:297) ~[mongo-java-driver-2.12.5.jar:na]\n\tat c.m.DBTCPConnector.call(DBTCPConnector.java:268)\n\tat c.m.DBCollectionImpl.find(DBCollectionImpl.java:84)\n\tat c.m.DBCollectionImpl.find(DBCollectionImpl.java:66)\n\tat c.m.DBCollection.findOne(DBCollection.java:869)\n\tat c.m.DBCollection.findOne(DBCollection.java:843)\n\tat c.m.DBCollection.findOne(DBCollection.java:789)\n\tat o.s.d.m.c.MongoTemplate$FindOneCallback.doInCollection(MongoTemplate.java:2013) ~[spring-data-mongodb-1.6.3.RELEASE.jar:na]\n\tat o.s.d.m.c.MongoTemplate$FindOneCallback.doInCollection(MongoTemplate.java:1997)\n\tat o.s.d.m.c.MongoTemplate.executeFindOneInternal(MongoTemplate.java:1772)\n\t... 47 common frames omitted\nWrapped by: o.s.d.DataAccessResourceFailureException: Read operation to server prod_mongos.internal.xyz.com:27017 failed on database xyz; nested exception is com.mongodb.MongoException$Network: Read operation to server prod_mongos.internal.xyz.com:27017 failed on database xyz\n\tat o.s.d.m.c.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:59) ~[spring-data-mongodb-1.6.3.RELEASE.jar:na]\n\tat o.s.d.m.c.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:1946)\n\tat o.s.d.m.c.MongoTemplate.executeFindOneInternal(MongoTemplate.java:1776)\n\tat o.s.d.m.c.MongoTemplate.doFindOne(MongoTemplate.j...","HOSTNAME":"prod-node-09","requestId":"eb8c702c-b3e7-4605-99a7-c3dcb5a076a9","WHAT":"ProcessBulkDiscoveredXYZEvent","host":"172.30.31.155:44243","type":"cloud_service","tags":["_grokparsefailure"]}

Elasticsearch CouchDB river bogus chunk size

Sorry, this is a kind of related to another post. Once my CouchDB gets a fair amount of documents in it, ES starts throwing errors in the log and doesn't index newer files:
[2013-08-19 17:55:08,379][WARN ][river.couchdb ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....
java.io.IOException: Bogus chunk size
at sun.net.www.http.ChunkedInputStream.processRaw(ChunkedInputStream.java:319)
at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:572)
at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3052)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.elasticsearch.river.couchdb.CouchdbRiver$Slurper.run(CouchdbRiver.java:477)
at java.lang.Thread.run(Thread.java:724)
[2013-08-19 17:55:13,392][WARN ][river.couchdb ] [Morning Star] [couchdb][portal_production] failed to read from _changes, throttling....
What's the dealio?
Edit - river status
$ curl http://localhost:9200/_river/portal_production/_status?pretty=true
{
"_index" : "_river",
"_type" : "portal_production",
"_id" : "_status",
"_version" : 2,
"exists" : true, "_source" : {"ok":true,"node":{"id":"EVxlLNZ9SrSXYOLS0YBw7w","name":"Shadow Slasher","transport_address":"inet[/192.168.1.106:9300]"}}
}
Edit - River sequence data
Seems pretty low!
curl -X GET http://localhost:9200/_river/portal_production/_seq?pretty=true
{
"_index" : "_river",
"_type" : "portal_production",
"_id" : "_seq",
"_version" : 1,
"exists" : true, "_source" : {"couchdb":{"last_seq":"4"}}
}
My _changes is much bigger, by the way:
curl -X GET http://localhost:5984/portal_production/_changes?limit=5
{"results":[
{"seq":4,"id":"Ifilter-1","changes":[{"rev":"4-d9c8e771bc345d1182fbe7c2d63f5d00"}]},
{"seq":7,"id":"Document-2","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
{"seq":10,"id":"Document-4","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
{"seq":13,"id":"Document-6","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
{"seq":16,"id":"Document-8","changes":[{"rev":"1-42f52115c4a5321328be07c490932b61"}]},
...
{"seq":208657,"id":"Document-11295","changes":[{"rev":"8-37cb48660d28bef854b2c31132bc9635"}]},
{"seq":208661,"id":"Document-11297","changes":[{"rev":"6-daf5c5d557d0fa30b2b08be26582a33c"}]},
{"seq":208665,"id":"Document-11299","changes":[{"rev":"6-22e57345c2ee5c7aee8b7d664606b874"}]},
{"seq":208669,"id":"Document-11301","changes":[{"rev":"6-06deee0c3c6705238a8b07e400b2414b"}]},
{"seq":208673,"id":"Document-11303","changes":[{"rev":"6-86fc60dd8c1d415d42a25a23eb975121"}]},
{"seq":208677,"id":"Document-11305","changes":[{"rev":"6-6d51a577fdc9013abf64ec4ffbf9eeee"}]},
{"seq":208683,"id":"Document-11307","changes":[{"rev":"6-726a7835ce390094b9b9e0a91aeb11f0"}]},
{"seq":208684,"id":"Document-11286","changes":[{"rev":"9-747e63e0304a974cc7db7ff84ae80697"}]}
],
"last_seq":208684}
Edit - Couchdb log
This seems bad:
[Thu, 22 Aug 2013 02:49:37 GMT] [info] [<0.340.0>] 127.0.0.1 - - 'GET' /portal_production/_changes?feed=continuous&include_docs=true&heartbeat=10000&since=4 500
[Thu, 22 Aug 2013 02:49:42 GMT] [info] [<0.348.0>] 127.0.0.1 - - 'GET' /portal_production/_changes?feed=continuous&include_docs=true&heartbeat=10000&since=4 200
[Thu, 22 Aug 2013 02:49:42 GMT] [error] [<0.348.0>] Uncaught error in HTTP request: {exit,{ucs,{bad_utf8_character_code}}}
[Thu, 22 Aug 2013 02:49:42 GMT] [info] [<0.348.0>] Stacktrace: [{xmerl_ucs,from_utf8,1},
{mochijson2,json_encode_string,2},
{mochijson2,'-json_encode_proplist/2-fun-0-',3},
{lists,foldl,3},
{mochijson2,json_encode_proplist,2},
{mochijson2,'-json_encode_proplist/2-fun-0-',3},
{lists,foldl,3},
{mochijson2,json_encode_proplist,2}]
So I kept deleting documents one by one, and retrying the _changes query with include_doc=true. But I never got to the bottom of it. Reading through some other related issues, imports done on text from Microsoft documents can have some of the funky characters. We're doing something like that, so we dumped the database, and are filtering non-UTF8 characters out. A little painful, but we had too many documents to find where the issue was. So far, no errors on the Elasticsearch side (well, some timeouts, but that is probably another thread).
Are you indexing office documents?
You could use the attachment plugin.
I have a branch not already merged that index couchdb attachments. If you want to test it, I will be happy to get feedback!

Categories

Resources