We seem to be out of ideas on how to continue troubleshooting this issue. Suddenly we see the exception listed below being hit every few minutes.
This is a 4 Mongo Shard Setup. The 4 MongoS servers are proxied using 3 HAProxies. We aren't having any visible network issues between our application and shards. The mongo logs for the shards, config servers or mongos don't show anything out of the ordinary. The haproxies seem to be doing their job just fine. Any thoughts and leads on this would be most appreciated!
{"#timestamp":"2016-02-10T05:10:23.780+00:00","#version":1,"message":"unable to process event [ Request Id: [ eb8c702c-b3e7-4605-99a7-c3dcb5a076a9 ] - Event: id [ 49e0ae8d-f16e-448c-9b28-a4af59aa2eb0 ] messageType [ bulk ] operationType: [ create ] xpoint [ 10254910941235296908401 ] ]","logger_name":"com.xyz.event.RabbitMQMessageProcessor","thread_name":"SimpleAsyncTaskExecutor-1","level":"WARN","level_value":30000,"stack_trace":"java.io.EOFException: null\n\tat org.bson.io.Bits.readFully(Bits.java:50) ~[mongo-java-driver-2.12.5.jar:na]\n\tat org.bson.io.Bits.readFully(Bits.java:35)\n\tat org.bson.io.Bits.readFully(Bits.java:30)\n\tat com.mongodb.Response.(Response.java:42)\n\tat com.mongodb.DBPort$1.execute(DBPort.java:141)\n\tat com.mongodb.DBPort$1.execute(DBPort.java:135)\n\tat com.mongodb.DBPort.doOperation(DBPort.java:164)\n\tat com.mongodb.DBPort.call(DBPort.java:135)\n\tat c.m.DBTCPConnector.innerCall(DBTCPConnector.java:289)\n\t... 56 common frames omitted\nWrapped by: c.m.MongoException$Network: Read operation to server prod_mongos.internal.xyz.com:27017 failed on database xyz\n\tat c.m.DBTCPConnector.innerCall(DBTCPConnector.java:297) ~[mongo-java-driver-2.12.5.jar:na]\n\tat c.m.DBTCPConnector.call(DBTCPConnector.java:268)\n\tat c.m.DBCollectionImpl.find(DBCollectionImpl.java:84)\n\tat c.m.DBCollectionImpl.find(DBCollectionImpl.java:66)\n\tat c.m.DBCollection.findOne(DBCollection.java:869)\n\tat c.m.DBCollection.findOne(DBCollection.java:843)\n\tat c.m.DBCollection.findOne(DBCollection.java:789)\n\tat o.s.d.m.c.MongoTemplate$FindOneCallback.doInCollection(MongoTemplate.java:2013) ~[spring-data-mongodb-1.6.3.RELEASE.jar:na]\n\tat o.s.d.m.c.MongoTemplate$FindOneCallback.doInCollection(MongoTemplate.java:1997)\n\tat o.s.d.m.c.MongoTemplate.executeFindOneInternal(MongoTemplate.java:1772)\n\t... 47 common frames omitted\nWrapped by: o.s.d.DataAccessResourceFailureException: Read operation to server prod_mongos.internal.xyz.com:27017 failed on database xyz; nested exception is com.mongodb.MongoException$Network: Read operation to server prod_mongos.internal.xyz.com:27017 failed on database xyz\n\tat o.s.d.m.c.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:59) ~[spring-data-mongodb-1.6.3.RELEASE.jar:na]\n\tat o.s.d.m.c.MongoTemplate.potentiallyConvertRuntimeException(MongoTemplate.java:1946)\n\tat o.s.d.m.c.MongoTemplate.executeFindOneInternal(MongoTemplate.java:1776)\n\tat o.s.d.m.c.MongoTemplate.doFindOne(MongoTemplate.j...","HOSTNAME":"prod-node-09","requestId":"eb8c702c-b3e7-4605-99a7-c3dcb5a076a9","WHAT":"ProcessBulkDiscoveredXYZEvent","host":"172.30.31.155:44243","type":"cloud_service","tags":["_grokparsefailure"]}
Related
I have 3 instance mongodb replicaset including 1 arbiter in 3 different ec2 instance. From mongo console I am able to connect to replica set.
But when I try to build/deploy my dockerized spring boot apllication in the primary ec2 instance it gives below exception
Caused by: org.springframework.data.mongodb.UncategorizedMongoDbException: Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='<usrName>', source='<source>', password=<hidden>, mechanismProperties=<hidden>}; nested exception is com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='<usrName>', source='<source>', password=<hidden>, mechanismProperties=<hidden>}
Caused by: com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='<usrName>', source='<source>', password=<hidden>, mechanismProperties=<hidden>}
Caused by: com.mongodb.MongoCommandException: Command failed with error 18 (AuthenticationFailed): 'Authentication failed.' on server <Primary-Host>:27017. The full response is {"operationTime": {"$timestamp": {"t": 1601217500, "i": 1}}, "ok": 0.0, "errmsg": "Authentication failed.", "code": 18, "codeName": "AuthenticationFailed", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1601217500, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "KSwBAZHnhcqmjdsAy9HHVB8+yZQ=", "subType": "00"}}, "keyId": 6876114453302083588}}}
Spring data mongodb properties used while connecting to replicaset
spring.data.mongodb.uri=mongodb://<usrName>:<password>#<host-primary>:27017,<host-secondary>:27017/<dbName>?<replicaset name>
spring.data.mongodb.auto-index-creation = true
Where as when I try to build/deploy using below properties i.e single node connection this is getting successfull
spring.data.mongodb.host=<Primary-Host>
spring.data.mongodb.port=27017
spring.data.mongodb.database=<database name>
spring.data.mongodb.authentication-database=admin
spring.data.mongodb.username=<user name>
spring.data.mongodb.password=<password>
spring.data.mongodb.auto-index-creation = true
Does username or password contains at sign #, colon :, slash /, or the percent sign % character ?
If yes check if you are using correct encoding.
Also try adding authSource in uri like so :
?authSource=admin&replicaSet=myRepl
I've run into this error when KafkaStream tries to deserialise the Arvo message:
[filtering-app-6adef284-11eb-48f8-8ca0-cde7da5224ab-StreamThread-1] ERROR org.apache.kafka.streams.KafkaStreams - stream-client [filtering-app-6adef284-11eb-48f8-8ca0-cde7da5224ab] All stream threads have died. The instance will be in error state and should be closed.
[filtering-app-6adef284-11eb-48f8-8ca0-cde7da5224ab-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [filtering-app-6adef284-11eb-48f8-8ca0-cde7da5224ab-StreamThread-1] Shutdown complete
Exception in thread "filtering-app-6adef284-11eb-48f8-8ca0-cde7da5224ab-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately.
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:80)
at org.apache.kafka.streams.processor.internals.RecordQueue.maybeUpdateTimestamp(RecordQueue.java:160)
The cause exception was:
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 1
Caused by: java.lang.RuntimeException: java.lang.StringIndexOutOfBoundsException: begin 1, end 0, length 1
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1529)
and
Caused by: java.lang.StringIndexOutOfBoundsException: begin 1, end 0, length 1
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
at java.base/java.lang.String.substring(String.java:1874)
The avro configuration is straigthforward:
{
"namespace": "io.confluent.developer.avro",
"type": "record",
"name": "Publication",
"fields": [
{"name": "name", "type": "string"},
{"name": "title", "type": "string"}
]
}
which is from this tutorial: https://kafka-tutorials.confluent.io/filter-a-stream-of-events/kstreams.html. The producer serialises the input string "{"name": "George R. R. Martin", "title": "A Dream of Spring"}" with no problem, but then the KafkaStream which basically tries to filter the event failed to deserialise the object to perform the Java filtering logic...
Has anyone encountered this problem before ? Appreciate any suggestions!
Found the issue: a proxy gets in the way.
The root cause was that the app can't connect to schema-registry. Just note it here in case someone runs into the same problem later.
I am trying to create a test framework using ZeroCode for Kafka. The product I am trying to test is based on micro-services and Kafka. All I am trying to do is to connect to my topic and publish a message to it, at the moment. But when I run the test case I get an error saying 'Exception during operation:produce
Stacktrace
-------------------------- BDD: Scenario:Produce a message to kafka topic - vanilla -------------------------
27 Mar 2020 10:43:21,531 INFO [main] runner.ZeroCodeMultiStepsScenarioRunnerImpl |
### Executing Scenario -->> Count No: 0
27 Mar 2020 10:43:21,531 INFO [main] runner.ZeroCodeMultiStepsScenarioRunnerImpl |
### Executing Step -->> Count No: 0
---------------------------------------------------------
kafka.bootstrap.servers - <myKafkaBootstrapServer>
---------------------------------------------------------
27 Mar 2020 10:43:21,681 INFO [main] client.BasicKafkaClient | <myKafkaBootstrapServer>, topicName:executions.enriched, operation:produce, requestJson:{"recordType":"JSON","records":[{"value":"EquityExecution"}]}
27 Mar 2020 10:43:21,683 ERROR [main] client.BasicKafkaClient | Exception during operation:produce, topicName:executions.enriched, error:null
java.lang.RuntimeException: java.lang.NullPointerException
at org.jsmart.zerocode.core.kafka.client.BasicKafkaClient.execute(BasicKafkaClient.java:50)
at org.jsmart.zerocode.core.engine.executor.JsonServiceExecutorImpl.executeKafkaService(JsonServiceExecutorImpl.java:102)
at org.jsmart.zerocode.core.runner.ZeroCodeMultiStepsScenarioRunnerImpl.runScenario(ZeroCodeMultiStepsScenarioRunnerImpl.java:190)
at org.jsmart.zerocode.core.runner.ZeroCodeUnitRunner.runLeafJsonTest(ZeroCodeUnitRunner.java:198)
I am using .properties file to give broker and SSL credentials. Then send a test JSON. If publishing is successful then I plan to consume from a certain topic and assert on the values - thereby performing an integration test on the service.
Please help me resolving this as I cannot find any meaningful information online as to how to fix this. Much appreciated!
My .properties file look something like this:
security.properties=SSL
ssl.keystore.password=<myPassword>
ssl.keystore.location=<myLocation>
kafka.bootstrap.servers=<myServer>
My JSON file (Test Scenario, null key is a valid input to my topic) looks something like this:
{
"scenarioName": "Produce a message to kafka topic - vanilla",
"steps": [
{
"name": "produce_step",
"url": "kafka-topic:my.topic",
"operation": "produce",
"request": {
"records":[
{
"value": "My test value"
}
]
},
"assertions": {
"status" : "Ok",
}
}
]
}
Everything looking ok except the below. You should fix these first, then it will work fine.
kafka.bootstrap.servers=<myServer> should go into the Kafka broker properties which your #TargetEnv("kafka_servers/kafka_test_server.properties") is pointing to.
The producer.properties should not have kafka.bootstrap.servers=... redundant entry.
And properties file as below:
kafka_test_server.properties
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# kafka bootstrap servers comma separated
# e.g. localhost:9092,host2:9093
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
kafka.bootstrap.servers=localhost:9092
kafka.producer.properties=producer.properties
kafka.consumer.properties=consumer.properties
That's it.
#TargetEnv("kafka_servers/kafka_test_server.properties")
#RunWith(ZeroCodeUnitRunner.class)
public class KafkaProduceTest {
#Test
#JsonTestCase("kafka/produce/test_kafka_produce.json")
public void testProduce() throws Exception {
}
}
There is a working example KafkaProduceTest in GitHub Kafka HelloWorld project you can clone and run locally.
I keep getting the following exception on trying to restore a snapshot using cloud_aws plugin from s3 repository:
[WARN ][cluster.action.shard ] [Landslide] [index_name][0] received shard failed for target shard [[index_name][0], node[U2w_femBQYO3f5TuOI5daw], [P], v[109], restoring[elasticsearch:backup_name], s[INITIALIZING], a[id=gJBMpmcVT6G132-h1ONGgw], unassigned_info[[reason=ALLOCATION_FAILED], at[2016-09-01T13:01:53.147Z], details[failed to create shard, failure ElasticsearchException[failed to create shard]; nested: LockObtainFailedException[Can't lock shard [index_name][0], timed out after 5000ms]; ]]], indexUUID [7quZdjJqRRmhzr7WBXqlgQ], message [failed to create shard], failure [ElasticsearchException[failed to create shard]; nested: LockObtainFailedException[Can't lock shard [index_name][0], timed out after 5000ms]; ]
[index_name][[index_name][0]] ElasticsearchException[failed to create shard]; nested: LockObtainFailedException[Can't lock shard [index_name][0], timed out after 5000ms];
at org.elasticsearch.index.IndexService.createShard(IndexService.java:389)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:601)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:501)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:166)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.LockObtainFailedException: Can't lock shard [index_name][0], timed out after 5000ms
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:609)
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:537)
at org.elasticsearch.index.IndexService.createShard(IndexService.java:306)
... 10 more
ES version 2.4.0.
There's no other operation going on on elastic server. Also note that the restore operation does complete inspite of the above exception, but it takes very long to restore a < 1GB index on a 5-6 Mbps internet. Any suggestions?
EDIT: It seems its a problem with s3 buckets. Finding no particular solution to this, i have moved on to FS repository.
i'd still be interested in the solution to this problem
I am using 3 node cluster setup with the elasticsearch 1.3.1, i have 17 indices each one is having min 0.5 M (1Gi) documents and 1.4 M (3 Gi) max. now i would like to try the snapshot and restore process in my cluster. i used the following REST calls to do the same...
To create a repository:
curl -XPUT 'http://host.name:9200/_snapshot/es_snapshot_repo' -d '{
"type": "fs",
"settings": {
"location": "/data/es_snapshot_bkup_repo/es_snapshot_repo"
}
}'
Verified the repository:
curl -XGET 'http://host.name:9200/_snapshot/es_snapshot_repo?pretty' the response is
{
"es_snapshot_repo" : {
"type" : "fs",
"settings" : {
"location" : "/data/es_snapshot_bkup_repo/es_snapshot_repo"
}
}
}
done the SNAPSHOT using
curl -XPUT "http://host.name:9200/_snapshot/es_snapshot_repo/snap_001" -d '{
"indices": "index_01",
"ignore_unavailable": "true",
"include_global_state": false,
"wait_for_completion": true
}'
the response is
{
"accepted": true
}
then I am trying to restore the snapshot by the request
curl -XPOST "http://host.name:9200/_snapshot/es_snapshot_repo/snap_001/_restore" -d '{
"indices": "index_01",
"ignore_unavailable": "true",
"include_global_state": false,
"rename_pattern": "index_01",
"rename_replacement": "index_01_bk",
"include_aliases": false
}'
ISSUE:
As I informed I have 3 nodes. the index which I am trying to take snapshot & restore is has 6 shards and 2 replicas.
Most of the shards and its replicas are restored properly, but sometimes 1, sometimes 2 primary shards and its replicas restoring is not happen. those primary shards are in the INITIALIZING state. I allow the cluster to relocate them for more than an hour but the shards are not relocating to the correct node... I got the following exception in my node.
the restore process trying to place the shard in the other 2 nodes... but it can't possible...
[2014-08-27 07:10:35,492][DEBUG][cluster.service ] [node_01] processing [
shard-failed (
[snap_001][4],
node[r4UoA7vJREmQfh6lz634NA],
[P],
restoring[es_snapshot_repo:snap_001],
s[INITIALIZING]),
reason [Failed to start shard,
message [IndexShardGatewayRecoveryException[[snap_001][4] failed recovery];
nested: IndexShardRestoreFailedException[[snap_001][4] restore failed];
nested: IndexShardRestoreFailedException[[snap_001][4] failed to restore snapshot [snap_001]];
nested: IndexShardRestoreFailedException[[snap_001][4] failed to read shard snapshot file];
nested: FileNotFoundException[/data/es_snapshot_bkup_repo/es_snapshot_repo/indices/index_01/4/snapshot-snap_001 (No such file or directory)]; ]]]:
done applying updated cluster_state (version: 56391)
Could anyone help me to overcome this issue and please correct me if I done any mistake in these process...
FYI I am using master node to pass the curl request
We need to provide a shared file system location which should be access by all the elasticsearch nodes with read & write permission.