Can't save kafka offset in zookeeper - java

I've installed Zookeeper and Kafka from Ambari, on CentoS 7.
Ambari version: 2.1.2.1
Zookeeper version: 3.4.6.2.3
Kafka version: 0.8.2.2.3
Java Kafka client:kafka_2.10, 0.8.2.2
I'm trying to save the Kafka offset, using the following code:
SimpleConsumer simpleConsumer = new SimpleConsumer(host, port, soTimeout, bufferSize, clientId);
TopicAndPartition topicAndPartition = new TopicAndPartition(topicName, partitionId);
Map<TopicAndPartition, OffsetAndMetadata> requestInfo = new HashMap<>();
requestInfo.put(topicAndPartition, new OffsetAndMetadata(readOffset, "", ErrorMapping.NoError()));
OffsetCommitRequest offsetCommitRequest = new OffsetCommitRequest(groupName, requestInfo, correlationId, clientName, (short)0);
simpleConsumer.commitOffsets(offsetCommitRequest);
simpleConsumer.close();
But when I run this, I get the following error in my client:
java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Also in the Kafka logs I have the following error:
[2015-11-24 15:38:53,566] ERROR Closing socket for /192.168.186.1 because of error (kafka.network.Processor)
java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:498)
at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:406)
at kafka.api.OffsetCommitRequest$$anonfun$1$$anonfun$apply$1.apply(OffsetCommitRequest.scala:73)
at kafka.api.OffsetCommitRequest$$anonfun$1$$anonfun$apply$1.apply(OffsetCommitRequest.scala:68)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at kafka.api.OffsetCommitRequest$$anonfun$1.apply(OffsetCommitRequest.scala:68)
at kafka.api.OffsetCommitRequest$$anonfun$1.apply(OffsetCommitRequest.scala:65)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at kafka.api.OffsetCommitRequest$.readFrom(OffsetCommitRequest.scala:65)
at kafka.api.RequestKeys$$anonfun$9.apply(RequestKeys.scala:47)
at kafka.api.RequestKeys$$anonfun$9.apply(RequestKeys.scala:47)
at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:55)
at kafka.network.Processor.read(SocketServer.scala:547)
at kafka.network.Processor.run(SocketServer.scala:405)
at java.lang.Thread.run(Thread.java:745)
Now I've also downloaded and installed the official Kafka 0.8.2.2 version from https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.2.2/kafka_2.10-0.8.2.2.tgz and it works ok; you can save the Kafka offset without any error.
Can anybody give me a some directions, why is the Ambari Kafka failing to save the offset?
P.S: I know that if versionId is 0 (in OffsetCommitRequest), than the offset is actually saved in Zookeeper.

Related

How to manage RecordTooLargeException avoiding Flink job restarting

Is there any way to ignore oversized messages without Flink job restarting?
If I try to produce (using KafkaSink ) a message which is too large (greater than max.message.bytes) then the RecordTooLargeException occurs and the Flink job restarts, then this "exception&restart" cycle is repeating endlessly!
I don't need to increase messages size limits such as max.message.bytes (Kafka Topic Config) and max.request.size (Flink Producer Config), they are good, they are already big. I just want to handle the situation when an unrealistically large message is trying to be produced. In this case, this big message should be ignored, and an error should be logged, and any Runtime Exception should NOT occur, and the endless restarting loop should NOT start.
I tried to use ProducerInterceptor -> it cannot intercept/reject a message, it can just modify it.
I tried to ignore oversized messages in SerializationSchema (implemented a custom wrapper of SerializationSchema) -> it cannot discard message producing too.
I am trying to overwrite KafkaWriter and KafkaSink classes, but it seems to be challenging.
I will be grateful for any advice!
A few quick environment details:
Kafka version is 2.8.1
Flink code is Java code based on the newer KafkaSource/KafkaSink API, not the
older KafkaConsumer/KafkaProduer API.
The flink-clients and flink-connector-kafka version is 1.15.0
Code sample which throws the RecordTooLargeException:
int numberOfRows = 1;
int rowsPerSecond = 1;
DataStream<String> stream = environment.addSource(
new DataGeneratorSource<>(
RandomGenerator.stringGenerator(1050000), // max.message.bytes=1048588
rowsPerSecond,
(long) numberOfRows),
TypeInformation.of(String.class))
.setParallelism(1)
.name("string-generator");
KafkaSinkBuilder<String> builder = KafkaSink.<String>builder()
.setBootstrapServers("localhost:9092")
.setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)
.setRecordSerializer(
KafkaRecordSerializationSchema.builder().setTopic("test.output")
.setValueSerializationSchema(new SimpleStringSchema())
.build());
KafkaSink<String> sink = builder.build();
stream.sinkTo(sink).setParallelism(1).name("output-producer");
Exception Stack Trace:
2022-06-02/14:01:45.066/PDT [flink-akka.actor.default-dispatcher-4] INFO output-producer: Writer -> output-producer: Committer (1/1) (a66beca5a05c1c27691f7b94ca6ac025) switched from RUNNING to FAILED on 271b1b90-7d6b-4a34-8116-3de6faa8a9bf # 127.0.0.1 (dataPort=-1). org.apache.flink.util.FlinkRuntimeException: Failed to send data to Kafka null with FlinkKafkaInternalProducer{transactionalId='null', inTransaction=false, closed=false} at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.throwException(KafkaWriter.java:440) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.lambda$onCompletion$0(KafkaWriter.java:421) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) ~[flink-runtime-1.15.0.jar:1.15.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1050088 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.

Issue connecting redis cluster using redisson

I am trying to connect redis cluster using redisson 2.3.0 and redis 5.0.7.
Java code for connection:
Config config = new Config();
config.useClusterServers()
.addNodeAddress("redis://127.0.0.1:8000");
RedissonClient redisson = Redisson.create(config);
It is giving following error::
java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.(InetSocketAddress.java:224)
at org.redisson.client.RedisClient.(RedisClient.java:93)
at org.redisson.connection.MasterSlaveConnectionManager.createClient(MasterSlaveConnectionManager.java:310)
at org.redisson.cluster.ClusterConnectionManager.connect(ClusterConnectionManager.java:150)
at org.redisson.cluster.ClusterConnectionManager.(ClusterConnectionManager.java:81)
at org.redisson.config.ConfigSupport.createConnectionManager(ConfigSupport.java:172)
at org.redisson.Redisson.(Redisson.java:103)
at org.redisson.Redisson.create(Redisson.java:133)
Cluster config from CLUSTER_NODES in redisson CLusterConnectionManager::
[ClusterNodeInfo [nodeId=4317f285b359ddc3ac08bb85239924509146e475, address=//127.0.0.1:8003#18003, flags=[SLAVE], slaveOf=4118a348827e6107d7e35522a251fd39c5a8f82b, slotRanges=[]],
ClusterNodeInfo [nodeId=2f7b93c80d3721b3fb26fe87bc28ed04a63fe0ec, address=//127.0.0.1:8005#18005, flags=[SLAVE], slaveOf=8b81c3e1acb4e1959a83267540058d1a6bffa12f, slotRanges=[]],
ClusterNodeInfo [nodeId=4118a348827e6107d7e35522a251fd39c5a8f82b, address=//127.0.0.1:8001#18001, flags=[MASTER], slaveOf=null, slotRanges=[[5461-10922]]],
ClusterNodeInfo [nodeId=a0770863d893a5b8106a83e247cea2544f99ef36, address=//127.0.0.1:8004#18004, flags=[SLAVE], slaveOf=6b9da1bbe38b978a3017406e5c1e310f4706cfc8, slotRanges=[]],
ClusterNodeInfo [nodeId=8b81c3e1acb4e1959a83267540058d1a6bffa12f, address=//127.0.0.1:8000#18000, flags=[MYSELF, MASTER], slaveOf=null, slotRanges=[[0-5460]]],
ClusterNodeInfo [nodeId=6b9da1bbe38b978a3017406e5c1e310f4706cfc8, address=//127.0.0.1:8002#18002, flags=[MASTER], slaveOf=null, slotRanges=[[10923-16383]]]]
Reason i found after debugging is :
Not generating ClientPartition.getMasterAddress() properly.
Address is //127.0.0.1:8001#18001 as recorded by CLUSTER_NODES, but it reads it as host: 18001 and port=-1, i.e reading the cluster bus port.
Please suggest where i am doing wrong config.
Thank You in advance.

Kafka streams throwing OutOfMemory exceptions continuously and stops working

I have set up a streams application which consumes messages from one topic, transforms them and put it to the other topic, if any error happens in serialization it puts the records to the error topic.
The load of messages is huge (in millions). The stream app was working perfectly fine until a few days ao, we loaded around 70M data and it was still doing good, then day before yesterday we added another stream to the same application and started streaming the data, now the application crashes with OOM exceptions. Each of the streams have different topics and consumer groups assigned.
The applications runs for an hour or so and then crashes with "java.lang.OutOfMemoryError: Java heap space" errors.
This application is behaving very strangely, we increased the heap size(Xmx) to 2G on each node, our topology is 2 nodes are running the application which are connected to Kafka broker which is running 3 nodes.
There were no network issues but I frequently see "Attempt to heartbeat failed since group is rebalancing" and consumer rebalancing happening in the logs only for the newly added stream.
kafka clients version - 2.3.1
kafka broker - 2.11
Kafka streams configuration:
```props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, SendAndContinueExceptionHandler.class);
props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, CustomProductionExceptionHandler.class);
props.put(ProducerConfig.RETRIES_CONFIG, "1");
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, CustomPartitioner.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
Streams creation code:
```#Bean
public Set<KafkaStreams> kStreamJson(StreamsBuilder builder) {
Serde<JsonNode> jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
final KStream<String, JsonNode> infoStream = builder.stream(inputTopic, Consumed.with(Serdes.String(), jsonSerde));
Properties infoProps = kStreamsConfigs().asProperties();
infoProps.put(StreamsConfig.APPLICATION_ID_CONFIG, migrationMOIProfilesGroupId);
infoStream
.map(IProcessX::process)
.through(
outputTopic,
Produced.with(Serdes.String(), new JsonPOJOSerde<>(Message.class)));
return Sets.newHashSet(
new KafkaStreams(builder.build(), infoProps)
);
}
Errors received:
[8/6/20, 22:22:54:070 GST] 00000076 SystemOut O 2020-08-06 22:22:54.070 INFO 83225 --- [s-streams-group] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=streams-group-5c15c2c1-798f-4b4d-91a8-24bd9a093fe6-StreamThread-18-consumer, groupId=streams-group] Discovered group coordinator kafka.broker:9092 (id: 2147483644 rack: null)
[8/6/20, 22:23:22:831 GST] 00000d95 SystemOut O 2020-08-06 21:27:59.979 ERROR 83225 --- [| producer-3356] o.apache.kafka.common.utils.KafkaThread : Uncaught exception in thread 'kafka-producer-network-thread | producer-3356':
java.lang.OutOfMemoryError: Java heap space
I have checked session.timeout.ms, heartbeat.timeout.ms, max.poll.interval.ms, max.poll.records but Im not sure what values to set for them.
Please help me solve the issue.

Old Kafka Offset consuming by Spark Structured Streaming after clearing Checkpointing location

I have built an application using the Apache Kafka and Apache Spark Structured streaming. I am facing the below issue.
Scenario:
I set up a Spark structured stream with a source of Kafka topic and
sink as Kafka topic.
We run the stream and produce a number of messages on the Kafka
topic.
We stopped the stream and restart stream by clearing checkpointing
location of the stream. After running for 5 to 6 hour later stream is
consuming old Kafka messages randomly.
After clearing checkpointing location I was expecting only new messages on stream.
Spark version: 2.4.0,
Kafka-client version: 2.0.0,
Kafka version: 2.0.0,
Cluster Manager: Kubernetes.
I have tried this scenario by changing the checkpointing location but the issue still persists.
{
SparkConf sparkConf = new SparkConf().setAppName("SparkKafkaConsumer");
SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();
Dataset<Row> stream = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option(subscribeType, "REQUEST_TOPIC")
.option("failOnDataLoss",false)
.option("maxOffsetsPerTrigger","50")
.option("startingOffsets","latest")
.load()
.selectExpr(
"CAST(value AS STRING) as payload",
"CAST(key AS STRING)",
"CAST(topic AS STRING)",
"CAST(partition AS STRING)",
"CAST(offset AS STRING)",
"CAST(timestamp AS STRING)",
"CAST(timestampType AS STRING)");
DataStreamWriter<String> dataWriterStream = stream
.writeStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("kafka.max.request.size", "35000000")
.option("kafka.retries", "5")
.option("kafka.batch.size", "35000000")
.option("kafka.receive.buffer.bytes", "200000000")
.option("kafka.acks","0")
.option("kafka.compression.type", "snappy")
.option("kafka.linger.ms", "0")
.option("kafka.buffer.memory", "50000000")
.option("topic", "RESPONSE_TOPIC")
.outputMode("append")
.option("checkpointLocation", checkPointDirectory);
spark.streams().awaitAnyTermination();
}
check below link,
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-checkpointing.html
You call SparkContext.setCheckpointDir(directory: String) to set the checkpoint directory - the directory where RDDs are checkpointed. The directory must be a HDFS path if running on a cluster. The reason is that the driver may attempt to reconstruct the checkpointed RDD from its own local file system, which is incorrect because the checkpoint files are actually on the executor machines

Cannot programmatically submit Spark application (with Cassandra connector) to cluster from remote client

I'm running a standalone Spark cluster on EC2, and I'm writing a application using Spark-Cassandra connector driver and try to submit job to Spark cluster programmatically.
The job itself is simple:
public static void main(String[] args) {
SparkConf conf;
JavaSparkContext sc;
conf = new SparkConf()
.set("spark.cassandra.connection.host", host);
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
sc = new JavaSparkContext("spark://[spark_master_host]","test",conf);
CassandraJavaRDD<CassandraRow> rdd = javaFunctions(sc).cassandraTable(
"keyspace", "table");
System.out.println(rdd.first().toString());
sc.stop();
}
Which runs fine when I run that in the Spark Master node of my EC2 cluster.
I'm trying to running this in a remote Windows client.
The problem was from these two lines:
conf.set("spark.driver.host", "[my_public_ip]");
conf.set("spark.driver.port", "15000");
First, if i comment out these 2 lines, application would not throw a exception, but the Executor is not running, with following log:
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/3 is now LOADING
14/12/06 22:40:03 INFO client.AppClient$ClientActor: Executor updated: app-20141207033931-0021/0 is now EXITED (Command exited with code 1)
14/12/06 22:40:03 INFO cluster.SparkDeploySchedulerBackend: Executor app-20141207033931-0021/0 removed: Command exited with code 1
Which never ends, when I check the worker node log, I found:
14/12/06 22:40:21 ERROR security.UserGroupInformation: PriviledgedActionException as:[username] cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
... 7 more
I've no idea what that's about, my guess is that probably worker node could not connect to driver, which probably initially set as:
14/12/06 22:39:30 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#[some_host_name]:52660]
14/12/06 22:39:30 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver#[some_host_name]:52660]
Obviously, no DNS is going to resolve my host name...
Since I can't set deploy mode to "client" or "cluster", if not via ./spark-submit script.(Which I think that's absurd...). I try to add a host resolution "XX.XXX.XXX.XX [host-name]" in /etc/hosts of all Spark Master Worker nodes.
No luck of course...
That leads me to the second, un-comment that two line;
Which gives me:
14/12/06 22:59:41 INFO Remoting: Starting remoting
14/12/06 22:59:41 ERROR Remoting: Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
at akka.remote.Remoting.start(Remoting.scala:194)
...
Cause:
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: /[my_public_ip]:15000
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:391)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:388)
I double checked my firewall setting and router setting, confirm that my firewall is diabled; and netstat -an to confirm port 15000 is not in use (in fact I tried to change to several available port, no luck); and I ping my public ip from both other machine and machine from my cluster, no problem.
Now I'm utterly screw up, I just run out of idea try to fix this. Any suggestions? Any help is appreciated!
Please check if 15000 is in your security group.

Categories

Resources