How to manage RecordTooLargeException avoiding Flink job restarting - java

Is there any way to ignore oversized messages without Flink job restarting?
If I try to produce (using KafkaSink ) a message which is too large (greater than max.message.bytes) then the RecordTooLargeException occurs and the Flink job restarts, then this "exception&restart" cycle is repeating endlessly!
I don't need to increase messages size limits such as max.message.bytes (Kafka Topic Config) and max.request.size (Flink Producer Config), they are good, they are already big. I just want to handle the situation when an unrealistically large message is trying to be produced. In this case, this big message should be ignored, and an error should be logged, and any Runtime Exception should NOT occur, and the endless restarting loop should NOT start.
I tried to use ProducerInterceptor -> it cannot intercept/reject a message, it can just modify it.
I tried to ignore oversized messages in SerializationSchema (implemented a custom wrapper of SerializationSchema) -> it cannot discard message producing too.
I am trying to overwrite KafkaWriter and KafkaSink classes, but it seems to be challenging.
I will be grateful for any advice!
A few quick environment details:
Kafka version is 2.8.1
Flink code is Java code based on the newer KafkaSource/KafkaSink API, not the
older KafkaConsumer/KafkaProduer API.
The flink-clients and flink-connector-kafka version is 1.15.0
Code sample which throws the RecordTooLargeException:
int numberOfRows = 1;
int rowsPerSecond = 1;
DataStream<String> stream = environment.addSource(
new DataGeneratorSource<>(
RandomGenerator.stringGenerator(1050000), // max.message.bytes=1048588
rowsPerSecond,
(long) numberOfRows),
TypeInformation.of(String.class))
.setParallelism(1)
.name("string-generator");
KafkaSinkBuilder<String> builder = KafkaSink.<String>builder()
.setBootstrapServers("localhost:9092")
.setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)
.setRecordSerializer(
KafkaRecordSerializationSchema.builder().setTopic("test.output")
.setValueSerializationSchema(new SimpleStringSchema())
.build());
KafkaSink<String> sink = builder.build();
stream.sinkTo(sink).setParallelism(1).name("output-producer");
Exception Stack Trace:
2022-06-02/14:01:45.066/PDT [flink-akka.actor.default-dispatcher-4] INFO output-producer: Writer -> output-producer: Committer (1/1) (a66beca5a05c1c27691f7b94ca6ac025) switched from RUNNING to FAILED on 271b1b90-7d6b-4a34-8116-3de6faa8a9bf # 127.0.0.1 (dataPort=-1). org.apache.flink.util.FlinkRuntimeException: Failed to send data to Kafka null with FlinkKafkaInternalProducer{transactionalId='null', inTransaction=false, closed=false} at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.throwException(KafkaWriter.java:440) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.lambda$onCompletion$0(KafkaWriter.java:421) ~[flink-connector-kafka-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) ~[flink-streaming-java-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) ~[flink-runtime-1.15.0.jar:1.15.0] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) ~[flink-runtime-1.15.0.jar:1.15.0] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 1050088 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.

Related

How to avoid "failed to send operations" errors after switching to the "exactly-once" delivery strategy?

I recently tried to switch my subscriptions in GCP Pub/Sub to the "exactly-once" delivery strategy. However, I started seeing the following warnings ~10 times every 30 minutes in my application logs:
com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Some acknowledgement ids in the request were invalid. This could be because the acknowledgement ids have expired or the acknowledgement ids were malformed.
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:92)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:98)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:67)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1041)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1215)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:574)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:544)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:535)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:563)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:744)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Some acknowledgement ids in the request were invalid. This could be because the acknowledgement ids have expired or the acknowledgement ids were malformed.
at io.grpc.Status.asRuntimeException(Status.java:535)
... 17 more
They're immediately followed by the following INFO log messages in the same thread:
Permanent error invalid ack id message, will not resend
I didn't see any problems caused by these warnings, but it's a bit hard to tell because my application is processing a decent number of messages (~1000/hour). I initially thought that these warnings were just short-term "aftershocks" from switching to the "exactly-once" strategy. However, I waited for about 2 hours and they kept occurring with the same frequency, showing no sign of disappearing. I then disabled the "exactly-once" strategy and they went away immediately after. Can anyone tell me whether these warnings are dangerous, what side effects I can expect, and most importantly how I can get rid of them?
I'm using version 3.4.0 of spring-cloud-gcp-dependencies and spring-cloud-gcp-starter-pubsub. I'm also using Spring Cloud Stream to process the incoming messages and I rely on it to automatically acknowledge the messages.
I have the following configuration set in my application.yaml file:
spring:
cloud:
gcp:
pubsub:
subscriber:
executor-threads: 15
max-ack-extension-period: 23400 # 6 hours and 30 minutes
acknowledgement-deadline: 600 # Maximum value
For context: The messages in my application represent jobs for execution, and they could take quite a while to finish - hence the 6h30m maximum acknowledgement extension period.
I also saw the following StackOverflow question:
How to handle errors during message acknowledgement using google pubsub java library?
From what I understand, the consequence of these warnings is that the messages will be redelivered to my application, but this is exactly what I want to avoid.
Thanks for the question, Alexander.
The errors you are seeing happen when the modifyAckDeadline or Acknowledgement request to the service fail because the acknowledgement id is already expired. In such cases, the service considers the expired acknowledgement id as invalid, since a newer delivery might already be in-flight. This is as-per the guarantees for exactly once delivery. There could be a few reasons for it:
The request was delayed due to network delays and by the time it arrived at the server, the acknowledgement id lease is already expired.
The tasks issuing the modifyAckDeadline or Acknowledgement request is overwhelmed (high CPU/memory/network usage), leading to delay in issuing those requests.
I suggest setting min-duration-per-ack-extension to a high number to reduce issues mentioned above. This will help mitigate the cases where the acknowledgement id lease expired. The highest value you can set for this field is 600 seconds.
Additionally, as mentioned in the other stack overflow question, you should consider checking the response of your acknowledgement operations. This can be used to guide your application if it can expect a redelivery. Sample.

Apache Flink got exception: java.lang.IllegalStateException: Trying to work with offloaded serialized shuffle descriptors

I am using Flink on the cluster. As I submitted the task, I got the following exception:
Caused by: java.util.concurrent.CompletionException: java.lang.IllegalStateException: Trying to work with offloaded serialized shuffle descriptors.
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:925)
at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:913)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:234)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1079)
at akka.dispatch.OnComplete.internal(Future.scala:263)
at akka.dispatch.OnComplete.internal(Future.scala:261)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:101)
at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:999)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:458)
... 9 more
Caused by: java. lang.IllegalStateException: Trying to work with offloaded serialized shuffle descriptors.
at org.apache.flink.runtime.deployment.InputGateDeploymentDescriptor.getShuffleDescriptors(InputGateDeploymentDescriptor.java:150)
at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGateFactory.create(SingleInputGateFactory.java:125)
at org.apache.flink.runtime.io.network.NettyShuffleEnvironment.createInputGates(NettyShuffleEnvironment.java:261)
at org.apache.flink.runtime.taskmanager.Task.<init>(Task.java:420)
at org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:737)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Flink version: 1.13.6;
Scala version: 2.11
Kafka version: 2.2.2
Part of my code:
object batchProcess {
def main(args:Array[String]): Unit = {
val host = "localhost"
val port = 6379
val env = StreamExecutionEnvironment.getExecutionEnvironment
// read from kafka
val source = KafkaSource.builder[String].setBootstrapServers("localhost:9092")
.setTopics("movie_rating_records").setGroupId("my-group").setStartingOffsets(OffsetsInitializer.earliest)
.setValueOnlyDeserializer(new SimpleStringSchema())
.setBounded(OffsetsInitializer.latest).build()
// val inputDataStream = env.readTextFile("a.txt")
val inputDataStream = env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
val dataStream = inputDataStream
.map( data =>{
val arr = data.split(",")
( arr(0),arr(1).toInt,arr(2).toInt,arr(3).toFloat,arr(4).toLong)
})
val (counterUserIdPos,counterUserIdNeg,counterMovieIdPos,counterMovieIdNeg,counterUserId2MovieId) = commonProcess(dataStream)
counterUserIdPos.map(x =>{
val jedisIns = new Jedis(host,port,100000)
jedisIns.set("batch2feature_userId_rating1_"+x._1.toString, x._2.toString)
jedisIns.close()
})
env.execute("test")
}
}
The input stream from Kafka is a string split by a comma, for example: 1542295208rating,556,112852,1.0,1542295208. The above code process the string and puts them into another datastream process function. And finally, it writes the result into Redis.
Any help or hints on resolving the issue would be greatly appreciated!
Here aer a few pointers I can think of
Netty is the internal serialization mechanism of Flink => from the stack trace we know the error is likely occurring in one of the .map or so, not when interacting with Kafka nor Redis.
Serialization issues are sometimes happening in Flink when using Scala. Maybe the second .map is somehow causing connection pools or some other context instance to be serialized into the lambda, so replacing it with a Flink SinkFunction might help (in addition to improving performance since you'd only create one Jedis instance per partition).
Investigate also what serialization is going on in the commonProcess.
Essentially, you should be hunting for a place where the code somehow needs to serialize some instance whose type would confuse the Flink serialization mechanism.

Kafka streams throwing OutOfMemory exceptions continuously and stops working

I have set up a streams application which consumes messages from one topic, transforms them and put it to the other topic, if any error happens in serialization it puts the records to the error topic.
The load of messages is huge (in millions). The stream app was working perfectly fine until a few days ao, we loaded around 70M data and it was still doing good, then day before yesterday we added another stream to the same application and started streaming the data, now the application crashes with OOM exceptions. Each of the streams have different topics and consumer groups assigned.
The applications runs for an hour or so and then crashes with "java.lang.OutOfMemoryError: Java heap space" errors.
This application is behaving very strangely, we increased the heap size(Xmx) to 2G on each node, our topology is 2 nodes are running the application which are connected to Kafka broker which is running 3 nodes.
There were no network issues but I frequently see "Attempt to heartbeat failed since group is rebalancing" and consumer rebalancing happening in the logs only for the newly added stream.
kafka clients version - 2.3.1
kafka broker - 2.11
Kafka streams configuration:
```props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, SendAndContinueExceptionHandler.class);
props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, CustomProductionExceptionHandler.class);
props.put(ProducerConfig.RETRIES_CONFIG, "1");
props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, CustomPartitioner.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
Streams creation code:
```#Bean
public Set<KafkaStreams> kStreamJson(StreamsBuilder builder) {
Serde<JsonNode> jsonSerde = Serdes.serdeFrom(jsonSerializer, jsonDeserializer);
final KStream<String, JsonNode> infoStream = builder.stream(inputTopic, Consumed.with(Serdes.String(), jsonSerde));
Properties infoProps = kStreamsConfigs().asProperties();
infoProps.put(StreamsConfig.APPLICATION_ID_CONFIG, migrationMOIProfilesGroupId);
infoStream
.map(IProcessX::process)
.through(
outputTopic,
Produced.with(Serdes.String(), new JsonPOJOSerde<>(Message.class)));
return Sets.newHashSet(
new KafkaStreams(builder.build(), infoProps)
);
}
Errors received:
[8/6/20, 22:22:54:070 GST] 00000076 SystemOut O 2020-08-06 22:22:54.070 INFO 83225 --- [s-streams-group] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=streams-group-5c15c2c1-798f-4b4d-91a8-24bd9a093fe6-StreamThread-18-consumer, groupId=streams-group] Discovered group coordinator kafka.broker:9092 (id: 2147483644 rack: null)
[8/6/20, 22:23:22:831 GST] 00000d95 SystemOut O 2020-08-06 21:27:59.979 ERROR 83225 --- [| producer-3356] o.apache.kafka.common.utils.KafkaThread : Uncaught exception in thread 'kafka-producer-network-thread | producer-3356':
java.lang.OutOfMemoryError: Java heap space
I have checked session.timeout.ms, heartbeat.timeout.ms, max.poll.interval.ms, max.poll.records but Im not sure what values to set for them.
Please help me solve the issue.

How to solve network and memory issues in Kafka brokers?

When using kafka, I got intermittent two network related errors.
1. Error in fetch kafka.server.replicafetcherthread$fetchrequest connection to broker was disconnected before the reponse was read
2. Error in fetch kafka.server.replicafetcherthread$fetchrequest Connection to broker1 (id: 1 rack: null) failed
[configuration environment]
Brokers: 5 / server.properties: "kafka_manager_heap_s=1g", "kafka_manager_heap_x=1g", "offsets.commit.required.acks=1","offsets.commit.timeout.ms=5000", Most settings are the default.
Zookeepers: 3
Servers: 5
Kafka:0.10.1.2
Zookeeper: 3.4.6
Both of these errors are caused by loss of network communication.
If these errors occur, Kafka will work to expand or shrink the ISR partition several times.
expanding-ex) INFO Partition [my-topic,7] on broker 1: Expanding ISR for partition [my-topic,7] from 1,2 to 1,2,3
shrinking-ex) INFO Partition [my-topic,7] on broker 1: Shrinking ISR for partition [my-topic,7] from 1,2,3 to 1,2
I understand that these errors are caused by network problems, but I'm not sure why the break in the network is occurring.
And if this network disconnection persists, I got the following additional error
: Error when handling request(topics=null} java.lang.OutOfMemoryError: Java heap space
I wonder what causes these and how can I improve this?
The network error tells you that one of the brokers is not running, which means it cannot connect to it. As per experience the minimum heap size you can assign is 2Gb.

DSE: Unable to sstablellaoding data from 4.8.9 to 5.0.2

I have 5GB worth of data in DSE 4.8.9. I am trying to load the same data into DSE 5.0.2. The command I use is following:
root#dse:/mnt/cassandra/data$ sstableloader -d 10.0.2.91 /mnt/cassandra/data/my-keyspace/my-table-0b168ba1637111e6b40131c603254a9b/
This gives me following exception:
DEBUG 15:27:12,850 Using framed transport.
DEBUG 15:27:12,850 Opening framed transport to: 10.0.2.91:9160
DEBUG 15:27:12,850 Using thriftFramedTransportSize size of 16777216
DEBUG 15:27:12,851 Framed transport opened successfully to: 10.0.2.91:9160
Could not retrieve endpoint ranges:
InvalidRequestException(why:unconfigured table schema_columnfamilies)
java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:342)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:109)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50297)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50274)
at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:50189)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1734)
at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1719)
at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:321)
... 2 more
Thoughts?
For scenarios when you have few nodes and not a lot of data, you can follow these steps for a cluster migration (ensure the clusters are at most 1 major release apart)
1) create the schema in the new cluster
2) move both node's data to each new node (into the new cfid tables)
3) nodetool refresh to pick up the data
4) nodetool cleanup to clear out the extra data
5) If the old cluster was from a previous major version, run sstable upgrade on the new cluster.

Categories

Resources