I am using Nifi to create data flow pipeline where I use Infinispan a a cache server But when I use executescript with Groovy script , it goes on infinite loop and open many socket connections. I have tried to close the same but still it opens many connections and then it throws
java.net.SocketException: No buffer space available (maximum connections reached?): connect
By following below link I changed the registry
https://support.pitneybowes.com/VFP06_KnowledgeWithSidebarTroubleshoot?id=kA280000000PEE1CAO&popup=false&lang=en_US
Then checked the open connections with netstat -n I opens 65534 because of the above settings.
Below is the groovy script to read from Infinispan cache
import org.infinispan.client.hotrod.RemoteCache;
import org.infinispan.client.hotrod.RemoteCacheManager;
import org.infinispan.client.hotrod.configuration.ConfigurationBuilder;
import org.apache.commons.io.IOUtils;
import java.nio.charset.StandardCharsets;
def cacheName = "mycache"
def configuration = new ConfigurationBuilder()
.addServer().host("localhost").port(11322).build();
def cacheManager = new RemoteCacheManager(configuration)
RemoteCache cacheA = cacheManager.getCache(cacheName)
flowFile = session.get()
if(!flowFile) return
key = flowFile.getAttribute('key')
id = flowFile.getAttribute('id')
jsonFromCache = cacheA.get(key + "_" + id);
if(cacheA != null) {
cacheA.stop()
}
if(cacheManager != null) {
cacheManager.stop()
}
flowFile = session.write(flowFile, {outputStream ->
outputStream.write(jsonFromCache.getBytes(StandardCharsets.UTF_8))
} as OutputStreamCallback)
session.transfer(flowFile, REL_SUCCESS)
You are opening connection to cache before getting file from session.
So, you are opening connection and at the following line just exits script without closing it:
if(!flowFile) return
Another point:
you can use ExecuteGroovyScript processor. Then processor start & stop could be managed. Example you can find here: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-groovyx-nar/1.9.2/org.apache.nifi.processors.groovyx.ExecuteGroovyScript/additionalDetails.html
import org.apache.nifi.processor.ProcessContext
import java.util.concurrent.atomic.AtomicLong
class Const{
static Date startTime = null;
static AtomicLong triggerCount = null;
}
static onStart(ProcessContext context){
Const.startTime = new Date()
Const.triggerCount = new AtomicLong(0)
println "onStart $context ${Const.startTime}"
}
static onStop(ProcessContext context){
def alive = (System.currentTimeMillis() - Const.startTime.getTime()) / 1000
println "onStop $context executed ${ Const.triggerCount } times during ${ alive } seconds"
}
def flowFile = session.get()
if(!flowFile)return
flowFile.'trigger.count' = Const.triggerCount.incrementAndGet()
REL_SUCCESS << flowFile
Related
I am trying to read my application.conf which is stored in in my s3 bucket.I used Bufferedsource to read from s3 but when I try to use ConfigFactory.parseString(source.mkString).getConfig("conf") it did not find the 'conf' which is there.Below is my source code :
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain
import com.amazonaws.services.s3.model.S3Object
import com.amazonaws.services.s3.{AmazonS3Client, AmazonS3ClientBuilder, AmazonS3URI}
import scala.collection.JavaConversions._
import scala.io.{BufferedSource, Source}
object Test {
def main(args: Array[String]): Unit = {
import com.amazonaws.auth.BasicAWSCredentials
val credentials = new BasicAWSCredentials("key", "secertkey")
// val credentialsProvider = new DefaultAWSCredentialsProviderChain()
val s3Client = new AmazonS3Client(credentials)
val uri: AmazonS3URI = new AmazonS3URI("s3://test-buck/conf/application.conf")
val s3Object: S3Object = s3Client.getObject(uri.getBucket, uri.getKey)
val source: BufferedSource = Source.fromInputStream(s3Object.getObjectContent)
try {
println(source.mkString)
import com.typesafe.config.{Config, ConfigFactory}
val rawConfig: Config = ConfigFactory.parseString(source.mkString)
val rootConfig = rawConfig.getConfig("conf")
println(rootConfig)
// println(rotConfig)
} finally {
source.close()
}
}
}
My application config looks like below
conf {
source_data_list = ["OL", "SB","1CP"]
//some other value
OL {
filename = "receipts_delta_GBR_14_10_2017.csv"
sftp_conf {
hostname = "endpoint"
port = "22"
username = "ubuntu"
pem = "pemfile"
filetype = "csv"
delimiter = "|"
directory = "/home/ubuntu/data"
}
}
}
Not sure what i am doing wrong here .Same application config if i put on resource and try loading by ConfigFactory.load("application.conf").getConfig("conf") it works as expected .
Any clue on this would help .
Exception I got
Exception in thread "main" Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'conf'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:218)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:224)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:33)
at com.dsm.utils.Test$.main(Test.scala:26)
at com.dsm.utils.Test.main(Test.scala)
Actually you succeeded to read the configuration.
The issue you're having is because of BufferedSource. The Buffered source can be read once. You read it, in order to debug, I guess, and then the source gets to the end. The second time you read it, in order to populate rawConfig you get an empty string. I solved it by extracting the configuration string into a variable, and then using it.
val config = source.mkString
println(s"config is: $config")
val rawConfig: Config = ConfigFactory.parseString(config)
val rootConfig = rawConfig.getConfig("conf")
println(s"rootConfig is: $rootConfig")
The output is:
rootConfig is: Config(SimpleConfigObject({"OL":{"filename":"receipts_delta_GBR_14_10_2017.csv","sftp_conf":{"delimiter":"|","directory":"/home/ubuntu/data","filetype":"csv","hostname":"endpoint","pem":"pemfile","port":"22","username":"ubuntu"}},"source_data_list":["OL","SB","1CP"]}))
I'm an Akka beginner. (I am using Java)
I'm making a file transfer system using Akka.
Currently, I have completed sending the Actor1(Local) -> Actor2(Remote) file.
Now,
When I have a problem transferring files, I'm thinking about how to solve it.
Then I had a question. The questions are as follows.
If I lost my network connection while I was transferring files, the file transfer failed (90 percent complete).
I will recover my network connection a few minutes later.
Is it possible to transfer the rest of the file data? (10% Remaining)
If that's possible, Please give me some advice.
here is my simple code.
thanks :)
Actor1 (Local)
private Behavior<Event> onTick() {
....
String fileName = "test.zip";
Source<ByteString, CompletionStage<IOResult>> logs = FileIO.fromPath(Paths.get(fileName));
logs.runForeach(f -> originalSize += f.size(), mat).thenRun(() -> System.out.println("originalSize : " + originalSize));
SourceRef<ByteString> logsRef = logs.runWith(StreamRefs.sourceRef(), mat);
getContext().ask(
Receiver.FileTransfered.class,
selectedReceiver,
timeout,
responseRef -> new Receiver.TransferFile(logsRef, responseRef, fileName),
(response, failure) -> {
if (response != null) {
return new TransferCompleted(fileName, response.transferedSize);
} else {
return new JobFailed("Processing timed out", fileName);
}
}
);
}
Actor2 (Remote)
public static Behavior<Command> create() {
return Behaviors.setup(context -> {
...
Materializer mat = Materializer.createMaterializer(context);
return Behaviors.receive(Command.class)
.onMessage(TransferFile.class, command -> {
command.sourceRef.getSource().runWith(FileIO.toPath(Paths.get("test.zip")), mat);
command.replyTo.tell(new FileTransfered("filename", 1024));
return Behaviors.same();
}).build();
});
}
You need to think about following for a proper implementation of file transfer with fault tolerance:
How to identify that a transfer has to be resumed for a given file.
How to find the point from which to resume the transfer.
Following implementation makes very simple assumptions about 1 and 2.
The file name is unique and thus can be used for such identification. Strictly speaking, this is not true, for example you can transfer files with the same name from different folders. Or from different nodes, etc. You will have to readjust this based on your use case.
It is assumed that the last/all writes on the receiver side wrote all bytes correctly and total number of written bytes indicate the point to resume the transfer. If this cannot be guaranteed, you need to logically split the original file into chunks and transfer hashes of each chunk, its size and position to the receiver, which has to validate chunks on its side and find correct pointer for resuming the transfer.
(That's a bit more than 2 :) ) This implementation ignores identification of transfer problem and focuses on 1 and 2 instead.
The code:
object Sender {
sealed trait Command
case class Upload(file: String) extends Command
case class StartWithIndex(file: String, index: Long) extends Sender.Command
def behavior(receiver: ActorRef[Receiver.Command]): Behavior[Sender.Command] = Behaviors.setup[Sender.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case Upload(file) =>
receiver.tell(Receiver.InitUpload(file, ctx.self.narrow[StartWithIndex]))
ctx.log.info(s"Initiating upload of $file")
Behaviors.same
case StartWithIndex(file, starWith) =>
val source = FileIO.fromPath(Paths.get(file), chunkSize = 8192, starWith)
val ref = source.runWith(StreamRefs.sourceRef())
ctx.log.info(s"Starting upload of $file")
receiver.tell(Receiver.Upload(file, ref))
Behaviors.same
}
}
}
object Receiver {
sealed trait Command
case class InitUpload(file: String, replyTo: ActorRef[Sender.StartWithIndex]) extends Command
case class Upload(file: String, fileSource: SourceRef[ByteString]) extends Command
val behavior: Behavior[Receiver.Command] = Behaviors.setup[Receiver.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case InitUpload(path, replyTo) =>
val file = fileAtDestination(path)
val index = if (file.exists()) file.length else 0
ctx.log.info(s"Got init command for $file at pointer $index")
replyTo.tell(Sender.StartWithIndex(path, index.toLong))
Behaviors.same
case Upload(path, fileSource) =>
val file = fileAtDestination(path)
val sink = if (file.exists()) {
FileIO.toPath(file.toPath, Set(StandardOpenOption.APPEND, StandardOpenOption.WRITE))
} else {
FileIO.toPath(file.toPath, Set(StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE))
}
ctx.log.info(s"Saving file into ${file.toPath}")
fileSource.runWith(sink)
Behaviors.same
}
}
}
Some auxiliary methods
val destination: File = Files.createTempDirectory("destination").toFile
def fileAtDestination(file: String) = {
val name = new File(file).getName
new File(destination, name)
}
def writeRandomToFile(file: File, size: Int): Unit = {
val out = new FileOutputStream(file, true)
(0 until size).foreach { _ =>
out.write(Random.nextPrintableChar())
}
out.close()
}
And finally some test code
// sender and receiver bootstrapping is omitted
//Create some dummy file to upload
val file: Path = Files.createTempFile("test", "test")
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
// Sleep to allow file upload to finish
Thread.sleep(1000)
//Write more data to the file to emulate a failure
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload that will "recover" from the previous upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
Finally, the whole process can be defined as
I'm trying to get a simple proof of concept multi part upload working in Kotlin using the amazon s3 client based on the documentation. The first part uploads successful and I get a response with an etag. The second part doesn't upload a single thing and times out. It always fails after the first part. Is there some connection cleanup that I need to do manually somehow?
Credentials and rights are all fine. The magic numbers below are just to get to the minimum part size of 5MB.
What am I doing wrong here?
fun main() {
val amazonS3 =
AmazonS3ClientBuilder.standard().withRegion(Regions.EU_WEST_1).withCredentials(ProfileCredentialsProvider())
.build()
val bucket = "io.inbot.sandbox"
val key = "test.txt"
val multipartUpload =
amazonS3.initiateMultipartUpload(InitiateMultipartUploadRequest(bucket, key))
var pn=1
var off=0L
val etags = mutableListOf<PartETag>()
for( i in 0.rangeTo(5)) {
val buf = ByteArrayOutputStream()
val writer = buf.writer().buffered()
for(l in 0.rangeTo(100000)) {
writer.write("part $i - Hello world for the $l'th time this part.\n")
}
writer.flush()
writer.close()
val bytes = buf.toByteArray()
val md = MessageDigest.getInstance("MD5")
md.update(bytes)
val md5 = Base64.encodeBytes(md.digest())
println("going to write ${bytes.size}")
bytes.inputStream()
var partRequest = UploadPartRequest().withBucketName(bucket).withKey(key)
.withUploadId(multipartUpload.uploadId)
.withFileOffset(off)
.withPartSize(bytes.size.toLong())
.withPartNumber(pn++)
.withMD5Digest(md5)
.withInputStream(bytes.inputStream())
.withGeneralProgressListener<UploadPartRequest> { it ->
println(it.bytesTransferred)
}
if(i == 5) {
partRequest = partRequest.withLastPart(true)
}
off+=bytes.size
val partResponse = amazonS3.uploadPart(partRequest)
etags.add(partResponse.partETag)
println("part ${partResponse.partNumber} ${partResponse.eTag} ${bytes.size}")
}
val completeMultipartUpload =
amazonS3.completeMultipartUpload(CompleteMultipartUploadRequest(bucket, key, multipartUpload.uploadId, etags))
}
This always fails on the second part with
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: F419872A24BB5526; S3 Extended Request ID: 48XWljQNuOH6LJG9Z85NJOGVy4iv/ru44Ai8hxEP+P+nqHECXZwWNwBoMyjiQfxKpr6icGFjxYc=), S3 Extended Request ID: 48XWljQNuOH6LJG9Z85NJOGVy4iv/ru44Ai8hxEP+P+nqHECXZwWNwBoMyjiQfxKpr6icGFjxYc=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1630)
Just to preempt some of the answers I'm not looking for, my intention with this is NOT to upload files but to eventually be able to stream arbitrary length streams to s3 by simply uploading parts until done and then combining them. So, I can't really use the TransferManager because that requires me to know the size in advance, which I won't. Also, buffering this as a file is not something I want to do since this will run in a dockerized server application. So I really want to upload an arbitrary number of parts. I'm happy to do it sequentially; though I wouldn't mind parallelism.
I've also used "com.github.alexmojaki:s3-stream-upload:1.0.1" but that seems to keep a lot of state in memory (I've ran out a couple of times), so I'd like to replace it with something simpler.
Update. Thanks ilya in the comments below. Removing the withFileOffset fixes things.
Removing withFileOffset fixes things. Thanks #Ilya for pointing this out.
Here's a simple outputstream that I implemented that actually works.
package io.inbot.aws
import com.amazonaws.auth.profile.ProfileCredentialsProvider
import com.amazonaws.regions.Regions
import com.amazonaws.services.s3.AmazonS3
import com.amazonaws.services.s3.AmazonS3ClientBuilder
import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest
import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest
import com.amazonaws.services.s3.model.InitiateMultipartUploadResult
import com.amazonaws.services.s3.model.PartETag
import com.amazonaws.services.s3.model.UploadPartRequest
import mu.KotlinLogging
import java.io.ByteArrayOutputStream
import java.io.OutputStream
import java.security.MessageDigest
import java.util.Base64
private val logger = KotlinLogging.logger { }
class S3Writer(
private val amazonS3: AmazonS3,
private val bucket: String,
private val key: String,
private val threshold: Int = 5*1024*1024
) : OutputStream(), AutoCloseable {
private val etags: MutableList<PartETag> = mutableListOf()
private val multipartUpload: InitiateMultipartUploadResult = this.amazonS3.initiateMultipartUpload(InitiateMultipartUploadRequest(bucket, key))
private val currentPart = ByteArrayOutputStream(threshold)
private var partNumber = 1
override fun write(b: Int) {
currentPart.write(b)
if(currentPart.size() > threshold) {
sendPart()
}
}
private fun sendPart(last: Boolean = false) {
logger.info { "sending part $partNumber" }
currentPart.flush()
val bytes = currentPart.toByteArray()
val md = MessageDigest.getInstance("MD5")
md.update(bytes)
val md5 = Base64.getEncoder().encode(md.digest())
var partRequest = UploadPartRequest().withBucketName(bucket).withKey(key)
.withUploadId(multipartUpload.uploadId)
.withPartSize(currentPart.size().toLong())
.withPartNumber(partNumber++)
.withMD5Digest(md5.contentToString())
.withInputStream(bytes.inputStream())
if(last) {
logger.info { "final part" }
partRequest = partRequest.withLastPart(true)
}
val partResponse = amazonS3.uploadPart(partRequest)
etags.add(partResponse.partETag)
currentPart.reset()
}
override fun close() {
if(currentPart.size() > 0) {
sendPart(true)
}
logger.info { "completing" }
amazonS3.completeMultipartUpload(CompleteMultipartUploadRequest(bucket, key, multipartUpload.uploadId, etags))
}
}
fun main() {
val amazonS3 =
AmazonS3ClientBuilder.standard().withRegion(Regions.EU_WEST_1).withCredentials(ProfileCredentialsProvider())
.build()
val bucket = "io.inbot.sandbox"
val key = "test.txt"
try {
S3Writer(amazonS3, bucket, key).use {
val w = it.bufferedWriter()
for (i in 0.rangeTo(1000000)) {
w.write("Line $i: hello again ...\n")
}
}
} catch (e: Throwable) {
logger.error(e.message,e)
}
}
This is the basic wordcount topology I tried to run. But I am recieving error as 'INFO org.apache.storm.zookeeper.server.SessionTrackerImpl - SessionTrackerImpl exited loop!'. Can anyone help me with this??
When i removed cluster.shutdown(), tweets are coming continously until I press cntrl+c. Again wordcount is not showing ##
import java.util.Arrays;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
public class TwitterHashtagStorm {
public static void main(String[] args) throws Exception {
String consumerKey = "************";
String consumerSecret = "***************";
String accessToken = "**********";
String accessTokenSecret = "***********";
String[] keyWords = {"apple"};
Config config = new Config();
config.setDebug(true);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("twitter-spout", new TwitterSampleSpout(consumerKey,
consumerSecret, accessToken, accessTokenSecret, keyWords));
builder.setBolt("twitter-hashtag-reader-bolt", new HashtagReaderBolt())
.shuffleGrouping("twitter-spout");
builder.setBolt("twitter-hashtag-counter-bolt",
new HashtagCounterBolt()).fieldsGrouping(
"twitter-hashtag-reader-bolt", new Fields("hashtag"));
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("TwitterHashtagStorm", config,
builder.createTopology());
Thread.sleep(10000);
cluster.shutdown();
}
}
10 seconds (10000 ms) is probably not be enough time for the Twitter connection to establish and for tweets to come into your topology. You should set the sleep call to something longer (several minuets at a minimum).
As for showing the work count, does your HashTagCounter bolt print out the counts to stout? If so the print out may be lost in the log messages from Storm. Try setting config.setDebud(false) (to cut down the log messages and give you a chance of seeing the count) or rewrite HashTagCounter to emit the messages to another location (a message broker, local socket reciever etc) separate from the console you are running Storm from.
I am trying to a kafka consumer to get messages which are produced and posted to a topic in Java. My consumer goes as follows.
consumer.java
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.javaapi.message.ByteBufferMessageSet;
import kafka.message.MessageAndOffset;
public class KafkaConsumer extends Thread {
final static String clientId = "SimpleConsumerDemoClient";
final static String TOPIC = " AATest";
ConsumerConnector consumerConnector;
public static void main(String[] argv) throws UnsupportedEncodingException {
KafkaConsumer KafkaConsumer = new KafkaConsumer();
KafkaConsumer.start();
}
public KafkaConsumer(){
Properties properties = new Properties();
properties.put("zookeeper.connect","10.200.208.59:2181");
properties.put("group.id","test-group");
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
}
#Override
public void run() {
Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
topicCountMap.put(TOPIC, new Integer(1));
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumerConnector.createMessageStreams(topicCountMap);
KafkaStream<byte[], byte[]> stream = consumerMap.get(TOPIC).get(0);
System.out.println(stream);
ConsumerIterator<byte[], byte[]> it = stream.iterator();
while(it.hasNext())
System.out.println("from it");
System.out.println(new String(it.next().message()));
}
private static void printMessages(ByteBufferMessageSet messageSet) throws UnsupportedEncodingException {
for(MessageAndOffset messageAndOffset: messageSet) {
ByteBuffer payload = messageAndOffset.message().payload();
byte[] bytes = new byte[payload.limit()];
payload.get(bytes);
System.out.println(new String(bytes, "UTF-8"));
}
}
}
When I run the above code I am getting nothing in the console wheres the java producer program behind the screen is posting data continously under the 'AATest' topic. Also the in the zookeeper console I am getting the following lines when I try running the above consumer.java
[2015-04-30 15:57:31,284] INFO Accepted socket connection from /10.200.208.59:51780 (org.apache.zookeeper.
server.NIOServerCnxnFactory)
[2015-04-30 15:57:31,284] INFO Client attempting to establish new session at /10.200.208.59:51780 (org.apa
che.zookeeper.server.ZooKeeperServer)
[2015-04-30 15:57:31,315] INFO Established session 0x14d09cebce30007 with negotiated timeout 6000 for clie
nt /10.200.208.59:51780 (org.apache.zookeeper.server.ZooKeeperServer)
Also when I run a separate console-consumer pointing to the AATest topic, I am getting all the data produced by the producer to that topic.
Both consumer and broker are in the same machine whereas the producer is in different machine. This actually resembles this question. But going through it dint help me. Please help me.
Different answer but it happened to be initial offset (auto.offset.reset) for a consumer in my case. So, setting up auto.offset.reset=earliest fixed the problem in my scenario. Its because I was publishing event first and then starting a consumer.
By default, consumer only consumes events published after it started because auto.offset.reset=latest by default.
eg. consumer.properties
bootstrap.servers=localhost:9092
enable.auto.commit=true
auto.commit.interval.ms=1000
session.timeout.ms=30000
auto.offset.reset=earliest
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
Test
class KafkaEventConsumerSpecs extends FunSuite {
case class TestEvent(eventOffset: Long, hashValue: Long, created: Date, testField: String) extends BaseEvent
test("given an event in the event-store, consumes an event") {
EmbeddedKafka.start()
//PRODUCE
val event = TestEvent(0l, 0l, new Date(), "data")
val config = new Properties() {
{
load(this.getClass.getResourceAsStream("/producer.properties"))
}
}
val producer = new KafkaProducer[String, String](config)
val persistedEvent = producer.send(new ProducerRecord(event.getClass.getSimpleName, event.toString))
assert(persistedEvent.get().offset() == 0)
assert(persistedEvent.get().checksum() != 0)
//CONSUME
val consumerConfig = new Properties() {
{
load(this.getClass.getResourceAsStream("/consumer.properties"))
put("group.id", "consumers_testEventsGroup")
put("client.id", "testEventConsumer")
}
}
assert(consumerConfig.getProperty("group.id") == "consumers_testEventsGroup")
val kafkaConsumer = new KafkaConsumer[String, String](consumerConfig)
assert(kafkaConsumer.listTopics().asScala.map(_._1).toList == List("TestEvent"))
kafkaConsumer.subscribe(Collections.singletonList("TestEvent"))
val events = kafkaConsumer.poll(1000)
assert(events.count() == 1)
EmbeddedKafka.stop()
}
}
But if consumer is started first and then published, the consumer should be able to consume the event without auto.offset.reset required to be set to earliest.
References for kafka 0.10
https://kafka.apache.org/documentation/#consumerconfigs
In our case, we solved our problem with the following steps:
The first thing we found is that there is an config called 'retry' for KafkaProducer and its default value means 'No Retry'. Also, send method of the KafkaProducer is async without calling the get method of the send method's result. In this way, there is no guarantee to delivery produced messages to the corresponding broker without retry. So, you have to increase it a bit or can use idempotence or transactional mode of KafkaProducer.
The second case is about the Kafka and Zookeeper version. We chose the 1.0.0 version of the Kafka and Zookeeper 3.4.4. Especially, Kafka 1.0.0 had an issue about the connectivity with Zookeeper. If Kafka loose its connection to the Zookeeper with an unexpected exception, it looses the leadership of the partitions which didn't synced yet. There is an bug topic about this issue :
https://issues.apache.org/jira/browse/KAFKA-2729
After we found the corresponding logs at Kafka log which indicates same issue at topic above, we upgraded our Kafka broker version to the 1.1.0.
It is also important point to notice that small sized the partitions (like 100 or less), increases the throughput of the producer so if there is no enough consumer then the available consumer fall into the thread stuck on results with delayed messages(we measured delay with minutes, approximately 10-15 minutes). So you need to balance and configure the partition size and thread counts of your application correctly according to your available resources.
There might also be a case where kafka takes a long time to rebalance consumer groups when a new consumer is added to the same group id.
Check kafka logs to see if the group is rebalanced after starting your consumer