I am building an Apache kafka producer that is consumed by flink Kafka consumer. I need to generate 1 million up to 10 million message per second. However I am getting very small number of records per second now (up to 2000 per second per partition). I have a cluster with 3 brokers and 30 gb memory in each. The topic has also 10 partitions. Any recommendations please?
Here is my producer code
public class TempDataGenerator implements Runnable {
private String topic = "try";
private String bootStrap_Servers = "kafka-node-01:9092,kafka-node-02:9092,kafka-node-03:9092";
public static void main(String[] args) {
ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
executor.execute(new TempDataGenerator());
}
public TempDataGenerator() {
}
private Producer<String, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,
bootStrap_Servers);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
StringSerializer.class.getName());
props.put(ProducerConfig.ACKS_CONFIG,"0");
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG,"5000000000");
props.put(ProducerConfig.BATCH_SIZE_CONFIG,"100000");
return new KafkaProducer<>(props);
}
public void run() {
final Producer<String, String> producer = createProducer();
Socket soc = null;
try {
boolean active = true;
int generatedCount = 0,tempUserID=1;//the minimum tuple that any thread can generate
while (active) {
generatedCount = 0;
/**
* generate per second
*/
for (long stop = Instant.now().getMillis()+1000; stop > Instant.now().getMillis(); ) { //generate tps
String msg = "{ID:" + generatedCount + ", msg: "+Instant.now().getMillis()+"}";
final ProducerRecord<String, String> record = new ProducerRecord<>(topic, null, msg);
RecordMetadata metadata = producer.send(record).get();
producer.flush();
generatedCount++;
}
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Related
I have a simple kafka producer
public class JavaKafkaProducerExample {
public static void main(String[] args) throws ExecutionException, InterruptedException {
String server = "localhost:9092";
String topicName = "test.topic";
final Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, server);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
try (final Producer<Long, String> producer = new KafkaProducer<>(props);) {
RecordMetadata recordMetadata = (RecordMetadata) producer.send(new ProducerRecord(topicName, "example message")).get(1000, TimeUnit.MILLISECONDS);
if (recordMetadata.hasOffset()) System.out.println("Message sent successfully");
} catch (Exception e) {
System.out.println(e);
}
}
}
Dependencies:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.1.0</version>
</dependency>
I expected that if kafka is unavailable, then send().get(timeout) will be interrupted by timeout, but I get the error java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms. only after 60 seconds. Why doesn't 'get(timeout)' work? How can I reduce the time to error? Is it possible to do this programmatically or only by changing the producer parameters?
I use ArrayBlockingQueue to decouple Kafka consumers from sinks:
Multi-threaded consumption of Kafka, one kafka consumer per thread;
Kafka consumer manually manages the offset;
The Kafka consumer wraps the message content and the callback function containing OFFSET into a Record object and sends it to ArrayBlockingQueue;
Sink fetches the record from ArrayBlockingQueue and processes it. Only after Sink successfully processes the record, does it call the callback function of the Record object (notify the Kafka consumer commitSync)
During the operation, I encountered an error, which troubled me for several days. I don't understand which part of the problem is wrong:
11:44:10.794 [pool-2-thread-1] ERROR com.alibaba.kafka.source.KafkaConsumerRunner - [pool-2-thread-1] ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
at org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:1824)
at org.apache.kafka.clients.consumer.KafkaConsumer.acquireAndEnsureOpen(KafkaConsumer.java:1808)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1255)
at com.alibaba.kafka.source.KafkaConsumerRunner$1.call(KafkaConsumerRunner.java:75)
at com.alibaba.kafka.source.KafkaConsumerRunner$1.call(KafkaConsumerRunner.java:71)
at com.alibaba.kafka.sink.Sink.run(Sink.java:25)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Source Code:
Queues.java
public class Queues {
public static volatile BlockingQueue[] queues;
/**
* Create Multiple Queues.
* #param count The number of queues created.
* #param capacity The Capacity of each queue.
*/
public static void createQueues(final int count, final int capacity) {
Queues.queues = new BlockingQueue[count];
for (int i=0; i<count; ++i) {
Queues.queues[i] = new ArrayBlockingQueue(capacity, true);
}
}
}
Record
#Builder
#Getter
public class Record {
private final String value;
private final Callable<Boolean> ackCallback;
}
Sink.java
public class Sink implements Runnable {
private final int queueId;
public Sink(int queueId) {
this.queueId = queueId;
}
#Override
public void run() {
while (true) {
try {
Record record = (Record) Queues.queues[this.queueId].take();
// (1) Handler: Write to database
Thread.sleep(10);
// (2) ACK: notify kafka consumer to commit offset manually
record.getAckCallback().call();
} catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
}
}
}
KafkaConsumerRunner
#Slf4j
public class KafkaConsumerRunner implements Runnable {
private final String topic;
private final KafkaConsumer<String, String> consumer;
public KafkaConsumerRunner(String topic, Properties properties) {
this.topic = topic;
this.consumer = new KafkaConsumer<>(properties);
}
#Override
public void run() {
// offsets to commit
Map<TopicPartition, OffsetAndMetadata> offsetsToCommit = new HashMap<>();
// Subscribe topic
this.consumer.subscribe(Collections.singletonList(this.topic));
// Consume Kafka Message
while (true) {
try {
ConsumerRecords<String, String> consumerRecords = this.consumer.poll(10000L);
for (TopicPartition topicPartition : consumerRecords.partitions()) {
for (ConsumerRecord<String, String> consumerRecord : consumerRecords.records(topicPartition)) {
// (1) Restore [partition -> offset] Map
offsetsToCommit.put(topicPartition, new OffsetAndMetadata(consumerRecord.offset()));
// (2) Put into queue
int queueId = topicPartition.partition() % Queues.queues.length;
Queues.queues[queueId].put(Record.builder()
.value(consumerRecord.value())
.ackCallback(this.getAckCallback(offsetsToCommit))
.build());
}
}
} catch (ConcurrentModificationException | InterruptedException e) {
log.error("[{}] {}", Thread.currentThread().getName(), ExceptionUtils.getMessage(e), e);
System.exit(1);
}
}
}
private Callable<Boolean> getAckCallback(Map<TopicPartition, OffsetAndMetadata> offsets) {
return new AckCallback<Boolean>(this.consumer, new HashMap<>(offsets)) {
#Override
public Boolean call() throws Exception {
try {
this.getConsumer().commitSync(this.getOffsets());
return true;
} catch (Exception e) {
log.error(String.format("[%s] %s", Thread.currentThread().getName(), ExceptionUtils.getMessage(e)), e);
return false;
}
}
};
}
#Getter
#AllArgsConstructor
abstract class AckCallback<T> implements Callable<T> {
private final KafkaConsumer<String, String> consumer;
private final Map<TopicPartition, OffsetAndMetadata> offsets;
}
}
Application.java
public class Application {
private static final String TOPIC = "YEWEI_TOPIC";
private static final int QUEUE_COUNT = 1;
private static final int QUEUE_CAPACITY = 4;
private static void createQueues() {
Queues.createQueues(QUEUE_COUNT, QUEUE_CAPACITY);
}
private static void startupSource() {
if (null == System.getProperty("java.security.auth.login.config")) {
System.setProperty("java.security.auth.login.config", "jaas.conf");
}
Properties properties = new Properties();
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "ConsumerGroup1");
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "cdh1:9092,cdh2:9092,cdh3:9092");
properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer.class);
properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringDeserializer.class);
properties.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 2);
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
properties.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SASL_PLAINTEXT");
properties.put(SaslConfigs.SASL_MECHANISM, "PLAIN");
ExecutorService executorService = Executors.newFixedThreadPool(QUEUE_COUNT);
for (int queueId = 0; queueId < QUEUE_COUNT; ++queueId) {
executorService.execute(new KafkaConsumerRunner(TOPIC, properties));
}
}
private static void startupSinks() {
ExecutorService executorService = Executors.newFixedThreadPool(QUEUE_COUNT);
for (int queueId = 0; queueId < QUEUE_COUNT; ++queueId) {
executorService.execute(new Sink(queueId));
}
}
public static void main(String[] args) {
Application.createQueues();
Application.startupSource();
Application.startupSinks();
}
}
I figured out this problem. Kafka consumer runs in its own thread and is also called back by the Sink thread. The poll and commitSync method of KafkaConsumer can only be applied to one thread. See org.apache.kafka.clients.consumer.KafkaConsumer#acquireAndEnsureOpen.
Change to: The Sink callback does not directly use the consumer object, but sends the ACK message to the LinkedTransferQueue. KafkaConsumerRunner polls the LinkedTransferQueue every time and batches ACKs
#Slf4j
public class KafkaConsumerRunner implements Runnable {
private final String topic;
private final BlockingQueue ackQueue;
private final KafkaConsumer<String, String> consumer;
public KafkaConsumerRunner(String topic, Properties properties) {
this.topic = topic;
this.ackQueue = new LinkedTransferQueue<Map<TopicPartition, OffsetAndMetadata>>();
this.consumer = new KafkaConsumer<>(properties);
}
#Override
public void run() {
// Subscribe topic
this.consumer.subscribe(Collections.singletonList(this.topic));
// Consume Kafka Message
while (true) {
while (!this.ackQueue.isEmpty()) {
try {
Map<TopicPartition, OffsetAndMetadata> offsets = (Map<TopicPartition, OffsetAndMetadata>) this.ackQueue.take();
this.consumer.commitSync(offsets);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
...
}
}
private Callable<Boolean> getAckCallback(Map<TopicPartition, OffsetAndMetadata> offsets) {
return new AckCallback<Boolean>(new HashMap<>(offsets)) {
#Override
public Boolean call() throws Exception {
try {
ackQueue.put(offsets);
return true;
} catch (Exception e) {
log.error(String.format("[%s] %s", Thread.currentThread().getName(), ExceptionUtils.getMessage(e)), e);
System.exit(1);
return false;
}
}
};
}
...
}
I have a Java class that -upon a certain action from the GUI- initiates a connection with the RabbitMQ server (using the pub/sub patter) and listens for new events.
I want to add a new feature where I will allow the user to set an "end time" that will stop my application from listening to new events (stop consuming from the queue without closing it).
I tried to utilise the basicCancel method, but I can't find a way to make it work for a predefined date.
Would it be a good idea to initiate a new thread inside my Subscribe class that will call the basicCancel upon reaching the given date or is there a better way to do that?
Listen to new events
private void listenToEvents(String queueName) {
try {
logger.info(" [*] Waiting for events. Subscribed to : " + queueName);
Consumer consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag, Envelope envelope,
AMQP.BasicProperties properties, byte[] body) throws IOException {
TypeOfEvent event = null;
String message = new String(body);
// process the payload
InteractionEventManager eventManager = new InteractionEventManager();
event = eventManager.toCoreMonitorFormatObject(message);
if(event!=null){
String latestEventOpnName = event.getType().getOperationMessage().getOperationName();
if(latestEventOpnName.equals("END_OF_PERIOD"))
event.getMessageArgs().getContext().setTimestamp(++latestEventTimeStamp);
latestEventTimeStamp = event.getMessageArgs().getContext().getTimestamp();
ndaec.receiveTypeOfEventObject(event);
}
}
};
channel.basicConsume(queueName, true, consumer);
//Should I add the basicCancel here?
}
catch (Exception e) {
logger.info("The Monitor could not reach the EventBus. " +e.toString());
}
}
Initiate Connection
public String initiateConnection(Timestamp endTime) {
Properties props = new Properties();
try {
props.load(new FileInputStream(everestHome+ "/monitoring-system/rabbit.properties"));
}catch(IOException e){
e.printStackTrace();
}
RabbitConfigure config = new RabbitConfigure(props,props.getProperty("queuName").trim());
ConnectionFactory factory = new ConnectionFactory();
exchangeTopic = new HashMap<String,String>();
String exchangeMerged = config.getExchange();
logger.info("Exchange=" + exchangeMerged);
String[] couples = exchangeMerged.split(";");
for(String couple : couples)
{
String[] infos = couple.split(":");
if (infos.length == 2)
{
exchangeTopic.put(infos[0], infos[1]);
}
else
{
logger.error("Invalid Exchange Detail: " + couple);
}
}
for(Entry<String, String> entry : exchangeTopic.entrySet()) {
String exchange = entry.getKey();
String topic = entry.getValue();
factory.setHost(config.getHost());
factory.setPort(Integer.parseInt(config.getPort()));
factory.setUsername(config.getUsername());
factory.setPassword(config.getPassword());
try {
connection1= factory.newConnection();
channel = connection1.createChannel();
channel.exchangeDeclare(exchange, EXCHANGE_TYPE);
/*Map<String, Object> args = new HashMap<String, Object>();
args.put("x-expires", endTime.getTime());*/
channel.queueDeclare(config.getQueue(),false,false,false,null);
channel.queueBind(config.getQueue(),exchange,topic);
logger.info("Connected to RabbitMQ.\n Exchange: " + exchange + " Topic: " + topic +"\n Queue Name is: "+ config.getQueue());
return config.getQueue();
} catch (IOException e) {
logger.error(e.getMessage());
e.printStackTrace();
} catch (TimeoutException e) {
logger.error(e.getMessage());
e.printStackTrace();
}
}
return null;
}
You can create a delayed queue, setting the time-to-leave so the message you push there will be dead-lettered exactly as soon as you want to stop your consumer.
Then you have to bind the dead letter exchange to a queue whose consumer will stop the other one as soon as it gets the message.
Never use threads when you have RabbitMq, you can do a lot of interesting stuff with delayed messages!
Please, this is the first time that I write a flink job and I need help. The goal of the job is to calculate the average of different fields of an avro object. The avro schema that I use to parse json messages that come from an ActiveMQ queue is the following:
[
{
"type":"record",
"name":"SensorDataAnnotation",
"namespace":"zzz",
"fields":[
{"name":"meas","type":["null","string"]},
{"name":"prefix","type":["null","string"]}
]
},
{
"namespace":"zzz",
"name":"SensorDataList",
"type":"record",
"fields":[
{"name":"SensorDataListContainer",
"type":{"name":"SensorDataListContainer","type":"array","namespace":"zzz",
"items":{"type":"record","name":"SensorData","namespace":"zzz",
"fields":[
{"name":"prkey","type":"int"},
{"name":"prkeyannotation","type":["null","SensorDataAnnotation"]},
{"name":"value1","type":["null","double"]},
{"name":"value1annotation","type":["null","SensorDataAnnotation"]},
{"name":"value2","type":["null","double"]},
{"name":"value2annotation","type":["null","SensorDataAnnotation"]},
{"name":"value3","type":["null","int"]},
{"name":"value3annotation","type":["null","SensorDataAnnotation"]}
}
]
This is the flink job that I tried to write:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<SensorData> messages = env.addSource(new StreamSource());
DataStream<Double> counts = messages
.map(new MapFunction<SensorData, Double>() {
#Override
public Double map(SensorData arg0) throws Exception {
return arg0.getValue1();
}
})
.timeWindowAll(Time.seconds(10), Time.seconds(5))
.apply(new Avg());
counts.print();
env.execute("ActiveMQ Streaming Job");
with the StreamSource and Avg classes:
StreamSource
class StreamSource extends RichSourceFunction<SensorData> {
private static final long serialVersionUID = 1L;
private static final Logger LOG = Logger.getLogger(StreamSource.class);
private transient volatile boolean running;
private transient MessageConsumer consumer;
private transient Connection connection;
private void init() throws JMSException {
// Create a ConnectionFactory
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory("tcp://localhost:61616");
// Create a Connection
connection = connectionFactory.createConnection();
connection.start();
// Create a Session
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
// Create the destination (Topic or Queue)
Destination destination = session.createQueue("input");
// Create a MessageConsumer from the Session to the Topic or Queue
consumer = session.createConsumer(destination);
}
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
running = true;
init();
}
#Override
public void run(SourceContext<SensorData> ctx) {
// this source never completes
while (running) {
try {
// Wait for a message
Message message = consumer.receive(1000);
if (message instanceof TextMessage) {
TextMessage textMessage = (TextMessage) message;
String text = textMessage.getText();
try {
byte[] avroDesObj = jsonToAvro(text, SensorData.SCHEMA$.toString());
DatumReader<SensorData> reader = new SpecificDatumReader<SensorData>(SensorData.SCHEMA$);
Decoder decoder = DecoderFactory.get().binaryDecoder(avroDesObj, null);
SensorData data = reader.read(null, decoder);
ctx.collect(data);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} else {
LOG.error("Don't know what to do .. or no message");
}
} catch (JMSException e) {
LOG.error(e.getLocalizedMessage());
running = false;
}
}
try {
close();
} catch (Exception e) {
LOG.error(e.getMessage(), e);
}
}
#Override
public void cancel() {
running = false;
}
#Override
public void close() throws Exception {
LOG.info("Closing");
try {
connection.close();
} catch (JMSException e) {
throw new RuntimeException("Error while closing ActiveMQ connection ", e);
}
}
}
Avg
public static class Avg implements AllWindowFunction<Double,Double, TimeWindow> {
#Override
public void apply(TimeWindow window, Iterable<Double> values, Collector<Double> out) throws Exception {
double sum = 0.0;
int count = 0;
for(Double value : values) {
sum += value.doubleValue();
count++;
}
Double avg = values.iterator().next();
avg = sum / count;
out.collect(avg);
}
}
When I launch the exported jar of this job in the flink dashboard, it does not start and I don't know what I am doing wrong.
Thank You.
I have a job that runs every hour, on 40core server, each job can have between 1 to 100 thousand tasks (need large queue), each task execute HTTP request when it finish, each task is critical which means it must run & complete.
Tasks can run asynchronously.
How do I config the number of threads in pool ? how do I config the queue size ?
in this test I'm trying to get my tasks rejected and flood my thread pool but instead i'm getting SocketTimeoutException
public static void main(String[] args) throws IOReactorException {
String url = "http://internal.server:8001/get";
int connectionTimeout = 3000;
int soTimeout = 3000;
int maxHttpConnections = 30;
IOReactorConfig customIOReactorConfig = IOReactorConfig.custom()
.setIoThreadCount(Runtime.getRuntime().availableProcessors())
.setConnectTimeout(connectionTimeout)
.setSoTimeout(soTimeout)
.build();
ConnectingIOReactor ioReactor = new DefaultConnectingIOReactor(customIOReactorConfig);
PoolingNHttpClientConnectionManager connManager = new PoolingNHttpClientConnectionManager(ioReactor);
connManager.setDefaultMaxPerRoute(maxHttpConnections);
connManager.setMaxTotal(maxHttpConnections);
CloseableHttpAsyncClient customHttpAsyncClient = HttpAsyncClients.custom()
.setConnectionManager(connManager)
.build();
HttpComponentsAsyncClientHttpRequestFactory asyncRequestFactory = new HttpComponentsAsyncClientHttpRequestFactory(customHttpAsyncClient);
AsyncRestTemplate asyncRestTemplate = new AsyncRestTemplate(asyncRequestFactory);
System.out.println("start");
for (int i = 0; i < 30_000; i++) {
asyncRestTemplate.execute(url, HttpMethod.GET, request -> logger.info("doWithRequest..."), response -> {
logger.info("extractData...");
return response.getStatusText();
}).addCallback(new ListenableFutureCallback<String>() {
#Override
public void onFailure(Throwable ex) {
logger.error("onFailure [{}] [{}]", ex.getMessage(), ex.getStackTrace()[0].toString());
}
#Override
public void onSuccess(String result) {
logger.info("onSuccess");
}
});
}
System.out.println("end loop");
}
you can do something like:
ThreadPoolTaskExecutor poolTaskExecutor = new ThreadPoolTaskExecutor();
poolTaskExecutor.setQueueCapacity(100);
CloseableHttpAsyncClient httpclient = HttpAsyncClients
.custom()
.setThreadFactory(poolTaskExecutor).build();