As on the title, i wonder if i can chech what embeded kafka topic contains, just lile what local variable contains?
I tried in intelij debug gui but didnt find anything
You need to write a KafkaConsumer to check data that any Kafka topic contains, embedded, or not.
Otherwise, you can download Kafka CLI tools, produce + flush some messages, then use kafka-dump-log tool on segment files written to disk, while your breakpoint is set.
Related
I'm new to the streaming community
I'm trying to create a continuous query using kafka topics and flink but I haven't found any examples so I can get an idea of how to get started
can you help me with some examples?
thank you.
For your use case, I'm guessing you want to use kafka as source for continuous data. In this case you can use kafka-source-connector(linked below) and if you want to slice it with time you can use flink's Window Processing Function. This will group your kafka messages streamed in a particular timeframe like a list/map.
Flink Kafka source connector
Flink Window Processing Function
While creating Kafka Streams using Kafka Streams DSL
https://kafka.apache.org/0110/documentation/streams/developer-guide
we have encountered a scenario where we need to update the Kafka Streams with new topology definition.
For example:
When we started, we have a topology defined to read from one topic (Source) and a destination topic (Sink).
However, on a configuration change we now need to read from 2 different topics (2 sources if you will) and write to a single destination topic.
From what we have built right now, the topology definition is hard coded, something like defined in processor topology.
Questions:
Is it possible to define topology in a declarative way (say in a Json or something else), which doesn't require a codification of the topology?
Is it possible to reload an existing Kafka Stream to use a new definition of the Kafka Streams Topology?
For #2 mentioned above, does Kafka Streams DSL provide a way to "reload" new topology definitions by way of an external trigger or system call?
We are using JDK 1.8 and Kafka DSL 2.2.0
Thanks,
Ayusman
Is it possible to define topology in a declarative way (say in a Json or something else), which doesn't require a codification of the topology?
The KStreams DSL is declarative, but I assume you mean something other than the DSL?
If so, the answer is No. You may want to look at KSQL, however.
Is it possible to reload an existing Kafka Stream to use a new definition of the Kafka Streams Topology?
You mean if an existing Kafka Streams application can reload a new definition of a processing topology? If so, the answer is No. In such cases, you'd deploy a new version of your application.
Depending on how the old/new topologies are defined, a simple rolling upgrade of your application may suffice (roughly: if the topology change was minimal), but probably you will need to deploy the new application separately and then, once the new one is vetted, decommission your old application.
Note: KStreams is a Java library and, by design, does not include functionality to operate/manage the Java applications that use the KStreams library.
For #2 mentioned above, does Kafka Streams DSL provide a way to "reload" new topology definitions by way of an external trigger or system call?
No.
I'm planning to write my own Kafka connect CSV connector which will read the data from a CSV file and write the data to a topic. Data should be written to the topic in the form of JSON.
Also I came across kafka-connect-spooldir plugin of confluent. I don't want to use this and write my own.
Can anyone advice me how to go about creating a connector for the same?
The official Kafka documentation has a section on Connector development so that is probably the best first stop.
Kafka also ships with File Connectors (both Source and Sink). Have a look at the code: https://github.com/apache/kafka/tree/trunk/connect/file/src/main/java/org/apache/kafka/connect/file
It should not be too hard to modify these for your use case.
Finally as you mentioned, there are already connectors that can read CSV files and that are open source. So if you're stuck on something you can check how they did it.
So I am new to working with Apache Kafka and I am trying to create a simple app so I can try to understand the API better. I know this question has been asked a lot here, but how can I clear out the messages/records that are stored on a topic?
Most of the answers I have seen say to change the message retention time or to delete & recreate the topic. Neither of these are options for me as I do not have access to the server.properties file. I am not running Kafka locally, it is hosted on a server. Is there a way to do do it in Java code maybe or something?
If you are searching for a way to delete messages selectively, the new AdminClient API (usable from Java code) provides the following deleteRecords method :
https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/admin/AdminClient.html
Need help in deciding what frameworks I can use in this scenario. I'm exploring Zookeeper, but not completely sure on how to solution this usecase.
background :
Say there is application that connects to a streaming source(Kafka or Activemq etc) and writes messages that were processed from the stream
to a file.
This application is deployed as 4 instances.Each instance is processing messages and writing to file that were processed in last 1 hr.
Each instance creates a file that stores messages that it processed last 1
hour. example -filename is servername_8.00 for messages processed from 8-9
Requirement is to transfer all the files that were created last 1 hour if every instance created a file in that window and also send only one consolidated file which lists all the 4 file names and
number of records.
what i'm looking for :
1. How do I make sure application instances know if other instances also created files and if every instance created then only they should transmit file
2. whatever instance sending, consolidated file should know what was transmitted.
what frameworks I can use to solve this?
You can definitely use ZooKeeper for this. I would use Apache Curator as well (note: I'm the main author of Curator).
Do all the instances share a file server? i.e. can each instance see all of the created files? If so, you can nominate a leader using ZooKeeper/Curator and only the leader does all of the work. You can see sample leader election code here: https://github.com/apache/curator/tree/master/curator-examples/src/main/java/leader
If the instances do not share a file server, you could still use ZooKeeper to coordinate the writing of the shared file. You'd again nominate a leader who exposes an API of some kind that all the instances can write to and the leader creates the shared file.
You also might find the Curator barrier recipes useful: http://curator.apache.org/curator-recipes/double-barrier.html and http://curator.apache.org/curator-recipes/barrier.html
You'd have to give a lot more detail on your use case if you want a more detail design.