We have an Apache Flink POC application which works fine locally but after we deploy into Kinesis Data Analytics (KDA) it does not emit records into the sink.
Used technologies
Local
Source: Kafka 2.7
1 broker
1 topic with partition of 1 and replication factor 1
Processing: Flink 1.12.1
Sink: Managed ElasticSearch Service 7.9.1 (the same instance as in case of AWS)
AWS
Source: Amazon MSK Kafka 2.8
3 brokers (but we are connecting to one)
1 topic with partition of 1, replication factor 3
Processing: Amazon KDA Flink 1.11.1
Parallelism: 2
Parallelism per KPU: 2
Sink: Managed ElasticSearch Service 7.9.1
Application logic
The FlinkKafkaConsumer reads messages in json format from the topic
The jsons are mapped to domain objects, called Telemetry
private static DataStream<Telemetry> SetupKafkaSource(StreamExecutionEnvironment environment){
Properties kafkaProperties = new Properties();
kafkaProperties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "BROKER1_ADDRESS.amazonaws.com:9092");
kafkaProperties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "flink_consumer");
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<>("THE_TOPIC", new SimpleStringSchema(), kafkaProperties);
consumer.setStartFromEarliest(); //Just for repeatable testing
return environment
.addSource(consumer)
.map(new MapJsonToTelemetry());
}
The Telemetry’s timestamp is chosen for EventTimeStamp.
3.1. With forMonotonousTimeStamps
Telemetry’s StateIso is used for keyBy.
4.1. The two letter iso code of the state of USA
5 seconds tumbling window strategy is applied
private static SingleOutputStreamOperator<StateAggregatedTelemetry> SetupProcessing(DataStream<Telemetry> telemetries) {
WatermarkStrategy<Telemetry> wmStrategy =
WatermarkStrategy
.<Telemetry>forMonotonousTimestamps()
.withTimestampAssigner((event, timestamp) -> event.TimeStamp);
return telemetries
.assignTimestampsAndWatermarks(wmStrategy)
.keyBy(t -> t.StateIso)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.process(new WindowCountFunction());
}
A custom ProcessWindowFunction is called to perform some basic aggregation.
6.1. We calculate a single StateAggregatedTelemetry
ElasticSearch is configured as sink.
7.1. StateAggregatedTelemetry data are mapped into a HashMap and pushed into source.
7.2. All setBulkFlushXYZ methods are set to low values
private static void SetupElasticSearchSink(SingleOutputStreamOperator<StateAggregatedTelemetry> telemetries) {
List<HttpHost> httpHosts = new ArrayList<>();
httpHosts.add(HttpHost.create("https://ELKCLUSTER_ADDRESS.amazonaws.com:443"));
ElasticsearchSink.Builder<StateAggregatedTelemetry> esSinkBuilder = new ElasticsearchSink.Builder<>(
httpHosts,
(ElasticsearchSinkFunction<StateAggregatedTelemetry>) (element, ctx, indexer) -> {
Map<String, Object> record = new HashMap<>();
record.put("stateIso", element.StateIso);
record.put("healthy", element.Flawless);
record.put("unhealthy", element.Faulty);
...
LOG.info("Telemetry has been added to the buffer");
indexer.add(Requests.indexRequest()
.index("INDEXPREFIX-"+ from.format(DateTimeFormatter.ofPattern("yyyy-MM-dd")))
.source(record, XContentType.JSON));
}
);
//Using low values to make sure that the Flush will happen
esSinkBuilder.setBulkFlushMaxActions(25);
esSinkBuilder.setBulkFlushInterval(1000);
esSinkBuilder.setBulkFlushMaxSizeMb(1);
esSinkBuilder.setBulkFlushBackoff(true);
esSinkBuilder.setRestClientFactory(restClientBuilder -> {});
LOG.info("Sink has been attached to the DataStream");
telemetries.addSink(esSinkBuilder.build());
}
Excluded things
We managed to put Kafka, KDA and ElasticSearch under the same VPC and same subnets to avoid the need to sign each request
From the logs we could see that the Flink can reach the ES cluster.
Request
{
"locationInformation": "org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.verifyClientConnection(Elasticsearch7ApiCallBridge.java:135)",
"logger": "org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge",
"message": "Pinging Elasticsearch cluster via hosts [https://...es.amazonaws.com:443] ...",
"threadName": "Window(TumblingEventTimeWindows(5000), EventTimeTrigger, WindowCountFunction) -> (Sink: Print to Std. Out, Sink: Unnamed, Sink: Print to Std. Out) (2/2)",
"applicationARN": "arn:aws:kinesisanalytics:...",
"applicationVersionId": "39",
"messageSchemaVersion": "1",
"messageType": "INFO"
}
Response
{
"locationInformation": "org.elasticsearch.client.RequestLogger.logResponse(RequestLogger.java:59)",
"logger": "org.elasticsearch.client.RestClient",
"message": "request [HEAD https://...es.amazonaws.com:443/] returned [HTTP/1.1 200 OK]",
"threadName": "Window(TumblingEventTimeWindows(5000), EventTimeTrigger, WindowCountFunction) -> (Sink: Print to Std. Out, Sink: Unnamed, Sink: Print to Std. Out) (2/2)",
"applicationARN": "arn:aws:kinesisanalytics:...",
"applicationVersionId": "39",
"messageSchemaVersion": "1",
"messageType": "DEBUG"
}
We could also verify that the messages had been read from the Kafka topic and sent for processing by looking at the Flink Dashboard
What we have tried without luck
We had implemented a RichParallelSourceFunction which emits 1_000_000 messages and then exits
This worked well in the Local environment
The job finished in the AWS environment, but there was no data on the sink side
We had implemented an other RichParallelSourceFunction which emits 100 messages at each second
Basically we had two loops a while(true) outer and for inner
After the inner loop we called the Thread.sleep(1000)
This worked perfectly fine on the local environment
But in AWS we could see that checkpoints' size grow continuously and no message appeared in ELK
We have tried to run the KDA application with different parallelism settings
But there was no difference
We also tried to use different watermarking strategies (forBoundedOutOfOrderness, withIdle, noWatermarks)
But there was no difference
We have added logs for the ProcessWindowFunction and for the ElasticsearchSinkFunction
Whenever we run the application from IDEA then these logs were on the console
Whenever we run the application with KDA then there was no such logs in CloudWatch
Those logs that were added to the main they do appear in the CloudWatch logs
We suppose that we don't see data on the sink side because the window processing logic is not triggered. That's why don't see processing logs in the CloudWatch.
Any help would be more than welcome!
Update #1
We have tried to downgrade the Flink version from 1.12.1 to 1.11.1
There is no change
We have tried processing time window instead of event time
It did not even work on the local environment
Update #2
The average message size is around 4kb. Here is an excerpt of a sample message:
{
"affiliateCode": "...",
"appVersion": "1.1.14229",
"clientId": "guid",
"clientIpAddr": "...",
"clientOriginated": true,
"connectionType": "Cable/DSL",
"countryCode": "US",
"design": "...",
"device": "...",
...
"deviceSerialNumber": "...",
"dma": "UNKNOWN",
"eventSource": "...",
"firstRunTimestamp": 1609091112818,
"friendlyDeviceName": "Comcast",
"fullDevice": "Comcast ...",
"geoInfo": {
"continent": {
"code": "NA",
"geoname_id": 120
},
"country": {
"geoname_id": 123,
"iso_code": "US"
},
"location": {
"accuracy_radius": 100,
"latitude": 37.751,
"longitude": -97.822,
"time_zone": "America/Chicago"
},
"registered_country": {
"geoname_id": 123,
"iso_code": "US"
}
},
"height": 720,
"httpUserAgent": "Mozilla/...",
"isLoggedIn": true,
"launchCount": 19,
"model": "...",
"os": "Comcast...",
"osVersion": "...",
...
"platformTenantCode": "...",
"productCode": "...",
"requestOrigin": "https://....com",
"serverTimeUtc": 1617809474787,
"serviceCode": "...",
"serviceOriginated": false,
"sessionId": "guid",
"sessionSequence": 2,
"subtype": "...",
"tEventId": "...",
...
"tRegion": "us-east-1",
"timeZoneOffset": 5,
"timestamp": 1617809473305,
"traits": {
"isp": "Comcast Cable",
"organization": "..."
},
"type": "...",
"userId": "guid",
"version": "v1",
"width": 1280,
"xb3traceId": "guid"
}
We are using ObjectMapper to parse only just some of the fields of the json. Here is how the Telemetry class looks like:
public class Telemetry {
public String AppVersion;
public String CountryCode;
public String ClientId;
public String DeviceSerialNumber;
public String EventSource;
public String SessionId;
public TelemetrySubTypes SubType; //enum
public String TRegion;
public Long TimeStamp;
public TelemetryTypes Type; //enum
public String StateIso;
...
}
Update #3
Source
Subtasks tab
ID
Bytes received
Records received
Bytes sent
Records sent
Status
0
0 B
0
0 B
0
RUNNING
1
0 B
0
2.83 MB
15,000
RUNNING
Watermarks tab
No Data
Window
Subtasks tab
ID
Bytes received
Records received
Bytes sent
Records sent
Status
0
1.80 MB
9,501
0 B
0
RUNNING
1
1.04 MB
5,499
0 B
0
RUNNING
Watermarks
SubTask
Watermark
1
No Watermark
2
No Watermark
According the comments and more information You have provided, it seems that the issue is the fact that two Flink consumers can't consume from the same partition. So, in Your case only one parallel instance of the operator will consume from kafka partition and the other one will be idle.
In general Flink operator will select MIN([all_downstream_parallel_watermarks]), so In Your case one Kafka Consumer will produce normal Watermarks and the other will never produce anything (flink assumes Long.Min in that case), so Flink will select the lower one which is Long.Min. So, window will never be fired, because while the data is flowing one of the watermarks is never generated. The good practice is to use the same paralellism as the number of Kafka partitions when working with Kafka.
After having a support session with the AWS folks it turned out that we have missed to set the time characteristic on the streaming environment.
In 1.11.1 the default value of TimeCharacteristic was IngestionTime.
Since 1.12.1 (see related release notes) the default value is EventTime:
In Flink 1.12 the default stream time characteristic has been changed to EventTime, thus you don’t need to call this method for enabling event-time support anymore.
So, after we have set that EventTime explicitly then it started to generates watermarks like a charm:
streamingEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
I'm developing a MS with Kotlin and Micronaut which access a Firestore database. When I run this MS locally I can make it work with 128M because it's very simple just read and write data to Firestore, and not big amounts of data, really small data like this:
{
"project": "DUMMY",
"columns": [
{
"name": "TODO",
"taskStatus": "TODO"
},
{
"name": "IN_PROGRESS",
"taskStatus": "IN_PROGRESS"
},
{
"name": "DONE",
"taskStatus": "DONE"
}
],
"tasks": {}
}
I'm running this in App Engine Standard in a F1 instance (256 MB 600 MHz) with this properties in my app.yaml
runtime: java11
instance_class: F1 # 256 MB 600 MHz
entrypoint: java -Xmx200m -jar MY_JAR.jar
service: data-connector
env_variables:
JAVA_TOOL_OPTIONS: "-Xmx230m"
GAE_MEMORY_MB: 128M
automatic_scaling:
max_instances: 1
max_idle_instances: 1
I know all that properties for handling memory are not necessary but I was desperate trying to make this work and just tried a lot of solutions because my first error message was:
Exceeded soft memory limit of 256 MB with 263 MB after servicing 1 requests total. Consider setting a larger instance class in app.yaml.
The error below is not fixed with the properties in the app.yaml, but now everytime I make a call to return that JSON I get this error
2020-04-10 12:09:15.953 CEST
While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.
It always last longer in the first request, I think due to some Firestore configuration, but the thing is that I cannot make that work, always getting the same error.
Do you have any idea what I could be doing wrong or what I need to fix this?
TL;DR The problem was I tried to used a very small instance for a simple application, but even with that I needed more memory.
Ok, a friend helped me with this. I was using a very small instance and even when I didn't get the error of memory limit it was a memory problem.
Updating my instance to a F2 (512 MB 1.2 GHz) solved the problem and testing my app with siege resulted in a very nice performance:
Transactions: 5012 hits
Availability: 100.00 %
Elapsed time: 59.47 secs
Data transferred: 0.45 MB
Response time: 0.30 secs
Transaction rate: 84.28 trans/sec
Throughput: 0.01 MB/sec
Concurrency: 24.95
Successful transactions: 3946
Failed transactions: 0
Longest transaction: 1.08
Shortest transaction: 0.09
My sysops friends tells me that this instances are more for python scripting code and things like that, not JVM REST servers.
Hi all, I'm working on Azure functions, I'm new to this, I have created a local java Azure function project using the below archetype:
*mvn archetype:generate -DgroupId=com.mynew.serverlesstest -DartifactId=serverlessexample -
DarchetypeGroupId=com.microsoft.azure -DarchetypeArtifactId=azure-functions-archetype -
DinteractiveMode=false*
*The template has a simple java function.
Upon executing a "mvn clean package" command, function.json gets generated in the target folder for the function, below is my function.json*
{
"scriptFile" : "..\\serverlessexample-1.0-SNAPSHOT.jar",
"entryPoint" : "com.mynew.serverlesstest.Function.hello",
"bindings" : [ {
"type" : "httpTrigger",
"name" : "req",
"direction" : "in",
"authLevel" : "anonymous",
"methods" : [ "get", "post" ]
}, {
"type" : "http",
"name" : "$return",
"direction" : "out"
} ],
"disabled" : false
}
on doing a mvn azure-functions:run, the application starts successfully , and I get below in the command prompt:
[06-04-2020 07:26:55] Initializing function HTTP routes
[06-04-2020 07:26:55] Mapped function route 'api/hello' [get,post] to 'hello'
[06-04-2020 07:26:55] Mapped function route 'api/HttpTrigger-Java' [get,post] to 'HttpTrigger-Java'
[06-04-2020 07:26:55]
[06-04-2020 07:26:55] Host initialized (424ms)
[06-04-2020 07:26:55] Host started (433ms)
[06-04-2020 07:26:55] Job host started
Http Functions:
hello: [GET,POST] Hosting environment: Production
http://localhost:7071/api/hello
Content root path: C:\Users\ramaswamys\Development\azure-
serverless\serverlessexample\target\azure-functions\serverlessexample-20200403205054646
Now listening on: http://0.0.0.0:7071
Application started. Press Ctrl+C to shut down.
HttpTrigger-Java: [GET,POST] http://localhost:7071/api/HttpTrigger-Java
[06-04-2020 07:27:00] Host lock lease acquired by instance ID
'000000000000000000000000852CF5C4'.
But when I try hitting the api(http://localhost:7071/api/hello) from postman, I dont get any response, I see the below in the command prompt:
[06-04-2020 07:29:04] Executing HTTP request: {
[06-04-2020 07:29:04] "requestId": "af46115f-7a12-49a9-87e0-7fb073a66450",
[06-04-2020 07:29:04] "method": "GET",
[06-04-2020 07:29:04] "uri": "/api/hello"
[06-04-2020 07:29:04] }
[06-04-2020 07:29:05] Executing 'Functions.hello' (Reason='This function was programmatically called
via the host APIs.', Id=7c712cdf-332f-413f-bda2-138f9b89025b)
After this nothing happens:
after 30 min I get a timeout exception like below in the command prompt:
Microsoft.Azure.WebJobs.Host: Timeout value of 00:30:00 was exceeded by function: Functions.hello.
Can someone suggest what might be causing this, why no response is seen in the postman, Am I doing
any thing wrong here ? Am I missing any configuration stuff ? Timely help would be appreciated
Please have a check of your function code.
From the info you offer, the trigger has already be triggered. So that means the code logic inside your trigger is not completed. In other words, you successfully triggered the trigger, but you are stuck in the internal logic of the trigger. Please check your code.
I´m trying to put my program in Heroku, i´ve made all the steps, created a git repository, commited all the stuff, but when I try to push to heroku, I always have this output:
**ERROR in Child compilation failed:
undefined
ERROR in budgets, maximum exceeded for /tmp/build_0e8fdf2b85b36f64aa8fc55074a8481a/src/app/app.component.css. Budget 10 kB was exceeded by 129 kB.**
I´ve searched for answers, and did this:
"budgets": {
"type": "any",
"baseline": "400kb",
"maximumError": "25%",
"maximumWarning": "12%"
},
I have tried with mb and kb instead of percentage but nothing worked...
Can somebody help me with this one?
Thanks
I am trying to create a ingest pipeline using below PUT request:
{
"description": "ContentExtractor",
"processors": [
{
"extractor": {
"field": "contentData",
"target_field": "content"
}
}
]
}
But this is resulting in following error:
{
"error": {
"root_cause": [
{
"type": "not_x_content_exception",
"reason": "Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"
}
],
"type": "not_x_content_exception",
"reason": "Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"
},
"status": 500
}
I see below exception in ES logs:
org.elasticsearch.common.compress.NotXContentException: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
at org.elasticsearch.common.compress.CompressorFactory.compressor(CompressorFactory.java:57) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:65) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.ingest.PipelineStore.validatePipeline(PipelineStore.java:154) ~[elasticsearch-5.1.2.jar:5.1.2]
at org.elasticsearch.ingest.PipelineStore.put(PipelineStore.java:133) ~[elasticsearch-5.1.2.jar:5.1.2]
This problem happening when Elasticsearch is running in Solaris, same request works fine in case of Linux. What am I doing wrong? Can somebody help me to fix this issue?
Thanks in advance.
Got the exact same error message but (on different version of elasticsearch and) when querying with erroneous
data format (misinterpreted doc https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html : "request body" is expected as plain JSON -- it is not intended to explain HTTP request body)
or using old syntax within path of URL (just after 'index' in the URL) :
curl -XPUT -H "Content-Type: application/json" http://host:port/index/_mapping/_doc -d "mappings=#mymapping.json"
Just remove the "mappings=" and trailing path!