I am using spring data Redis 2.5.7 to consume the stream messages from Redis, this is my consumer code looks like:
package com.dolphin.soa.post.common.mq;
import com.alibaba.fastjson.JSON;
import com.dolphin.soa.post.contract.request.ArticleRequest;
import com.dolphin.soa.post.model.entity.SubRelation;
import com.dolphin.soa.post.service.IArticleService;
import com.dolphin.soa.post.service.ISubRelationService;
import lombok.extern.slf4j.Slf4j;
import misc.enumn.user.SubStatus;
import org.apache.commons.collections4.CollectionUtils;
import org.apache.commons.collections4.MapUtils;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.data.redis.connection.stream.MapRecord;
import org.springframework.data.redis.core.DefaultTypedTuple;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.data.redis.core.ZSetOperations;
import org.springframework.data.redis.stream.StreamListener;
import org.springframework.stereotype.Component;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
/**
* #author dolphin
*/
#Component
#Slf4j
public class StreamMessageListener implements StreamListener<String, MapRecord<String, String, String>> {
#Value("${dolphin.redis.stream.group}")
private String groupName;
#Value("${dolphin.redis.user.sub.article.key}")
private String subArticleKey;
private final StringRedisTemplate stringRedisTemplate;
private final RedisTemplate<String, Object> articleRedisTemplate;
private final RedisTemplate<String, Long> redisLongTemplate;
private final ISubRelationService subRelationService;
private final IArticleService articleService;
public StreamMessageListener(StringRedisTemplate stringRedisTemplate,
#Qualifier("redisObjectTemplate") RedisTemplate<String, Object> articleRedisTemplate,
ISubRelationService subRelationService,
#Qualifier("redisLongTemplate") RedisTemplate<String, Long> redisLongTemplate,
IArticleService articleService) {
this.stringRedisTemplate = stringRedisTemplate;
this.articleRedisTemplate = articleRedisTemplate;
this.subRelationService = subRelationService;
this.redisLongTemplate = redisLongTemplate;
this.articleService = articleService;
}
#Override
public void onMessage(MapRecord<String, String, String> message) {
try {
Map<String, String> body = message.getValue();
log.debug("receive message from redis:" + JSON.toJSONString(body));
handleArticle(body);
this.stringRedisTemplate.opsForStream().acknowledge(groupName, message);
} catch (Exception e) {
log.error("handle redis stream message error", e);
}
}
private void handleArticle(Map<String, String> body) {
Long channelId = MapUtils.getLongValue(body, "sub_source_id", -1L);
Long articleId = MapUtils.getLongValue(body, "id", -1L);
if (channelId <= 0L || articleId <= 0L) {
log.error("id incorrect", body);
return;
}
ArticleRequest articleRequest = new ArticleRequest();
articleRequest.setChannelId(channelId);
articleRequest.setSubStatus(SubStatus.SUB);
// TODO: may be page should avoid memory overflow, this is dangerous when the subscribe user increase
List<SubRelation> relations = subRelationService.list(articleRequest);
if (CollectionUtils.isEmpty(relations)) {
return;
}
relations.forEach(item -> {
var articleIdsSet = new HashSet<ZSetOperations.TypedTuple<Long>>();
ZSetOperations.TypedTuple<Long> singleId = new DefaultTypedTuple<>(articleId, Double.valueOf(articleId));
articleIdsSet.add(singleId);
/**
* because now we only have less than 2GB memory
* the redis stream only pass the article id and channel id not the full article
* at this procedure has a extern stop 1 query from database
*/
articleService.getArticleFromCache(Arrays.asList(articleId));
String userSubCacheKey = subArticleKey + item.getUserId();
Boolean isKeyExists = redisLongTemplate.hasKey(userSubCacheKey);
if (isKeyExists) {
/**
* only the user is active recently
* the redis will cache the user subscribe list then we push the newest article
* to the subscribe user one by one(may be we should make the operation less)
* when the channel subscribe user increment
* this may become a performance neck bottle
*/
redisLongTemplate.opsForZSet().add(userSubCacheKey, articleIdsSet);
}
});
}
}
I am facing a problem that the Redis stream have message but the spring data Redis consumer did not consume it. when I am using this command to check the stream message:
> XINFO STREAM pydolphin:stream:article
length
45
radix-tree-keys
1
radix-tree-nodes
2
last-generated-id
1652101310592-0
groups
1
first-entry
1652083221122-0
id
2288687
sub_source_id
4817
last-entry
1652101310592-0
id
2288731
sub_source_id
4792
it shows that there have some message but the consumer did not consume it. why did this happen? what should I do to fix this problem?Sometimes it will consume one or more message but not all the message will be consumed. It always have many message in the queue. I have already tried to use this command to check the pending message:
XPENDING pydolphin:stream:article pydolphin:stream:group:article - + 20
it only return 1 pending message. but the stream queue have 40+ message.
Related
I am trying to create a simple application where the app will consume Kafka message do some cql transform and publish to Kafka and below is the code:
JAVA: 1.8
Flink: 1.13
Scala: 2.11
flink-siddhi: 2.11-0.2.2-SNAPSHOT
I am using library: https://github.com/haoch/flink-siddhi
input json to Kafka:
{
"awsS3":{
"ResourceType":"aws.S3",
"Details":{
"Name":"crossplane-test",
"CreationDate":"2020-08-17T11:28:05+00:00"
},
"AccessBlock":{
"PublicAccessBlockConfiguration":{
"BlockPublicAcls":true,
"IgnorePublicAcls":true,
"BlockPublicPolicy":true,
"RestrictPublicBuckets":true
}
},
"Location":{
"LocationConstraint":"us-west-2"
}
}
}
main class:
public class S3SidhiApp {
public static void main(String[] args) {
internalStreamSiddhiApp.start();
//kafkaStreamApp.start();
}
}
App class:
package flinksidhi.app;
import com.google.gson.JsonObject;
import flinksidhi.event.s3.source.S3EventSource;
import io.siddhi.core.SiddhiManager;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.siddhi.SiddhiCEP;
import org.json.JSONObject;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Map;
import static flinksidhi.app.connector.Consumers.createInputMessageConsumer;
import static flinksidhi.app.connector.Producer.*;
public class internalStreamSiddhiApp {
private static final String inputTopic = "EVENT_STREAM_INPUT";
private static final String outputTopic = "EVENT_STREAM_OUTPUT";
private static final String consumerGroup = "EVENT_STREAM1";
private static final String kafkaAddress = "localhost:9092";
private static final String zkAddress = "localhost:2181";
private static final String S3_CQL1 = "from inputStream select * insert into temp";
private static final String S3_CQL = "from inputStream select json:toObject(awsS3) as obj insert into temp;" +
"from temp select json:getString(obj,'$.awsS3.ResourceType') as affected_resource_type," +
"json:getString(obj,'$.awsS3.Details.Name') as affected_resource_name," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration') as encryption," +
"json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm') as algorithm insert into temp2; " +
"from temp2 select affected_resource_name,affected_resource_type, " +
"ifThenElse(encryption == ' ','Fail','Pass') as state," +
"ifThenElse(encryption != ' ' and algorithm == 'aws:kms','None','Critical') as severity insert into outputStream";
public static void start(){
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//DataStream<String> inputS = env.addSource(new S3EventSource());
//Flink kafka stream consumer
FlinkKafkaConsumer<String> flinkKafkaConsumer =
createInputMessageConsumer(inputTopic, kafkaAddress,zkAddress, consumerGroup);
//Add Data stream source -- flink consumer
DataStream<String> inputS = env.addSource(flinkKafkaConsumer);
SiddhiCEP cep = SiddhiCEP.getSiddhiEnvironment(env);
cep.registerExtension("json:toObject", io.siddhi.extension.execution.json.function.ToJSONObjectFunctionExtension.class);
cep.registerExtension( "json:getString", io.siddhi.extension.execution.json.function.GetStringJSONFunctionExtension.class);
cep.registerStream("inputStream", inputS, "awsS3");
inputS.print();
System.out.println(cep.getDataStreamSchemas());
//json needs extension jars to present during runtime.
DataStream<Map<String,Object>> output = cep
.from("inputStream")
.cql(S3_CQL1)
.returnAsMap("temp");
//Flink kafka stream Producer
FlinkKafkaProducer<Map<String, Object>> flinkKafkaProducer =
createMapProducer(env,outputTopic, kafkaAddress);
//Add Data stream sink -- flink producer
output.addSink(flinkKafkaProducer);
output.print();
try {
env.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Consumer class:
package flinksidhi.app.connector;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.json.JSONObject;
import java.util.Properties;
public class Consumers {
public static FlinkKafkaConsumer<String> createInputMessageConsumer(String topic, String kafkaAddress, String zookeeprAddr, String kafkaGroup ) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", kafkaAddress);
properties.setProperty("zookeeper.connect", zookeeprAddr);
properties.setProperty("group.id",kafkaGroup);
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(
topic,new SimpleStringSchema(),properties);
return consumer;
}
}
Producer class:
package flinksidhi.app.connector;
import flinksidhi.app.util.ConvertJavaMapToJson;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.json.JSONObject;
import java.util.Map;
public class Producer {
public static FlinkKafkaProducer<Tuple2> createStringProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Tuple2>(kafkaAddress, topic, new AverageSerializer());
}
public static FlinkKafkaProducer<Map<String,Object>> createMapProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {
return new FlinkKafkaProducer<Map<String,Object>>(kafkaAddress, topic, new SerializationSchema<Map<String, Object>>() {
#Override
public void open(InitializationContext context) throws Exception {
}
#Override
public byte[] serialize(Map<String, Object> stringObjectMap) {
String json = ConvertJavaMapToJson.convert(stringObjectMap);
return json.getBytes();
}
});
}
}
I have tried many things but the code where the CQL is invoked is never called and doesn't even give any error not sure where is it going wrong.
The same thing if I do creating an internal stream source and use the same input json to return as string it works.
Initial guess: if you are using event time, are you sure you have defined watermarks correctly? As stated in the docs:
(...) an incoming element is initially put in a buffer where elements are sorted in ascending order based on their timestamp, and when a watermark arrives, all the elements in this buffer with timestamps smaller than that of the watermark are processed (...)
If this doesn't help, I would suggest to decompose/simplify the job to a bare minimum, for example just a source operator and some naive sink printing/logging elements. And if that works, start adding back operators one by one. You could also start by simplifying your CEP pattern as much as possible.
First of all thanks a lot #Piotr Nowojski , just because of your small pointer which no matter how many times I pondered over about event time , it did not came in my mind. So yes while debugging the two cases:
With internal datasource , where it was processing successfully, while debugging the flow , I identified that it was processing a watermark after it was processing the data, but it did not catch me that it was somehow managing the event time of the data implicitly.
With kafka as a datasource , while I was debugging I could very clearly see that it was not processing any watermark in the flow, but it did not occur to me that , it is happening because of the event time and watermark not handled properly.
Just adding a single line of code in the application code which I understood from below Flink code snippet:
#deprecated In Flink 1.12 the default stream time characteristic has been changed to {#link
* TimeCharacteristic#EventTime}, thus you don't need to call this method for enabling
* event-time support anymore. Explicitly using processing-time windows and timers works in
* event-time mode. If you need to disable watermarks, please use {#link
* ExecutionConfig#setAutoWatermarkInterval(long)}. If you are using {#link
* TimeCharacteristic#IngestionTime}, please manually set an appropriate {#link
* WatermarkStrategy}. If you are using generic "time window" operations (for example {#link
* org.apache.flink.streaming.api.datastream.KeyedStream#timeWindow(org.apache.flink.streaming.api.windowing.time.Time)}
* that change behaviour based on the time characteristic, please use equivalent operations
* that explicitly specify processing time or event time.
*/
I got to know that by default flink considers event time and for that watermark needs to be handled properly which I didn't so I added below link for setting the time characteristics of the flink execution environment:
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
and kaboom ... it started working , while this is deprecated and needs some other configuration, but thanks a lot , it was a great pointer and helped me a lot and I solved the issue..
Thanks again #Piotr Nowojski
I've written a simple Kafka Stream processor code to
Read messages as stream from a topic with <K, V> as <String, String>
Convert the value in the message from String to a Custom Object <String, Object> using mapValues() method
Use Window function to aggregate the statistics of the Objects for a particular time interval
Sample Message
{"coiRequestGuid":"xxxx","accountId":1122132,"companyName":"xxxx","existingPolicyCoverageLimit":1000000,"isChangeRequested":true,"newlyRequestedPolicyCoverageLimit":200000,"isNewRecipient":false,"newRecipientGuid":null,"existingRecipientId":11111,"recipientName":"xxxx","recipientEmail":"xxxxx"}
Here is my code
import com.da.app.data.model.PolicyChangeRequest;
import com.da.app.data.model.PolicyChangeRequestStats;
import com.da.app.data.serde.JsonDeserializer;
import com.da.app.data.serde.JsonSerializer;
import com.da.app.data.serde.WrapperSerde;
import com.da.app.system.util.ConfigUtil;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.*;
import org.apache.kafka.streams.state.WindowStore;
import org.apache.log4j.Logger;
import org.json.JSONObject;
import java.time.Duration;
import java.util.Properties;
public class PolicyChangeReqStreamProcessor {
private static final Logger logger = Logger.getLogger(PolicyChangeReqStreamProcessor.class);
private static final String TOPIC_NAME = "stream-window-play1";
private static ObjectMapper mapper = new ObjectMapper();
private static Properties properties = ConfigUtil.loadProperty();
public static void main(String[] args) {
logger.info("Policy Limit Change Stats Generator");
Properties streamProperties = getStreamProperties();
StreamsBuilder streamsBuilder = new StreamsBuilder();
KStream<String, String> source = streamsBuilder.stream(TOPIC_NAME,
Consumed.with(Serdes.String(), Serdes.String()));
source
.filter((key, value) -> isValidEvent(value))
//Converting the request json to PolicyChangeRequest object
.mapValues(PolicyChangeReqStreamProcessor::convertPolicyChangeReqJsonToObj)
//Mapping all events to a single key in order to group all the events
.map((key, value) -> new KeyValue<>("key", value))
// Grouping by key
.groupByKey(Grouped.with(Serdes.String(), new PolicyChangeRequestSerde()))
//Creating a Tumbling window of 5 secs (for Testing)
.windowedBy(TimeWindows.of(Duration.ofSeconds(5)).advanceBy(Duration.ofSeconds(5)))
// Aggregating the PolicyChangeRequest events to a
// PolicyChangeRequestStats object
.<PolicyChangeRequestStats>aggregate(PolicyChangeRequestStats::new,
(k, v, policyStats) -> policyStats.add(v),
Materialized.<String, PolicyChangeRequestStats, WindowStore<Bytes, byte[]>>as
("policy-change-aggregates")
.withValueSerde(new PolicyChangeRequestStatsSerde()))
//Converting KTable to KStream
.toStream()
.foreach((key, value) -> logger.info(key.window().startTime() + "----" + key.window().endTime() + " :: " + value));
KafkaStreams kafkaStreams = new KafkaStreams(streamsBuilder.build(), streamProperties);
logger.info("Started the stream");
kafkaStreams.start();
Runtime.getRuntime().addShutdownHook(new Thread(kafkaStreams::close));
}
private static PolicyChangeRequest convertPolicyChangeReqJsonToObj(String policyChangeReq) {
JSONObject policyChangeReqJson = new JSONObject(policyChangeReq);
PolicyChangeRequest policyChangeRequest = new PolicyChangeRequest(policyChangeReqJson);
// return mapper.readValue(value, PolicyChangeRequest.class);
return policyChangeRequest;
}
private static boolean isValidEvent(String value) {
//TODO: Message Validation
return true;
}
private static Properties getStreamProperties() {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "policy-change-stats-gen");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, properties.getProperty("kafka.bootstrap.servers"));
props.put(StreamsConfig.CLIENT_ID_CONFIG, "stream-window-play1");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
return props;
}
public static final class PolicyChangeRequestStatsSerde extends WrapperSerde<PolicyChangeRequestStats> {
PolicyChangeRequestStatsSerde() {
super(new JsonSerializer<>(), new JsonDeserializer<>(PolicyChangeRequestStats.class));
}
}
public static final class PolicyChangeRequestSerde extends WrapperSerde<PolicyChangeRequest> {
PolicyChangeRequestSerde() {
super(new JsonSerializer<>(), new JsonDeserializer<>(PolicyChangeRequest.class));
}
}
}
isValidEvent - returns true
.mapValues(PolicyChangeReqStreamProcessor::convertPolicyChangeReqJsonToObj) - This will convert the incoming json string to a PolicyChangeRequest object
Till the operation map((key, value) -> new KeyValue<>("key", value)), the Custom Object - PolicyChangeRequest is fine as per the incoming message (I've tested by printing the stream there).
But after going into the Groupby and aggregate operation, the Custom Object got changed as
PolicyChangeRequest{coiRequestGuid='null', accountId='null', companyName='null', existingPolicyCoverageLimit=null, isChangeRequested=null, newlyRequestedPolicyCoverageLimit=null, isNewRecipient=null, newRecipientGuid='null', existingRecipientId='null', recipientName='null', recipientEmail='null'}
I found the above value by putting a log statement inside the policyStats.add(v) method i've called inside the aggregate method.
The add method is in the PolicyChangeRequestStats class
public PolicyChangeRequestStats add(PolicyChangeRequest policyChangeRequest) {
System.out.println("Incoming req: " + policyChangeRequest);
//Incrementing the Policy limit change request count
this.policyLimitChangeRequests++;
//Adding the Increased policy limit coverage to the existing increasedPolicyLimitCoverage
this.increasedPolicyLimitCoverage +=
(policyChangeRequest.getNewlyRequestedPolicyCoverageLimit() -
policyChangeRequest.getExistingPolicyCoverageLimit());
return this;
}
I'm getting NullPointerException in the line where I'm adding the policyChangeRequest.getNewlyRequestedPolicyCoverageLimit() - policyChangeRequest.getExistingPolicyCoverageLimit() as the values were null in the PolicyChangeRequest object
I've provided the valid Serde classes for the key and Value while doing groupBy .groupByKey(Grouped.with(Serdes.String(), new PolicyChangeRequestSerde())).
For Serialization and Desrialization I used Gson.
But I can't able to get the PolicyChangeRequest object as is before it was sent to the grouping operation.
I'm new to kafka Streams and I'm not sure whether I missed anything or whether the process I'm doing is correct or not.
Can anyone guide me here?
Overview:
I'm totally new to Elastic search testing and I'm gonna add proper unit tests. The project compatibilities are as follow:
Java 8
Elasticsearch 6.2.4
Project uses low level rest client for fetching data from ES
More info about ES configurations is as follow:
import static java.net.InetAddress.getByName;
import static java.util.Arrays.stream;
import java.net.UnknownHostException;
import java.util.Map;
import java.util.Objects;
import javax.inject.Inject;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import au.com.api.util.RestClientUtil;
import lombok.extern.slf4j.Slf4j;
#Slf4j
#Configuration
public class ElasticConfiguration implements InitializingBean{
#Value(value = "${elasticsearch.hosts}")
private String[] hosts;
#Value(value = "${elasticsearch.httpPort}")
private int httpPort;
#Value(value = "${elasticsearch.tcpPort}")
private int tcpPort;
#Value(value = "${elasticsearch.clusterName}")
private String clusterName;
#Inject
private RestClientUtil client;
#Bean
public RestHighLevelClient restHighClient() {
return new RestHighLevelClient(RestClient.builder(httpHosts()));
}
#Bean
#Deprecated
public RestClient restClient() {
return RestClient.builder(httpHosts()).build();
}
/**
* #return TransportClient
* #throws UnknownHostException
*/
#SuppressWarnings("resource")
#Bean
public TransportClient transportClient() throws UnknownHostException{
Settings settings = Settings.builder()
.put("cluster.name", clusterName).build();
return new PreBuiltTransportClient(settings).addTransportAddresses(transportAddresses());
}
#Override
public void afterPropertiesSet() throws Exception {
log.debug("loading search templates...");
try {
for (Map.Entry<String, String> entry : Constants.SEARCH_TEMPLATE_MAP.entrySet()) {
client.putInlineSearchTemplateToElasticsearch(entry.getKey(), entry.getValue());
}
} catch (Exception e) {
log.error("Exception has occurred in putting search templates into ES.", e);
}
}
private HttpHost[] httpHosts() {
return stream(hosts).map(h -> new HttpHost(h, httpPort, "http")).toArray(HttpHost[]::new);
}
private TransportAddress[] transportAddresses() throws UnknownHostException {
TransportAddress[] transportAddresses = stream(hosts).map(h -> {
try {
return new TransportAddress(getByName(h), tcpPort);
} catch (UnknownHostException e) {
log.error("Exception has occurred in creating ES TransportAddress. host: '{}', tcpPort: '{}'", h, tcpPort, e);
}
return null;
}).filter(Objects::nonNull).toArray(TransportAddress[]::new);
if (transportAddresses.length == 0) {
throw new UnknownHostException();
}
return transportAddresses;
}
}
Issue:
I don't know how to Mock ES or how to test ES without running an standalone ES on my machine. Please use the following class as an example and let me know how could I write a testcase (unit test not integration) for getSearchResponse method:
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.NoNodeAvailableException;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.script.ScriptType;
import org.elasticsearch.script.mustache.SearchTemplateRequestBuilder;
import org.elasticsearch.search.Scroll;
import org.elasticsearch.search.aggregations.Aggregation;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.MessageSource;
import org.springframework.stereotype.Repository;
#Slf4j
#Repository
#NoArgsConstructor
public abstract class NewBaseElasticsearchRepository {
#Autowired
protected NewIndexLocator newIndexLocator;
#Value(value = "${elasticsearch.client.timeout}")
private Long timeout;
#Autowired
protected TransportClient transportClient;
#Autowired
protected ThresholdService thresholdService;
#Autowired
protected MessageSource messageSource;
/**
* #param script the name of the script to be executed
* #param templateParams a map of the parameters to be sent to the script
* #param indexName the index to target (an empty indexName will search all indexes)
*
* #return a Search Response object containing details of the request results from Elasticsearch
*
* #throws NoNodeAvailableException thrown when the transport client cannot connect to any ES Nodes (or Coordinators)
* #throws Exception thrown for all other request errors such as parsing and non-connectivity related issues
*/
protected SearchResponse getSearchResponse(String script, Map<String, Object> templateParams, String... indexName) {
log.debug("transport client >> index name --> {}", Arrays.toString(indexName));
SearchResponse searchResponse;
try {
searchResponse = new SearchTemplateRequestBuilder(transportClient)
.setScript(script)
.setScriptType(ScriptType.STORED)
.setScriptParams(templateParams)
.setRequest(new SearchRequest(indexName))
.execute()
.actionGet(timeout)
.getResponse();
} catch (NoNodeAvailableException e) {
log.error(ELASTIC_SEARCH_EXCEPTION_NOT_FOUND, e.getMessage());
throw new ElasticSearchException(ELASTIC_SEARCH_EXCEPTION_NOT_FOUND);
} catch (Exception e) {
log.error(ELASTIC_SEARCH_EXCEPTION, e.getMessage());
throw new ElasticSearchException(ELASTIC_SEARCH_EXCEPTION);
}
log.debug("searchResponse ==> {}", searchResponse);
return searchResponse;
}
So, I would be grateful if you could have a look on the example class and share your genuine solutions with me here about how could I mock TransportClient and get a proper response from SearchResponse object.
Note:
I tried to use ESTestCase from org.elasticsearch.test:framework:6.2.4 but faced jar hell issue and could't resolve it. In the meantime, I could't find any proper docs related to that or Java ES unit testing, in general.
OK, everyone. Another Dataflow question from a Dataflow newbie. (Just started playing with it this week..)
I'm creating a datapipe to take in a list of product names and generate autocomplete data. The data processing part is all working fine, it seems, but I'm missing something obvious because when I add my last ".apply" to use either DatastoreIO or TextIO to write the data out, I'm getting a syntax error in my IDE that says the following:
"The method apply(DatastoreV1.Write) is undefined for the type ParDo.SingleOutput>,Entity>"
If gives me an option add a cast to the method receiver, but that obviously isn't the answer. Do I need to do some other step before I try to write the data out? My last step before trying to write the data is a call to an Entity helper for Dataflow to change my Pipeline structure from > to , which seems to me like what I'd need to write to Datastore.
I got so frustrated with this thing the last few days, I even decided to write the data to some AVRO files instead so I could just load it in Datastore by hand. Imagine how ticked I was when I got all that done and got the exact same error in the exact same place on my call to TextIO. That is why I think I must be missing something very obvious here.
Here is my code. I included it all for reference, but you probably just need to look at the main[] at the bottom. Any input would be greatly appreciated! Thanks!
MrSimmonsSr
package com.client.autocomplete;
import com.client.autocomplete.AutocompleteOptions;
import com.google.datastore.v1.Entity;
import com.google.datastore.v1.Key;
import com.google.datastore.v1.Value;
import static com.google.datastore.v1.client.DatastoreHelper.makeKey;
import static com.google.datastore.v1.client.DatastoreHelper.makeValue;
import org.apache.beam.sdk.coders.DefaultCoder;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.PCollectionList;
import com.google.api.services.bigquery.model.TableRow;
import com.google.common.base.MoreObjects;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.gcp.datastore.DatastoreIO;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.transforms.GroupByKey;
import org.apache.beam.sdk.transforms.DoFn.ProcessContext;
import org.apache.beam.sdk.transforms.DoFn.ProcessElement;
import org.apache.beam.sdk.extensions.jackson.ParseJsons;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.StreamingOptions;
import org.apache.beam.sdk.options.Validation;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.List;
import java.util.ArrayList;
/*
* A simple Dataflow pipeline to create autocomplete data from a list of
* product names. It then loads that prefix data into Google Cloud Datastore for consumption by
* a Google Cloud Function. That function will take in a prefix and return a list of 10 product names
*
* Pseudo Code Steps
* 1. Load a list of product names from Cloud Storage
* 2. Generate prefixes for use with autocomplete, based on the product names
* 3. Merge the prefix data together with 10 products per prefix
* 4. Write that prefix data to the Cloud Datastore as a KV with a <String>, List<String> structure
*
*/
public class ClientAutocompletePipeline {
private static final Logger LOG = LoggerFactory.getLogger(ClientAutocompletePipeline.class);
/**
* A DoFn that keys each product name by all of its prefixes.
* This creates one row in the PCollection for each prefix<->product_name pair
*/
private static class AllPrefixes
extends DoFn<String, KV<String, String>> {
private final int minPrefix;
private final int maxPrefix;
public AllPrefixes(int minPrefix) {
this(minPrefix, 10);
}
public AllPrefixes(int minPrefix, int maxPrefix) {
this.minPrefix = minPrefix;
this.maxPrefix = maxPrefix;
}
#ProcessElement
public void processElement(ProcessContext c) {
String productName= c.element().toString();
for (int i = minPrefix; i <= Math.min(productName.length(), maxPrefix); i++) {
c.output(KV.of(productName.substring(0, i), c.element()));
}
}
}
/**
* Takes as input the top product names per prefix, and emits an entity
* suitable for writing to Cloud Datastore.
*
*/
static class FormatForDatastore extends DoFn<KV<String, List<String>>, Entity> {
private String kind;
private String ancestorKey;
public FormatForDatastore(String kind, String ancestorKey) {
this.kind = kind;
this.ancestorKey = ancestorKey;
}
#ProcessElement
public void processElement(ProcessContext c) {
// Initialize an EntityBuilder and get it a valid key
Entity.Builder entityBuilder = Entity.newBuilder();
Key key = makeKey(kind, ancestorKey).build();
entityBuilder.setKey(key);
// New HashMap to hold all the properties of the Entity
Map<String, Value> properties = new HashMap<>();
String prefix = c.element().getKey();
String productsString = "Products[";
// iterate through the product names and add each one to the productsString
for (String productName : c.element().getValue()) {
// products.add(productName);
productsString += productName + ", ";
}
productsString += "]";
properties.put("prefix", makeValue(prefix).build());
properties.put("products", makeValue(productsString).build());
entityBuilder.putAllProperties(properties);
c.output(entityBuilder.build());
}
}
/**
* Options supported by this class.
*
* <p>Inherits standard Beam example configuration options.
*/
public interface Options
extends AutocompleteOptions {
#Description("Input text file")
#Validation.Required
String getInputFile();
void setInputFile(String value);
#Description("Cloud Datastore entity kind")
#Default.String("prefix-product-map")
String getKind();
void setKind(String value);
#Description("Whether output to Cloud Datastore")
#Default.Boolean(true)
Boolean getOutputToDatastore();
void setOutputToDatastore(Boolean value);
#Description("Cloud Datastore ancestor key")
#Default.String("root")
String getDatastoreAncestorKey();
void setDatastoreAncestorKey(String value);
#Description("Cloud Datastore output project ID, defaults to project ID")
String getOutputProject();
void setOutputProject(String value);
}
public static void main(String[] args) throws IOException{
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
// create the pipeline
Pipeline p = Pipeline.create(options);
PCollection<String> toWrite = p
// A step to read in the product names from a text file on GCS
.apply(TextIO.read().from("gs://sample-product-data/clean_product_names.txt"))
// Next expand the product names into KV pairs with prefix as key (<KV<String, String>>)
.apply("Explode Prefixes", ParDo.of(new AllPrefixes(2)))
// Apply a GroupByKey transform to the PCollection "flatCollection" to create "productsGroupedByPrefix".
.apply(GroupByKey.<String, String>create())
// Now format the PCollection for writing into the Google Datastore
.apply("FormatForDatastore", ParDo.of(new FormatForDatastore(options.getKind(),
options.getDatastoreAncestorKey()))
// Write the processed data to the Google Cloud Datastore
// NOTE: This is the line that I'm getting the error on!!
.apply(DatastoreIO.v1().write().withProjectId(MoreObjects.firstNonNull(
options.getOutputProject(), options.getOutputProject()))));
// Run the pipeline.
PipelineResult result = p.run();
}
}
I think you need another closing parenthesis. I've removed some of the extraneous bits and reindent according to the parentheses:
PCollection<String> toWrite = p
.apply(TextIO.read().from("..."))
.apply("Explode Prefixes", ...)
.apply(GroupByKey.<String, String>create())
.apply("FormatForDatastore", ParDo.of(new FormatForDatastore(
options.getKind(), options.getDatastoreAncestorKey()))
.apply(...);
Specifically, you need another parenthesis to close the apply("FormatForDatastore", ...). Right now, it is trying to call ParDo.of(...).apply(...) which doesn't work.
what does this Spring JSON endpoint break in Jboss/Tomcat ? I tried to add this to an existing APPLICATION and It worked until I started refactoring the code and now the errors do not point to anything that is logical to me.
Here is my code a controller and Helper class to keep things clean.
Controller.Java
import java.io.IOException;
import java.util.Properties;
import javax.annotation.Nonnull;
import org.apache.commons.configuration.ConfigurationException;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;
#Controller
public class PropertiesDisplayController {
#Nonnull
private final PropertiesDisplayHelper propertiesHelper;
/**
* #param propertiesHelper
*/
#Nonnull
#Autowired
public PropertiesDisplayController(#Nonnull final PropertiesDisplayHelper propertiesHelper) {
super();
this.propertiesHelper = propertiesHelper;
}
#Nonnull
#RequestMapping("/localproperties")
public #ResponseBody Properties localProperties() throws ConfigurationException, IOException {
return propertiesHelper.getLocalProperties();
}
#Nonnull
#RequestMapping("/properties")
public #ResponseBody Properties applicationProperties() throws IOException,
ConfigurationException {
return propertiesHelper.getApplicationProperties();
}
}
this would be the Helper.java
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import java.util.Map;
import java.util.Properties;
import java.util.TreeMap;
import javax.annotation.Nonnull;
import org.apache.commons.configuration.Configuration;
import org.apache.commons.configuration.ConfigurationException;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang.StringEscapeUtils;
import org.apache.commons.lang.StringUtils;
public class PropertiesDisplayHelper {
/** Static location of properties in for SSO */
private static String LOCAL_PROPERTIES_LOCATION =
"local.properties";
/** Static strings for masking the passwords and blank values */
private static String NOVALUE = "**NO VALUE**";
/** Static strings for masking the passwords and blank values */
private static String MASKED = "**MASKED**";
#Nonnull
public Properties getApplicationProperties() throws ConfigurationException {
final Properties properties = new Properties();
final Configuration configuration = AppConfiguration.Factory.getConfiguration();
// create a map of properties
final Iterator<?> propertyKeys = configuration.getKeys();
final Map<String, String> sortedProperties = new TreeMap<String, String>();
// loops the configurations and builds the properties
while (propertyKeys.hasNext()) {
final String key = propertyKeys.next().toString();
final String value = configuration.getProperty(key).toString();
sortedProperties.put(key, value);
}
properties.putAll(sortedProperties);
// output of the result
formatsPropertiesData(properties);
return properties;
}
#Nonnull
public Properties getLocalProperties() throws ConfigurationException, IOException {
FileInputStream fis = null;
final Properties properties = new Properties();
// imports file local.properties from specified location
// desinated when the update to openAM12
try {
fis = new FileInputStream(LOCAL_PROPERTIES_LOCATION);
properties.load(fis);
} finally {
// closes file input stream
IOUtils.closeQuietly(fis);
}
formatsPropertiesData(properties);
return properties;
}
void formatsPropertiesData(#Nonnull final Properties properties) {
for (final String key : properties.stringPropertyNames()) {
String value = properties.getProperty(key);
if (StringUtils.isEmpty(value)) {
value = NOVALUE;
} else if (key.endsWith("ssword")) {
value = MASKED;
} else {
value = StringEscapeUtils.escapeHtml(value);
}
// places data to k,v paired properties object
properties.put(key, value);
}
}
}
they set up a json display of the properties in application and from a file for logging. Yet now this no intrusive code seems to break my entire application build.
Here is the error from Jboss
20:33:41,559 ERROR [org.jboss.as.server] (DeploymentScanner-threads - 1) JBAS015870: Deploy of deployment "openam.war" was rolled back with the following failure message:
{"JBAS014671: Failed services" => {"jboss.deployment.unit.\"APPLICATION.war\".STRUCTURE" => "org.jboss.msc.service.StartException in service jboss.deployment.unit.\"APPLICATION.war\".STRUCTURE: JBAS018733: Failed to process phase STRUCTURE of deployment \"APPLICATION.war\"
Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException: JBAS018740: Failed to mount deployment content
Caused by: java.io.FileNotFoundException:APPLICATION.war (Access is denied)"}}
and the Errors from Tomcat
http://pastebin.com/PXdcpqvc
I am at a lost here and think there is something I just do not see.
A simple solution was at hand. The missing component was #Component annotation on the helper class.