Spring Integration : Transformer : file to Object

Spring Integration : Transformer : file to Object - java

I am new to Spring Integration and I am trying to read a file and transform into a custom object which has to be sent to jms Queue wrapped in jms.Message.
It all has to be done using annotation.
I am reading the files from directory using below.
#Bean
#InboundChannelAdapter(value = "filesChannel", poller = #Poller(fixedRate = "5000", maxMessagesPerPoll = "1"))
public MessageSource<File> fileReadingMessageSource() {
FileReadingMessageSource source = new FileReadingMessageSource();
source.setDirectory(new File(INBOUND_PATH));
source.setAutoCreateDirectory(false);
/*source.setFilter(new AcceptOnceFileListFilter());*/
source.setFilter(new CompositeFileListFilter<File>(getFileFilters()));
return source;
}
Next Step is transforming the file content to Invoice Object(assume).
I want to know what would be incoming message type for my transformer and how should I transform it. Could you please help here. I am not sure what would be the incoming datatype and what should be the transformed object type (should it be wrapped inside Message ?)
#Transformer(inputChannel = "filesChannel", outputChannel = "jmsOutBoundChannel")
public ? convertFiletoInvoice(? fileMessage){
}

The payload is a File (java.io.File).
You can read the file and output whatever you want (String, byte[], Invoice etc).
Or you could use some of the standard transformers (e.g. FileToStringTransformer, JsonToObjectTransformer etc).
The JMS adapter will convert the object to TextMessage, ObjectMessage etc.

Related

Streaming data from Kinesis to S3 fails with Illegal Character that KPL itself writes

I have a relatively straightforward use case:
Read Avro data from a Kafka topic
Use KPL (v0.14.12) to send this data to Kinesis Data Streams
Use Kinesis Firehose to transform this data into Parquet and transfer it to S3.
The Kafka topic was written into by Kafka Streams using the following producer Configuration:
private void addAwsGlueSpecificProperties(Map<String, Object> props) {
props.put(AWSSchemaRegistryConstants.AWS_REGION, "eu-central-1");
props.put(AWSSchemaRegistryConstants.DATA_FORMAT, DataFormat.AVRO.name());
props.put(AWSSchemaRegistryConstants.SCHEMA_AUTO_REGISTRATION_SETTING, true);
props.put(AWSSchemaRegistryConstants.REGISTRY_NAME, "Kinesis_Schema_Registry");
props.put(AWSSchemaRegistryConstants.COMPRESSION_TYPE, AWSSchemaRegistryConstants.COMPRESSION.ZLIB.name());
props.put(DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(DEFAULT_VALUE_SERDE_CLASS_CONFIG, GlueSchemaRegistryKafkaStreamsSerde.class.getName());
}
Most notably, I've set SCHEMA_AUTO_REGISTRATION_SETTING to true to try and rule out problems with my schema definition. The auto-registration itself worked without any issues.
I have a very simple loop running for test purposes, which does step 1 and 2 of the above. It looks as follows:
KinesisProducer kinesisProducer = new KinesisProducer(getKinesisConfig());
try (final KafkaConsumer<String, AvroEvent> consumer = new KafkaConsumer<>(properties)) {
consumer.subscribe(Collections.singletonList(TOPIC));
while (true) {
log.info("Polling...");
final ConsumerRecords<String, AvroEvent> records = consumer.poll(Duration.ofMillis(100));
for (final ConsumerRecord<String, AvroEvent> record : records) {
final String key = record.key();
ListenableFuture<UserRecordResult> request = kinesisProducer.addUserRecord("my-data-stream", key, randomExplicitHashKey(), value.toByteBuffer(), gsrSchema);
Futures.addCallback(request, CALLBACK, executor);
}
Thread.sleep(Duration.ofSeconds(10).toMillis());
}
}
The callback just does a bit of logging on success/failure.
My Kinesis Config looks as follows:
private static KinesisProducerConfiguration getKinesisConfig() {
KinesisProducerConfiguration config = new KinesisProducerConfiguration();
GlueSchemaRegistryConfiguration schemaRegistryConfiguration = getGlueSchemaRegistryConfiguration();
config.setGlueSchemaRegistryConfiguration(schemaRegistryConfiguration);
config.setRegion("eu-central-1");
config.setCredentialsProvider(new DefaultAWSCredentialsProviderChain());
config.setMaxConnections(2);
config.setThreadingModel(KinesisProducerConfiguration.ThreadingModel.POOLED);
config.setThreadPoolSize(2);
config.setRateLimit(100L);
return config;
}
private static GlueSchemaRegistryConfiguration getGlueSchemaRegistryConfiguration() {
GlueSchemaRegistryConfiguration gsrConfig = new GlueSchemaRegistryConfiguration("eu-central-1");
gsrConfig.setAvroRecordType(AvroRecordType.GENERIC_RECORD ); // have also tried SPECIFIC_RECORD
gsrConfig.setRegistryName("Kinesis_Schema_Registry");
gsrConfig.setCompressionType(AWSSchemaRegistryConstants.COMPRESSION.ZLIB);
return gsrConfig;
}
This setup allows me to read Specific Avro records from Kafka and send them to Kinesis. I have also verified that the correct schema version ID is queried from GSR by my code. However, when my data gets to Firehose, I receive only the following error message for all my records (one per record):
{
"attemptsMade": 1,
"arrivalTimestamp": 1659622848304,
"lastErrorCode": "DataFormatConversion.ParseError",
"lastErrorMessage": "Encountered malformed JSON. Illegal character ((CTRL-CHAR, code 3)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: com.fasterxml.jackson.databind.util.ByteBufferBackedInputStream#6252e7eb; line: 1, column: 2]",
"attemptEndingTimestamp": 1659623152452,
"rawData": "<base64EncodedData>",
"sequenceNumber": "<seqNum>",
"dataCatalogTable": {
"databaseName": "<Glue database name>",
"tableName": "<Glue table name>",
"region": "eu-central-1",
"versionId": "LATEST",
"roleArn": "<arn>"
}
}
Unfortunately I can't post the entirety of the data as it is sensitive. However, the relevant part is that it always starts with the above control character that is causing the problem:
0x03 0x05 <schemaVersionId> <data>
My original data does not contain these control characters. After some debugging, I've found that KPL explicitly adds these bytes to the beginning of a UserRecord. In com.amazonaws.services.schemaregistry.serializers.SerializationDataEncoder#write:
public byte[] write(final byte[] objectBytes, UUID schemaVersionId) {
byte[] bytes;
try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
writeHeaderVersionBytes(out);
writeCompressionBytes(out);
writeSchemaVersionId(out, schemaVersionId);
boolean shouldCompress = this.compressionHandler != null;
bytes = writeToExistingStream(out, shouldCompress ? compressData(objectBytes) : objectBytes);
} catch (Exception e) {
throw new AWSSchemaRegistryException(e.getMessage(), e);
}
return bytes;
}
With writeHeaderVersionBytes(out) and writeCompressionBytes(out) writing to the front of the stream, respectively:
// byte HEADER_VERSION_BYTE = (byte) 3;
private void writeHeaderVersionBytes(ByteArrayOutputStream out) {
out.write(AWSSchemaRegistryConstants.HEADER_VERSION_BYTE);
}
// byte COMPRESSION_BYTE = (byte) 5
// byte COMPRESSION_DEFAULT_BYTE = (byte) 0
private void writeCompressionBytes(ByteArrayOutputStream out) {
out.write(compressionHandler != null ? AWSSchemaRegistryConstants.COMPRESSION_BYTE
: AWSSchemaRegistryConstants.COMPRESSION_DEFAULT_BYTE);
}
Why is Kinesis unable to parse a message that is produced by the library that is supposed to be best suited for writing to it? What am I missing?

I've finally figured out the problem and it's quite dumb.
What it boils down to, is that the transformer that converts data to parquet in Firehose expects a pure JSON payload. It expects records in the form:
{"itemId": 1, "itemName": "someItem"}{"itemId": 2, "itemName": "otherItem"}
It seemingly does not accept the same data in a different format.
This means that Avro-compatible JSON (where the above itemId would look like "itemId": {"long": 1}, or e.g. binary Avro data, is not compatible with the Kinesis Firehose parquet transformer, regardless of the fact that my schema definition in the Glue Schema Registry is explicitly registered as being in Avro format.
In addition, the Firehose parquet transformer requires the use of a Glue table - creating this table from an imported Avro schema simply does not work (see this answer), and had to be created manually. Luckily, even though it can't use the table that is based on an existing schema, the table definition was the same (with the exception of the Serde it needs to use), so it was relatively easy to fix...
To sum up, to get the above code to work I had to:
Create a Glue table for the schema manually (you can use the first table created from the existing schema as a template for creating this second table, but you can't have Firehose link to the first table)
Change the above code:
kinesisProducer.addUserRecord("my-data-stream", key, randomExplicitHashKey(), value.toByteBuffer(), gsrSchema);
to:
ByteBuffer data = ByteBuffer.wrap(value.toString().getBytes(StandardCharsets.UTF_8));
kinesisProducer.addUserRecord("my-data-stream", key, randomExplicitHashKey(), data);
Note that the I am now using the overloaded addUserRecord function that does not include a Schema parameter, which internally invokes the previous function with a null schema parameter. This prevents the KPL from encoding my payload and instead sends the 'plain' JSON over to KDS.
This is contrary to the only AWS Docs example that I could find on the topic, which likely is meant for a Firehose stream which does not convert the data prior to sending it to its destination.
I can't quite understand the reasons for all these undocumented limitations, and it was a pain to debug seeing how neither of the KPL functions nor KDS explicitly mentions anywhere that I can find that this is the expected behaviour. I feel like it's not worth trying to open an issue/PR over at the KPL repo seeing how it seems like Amazon doesn't really care about maintaining it that much...
I'll probably switch over to the plain Kinesis Client + Kinesis Aggregation for a more robust solution in the future, but hey, at least it works.

How to convert Flux of ByteBuffer to Spring BodyInserter

I have an usecase to read a file from s3 and publish to rest service in java.
For the implementation, I am trying awssdk s3 API to read file which returns Flux<ByteBuffer> and then publish to rest service using the Spring WebClient.
Per my exploration, the spring WebClient requires BodyInserter which can be prepared using the BodyInserters.fromDataBuffers. I am unable to figure out how to convert properly Flux to Flux and call WebClient exchange;
Flux<ByteBuffer> byteBufferFlux = getS3File(key);
Flux<DataBuffer> dataBufferFlux= byteBufferFlux.map(byteBuffer -> {
?????????????Convert bytebuffer to data buffer ??????
return dataBuffer;
});
BodyInserter<Flux<DataBuffer>, ReactiveHttpOutputMessage> inserter = BodyInserters.fromDataBuffers(dataBufferFlux);
Any suggestions how to achieve this?

You can convert using DefaultDataBuffer which you can create via the DefaultDataBufferFactory
DataBufferFactory dataBufferFactory = new DefaultDataBufferFactory();
Flux<DataBuffer> buffer = getS3File(key).map(dataBufferFactory::wrap);
BodyInserter<Flux<DataBuffer>, ReactiveHttpOutputMessage> inserter =
BodyInserters.fromDataBuffers(buffer);
You don't actually need a BodyInserter at all though if using Webclient you can the following method signature for body()
<T, P extends Publisher<T>> RequestHeadersSpec<?> body(P publisher, Class<T> elementClass);
Which you can then pass your Flux<ByteBuffer> directly into, whilst specifying the Class to use
WebClient.create("http://someUrl")
.post()
.uri("/someUri")
.body(getS3File(key),ByteBuffer.class)

You may not need dataBufferFlux and should be able to write the Flux to your rest endpoint. Try this:
Flux<ByteBuffer> byteBufferFlux = getS3File(key);
BodyInserter<Flux<ByteBuffer>, ReactiveHttpOutputMessage> = BodyInserters.fromPublisher(byteBufferFlux, ByteBuffer.class);

Spring Batch creating multiple files .Gradle based project

I need to create 3 separate files.
My Batch job should read from Mongo then parse through the information and find the "business" column (3 types of business: RETAIL,HPP,SAX) then create a file for their respective business. the file should create either RETAIL +formattedDate; HPP + formattedDate; SAX +formattedDate as the file name and the information found in the DB inside a txt file. Also, I need to set the .resource(new FileSystemResource("C:\filewriter\index.txt)) into something that will send the information to the right location, right now hard coding works but only creates one .txt file.
example:
#Bean
public FlatFileItemWriter<PaymentAudit> writer() {
LOG.debug("Mongo-writer");
FlatFileItemWriter<PaymentAudit> flatFile = new
FlatFileItemWriterBuilder<PaymentAudit>()
.name("flatFileItemWriter")
.resource(new FileSystemResource("C:\\filewriter\\index.txt))
//trying to create a path instead of hard coding it
.lineAggregator(createPaymentPortalLineAggregator())
.build();
String exportFileHeader =
"CREATE_DTTM";
StringHeaderWriter headerWriter = new
StringHeaderWriter(exportFileHeader);
flatFile.setHeaderCallback(headerWriter);
return flatFile;
}
My idea would be something like but not sure where to go:
public Map<String, List<PaymentAudit>> getPaymentPortalRecords() {
List<PaymentAudit> recentlyCreated =
PaymentPortalRepository.findByCreateDttmBetween(yesterdayMidnight,
yesterdayEndOfDay);
List<PaymentAudit> retailList = new ArrayList<>();
List<PaymentAudit> saxList = new ArrayList<>();
List<PaymentAudit> hppList = new ArrayList<>();
//String exportFilePath = "C://filewriter/";??????
recentlyCreated.parallelStream().forEach(paymentAudit -> {
if (paymentAudit.getBusiness().equalsIgnoreCase(RETAIL)) {
retailList.add(paymentAudit);
} else if
(paymentAudit.getBusiness().equalsIgnoreCase(SAX)) {
saxList.add(paymentAudit);
} else if
(paymentAudit.getBusiness().equalsIgnoreCase(HPP)) {
hppList.add(paymentAudit);
}
});

To create a file for each business object type, you can use the ClassifierCompositeItemWriter. In your case, you can create a writer for each type and add them as delegates in the composite item writer.
As per creating the filename dynamically, you need to use a step scoped writer. There is an example in the Step Scope section of the reference documentation.
Hope this helps.

Convert JSON message to javax.jms.ObjectMessage in ActiveMq

I have an ActimeMQ consumer which expects a message in javax.jms.ObjectMessage format.
This message pojo has 5 string elements.
Now I am trying to write a message producer for this consumer in NodeJs.
I am using stompit module
My current NodeJs code is
stompit.connect(connectOptions, function(error, client) {
if (error) {
console.log('connect error ' + error.message);
return;
} else {
console.log("connected");
}
var sendHeaders = {
'destination': '/queue/test',
'transformation': 'jms-object-json'
};
var msg = new Object();
msg.val1 = "12";
msg.val2 = "test";
msg.val3 = "1";
msg.val4 = "1";
msg.val5 = "Y";
var frame = client.send(sendHeaders);
frame.write(JSON.stringify(msg));
frame.end();
});
Java consumer is able to get the message but throws the exception
org.apache.activemq.command.ActiveMQTextMessage cannot be cast to javax.jms.ObjectMessage
I have read this page from activeMQ which says that
Currently, ActiveMQ comes with a transformer that can transform XML/JSON text to Java objects, but you can add your own transformers as well
I didn't quite understand this part on how to convert data.
I have added xstream-1.4.10.jar and jettison-1.3.8.jar in apache-activemq-5.15.0\lib and restarted the ActiveMq server.
But still I get the error in the consumer.
Also in the ActiveMQ console -> Queues -> message properties, it shows transformation-error
Please let me know how I can convert this ActiveMQTextMessage type to javax.jms.ObjectMessage before it reaches the consumer

There isn't a transformer in ActiveMQ that will convert any random JSON string into and ObjectMessages, you'd have to write you own to handle whatever format you are sending. The converter in ActiveMQ will convert some basic types that Map from the JSON but it's tricky and not necessarily reliable. You are better off handling the TextMessage and doing something meaningful with the JSON yourself.

ActiveMQTextMessage and ObjectMessage are different , they can't cast to each other.
From ActiveMQTextMessage , you can get the true message content as a String, then you have to trans it to a json object yourself.

Deserialize Avro messages into specific datum using KafkaAvroDecoder

I'm reading from a Kafka topic, which contains Avro messages serialized using the KafkaAvroEncoder (which automatically registers the schemas with the topics). I'm using the maven-avro-plugin to generate plain Java classes, which I'd like to use upon reading.
The KafkaAvroDecoder only supports deserializing into GenericData.Record types, which (in my opinion) misses the whole point of having a statically typed language. My deserialization code currently looks like this:
SpecificDatumReader<event> reader = new SpecificDatumReader<>(
event.getClassSchema() // event is my class generated from the schema
);
byte[] in = ...; // my input bytes;
ByteBuffer stuff = ByteBuffer.wrap(in);
// the KafkaAvroEncoder puts a magic byte and the ID of the schema (as stored
// in the schema-registry) before the serialized message
if (stuff.get() != 0x0) {
return;
}
int id = stuff.getInt();
// lets just ignore those special bytes
int length = stuff.limit() - 4 - 1;
int start = stuff.position() + stuff.arrayOffset();
Decoder decoder = DecoderFactory.get().binaryDecoder(
stuff.array(), start, length, null
);
try {
event ev = reader.read(null, decoder);
} catch (IOException e) {
e.printStackTrace();
}
I found my solution cumbersome, so I'd like to know if there is a simpler solution to do this.

Thanks to the comment I was able to find the answer. The secret was to instantiate KafkaAvroDecoder with a Properties specifying the use of the specific Avro reader, that is:
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "...");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
io.confluent.kafka.serializers.KafkaAvroSerializer.class);
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_C‌ONFIG, "...");
props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true);
VerifiableProp vProps = new VerifiableProperties(props);
KafkaAvroDecoder decoder = new KafkaAvroDecoder(vProps);
MyLittleData data = (MyLittleData) decoder.fromBytes(input);
The same configuration applies for the case of using directly the KafkaConsumer<K, V> class (I'm consuming from Kafka in Storm using the KafkaSpout from the storm-kafka project, which uses the SimpleConsumer, so I have to manually deserialize the messages. For the courageous there is the storm-kafka-client project, which does this automatically by using the new style consumer).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring Integration : Transformer : file to Object - java

Related

Streaming data from Kinesis to S3 fails with Illegal Character that KPL itself writes

How to convert Flux of ByteBuffer to Spring BodyInserter

Spring Batch creating multiple files .Gradle based project

Convert JSON message to javax.jms.ObjectMessage in ActiveMq

Deserialize Avro messages into specific datum using KafkaAvroDecoder

Categories

Resources