Splitting, enriching and putting back together

Splitting, enriching and putting back together - java

I have a message carrying XML (an order) containing multiple homogenous nodes (think a list of products) in addition to other info (think address, customer details, etc.). I have to enrich each 'product' with details provided by another external service and return the same complete XML 'order' message with enriched 'products'.
I came up with this sequence of steps:
Split the original XML with xpath to separate messages (also keeping the original message)
Enrich split messages with additional data
Put enriched parts back into the original message by replacing old elements.
I was trying to use multicast by sending original message to endpoint where splitting and enriching is done and to aggregation endpoint where original message and split-enriched messages are aggregates and then passed to processor which is responsible for combining these parts back to single xml file. But I couldn't get the desired effect...
What would be the correct and nice way to solve this problem?

The Splitter EIP in Camel can aggregate messages back (as a Composed Message Processor EIP).
http://camel.apache.org/splitter
See this video which demonstrates such a use-case
http://davsclaus.blogspot.com/2011/09/video-using-splitter-eip-and-aggregate.html

Related

Spring Integration aggregating messages that were split twice

I have a use case where my message are being split twice and i want to aggregate all these messages. How can this best be achieved, should i aggregate the messages twice by introducing different sequence headers, or is there a way to aggregate the messages in single aggregating step by overriding the method how messages are grouped?

That's called a "nested splitting" and there is built-in algorithm to push sequence detail headers to the stack for a new splitting context. This would allow in the end to have an ascendant aggregation: the first one aggregate for the closest nested splitting, pops sequence detail headers and allows the next aggregator to deal with its own sequence context.
So, in two words: it is better to have as many aggregator as you have splitting if you want to send a single message in the start and receive a single message in the end.
Of course you can have a custom splitting algorithm with an applySequence = false. As many as you need. And have only a single aggregator in the end, but with a custom correlation logic already.
We have some explanation in the docs: https://docs.spring.io/spring-integration/docs/current/reference/html/message-routing.html#aggregatingmessagehandler
Starting with version 5.3, after processing message group, an AbstractCorrelatingMessageHandler performs a MessageBuilder.popSequenceDetails() message headers modification for the proper splitter-aggregator scenario with several nested levels.
We don't have a sample on the matter, but here is a configuration for test case: https://github.com/spring-projects/spring-integration/blob/main/spring-integration-core/src/test/java/org/springframework/integration/aggregator/scenarios/NestedAggregationTests-context.xml

How distinguish different JMS text messages from the same queue?

I have an AMQ queue I'm connecting to. The publisher is sending JMS TextMessage. The messages are of different types: foo-updated, bar-updated, baz-updated etc, all on the single queue.
The payload/message body is JSON for all types, and their schemas have significant overlap (but no direct type information).
The publishing team said "search the string/text of the message for foo, and if it's there, it's a foo-updated message".
That sounds wrong.
There may be header information in the JMS message I can use (I'm exploring that), but assuming I can influence (but not necessarily change anything), what's the best way to handle this ?

If you have influence over using JMS topics, you should use that. Just like REST URLs you could use topics to indicate resources and actions on those: foo/create, foo/update, bar/update
Then the JMS Broker can help you to efficiently route different messages to difference consumers. E.g. one consumer subscribing to foo/* another to */update
If you are stuck with a queue, the publisher should add additional information as header properties, for example type=foo and action=update. Then your consumer can specify JMS selectors like "action = 'update'" to receive only some of the messages.
Otherwise you are actually stuck with looking into the content :-(

Use JMS Message Selectors
See: Message Selectors

Can an email contain multiple Message Ids?

I am implementing a mail client in java and I am retrieving the MessageId using the command: String[] msgIds = msg.getHeader("Message-Id");
Since getHeader() returns an Array. I was wondering if there is any scenario where an email might contain multiple Ids.
I tried testing it by sending/replying/forwarding an email back and forth but it only contained one id every time.

The current specification for internet email message format is RFC 5322. That specifies that an email message can have zero or one "message-id" headers, and that one is recommended. (See page 20 in the linked version)
So any email that has more than one "message-id" header is non-conformant.
However, if you are implementing a mail reader or processor, it is advisable to allow for the possibility of a non-conformant email message. At the very least, your processor should cope with such an email so that it doesn't crash or behave in a destructive fashion. (That kind of fragility could allow someone to attack your mail processor, and maybe the system that it runs on.)

An E-Mail might not contain multiple Message-Id headers but other E-Mail header fields might appear multiple times. The getHeader function is returning an array to take these into account.
For example the Recieved header can be set multiple times to provide a full trace of servers that handled the E-Mail.

How to order fields in outgoing messages in QuickFIX/J

Is there any way to order fields in outgoing messages without rebuilding QuickFIX/J? Or any configuration flag available which orders messages according to any validation file that we might set using some path flag?

See the QuickFIX/J User FAQ, topic "I altered my data dictionary. Should I regenerate/rebuild QF/J?". Specifically following excerpts:
If your DD changes aren't very extensive, maybe just a few field changes, then you don't really need to. If you added a whole new custom message type, then you probably should. If you changed field orders inside of repeating groups, then I recommend that you do, especially if those group changes are in outgoing messages.
And
OUTGOING MSGS: The DD xml file is irrelevant when you construct outgoing messages. You can pretty much add whatever fields you want to messages using the generic field setters (setString, setInt, etc) and QF will let you. The only trouble is with repeating groups. QF will write repeating group element ordering according to the DD that was used for code generation. If you altered any groups that are part of outgoing messages, you DEFINITELY need to rebuild.
From what I gather from this FAQ entry, you should not rebuild for outgoing messages unless the reordering is within repeating groups. In case you change field order in repeating groups you should rebuild.
In any case it's easy to test by shuffling fields around in a message in the dictionary, refer to it the custom dictionary in your configuration, then log the message generated by the QuikFIX/J engine.

Processing data based on the metadata in the file using apache camel

I have to setup camel to process data files where the first line of the file is the metadata and then it follows with millions of lines of actual data. The metadata dictates how the data is to be processed. What I am looking for is something like this:
Read first line (metadata) and populate a bean (with metadata) --> 2. then send data 1000 lines at a time to the data processor which will refer to the bean in step # 1
Is it possible in Apache Camel?

Yes.
An example architecture might look something like this:
You could setup a simple queue that could be populated with file names (or whatever identifier you are using to locate each individual file).
From the queue, you could route through a message translator bean, whose sole is to translate a request for a filename into a POJO that contains the metadata from the first line of the file.
(You have a few options here)
Your approach to processing the 1000 line sets will depend on whether or not the output or resulting data created from the 1000 lines sets needs to be recomposed into a single message and processed again later. If so, you could implement a composed message processor made up of a message producer/consumer, a message aggregator and a router. The message producer/consumer would receive the POJO with the metadata created in step2 and enqueue as many new requests are necessary to process all of the lines in the file. The router would route from this queue through your processing pipeline and into the message aggregator. Once aggregated, a single unified message with all of your important data will be available for you to do what you will.
If instead each 1000 line set can be processed independently and rejoining is not required, than it is not necessary to agggregate the messages. Instead, you can use a router to route from step 2 to a producer/consumer that will, like above, enquene the necessary number of new requests for each file. Finally, the router will route from this final queue to a consumer that will do the processing.
Since you have a large quantity of data to deal with, it will likely be difficult to pass around 1000 line groups of data through messages, especially if they are being placed in a queue (you don't want to run out of memory). I recommend passing around some type of indicator that can be used to identify which line of the file a specific request was for, and then parse the 1000 lines when you need them. You could do this in a number of ways, like by calculating the number of bytes deep into a file a specific line is, and then using a file reader's skip() method to jump to that line when the request hits the bean that will be processing it.
Here are some resources provided on the Apache Camel website that describe the enterprise integration patterns that I mentioned above:
http://camel.apache.org/message-translator.html
http://camel.apache.org/composed-message-processor.html
http://camel.apache.org/pipes-and-filters.html
http://camel.apache.org/eip.html

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.