Is it OK to call Kafka Streams StreamBuilder.build() multiple times - java

We're using micronaut/kafka-streams. Using this framework in order to create a streams application you build something like this:
#Factory
public class FooTopologyConfig {
#Singleton
#Named
public KStream<String, FooPojo> configureTopology {
return builder.stream("foo-topic-in")
.peek((k,v) -> System.out.println(String.format("key %s, value: %s", k,v))
.to("foo-topic-out");
}
}
This:
Receives a ConfiguredStreamBuilder (a very light wrapper around StreamsBuilder)
Build and return the stream (we're not actually sure how important returning the stream is,
but that's a different question).
ConfiguredStreamBuilder::build() (which invokes the same on StreamsBuilder) is called later by the framework and the returned Topology is not made available for injection by Micronaut.
We want the Topology bean in order to log a description of the topology (via Topology::describe).
Is it safe to do the following?
Call ConfiguredStreamBuilder::build (and therefore StreamsBuilder::build) and use the returned instance of Topology to print a human readable description.
Allow the framework to call ConfiguredStreamBuilder::build for a second time later, and use the second instance of the returned topology to build the application.

There should be no problem calling build() multiple times. This is common in the internal code of Streams as well as in the tests.
To answer your other question. you only need the stream from builder.stream() operations if you want to expand on that branch of the topology later.

Related

Creating a loop using spring-webflux and avoid memory issues

I am currently working on a project where I need to create a loop using spring webflux to generate a Flux for downstream processing. The loop should sequentially take batches of elements from a source (in this instance a repository) and pass the elements as signal in a Flux. To acquire the elements, we have a repository method which fetches the next batch. When all elements have been processed, the method yields an empty List.
I have identified that I can use Flux::generate in the following manner:
Flux.<List<Object>>generate(sink -> {
List<Object> batch = repository.fetch();
if (batch.isEmpty()) {
sink.complete();
} else {
sink.next(batch);
}
})
...
However, when I use this, the argument method runs continuously, buffering until I run out of memory.
I have also tried using Flux::create, but I am struggling to find an appropriate approach. I have found that I can do the following:
Consumer<Integer> sinker;
Flux<?> createFlux() {
return Flux.<List<Object>>create(sink -> sinker = integer -> {
List<Object> batch = repository.fetch();
if (batch.isEmpty()) {
sink.complete();
} else {
sink.next(batch);
}
})
...
.doOnNext(x -> sinker.accept(y))
...
}
Then I just need to call the Consumer initially to initiate the loop.
However, I feel like I am overly complicating a job which should have a fairly standard implementation. Also, this implementation requires secondary calls to get started, and I haven't found a decent way to initiate it within the pipeline (for instance, using .onSubscribe() doesn't work, as it attempts to call the Consumer before it has been assigned).
So in summary, I am looking for a simple way to create an unbounded loop while controlling the backpressure to avoid outOfMemory-errors.
I believe I have found a simpler solution which serves my need. The method Mono::repeat(BooleanSuplier) allows me to loop until the list is empty, simply by:
Mono.fromCallable(() -> repository.nextBatch())
.flatMap(/* do some stuff here */)
.repeat(() -> repository.hasNext())
If other more elegant solutions exist, I am still open for suggestions.

"sharing" parts of a reactive stream over multiple rest calls

I have this Spring WebFlux controller:
#RestController
public class Controller
{
#PostMapping("/doStuff")
public Mono<Response> doStuff(#RequestBody Mono<Request> request)
{
...
}
}
Now, say I wanted to relate separate requests coming to this controller from different clients to group processing based on some property of the Request object.
Take 1:
#PostMapping("/doStuff")
public Mono<Response> doStuff(#RequestBody Mono<Request> request)
{
return request.flux()
.groupBy(r -> r.someProperty())
.flatMap(gf -> gf.map(r -> doStuff(r)));
}
This will not work, because every call will get its own instance of the stream. The whole flux() call doesn't make sense, there will always ever be one Request object going through the stream even if there's many of those streams fired at the same time as a result of simultaneous calls coming from clients. What I need, I gather, is some part of the stream that is shared between all requests where I could do my grouping, which led me to this slightly over engineered code
Take 2:
private AtomicReference<FluxSink<Request>> sink = new AtomicReference<>();
private Flux<Response> serializingStream;
public Controller()
{
this.serializingStream =
Flux.<Request>create(fluxSink -> sink.set(fluxSink), ERROR)
.groupBy(r -> r.someProperty())
.flatMap(gf -> gf.map(r -> doStuff(r)));
.publish()
.autoConnect();
this.serializingStream.subscribe().dispose(); //dummy subscription to set the sink;
}
#PostMapping("/doStuff")
public Mono<Response> doStuff(#RequestBody Request request)
{
req.setReqId(UUID.randomUUID().toString());
return
serializingStream
.doOnSubscribe(__ -> sink.get().next(req))
.filter(resp -> resp.getReqId().equals(req.getReqId()))
.take(1)
.single();
}
And this kind of works, though it looks like I am doing things I shouldn't (or at least they don't feel right), like leaking the FluxSink and then injecting a value through it while subscribing, adding a request ID so that I can then filter the right response. Also, if error happens in the serializingStream then it breakes everything for everyone, but I guess I could try to isolate the errors to keep things going.
The question is, is there a better way of doing this that doesn't feel like an open heart surgery.
Also, related question for a similar scenario. I was thinking about using Akka Persistence to implement event sourcing and have it trigerred from inside that Reactor stream. I was reading about Akka Streams that allow to wrap an Actor and then there's some ways of converting that into something that can be hooked up with Reactor (aka Publisher or Subscriber), but then if every requests gets it's own stream, I am effectively loosing back pressure and am risking OOME because of flooding the Persistent Actor's mailbox, so I guess that problem falls in to the same category like the one I described above.

Reactor / WebFlux implement a reactive http news ticker

I have a request that is rather simple to formulate, but I cannot pull it of without leaking resources.
I want to return a response of type application/stream+json, featuring news events someone posted. I do not want to use Websockets, not because I don't like them, I just want to know how to do it with a stream.
For this I need to return a Flux<News> from my restcontroller, that is continuously fed with news, once someone posts any.
My attempt for this was creating a Publisher:
public class UpdatePublisher<T> implements Publisher<T> {
private List<Subscriber<? super T>> subscribers = new ArrayList<>();
#Override
public void subscribe(Subscriber<? super T> s) {
subscribers.add(s);
}
public void pushUpdate(T message) {
subscribers.forEach(s -> s.onNext(message));
}
}
And a simple News Object:
public class News {
String message;
// Constructor, getters, some properties omitted for readability...
}
And endpoints to publish news respectively get the stream of news
// ...
private UpdatePublisher<String> updatePublisher = new UpdatePublisher<>();
#GetMapping(value = "/news/ticker", produces = "application/stream+json")
public Flux<News> getUpdateStream() {
return Flux.from(updatePublisher).map(News::new);
}
#PutMapping("/news")
public void putNews(#RequestBody News news) {
updatePublisher.pushUpdate(news.getMessage());
}
This WORKS, but I cannot unsubscribe, or access any given subscription again - so once a client disconnects, the updatePublisher will just continue to push onto a growing number of dead channels - as I have no way to call the onCompleted() handler on the subscriptions.
TL;DL:
Can one push messages onto a possible endless Flux from a different thread and still terminate the Flux on demand without relying on a reset by peer exception or something along those lines?
You should never try to implement yourself the Publisher interface, as it boils down to getting the reactive streams implementation right. This is exactly the issue you're facing here.
Instead you should use one of the generator operators provided by Reactor itself (this is actually a Reactor question, nothing specific to Spring WebFlux).
In this case, Flux.create or Flux.push are probably the best candidates, given your code uses some type of event listener to push events down the stream. See the reactor project reference documentation on that.
Without more details, it's hard to give you a concrete code sample that solves your problem. Here are a few pointers though:
you might want to .share() the stream of events for all subscribers if you'd like some multicast-like communication pattern
pay attention to the push/pull/push+pull model that you'd like to have here; how is the backpressure supposed to work here? What if we produce more events that the subscribers can handle?
this model would only work on a single application instance. If you'd like this to work on multiple application instances, you might want to look into messaging patterns using a broker

Use placeholders in feature files

I would like to use placeholders in a feature file, like this:
Feature: Talk to two servers
Scenario: Forward data from Server A to Server B
Given MongoDb collection "${db1}/foo" contains the following record:
"""
{"key": "value"}
"""
When I send GET "${server1}/data"
When I forward the respone to PUT "${server2}/data"
Then MongoDB collection "${db2}/bar" MUST contain the following record:
"""
{"key": "value"}
"""
The values of ${server1} etc. would depend on the environment in which the test is to be executed (dev, uat, stage, or prod). Therefore, Scenario Outlines are not applicable in this situation.
Is there any standard way of doing this? Ideally there would be something which maintains a Map<String, String> that can be filled in a #Before or so, and runs automatically between Cucumber and the Step Definition so that inside the step definitions no code is needed.
Given the following step definitions
public class MyStepdefs {
#When("^I send GET "(.*)"$)
public void performGET(final String url) {
// …
}
}
And an appropriate setup, when performGET() is called, the placeholder ${server1} in String uri should already be replaced with a lookup of a value in a Map.
Is there a standard way or feature of Cucumber-Java of doing this? I do not mind if this involves dependency injection. If dependency injection is involved, I would prefer Spring, as Spring is already in use for other reasons in my use case.
The simple answer is that you can't.
The solution to your problem is to remove the incidental details from your scenario all together and access specific server information in the step defintions.
The server and database obviously belong together so lets describe them as a single entity, a service.
The details about the rest calls doesn't really help to convey what you're
actually doing. Features don't describe implementation details, they describe behavior.
Testing if records have been inserted into the database is another bad practice and again doesn't describe behavior. You should be able to replace that by an other API call that fetches the data or some other process that proves the other server has received the information. If there are no such means to extract the data available you should create them. If they can't be created you can wonder if the information even needs to be stored (your service would then appear to have the same properties as a black hole :) ).
I would resolve this all by rewriting the story such that:
Feature: Talk to two services
Scenario: Forward foobar data from Service A to Service B
Given "Service A" has key-value information
When I forward the foobar data from "Service A" to "Service B"
Then "Service B" has received the key-value information
Now that we have two entities Service A and Service B you can create a ServiceInformationService to look up information about Service A and B. You can inject this ServiceInformationService into your step definitions.
So when ever you need some information about Service A, you do
Service a = serviceInformationService.lookup("A");
String apiHost = a.getApiHost():
String dbHost = a.getDatabaseHOst():
In the implementation of the Service you look up the property for that service System.getProperty(serviceName + "_" + apiHostKey) and you make sure that your CI sets A_APIHOST and A_DBHOST, B_APIHOST, B_DBHOST, ect.
You can put the name of the collections in a property file that you look up in a similar way as you'd look up the system properties. Though I would avoid direct interaction with the DB if possible.
The feature you are looking for is supported in gherkin with qaf. It supports to use properties defined in properties file using ${prop.key}. In addition it offers strong resource configuration features to work with different environments. It also supports web-services

How to write unit tests for this simple application

I have an application with a class registered as a message listener that receives messages from a queue, checks it's of the correct class type (in public void onMessage(Message message)) and sends it to another class that converts this class to a string and writes the line to a log file (in public void handleMessage(MessageType m)). How would you write unit tests for this?
If you can use Mockito in combination with JUnit your test could look like this:
public void onMessage_Success() throws Excepton {
// Arrange
Message message = aMessage().withContent("...").create();
File mockLogFile = mock(File.class);
MessageHandler mockMessageHandler = mock(MessageHandler.class);
when(mockMessageHandler).handleMessage(any(MessageType.class)
.thenReturn("somePredefinedTestOutput");
when(mockMessageHandler).getLogFile().thenReturn(mockLogFile);
MessageListener sut = spy(new MessageListener());
Whitebox.setInternalState(sut, "messageHanlder", mockMessageHandler);
// or simply sut.setMessageHandler(mockMessageHandler); if a setter exists
// Act
sut.onMessage(message);
// Assert
assertThat(mockLogFile, contains("your desired content"));
verify(sut, times(1)).handleMessage(any(Message.class));
}
Note that this is just a simple example how you could test this. There are probably plenty of other ways to test the functionality. The example above showcaeses a typical builder-pattern for the generation of default-messages which accept certain values for testing. Moreover, I have not really clarified the Hamcrest matcher for the contains method on the mockLogFile.
As #Keppil also mentioned in his comment, it makes sense to create multiple test-cases which varry slightly in the arrange and assert parts where the bad-cases are tested
What I probably didn't explain enough is that getLogFile() method (which with high certainty has an other name in your application) of MessageHandler should return the reference to the file used by your MessageHandler instance to store the actual log-messages. Therefore, it probably is better to define this mockMessageHandler as spy(new MessageHandler()) instead of mock(MessageHandler.class) although this means that the unit-test is actually an integration test as the interaction of two classes is tested at the same time.
But overall, I hope you got the idea - use mock(Class) to generate default implementations for dependencies your system-under-test (SUT) requires or spy(Instance) if you want to include a real-world object instead of one that only has null-values as return types. You can take influence on the return-value of mocked objects with when(...).thenReturn(...)/.thenThrow(...) or doReturn(...).when(...) in case of void-operations f.e.
If you have dependency-injection into private fields in place you should use Whitebox.setInternalState(...) to inject the values into the sut or mock classes if no public or package-private (if you obtain the testing-model of reusing the package structure of the system-under-test classes within your test-classes) setter-methods are available.
Further, verify(...) lets you verify that a certain method was invoked while executing the SUT. This is quite handy in this scenario when the actual assertion isn't that trivial.

Categories

Resources