I am trying to generate a class and methods in it, using Byte Buddy, based on some configuration that is available at runtime. The class is trying to create a Hazelcast Jet pipeline to join multiple IMaps.
Based on the provided configuration, the no. of IMaps to join can vary. In the sample below, I am trying to join three IMaps.
private Pipeline getPipeline(IMap<String, Object1> object1Map, IMap<String, Object2> object2Map,
IMap<String, Object3> object3Map) {
Pipeline p = Pipeline.create();
BatchStage<Entry<String, Object1>> obj1 = p.drawFrom(Sources.map(object1Map));
BatchStage<Entry<String, Object2>> obj2 = p.drawFrom(Sources.map(object2Map));
BatchStage<Entry<String, Object3>> obj3 = p.drawFrom(Sources.map(object3Map));
DistributedFunction<Tuple2<Object1, Object2>, String> obj1Obj2JoinFunc = entry -> entry.f1().getField31();
DistributedBiFunction<Tuple2<Object1, Object2>, Object3, Tuple2<Tuple2<Object1, Object2>, Object3>> output = (
in1, in2) -> (Tuple2.tuple2(in1, in2));
BatchStage<Tuple2<Object1, Object2>> obj1_obj2 = obj1.map(entry -> entry.getValue())
.hashJoin(obj2.map(entry -> entry.getValue()),
JoinClause.onKeys(Object1::getField11, Object2::getField21), Tuple2::tuple2).filter(entry -> entry.getValue() != null);
BatchStage<Tuple2<Tuple2<Object1, Object2>, Object3>> obj1_obj2_obj3 = obj1_obj2.hashJoin(
obj3.map(entry -> entry.getValue()),
JoinClause.onKeys(obj1Obj2JoinFunc, Object3::getField31), output)
.filter(entry -> entry.getValue() != null);
// the transformResult method will get the required fields from above operation and create object of AllObjectJoinClass
BatchStage<Entry<String, AllObjectJoinClass>> result = transformResult(obj1_obj2_obj3);
result.drainTo(Sinks.map("obj1_obj2_obj3"));
return p;
}
The problem here is that the no. of arguments to my method depend on the runtime configuration and that determines the method body as well.
I am able to generate the method signature using TypeDescription.Generic.Builder.parameterizedType.
But, I am having trouble generating the method body. I tried using MethodDelegation.to so that the method resides in a separate class. The trouble with this approach is that the method in the separate class needs to be very generic so that it can take arbitrary no. of arguments of different types and also needs to know about the fields of each of the objects in the IMap.
I wonder if there's an an alternate approach to achieving this with maybe templates of some type so that a separate class can be generated for each pipeline with this body. I didn't find any documentation for generating a method with a defined body (maybe I missed something).
-- Anoop
It very much depends on what you are trying to do:
With Advice, you can write a template as byte code that is inlined into your method.
With StackManipulations you can compose individual byte code instructions.
It seems to me that option (2) is what you are aiming for. For individually composed code, this is often the easiest option.
Writing individual byte code is of course not the most convenient option but if you can easily compose handling of each input, you might be able to compose multiple Advice classes to avoid using byte code instructions directly.
Related
Our project use a external library. It has a method return FluxMap (since FluxMap is not completely public so just call it Flux):
Flux<MappedType> aFluxMap = Library.createMappingToMappedType();
I have to emit some objects to aFluxMap to get them converted to MappedType (it has private constructor, few setter), then I can:
aFluxMap.doOnNext(converted -> doJob(converted))
I expect that there is a method on Flux/Mono like:
aFluxMap.emit(myObj);
But I could not find any method like that.
I have searched "how to emit to flux dynamically", then there is a solution:
FluxProcessor p = UnicastProcessor.create().serialize();
FluxSink sink = p.sink();
sink.next(mess);
But seem that it emit to newly created flux (p), not my aFluxMap. So I want to ask is there any way to emit message to a existed Flux (or how to connect a FluxSink to a existed Flux, so that whenever FluxSink .next(mess), then the existed Flux get the message). Thank you
Note: please don't pay much attention to the stupidity of the library. We must use it
==========================================
UPDATE:
As #lkatiforis suggestion:
FluxProcessor p = //see above
Flux<MappedType> aFluxMap = Library.createMappingToMappedType();
p.flatMap(raw -> aFluxMap).subscribe();
I got another issue. Library.createMappingToMappedType() return a subscribed Flux with its source is UnicastProcessor (also subscribed).
When I call p.flatMap(raw -> aFluxMap), then internally aFluxMap get subscribed again cause its source also get subscribed again, so that I got an exception telling that "UnicastProcessor can be subscribe once". Any suggestion?
You can create a new stream and then merge the two streams into one by using one of these methods: merge, concat, zip, and their variants.
Here is an example:
Flux<MappedType> yourFlux = //...
Flux<MappedType> aFluxMap = Library.createMappingToMappedType();
Flux.merge(aFluxMap, yourFlux);
The merge operator executes a merging of the MappedType objects from the two provided publisher sequences.
I have some records called "IDs" (about 2 Billion rows of Strings). I would like to create a BloomFilter based on these IDs, in GCP dataflow job. Then, I would like to iterate another data record called "newIDs", to check whether the newIDs are within the IDs or not.
My idea was to get the PCollectionView of these IDs as an Iterable. Then, based on this PCollectionView, I am going to create a BloomFilter, and this BloomFilter will be viewed as a Singleton for all the instances. Is this the best way to do it?
If yes, how can I make a global BloomFilter based on a PCollectionView > and use it for all the instances? I know sideInput for the DoFn; however, in my DoFn of generating the BloomFilter, there was no input. I just want to iterate the PCollectionView> and generate the BloomFilter.
Here are the codes:
// Generate the PCollectionView<Iterable<String>> from the IDs file;
PCollectionView<Iterable<String>> IDs = p.apply(TextIO.read("IDs file")).apply(View.asIterable());
// I don't know how I should write the code to generate this BloomFilter View. I tried the code below.
PCollectionView<BloomFilter<String>> bloomFilter = ???
PCollectionView<BloomFilter<String>> bloomFilter = p.apply(ParDo.of(new MakeBF()).withSideInput(IDs));
// Like here, I don't know what I should use as input for DoFn.
private class MakeBF extends DoFn<???, BloomFilter<String>> {
Iterable<String> IDView = c.sideInput(IDs);
BloomFilter bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charset.defaultCharset()), (long) 2E9, 0.001)
IDView.forEach(id -> bloomFilter.put(id))
}
Thanks a lot, guys.
I have a list and I'm streaming this list to get some filtered data as:
List<Future<Accommodation>> submittedRequestList =
list.stream().filter(Objects::nonNull)
.map(config -> taskExecutorService.submit(() -> requestHandler
.handle(jobId, config))).collect(Collectors.toList());
When I wrote tests, I tried to return some data using a when():
List<Future<Accommodation>> submittedRequestList = mock(LinkedList.class);
when(list.stream().filter(Objects::nonNull)
.map(config -> executorService.submit(() -> requestHandler
.handle(JOB_ID, config))).collect(Collectors.toList())).thenReturn(submittedRequestList);
I'm getting org.mockito.exceptions.misusing.WrongTypeOfReturnValue:
LinkedList$$EnhancerByMockitoWithCGLIB$$716dd84d cannot be returned by submit() error. How may I resolve this error by using a correct when()?
You can only mock single method calls, not entire fluent interface cascades.
Eg, you could do
Stream<Future> fs = mock(Stream.class);
when(requestList.stream()).thenReturn(fs);
Stream<Future> filtered = mock(Stream.class);
when(fs.filter(Objects::nonNull).thenReturn(filtered);
and so on.
IMO it's really not worth mocking the whole thing, just verify that all filters were called and check the contents of the result list.
I have this code which I want to refactor using a functional style, using Java 8. I would like to remove the mutable object currentRequest and still return the filtered request.
HttpRequest currentRequest = httpRequest;
for (Filter filter : filters) {
currentRequest = filter.doFilter(currentRequest);
}
The aim is to pass a request to the filter.doFilter method, and take the output and pass it back into the filter.doFilter method, and continue to do this until all filters are applied.
For example in a more convoluted way to the for loop
HttpRequest filteredRequest1 = filters.get(0).doFilter(currentRequest);
HttpRequest filteredRequest2 = filters.get(1).doFilter(filteredRequest1);
HttpRequest filteredRequest3 = filters.get(2).doFilter(filteredRequest2);
...
I think this is a case for composing functions, and the doFilter method should be a function like below:
Function<HttpRequest, HttpRequest> applyFilter = request -> filters.get(0).doFilter(request);
But I know this is totally wrong, as I got stuck here.
The other way I was thinking was to use reduce, but I cannot see a way of using it in this case.
If you could help me out with a way of doing this, or point me to some resource that will be great.
It looks like you may want to do a reduce with your HttpRequest as its identity. Each step of the reduce will combine the intermediate result with the next filter, like so:
filters.stream().reduce(currentRequest,
(req, filter) -> filter.doFilter(req),
(req1, req2) -> throwAnExceptionAsWeShouldntBeHere());
Note: the last function is used to merge two HttpRequests together if a parallel stream is used. If that's the route you wish to go down, then proceed with caution.
Here's a way that streams the filters and maps each one of them to a UnaryOperator<HttpRequest>. Then, all the functions are reduced via the Function.andThen operator and finally, if the filters collections wasn't empty, the resulting composed function is executed with the currentRequest as an argument:
HttpRequest result = filters.stream()
.map(filter -> ((Function<HttpRequest, HttpRequest>) filter::doFilter))
.reduce(Function::andThen)
.map(function -> function.apply(currentRequest))
.orElse(currentRequest);
I have two Kafka Spouts whose values I want to send to the same bolt.
Is it possible ?
Yes it is possible:
TopologyBuilder b = new TopologyBuilder();
b.setSpout("topic_1", new KafkaSpout(...));
b.setSpout("topic_2", new KafkaSpout(...));
b.setBolt("bolt", new MyBolt(...)).shuffleGrouping("topic_1").shuffleGrouping("topic_2");
You can use any other grouping, too.
Update:
In order to distinguish tuples (ie, topic_1 or topic_2) in consumer bolt, there are two possibilities:
1) You can use operator IDs (as suggested by #user-4870385):
if(input.getSourceComponent().equalsIgnoreCase("topic_1")) {
//do something
} else {
//do something
}
2) You can use stream names (as suggested by #zenbeni). For this case, both spouts need to declare named streams and the bolt need to connect to spouts by stream names:
public class MyKafkaSpout extends KafkaSpout {
final String streamName;
public MyKafkaSpout(String stream) {
this.streamName = stream;
}
// other stuff omitted
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// compare KafkaSpout.declareOutputFields(...)
declarer.declare(streamName, _spoutConfig.scheme.getOutputFields());
}
}
Build the topology, stream names need to be used now:
TopologyBuilder b = new TopologyBuilder();
b.setSpout("topic_1", new MyKafkaSpout("stream_t1"));
b.setSpout("topic_2", new MyKafkaSpout("stream_t2"));
b.setBolt("bolt", new MyBolt(...)).shuffleGrouping("topic_1", "stream_t1").shuffleGrouping("topic_2", "stream_t2");
In MyBolt the stream name can now be used to distinguish input tuples:
// in my MyBolt.execute():
if(input.getSourceStreamId().equals("Topic1")) {
// do something
} else {
// do something
}
Discussion:
While the second approach using stream names is more natural (according to #zenbeni), the first is more flexible (IHMO). Stream names are declared by spout/bolt directly (ie, at the time the spout/bolt code is written); in contrast, operator IDs are assigned when topology is put together (ie, at the time the spout/bolt is used).
Let's assume we get three bolts as class files (no source code). The first two should be used as producers and both declare output streams with the same name. If the third consumer distinguishes input tuples by stream, this will not work. Even if both given producer bolts declare different output stream names, the expected input stream names might be hard coded in consumer bolt and might not match. Thus, it does not work either. However, if the consumer bolt uses component names (even if they are hard coded) to distinguish incoming tuples, the expected component IDs can be assigned correctly.
Of course, it would be possible to inherit from the given classes (if not declared final and overwrite declareOutputFields(...) in order to assign own stream names. However, this is more additional work to do.
Yes its possible. You can have any spout talking to same bolt.
Refer https://storm.apache.org/documentation/Tutorial.html "Streams" section.