How to read AvroFile into Tuple Class with Java in Flink

How to read AvroFile into Tuple Class with Java in Flink - java

I'm Trying to read an Avro file and perform some operations on it, everything works fine but the aggregation functions, when I use them it get the below exception :
aggregating on field positions is only possible on tuple data types
then I change my class to implement Tuple4 (as I have 4 fields) but then when I want to collect the results get AvroTypeException Unknown Type : T0
Here are my data and job classes :
public class Nation{
public Integer N_NATIONKEY;
public String N_NAME;
public Integer N_REGIONKEY;
public String N_COMMENT;
public Integer getN_NATIONKEY() {
return N_NATIONKEY;
}
public void setN_NATIONKEY(Integer n_NATIONKEY) {
N_NATIONKEY = n_NATIONKEY;
}
public String getN_NAME() {
return N_NAME;
}
public void setN_NAME(String n_NAME) {
N_NAME = n_NAME;
}
public Integer getN_REGIONKEY() {
return N_REGIONKEY;
}
public void setN_REGIONKEY(Integer n_REGIONKEY) {
N_REGIONKEY = n_REGIONKEY;
}
public String getN_COMMENT() {
return N_COMMENT;
}
public void setN_COMMENT(String n_COMMENT) {
N_COMMENT = n_COMMENT;
}
public Nation() {
}
public static void main(String[] args) throws Exception {
Configuration parameters = new Configuration();
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
Path path2 = new Path("/Users/violet/Desktop/nation.avro");
AvroInputFormat<Nation> format = new AvroInputFormat<Nation>(path2,Nation.class);
format.configure(parameters);
DataSet<Nation> nation = env.createInput(format);
nation.aggregate(Aggregations.SUM,0);
JobExecutionResult res = env.execute();
}
and here's the tuple class and the same code for the job as above:
public class NationTuple extends Tuple4<Integer,String,Integer,String> {
Integer N_NATIONKEY(){ return this.f0;}
String N_NAME(){return this.f1;}
Integer N_REGIONKEY(){ return this.f2;}
String N_COMMENT(){ return this.f3;}
}
I tried with this class and got the TypeException (Used NationTuple everywhere instead of Nation)

I don't think having your class implementing Tuple4 is right way to go. Instead you should add to your topology a MapFunction that converts your NationTuple to Tuple4.
static Tuple4<Integer, String, Integer, String> toTuple(Nation nation) {
return Tuple4.of(nation.N_NATIONKEY, ...);
}
And then in your topology call:
inputData.map(p -> toTuple(p)).returns(new TypeHint<Tuple4<Integer, String, Integer, String>(){});
The only subtle part is that you need to provide a type hint so flink can figure out what kind of tuple your function returns.
Another solution is to use field names instead of tuple field indices when doing your aggregation. For example:
groupBy("N_NATIONKEY", "N_REGIONKEY")
This is all explained here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#specifying-keys

Related

Reactor Mono - execute parallel tasks

I am new to Reactor framework and trying to utilize it in one of our existing implementations. LocationProfileService and InventoryService both return a Mono and are to executed in parallel and have no dependency on each other (from the MainService). Within LocationProfileService - there are 4 queries issued and the last 2 queries have a dependency on the first query.
What is a better way to write this? I see the calls getting executed sequentially, while some of them should be executed in parallel. What is the right way to do it?
public class LocationProfileService {
static final Cache<String, String> customerIdCache //define Cache
#Override
public Mono<LocationProfileInfo> getProfileInfoByLocationAndCustomer(String customerId, String location) {
//These 2 are not interdependent and can be executed immediately
Mono<String> customerAccountMono = getCustomerArNumber(customerId,location) LocationNumber).subscribeOn(Schedulers.parallel()).switchIfEmpty(Mono.error(new CustomerNotFoundException(location, customerId))).log();
Mono<LocationProfile> locationProfileMono = Mono.fromFuture(//location query).subscribeOn(Schedulers.parallel()).log();
//Should block be called, or is there a better way to do ?
String custAccount = customerAccountMono.block(); // This is needed to execute and the value from this is needed for the next 2 calls
Mono<Customer> customerMono = Mono.fromFuture(//query uses custAccount from earlier step).subscribeOn(Schedulers.parallel()).log();
Mono<Result<LocationPricing>> locationPricingMono = Mono.fromFuture(//query uses custAccount from earlier step).subscribeOn(Schedulers.parallel()).log();
return Mono.zip(locationProfileMono,customerMono,locationPricingMono).flatMap(tuple -> {
LocationProfileInfo locationProfileInfo = new LocationProfileInfo();
//populate values from tuple
return Mono.just(locationProfileInfo);
});
}
private Mono<String> getCustomerAccount(String conversationId, String customerId, String location) {
return CacheMono.lookup((Map)customerIdCache.asMap(),customerId).onCacheMissResume(Mono.fromFuture(//query).subscribeOn(Schedulers.parallel()).map(x -> x.getAccountNumber()));
}
}
public class InventoryService {
#Override
public Mono<InventoryInfo> getInventoryInfo(String inventoryId) {
Mono<Inventory> inventoryMono = Mono.fromFuture(//inventory query).subscribeOn(Schedulers.parallel()).log();
Mono<List<InventorySale>> isMono = Mono.fromFuture(//inventory sale query).subscribeOn(Schedulers.parallel()).log();
return Mono.zip(inventoryMono,isMono).flatMap(tuple -> {
InventoryInfo inventoryInfo = new InventoryInfo();
//populate value from tuple
return Mono.just(inventoryInfo);
});
}
}
public class MainService {
#Autowired
LocationProfileService locationProfileService;
#Autowired
InventoryService inventoryService
public void mainService(String customerId, String location, String inventoryId) {
Mono<LocationProfileInfo> locationProfileMono = locationProfileService.getProfileInfoByLocationAndCustomer(....);
Mono<InventoryInfo> inventoryMono = inventoryService.getInventoryInfo(....);
//is using block fine or is there a better way to do?
Mono.zip(locationProfileMono,inventoryMono).subscribeOn(Schedulers.parallel()).block();
}
}

You don't need to block in order to get the pass that parameter your code is very close to the solution. I wrote the code using the class names that you provided. Just replace all the Mono.just(....) with the call to the correct service.
public Mono<LocationProfileInfo> getProfileInfoByLocationAndCustomer(String customerId, String location) {
Mono<String> customerAccountMono = Mono.just("customerAccount");
Mono<LocationProfile> locationProfileMono = Mono.just(new LocationProfile());
return Mono.zip(customerAccountMono, locationProfileMono)
.flatMap(tuple -> {
Mono<Customer> customerMono = Mono.just(new Customer(tuple.getT1()));
Mono<Result<LocationPricing>> result = Mono.just(new Result<LocationPricing>());
Mono<LocationProfile> locationProfile = Mono.just(tuple.getT2());
return Mono.zip(customerMono, result, locationProfile);
})
.map(LocationProfileInfo::new)
;
}
public static class LocationProfileInfo {
public LocationProfileInfo(Tuple3<Customer, Result<LocationPricing>, LocationProfile> tuple){
//do wathever
}
}
public static class LocationProfile {}
private static class Customer {
public Customer(String cutomerAccount) {
}
}
private static class Result<T> {}
private static class LocationPricing {}
Pleas remember that the first zip is not necessary. I re write it to mach your solution. But I would solve the problem a little bit differently. It would be clearer.
public Mono<LocationProfileInfo> getProfileInfoByLocationAndCustomer(String customerId, String location) {
return Mono.just("customerAccount") //call the service
.flatMap(customerAccount -> {
//declare the call to get the customer
Mono<Customer> customerMono = Mono.just(new Customer(customerAccount));
//declare the call to get the location pricing
Mono<Result<LocationPricing>> result = Mono.just(new Result<LocationPricing>());
//declare the call to get the location profile
Mono<LocationProfile> locationProfileMono = Mono.just(new LocationProfile());
//in the zip call all the services actually are executed
return Mono.zip(customerMono, result, locationProfileMono);
})
.map(LocationProfileInfo::new)
;
}

Java Generics : cant call a function with said generics even though type matches

I have this code where I have defined two classes using generics.
1. Section which can have a generic type of data.
2. Config which uses kind of builder patterns and stores list of such sections.
On running this code it gives compilation error and I am no where to understand why. I have mentioned the type.
Error : incompatible types: java.util.List> cannot be converted to java.util.List>
public class Main {
public static void main(String[] args) {
Section<String> section = new Section<>("wow");
List<Section<String>> sections = new ArrayList<>();
sections.add(section);
Config<String> config = new Config<>().setSections(sections);
}
public static class Section<T> {
private T data;
public Section(T data) {
this.data = data;
}
public T getData() {
return data;
}
}
public static class Config<T> {
private List<Section<T>> sections;
public Config() {
}
public Config<T> setSections(List<Section<T>> sections) {
this.sections = sections;
return this;
}
}
}

The problem is at line 7, you are creating new Config and call setSections on the same line.
So the solutions are two:
Explicit type:
Config<String> config = new Config<String>().setSections(sections);
Split operations:
Config<String> config = new Config<>();
conf.setSections(sections);

It's a compiler peculiarity, you'll have to write
Config<String> config = new Config<String>().setSections(sections);

Function chaining in Java

I need to do a lot of different preprocessing of some text data, the preprocessing consists of several simple regex functions all written in class Filters that all take in a String and returns the formatted String. Up until now, in the different classes that needed some preprocessing, I created a new function where I had a bunch of calls to Filters, they would look something like this:
private static String filter(String text) {
text = Filters.removeURL(text);
text = Filters.removeEmoticons(text);
text = Filters.removeRepeatedWhitespace(text);
....
return text;
}
Since this is very repetitive (I would call about 90% same functions, but 2-3 would be different for each class), I wonder if there are some better ways of doing this, in Python you can for example put function in a list and iterate over that, calling each function, I realize this is not possible in Java, so what is the best way of doing this in Java?
I was thinking of maybe defining an enum with a value for each function and then call a main function in Filters with array of enums with the functions I want to run, something like this:
enum Filter {
REMOVE_URL, REMOVE_EMOTICONS, REMOVE_REPEATED_WHITESPACE
}
public static String filter(String text, Filter... filters) {
for(Filter filter: filters) {
switch (filter) {
case REMOVE_URL:
text = removeURL(text);
break;
case REMOVE_EMOTICONS:
text = removeEmoticons(text);
break;
}
}
return text;
}
And then instead of defining functions like shown at the top, I could instead simply call:
filter("some text", Filter.REMOVE_URL, Filter.REMOVE_EMOTICONS, Filter.REMOVE_REPEATED_WHITESPACE);
Are there any better ways to go about this?

Given that you already implemented your Filters utility class you can easily define a list of filter functions
List<Function<String,String>> filterList = new ArrayList<>();
filterList.add(Filters::removeUrl);
filterList.add(Filters::removeRepeatedWhitespace);
...
and then evaluate:
String text = ...
for (Function<String,String> f : filterList)
text = f.apply(text);
A variation of this, even easier to handle:
Define
public static String filter(String text, Function<String,String>... filters)
{
for (Function<String,String> f : filters)
text = f.apply(text);
return text;
}
and then use
String text = ...
text = filter(text, Filters::removeUrl, Filters::removeRepeatedWhitespace);

You could do this in Java 8 pretty easily as #tobias_k said, but even without that you could do something like this:
public class FunctionExample {
public interface FilterFunction {
String apply(String text);
}
public static class RemoveSpaces implements FilterFunction {
public String apply(String text) {
return text.replaceAll("\\s+", "");
}
}
public static class LowerCase implements FilterFunction {
public String apply(String text) {
return text.toLowerCase();
}
}
static String filter(String text, FilterFunction...filters) {
for (FilterFunction fn : filters) {
text = fn.apply(text);
}
return text;
}
static FilterFunction LOWERCASE_FILTER = new LowerCase();
static FilterFunction REMOVE_SPACES_FILTER = new RemoveSpaces();
public static void main(String[] args) {
String s = "Some Text";
System.out.println(filter(s, LOWERCASE_FILTER, REMOVE_SPACES_FILTER));
}
}

Another way would be to add a method to your enum Filter and implement that method for each of the enum literals. This will also work with earlier versions of Java. This is closest to your current code, and has the effect that you have a defined number of possible filters.
enum Filter {
TRIM {
public String apply(String s) {
return s.trim();
}
},
UPPERCASE {
public String apply(String s) {
return s.toUpperCase();
}
};
public abstract String apply(String s);
}
public static String applyAll(String s, Filter... filters) {
for (Filter f : filters) {
s = f.apply(s);
}
return s;
}
public static void main(String[] args) {
String s = " Hello World ";
System.out.println(applyAll(s, Filter.TRIM, Filter.UPPERCASE));
}
However, if you are using Java 8 you can make your code much more flexible by just using a list of Function<String, String> instead. If you don't like writing Function<String, String> all the time, you could also define your own interface, extending it:
interface Filter extends Function<String, String> {}
You can then define those functions in different ways: With method references, single- and multi-line lambda expressions, anonymous classes, or construct them from other functions:
Filter TRIM = String::trim; // method reference
Filter UPPERCASE = s -> s.toUpperCase(); // one-line lambda
Filter DO_STUFF = (String s) -> { // multi-line lambda
// do more complex stuff
return s + s;
};
Filter MORE_STUFF = new Filter() { // anonymous inner class
// in case you need internal state
public String apply(String s) {
// even more complex calculations
return s.replace("foo", "bar");
};
};
Function<String, String> TRIM_UPPER = TRIM.andThen(UPPERCASE); // chain functions
You can then pass those to the applyAll function just as the enums and apply them one after the other in a loop.

For a large validation task is chain of responsibility pattern a good bet?

I need to build a process which will validate a record against ~200 validation rules. A record can be one of ~10 types. There is some segmentation from validation rules to record types but there exists a lot of overlap which prevents me from cleanly binning the validation rules.
During my design I'm considering a chain of responsibility pattern for all of the validation rules. Is this a good idea or is there a better design pattern?

Validation is frequently a Composite pattern. When you break it down, you want to seperate the what you want to from the how you want to do it, you get:
If foo is valid
then do something.
Here we have the abstraction is valid -- Caveat: This code was lifted from currrent, similar examples so you may find missing symbology and such. But this is so you get the picture. In addition, the
Result
Object contains messaging about the failure as well as a simple status (true/false).
This allow you the option of just asking "did it pass?" vs. "If it failed, tell me why"
QuickCollection
and
QuickMap
Are convenience classes for taking any class and quickly turning them into those respected types by merely assigning to a delegate. For this example it means your composite validator is already a collection and can be iterated, for example.
You had a secondary problem in your question: "cleanly binding" as in, "Type A" -> rules{a,b,c}" and "Type B" -> rules{c,e,z}"
This is easily managed with a Map. Not entirely a Command pattern but close
Map<Type,Validator> typeValidators = new HashMap<>();
Setup the validator for each type then create a mapping between types. This is really best done as bean config if you're using Java but Definitely use dependency injection
public interface Validator<T>{
public Result validate(T value);
public static interface Result {
public static final Result OK = new Result() {
#Override
public String getMessage() {
return "OK";
}
#Override
public String toString() {
return "OK";
}
#Override
public boolean isOk() {
return true;
}
};
public boolean isOk();
public String getMessage();
}
}
Now some simple implementations to show the point:
public class MinLengthValidator implements Validator<String> {
private final SimpleResult FAILED;
private Integer minLength;
public MinLengthValidator() {
this(8);
}
public MinLengthValidator(Integer minLength) {
this.minLength = minLength;
FAILED = new SimpleResult("Password must be at least "+minLength+" characters",false);
}
#Override
public Result validate(String newPassword) {
return newPassword.length() >= minLength ? Result.OK : FAILED;
}
#Override
public String toString() {
return this.getClass().getSimpleName();
}
}
Here is another we will combine with
public class NotCurrentValidator implements Validator<String> {
#Autowired
#Qualifier("userPasswordEncoder")
private PasswordEncoder encoder;
private static final SimpleResult FAILED = new SimpleResult("Password cannot be your current password",false);
#Override
public Result validate(String newPassword) {
boolean passed = !encoder.matches(newPassword,user.getPassword());
return (passed ? Result.OK : FAILED);
}
#Override
public String toString() {
return this.getClass().getSimpleName();
}
}
Now here is a composite:
public class CompositePasswordRule extends QuickCollection<Validator> implements Validator<String> {
public CompositeValidator(Collection<Validator> rules) {
super.delegate = rules;
}
public CompositeValidator(Validator<?>... rules) {
super.delegate = Arrays.asList(rules);
}
#Override
public CompositeResult validate(String newPassword) {
CompositeResult result = new CompositeResult(super.delegate.size());
for(Validator rule : super.delegate){
Result temp = rule.validate(newPassword);
if(!temp.isOk())
result.put(rule,temp);
}
return result;
}
public static class CompositeResult extends QuickMap<Validator,Result> implements Result {
private Integer appliedCount;
private CompositeResult(Integer appliedCount) {
super.delegate = VdcCollections.delimitedMap(new HashMap<PasswordRule, Result>(), "-->",", ");
this.appliedCount = appliedCount;
}
#Override
public String getMessage() {
return super.delegate.toString();
}
#Override
public String toString() {
return super.delegate.toString();
}
#Override
public boolean isOk() {
boolean isOk = true;
for (Result r : delegate.values()) {
isOk = r.isOk();
if(!isOk)
break;
}
return isOk;
}
public Integer failCount() {
return this.size();
}
public Integer passCount() {
return appliedCount - this.size();
}
}
}
and now a snippet of use:
private Validator<String> pwRule = new CompositeValidator<String>(new MinLengthValidator(),new NotCurrentValidator());
Validator.Result result = pwRule.validate(newPassword);
if(!result.isOk())
throw new PasswordConstraintException("%s", result.getMessage());
user.obsoleteCurrentPassword();
user.setPassword(passwordEncoder.encode(newPassword));
user.setPwExpDate(DateTime.now().plusDays(passwordDaysToLive).toDate());
userDao.updateUser(user);

Chain of responsibility implies that there is an order in which the validations must take place. I would probably use something similar to the Strategy pattern where you have a Set of validation strategies that are applied to a specific type of record. You could then use a factory to examine the record and apply the correct set of validations.

Java Compilation error "method setSchema in class MpsPojo cannot be applied to given types;"

Hi I saw some of the related question related to this but didn't find any to the point solution.
I have a POJO class defined as:
MpsPojo.java
public class MpsPojo {
private String mfr;
private String prod;
private String sche;
public String getMfr() {
return mfr;
}
public void setMfr(String mfr) {
this.mfr = mfr;
}
public String getProd() {
return prod;
}
public void setProd() {
this.prod = prod;
}
public String getSchema() {
return sche;
}
public void setSchema() {
this.sche = sche;
}
}
I have 2nd business Logic as:: MpsLogic.java
public class MpsLogic {
public void calculateAssert(MpsPojo mpspojo){
String manufacturer;
String product;
String schema;
manufacturer = mpspojo.getMfr();
product = mpspojo.getProd();
schema = mpspojo.getSchema();
String url = "http://localhost:9120/dashboards/all/list/"+manufacturer+"/"+product+"/"+schema;
}
}
And final class, the Test class is :: FinalLogic.java
public class FinalLogic {
MpsPojo mpspojon = new MpsPojo();
MpsLogic mpslogicn = new MpsLogic();
#Test
public void firstTest() {
mpspojon.setMfr("m1");
mpspojon.setProd("p1");
mpspojon.setSchema("sch1");
mpslogicn.calculateAssert(mpspojon);
System.out.println("Printing from Final class");
}
}
In program FinalLogic.java, this gives me the Compilation error error method setSchema in class MpsPojo cannot be applied to given types;
But when I comment the lines mpspojon.setProd("p1"); and mpspojon.setSchema("sch1"); then this works fine without error.
I debugged a lot but dint find any clue for this. Any help will be very helpful for me.
Thanks

Add String arguments to setProd and setSchema as you have already done with setMfr:
public void setProd(String prod) {
^ ^
and
public void setSchema(String sche) {
^ ^

setSchema() receives no parameters in your declaration. Change it to:
public void setSchema(String sche) {
this.sche = sche;
}
Same holds true for setProd
If you use any IDE, I advise you:
look into the warnings that you will get (the assignment this.sche = sche will give warning The assignment to variable thing has no effect in case of no argument method).
Generate the setters/getters automatically, don't code them by yourself (thus avoiding any possible typing mistakes). E.g. in Eclipse that will be alt+shift+s, then r

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read AvroFile into Tuple Class with Java in Flink - java

Related

Reactor Mono - execute parallel tasks

Java Generics : cant call a function with said generics even though type matches

Function chaining in Java

For a large validation task is chain of responsibility pattern a good bet?

Java Compilation error "method setSchema in class MpsPojo cannot be applied to given types;"

Categories

Resources