Architecture/Design of a pipeline-based system. How to improve this code? - java

I have a pipeline-based application that analyzes text in different languages (say, English and Chinese). My goal is to have a system that can work in both languages, in a transparent way. NOTE: This question is long because it has many simple code snippets.
The pipeline is composed of three components (let's call them A, B, and C), and I've created them in the following way so that the components are not tightly coupled:
public class Pipeline {
private A componentA;
private B componentB;
private C componentC;
// I really just need the language attribute of Locale,
// but I use it because it's useful to load language specific ResourceBundles.
public Pipeline(Locale locale) {
componentA = new A();
componentB = new B();
componentC = new C();
}
public Output runPipeline(Input) {
Language lang = LanguageIdentifier.identify(Input);
//
ResultOfA resultA = componentA.doSomething(Input);
ResultOfB resultB = componentB.doSomethingElse(resultA); // uses result of A
return componentC.doFinal(resultA, resultB); // uses result of A and B
}
}
Now, every component of the pipeline has something inside which is language specific. For example, in order to analyze Chinese text, I need one lib, and for analyzing English text, I need another different lib.
Moreover, some tasks can be done in one language and cannot be done in the other. One solution to this problem is to make every pipeline component abstract (to implement some common methods), and then have a concrete language-specific implementation. Exemplifying with component A, I'd have the following:
public abstract class A {
private CommonClass x; // common to all languages
private AnotherCommonClass y; // common to all languages
abstract SomeTemporaryResult getTemp(input); // language specific
abstract AnotherTemporaryResult getAnotherTemp(input); // language specific
public ResultOfA doSomething(input) {
// template method
SomeTemporaryResult t = getTemp(input); // language specific
AnotherTemporaryResult tt = getAnotherTemp(input); // language specific
return ResultOfA(t, tt, x.get(), y.get());
}
}
public class EnglishA extends A {
private EnglishSpecificClass something;
// implementation of the abstract methods ...
}
In addition, since each pipeline component is very heavy and I need to reuse them, I thought of creating a factory that caches up the component for further use, using a map that uses the language as the key, like so (the other components would work in the same manner):
public Enum AFactory {
SINGLETON;
private Map<String, A> cache; // this map will only have one or two keys, is there anything more efficient that I can use, instead of HashMap?
public A getA(Locale locale) {
// lookup by locale.language, and insert if it doesn't exist, et cetera
return cache.get(locale.getLanguage());
}
}
So, my question is: What do you think of this design? How can it be improved? I need the "transparency" because the language can be changed dynamically, based on the text that it's being analyzed. As you can see from the runPipeline method, I first identify the language of the Input, and then, based on this, I need to change the pipeline components to the identified language. So, instead of invoking the components directly, maybe I should get them from the factory, like so:
public Output runPipeline(Input) {
Language lang = LanguageIdentifier.identify(Input);
ResultOfA resultA = AFactory.getA(lang).doSomething(Input);
ResultOfB resultB = BFactory.getB(lang).doSomethingElse(resultA);
return CFactory.getC(lang).doFinal(resultA, resultB);
}
Thank you for reading this far. I very much appreciate every suggestion that you can make on this question.

The factory idea is good, as is the idea, if feasible, to encapsulate the A, B, & C components into single classes for each language. One thing that I would urge you to consider is to use Interface inheritance instead of Class inheritance. You could then incorporate an engine that would do the runPipeline process for you. This is similar to the Builder/Director pattern. The steps in this process would be as follows:
get input
use factory method to get correct interface (english/chinese)
pass interface into your engine
runPipeline and get result
On the extends vs implements topic, Allen Holub goes a bit over the top to explain the preference for Interfaces.
Follow up to you comments:
My interpretation of the application of the Builder pattern here would be that you have a Factory that would return a PipelineBuilder. The PipelineBuilder in my design is one that encompases A, B, & C, but you could have separate builders for each if you like. This builder then is given to your PipelineEngine which uses the Builder to generate your results.
As this makes use of a Factory to provide the Builders, your idea above for a Factory remains in tact, replete with its caching mechanism.
With regard to your choice of abstract extension, you do have the choice of giving your PipelineEngine ownership of the heavy objects. However, if you do go the abstract way, note that the shared fields that you have declared are private and therefore would not be available to your subclasses.

I like the basic design. If the classes are simple enough, I might consider consolidating the A/B/C factories into a single class, as it seems there could be some sharing in behavior at that level. I'm assuming that these are really more complex than they appear, though, and that's why that is undesirable.
The basic approach of using Factories to reduce coupling between components is sound, imo.

If I'm not mistaken, What you are calling a factory is actually a very nice form of dependency injection. You are selecting an object instance that is best able to meet the needs of your parameters and return it.
If I'm right about that, you might want to look into DI platforms. They do what you did (which is pretty simple, right?) then they add a few more abilities that you may not need now but you may find would help you later.
I'm just suggesting you look at what problems are solved now. DI is so easy to do yourself that you hardly need any other tools, but they might have found situations you haven't considered yet. Google finds many great looking links right off the bat.
From what I've seen of DI, it's likely that you'll want to move the entire creation of your "Pipe" into the factory, having it do the linking for you and just handing you what you need to solve a specific problem, but now I'm really reaching--my knowledge of DI is just a little better than my knowledge of your code (in other words, I'm pulling most of this out of my butt).

Related

OOP: Any idiom for easy interface extraction and less verbose auto-forwarding?

EDIT
Even though I use a pseudo-Java syntax below for illustration, this question is NOT limited to any 1 programming language. Please feel free to post an idiom or language-provided mechanism from your favorite programming language.
When attempting to reuse an existing class, Old, via composition instead of inheritance, it is very tedious to first manually create a new interface out of the existing class, and then write forwarding functions in New. The exercise becomes especially wasteful if Old has tons of public methods in it and whereas you need to override only a handful of them.
Ignoring IDE's like Eclipse that though can help with this process but still cannot reduce the resulting verbosity of code that one has to read and maintain, it would greatly help to have a couple language mechanisms to...
automatically extract the public methods of Old, say, via an interfaceOf operator; and
by default forward all automatically generated interface methods of Old , say, via a forwardsTo operator, to a composed instance of Old, with you only providing definitions for the handful of methods you wish to override in New.
An example:
// A hypothetical, Java-like language
class Old {
public void a() { }
public void b() { }
public void c() { }
private void d() { }
protected void e() { }
// ...
}
class New implements interfaceOf Old {
public New() {
// This would auto-forward all Old methods to _composed
// except the ones overridden in New.
Old forwardsTo _composed;
}
// The only method of Old that is being overridden in New.
public void b() {
_composed.b();
}
private Old _composed;
}
My question is:
Is this possible at the code level (say, via some reusable design pattern, or idiom), so that the result is minimal verbosity in New and classes like New?
Are there any other languages where such mechanisms are provided?
EDIT
Now, I don't know these languages in detail but I'm hoping that 'Lispy' languages like Scheme, Lisp, Clojure won't disappoint here... for Lisp after all is a 'programmable programming language' (according to Paul Graham and perhaps others).
EDIT 2
I may not be the author of Old or may not want to change its source code, effectively wanting to use it as a blackbox.
This could be done in languages that allow you to specify a catch-all magic method (eg. __call() in php). You could catch any function call here that you have not specifically overriden, check if it exists in class Old and if it does, just forward the call.
Something like this:
public function __call($name, $args)
{
if (method_exists($old, $name))
{
call_user_func([$obj, $name], $args);
}
}
First, to answer the design question in the context of "OOP" (class-oriented) languages:
If you really need to replace Old with its complete interface IOld everywhere you use it, just to make New, which implements IOld, behave like you want, then you actually should use inheritance.
If you only need a small part of IOld for New, then you should only put that part into the interface ICommon and let both Old and New implement it. In this case, you would only replace Old by ICommon where both Old and New make sense.
Second, what can Common Lisp do for you in such a case?
Common Lisp is very different from Java and other class-oriented languages.
Just a few pointers: In Common Lisp, objects are primarily used to structure and categorize data, not code. You won't find "one class per file", "one file per class", or "package names completely correspond to directory structure" here. Methods do not "belong" to classes but to generic functions whose sole responsibility it is to dispatch according to the classes of their arguments (which has the nice side effect of enabling a seamless multiple dispatch). There is multiple inheritance. There are no interfaces as such. There is a much stronger tendency to use packages for modularity instead of just organizing classes. Which symbols are exported ("public" in Java parlance) is defined per package, not per class (which would not make sense with the above obviously).
I think that your problem would either completely disappear in a Common Lisp environment because your code is not forced into a class structure, or be quite naturally solved or expressed in terms of multiple dispatch and/or (maybe multiple) inheritance.
One would need at least a complete example and large parts of the surrounding system to even attempt a translation into Common Lisp idioms. You just write code so differently that it would not make any sense to try a one-to-one translation of a few forms.
I think Go has such a mechanism, a struct can embed methods from another struct.
Take a look here. This could be what you are asking as second question.

What are the best practices for Facade pattern?

I have my code working, but I don't know if the way that I implemented it is appropriate. Basically, I want to maintain the pattern without violating it.
The code looks like this:
Package Model (with setters/getters omitted):
public class CA {
private Integer in;
private Integer jn;
}
public class CB {
private Integer kn;
private Integer ln;
}
public class CC {
private static CC instancia;
private CA a;
private CB b;
public static CC getInstancia() {
if(instancia == null) {
instancia = new CC();
}
return instancia;
}
}
Package Business:
class CCBusiness {
static CC c = CC.getInstancia();
void alter(Integer input) {
c.getCA.setIn(input);
Integer num = c.getCB.getLn();
}
}
Package Facade:
class FacadeOne {
void methodOne() {
CCBusiness.alter(1);
// And more xxBusiness.xx()
}
The real code is more complex, but to explain my doubts, I think this should work.
In one facade I call several Business objects, but it is appropriate that one Business (in this case, the one of CC class) can modify attributes from other classes (in this case, the ones inside CC)? Should I create CABusiness and CBBusiness?
Because, what I understand, one Business can't call another Business, so the second as to be parametrized to receive the object from FacadeOne (if I create CABusiness and CBBusiness)?
I think some clarifications might help you: The facade pattern helps you to have a single point of access for several classes which are hidden behind the facade and thus hidden to the outside world. Usually those classes form some kind of module or logical unit.
What you are struggling with is the structure behind the facade and their hierarchy. This is hard to analyse without knowing the whole picture, but from the information I have it would be best to have several you your Business classes, which can be individually called from the facade. Creating cross-callings between the Business objects will bear the chance to spaghettify your code.
As for best practices and techniques, the simplest one is to draw a sketch of your classes, which usually clarifies a lot. And you're already half way to UML based documentation. :-)
By the way, avoid giving your classes names like CA, CB... It's the same like naming variables a001, a002... Speaking names do a lot for readability!
By having a Facade you can get away with calling multiple CxBusiness objects and integrating their operations into a meaningful result. That is the purpose of a Facade, to simplify the interaction with the Business layer by hiding away interactions of 5 different components behind a concise and clear operation: methodOne.
For the individual CxBusiness however, you want to avoid cross-calling among each other; otherwise, you will end up with a complex dependency structure that could potentially run into circular references. Keep each CxBusiness as the sole wrapper for each Cx model and you will reduce the number of unwanted side-effects when interacting with them. Any interactions among these will take place in the facade.
Furthermore, enforce this pattern by having the facade depend upon interfaces rather than concrete classes: ICABusiness, ICCBusiness, etc. Then, the only way to access any model should be through these interfaces, and obviously, you should not have a concrete CxBusiness with a ICxBusiness member (no cross-dependencies). Once you put these restrictions in place, the implementation itself will flow towards a more modular and less coupled design.

Framework to populate common field in unrelated classes

I'm attempting to write a framework to handle an interface with an external library and its API. As part of that, I need to populate a header field that exists with the same name and type in each of many (70ish) possible message classes. Unfortunately, instead of having each message class derive from a common base class that would contain the header field, each one is entirely separate.
As as toy example:
public class A
{
public Header header;
public Integer aData;
}
public class B
{
public Header header;
public Long bData;
}
If they had designed them sanely where A and B derived from some base class containing the header, I could just do:
public boolean sendMessage(BaseType b)
{
b.header = populateHeader();
stuffNecessaryToSendMessage();
}
But as it stands, Object is the only common class. The various options I've thought of would be:
A separate method for each type. This would work, and be fast, but the code duplication would be depressingly wasteful.
I could subclass each of the types and have them implement a common Interface. While this would work, creating 70+ subclasses and then modifying the code to use them instead of the original messaging classes is a bridge too far.
Reflection. Workable, but I'd expect it to be too slow (performance is a concern here)
Given these, the separate method for each seems like my best bet, but I'd love to have a better option.
I'd suggest you the following. Create a set of interfaces you'd like to have. For example
public interface HeaderHolder {
public void setHeader(Header header);
public Header getHeader();
}
I'd like your classes to implement them, i.e you's like that your class B is defined as
class B implements HeaderHolder {...}
Unfortunately it is not. Now problem!
Create facade:
public class InterfaceWrapper {
public <T> T wrap(Object obj, Class<T> api) {...}
}
You can implement it at this phase using dynamic proxy. Yes, dynamic proxy uses reflection, but forget about this right now.
Once you are done you can use your InterfaceWrapper as following:
B b = new B();
new IntefaceWrapper().wrap(b, HeaderHolder.class).setHeader("my header");
As you can see now you can set headers to any class you want (if it has appropriate property). Once you are done you can check your performance. If and only if usage of reflection in dynamic proxy is a bottleneck change the implementation to code generation (e.g. based on custom annotation, package name etc). There are a lot of tools that can help you to do this or alternatively you can implement such logic yourself. The point is that you can always change implementation of IntefaceWrapper without changing other code.
But avoid premature optimization. Reflection works very efficiently these days. Sun/Oracle worked hard to achieve this. They for example create classes on the fly and cache them to make reflection faster. So probably taking in consideration the full flow the reflective call does not take too much time.
How about dynamically generating those 70+ subclasses in the build time of your project ? That way you won't need to maintain 70+ source files while keeping the benefits of the approach from your second bullet.
The only library I know of that can do this Dozer. It does use reflection, but the good news is that it'll be easier to test if it's slow than to write your own reflection code to discover that it's slow.
By default, dozer will call the same getter/setters on two objects even if they are completely different. You can configure it in much more complex ways though. For example, you can also tell it to access the fields directly. You can give it a custom converter to convert a Map to a List, things like that.
You can just take one populated instance, or perhaps even your own BaseType and say, dozer.map(baseType, SubType.class);

How to change a method's behavior according to the application which is calling it?

I have a common jar that uses some unmarshaling of a String object. The method should act differently depending on which application it is called from, how can I do that besides from the fact that I can identify the application by trying to load some unique class it has (don't like that). Is there some design pattern that solves this issue?
As I alluded to in my comment, the best thing to do is to break that uber-method up into different methods that encapsulate the specific behaviors, and likely also another method (used by all of the app-specific ones) that deals with the common behaviors.
The most important thing to remember is that behavior matters. If something is behaving differently in different scenarios, a calling application effectively cannot use that method because it doesn't have any control over what happens.
If you still really want to have a single method that all of your applications call that behaves differently in each one, you can do it, using a certain design pattern, in a way that makes sense and is maintainable. The pattern is called "Template Method".
The general idea of it is that the calling application passes in a chunk of logic that the called method wraps around and calls when it needs to. This is very similar to functional programming or programming using closures, where you are passing around chunks of logic as if it were data. While Java proper doesn't support closures, other JVM-based languages like Groovy, Scala, Clojure, JRuby, etc. do support closures.
This same general idea is very powerful in certain circumstances, and may apply in your case, but such a question requires very intimate knowledge of the application domain and architecture and there really isn't enough information in your posted question do dig too much deeper.
Actually, I think a good OO oriented solution is, in the common jar, to have one base class, and several derived classes. The base class would contain the common logic for the method being called, and each derived class would contain specific behavior.
So, in your jar, you might have the following:
public abstact class JarClass {
public method jarMethod() {
//common code here
}
}
public class JarClassVersion1 extends JarClass {
public method jarMethod() {
// initiailzation code specific to JarClassVerion1
super.jarMethod();
// wrapup code specific to JarClassVerion1
}
}
public class JarClassVersion2 extends JarClass {
public method jarMethod() {
// initiailzation code specific to JarClassVerion2
super.jarMethod();
// wrapup code specific to JarClassVerion2
}
}
As to how the caller works, if you are willing to design your code so that the knowledge of which derived class to use resides with the caller, then you obviously just have the caller create the appropriate derived class and call jarMethod.
However, I take it from your question, you want the knowledge of which class to use to reside in the jar. In that case, there are several solutions. But a fairly easy one is to define a factory method inside the jar which creates the appropriate derived class. So, inside the abstract JarClass, you might define the following method:
public static JarClass createJarClass(Class callerClass) {
if (callerClass.equals(CallerClassType1.class)) {
return new JarClassVersion1();
} else if (callerClass.equals(CallerClassType2.class)) {
return new JarClassVersion1();
// etc. for all derived classess
}
And then the caller would simply do the following:
JarClass.createJarClass(this.getClass()).jarMethod();

What is the best way to work with many interfaces?

I have a situation where I have have a lot of model classes (~1000) which implement any number of 5 interfaces. So I have classes which implement one and others which implement four or five.
This means I can have any permutation of those five interfaces. In the classical model, I would have to implement 32-5 = 27 "meta interfaces" which "join" the interfaces in a bundle. Often, this is not a problem because IB usually extends IA, etc. but in my case, the five interfaces are orthogonal/independent.
In my framework code, I have methods which need instances that have any number of these interfaces implemented. So lets assume that we have the class X and the interfaces IA, IB, IC, ID and IE. X implements IA, ID and IE.
The situation gets worse because some of these interfaces have formal type parameters.
I now have two options:
I could define an interface IADE (or rather IPersistable_MasterSlaveCapable_XmlIdentifierProvider; underscores just for your reading pleasure)
I could define a generic type as <T extends IPersistable & IMasterSlaveCapable & IXmlIdentifierProvider> which would give me a handy way to mix & match interfaces as I need them.
I could use code like this: IA a = ...; ID d = (ID)a; IE e = (IE)e and then use the local variable with the correct type to call methods even though all three work on the same instance. Or use a cast in every second method call.
The first solution means that I get a lot of empty interfaces with very unreadable names.
The second uses a kind of "ad-hoc" typing. And Oracle's javac sometimes stumbles over them while Eclipse gets it right.
The last solution uses casts. Nuff said.
Questions:
Is there a better solution for mixing any number of interfaces?
Are there any reasons to avoid the temporary types which solution #2 offers me (except for shortcomings in Oracle's javac)?
Note: I'm aware that writing code which doesn't compile with Oracle's javac is a risk. We know that we can handle this risk.
[Edit] There seems to be some confusion what I try to attempt here. My model instances can have one of these traits:
They can be "master slave capable" (think cloning)
They can have an XML identifier
They might support tree operations (parent/child)
They might support revisions
etc. (yes, the model is even more complex than that)
Now I have support code which operates on trees. An extensions of trees are trees with revisions. But I also have revisions without trees.
When I'm in the code to add a child in the revision tree manager, I know that each instance must implement ITtree and IRevisionable but there is no common interface for both because these are completely independent concerns.
But in the implementation, I need to call methods on the nodes of the tree:
public void addChild( T parent, T child ) {
T newRev = parent.createNewRevision();
newRev.addChild( foo );
... possibly more method calls to other interfaces ...
}
If createNewRevision is in the interface IRevisionable and addChild is in the interface ITree, what are my options to define T?
Note: Assume that I have several other interfaces which work in a similar way: There are many places where they are independent but some code needs to see a mix of them. IRevisionableTree is not a solution but another problem.
I could cast the type for each call but that seems clumsy. Creating all permutations of interfaces would be boring and there seems no reasonable pattern to compress the huge interface names. Generics offer a nice way out:
public
<T extends IRevisionable & ITree>
void addChild( T parent, T child ) { ... }
This doesn't always work with Oracle's javac but it seems compact and useful. Any other options/comments?
Loosely coupled capabilities might be interesting. An example here.
It is an entirely different approach; decoupling things instead of typing.
Basically interfaces are hidden, implemented as delegating field.
IA ia = x.lookupCapability(IA.class);
if (ia != null) {
ia.a();
}
It fits here, as with many interfaces the wish to decouple rises, and you can more easily combine cases of interdepending interfaces (if (ia != null && ib != null) ...).
If you have a method (semicode)
void doSomething(IA & ID & IE thing);
then my main concern is: Couldn't doSomething be better tailored? Might it be better to split up the functionality? Or are the interfaces itself badly tailored?
I have stumbled over similar things several times and each time it proved to be better to take big step backward and rethink the complete partitioning of the logic - not only due to the stuff you mentioned but also due to other concerns.
Since you formulated your question very abstractly (i.e. without a sensible example) I cannot tell you if that's advisable in your case also.
I would avoid all "artificial" interfaces/types that attempt to represent combinations. It's just bad design... what happens if you add 5 more interfaces? The number of combinations explodes.
It seems you want to know if some instance implements some interface(s). Reasonable options are:
use instanceof - there is no shame
use reflection to discover the interfaces via object.getClass().getInterfaces() - you may be able to write some general code to process stuff
use reflection to discover the methods via object.getClass().getMethods() and just invoke those that match a known list of methods of your interfaces (this approach means you don't have to care what it implements - sounds simple and therefore sounds like a good idea)
You've given us no context as to exactly why you want to know, so it's hard to say what the "best" approach is.
Edited
OK. Since your extra info was added it's starting to make sense. The best approach here is to use the a callback: Instead of passing in a parent object, pass in an interface that accepts a "child".
It's a simplistic version of the visitor pattern. Your calling code knows what it is calling with and how it can handle a child, but the code that navigates around and/or decides to add a child doesn't have context of the caller.
Your code would look something like this (caveat: May not compile; I just typed it in):
public interface Parent<T> {
void accept(T child);
}
// Central code - I assume the parent is passed in somewhere earlier
public void process(Parent<T> parent) {
// some logic that decides to add a child
addChild(parent, child);
}
public void addChild(Parent<T> parent, T child ) {
parent.accept(child);
}
// Calling code
final IRevisionable revisionable = ...;
someServer.process(new Parent<T> {
void accept(T child) {
T newRev = revisionable.createNewRevision();
newRev.addChild(child);
}
}
You may have to juggle things around, but I hope you understand what I'm trying to say.
Actually solution 1 is a good solution, but you should find a better naming.
What actually would you name a class that implements the IPersistable_MasterSlaveCapable_XmlIdentifierProvider interface? If you follow good naming convention, it should have a meaningful name originating from a model entity. You can give the interface the same name prefixed with I.
I don't find it a disadvantage to have many interfaces, because like that you can write mock implementations for testing purposes.
My situation is the opposite: I know that at certain point in code,
foo must implement IA, ID and IE (otherwise, it couldn't get that
far). Now I need to call methods in all three interfaces. What type
should foo get?
Are you able to bypass the problem entirely by passing (for example) three objects? So instead of:
doSomethingWithFoo(WhatGoesHere foo);
you do:
doSomethingWithFoo(IA foo, ID foo, IE foo);
Or, you could create a proxy that implements all interfaces, but allows you to disable certain interfaces (i.e. calling the 'wrong' interface causes an UnsupportedOperationException).
One final wild idea - it might be possible to create Dynamic Proxies for the appropriate interfaces, that delegate to your actual object.

Categories

Resources