Using ANTLR and Java to create a data binding code generator

Using ANTLR and Java to create a data binding code generator - java

I would create a data binding code generator for a specified programming language and for a specified serialization format: given a specification for the structure of data to be serialized or deserialized, the intended code generator should generate the classes (in the specified programming language) that represent the given vocabulary as well as the methods for serialization and deserialization using the specified format. The intended code generator could require the following inputs:
the target programming language, that is the programming language for generating the code;
the target serialization format, that is the serialization format for the data;
the specification of the structure of data to be serialized or deserialized.
Since initially I would like to create a simple code generator, the first version of this software could require only define the specification of the structure of data to be serialized or deserialized, so I choose C# as target programming language and XML as target serialization format.
Essentially, the intended code generator should be a Java software which reads the specification of the structure of data to be serialized or deserialized (this specification must be written in according to a given grammar), and generates the C# classes that represent the given vocabulary: these classes should have the methods for serialization and deserialization in XML format. The purpose of the intended code generator is to generate one or more classes, so that they could be embedded in a C# project.
Regarding the specification of the structure of data to be serialized or deserialized, it could be defined as in the following example:
simple type Message: int id, string content
Given the specification in the above example, the intended code generator could generate the following C# class:
public class Message
{
public int Id { get; set; }
public string Content { get; set; }
public byte[] Serialize()
{
// ...
}
public void Deserialize(byte[] data)
{
// ...
}
}
I read about ANTLR and I believe that this tool is perfect for the just explained purpose. As explained in this answer, I should first create a grammar for the specification of the structure of data to be serialized or deserialized.
The above example is very simple, because it defines only a simple type, but the specification of the structure of data could be more complex, so we could have a compound type which includes one or more simple types, or lists, etc., like in the following example:
simple type LogInfo: DateTime time, String message
simple type LogSource: String class, String version
compound type LogEntry: LogInfo info, LogSource source
Moreover, the specification of the data could include also one or more constraints, like in the following example:
simple type Message: int id (constraint: not negative), string content
In this case, the intended code generator could generate the following C# class:
public class Message
{
private int _id;
private string _content;
public int Id
{
get { return _id; }
set
{
if (value < 0)
throw new ArgumentException("...");
_id = value;
}
}
public string Content
{
get { return _content; }
set { _content = value; }
}
public byte[] Serialize()
{
// ...
}
public void Deserialize(byte[] data)
{
// ...
}
}
Essentially, the intended code generator should find all user-defined types, any constraints, etc .. Is there some simple example?

If you want to look at an opensource data interchange system with roughly the characteristics you are proposing (multi-platform, multi-language, data definition language), you could do worse than looking at Google's Protocol Buffers, more commonly known as protobuf.
The data description language's compiler is not, unfortunately, generated from a grammar; but it is a relatively readable recursive-descent parser written in C++. Code generators for several languages are included, and many more are available.
An interesting feature is the interchange format can be described in itself. In addition, it is possible to code and decode data based on the description of the interchange format, so it is also possible to interchange format descriptions and use them ad hoc without the need of code generation. (This is less efficient, obviously, but is nonetheless often useful.)

Always a good starting point is the example grammars in the Antl4 repo. Simple grammars, like abnf, json, and less, might provide relevant starting points for your specification grammar. More complex grammars, like the several sql grammars, can give insights into how to handle more difficult or involved specification constructs -- each line of your specification appears broadly analogous to an sql statement.
Of course, Antlr 4 -- both its grammar and implementation -- is the most spot on example of reading a specification and generating a derived source output.

Related

Dealing with non-trivial command and event payloads in Axon

Whenever I take a look at Axon Bank I start wondering whether I should follow a set of design rules for events and commands.
In Axon Bank both events and commands exclusively consist of primitives. In my applications I tend to avoid primitive usage as much as possible, mainly to build an expressive domain and to have type safety wherever I can get it.
Axon itself comes around with some DDD references but no matter which documents I browse, not a single example makes use of compound objects as part of event/command payloads.
Which confuses me. There is built-in-support for full-blown xml and json serialization capable of more than just having some key-value pairs.
I understand that especially events tend to be small and simple structures since they only reflect incremental state changes but there will always be some kind of gap between a complex domain model and an event (entry).
In my domain I could have a bunch of Classes like OverdraftLimit, CurrentBalance, Deposit and AccountIdentifier.
Now there are two possible ways to design events and commands:
1. Primitives and extensive converting
Treat Events as raw data with a nice label on it
Convert raw data to powerful objects as soon as it "enters" the application
When creating events simply strip them down again.
public class BankAccountcreatedEvent {
private final String accountIdentifier;
private final int overdraftLimt;
// ...
}
And somewhere else:
public void on (BankAccountCreatedEvent event) {
this.accountIdentifier = AccountIentifier.fromString(event.getAccountIdentifier());
this.overdraftLimit = new OverdraftLimit(event.getOverdraftLimit());
}
Pros:
Simple command/event API that does not have any weird dependencies
Makes distribution easier
Upcasters will only be needed if the actual event structure changes and therefore can be anticipated easily.
Cons:
A huge conversion layer needs to be written and maintained
Decoupling events/commands and the rest of the domain model for mainly technical reasons introduces a new, artificial, contextual gap
2. Expressive Payloads
Use sophisticated types directly as attributes
public class BankAccountCreatedEvent {
private final BankAccountIdentifier bankAccountIdentifier;
private final OverdraftLimit overdraftLimit;
//..
}
Pros:
Less to write, easier to read
Keep together what naturally belongs together
Cons:
Domain logic influences event structure indirectly, upcasting will be needed more frequently and will be less predictable.
I need a second opinion. Is there a recommended way?

The primary thing to keep in mind is that the serialized form of the Event is your formal contract. How you represent that in Java classes is up to each application, in the end. If you configure your serializer to ignore unknown fields, you can leave fields you don't care about out, for example.
Personally, I don't mind primitives in Events. However, I do understand the value of using explicit Value Objects for certain fields, as they allow you to express the "mathematics" involved with each of them. In the case of identifiers, they prevent a "mix-up" where an identifier is used to accidentally attempt to identify another type of object.
In the end, it doesn't matter that much. With a few simple Jackson annotations, you can translate these Value Objects to a simple value in JSON. Check out #JsonValue, for example.
public class BankAccountCreatedEvent {
private final BankAccountIdentifier bankAccountIdentifier;
private final OverdraftLimit overdraftLimit;
//..
}
would map to:
{
"bankAccountIdentifier": "abcdef1234",
"overdraftLimit" : 1000
}
If the BankAccountIdentifier and OverdraftLimit classes would both have an #JsonValue annotated method that would return their 'simple' value.

Binary serialization protocol

I have a requirement where i need to transfer information through the wire(binary over tcp) between 2 applications. One is in Java and the other in C++. I need a protocol implementation to transfer objects between these 2 applications. The Object classes are present in both the applications (are mapped accordingly). I just need some encoding scheme on one side which retains the Object representation on one side and can be decoded on the other side as a complete Object.
For eg,
C++ class
class Person
{
int age;
string name;
};
Java class
class Person
{
int age;
String name;
}
C++ encoding
Person p;
p.age = 20;
p.name = "somename";
char[] arr = SomeProtocolEncoder.encode(p);
socket.send(arr);
Java decoding
byte[] arr = socket.read();
SomeProtocolIntermediateObject object = SomeProtocolDecoder.decode(arr);
Person p = (Person)ReflectionUtil.get(object);
The protocol should provide some intermediate object which maintains the object representational state so that using reflection i can get back the object later.

Sounds like you want Protobufs: http://code.google.com/apis/protocolbuffers/docs/tutorials.html

Check out Google's protocol buffers.

Thrift is what you're looking for. You just create a definition of the structs and methods you need to call and it does all of the heavy lifting. It's got binary protocols (optionally with zlib compression or ssl). It'll probably do your taxes but you didn't hear that from me.

You might want to check out these projects and choose one:
Protocol Buffers
Thrift
Apache Avro
Here is a Thrift-vs-PB comparison I read recently. You should also refer to this Wiki for performance comparisons between these libraries.

You can check the amef protocol, an example of C++ encoding in amef would be like,
//Create a new AMEF object
AMEFObject *object = new AMEFObject();
//Add a child string object
object->addPacket("This is the Automated Message Exchange Format Object property!!","adasd");
//Add a child integer object
object->addPacket(21213);
//Add a child boolean object
object->addPacket(true);
AMEFObject *object2 = new AMEFObject();
string j = "This is the property of a nested Automated Message Exchange Format Object";
object2->addPacket(j);
object2->addPacket(134123);
object2->addPacket(false);
//Add a child character object
object2->addPacket('d');
//Add a child AMEF Object
object->addPacket(object2);
//Encode the AMEF obejct
string str = new AMEFEncoder()->encode(object,false);
Decoding in java would be like,
byte arr = amef encoded byte array value;
AMEFDecoder decoder = new AMEFDecoder()
AMEFObject object1 = AMEFDecoder.decode(arr,true);
The Protocol implementation has codecs for both C++ and Java, the interesting part is it can retain object class representation in the form of name value pairs,
I required a similar protocol in my last project, when i incidentally stumbled upon this protocol, i had actually modified the base library according to my requirements. Hope this helps you.

What about plain old ASN.1?
It would have the advantage of being really backed by a standard (and widely used). The problem is finding a compiler/runtime for each language.

This project is the ultimate comparison of Java serialization protocols:
https://github.com/eishay/jvm-serializers/wiki
Some libraries also provide C++ serialization.
I've personally ported Python Construct to Java. If there's some interest I'll be happy to start a conversion project to C++ and/or JavaScript!
http://construct.wikispaces.com/
https://github.com/ZiglioNZ/construct

Builders in Java versus C++?

In Google's Protocol Buffer API for Java, they use these nice Builders that create an object (see here):
Person john =
Person.newBuilder()
.setId(1234)
.setName("John Doe")
.setEmail("jdoe#example.com")
.addPhone(
Person.PhoneNumber.newBuilder()
.setNumber("555-4321")
.setType(Person.PhoneType.HOME))
.build();
But the corresponding C++ API does not use such Builders (see here)
The C++ and the Java API are supposed to be doing the same thing, so I'm wondering why they didn't use builders in C++ as well. Are there language reasons behind that, i.e. it's not idiomatic or it's frowned upon in C++? Or probably just the personal preference of the person who wrote the C++ version of Protocol Buffers?

The proper way to implement something like that in C++ would use setters that return a reference to *this.
class Person {
std::string name;
public:
Person &setName(string const &s) { name = s; return *this; }
Person &addPhone(PhoneNumber const &n);
};
The class could be used like this, assuming similarly defined PhoneNumber:
Person p = Person()
.setName("foo")
.addPhone(PhoneNumber()
.setNumber("123-4567"));
If a separate builder class is wanted, then that can be done too. Such builders should be allocated
in stack, of course.

I would go with the "not idiomatic", although I have seen examples of such fluent-interface styles in C++ code.
It may be because there are a number of ways to tackle the same underlying problem. Usually, the problem being solved here is that of named arguments (or rather their lack of). An arguably more C++-like solution to this problem might be Boost's Parameter library.

The difference is partially idiomatic, but is also the result of the C++ library being more heavily optimized.
One thing you failed to note in your question is that the Java classes emitted by protoc are immutable and thus must have constructors with (potentially) very long argument lists and no setter methods. The immutable pattern is used commonly in Java to avoid complexity related to multi-threading (at the expense of performance) and the builder pattern is used to avoid the pain of squinting at large constructor invocations and needing to have all the values available at the same point in the code.
The C++ classes emitted by protoc are not immutable and are designed so that the objects can be reused over multiple message receptions (see the "Optimization Tips" section on the C++ Basics Page); they are thus harder and more dangerous to use, but more efficient.
It is certainly the case that the two implementations could have been written in the same style, but the developers seemed to feel that ease of use was more important for Java and performance was more important for C++, perhaps mirroring the usage patterns for these languages at Google.

Your claim that "the C++ and the Java API are supposed to be doing the same thing" is unfounded. They're not documented to do the same things. Each output language can create a different interpretation of the structure described in the .proto file. The advantage of that is that what you get in each language is idiomatic for that language. It minimizes the feeling that you're, say, "writing Java in C++." That would definitely be how I'd feel if there were a separate builder class for each message class.
For an integer field foo, the C++ output from protoc will include a method void set_foo(int32 value) in the class for the given message.
The Java output will instead generate two classes. One directly represents the message, but only has getters for the field. The other class is the builder class and only has setters for the field.
The Python output is different still. The class generated will include a field that you can manipulate directly. I expect the plug-ins for C, Haskell, and Ruby are also quite different. As long as they can all represent a structure that can be translated to equivalent bits on the wire, they're done their jobs. Remember these are "protocol buffers," not "API buffers."
The source for the C++ plug-in is provided with the protoc distribution. If you want to change the return type for the set_foo function, you're welcome to do so. I normally avoid responses that amount to, "It's open source, so anyone can modify it" because it's not usually helpful to recommend that someone learn an entirely new project well enough to make major changes just to solve a problem. However, I don't expect it would be very hard in this case. The hardest part would be finding the section of code that generates setters for fields. Once you find that, making the change you need will probably be straightforward. Change the return type, and add a return *this statement to the end of the generated code. You should then be able to write code in the style given in Hrnt's answer.

To follow up on my comment...
struct Person
{
int id;
std::string name;
struct Builder
{
int id;
std::string name;
Builder &setId(int id_)
{
id = id_;
return *this;
}
Builder &setName(std::string name_)
{
name = name_;
return *this;
}
};
static Builder build(/* insert mandatory values here */)
{
return Builder(/* and then use mandatory values here */)/* or here: .setId(val) */;
}
Person(const Builder &builder)
: id(builder.id), name(builder.name)
{
}
};
void Foo()
{
Person p = Person::build().setId(2).setName("Derek Jeter");
}
This ends up getting compiled into roughly the same assembler as the equivalent code:
struct Person
{
int id;
std::string name;
};
Person p;
p.id = 2;
p.name = "Derek Jeter";

In C++ you have to explicitly manage memory, which would probably make the idiom more painful to use - either build() has to call the destructor for the builder, or else you have to keep it around to delete it after constructing the Person object.
Either is a little scary to me.

Generating ActionScript value objects from an xsd schema

Are there any tools available for transforming types defined in an xsd schema (may or may not include other xsd files) into ActionScript value objects? I've been googling this for a while but can't seem to find any tools and I'm pondering wether writing such a tool would save us more time right now than to simply code our value objects by hand.
Another possibility I've been considering is using a tool such as XMLBeans to transform the types defined by the schema to Java classes and then converting those classes in ActionScript. However, I've come to realize that there are about a gazillion java -> as3 converters out there and the general consesus seems to be that they sort of work, ie, I have no idea which tool is a good fit.
Any thoughts?

For Java -> AS generation, check out GAS3 from the Granite Data Services project:
http://www.graniteds.org/confluence/display/DOC/2.+Gas3+Code+Generator
This is the kind of thing you can write yourself too, especially if you leverage a tool like Ant and write a custom Task to handle it. In fact, I worked on this last year and open-sourced it:
https://github.com/cliffmeyers/Java2As

I don't have any kind of translator either. What I do is have an XML object wrapped by an ActionScript object. Then you have a getter/setter for each value that converts xml->whatever and whatever->XML. You still have to write the getter/setter though, but you can have a macro/snippit handle that work for you.
So for XML like:
<person>
<name>Bob</name>
...
</person>
Then we have an XML Object Wrapper class and extend it. Normally
class XMLObjectWrapper
{
var _XMLObject:XML;
function set XMLObject(xml:XML):void
{
_XMLObject = xml;
}
function get XMLObject():XML
{
return _XMLObject;
}
}
class person extends XMLObjectWrapper
{
function set name(value:String):void
{
_XMLObject.name = value;
}
function get name():String
{
return _XMLObject.name;
}
}

Examples for creating stub data structures with dynamic JVM Languages?

Over the years, I think I have seen and tried every conceivable way of generating stub data structures (fake data) for complex object graphs. It always gets hairy in java.
* * * *
A---B----C----D----E
(Pardon cheap UML)
The key issue is that there are certain relationships between the values, so a certain instance of C may imply given values for E.
Any attempt I have seen at applying a single pattern or group of pattens to solve this problem in java ultimately end up being messy.
I am considering if groovy or any of the dynamic vm languages can do a better job. It should be possible to do things significantly simpler with closures.
Anyone have any references/examples of this problem solved nicely with (preferably) groovy or scala ?
Edit:
I did not know "Object Mother" was the name of the pattern, but it's the one I'm having troubles with: When the object structure to be generated by the Object Mother is sufficiently complex, you'll always end up with a fairly complex internal structure inside the Object Mother itself (or by composing multiple Object Mothers). Given a sufficiently large target structure (Say 30 classes), finding structured ways to implement the object mother(s) is really hard. Now that I know the name of the pattern i can google it better though ;)

You might find the Object Mother pattern to be useful. I've used this on my current Groovy/Grails project to help me create example data.
It's not groovy specific, but a dynamic language can often make it easier to create something like this using duck typing and closures.

I typically create object mothers using the builder pattern.
public class ItineraryObjectMother
{
Status status;
private long departureTime;
public ItineraryObjectMother()
{
status = new Status("BLAH");
departureTime = 123456L;
}
public Itinerary build()
{
Itinerary itinerary = new Itinerary(status);
itinerary.setDepartureTime(departureTime);
return itinerary;
}
public ItineraryObjectMother status(Status status)
{
this.status = status;
return this;
}
public ItineraryObjectMother departs(long departureTime)
{
this.departureTime = departureTime;
return this;
}
}
Then it can be used like this:
Itinerary i1 = new ItineraryObjectMother().departs(1234L).status(someStatus).build();
Itinerary i2 = new ItineraryObjectMother().departs(1234L).build();
As Ted said, this can be improved/simplified with a dynamic language.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using ANTLR and Java to create a data binding code generator - java

Related

Dealing with non-trivial command and event payloads in Axon

Binary serialization protocol

Builders in Java versus C++?

Generating ActionScript value objects from an xsd schema

Examples for creating stub data structures with dynamic JVM Languages?

Categories

Resources