Shared Avro Schema files and version control

Shared Avro Schema files and version control - java

What is the best practise for sharing an Avro schema / generated sources?
I have two Java applications that communicates through Kafka.
My thought was to use Avro schemas for the events thats flowing between the applications.
So extracting the Avro schemas into a shared library seems like a good idea. But what is actually best practice here? Normally generated files are not stored in Source Control. But is that also the case with Avro generated Java classes. If not - then each consumer will have to generate their own classes at compile time.(But is that even possible if the schemas are in a maven, gradle etc. dependency)

Overall, version control is good, but you should ignore generated sources such as those that end up in the target folder in Maven.
The generated Java classes can go into a shared library placed into Nexus/Artifactory, for example during a mvn deploy, and from there can be appropritaely versioned for consumers to use.
Within the avro-maven-plugin generated classes, the schema is available as a static field, so you wouldn't need to copy those resources into the package.
Otherwise, assuming you are using the Confluent Schema Registry, you could use the GenericRecord type in your consumers, then parse the messages like you would normally for a JSON message, for example. E.g. Object fieldName = message.value().get("fieldName"), meanwhile producers could still have a Specific Avro class

Related

Generating AVRO classes in specific package

I have two .avsc files with matching namespace and field names. When generating, the classes generated from first schema are being overwritten by classes from the second schema. Is there a way to generate classes in a specific directory, but only for one of the .avsc files?
If I change the namespace in avro schema everything is great, but Kafka messages aren't being read and I get the following error:
Could not find class com.package.TestClaim
Obviously, because avro's namespace after change is com.package.test_claim.TestClaim
This is what's generated when I added *.test_claim to namespace of one of the schemas.

in a specific directory, but only for one of the .avsc files?
That's what the namespace does. This isn't overridable elsewhere, so yes, two files will conflict if compiled separately.
Unclear how you are compiling the Java classes, but if you use Avro IDL rather than AVSC, you can declare/combine many record types under a single namespace on the protocol (which will write the classes to a single folder)
And if you needed different/nested packaging, that is available too
#namespace("org.apache.avro.firstNamespace")
protocol MyProto {
#namespace("org.apache.avro.someOtherNamespace")
record Foo {}

How to manage versions and deployment of XML schemas and classes generated thereof?

We maintain multiple projects that communicate via XML.
The interfaces are defined in XML schemas (.xsd files).
We use JAXB to generate classes from those schemas that are then used in the projects.
We also use the .xsd files to validate input or output.
Sometimes, we need to update the schemas to create a new version that may or may not be backwards compatible.
How can we effectively manage these schemas? Projects should be able to select which version(s) of the schemas they want to work with. It would be nice if every project's build didn't have to integrate and maintain the class generation step again. Are there any good practices for this?
I'm currently thinking about two options:
Package the generated classes as an artifact and deploy them to a Maven repo from where projects can pull them in. Projects don't have to deal with the class generation but access to the .xsd file itself becomes more complicated.
Pull the schemas into the projects as Git submodules. This gives simple access to the schema file but each project's build has to bother with generating the classes.

Basically, JAXB (and XML data binding generally) is a bad idea unless the schema is very stable. You may be using the wrong technology. Working with multiple versions of the schema means you are working with multiple versions of compiled Java code, and that's always going to be a maintenance nightmare.
It may not be a helpful suggestion, but my advice is, don't start from here. If multiple versions of the schema need to coexist, then you want a technology where the application doesn't need to be recompiled every time there is a schema change; look at either a generic low-level API such as JDOM2 or XOM, or a declarative XML-oriented language such as XSLT, XQuery, or LINQ.

Multiple related schemas using jaxb2

I'm using jaxb2 for a rest webservice.
I need to use two schemas. One is my own schema, stored in the src/main/resources/schema folder, and another schema, which is an online schema http://mypage.com/1/meta/schema.xsd. The problem is that both schemas have duplicated imports, so when I try to build the package, it gives me an issue with both executions saying that certain classes were already defined before.
How can I fix this?

You could use separate schema compilation for that, i.e. out of each schema file a JAR is created.

What's the correct or proper way to specify XSD schemaLocation across projects?

Say I have two projects, A and B. Java projects, in case that's important.
Project A contains a bunch of XSD files that represent core types and elements. They are all placed in a package called, say, "definition". This gets built into project-a.jar.
Project B represents an extension project, and it's allowed to defined its own types and elements. I created a new schema and placed it in "definition.extension" package. This gets built into project-b.jar.
Now, for the XSDs in Project B, what exactly should I put as the schemaLocation for an include?
schemaLocation="../core-types.xsd" didn't quite work (I know, it's needs a URI), but what exactly is the proper or standard approach to this? Google found me more people asking this question that clear-cut, standard approaches on what really is the correct way to handle this.
It can't be that I have programmatically adjust the schemaLocation during runtime... or that I'd need a build step/script that will dynamically replaced the schemaLocation during compilation... right?
I'm not looking for answers like "put them in a shared location". I'm looking for something more along the lines of a dev environment that uses relative references instead of hardcoded references.
FYI, I'm using IntelliJ IDEA, in case there's an IDE-specific approach.

If you just want IntelliJ to stop showing your includes in red, you can use some custom URI in your include. You then go to Project Settings -> Schema's and DTD's where you can map this URI onto a local file.
If you need to do schema validation at run time, that's a different story. You probably need to use an XML Catalog. If you're using JAXB, you should have a look at this question: jaxb - how to map xsd files to URL to find them

You should use XML Catalogs. This link gives a thorough introduction to XML catalogs - and how to use them in Java for instance - by XML expert Norman Walsh. Quote:
These catalog files can be used to map public and system identifiers and other URIs to local files (or just other URIs).
The aforementioned identifiers are typically the schemalocations or namespaces you use in schema imports.
When using such catalogs, in order to avoid confusions and some bug in XJC, I strongly recommend you remove all schemaLocations from the schema imports in XML schemas, and only keep the namespace (if you have a choice of course). For example:
<import namespace="http://www.w3.org/1999/xlink" />
Then specify the mappings for each namespace to the actual schema location in the catalog. For example, using the OASIS XML catalog format:
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<uri name="http://www.w3.org/1999/xlink" uri="w3c/1999/xlink.xsd" />
</catalog>
At compilation time, if you are generating JAXB-annotated classes from the schemas, I recommend you use Episodes to achive separate schema compilation aka modular schema compilation. This is supported by maven-jaxb2-plugin for instance, which also has advanced support for catalogs.
For runtime, depending on your XML use case, you should try to use a library with native support for XML catalogs, such as most Java web service frameworks (JAX-WS RI/Metro, Apache CXF...) if you are developing web services for example. If you can't or if you want finer control over the XML catalog (e.g. being able to load schemas from the classpath), I invite you to look at the XML Entity and URI Resolvers page mentioned earlier, especially the sections Using Catalogs with Popular Applications and Adding Catalog Support to Your Applications. Basically, you play with org.apache.xml.resolver.tools.CatalogResolver and optionally (for finer control) org.apache.xml.resolver.CatalogManager classes. For concrete examples of custom CatalogResolver/CatalogManager, you may look at code sources from Apache commons-configuration, AuthzForce, CXF, etc.

Dealing with shared namespaces with multiple WSDL's (xmlbeans)

I have five WSDL's that share namespaces, but not all of them. I generate client code out of them (databinding with XMLBeans). Seperately they compile fine. I create JAR files out of each generated client code.
Once I try to use all JAR files within a project, I get naming / compile conflicts.
I want to reuse as much as possible. Is there any smart way to deal with this (rather than giving each client an own node in the package structure)?

The XMLBeans (2.x) faq notes the limitations of xsdconfig namespace mapping:
Note: XMLBeans doesn’t support using two or more sets of java classes (in different packages) mapped to schema types/elements that have the same names and target namespaces, using all in the same class loader. Depending on the direction you are using for the java classes to schema types mapping, some features might not work correctly. This is because even though the package names for the java classes are different, the schema location for the schema metadata (.xsb files) is the same and contains the corresponding implementing java class, so the JVM will always pick up the first on the classpath. This can be avoided if multiple class loaders are used.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Shared Avro Schema files and version control - java

Related

Generating AVRO classes in specific package

How to manage versions and deployment of XML schemas and classes generated thereof?

Multiple related schemas using jaxb2

What's the correct or proper way to specify XSD schemaLocation across projects?

Dealing with shared namespaces with multiple WSDL's (xmlbeans)

Categories

Resources