Multiple related schemas using jaxb2

Multiple related schemas using jaxb2 - java

I'm using jaxb2 for a rest webservice.
I need to use two schemas. One is my own schema, stored in the src/main/resources/schema folder, and another schema, which is an online schema http://mypage.com/1/meta/schema.xsd. The problem is that both schemas have duplicated imports, so when I try to build the package, it gives me an issue with both executions saying that certain classes were already defined before.
How can I fix this?

You could use separate schema compilation for that, i.e. out of each schema file a JAR is created.

Related

Generating AVRO classes in specific package

I have two .avsc files with matching namespace and field names. When generating, the classes generated from first schema are being overwritten by classes from the second schema. Is there a way to generate classes in a specific directory, but only for one of the .avsc files?
If I change the namespace in avro schema everything is great, but Kafka messages aren't being read and I get the following error:
Could not find class com.package.TestClaim
Obviously, because avro's namespace after change is com.package.test_claim.TestClaim
This is what's generated when I added *.test_claim to namespace of one of the schemas.

in a specific directory, but only for one of the .avsc files?
That's what the namespace does. This isn't overridable elsewhere, so yes, two files will conflict if compiled separately.
Unclear how you are compiling the Java classes, but if you use Avro IDL rather than AVSC, you can declare/combine many record types under a single namespace on the protocol (which will write the classes to a single folder)
And if you needed different/nested packaging, that is available too
#namespace("org.apache.avro.firstNamespace")
protocol MyProto {
#namespace("org.apache.avro.someOtherNamespace")
record Foo {}

How to manage versions and deployment of XML schemas and classes generated thereof?

We maintain multiple projects that communicate via XML.
The interfaces are defined in XML schemas (.xsd files).
We use JAXB to generate classes from those schemas that are then used in the projects.
We also use the .xsd files to validate input or output.
Sometimes, we need to update the schemas to create a new version that may or may not be backwards compatible.
How can we effectively manage these schemas? Projects should be able to select which version(s) of the schemas they want to work with. It would be nice if every project's build didn't have to integrate and maintain the class generation step again. Are there any good practices for this?
I'm currently thinking about two options:
Package the generated classes as an artifact and deploy them to a Maven repo from where projects can pull them in. Projects don't have to deal with the class generation but access to the .xsd file itself becomes more complicated.
Pull the schemas into the projects as Git submodules. This gives simple access to the schema file but each project's build has to bother with generating the classes.

Basically, JAXB (and XML data binding generally) is a bad idea unless the schema is very stable. You may be using the wrong technology. Working with multiple versions of the schema means you are working with multiple versions of compiled Java code, and that's always going to be a maintenance nightmare.
It may not be a helpful suggestion, but my advice is, don't start from here. If multiple versions of the schema need to coexist, then you want a technology where the application doesn't need to be recompiled every time there is a schema change; look at either a generic low-level API such as JDOM2 or XOM, or a declarative XML-oriented language such as XSLT, XQuery, or LINQ.

Shared Avro Schema files and version control

What is the best practise for sharing an Avro schema / generated sources?
I have two Java applications that communicates through Kafka.
My thought was to use Avro schemas for the events thats flowing between the applications.
So extracting the Avro schemas into a shared library seems like a good idea. But what is actually best practice here? Normally generated files are not stored in Source Control. But is that also the case with Avro generated Java classes. If not - then each consumer will have to generate their own classes at compile time.(But is that even possible if the schemas are in a maven, gradle etc. dependency)

Overall, version control is good, but you should ignore generated sources such as those that end up in the target folder in Maven.
The generated Java classes can go into a shared library placed into Nexus/Artifactory, for example during a mvn deploy, and from there can be appropritaely versioned for consumers to use.
Within the avro-maven-plugin generated classes, the schema is available as a static field, so you wouldn't need to copy those resources into the package.
Otherwise, assuming you are using the Confluent Schema Registry, you could use the GenericRecord type in your consumers, then parse the messages like you would normally for a JSON message, for example. E.g. Object fieldName = message.value().get("fieldName"), meanwhile producers could still have a Specific Avro class

Parsing Xml using JAXB with multiple Schemas each in a separate maven module

I am currently working on having the Maven JAXB plugin generate the model for a really complex set of Xml schemas. I started with all schemas being in one Maven module. The resulting module was able to parse any document in that language. Unfortunately I had to split up this one module into several ones due to the fact that I needed to import that schema and model in another module. It seems that there are issues with the episode files in case of multi-schema modules.
Now for example I have one schema defining the general structure of the xml format and defines the basic types (Let's call that A). Now there are several other schemas each implementing sub-types of these base types (Let's call one of those B).
I created the modules so B extends A. Unfortunately now it seems that I am unable to parse documents anymore. I can see that in module B a ObjctFactory for A has been created containing only definitions for all the types that B extedns from A. All the others are not present anymore.
As long as this ObjectFactory is present I am not able to parse anything, because the definition of the root elemment is missing. If I delete it, all the elements defined in B are missing from the resulting object tree.
Having compared the original ObjectFactory in the module with all the schemas, I could see that in the first version there were tons of "alternatives" that sort of told the parser which elements could possibly come. In the split-up version these are missing, so I guess that if I delete the partial ObjectFactory in B the one in A is used and this one doesn't know about the elements in B.
So how can I have JAXB parse all alements in A and B. Is there some way to extend ObjectFactories? If yes, how is this done?
I guess the next thing that could be causing trouble could be that I have several schemas extending A so I have documents containing A, B, C, D and E where B, C, D and E all extend A but are completely unreleated to each other. I guess extending woudn't be an option in that case.

I have ran into this situation a lot when doing the OGC Schemas and Tools Project. This project has tons of schemas using one another.
Here are few tips for you:
Separate individual schemas into individual Maven modules. One schema - one jar.
Generate episode files.
Use these episodes and separate schema compilation to avoid generating classes for the imported schema.
Still, XJC will generate things for the imported schema here and there - even if your episode file says you don't need anything. Mostly these things can be just removed. I was using the Ant plugin to delete these files during the build.
In the runtime you just include all the JARs you need and build you JAXB context for the context path com.acme.foo:com.acme.bar
You might want to check the OGC project I mentioned. This project has a huge number of interrelated schemas, with different coexisting versions.

What's the correct or proper way to specify XSD schemaLocation across projects?

Say I have two projects, A and B. Java projects, in case that's important.
Project A contains a bunch of XSD files that represent core types and elements. They are all placed in a package called, say, "definition". This gets built into project-a.jar.
Project B represents an extension project, and it's allowed to defined its own types and elements. I created a new schema and placed it in "definition.extension" package. This gets built into project-b.jar.
Now, for the XSDs in Project B, what exactly should I put as the schemaLocation for an include?
schemaLocation="../core-types.xsd" didn't quite work (I know, it's needs a URI), but what exactly is the proper or standard approach to this? Google found me more people asking this question that clear-cut, standard approaches on what really is the correct way to handle this.
It can't be that I have programmatically adjust the schemaLocation during runtime... or that I'd need a build step/script that will dynamically replaced the schemaLocation during compilation... right?
I'm not looking for answers like "put them in a shared location". I'm looking for something more along the lines of a dev environment that uses relative references instead of hardcoded references.
FYI, I'm using IntelliJ IDEA, in case there's an IDE-specific approach.

If you just want IntelliJ to stop showing your includes in red, you can use some custom URI in your include. You then go to Project Settings -> Schema's and DTD's where you can map this URI onto a local file.
If you need to do schema validation at run time, that's a different story. You probably need to use an XML Catalog. If you're using JAXB, you should have a look at this question: jaxb - how to map xsd files to URL to find them

You should use XML Catalogs. This link gives a thorough introduction to XML catalogs - and how to use them in Java for instance - by XML expert Norman Walsh. Quote:
These catalog files can be used to map public and system identifiers and other URIs to local files (or just other URIs).
The aforementioned identifiers are typically the schemalocations or namespaces you use in schema imports.
When using such catalogs, in order to avoid confusions and some bug in XJC, I strongly recommend you remove all schemaLocations from the schema imports in XML schemas, and only keep the namespace (if you have a choice of course). For example:
<import namespace="http://www.w3.org/1999/xlink" />
Then specify the mappings for each namespace to the actual schema location in the catalog. For example, using the OASIS XML catalog format:
<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<uri name="http://www.w3.org/1999/xlink" uri="w3c/1999/xlink.xsd" />
</catalog>
At compilation time, if you are generating JAXB-annotated classes from the schemas, I recommend you use Episodes to achive separate schema compilation aka modular schema compilation. This is supported by maven-jaxb2-plugin for instance, which also has advanced support for catalogs.
For runtime, depending on your XML use case, you should try to use a library with native support for XML catalogs, such as most Java web service frameworks (JAX-WS RI/Metro, Apache CXF...) if you are developing web services for example. If you can't or if you want finer control over the XML catalog (e.g. being able to load schemas from the classpath), I invite you to look at the XML Entity and URI Resolvers page mentioned earlier, especially the sections Using Catalogs with Popular Applications and Adding Catalog Support to Your Applications. Basically, you play with org.apache.xml.resolver.tools.CatalogResolver and optionally (for finer control) org.apache.xml.resolver.CatalogManager classes. For concrete examples of custom CatalogResolver/CatalogManager, you may look at code sources from Apache commons-configuration, AuthzForce, CXF, etc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.