I successfully read an XSD schema using org.eclipse.xsd.util.XSDResourceImpl and process all contained XSD elements, types, attributes etc.
But when I want to process a reference to an element declared in the imported schema, I get null as its type. It seems the imported schemas are not processed by XSDResourceImpl.
Any idea?
final XSDResourceImpl rsrc = new XSDResourceImpl(URI.createFileURI(xsdFileWithPath));
rsrc.load(new HashMap());
final XSDSchema schema = rsrc.getSchema();
...
if (elem.isElementDeclarationReference()){ //element ref
elem = elem.getResolvedElementDeclaration();
}
XSDTypeDefinition tdef = elem.getType(); //null for element ref
Update:
I made the imported XSD invalid, but get no exception. It means it is really not parsed. Is there any way to force loading imported XSD together with the main one?
There is one important trick to process imports and includes automatically. You have to use a ResourceSet to read the main XSD file.
import org.eclipse.emf.ecore.resource.Resource;
import org.eclipse.emf.ecore.resource.ResourceSet;
import org.eclipse.emf.ecore.resource.impl.ResourceSetImpl;
import org.eclipse.xsd.util.XSDResourceFactoryImpl;
import org.eclipse.xsd.util.XSDResourceImpl;
import org.eclipse.xsd.XSDSchema;
static ResourceSet resourceSet;
XSDResourceFactoryImpl rf = new XSDResourceFactoryImpl();
Resource.Factory.Registry.INSTANCE.getExtensionToFactoryMap().put("xsd", rf);
resourceSet = new ResourceSetImpl();
resourceSet.getLoadOptions().put(XSDResourceImpl.XSD_TRACK_LOCATION, Boolean.TRUE);
XSDResourceImpl rsrc = (XSDResourceImpl)(resourceSet.getResource(uri, true));
XSDSchema sch = rsrc.getSchema();
Then before processing an element, an attribute or a model group you have to use this:
elem = elem.getResolvedElementDeclaration();
attr = attr.getResolvedAttributeDeclaration();
grpdef = grpdef.getResolvedModelGroupDefinition();
Could you try something like that, manually resolve type :
final XSDResourceImpl rsrc = new XSDResourceImpl(URI.createFileURI(xsdFileWithPath));
rsrc.load(new HashMap());
final XSDSchema schema = rsrc.getSchema();
for (Object content : schema.getContents())
{
if (content instanceof XSDImport)
{
XSDImport xsdImport = (XSDImport) content;
xsdImport.resolveTypeDefinition(xsdImport.getNamespace(), "");
}
}
You may have a look here. Particulary in this method :
private static void forceImport(XSDSchemaImpl schema) {
if (schema != null) {
for (XSDSchemaContent content: schema.getContents()) {
if (content instanceof XSDImportImpl) {
XSDImportImpl importDirective = (XSDImportImpl)content;
schema.resolveSchema(importDirective.getNamespace());
}
}
}
}
Related
I'm trying to validate an XML file against a number of different schemas (apologies for the contrived example):
a.xsd
b.xsd
c.xsd
c.xsd in particular imports b.xsd and b.xsd imports a.xsd, using:
<xs:include schemaLocation="b.xsd"/>
I'm trying to do this via Xerces in the following manner:
XMLSchemaFactory xmlSchemaFactory = new XMLSchemaFactory();
Schema schema = xmlSchemaFactory.newSchema(new StreamSource[] { new StreamSource(this.getClass().getResourceAsStream("a.xsd"), "a.xsd"),
new StreamSource(this.getClass().getResourceAsStream("b.xsd"), "b.xsd"),
new StreamSource(this.getClass().getResourceAsStream("c.xsd"), "c.xsd")});
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new StringReader(xmlContent)));
but this is failing to import all three of the schemas correctly resulting in cannot resolve the name 'blah' to a(n) 'group' component.
I've validated this successfully using Python, but having real problems with Java 6.0 and Xerces 2.8.1. Can anybody suggest what's going wrong here, or an easier approach to validate my XML documents?
So just in case anybody else runs into the same issue here, I needed to load a parent schema (and implicit child schemas) from a unit test - as a resource - to validate an XML String. I used the Xerces XMLSchemFactory to do this along with the Java 6 validator.
In order to load the child schema's correctly via an include I had to write a custom resource resolver. Code can be found here:
https://code.google.com/p/xmlsanity/source/browse/src/com/arc90/xmlsanity/validation/ResourceResolver.java
To use the resolver specify it on the schema factory:
xmlSchemaFactory.setResourceResolver(new ResourceResolver());
and it will use it to resolve your resources via the classpath (in my case from src/main/resources). Any comments are welcome on this...
http://www.kdgregory.com/index.php?page=xml.parsing
section 'Multiple schemas for a single document'
My solution based on that document:
URL xsdUrlA = this.getClass().getResource("a.xsd");
URL xsdUrlB = this.getClass().getResource("b.xsd");
URL xsdUrlC = this.getClass().getResource("c.xsd");
SchemaFactory schemaFactory = schemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//---
String W3C_XSD_TOP_ELEMENT =
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n"
+ "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" elementFormDefault=\"qualified\">\n"
+ "<xs:include schemaLocation=\"" +xsdUrlA.getPath() +"\"/>\n"
+ "<xs:include schemaLocation=\"" +xsdUrlB.getPath() +"\"/>\n"
+ "<xs:include schemaLocation=\"" +xsdUrlC.getPath() +"\"/>\n"
+"</xs:schema>";
Schema schema = schemaFactory.newSchema(new StreamSource(new StringReader(W3C_XSD_TOP_ELEMENT), "xsdTop"));
The schema stuff in Xerces is (a) very, very pedantic, and (b) gives utterly useless error messages when it doesn't like what it finds. It's a frustrating combination.
The schema stuff in python may be a lot more forgiving, and was letting small errors in the schema go past unreported.
Now if, as you say, c.xsd includes b.xsd, and b.xsd includes a.xsd, then there's no need to load all three into the schema factory. Not only is it unnecessary, it will likely confuse Xerces and result in errors, so this may be your problem. Just pass c.xsd to the factory, and let it resolve b.xsd and a.xsd itself, which it should do relative to c.xsd.
From the xerces documentation :
http://xerces.apache.org/xerces2-j/faq-xs.html
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
...
StreamSource[] schemaDocuments = /* created by your application */;
Source instanceDocument = /* created by your application */;
SchemaFactory sf = SchemaFactory.newInstance(
"http://www.w3.org/XML/XMLSchema/v1.1");
Schema s = sf.newSchema(schemaDocuments);
Validator v = s.newValidator();
v.validate(instanceDocument);
I faced the same problem and after investigating found this solution. It works for me.
Enum to setup the different XSDs:
public enum XsdFile {
// #formatter:off
A("a.xsd"),
B("b.xsd"),
C("c.xsd");
// #formatter:on
private final String value;
private XsdFile(String value) {
this.value = value;
}
public String getValue() {
return this.value;
}
}
Method to validate:
public static void validateXmlAgainstManyXsds() {
final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
String xmlFile;
xmlFile = "example.xml";
// Use of Enum class in order to get the different XSDs
Source[] sources = new Source[XsdFile.class.getEnumConstants().length];
for (XsdFile xsdFile : XsdFile.class.getEnumConstants()) {
sources[xsdFile.ordinal()] = new StreamSource(xsdFile.getValue());
}
try {
final Schema schema = schemaFactory.newSchema(sources);
final Validator validator = schema.newValidator();
System.out.println("Validating " + xmlFile + " against XSDs " + Arrays.toString(sources));
validator.validate(new StreamSource(new File(xmlFile)));
} catch (Exception exception) {
System.out.println("ERROR: Unable to validate " + xmlFile + " against XSDs " + Arrays.toString(sources)
+ " - " + exception);
}
System.out.println("Validation process completed.");
}
I ended up using this:
import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;
import java.io.IOException;
.
.
.
try {
SAXParser parser = new SAXParser();
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setFeature("http://apache.org/xml/features/validation/schema", true);
parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking", true);
parser.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", "http://your_url_schema_location");
Validator handler = new Validator();
parser.setErrorHandler(handler);
parser.parse("file:///" + "/home/user/myfile.xml");
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException ex) {
e.printStackTrace();
}
class Validator extends DefaultHandler {
public boolean validationError = false;
public SAXParseException saxParseException = null;
public void error(SAXParseException exception)
throws SAXException {
validationError = true;
saxParseException = exception;
}
public void fatalError(SAXParseException exception)
throws SAXException {
validationError = true;
saxParseException = exception;
}
public void warning(SAXParseException exception)
throws SAXException {
}
}
Remember to change:
1) The parameter "http://your_url_schema_location" for you xsd file location.
2) The string "/home/user/myfile.xml" for the one pointing to your xml file.
I didn't have to set the variable: -Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema=org.apache.xerces.jaxp.validation.XMLSchemaFactory
Just in case, anybody still come here to find the solution for validating xml or object against multiple XSDs, I am mentioning it here
//Using **URL** is the most important here. With URL, the relative paths are resolved for include, import inside the xsd file. Just get the parent level xsd here (not all included xsds).
URL xsdUrl = getClass().getClassLoader().getResource("my/parent/schema.xsd");
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(xsdUrl);
JAXBContext jaxbContext = JAXBContext.newInstance(MyClass.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(schema);
/* If you need to validate object against xsd, uncomment this
ObjectFactory objectFactory = new ObjectFactory();
JAXBElement<MyClass> wrappedObject = objectFactory.createMyClassObject(myClassObject);
marshaller.marshal(wrappedShipmentMessage, new DefaultHandler());
*/
unmarshaller.unmarshal(getClass().getClassLoader().getResource("your/xml/file.xml"));
If all XSDs belong to the same namespace then create a new XSD and import other XSDs into it. Then in java create schema with the new XSD.
Schema schema = xmlSchemaFactory.newSchema(
new StreamSource(this.getClass().getResourceAsStream("/path/to/all_in_one.xsd"));
all_in_one.xsd :
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ex="http://example.org/schema/"
targetNamespace="http://example.org/schema/"
elementFormDefault="unqualified"
attributeFormDefault="unqualified">
<xs:include schemaLocation="relative/path/to/a.xsd"></xs:include>
<xs:include schemaLocation="relative/path/to/b.xsd"></xs:include>
<xs:include schemaLocation="relative/path/to/c.xsd"></xs:include>
</xs:schema>
Microsoft Powerpoint has a feature to split the slides by section (a logical grouping).
What's the best way to extract the section name?
Tech Stack -
Apache POI - v5.2.2
Java
I've achieved the same with VBA
sectionName = ActivePresentation.SectionProperties.Name(currentSlide.sectionIndex)
The Office Open XML which Apache POI uses is Office Open XML defined in 2006 and first published in Office 2007. This OOXML does not know something about sections in presentations. Sections were introduced later (2010).
Even ECMA-376 5th edition does not contain anything about sections in presentations. So Microsoft has not public published XSDs for this extension yet. So XmlBeans can't have created classes for it.
So if one would want using that feature, one would must manipulate the XML directly.
How to get what XML needs to be manipulated?
All Office Open XML files, so also PowerPoint *.pptx, are ZIP archives containing XML files and other files in a special directory structure. One can simply unzip a *.pptx file and have a look into.
Have a look into the /ppt/presentation.xml and you will see the XML.
What to use to manipulate the XML?
One can use org.openxmlformats.schemas.presentationml.x2006.main.* classes contained in poi-ooxml-full-5.*.jar as long as possible and else org.apache.xmlbeans.XmlObject and/or org.apache.xmlbeans.XmlCursorcontained in xmlbeans-5.*.jar. But using XmlObject directly can be very laborious.
Complete example for how to get the sections and the section names:
import java.io.FileInputStream;
import org.apache.poi.xslf.usermodel.*;
import org.apache.xmlbeans.XmlObject;
import javax.xml.namespace.QName;
public class PowerPointGetSectionProperties {
static Long getSlideId(XSLFSlide slide) {
if (slide == null) return null;
Long slideId = null;
XMLSlideShow presentation = slide.getSlideShow();
String slideRId = presentation.getRelationId(slide);
org.openxmlformats.schemas.presentationml.x2006.main.CTPresentation ctPresentation = presentation.getCTPresentation();
org.openxmlformats.schemas.presentationml.x2006.main.CTSlideIdList sldIdLst = ctPresentation.getSldIdLst();
for (org.openxmlformats.schemas.presentationml.x2006.main.CTSlideIdListEntry sldId : sldIdLst.getSldIdList()) {
if (sldId.getId2().equals(slideRId)) {
slideId = sldId.getId();
break;
}
}
return slideId;
}
static XmlObject[] getSections(org.openxmlformats.schemas.presentationml.x2006.main.CTExtensionList extList) {
if (extList == null) return new XmlObject[0];
XmlObject[] sections = extList.selectPath(
"declare namespace p14='http://schemas.microsoft.com/office/powerpoint/2010/main' "
+".//p14:section");
return sections;
}
static XmlObject[] getSectionSldIds(XmlObject section) {
if (section == null) return new XmlObject[0];
XmlObject[] sldIds = section.selectPath(
"declare namespace p14='http://schemas.microsoft.com/office/powerpoint/2010/main' "
+".//p14:sldId");
return sldIds;
}
static Long getSectionSldId(XmlObject sectionSldId) {
if (sectionSldId == null) return null;
Long sldIdL = null;
XmlObject sldIdO = sectionSldId.selectAttribute(new QName("id"));
if (sldIdO instanceof org.apache.xmlbeans.impl.values.XmlObjectBase) {
String sldIsS = ((org.apache.xmlbeans.impl.values.XmlObjectBase)sldIdO).getStringValue();
try {
sldIdL = Long.valueOf(sldIsS);
} catch (Exception ex) {
// do nothing
}
}
return sldIdL;
}
static XmlObject getSection(XSLFSlide slide) {
Long slideId = getSlideId(slide);
if (slideId != null) {
XMLSlideShow presentation = slide.getSlideShow();
org.openxmlformats.schemas.presentationml.x2006.main.CTPresentation ctPresentation = presentation.getCTPresentation();
org.openxmlformats.schemas.presentationml.x2006.main.CTExtensionList extList = ctPresentation.getExtLst();
XmlObject[] sections = getSections(extList);
for (XmlObject section : sections) {
XmlObject[] sectionSldIds = getSectionSldIds(section);
for (XmlObject sectionSldId : sectionSldIds) {
Long sldIdL = getSectionSldId(sectionSldId);
if (slideId.equals(sldIdL)) {
return section;
}
}
}
}
return null;
}
static String getSectionName(XmlObject section) {
if (section == null) return null;
String sectionName = null;
XmlObject name = section.selectAttribute(new QName("name"));
if (name instanceof org.apache.xmlbeans.impl.values.XmlObjectBase) {
sectionName = ((org.apache.xmlbeans.impl.values.XmlObjectBase)name).getStringValue();
}
return sectionName;
}
public static void main(String args[]) throws Exception {
XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("./PPTXUsingSections.pptx"));
for (XSLFSlide slide : slideShow.getSlides()) {
System.out.println(slide.getSlideName());
XmlObject section = getSection(slide);
String sectionName = getSectionName(section);
System.out.println(sectionName);
}
slideShow.close();
}
}
I'm trying to deserialize the below XML to an object, but one of the values (Required)is returning null.
<?xml version="1.0" encoding="UTF-8"?>
-<sy:config xmlns:sy="http://www.example.com/def/sy">
-<sy:configurations>
-<sy:configuration property="isReq" name="ABC">
**Required**
<atom:link title="ABC Uri" xmlns:atom="http://www.w3.org/2005/Atom" rel="http://www.example.com/def//id"
href="abc/bc/def/docid"/>
</sy:configuration>
</sy:configurations>
</sy:config>
enter code here
I'm using the below code to deserialize eclipse emfutil to deserialize could you please let me know why the configuration.getvalue() is returning null instead of returning 'Required'
private static <T extends EObject> T readEObjectFromInputStream(InputStream inputStream, String emfFileExtension,Class<T> expectedResultType) throws IOException {
org.eclipse.emf.common.util.URI emfResourceUri = org.eclipse.emf.common.util.URI
.createPlatformResourceURI(FILE_PATH + emfFileExtension, true);
Resource emfResource = new ResourceSetImpl().createResource(emfResourceUri);
emfResource.load(inputStream, null);
EObject eObject = emfResource.getContents().get(0);
T result = expectedResultType.cast(eObject);
return result;
}
This post on the Eclipse forums has an example of how to do this, and a discussion of several reasons why it might not be working.
Here's the full example:
try { ResourceSet resourceSet = new ResourceSetImpl();
Resource resource = resourceSet.createResource(URI.createURI("http:///My.chbasev21"));
DocumentRoot documentRoot = ChbaseV21Factory.eINSTANCE.createDocumentRoot();
CompanyDetailsType root = ChbaseV21Factory.eINSTANCE.createCompanyDetailsType();
documentRoot.setCompanyDetails(root);
resource.getContents().add(documentRoot);
//resource.save(Collections.EMPTY_MAP);
resource.save(System.out, null);
resource.save(new FileOutputStream("C:/test2.xml"), null);
}
catch (IOException exception) {
exception.printStackTrace();
}
I am trying to deserialize some Kafka messages that were serialized by Nifi, using Hortonworks Schema Registry
Processor used on the Nifi Side as RecordWritter: AvroRecordSetWriter
Schema write strategy: HWX COntent-Encoded Schema Reference
I am able to deserialize these messsages in other Nifi kafka consumer. However I am trying to deserialize them from my Flink application using Kafka code.
I have the following inside the Kafka deserializer Handler of my Flink Application:
final String SCHEMA_REGISTRY_CACHE_SIZE_KEY = SchemaRegistryClient.Configuration.CLASSLOADER_CACHE_SIZE.name();
final String SCHEMA_REGISTRY_CACHE_EXPIRY_INTERVAL_SECS_KEY = SchemaRegistryClient.Configuration.CLASSLOADER_CACHE_EXPIRY_INTERVAL_SECS.name();
final String SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_SIZE_KEY = SchemaRegistryClient.Configuration.SCHEMA_VERSION_CACHE_SIZE.name();
final String SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS_KEY = SchemaRegistryClient.Configuration.SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS.name();
final String SCHEMA_REGISTRY_URL_KEY = SchemaRegistryClient.Configuration.SCHEMA_REGISTRY_URL.name();
Properties schemaRegistryProperties = new Properties();
schemaRegistryProperties.put(SCHEMA_REGISTRY_CACHE_SIZE_KEY, 10L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_CACHE_EXPIRY_INTERVAL_SECS_KEY, 5000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_SIZE_KEY, 1000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS_KEY, 60 * 60 * 1000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_URL_KEY, "http://schema_registry_server:7788/api/v1");
return (Map<String, Object>) HWXSchemaRegistry.getInstance(schemaRegistryProperties).deserialize(message);
And here is the HWXSchemaRegistryCode to deserialize the message:
import com.hortonworks.registries.schemaregistry.avro.AvroSchemaProvider;
import com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient;
import com.hortonworks.registries.schemaregistry.errors.SchemaNotFoundException;
import com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer;
public class HWXSchemaRegistry {
private SchemaRegistryClient client;
private Map<String,Object> config;
private AvroSnapshotDeserializer deserializer;
private static HWXSchemaRegistry hwxSRInstance = null;
public static HWXSchemaRegistry getInstance(Properties schemaRegistryConfig) {
if(hwxSRInstance == null)
hwxSRInstance = new HWXSchemaRegistry(schemaRegistryConfig);
return hwxSRInstance;
}
public Object deserialize(byte[] message) throws IOException {
Object o = hwxSRInstance.deserializer.deserialize(new ByteArrayInputStream(message), null);
return o;
}
private static Map<String,Object> properties2Map(Properties config) {
Enumeration<Object> keys = config.keys();
Map<String, Object> configMap = new HashMap<String,Object>();
while (keys.hasMoreElements()) {
Object key = (Object) keys.nextElement();
configMap.put(key.toString(), config.get(key));
}
return configMap;
}
private HWXSchemaRegistry(Properties schemaRegistryConfig) {
_log.debug("Init SchemaRegistry Client");
this.config = HWXSchemaRegistry.properties2Map(schemaRegistryConfig);
this.client = new SchemaRegistryClient(this.config);
this.deserializer = this.client.getDefaultDeserializer(AvroSchemaProvider.TYPE);
this.deserializer.init(this.config);
}
}
But I am getting a 404 HTTP Error code(schema not found). I think this is due to incompatible "protocols" between Nifi configuration and HWX Schema Registry Client implementation, so schema identifier bytes that the client is looking for does not exist on the server, or something like this.
Can someone help on this?
Thank you.
Caused by: javax.ws.rs.NotFoundException: HTTP 404 Not Found
at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1069)
at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:866)
at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:750)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:205)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390)
at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:748)
at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:404)
at org.glassfish.jersey.client.JerseyInvocation$Builder.get(JerseyInvocation.java:300)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient$14.run(SchemaRegistryClient.java:1054)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient$14.run(SchemaRegistryClient.java:1051)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getEntities(SchemaRegistryClient.java:1051)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getAllVersions(SchemaRegistryClient.java:872)
at com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getAllVersions(SchemaRegistryClient.java:676)
at HWXSchemaRegistry.(HWXSchemaRegistry.java:56)
at HWXSchemaRegistry.getInstance(HWXSchemaRegistry.java:26)
at SchemaService.deserialize(SchemaService.java:70)
at SchemaService.deserialize(SchemaService.java:26)
at org.apache.flink.streaming.connectors.kafka.internals.KafkaDeserializationSchemaWrapper.deserialize(KafkaDeserializationSchemaWrapper.java:45)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:140)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:712)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
I found a workaround. Since I wasn't able to get this working. I take the first bytes of the byte array to make several calls to schema registry and get the avro schema to deserialize later the rest of the byte array.
First byte (0) is protocol version (I figured out this is a Nifi-specific byte, since I didn't need it).
Next 8 bytes are the schema Id
Next 4 bytes are the schema version
The rest of the bytes are the message itself:
import com.hortonworks.registries.schemaregistry.SchemaMetadataInfo;
import com.hortonworks.registries.schemaregistry.SchemaVersionInfo;
import com.hortonworks.registries.schemaregistry.SchemaVersionKey;
import com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient;
try(SchemaRegistryClient client = new SchemaRegistryClient(this.schemaRegistryConfig)) {
try {
Long schemaId = ByteBuffer.wrap(Arrays.copyOfRange(message, 1, 9)).getLong();
Integer schemaVersion = ByteBuffer.wrap(Arrays.copyOfRange(message, 9, 13)).getInt();
SchemaMetadataInfo schemaInfo = client.getSchemaMetadataInfo(schemaId);
String schemaName = schemaInfo.getSchemaMetadata().getName();
SchemaVersionInfo schemaVersionInfo = client.getSchemaVersionInfo(
new SchemaVersionKey(schemaName, schemaVersion));
String avroSchema = schemaVersionInfo.getSchemaText();
byte[] message= Arrays.copyOfRange(message, 13, message.length);
// Deserialize [...]
}
catch (Exception e)
{
throw new IOException(e.getMessage());
}
}
I also thought that maybe I had to remove the first byte before calling the hwxSRInstance.deserializer.deserialize in my question code, since this byte seems to be a Nifi specific byte to communicate between Nifi processors, but it didn't work.
Next step is to build a cache with the schema texts to avoid calling multiple times the schema registry API.
New info: I will extend my answer to include the avro deserialization part, since it was some troubleshooting for me and I had to inspect Nifi Avro Reader source code to figure out this part (I was getting not valid Avro data exception when trying to use the basic avro deserialization code):
import org.apache.avro.Schema;
import org.apache.avro.file.SeekableByteArrayInput;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
private static GenericRecord deserializeMessage(byte[] message, String schemaText) throws IOException {
InputStream in = new SeekableByteArrayInput(message);
Schema schema = new Schema.Parser().parse(schemaText);
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(in, null);
GenericRecord genericRecord = null;
genericRecord = datumReader.read(genericRecord, decoder);
in.close();
return genericRecord;
}
If you want to convert GenericRecord to map, note that String values are not Strings objects, you need to cast the Keys and values of types string:
private static Map<String, Object> avroGenericRecordToMap(GenericRecord record)
{
Map<String, Object> map = new HashMap<>();
record.getSchema().getFields().forEach(field ->
map.put(String.valueOf(field.name()), record.get(field.name())));
// Strings are maped to Utf8 class, so they need to be casted (all the keys of records and those values which are typed as string)
if(map.get("value").getClass() == org.apache.avro.util.Utf8.class)
map.put("value", String.valueOf(map.get("value")));
return map;
}
I have a set of related Java classes, which are able to hold data I need. Below is a simplified class diagram of what I have:
Now I need to import data from XML and for that I want to generate XSD schema. The problem is that I want several XSD schemas like this:
One that allows the whole data graph to be imported.
One that allows only RootNote.fieldA and ChildNodeA.
One that allows only RootNote.fieldB and ChildNodeB.
I can easily generate XSD that meets the requirements of nr.1 using JAXB (programmatically). But is there a way to do that for cases nr.2 and nr.3 for the same classes? In other words, it seems I need something like "profiles" in JAXB.
Update:
Here is how I generate XSD schema:
JAXBContext jc = JAXBContext.newInstance(RootNode.class);
final File baseDir = new File(".");
class MySchemaOutputResolver extends SchemaOutputResolver {
public Result createOutput( String namespaceUri, String suggestedFileName ) throws IOException {
return new StreamResult(new File(baseDir,suggestedFileName));
}
}
jc.generateSchema(new MySchemaOutputResolver());
This is not a full answer, just an idea.
You probably use the javax.xml.bind.JAXBContext.generateSchema(SchemaOutputResolver) method to generate your schema, so you basically use a specific JAXBContext instance. This instance is built based on the annotations in classes. When building the context, these annotations are read an organized into a model which is then used for all the operations.
So to generate different schemas you probably need to create different contexts. You can't change the annotations per case, but you can read annotations in different ways.
Take a look at the AnnotationReader. This is what JAXB RI uses behind the scenes to load annotations from Java classes. You can create your own implementation and use it when creating the JAXBContext. Here's an example of something similar:
final AnnotationReader<Type, Class, Field, Method> annotationReader = new AnnoxAnnotationReader();
final Map<String, Object> properties = new HashMap<String, Object>();
properties.put(JAXBRIContext.ANNOTATION_READER, annotationReader);
final JAXBContext context = JAXBContext.newInstance(
"org.jvnet.annox.samples.po",
Thread.currentThread().getContextClassLoader(),
properties);
So how about writing your own annotation reader, which would consider what you call "profiles"? You can invent your own annotation #XmlSchemaProfile(name="foo"). Your annotation reader would then check if this annotation is present with the desired value and then either return it or ignore it. You'll be able to build different contexts from the same Java model - and consequently produce different schemas according to profiles defined by your #XmlSchemaProfile annotations.
I found a solution that suited me. The idea is to output the result of XSD generation into an XML Document (in-memory DOM). JAXB allows that. After this, you can manipulate the document any way you wish, adding or removing parts.
I wrote some filters that whitelist or blacklist fields (in XSD they are elements) and classes (in XSD they are complex types). While I see a lot of potential problems with this approach, it did the job in my case. Below is the code for case 2 schema:
// This SchemaOutputResolver implementation saves XSD into DOM
static class DOMResultSchemaOutputResolver extends SchemaOutputResolver {
private List<DOMResult> results = new LinkedList<DOMResult>();
#Override
public Result createOutput(String ns, String file) throws IOException {
DOMResult result = new DOMResult();
result.setSystemId(file);
results.add(result);
return result;
}
public Document getDocument() {
return (Document)results.get(0).getNode();
}
public String getFilename() {
return results.get(0).getSystemId();
}
}
// This method serializes the DOM into file
protected void serializeXsdToFile(Document xsdDocument, String filename) throws IOException {
OutputFormat format = new OutputFormat(xsdDocument);
format.setIndenting(true);
FileOutputStream os = new FileOutputStream(filename);
XMLSerializer serializer = new XMLSerializer(os, format);
serializer.serialize(xsdDocument);
}
#Test
public void generateSchema2() throws JAXBException, IOException, XPathExpressionException {
JAXBContext context = JAXBContext.newInstance(RootNode.class);
DOMResultSchemaOutputResolver schemaOutputResolver = new DOMResultSchemaOutputResolver();
context.generateSchema(schemaOutputResolver);
// Do your manipulations here as you want. Below is just an example!
filterXsdDocumentComplexTypes(schemaOutputResolver.getDocument(), asList("childNodeA"), true);
filterXsdDocumentElements(schemaOutputResolver.getDocument(), asList("fieldB"));
serializeXsdToFile(schemaOutputResolver.getDocument(), "xf.xsd");
}
private boolean shouldComplexTypeBeDeleted(String complexTypeName, List<String> complexTypes, boolean whitelist) {
return (whitelist && !complexTypes.contains(complexTypeName)) || (!whitelist && complexTypes.contains(complexTypeName));
}
protected void filterXsdDocumentComplexTypes(Document xsdDocument, List<String> complexTypes, boolean whitelist) throws XPathExpressionException {
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList complexTypeNodes = (NodeList)xPath.evaluate("//*[local-name() = 'complexType']", xsdDocument, XPathConstants.NODESET);
for (int i = 0; i < complexTypeNodes.getLength(); i++) {
Node node = complexTypeNodes.item(i);
Node complexTypeNameNode = node.getAttributes().getNamedItem("name");
if (complexTypeNameNode != null) {
if (shouldComplexTypeBeDeleted(complexTypeNameNode.getNodeValue(), complexTypes, whitelist)) {
node.getParentNode().removeChild(node);
}
}
}
NodeList elements = (NodeList)xPath.evaluate("//*[local-name() = 'element']", xsdDocument, XPathConstants.NODESET);
for (int i = 0; i < elements.getLength(); i++) {
Node node = elements.item(i);
Node typeNameNode = node.getAttributes().getNamedItem("type");
if (typeNameNode != null) {
if (shouldComplexTypeBeDeleted(typeNameNode.getNodeValue(), complexTypes, whitelist) && !typeNameNode.getNodeValue().startsWith("xs")) {
node.getParentNode().removeChild(node);
}
}
}
}
protected void filterXsdDocumentElements(Document xsdDocument, List<String> blacklistedElements) throws XPathExpressionException {
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList elements = (NodeList)xPath.evaluate("//*[local-name() = 'element']", xsdDocument, XPathConstants.NODESET);
for (int i = 0; i < elements.getLength(); i++) {
Node node = elements.item(i);
if (blacklistedElements.contains(node.getAttributes().getNamedItem("name").getNodeValue())) {
node.getParentNode().removeChild(node);
}
}
}