Why can't avro take the schema from the .avro file?

Why can't avro take the schema from the .avro file? - java

Here is the deserializer from tutorialspoint.
public class Deserialize {
public static void main(String args[]) throws Exception{
//Instantiating the Schema.Parser class.
Schema schema = new Schema.Parser().parse(new File("/home/Hadoop/Avro/schema/emp.avsc"));
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(new File("/home/Hadoop/Avro_Work/without_code_gen/mydata.txt"), datumReader);
GenericRecord emp = null;
while (dataFileReader.hasNext()) {
emp = dataFileReader.next(emp);
System.out.println(emp);
}
System.out.println("hello");
}
}
My question is: If there is already a schema in the .avro file why do I have to pass the schema as well? I find it very inconvenient having to provide the schema in order to parse the file.

Avro requires two schemas for resolution - a reader schema and a writer schema.
The writer schema is included in the file.
And you can parse the schema out of the file
String filepath = ...;
DataFileReader<Void> reader = new DataFileReader<>(Util.openSeekableFromFS(filepath),
new GenericDatumReader<>());
System.out.println(reader.getSchema().toString(true));
This is how java -jar avro-tools.jar getschema works
And you may need the Util.openSeekableFromFS method since it seems to be package private

Related

Edit Liquibase ChangeLog to include new changeset files at runtime

I have a code which is editing the changelog.json at runtime. However when i run liquibase.migrate() it is not picking up the latest changes.
ChangeLog file before the runtime
{
"databaseChangeLog" : [ ]
}
ChangeLog file during the runtime before execution of liquibase.update()
{
"databaseChangeLog" : [ {
"include" : {
"file" : "changesets/myFolder/changeset-0.sql"
}
} ]
}
Method used to add new changeset files to ChangeSetLog at runtime
public void addNewChangeSetToChangeLog(File file) throws IOException {
FileReader fileReader= new FileReader("src/changesets/DbChangelog.json");
BufferedReader br= new BufferedReader(fileReader);
StringBuilder jsonString =new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
jsonString.append(line);
jsonString.append("\n");
}
JsonNode jsonNode=objectMapper.readTree(String.valueOf(jsonString));
ArrayNode arrayNode= (ArrayNode) jsonNode.get("databaseChangeLog");
JsonNode newChangeSetNode=objectMapper.readTree("{\"include\":{\"file\":\""+file.getPath()+"\"}}");
arrayNode.add(newChangeSetNode);
objectMapper.enable(SerializationFeature.INDENT_OUTPUT);
ObjectWriter writer = objectMapper.writer();
writer.writeValue(new File("src/changesets/DbChangelog.json"), jsonNode);
br.close();
fileReader.close();
}
I have tried
getting instance of liquibase class at runtime using class.getInstance()
Taking the location of ChangeLog at runtime via userinput so that it is unknown at compile time
Following is method using to call liquibase update
public void execute(Connection connection) throws LiquibaseException, IOException {
Database database = DatabaseFactory.
getInstance().
findCorrectDatabaseImplementation(new JdbcConnection(connection));
LiquibaseUtils liquibaseUtils=new LiquibaseUtils();
liquibase.Liquibase liquibase = new liquibase.Liquibase("changesets/DbChangelog.json", new ClassLoaderResourceAccessor(), database);
liquibase.update(new Contexts(), new LabelExpression());
}

if you have an edited query for alter table or add/update data, you must declare it in new , so when liquibase querying databasechangelog on your database, and it found new changeset, liquibase will execute it.
in your case, add new file include changeset-1.sql with new query.

How to parse a big rdf file in rdf4j

I want to parse a huge file in RDF4J using the following code but I get an exception due to parser limit;
public class ConvertOntology {
public static void main(String[] args) throws RDFParseException, RDFHandlerException, IOException {
String file = "swetodblp_april_2008.rdf";
File initialFile = new File(file);
InputStream input = new FileInputStream(initialFile);
RDFParser parser = Rio.createParser(RDFFormat.RDFXML);
parser.setPreserveBNodeIDs(true);
Model model = new LinkedHashModel();
parser.setRDFHandler(new StatementCollector(model));
parser.parse(input, initialFile.getAbsolutePath());
FileOutputStream out = new FileOutputStream("swetodblp_april_2008.nt");
RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, out);
try {
writer.startRDF();
for (Statement st: model) {
writer.handleStatement(st);
}
writer.endRDF();
}
catch (RDFHandlerException e) {
}
finally {
out.close();
}
}
The parser has encountered more than "100,000" entity expansions in this document; this is the limit imposed by the application.
I execute my code as following as suggested on the RDF4J web site to set up the two parameters (as in the following command)
mvn -Djdk.xml.totalEntitySizeLimit=0 -DentityExpansionLimit=0 exec:java
any help please

The error is due to the Apache Xerces XML parser, rather than the default JDK XML parser.
So Just delete Xerces XML folder from you .m2 repository and the code works fine.

Reading JSON-String using EMFJson

I am using EMFJson for serializing EMF Ecore Models. I am able to create a JSON String from an existing model. However, the way back is not working for me. I tried the following two snippets:
First Attempt:
ObjectMapper objectMapper = EMFModule.setupDefaultMapper();
objectMapper.reader().forType(MyClass.class).readValue(string);
Second Attempt:
ObjectMapper objectMapper = EMFModule.setupDefaultMapper();
ResourceSet resourceSet = new ResourceSetImpl();
resourceSet.getResourceFactoryRegistry()
.getExtensionToFactoryMap()
.put("json", new JsonResourceFactory());
try {
Resource resource = objectMapper
.reader()
.withAttribute(EMFContext.Attributes.RESOURCE_SET, resourceSet)
.withAttribute(EMFContext.Attributes.RESOURCE_URI, null)
.forType(Resource.class)
.readValue(string);
} catch (IOException e1) {
e1.printStackTrace();
}
For both attempts I am getting the following exception: java.lang.RuntimeException: Cannot create resource for uri default
I guess that the second approach cannot work at all as I do not know what to provide as RESOURCE_URI. The example here I took as foundation for attempt two reads a file rather than a string. Does somebody have an idea how to make this work? Thanks!

I managed to handle it using the answer given here: Parse XML in string format using EMF
The method with my changes looks like this:
private EObject loadEObjectFromString(String model, EPackage ePackage) throws IOException {
ResourceSet resourceSet = new ResourceSetImpl();
resourceSet.getResourceFactoryRegistry().getExtensionToFactoryMap().put(Resource.Factory.Registry.DEFAULT_EXTENSION, new JsonResourceFactory());
resourceSet.getPackageRegistry().put(ePackage.getNsURI(), ePackage);
Resource resource = resourceSet.createResource(URI.createURI("*.extension"));
InputStream stream = new ByteArrayInputStream(model.getBytes(StandardCharsets.UTF_8));
resource.load(stream, null);
return resource.getContents().get(0);
}
Now I can call it like this:
EObject test = this.loadEObjectFromString(jsonString, MyPackage.eINSTANCE);

Validate String against Xml Schema Datatype in Java

Is there a standard way to validate a string against any of the standard xml schema datatypes (see XML Schema Part 2: Datatypes Second Edition or more specifically Built-in-datatypes)?
I don't want to validate a complete XSD, I just wand to validate some user input against XML datatypes (e.g. against http://www.w3.org/2001/XMLSchema#date or http://www.w3.org/2001/XMLSchema#boolean). Is there a way to do it using standard APIs? If not, are there other possibilitie instead of writing it from scratch?
The classes in the package javax.xml.validation seem to be targeted at validating complete schemas instead of specific datatypes.
Example of what I am trying to do:
String content = "abc";
String datatype = "http://www.w3.org/2001/XMLSchema#long";
boolean isValid = Validator.isValid(content, datatype); //return false

Not a standard API, but Xerces has an XML Scheam API that might be of interest. In Xerces you can also find data type validators that enables you to do this:
import org.apache.xerces.impl.dv.InvalidDatatypeValueException;
import org.apache.xerces.impl.dv.xs.YearDV;
public class Main {
public static void main(String[] args) {
try {
new YearDV().getActualValue("Notayear", null);
System.out.println("OK");
} catch (InvalidDatatypeValueException e) {
System.out.println(e.getMessage());
}
}
which would print
cvc-datatype-valid.1.2.1: 'Notayear' is not a valid value for 'gYear'.
Take it from there. Lots of code to read!

you can do the following:
public boolean validate(String inputXml, String schemaLocation)throws SaxException, throws IOException {
// build the schema
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
File schemaFile = new File(schemaLocation);
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
// create a source from a string
Source source = new StreamSource(new StringReader(inputXml));
// check input
boolean isValid = true;
try {
validator.validate(source);
} catch (SaxException e) {
System.err.printlin("Not valid");
isValid = false;
}
return isValid;
}

Validating XML files in Java against two schemas with same namespace

I have
an XML document,
base XSD file and
extended XSD file.
Both XSD files have one namespace.
File 3) includes file 2): <xs:include schemaLocation="someschema.xsd"></xs:include>
XML document (file 1) has following root tag:
<tagDefinedInSchema xmlns="http://myurl.com/myapp/myschema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://myurl.com/myapp/myschema schemaFile2.xsd">
where schemaFile2.xsd is the file 3 above.
I need to validate file 1 against both schemas, without
modifying the file itself and
merging two schemas in one file.
How can I do this in Java?
UPD: Here is the code I'm using.
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
DocumentBuilderFactory documentFactory = DocumentBuilderFactory
.newInstance();
documentFactory.setNamespaceAware(namespaceAware);
DocumentBuilder builder = documentFactory.newDocumentBuilder();
Document document = builder.parse(new ByteArrayInputStream(xmlData
.getBytes("UTF-8")));
File schemaLocation = new File(schemaFileName);
Schema schema = schemaFactory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
Source source = new DOMSource(document);
validator.validate(source);
UPD 2: This works for me:
public static void validate(final String xmlData,
final String schemaFileName, final boolean namespaceAware)
throws SAXException, IOException {
final SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
schemaFactory.setResourceResolver(new MySchemaResolver());
final Schema schema = schemaFactory.newSchema();
final Validator validator = schema.newValidator();
validator.setResourceResolver(schemaFactory.getResourceResolver());
final InputSource is = new InputSource(new ByteArrayInputStream(xmlData
.getBytes("UTF-8")));
validator.validate(new SAXSource(is), new SAXResult(new XMLReaderAdapter()));
}
class MySchemaResolver implements LSResourceResolver {
#Override
public LSInput resolveResource(final String type,
final String namespaceURI, final String publicId, String systemId,
final String baseURI) {
final LSInput input = new DOMInputImpl();
try {
if (systemId == null) {
systemId = SCHEMA1;
}
FileInputStream fis = new FileInputStream(
new File("path_to_schema_directory/" + systemId));
input.setByteStream(fis);
return input;
} catch (FileNotFoundException ex) {
LOGGER.error("File Not found", ex);
return null;
}
}
}

A bit of terminology: you have one schema here, which is built from two schema documents.
If you specify schemaFile2.xsd to the API when building the Schema, it should automatically pull in the other document via the xs:include. If you suspect that this isn't happening, you need to explain what the symptoms are that cause you to believe this.

It may seem a bit inefficient, but couldn't you validate against schema A, create a new validator using schema B and validate against that one too?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why can't avro take the schema from the .avro file? - java

Related

Edit Liquibase ChangeLog to include new changeset files at runtime

How to parse a big rdf file in rdf4j

Reading JSON-String using EMFJson

Validate String against Xml Schema Datatype in Java

Validating XML files in Java against two schemas with same namespace

Categories

Resources