Dumping values with quotes with SnakeYaml

Dumping values with quotes with SnakeYaml - java

Having a simple yml file test.yml as follows
color: 'red'
I load and dump the file as follows
final DumperOptions yamlOptions = new DumperOptions();
yamlOptions.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK);
Yaml yaml = new Yaml(yamlOptions);
Object result = yaml.load(new FileInputStream(new File("test.yml")));
System.out.println(yaml.dump(result));
I expect to get
color: 'red'
However, the during the dump, the serializer leaves out the quotes and prints
color: red
How can I make the serializer to print the original quotes too?

How can I make the serializer to print the original quotes too?
Not with the high-level API. Quoting the spec:
The scalar style is a presentation detail and must not be used to convey content information, with the exception that plain scalars are distinguished for the purpose of tag resolution.
The high-level API implements the whole YAML loading process, giving you only the content of the YAML file, without any information about presentation details, as required by the spec.
That being said, you can use the low level API which preserves presentation details:
final Yaml yaml = new Yaml();
final Iterator<Event> events = yaml.parse(new StreamReader(new UnicodeReader(
new FileInputStream(new File("test.yml"))).iterator();
final DumperOptions yamlOptions = new DumperOptions();
final Emitter emitter = new Emitter(new PrintWriter(System.out), yamlOptions);
while (events.hasNext()) emitter.emit(events.next());
However, be aware that even this will not preserve every presentation detail of your input (e.g. indentation and comments will not be preserved). SnakeYaml is not round-tripping and therefore unable to preserve the exact input layout.

Related

Serializing multiline string from JsonNode to YAML string adds double quotes and "\n"

I have a YAML string where one of the attributes looks like this:
description: |
this is my description //imagine there's a space after description
this is my description in the second line
In my Java code I read it into a JsonNode like this:
JsonNode node = new YamlMapper().readTree(yamlString);
I then do some changes to it and write it back to a string like this:
new YamlMapper().writeValueAsString(node))
The new string now looks like this:
"this is my description \nthis is my description in the second line\n"
So now in the YAML file you can see the added quotes + the new line character (\n) and everything is in one line. I expect it to return the original YAML like the one above.
This is how my YAML object mapper is configured:
new ObjectMapper(
new YAMLFactory()
.disable(YAMLGenerator.Feature.MINIMIZE_QUOTES))
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
.setSerializationInclusion(JsonInclude.Include.NON_EMPTY);
If I remove the space after description in the original YAML, it works just fine

To serialize multiline text using jackson. Jackson introduced a new flag YAMLGenerator.Feature.LITERAL_BLOCK_STYLE since version 2.9, which can be turned on as:
new ObjectMapper(
new YAMLFactory().enable(YAMLGenerator.Feature.LITERAL_BLOCK_STYLE)
).writeValueAsString(new HashMap<String, String>(){{
put("key", "test1\ntest2\ntest3");
}});
The output won't be wrapped with quotes:
---
key: |-
test1
test2
test3
Note there is a few differences between "block scalars": |, |-, >..., you can check out at https://yaml-multiline.info/

Jackson's API is too high level to control the output in detail. You can use SnakeYAML directly (which is used by Jackson under the hood), but you need to go down to the node or event level of the API to control node style in the output.
See also: I want to load a YAML file, possibly edit the data, and then dump it again. How can I preserve formatting?
This answer shows general usage of SnakeYAML's event API to keep formatting; of course it's harder to do changes on a stream of events. You might instead want to work on the node graph, this answer has some example code showing how to load YAML to a node graph, process it, and write it back again.

How to validate values in a YAML configuration file while loading it?

Is there a way to validate values in a YAML file while loading it in the code. The requirement is I have some elements in the YAML file which must have values. If the validation fails, then YAML should not be loaded.
I'm using snakeyaml library and heard there is a way to do this via Representer.
Code I'm currently using to load the YAML,
Reader in = new InputStreamReader(Files.newInputStream(file), StandardCharsets.UTF_8);
Yaml yaml = new Yaml();
yaml.setBeanAccess(BeanAccess.FIELD);
return yaml.loadAs(in, School.class);

Since you can have any value in a YAML file, you should load the file in a function, test the values and raise an error if the values are not what you want. Return the loaded data if they are.
This may have side-effects if your YAML has tags that create arbitrary objects, but checking during loading will not prevent that, as such object might have been created before you come to the value you want to check.
If you do have tags in your YAML and that is a real problem, then you would have to make a safe_load-er for the YAML file that can handle the tags (by creating normal mapping objects), then check the values and reload with full tag support.

How to build an rdf file using an xml file as input

I have an input file in xml format and I need to convert it into a .rdf file that is based on an ontology model created.
Can anyone let me know what the suitable method is to do this using jena api in java?

Is your input file in some arbitrary XML format, or is it already serialized as RDF/XML? (ie: is the root tag of your document <rdf:RDF>?)
If it is in some arbitrary format, then you will need to define some rdf-based schema for representing your data. This is purely project-specific, and will require work on your part to define a way for a graph to apply to your data.
Once you have done that, then basic document construction is a topic for the Jena Tutorials. There is far too much material to cover here, but the basics of creating a statement should suffice:
final Model m = ModelFactory.createDefaultModel();
final Resource s = m.createResource("urn:ex:subject");
final Property p = m.createProperty("urn:ex:predicate");
final Resource o = m.createResource("urn:ex:object");
m.add(s,p,o);
try( final OutputStream out = Files.newOutputStream(Paths.createTempFile("tmp","rdf"), StandardOpenOptions.CREATE_NEW) ){
m.write(out, null, "RDF/XML");
}
The exercise of iterating over your XML and constructing the proper set of statements is left as an exercise for the reader.
If your data is already in RDF/XML, then you can directly read it in to a model:
// Assume you have an InputStream called 'in' pointing at your input data
final Model m = ModelFactory.createDefaultModel();
m.read(in, null, "RDF/XML"); // Assumed that there is no base

Externalize XML construction from a stream of CSV in Java

I get a stream of values as CSV , based on some condition I need to generate a XML including only a set of values from the CSV. For e.g .
Input : a:value1, b:value2, c:value3, d:value4, e:value5.
if (condition1)
XML O/P = <Request><ValueOfA>value1</ValueOfA><ValueOfE>value5</ValueOfE></Request>
else if (condition2)
XML O/P = <Request><ValueOfB>value2</ValueOfB><ValueOfD>value4</ValueOfD></Request>
I want to externalize the process in a way that given a template the output XML is generated accordingly. String manipulation is the easiest way of implementing this but I do not want to mess up the XML if some special characters appear in the input, etc. Please suggest.

Perhaps you could benefit from templating engine, something like Apache Velocity.

I would suggest creating an xsd and using JAXB to create the Java binding classes that you can use to generate the XML.

I recommend my own templating engine (JATL http://code.google.com/p/jatl/) Although its geared to (X)HTML its also very good at generating XML.
I didn't bother solving the whole problem for you (that is double splitting on the input ("," and then ":").) but this is how you would use JATL.
final String a = "stuff";
HtmlWriter html = new HtmlWriter() {
#Override
protected void build() {
//If condition1
start("Request").start("ValueOfA").text(a).end().end();
}
};
//Now write.
StringWriter writer = new StringWriter();
String results = html.write(writer).getBuffer().toString();
Which would generate
<Request><ValueOfA>stuff</ValueOfA></Request>
All the correct escaping is handled for you.

Extracting text from documents of unknown content type

is there a parser for application/octet-stream type within Apache Tika? I suppose it's a non-parsable stream.
I just need to parse ODS documents, MS documents and PDF files. It seems that new Tika( ).parseToString(file); is enough. But I can't figure out what happens when the content type is not detected - > application/octet-stream is default. If I have a chance to extract text from those documents that are one of those types, but contentType detector didn't detect their type.
What else should I try instead of returning document to the user telling him that it is not supported format.
Or is really a resulting application/octet-stream content type a signal that we can't read this ? Or "you must figure out your own way how to deal with this" ?

If the detector doesn't know what the file is, it'll return application/octet-stream
And if the detector doesn't know what it is, then Tika won't be able to pick a suitable Parser for it. (You'll end up with the EmptyParser which does nothing)
If you can, pass in the name of your file when you do the detection and parsing, as that'll help with the detection in some cases:
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
parser.parse(input, textHandler, metadata, new ParseContext());
Also, it's worth checking the supported formats part of the Tika website to ensure that the documents you have are ones where there's a Parser - http://tika.apache.org/0.9/formats.html
If your documents are in a format that isn't currently supported, then you have two choices (neither immediate fixes). One is to help write a new parser (requires finding a suitable Java library for the format). The other is to use a command line based parser (requires finding an executable for your platform that can do the xhtml generation, then wiring that in)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Dumping values with quotes with SnakeYaml - java

Related

Serializing multiline string from JsonNode to YAML string adds double quotes and "\n"

How to validate values in a YAML configuration file while loading it?

How to build an rdf file using an xml file as input

Externalize XML construction from a stream of CSV in Java

Extracting text from documents of unknown content type

Categories

Resources