Checkmarx scan issue - deserilization of unsanitized xml data from the input - java

I am currently facing issue during checkmarx scan. It is highlighting that we are deserializing of Untrusted data in the last line mentioned below. How to rectify this issue ?
Scan Issue : Deserialization of Untrusted Data
Note: We do not have any xsd
String message = request.getParameter("param_name"); // Input xml string
XStream parser = new XStream(new StaxDriver());
MyMessage messageObj = (MyMessage) parser.fromXML(message); // This line is flagged by CHECKMARX SCAN

I will assume that you intended to say that you're getting results for Deserialization of Untrusted Data.
The reason you're getting that message is that XStream will happily attempt to create an instance of just about any object specified in the XML by default. The technique is to allow only the types you intend to be deserialized. One would presume you've ensured those types are safe.
I ran this code derived from your example and verified that the two lines I added were detected as sanitization.
String message = request.getParameter("param_name");
XStream parser = new XStream(new StaxDriver());
parser.addPermission(NoTypePermission.NONE);
parser.allowTypes(new Class[] {MyMessage.class, String.class});
MyMessage messageObj = (MyMessage) parser.fromXML(message);
I added the String.class type since I'd presume some of your properties on MyMessage are String. String itself, like most primitives, is generally safe for deserialization. While the string itself is safe, you'll want to make sure how you use it is safe. (e.g. if you are deserializing a string and passing it to the OS as part of a shell exec, that could be a different vulnerability.)

Related

How to convert JSON to AVRO GenericRecord in Java

I am building a tool in an Apache Beam pipeline which will ingest lots of different types of data (different Schemas, different filetypes, etc.) and will output the results as Avro files. Because there are many different types of output schemas, I'm using GenericRecords to write the Avro data. These GenericRecords include schemas generated during ingestion for each unique file / schema layout. In general, I have been using the built in Avro Schema class to handle these.
I tried using DecoderFactory to convert the Json data to Avro
DecoderFactory decoderFactory = new DecoderFactory();
Decoder decoder = decoderFactory.jsonDecoder(schema, content);
DatumReader<GenericData.Record> reader = new GenericDatumReader<>(schema);
return reader.read(null, decoder);
Which works just fine for the most part, except for when I have a case of a schema that has nullable fields, because the data is being read in from a JSON format that does not include typed fields, so when it creates the Schema it knows whether or not that field can be nullable, or is required, etc. This produces a problem when it writes the data to Avro:
If I have a nullable record that looks like this:
{"someField": "someValue"}
Avro is expecting the JSON data to look like this:
{"someField": {"string": "someValue"}}. This presents a problem anytime this combination appears (which is very frequent).
One possible solution raised was to use an AvroMapper. I laid it out like it shows on that page, created the Schema object as an AvroSchema, packaged the data into a byte array with the schema using AvroMapper.writter()
static GenericRecord convertJsonToGenericRecord(String content, Schema schema)
throws IOException {
JsonNode node = ObjectMappers.defaultObjectMapper().readTree(content);
AvroSchema avroSchema = new AvroSchema(schema);
byte[] avroData =
mapper
.writer(avroSchema)
.writeValueAsBytes(node);
return mapper.readValue(avroData, GenericRecord.class);
Which may hopefully get around the typing problem with nullable records, but which is still giving me issues in the form of not recognizing that the AvroSchema is inside the actual byte array that I'm passing in (avroData). Here is the stack trace:
com.fasterxml.jackson.core.JsonParseException: No AvroSchema set, can not parse
at com.fasterxml.jackson.dataformat.avro.deser.MissingReader._checkSchemaSet(MissingReader.java:68)
at com.fasterxml.jackson.dataformat.avro.deser.MissingReader.nextToken(MissingReader.java:41)
at com.fasterxml.jackson.dataformat.avro.deser.AvroParserImpl.nextToken(AvroParserImpl.java:97)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4762)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4668)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3691)
When I checked the avroData byte array just to see what it looked like, it did not include anything other than the actual value I passed into it. It didn't include the schema, and it didn't even include the header or key. For the test, I'm using a single K/V pair as in the example above, and all I got back was the value.
An alternative route that I may pursue if this doesn't work is to manually format the JSON data as it comes in, but this is messy, and will require lots of recursion. I'm 99% sure that I can get it working that way, but would love to avoid it if at all possible.
To reiterate, what I'm trying to do is package incoming JSON-formatted data (string, byte array, node, whatever) with an Avro Schema to create GenericRecords which will be output to .avro files. I need to find a way to ingest the data and Schema such that it will allow for nullable records to be untyped in the JSON-string.
Thank you for your time, and don't hesitate to ask clarifying questions.

Serializing multiline string from JsonNode to YAML string adds double quotes and "\n"

I have a YAML string where one of the attributes looks like this:
description: |
this is my description //imagine there's a space after description
this is my description in the second line
In my Java code I read it into a JsonNode like this:
JsonNode node = new YamlMapper().readTree(yamlString);
I then do some changes to it and write it back to a string like this:
new YamlMapper().writeValueAsString(node))
The new string now looks like this:
"this is my description \nthis is my description in the second line\n"
So now in the YAML file you can see the added quotes + the new line character (\n) and everything is in one line. I expect it to return the original YAML like the one above.
This is how my YAML object mapper is configured:
new ObjectMapper(
new YAMLFactory()
.disable(YAMLGenerator.Feature.MINIMIZE_QUOTES))
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
.setSerializationInclusion(JsonInclude.Include.NON_EMPTY);
If I remove the space after description in the original YAML, it works just fine
To serialize multiline text using jackson. Jackson introduced a new flag YAMLGenerator.Feature.LITERAL_BLOCK_STYLE since version 2.9, which can be turned on as:
new ObjectMapper(
new YAMLFactory().enable(YAMLGenerator.Feature.LITERAL_BLOCK_STYLE)
).writeValueAsString(new HashMap<String, String>(){{
put("key", "test1\ntest2\ntest3");
}});
The output won't be wrapped with quotes:
---
key: |-
test1
test2
test3
Note there is a few differences between "block scalars": |, |-, >..., you can check out at https://yaml-multiline.info/
Jackson's API is too high level to control the output in detail. You can use SnakeYAML directly (which is used by Jackson under the hood), but you need to go down to the node or event level of the API to control node style in the output.
See also: I want to load a YAML file, possibly edit the data, and then dump it again. How can I preserve formatting?
This answer shows general usage of SnakeYAML's event API to keep formatting; of course it's harder to do changes on a stream of events. You might instead want to work on the node graph, this answer has some example code showing how to load YAML to a node graph, process it, and write it back again.

How to Pre Process Json String in Java :: Convert Capitalised Field names to lowerCase Camel case names

My current Android project consumes many Json web services.
I have no control over the Json content.
I wish to persist this Json directly into my applications local Realm database.
The issue is the Json Field Names Are All Capitalised.
I do not wish my Realm DTO objects to have capitalised field names as thats just WRONG.
How can I transform the Capitalised field names to acceptable Java field name format?
Is there any Json pre processing libraries that will perform the required transformation of Capitalised field names?
I realise I can use Jackson/GSON type libraries to solve this issue, however that means transforming Json to Java Pojo before I can persist the data.
The json Field names are "ThisIsAFieldName".
What I want is to transform them to "thisIsAFieldName".
I think you should really consider letting your JSON deserializer handle this, but if this really isn't a possibility you can always use good old string manipulation :
String input; // your JSON input
Pattern p = Pattern.compile("\"([A-Z])([^\"]*\"\\s*:)"); // matches '"Xxxx" :'
Matcher m = p.matcher(input);
StringBuffer output = new StringBuffer();
while (m.find()) {
m.appendReplacement(output, String.format("\"%s$2", m.group(1).toLowerCase());
}
m.appendTail(output);
Ideone test.

Hadoop + Jackson parsing: ObjectMapper reads Object and then breaks

I am implementing a JSON RecordReader in Hadoop with Jackson.
By now I am testing locally with JUnit + MRUnit.
The JSON files contain one object each, that after some headers, it has a field whose value is an array of entries, each of which I want to be understood as a Record (so I need to skip those headers).
I am able to do this by advancing the FSDataInputStream up to the point of reading.
In my local testing, I do the following:
fs = FileSystem.get(new Configuration());
in = fs.open(new Path(filename));
long offset = getOffset(in, "HEADER_START_HERE");
in.seek(offset);
where getOffset is a function where points the InputStream where the field value starts - which works OK, if we look at in.getPos() value.
I am reading the first record by:
ObjectMapper mapper = new ObjectMapper();
JsonNode actualObj = mapper.readValue (in, JsonNode.class);
The first record comes back fine. I can use mapper.writeValueAsString(actualObj) and it has read it fine, and it was valid.
Fine till here.
So I try to iterate the objects, by doing:
ObjectMapper mapper = new ObjectMapper();
JsonNode actualObj = null;
do {
actualObj = mapper.readValue (in, JsonNode.class);
if( actualObj != null) {
LOG.info("ELEMENT:\n" + mapper.writeValueAsString(actualObj) );
}
} while (actualObj != null) ;
And it reads the first one, but then it breaks:
java.lang.NullPointerException: null
at org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:54)
at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:57)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:243)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:273)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:225)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:193)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.codehaus.jackson.impl.ByteSourceBootstrapper.ensureLoaded(ByteSourceBootstrapper.java:340)
at org.codehaus.jackson.impl.ByteSourceBootstrapper.detectEncoding(ByteSourceBootstrapper.java:116)
at org.codehaus.jackson.impl.ByteSourceBootstrapper.constructParser(ByteSourceBootstrapper.java:197)
at org.codehaus.jackson.JsonFactory._createJsonParser(JsonFactory.java:503)
at org.codehaus.jackson.JsonFactory.createJsonParser(JsonFactory.java:365)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1158)
Why is this exception happening?
Does it have to do with being reading locally?
Is it needed some kind of reset or something when reusing an ObjectMapper or its underlying stream?
I managed to work it around. In case it helps:
First of all, I'm using Jackson 1.x latest version.
It seems that once JsonParser is instantiated with an InputStream, it takes control over it.
So, when using readValue(), once it is read (internally it calls _readMapAndClose() which automatically closes the stream.
There is a setting that you can set to tell the JsonParser not to close the underlying stream. You can pass it to your JsonFactory like this before your create your JsonParser:
JsonFactory f = new MappingJsonFactory();
f.configure(JsonParser.Feature.AUTO_CLOSE_SOURCE, false);
Beware you are responsible for closing the stream (FSDataInputStream in my case).
So, answers:
Why is this exception happening?
Because the parser manages the stream, and closes it after readValue().
Does it have to do with being reading locally?
No
Is it needed some kind of reset or something when reusing an ObjectMapper or its underlying stream?
No. What you need to be aware of when using Streaming API mixed with ObjectMapper-like methods, is that sometimes the mapper/parser may take control of the underlying stream. Refer to the Javadoc of JsonParser and check the documentation on each of the reading methods to meet your needs.

Invalid character while converting from JSON to XML using jsonlib

I'm trying to convert a JSON string to XML using jsonlib in Java.
JSONObject json = JSONObject.fromObject(jsonString);
XMLSerializer serializer = new XMLSerializer();
String xml = serializer.write( json );
System.out.println(xml);
The error that I get is
nu.xom.IllegalNameException: 0x24 is not a legal NCName character
The problem here is that I have some properties in my JSON that are invalid XML characters. eg. I have a property named "$t". The XMLSerializer throws the exception while trying to create a XML tag in this name because $ is not allowed in XML tag names. Is there any way in which I can override this XML well formedness check done by the serializer?
First I'd suggest to add the language you are using (it is Java, right?).
You could override the method where it checks your XML tag name to do nothing.
I took a look at the spec for the json-lib XMLSerializer and to my surprise it seems to have no option for serialising a JSON object whose keys are not valid XML names. If that's the case then I think you will need to find a different library.
You could loop over json.keySet (recursively if necessary) and replace invalid keys with valid ones (using remove and add).

Categories

Resources