So I have been thinking whether there is a way to send an XML such that the XML contains code in (bytecode) that will be unintentionally executed by the JVM. I am using java so I think uncompiled code will not work. I think I need to inject bytecode in the XML to trick the JVM? I want to try to make sure that the web service that I am building is secure. I am using JAXB for xml marshalling unmarshalling and Jersey as the web service handler.
Unintentionally? I don't think so.
The JAXB marshaller is going to deserialize XML values into the state of a given object, but the class and its behavior will be decided by you. I don't see sending raw bytecode in the XML and doing anything harmful with it.
You could send a JSON object that your Java object could execute using Rhino, but that's hardly unintentional.
Your service might have other security issues, but Java byte code injection attack isn't one of them.
You should be validating all data sent to you before binding, anyway.
About the only xml related vulnerability (i'm aware of) is "external entities", you can read up on that here. pretty sure the jdk has external entity handling disabled by default these days.
XML is data, and it's very unlikely that any recipient is going to try to execute it.
But of course, some XML vocabularies use the data to contain what you can think of as instructions to perform an action, and the recipient might then be fooled into performing inappropriate actions, which you could consider to be a security problem. This vulnerability is not at the level of XML, it is at the level of the application protocol (the vocabulary). The attack would have to use instructions that make sense in the context of this protocol, which is much more likely to be something like <employee action="delete"/> than something at the level of bytecode.
Related
I'm primarily interested in doing this in Java, but seeing a solution in any language would be helpful.
According to various documentation that I'm reading the default workflow with gRPC is
Write a .proto file
Generate client and/or server code from that file
Write your program and compile it together with the generated code
What I want to do is programmaticaly read in a message schema (either from a .proto file or through some other means), and then send some data that's laid out according to that schema to some address.
The only way I can see to do that right now is to shell out, generate code in a temp directory, invoke the compiler, load the compiled code, and use reflection to get at the intended functions.
That sounds like an extreme hack to me. Is there a simpler option available?
In gRPC Java, the generated code and the protos are optional, and you don't actually need them (though they are convenient). To dynamically interpret the message you will need to define your own Marshaller, which works with an InputStream to access the raw message bytes. From here you can buffer them into an array, and decide how to parse them.
As a similar exercise, I wrote a more in depth tutorial on using JSON with gRPC. The principle should be the same for your code.
I'm working on a Play 2 app which is being translated. Play uses Java's MessageFormat behind the scenes so I have a fair number of property values, ala:
my.interface.key={0,choice,0#{0} families|1#1 family|1<{0,number,integer} families}
I just received back a translation of this in the form:
my.interface.key={0,choix,0#{0} familles|1#1 famille|1<{0,nombre,entier} familles}
If it's not obvious, some bits of that should not have been translated, but mistakes will happen from time to time. That's fair enough, but I'm sure there must be a way of validating these strings prior to my app crashing at runtime with a IllegalArgumentException: unknown format type at ... exception. Preferably with a Git commit hook, or even an SBT build task.
If I was to hack this up myself I would probably make a tool to read these property files and check that, for each value, running MessageFormat.format(value) doesn't blow up.
Ideally I could do this via a Perl (or Python) script. Sadly, the only non-Java library I can find - Text::MessageFormat on CPAN - doesn't seem to support the most error-prone formats, such as pluralisation.
Can anyone suggest a more sensible approach based on existing tooling before I dive in?
We had a similar problem. Our solution was to create classes that model the structure of the message format, then use XML to define the messages in our message bundle.
If the translator uses an XML editor then there is some hope they won't "break" the structure of the message.
See this answer for details.
Recently we've been tasked with coming up with a XML communication specification for our products. A few of my coworkers have high opinions of JAXB for marshalling and unmarshalling XML docs. I've spent some time playing around with it and I understand where they are coming from. It makes life simple for simple XML docs.
Now to take it up a notch. One of the things that I would like to see in our communication model "built in" signature validation for people who use it after me. One of the problems I'm running into is that to validate a signature I need to treat the corresponding XML as bytes. So let's take this example...
<topLevel>
<sensitiveData encoding="UTF8">
<creditCard>
<number>1234-1234-1234-1234</number>
<expDate>Oct 2020</expDate>
</creditCard>
</sensitiveData>
<signatureOfSensitiveData algorithm="SHA1WithRSA">VGhpc0lzQVNpZ25hdHVyZQ==</signatureOfSensitiveData>
</topLevel>
Edit: I am not actually passing credit card data. Just an example here.
What would be great is if I could get the byte[] (determined by the encoding) representation of everything inside of the "sensitiveData" tag. I wouldn't even mind having to call "unmarshall" again on that byte[].
This also opens up other doors for us. We could actually introduce "compression" and "encryption" attributes into elements. If we could treat them as a byte[] we could then inflate and decrypt them and then pass them on to be unmarshalled again.
Side note: I think this works if you base64 encode the XML and then include it in an element. But that then forces us to base64 even simple documents and introduce some unnecessary bloat into our messages.
Any ideas for solutions to this? My hope is that I'm just missing something basic in JAXB and it will be a breeze after I get that.
Thanks!
You can use a JAX-WS framework that supports WS-Security. JAX-WS relies on JAXB but adds the communication part with support for the SOAP protocol, and WS-Security is the standard for XML signature, encryption and other security features in SOAP/XML. WS-Security relies on the XML signature & encryption standard mentioned in a comment.
Examples of such frameworks (non-exhaustive list): Apache CXF, Glassfish Metro, etc. More info.
I am implementing a log server in C++; that accepts log messages from a Java program (via log4j socket appender). How do I read these java logging objects in C++?
You should configure the log4j appender to send XML format messages. Then it is simply a matter of reading XML in C++.
Serialized java objects is a byte stream which need meta information from the Java Runtime to be able to reconstruct the java objects. Without that meta information available in the system you must add that information yourself, which is tedious and error prone. I second the idea of sending XML instead - that is what XML serialization was invented for :)
Another very fast way of language-agnostic serialization is protobuf. proto-files (meta-files that describe your data-structures) are compiled using protoc which writes IO-code for various target languages.
I'm using it in my app and did some benchmarking which might give you a clue if it serves your purpose.
The only downside I'm aware of is that protobuf does not handle references at all. If one of your objects contain the same object twice it will be written twice instead of just once with a reference to the previous instance (which is the case with Java serialization).
Concerning your original question, I agree with Thorbjørn that reading and writing of serialized Java objects will be too hard and error prone.
If you consider going the protobuf way, feel free to use this logging event protobuf file as a starter.
json is the best way to go for this kind of problems.
Log4cxx is a Log4j port to C++, perhaps you can glean some ideas from that or even use it directly?
JSON! JSON! JSON! JSON!
I am new to Java and I came across a statement in a Java project which says:
Digester digester = DigesterLoader.createDigester(getClass()
.getClassLoader().getResource("rules.xml"));
rules.xml file contains various patterns and every pattern has different attributes like classname, methodname and some another properties.
i googled about digester but couldn't found anything useful that could help me with the statement above. can anyone just tell me what are the steps followed in executing above statement ? In fact what is the advantage of this XML stuff ?
swapnil, as a user of Digester back in my Struts days I can honestly say it's tricky to learn/debug. It's a tough library to familiarize yourself with, essentially you are setting up event handlers for certain elements kinda like a SAX parser (in fact it's using SAX in behind the scenes). So you feed a rules engine some XPath for nodes you are interested in and setup rules which will instantiate, and set properties on some POJOs with data it finds in the XML file.
Great idea, and once you get used to it it's good, however if you have an xsd for your input xml file I'd sooner recommend you use JAXB.
The one thing that is nice about Digester is it will only do things with elements you are interested in, so memory footprint ends up being nice and low.
This is the method that's getting called here. Xml is used commonly in Java for configurations, since xml files do not need to be compiled. Having the same configuration in a java file would mean you have to compile the file.
I assume you understand how the rules file is being loaded using the class loader? It's basically looking in the same package as the class itself and creating a URL that gives the file's absolute location.
As for the Digester, I've not used it but a quick read of this (http://commons.apache.org/digester/) should explain all.
They used it at my last gig and all I remember is that it was extremely slow.