Preon is a Java library meant for creating binary codecs: you simply place annotations in a class' data members regarding their correspondence with bit fields (e.g. number of bits to use for certain field) and, based on such class, the library builds a Codec object that is able to create instances of the class reading their data from a binary input stream.
Due to licensing issues (it is distributed under GPL), I cannot use it.
Are there any libraries with equivalent or similar functionality, either in Java or in C++?
looking at the license page, the preon library is "GNU General Public License, version 2, with the Classpath Exception", which is important. that allows you to use the library in binary form without your application also being GPL.
Take a look at Java Binary Block Parser library, it allows to make bit field parsing and mapping to class fields
Related
I inherited a Java system that was written around CoreNLP, meaning that system classes use CoreNLP classes as fields in some places, in addition to using the CoreNLP parser.
I would like to test the system's accuracy with different parsers, and to that end I have refactored the code to use a generalized parser adapter, the subclasses of which should perform the necessary bridging between the CoreNLP API and the specific parser implementation.
Now, the subclass StanfordParserAdapter is trivial. My problem begins with the OpnNlpParserAdapter subclass. Is there an existing bridge between these parsers that I can use? If so, it will save me (and potentially others) a lot of work.
Example: Given a List<HasWord> to parse, CoreNLP produces a Tree. I would like the OpenNLP parser (through bridging code) to produce an equivalent Stanford Tree object when given the same input.
I don't think that bridging is the optimal solution for this as it involves a lot of work and more time-taking than using Opennlp directly in your project.
On the other hand, configuring and using Opennlp is quite easy. And there's a lot of support which you can use to parse your input.
comment if you need any help in configuring and using tools. A lot of examples are there to help.
I'm new to this Apache Avro(serialization framework). I know what serialization is but why there are separate frameworks lik avro, thrift, protocol buffers and
Why cant we use java serialization api's instead of these separate frameworks, are there any flaws in java serializatio api's.
What is the meaning of below phrase
"does not require running a code-generation program when a schema changes" in avro or in any other serializatio framework.
Please help me to understand all these!!
Why cant we use java serialization api's instead of these separate frameworks, are there any flaws in java serializatio api's.
I would assume you can use Java Serialization unless you know otherwise.
The main reasons not to use it are
you know there is a performance problem.
you need to exchange data across languages. Java Serialization is only for Java.
does not require running a code-generation program when a schema changes
I am guessing this means it can read serialized data with an older or newer model without having to re-generate and compile the code. i.e. it is tolerant of changes in the model.
BTW: As the data models I work with are usually a) very simple b) require maximum performance, I write my own Serialization without using a framework (or write my own framework) This is fine provided your model is very simple and won't change often.
In short, unless you know you can't, try Java Serialization first.
A comparison I did on different Serialization Methods
1.
The problem with java serialization is that it's not agnostic of your code. Meaning that is tightly coupled to the structure of you classes. Other serialization frameworks provide you with some flexibility/control that it's useful to bypass this kind of situations. Even though there is a way in java standard mechanism to control serialization through the writeObject readObject methods, it is a problem that other fwks have addressed in a more elegant way.
Second, you cannot interexchange the output of your java serialization with other language - platforms.
Last, but not least. Java serialization does not produce the more compact result possible, which might lead to performance degradation if you perform things like transfer data over a network. Other protocols (like Oracle's POF or protocol buffers) are more optimized to produce an smaller output.
2.
Regarding your second question I guess that what that means is that you don't need to run any precompile job that generates code in the case that the structure of your serialized classes changes. I personally hate frameworks that force some kind of compile-time code generation. I hate the hassle of having to even have to look at generated code, but that is just me and my ocd.
Two principle things Avro does well: Hadoop's MapReduce and communication protocol structures. I use it for MapReduce where I put numerous data instances in a single file all conforming to a particular schema; each record is stored very efficiently and markers delineate each individual record. Hadoop also uses it to communicate data between the Map and Reduce tasks. Much better than storing field names alongside data. These files are easy to split into multiple parts for processing in a distributed computing environment. Since the schema is embedded into the file, a reader doesn't have to know what the data looks like. Avro is not tied to any language and there are several language APIs for reading Avro data. If you want to write out a single complex object, then Java's serialization OR Avro will work. If you want more power and efficiency and are using millions of individual instances, then Avro is a good alternative. I am sure you can do this with the Java API, but why work that hard.
There are mechanisms to evolve schemas thru the schema resolution rules. There are also tools that will turn your java objects into schemas for you.
The best place to start is here: http://avro.apache.org/docs/current/spec.html It may take a couple of reads to get the gist. Read it again after trying to use some of the tools that come with the Avro package. Avro takes a while to get the hang of. JSON is only used as a data specification language it isn't used to store the data. You can generate schemas using the API or using a JSON file. Lots of flexibility and enough rope to easily get into trouble with -- well worth it tho.
java serializes objects in a well-known and published manner (there's a spec).
what im looking for is a library that can parse a binary blob of serialized objects into something like a graph of apache beanutils DynaBean
such a library would be useful in case i want to "read" (and work with) serialized objects without having the classes themselves in the classpath (or, as in my case, because the classes were refactored and renamed rendering old data unreadable ...)
What is wrong with XML / JSON / BSON ? Well defined, widely accepted and language agnostic formats. There is a ton of serialisation libraries with different flavours
I'm looking for a fail-safe way to round-trip between a JVM class file and a text representation and back again.
One strict requirement is that the resulting round-tripped JVM class file is exactly functionally equivalent to the original JVM class file as long as the text representation is left unchanged.
Furthermore, the text representation must be human-readable and editable. It should be possible to make small changes to the the text representation (such as changing a text string or a class name, etc.) which are reflected in the resulting class file representation.
The simplest solution would be to use a Java decompiler such as JAD to generate the text representation, which in this case would simply be the re-created Java source code. And then use javac to generate the byte-code. However, given the state of the free Java decompilers this approach does not work under all circumstances. It is rather easy to create obfuscated byte-code that does not survive a full round-trip class-file/java-source/class-file (in part because there simply isn't a 1:1 mapping between JVM byte-code and Java source code).
Is there a fail-safe way to achieve JVM class-file/text-representation/class-file round-tripping given the requirements above?
Update: Before answering - save time and effort by reading all the requirements above, and note specifically:
"Text-representation of JVM bytecode" does not necessarily mean "Java source-code".
The BCEL project provides a JasminVisitor which will convert class files into jasmin assembly.
This can be modified and then reassembled into class files. If no edits are made and the versions are kept compatible the the round trip should result in identical class files except that line number mapping may be lost. If you require a bit for bit identical copy for the round trip case you will likely need to alter the tool to take aspects of the code which are pure meta data as well.
jasmin is rather old and is not designed with ease of actually writing full blown programs in assembly but for modifying string constant tables and constants it should be more than adequate.
Jasmin and Kimera?
Looks like ASM does this. (This is the same sort of answer as ShuggyCoUk's, but with a different tool.) Jarjar says it uses ASM for exactly the sort of thing you're talking about.
I've written a tool that's designed for exactly this.
The Krakatau disassembler and assembler is designed to handle any valid classfile, no matter how bizarre. It uses an assembly format based on the Jasmin format, but extended to support all the classfile features that Jasmin can't handle. It even supports some of the obscure or undocumented 'features' of Hotspot, such as pre 45.3 classfiles using smaller widths for the Code attribute fields.
It can roundtrip any classfile I know of. The result won't be identical binary wise, but it will have the same functionality (constant pool entries may be rearranged for instance).
Update: Krakatau now supports exact binary roundtripping of classfiles. Passing the -roundtrip flag will preserve the order of constant pool entries, etc.
No. There exists valid byte-code without a corresponding Java program.
The Soot project has a quite sophisticated decompiler- http://www.sable.mcgill.ca/dava/ - which may be useful for those byte codes coming from a Java compiler. It is, however, not perfect.
Your best bet is still getting the source code for the class files.
Is there any clear documentation on the binary formats used to serialize the various MFC data structures? I've been able to view some of my own classes in a hex editor and use Java's ByteBuffer class to read them in (with automatic endianness conversions, etc).
However, I am currently running into issues while trying to bring over the CObArray data, as there seems to be a rather large header that is opaque to me, and it is unclear how it is persisting object type information.
Is there a set of online documentation that would be helpful for this? Or some sample Java code from someone that has dealt with this in the past?
Since MFC ships with source code I would create a test MFC application that serializes a CObArray and step through the serialization code. This should give you all the information you need.
I agree with jmatthias: use the MFC source code.
There's also this page on MSDN that may be useful.