Loading ruby-serialized data in Java for indexing rubygems - java

Is there a way to read the ruby-Marshalled data in Java.
I need to read the gzipped Marshalled latest_specs.4.8.gz file of rubygems in Java for my use case.
What I have noticed is that Ruby uses the Marshalled format of Version 4.8 where as Java is using the version 5(STREAM_VERSION in ObjectStreamConstants) and I think this is giving an exception "java.io.StreamCorruptedException: invalid stream header: 04085B02" in my code.
I tried with jruby but getting an exception "undefined class/module Gem::Version"
I hope this is what is done in artifactory for indexing the gems.
Has anyone ever had a similar issue? Any help is much appreciated.

The Marshal format is documented. If you want to read it in Java, you will have to write a decoder for it. But, the Marshal file may contain Ruby objects which expect their respective classes to be in memory, so you will also have to implement a Java version of those classes.
Or, you could just a language-independent serialization format such as JSON, YAML, XML, ASN.1, BSON, BER, BERT, ogdl, …

Related

Serialize arbitrary complex Java object (with no source code) and load it in Python

I need to serialize a big/complex Java object generated by a library for which I don't have source code.
The reason I need to serialize it is to being able to then load it in Python.
So, I don't need to be able to de-serialize the generated file back to the orginal Java object. I just need to generate some reasonably-easy-to-parse file (e.g., json) so that I can then parse it in Python and extract the information I need.
I tried to use the Gson library but I get a recursion error. I suspect the object I am serializing has some circularity.
What I am trying to avoid is to write any custom serialization code in Java, and keep all the "parse and manually de-serialize" code in Python.
I currently have a solution based on Protobuf, but I don't like it, since it requires:
custom serialization Java code
Protobuf specification
custom information extraction in Python.
So, right now, I need to keep "in sinc" 3 pieces of code.
What I would like is a solution like this:
Java object --> "raw" serialized file --> load in Python --> extract info in Python
Any suggestion?

what is yaml file in java and use of it?

i saw some site like this http://jyaml.sourceforge.net/ for yaml in java.
but i can't to use of that.
how can i use form yaml files?
if is it possible to use it in javafx 2.0?
thanks.
What is YAML
You should see the Wikipedia page for YAML at least. The official YAML website defines it as
[...] a human friendly data serialization
standard for all programming languages.
Use with Java
It depends on what you want to use it for - the most common use (I'd imagine, since I haven't used it myself) would be for storing application configuration, as an alternative to XML or JSON. Essentially, you'll have a simple text file that contains data in a structured format as defined by the YAML spec. Here is an article that discusses the use of YAML with Java.
To avoid reinventing the wheel, you should make use of a library that performs the serialization and deserialization for you i.e. it can read from and write to the text file and parse the data in it and hand it over to your application in an easier to use object form. The business logic, of course, must be written by you. There are several Java libraries that are available and this question on SO talks about which one to use and why: https://stackoverflow.com/questions/450399/which-java-yaml-library-should-i-use.
Yaml is a file format*, and jYaml is a Java library for working with that file format.
So you may use it to read or write information into this format.
How can i use form yaml files?
You write one, and use it with this library.
If is it possible to use it in javafx 2.0?
Can you use this library in JavaFX 2.0? If you can then yes. :)
* See comment

How do I read a java object in C++?

I am implementing a log server in C++; that accepts log messages from a Java program (via log4j socket appender). How do I read these java logging objects in C++?
You should configure the log4j appender to send XML format messages. Then it is simply a matter of reading XML in C++.
Serialized java objects is a byte stream which need meta information from the Java Runtime to be able to reconstruct the java objects. Without that meta information available in the system you must add that information yourself, which is tedious and error prone. I second the idea of sending XML instead - that is what XML serialization was invented for :)
Another very fast way of language-agnostic serialization is protobuf. proto-files (meta-files that describe your data-structures) are compiled using protoc which writes IO-code for various target languages.
I'm using it in my app and did some benchmarking which might give you a clue if it serves your purpose.
The only downside I'm aware of is that protobuf does not handle references at all. If one of your objects contain the same object twice it will be written twice instead of just once with a reference to the previous instance (which is the case with Java serialization).
Concerning your original question, I agree with Thorbjørn that reading and writing of serialized Java objects will be too hard and error prone.
If you consider going the protobuf way, feel free to use this logging event protobuf file as a starter.
json is the best way to go for this kind of problems.
Log4cxx is a Log4j port to C++, perhaps you can glean some ideas from that or even use it directly?
JSON! JSON! JSON! JSON!

How can I uses Sesame's RDFXMLParser in JRuby?

I am not very experienced in Java and JRuby but need to parse RDF data using Sesame's RDFXMLParser in JRuby and my python-minded brain just does not want to get into it. I have problems translating the Java example into JRuby. At first I don't know how to define the RDFHandler in a way it would make sense. I also don't get why the parse methods needs a Reader and an URI as I only want to parse local file.
I would highly appreciate example code in JRuby. Many thanks!
I can't help you with the JRuby-specific part of your question, but as for your confusion over how to invoke the parse() method: if you only want to parse a local file, you would normally do that by simply creating a java.io.FileInputstream object for your file and providing that to the parser.

Saving Java Object Graphs as XML file

What's the simplest-to-use techonlogy available to save an arbitrary Java object graph as an XML file (and to be able to rehydrate the objects later)?
The easiest way here is to serialize the object graph.
Java 1.4 has built in support for serialization as XML.
A solution I have used successfully is XStream (http://x-stream.github.io/)- it's a small library that will easily allow you to serialize and deserialize to and from XML.
The downside is you can only very limited define the resulting XML; which might not be neccessary in your case.
Apache digester is fairly easy: http://commons.apache.org/digester/
JAXB is newer and comes with annotation goodness: https://jaxb.dev.java.net
XStream by the folks at Thoughtworks has a simple API and even deals with things like duplicate and circular references. It seems to be actively developed and is well documented.
http://x-stream.github.io/
Use java.beans.XMLEncoder. Its API is very simple (actually a little too simple; it'd be nice to wire it to a SAX ContentHandler), but it works on many graphs out of the box, and it's easy to create your own persistence delegate for any odd-ball classes you might encounter.
The syntax used by XMLDecoder allows
you to invoke any method, instance
or static, including constructors,
so it's extremely flexible.
Other encoders name
elements and attributes after class
and field names, so there's no fixed schema for the result. The XMLEncoder's
XML follows a simple DTD and can
easily be validated or transformed,
even when you've never seen the
types it uses.
You can assign objects an
identifier, and reference them
throughout the graph.
You can refer to constants defined
in classes or interfaces.
And, it's built into Java SE, so you don't need to ship an extra library.
Simple
Although XStream and JAXB can serialize an some object graphs succssfully they can not handle very complex graphs. The most powerful solution for large complex graphs is Simple XML Serialization. It can handle any graph. Also, it’s fast and simple to use without any dependencies.
To quote the Simple project page:
Simple is a high performance XML serialization and configuration framework for Java. Its goal is to provide an XML framework that enables rapid development of XML configuration and communication systems. This framework aids the development of XML systems with minimal effort and reduced errors. It offers full object serialization and deserialization, maintaining each reference encountered. In essence it is similar to C# XML serialization for the Java platform, but offers additional features for interception and manipulation.
The Simple API is, well, simple! It's really good. http://simple.sourceforge.net/
You can also use XStream: http://www.ibm.com/developerworks/library/x-xstream/index.html
JAX-B is part of the standard APIs and really easy to use.
If you need control over the XML that gets generated, I recommend taking a look at Betwixt (http://commons.apache.org/betwixt/) - it adds a lot of functionality to Apache's digester (Digester is good for building object graphs from XML, but is not so good for generating them).
If you really don't care about the XML that gets generated (just that it can be deserialized in the future), then the XMLEncoder/Decoder classes built into Java or good - as long as the objects you are serializing follow the JavaBean specification. The biggest area I've run into problems with the XMLEncoder/Decoder solution is if you have a bean that returns an immutable list for one of it's properties - the encoder doesn't handle that situation very well.
If you need to control the structure of the XML, the XStream is a good choice. You can use annotations to define precisely the structure/mapping of the XML and your objects.
I'd second (or third) XStream. It reads and writes XML without needing any special binding configuration or placing lots of extraneous syntax in the XML.
I put together a list with a lot of xml serialization libraries and its license
XStream is very simple http://x-stream.github.io/
XStream is a simple library to serialize objects to XML and back again.
java.beans.XMLEncoder perhaps?
Jackson
The Jackson Project is a processing and binding library for XML, JSON, and some other formats.
… Jackson is a suite of data-processing tools for Java (and the JVM platform), including the flagship streaming JSON parser / generator library, matching data-binding library (POJOs to and from JSON) and additional data format modules to process data encoded in Avro, BSON, CBOR, CSV, Smile, (Java) Properties, Protobuf, XML or YAML; and even the large set of data format modules to support data types of widely used data types such as Guava, Joda, PCollections and many, many more…
If you are really only interested in serializing your objects to a file and then deserializing them later, then you might check out YAML instead of XML. YAML is much easier to work with than XML and the output files are very human-readable (which may or may not be a requirement). Check out yaml.org for more information. I've used JYAML successfully on a recent project.

Categories

Resources