How to get structure of a Google Protobuf message without the definition

How to get structure of a Google Protobuf message without the definition - java

I have to get the message structure of a protobuf message transfered to me without the message's definition. Using UnknownFieldSet methods, I was able to get a string representation of the message as below:
1: "a"
2: {
3:"b"
4:"c"
}
What data structure does field 2 represent ? Using UnknownFieldSet.Field.getGroupList i was able to get the content of field 3 and 4, does that means field 2 has the "deprecated" group structure ?

If you posted the raw binary data we could tell you - or you could look at the protocol buffer encoding documentation. If you see a field with a wire type of 3, that indicates a group.
I'm not as familiar with the UnknownFieldSet API as I probably should be, but it does sound like you're dealing with a group.
On the other hand, I'd expect most of the uses of groups to be internal to Google - where did this data come from? Admittedly there's nothing to stop people from using the deprecated group format instead of embedded messages, but I would hope that few are doing so...
Is there any reason you can't ask for the .proto file involved? While some information can certainly be gleaned from protocol buffers without their definitions, they're really designed to be used in situations where both ends do know the message format - although possibly different versions.

Related

Custom JAXB validation error messages

I am writing a small application which will be used for validating xml-files, correct them (if possible) and then perform tests on the contents. The end users will have very little knowledge of XML and parsing, so I want to catch the validation errors and then write my own event and error handler that produces error messages that are hopefully easier to understand for the end users.
I based my initial attempt on the solution detailed in this blogpost.
So far I have classified the errors based on the contents of event.getMessage(). Unfortunately, without knowing all types of parsing errors that may occur, it is more or less impossible to write good custom error messages. Is there a good way to find what types of error messages that can occur during validation?
I.e. I am looking for a listing of all messages, e.g. The content of element X is not complete one of ..., Invalid content was found starting with element Y..., Value Z is not facet-valid with respect to pattern...
Or is there some better way to do this?

The format of and number of messages will depend on which XML Schema validator you've plugged into JAXB. If you're using one based on Xerces-J (like the one included in Oracle's JDK) most of the messages will be prefixed with an identifier corresponding to a validation rule/constraint in the XML Schema specification (such as cvc-maxLength-valid). The list of identifiers for the XML Schema validation rules are available here in the specification. The full list of XML schema related error messages produced by Xerces can be found in its XMLSchemaMessages.properties message file, but keep in mind that this has changed over time and will depend on which version of Xerces you're using.

Read proto partly instead of full parsing in java

I used to define a proto file, for example
option java_package = "proto.data";
message Data {
repeated string strs = 1;
repeated int ints = 2;
}
I received from network this object's inputstream (or bytes). Then, normally, I do a parsing like Data.parserFrom(stream) or Data.parserFrom(bytes) to get the object.
By this, I have to hold full memory on Data object while I just need travel
all string and integer values in the object. It's bad when the object size is big.
What should I do for this issue?

Unfortunately, there is no way to parse just part of a protobuf. If you want to be sure that you've seen all of the strs or all of the ints, you have to parse the entire message, since the values could appear in any order or even interleaved.
If you only care about memory usage and not CPU time then you could, in theory, use a hand-written parser to parse the message and ignore fields that you don't care about. You still have to do the work of parsing, you can just discard them immediately rather than keeping them in memory. However, to do this you'd need to study the Protobuf wire format and write your own parser. You can use Protobuf's CodedInputStream class but a lot of work still needs to be done manually. The Protobuf library really isn't designed for this.
If you are willing to consider using a different protocol framework, Cap'n Proto is extremely similar in design to Protobufs but features in the ability to read only the part of the message you care about. Cap'n Proto incurs no overhead for the fields you don't examine, other than obviously the bandwidth and memory to receive the raw message bytes. If you are reading from a file, and you use memory mapping (MappedByteBuffer in Java), then only the parts of the message you actually use will be read from disk.
(Disclosure: I am the author of most of Google Protobufs v2 (the version you are probably using) as well as Cap'n Proto.)

Hmm. It appears that it may be already implemented but not adequately documented.
Has you tested it ?
See for discussion:
https://groups.google.com/forum/#!topic/protobuf/7vTGDHe0ZyM
See also, sample test code in google's github:
https://github.com/google/protobuf/blob/4644f99d1af4250dec95339be6a13e149787ab33/java/src/test/java/com/google/protobuf/lazy_fields_lite.proto

How to get HL72.6 ORC-21 in hapi2.1

I am trying to get the value in ORC-21:
//--------------
ORC orcObj = messageObj.getCOMMON_ORDER().getORC();
String result = orcObj.getOrc21_OrderingFacilityName(0).getOrganizationName().getValue();
//--------------
But it turns out that I have to put the ORC field between PID and FT1, as a "global ORC". Otherwise, the return is null.
Does anyone know how to fix this? I use PipeParser()

HL7 standard is mainly based on the correct order of segments in a message and fields in a segment.
What type of message are You trying to parse? Chances are, that the HL7 standard specifies the message order and HAPI in its object model follows the standard precisely. Any non-standard segments or unexpected segments in wrong order are ingored by the parser.
If You are dealing with a 3rd party source of messages and You have no way of making the input messages standards compliant, then You might have to modify the existing HAPI message types to accept Your own order of segments. A simple example is available on HAPI website - have a look!. Based on this example I added custom Z-segment mappings in an application I developed recently. It might also help You!

Thrift - converting from simple JSON

I created the following Thrift Object:
struct Student{
1: string id;
2: string firstName;
3: string lastName
}
Now I would like to read this object from JSON. According to this post this is possible
So I wrote the following code:
String json = "{\"id\":\"aaa\",\"firstName\":\"Danny\",\"lastName\":\"Lesnik\"}";
StudentThriftObject s = new StudentThriftObject();
byte[] jsonAsByte = json.getBytes("UTF-8");
TMemoryBuffer memBuffer = new TMemoryBuffer(jsonAsByte.length);
memBuffer.write(jsonAsByte);
TProtocol proto = new TJSONProtocol(memBuffer);
s.read(proto);
What I'm getting is the following exception:
Exception in thread "main" org.apache.thrift.protocol.TProtocolException: Unexpected character:i
at org.apache.thrift.protocol.TJSONProtocol.readJSONSyntaxChar(TJSONProtocol.java:322)
at org.apache.thrift.protocol.TJSONProtocol.readJSONInteger(TJSONProtocol.java:698)
at org.apache.thrift.protocol.TJSONProtocol.readFieldBegin(TJSONProtocol.java:837)
at com.vanilla.thrift.example.entities.StudentThriftObject$StudentThriftObjectStandardScheme.read(StudentThriftObject.java:486)
at com.vanilla.thrift.example.entities.StudentThriftObject$StudentThriftObjectStandardScheme.read(StudentThriftObject.java:479)
at com.vanilla.thrift.example.entities.StudentThriftObject.read(StudentThriftObject.java:413)
at com.vanilla.thrift.controller.Main.main(Main.java:24)
Am I missing something?

You are missing the fact, that Thrift's JSON is different from yours. The field names are not written, instead the assigned field ID numbers are written (and expected). Here's an example for Thrift's JSON protocol:
[1,"MyService",2,1,{"1":{"rec":{"1":{"str":"Error: Process() failed"}}}}]
In other words, Thrift is not intended to parse any kind of JSON. It supports a very specific JSON format as one of the possible transports.
However, depending on what the origin of your JSON data is, Thrift can possibly still help you out, if you are able to use it on both sides. In that case, write an IDL to describe the data structures, feed it to the Thrift compiler and integrate both the generated code and the neccessary parts of the library with your projects.
If the origin of the JSON lies outside of your reach, or if the JSON format cannot be changed for some reason, you need to find another way.
Format and semantics are different beasts
To some extent, the whole issue can be compared with XML: There is one general XML syntax, which tells us how we have to fomat things so any standard conformant XML processor can read them.
But knowing the rules of XML is only half the answer, if we get a certain XML file from someone. Even if our XML parser can read the file successfully, because it is well-formed XML, we need to know the semantics of the data to really make use of what's within that file: Is it a customer data record? Or is it a SOAP envelope? Maybe a configuration file?
That is where DTDs or XML Schema come into play, they exist to describe the contents of the XML data. Without knowing the logical structure you are lost, because there are myriads of possible ways to express things in XML. And exactly the same is true with JSON, except that JSON schema descriptions are less commonly used.
"So you mean, we need just a way to tell Thrift how the JSON is organized?"
No, because the purpose and idea behind Thrift is to have a framework to de/serialize things and/or implement RPC servers and clients as efficiently as possible. It is not intended to have a general purpose file parser. Instead, Thrift reads and speaks only its own set of formats, which are plugged into the architecture as protocols: Thrift Binary, Thrift JSON, Thrift Compact, and a few more.
What you could do: In addition to what I said at in the first section of my answer, you may consider writing your own custom Thrift protocol implementation to support your particular JSON format of choice. It is not that hard, and worth a try.

Best file format regarding standard string and integer data?

For my project, I need to store info about protocols (the data sent (most likely integers) and in the order it's sent) and info that might be formatted something like this:
'ID' 'STRING' 'ADDITIONAL INTEGER DATA'
This info will be read by a Java program and stored in memory for processing, but I don't know what would be the most sensible format to store this data in?
EDIT: Here's some extra information:
1)I will be using this data in a game server.
2)Since it is a game server, speed is not the primary concern, since this data will primary be read and utilized during startup, which shouldn't occur very often.
3)Memory consumption I would like to keep at a minimum, however.
4)The second data "example" will be used as a "dictionary" to look up names of specific in-game items, their stats and other integer data (and therefore might become very large, unlike the first data containing the protocol information, where each file will only note small protocol bites, like a login protocol for instance).
5)And yes, I would like the data to be "human-editable".
EDIT 2: Here's the choices that I've made:
JSON - For the protocol descriptions
CSV - For the dictionaries

There are many factors that could come to weigh--here are things that might help you figure this out:
1) Speed/memory usage: If the data needs to load very quickly or is very large, you'll probably want to consider rolling your own binary format.
2) Portability/compatibility: Balanced against #1 is the consideration that you might want to use the data elsewhere, with programs that won't read a custom binary format. In this case, your heavy hitters are probably going to be CSV, dBase, XML, and my personal favorite, JSON.
3) Simplicity: Delimited formats like CSV are easy to read, write, and edit by hand. Either use double-quoting with proper escaping or choose a delimiter that will not appear in the data.
If you could post more info about your situation and how important these factors are, we might be able to guide you further.

How about XML, JSON or CSV ?

I've written a similar protocol-specification using XML. (Available here.)
I think it is a good match, since it captures the hierarchal nature of specifying messages / network packages / fields etc. Order of fields are well defined and so on.
I even wrote a code-generator that generated the message sending / receiving classes with methods for each message type in XSLT.
The only drawback as I see it is the verbosity. If you have a really simple structure of the specification, I would suggest you use some simple home-brewed format and write a parser for it using a parser-generator of your choice.

In addition to the formats suggested by others here (CSV, XML, JSON, etc.) you might consider storing the info in a Java properties file. (See the java.util.Properties class.) The code is already there for you, so all you have to figure out is the properties names (or name prefixes) you want to use.
The Properties class also provides for storing/loading properties in a simple XML format.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.