The string at the bottom of this post is the serialization of a java.util.GregorianCalendar object in Java. I am hoping to parse it in Python.
I figured I could approach this problem with a combination of regexps and key=val splitting, i.e. something along the lines of:
text_inside_brackets = re.search(r"\[(.*)\]", text).group(1)
and
import parse
for x in [parse('{key} = {value}', x) for x in text_inside_brackets.split('=')]:
my_dict[x['key']] = x['value']
My question is: What would be a more principled / robust approach to do this? Are there any Python parsers for serialized Java objects that I could use for this problem? (do such things exist?). What other alternatives do I have?
My hope is to ultimately parse this in JSON or nested Python dictionaries, so that I can manipulate it it any way I want.
Note: I would prefer to avoid a solution relies on Py4J mostly because it requires setting up a server and a client, and I am hoping to do this within a single
Python script.
java.util.GregorianCalendar[time=1413172803113,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="America/New_York",offset=-18000000,dstSavings=3600000,useDaylight=true,transitions=235,lastRule=java.util.SimpleTimeZone[id=America/New_York,offset=-18000000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2014,MONTH=9,WEEK_OF_YEAR=42,WEEK_OF_MONTH=3,DAY_OF_MONTH=13,DAY_OF_YEAR=286,DAY_OF_WEEK=2,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=3,MILLISECOND=113,ZONE_OFFSET=-18000000,DST_OFFSET=3600000]
The serialized form of a GregorianCalendar object contains quite a lot of redundancy. In fact, there are only two fields that matter, if you want to reconstitute it:
the time
the timezone
There is code for extracting this in How to convert Gregorian string to Gregorian Calendar?
If you want a more principled and robust approach, I echo mbatchkarov's suggestion to use JSON.
Related
I'm building Android application which interacts with REST API built on .NET.
If my table in SQL Server has 2 rows with the following datetime values:
2019-01-01 00:00:00.000
2019-01-01 00:00:00.113
Then the returned json will have the following values:
2019-01-01T00:00:00
2019-01-01T00:00:00.113
So I don't know how to provide the pattern for setDateFormat when creating an instance of Gson.
If I use GsonBuilder().setDateFormat("yyyy-MM-dd'T'HH:mm:ss"), then my gson can generalize on both cases but it loses millisecond part.
If I use GsonBuilder().setDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS"), then my gson won't lose millisecond part in the second case but it will throw an exception when dealing with the first case.
How can I successfully parse time in two cases but still achieve millisecond? Any provided solution would be appreciated. I don't mind if things I have to is server side or client side.
After a workaround, I found solution myself, thanks to peter.petrov
It is because I configure my API to return data in json format, rather than xml, in my WebApiConfig.cs. So I feel I can't control how it builds its time format. But I finally found it. This is my WebApiConfig.cs file:
var json = config.Formatters.JsonFormatter;
json.SerializerSettings.PreserveReferencesHandling = Newtonsoft.Json.PreserveReferencesHandling.Objects;
json.SerializerSettings.DateFormatString = "yyyy-MM-dd'T'HH:mm:ss.fff";
config.Formatters.Remove(config.Formatters.XmlFormatter);
Now I server always includes millisecond, no matter what.
That's it.
I have a big .pm File, which only consist of a very big Perl hash with lots of subhashes. I have to load this hash into a Java program, do some work and changes on the data lying below and save it back into a .pm File, which should look similar to the one i started with.
By now, i tried to convert it linewise by regex and string matching, converting it into a XML Document and later Elementwise parse it back into a perl hash.
This somehow works, but seems quite dodgy. Is there any more reliable way to parse the perl hash without having a perl runtime installed?
You're quite right, it's utterly filthy. Regex and string for XML in the first place is a horrible idea, and honestly XML is probably not a good fit for this anyway.
I would suggest that you consider JSON. I would be stunned to find java can't handle JSON and it's inherently a hash-and-array oriented data structure.
So you can quite literally:
use JSON;
print to_json ( $data_structure, { pretty => 1 } );
Note - it won't work for serialising objects, but for perl hash/array/scalar type structures it'll work just fine.
You can then import it back into perl using:
my $new_data = from_json $string;
print Dumper $new_data;
Either Dumper it to a file, but given you requirement is multi-language going forward, just using native JSON as your 'at rest' data is probably a more sensible choice.
But if you're looking at parsing perl code within java, without a perl interpreter? No, that's just insanity.
Question:
Instead of writing my own serialization algorithm; would it be possible to just use the built in Java serialization, like I have done below, while still having it work across multiple languages?
Explanation:
How I imagine it working, would be as follows: I start up a process, that will be be a language-specific program - written in that language. So I'd have a CppExecutor.exe file, for example. I would write data to a stream to this program. The program would then do what it needs to do, then return a result.
To do this, I would need to serialize the data in some way. The first thing that came to mind was the basic Java Serialization with the use of an ObjectInputStream and ObjectOutputStream. Most of what I have read has only stated that the Java serialization is Java-to-Java applications.
None of the data will ever need to be stored in a file. The method of transferring these packets would be through a java.lang.Process, which I have set up already.
The data will be composed of the following:
String - Mostly containing information that is displayed to the user.
Integer - most likely 32-bit. Won't need to deal with times.
Float- just to handle all floating-point values.
Character - to ensure proper types are used.
Array - Composed of any of the elements in this list.
The best way I have worked out how to do this is as follows: I would start with a 4-byte magic number - just to ensure we are working with the correct data. Following, I would have an integer specifying how many elements there are. After that, for each of the elements I would have: a single byte, signifying the data type (of the above), following by any crucial information, e.x: length for the String and Array. Then, the data that follows.
Side-notes:
I would also like to point out that a lot of these calculations will be taking place, where every millisecond could matter. Due to this, a text-based format (such as JSON) may produce far larger operation times. Considering that non of the packets would need to be interpreted by a human, using only bytes wouldn't be an issue.
I'd recommend Google protobuf: it is binary, stable, proven, and has bindings for all languages you've mentioned. Moreover, it also handles structured data nicely.
There is a binary json format called bson.
I would also like to point out that a lot of these calculations will be taking place, so a text-based format (such as JSON) may produce far larger operation times.
Do not optimize before you measured.
Premature optimization is the root of all evil.
Can you have a try and benchmark the throughput? See if it fits your needs?
Thrift,Protobuf,JSON,MessagePack
complexity of installation Thrift >> Protobuf > BSON > MessagePack > JSON
serialization data size JSON > MessagePack > Binary Thrift > Compact Thrift > Protobuf
time cost Compact Thrift > Binary Thrift > Protobuf > JSON > MessagePack
How would I parse this JSON array in Java? I'm confused because there is no object. Thanks!
EDIT: I'm an idiot! I should have read the documentation... that's probably what it's there for...
[
{
"id":"63565",
"name":"Buca di Beppo",
"user":null,
"phone":"(408)377-7722",
"address":"1875 S Bascom Ave Campbell, California, United States",
"gps_lat":"37.28967000",
"gps_long":"-121.93179700",
"monhh":"",
"tuehh":"",
"wedhh":"",
"thuhh":"",
"frihh":"",
"sathh":"",
"sunhh":"",
"monhrs":"",
"tuehrs":"",
"wedhrs":"",
"thuhrs":"",
"frihrs":"",
"sathrs":"",
"sunhrs":"",
"monspecials":"",
"tuespecials":"",
"wedspecials":"",
"thuspecials":"",
"frispecials":"",
"satspecials":"",
"sunspecials":"",
"description":"",
"source":"ripper",
"worldsbarsname":"BucadiBeppo31",
"url":"www.bucadebeppo.com",
"maybeDupe":"no",
"coupontext":"",
"couponimage":"0",
"distance":"1.00317",
"images":[
0
]
}
]
It is perfectly valid JSON. It is an array containing one object.
In JSON, arrays and objects don't have names. Only attributes of objects have names.
This is all described clearly by the JSON syntax diagrams at http://json.org. (FWIW, the site has translations in a number of languages ...)
How do you parse it? There are many libraries for parsing JSON. Many of them are linked from the site above. I suggest you use one of those rather than writing your own parsing code.
In response to this comment:
OTOH, writing your own parser is a reasonable project, and a good exercise for both learning JSON and learning Java (or whatever language). A reasonable parser can be written in about 500 lines of text.
In my opinion (having written MANY parsers in my time), writing a parser for a language is a very inefficient way to gain a working understanding the syntax of a language. And depending on how you implement the parser (and the nature of the language syntax specification) you can easily get an incorrect understanding.
A better approach is to read the language's syntax specification, which the OP has now done, and which you would have to do in order to implement a parser.
Writing a parser can be a good learning exercising, but it is really a learning exercise in writing parsers. Even then, you need to pick an appropriate implementation approach, and an appropriate language to be parsed.
It's an array containing one element. That element is an object. The object (dictionary) contains about 20 name/value pairs.
Specifically I am converting a python script into a java helper method. Here is a snippet (slightly modified for simplicity).
# hash of values
vals = {}
vals['a'] = 'a'
vals['b'] = 'b'
vals['1'] = 1
output = sys.stdout
file = open(filename).read()
print >>output, file % vals,
So in the file there are %(a), %(b), %(1) etc that I want substituted with the hash keys. I perused the API but couldn't find anything. Did I miss it or does something like this not exist in the Java API?
You can't do this directly without some additional templating library. I recommend StringTemplate. Very lightweight, easy to use, and very optimized and robust.
I doubt you'll find a pure Java solution that'll do exactly what you want out of the box.
With this in mind, the best answer depends on the complexity and variety of Python formatting strings that appear in your file:
If they're simple and not varied, the easiest way might be to code something up yourself.
If the opposite is true, one way to get the result you want with little work is by embedding Jython into your Java program. This will enable you to use Python's string formatting operator (%) directly. What's more, you'll be able to give it a Java Map as if it were a Python dictionary (vals in your code).