I want to read python dictionary string using java. Example string:
{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}
This is not a valid JSON. I want it to convert into proper JSON using java code.
well, the best way would be to pass it through a python script that reads that data and outputs valid json:
>>> json.dumps(ast.literal_eval("{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"))
'{"name": "Shivam", "otherInfo": [[0], [1]], "isMale": true}'
so you could create a script that only contains:
import json, ast; print(json.dumps(ast.literal_eval(sys.argv[1])))
then you can make it a python oneliner like so:
python -c "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))" "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"
that you can run from your shell, meaning you can run it from within java the same way:
String PythonData = "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}";
String[] cmd = {
"python", "-c", "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))",
python_data
};
Runtime.getRuntime().exec(cmd);
and as output you'll have a proper JSON string.
This solution is the most reliable way I can think of, as it's going to parse safely any python syntax without issue (as it's using the python parser to do so), without opening a window for code injection.
But I wouldn't recommend using it, because you'd be spawning a python process for each string you parse, which would be a performance killer.
As an improvement on top of that first answer, you could use some jython to run that python code in the JVM for a bit more performance.
PythonInterpreter interpreter = new PythonInterpreter();
interpreter.eval("to_json = lambda d: json.dumps(ast.literal_eval(d))")
PyObject ToJson = interpreter.get("to_json");
PyObject result = ToJson.__call__(new PyString(PythonData));
String realResult = (String) result.__tojava__(String.class);
The above is untested (so it's likely to fail and spawn dragons 👹) and I'm pretty sure you can make it more elegant. It's loosely adapted from this answer. I'll leave up to you as an exercise to see how you can include the jython environment in your Java runtime ☺.
P.S.: Another solution would be to try and fix every pattern you can think of using a gigantic regexp or multiple ones. But even if on simpler cases that might work, I would advise against that, because regex is the wrong tool for the job, as it won't be expressive enough and you'll never be comprehensive. It's only a good way to plant a seed for a bug that'll kill you at some point in the future.
P.S.2: Whenever you need to parse code from an external source, always make sure that data is sanitized and safe. Never forget about little bobby tables
In conjunction to the other answer: it is straight forward to simply invoke that python one-liner statement to "translate" a python-dict-string into a standard JSON string.
But doing a new Process for each row in your database might turn into a performance killer quickly.
Thus there are two options that you should consider on top of that:
establish some small "python server" that keeps running; its only job is to do that translation for JVMs that can connect to it
you can look into jython. Meaning: simply enable your JVM to run python code. In other words: instead of writing your own python-dict-string parser; you simply add "python powers" to your JVM; and rely on existing components to that translation for you.
Related
I have a big .pm File, which only consist of a very big Perl hash with lots of subhashes. I have to load this hash into a Java program, do some work and changes on the data lying below and save it back into a .pm File, which should look similar to the one i started with.
By now, i tried to convert it linewise by regex and string matching, converting it into a XML Document and later Elementwise parse it back into a perl hash.
This somehow works, but seems quite dodgy. Is there any more reliable way to parse the perl hash without having a perl runtime installed?
You're quite right, it's utterly filthy. Regex and string for XML in the first place is a horrible idea, and honestly XML is probably not a good fit for this anyway.
I would suggest that you consider JSON. I would be stunned to find java can't handle JSON and it's inherently a hash-and-array oriented data structure.
So you can quite literally:
use JSON;
print to_json ( $data_structure, { pretty => 1 } );
Note - it won't work for serialising objects, but for perl hash/array/scalar type structures it'll work just fine.
You can then import it back into perl using:
my $new_data = from_json $string;
print Dumper $new_data;
Either Dumper it to a file, but given you requirement is multi-language going forward, just using native JSON as your 'at rest' data is probably a more sensible choice.
But if you're looking at parsing perl code within java, without a perl interpreter? No, that's just insanity.
I've been creating a simple parser combinator library in Java and for a first attempt I'm using programatic strcutures to define both the tokens and the parser grammar, see below:
final Combinator c2 = new CombinatorBuilder()
/*
.addParser("SEXPRESSION", of(Option.of(new Terminal("LPAREN"), zeroOrMore(new ParserPlaceholder("EXPRESSION")), new Terminal("RPAREN"))))
.addParser("QEXPRESSION", of(Option.of(new Terminal("LBRACE"), zeroOrMore(new ParserPlaceholder("EXPRESSION")), new Terminal("RBRACE"))))
*/
.addParser("SEXPRESSION", of(Option.of(new Terminal("LPAREN"), new ParserPlaceholder("EXPRESSION"), new Terminal("RPAREN"))))
.addParser("QEXPRESSION", of(Option.of(new Terminal("LBRACE"), new ParserPlaceholder("EXPRESSION"), new Terminal("RBRACE"))))
.addParser("EXPRESSION", of(
Option.of(new Terminal("NUMBER")),
Option.of(new Terminal("SYMBOL")),
Option.of(new Terminal("STRING")),
Option.of(new Terminal("COMMENT")),
Option.of(new ParserPlaceholder("SEXPRESSION")),
Option.of(new ParserPlaceholder("QEXPRESSION"))
)).build()
If I take the first Parser "SEXPRESSION" defined using the builer I can explain the structure:
Parameters to addParser:
Name of parser
an ImmutableList of disjunctive Options
Parameters to Option.of:
An array of Elements where each element is either a Terminal, or a ParserPlaceholder which is later substituted for the actual Parser where the names match.
The idea is to be able to reference one Parser from another and thus have more complex grammars expressed.
The problem I'm having is that using the grammar above to parse a string value such as "(+ 1 2)" gets stuck in an infinite recursive call when parsing the RPAREN ')' as the "SEXPRESSIONS" and "EXPRESSION" Parsers have "one or many" cardinaltiy.
I'm sure I could get creative and come up with some way of limiting the depth of the recursive calls, perhaps by ensuring that when the "SEXPRESSION" parser hands off to the "EXPRESSION" parser which then hands off to the "SEXPRESSION" parser, and no token are taken, then drop out? But I don't want a hacky solution if a standard solution exists.
Any ideas?
Thanks
Not to dodge the question, but I don't think there's anything wrong with calling an application using VM arguments to increase stack size.
This can be done in Java by adding the flag -XssNm where N is the amount of memory the application is called with.
The default Java stack size is 512 KB which, frankly, is hardly any memory at all. Minor optimizations aside, I felt that it was difficult, if not impossible to work with that little memory to implement complex recursive solutions, especially because Java isn't the least bit efficient when it comes to recursion.
So, some examples of this flag, as as follows:
-Xss4M 4 MB
-Xss2G 2 GB
It also goes right after you call java to launch the application, or if you are using an IDE like Eclipse, you can go in and manually set the command line arguments in run configurations.
Hope this helps!
The string at the bottom of this post is the serialization of a java.util.GregorianCalendar object in Java. I am hoping to parse it in Python.
I figured I could approach this problem with a combination of regexps and key=val splitting, i.e. something along the lines of:
text_inside_brackets = re.search(r"\[(.*)\]", text).group(1)
and
import parse
for x in [parse('{key} = {value}', x) for x in text_inside_brackets.split('=')]:
my_dict[x['key']] = x['value']
My question is: What would be a more principled / robust approach to do this? Are there any Python parsers for serialized Java objects that I could use for this problem? (do such things exist?). What other alternatives do I have?
My hope is to ultimately parse this in JSON or nested Python dictionaries, so that I can manipulate it it any way I want.
Note: I would prefer to avoid a solution relies on Py4J mostly because it requires setting up a server and a client, and I am hoping to do this within a single
Python script.
java.util.GregorianCalendar[time=1413172803113,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="America/New_York",offset=-18000000,dstSavings=3600000,useDaylight=true,transitions=235,lastRule=java.util.SimpleTimeZone[id=America/New_York,offset=-18000000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2014,MONTH=9,WEEK_OF_YEAR=42,WEEK_OF_MONTH=3,DAY_OF_MONTH=13,DAY_OF_YEAR=286,DAY_OF_WEEK=2,DAY_OF_WEEK_IN_MONTH=2,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=3,MILLISECOND=113,ZONE_OFFSET=-18000000,DST_OFFSET=3600000]
The serialized form of a GregorianCalendar object contains quite a lot of redundancy. In fact, there are only two fields that matter, if you want to reconstitute it:
the time
the timezone
There is code for extracting this in How to convert Gregorian string to Gregorian Calendar?
If you want a more principled and robust approach, I echo mbatchkarov's suggestion to use JSON.
I am working on a large scale project that involves giving a python script a first name and getting back a result as to what kind of gender it belongs to. My current program is written in Java and using Jython to interact with a Python script called "sex machine." It works great in most cases and I've tested it with smaller groups of users. However, when I attempt to test it with a large group of users the program gets about halfway in and then gives me the following error:
"Exception in thread "main" SyntaxError: No viable alternative to input '\\n'", ('<string>', 1, 22, "result = d.get_gender('Christinewazonek'')\n")
I am more accustomed to Java and have limited knowledge of Python so at the moment I don't know how to solve this problem. I tried to trim the string that I'm giving the get_gender method but that didn't help any. I am not sure what the numbers 1, 22 even mean.
Like I said since I'm using Jython my code would be the following:
static PythonInterpreter interp = new PythonInterpreter();
interp.exec("import sys, os.path");
interp.exec("sys.path.append('/Users/myname/Desktop/')");
interp.exec("import sexmachine.detector as gender");
interp.exec("d = gender.Detector()");
interp.exec("result = d.get_gender('"+WordUtils.capitalize(name).trim()
+"')");
PyObject gendAnswer = interp.get("result");
And this is pretty much the extent of Jython/Python interaction in my Java code. If someone sees something that's wrong or not right I would certainly appreciate if you could help me. As this is a large project it takes time to run the whole program again only to run into the same issue, so because of this I really need to fix this problem.
I don't know if it helps but this is what I did and it works for me.
public static void main(String[] args){
PythonInterpreter pI = new PythonInterpreter();
pI.exec("x = 3");
PyObject result = pI.get("x");
System.out.println(result);
}
Not sure if you sorted this out, but have an extra apostrophe on
d.get_gender('Christinewazonek'')
Just like in Java, everything you open you need to close, and in this case you opened a string containing )\n") which was not closed.
Depending on the interpreter you are using, this can be flagged easily. Perhaps you might try different interpreter.
Specifically I am converting a python script into a java helper method. Here is a snippet (slightly modified for simplicity).
# hash of values
vals = {}
vals['a'] = 'a'
vals['b'] = 'b'
vals['1'] = 1
output = sys.stdout
file = open(filename).read()
print >>output, file % vals,
So in the file there are %(a), %(b), %(1) etc that I want substituted with the hash keys. I perused the API but couldn't find anything. Did I miss it or does something like this not exist in the Java API?
You can't do this directly without some additional templating library. I recommend StringTemplate. Very lightweight, easy to use, and very optimized and robust.
I doubt you'll find a pure Java solution that'll do exactly what you want out of the box.
With this in mind, the best answer depends on the complexity and variety of Python formatting strings that appear in your file:
If they're simple and not varied, the easiest way might be to code something up yourself.
If the opposite is true, one way to get the result you want with little work is by embedding Jython into your Java program. This will enable you to use Python's string formatting operator (%) directly. What's more, you'll be able to give it a Java Map as if it were a Python dictionary (vals in your code).