What is the best way to parse this configuration file? - java

I am working on a personal project that uses a custom config file. The basic format of the file looks like this:
[users]
name: bob
attributes:
hat: brown
shirt: black
another_section:
key: value
key2: value2
name: sally
sex: female
attributes:
pants: yellow
shirt: red
There can be an arbitrary number of users and each can have different key/value pairs and there can be nested keys/values under a section using tab-stops. I know that I can use json, yaml, or even xml for this config file, however, I'd like to keep it custom for now.
Parsing shouldn't be difficult at all as I have already written code to do parse it. My question is, what is the best way to go about parsing this using clean and structured code as well as writing in a way that won't make changes in the future difficult (there might be multiple nests in the future). Right now, my code looks utterly disgusting. For example,
private void parseDocument() {
String current;
while((current = reader.readLine()) != null) {
if(current.equals("") || current.startsWith("#")) {
continue; //comment
}
else if(current.startsWith("[users]")) {
parseUsers();
}
else if(current.startsWith("[backgrounds]")) {
parseBackgrounds();
}
}
}
private void parseUsers() {
String current;
while((current = reader.readLine()) != null) {
if(current.startsWith("attributes:")) {
while((current = reader.readLine()) != null) {
if(current.startsWith("\t")) {
//add user key/values to User object
}
else if(current.startsWith("another_section:")) {
while((current = reader.readLine()) != null) {
if(current.startsWith("\t")) {
//add user key/values to new User object
}
else if (current.equals("")) {
//newline means that a new user is up to parse next
}
}
}
}
}
else if(!current.isEmpty()) {
//
}
}
}
As you can see, the code is pretty messy, and I have cut it short for the presentation here. I feel there are better ways to do this as well maybe not using BufferedReader. Can someone please provide possibly a better way or approach that is not as convoluted as mine?

I would suggest not creating custom code for config files. What you're proposing isn't too far removed from YAML (getting started). Use that instead.
See Which java YAML library should I use?

Everyone will recommend using XML because it's simply better.
However, in case you're on a quest to prove your programmer's worth to yourself...
...there is nothing really fundamentally wrong with the code you posted in the sense that it's clear and it's obvious to potential readers what's going on, and unless I'm totally out of the loop on file operations, it should perform pretty much as well as it could.
The one criticism I could offer is that it's not recursive. Every level requires a new level of code to support. I would probably make a recursive function (a function that calls itself with sub-content as parameter and then again if there's sub-sub-content etc.), that could be called, reading all of this stuff into a hashtable with hashtables or something, and then I'd use that hashtable as a configuration object.
Then again, at that point I would probably stop seeing the point and use XML. ;)

I'd recommend changing the configuration file's format to JSON and using an existing library to parse the JSON objects such as FlexJSON.
{
"users": [
{
"name": "bob",
"hat": "brown",
"shirt": "black",
"another_section": {
"key": "value",
"key2": "value2"
}
},
{
"name": "sally",
"sex": "female",
"another_section": {
"pants": "yellow",
"shirt": "red"
}
}
]
}

It looks simple enough for a state machine.
while((current = reader.readLine()) != null) {
if(current.startsWith("[users]"))
state = PARSE_USER;
else if(current.startsWith("[backgrounds]"))
state = PARSE_BACKGROUND;
else if (current.equals("")) {
// Store the user or background that you've been building up if you have one.
switch(state) {
case PARSE_USER:
case USER_ATTRIBUTES:
case USER_OTHER_ATTRIBUTES:
state = PARSE_USER;
break;
case PARSE_BACKGROUND:
case BACKGROUND_ATTRIBUTES:
case BACKGROUND_OTHER_ATTRIBUTES:
state = PARSE_BACKGROUND;
break;
}
} else switch(state) {
case PARSE_USER:
case USER_ATTRIBUTES:
case USER_OTHER_ATTRIBUTES:
if(current.startsWith("attributes:"))
state = USER_ATTRIBUTES;
else if(current.startsWith("another_section:"))
state = USER_OTHER_ATTRIBUTES;
else {
// Split the line into key/value and store into user
// object being built up as appropriate based on state.
}
break;
case PARSE_BACKGROUND:
case BACKGROUND_ATTRIBUTES:
case BACKGROUND_OTHER_ATTRIBUTES:
if(current.startsWith("attributes:"))
state = BACKGROUND_ATTRIBUTES;
else if(current.startsWith("another_section:"))
state = BACKGROUND_OTHER_ATTRIBUTES;
else {
// Split the line into key/value and store into background
// object being built up as appropriate based on state.
}
break;
}
}
// If you have an unstored object, store it.

If you could utilise XML or JSON or other well-known data encoding as the data format, it will be a lot easier to parse/deserialize the text content and extract the values.
For example.
name: bob
attributes:
hat: brown
shirt: black
another_section:
key: value
key2: value2
Can be Expressed as the follow XML (there are other options to express it in XML as well)
<config>
<User hat="brown" shirt="black" >
<another_section>
<key>value</key>
<key2>value</key2>
</another_section>
</User>
</config>
Custom ( Extremely simple )
As I mentioned in the comment below, you can just make them all name and value pairs.
e.g.
name :bob
attributes_hat :brown
attributes_shirt :black
another_section_key :value
another_section_key2 :value2
and then do string split on '\n' (newline) and ':' to extract the key and value or build a dictionary/map object.

A nice way to clean it up would be to use a table, i.e. replace your conditionals with a Map. You can then invoke you parsing methods through reflection (simple) or create a few more classes implementing a common interface (more work but more robust).

Related

modify underlying result/value of async object

I am using Kotlin in a webserver app and I have a line of code as follows:
.onComplete { jsonResult: AsyncResult<JsonObject>? ->
Now what I want to do is change the underlying JsonObject wrapped in the AsyncResult, so that it is going to be reflected further downstream.
var res: JsonObject? = jsonResult?.result()
if (res != null) {
if (res.getInteger("files_uploaded") > 0) {
res.put("URL", "Some URL")
}
}
I was then imagining to update the underlying JSON object in the result but not sure how to do that.
please take note that single quotes are missing and ` appear as \` because the code formatting. I tried to leave what seemed least confusing...
You should be able to make changes in the conditional statement
if (res !=null) {
res being the JsonObject:
console.log(res);
would show you what's in there. You may need to use
let resXmodifiedX = JSON.parse(res);
One approach is to write a function and pass res to that function which you can do if it is in the console.log(res).
Some notes on what's below:
place the function somewhere consistent maybe at the bottom of the file...
objects often have multiple levels res.person.name, res.contact.email, or whatever...
use multiple for loops:
let level = res[key]; for(child in level) {
you don't need to do this if you know exactly what object attributes you need to update.
you can set the value directly but you always want to test for it before trying to set it to avoid errors that stop execution.
let toBe = toBe =>`${toBe}`;
let update = (res)?toBe(update(res)):toBe('not Found');
This option is really only if you know for sure that data will be there and you can't proceed without it. Which is not uncommon but also not how JSON is designed to be used.
The code below is a concise way to make some simple changes but may not be an ideal solution. To use it xModify(res) replaces console.log(res) above.
function xModify(x) {
let resXmodifiedX = JSON.parse(x);
let res = resXmodifiedX;
for (key in res) {
res[key] = key=='name'? \`My change ${res[key]}\`: key=='other'? \`My Change ${res[key]}\`:res[key];
resXmodifiedX = JSON.stringify(res);
return resXmodifiedX;
}
That will update res.name and res.other otherwise res[key] is unchanged. If you do not need to parse res change let res = xModifiedx; to let res = x; remove the first line and change the last two lines to return res;
function xModify(x) {
let res = x;
for (key in res) {
res[key] = key=='name'? \`My change ${res[key]}\`: key=='other'? \`My Change ${res[key]}\`:res[key];
return res;
}
If your data is numeric which is not generally the case in a web server response scenario this is a terrible approach. Because it is probably a string I used the template variable as a way to easily add a complex pattern in place of a string. My change ${res[key]} not a real world example. Any valid JS code can go in the ${ } (template variable). I've been defaulting to the first pattern more and more.
let me = (bestCase)?`${'the best version'} of myself`:`${'someone'} I'm ok with`;

How to handle java object heap VM? [duplicate]

I'm trying to parse some huge JSON file (like http://eu.battle.net/auction-data/258993a3c6b974ef3e6f22ea6f822720/auctions.json) using gson library (http://code.google.com/p/google-gson/) in JAVA.
I would like to know what is the best approch to parse this kind of big file (about 80k lines) and if you may know good API that can help me processing this.
Some idea...
read line by line and get rid of the JSON format: but that's nonsense.
reduce the JSON file by splitting this file into many other: but I did not find any good Java API for this.
use this file directlly as nonSql database, keep the file and use it as my database.
I would really appreciate adices/ help/ messages/ :-)
Thanks.
You don't need to switch to Jackson. Gson 2.1 introduced a new TypeAdapter interface that permits mixed tree and streaming serialization and deserialization.
The API is efficient and flexible. See Gson's Streaming doc for an example of combining tree and binding modes. This is strictly better than mixed streaming and tree modes; with binding you don't waste memory building an intermediate representation of your values.
Like Jackson, Gson has APIs to recursively skip an unwanted value; Gson calls this skipValue().
I will suggest to have a look at Jackson Api it is very easy to combine the streaming and tree-model parsing options: you can move through the file as a whole in a streaming way, and then read individual objects into a tree structure.
As an example, let's take the following input:
{
"records": [
{"field1": "aaaaa", "bbbb": "ccccc"},
{"field2": "aaa", "bbb": "ccc"}
] ,
"special message": "hello, world!"
}
Just imagine the fields being sparse or the records having a more complex structure.
The following snippet illustrates how this file can be read using a combination of stream and tree-model parsing. Each individual record is read in a tree structure, but the file is never read in its entirety into memory, making it possible to process JSON files gigabytes in size while using minimal memory.
import org.codehaus.jackson.map.*;
import org.codehaus.jackson.*;
import java.io.File;
public class ParseJsonSample {
public static void main(String[] args) throws Exception {
JsonFactory f = new MappingJsonFactory();
JsonParser jp = f.createJsonParser(new File(args[0]));
JsonToken current;
current = jp.nextToken();
if (current != JsonToken.START_OBJECT) {
System.out.println("Error: root should be object: quiting.");
return;
}
while (jp.nextToken() != JsonToken.END_OBJECT) {
String fieldName = jp.getCurrentName();
// move from field name to field value
current = jp.nextToken();
if (fieldName.equals("records")) {
if (current == JsonToken.START_ARRAY) {
// For each of the records in the array
while (jp.nextToken() != JsonToken.END_ARRAY) {
// read the record into a tree model,
// this moves the parsing position to the end of it
JsonNode node = jp.readValueAsTree();
// And now we have random access to everything in the object
System.out.println("field1: " + node.get("field1").getValueAsText());
System.out.println("field2: " + node.get("field2").getValueAsText());
}
} else {
System.out.println("Error: records should be an array: skipping.");
jp.skipChildren();
}
} else {
System.out.println("Unprocessed property: " + fieldName);
jp.skipChildren();
}
}
}
}
As you can guess, the nextToken() call each time gives the next parsing event: start object, start field, start array, start object, ..., end object, ..., end array, ...
The jp.readValueAsTree() call allows to read what is at the current parsing position, a JSON object or array, into Jackson's generic JSON tree model. Once you have this, you can access the data randomly, regardless of the order in which things appear in the file (in the example field1 and field2 are not always in the same order). Jackson supports mapping onto your own Java objects too. The jp.skipChildren() is convenient: it allows to skip over a complete object tree or an array without having to run yourself over all the events contained in it.
Declarative Stream Mapping (DSM) library allows you to define mappings between your JSON or XML data and your POJO. So you don't need to write a custom parser. İt has powerful scripting(Javascript, groovy, JEXL) support. You can filter and transform data while you are reading. You can call functions for partial data operation while you are reading data. DSM read data as a Stream so it uses very low memory.
For example,
{
"company": {
....
"staff": [
{
"firstname": "yong",
"lastname": "mook kim",
"nickname": "mkyong",
"salary": "100000"
},
{
"firstname": "low",
"lastname": "yin fong",
"nickname": "fong fong",
"salary": "200000"
}
]
}
}
imagine the above snippet is a part of huge and complex JSON data. we only want to get stuff that has a salary higher than 10000.
First of all, we must define mapping definitions as follows. As you see, it is just a yaml file that contains the mapping between POJO fields and field of JSON data.
result:
type: object # result is map or a object.
path: /.+staff # path is regex. its match with /company/staff
function: processStuff # call processStuff function when /company/stuff tag is closed
filter: self.data.salary>10000 # any expression is valid in JavaScript, Groovy or JEXL
fields:
name:
path: firstname
sureName:
path: lastname
userName:
path: nickname
salary: long
Create FunctionExecutor for process staff.
FunctionExecutor processStuff=new FunctionExecutor(){
#Override
public void execute(Params params) {
// directly serialize Stuff class
//Stuff stuff=params.getCurrentNode().toObject(Stuff.class);
Map<String,Object> stuff= (Map<String,Object>)params.getCurrentNode().toObject();
System.out.println(stuff);
// process stuff ; save to db. call service etc.
}
};
Use DSM to process JSON
DSMBuilder builder = new DSMBuilder(new File("path/to/mapping.yaml")).setType(DSMBuilder.TYPE.XML);
// register processStuff Function
builder.registerFunction("processStuff",processStuff);
DSM dsm= builder.create();
Object object = dsm.toObject(xmlContent);
Output: (Only stuff that has a salary higher than 10000 is included)
{firstName=low, lastName=yin fong, nickName=fong fong, salary=200000}

Updating pre-existing documents in mongoDB java driver when you've changed document structure

I've got a database of playerdata that has some pre-existing fields from previous versions of the program. Example out-dated document:
{
"playername": "foo"
}
but a player document generated under the new version would look like this:
{
"playername": "bar",
"playercurrency": 20
}
the issue is that if I try to query playercurrency on foo I get a NullPointerException because playercurrency doesn't exist for foo. I want to add the playercurrency field to foo without disturbing any other data that could be stored in foo. I've tried some code using $exists Example:
players.updateOne(new Document("playername", "foo"), new Document("$exists", new Document("playername", "")));
players.updateOne(new Document("playername", "foo"), new Document("$exists", new Document("playercurrency", 20)));
My thought is that it updates only playercurrency because it doesn't exist and it would leave playername alone becuase it exists. I might be using exists horribly wrong, and if so please do let me know because this is one of my first MongoDB projects and I would like to learn as much as I possibly can.
Do you have to do this with java? Whenever I add a new field that I want to be required I just use the command line to migrate all existing documents. This will loop through all players that don't have a playercurrency and set it to 0 (change to whatever default you want):
db.players.find({playercurrency:null}).forEach(function(player) {
player.playercurrency = 0; // or whatever default value
db.players.save(player);
});
This will result in you having the following documents:
{
"playername" : "foo",
"playercurrency" : 0
}
{
"playername" : "bar",
"playercurrency" : 20
}
So I know that it is normally frowned upon on answering your own question, but nobody really posted what I ended up doing I would like to take this time to thank #Mark Watson for answering and ultimately guiding me to finding my answer.
Since checking if a certain field is null doesn't work in the MongoDB Java Driver I needed to find a different way to know when something is primed for an update. So after a little bit of research I stumbled upon this question which helped me come up with this code:
private static void updateValue(final String name, final Object defaultValue, final UUID key) {
if (!exists(name, key)) {
FindIterable iterable = players.find(new Document("_id", key));
iterable.forEach(new Block<Document>() {
#Override
public void apply(Document document) {
players.updateOne(new Document("_id", key), new Document("$set", new Document(name, defaultValue)));
}
});
}
}
private static boolean exists(String name, UUID key) {
Document query = new Document(name, new Document("$exists", true)).append("_id", key);
return players.count(query) == 1;
}
Obviously this is a little specialized to what I wanted to do, but with little revisions it can be easliy changed to work with anything you might need. Make sure to replace players with your Collection object.

Obtaining all name-value pairs in a form using Jsoup

I want to automate posting of a number of HTML forms using Jsoup and HttpClient. Most of those forms have hidden fields (with session ids, etc.) or have default values that I'd rather leave alone.
Coding each of the form submissions individually -- extracting each of said hidden or default values from the page -- is extremely tedious, so I thought about writing a generic method to obtain the list of HTTP parameters for a given form.
It is not a trivial piece of code, though, because of the variety of input tags and field types, each of which may need specific handling (e.g. textareas, checkboxes, radio buttons, selects, ...) so I thought I'd first search/ask in case it already exists.
Note: Jsoup and HttpClient are a given; I can't change that -- so please no need to provide answers suggesting other solutions: I have a Jsoup Document object and I need to build an HttpClient HttpRequest.
So I've ended up writing it. I would still prefer to swap for something field-tested (and hopefully maintained elsewhere), but in case it helps anyone landing here...
Not thoroughly tested and without support for multipar/form-data, but works in the few examples I've tried:
public void submit(String formSelector, List<String> params) {
if (params.size() % 2 != 0) {
throw new Exception("There must be an even number of params.");
}
Element form= $(formSelector).first();
Set<String> newParams= Sets.newHashSet();
for (int i=0; i < params.size(); i+= 2) {
newParams.add(params.get(i));
}
List<String> allParams= Lists.newArrayList(params);
for (Element field: form.select("input, select, textarea")) {
String name= field.attr("name");
if (name == null || newParams.contains(name)) continue;
String type= field.attr("type").toLowerCase();
if ("checkbox".equals(type) || "radio".equals(type)) {
if (field.attr("checked") != null) {
allParams.add(field.attr("name"));
allParams.add(field.attr("value"));
}
}
else if (! fieldTypesToIgnore.contains(type)) {
allParams.add(field.attr("name"));
allParams.add(field.val());
}
}
String action= form.attr("abs:action");
String method= form.attr("method").toLowerCase();
// String encType= form.attr("enctype"); -- TODO
if ("post".equals(method)) {
post(action, allParams);
}
else {
get(action, allParams);
}
}
($, get, and post are methods I already had lying around... you can easily guess what they do).
Jsoup has a formData method in the FormElement class; it works in simple cases, but it doesn't always do what I need, so I ended up writing some custom code too.

Java Vector Field (private member) accumulator doesn't store my Cows!

Edit: This code is fine. I found a logic bug somewhere that doesn't exist in my pseudo code. I was blaming it on my lack of Java experience.
In the pseudo code below, I'm trying to parse the XML shown. A silly example maybe but my code was too large/specific for anyone to get any real value out of seeing it and learning from answers posted. So, this is more entertaining and hopefully others can learn from the answer as well as me.
I'm new to Java but an experienced C++ programmer which makes me believe my problem lies in my understanding of the Java language.
Problem: When the parser finishes, my Vector is full of uninitialized Cows. I create the Vector of Cows with a default capacity (which shouldn't effect it's "size" if it's anything like C++ STL Vector). When I print the contents of the Cow Vector out after the parse, it gives the right size of Vector but all the values appear never to have been set.
Info: I have successfully done this with other parsers that don't have Vector fields but in this case, I'd like to use a Vector to accumulate Cow properties.
MoreInfo: I can't use generics (Vector< Cow >) so please don't point me there. :)
Thanks in advance.
<pluralcow>
<cow>
<color>black</color>
<age>1</age>
</cow>
<cow>
<color>brown</color>
<age>2</age>
</cow>
<cow>
<color>blue</color>
<age>3</age>
</cow>
</pluralcow>
public class Handler extends DefaultHandler{
// vector to store all the cow knowledge
private Vector m_CowVec;
// temp variable to store cow knowledge until
// we're ready to add it to the vector
private Cow m_WorkingCow;
// flags to indicate when to look at char data
private boolean m_bColor;
private boolean m_bAge;
public void startElement(...tag...)
{
if(tag == pluralcow){ // rule: there is only 1 pluralcow tag in the doc
// I happen to magically know how many cows there are here.
m_CowVec = new Vector(numcows);
}else if(tag == cow ){ // rule: multiple cow tags exist
m_WorkingCow = new Cow();
}else if(tag == color){ // rule: single color within cow
m_bColor = true;
}else if(tag == age){ // rule: single age within cow
m_bAge = true;
}
}
public void characters(...chars...)
{
if(m_bColor){
m_WorkingCow.setColor(chars);
}else if(m_bAge){
m_WorkingCow.setAge(chars);
}
}
public void endElement(...tag...)
{
if(tag == pluralcow){
// that's all the cows
}else if(tag == cow ){
m_CowVec.addElement(m_WorkingCow);
}else if(tag == color){
m_bColor = false;
}else if(tag == age){
m_bAge = false;
}
}
}
When you say that the Cows are uninitialized, are the String properties initialized to null? Or empty Strings?
I know you mentioned that this is pseudo-code, but I just wanted to point out a few potential problems:
public void startElement(...tag...)
{
if(tag == pluralcow){ // rule: there is only 1 pluralcow tag in the doc
// I happen to magically know how many cows there are here.
m_CowVec = new Vector(numcows);
}else if(tag == cow ){ // rule: multiple cow tags exist
m_WorkingCow = new Cow();
}else if(tag == color){ // rule: single color within cow
m_bColor = true;
}else if(tag == age){ // rule: single age within cow
m_bAge = true;
}
}
You really should be using tag.equals(...) instead of tag == ... here.
public void characters(...chars...)
{
if(m_bColor){
m_WorkingCow.setColor(chars);
}else if(m_bAge){
m_WorkingCow.setAge(chars);
}
}
I'm assuming you're aware of this, but this methods is actually called with a character buffer with start and end indexes.
Note also that characters(...) can be called multiple times for a single text block, returning small chunks in each call:
http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)
"...SAX parsers may return all contiguous
character data in a single chunk, or
they may split it into several chunks..."
I doubt you'll run into that problem in the simple example you provided, but you also mentioned that this is a simplified version of a more complex problem. If in your original problem, your XML consists of large text blocks, this is something to consider.
Finally, as others have mentioned, if you could, it's a good idea to consider an XML marshalling library (e.g., JAXB, Castor, JIBX, XMLBeans, XStream to name a few).
The code looks fine to me. I say set breakpoints at the start of each function and watch it in the debugger or add some print statements. My gut tells me that either characters() is not being called or setColor() and setAge() don't work correctly, but that's just a guess.
I have to say that I'm not a big fan of this design.
However, are you sure that your characters is ever called ? (maybe a few system.outs would help). If it's never called, you would end up with an uninitialized cow.
Also, I would not try to implement an XML parser myself like this since you need to be more robust against validation issues.
You can use SAX or DOM4J, or even better, use Apache digester.
Also, if I have a schema I will use JaxB, or another code generator to speed up development of XML interface code. The code generators hide a lot of the complexity of working directly with SAX or DOM4J.

Categories

Resources