I have written an application which outputs data as XML. However, it would be nice to allow the user to completely customize the output format so they can more easily integrate it into their applications.
What would be the best way to approach this problem? My initial thoughts are to define a grammar and write a parser from the ground up.
Are there any free Java libraries that can assist in parsing custom scripting(formatting?) languages?
Since I already have the XML, would it be a better approach to just 'convert' this with a search & replace algorithm?
I should specify here that 'users' are other programmers so defining a simple language would be fine, and that the output is potentially recursive (imagine outputting the contents of a directory to XML).
Just looking for general advice in this area before I set off down the wrong track.
EDIT: To clarify... My situation is a bit unique. The application outputs coordinates and other data to be loaded into a game engine. Everybody seems to use a different, completely custom format in their own engine. Most people do not want to implement a JSON parser and would rather use what they already have working. In other words, it is in the interests of my users to have full control over the output, asking them to implement a different parser is not an option.
Have you considered just using a templating engine like Velocity or FreeMarker.
I would have created a result bean as a POJO.
Then I would have different classes working on the result bean. That way you can easily extend with new formats if needed.
E.g
Result result = logic.getResult();
XMLOutputter.output(result, "myXMLFile.xml");
Format1Outputter.output(result, "myFormat1File.fo1");
Format2Outputter.output(result, "myFormat2File.fo2");
If you are planning to provide this as an API to multiple parties, I would advise against allowing over-customization, it will add unnecessary complexity to your product and provide just one more place for bugs to be introduced.
Second, it will increase the complexity of your documentation and as a side affect likely cause your documentation to fall out of sync with the api in general.
The biggest thing I would suggest considering, in terms of making your stream easier to digest, is making the output available in JSON format, which just about every modern language has good support for (I use Gson for Java, myself).
Related
I have about a year of experience in coding in Java. To hone my skills I'm trying to write a Calendar/journal entry desktop app in Java. I've realized that I still have no experience in data persistence and still don't really understand what the data persistence options would be for this program -- So perhaps I'm jumping the gun, and the design choices that I'm hoping to implement aren't even applicable once I get into the nitty gritty.
I mainly want to write a calendar app that allows you to log daily journal entries with associated activity logs for time spent on daily tasks. In terms of adding, editing and viewing the journal entries, using a hash table with the dates of the entries as keys and the entries themselves as the values seems most Big-Oh efficient (O(1) average case for each using a hash table).
However, I'm also hoping to implement a feature that could, given a certain range of dates, provide a simple analysis of average amount of time spent on certain tasks per day. If this is one of the main features I'm interested in, am I wrong in thinking that perhaps a sorted array would be more Big-Oh efficient? Especially considering that the data entries are generally expected to already be added date by date.
Or perhaps there's another option I'm unaware of?
The reason I'm asking is because of the answer provided by this following question: Why not use hashing/hash tables for everything?
And the reason I'm unsure if I'm even asking the right question is because of the answer to the following question: Whats the best data structure for a calendar / day planner?
If so, I would really appreciate being directed other resources on data persistence in java.
Thank you for the help!
Use a NavigableMap interface (implemented by TreeMap, a red-black tree).
This allows you to easily and efficiently select date ranges and traverse over events in key order.
As an aside, if you consider time or date intervals to be "half-open" it will make many problems easier. That is, when selecting events, include the lower bound in results, but exclude the upper. The methods of NavigableMap, like subMap(), are designed to work this way, and it's a good practice when you are working with intervals of any quantity, as it's easy to define a sequence of intervals without overlap or gaps.
Depends on how serious you want your project to be. In all cases, be careful of premature optimization. This is when you try too hard to make your code "efficient", and sacrifice readability/maintainability in the process. For example, there is likely a way of doing manual memory management with native code to make a more efficient implementation of a data structure for your calendar, but it likely does not outweigh the beneits of using familiar APIs etc. It might do, but you only know when you run your code.
Write readable code
Run it, test for performance issues
Use a profiler (e.g. JProfiler) to identify the code that is responsible for poor performance
Optimise that code
Repeat
For code that will "work", but will not be very scalable, a simple List will usually do fine. You can use JSONs to store your objects, and a library such as Jackson Databind to map between List and JSON. You could then simply save it to a file for persistence.
For an application that you want to be more robust and protected against data corruption, a database is probably better. With this, you can guarantee that, for example, data is not partially written, concurrent access to the same data will not result in corruption, and a whole host of other benefits. However, you will need to have a database server running alongside your application. You can use JDBC and suitable drivers for your database vendor (e.g. Mysql) to connect to, read from and write to the database.
For a serious application, you will probably want to create an API for your persistence. A framework like Spring is very helpful for this, as it allows you to declare REST endpoints using annotations, and introduces useful programming concepts, such as containers, IoC/Dependency Injection, Testing (unit tests and integration tests), JPA/ORM systems and more.
Like I say, this is all context dependent, but above all else, avoid premature optimization.
This thread might give you some ideas what data structure to use for Range Queries.
Data structure for range query
And it even might be easier to use a database and using an API to query for the desired range.
If you are using (or are able to use) Guava, you might consider using RangeMap (*).
This would allow you to use, say, a RangeMap<Instant, Event>, which you could then query to say "what event is occurring at time T".
One drawback is that you wouldn't be able to model concurrent events (e.g. when you are double-booked in two meetings).
(*) I work for Google, Guava is Google's open-sourced Java library. This is the library I would use, but others with similar range map offerings are available.
Relatively simple question. I need to translate/localize a legacy Java application.
Our company, with newer applications uses .properties files in Java for localizing their strings, and this concept is very similar to .resx files in C# (which we also have products using that).
The problem is this is a legacy product that was around before we started thinking about localization. It is full of hard coded strings and also various forms of hard-coded string concatenation/formatting.
As far as I am aware I have a very daunting task of pulling all our strings and formatting into .properties files in the product and then referencing those in the code.
Personally I have no huge issue doing this work, but I want to make sure I am not missing something.
So I have a couple general questions.
Is there a faster way of converting my product to use the
.properties files? Off the top of my head I could write a script
that would automate maybe 30-40% of the work...
Are there any "gotchas" I should be worried about specific to converting a legacy
product (I am not looking for general localization "gotchas" which I
can google for, but anything specific to this scenario)?
Finally, are there any completely different strategies I am overlooking for
localization? This is just how we translate our existing products,
but because this is a legacy product (and on the agenda to be
re-written) this is essentially throw-away code and I could do pretty much whatever I want. Including just
finding the cheapest dirtiest fastest way possible, although I am
obviously leaning toward doing the job properly.
Any thoughts, people?
As a guideline I would say try to keep answers focused on the questions being asked, but any informational contributions or questions are always welcome in comments.
No, there is no faster way. You have to go through the code line by line.
There are plenty of gotchas, since internationalization is about more than just string constants.
You may already know that number formats and date formats need to be localized, but you'll need to be on the lookout for numbers and dates being embedded into strings via concatenation or StringBuilder.append calls. You'll also need to be on the lookout for implicit toString() calls, such as when a Number or Date is supplied as a Swing model value (for example, returning a Number from the TableModel.getValueAt method), or when a JSP or JSF EL expression refers to such a value directly instead of formatting it.
Similarly, keep an eye out for enum constants directly displayed to the user, implicitly invoking their toString() method.
Creating sentences through string concatenation is a problem not only because of the formatting of numbers, dates, and enums, but also because other languages may have different ordering of sentence structure. Such string concatenation should be replaced with localized MessageFormats.
Keystrokes need to be localized, including all mnemonics (and accelerators if it's a desktop app).
Layouts are an issue. Places where the application assumes left-to-right orientation are something you'll want to address; even if you're only planning to localize for other left-to-right languages, you probably know that putting off good i18n practices is asking for trouble later down the line.
If your app is a Swing application, you'll want to convert LEFT/WEST and RIGHT/EAST layout constraints to LINE_START and LINE_END. If your app is a web application, you'll need to factor out margin-left, margin-right, padding-left, padding-right, border-left, and border-right (and probably many others I'm forgetting) into lang-specific CSS blocks.
Swing apps also need to call applyComponentOrientation after building each window, usually right before calling pack().
Some programmers like to store parts of a UI in a database. I'm not talking about user content (which you shouldn't localize); I'm talking about label text, window titles, layout constraints, and so on. I have a hearty dislike for that practice, personally, but people do it. If your app is doing that, I guess either the database table needs a locale column, or the practice of storing the UI in the database needs to be removed entirely.
To answer your final question, if there are any better strategies than stepping through the code, I've never heard of them. You could just search for double-quote characters in the code, of course. I suppose the choice depends on how professional and polished your superiors want the application to look.
One thing I've learned is that throw-away code often isn't. Don't be surprised if that rewrite ends up trying to salvage large swaths of code from the legacy version.
My question is both a language implementation question and an ANTLR4 API question. Is there way I can modify a ParseTree and it's accompanying TokenStream?
Here is the scenario. I have a simple language that defines a dataflow program. You can see it on github, if you're curious. I lex and parse the language with ANTLR4. I use listeners to walk the parse tree and evaluate the code.
The problem I have most recently run into is I need to be able to modify the code at runtime. I need to be able to define new objects and create instances from them. Note, I'm not referring to having reflection in the language. I'm referring to having a program like an IDE modify the internal representation of the source code.
I have started off down the path of defining a bunch of definition objects to create an AST, but I just realized this approach will require me to come up with my own solutions for walking the AST. Rather than reinvent the wheel, I'd rather use ANTLR's listeners/visitors.
Another problem I face is the need to be able to output the current state the AST as code at any point in time (The tool I'm embedding the language in needs to be able to save.) I am using StringTemplate to generate the code from my definition objects. I think I should be able to make ST render the parse tree.
In general, I need to able to lex, parse, evaluate, refactor, evaluate, and generate code all from within my runtime.
Rather than create my own definition objects, I'm wondering what the best approach is modify the ParseTree/TokenStreams?
I checked out your language. It looks pretty simple, and I'm assuming it is.
From your description I'm working on the basis that the IDE will operate directly on the tree. Given that this is so you need:
A parser for your language, to convert source code into tree. ANTLR can do this, but you may need to build your own tree rather than rely on what is provided. Writing your own parser is not that hard.
Tree rewriting rules. A series of permitted transformations of the tree that ensure it remains valid. This can be in the form of an API that operates on the tree. This http://www.program-transformation.org/Transform/TreeRewriting may be helpful.
Source code generation. The ability to reconstruct source code from the tree. If you need to preserve comments and formatting then either (a) include them in the grammar (b) merge generated source code with authored code.
The end result is that you can round-trip any change in either the source code or the generated tree.
If you think this is a lot of work, it is. However, it is the basis for many other tools that aim to provide these facilities. The saving grace is that your language is simple.
We have no means to alter parse tree safely at the moment. It's best to derive a new version from the old. You should also look at the token stream rewrite engine thingie.
Ter
The short answer to your question is yes, but as you surmise, it will be painful and fragile.
A far more workable alternative, unless you have a hard requirement explicitly to modify the 'internal' representation of the source code/parse tree, is to simply re-parse the modified source code between (virtually) every keystroke. Just grab the current contents of the active editor as a String and feed it as a stream to the Lexer.
While this may sound like an expensive operation, Antlr is actually quite fast. Eclipse in particular works well with this approach - I have used it with a number of DSL editors without any noticeable impact on editor performance. The parse occurs entirely on a background thread. Editor problemMarkers are only updated when there is a sufficient pause in the foreground editing thread. NetBeans should be similar.
Let me describe the problem. A lot of suppliers send us data files in various formats (with various headers). We do not have any control on the data format (what columns the suppliers send us). Then this data needs to be converted to our standard transactions (this standard is constant and defined by us).
The challenge here is that we do not have any control on what columns suppliers send us in their files. The destination standard is constant. Now I have been asked to develop a framework through which the end users can define their own data transformation rules through UI. (say field A in destination transaction is equal to columnX+columnY or first 3 characters of columnZ from input file). There will be many such data transformation rules.
The goal is that the users should be able to add all these supplier files (and convert all their data to my company data from front end UI with minimum code change). Please suggest me some frameworks for this (preferably java based).
Worked in a similar field before. Not sure if I would trust customers/suppliers to use such a tool correctly and design 100% bulletproof transformations. Mapping columns is one thing, but how about formatting problems in dates, monetary values and the likes? You'd probably need to manually check their creations anyway or you'll end up with some really nasty data consistency issues. Errors caused by faulty data transformation are little beasts hiding in the dark and jumping at you when you need them the least.
If all you need is a relatively simple, graphical way to design data conversations, check out something like Talend Open Studio (just google it). It calls itself an ETL tool, but we used for all kinds of stuff.
I have a piece of xml that contains optional non-enumerated elements, so schema validation does not catch invalid values. However, this xml is transformed into a different format after validation and is then handed off to a system that tries to store the information in a database. At this point, some of the values that were optional in the previous format are now coded values in the database that will throw foreign key constraint exception if we try and store them. So, I need to build a process in a J2EE app that will check a set of xpaths values against a set of values that are allowable at those spots and if they are not valid either remove them/replace them/remove them and their parents depending on schema restrictions.
I have a couple options that will work, but neither of them seem like very elegant/intuitive solutions.
Option #1 would involve doing the work in an xslt 1.0. Before sending the xml through the xslt, querying up the acceptable values and sending the lists in as parameters. Then place tests at the appropriate locations in the xml that compares the incoming value against the acceptable ones and generates the xml accordingly.
This option doesn't seem very reusable, but it'd be very quick to implement.
Option #2 would involve Java code and an xml config file. The xml config file would layout the xpaths of the needed tests, the acceptable values, the default values (if applicable) and what to take out of the doc if the tests fail.
This option is much more reusable, but would probably double the time needed to build it.
So, which one of these would you pick? Or do you have another idea altogether? I'm open to all suggestions and would love to hear how you would handle this.
Sounds to me like option 2 is over-engineering. Do you have a clear idea about when you will want to reuse this functionality? If not, YAGNI, so go for the simpler and easier solution
Both options are acceptable. Depending on your skills and the complexity of your XML, I would say that it will require about the same amount of time.
Option 1 would be in my opinion more flexible, easier to maintain in the long run.
Option 2 could be tricky in some cases, how to define the config file itself for complex rules and how do you parse it without having to write complex code? One could say, I'll use a dom4j visitor and I'll be done with it. However, option 2 could become unnecessarily complicated imho if you deal with a complex XML structure.
I agree here. It felt like it was borderline over-engineering, but I was afraid that someone hearing that this was done would assume that it would be reusable and attempt to design something that used it in the future. However, I have since been reassured that this is a one-time deal and thus, will be going with the xslt approach.
Thanks all for your comments/answers!