Removing Optional Elements from XML when invalid

Removing Optional Elements from XML when invalid - java

I have a piece of xml that contains optional non-enumerated elements, so schema validation does not catch invalid values. However, this xml is transformed into a different format after validation and is then handed off to a system that tries to store the information in a database. At this point, some of the values that were optional in the previous format are now coded values in the database that will throw foreign key constraint exception if we try and store them. So, I need to build a process in a J2EE app that will check a set of xpaths values against a set of values that are allowable at those spots and if they are not valid either remove them/replace them/remove them and their parents depending on schema restrictions.
I have a couple options that will work, but neither of them seem like very elegant/intuitive solutions.
Option #1 would involve doing the work in an xslt 1.0. Before sending the xml through the xslt, querying up the acceptable values and sending the lists in as parameters. Then place tests at the appropriate locations in the xml that compares the incoming value against the acceptable ones and generates the xml accordingly.
This option doesn't seem very reusable, but it'd be very quick to implement.
Option #2 would involve Java code and an xml config file. The xml config file would layout the xpaths of the needed tests, the acceptable values, the default values (if applicable) and what to take out of the doc if the tests fail.
This option is much more reusable, but would probably double the time needed to build it.
So, which one of these would you pick? Or do you have another idea altogether? I'm open to all suggestions and would love to hear how you would handle this.

Sounds to me like option 2 is over-engineering. Do you have a clear idea about when you will want to reuse this functionality? If not, YAGNI, so go for the simpler and easier solution

Both options are acceptable. Depending on your skills and the complexity of your XML, I would say that it will require about the same amount of time.
Option 1 would be in my opinion more flexible, easier to maintain in the long run.
Option 2 could be tricky in some cases, how to define the config file itself for complex rules and how do you parse it without having to write complex code? One could say, I'll use a dom4j visitor and I'll be done with it. However, option 2 could become unnecessarily complicated imho if you deal with a complex XML structure.

I agree here. It felt like it was borderline over-engineering, but I was afraid that someone hearing that this was done would assume that it would be reusable and attempt to design something that used it in the future. However, I have since been reassured that this is a one-time deal and thus, will be going with the xslt approach.
Thanks all for your comments/answers!

Related

React to console inputs (commands) - how to handle multiple options most efficient?

Some information (don't want to confuse you with a lot of shitty code):
I've done a pretty large console programm (my largest project so far) which helps me a lot with managing some accounts / assets and more. I'm constantly adding more features but at the same time I reshape the code to work on my shitty coding style.
The console program has a lot of commands the user can type and for every command different methods get called / objects get created / manipulated and so on.
My keywords which are saved in an ArrayList<String> and my commands have this type: [keyword] [...n more Strings]
DESIGN PROBLEM 1:
I have a method cmdProcessor(String[] arguments) which handles the input (command) of the user, and the [keyword] is always the first argument arguments[0]. That means I have a large number of if-statements of this type:
if(arguments[0].equalsIgnoreCase("keyword") callMethod(argmts); where in the String[] argmts the remaining arguments[1] ... [n] are.
Is this a good way to handle this or should I go with switch-case?
Or something else (what?)? Is it better to save the keywords in a HashMap<String, Method>?
DESIGN PROBLEM 2:
The methods (see above callMethod(argmts) ), which are triggered by the entered keyword look even more chaotic. Since the same method can have different numbers and forms of arguments saved in the String[] argmts the method is full of if(argmts.length == ...) to check length, and every of these if-blocks has a bunch of switch-case options which also have a lot of ifs and so on. The last else and the default-case in switch-case I always use for error-handling (throwing error codes and and explanation why the pattern doesn't match and so on).
Is this good or are there better ways?
I thought about using lots of submethods, which would also blow up
my program and cost a lot of time but maybe improve readability / overview. Is this okay, or what is the best
option in such cases (lots of ifs and switch-case)?
Since I want to build more and more around this program maybe I should start now to fix bad design before it's too late. :)

About Design-Problem 1:
My go-to would be to register a lot of Handlers, which you can base on a common interface and then implement the specific behavior individually. This is good, because the central method handling your input is slim, and you only need to register a lot of singletons once, on initialization. Disadvantage: if you forget one, it will not work. So maybe, you can register them automatically (reflection or something thelike).
Aside from that, a map is better than a List in this case, because (I assume) you don't need a sorting. You need a mapping from key to behavior, so a map seems better (though even a very large set of keywords would probably not be very inefficient, if you stick to a list).
About Design Problem 2:
If I was you, I'd use actual Regular-Expression patterns. Take a look at the java.util.regex.Pattern-class. You can isolate groups and validate the values you receive. Though it does not spare you the exception/error-handling, it does help a lot in segmentation and interpretation efforts.

XML versus JSON for passing complete tree to decision based dynamic form on the UI

Please deal with this naive question.
Objective is to create a UI and dynamically be able to change the basic workflows, add another option, add another steps etc.
As an example, one simple workflow could be as following:
What's the age of 'X'? TEXT_BOX
if(age>18) proceed to step 2.
else, go to next page (let's say, same processing happens again with different value of X).
What does 'X do?
a. Job
b. Business
Submit (Go to next 'X').
I wanted to keep this workflow in XML as complete tree (all branches of if/else-if/else) and pass it on UI for rendering. Some of the sub-trees will be populated as per action performed in previous step.
This way, small modifications or workflow changes will not require any code changes.
Other option is to use JSON and pass it directly (rather than converting XML to JSON and passing) but it will be loosely coupled and could be difficult to manage in future.
Is there any clearcut benefit I should think of before choosing any of them?
Will any of them provide any extra benefit in the problem I am trying to solve?
Thanks,

It is possible to mimic the structure that you plan to have in an XML, as a json string, without any hurdles as far as I think of. (If you think of any hurdles please point out, I might help you out with idea to handle that).
json will for sure save you lots of bandwith if the data you are planning to send is large. This will inturn also reduce the roundtrip time and make your application more responsive.

What is a good framework to implement data transformation rules through UI

Let me describe the problem. A lot of suppliers send us data files in various formats (with various headers). We do not have any control on the data format (what columns the suppliers send us). Then this data needs to be converted to our standard transactions (this standard is constant and defined by us).
The challenge here is that we do not have any control on what columns suppliers send us in their files. The destination standard is constant. Now I have been asked to develop a framework through which the end users can define their own data transformation rules through UI. (say field A in destination transaction is equal to columnX+columnY or first 3 characters of columnZ from input file). There will be many such data transformation rules.
The goal is that the users should be able to add all these supplier files (and convert all their data to my company data from front end UI with minimum code change). Please suggest me some frameworks for this (preferably java based).

Worked in a similar field before. Not sure if I would trust customers/suppliers to use such a tool correctly and design 100% bulletproof transformations. Mapping columns is one thing, but how about formatting problems in dates, monetary values and the likes? You'd probably need to manually check their creations anyway or you'll end up with some really nasty data consistency issues. Errors caused by faulty data transformation are little beasts hiding in the dark and jumping at you when you need them the least.
If all you need is a relatively simple, graphical way to design data conversations, check out something like Talend Open Studio (just google it). It calls itself an ETL tool, but we used for all kinds of stuff.

Simplest way to allow users to specify output format

I have written an application which outputs data as XML. However, it would be nice to allow the user to completely customize the output format so they can more easily integrate it into their applications.
What would be the best way to approach this problem? My initial thoughts are to define a grammar and write a parser from the ground up.
Are there any free Java libraries that can assist in parsing custom scripting(formatting?) languages?
Since I already have the XML, would it be a better approach to just 'convert' this with a search & replace algorithm?
I should specify here that 'users' are other programmers so defining a simple language would be fine, and that the output is potentially recursive (imagine outputting the contents of a directory to XML).
Just looking for general advice in this area before I set off down the wrong track.
EDIT: To clarify... My situation is a bit unique. The application outputs coordinates and other data to be loaded into a game engine. Everybody seems to use a different, completely custom format in their own engine. Most people do not want to implement a JSON parser and would rather use what they already have working. In other words, it is in the interests of my users to have full control over the output, asking them to implement a different parser is not an option.

Have you considered just using a templating engine like Velocity or FreeMarker.

I would have created a result bean as a POJO.
Then I would have different classes working on the result bean. That way you can easily extend with new formats if needed.
E.g
Result result = logic.getResult();
XMLOutputter.output(result, "myXMLFile.xml");
Format1Outputter.output(result, "myFormat1File.fo1");
Format2Outputter.output(result, "myFormat2File.fo2");

If you are planning to provide this as an API to multiple parties, I would advise against allowing over-customization, it will add unnecessary complexity to your product and provide just one more place for bugs to be introduced.
Second, it will increase the complexity of your documentation and as a side affect likely cause your documentation to fall out of sync with the api in general.
The biggest thing I would suggest considering, in terms of making your stream easier to digest, is making the output available in JSON format, which just about every modern language has good support for (I use Gson for Java, myself).

Working with large text snippets in Java source

Are there any good ways to work with blocks of text (Strings) within Java source code? Many other languages have heredoc syntax available to them, but Java does not. This makes it pretty inconvenient to work with things like tag libraries which output a lot of static markup, and unit tests where you need to assert comparisons against blocks of XML.
How do other people work around this? Is it even possible? Or do I just have to put up with it?

If the text is static, or can be parameterized, a possible solution would be to store it in an external file and then import it. However, this creates file I/O which may be unnecessary or have a performance impact. Using this solution would need to involve caching the file contents to reduce the number of file reads.

The closes option in Java to HereDoc is java.text.MessageFormat.
You can not embed logic. It a simple value escape utility. There are no variables used. You have to use zero based indexing. Just follow the javadoc.
http://download.oracle.com/javase/1,5.0/docs/api/java/text/MessageFormat.html

While you could use certain formatters to convert and embed any text file or long literal
as a Java string (e.g., with newline breaks, the necessary escapes, etc.), I can't really think of frequent situations where you would need these capabilities.
The trend in software is generally to separate code from the data it operates on. Large text sections, even if meant just for display or comparison, are data, and are thus typically stored externally. The cost of reading a file (or even caching the result in memory) is fairly low. Internationalization is easier. Changing is easier. Version control is easier. Other tools (e.g., spell checkers) can easily be used.
I agree that in the case of unit tests where you want to compare things against a mock you would need large scale text comparisons. However, when you deal with such large files you will typically have tests that can work on several different large inputs to produce several large outputs, so why not just have your test load the appropriate files rather than inline it ?
Same goes with XML. In fact, for XML I would argue that in many cases you would want to read the XML and build a DOM tree which you would then compare rather than do a text compare that can be affected by whitespaces. And manually creating an XML tree in your unit test is ugly.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.