How to extract one boolean field from XML? - java

I have a model which is in XML format as shown below and I need to parse the XML and check whether my XML has internal-flag flag set as true or not. In my other models, it might be possible, that internal-flag flag is set as false. And sometimes, it is also possible that this field won't be there so by default it will be false from my code.
<?xml version="1.0"?>
<ClientMetadata
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com client.xsd"
xmlns="http://www.google.com">
<client id="200" version="13">
<name>hello world</name>
<description>hello hello</description>
<organization>TESTER</organization>
<author>david</author>
<internal-flag>true</internal-flag>
<clock>
<clock>
<for>
<init>val(tmp1) = 1</init>
<clock>
<eval><![CDATA[result("," + $convert(val(tmp1)))]]></eval>
</clock>
</for>
<for>
<incr>val(tmp1) -= 1</incr>
<clock>
<eval><![CDATA[result("," + $convert(val(tmp1)))]]></eval>
</clock>
</for>
</clock>
</clock>
</client>
</ClientMetadata>
I have a POJO in which I am storing my above model -
public class ModelMetadata {
private int modelId;
private String modelValue; // this string will have my above XML data as string
// setters and getters here
}
Now what is the best way to determine whether my model has internal-flag set as true or not?
// this list will have all my Models stored
List<ModelMetadata> metadata = getModelMetadata();
for (ModelMetadata model : metadata) {
// my model will be stored in below variable in XML format
String modelValue = model.getModelValue();
// now parse modelValue variable and extract `internal-flag` field property
}
Do I need to use XML parsing for this or is there any better way to do this?
Update:-
I have started using Stax and this is what I have tried so far but not sure how can I extract that field -
InputStream is = new ByteArrayInputStream(modelValue.getBytes());
XMLStreamReader r = XMLInputFactory.newInstance().createXMLStreamReader(is);
while(r.hasNext()) {
// now what should I do here?
}

There is an easy solution using XMLBeam (Disclosure: I'm affiliated with that project), just a few lines:
public class ReadBoolean {
public interface ClientMetaData {
#XBRead("//xbdefaultns:internal-flag")
boolean hasFlag();
}
public static void main(String[] args) throws IOException {
ClientMetaData clientMetaData = new XBProjector().io().url("res://xmlWithBoolean.xml").read(ClientMetaData.class);
System.out.println("Has flag:"+clientMetaData.hasFlag());
}
}
This program prints out
Has flag:true
for your XML.

You could also do some simple string parsing, but this will only work for small cases with proper XML and if there's only a single <internal-flag> element.
This is a simple solution to your problem without using any XML parsing utilities. Other solutions may be more robust or powerful.
Find the index of the string literal <internal-flag>. If it doesn't exist, return false.
Go forward "<internal-flag>".length (15) characters. Read up to the next </internal-flag>, which should be the string true or false.
Take that string, use Boolean.parseBoolean(String) to get a boolean value.
If you want me to help you out with the code just drop a comment!

If you are willing to consider adding Groovy to your mix (e.g. see the book Making Java Groovy) then using a Groovy XMLParser and associated classes will make this simple.
If you need to stick to Java, let me put in a shameless plug for my Xen library, which mimics a lot of the "Groovy way". The answer to your question would be:
Xen doc = new XenParser().parseText(YOUR_XML_STRING);
String internalFlag = doc.getText(".client.internal-flag");
boolean isSet = "true".equals(internalFlag);
If the XML comes from a File, Stream, or URI, that can be handled too.
Caveat emptor, (even though it is free) this is a fairly new library, written solely by a random person (me), and not thoroughly tested on all the crazy XML out there. If anybody knows of a similar, more "mainstream" library I'd be very interested in hearing about it.

Related

Parse a single POJO from multiple YAML documents representing different classes

I want to use a single YAML file which contains several different objects - for different applications. I need to fetch one object to get an instance of MyClass1, ignoring the rest of docs for MyClass2, MyClass3, etc. Some sort of selective de-serializing: now this class, then that one... The structure of MyClass2, MyClass3 is totally unknown to the application working with MyClass1. The file is always a valid YAML, of course.
The YAML may be of any structure we need to implement such a multi-class container. The preferred parsing tool is snakeyaml.
Is it sensible? How can I ignore all but one object?
UPD: replaced all "document" with "object". I think we have to speak about the single YAML document containing several objects of different structure. More of it, the parser knows exactly only 1 structure and wants to ignore the rest.
UDP2: I think it is impossible with snakeyaml. We have to read all objects anyway - and select the needed one later. But maybe I'm wrong.
UPD2: sample config file
---
-
exportConfiguration781:
attachmentFieldName: "name"
baseSftpInboxPath: /home/user/somedir/
somebool: false
days: 9999
expected:
- ABC w/o quotes
- "Cat ABC"
- "Some string"
dateFormat: yyyy-MMdd-HHmm
user: someuser
-
anotherConfiguration:
k1: v1
k2:
- v21
- v22
This is definitely possible with SnakeYAML, albeit not trivial. Here's a general rundown what you need to do:
First, let's have a look what loading with SnakeYAML does. Here's the important part of the YAML class:
private Object loadFromReader(StreamReader sreader, Class<?> type) {
Composer composer = new Composer(new ParserImpl(sreader), resolver, loadingConfig);
constructor.setComposer(composer);
return constructor.getSingleData(type);
}
The composer parses YAML input into Nodes. To do that, it doesn't need any knowledge about the structure of your classes, since every node is either a ScalarNode, a SequenceNode or a MappingNode and they just represent the YAML structure.
The constructor takes a root node generated by the composer and generates native POJOs from it. So what you want to do is to throw away parts of the node graph before they reach the constructor.
The easiest way to do that is probably to derive from Composer and override two methods like this:
public class MyComposer extends Composer {
private final int objIndex;
public MyComposer(Parser parser, Resolver resolver, int objIndex) {
super(parser, resolver);
this.objIndex = objIndex;
}
public MyComposer(Parser parser, Resolver resolver, LoaderOptions loadingConfig, int objIndex) {
super(parser, resolver, loadingConfig);
this.objIndex = objIndex;
}
#Override
public Node getNode() {
return strip(super.getNode());
}
private Node strip(Node input) {
return ((SequenceNode)input).getValue().get(objIndex);
}
}
The strip implementation is just an example. In this case, I assumed your YAML looks like this (object content is arbitrary):
- {first: obj}
- {second: obj}
- {third: obj}
And you simply select the object you actually want to deserialize by its index in the sequence. But you can also have something more complex like a searching algorithm.
Now that you have your own composer, you can do
Constructor constructor = new Constructor();
// assuming we want to get the object at index 1 (i.e. second object)
Composer composer = new MyComposer(new ParserImpl(sreader), new Resolver(), 1);
constructor.setComposer(composer);
MyObject result = (MyObject)constructor.getSingleData(MyObject.class);
The answer of #flyx was very helpful for me, opening the way to workaround the library (in our case - snakeyaml) limitations by overriding some methods. Thanks a lot! It's quite possible there is a final solution in it - but not now. Besides, the simple solution below is robust and should be considered even if we'd found the complete library-intruding solution.
I've decided to solve the task by double distilling, sorry, processing the configuration file. Imagine the latter consisting of several parts and every part is marked by the unique token-delimiter. For the sake of keeping the YAML-likenes, it may be
---
#this is a unique key for the configuration A
<some YAML document>
---
#this is another key for the configuration B
<some YAML document
The first pass is pre-processing. For the given String fileString and String key (and DELIMITER = "\n---\n". for example) we select a substring with the key-defined configuration:
int begIndex;
do {
begIndex= fileString.indexOf(DELIMITER);
if (begIndex == -1) {
break;
}
if (fileString.startsWith(DELIMITER + key, begIndex)) {
fileString = fileString.substring(begIndex + DELIMITER.length() + key.length());
break;
}
// spoil alien delimiter and repeat search
fileString = fileString.replaceFirst(DELIMITER, " ");
} while (true);
int endIndex = fileString.indexOf(DELIMITER);
if (endIndex != -1) {
fileString = fileString.substring(0, endIndex);
}
Now we feed the fileString to the simple YAML parsing
ExportConfiguration configuration = new Yaml(new Constructor(ExportConfiguration.class))
.loadAs(fileString, ExportConfiguration.class);
This time we have a single document that must co-respond to the ExportConfiguration class.
Note 1: The structure and even the very content of the rest of configuration file plays absolutely no role. This was the main idea, to get independent configurations in a single file
Note 2: the rest of configurations may be JSON or XML or whatever. We have a method-preprocessor that returns a String configuration - and the next processor parses it properly.

Is it possible to serialize multiple Java class objects into a single XML file using multiple "root" elements?

I'm working with a fairly complex and extensive number of user settings for an application. Rather than have one single megalithic class responsible for every single setting, I've judiciously decided to break it down into several smaller, simpler classes that will each be responsible for a certain set of user settings.
My desire is to store each of these settings instances inside a single XML file for the sake of simplicity (i hope) and to keep the number of files from getting stupidly huge.
So we come to my question:
If I have an XML file with a single root element, is it possible to store multiple objects inside that same root element, and extract them later, OR even more ideally (for the sake of making it more readable), is it possible for me to store multiple objects in a single XML file under different root element names? For example: could the following
#XMLRootElement(name = "Settings")
public class EQLGameSettings{
#XmlElement(name="PlayerCount")
final private Integer PlayerCount;
#XmlElement(name="GameMode")
final private EQLGameMode Mode;
#XmlElement(name="WinnerOnNewGame")
final private Boolean WinnerOnNewGame;
.
.
.
}
be stored in the same XML file alongside this under the same root element, and both be deserialized later?
#XMLRootElement(name = "Settings")
public class EQLScoreSettings{
#XmlElement(name="ScoreValue")
final private Integer ScoreValue;
#XmlElement(name="PenaltyImposed")
final private Integer PenaltyImposed;
#XmlElement(name="VictoryScore")
final private Integer VictoryScore;
.
.
.
}
And then be able to deserialize them both into new instances of their respective classes:
JAXBContext EQLGameSettingsContext = JAXBContext.newInstance("EQLGameSettings"),
EQLScoreSettingsContext = JAXBContext.newInstance("EQLScoreSettings");
Unmarshaller UM = EQLGameSettingsContext.createUnmarshaller();
File settingsXML = new File("path/to/Settings.xml");
EQLGameSettings = (EQLGameSettings) UM.unmarshal(settingsXML);
//And then be able to use the same file with and get a Score Settings instance like so:
UM = EQLScoreSettingsContext.createUnmarshaller();
EQLScoreSettings = (EQLScoreSettings) UM.unmarshal(settingsXML);
Would that work? Or would even trying to store multiple objects in the same XML file rewrite it every single time? Or would the following work?
#XMLRootElement(name = "GameSettings")
public class EQLGameSettings{
.
.
.
}
#XMLRootElement(name = "ScoreSettings")
public class EQLScoreSettings{
.
.
.
}
JAXBContext EQLGameSettingsContext = JAXBContext.newInstance("EQLGameSettings"),
EQLScoreSettingsContext = JAXBContext.newInstance("EQLScoreSettings");
Unmarshaller UM = EQLGameSettingsContext.createUnmarshaller();
File settingsXML = new File("path/to/Settings.xml");
EQLGameSettings = (EQLGameSettings) UM.unmarshal(settingsXML);
//And then be able to use the same file with and get a Score Settings instance like so:
UM = EQLScoreSettingsContext.createUnmarshaller();
EQLScoreSettings = (EQLScoreSettings) UM.unmarshal(settingsXML);
Would any or all of the preceding work at all, or am I just going to be stuck with having to generate a single XML file for each settings instance?
EDIT: Looks like that's not an option. I still really do not want to go through the pain of maintaining an XML file for each settings class. My next thought is to take each setting class and instead make it serializable. So instead of trying to fit each setting into an XML file I could instead do something like so:
#XmlRootElement("Settings")
public class MasterSettings{
#XmlElement("GameSettings")
EQLGameSettings GameSettings;
#XmlElement("ScoreSettings")
EQLScoreSettings ScoreSettings;
}
JAXBContext SettingsContext = JAXBContext.newInstance("MasterSettings");
Unmarshaller UM = SettingsContext.createUnmarshaller();
File SettingsFile = new File("Path/To/Settings.xml");
MasterSettings Settings = (MasterSettings)UM.unmarshal(SettingsFile);
I'm pretty confident that will work but then the next question I need to ask is how do I set up a class so that it can be serialized in this fashion?
Permit me to say that things are much simpler than you appear to think they are :)
Write a Java class that reflects the structure of all you want to store in one XML file. According to your rambling description, it might be something like the class below. (Note that I'm demonstrating how to implement a repeating element - use the other pattern if you need just a single occurrence).
#XmlRootElement("Settings")
#XmlAccessorType(XmlAccessType.PROPERTY)
public class Settings {
private List<GameSetting> gameSetting;
private ScoreSetting scoreSetting;
#XmlElement
public List<GameSetting> getGameSetting(){
if( gameSetting == null ){
gameSetting = new ArrayList<>();
}
return gameSetting;
}
#XmlElement
public ScoreSetting getScoreSetting(){
return scoreSetting;
}
public void setScoreSetting( ScoreSetting value ){
scoreSetting = value;
}
}
Then, you have the classes for GameSetting and ScoreSetting, very much like this one (but you can omit #XmlRootElement).
You build the object hierarchy and marshal a single object of Settings - that's all. After unmarshalling you can access the proeprties and subproperties.
Isn't possible to serialize multiple root elements in one xml .. e.g
<parentA>
<child>
Text
</child>
</parentA>
<parentB>
<child>
Text
</child>
</parentB>
you can try this way
<wrapperElement>
<parentA>
<child>
Text
</child>
</parentA>
<parentB>
<child>
Text
</child>
</parentB>
</wrapperElement>
The root must be always one.
XML is based on an XSD, for this reason you can't make that you say.
For more information see this link
But there is a workaround in this guide

How to decode XHTML and/or HTML5 entities in Java?

I have some strings that contain XHTML character entities:
"They&apos;re quite varied"
"Sometimes the string ∈ XML standard, sometimes ∈ HTML4 standard"
"Therefore -> I need an XHTML entity decoder."
"Sadly, some strings are not valid XML & are not-quite-so-valid HTML <- but I want them to work, too."
Is there any easy way to decode the entities? (I'm using Java)
I'm currently using StringEscapeUtils.unescapeHtml4(myString.replace("&apos;", "\'")) as a temporary hack. Sadly, org.apache.commons.lang3.StringEscapeUtils has unescapeHtml4 and unescapeXML, but no unescapeXhtml.
EDIT: I do want to handle invalid XML, for example I want "&&xyzzy;" to decode to "&&xyzzy;"
EDIT: I think HTML5 has almost the same character entities as XHTML, so I think HTML 5 decoder would be fine too.
This may not be directly relevant but you may wish to adopt JSoup which handles things like that albeit from a higher level. Includes web page cleaning routines.
Have you tried to implement a XHTMLStringEscapeUtils based on the facilities provide by org.apache.commons.text.StringEscapeUtils?
import org.apache.commons.text.StringEscapeUtils;
import org.apache.commons.text.translate.*;
public class XHTMLStringEscapeUtils {
public static final CharSequenceTranslator ESCAPE_XHTML =
new AggregateTranslator(
new LookupTranslator(EntityArrays.BASIC_ESCAPE),
new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE)
).with(StringEscapeUtils.ESCAPE_XML11);
public static final CharSequenceTranslator UNESCAPE_XHTML =
new AggregateTranslator(
new LookupTranslator(EntityArrays.BASIC_UNESCAPE),
new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE),
new NumericEntityUnescaper(),
new LookupTranslator(EntityArrays.APOS_UNESCAPE)
);
public static final String escape(final String input) {
return ESCAPE_XHTML.translate(input);
}
public static final String unescape(final String input) {
return UNESCAPE_XHTML.translate(input);
}
}
Thanks to the modular design of Apache commons-text lib, it's easy to create custom escape utils.
You can find a full project with tests here xhtml-string-escape-utils

How to use an array value as field in Java? a1.section[2] = 1;

New to Java, and can't figure out what I hope to be a simple thing.
I keep "sections" in an array:
//Section.java
public static final String[] TOP = {
"Top News",
"http://www.mysite.com/RSS/myfeed.csp",
"top"
};
I'd like to do something like this:
Article a1 = new Article();
a1.["s_" + section[2]] = 1; //should resolve to a1.s_top = 1;
But it won't let me, as it doesn't know what "section" is. (I'm sure seasoned Java people will cringe at this attempt... but my searches have come up empty on how to do this)
Clarification:
My article mysqlite table has fields for the "section" of the article:
s_top
s_sports
...etc
When doing my import from an XML file, I'd like to set that field to a 1 if it's in that category. I could have switch statement:
//whatever the Java version of this is
switch(section[2]) {
case "top": a1.s_top = 1; break;
case "sports": a1.s_sports = 1; break;
//...
}
But I thought it'd be a lot easier to just write it as a single line:
a1["s_"+section[2]] = 1;
In Java, it's a pain to do what you want to do in the way that you're trying to do it.
If you don't want to use the switch/case statement, you could use reflection to pull up the member attribute you're trying to set:
Class articleClass = a1.getClass();
Field field = articleClass.getField("s_top");
field.set(a1, 1);
It'll work, but it may be slow and it's an atypical approach to this problem.
Alternately, you could store either a Map<String> or a Map<String,Boolean> inside of your Article class, and have a public function within Article called putSection(String section), and as you iterate, you would put the various section strings (or string/value mappings) into the map for each Article. So, instead of statically defining which sections may exist and giving each Article a yes or no, you'd allow the list of possible sections to be dynamic and based on your xml import.
Java variables are not "dynamic", unlink actionscript for exemple. You cannot call or assign a variable without knowing it at compile time (well, with reflection you could but it's far to complex)
So yes, the solution is to have a switch case (only possible on strings with java 1.7), or using an hashmap or equivalent
Or, if it's about importing XML, maybe you should take a look on JAXB
If you are trying to get an attribute from an object, you need to make sure that you have "getters" and "setters" in your object. You also have to make sure you define Section in your article class.
Something like:
class Article{
String section;
//constructor
public Article(){
};
//set section
public void setSection(Section section){
this.section = section;
}
//get section
public String getSection(){
return this.section;
}

Java REGEX XML parse/cut-down while maintaining structure HowTo

I am writing a RESTful web service in Java.
The idea is to "cut down" an XML document and strip away all the unneeded content (~98%) and leave only the tags we're interested in, while maintaining the document's structure, which is as follows (I cannot provide the actual XML content for confidentiality reasons):
<sear:SEGMENTS xmlns="http://www.exlibrisgroup.com/xsd/primo/primo_nm_bib" xmlns:sear="http://www.exlibrisgroup.com/xsd/jaguar/search">
<sear:JAGROOT>
<sear:RESULT>
<sear:DOCSET IS_LOCAL="true" TOTAL_TIME="176" LASTHIT="9" FIRSTHIT="0" TOTALHITS="262" HIT_TIME="11">
<sear:DOC SEARCH_ENGINE_TYPE="Local Search Engine" SEARCH_ENGINE="Local Search Engine" NO="1" RANK="0.086826384" ID="2347460">
[
<PrimoNMBib>
<record>
<display>
<title></title>
</display>
<sort>
<author></author>
</sort>
</record>
</PrimoNMBib>
]
</sear:DOC>
</sear:DOCSET>
</sear:RESULT>
</sear:JAGROOT>
</sear:SEGMENTS>
Of course, this is the structure of only the tags we are interested in - there are hundreds more tags, but they are irrelevant.
The square brackets ([]) are not part of the XML and indicate that the element <PrimoNMBib></PrimoNMBib> are elements of a list of children and occur more than once - one per match of the search from the RESTFUL service.
I've been trying to parse the document with regular expressions, as to leave only the segments of the structure as shown above along with the values of <title> and <author> while removing everything else in-between the tags including other tags, however I can't get it to work for the life of me...
Previously I tried it using XSLT, however for unresolved reasons that didn't work either... I'd already asked a question for the XSLT implementation...
Anyway, I would very much appreciate a tip/hint/solution as how to solve this problem using regex and Java...
I wouldn't recommend using regex to manipulate XML.
Alternative Approach
You could use a StAX parser that leverages a StreamFilter to cut down the document and still maintain a valid structure.
How a StreamFilter Works
A StreamFilter receives event event from the XMLStreamReader, if you want to have the event reported you return true, otherwise false. In the example below the StreamFilter will reject anything in the "http://www.exlibrisgroup.com/xsd/jaguar/search" namespace. You will need to tweak the logic to get it to match the requirements of your use case.
http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/StreamFilter.html
Demo
package forum10351473;
import java.io.FileReader;
import javax.xml.stream.*;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("src/forum10351473/input.xml"));
xsr = xif.createFilteredReader(xsr, new StreamFilter() {
private boolean reportContent = false;
#Override
public boolean accept(XMLStreamReader reader) {
if(reader.isStartElement() || reader.isEndElement()) {
reportContent = !"http://www.exlibrisgroup.com/xsd/jaguar/search".equals(reader.getNamespaceURI());
}
return reportContent;
}
});
// The XMLStreamReader (xsr) will now only report the events you care about.
// You can process the XMLStreamReader yourself or pass as input to something
// like JAXB.
while(xsr.hasNext()) {
if(xsr.isStartElement()) {
System.out.println(xsr.getLocalName());
}
xsr.next();
}
}
}
Output
PrimoNMBib
record
display
title
sort
author

Categories

Resources