I got a question regarding XML and parsing it. I use JDOM to parse my XML-File, but I got a little Problem.
A sample of my XML-File looks like this:
<IO name="Bus" type="Class">
<ResourceAttribute name="Bandwidth" type="KiloBitPerSecond" value="50" />
</IO>
Bus is a object instance of the class IO. The object got the name and type properties. Additional it has some attributes, like in the sample, the Attribute Bandwidth with the value of 50 and the datatype KiloBitPerSecond.
So when I want to loop over the file with:
for(Element packages : listPackages)
{
Map<String, Values> valueMap = new HashMap<String, Values>();
List<Element> objectInstanceList = packages.getChildren();
for(Element objects : objectInstanceList)
{
List<Element> listObjectClasses = objects.getChildren();
for(Element classes : listObjectClasses)
{
List<Element> listObjectAttributes = classes.getChildren();
for(Element objectAttributes : listObjectAttributes)
{
List<Attribute> listAttributes = objectAttributes.getAttributes();
for(Attribute attributes : listAttributes)
{
String name = attributes.getName();
String value = attributes.getValue();
AttributeType datatype = attributes.getAttributeType();
Values v = new Values(name, datatype, value);
valueMap.put(classes.getName(), v);
System.out.println(name + ":" + value);
}
}
}
}
//System.out.println(valueMap);
}
values is a class which defines the object attribute:
public class Values{
private String name;
//private AttributeType datatype;
private String value;
Thats the rest of the Code. I got two question relating that. The first one got more priority at the moment.
How do I get the values of the object(Attribute.Name = Bandwidth; Attribute.Value = 50) ? Istead that I get
name:Bus
type:Class
I thought about an additional for-loop, but the JDOM class attribute dont have a method called getAttributes().
Thats just second priority because without question 1 I cannot go further. As you see in the sample, an Attribute got 3 properties, name, type and value. How can I extract that triple put of the sample. JDOM seems just to know 2 properties for an Attribute, name and value.
thanks a lot in advance and hopefully I managed to express my self.
Edit: Added an additional for-loop in it, so the output now is:
name:Bandwidth
type:KiloBitPerSecond
value:50
That means name is the name of that property and value is the value of name. Didnt know that. At least question one is clearer now and I can try working on 2, but the new information makes 2 clearer to me.
In xml the opening tag of elements are encosoed between < and > (or />) , after the < comes the name of the element, then comes a list of attributes in the format name="value". An element can be closed inline with /> or with a closing tag </[element name]>
It would be preferable to use recursion to parse your xml instead of badly readable/maintainable nested for loops.
Here is how it could look like:
#Test
public void parseXmlRec() throws JDOMException, IOException {
String xml = "<root>"
+ "<Package>"
+ "<IO name=\"Bus\" type=\"Class\">\r\n" +
" <ResourceAttribute name=\"Bandwidth\" type=\"KiloBitPerSecond\" value=\"50\" />\r\n" +
" </IO>"
+ "</Package>"
+ "</root>";
InputStream is = new ByteArrayInputStream(xml.getBytes());
SAXBuilder sb = new SAXBuilder();
Document document = sb.build(is);
is.close();
Element root = document.getRootElement();
List<Element> children = root.getChildren();
for(Element element : children) {
parseelement(element);
}
}
private void parseelement(Element element) {
System.out.println("Element:" + element.getName());
String name = element.getAttributeValue("name");
if(name != null) {
System.out.println("name: " + name);
}
String type = element.getAttributeValue("type");
if(type != null) {
System.out.println("type: " + type);
}
String value = element.getAttributeValue("value");
if(value != null) {
System.out.println("value: " + value);
}
List<Element> children = element.getChildren();
if(children != null) {
for(Element child : children) {
parseelement(child);
}
}
}
This outputs:
Element: Package
Element: IO
name: Bus
type: Class
Element: ResourceAttribute
name: Bandwidth
type: KiloBitPerSecond
value: 50
While parsing, check the name of each element and instanciate the coresponding objects. For that I would suggest to write a separate method to handle each element. For example:
void parsePackage(Element packageElement) { ... }
parseIO(Element ioElement) { ... }
void parseResourceAttribute(Element resourceAttributeElement) { ... }
Related
I need your help because I don't find the solution in Java for my problem.
I stored in object LinkedHashMap<String, String> this content:
TAG1.TAG2.TAG11 : value1
TAG1.TAG2.TAG12 : value2
TAG1.TAG2.TAG3.TAG131 : value3
TAG1.TAG2.TAG3.TAG132 : value4
TAG1.TAG2.TAG3.TAG133 : value5
TAG1.TAG2.TAG3.TAG134 : value6
TAG1.TAG4.TAG5.TAG21 : value7
TAG1.TAG4.TAG5.TAG22 : value8
TAG1.TAG4.TAG5.TAG23 : value9
TAG6 : value10
I need to display if a tag has 2 or more children, the list of child.
Here is the expected result:
TAG1.TAG2
TAG11 : value1
TAG12 : value2
TAG1.TAG2.TAG3
TAG131 : value3
TAG132 : value4
TAG133 : value5
TAG134 : value6
TAG1.TAG4.TAG5
TAG21 : value7
TAG22 : value8
TAG23 : value9
TAG6 : value10
EDIT 14/06/2022 :
In fact, my original analyse is bad because initialy I have a XML file :
<TAG1>
<TAG2>
<TAG11>value1</TAG11>
<TAG12>value2</TAG12>
<TAG3>
<TAG131>value3</TAG131>
<TAG132>value4</TAG132>
<TAG133>value5</TAG133>
<TAG134>value6</TAG134>
</TAG3>
</TAG2>
<TAG4>
<TAG5>
<TAG21>value7</TAG21>
<TAG22>value8</TAG22>
<TAG23>value9</TAG23>
</TAG5>
</TAG4>
</TAG1>
<TAG6>value10</TAG6>
And I created a map to store it :
TAG1.TAG2.TAG11 : value1
TAG1.TAG2.TAG12 : value2
TAG1.TAG2.TAG3.TAG131 : value3
TAG1.TAG2.TAG3.TAG132 : value4
TAG1.TAG2.TAG3.TAG133 : value5
TAG1.TAG2.TAG3.TAG134 : value6
TAG1.TAG4.TAG5.TAG21 : value7
TAG1.TAG4.TAG5.TAG22 : value8
TAG1.TAG4.TAG5.TAG23 : value9
TAG6 : value10
But, today I have a this case :
<TAG1>
<TAG2>
<TAG11>value1</TAG11>
<TAG12>value2</TAG12>
<TAG3>
<TAG131>value3</TAG131>
<TAG132>value4</TAG132>
<TAG133>value5</TAG133>
<TAG134>value6</TAG134>
</TAG3>
<TAG3>
<TAG131>value11</TAG131>
<TAG132>value12</TAG132>
<TAG133>value13</TAG133>
<TAG134>value14</TAG134>
</TAG3>
</TAG2>
<TAG4>
<TAG5>
<TAG21>value7</TAG21>
<TAG22>value8</TAG22>
<TAG23>value9</TAG23>
</TAG5>
</TAG4>
</TAG1>
<TAG6>value10</TAG6>
But the Map object does not allow to store many keys (in the example many TAG3). Have you got an idea how I can resolve this problem ?
EDIT 15/06/2022 :
In fact the expected result needs to keep the original XML structure.
Here the result of last sample :
TAG1.TAG2
TAG11 : value1
TAG12 : value2
TAG1.TAG2.TAG3
TAG131 : value3
TAG132 : value4
TAG133 : value5
TAG134 : value6
TAG1.TAG2.TAG3
TAG131 : value11
TAG132 : value12
TAG133 : value13
TAG134 : value14
TAG1.TAG4.TAG5
TAG21 : value7
TAG22 : value8
TAG23 : value9
TAG6 : value10
It's to display xml more human reader.
EDIT 04/07/2022 :
I detect a problem of inconsistent with "new TreeMap<>(Comparator.comparingInt(MyTag::getAppearanceOrder)". Indeed, some MyTag object are the same AppearanceOrder, so there is a problem of inconsistent ordering. Some value in Map are so removed.
To resolve I used :
map.entrySet().stream().sorted(Map.Entry.comparingByKey(. . .))
And I store the result in Map with collect().
Below the working code :
public class Main {
public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException {
//Accessing the xml file
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File("data.xml"));
document.getDocumentElement().normalize();
Element root = document.getDocumentElement();
//Retrieving a List of records where each record contains: the original chain of tags, the numbered chain of tags and the value
String tagSep = ".";
List<Record> listRecords = new ArrayList<>();
visitXMLFile(listRecords, root.getChildNodes(), tagSep, "", "", new HashMap<>());
//Queue sorted by the numbered tag's length in descending order (from the longest to the shortest)
PriorityQueue<Record> queue = new PriorityQueue<>(Comparator.comparing(Record::getTagNumberedLen).reversed());
queue.addAll(listRecords);
//Using a set to have unique numbered tags (no duplicates) to group by in the resulting map
Set<MyTag> setMyTags = new HashSet<>();
//Checking for each numbered tag if its largest substring is equal to any other numbered tag's beginning:
// - if it does, then the substring is collected as a key to group by within the final map
//
// - if it doesn't, then another substring is generated from the previous substring until a matching value is found.
// If no value is found, then the numbered tag is collected entirely as a key for the resulting map.
while (!queue.isEmpty()) {
Record rec = queue.poll();
//This loop keeps creating substrings of the current numbered tag until:
// - the substring matches another numbered tag's beginning
// - or no more substrings can be generated
int lastIndexTagNum = rec.getTagNumbered().lastIndexOf(tagSep);
int lastIndexTag = rec.getTag().lastIndexOf(tagSep);
while (lastIndexTagNum > 0) {
//Checking if the substring matches the beginning of any numbered tag except the current one
String subStrTagNum = rec.getTagNumbered().substring(0, lastIndexTagNum);
if (listRecords.stream().anyMatch(r -> !r.getTagNumbered().equals(rec.getTagNumbered()) && r.getTagNumbered().startsWith(subStrTagNum + tagSep))) {
String subStrTag = rec.getTag().substring(0, lastIndexTag);
int appearanceOrder = listRecords.stream().filter(r -> r.getTagNumbered().startsWith(subStrTagNum + tagSep)).map(r -> r.getAppearanceOrder()).min(Comparator.naturalOrder()).orElse(0);
//If a match is found then the current substring is added to the set and the substring iteration is interrupted
setMyTags.add(new MyTag(subStrTag, subStrTagNum + tagSep, appearanceOrder));
break;
}
//Creating a new substring from the previous substring if no match has been found
lastIndexTagNum = rec.getTagNumbered().substring(0, lastIndexTagNum).lastIndexOf(tagSep);
lastIndexTag = rec.getTag().substring(0, lastIndexTag).lastIndexOf(tagSep);
}
//If no substrings of the current numbered tag matches the beginning of any other numbered tag,
//then the current numbered tag is collected as a key for the resulting map
if (lastIndexTagNum < 0) {
int appearanceOrder = listRecords.stream().filter(r -> r.getTagNumbered().startsWith(rec.getTagNumbered())).map(r -> r.getAppearanceOrder()).min(Comparator.naturalOrder()).orElse(0);
setMyTags.add(new MyTag(rec.getTag(), rec.getTagNumbered(), appearanceOrder));
}
}
//Creating a temporary resulting map (not sorted as the input)
Map<MyTag, List<String>> mapTemp = listRecords.stream()
.collect(Collectors.toMap(
rec -> {
//Looking for the longest numbered tag which matches the beginning of the current record's numbered tag.
//The reason why we need the longest match (i.e. the most accurate) is because some elements
//may share the same parents but be on different levels, for example the values 3, 4, 5 and 6
//have a key whose beginning matches both "TAG1.TAG2" and "TAG1.TAG2.TAG3", but only the longest
//match is actually the right one.
return setMyTags.stream().filter(mt -> rec.getTagNumbered().startsWith(mt.getTagNumbered())).max(Comparator.comparingInt(MyTag::getTagNumberedLen)).orElseThrow(() -> new RuntimeException("No key found"));
},
rec -> {
//Retrieving, like above, the numbered tag that will be used to map the current value
MyTag myTag = setMyTags.stream().filter(mt -> rec.getTagNumbered().startsWith(mt.getTagNumbered())).max(Comparator.comparingInt(MyTag::getTagNumberedLen)).orElseThrow(() -> new RuntimeException("No key found"));
//If the new numbered tag and the record's numbered tag are equal then a List with the current value is returned
if (myTag.getTagNumbered().equals(rec.getTagNumbered())) {
return new ArrayList<>(List.of(rec.getValue()));
} else { //If the new numbered tag is a substring of the record's numbered tag then the rest of the current (non-numbered) tag is added to the value
return new ArrayList<>(List.of(rec.getTag().substring(myTag.getTag().length() + 1) + " : " + rec.getValue()));
}
},
//Handling colliding cases by merging the lists together
(list1, list2) -> {
list1.addAll(list2);
return list1;
}
)
);
//Creating a TreeMap whose ordering is based on the insertion order of the input
Map<MyTag, List<String>> mapRes =
mapTemp.entrySet().stream()
.sorted(Map.Entry.comparingByKey(Comparator.comparingInt(MyTag::getAppearanceOrder)))
.collect(Collectors.toMap(
Map.Entry::getKey, Map.Entry::getValue, (e1, e2) -> e1, LinkedHashMap::new));
//Printing the resulting map
for (Map.Entry<MyTag, List<String>> entry : mapRes.entrySet()) {
System.out.println(entry.getKey());
for (String value : entry.getValue()) {
System.out.println("\t" + value);
}
}
}
private static void visitXMLFile(List<Record> listInput, NodeList nodeList, String tagSep, String tag, String tagNumbered, Map<String, Integer> mapTagOccurrence) {
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.hasChildNodes()) {
String newTag = tag.isEmpty() ? node.getNodeName() : tag + tagSep + node.getNodeName();
//Setting or incrementing the number of appearances of a tag chain
//(sometimes a same chain of tags can be repeated, ex: TAG1.TAG2.TAG3)
if (!mapTagOccurrence.containsKey(newTag)) {
mapTagOccurrence.put(newTag, 1);
} else {
mapTagOccurrence.computeIfPresent(newTag, (key, val) -> val + 1);
}
//Creating a numbered version of the tag where its number of appearances is added at the end.
//This is done to uniquely identify different groups of tag chain when these are repeated (ex: TAG1.TAG2.TAG3)
String newTagNum = tagNumbered.isEmpty() ? node.getNodeName() + mapTagOccurrence.get(newTag) : tagNumbered + tagSep + node.getNodeName() + mapTagOccurrence.get(newTag);
visitXMLFile(listInput, node.getChildNodes(), tagSep, newTag, newTagNum, mapTagOccurrence);
} else {
if (!node.getTextContent().trim().equals("")) {
int appearanceOrder = listInput.size() + 1;
listInput.add(new Record(tag, tagNumbered, node.getTextContent().trim(), appearanceOrder));
}
}
}
}
}
class MyTag {
//Tag chain for the user
private String tag;
//Unique tag chain for identification
private String tagNumbered;
private int appearanceOrder;
public MyTag(String tag, String tagNumbered, int appearanceOrder) {
this.tag = tag;
this.tagNumbered = tagNumbered;
this.appearanceOrder = appearanceOrder;
}
public String getTag() {
return tag;
}
public String getTagNumbered() {
return tagNumbered;
}
public int getTagNumberedLen() {
return tagNumbered == null ? 0 : tagNumbered.length();
}
public int getAppearanceOrder() {
return appearanceOrder;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MyTag tagPair = (MyTag) o;
return Objects.equals(tagNumbered, tagPair.tagNumbered);
}
#Override
public int hashCode() {
return Objects.hash(tagNumbered);
}
#Override
public String toString() {
return tag;
}
}
class Record {
//Tag chain for the user
private String tag;
//Unique tag chain for identification
private String tagNumbered;
private String value;
private int appearanceOrder;
public Record(String tag, String tagNumbered, String value, int appearanceOrder) {
this.tag = tag;
this.tagNumbered = tagNumbered;
this.value = value;
this.appearanceOrder = appearanceOrder;
}
public String getTag() {
return tag;
}
public String getTagNumbered() {
return tagNumbered;
}
public int getTagNumberedLen() {
return tagNumbered == null ? 0 : tagNumbered.length();
}
public String getValue() {
return value;
}
public int getAppearanceOrder() {
return appearanceOrder;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Record record = (Record) o;
return Objects.equals(tagNumbered, record.tagNumbered);
}
#Override
public int hashCode() {
return Objects.hash(tagNumbered);
}
#Override
public String toString() {
return tag + " - " + tagNumbered + " - " + value;
}
}
Edit
At this point to answer your edited question, I had to use a List instead of a Map to store the input since multiple values share the same key and a Map<String, List<String>> wouldn't maintain the insertion order. In fact, the values from 3 to 6 would be alternated with the values from 11 to 14.
Besides, since the same chain of tags can appear several times (ex: TAG1.TAG2.TAG3), I had to implement two custom classes: MyTag and Record.
The first class represents a custom tag made of two fields: tag and tagNumbered. The first field holds the tag chain that must be shown to the user, while the second is used as the actual identifier to group by in the stream operation. tagNumbered is basically a copy of tag where at the end of each nested tag is added its number of appearances.
Instead, the class Record is used to represent a value accompanied by its tag chain and numbered tag chain.
So, the following XML is represented as follows by the respective classes:
<x>
<y>
<z>value1</z>
</y>
<y>
<z>value2</z>
</y>
</x>
Record:
Record1:
- tag: x.y.z
- tagNumbered: x1.y1.z1
- value: value1
Record2:
- tag: x.y.z
- tagNumbered: x1.y2.z1 //because y appears twice within x
- value: value2
MyTag (MyTag is created from Record):
MyTag1:
- tag: x.y.z
- tagNumbered: x1.y1.z1
MyTag2:
- tag: x.y.z
- tagNumbered: x1.y2.z1 //because y appears twice within x
Here is an XML sample based on your question's input, that I've used for the code below.
<root>
<TAG1>
<TAG2>
<TAG11>value1</TAG11>
<TAG12>value2</TAG12>
<TAG3>
<TAG131>value3</TAG131>
<TAG132>value4</TAG132>
<TAG133>value5</TAG133>
<TAG134>value6</TAG134>
</TAG3>
<TAG3>
<TAG131>value11</TAG131>
<TAG132>value12</TAG132>
<TAG133>value13</TAG133>
<TAG134>value14</TAG134>
</TAG3>
</TAG2>
<TAG4>
<TAG5>
<TAG21>value7</TAG21>
<TAG22>value8</TAG22>
<TAG23>value9</TAG23>
</TAG5>
</TAG4>
</TAG1>
<TAG6>value10</TAG6>
</root>
Original Answer Updated
The first part of the problem consists in creating a List<Record> while reading from the XML file which is achieved with the visitXMLFile method.
After reading the records from the file, we need to create a Set of unique numbered tag chains to identify each group of values. This is actually done with a Set<MyTag>; however MyTag's equals() and hashCode() are based exclusively on tagNumbered.
After creating the Set of unique numbered tags, we need to stream the input list of entries with a single operation: collect(Collectors.toMap()). In this operation, each record is mapped to a MyTag (i.e., a numbered tag) of the Set previously created.
Finally, to maintain the original insertion order, the resulting Map has been implemented as a TreeMap initialized with a Comparator defined on the order of the input list's records.
Here is an implementation with detailed comments explaining the whole logic step by step:
public class Main {
public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException {
//Accessing the xml file
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File("data.xml"));
document.getDocumentElement().normalize();
Element root = document.getDocumentElement();
//Retrieving a List of records where each record contains: the original chain of tags, the numbered chain of tags and the value
String tagSep = ".";
List<Record> listRecords = new ArrayList<>();
visitXMLFile(listRecords, root.getChildNodes(), tagSep, "", "", new HashMap<>());
//Queue sorted by the numbered tag's length in descending order (from the longest to the shortest)
PriorityQueue<Record> queue = new PriorityQueue<>(Comparator.comparing(Record::getTagNumberedLen).reversed());
queue.addAll(listRecords);
//Using a set to have unique numbered tags (no duplicates) to group by in the resulting map
Set<MyTag> setMyTags = new HashSet<>();
//Checking for each numbered tag if its largest substring is equal to any other numbered tag's beginning:
// - if it does, then the substring is collected as a key to group by within the final map
//
// - if it doesn't, then another substring is generated from the previous substring until a matching value is found.
// If no value is found, then the numbered tag is collected entirely as a key for the resulting map.
while (!queue.isEmpty()) {
Record rec = queue.poll();
//This loop keeps creating substrings of the current numbered tag until:
// - the substring matches another numbered tag's beginning
// - or no more substrings can be generated
int lastIndexTagNum = rec.getTagNumbered().lastIndexOf(tagSep);
int lastIndexTag = rec.getTag().lastIndexOf(tagSep);
while (lastIndexTagNum > 0) {
//Checking if the substring matches the beginning of any numbered tag except the current one
String subStrTagNum = rec.getTagNumbered().substring(0, lastIndexTagNum);
if (listRecords.stream().anyMatch(r -> !r.getTagNumbered().equals(rec.getTagNumbered()) && r.getTagNumbered().startsWith(subStrTagNum + tagSep))) {
String subStrTag = rec.getTag().substring(0, lastIndexTag);
int appearanceOrder = listRecords.stream().filter(r -> r.getTagNumbered().startsWith(subStrTagNum + tagSep)).map(r -> r.getAppearanceOrder()).min(Comparator.naturalOrder()).orElse(0);
//If a match is found then the current substring is added to the set and the substring iteration is interrupted
setMyTags.add(new MyTag(subStrTag, subStrTagNum + tagSep, appearanceOrder));
break;
}
//Creating a new substring from the previous substring if no match has been found
lastIndexTagNum = rec.getTagNumbered().substring(0, lastIndexTagNum).lastIndexOf(tagSep);
lastIndexTag = rec.getTag().substring(0, lastIndexTag).lastIndexOf(tagSep);
}
//If no substrings of the current numbered tag matches the beginning of any other numbered tag,
//then the current numbered tag is collected as a key for the resulting map
if (lastIndexTagNum < 0) {
int appearanceOrder = listRecords.stream().filter(r -> r.getTagNumbered().startsWith(rec.getTagNumbered())).map(r -> r.getAppearanceOrder()).min(Comparator.naturalOrder()).orElse(0);
setMyTags.add(new MyTag(rec.getTag(), rec.getTagNumbered(), appearanceOrder));
}
}
//Creating a temporary resulting map (not sorted as the input)
Map<MyTag, List<String>> mapTemp = listRecords.stream()
.collect(Collectors.toMap(
rec -> {
//Looking for the longest numbered tag which matches the beginning of the current record's numbered tag.
//The reason why we need the longest match (i.e. the most accurate) is because some elements
//may share the same parents but be on different levels, for example the values 3, 4, 5 and 6
//have a key whose beginning matches both "TAG1.TAG2" and "TAG1.TAG2.TAG3", but only the longest
//match is actually the right one.
return setMyTags.stream().filter(mt -> rec.getTagNumbered().startsWith(mt.getTagNumbered())).max(Comparator.comparingInt(MyTag::getTagNumberedLen)).orElseThrow(() -> new RuntimeException("No key found"));
},
rec -> {
//Retrieving, like above, the numbered tag that will be used to map the current value
MyTag myTag = setMyTags.stream().filter(mt -> rec.getTagNumbered().startsWith(mt.getTagNumbered())).max(Comparator.comparingInt(MyTag::getTagNumberedLen)).orElseThrow(() -> new RuntimeException("No key found"));
//If the new numbered tag and the record's numbered tag are equal then a List with the current value is returned
if (myTag.getTagNumbered().equals(rec.getTagNumbered())) {
return new ArrayList<>(List.of(rec.getValue()));
} else { //If the new numbered tag is a substring of the record's numbered tag then the rest of the current (non-numbered) tag is added to the value
return new ArrayList<>(List.of(rec.getTag().substring(myTag.getTag().length() + 1) + " : " + rec.getValue()));
}
},
//Handling colliding cases by merging the lists together
(list1, list2) -> {
list1.addAll(list2);
return list1;
}
)
);
//Creating a TreeMap whose ordering is based on the insertion order of the input
Map<MyTag, List<String>> mapRes = new TreeMap<>(Comparator.comparingInt(MyTag::getAppearanceOrder));
mapRes.putAll(mapTemp);
//Printing the resulting map
for (Map.Entry<MyTag, List<String>> entry : mapRes.entrySet()) {
System.out.println(entry.getKey());
for (String value : entry.getValue()) {
System.out.println("\t" + value);
}
}
}
private static void visitXMLFile(List<Record> listInput, NodeList nodeList, String tagSep, String tag, String tagNumbered, Map<String, Integer> mapTagOccurrence) {
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node.hasChildNodes()) {
String newTag = tag.isEmpty() ? node.getNodeName() : tag + tagSep + node.getNodeName();
//Setting or incrementing the number of appearances of a tag chain
//(sometimes a same chain of tags can be repeated, ex: TAG1.TAG2.TAG3)
if (!mapTagOccurrence.containsKey(newTag)) {
mapTagOccurrence.put(newTag, 1);
} else {
mapTagOccurrence.computeIfPresent(newTag, (key, val) -> val + 1);
}
//Creating a numbered version of the tag where its number of appearances is added at the end.
//This is done to uniquely identify different groups of tag chain when these are repeated (ex: TAG1.TAG2.TAG3)
String newTagNum = tagNumbered.isEmpty() ? node.getNodeName() + mapTagOccurrence.get(newTag) : tagNumbered + tagSep + node.getNodeName() + mapTagOccurrence.get(newTag);
visitXMLFile(listInput, node.getChildNodes(), tagSep, newTag, newTagNum, mapTagOccurrence);
} else {
if (!node.getTextContent().trim().equals("")) {
int appearanceOrder = listInput.size() + 1;
listInput.add(new Record(tag, tagNumbered, node.getTextContent().trim(), appearanceOrder));
}
}
}
}
}
class MyTag {
//Tag chain for the user
private String tag;
//Unique tag chain for identification
private String tagNumbered;
private int appearanceOrder;
public MyTag(String tag, String tagNumbered, int appearanceOrder) {
this.tag = tag;
this.tagNumbered = tagNumbered;
this.appearanceOrder = appearanceOrder;
}
public String getTag() {
return tag;
}
public String getTagNumbered() {
return tagNumbered;
}
public int getTagNumberedLen() {
return tagNumbered == null ? 0 : tagNumbered.length();
}
public int getAppearanceOrder() {
return appearanceOrder;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MyTag tagPair = (MyTag) o;
return Objects.equals(tagNumbered, tagPair.tagNumbered);
}
#Override
public int hashCode() {
return Objects.hash(tagNumbered);
}
#Override
public String toString() {
return tag;
}
}
class Record {
//Tag chain for the user
private String tag;
//Unique tag chain for identification
private String tagNumbered;
private String value;
private int appearanceOrder;
public Record(String tag, String tagNumbered, String value, int appearanceOrder) {
this.tag = tag;
this.tagNumbered = tagNumbered;
this.value = value;
this.appearanceOrder = appearanceOrder;
}
public String getTag() {
return tag;
}
public String getTagNumbered() {
return tagNumbered;
}
public int getTagNumberedLen() {
return tagNumbered == null ? 0 : tagNumbered.length();
}
public String getValue() {
return value;
}
public int getAppearanceOrder() {
return appearanceOrder;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Record record = (Record) o;
return Objects.equals(tagNumbered, record.tagNumbered);
}
#Override
public int hashCode() {
return Objects.hash(tagNumbered);
}
#Override
public String toString() {
return tag + " - " + tagNumbered + " - " + value;
}
}
Output
TAG1.TAG2
TAG11 : value1
TAG12 : value2
TAG1.TAG2.TAG3
TAG131 : value3
TAG132 : value4
TAG133 : value5
TAG134 : value6
TAG1.TAG2.TAG3
TAG131 : value11
TAG132 : value12
TAG133 : value13
TAG134 : value14
TAG1.TAG4.TAG5
TAG21 : value7
TAG22 : value8
TAG23 : value9
TAG6
value10
What I am actually doing is a recursive function which reads the tags in the xml. Below is the code:
private void readTag(org.w3c.dom.Node item, String histoTags, String fileName, Hashtable<String, String> tagsInfos) {
try {
if (item.getNodeType() == Node.ELEMENT_NODE) {
NodeList itemChilds = item.getChildNodes();
for (int i=0; i < itemChilds.getLength(); i++) {
org.w3c.dom.Node itemChild = itemChilds.item(i);
readTag(itemChild, histoTags + "|" + item.getNodeName(), fileName, tagsInfos);
}
}
else if (item.getNodeType() == Node.TEXT_NODE) {
tagsInfosSoft.put(histoTags, item.getNodeValue());
}
}
This function takes some time to execute. The xml the function reads is in this format:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<Mouvement>
<Com>
<IdCom>32R01000000772669473</IdCom>
<RefCde>32R</RefCde>
<Edit>0</Edit>
<Com>
<Mouvement>
<Document>
Is there any way of optimizing this code in java?
Two optimizations, don't know how much they will help:
Don't use getChildNodes(). Use getFirstChild() and getNextSibling().
Reuse a single StringBuilder instead of creating a new one for every element (implicitly done by histoTags + "|" + item.getNodeName()).
But, you should also be aware that the text content of an element node may seen as a combination of multiple TEXT and CDATA nodes.
Your code will also work better if it works on elements, not nodes.
private static void readTag(Element elem, StringBuilder histoTags, String fileName, Hashtable<String, String> tagsInfos) {
int histoLen = histoTags.length();
CharSequence textContent = null;
boolean hasChildElement = false;
for (Node child = elem.getFirstChild(); child != null; child = child.getNextSibling()) {
switch (child.getNodeType()) {
case Node.ELEMENT_NODE:
histoTags.append('|').append(child.getNodeName());
readTag((Element)child, histoTags, fileName, tagsInfos);
histoTags.setLength(histoLen);
hasChildElement = true;
break;
case Node.TEXT_NODE:
case Node.CDATA_SECTION_NODE:
//uncomment to test: System.out.println(histoTags + ": \"" + child.getTextContent() + "\"");
if (textContent == null)
// Optimization: Don't copy to a StringBuilder if only one text node will be found
textContent = child.getTextContent();
else if (textContent instanceof StringBuilder)
// Ok, now we need a StringBuilder to collect text from multiple nodes
((StringBuilder)textContent).append(child.getTextContent());
else
// And we keep collecting text from multiple nodes
textContent = new StringBuilder(textContent).append(child.getTextContent());
break;
default:
// ignore all others
}
}
if (textContent != null) {
String text = textContent.toString();
// Suppress pure whitespace content on elements with child elements, i.e. structural whitespace
if (! hasChildElement || ! text.trim().isEmpty())
tagsInfos.put(histoTags.toString(), text);
}
}
Test
String xml = "<root>\n" +
" <tag>hello <![CDATA[world]]> Foo <!-- comment --> Bar</tag>\n" +
"</root>\n";
Element docElem = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new StringReader(xml)))
.getDocumentElement();
Hashtable<String, String> tagsInfos = new Hashtable<>();
readTag(docElem, new StringBuilder(docElem.getNodeName()), "fileName", tagsInfos);
System.out.println(tagsInfos);
Output (with print uncommented)
root: "
"
root|tag: "hello "
root|tag: "world"
root|tag: " Foo "
root|tag: " Bar"
root: "
"
{root|tag=hello world Foo Bar}
See how splitting the text inside the <tag> node using CDATA and comments caused the DOM node to contain multiple TEXT/CDATA child nodes.
I am stuck on an issue trying to parse some XML documents to obtain the output i require.
Take this sample XML:
<root>
<ZoneRule Name="After" RequiresApproval="false">
<Zone>
<WSAZone ConsecutiveDayNumber="1">
<DaysOfWeek>
<WSADaysOfWeek Saturday="false"/>
</DaysOfWeek>
<SelectedLimits>
</SelectedLimits>
<SelectedHolidays>
</SelectedHolidays>
</WSAZone>
</Zone>
</ZoneRule>
<ZoneRule Name="Before" RequiresApproval="false">
<Zone>
<WSAZone ConsecutiveDayNumber="3">
<DaysOfWeek>
<WSADaysOfWeek Saturday="true"/>
</DaysOfWeek>
<SelectedLimits>
</SelectedLimits>
<SelectedHolidays>
</SelectedHolidays>
</WSAZone>
</Zone>
</ZoneRule>
</root>
What i am attempting to do is to be able to ignore the root tag (this is working so no problems here), and treat each of the "ZoneRule's" as its own individual block.
Once i have each ZoneRule isolated i need to extract all of the nodes and attributes to allow me to to create a string to query a database to check if it exists (this part is also working).
The issue i am having is that in my code i cannot separate out each individual ZoneRule block, for some reason it is being processed all as one.
My sample code is as follows:
public String testXML = "";
int andCount = 0;
public void printNote(NodeList nodeList) {
for (int count = 0; count < nodeList.getLength(); count++) {
Node tempNode = nodeList.item(count);
// make sure it's element node.
if (tempNode.getNodeType() == Node.ELEMENT_NODE) {
if (tempNode.hasAttributes()))) {
// get attributes names and values
NamedNodeMap nodeMap = tempNode.getAttributes();
for (int i = 0; i < nodeMap.getLength(); i++) {
Node node = nodeMap.item(i);
if (andCount == 0) {
testXML = testXML + "XMLDataAsXML.exist('//" + tempNode.getNodeName() + "[./#" + node.getNodeName() + "=\"" + node.getNodeValue() + "\"]')=1 \n";
} else {
testXML = testXML + " and XMLDataAsXML.exist('//" + tempNode.getNodeName() + "[./#" + node.getNodeName() + "=\"" + node.getNodeValue() + "\"]')=1 \n";
}
andCount = andCount + 1;
}
}
if (tempNode.hasChildNodes()) {
// loop again if has child nodes
printNote(tempNode.getChildNodes());
}
}
}
}
private void jButton2ActionPerformed(java.awt.event.ActionEvent evt) {
try {
File file = new File("C:\\Test.xml");
DocumentBuilder dBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = dBuilder.parse(file);
//System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
if (doc.hasChildNodes()) {
printNote(doc.getChildNodes());
}
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println(testXML);
}
Which produces this output (both nodes combined).
XMLDataAsXML.exist('//ZoneRule[./#Name="After"]')=1
and XMLDataAsXML.exist('//ZoneRule[./#RequiresApproval="false"]')=1
and XMLDataAsXML.exist('//WSAZone[./#ConsecutiveDayNumber="1"]')=1
and XMLDataAsXML.exist('//WSADaysOfWeek[./#Saturday="false"]')=1
and XMLDataAsXML.exist('//ZoneRule[./#Name="Before"]')=1
and XMLDataAsXML.exist('//ZoneRule[./#RequiresApproval="false"]')=1
and XMLDataAsXML.exist('//WSAZone[./#ConsecutiveDayNumber="3"]')=1
and XMLDataAsXML.exist('//WSADaysOfWeek[./#Saturday="true"]')=1
What i am actually after is this (excuse the incomplete SQL statements):
XMLDataAsXML.exist('//ZoneRule[./#Name="After"]')=1
and XMLDataAsXML.exist('//ZoneRule[./#RequiresApproval="false"]')=1
and XMLDataAsXML.exist('//WSAZone[./#ConsecutiveDayNumber="1"]')=1
and XMLDataAsXML.exist('//WSADaysOfWeek[./#Saturday="false"]')=1
XMLDataAsXML.exist XMLDataAsXML.exist('//ZoneRule[./#Name="Before"]')=1
and XMLDataAsXML.exist('//ZoneRule[./#RequiresApproval="false"]')=1
and XMLDataAsXML.exist('//WSAZone[./#ConsecutiveDayNumber="3"]')=1
and XMLDataAsXML.exist('//WSADaysOfWeek[./#Saturday="true"]')=1
The XML that will be parsed will not always be exactly like above so i cannot use hardcoded xPaths etc - i need to dynamically loop through the document, looking for the ZoneRule node as my base (i will dynamically generate this value based on the file received) and then extract all the required info.
I am completely open to better methods than what i have tried above.
Thanks very much.
In your code, the testXML and andCount are declared outside the printNote method and are not being reset during iterations.
You start with the first ZoneRule, generate the correct text during the first for iterations (lets forget about the recursion) and now you move to the next ZoneRule, but testXML contains the whole generated text and the andCount is lager then 0 so you keep attaching the text generated for the next ZoneRule.
You should reset the andCount and testXML at the beggining of each iteriation of the for loop. But then you 'recursive' children would not be rendered correctly.
So either you need two methods one to deal with top level ZoneRule elements and another for its children, or much better, instead of appending to text to shared variable, you should redisng your method so they would return String value which then can be appended correctly (with and or without, withou new line or without) at the place when it is recursively callled.
I am using Jericho HTML Parser to parse some malformed html. In particular I am trying to get all text nodes, process the text and then replace it.
I want to skip specific elements from processing. For example I want to skip all elements, and any element that has attribute class="noProcess". So, if a div has class="noProcess" then I want to skip this div and all children from processing. However, I do want these skipped elements to return back to the output after processing.
Jericho provides an Iterator for all nodes but I am not sure how to skip complete elements from the Iterator. Here is my code:
private String doProcessHtml(String html) {
Source source = new Source(html);
OutputDocument outputDocument = new OutputDocument(source);
for (Segment segment : source) {
if (segment instanceof Tag) {
Tag tag = (Tag) segment;
System.out.println("FOUND TAG: " + tag.getName());
// DO SOMETHING HERE TO SKIP ENTIRE ELEMENT IF IS <A> OR CLASS="noProcess"
} else if (segment instanceof CharacterReference) {
CharacterReference characterReference = (CharacterReference) segment;
System.out.println("FOUND CHARACTERREFERENCE: " + characterReference.getCharacterReferenceString());
} else {
System.out.println("FOUND PLAIN TEXT: " + segment.toString());
outputDocument.replace(segment, doProcessText(segment.toString()));
}
}
return outputDocument.toString();
}
It doesn't look like using the ignoreWhenParsing() method works for me as the parser just treats the "ignored" element as text.
I was thinking that if I could convert the Iterator loop to a for (int i = 0;...) loop I could probably be able to skip the element and all its children by modifying i to point to the EndTag and then continue the loop.... but not sure.
I think you might want to consider a redesign of the way your segments are built. Is there a way to parse the html in such a way that each segment is a parent element that contains a nested list of child elements? That way you could do something like:
for (Segment segment : source) {
if (segment instanceof Tag) {
Tag tag = (Tag) segment;
System.out.println("FOUND TAG: " + tag.getName());
// DO SOMETHING HERE TO SKIP ENTIRE ELEMENT IF IS <A> OR CLASS="noProcess"
continue;
} else if (segment instanceof CharacterReference) {
CharacterReference characterReference = (CharacterReference) segment;
System.out.println("FOUND CHARACTERREFERENCE: " + characterReference.getCharacterReferenceString());
for(Segment child : segment.childNodes()) {
//Use recursion to process child elements
//You will want to put your for loop in a separate method so it can be called recursively.
}
} else {
System.out.println("FOUND PLAIN TEXT: " + segment.toString());
outputDocument.replace(segment, doProcessText(segment.toString()));
}
}
Without more code to inspect its hard to determine if restructuring the segment element is even possible or worth the effort.
Managed to have a working solution by using the getEnd() method of the Element object of the Tag. The idea is to skip elements if their end position is less than a position you set. So you find the end position of the element you want to exclude and you do not process anything else before that position:
final ArrayList<String> excludeTags = new ArrayList<String>(Arrays.asList(new String[] {"head", "script", "a"}));
final ArrayList<String> excludeClasses = new ArrayList<String>(Arrays.asList(new String[] {"noProcess"}));
Source.LegacyIteratorCompatabilityMode = true;
Source source = new Source(htmlToProcess);
OutputDocument outputDocument = new OutputDocument(source);
int skipToPos = 0;
for (Segment segment : source) {
if (segment.getBegin() >= skipToPos) {
if (segment instanceof Tag) {
Tag tag = (Tag) segment;
Element element = tag.getElement();
// check excludeTags
if (excludeTags.contains(tag.getName().toLowerCase())) {
skipToPos = element.getEnd();
}
// check excludeClasses
String classes = element.getAttributeValue("class");
if (classes != null) {
for (String theClass : classes.split(" ")) {
if (excludeClasses.contains(theClass.toLowerCase())) {
skipToPos = element.getEnd();
}
}
}
} else if (segment instanceof CharacterReference) { // for future use. Source.LegacyIteratorCompatabilityMode = true;
CharacterReference characterReference = (CharacterReference) segment;
} else {
outputDocument.replace(segment, doProcessText(segment.toString()));
}
}
}
return outputDocument.toString();
This should work.
String skipTag = null;
for (Segment segment : source) {
if (skipTag != null) { // is skipping ON?
if (segment instanceof EndTag && // if EndTag found for the
skipTag.equals(((EndTag) segment).getName())) { // tag we're skipping
skipTag = null; // set skipping OFF
}
continue; // continue skipping (or skip the EndTag)
} else if (segment instanceof Tag) { // is tag?
Tag tag = (Tag) segment;
System.out.println("FOUND TAG: " + tag.getName());
if (HTMLElementName.A.equals(tag.getName()) { // if <a> ?
skipTag = tag.getName(); // set
continue; // skipping ON
} else if (tag instanceof StartTag) {
if ("noProcess".equals( // if <tag class="noProcess" ..> ?
((StartTag) tag).getAttributeValue("class"))) {
skipTag = tag.getName(); // set
continue; // skipping ON
}
}
} // ...
}
I'd like to be able to read in an XML schema (i.e. xsd) and from that know what are valid attributes, child elements, values as I walk through it.
For example, let's say I have an xsd that this xml will validate against:
<root>
<element-a type="something">
<element-b>blah</element-b>
<element-c>blahblah</element-c>
</element-a>
</root>
I've tinkered with several libraries and I can confidently get <root> as the root element. Beyond that I'm lost.
Given an element I need to know what child elements are required or allowed, attributes, facets, choices, etc. Using the above example I'd want to know that element-a has an attribute type and may have children element-b and element-c...or must have children element-b and element-c...or must have one of each...you get the picture I hope.
I've looked at numerous libraries such as XSOM, Eclipse XSD, Apache XmlSchema and found they're all short on good sample code. My search of the Internet has also been unsuccessful.
Does anyone know of a good example or even a book that demonstrates how to go through an XML schema and find out what would be valid options at a given point in a validated XML document?
clarification
I'm not looking to validate a document, rather I'd like to know the options at a given point to assist in creating or editing a document. If I know "I am here" in a document, I'd like to determing what I can do at that point. "Insert one of element A, B, or C" or "attach attribute 'description'".
This is a good question. Although, it is old, I did not find an acceptable answer. The thing is that the existing libraries I am aware of (XSOM, Apache XmlSchema) are designed as object models. The implementors did not have the intention to provide any utility methods — you should consider implement them yourself using the provided object model.
Let's see how querying context-specific elements can be done by the means of Apache XmlSchema.
You can use their tutorial as a starting point. In addition, Apache CFX framework provides the XmlSchemaUtils class with lots of handy code examples.
First of all, read the XmlSchemaCollection as illustrated by the library's tutorial:
XmlSchemaCollection xmlSchemaCollection = new XmlSchemaCollection();
xmlSchemaCollection.read(inputSource, new ValidationEventHandler());
Now, XML Schema defines two kinds of data types:
Simple types
Complex types
Simple types are represented by the XmlSchemaSimpleType class. Handling them is easy. Read the documentation: https://ws.apache.org/commons/XmlSchema/apidocs/org/apache/ws/commons/schema/XmlSchemaSimpleType.html. But let's see how to handle complex types. Let's start with a simple method:
#Override
public List<QName> getChildElementNames(QName parentElementName) {
XmlSchemaElement element = xmlSchemaCollection.getElementByQName(parentElementName);
XmlSchemaType type = element != null ? element.getSchemaType() : null;
List<QName> result = new LinkedList<>();
if (type instanceof XmlSchemaComplexType) {
addElementNames(result, (XmlSchemaComplexType) type);
}
return result;
}
XmlSchemaComplexType may stand for both real type and for the extension element. Please see the public static QName getBaseType(XmlSchemaComplexType type) method of the XmlSchemaUtils class.
private void addElementNames(List<QName> result, XmlSchemaComplexType type) {
XmlSchemaComplexType baseType = getBaseType(type);
XmlSchemaParticle particle = baseType != null ? baseType.getParticle() : type.getParticle();
addElementNames(result, particle);
}
When you handle XmlSchemaParticle, consider that it can have multiple implementations. See: https://ws.apache.org/commons/XmlSchema/apidocs/org/apache/ws/commons/schema/XmlSchemaParticle.html
private void addElementNames(List<QName> result, XmlSchemaParticle particle) {
if (particle instanceof XmlSchemaAny) {
} else if (particle instanceof XmlSchemaElement) {
} else if (particle instanceof XmlSchemaGroupBase) {
} else if (particle instanceof XmlSchemaGroupRef) {
}
}
The other thing to bear in mind is that elements can be either abstract or concrete. Again, the JavaDocs are the best guidance.
Many of the solutions for validating XML in java use the JAXB API. There's an extensive tutorial available here. The basic recipe for doing what you're looking for with JAXB is as follows:
Obtain or create the XML schema to validate against.
Generate Java classes to bind the XML to using xjc, the JAXB compiler.
Write java code to:
Open the XML content as an input stream.
Create a JAXBContext and Unmarshaller
Pass the input stream to the Unmarshaller's unmarshal method.
The parts of the tutorial you can read for this are:
Hello, world
Unmarshalling XML
I see you have tried Eclipse XSD. Have you tried Eclipse Modeling Framework (EMF)? You can:
Generating an EMF Model using XML Schema (XSD)
Create a dynamic instance from your metamodel (3.1 With the dynamic instance creation tool)
This is for exploring the xsd. You can create the dynamic instance of the root element then you can right click the element and create child element. There you will see what the possible children element and so on.
As for saving the created EMF model to an xml complied xsd: I have to look it up. I think you can use JAXB for that (How to use EMF to read XML file?).
Some refs:
EMF: Eclipse Modeling Framework, 2nd Edition (written by creators)
Eclipse Modeling Framework (EMF)
Discover the Eclipse Modeling Framework (EMF) and Its Dynamic Capabilities
Creating Dynamic EMF Models From XSDs and Loading its Instances From XML as SDOs
This is a fairly complete sample on how to parse an XSD using XSOM:
import java.io.File;
import java.util.Iterator;
import java.util.Vector;
import org.xml.sax.ErrorHandler;
import com.sun.xml.xsom.XSComplexType;
import com.sun.xml.xsom.XSElementDecl;
import com.sun.xml.xsom.XSFacet;
import com.sun.xml.xsom.XSModelGroup;
import com.sun.xml.xsom.XSModelGroupDecl;
import com.sun.xml.xsom.XSParticle;
import com.sun.xml.xsom.XSRestrictionSimpleType;
import com.sun.xml.xsom.XSSchema;
import com.sun.xml.xsom.XSSchemaSet;
import com.sun.xml.xsom.XSSimpleType;
import com.sun.xml.xsom.XSTerm;
import com.sun.xml.xsom.impl.Const;
import com.sun.xml.xsom.parser.XSOMParser;
import com.sun.xml.xsom.util.DomAnnotationParserFactory;
public class XSOMNavigator
{
public static class SimpleTypeRestriction
{
public String[] enumeration = null;
public String maxValue = null;
public String minValue = null;
public String length = null;
public String maxLength = null;
public String minLength = null;
public String[] pattern = null;
public String totalDigits = null;
public String fractionDigits = null;
public String whiteSpace = null;
public String toString()
{
String enumValues = "";
if (enumeration != null)
{
for(String val : enumeration)
{
enumValues += val + ", ";
}
enumValues = enumValues.substring(0, enumValues.lastIndexOf(','));
}
String patternValues = "";
if (pattern != null)
{
for(String val : pattern)
{
patternValues += "(" + val + ")|";
}
patternValues = patternValues.substring(0, patternValues.lastIndexOf('|'));
}
String retval = "";
retval += minValue == null ? "" : "[MinValue = " + minValue + "]\t";
retval += maxValue == null ? "" : "[MaxValue = " + maxValue + "]\t";
retval += minLength == null ? "" : "[MinLength = " + minLength + "]\t";
retval += maxLength == null ? "" : "[MaxLength = " + maxLength + "]\t";
retval += pattern == null ? "" : "[Pattern(s) = " + patternValues + "]\t";
retval += totalDigits == null ? "" : "[TotalDigits = " + totalDigits + "]\t";
retval += fractionDigits == null ? "" : "[FractionDigits = " + fractionDigits + "]\t";
retval += whiteSpace == null ? "" : "[WhiteSpace = " + whiteSpace + "]\t";
retval += length == null ? "" : "[Length = " + length + "]\t";
retval += enumeration == null ? "" : "[Enumeration Values = " + enumValues + "]\t";
return retval;
}
}
private static void initRestrictions(XSSimpleType xsSimpleType, SimpleTypeRestriction simpleTypeRestriction)
{
XSRestrictionSimpleType restriction = xsSimpleType.asRestriction();
if (restriction != null)
{
Vector<String> enumeration = new Vector<String>();
Vector<String> pattern = new Vector<String>();
for (XSFacet facet : restriction.getDeclaredFacets())
{
if (facet.getName().equals(XSFacet.FACET_ENUMERATION))
{
enumeration.add(facet.getValue().value);
}
if (facet.getName().equals(XSFacet.FACET_MAXINCLUSIVE))
{
simpleTypeRestriction.maxValue = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MININCLUSIVE))
{
simpleTypeRestriction.minValue = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MAXEXCLUSIVE))
{
simpleTypeRestriction.maxValue = String.valueOf(Integer.parseInt(facet.getValue().value) - 1);
}
if (facet.getName().equals(XSFacet.FACET_MINEXCLUSIVE))
{
simpleTypeRestriction.minValue = String.valueOf(Integer.parseInt(facet.getValue().value) + 1);
}
if (facet.getName().equals(XSFacet.FACET_LENGTH))
{
simpleTypeRestriction.length = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MAXLENGTH))
{
simpleTypeRestriction.maxLength = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_MINLENGTH))
{
simpleTypeRestriction.minLength = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_PATTERN))
{
pattern.add(facet.getValue().value);
}
if (facet.getName().equals(XSFacet.FACET_TOTALDIGITS))
{
simpleTypeRestriction.totalDigits = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_FRACTIONDIGITS))
{
simpleTypeRestriction.fractionDigits = facet.getValue().value;
}
if (facet.getName().equals(XSFacet.FACET_WHITESPACE))
{
simpleTypeRestriction.whiteSpace = facet.getValue().value;
}
}
if (enumeration.size() > 0)
{
simpleTypeRestriction.enumeration = enumeration.toArray(new String[] {});
}
if (pattern.size() > 0)
{
simpleTypeRestriction.pattern = pattern.toArray(new String[] {});
}
}
}
private static void printParticle(XSParticle particle, String occurs, String absPath, String indent)
{
boolean repeats = particle.isRepeated();
occurs = " MinOccurs = " + particle.getMinOccurs() + ", MaxOccurs = " + particle.getMaxOccurs() + ", Repeats = " + Boolean.toString(repeats);
XSTerm term = particle.getTerm();
if (term.isModelGroup())
{
printGroup(term.asModelGroup(), occurs, absPath, indent);
}
else if(term.isModelGroupDecl())
{
printGroupDecl(term.asModelGroupDecl(), occurs, absPath, indent);
}
else if (term.isElementDecl())
{
printElement(term.asElementDecl(), occurs, absPath, indent);
}
}
private static void printGroup(XSModelGroup modelGroup, String occurs, String absPath, String indent)
{
System.out.println(indent + "[Start of Group " + modelGroup.getCompositor() + occurs + "]" );
for (XSParticle particle : modelGroup.getChildren())
{
printParticle(particle, occurs, absPath, indent + "\t");
}
System.out.println(indent + "[End of Group " + modelGroup.getCompositor() + "]");
}
private static void printGroupDecl(XSModelGroupDecl modelGroupDecl, String occurs, String absPath, String indent)
{
System.out.println(indent + "[GroupDecl " + modelGroupDecl.getName() + occurs + "]");
printGroup(modelGroupDecl.getModelGroup(), occurs, absPath, indent);
}
private static void printComplexType(XSComplexType complexType, String occurs, String absPath, String indent)
{
System.out.println();
XSParticle particle = complexType.getContentType().asParticle();
if (particle != null)
{
printParticle(particle, occurs, absPath, indent);
}
}
private static void printSimpleType(XSSimpleType simpleType, String occurs, String absPath, String indent)
{
SimpleTypeRestriction restriction = new SimpleTypeRestriction();
initRestrictions(simpleType, restriction);
System.out.println(restriction.toString());
}
public static void printElement(XSElementDecl element, String occurs, String absPath, String indent)
{
absPath += "/" + element.getName();
String typeName = element.getType().getBaseType().getName();
if(element.getType().isSimpleType() && element.getType().asSimpleType().isPrimitive())
{
// We have a primitive type - So use that instead
typeName = element.getType().asSimpleType().getPrimitiveType().getName();
}
boolean nillable = element.isNillable();
System.out.print(indent + "[Element " + absPath + " " + occurs + "] of type [" + typeName + "]" + (nillable ? " [nillable] " : ""));
if (element.getType().isComplexType())
{
printComplexType(element.getType().asComplexType(), occurs, absPath, indent);
}
else
{
printSimpleType(element.getType().asSimpleType(), occurs, absPath, indent);
}
}
public static void printNameSpace(XSSchema s, String indent)
{
String nameSpace = s.getTargetNamespace();
// We do not want the default XSD namespaces or a namespace with nothing in it
if(nameSpace == null || Const.schemaNamespace.equals(nameSpace) || s.getElementDecls().isEmpty())
{
return;
}
System.out.println("Target namespace: " + nameSpace);
Iterator<XSElementDecl> jtr = s.iterateElementDecls();
while (jtr.hasNext())
{
XSElementDecl e = (XSElementDecl) jtr.next();
String occurs = "";
String absPath = "";
XSOMNavigator.printElement(e, occurs, absPath,indent);
System.out.println();
}
}
public static void xsomNavigate(File xsdFile)
{
ErrorHandler errorHandler = new ErrorReporter(System.err);
XSSchemaSet schemaSet = null;
XSOMParser parser = new XSOMParser();
try
{
parser.setErrorHandler(errorHandler);
parser.setAnnotationParser(new DomAnnotationParserFactory());
parser.parse(xsdFile);
schemaSet = parser.getResult();
}
catch (Exception exp)
{
exp.printStackTrace(System.out);
}
if(schemaSet != null)
{
// iterate each XSSchema object. XSSchema is a per-namespace schema.
Iterator<XSSchema> itr = schemaSet.iterateSchema();
while (itr.hasNext())
{
XSSchema s = (XSSchema) itr.next();
String indent = "";
printNameSpace(s, indent);
}
}
}
public static void printFile(String fileName)
{
File fileToParse = new File(fileName);
if (fileToParse != null && fileToParse.canRead())
{
xsomNavigate(fileToParse);
}
}
}
And for your Error Reporter use:
import java.io.OutputStream;
import java.io.PrintStream;
import java.text.MessageFormat;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ErrorReporter implements ErrorHandler {
private final PrintStream out;
public ErrorReporter( PrintStream o ) { this.out = o; }
public ErrorReporter( OutputStream o ) { this(new PrintStream(o)); }
public void warning(SAXParseException e) throws SAXException {
print("[Warning]",e);
}
public void error(SAXParseException e) throws SAXException {
print("[Error ]",e);
}
public void fatalError(SAXParseException e) throws SAXException {
print("[Fatal ]",e);
}
private void print( String header, SAXParseException e ) {
out.println(header+' '+e.getMessage());
out.println(MessageFormat.format(" line {0} at {1}",
new Object[]{
Integer.toString(e.getLineNumber()),
e.getSystemId()}));
}
}
For your main use:
public class WDXSOMParser
{
public static void main(String[] args)
{
String fileName = null;
if(args != null && args.length > 0 && args[0] != null)
fileName = args[0];
else
fileName = "C:\\xml\\CollectionComments\\CollectionComment1.07.xsd";
//fileName = "C:\\xml\\PropertyListingContractSaleInfo\\PropertyListingContractSaleInfo.xsd";
//fileName = "C:\\xml\\PropertyPreservation\\PropertyPreservation.xsd";
XSOMNavigator.printFile(fileName);
}
}
It's agood bit of work depending on how compex your xsd is but basically.
if you had
<Document>
<Header/>
<Body/>
<Document>
And you wanted to find out where were the alowable children of header you'd (taking account of namespaces)
Xpath would have you look for '/element[name="Document"]/element[name="Header"]'
After that it depends on how much you want to do. You might find it easier to write or find something that loads an xsd into a DOM type structure.
Course you are going to possibly find all sorts of things under that elment in xsd, choice, sequence, any, attributes, complexType, SimpleContent, annotation.
Loads of time consuming fun.
Have a look at this.
How to parse schema using XOM Parser.
Also, here is the project home for XOM