I have an xml which has a simple set of data.
This data is displayed in a simple table and each row of data is assigned an ID in the table based on the position in the xml ( <xsl:value-of select="position()"
/> ). I cant add an id attribute to the data because its not my data, but I need to locate elements based on this position and remove them.
public class Delete extends HttpServlet {
private final String XML_FILE = "data.xml";
public void init() throws ServletException {
}
public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// Disable browser caching
response.setHeader("Cache-Control", "private, no-store, no-cache, must-revalidate");
response.setHeader("Pragma", "no-cache");
response.setDateHeader("Expires", 0);
String index = request.getParameter("delete");
try {
// Load the current data.xml
SAXBuilder builder = new SAXBuilder();
Document xml_document = builder.build(new File(getServletContext().getRealPath("/") + XML_FILE));
Element root = xml_document.getRootElement();
root.removeChild(index);
XMLOutputter outputter = new XMLOutputter(Format.getPrettyFormat());
outputter.output(xml_document, new FileWriter(getServletContext().getRealPath("/") + XML_FILE));
}
catch(Exception ex) {}
// Once we have processed the input we were given
// redirect the web browser to the main page.
response.sendRedirect("/");
}
public void destroy() {
}
}
This code does not remove the correct data. Anyone know how to find the child of the root element by its position?
#rolfl
int index = Integer.parseInt(delete);
Element root = xml_document.getRootElement();
root.getChildren().remove(index);
This does not remove any elements.
Your problem here is that the process is getting the index to remove as a string, and that's then calling the removeChild(String) method .... which looks for the first child that has an element tag name of whatever (string) value is in the index.
What you want to do, instead, is to convert the index to an int, and then treat the children of the root as a List.... something like:
int index = Integer.parseInt(request.getParameter("delete"));
root.getChildren().remove(index);
See the documentation for getChildren().
This is how I got it to work. Not sure if its a great solution but it works.
SAXBuilder builder = new SAXBuilder();
Document xml_document = builder.build(new File(getServletContext().getRealPath("/") + XML_FILE));
// Get root element
Element root = xml_document.getRootElement();
// Create a list of the children of the root element
List<Element> kids = root.getChildren();
// Interate through list of elements and delete (detach) the element at position index.
int i = 1;
for (Element element : kids)
{
if(i == index)
{
element.detach();
break;
}
else
{
i = i + 1;
}
}
I got the root element with
Element root = xml_document.getRootElement();
Made a list of it's children elements with
List<Element> kids = root.getChildren();
Then iterated through this list until I reached the index of the element to delete then did .detach on this element
int i = 1;
for (Element element : kids)
{
if(i == index)
{
element.detach();
break;
}
else
{
i = i + 1;
}
}
If anyone can update this to show an easier way to remove the element please do so. It feels like there must be an easier way to detach an element without the iteration. Anyway, as I said it works.
Related
I have the HTML string like
<b>test</b><b>er</b>
<span class="ab">continue</span><span> without</span>
I want to collapse the Tags which are similar and belong to each other. In the above sample I want to have
<b>tester</b>
since the tags have the same tag withouth any further attribute or style. But for the span Tag it should remain the same because it has a class attribute. I am aware that I can iterate via Jsoup over the tree.
Document doc = Jsoup.parse(input);
for (Element element : doc.select("b")) {
}
But I'm not clear how look forward (I guess something like nextSibling) but than how to collapse the elements?
Or exists a simple regexp merge?
The attributes I can specify on my own. It's not required to have a one-fits-for-all Tag solution.
My approach would be like this. Comments in the code
public class StackOverflow60704600 {
public static void main(final String[] args) throws IOException {
Document doc = Jsoup.parse("<b>test</b><b>er</b><span class=\"ab\">continue</span><span> without</span>");
mergeSiblings(doc, "b");
System.out.println(doc);
}
private static void mergeSiblings(Document doc, String selector) {
Elements elements = doc.select(selector);
for (Element element : elements) {
// get the next sibling
Element nextSibling = element.nextElementSibling();
// merge only if the next sibling has the same tag name and the same set of attributes
if (nextSibling != null && nextSibling.tagName().equals(element.tagName())
&& nextSibling.attributes().equals(element.attributes())) {
// your element has only one child, but let's rewrite all of them if there's more
while (nextSibling.childNodes().size() > 0) {
Node siblingChildNode = nextSibling.childNodes().get(0);
element.appendChild(siblingChildNode);
}
// remove because now it doesn't have any children
nextSibling.remove();
}
}
}
}
output:
<html>
<head></head>
<body>
<b>tester</b>
<span class="ab">continue</span>
<span> without</span>
</body>
</html>
One more note on why I used loop while (nextSibling.childNodes().size() > 0). It turned out for or iterator couldn't be used here because appendChild adds the child but removes it from the source element and remaining childen are be shifted. It may not be visible here but the problem will appear when you try to merge: <b>test</b><b>er<a>123</a></b>
I tried to update the code from #Krystian G but my edit was rejected :-/ Therefore I post it as an own post. The code is an excellent starting point but it fails if between the tags a TextNode appears, e.g.
<span> no class but further</span> (in)valid <span>spanning</span> would result into a
<span> no class but furtherspanning</span> (in)valid
Therefore the corrected code looks like:
public class StackOverflow60704600 {
public static void main(final String[] args) throws IOException {
String test1="<b>test</b><b>er</b><span class=\"ab\">continue</span><span> without</span>";
String test2="<b>test</b><b>er<a>123</a></b>";
String test3="<span> no class but further</span> <span>spanning</span>";
String test4="<span> no class but further</span> (in)valid <span>spanning</span>";
Document doc = Jsoup.parse(test1);
mergeSiblings(doc, "b");
System.out.println(doc);
}
private static void mergeSiblings(Document doc, String selector) {
Elements elements = doc.select(selector);
for (Element element : elements) {
Node nextElement = element.nextSibling();
// if the next Element is a TextNode but has only space ==> we need to preserve the
// spacing
boolean addSpace = false;
if (nextElement != null && nextElement instanceof TextNode) {
String content = nextElement.toString();
if (!content.isBlank()) {
// the next element has some content
continue;
} else {
addSpace = true;
}
}
// get the next sibling
Element nextSibling = element.nextElementSibling();
// merge only if the next sibling has the same tag name and the same set of
// attributes
if (nextSibling != null && nextSibling.tagName().equals(element.tagName())
&& nextSibling.attributes().equals(element.attributes())) {
// your element has only one child, but let's rewrite all of them if there's more
while (nextSibling.childNodes().size() > 0) {
Node siblingChildNode = nextSibling.childNodes().get(0);
if (addSpace) {
// since we have had some space previously ==> preserve it and add it
if (siblingChildNode instanceof TextNode) {
((TextNode) siblingChildNode).text(" " + siblingChildNode.toString());
} else {
element.appendChild(new TextNode(" "));
}
}
element.appendChild(siblingChildNode);
}
// remove because now it doesn't have any children
nextSibling.remove();
}
}
}
}
So I am trying to get the data from this webpage using Jsoup...
I've tried looking up many different ways of doing it and I've gotten close but I don't know how to find tags for certain stats (Attack, Strength, Defence, etc.)
So let's say for examples sake I wanted to print out
'Attack', '15', '99', '200,000,000'
How should I go about doing this?
You can use CSS selectors in Jsoup to easily extract the column data.
// retrieve page source code
Document doc = Jsoup
.connect("http://services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=Lynx%A0Titan")
.get();
// find all of the table rows
Elements rows = doc.select("div#contentHiscores table tr");
ListIterator<Element> itr = rows.listIterator();
// loop over each row
while (itr.hasNext()) {
Element row = itr.next();
// does the second col contain the word attack?
if (row.select("td:nth-child(2) a:contains(attack)").first() != null) {
// if so, assign each sibling col to variable
String rank = row.select("td:nth-child(3)").text();
String level = row.select("td:nth-child(4)").text();
String xp = row.select("td:nth-child(5)").text();
System.out.printf("rank=%s level=%s xp=%s", rank, level, xp);
// stop looping rows, found attack
break;
}
}
A very rough implementation would be as below. I have just shown a snippet , optimizations or other conditionals need to be added
public static void main(String[] args) throws Exception {
Document doc = Jsoup
.connect("http://services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=Lynx%A0Titan")
.get();
Element contentHiscoresDiv = doc.getElementById("contentHiscores");
Element table = contentHiscoresDiv.child(0);
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (Element column : tds) {
if (column.children() != null && column.children().size() > 0) {
Element anchorTag = column.getElementsByTag("a").first();
if (anchorTag != null && anchorTag.text().contains("Attack")) {
System.out.println(anchorTag.text());
Elements attributeSiblings = column.siblingElements();
for (Element attributeSibling : attributeSiblings) {
System.out.println(attributeSibling.text());
}
}
}
}
}
}
Attack
15
99
200,000,000
I've got method which returns me a Map from an XML file. I've converted that map to separate Keys and Values into List.
However I'm noticing there are newline characters in the values list. How can I strip out the newline and replace them with a space or leave them blank.
Code:
#Test
public void testGetXMLModelData() throws Exception {
File f = new File("xmlDir/example.xml");
Model m = getXMLModelData(f);
logger.debug("Models Keys: "+m.getInputs());
logger.debug("Models Values: "+m.getValues());
}
public Model getXMLModelData(File f) throws Exception {
Model model = new Model();
Map<String,String> map = p(f);
List<String> listKeys = new ArrayList<String>(map.keySet());
List<String> listValues = new ArrayList<String>(map.values());
model.setInputs(listKeys);
model.setValues(listValues);
return model;
}
public Map<String, String> p(File file) throws Exception {
Map<String, String> map = new HashMap<String,String>();
XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream(file));
while(xr.hasNext()) {
int e = xr.next();
if (e == XMLStreamReader.START_ELEMENT) {
String name = xr.getLocalName();
xr.next();
String value = null;
try {
value = xr.getText();
} catch (IllegalStateException exep) {
exep.printStackTrace();
}
map.put(name, value);
}
}
return map;
}
Output:
2015-08-19 20:13:52,327 : Models Keys: [IRS1095A, MonthlyPlanPremiumAmtPP, WagesSalariesAndTipsAmt, MonthlyAdvancedPTCAmtPP, MonthCdPP, ReturnData, IndividualReturnFilingStatusCd, PrimaryResidentStatesInfoGrpPP, MonthlyPTCInformationGrpPP, IRS1040, ResidentStateInfoPP, SelfSelectPINGrp, MonthlyPremiumSLCSPAmtPP, Filer, ResidentStateAbbreviationCdPP, PrimaryBirthDt, Return, ReturnHeader, TotalExemptionsCnt, AdjustedGrossIncomeAmt, PrimarySSN]
2015-08-19 20:13:52,328 : Models Values: [
, 136, 22000, 125, SEPTEMBER,
, 1,
,
,
,
,
, 250,
, CA, 1970-01-01,
,
, 1, 22000, 555-11-2222]
Any help or assistance would be much appreciated. Thanks in advance
Edit:
XML file
<Return xmlns="http://www.irs.gov/efile">
<ReturnData>
<IRS1095A uuid="a77f40a2-af31-4404-a27d-4c1eaad730c2">
<MonthlyPTCInformationGrpPP uuid="69dc9dd5-5415-4ee4-a199-19b2dbb701be">
<MonthlyPlanPremiumAmtPP>136</MonthlyPlanPremiumAmtPP>
<MonthlyAdvancedPTCAmtPP>125</MonthlyAdvancedPTCAmtPP>
<MonthCdPP>SEPTEMBER</MonthCdPP>
<MonthlyPremiumSLCSPAmtPP>250</MonthlyPremiumSLCSPAmtPP>
</MonthlyPTCInformationGrpPP>
</IRS1095A>
<IRS1040>
<IndividualReturnFilingStatusCd>1</IndividualReturnFilingStatusCd>
<WagesSalariesAndTipsAmt>22000</WagesSalariesAndTipsAmt>
<TotalExemptionsCnt>1</TotalExemptionsCnt>
<AdjustedGrossIncomeAmt>22000</AdjustedGrossIncomeAmt>
</IRS1040>
</ReturnData>
<ReturnHeader>
<SelfSelectPINGrp>
<PrimaryBirthDt>1970-01-01</PrimaryBirthDt>
</SelfSelectPINGrp>
<Filer>
<PrimarySSN>555-11-2222</PrimarySSN>
<PrimaryResidentStatesInfoGrpPP>
<ResidentStateInfoPP uuid="a77f40a2-af31-4404-a27d-4c1eaad730c2">
<ResidentStateAbbreviationCdPP>CA</ResidentStateAbbreviationCdPP>
</ResidentStateInfoPP>
</PrimaryResidentStatesInfoGrpPP>
</Filer>
</ReturnHeader>
</Return>
Set value = xr.getText().trim(). That will trim extraneous characters from the beginning and end of the values.
To then prevent adding the value, wrap the map.put(name, value) with an if (value != null && !value.isEmpty())
Your code is extracting the element name and the text immediately following the start element, ignoring any text following an end element.
So, it collects:
Return = <newline><space><space>
ReturnData = <newline><space><space><space><space>
IRS1095A = <newline><space><space><space><space><space><space>
MonthlyPTCInformationGrpPP = <newline><space><space><space><space><space><space><space><space>
MonthlyPlanPremiumAmtPP = 136
...
And then you add those to a HashMap, which shuffles the key/value pairs in random order, making it difficult to see what happened.
Updated
I'm not going to write the code for you, but if you want "value elements" then you need to:
Remember start element when seen
Collect any text, concatenating with other text already collected, e.g. when you see <text><cdata><text>
When seeing a start element and a start element is remembered, verify text is empty or all whitespace, then discard text
When seeing an end element:
if start element is remembered, add elementName/text to result, then forget start element and discard text. Note: Don't use map if same element name can occur more than once.
if start element is not remembered (was forgotton), verify text is empty or all whitespace, then discard text
This will collect just the leaf elements, ignoring any "layout".
Code exactly as written above
Well, I did add missing resource cleanup.
Map<String, String> map = new HashMap<>();
try (FileInputStream in = new FileInputStream(file)) {
XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(in);
try (
String elementName = null;
StringBuilder textBuf = new StringBuilder();
while (xr.hasNext()) {
switch (xr.next()) {
case XMLStreamConstants.START_ELEMENT:
// 3. When seeing a start element and a start element is remembered
if (elementName != null) {
// verify text is empty or all whitespace
if (! textBuf.toString().trim().isEmpty())
throw new IllegalArgumentException("Found text mixed with elements");
// then discard text
textBuf.setLength(0);
}
// 1. Remember start element when seen
elementName = xr.getLocalName();
break;
case XMLStreamConstants.CHARACTERS:
case XMLStreamConstants.CDATA:
case XMLStreamConstants.SPACE:
// 2. Collect any text
textBuf.append(xr.getText());
break;
case XMLStreamConstants.END_ELEMENT: // 4. When seeing an end element
if (elementName != null) { // 1. if start element is remembered
// add elementName/text to result
map.put(elementName, textBuf.toString());
// then forget start element
elementName = null;
// and discard text
textBuf.setLength(0);
} else { // 2. if start element is not remembered (was forgotton)
// verify text is empty or all whitespace
if (! textBuf.toString().trim().isEmpty())
throw new IllegalArgumentException("Found text mixed with elements");
// then discard text
textBuf.setLength(0);
}
break;
default:
// ignore
}
}
} finally {
xr.close();
}
}
return map;
I got a question regarding XML and parsing it. I use JDOM to parse my XML-File, but I got a little Problem.
A sample of my XML-File looks like this:
<IO name="Bus" type="Class">
<ResourceAttribute name="Bandwidth" type="KiloBitPerSecond" value="50" />
</IO>
Bus is a object instance of the class IO. The object got the name and type properties. Additional it has some attributes, like in the sample, the Attribute Bandwidth with the value of 50 and the datatype KiloBitPerSecond.
So when I want to loop over the file with:
for(Element packages : listPackages)
{
Map<String, Values> valueMap = new HashMap<String, Values>();
List<Element> objectInstanceList = packages.getChildren();
for(Element objects : objectInstanceList)
{
List<Element> listObjectClasses = objects.getChildren();
for(Element classes : listObjectClasses)
{
List<Element> listObjectAttributes = classes.getChildren();
for(Element objectAttributes : listObjectAttributes)
{
List<Attribute> listAttributes = objectAttributes.getAttributes();
for(Attribute attributes : listAttributes)
{
String name = attributes.getName();
String value = attributes.getValue();
AttributeType datatype = attributes.getAttributeType();
Values v = new Values(name, datatype, value);
valueMap.put(classes.getName(), v);
System.out.println(name + ":" + value);
}
}
}
}
//System.out.println(valueMap);
}
values is a class which defines the object attribute:
public class Values{
private String name;
//private AttributeType datatype;
private String value;
Thats the rest of the Code. I got two question relating that. The first one got more priority at the moment.
How do I get the values of the object(Attribute.Name = Bandwidth; Attribute.Value = 50) ? Istead that I get
name:Bus
type:Class
I thought about an additional for-loop, but the JDOM class attribute dont have a method called getAttributes().
Thats just second priority because without question 1 I cannot go further. As you see in the sample, an Attribute got 3 properties, name, type and value. How can I extract that triple put of the sample. JDOM seems just to know 2 properties for an Attribute, name and value.
thanks a lot in advance and hopefully I managed to express my self.
Edit: Added an additional for-loop in it, so the output now is:
name:Bandwidth
type:KiloBitPerSecond
value:50
That means name is the name of that property and value is the value of name. Didnt know that. At least question one is clearer now and I can try working on 2, but the new information makes 2 clearer to me.
In xml the opening tag of elements are encosoed between < and > (or />) , after the < comes the name of the element, then comes a list of attributes in the format name="value". An element can be closed inline with /> or with a closing tag </[element name]>
It would be preferable to use recursion to parse your xml instead of badly readable/maintainable nested for loops.
Here is how it could look like:
#Test
public void parseXmlRec() throws JDOMException, IOException {
String xml = "<root>"
+ "<Package>"
+ "<IO name=\"Bus\" type=\"Class\">\r\n" +
" <ResourceAttribute name=\"Bandwidth\" type=\"KiloBitPerSecond\" value=\"50\" />\r\n" +
" </IO>"
+ "</Package>"
+ "</root>";
InputStream is = new ByteArrayInputStream(xml.getBytes());
SAXBuilder sb = new SAXBuilder();
Document document = sb.build(is);
is.close();
Element root = document.getRootElement();
List<Element> children = root.getChildren();
for(Element element : children) {
parseelement(element);
}
}
private void parseelement(Element element) {
System.out.println("Element:" + element.getName());
String name = element.getAttributeValue("name");
if(name != null) {
System.out.println("name: " + name);
}
String type = element.getAttributeValue("type");
if(type != null) {
System.out.println("type: " + type);
}
String value = element.getAttributeValue("value");
if(value != null) {
System.out.println("value: " + value);
}
List<Element> children = element.getChildren();
if(children != null) {
for(Element child : children) {
parseelement(child);
}
}
}
This outputs:
Element: Package
Element: IO
name: Bus
type: Class
Element: ResourceAttribute
name: Bandwidth
type: KiloBitPerSecond
value: 50
While parsing, check the name of each element and instanciate the coresponding objects. For that I would suggest to write a separate method to handle each element. For example:
void parsePackage(Element packageElement) { ... }
parseIO(Element ioElement) { ... }
void parseResourceAttribute(Element resourceAttributeElement) { ... }
I am trying to replicate this XML:
<?xml version="1.0"?>
<AccessRequest xml:lang="en-US">
<AccessLicenseNumber>YourLicenseNumber</AccessLicenseNumber>
<UserId>YourUserID</UserId>
<Password>YourPassword</Password>
</AccessRequest>
<?xml version="1.0"?>
<AddressValidationRequest xml:lang="en-US">
<Request>
<TransactionReference>
<CustomerContext>Your Test Case Summary Description</CustomerContext>
<XpciVersion>1.0</XpciVersion>
</TransactionReference>
<RequestAction>XAV</RequestAction>
<RequestOption>3</RequestOption>
</Request>
<AddressKeyFormat>
<AddressLine>AIRWAY ROAD SUITE 7</AddressLine>
<PoliticalDivision2>SAN DIEGO</PoliticalDivision2>
<PoliticalDivision1>CA</PoliticalDivision1>
<PostcodePrimaryLow>92154</PostcodePrimaryLow>
<CountryCode>US</CountryCode>
</AddressKeyFormat>
</AddressValidationRequest>
I am using one class to build the request:
public UpsRequestBuilder()
{
try
{
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
doc = docBuilder.newDocument();
}
catch(Exception e)
{
System.out.println(e.getMessage());
}
}
public void accessRequestBuilder(String accessKey, String username, String password)
{
Element accessRequest = doc.createElement("AccessRequest");
doc.appendChild(accessRequest);
Element license = doc.createElement("AccessLicenseNumber");
accessRequest.appendChild(license);
license.setTextContent(accessKey);
Element userId = doc.createElement("UserId");
accessRequest.appendChild(userId);
userId.setTextContent(username);
Element pass = doc.createElement("Password");
accessRequest.appendChild(pass);
pass.setTextContent(password);
System.out.println("completed Requestbuilder");
}
public void addAddress(Address address)
{
Element addressKeyFormat = doc.createElement("AddressKeyFormat");
doc.appendChild(addressKeyFormat);
Element addressLine = doc.createElement("AddressLine");
addressKeyFormat.appendChild(addressLine);
addressLine.setTextContent(address.getState() + ' ' + address.getStreet2());
Element city = doc.createElement("PoliticalDivision2");
addressKeyFormat.appendChild(city);
city.setTextContent(address.getCity());
Element state = doc.createElement("PoliticalDivision1");
addressKeyFormat.appendChild(state);
state.setTextContent(address.getState());
Element zip = doc.createElement("PostcodePrimaryLow");
addressKeyFormat.appendChild(zip);
zip.setTextContent(address.getZip());
Element country = doc.createElement("CountryCode");
addressKeyFormat.appendChild(country);
country.setTextContent(address.getCountry());
System.out.println("completed addAddress");
}
public void validateAddressRequest(String customerContextString, String action)
{
Element addressValidation = doc.createElement("AddressValidationRequest");
doc.appendChild(addressValidation);
Element transactionReference = doc.createElement("TransactionReference");
addressValidation.appendChild(transactionReference);
Element customerContext = doc.createElement("CustomerContext");
Element version = doc.createElement("XpciVersion");
transactionReference.appendChild(customerContext);
customerContext.setTextContent(customerContextString); //TODO figure out a way to optionally pass context text
transactionReference.appendChild(version);
version.setTextContent("1.0");//change this if the api version changes
Element requestAction = doc.createElement("RequestAction");
addressValidation.appendChild(requestAction);
requestAction.setTextContent(action);
System.out.println("completed validateAddressRequest");
}
And this is the function that uses it:
public void validateAddress(Address address)
{
UpsRequestBuilder request = new UpsRequestBuilder();
request.accessRequestBuilder(accessKey, username, password);
request.validateAddressRequest("", "3");
request.addAddress(address);
System.out.println(request.toString());
}
When I try and print out the XML from this, I get the error "HIERARCHY_REQUEST_ERR: An attempt was made to insert a node where it is not permitted." It happens in the validateAddressRequest function when I try and add the addressValidation element to the document (doc). Here is the exact line:
doc.appendChild(addressValidation);
what is the problem with adding this element to the document?
what is the problem with adding this element to the document?
You're trying to add it at the top level of the document. You can't do that, as the document already has a root element. Any XML document can only have a single root element.
The XML you've shown at the top of your question isn't a single XML document - it's two.