org.xml.sax.SAXParseException: Reference is not allowed in prolog - java

I am trying to escape html characters of a string and use this string to build a DOM XML using parseXml method shown below. Next, I am trying to insert this DOM document into database. But, when I do that I am getting the following error:
org.xml.sax.SAXParseException: Reference is not allowed in prolog.
I have three questions:
1) I am not sure how to escape double quotes. I tried replaceAll("\"", """) and am not sure if this is right.
2) Suppose I want a string starting and ending with double quotes (eg: "sony"), how do I code it? I tried something like:
String sony = "\"sony\""
Is this right? Will the above string contain "sony" along with double quotes or is there another way of doing it?
3)I am not sure what the "org.xml.sax.SAXParseException: Reference is not allowed in prolog." error means. Can someone help me fix this?
Thanks,
Sony
Steps in my code:
Utils. java
public static String escapeHtmlEntities(String s) {
return s.replaceAll("&", "&").replaceAll("<", "<").replaceAll(">", ">").replaceAll("\"", """).
replaceAll(":", ":").replaceAll("/", "/");
}
public static Document parseXml (String xml) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xml)));
doc.setXmlStandalone(false);
return doc;
}
TreeController.java
protected void notifyNewEntryCreated(String entryType) throws Exception {
for (Listener l : treeControlListeners)
l.newEntryCreated();
final DomNodeTreeModel domModel = (DomNodeTreeModel) getModel();
Element parent_item = getSelectedEntry();
String xml = Utils.escapeHtmlEntities("<entry xmlns=" + "\"http://www.w3.org/2005/atom\"" + "xmlns:libx=" +
"\"http://libx.org/xml/libx2\">" + "<title>" + "New" + entryType + "</title>" +
"<updated>2010-71-22T11:08:43z</updated>" + "<author> <name>LibX Team</name>" +
"<uri>http://libx.org</uri>" + "<email>libx.org#gmail.com</email></author>" +
"<libx:" + entryType + "></libx:" + entryType + ">" + "</entry>");
xmlModel.insertNewEntry(xml, getSelectedId());
}
XMLDataModel.java
public void insertNewEntry (String xml, String parent_id) throws Exception {
insertNewEntry(Utils.parseXml(xml).getDocumentElement(), parent_id);
}
public void insertNewEntry (Element elem, String parent_id) throws Exception {
// inserting an entry with no libx: tag will create a storage leak
if (elem.getElementsByTagName("libx:package").getLength() +
elem.getElementsByTagName("libx:libapp").getLength() +
elem.getElementsByTagName("libx:module").getLength() < 1) {
// TODO: throw exception here instead of return
return;
}
XQPreparedExpression xqp = Q.get("insert_new_entry.xq");
xqp.bindNode(new QName("entry"), elem.getOwnerDocument(), null);
xqp.bindString(new QName("parent_id"), parent_id, null);
xqp.executeQuery();
xqp.close();
updateRoots();
}
insert_new_entry.xq
declare namespace libx='http://libx.org/xml/libx2';
declare namespace atom='http://www.w3.org/2005/atom';
declare variable $entry as xs:anyAtomicType external;
declare variable $parent_id as xs:string external;
declare variable $feed as xs:anyAtomicType := doc('libx2_feed')/atom:feed;
declare variable $metadata as xs:anyAtomicType := doc('libx2_meta')/metadata;
let $curid := $metadata/curid
return replace value of node $curid with data($curid) + 1,
let $newid := data($metadata/curid) + 1
return insert node
{$newid}{
$entry//
}
into $feed,
let $newid := data($metadata/curid) + 1
return if ($parent_id = 'root') then ()
else
insert node http://libx.org/xml/libx2' /> into
$feed/atom:entry[atom:id=$parent_id]//(libx:module|libx:libapp|libx:package)

To escape a double quote, use the " entity, which is predefined in XML.
So, your example string, say an attribute value, will look like
<person name=""sony""/>
There is also &apos; for apostrophe/single quote.
I see you have lots of replaceAll calls, but the replacements seem to be the same? There are some other characters that cannot be used literally, but should be escaped:
& --> &
> --> >
< --> <
" --> "
' --> &apos;
(EDIT: ok, I see this is just formatting - the entities are being turned into they're actual values when being presented by SO.)
The SAX exception is the parser grumbling because of the invalid XML.
As well as escaping the text, you will need to ensure it adheres to the well-formedness rules of XML. There's quite a bit to get right, so it's often simpler to use a 3rd party library to write out the XML. For example, the XMLWriter in dom4j.

You can check out Tidy specification. its a spec released by w3c. Almost all recent languages have their own implementation.
rather than just replace or care only to < ,>, & just configure JTidy ( for java ) options and parse. this abstracts all the complication of Xml escape thing.
i have used both python , java and marklogic based tidy implementations. all solved my purposes

Related

java xpath list concatenation

I am using java XPathFactory to get values from a simple xml file:
<Obama>
<coolnessId>0</coolnessId>
<cars>0</cars>
<cars>1</cars>
<cars>2</cars>
</Obama>
With the xpression //Obama/coolnessId | //Obama/cars the result is:
0
0
1
2
From this result, I cannot distinguish between what is the coolnessId and what is the car id. I would need something like:
CoolnessId: 0
CarId: 0
CarId: 1
CarId: 2
With concat('c_id: ', //Obama/coolnessId,' car_id: ',//Obama/cars) I am close to the solution, but concat cannot be used for a list of values.
Unfortunately, I cannot use string-join, because it seems not be known in my xpath library. And I cannot manipulate the given xml.
What other tricks can I use to get a list of values with something like an alias?
If you select the elements rather than their text content you'll have some context:
public static void main(String[] args) throws Exception {
String xml =
"<Obama>" +
" <coolnessId>0</coolnessId>" +
" <cars>0</cars>" +
" <cars>1</cars>" +
" <cars>2</cars>" +
"</Obama>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
Document doc = factory.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//Obama/cars | //Obama/coolnessId");
NodeList result = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < result.getLength(); i++) {
Element item = (Element) result.item(i);
System.out.println(item.getTagName() + ": " + item.getTextContent());
}
}
Assuming you ask for the result of the evaluation as a NODELIST, your XPath expression actually returns a sequence of four element nodes, not a sequence of four strings as you suggest. If your input uses the DOM tree model, these will be returned in the form of a DOM NodeList. You can process the Node objects in this NodeList to get the names of the nodes as well as their string values.
If you switch to an XPath 3.1 engine such as Saxon, you can get the result directly as a single string using the XPath expression:
string-join((//Obama/coolnessId | //Obama/cars) ! (name() || ': ' || string()), '\n')
To invoke XPath expressions in Saxon you can use either the JAXP API (javax.xml.xpath) or Saxon's s9api interface: I would recommend s9api because it understands the richer type system of XPath 2.0 and beyond.

How to get a specifc information for an XML file

I have a large XML file and below is an extract from it:
...
<LexicalEntry id="Ait~ifAq_1">
<Lemma partOfSpeech="n" writtenForm="اِتِّفاق"/>
<Sense id="Ait~ifAq_1_tawaAfuq_n1AR" synset="tawaAfuq_n1AR"/>
<WordForm formType="root" writtenForm="وفق"/>
</LexicalEntry>
<LexicalEntry id="tawaA&um__1">
<Lemma partOfSpeech="n" writtenForm="تَوَاؤُم"/>
<Sense id="tawaA&um__1_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="وأم"/>
</LexicalEntry>
<LexicalEntry id="tanaAgum_2">
<Lemma partOfSpeech="n" writtenForm="تناغُم"/>
<Sense id="tanaAgum_2_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="نغم"/>
</LexicalEntry>
<Synset baseConcept="3" id="tawaAfuq_n1AR">
<SynsetRelations>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hypernym" targets="ext_noun_NP_420"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="13971065-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
...
I want to extract specific information from it. For a given writtenForm whether from <Lemma> or <WordForm>, the programme takes the value of synset from <Sense> of that writtenForm (same <LexicalEntry>) and searches for all the value id of <Synset> that have the same value as the synset from <Sense>. After that, the programme gives us all the relations of that Synset, i.e it displays the value of relType and returns to <LexicalEntry> and looks for the value synset of <Sense> who have the same value of targets then displays its writtenForm.
I think it's a little bit complicated but the result should be like this:
اِتِّفاق hyponym تَوَاؤُم, اِنْسِجام
One of the solutions is the use of the Stream reader because of the memory consumption. but I don't how should I proceed to get what I want. help me please.
The SAX Parser is different from DOM Parser.It is looking only on the current item it can't see on the future items until they become the current item . It is one of the many you can use when XML file is extremely big . Instead of it there are many out there . To name a few:
SAX PARSER
DOM PARSER
JDOM PARSER
DOM4J PARSER
STAX PARSER
You can find for all them tutorials here.
In my opinion after learning it go straight to use DOM4J or JDOM for commercial product.
The logic of SAX Parser is that you have a MyHandler class which is extending DefaultHandler and #Overrides some of it's methods:
XML FILE:
<?xml version="1.0"?>
<class>
<student rollno="393">
<firstname>dinkar</firstname>
<lastname>kad</lastname>
<nickname>dinkar</nickname>
<marks>85</marks>
</student>
<student rollno="493">
<firstname>Vaneet</firstname>
<lastname>Gupta</lastname>
<nickname>vinni</nickname>
<marks>95</marks>
</student>
<student rollno="593">
<firstname>jasvir</firstname>
<lastname>singn</lastname>
<nickname>jazz</nickname>
<marks>90</marks>
</student>
</class>
Handler class:
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class UserHandler extends DefaultHandler {
boolean bFirstName = false;
boolean bLastName = false;
boolean bNickName = false;
boolean bMarks = false;
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes)
throws SAXException {
if (qName.equalsIgnoreCase("student")) {
String rollNo = attributes.getValue("rollno");
System.out.println("Roll No : " + rollNo);
} else if (qName.equalsIgnoreCase("firstname")) {
bFirstName = true;
} else if (qName.equalsIgnoreCase("lastname")) {
bLastName = true;
} else if (qName.equalsIgnoreCase("nickname")) {
bNickName = true;
}
else if (qName.equalsIgnoreCase("marks")) {
bMarks = true;
}
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("student")) {
System.out.println("End Element :" + qName);
}
}
#Override
public void characters(char ch[],
int start, int length) throws SAXException {
if (bFirstName) {
System.out.println("First Name: "
+ new String(ch, start, length));
bFirstName = false;
} else if (bLastName) {
System.out.println("Last Name: "
+ new String(ch, start, length));
bLastName = false;
} else if (bNickName) {
System.out.println("Nick Name: "
+ new String(ch, start, length));
bNickName = false;
} else if (bMarks) {
System.out.println("Marks: "
+ new String(ch, start, length));
bMarks = false;
}
}
}
Main Class :
import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAXParserDemo {
public static void main(String[] args){
try {
File inputFile = new File("input.txt");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
XPath was designed for exactly this. Java provides support for it in the javax.xml.xpath package.
To do what you want, the code will look something like this:
List<String> findRelations(String word,
Path xmlFile)
throws XPathException {
String xmlLocation = xmlFile.toUri().toASCIIString();
XPath xpath = XPathFactory.newInstance().newXPath();
xpath.setXPathVariableResolver(
name -> (name.getLocalPart().equals("word") ? word : null));
String id = xpath.evaluate(
"//LexicalEntry[WordForm/#writtenForm=$word or Lemma/#writtenForm=$word]/Sense/#synset",
new InputSource(xmlLocation));
xpath.setXPathVariableResolver(
name -> (name.getLocalPart().equals("id") ? id : null));
NodeList matches = (NodeList) xpath.evaluate(
"//Synset[#id=$id]/SynsetRelations/SynsetRelation",
new InputSource(xmlLocation),
XPathConstants.NODESET);
List<String> relations = new ArrayList<>();
int matchCount = matches.getLength();
for (int i = 0; i < matchCount; i++) {
Element match = (Element) matches.item(i);
String relType = match.getAttribute("relType");
String synset = match.getAttribute("targets");
xpath.setXPathVariableResolver(
name -> (name.getLocalPart().equals("synset") ? synset : null));
NodeList formNodes = (NodeList) xpath.evaluate(
"//LexicalEntry[Sense/#synset=$synset]/WordForm/#writtenForm",
new InputSource(xmlLocation),
XPathConstants.NODESET);
int formCount = formNodes.getLength();
StringJoiner forms = new StringJoiner(",");
for (int j = 0; j < formCount; j++) {
forms.add(
formNodes.item(j).getNodeValue());
}
relations.add(
String.format("%s %s %s", word, relType, forms));
}
return relations;
}
Some basic XPath information:
XPath uses a single file-path-like string to match parts of an XML document. The parts can be any structural part of the document: text, elements, attributes, even things like comments.
A Java XPath expression can attempt to match exactly one part, or several parts, or can even concatenate all matched parts as a String.
In an XPath expression, a name by itself represents an element. For example, WordForm in XPath means any <WordForm …> element in the XML document.
A name starting with # represents an attribute. For example, #writtenForm refers to any writtenForm=… attribute in the XML document.
A slash indicates a parent and child in an XML document. LexicalEntry/Lemma means any <Lemma> element which is a direct child of a <LexicalEntry> element. Synset/#id means the id=… attribute of any <Synset> element.
Just as a path starting with / indicates an absolute (root-relative) path in Unix, an XPath starting with a slash indicates an expression relative to the root of an XML document.
Two slashes means a descendant which may be a direct child, a grandchild, a great-grandchild, etc. Thus, //LexicalEntry means any LexicalEntry in the document; /LexicalEntry only matches a LexicalEntry element which is the root element.
Square brackets indicate match qualifiers. Synset[#baseConcept='3'] matches any <Synset> element with an baseConcept attribute whose value is the string "3".
XPath can refer to variables, which are defined externally, using Unix-shell-like $ substitutions, like $word. How those variables are passed to an XPath expression depends on the engine. Java uses the setXPathVariableResolver method. Variable names are in a completely separate namespace from node names, so it is of no consequence if a variable name is the same as an element name or attribute name in the XML document.
So, the XPath expressions in the code mean:
//LexicalEntry[WordForm/#writtenForm=$word or Lemma/#writtenForm=$word]/Sense/#synset
Match any <LexicalEntry> element anywhere in the XML document which has either
a WordForm child with a writtenForm attribute whose value is equal to the word variable
a Lemma child with a writtenForm attribute whose value is equal to the word variable
and for every such <LexicalEntry> element, return the value of the synset attribute of any <Sense> element which is a direct child of the <LexicalEntry> element.
The word variable is defined externally, by an xpath.setXPathVariableResolver, right before the XPath expression is evaluated.
//Synset[#id=$id]/SynsetRelations/SynsetRelation
Match any <Synset> element anywhere in the XML document whose id attribute is equal to the id variable. For each such <Synset> element, look for any direct SynsetRelations child element, and return each of its direct SynsetRelation children.
The id variable is defined externally, by an xpath.setXPathVariableResolver, right before the XPath expression is evaluated.
//LexicalEntry[Sense/#synset=$synset]/WordForm/#writtenForm
Match any <LexicalEntry> element anywhere in the XML document which has a <Sense> child element which has a synset attribute whose value is identical to the synset variable. For each matched element, find any <WordForm> child element and return that element’s writtenForm attribute.
The synset variable is defined externally, by an xpath.setXPathVariableResolver, right before the XPath expression is evaluated.
Logically, what the above should amount to is:
Locate the synset value for the requested word.
Use the synset value to locate SynsetRelation elements.
Locate writtenForm values corresponding to the targets value of each matched SynsetRelation.
If this XML file is too large to represent in memory, use SAX.
You will want to write your SAX parser to maintain a location. To do this, I typically use a StringBuffer, but a Stack of Strings would work just as nicely. This portion will be important because it will permit you to keep track of the path back to the root of the document, which will allow you to understand where in the document you are at a given point in time (useful when trying to only extract a little information).
The main logic flow looks like:
1. When entering a node, add the node's name to the stack.
2. When exiting a node, pop the node's name (top element) off the stack.
3. To know your location, read your current branch of the XML from the bottom of the stack to the top of the stack.
4. When entering a region you care about, clear the buffer you will capture the characters into
5. When exiting a region you care about, flush the buffer into the data structure you will return back as your output.
This way you can efficiently skip over all the branches of the XML tree that you don't care about.

How to use Selenium get text from an element not including its sub-elements

HTML
<div id='one'>
<button id='two'>I am a button</button>
<button id='three'>I am a button</button>
I am a div
</div>
Code
driver.findElement(By.id('one')).getText();
I've seen this question pop up a few times in the last maybe year or so and I've wanted to try writing this function... so here you go. It takes the parent element and removes each child's textContent until what remains is the textNode. I've tested this on your HTML and it works.
/**
* Takes a parent element and strips out the textContent of all child elements and returns textNode content only
*
* #param e
* the parent element
* #return the text from the child textNodes
*/
public static String getTextNode(WebElement e)
{
String text = e.getText().trim();
List<WebElement> children = e.findElements(By.xpath("./*"));
for (WebElement child : children)
{
text = text.replaceFirst(child.getText(), "").trim();
}
return text;
}
and you call it
System.out.println(getTextNode(driver.findElement(By.id("one"))));
Warning: the initial solution (deep below) won't workI opened an enhancement request: 2840 against the Selenium WebDrive and another one against the W3C WebDrive specification - the more votes, the sooner they'll get enough attention (one can hope). Until then, the solution suggested by #shivansh in the other answer (execution of a JavaScript via Selenium) remains the only alternative. Here's the Java adaptation of that solution (collects all text nodes, discards all that are whitespace only, separates the remaining by \t):
WebElement e=driver.findElement(By.xpath("//*[#id='one']"));
if(driver instanceof JavascriptExecutor) {
String jswalker=
"var tw = document.createTreeWalker("
+ "arguments[0],"
+ "NodeFilter.SHOW_TEXT,"
+ "{ acceptNode: function(node) { return NodeFilter.FILTER_ACCEPT;} },"
+ "false"
+ ");"
+ "var ret=null;"
+ "while(tw.nextNode()){"
+ "var t=tw.currentNode.wholeText.trim();"
+ "if(t.length>0){" // skip over all-white text values
+ "ret=(ret ? ret+'\t'+t : t);" // if many, tab-separate them
+ "}"
+ "}"
+ "return ret;" // will return null if no non-empty text nodes are found
;
Object val=((JavascriptExecutor) driver).executeScript(jswalker, e);
// ---- Pass the context node here ------------------------------^
String textNodesTabSeparated=(null!=val ? val.toString() : null);
// ----^ --- this is the result you want
}
References:
TreeWalker - supported by all browsers
Selenium Javascript Executor
Initial suggested solution - not working - see enhancement request: 2840
driver.findElement(By.id('one')).find(By.XPath("./text()").getText();
In a single search
driver.findElement(By.XPath("//[#id=one]/text()")).getText();
See XPath spec/Location Paths the child::text() selector.
I use a function like below:
private static final String ALL_DIRECT_TEXT_CONTENT =
"var element = arguments[0], text = '';\n" +
"for (var i = 0; i < element.childNodes.length; ++i) {\n" +
" var node = element.childNodes[i];\n" +
" if (node.nodeType == Node.TEXT_NODE" +
" && node.textContent.trim() != '')\n" +
" text += node.textContent.trim();\n" +
"}\n" +
"return text;";
public String getText(WebDriver driver, WebElement element) {
return (String) ((JavascriptExecutor) driver).executeScript(ALL_DIRECT_TEXT_CONTENT, element);
}
var outerElement = driver.FindElement(By.XPath("a"));
var outerElementTextWithNoSubText = outerElement.Text.Replace(outerElement.FindElement(By.XPath("./*")).Text, "");
Similar solution to the ones given, but instead of JavaScript or setting text to "", I remove elements in the XML and then get the text.
Problem:
Need text from 'root element without children' where children can be x levels deep and the text in the root can be the same as the text in other elements.
The solution treats the webelement as an XML and replaces the children with voids so only the root remains.
The result is then parsed. In my cases this seems to be working.
I only verified this code in a environment with Groovy. No idea if it will work in Java without modifications. Essentially you need to replace the groovy libraries for XML with Java libraries and off you go I guess.
As for the code itself, I have two parameters:
WebElement el
boolean strict
When strict is true, then really only the root is taken into account. If strict is false, then markup tags will be left. I included in this whitelist p, b, i, strong, em, mark, small, del, ins, sub, sup.
The logic is:
Manage whitelisted tags
Get element as string (XML)
Parse to an XML object
Set all child nodes to void
Parse and get text
Up until now this seems to be working out.
You can find the code here: GitHub Code

Building XML document in lowercase or TitleCase - flag based

In my current project, we are in the process of re-factoring a java class that constructs an XML document. In previous versions of the product delivered to the customer, the XML document is built with lower case elements and attributes:
<rootElement attr = "abc">
<childElement childAttr = "xyz"/>
</rootElement>
But now we have a requirement to build the XML document with TitleCase element and attributes. The user will set a flag in a properties file to indicate whether the document should be built in lower case or title case. If the flag is configured to build the document in TitleCase, the resultant document will look like:
<RootElement Attr = "abc">
<ChildElement ChildAttr = "xyz">
</RootElement>
Various approaches to solve the problem:
1. Plugging in a transformer to convert lowercase XML document to TitleCase XML document. But this will impact the overall performance, as we deal with huge XML files spanning more than 10,000 lines.
2. Create two separate maps with corr. XML elements and attributes.
For eg:
lowercase map: rootelement -> rootElement, attr -> attr ....
TitelCase map: rootlement -> RootElement, attr -> Attr ....
Based on the property set by the user, the corr. map will be chosen and XML element/attributes from this map will be used to build the XML document.
3. Using enum to define constants and its corr. values.
public enum XMLConstants {
ROOTELEMENT("rootElement", "RootElement"),
ATTRIBUTE("attr", "Attr");
private String lowerCase;
private String titleCase;
private XMLConstants(String aLowerCase, String aTitleCase){
titleCase = aTitleCase;
lowerCase = aLowerCase;
}
public String getValue(boolean isLowerCase){
if(isLowerCase){
return lowerCase;
} else {
return titleCase;
}
}
}
--------------------------------------------------------------
// XML document builder
if(propertyFlag){
isLowerCase = false;
} else {
isLowerCase = true;
}
....
....
createRootElement(ROOTELEMENT.getValue(isLowerCase));
createAttribute(ATTRIBUTE.getValue(isLowerCase));
Please help me choose the right option keeping in mind the performance aspect of the entire solution. If you have any other suggestions, please let me know.
// set before generate XML
boolean isUpperCase;
// use function for each tag/attribute name instead of string constant
// smth. like getInCase("rootElement")
String getInCase(String initialName) {
String intialFirstCharacter = initialName.substring(0, 1);
String actualFirstCharacter;
if (isUpperCase) {
actualFirstCharacter = intialFirstCharacter.toUpperCase();
} else {
actualFirstCharacter = intialFirstCharacter.toLowerCase();
}
return actualFirstCharacter + initialName.substring(1);
}

XML Parsing Error: junk after document element - REST

I am working on a RESTful web service, that will return a list of RSS feeds that someone has added to a feed list which I have previously implemented.
Now if I return a TEXT_PLAIN reply, this displays just fine in the browser, although when I attempt to return an APPLICATION_XML reply, then I get the following error:
XML Parsing Error: junk after document element
Location: http:// localhost:8080/Assignment1/api/feedlist
Line Number 1, Column 135:SMH Top Headlineshttp://feeds.smh.com.au/rssheadlines/top.xmlUTS Library Newshttp://www.lib.uts.edu.au/news/feed/all
Here is the code - I cannot figure out why it is not returning a well formed XML page (I have also tried formatting the XML reply with new lines and spaces(indents) - and of course this did not work):
package au.com.rest;
import java.io.FileNotFoundException;
import java.io.IOException;
import javax.ws.rs.*;
import javax.ws.rs.core.*;
import au.edu.uts.it.wsd.*;
#Path("/feedlist")
public class RESTFeedService {
String feedFile = "/tmp/feeds.txt";
String textReply = "";
String xmlReply = "<?xml version=\"1.0\"?><feeds>";
FeedList feedList = new FeedListImpl();
#GET
#Produces(MediaType.APPLICATION_XML)
public String showXmlFeeds() throws FileNotFoundException, IOException
{
feedList.load(feedFile);
for (Feed f:feedList.list()){
xmlReply += "<feed><name>" + f.getName() + "</name>";
xmlReply += "<uri>" + f.getURI() + "</uri></feed></feeds>";
}
return xmlReply;
}
}
EDIT: I've spotted the immediate problem now. You're closing the feeds element on every input element:
for (Feed f:feedList.list()){
xmlReply += "<feed><name>" + f.getName() + "</name>";
xmlReply += "<uri>" + f.getURI() + "</uri></feed></feeds>";
}
The minimal change would be:
for (Feed f:feedList.list()){
xmlReply += "<feed><name>" + f.getName() + "</name>";
xmlReply += "<uri>" + f.getURI() + "</uri></feed>";
}
xmlReply += "</feeds>";
... but you should still apply the rest of the advice below.
First step - you need to diagnose the problem further. Look at the source in the browser to see exactly what it's complaining about. Can you see the problem in the XML yourself? What does it look like?
Without knowing about the rest framework you're using, this looks like it could be a
problem to do with a single instance servicing multiple requests. For some reason you've got an instance variable which you're mutating in your method. Why would you want to do that? If a new instance of your class is created for each request, it shouldn't be a problem - but I don't know if that's the case.
As a first change, try moving this line:
String xmlReply = "<?xml version=\"1.0\"?><feeds>";
into the method as a local variable.
After that though:
Keep all your fields private
Avoid using string concatenation in a loop like this
More importantly, don't build up XML by hand - use an XML API to do it. (The built-in Java APIs aren't nice, but there are plenty of alternatives.)
Consider which of these fields (if any) is really state of the object rather than something which should be a local variable. What state does your object logically have at all?

Categories

Resources