Am working with the server response with my application. am getting this stuff as response as string.
<Body>
<HotelRQ xmlns="urn:Hotel_Search">
<POS>
<Source Username='USERNAME' Password='PASSWORD' PropertyID='PROPERTYID' />
</POS>
<AvailRequests>
<AvailRequest>
<StayDateRange Start='2009-09-05T12:00:00' End='2009-09-06T12:00:00'/>
<RoomStays>
<RoomStay> <!—for Room 1->
<GuestCounts>
<GuestCount Count='2'/>
</GuestCounts>
<ChildCounts>
<ChildAge Age='10'/>
<ChildAge Age='09'/>
</ChildCounts>
</RoomStay>
</RoomStays>
<SearchCriteria>
<Criterion>
<HotelRef HotelCityName='CITY' HotelName='' Area='' Attraction='' Rating=''/>
<Sorting Preference='2'/>
<ResponseType Compact="Y"/>
</Criterion>
</SearchCriteria>
</AvailRequest>
</AvailRequests>
</HotelRQ>
</Body>
can i use jaxb for this?
i am using this file to parse which i am getting on server response. what i nned to do for parsing it using JAXB. am getting it in String format.and how can i directly map into java class variables or beans?
thanx.
You can see this question: How do I load an org.w3c.dom.Document from XML in a string?
After loading the XML string inside a DOM document, you can call, among others, the method getElementsByTagName to retrieve specific tags and then iterate over them, using getNodeValue to extract the value inside a tag and getAttribute to extract attributes.
You can't easily map it to a Java class with desider fields at runtime, but you can wrap the parsing and extraction of elements inside a class with simple methods like getRooms()
Related
I am having below xml structure. And need to unmarshall using JAXB.
`
<Rule>
<RuleNumber>5001</RuleNumber>
**<RuleA>**
<Match> some text-1 </Match>
<Ignore> some text-2 </Ignore>
<MatchWord> some text-3 </MatchWord>
**</RuleA>**
**<RuleB>**
<Ignore> some text-4 </Ignore>
<MatchWord> some text-5 </MatchWord>
**</RuleB>**
</Rule>`
In the above xml , in level 1 <Rule> tag is there. It is known before unmarshalling.
In level -2 , <RuleNumber> , <RuleA> , <RuleB> tags are there. In this level-2 the Tag <RuleNumber> is known before unmarshalling but the other two tag's names are not known. i.e. <RuleA>, <RuleB> tags are not known before unmarshalling. And one more thing instead of <RuleA> or <RuleB> , the name of these tags may be any name. It may be <RuleC> or <RuleZ> or even <Rule101>, <Rule102>, <RuleA1> , <RuleZ1> etc.
In level-3 , <Match> , <Ignore>, <MatchWord> tags are there. Theses are known tags before unmarshalling.
I want to unmarshall this xml using jaxb . Here the name of the tag <RuleA> , <RuleB> are not known , the name can be anything. So how to write a class to map this kind of xml structure?. And subsequently needed to unmarshalling the data.
Is it possible to handle this scenario using JAXB ?
I know to write code for simple unknown type using #xmlAnyElement, i.e. if unknown tags are not having any child elements. But don't know how to approach this complex type. In my case the unknown tags having child tags.
I have list of webpages around 1 million, I want to efficiently just extract text from those pages. Currently I am using BeautifulSoup library in python to get text from HTML and using request command to get html of a webpage. This approach extract some extra information in addition to the text like if any javascript is listed in body.
Could you please suggest me any suitable and efficient way to do the task. I looked at scrapy but it looks like it crawls specific website. Can we pass it list of specific webpages to get information from ?
Thank you in advance.
Yes, you can use Scrapy to crawl a set of URLs in a generic fashion.
You simply need to set them on the start_urls list attribute of your spider, or reimplement the start_requests spider method to yield requests from any data source, and then implement your parse callback to perform the generic content extraction you want.
You can use html-text to extract text from them, and regular Scrapy selectors to extract additional data like the one you mention.
In scrapy you can set up your own parser. E.g. Beautiful soup. This parser you can call from your parse method.
To extract text from generic pages I traverse the body only, exclude comments etc and some tags like script, style, etc:
for snippet in soup.find('body').descendants:
if isinstance(snippet, bs4.element.NavigableString) \
and not isinstance(snippet, EXCLUDED_STRING_TYPES)\
and snippet.parent.name not in EXCLUDED_TAGS:
snippet = re.sub(UNICODE_WHITESPACES, ' ', snippet)
snippet = snippet.strip()
if snippet != '':
snippets.append(snippet)
with
EXCLUDED_STRING_TYPES = (bs4.Comment, bs4.CData, bs4.ProcessingInstruction, bs4.Declaration)
EXCLUDED_TAGS = ['script', 'noscript', 'style', 'pre', 'code']
UNICODE_WHITESPACES = re.compile(u'[\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004'
u'\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]+')
How to convert following XML to java using jaxb
<work>
<subwork id="sub">
<ret="it">
</subwork>
<ret id="it">
<time>9</time>
</ret>
</work>
It is a bit tough since ret tag is outside subwork tag
Frst, you need to start with valid XML. I've made assumptions in correcting the XML:
<work>
<subwork id="sub">
<ret id="it"/>
</subwork>
<ret id="it">
<time>9</time>
</ret>
</work>
Second (and there are other ways of doing this), you need to create a schema that describes this XML. Without doing it for you, I'll say that the trick is to define an element, ret, and then refer to that element within the work element and again within the subwork element.
Third, you then feed that schema file (.XSD) into a tool that generates the JAXB classes. Typically this is xcj.exe (included with the Java JDK).
I am trying to retrieve the value of an attribute from an xmel file using XPath and I am not sure where I am going wrong..
This is the XML File
<soapenv:Envelope>
<soapenv:Header>
<common:TestInfo testID="PI1" />
</soapenv:Header>
</soapenv:Envelope>
And this is the code I am using to get the value. Both of these return nothing..
XPathBuilder getTestID = new XPathBuilder("local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']/*[local-name(.)='TestInfo'])");
XPathBuilder getTestID2 = new XPathBuilder("Envelope/Header/TestInfo/#testID");
Object doc2 = getTestID.evaluate(context, sourceXML);
Object doc3 = getTestID2.evaluate(context, sourceXML);
How can I retrieve the value of testID?
However you're iterating within the java, your context node is probably not what you think, so remove the "." specifier in your local-name(.) like so:
/*[local-name()='Header']/*[local-name()='TestInfo']/#testID worked fine for me with your XML, although as akaIDIOT says, there isn't an <Envelope> tag to be seen.
The XML file you provided does not contain an <Envelope> element, so an expression that requires it will never match.
Post-edit edit
As can be seen from your XML snippet, the document uses a specific namespace for the elements you're trying to match. An XPath engine is namespace-aware, meaning you'll have to ask it exactly what you need. And, keep in mind that a namespace is defined by its uri, not by its abbreviation (so, /namespace:element doesn't do much unless you let the XPath engine know what the namespace namespace refers to).
Your first XPath has an extra local-name() wrapped around the whole thing:
local-name(/*[local-name(.)='Envelope']/*[local-name(.)='Header']
/*[local-name(.)='TestInfo'])
The result of this XPath will either be the string value "TestInfo" if the TestInfo node is found, or a blank string if it is not.
If your XML is structured like you say it is, then this should work:
/*[local-name()='Envelope']/*[local-name()='Header']/*[local-name()='TestInfo']/#testID
But preferably, you should be working with namespaces properly instead of (ab)using local-name(). I have a post here that shows how to do this in Java.
If you don't care for the namespaces and use an XPath 2.0 compatible engine, use * for it.
//*:Header/*:TestInfo/#testID
will return the desired input.
It will probably be more elegant to register the needed namespaces (not covered here, depends on your XPath engine) and query using these:
//soapenv:Header/common:TestInfo/#testID
I need some advice how to parse this xml with java simplexml lib.
Problem is that i don't how how many items i will have in this xml element.
<Playoffs>
<PlayoffStatus>Potential Brackets</PlayoffStatus>
<LConference>Western Conference</LConference>
<RConference>Eastern Conference</RConference>
<LTeam1>Clippers (1)</LTeam1>
<LTeam2>Nuggets (8)</LTeam2>
<LTeam3>Grizzlies (4)</LTeam3>
<LTeam4>Warriors (5)</LTeam4>
<LTeam5>Spurs (3)</LTeam5>
<RTeam1>Heat (1)</RTeam1>
<RTeam2>Celtics (8)</RTeam2>
<RTeam3>Pacers (4)</RTeam3>
<RTeam4>Bulls (5)</RTeam4>
<RTeam5>Hawks (3)</RTeam5>
.......
You can use a Converter object or the #ElementListUnion interface. See here for an example.
https://simple.svn.sourceforge.net/svnroot/simple/trunk/download/stream/src/test/java/org/simpleframework/xml/core/UnionInlineListWitinPathTest.java
https://simple.svn.sourceforge.net/svnroot/simple/trunk/download/stream/src/test/java/org/simpleframework/xml/convert/HideEnclosingConverterTest.java