I have around 30 xml files with proper formatting and huge amount of data. I want to search these xml files for specific data retrival. Can you suggest any site or blog which i can use as aguideline to solve my problem.
I need to search inside of each tag for the keyword provided by the user. And also sometime the specific tag name which will return the content inside the tag according to the user request.
example : a.xml, b.xml, c.xml
inside a.xml
<abc>
some content
</abc>
User may search for abc the tag or some keyword inside the content. In both cases it should return the content or if more than one match then it should return the link for both by clicking which the user can see them one by one.
I'd recommend using XPath, which is a SQL-like language for searching in XML documents
http://www.ibm.com/developerworks/library/x-javaxpathapi.html
Use a SAX parser (no need to go back and forth within the documents plus huge amount of data hence don't use a DOM parser).
See this link for a tutorial.
You may store your XMLs into an XML database (for example eXist), and then query it using XQuery.
Related
I have a Word template, complete with fonts, colors, etc. I am querying a database and retrieving information into a POJO. I want to extract the relevant info from said POJO and create a Word document as per my template's directives.
The doc will have tables and graphs so I need to use Content Control Data Binding. As I understand it, I'll have to do the following to achieve this
Modify the Word template to add content controls
Transform the POJO into an XML object (template?)
Use ContentControlMergeXML to bind the XML data to the Word template
Unfortunately, I can't find a good step-by-step example of this anywhere. Nearly all of the links in the docx4j forum lead to broken GitHub pages
My questions
How can I use OpenDoPE to add tags to my Word template? I'll need to preserve style, so I want the correct OpenDoPE version
Should the POJO be converted into an XML object or document?
Is there an end to end example of this entire process so I can follow along? (preferably with source code)
Content control data binding essentially injects an XPath value into a content control in the Word document.
That XPath is evaluated against an XML document, so yes, you need to convert your POJO into XML.
Authoring
Now, there are 3 different OpenDoPE Word AddIns which you can use to add content controls to your Word document. See the links at https://opendope.org/implementations.html
The most recent one assumes a fixed XML format. So to use that, you'd need to transform your POJO to match that format. (ie use the AddIn to author your docx, then inspect the resulting XML (embedded in the docx), then figure out how to transform your POJO to that).
The older AddIns support arbitrary XML, but are cruder. To use one of these, first convert your POJO to XML (eg using JAXB), then feed the AddIn your sample XML.
Runtime
To bind your XML to a docx "template" to create an instance docx, see https://github.com/plutext/docx4j/blob/master/docx4j-samples-docx4j/src/main/java/org/docx4j/samples/ContentControlBindingExtensions.java
You can run that sample code against the sample docx + data; you can take a look at the docx to see what the content controls look like (they bind a custom xml part in the docx, so unzip it to see that)
ps the GitHub links broke as a result of a recent code re-org. GitHub isn't smart enough to dynamically maintain them :-( See https://www.docx4java.org/downloads.html for downloadable sample code.
I am new to this validation process in Java...
-->XML file named Validation Limits
-->Structure of the XML
parameter /parameter
lowerLimit /lowerLimit
upperLimit /upperLimit
enable /enable
-->Depending the the enable status, 'true or false', i must perform the validation process for the respective parameter
--> what could be the best possible method to perform this operation...
I have parsed the xml (DOM) [forgot this to mention earlier] and stored the values in the arrays but is complicated with lot of referencing that take place from one array to another. If any better method that could replace array procedure will be helpful
Thank you in advance.
Try using a DOM or SAX parser, they will do the parsing for you. You can find some good, free tutorials in the internet.
The difference between DOM and SAX is as follows: DOM loads the XML into a tree structure which you can browse through (i.e. the whole XML is loaded), whereas SAX parses the document and triggers events (calls methods) in the process. Both have advantages and disadvantages, but personally, for reasonably sized XML files, I would use DOM.
So, in your case: use DOM to get a tree of your XML document, locate the attribute, see other elements depending on it.
Also, you can achieve this in pure XML, using XML Schema, although this might be too much for simple needs.
What is the best way to detect data types inside html page using Java facilities DOM API, regexp, etc?
I'd like to detect types like skype plugin does for the phone/skype numbers, similar for addresses, emails, time, etc.
'Types' is an inappropriate term for the kind of information you are referring to. Choice of DOM API or regex depends upon the structure of information within the page.
If you know the structure, (for example tables being used for displaying information, you already know from which cell you can find phone number and which cell you can find email address), it makes sense to go with a DOM API.
Otherwise, you should use regex on plain HTML text without parsing it.
I'd use regexes in the following order:
Extract only the BODY content
Remove all tags to leave just plain text
Match relevant patterns in text
Of course, this assumes that markup isn't providing hints, and that you're purely extracting data, not modifying page context.
Hope this helps,
Phil Lello
I use Jericho HTML Parser 3.1.
I need to extract text from html, handle it and according to this, I need to insert tags to original html.
But for this I need matching between extracted text and source html.
net.htmlparser.jericho.TextExtractor extracts text pretty good, but I was not able to find how to find the location in original file.
Is it possible to do so with Jericho-html?
You cann't do this with the TextExtractor as is, but I've needed to do similar things in the past and the simplest solution is to copy Jericho's TextExtractor implementation and edit it to add your own custom behaviour. It's a pretty simple class so you'll be able to easily see where to add your own hooks.
I'm building an XML editor using the above technologies. In essence, I want to read in a whole XML file to a java object, and refer using this object to each element in the XML node tree (grouped into entries) to display the content locked, have separate padlocks for the user to click to 'unlock' an entry allow overwriting of the data, and to submit this entry. 'Add entry', 'Duplicate entry', 'Delete entry' are also functions I'd like to add.
I already use dom4j and XPath to access areas of the XML file so some of the work in theory is already done. Given the above, I was going to use these two together with inplaceInputs to allow the user to edit the XML and JSF validators to check the data coming in.
Is this the best way to approach this problem, or is there a more straightforward route than XPathing a whole record? I started looking at jaxb but I'm new at java and jsf but I've got the feeling I won't be by the end..
Thanks
You can try using SIMPLE FRAMEWORK API in java.It is dedicated for XML in java and will certainly suit your needs.You can access the entire xml BASED ON NODE,TRESS,CHILD.Morever writing and reading an XML is equally easy by using serializer and persister which will store the values in repsective setters and gettters.