Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have an XML document in a foreign language and another XML document in English. I am trying to replace some nodes in the foreign document with nodes from the English document and export the document.
I have been working on this for days now and have tried countless things form importing both documents into text with a Scanner, BufferedReader, etc. with no good results.
I'm at a loss on what else I can try. I have searched for days and have nothing. Maybe what I'm trying to do cannot be done although it seems simple enough. Any help/direction would be appreciated.
Put them into DOM objects, then use XPATH to locate and select nodes, to copy values between them.
Depending on what you need to replace and what you mean by "export", I would use an XML parser like SAX using the following algorithm
For each node that you read
Replace attributes or text as necessary
Write it out to the the a new XML file
There are many tutorials out there on how to use SAX, such as this one: How to parse XML using the SAX parser
If the "replacements" you need to do are very straightforward like "all <tag> objects under <parent-tag>" then maybe building the DOM and using XPath would work, but if your replacements are very arbitrary and unstructured then I'd go with parsers.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
The image below describes what I want to do, so I'm supposed to add many values to this three tables.
I'm using the library docx4j
You can use content control databinding for this; docx4j's OpenDoPE convention allows you to repeat table rows. And more recent versions of Word have a concept of repeating content controls; see https://www.docx4java.org/blog/2015/01/word-2013-repeatingsection-content-controls-ready-for-prime-time/
In principle, docx4j supports both, but it'll be easier to get help with the OpenDoPE approach.
To get started, try invoice.docx from https://github.com/plutext/docx4j/tree/master/sample-docs/word/databinding which is an example of repeating table rows.
To merge invoice-data.xml (from the same dir) into it, use https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ContentControlsMergeXML.java
If you like this approach, you'll need to author your own input document; to do this, you can try the "friendly" Word AddIn at https://opendope.org/implementations.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm trying to add some data to a PDF with iText 7 in a Java application.
I don't succeed in opening the pdf in append mode. I looked for some solutions online but all concerned iText5 (and use classes that doesn't exist any more.)
What can I do?
It depends on what you want specifically:
merge two documents:
https://developers.itextpdf.com/content/itext-7-examples/itext-7-merging-pdf-documents
add content at the end of a document:
Similar to before, you could create a new document (to a byte output stream), and merge the two together
add content to an existing page:
Hard to do, since that typically requires re-layout of the document, which no PDF-engine can currently do.
fill in forms in the document:
https://developers.itextpdf.com/content/itext-7-examples/itext-7-form-examples
add an attachment to the document:
https://developers.itextpdf.com/examples/miscellaneous/clone-embedded-files
extra (3):
Adding content to a PDF, in the middle of existing content is extremely hard.
To understand why, here is some information on how PDF documents are built internally:
PDF documents contain instructions for a viewer to render, rather than plain text
instructions and their arguments are grouped in 'objects'
objects can be compressed to reduce file size
a PDF document keeps an internal index of all of these objects, this is called the XREF table
the index inside a PDF document uses byte-offsets to tell a renderer where (in the file) an object can be found
Suppose you want to change (or add) something.
You'd mess up all the byte-offsets in the XREF. No viewer would be able to find any object again.
Then there is the fact that the PDF does not contain layout information. If you added something new, and existing content would need to move, you need layout information (what objects make a sentence? which sentences make a paragraph?). Only by having layout information can you sensibly re-layout the document.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need to parse an XML document in java for a web service I'm making, and save the contents of it.
I need to save the name of the tags, if the tag has attributes save the attributes, and then save the data within those tags. These three items will be inserted into a database table with the three columns tags, attributes, and data.
I'm using the following java libraries:
javax.xml.parsers.DocumentBuilder
javax.xml.parsers.DocumentBuilderFactory
org.w3c.dom.Document, org.w3c.dom.NodeList
org.xml.sax.InputSource.
Any help would be much appreciated.
DISCLAIMER: I don't want to plagiarize so I didn't include code but included links to other tutorials that are VERY helpful to this topic.
First, you should read w3c dom's java API because it tells you a lot of useful functions that are very related to your question.
Second, this website contains a useful tutorial that's easy to understand and it contains the necessary information for you to get the attributes of tags.
Third, this website gives you info on how to get tagName when you are looping through elements.
Fourth, you should always read related API, google, and then post a question if you are have no clue after a LONG period of time.
Lastly, you should post a difference question or research on database FIRST before asking that question here. This question should only be about XML Document Parsing in Java.
We are not supposed to help you do anything so the API is the best help for you (and google).
API: https://docs.oracle.com/javase/7/docs/api/org/w3c/dom/package-summary.html
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have an xml that I need to Parse and fetch values from it. However , I am not sure which way of parsing would be the best for the xml of this type. I read about different ways but not sure if that's the best possible way. Could someone please help me write a java code to Parse this xml using the best approach ?
Thanks in advance !
Here's the XML :
<managementDomain>
<mtosi:additionalInfo>
<mtosi:nvs>
<stru:attributeName>Managed Device Name</stru:attributeName>
<stru:attributeValue>
<nonc:value>al-dcdc-numr-phe-eu</nonc:value>
</stru:attributeValue>
</mtosi:nvs>
<mtosi:nvs>
<stru:attributeName>NMDBF</stru:attributeName>
<stru:attributeValue>
<nonc:value>Y</nonc:value>
</stru:attributeValue>
</mtosi:nvs>
<mtosi:nvs>
<stru:attributeName>BFGCustrID</stru:attributeName>
<stru:attributeValue>
<nonc:value>3444</nonc:value>
</stru:attributeValue>
</mtosi:nvs>
<mtosi:nvs>
<stru:attributeName>BFGContractID</stru:attributeName>
<stru:attributeValue>
<nonc:value>12331</nonc:value>
</stru:attributeValue>
</mtosi:nvs>
</mtosi:additionalInfo>
<mtosi:mdVendorExtensions>
<mtosi:tmf854Version/>
<mtosi:extVersion/>
<mtosi:extAuthor/>
</mtosi:mdVendorExtensions>
<mtosi:managedElement>
<mtosi:manufacturer>
<nonc:ossValue>CISCO</nonc:ossValue>
</mtosi:manufacturer>
<mtosi:productName>
<nonc:value>CISCO2951</nonc:value>
</mtosi:productName>
<mtosi:meVendorExtensions>
<mtosi:tmf854Version/>
<mtosi:extVersion/>
<mtosi:extAuthor/>
<mtosi:managementIPAddress>
<mtosi:ipValue>
<nonc:value>10.32.22.49</nonc:value>
</mtosi:ipValue>
</mtosi:managementIPAddress>
</mtosi:meVendorExtensions>
</mtosi:managedElement>
</managementDomain>
I need to fetch :
ManagementIpAddress , BFGCustomerId , BFGContractID and Managed Device Name from this xml
Two possible ways of parsing this XML are DOM4J and SAX. The former is more memory intensive and loads the complete document into a Java object structure. With SAX you can parse the file by Streaming the content and "listening" for the Elements you want to extract.
So for your specific case - that is reading only some few elements - SAX might be the way to go.
The drawback of SAX is, that invites to some hackish solutions that only work with specific - and correct (i.e. in the best case pre-validated) XML files. You need to programm more carefully when using SAX.
(Of course with small XMLs it's not a shame to load it completly with DOM4J, if this is more convenient for you ;)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to parse(steal) a lot of information from a HTML page. And a lot of information is in blocks. like: username: 1.age 2.gender 3.country etc. It's a very large block and therefore my regex pattern is huge. All of my regex development tools have a single line for the pattern, and a textbox for the text. It makes developing these kind of large patterns impossible. What am I suppose to do to develop large regex patterns or do I avoid them?
HTML pages are basically a valid DOM strucure. So better use a DOM parser instead of regex to get the desired info. You can explore JSoup : Java HTML parser.
use the parsing rules described for HTML to generate the DOM trees from text/html resources. Together, these rules define what is referred to as the HTML parser.