utf-8 using standard openStream and DocumentBuilder - java

Need to convert the format of output to UTF-8, because the output is not treating special characters.
Anyone have any idea how can this be done?
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL u = new URL("http://www.aredacao.com.br/tv-saude");
Document doc = builder.parse(u.openStream());
NodeList nodes = doc.getElementsByTagName("item");`

The problem is that the site returns <?xml version='1.0' encoding='iso-8859-1'?> but it should be returning <?xml version='1.0' encoding='UTF-8'?>.
One solution is to translate each element's text yourself:
static void readData()
throws IOException,
ParserConfigurationException,
SAXException {
DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL u = new URL("http://www.aredacao.com.br/tv-saude");
Document doc = builder.parse(u.toString());
NodeList nodes = doc.getElementsByTagName("item");
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
Element el = (Element) node;
String title =
el.getElementsByTagName("title").item(0).getTextContent();
title = treatCharsAsUtf8Bytes(title);
String description =
el.getElementsByTagName("description").item(0).getTextContent();
description = treatCharsAsUtf8Bytes(description);
System.out.println("title=" + title);
System.out.println("description=" + description);
System.out.println();
}
}
private static String treatCharsAsUtf8Bytes(String s) {
byte[] bytes = s.getBytes(StandardCharsets.ISO_8859_1);
return new String(bytes, StandardCharsets.UTF_8);
}
Another possibility is to write a subclass of FilterInputStream which replaces the erroneous <?xml prolog's encoding, but that is a lot more work, and I would only consider doing that if the document had a complex structure with many different elements such that translating each would be unwieldy.

Related

I can't remove space when get value from xml using node.getTextContent()

I have xml configuration file that we parsed using DOM parser.when i get value from childnode using node.getTextContent();,i cannot remove space in the string value.it works when i give value without space. but, i have to handle negative scenerio too.
I tried trim(),replaceAll("\s", "") and replaceAll("\u00A0", "");but nothing worked for me
NodeList serviceAddrNodeList=serviceAddressesNode.getChildNodes();
packetSendingIplist =new ArrayList();
for (int l = 0; l < serviceAddrNodeList.getLength(); l++) {
Node serviceAddrNode=serviceAddrNodeList.item(l);
if(serviceAddrNode.getNodeType()==Node.ELEMENT_NODE){
String packetSendingIp = serviceAddrNode.getTextContent();
packetSendingIp.trim(); //replaceAll("\s", "") and replaceAll("\u00A0", "")
if(checkValidIp(packetSendingIp)){
log("invalid service_addr-"+packetSendingIp+"ignoring this
listening point ");
}
}
}
xml:
<service_addresses>
<!-- host1 -->
<service_addr>172.17.1.16 </service_addr>
<service_addr>172.17.1.17 </service_addr>
<!-- host12-->
<service_addr>172.17.1.32</service_addr>//works becuase no space here
<service_addr>172.17.1.33 </service_addr>
</service_addresses>
try with this
File fXmlFile = new File("your xml file path");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
NodeList nodeList = doc.getElementsByTagName("service_addr");
for(int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getTextContent().trim());
}
output, without any whitespaces
172.17.1.16
172.17.1.17
172.17.1.32
172.17.1.33
You can remove all the whitespaces on the beginning and end of the content through a regex:
public class Main {
public static final String DEST = "html_1.pdf";
private static final String WHITESPACE_REGEX = "(^( )*|( )*$)";
public static void main(String[] args) throws Exception {
Assert.assertEquals(" 192.168.1.1 ".replaceAll(WHITESPACE_REGEX, StringUtils.EMPTY), "192.168.1.1");
Assert.assertEquals(" 192.168.1.1 ".replaceAll(WHITESPACE_REGEX, StringUtils.EMPTY), "192.168.1.1");
Assert.assertEquals("192.168.1.1 ".replaceAll(WHITESPACE_REGEX, StringUtils.EMPTY), "192.168.1.1");
Assert.assertEquals("192.168.1.1".replaceAll(WHITESPACE_REGEX, StringUtils.EMPTY), "192.168.1.1");
}
}
You cannot do Trim() like that. You have to assign this output again to variable because after what you doing it will have still The same value. Add trim to this line and it should works as expected.
String packetSendingIp = serviceAddrNode.getTextContent().trim()

How to get data from XML node?

I am struggling to get the data out of the following XML node. I use DocumentBuilder to parse XML and I usually get the value of a node by defining the node but in this case I am not sure how the node would be.
<Session.openRs status="success" sessionID="19217B84:AA3649FE:B211FF37:E61A78F1:7A35D91D:48E90C41" roleBasedSecurity="1" entityID="1" />
This is how I am getting the values for other tags by the tag name.
public List<NYProgramTO> getNYPPAData() throws Exception {
this.getConfiguration();
List<NYProgramTO> to = dao.getLatestNYData();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document document = null;
// Returns chunkSize
/*List<NYProgramTO> myList = getNextChunk(to);
ExecutorService executor = Executors.newFixedThreadPool(myList.size());
myList.stream().parallel()
.forEach((NYProgramTO nyTo) ->
{
executor.execute(new NYExecutorThread(nyTo, migrationConfig , appContext, dao));
});
executor.shutdown();
executor.awaitTermination(300, TimeUnit.SECONDS);
System.gc();*/
try {
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource source = new InputSource();
for(NYProgramTO nyProgram: to) {
String reqXML = nyProgram.getRequestXML();
String response = RatingRequestProcessor.postRequestToDC(reqXML, URL);
// dao.storeData(nyProgram);
System.out.println(response);
if(response != null) {
source.setCharacterStream(new StringReader(response));
document = builder.parse(source);
NodeList list = document.getElementsByTagName(NYPG3Constants.SERVER);
for(int iterate = 0; iterate < list.getLength(); iterate++){
Node node = list.item(iterate);
if(node.getNodeType() == Node.ELEMENT_NODE) {
Element element = (Element) node;
nyProgram.setResponseXML(response);
nyProgram.setFirstName(element.getElementsByTagName(NYPG3Constants.F_NAME).item(0).getTextContent());
nyProgram.setLastName(element.getElementsByTagName(NYPG3Constants.L_NAME).item(0).getTextContent());
nyProgram.setPolicyNumber(element.getElementsByTagName(NYPG3Constants.P_NUMBER).item(0).getTextContent());
nyProgram.setZipCode(element.getElementsByTagName(NYPG3Constants.Z_CODE).item(0).getTextContent());
nyProgram.setDateOfBirth(element.getElementsByTagName(NYPG3Constants.DOB).item(0).getTextContent());
nyProgram.setAgencyCode(element.getElementsByTagName(NYPG3Constants.AGENCY_CODE).item(0).getTextContent());
nyProgram.setLob(element.getElementsByTagName(NYPG3Constants.LINE_OF_BUSINESS).item(0).getTextContent());
if(element.getElementsByTagName(NYPG3Constants.SUBMISSION_NUMBER).item(0) != null){
nyProgram.setSubmissionNumber(element.getElementsByTagName(NYPG3Constants.SUBMISSION_NUMBER).item(0).getTextContent());
} else {
nyProgram.setSubmissionNumber("null");
}
I need to get the value for sessionId. What I want to know is the node, I am sure it can't be .I am retrieving the values via tag names so what would be the tag name in this case?
Thanks in advance
You should consider using XPath. At least for me, is so much easy to use and, in your case, in order to get sessionID you could try something like this:
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Session.openRs/#sessionID";
String sessionID = xPath.evaluate(expression,document);
You can obtain 'document' like this:
Document document = builder.newDocumentBuilder();
Hope this can help!!

How get XML value from Unziped file

I need to get value like "Symbol" ect. from xml file and send to list.
For now my code looks like this:
Scanner sc = null;
byte[] buff = new byte[1 << 13];
List<String> question2 = new ArrayList<String>();
question2 = <MetodToGetFile>(sc,fileListQ);
for ( String strLista : question2){
ByteArrayInputStream in = new ByteArrayInputStream(strLista.getBytes());
try(InputStream reader = Base64.getMimeDecoder().wrap(in)){
try (GZIPInputStream gis = new GZIPInputStream(reader)) {
try (ByteArrayOutputStream out = new ByteArrayOutputStream()){
int readGis = 0;
while ((readGis = gis.read(buff)) > 0)
out.write(buff, 0, readGis);
byte[] buffer = out.toByteArray();
String s2 = new String(buffer);
}
}
}
}
}
I want to know how can i contunue this and takevalue "xxx" and "zzzz" to put to another list, because i need to compere some value.
XML looks like this:
<?xml version="1.0" encoding="utf-8"?>
<Name Name="some value">
<Group Names="some value">
<Package Guid="{7777-7777-7777-7777-7777}">
<Attribute Typ="" Name="Symbol">xxx</Attribute>
<Attribute Type="" Name="Surname">xxx</Attribute>
<Attribute Type="Address" Name="Name">zzzz</Attribute>
<Attribute Type="Address" Name="Country">zzzz</Attribute>
</Package>
EDIT: Hello i hope that my solution will be usefull for someone :)
try{
//Get is(inputSource with xml in s2(xml string value from stream)
InputSource is = new InputSource(new StringReader(s2));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
//Get "some value" from attribut Name
String name= (String) xpath.evaluate("/Name/#Name", doc, XPathConstants.STRING);
//Get "guid" from attribute guid
String guid= (String) xpath.evaluate("/Name/Group/Package/#Guid", doc, XPathConstants.STRING);
//Get element xxx by tag value Symbol
String symbol= xpath.evaluate("/Name/Group/Package/Attribute[#Name=\"Symbol\"]", doc.getDocumentElement());
System.out.println(name);
System.out.println(guid);
System.out.println(symbol);
}catch(Exception e){
e.printStackTrace();
}
I would be happy if i will help someone by my code :)
Add a method like this to retrieve all of the elements that match a given Path expression:
public List<Node> getNodes(Node sourceNode, String xpathExpresion) throws XPathExpressionException {
// You could cache/reuse xpath for better performance
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList) xpath.evaluate(xpathExpresion,sourceNode,XPathConstants.NODESET);
ArrayList<Node> list = new ArrayList<Node>();
for(int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
list.add(node);
}
return list;
}
Add another method to build a Document from an XML input:
public Document buildDoc(InputStream is) throws Exception {
DocumentBuilderFactory fact = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = fact.newDocumentBuilder();
Document newDoc = parser.parse(is);
newDoc.normalize();
is.close();
return newDoc;
}
And then put it all together:
InputSource is = new InputSource(new StringReader("... your XML string here"));
Document doc = buildDoc(is);
List<Node> nodes = getNodes(doc, "/Name/Group/Package/Attribute");
for (Node node: nodes) {
// for the text body of an element, first get its nested Text child
Text text = node.getChildNodes().item(0);
// Then ask that Text child for it's value
String content = node.getNodeValue();
}
I hope I copied and pasted this correctly. I pulled this from a working class in an open source project of mine and cleaned it up a bit to answer your specific question.

Parsing XML received in String to get lat and long

I need to parse this xml which i am receiving in String format i want to extract lat and lon
<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="ok">
<cell
lat="13.035037526666665"
lon="77.56784941333333"
mcc="404"
mnc="45"
lac="1020"
cellid="13443"
averageSignalStrength="0"
range="-1"
samples="15"
changeable="1"
radio="GSM" />
</rsp>
Please can anyone help me on this
I have tried this but not getting output
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource src = new InputSource();
src.setCharacterStream(new StringReader(data));
Document doc = builder.parse(src);
String lat = doc.getElementsByTagName("lat").item(0).getTextContent();
String lon = doc.getElementsByTagName("lon").item(0).getTextContent();
lat is an attribute of the element cell. So you should not use getElementsByTagName("lat") but getAttribute("lat"): (code untested)
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource src = new InputSource();
src.setCharacterStream(new StringReader(data));
Document doc = builder.parse(src);
Element docElement = doc.getDocumentElement();
String lat = docElement.getElementsByTagName("cell").item(0).getAttribute("lat");
String lon = docElement.getElementsByTagName("cell").item(0).getAttribute("lon");
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new FileInputStream("C:\\dev\\Workspace\\ahportal\\benefits-base-sdk\\portlets\\ah-hm-enrl-charts-portlet\\docroot\\WEB-INF\\src\\com\\aonhewitt\\portal\\base\\charts\\bean\\employee.xml"));
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
System.out.println("lat=> " + node.getAttributes().getNamedItem("lat")
.getNodeValue());
System.out.println("lon=> " + node.getAttributes().getNamedItem("lon")
.getNodeValue());
}
}
}
Please try above code. It should work. Let me know if it doesn't work for you.
Depending on what you want to do with the coordinates, I would recommend to wrap them into an location object, holding lat & lon.
I would not recommend to use the DOM API, but a data binding lib like JAXB, or for a shorter and more flexible approach a data projection library (DISCLOSURE: I'm affiliated with that project) and have the values automatically converted to Double.
import org.xmlbeam.XBProjector;
import org.xmlbeam.annotation.XBRead;
public class ReadCoords {
// Object holding coordinates
public interface Location {
// Access methods for coordinates
#XBRead("./#lat")
Double getLat();
#XBRead("./#lon")
Double getLon();
}
public static void main(String[] args) {
Location location = new XBProjector().io().url("res://data.xml").evalXPath("/rsp/cell").as(Location.class);
System.out.println(location.getLat()+"/"+location.getLon());
}
}

Write in docx file

i try to parse text from xml file and then write it and save in docx file with apachepoi XWPFDocument, it creates docx file, but it's empty, i cant seen there my text from parsed xml. Any suggestions will be appreciated?
xml:
`<document>
<el id="1">
<text>Rakesh</text>
</el>
<el id="2">
<text>John</text>
</el>
<el id="3">
<text>Rajesh</text>
</el>
</document>`
code:
public void dothis() throws ParserConfigurationException, SAXException,
IOException, TransformerFactoryConfigurationError,
TransformerException {
in = new BufferedReader(new FileReader("D:\\Probe.xml"));
XWPFDocument document1 = new XWPFDocument();
XWPFParagraph paragraphOne = document1.createParagraph();
XWPFRun paragraphOneRunOne = paragraphOne.createRun();
paragraphOneRunOne.setText(in);
PrintWriter zzz = new PrintWriter(new FileWriter("D:\\dd3.docx"));
document1.write(zzz);
zzz.close();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Get the DOM Builder
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("D:\\Probe.xml");
List<Elementt> empList = new ArrayList<>();
// Iteration durch den Knoten und die kinder Knoten extraktion
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
Elementt emp = new Elementt();
emp.id = node.getAttributes().getNamedItem("id").getNodeValue();
NodeList childNodes = node.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
Node cNode = childNodes.item(j);
// Unterelementen von xml identifizieren
if (cNode instanceof Element) {
String content = cNode.getLastChild().getTextContent()
.trim();
switch (cNode.getNodeName()) {
case "text":
emp.text = content;
break;
}
}
}
empList.add(emp);
}
Based on my experience with XWPFRun when you do this:
paragraphOneRunOne.setText(in);
The 'in' needs to equal the text you want to input. Your 'in' is equal to the following:
in = new BufferedReader(new FileReader("D:\\Probe.xml"));
Try parsing the text first, then setting that as your character run, something like:
String in = textFromXMLFile
paragraphOneRunOne.setText(in);
Or, if I have understood your code correctly (I haven't done any xml yet), and the ArrayList contains your text, something like:
List<Elementt> empList = new ArrayList<>();
for(int i = 0; i < empList.length(); i++){
paragraphOneRunOne.setText(empList.get(i));
}
The main point is whenever you set the runtext, whatever you use at that point seems to be what is inputted, so you need the relevant data ready before setting the run with it.
Good luck!

Categories

Resources