i have html form with textarea in which i paste some XML, for example:
<network ip_addr="10.0.0.0/8" save_ip="true">
<subnet interf_used="200" name="lan1" />
<subnet interf_used="254" name="lan2" />
</network>
When user submit form, that data is send to Java server, so in headers i get something like that:
GET /?we=%3Cnetwork+ip_addr%3D%2210.0.0.0%2F8%22+save_ip%3D%22true%22%3E%0D%0A%3Csubnet+interf_used%3D%22200%22+name%3D%22lan1%22+%2F%3E%0D%0A%3Csubnet+interf_used%3D%22254%22+name%3D%22lan2%22+%2F%3E%0D%0A%3C%2Fnetwork%3E HTTP/1.1
how can i use that in my Java applications? I need to make some calculations on that data and re-send new generated XML.
This answer shows how to use the URLDecoder/URLEncoder classes to decode and encode url strings. It should work if you passed the 'GET' string to the URLDecoders decode method.
To answer your following question (comment)
First you need to extract this xml based response from the url string. Maybe it's enough to create a substring starting with the first < char.
The String should be fed into a XML parser to create a DOM document. The last easy task would be walking through that document and copying the values to your internal network model.
Do not think about using RegExp to extract the data. Use a parser.
Related
I'm showing a dropdown on a web page but when using characters as ○ as options, the dropdown shows a question mark
I'm getting the dropdown option from a SQL Server database in which the column that saves the value is nvarchar type
Then I create an XML output string with the values to send it as response of an AJAX call
When I do xmlWriter.toString() , being xmlWriter a StringWriter object, I'm able to see the ○ character using Eclipse's debug mode but that string needs to be sent as a ByteArrayOutputStream object to add it to response stream for the response to see the XML file on the client side but when doing xmlWriter.toString().getBytes() the ○ character becomes a question mark
I've tried to use xmlWriter.toString().getBytes("UTF-8") but the result is some strange symbols
What am I missing?
By guessing what might be your problem it feels like you're not specifying the encoding in your response object to the browser and it fails guessing the right one. Consider calling getBytes("UTF-8") as you did (better: getBytes(StandardCharsets.UTF_8)) and submit an encoding information along with your response, either in the HTTP header (Content-Type: application/xml; charset=utf-8) as you're probably using HTTP or in the XML header (<?xml version="1.0" encoding="utf-8"?>). Maybe even both as this will provide you the best compatibility.
I'm calling a soap webservice from my java application.
I get response and I want to parse it and get data.
The problem is that field <tranData>, contains structure with >< instead of <>. How can I parse this document to get data from field <tranData>?
This is response structure:
<response>
<Portfolio>
<ID>1</ID>
<holder>2</holder>
</Portfolio>
<tranData> <responseOne><header><code>1</code></header></responseOne></tranData>
Please remember that, this is only a example of response, and the amount of data will be much bigger, so the solution should be fast.
What you show us is the actual document as it is received over the wire, right? So <tranData> contains an XML string that has been escaped to not interfere with the markup of the rest of the containing document.
When you read the content of the <tranData> element, the XML processor will 'unescape' the string and give you the 'original' value:
<responseOne><header><code>1</code></header></responseOne>
What you do with that value is a different story. You can parse it as yet another XML document and retrieve the value of the <code> element, or just pass the string along to some other processing step.
I am trying to write a ASP.net Web API that sends XML files from the database and recieve/read it on android
The XML file that I send is something like this
<ArrayOfMerchant xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/MvcApplication1.Models">
<Merchant>
<Address>ABC</Address>
<City>HHHH</City>
<Country>EEEE</Country>
<Id>1</Id>
<Latitude/>
<Longitude/>
<Name>Some store</Name>
</Merchant>
</ArrayOfMerchant>
Opened on browser and it looks fine.
On the Android side I am trying to receive and read it with HttpURLConnection.
Everything works , but when I try to convert the Input-stream into string, the string is something like
String = [{"Id":1,"Name":"Some store","Address":"ABC","City:"EEEE","Country":"Canada","Longitude":"","Latitude":""}]
Question:
1)Why does it display differently with different markup and also different ordering of the elements?
2)How can I receive / retrieve it as a normal XML file so I can parse it?
1) I dot know, depends on your ASP code and configuration. You can try to change parameters of your HTTP request to see how your ASP app respond when you change Accept header or User-Agent. There are some tools.
2) Actually, you don't need XML to parse the data .
I am using the Selenium 2 Java API to interact with web pages. My question is: How can i detect the content type of link destinations?
Basically, this is the background: Before clicking a link, i want to be sure that the response is an HTML file. If not, i need to handle it in another way. So, let's say there is a download link for a PDF file. The application should directly read the contents of that URL instead of opening it in the browser.
The goal is to have an application which automatically knows wheather the current location is an HTML, PDF, XML or whatever to use appropriate parsers to extract useful information out of the documents.
Update
Added bounty: Will reward it to the best solution which allows me to get the content type of a given URL.
As Jochen suggests, the way to get the Content-type without also downloading the content is HTTP HEAD, and the selenium webdrivers does not seem to offer functionality like that. You'll have to find another library to help you with fetching the content type of an url.
A Java library that can do this is Apache HttpComponents, especially HttpClient.
(The following code is untested)
HttpClient httpclient = new DefaultHttpClient();
HttpHead httphead = new HttpHead("http://foo/bar");
HttpResponse response = httpclient.execute(httphead);
BasicHeader contenttypeheader = response.getFirstHeader("Content-Type");
System.out.println(contenttypeheader);
The project publishes JavaDoc for HttpClient, the documentation for the HttpClient interface contains a nice example.
You can figure out the content type will processing the data coming in.
Not sure why you need to figure this out first.
If so, use the HEAD method and look at the Content-Type header.
You can retrieve all the URLs from the DOM, and then parse the last few characters of each URL (using a java regex) to determine the link type.
You can parse characters proceeding the last dot. For example, in the url http://yoursite.com/whatever/test.pdf, extract the pdf, and enforce your test logic accordingly.
Am I oversimplifying your problem?
I have a crawler that downloads pages and tries to parse the HTML. One of the issues I've been facing is how to properly determine what mimetype an HTML file is.
Right now I'm using
is = new ByteArrayInputStream( htmlResult.getBytes( "UTF-8" ) );
mimeType = URLConnection.guessContentTypeFromStream(is);
but it misses sites like this: http://www.artdaily.org/index.asp?int_sec%3D11%26int_new%3D39415 because of the extra space between the doc tag and HTML tag in the source.
Does anyone know a good way to determine if a string is HTML or not? Searching for or some other tag wouldn't necessarily work because of text being embedded in binary files I may come across.
thanks
Do you have control over the http connection that you crawler uses? Then how about checking the HTTP response header "Content-type". Thats one way to determine the content type. I just did a quick test of the artdaily.com to see if the content type header was sent. And there is one that has a value text/html