Android HttpPut and weird character apparition - java

I try to send an xml using HttpPut and I always have error 400, I tried with several REST client for firefox and both my URI and XML are ok.
So I check the packet with wireshark and something weird happens, it seems that I have the character '?' at the begining of the xml. Of course this '?' is not in my xml file and I can't find where it comes from. When I put my xml in a variable in the code everything works fine but if I read the xml from the file in the eclipse's assets directory the '?' appears...
Here's a sample of my code, I tried everything: with addHeaders, without add headers, read bytes instead of lines....and got error 400 every time. The problem is from that part of the code cause if I add the first line of the xml file "manually" (as I did in the code I put here) the '?' comes after the first line instead of at the beginning (if I read all the xml from the file)
BufferedReader reader = new BufferedReader(new InputStreamReader(c.getAssets().open("data.xml")));
String line;
String f="<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>";
while((line=reader.readLine()) != null){
f+=line;
}
reader.close();
StringEntity se = new StringEntity(new String(f));
se.setContentEncoding(new BasicHeader(HTTP.CONTENT_TYPE,"text/xml;charset=UTF-8"));
reqPut.setEntity(se);
httpResp = (BasicHttpResponse) httpCli.execute(reqPut);
So if anyone has a clue about this....

Related

Is it necessarily so that you can POST a byte stream to any API that will accept a file, or does it depend on the API?

I have come to the understanding that knowing this is indicative of a lack of knowledge of how REST-like APIs work, and if someone can provide me a reference where I can learn the background behind this question, I would appreciate it. In the meantime, though, I would also appreciate help answering this question!
I have a java application that posts files from the local filesystem to an API. My goal is to instead of having millions of files sitting on the volume with all of their file handles, I want to leave the files in a .tar.gz file, and then in memory pull them out of archive and POST them without writing them to disk. I know that I can write them to disk, POST them, and then delete them, but I view that option as a last resort.
So here's code that works to POST a file that exists in the file system, not in an archive
public CloseableHttpResponse submit (File file) throws IOException {
CloseableHttpClient client = HttpClients.createDefault();
HttpPost post = new HttpPost(API_LOCATION + API_BASE);
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addBinaryBody("files", file, ContentType.APPLICATION_OCTET_STREAM, null);
HttpEntity multipartEntity = builder.build();
post.setEntity(multipartEntity);
CloseableHttpResponse response = client.execute(post);
System.out.println("response: " + IOUtils.toString(response.getEntity().getContent(),"UTF-8"));
client.close();
return response;
}
I get back a JSON response from my particular API that looks like this
response: {"data":[<bunch of json>]}
I've put the same file into a .tar.gz archive and have used apache commons compress to unzip the file and pull out each file as a TarArchiveEntry, and I've tested that it works properly by writing the text file to disk and opening it manually outside of java - I am definitely getting the entry into memory correctly. I tried changing the entity attached to the POST to a ByteArrayEntity and converting the archive entry to a byte stream, but the API insists it will only accept a multipart entity. So looking at the API for MultipartEntityBuilder.addBinaryBody it appears I'm left with two options: I can either post a byte array or an InputStream. I've tried both and I can't get either to work - I'll post my example code for the byte array approach, but I can't figure out how to convert the tar archive to an InputStream - at least not without converting it to a byte array first, which seems sorta silly at that point.
public CloseableHttpResponse submit (byte[] xmlBytes) throws IOException {
CloseableHttpClient client = HttpClients.createDefault();
HttpPost post = new HttpPost(API_LOCATION + API_BASE);
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addBinaryBody("files", xmlBytes, ContentType.APPLICATION_OCTET_STREAM, null);
HttpEntity multipartEntity = builder.build();
post.setEntity(multipartEntity);
CloseableHttpResponse response = client.execute(post);
System.out.println("response: " + IOUtils.toString(response.getEntity().getContent(),"UTF-8"));
System.out.println(response.getStatusLine().getStatusCode());
client.close();
return response;
}
I believe the code is identical with the exception of the data type of the input parameter. Here is my empty response, which comes with a status code 207:
response: {"data":[]}
So here is my real question: Can any API that accept files also accept a file in the form of a byte stream or byte array? Can the API tell the difference, and what is really happening when I POST a file? Does the API have to be specifically configured to accept this file in the form of a byte stream or a byte array? A link to a reference along with a short explanation would be highly appreciated - I really need to learn this stuff and understand it well.
Is there some easy to correct mistake that I'm making? Am I using the wrong Content-Type or something? I'm not even sure what the meaning of the third argument to MultipartEntityBuilder.build is (the one I've left null).
Any help is appreciated, thank you very much!
It appears that an API that accepts a file doesn't care if it comes from a file object or a byte array. Per JB Nizet:
You're passing null as the file name. When passing a File as argument, the actual name of the File is used if you passed null as file name. That doesn't happen obviously if you pass a bute array. So specify a non-null file name as last argument. That can only be found out by reading the javadoc and the source code of MultipartEntityBuilder. It's open source: use that as an advantage.
In this specific case, adding a random string as the last argument of the build method fixes the problem and the API accepts the byte array as a file.

XML info shown in Developer tools but none when printed to console in Java

I am accessing a DICOM metadata file by making a WADO query call. When I go into the Developer tools of my browser I can see all the metadata information, including ones related to RDSR.
However, in my source code when I simply wishes to print to console all the contents, I noticed that only RDSR metadata information is missing. I used the same url query as the one I used for the browser, along with this code:
URL url = new URL(urlPath);
String userPassword = username + ":" + password;
String encoding = Base64.getEncoder().encodeToString(userPassword.getBytes());
URLConnection uc = url.openConnection();
uc.setRequestProperty("Authorization", "Basic " + encoding);
uc.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String strInputLine;
while ((strInputLine = br.readLine()) != null){
System.out.println(strInputLine);
}
Is there something in my code that is preventing it from printing everything?
Thanks.
In conclusion, one of the reasons why working with xmls in code may have missing or funny results is because the xmls may not be plain xmls with one line of data on each line. In other words, it could be that a line is merely the topmost layer acting as an entry point to a tree of data.
Viewing this kind of xml in browser may not look different, but in fact some of the lines of data are nested xml data that are several layers deeper than others, which will require additional recursions when working in code.

Problems parsing Spanish characters (á, é, í, ó, ú) from XML response

I'm developing a Java app, that calls a PHP from internet that it's giving me a XML response.
In the response is contained this word: "Próximo", but when i parse the nodes of the XML and obtain the response into a String variable, I'm receiving the word like this: "Próximo".
How can i solve this?
StringEscapeUtils.unescapeHTML()
Probably you are using different encoding in your Java app then encoding of PHP script. Try to set encoding of your stream, for example like that
URL oracle = new URL("http://www.yourpage.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
yc.getInputStream(),"utf-8"));//<-- here you set encoding
//to the same as in your PHP
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
I found solution to this problem... While parsing use "ISO-8859-1" format and use Html.fromHtml(string) method while storing your values into bean .Where "string" is the value inside the each tag of XML response.

How to parse an XML file containing BOM?

I want to parse an XML file from URL using JDOM. But when trying this:
SAXBuilder builder = new SAXBuilder();
builder.build(aUrl);
I get this exception:
Invalid byte 1 of 1-byte UTF-8 sequence.
I thought this might be the BOM issue. So I checked the source and saw the BOM in the beginning of the file. I tried reading from URL using aUrl.openStream() and removing the BOM with Commons IO BOMInputStream. But to my surprise it didn't detect any BOM.
I tried reading from the stream and writing to a local file and parse the local file. I set all the encodings for InputStreamReader and OutputStreamWriter to UTF8 but when I opened the file it had crazy characters.
I thought the problem is with the source URL encoding. But when I open the URL in browser and save the XML in a file and read that file through the process I described above, everything works fine.
I appreciate any help on the possible cause of this issue.
That HTTP server is sending the content in GZIPped form (Content-Encoding: gzip; see http://en.wikipedia.org/wiki/HTTP_compression if you don't know what that means), so you need to wrap aUrl.openStream() in a GZIPInputStream that will decompress it for you. For example:
builder.build(new GZIPInputStream(aUrl.openStream()));
Edited to add, based on the follow-up comment: If you don't know in advance whether the URL will be GZIPped, you can write something like this:
private InputStream openStream(final URL url) throws IOException
{
final URLConnection cxn = url.openConnection();
final String contentEncoding = cxn.getContentEncoding();
if(contentEncoding == null)
return cxn.getInputStream();
else if(contentEncoding.equalsIgnoreCase("gzip")
|| contentEncoding.equalsIgnoreCase("x-gzip"))
return new GZIPInputStream(cxn.getInputStream());
else
throw new IOException("Unexpected content-encoding: " + contentEncoding);
}
(warning: not tested) and then use:
builder.build(openStream(aUrl.openStream()));
. This is basically equivalent to the above — aUrl.openStream() is explicitly documented to be a shorthand for aUrl.openConnection().getInputStream() — except that it examines the Content-Encoding header before deciding whether to wrap the stream in a GZIPInputStream.
See the documentation for java.net.URLConnection.
You might find you can avoid handling encoded responses by sending a blank Accept-Encoding header. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html: "If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding.". That seems to be occurring here.

"En dash" being garbled during http response handling or text manipulation

I'm writing code to work with text from Wikipedia and am having issues with en dashes being garbled. I haven't worked with en dashes or other non-standard characters before (non-standard to me being character that don't appear on my keyboard ;), so I'm not sure where to point the finger at what I'm doing wrong. Here's what is happening, along with code snippets.....
I send a request to Wikipedia (I'm using the Apache HttpComponents client API for communicating with Wikipedia) for the contents of an article and save it in a String:
DefaultHttpClient client = new DefaultHttpClient();
HttpGet queryRequest = new HttpGet(query); // query is the URL for retrieving the article contents.
ResponseHandler<String> responseHandler = new BasicResponseHandler();
String responseBody = client.execute(queryRequest, responseHandler);
At this point if I were to send "responseBody" to System.out, en dashes are displayed in my Eclipse console as '?'. This might just be an Eclipse console display issue so I'll move on.
I manipulate the text, ignoring the en dashes, and then send the text back to Wikipedia.
List<NameValuePair> postParams = new ArrayList<NameValuePair>();
postParams.add(new BasicNameValuePair("text", content); // content is a String with the article text
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(postParams, "UTF-8");
HttpPost queryRequest = new HttpPost(url); // url is the basic URL for the Wikipedia api
queryRequest.setEntity(entity);
queryRequest.addHeader("Content-Type", "application/x-www-form-urlencoded");
ResponseHandler<String> responseHandler = new BasicResponseHandler();
String responseBody = client.execute(queryRequest, responseHandler);
When the text, now uploaded to Wikipedia, is displayed in a web browser what was en dashes before are now displayed as '?' in a box (unknown character?). Therefore, somewhere I am inadvertently changing or miscoding the en dashes, but I'm not sure exactly where.
Can someone point me in the right direction?
Now for the real answer. The problem with the non-English characters getting mangled had nothing to do with the Apache HTTPComponents or with an Java string handling/manipulation. The problem was with the Eclipse IDE running on Windows.
Eclipse in the run configuration defaults to use the system's default encoding method, Cp1252 for Windows. Since Cp1252 doesn't support all of the UTF-8 characters, thus problems arise. I found the solution here. In Eclipse you go into the Run Configurations. For the project you are attempting to run, go to the 'Common' tab. There is a section for encoding. Change it from "Default" to "Other" and set the encoding to UTF-8.
All is now well.
I still have yet to figure out why the endash is getting mangled. I do have a (possibly kludgy) fix in the mean time.
String unknownUTF = String.copyValueOf(Character.toChars(65533));
content = content.replace(unknownUTF, "\u2013");
I'm basically replacing all instances of the 'unknown' UTF-8 character with the endash character. This works assuming that the original content doesn't contain any other UTF-8 characters that are getting converted into the 'unknown' character.

Categories

Resources