Java Play Framework, WSRequestHolder, how to specify encoding of query parameters?

Java Play Framework, WSRequestHolder, how to specify encoding of query parameters? - java

I am using Play 2.3 Java application, I am sending Get request to a server and I include special characters in query parameters, like Š, which is sent as %C5%A0 but server understand only Windows-1250 characters. In this case it expects %8A (see encoding https://www.w3schools.com/tags/ref_urlencode.asp)
example:
wsRequestHolder.setQueryParameter("city", "Plavecký Štvrtok");
How can I set encoding of sending query paremeters via WSRequestHolder to something different than UTF-8?

There is no implicit way to define the encoding of HTTP Query parameters for WSRequestHandlers in Play.
The RFC 3986 - Uniform Resource Identifier (URI) only defines that characters not available in the ASCII charset must be encoded in a certain way.
So its up to you to convert the String into the proper encoding that is supported by the server. Play will then escape it to be a valid URI only consisting of ASCII characters.
ws.RequestHolder.setQueryParameter("city", new String("Plavecký Štvrtok".getBytes(), "Cp1250")
See supported encodings in Java 8 and what their canonical names are.

Related

Custom URL scheme as adapter on existing URL schemes

Is there a clean and spec-conformant way to define a custom URL scheme that acts as an adapter on the resource returned by another URL?
I have already defined a custom URL protocol which returns a decrypted representation of a local file. So, for instance, in my code,
decrypted-file:///path/to/file
transparently decrypts the file you would get from file:///path/to/file. However, this only works for local files. No fun! I am hoping that the URL specification allows a clean way that I could generalize this by defining a new URL scheme as a kind of adapter on existing URLs.
For example, could I instead define a custom URL scheme decrypted: that could be used as an adapter that prefixes another absolute URL that retrieved a resource? Then I could just do
decrypted:file:///path/to/file
or decrypted:http://server/path/to/file or decrypted:ftp://server/path/to/file or whatever. This would make my decrypted: protocol composable with all existing URL schemes that do file retrieval.
Java does something similar with the jar: URL scheme but from my reading of RFC 3986 it seems like this Java technology violates the URL spec. The embedded URL is not properly byte-encoded, so any /, ?, or # delimiters in the embedded URL should officially be treated as segment delimiters in the embedding URL (even if that's not what JarURLConnection does). I want to stay within the specs.
Is there a nice and correct way to do this? Or is the only option to byte-encode the entire embedded URL (i.e., decrypted:file%3A%2F%2F%2Fpath%2Fto%2Ffile, which is not so nice)?
Is what I'm suggesting (URL adapters) done anywhere else? Or is there a deeper reason why this is misguided?

There's no built-in adaptor in Cocoa, but writing your own using NSURLProtocol is pretty straightforward for most uses. Given an arbitrary URL, encoding it like so seems simplest:
myscheme:<originalurl>
For example:
myscheme:http://example.com/path
At its simplest, NSURL only actually cares if the string you pass in is a valid URI, which the above is. Yes, there is then extra URL support layered on top, based around RFC 1808 etc. but that's not essential.
All that's required to be a valid URI is a colon to indicate the scheme, and no invalid characters (basically, ASCII without spaces).
You can then use the -resourceSpecifier method to retrieve the original URL and work with that.

UTF-8 encoding of GET parameters in JSF

I have a search form in JSF that is implemented using a RichFaces 4 autocomplete component and the following JSF 2 page and Java bean. I use Tomcat 6 & 7 to run the application.
...
<h:commandButton value="#{msg.search}" styleClass="search-btn" action="#{autoCompletBean.doSearch}" />
...
In the AutoCompleteBean
public String doSearch() {
//some logic here
return "/path/to/page/with/multiple_results?query=" + searchQuery + "&faces-redirect=true";
}
This works well as long as everything withing the "searchQuery" String is in Latin-1, it does not work if is outside of Latin-1.
For instance a search for "bodø" will be automatically encoded as "bod%F8". However a search for "Kra Ðong" will not work since it is unable to encode "Ð".
I have now tried several different approaches to solve this, but none of them works.
I have tried encoding the searchQuery my self using URLEncode, but this only leads to double encoding since % is encoded as %25.
I have tried using java.net.URI to get the encoding, but gives the same result as URLEncode.
I have tried turning on UTF-8 in Tomcat using URIEncoding="UTF-8" in the Connector but this only worsens that problem since then non-ascii characters does not work at all.
So to my questions:
Can I change the way JSF 2 encodes the GET parameters?
If I cannot change the way JSF 2 encodes the GET parameters, can I turn of the encoding and do it manually?
Am I doing something where strange here? This seems like something that should be supported out-of-the-box, but I cannot find any others with the same problem.

I think you've hit a corner case bug in JSF. The query string is URL-encoded by ExternalContext#encodeRedirectURL() which uses the response character encoding as obtained by ExternalContext#getResponseCharacterEncoding(). However, while JSF by default uses UTF-8 as response character encoding, this is only been set if the view is actually to be rendered, not when the response is to be redirected, so the response character encoding still returns the platform default of ISO-8859-1 which causes your characters to be URL-encoded using this wrong encoding.
I've reported this as issue 2440. In the meanwhile your best bet is to explicitly set the response character encoding yourself beforehand.
FacesContext.getCurrentInstance().getExternalContext().setResponseCharacterEncoding("UTF-8");
Note that this still requires that the container itself uses the same character encoding to decode the request URL, so you certainly need to set URIEncoding="UTF-8" in Tomcat's configuration. This won't mess up the characters anymore as they will be really UTF-8 now.

The only character encoding accepted for HTTP URLs and headers is US-ASCII, you need to URL encode these characters to send them back to the application. Simplest way to do this in java would be:
public String doSearch() {
//some logic here
String encodedSearchQuery = java.net.URLEncoder.encode( searchQuery, "UTF-8" );
return "/path/to/page/with/multiple_results?query=" + encodedSearchQuery + "&faces-redirect=true";
}
And then it should work for any character that you use.

Is DataMatrix support UTF8 or ISO-8859-2?

I have problem with Barcode4J and generation DataMatrix with ISO-8859-2 characters in message.
Below example use of barcode4j (version 2.1.0) from command line. As You can see when i use message "żaba" i get error Message contains characters outside ISO-8859-1 encoding. Is DataMatrix specification support ISO-8859-1 only or something is missing in Barcode4J ?
java -cp build/barcode4j.jar:lib/avalon-framework-4.2.0.jar:lib/commons-cli-1.0.jar org.krysalis.barcode4j.cli.Main -s datamatrix "żaba"
Exception in thread "main" java.lang.IllegalArgumentException: Message contains characters outside ISO-8859-1 encoding.
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder$EncoderContext.<init>(DataMatrixHighLevelEncoder.java:199)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder.createEncoderContext(DataMatrixHighLevelEncoder.java:171)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder.encodeHighLevel(DataMatrixHighLevelEncoder.java:119)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixLogicImpl.generateBarcodeLogic(DataMatrixLogicImpl.java:50)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixBean.generateBarcode(DataMatrixBean.java:128)
at org.krysalis.barcode4j.impl.ConfigurableBarcodeGenerator.generateBarcode(ConfigurableBarcodeGenerator.java:174)
at org.krysalis.barcode4j.cli.Main.handleCommandLine(Main.java:164)
at org.krysalis.barcode4j.cli.Main.main(Main.java:86)

As is described here, Barcode4J only currently supports the default character set defined by the DataMatrix specification (ISO-8859-1). Support for ECI hasn't been implemented for DataMatrix, yet. You can, however, encode binary messages by encoding a byte stream as an RFC 2397 data URL. That byte stream could be a string encoded using UTF-8. The drawback: the reader might not be able to interpret the data correctly.

How to check encoding in java?

I am facing a problem about encoding.
For example, I have a message in XML, whose format encoding is "UTF-8".
<message>
<product_name>apple</product_name>
<price>1.3</price>
<product_name>orange</product_name>
<price>1.2</price>
.......
</message>
Now, this message is supporting multiple languages:
Traditional Chinese (big5),
Simple Chinese (gb),
English (utf-8)
And it will only change the encoding in specific fields.
For example (Traditional Chinese),
蘋果
1.3
橙
1.2
.......
Only "蘋果" and "橙" are using big5, "<product_name>" and "</product_name>" are still using utf-8.
<price>1.3</price> and <price>1.2</price> are using utf-8.
How do I know which word is using different encoding?

It looks like whoever is providing the XML is providing incorrect XML. They should be using a consistent encoding.
http://sourceforge.net/projects/jchardet/files/ is a pretty good heuristic charset detector.
It's a port of the one used in Firefox to detect the encoding of pages that are missing a charset in content-type or a BOM.
You could use that to try and figure out the encoding for substrings in a malformed XML file if you can't get the provider to fix their output.

you should use only one encoding in one xml file. there are counterparts of the characters of big5 in the UTF_8 encoding.

Because I cannot get the provider to fix the output, so I should be handle it by myself and I cannot use the extend library in this project.
I only can solve that like this,
String str = new String(big5String.getByte("UTF-8"));
before display the message.

encoding problem in servlet

I have a servlet which receive some parameter from the client ,then do some job.
And the parameter from the client is Chinese,so I often got some invalid characters in the servet.
For exmaple:
If I enter
http://localhost:8080/Servlet?q=中文&type=test
Then in the servlet,the parameter of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.
However if I enter the adderss bar again,the url will changed to :
http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test
Now my servlet will get the right parameter of 'q'.
What is the problem?
UPDATE
BTW,it words well when I send the form with post.
WHen I send them in the ajax,for example:
url="http://..q='中文',
xmlhttp.open("POST",url,true);
Then the server side also get the invalid characters.
It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.
That's to say http://.../q=中文 does not work,
http://.../q=%D6%D0%CE%C4 work.
But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work？

Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:
<%# page pageEncoding="UTF-8" %>
Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncoding attribute of the <Connector> element in /conf/server.xml to UTF-8.
<Connector URIEncoding="UTF-8">
For the case that you'd like to use POST, then you need to ensure that the HttpServletRequest is instructed to parse the POST request body using UTF-8.
request.setCharacterEncoding("UTF-8");
Call this before you access the first parameter. A Filter is the best place for this.
See also:
Unicode - How to get the characters right?

Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn't UTF-8!
It would probably be safest to switch to POST requests.

I believe that the problem is on sending side. As I understood from your description if you are writing the URL in browser you get "correctly" encoded request. This job is done by browser: it knows to convert unicode characters to sequence of codes like %xx.
So, try to check how do you send the request. It should be encoded on sending.
Other possibility is to use POST method instead of GET.

Do read this article on URL encoding format "www.blooberry.com/indexdot/html/topics/urlencoding.htm".
If you want, you could convert characters to hex or Base64 and put them in the parameters of the URL.
I think it's better to put them in the body (Post) then the URL (Get).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.