Is DataMatrix support UTF8 or ISO-8859-2? - java

I have problem with Barcode4J and generation DataMatrix with ISO-8859-2 characters in message.
Below example use of barcode4j (version 2.1.0) from command line. As You can see when i use message "żaba" i get error Message contains characters outside ISO-8859-1 encoding. Is DataMatrix specification support ISO-8859-1 only or something is missing in Barcode4J ?
java -cp build/barcode4j.jar:lib/avalon-framework-4.2.0.jar:lib/commons-cli-1.0.jar org.krysalis.barcode4j.cli.Main -s datamatrix "żaba"
Exception in thread "main" java.lang.IllegalArgumentException: Message contains characters outside ISO-8859-1 encoding.
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder$EncoderContext.<init>(DataMatrixHighLevelEncoder.java:199)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder.createEncoderContext(DataMatrixHighLevelEncoder.java:171)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixHighLevelEncoder.encodeHighLevel(DataMatrixHighLevelEncoder.java:119)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixLogicImpl.generateBarcodeLogic(DataMatrixLogicImpl.java:50)
at org.krysalis.barcode4j.impl.datamatrix.DataMatrixBean.generateBarcode(DataMatrixBean.java:128)
at org.krysalis.barcode4j.impl.ConfigurableBarcodeGenerator.generateBarcode(ConfigurableBarcodeGenerator.java:174)
at org.krysalis.barcode4j.cli.Main.handleCommandLine(Main.java:164)
at org.krysalis.barcode4j.cli.Main.main(Main.java:86)

As is described here, Barcode4J only currently supports the default character set defined by the DataMatrix specification (ISO-8859-1). Support for ECI hasn't been implemented for DataMatrix, yet. You can, however, encode binary messages by encoding a byte stream as an RFC 2397 data URL. That byte stream could be a string encoded using UTF-8. The drawback: the reader might not be able to interpret the data correctly.

Related

Java Encoding for "GB2312" CHARACTER ® replacing with question mark(?)

I'm trying to get encoded value using GB2312 characterset but I'm getting '? 'instead of '®'
Below is my sample code:
new String("Test ®".getBytes("GB2312"));
but I'm getting Test ? instead of Test ®.
Does any one faced this issue?
Java version- JDK6
Platform: Window 7
I'm not aware of Chinese character encoding so need suggestion.
For better understanding, the statement divided in two parts:
byte[] bytes = "Test ®".getBytes("GB2312"); // bytes, encoding the string to GB2312
new String(bytes); // back to string, using default encoding
Probably ® is not a valid GB2312 character, so it is converted to ?. See the result of
Charset.forName("GB2312").newEncoder().canEncode("®")
Based on documentation of getBytes:
The behavior of this method when this string cannot be encoded in the given charset is unspecified. The CharsetEncoder class should be used when more control over the encoding process is required.
which also suggest using CharsetEncoder.

Java Play Framework, WSRequestHolder, how to specify encoding of query parameters?

I am using Play 2.3 Java application, I am sending Get request to a server and I include special characters in query parameters, like Š, which is sent as %C5%A0 but server understand only Windows-1250 characters. In this case it expects %8A (see encoding https://www.w3schools.com/tags/ref_urlencode.asp)
example:
wsRequestHolder.setQueryParameter("city", "Plavecký Štvrtok");
How can I set encoding of sending query paremeters via WSRequestHolder to something different than UTF-8?
There is no implicit way to define the encoding of HTTP Query parameters for WSRequestHandlers in Play.
The RFC 3986 - Uniform Resource Identifier (URI) only defines that characters not available in the ASCII charset must be encoded in a certain way.
So its up to you to convert the String into the proper encoding that is supported by the server. Play will then escape it to be a valid URI only consisting of ASCII characters.
ws.RequestHolder.setQueryParameter("city", new String("Plavecký Štvrtok".getBytes(), "Cp1250")
See supported encodings in Java 8 and what their canonical names are.

How to set encoding for bundle?

I use properties for i18n. If I specify non-English locale:
ResourceBundle messages = ResourceBundle.getBundle("i18n/Messages", new Locale("tr", "TR"));
I see the message with wrong encoding:
ÐÑивеÑ!
How to set the correct encoding?
Problem: By default ISO 8859-1 character encoding is used for reading content of properties file, so if the file contains any character beyond ISO 8859-1 then it will not be processed properly.
First solution:
Process the properties file to have content with ISO 8859-1 character encoding by using native2ascii - Native-to-ASCII Converter
What native2ascii does: It converts all the non-ISO 8859-1 character in their equivalent \uXXXX. This is a good tool because you need not to search the \uXXXX equivalent of special character.
Usage for UTF-8: native2ascii -encoding utf8 e:\a.txt e:\b.txt
If you use Eclipse then you will notice that it implicitly converts the special character into \uXXXX equivalent. Try copying
会意字 / 會意字
into a properties file opened in Eclipse.
Second solution: Create a custom ResourceBundle.Control class which can be used with ResourceBundle to read properties in any given encoding scheme. You may want to use an already created and shared CustomResourceBundleControl.
Use it as below:
ResourceBundle messages = ResourceBundle.getBundle("i18n/Messages", new CustomResourceBundleControl("UTF-8"));

Converting XML to JSON results in unknown characters when running on Centos instead of Windows

I have a Java servlet which gets RSS feeds converts them to JSON. It works great on Windows, but it fails on Centos.
The RSS feed contains Arabic and it shows unintelligible characters on Centos. I am using those lines to encode the RSS feed:
byte[] utf8Bytes = Xml.getBytes("Cp1256");
// byte[] defaultBytes = Xml.getBytes();
String roundTrip = new String(utf8Bytes, "UTF-8");
I tried it on Glassfish and Tomcat. Both have the same problem; it works on Windows, but fails on Centos. How is this caused and how can I solve it?
byte[] utf8Bytes = Xml.getBytes("Cp1256");
String roundTrip = new String(utf8Bytes, "UTF-8");
This is an attempt to correct a badly-decoded string. At some point prior to this operation you have read in Xml using the default encoding, which on your Windows box is code page 1256 (Windows Arabic). Here you are encoding that string back to code page 1256 to retrieve its original bytes, then decoding it properly as the encoding you actually wanted, UTF-8.
On your Linux server, it fails, because the default encoding is something other than Cp1256; it would also fail on any Windows server not installed in an Arabic locale.
The commented-out line that uses the default encoding instead of explicitly Cp1256 is more likely to work on a Linux server. However, the real fix is to find where Xml is being read, and fix that operation to use the correct encoding(*) instead of the default. Allowing the default encoding to be used is almost always a mistake, as it makes applications dependent on configuration that varies between servers.
(*: for this feed, that's UTF-8, which is the most common encoding, but it may differ for others. Finding out the right encoding for a feed depends on the Content-Type header returned for the resource and the <?xml encoding declaration. By far the best way to cope with this is to fetch and parse the resource using a proper XML library that knows about this, for example with DocumentBuilder.parse(uri).)
There are many places where wrong encoding can be used. Here is the complete list http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8

How to check encoding in java?

I am facing a problem about encoding.
For example, I have a message in XML, whose format encoding is "UTF-8".
<message>
<product_name>apple</product_name>
<price>1.3</price>
<product_name>orange</product_name>
<price>1.2</price>
.......
</message>
Now, this message is supporting multiple languages:
Traditional Chinese (big5),
Simple Chinese (gb),
English (utf-8)
And it will only change the encoding in specific fields.
For example (Traditional Chinese),
蘋果
1.3
橙
1.2
.......
Only "蘋果" and "橙" are using big5, "<product_name>" and "</product_name>" are still using utf-8.
<price>1.3</price> and <price>1.2</price> are using utf-8.
How do I know which word is using different encoding?
It looks like whoever is providing the XML is providing incorrect XML. They should be using a consistent encoding.
http://sourceforge.net/projects/jchardet/files/ is a pretty good heuristic charset detector.
It's a port of the one used in Firefox to detect the encoding of pages that are missing a charset in content-type or a BOM.
You could use that to try and figure out the encoding for substrings in a malformed XML file if you can't get the provider to fix their output.
you should use only one encoding in one xml file. there are counterparts of the characters of big5 in the UTF_8 encoding.
Because I cannot get the provider to fix the output, so I should be handle it by myself and I cannot use the extend library in this project.
I only can solve that like this,
String str = new String(big5String.getByte("UTF-8"));
before display the message.

Categories

Resources