java improper cyrillic string

java improper cyrillic string - java

I use JExcel API to read excel file and then I stored them into an ArrayList. Then I insert them into a database. My array list contains cyrillic strings and the problem is these strings do not have inserted into database properly. As long as I have seen, other people can print out cyrillic strings into a standard output properly but I am not sure to how to store them in the collection. Any suggestions? Thanks

Check that the strings are using the correct UNICODE characters for the cyrillic alphabet both before they go into the database and after they come out. If they go in wrong, you can't expect anything after that to work! If they go in right but come out wrong, your interface to the database is wrong and that's where you should fix. If they come out correct individually, then the problem is in how you're presenting the string to the presentation layer.
BTW, checking for cyrillic is just a matter of scanning through the string looking for characters in the range \u0410–\u044f. That should be trivial to code up and ensure that it prints something you can examine easily.

Related

Can anyone tell me the type of character encoding used within these Strings? [Decompiled]

Currently working on a project for a client that involves re modelling a decompiled and obfuscated set of code from a jar that's rather large.
There's a set of strings that keep popping up to be decoded consistently however the methods to decode said strings have been scrambled to the point of illegibility, from previously asking on here no one has time to be crawling through the method to figure out an alternate solution, so instead figuring out the character encoding is the best way to create a solution for the issue.
*Note that the obfuscator used does not have the ability to encrypt hard coded strings
I've tried varying methods of conversion from different libraries and different character sets however it doesn't seem to be playing ball, I asked a much more complex question here earlier but instead the more effective solution is to begin from knowing how to decode it from the start, below are some examples of the strings.
String encodedPriceExample = "\0163J\032'\032J\037\"m\007:P$\031";
//from interpretation shows a price of a transaction
String encodedErrorMessageExample ="V5T\016\005\"J:\037$\036w\0062\013!\017w\0238\037%J4\037%\0302\004#J1\0134\036>\0059J5\0171\005%\017w\0238\037wO$D";
//longer one, should show a no join message of some form.
These are only two strings, however all of them look similar and are decrypted via a scrambled static method as previously said.
Does this character encoding look like any character encoding at all? needs to be converted into UTF-8 or Base64.
Due to it going through a decompiler the string itself may just be jumbled and converted into raw unicode of some form, however I've never seen it happen before even with obfuscation, other hard coded strings in the project are fine, just the strings in those static methods.
Any input and / or help would be greatly appreciated in sorting this out! This is more of a check to make sure that my angle for fixing it up is correct.
Thanks guys

showDocument() with non-standard (Chinese) characters

So, I finally discovered that JavaFX lets you use HostServices.showDocument(uri) to open a browser to the given url. I have run into a problem though; I cannot open up urls that contain Chinese characters. It can only interpret them as '?', taking you to the wrong url. AWT's Display.browse(uri) handles characters without a problem, so I know that it can be communicated to the browser technically. I'm not sure if there is anything I can do on my end or not though.
My question is: Is there any way to make JavaFX's HostServices.showDocument() correctly read in Chinese characters?
EDIT:
Sample string
http://www.mdbg.net/chindict/chindict.php?page=worddict&wdrst=0&wdqb=%E6%96%87
You can follow the link through to see the address' chinese character (at the very end of the url). So in doing this, I noticed that it converts the character to a series of %, letters, and numbers. Plugging those into showDocument() in place of the character works fine. So then, I guess the question is now "How do I convert a character to this format?

I was able to figure out that converting the string into a URI, then using the .toASCIIString() method gave me what I needed. (Converting Chinese characters, and I would assume others, into something readable by showDocument(). Thanks for the help jewelsea.
If there is a better way to do this, feel free to give me another answer.

How do I handle non-English characters properly?

So I'm working with last.fm API. Sometimes, the query results in tracks that contain characters like these:
Æther, é, Hṛṣṭa
or non-English characters like these:
水鏡.
When debugging in Eclipse, I see them just fine (as-is) but printing on console prints these as ??? - which is OK for me.
Now, how do I handle these? At first I though I could remove every song that has any character other than the ones in English language. I used the regex ^\\w+$ but it didn't work. I also tried \\w+. That didn't work either.
Then I thought further on how do handle these properly. Any one can help me out? I am perfectly fine with letting these tracks out of the equation, ie. I'm fine with having only English character tracks.
Another question: What is the best way to display these character of console and/or Swing GUI?

You must ensure that you use correct encoding when reading your input first.
Second ensure that the font used in Eclipse on platform you developing has ability to display all these characters. Swing must display unicode chars if you read them correctly.
You will likely want to use UTF-8 everywhere.

How can I output data with special characters visible?

I have a text file that was provided to me and no one knows the encoding on it. Looking at it in a text editor, everything looks fine, aligned properly into neat columns.
However, I'm seeing some anomalies when I read the data. Even though, visually, the field "Foo" appears in the same columns in the text file (for instance, in columns 15-20), when I try to pull it out using substring(15,20) my data varies wildly. Sometimes I'll pull bytes 11-16, sometimes 18-23, sometimes 15-20...there's no consistency between records.
I suspect that there are some special chartacters, invisible to my text editor, but readable by (and counted in the index of) the String methods. Is there any way in Java to dump the contents of the file with any special characters visible so I can see what I need to Strings I need replace with regex?
If not in Java, can anyone recommed a tool that may be able to help me out?

I would start with having a look at the file directly. Any code adds a layer of doubt. Take a Total Commander (or equivalent on your platform), view the file (F3) and switch to hex mode. You suggest that the special characters behavior is not even consistent between lines, so you should get some visual clue about the format before you even attempt to fix it algorithmically.

Have you tried printing the contents of the file as individual integers or bytes? That way you can see if there are any hidden characters.

MediaMetadataRetriever extractMetadata string encoding

When I use the extractMetadata( MediaMetadataRetriever.METADATA_KEY_TITLE ) function.
Some of the strings returned are displayed incorrectly.
i.e.
Christina Perri - A Thousand Years
is displayed as
䌀栀爀椀猀琀椀渀愀 倀攀爀爀椀 ⴀ 䄀 吀栀漀甀猀愀渀搀 夀攀愀爀猀
Does anyone have any tips as to how I can get the string to display correctly?

I have no idea about Android, but there are two possibilities
You are reading it correctly and someone used this characters while storing the data.
You get the wrong characters because the text you get, has been stored in a different enconding, than you are using to display it. In this case you need to tell Java in which encoding this string is.
A good start to read about encodings is this blog
The Java tutorial for working with text

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java improper cyrillic string - java

Related

Can anyone tell me the type of character encoding used within these Strings? [Decompiled]

showDocument() with non-standard (Chinese) characters

How do I handle non-English characters properly?

How can I output data with special characters visible?

MediaMetadataRetriever extractMetadata string encoding

Categories

Resources