I am trying to encode Arabic text from a Web service. Currently the values come as question marks (???).
I have read many blogs (even stackoverflow answers/links) but nothing seems to worked.
Any idea of how I can resolve this issue?
Thanks
If you use dreamweaver's designer view and paste your Arabic text in design view you will get ascii characters in dreamweaver's code view which will work in any web browser.
First, an important aside: check that the web service you are consuming sends you actual Arabic characters and not actual question marks. Check a network dump if you are not sure, and use wget/curl to perform a simple transaction; check the results.
If the raw data as sent by the WS is question marks, you have an uphill battle - try again and fiddle with the Accept/Accept-Charset headers. If all fail, it may be that the server itself isn't coded properly and there ain't much you can do after that...
Also, you're trying to decode the text, convert it from a byte representation to abstract characters.
This has been the problem Sending UTF-8 data from Android. Your code would work fine except that you will have to encode your String to Base64 . At Server PHP you just decode Base64 String back. It worked for me. I can share if you need the code.
Related
I'm handling a Web Service and need some help. The process is that a pdf will be encoded with base64 and sent to my web service. I will then decode it back into a pdf and place it in the appropriate folder. The issue is that the request needs to contain the actual giant base64 string. First question is is this possible. Second, I am using postman to make the requests and was wondering how to even copy the base64 string into it. It seems there's a string limit. Any help would be greatly appreciated.
I don't know about postman but I can suggest to use JAX-RS and implement a ReaderInterceptor and a WriterInterceptor using Base64.Decoder#wrap respectively Base64.Encoder#wrap.
Otherwise, maybe postman has similar features?
Use streams like these as much as possible to reduce memory usage.
Tutorial:
https://jersey.java.net/documentation/latest/filters-and-interceptors.html#d0e9806
Alright it just seems to be an issue with Postman. When you place a string of that size it will give you errors and only put a certain length per line. It will still receive the entire string. I am able to receive it and decode it. Thank you all for your help!
I am having difficulty receiving a PDF file via a rest service. The rest service returns a long string with the data that is suppose to make up the PDF. My overall goal is to save the response as a PDF file for later use.
I call this service url: http://4.hidemyass.com/ip-2/encrypted/dnXuHKAZVZaONS2GNfC9RFn8k8puE2YJx6MjcPDMaKdpMRTBkvNF4CrTg4m7GeKjcLfO1bgYWIwR9bz1ZJP-LTK6Gm8tG_-d4V-oSUMfT-tIJMuZizsz9AeZp5tcZWVcz62A6j7YRWqJRAS_s_cMFLlo&f=norefer
and according the docs, it should be the string contents that make up a valid PDF.
What am I missing? What do I need to do in order to make is viewable as a PDF.
Thanks!
Chuc
Sorry the above link failed.
In the end we found that sending the PDF as binary data wrapped in JSON did not work very well. The creators of the service found that their framework was manipulating the binary data ever so slightly and converting some characters. They ended up switching to Base 64 encoding which worked great.
Let's say someone uses this letter: ë. They input it in an EditText Box and it correctly is stored in the MySQL Database (via a php script). But to grap that database field with that special character causes an output of "null" in Java/Android.
It appears my database is setup and storing correctly. But retrieving is the issue. Do I have to fix this in the PHP side or handle it in Java/Android? EDIT: I don't believe this has anything to do with the PHP side anymore so I am more interested int he Java side.
Sounds similar to: android, UTF8 - How do I ensure UTF8 is used for a shared preference
I suspect that the problem occurs over the web interface between the web service and the Android App. One side is sending UTF-16 or ISO 8859-1 characters, and the other is interpreting it as UTF-8 (or vice versa). Make sure:
That the web request from Android is using UTF-8
That the web service replies using UTF-8.
As in the other answer, use a HTTP debugging proxy to check that the characters being sent between the Android App and the web service are what you expect.
I suggest to extract your database access code to a standard Java Env then compile and test it. This will help you to isolate the problem.
Usually you won't get null even if there is encode problem. Check other problem and if other exception throws.
Definitely not problem of PHP if you sure the string is correctly inserted.
Probably a confusion between UTF-8 and UTF-16 or any other character set that you might be using for storing these international characters. In UTF-16, the character ë will be stored as two bytes with the first byte beeing the null byte (0x00). If this double byte is incorrectly transmitted back as, said, UTF-8, then the null byte will be seen as the end of string terminator; resulting in the output of a null value instead of the international character.
First, you need to be 100% sure that your international characters are stored correctly in the database. Seeing the correct result in a php page on a web site is not a guaranty for that; as two wrongs can give you a right. In the past, I have often seen incorrectly stored characters in a database that were still displayed correctly on a web page or some system. This will looks OK until you need to access your database from another system and at this point; everything break loose because you cannot repeat the same kind of errors on the second system.
When I use the extractMetadata( MediaMetadataRetriever.METADATA_KEY_TITLE ) function.
Some of the strings returned are displayed incorrectly.
i.e.
Christina Perri - A Thousand Years
is displayed as
䌀栀爀椀猀琀椀渀愀 倀攀爀爀椀 ⴀ 䄀 吀栀漀甀猀愀渀搀 夀攀愀爀猀
Does anyone have any tips as to how I can get the string to display correctly?
I have no idea about Android, but there are two possibilities
You are reading it correctly and someone used this characters while storing the data.
You get the wrong characters because the text you get, has been stored in a different enconding, than you are using to display it. In this case you need to tell Java in which encoding this string is.
A good start to read about encodings is this blog
The Java tutorial for working with text
First I would like to say thank you for the help in advance.
I am currently writing a web crawler that parses HTML content, strips HTML tags, and then spell checks the text which is retrieved from the parsing.
Stripping HTML tags and spell checking has not caused any problems, using JSoup and Google Spell Check API.
I am able to pull down content from a URL and passing this information into a byte[] and then ultimately a String so that it can be stripped and spell checked. I am running into a problem with character encoding.
For example when parsing http://www.testwareinc.com/...
Original Text: We’ve expanded our Mobile Web and Mobile App testing services.
... the page is using ISO-8859-1 according to meta tag...
ISO-8859-1 Parse: Weve expanded our Mobile Web and Mobile App testing services.
... then trying using UTF-8...
UTF-8 Parse: We�ve expanded our Mobile Web and Mobile App testing services.
Question
Is it possible that HTML of a webpage can include a mix of encodings? And how can that be detected?
It looks like the apostrophe is coded as a 0x92 byte, which according to Wikipedia is an unassigned/private code point.
From there on, it looks like the browser falls back by assuming it's a non-encoded 1-byte Unicode code point : +0092 (Private Use Two) which appears to be represented as an apostrophe. No wait, if it's one byte, it's more probably cp1252: Browsers must have a fallback strategy according to the advertised CP, such as ISO-8859-1 -> CP1252.
So no mix of encoding here but as others said a broken document. But with a fallback heuristic that will sometimes help, sometimes not.
If you're curious enough, you may want to dive into FF or Chrome's source code to see exactly what they do in such a case.
Having more than 1 encoding in a document isn't a mixed document, it is a broken document.
Unfortunately there are a lot of web pages that use an encoding that doesn't match the document definition, or contains some data that is valid in the given encoding and some content that is invalid.
There is no good way to handle this. It is possible to try and guess the encoding of a document, but it is difficult and not 100% reliable. In cases like yours, the simplest solution is just to ignore parts of the document that can't be decoded.
Apache Tika has an encoding detector. There are also commercial alternatives if you need, say, something in C++ and are in a position to spend money.
I can pretty much guarantee that each web page is in one encoding, but it's easy to be mistaken about which one.
seems like issue with special characters. Check this StringEscapeUtils.escapeHtml if it helps. or any method there
edited: added this logic as he was not able to get code working
public static void main(String[] args) throws FileNotFoundException {
String asd = "’";
System.out.println(StringEscapeUtils.escapeXml(asd)); //output - ’
System.out.println(StringEscapeUtils.escapeHtml(asd)); //output - ’
}