I am calling a shared .dll function using JNA Java. From the documentation, the function can be invoked to receive parameters using Visual C++ as below;
PMSifEncodeKcdLcl(PCHAR ff, PCHAR Dta, BOOL Dbg, PCHAR szOpId, PCHAR szOpFirst, PCHAR szOpLast);
From the doc:
ff - A single ASCII character.
Dta - Points to a null-terminated string.
Dbg - a boolean flag
szOpId - points to a null-terminated string
szOpFirst - points to a null-terminated string
szOpLast - points to a null-terminated string
The string is built from a number of Data Fields. The format for each Data Field within the string is as follows:
RS FI data
RS = Record Separator.
Indicates the start of the Data Field. A single ASCII Record Separator [RS] character (hex 1E)
FI = Field Identifier - Indicates the type of data in the field. A single ASCII character.
data = the actual data. A number of ASCII characters, dependent on the Field Identifier. Sometimes the data is variable in length. The Record Separator of the following field indicates the end of a Data Field (or for the last field, the NULL character at the end of the string).
An Answer Code is returned in field ff. Answer Data (if any) is returned in field Dta
I have cross checked the JNA documentation to confirm field mappings but still no success. After trying for days. I came up with the code below;
My Java Code:
/* JNA interface class
*/
public class JNALocksInterface {
public interface LockLibrary extends StdCallLibrary {
LockLibrary INSTANCE = (LockLibrary) Native.loadLibrary("path_to_dll", LockLibrary.class);
public void PMSifEncodeKcdLcl(byte[] ff, byte[] dta, boolean debug, String szOpid, String szOpFirst, String szOpLast);
}
}
/*My Calling Class Code*/
JNALocksInterface.LockLibrary INSTANCE = JNALocksInterface.LockLibrary.INSTANCE;
String dta = "*R101*L101*TSingle Room*NMatu*FZachary*URegular Guest*D201805021347*O201805030111";
String ff = "A";
byte[] dataBytes = new byte[dta.length() + 1];
System.arraycopy(dta.getBytes("UTF-8"), 0, dataBytes, 0, dta.length());
dataBytes[dta.length()] = 0;
byte[] dtaByteArray = new byte[dta.length() + 1];
byte[] ffByteArray = ff.getBytes("UTF-8");
for (int i = 0; i < dataBytes.length; i++) {
String s1 = String.format("%8s", Integer.toBinaryString(dataBytes[i] & 0xFF)).replace(' ', '0');
// System.out.println(s1);
if((char)dataBytes[i] == '*')
{
dtaByteArray[i] = 30;
}
else{
int val = Integer.parseInt(s1, 2);
byte b = (byte) val;
dtaByteArray[i] = b;
}
}
byte[] commandCodeFinal = new byte[1];
for (int i = 0; i < ffByteArray.length; i++) {
String s2 = String.format("%8s", Integer.toBinaryString(ffByteArray[i] & 0xFF)).replace(' ', '0');
System.out.println(s2);
int val = Integer.parseInt(s2, 2);
byte b = (byte) val;
commandCodeFinal[i] = b;
}
String userNameBytes = "test";
String userFirstNameBytes = "test";
String userLastNameBytes = "test";
INSTANCE.PMSifEncodeKcdLcl(commandCodeFinal, dtaByteArray, false, userNameBytes, userFirstNameBytes, userLastNameBytes);
I am getting a wrong response on field ff and dta as shown below.
FF Response >> :
DTA Response >> 0101IR101L101TSingle RoomNMatuFZacharyURegular GuestD201805021347O2018050
I am replacing "*" with the ascii record separator.
Can someone show me how to correctly call the function using JNA? I've searched all over but still no success.
Solved IT. Used Unicode Field separator and used JNA Memory object and it Worked!
Was also using WIndows 10 64 bit. Changed to Windows 7 32 bit and it worked!!
Replaced the code with below snippet;
String fieldSeparator = "\u001e"
String dataTest = fieldSeparator+"R101"+fieldSeparator+"TSingle Room"+fieldSeparator+"FShujaa"+fieldSeparator+"NMatoke"
+ fieldSeparator+"URegular Guest"+fieldSeparator+"D201805040842"+fieldSeparator+"O201805051245";
String dataTestPadded = org.apache.commons.lang.StringUtils.rightPad(dataTest,30,'0');
System.out.println("Padded string >> " + dataTestPadded);
String data = dataTest;
//getPayloadToSend(payLoadSample) + (char)00;
String commandCode = "A";
Memory commandCodeMemory = new Memory(commandCode.length()+1);
commandCodeMemory.setString(0, commandCode);
Memory dataMemory = new Memory(data.length()+1);
dataMemory.setString(0, data);
//dataMemory.setString(1, "0");
System.out.println("Registerring >> " + INSTANCE.PMSifRegister("42860149", "BatchClient")) ;
INSTANCE.PMSifEncodeKcdLcl(commandCodeMemory, dataMemory, false, "ZKMATU", "zACHARY", "tESTING");
System.out.println("FF Response >> " + commandCodeMemory.getString(0));
System.out.println("DTA Response >> " + dataMemory.getString(0));
INSTANCE.PMSifUnregister();
I am trying to recreate the following logic I created in JAVA to swift:
public String xorMessage(String message, String key) {
try {
if (message == null || key == null) return null;
char[] keys = key.toCharArray();
char[] mesg = message.toCharArray();
int ml = mesg.length;
int kl = keys.length;
char[] newmsg = new char[ml];
for (int i = 0; i < ml; i++) {
newmsg[i] = (char)(mesg[i] ^ keys[i % kl]);
}//for i
return new String(newmsg);
} catch (Exception e) {
return null;
}
I have reached till here while coding in swift3:
import UIKit
import Foundation
let t = "22-Jun-2017 12:30 pm"
let m = "message"
print(UInt8(t))
let a :[UInt8] = Array(t.utf8)
let v = m.characters.map{String ($0) }
print(v)
func encodeWithXorByte(key: UInt8 , Input : String) -> String {
return String(bytes: Input.utf8.map{$0 ^ key}, encoding: String.Encoding.utf8) ?? ""
}
var ml :Int = Int( m.characters.count )
var kl :Int = Int (t.characters.count)
var f = [String]()
for i in 0..<ml{
let key = a[i%kl]
let input = v[i]
f.append(String(bytes: input.utf8.map{$0 ^ key} , encoding : String.Encoding.utf8)!)
// f.append(<#T##newElement: Character##Character#>)
//m[i] = input.utf8.map{$0 ^ key}
}
I am trying to obtain a string(message) which has been xor'ed with a key passed into the above function. But my code in swift is not working as it is returning character array and I want a string, if I try to cast the character array to string it does not show the unicode like \u{0001} etc in the string...
Suppose I get following output :
["_", "W", "^", "9", "\u{14}", "\t", "H"]
and then I try to convert to string, I get this:
_W^9 H
I want :
_W^9\u{14}\tH
Please help.
There are different problems. First, if your intention is to print
"unprintable" characters in a string \u{} escaped then you can use
the .debugDescription method. Example:
let s = "a\u{04}\u{08}b"
print(s) // ab
print(s.debugDescription) // "a\u{04}\u{08}b"
Next, your Swift code converts the string to UTF-8, xor's the bytes
and then converts the result back to a String. That can easily fail
if the xor'ed byte sequence is not valid UTF-8.
The Java code operates on UTF-16 code units, so the equivalent Swift
code would be
func xorMessage(message: String, key: String) -> String {
let keyChars = Array(key.utf16)
let keyLen = keyChars.count
let newMsg = message.utf16.enumerated().map { $1 ^ keyChars[$0 % keyLen] }
return String(utf16CodeUnits: newMsg, count: newMsg.count)
}
Example:
let t = "22-Jun-2017 12:30 pm"
let m = "message"
let encrypted = xorMessage(message: m, key: t)
print(encrypted.debugDescription) // "_W^9\u{14}\tH"
Finally, even that can produce unexpected results unless you restrict
the input (key and message) to ASCII characters. Example:
let m = "😀"
print(Array(m.utf16).map { String($0, radix: 16)} ) // ["d83d", "de00"]
let t = "a€"
print(Array(t.utf16).map { String($0, radix: 16)} ) // ["61", "20ac"]
let e = xorMessage(message: m, key: t)
print(Array(e.utf16).map { String($0, radix: 16)} ) // ["fffd", "feac"]
let d = xorMessage(message: e, key: t)
print(Array(d.utf16).map { String($0, radix: 16)} ) // ["ff9c", "fffd"]
print(d) // ワ�
print(d == m) // false
The problem is that the xor'ing produces an invalid UTF-16 sequence
(an unbalanced surrogate pair), which is then replaced by the
"replacement character" U+FFFD.
I don't know how Java handles this, but Swift strings cannot invalid
Unicode scalar values, so the only solution would be to represent
the result as an [UInt16] array instead of a String.
This question already has answers here:
Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars
(12 answers)
Closed 5 days ago.
Is there a better way for getting rid of accents and making those letters regular apart from using String.replaceAll() method and replacing letters one by one?
Example:
Input: orčpžsíáýd
Output: orcpzsiayd
It doesn't need to include all letters with accents like the Russian alphabet or the Chinese one.
Start with java.text.Normalizer.
string = Normalizer.normalize(string, Normalizer.Form.NFD);
// or Normalizer.Form.NFKD for a more "compatible" deconstruction
This will separate all of the accent marks from most characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.
string = string.replaceAll("[^\\p{ASCII}]", "");
If your text is in Unicode, you should use this instead:
string = string.replaceAll("\\p{M}", "");
For Unicode, \\P{M} matches the base glyph and \\p{M} (lowercase) matches each accent.
Thanks to GarretWilson for the pointer and regular-expressions.info for the great Unicode guide.
It is important to note that Normalizer by itself is insufficient to remove diacritics. For example, the following will not replace the accented é with the unaccented e:
import static java.text.Normalizer.normalize;
import static java.text.Normalizer.Form.*;
public class T {
public static void main( final String[] args ) {
final var text = "Brévis";
System.out.println(
normalize( text, NFD ) + " " +
normalize( text, NFC ) + " " +
normalize( text, NFKD ) + " " +
normalize( text, NFKC )
);
}
}
As of 2011 you can use Apache Commons StringUtils.stripAccents(input) (since 3.0):
String input = StringUtils.stripAccents("Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ");
System.out.println(input);
// Prints "This is a funky String"
Note:
The accepted answer (Erick Robertson's) doesn't work for Ø or Ł. Apache Commons 3.5 doesn't work for Ø either, but it does work for Ł. After reading the Wikipedia article for Ø, I'm not sure it should be replaced with "O": it's a separate letter in Norwegian and Danish, alphabetized after "z". It's a good example of the limitations of the "strip accents" approach.
The solution by #virgo47 is very fast, but approximate. The accepted answer uses Normalizer and a regular expression. I wondered what part of the time was taken by Normalizer versus the regular expression, since removing all the non-ASCII characters can be done without a regex:
import java.text.Normalizer;
public class Strip {
public static String flattenToAscii(String string) {
StringBuilder sb = new StringBuilder(string.length());
string = Normalizer.normalize(string, Normalizer.Form.NFD);
for (char c : string.toCharArray()) {
if (c <= '\u007F') sb.append(c);
}
return sb.toString();
}
}
Small additional speed-ups can be obtained by writing into a char[] and not calling toCharArray(), although I'm not sure that the decrease in code clarity merits it:
public static String flattenToAscii(String string) {
char[] out = new char[string.length()];
string = Normalizer.normalize(string, Normalizer.Form.NFD);
int j = 0;
for (int i = 0, n = string.length(); i < n; ++i) {
char c = string.charAt(i);
if (c <= '\u007F') out[j++] = c;
}
return new String(out);
}
This variation has the advantage of the correctness of the one using Normalizer and some of the speed of the one using a table. On my machine, this one is about 4x faster than the accepted answer, and 6.6x to 7x slower that #virgo47's (the accepted answer is about 26x slower than #virgo47's on my machine).
EDIT: If you're not stuck with Java <6 and speed is not critical and/or translation table is too limiting, use answer by David. The point is to use Normalizer (introduced in Java 6) instead of translation table inside the loop.
While this is not "perfect" solution, it works well when you know the range (in our case Latin1,2), worked before Java 6 (not a real issue though) and is much faster than the most suggested version (may or may not be an issue):
/**
* Mirror of the unicode table from 00c0 to 017f without diacritics.
*/
private static final String tab00c0 = "AAAAAAACEEEEIIII" +
"DNOOOOO\u00d7\u00d8UUUUYI\u00df" +
"aaaaaaaceeeeiiii" +
"\u00f0nooooo\u00f7\u00f8uuuuy\u00fey" +
"AaAaAaCcCcCcCcDd" +
"DdEeEeEeEeEeGgGg" +
"GgGgHhHhIiIiIiIi" +
"IiJjJjKkkLlLlLlL" +
"lLlNnNnNnnNnOoOo" +
"OoOoRrRrRrSsSsSs" +
"SsTtTtTtUuUuUuUu" +
"UuUuWwYyYZzZzZzF";
/**
* Returns string without diacritics - 7 bit approximation.
*
* #param source string to convert
* #return corresponding string without diacritics
*/
public static String removeDiacritic(String source) {
char[] vysl = new char[source.length()];
char one;
for (int i = 0; i < source.length(); i++) {
one = source.charAt(i);
if (one >= '\u00c0' && one <= '\u017f') {
one = tab00c0.charAt((int) one - '\u00c0');
}
vysl[i] = one;
}
return new String(vysl);
}
Tests on my HW with 32bit JDK show that this performs conversion from àèéľšťč89FDČ to aeelstc89FDC 1 million times in ~100ms while Normalizer way makes it in 3.7s (37x slower). In case your needs are around performance and you know the input range, this may be for you.
Enjoy :-)
System.out.println(Normalizer.normalize("àèé", Normalizer.Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", ""));
worked for me. The output of the snippet above gives "aee" which is what I wanted, but
System.out.println(Normalizer.normalize("àèé", Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", ""));
didn't do any substitution.
Depending on the language, those might not be considered accents (which change the sound of the letter), but diacritical marks
https://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics
"Bosnian and Croatian have the symbols č, ć, đ, š and ž, which are considered separate letters and are listed as such in dictionaries and other contexts in which words are listed according to alphabetical order."
Removing them might be inherently changing the meaning of the word, or changing the letters into completely different ones.
I have faced the same issue related to Strings equality check, One of the comparing string has
ASCII character code 128-255.
i.e., Non-breaking space - [Hex - A0] Space [Hex - 20].
To show Non-breaking space over HTML. I have used the following spacing entities. Their character and its bytes are like &emsp is very wide space[ ]{-30, -128, -125}, &ensp is somewhat wide space[ ]{-30, -128, -126}, &thinsp is narrow space[ ]{32} , Non HTML Space {}
String s1 = "My Sample Space Data", s2 = "My Sample Space Data";
System.out.format("S1: %s\n", java.util.Arrays.toString(s1.getBytes()));
System.out.format("S2: %s\n", java.util.Arrays.toString(s2.getBytes()));
Output in Bytes:
S1: [77, 121, 32, 83, 97, 109, 112, 108, 101, 32, 83, 112, 97, 99, 101, 32, 68, 97, 116, 97]
S2: [77, 121, -30, -128, -125, 83, 97, 109, 112, 108, 101, -30, -128, -125, 83, 112, 97, 99, 101, -30, -128, -125, 68, 97, 116, 97]
Use below code for Different Spaces and their Byte-Codes: wiki for List_of_Unicode_characters
String spacing_entities = "very wide space,narrow space,regular space,invisible separator";
System.out.println("Space String :"+ spacing_entities);
byte[] byteArray =
// spacing_entities.getBytes( Charset.forName("UTF-8") );
// Charset.forName("UTF-8").encode( s2 ).array();
{-30, -128, -125, 44, -30, -128, -126, 44, 32, 44, -62, -96};
System.out.println("Bytes:"+ Arrays.toString( byteArray ) );
try {
System.out.format("Bytes to String[%S] \n ", new String(byteArray, "UTF-8"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
➩ ASCII transliterations of Unicode string for Java. unidecode
String initials = Unidecode.decode( s2 );
➩ using Guava: Google Core Libraries for Java.
String replaceFrom = CharMatcher.WHITESPACE.replaceFrom( s2, " " );
For URL encode for the space use Guava laibrary.
String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);
➩ To overcome this problem used String.replaceAll() with some RegularExpression.
// \p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
s2 = s2.replaceAll("\\p{Zs}", " ");
s2 = s2.replaceAll("[^\\p{ASCII}]", " ");
s2 = s2.replaceAll(" ", " ");
➩ Using java.text.Normalizer.Form.
This enum provides constants of the four Unicode normalization forms that are described in Unicode Standard Annex #15 — Unicode Normalization Forms and two methods to access them.
s2 = Normalizer.normalize(s2, Normalizer.Form.NFKC);
Testing String and outputs on different approaches like ➩ Unidecode, Normalizer, StringUtils.
String strUni = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ Æ,Ø,Ð,ß";
// This is a funky String AE,O,D,ss
String initials = Unidecode.decode( strUni );
// Following Produce this o/p: Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ Æ,Ø,Ð,ß
String temp = Normalizer.normalize(strUni, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
temp = pattern.matcher(temp).replaceAll("");
String input = org.apache.commons.lang3.StringUtils.stripAccents( strUni );
Using Unidecode is the best choice, My final Code shown below.
public static void main(String[] args) {
String s1 = "My Sample Space Data", s2 = "My Sample Space Data";
String initials = Unidecode.decode( s2 );
if( s1.equals(s2)) { //[ , ] %A0 - %2C - %20 « http://www.ascii-code.com/
System.out.println("Equal Unicode Strings");
} else if( s1.equals( initials ) ) {
System.out.println("Equal Non Unicode Strings");
} else {
System.out.println("Not Equal");
}
}
I suggest Junidecode . It will handle not only 'Ł' and 'Ø', but it also works well for transcribing from other alphabets, such as Chinese, into Latin alphabet.
One of the best way using regex and Normalizer if you have no library is :
public String flattenToAscii(String s) {
if(s == null || s.trim().length() == 0)
return "";
return Normalizer.normalize(s, Normalizer.Form.NFD).replaceAll("[\u0300-\u036F]", "");
}
This is more efficient than replaceAll("[^\p{ASCII}]", "")) and if you don't need diacritics (just like your example).
Otherwise, you have to use the p{ASCII} pattern.
Regards.
Since this solution is already available in StringUtils.stripAccents() at Maven Repository and working for Ł as mentioned by #DavidS.
But I need this to be working for both Ø and Ł So modified as below. May be help full for others too.
Update
This is modified version of StringUtils.stripAccents(String obj), that contains old functionality along with handling both Ø and Ł chars.
public static String stripAccents(final String input) {
if (input == null) {
return null;
}
final StringBuilder decomposed = new StringBuilder(Normalizer.normalize(input, Normalizer.Form.NFD));
for (int i = 0; i < decomposed.length(); i++) {
if (decomposed.charAt(i) == '\u0141') {
decomposed.setCharAt(i, 'L');
} else if (decomposed.charAt(i) == '\u0142') {
decomposed.setCharAt(i, 'l');
}else if (decomposed.charAt(i) == '\u00D8') {
decomposed.setCharAt(i, 'O');
}else if (decomposed.charAt(i) == '\u00F8') {
decomposed.setCharAt(i, 'o');
}
}
// Note that this doesn't correctly remove ligatures...
return Pattern.compile("\\p{InCombiningDiacriticalMarks}+").matcher(decomposed).replaceAll("");
}
Input string Ł Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ Ø ø
output string L This is a funky String O o
#David Conrad solution is the fastest I tried using the Normalizer, but it does have a bug. It basically strips characters which are not accents, for example Chinese characters and other letters like æ, are all stripped.
The characters that we want to strip are non spacing marks, characters which don't take up extra width in the final string. These zero width characters basically end up combined in some other character. If you can see them isolated as a character, for example like this `, my guess is that it's combined with the space character.
public static String flattenToAscii(String string) {
char[] out = new char[string.length()];
String norm = Normalizer.normalize(string, Normalizer.Form.NFD);
int j = 0;
for (int i = 0, n = norm.length(); i < n; ++i) {
char c = norm.charAt(i);
int type = Character.getType(c);
//Log.d(TAG,""+c);
//by Ricardo, modified the character check for accents, ref: http://stackoverflow.com/a/5697575/689223
if (type != Character.NON_SPACING_MARK){
out[j] = c;
j++;
}
}
//Log.d(TAG,"normalized string:"+norm+"/"+new String(out));
return new String(out);
}
I think the best solution is converting each char to HEX and replace it with another HEX. It's because there are 2 Unicode typing:
Composite Unicode
Precomposed Unicode
For example "Ồ" written by Composite Unicode is different from "Ồ" written by Precomposed Unicode. You can copy my sample chars and convert them to see the difference.
In Composite Unicode, "Ồ" is combined from 2 char: Ô (U+00d4) and ̀ (U+0300)
In Precomposed Unicode, "Ồ" is single char (U+1ED2)
I have developed this feature for some banks to convert the info before sending it to core-bank (usually don't support Unicode) and faced this issue when the end-users use multiple Unicode typing to input the data. So I think, converting to HEX and replace it is the most reliable way.
A fast and safer way
public static String removeDiacritics(String str) {
if (str == null)
return null;
if (str.isEmpty())
return "";
int len = str.length();
StringBuilder sb
= new StringBuilder(len);
//iterate string codepoints
for (int i = 0; i < len; ) {
int codePoint = str.codePointAt(i);
int charCount
= Character.charCount(codePoint);
if (charCount > 1) {
for (int j = 0; j < charCount; j++)
sb.append(str.charAt(i + j));
i += charCount;
continue;
}
else if (codePoint <= 127) {
sb.append((char)codePoint);
i++;
continue;
}
sb.append(
java.text.Normalizer
.normalize(
Character.toString((char)codePoint),
java.text.Normalizer.Form.NFD)
.charAt(0));
i++;
}
return sb.toString();
}
Faced the same issue, here's solution using Kotlin extension
val String.stripAccents: String
get() = Regex("\\p{InCombiningDiacriticalMarks}+")
.replace(
Normalizer.normalize(this, Normalizer.Form.NFD),
""
)
usage
val textWithoutAccents = "some accented string".stripAccents
In case anyone is strugling to do this in kotlin, this code works like a charm. To avoid inconsistencies I also use .toUpperCase and Trim(). then i cast this function:
fun stripAccents(s: String):String{
if (s == null) {
return "";
}
val chars: CharArray = s.toCharArray()
var sb = StringBuilder(s)
var cont: Int = 0
while (chars.size > cont) {
var c: kotlin.Char
c = chars[cont]
var c2:String = c.toString()
//these are my needs, in case you need to convert other accents just Add new entries aqui
c2 = c2.replace("Ã", "A")
c2 = c2.replace("Õ", "O")
c2 = c2.replace("Ç", "C")
c2 = c2.replace("Á", "A")
c2 = c2.replace("Ó", "O")
c2 = c2.replace("Ê", "E")
c2 = c2.replace("É", "E")
c2 = c2.replace("Ú", "U")
c = c2.single()
sb.setCharAt(cont, c)
cont++
}
return sb.toString()
}
to use these fun cast the code like this:
var str: String
str = editText.text.toString() //get the text from EditText
str = str.toUpperCase().trim()
str = stripAccents(str) //call the function