Xor a string that is uint16 or uint32 - java

I am trying to recreate the following logic I created in JAVA to swift:
public String xorMessage(String message, String key) {
try {
if (message == null || key == null) return null;
char[] keys = key.toCharArray();
char[] mesg = message.toCharArray();
int ml = mesg.length;
int kl = keys.length;
char[] newmsg = new char[ml];
for (int i = 0; i < ml; i++) {
newmsg[i] = (char)(mesg[i] ^ keys[i % kl]);
}//for i
return new String(newmsg);
} catch (Exception e) {
return null;
}
I have reached till here while coding in swift3:
import UIKit
import Foundation
let t = "22-Jun-2017 12:30 pm"
let m = "message"
print(UInt8(t))
let a :[UInt8] = Array(t.utf8)
let v = m.characters.map{String ($0) }
print(v)
func encodeWithXorByte(key: UInt8 , Input : String) -> String {
return String(bytes: Input.utf8.map{$0 ^ key}, encoding: String.Encoding.utf8) ?? ""
}
var ml :Int = Int( m.characters.count )
var kl :Int = Int (t.characters.count)
var f = [String]()
for i in 0..<ml{
let key = a[i%kl]
let input = v[i]
f.append(String(bytes: input.utf8.map{$0 ^ key} , encoding : String.Encoding.utf8)!)
// f.append(<#T##newElement: Character##Character#>)
//m[i] = input.utf8.map{$0 ^ key}
}
I am trying to obtain a string(message) which has been xor'ed with a key passed into the above function. But my code in swift is not working as it is returning character array and I want a string, if I try to cast the character array to string it does not show the unicode like \u{0001} etc in the string...
Suppose I get following output :
["_", "W", "^", "9", "\u{14}", "\t", "H"]
and then I try to convert to string, I get this:
_W^9 H
I want :
_W^9\u{14}\tH
Please help.

There are different problems. First, if your intention is to print
"unprintable" characters in a string \u{} escaped then you can use
the .debugDescription method. Example:
let s = "a\u{04}\u{08}b"
print(s) // ab
print(s.debugDescription) // "a\u{04}\u{08}b"
Next, your Swift code converts the string to UTF-8, xor's the bytes
and then converts the result back to a String. That can easily fail
if the xor'ed byte sequence is not valid UTF-8.
The Java code operates on UTF-16 code units, so the equivalent Swift
code would be
func xorMessage(message: String, key: String) -> String {
let keyChars = Array(key.utf16)
let keyLen = keyChars.count
let newMsg = message.utf16.enumerated().map { $1 ^ keyChars[$0 % keyLen] }
return String(utf16CodeUnits: newMsg, count: newMsg.count)
}
Example:
let t = "22-Jun-2017 12:30 pm"
let m = "message"
let encrypted = xorMessage(message: m, key: t)
print(encrypted.debugDescription) // "_W^9\u{14}\tH"
Finally, even that can produce unexpected results unless you restrict
the input (key and message) to ASCII characters. Example:
let m = "😀"
print(Array(m.utf16).map { String($0, radix: 16)} ) // ["d83d", "de00"]
let t = "a€"
print(Array(t.utf16).map { String($0, radix: 16)} ) // ["61", "20ac"]
let e = xorMessage(message: m, key: t)
print(Array(e.utf16).map { String($0, radix: 16)} ) // ["fffd", "feac"]
let d = xorMessage(message: e, key: t)
print(Array(d.utf16).map { String($0, radix: 16)} ) // ["ff9c", "fffd"]
print(d) // ワ�
print(d == m) // false
The problem is that the xor'ing produces an invalid UTF-16 sequence
(an unbalanced surrogate pair), which is then replaced by the
"replacement character" U+FFFD.
I don't know how Java handles this, but Swift strings cannot invalid
Unicode scalar values, so the only solution would be to represent
the result as an [UInt16] array instead of a String.

Related

Unable to call a .dll function correclty

I am calling a shared .dll function using JNA Java. From the documentation, the function can be invoked to receive parameters using Visual C++ as below;
PMSifEncodeKcdLcl(PCHAR ff, PCHAR Dta, BOOL Dbg, PCHAR szOpId, PCHAR szOpFirst, PCHAR szOpLast);
From the doc:
ff - A single ASCII character.
Dta - Points to a null-terminated string.
Dbg - a boolean flag
szOpId - points to a null-terminated string
szOpFirst - points to a null-terminated string
szOpLast - points to a null-terminated string
The string is built from a number of Data Fields. The format for each Data Field within the string is as follows:
RS FI data
RS = Record Separator.
Indicates the start of the Data Field. A single ASCII Record Separator [RS] character (hex 1E)
FI = Field Identifier - Indicates the type of data in the field. A single ASCII character.
data = the actual data. A number of ASCII characters, dependent on the Field Identifier. Sometimes the data is variable in length. The Record Separator of the following field indicates the end of a Data Field (or for the last field, the NULL character at the end of the string).
An Answer Code is returned in field ff. Answer Data (if any) is returned in field Dta
I have cross checked the JNA documentation to confirm field mappings but still no success. After trying for days. I came up with the code below;
My Java Code:
/* JNA interface class
*/
public class JNALocksInterface {
public interface LockLibrary extends StdCallLibrary {
LockLibrary INSTANCE = (LockLibrary) Native.loadLibrary("path_to_dll", LockLibrary.class);
public void PMSifEncodeKcdLcl(byte[] ff, byte[] dta, boolean debug, String szOpid, String szOpFirst, String szOpLast);
}
}
/*My Calling Class Code*/
JNALocksInterface.LockLibrary INSTANCE = JNALocksInterface.LockLibrary.INSTANCE;
String dta = "*R101*L101*TSingle Room*NMatu*FZachary*URegular Guest*D201805021347*O201805030111";
String ff = "A";
byte[] dataBytes = new byte[dta.length() + 1];
System.arraycopy(dta.getBytes("UTF-8"), 0, dataBytes, 0, dta.length());
dataBytes[dta.length()] = 0;
byte[] dtaByteArray = new byte[dta.length() + 1];
byte[] ffByteArray = ff.getBytes("UTF-8");
for (int i = 0; i < dataBytes.length; i++) {
String s1 = String.format("%8s", Integer.toBinaryString(dataBytes[i] & 0xFF)).replace(' ', '0');
// System.out.println(s1);
if((char)dataBytes[i] == '*')
{
dtaByteArray[i] = 30;
}
else{
int val = Integer.parseInt(s1, 2);
byte b = (byte) val;
dtaByteArray[i] = b;
}
}
byte[] commandCodeFinal = new byte[1];
for (int i = 0; i < ffByteArray.length; i++) {
String s2 = String.format("%8s", Integer.toBinaryString(ffByteArray[i] & 0xFF)).replace(' ', '0');
System.out.println(s2);
int val = Integer.parseInt(s2, 2);
byte b = (byte) val;
commandCodeFinal[i] = b;
}
String userNameBytes = "test";
String userFirstNameBytes = "test";
String userLastNameBytes = "test";
INSTANCE.PMSifEncodeKcdLcl(commandCodeFinal, dtaByteArray, false, userNameBytes, userFirstNameBytes, userLastNameBytes);
I am getting a wrong response on field ff and dta as shown below.
FF Response >> :
DTA Response >> 0101IR101L101TSingle RoomNMatuFZacharyURegular GuestD201805021347O2018050
I am replacing "*" with the ascii record separator.
Can someone show me how to correctly call the function using JNA? I've searched all over but still no success.
Solved IT. Used Unicode Field separator and used JNA Memory object and it Worked!
Was also using WIndows 10 64 bit. Changed to Windows 7 32 bit and it worked!!
Replaced the code with below snippet;
String fieldSeparator = "\u001e"
String dataTest = fieldSeparator+"R101"+fieldSeparator+"TSingle Room"+fieldSeparator+"FShujaa"+fieldSeparator+"NMatoke"
+ fieldSeparator+"URegular Guest"+fieldSeparator+"D201805040842"+fieldSeparator+"O201805051245";
String dataTestPadded = org.apache.commons.lang.StringUtils.rightPad(dataTest,30,'0');
System.out.println("Padded string >> " + dataTestPadded);
String data = dataTest;
//getPayloadToSend(payLoadSample) + (char)00;
String commandCode = "A";
Memory commandCodeMemory = new Memory(commandCode.length()+1);
commandCodeMemory.setString(0, commandCode);
Memory dataMemory = new Memory(data.length()+1);
dataMemory.setString(0, data);
//dataMemory.setString(1, "0");
System.out.println("Registerring >> " + INSTANCE.PMSifRegister("42860149", "BatchClient")) ;
INSTANCE.PMSifEncodeKcdLcl(commandCodeMemory, dataMemory, false, "ZKMATU", "zACHARY", "tESTING");
System.out.println("FF Response >> " + commandCodeMemory.getString(0));
System.out.println("DTA Response >> " + dataMemory.getString(0));
INSTANCE.PMSifUnregister();

how to convert digits to ascii value or characters and store in a string array

I am trying to generate a prime key from a phone number provided by the users on my application.
for example, user provides following phone number:
Phone number: 033232532523
Now, I want to generate some kind of key like converting those digits to alphabet, special characters or ascii value or kind of that, so that I could get a key something like this (dummy):
ab743kdhad$
e.g replacing 0 with a, getting ascii value of 3 and so on...
The code I am trying to get is something like this:
public class PrimeKeyGenerator {
public static void main( String[] args ) {
String phoneNumber = "123456342";
//could we convert the digits to characters or replace the digits with their ascii value?
String characters = convertNumToCharacters( phoneNumber );
System.out.println( "Generated Prime Key: " + characters );
}
private static String convertNumToCharacters(String phoneNumber) {
return null;
}}
You could convert the digits to a byte[] and then apply a SHA-1 hash and then Base64 encode the result. Something like,
private static String convertNumToCharacters(String phoneNumber) {
byte[] digits = new byte[phoneNumber.length()];
for (int i = 0; i < digits.length; i++) {
digits[i] = (byte) Character.digit(phoneNumber.charAt(i), 10);
}
try {
MessageDigest md = MessageDigest.getInstance("SHA1");
return Base64.getEncoder().encodeToString(md.digest(digits));
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
return null;
}
Which returns (with your input "123456342")
Generated Prime Key: wlwRLSZuhzMBn5Yw6RVfw+dwegM=
and (with my phone #)
Generated Prime Key: botMioqy/9B4tu/KvLv5Cc/Ykak=

how to detect base64 encoded strings? [duplicate]

I want to decode a Base64 encoded string, then store it in my database. If the input is not Base64 encoded, I need to throw an error.
How can I check if a string is Base64 encoded?
You can use the following regular expression to check if a string constitutes a valid base64 encoding:
^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$
In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /]. If the rest length is less than 4, the string is padded with '=' characters.
^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.
([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$ means the string ends in one of three forms: [A-Za-z0-9+/]{4}, [A-Za-z0-9+/]{3}= or [A-Za-z0-9+/]{2}==.
If you are using Java, you can actually use commons-codec library
import org.apache.commons.codec.binary.Base64;
String stringToBeChecked = "...";
boolean isBase64 = Base64.isArrayByteBase64(stringToBeChecked.getBytes());
[UPDATE 1] Deprecation Notice
Use instead
Base64.isBase64(value);
/**
* Tests a given byte array to see if it contains only valid characters within the Base64 alphabet. Currently the
* method treats whitespace as valid.
*
* #param arrayOctet
* byte array to test
* #return {#code true} if all bytes are valid characters in the Base64 alphabet or if the byte array is empty;
* {#code false}, otherwise
* #deprecated 1.5 Use {#link #isBase64(byte[])}, will be removed in 2.0.
*/
#Deprecated
public static boolean isArrayByteBase64(final byte[] arrayOctet) {
return isBase64(arrayOctet);
}
Well you can:
Check that the length is a multiple of 4 characters
Check that every character is in the set A-Z, a-z, 0-9, +, / except for padding at the end which is 0, 1 or 2 '=' characters
If you're expecting that it will be base64, then you can probably just use whatever library is available on your platform to try to decode it to a byte array, throwing an exception if it's not valid base 64. That depends on your platform, of course.
As of Java 8, you can simply use java.util.Base64 to try and decode the string:
String someString = "...";
Base64.Decoder decoder = Base64.getDecoder();
try {
decoder.decode(someString);
} catch(IllegalArgumentException iae) {
// That string wasn't valid.
}
Try like this for PHP5
//where $json is some data that can be base64 encoded
$json=some_data;
//this will check whether data is base64 encoded or not
if (base64_decode($json, true) == true)
{
echo "base64 encoded";
}
else
{
echo "not base64 encoded";
}
Use this for PHP7
//$string parameter can be base64 encoded or not
function is_base64_encoded($string){
//this will check if $string is base64 encoded and return true, if it is.
if (base64_decode($string, true) !== false){
return true;
}else{
return false;
}
}
var base64Rejex = /^(?:[A-Z0-9+\/]{4})*(?:[A-Z0-9+\/]{2}==|[A-Z0-9+\/]{3}=|[A-Z0-9+\/]{4})$/i;
var isBase64Valid = base64Rejex.test(base64Data); // base64Data is the base64 string
if (isBase64Valid) {
// true if base64 formate
console.log('It is base64');
} else {
// false if not in base64 formate
console.log('it is not in base64');
}
Try this:
public void checkForEncode(String string) {
String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(string);
if (m.find()) {
System.out.println("true");
} else {
System.out.println("false");
}
}
It is impossible to check if a string is base64 encoded or not. It is only possible to validate if that string is of a base64 encoded string format, which would mean that it could be a string produced by base64 encoding (to check that, string could be validated against a regexp or a library could be used, many other answers to this question provide good ways to check this, so I won't go into details).
For example, string flow is a valid base64 encoded string. But it is impossible to know if it is just a simple string, an English word flow, or is it base 64 encoded string ~Z0
There are many variants of Base64, so consider just determining if your string resembles the varient you expect to handle. As such, you may need to adjust the regex below with respect to the index and padding characters (i.e. +, /, =).
class String
def resembles_base64?
self.length % 4 == 0 && self =~ /^[A-Za-z0-9+\/=]+\Z/
end
end
Usage:
raise 'the string does not resemble Base64' unless my_string.resembles_base64?
Check to see IF the string's length is a multiple of 4. Aftwerwards use this regex to make sure all characters in the string are base64 characters.
\A[a-zA-Z\d\/+]+={,2}\z
If the library you use adds a newline as a way of observing the 76 max chars per line rule, replace them with empty strings.
/^([A-Za-z0-9+\/]{4})*([A-Za-z0-9+\/]{4}|[A-Za-z0-9+\/]{3}=|[A-Za-z0-9+\/]{2}==)$/
this regular expression helped me identify the base64 in my application in rails, I only had one problem, it is that it recognizes the string "errorDescripcion", I generate an error, to solve it just validate the length of a string.
For Flutter, I tested couple of the above comments and translated that into dart function as follows
static bool isBase64(dynamic value) {
if (value.runtimeType == String){
final RegExp rx = RegExp(r'^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$',
multiLine: true,
unicode: true,
);
final bool isBase64Valid = rx.hasMatch(value);
if (isBase64Valid == true) {return true;}
else {return false;}
}
else {return false;}
}
In Java below code worked for me:
public static boolean isBase64Encoded(String s) {
String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(s);
return m.find();
}
This works in Python:
import base64
def IsBase64(str):
try:
base64.b64decode(str)
return True
except Exception as e:
return False
if IsBase64("ABC"):
print("ABC is Base64-encoded and its result after decoding is: " + str(base64.b64decode("ABC")).replace("b'", "").replace("'", ""))
else:
print("ABC is NOT Base64-encoded.")
if IsBase64("QUJD"):
print("QUJD is Base64-encoded and its result after decoding is: " + str(base64.b64decode("QUJD")).replace("b'", "").replace("'", ""))
else:
print("QUJD is NOT Base64-encoded.")
Summary: IsBase64("string here") returns true if string here is Base64-encoded, and it returns false if string here was NOT Base64-encoded.
C#
This is performing great:
static readonly Regex _base64RegexPattern = new Regex(BASE64_REGEX_STRING, RegexOptions.Compiled);
private const String BASE64_REGEX_STRING = #"^[a-zA-Z0-9\+/]*={0,3}$";
private static bool IsBase64(this String base64String)
{
var rs = (!string.IsNullOrEmpty(base64String) && !string.IsNullOrWhiteSpace(base64String) && base64String.Length != 0 && base64String.Length % 4 == 0 && !base64String.Contains(" ") && !base64String.Contains("\t") && !base64String.Contains("\r") && !base64String.Contains("\n")) && (base64String.Length % 4 == 0 && _base64RegexPattern.Match(base64String, 0).Success);
return rs;
}
There is no way to distinct string and base64 encoded, except the string in your system has some specific limitation or identification.
This snippet may be useful when you know the length of the original content (e.g. a checksum). It checks that encoded form has the correct length.
public static boolean isValidBase64( final int initialLength, final String string ) {
final int padding ;
final String regexEnd ;
switch( ( initialLength ) % 3 ) {
case 1 :
padding = 2 ;
regexEnd = "==" ;
break ;
case 2 :
padding = 1 ;
regexEnd = "=" ;
break ;
default :
padding = 0 ;
regexEnd = "" ;
}
final int encodedLength = ( ( ( initialLength / 3 ) + ( padding > 0 ? 1 : 0 ) ) * 4 ) ;
final String regex = "[a-zA-Z0-9/\\+]{" + ( encodedLength - padding ) + "}" + regexEnd ;
return Pattern.compile( regex ).matcher( string ).matches() ;
}
If the RegEx does not work and you know the format style of the original string, you can reverse the logic, by regexing for this format.
For example I work with base64 encoded xml files and just check if the file contains valid xml markup. If it does not I can assume, that it's base64 decoded. This is not very dynamic but works fine for my small application.
This works in Python:
def is_base64(string):
if len(string) % 4 == 0 and re.test('^[A-Za-z0-9+\/=]+\Z', string):
return(True)
else:
return(False)
Try this using a previously mentioned regex:
String regex = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
if("TXkgdGVzdCBzdHJpbmc/".matches(regex)){
System.out.println("it's a Base64");
}
...We can also make a simple validation like, if it has spaces it cannot be Base64:
String myString = "Hello World";
if(myString.contains(" ")){
System.out.println("Not B64");
}else{
System.out.println("Could be B64 encoded, since it has no spaces");
}
if when decoding we get a string with ASCII characters, then the string was
not encoded
(RoR) ruby solution:
def encoded?(str)
Base64.decode64(str.downcase).scan(/[^[:ascii:]]/).count.zero?
end
def decoded?(str)
Base64.decode64(str.downcase).scan(/[^[:ascii:]]/).count > 0
end
Function Check_If_Base64(ByVal msgFile As String) As Boolean
Dim I As Long
Dim Buffer As String
Dim Car As String
Check_If_Base64 = True
Buffer = Leggi_File(msgFile)
Buffer = Replace(Buffer, vbCrLf, "")
For I = 1 To Len(Buffer)
Car = Mid(Buffer, I, 1)
If (Car < "A" Or Car > "Z") _
And (Car < "a" Or Car > "z") _
And (Car < "0" Or Car > "9") _
And (Car <> "+" And Car <> "/" And Car <> "=") Then
Check_If_Base64 = False
Exit For
End If
Next I
End Function
Function Leggi_File(PathAndFileName As String) As String
Dim FF As Integer
FF = FreeFile()
Open PathAndFileName For Binary As #FF
Leggi_File = Input(LOF(FF), #FF)
Close #FF
End Function
import java.util.Base64;
public static String encodeBase64(String s) {
return Base64.getEncoder().encodeToString(s.getBytes());
}
public static String decodeBase64(String s) {
try {
if (isBase64(s)) {
return new String(Base64.getDecoder().decode(s));
} else {
return s;
}
} catch (Exception e) {
return s;
}
}
public static boolean isBase64(String s) {
String pattern = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(s);
return m.find();
}
For Java flavour I actually use the following regex:
"([A-Za-z0-9+]{4})*([A-Za-z0-9+]{3}=|[A-Za-z0-9+]{2}(==){0,2})?"
This also have the == as optional in some cases.
Best!
I try to use this, yes this one it's working
^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$
but I added on the condition to check at least the end of the character is =
string.lastIndexOf("=") >= 0

JIS X 0208 conversion: how to handle unified (merged) codepoints

I'm trying to convert Java characters to JIS X 0208 "x-JIS0208" encoding (or any compatible, like EUC-JP, but not Shift-JIS), but I want unified (merged) codepoints to be handled correctly.
For example, 高 is assigned to row 25 column 66 in this JISX0208 chart, and a look-alike character 髙, while classified as an unassigned codepoint, is merged with the former. I quote from wikipedia: "both the form [ ] (高) and the less common form with a ladder-like construction (髙) are subsumed into the same code point".
I tried this in code the code below, and whatever encoding I try, I always get either an exception or the unassigned-character-placeholder ? (either ASCII or full-width).
Is there a way, perhaps a different endoding or an entirely different way of converting, so both these characters return the same codepoint? Alternatively, is there an API to find such characters so I can merge them before converting?
static Charset charset1 = Charset.forName("x-JIS0208");
static Charset charset2 = Charset.forName("EUC-JP");
static Charset[] charsets = {charset1, charset2};
static CharBuffer in = CharBuffer.allocate(1);
public static void main(String[] args) throws Exception
{
CharsetEncoder[] encoders = new CharsetEncoder[charsets.length];
for (int i = 0; i < charsets.length; i++)
encoders[i] = charsets[i].newEncoder();
char[] testChars = {' ', 'A', '?', '亜', '唖', '蔭', '高', '髙'};
for (char ch : testChars)
{
System.out.print("'" + ch + "'\t(" + Integer.toHexString(ch) + ")\t=");
for (int i = 0; i < charsets.length; i++)
{
System.out.print("\t" + interpret(encode1(encoders[i], ch)));
System.out.print("\t" + interpret(encode2(charsets[i], ch)));
}
System.out.println();
}
}
private static String interpret(int i)
{
if (i == -1)
return "excepti";
if (i < 0x80)
return "'" + (char)i + "'";
return Integer.toHexString(i);
}
private static int encode1(CharsetEncoder encoder, char ch)
{
in.rewind();
in.put(ch);
in.rewind();
try
{
ByteBuffer out = encoder.encode(in);
if (out.limit() == 1)
return out.get(0) & 0xFF;
return out.get(1) & 0xFF | (out.get(0) & 0xFF) << 8;
}
catch (CharacterCodingException e)
{
return -1;
}
}
private static int encode2(Charset charset, char ch)
{
in.rewind();
in.put(ch);
in.rewind();
ByteBuffer out = charset.encode(in);
if (out.limit() == 1)
return out.get(0) & 0xFF;
return out.get(1) & 0xFF | (out.get(0) & 0xFF) << 8;
}
The output:
' ' (3000) = 2121 2121 a1a1 a1a1
'A' (ff21) = 2341 2341 a3c1 a3c1
'?' (ff1f) = 2129 2129 a1a9 a1a9
'亜' (4e9c) = 3021 3021 b0a1 b0a1
'唖' (5516) = 3022 3022 b0a2 b0a2
'蔭' (852d) = 307e 307e b0fe b0fe
'高' (9ad8) = 3962 3962 b9e2 b9e2
'髙' (9ad9) = excepti 2129 excepti '?'
Note: I'm only interested in converting single characters, lots of them, not strings or streams, so I actually prefer a different method (if exists) that doesn't allocate a ByteBuffer every conversion.
髙 is not contained in JIS X 0208, but is containd in Microsoft Windows code page 932 (MS932). This is a variant of Shift JIS encoding, and is a superset of JIS X 0208 charset.
You should use the name "Windows-31j" for MS932, like:
Charset.forName("Windows-31j");
rather than Charset.forName("x-JIS0208");.
EDIT
The mapping table for some characters like 𨦇 and 鋏 (scissors) is distributed from the government of Japan, like National Tax Agency (see JIS縮退マップ(Ver.1.0.0)) .
But these mapping tables don't contain the character 髙. I think this is because 髙 is not contained in JIS X 0208 nor JIS X 0213.
So, I think you will have to replace 髙 with 高 manually (with String#replaceAll()), or make your own custom Charset with CharsetProvider.
I only knew that in the spec "ARIB STD-B24" (for ISDB-T 1seg in JP), this character is coding with DRCS pattern data, from DRCS-1 to DRCS-15, and each set consists of 94
characters.

String to binary output in Java

I want to get binary (011001..) from a String but instead i get [B#addbf1 , there must be an easy transformation to do this but I don't see it.
public static String toBin(String info){
byte[] infoBin = null;
try {
infoBin = info.getBytes( "UTF-8" );
System.out.println("infoBin: "+infoBin);
}
catch (Exception e){
System.out.println(e.toString());
}
return infoBin.toString();
}
Here i get infoBin: [B#addbf1
and I would like infoBin: 01001...
Any help would be appreciated, thanks!
Only Integer has a method to convert to binary string representation check this out:
import java.io.UnsupportedEncodingException;
public class TestBin {
public static void main(String[] args) throws UnsupportedEncodingException {
byte[] infoBin = null;
infoBin = "this is plain text".getBytes("UTF-8");
for (byte b : infoBin) {
System.out.println("c:" + (char) b + "-> "
+ Integer.toBinaryString(b));
}
}
}
would print:
c:t-> 1110100
c:h-> 1101000
c:i-> 1101001
c:s-> 1110011
c: -> 100000
c:i-> 1101001
c:s-> 1110011
c: -> 100000
c:p-> 1110000
c:l-> 1101100
c:a-> 1100001
c:i-> 1101001
c:n-> 1101110
c: -> 100000
c:t-> 1110100
c:e-> 1100101
c:x-> 1111000
c:t-> 1110100
Padding:
String bin = Integer.toBinaryString(b);
if ( bin.length() < 8 )
bin = "0" + bin;
Arrays do not have a sensible toString override, so they use the default object notation.
Change your last line to
return Arrays.toString(infoBin);
and you'll get the expected output.
When you try to use + with an object in a string context the java compiler silently inserts a call to the toString() method.
In other words your statements look like
System.out.println("infobin: " + infoBin.toString())
which in this case is the one inherited from Object.
You will need to use a for-loop to pick out each byte from the byte array.

Categories

Resources