How to convert next character pair to a hex integer - java

where I have to first reverse two bytes then convert that pair to hex integer. I am trying to convert it like below but it is giving error. Any idea to do that ? Thanks in advance
Here complete string : http://pastebin.com/1cSCyD78
Sample string
String str = "031890";
Error Message :
java.lang.NumberFormatException: Invalid int: "0x30"
Java Code
for ( int start = 0; start < str.length(); start += 2 ) {
try {
String thisByte = new StringBuilder(str.substring(start, start+2)).reverse().toString();
thisByte = "0x" + thisByte;
int value = Integer.parseInt(thisByte, 16);
char c = (char) value;
System.out.println(c);
} catch(Exception e) {
Log.e("MainActivity", e.getMessage());
}
}
Update
StringBuilder output = new StringBuilder();
for ( int start = 0; start < str.length(); start += 2 ) {
try {
String thisByte = new StringBuilder(str.substring(start, start+2)).reverse().toString();
output.append((char)Integer.parseInt(thisByte, 16));
} catch(Exception e) {
Log.e("MainActivity", e.getMessage());
}
}
Yes I Tried without prepending a "0x" to the string and now see my output looking weird.

Your string looks like you need 4 digits from your string per character, not 2.
Given that you interpret 2 digits as a character, though, at first glance, the output you showed in the picture does seem to match the string you posted on pastebin. You do get things that look like words in the output, so it's not totally off, and the gaps between the letters come from every second pair of 2 digits being '00'.
Not sure where this string came from, but if it was also generated by converting characters in some String to Bytes, it might make sense that it's 4 digits per character, since, for example, Java's chars are 16 bits (i.e. 2 bytes, i.e. 4 digits in your String) that encode the actual unicode symbol they represent in UTF-16.
If you are working off specs that someone else provided you with, maybe when they said "2 BYTES", they actually meant "two 8-bit numbers", which correspond to 4 digits (four 4-bit nibbles) in your hex string.
But your string looks like it contains binary data as well, not just characters. Do you know what are you actually expecting to see as the answer?
Update (as per comment request):
It's a trivial change to your code, but here it is:
StringBuilder output = new StringBuilder();
for ( int start = 0; start < str.length(); start += 4 ) {
try {
String thisByte = new StringBuilder(str.substring(start, start+4)).reverse().toString();
output.append((char)Integer.parseInt(thisByte, 16));
} catch(Exception e) {
Log.e("MainActivity", e.getMessage());
}
}
All I did was replace "2" with "4". :)
Update (as per chat):
The code posted here to convert the hex-string into characters (using 4 digits per character) seems to work fine, but the hex-string does not seem to follow the convention the OP is expecting based on the specifications of the data, which caused part of the confusion.
A side-note:
If this is a public application, it is highly risky to include unencrypted SQL statements in the network traffic. If these statements are part of a request and get executed on the server, a hacker can use this to perform unwanted operations on the underlying data (e.g. stealing all phone numbers in the database). If it is merely some debug-/log-information sent to the client, it's still not a good idea as it may give hints to a hacker about the structure of your database and the way you access it, significantly simplifying a potential SQL injection attack.

Try without prepending a "0x" to the string. This prefix is only for the compiler. It's actually a shortcut for saying to use 16 as the radix.

Related

How to convert special characters in a string to unicode?

I couldn't find an answer to this problem, having tried several answer here combined to find something that works, to no avail.
An application I'm working on uses a users name to create PDF's with that name in it. However, when someones name contains a special character like "Yağmur" the pdf creator freaks out and omits this special character.
However, when it gets the unicode equivalent ("Yağmur"), it prints "Yağmur" in the pdf as it should.
How do I check a name/string for any special character (regex = "[^a-z0-9 ]") and when found, replace that character with its unicode equivalent and returning the new unicoded string?
I will try to give the solution in generic way as the frame work you are using is not mentioned as the part of your problem statement.
I too faced the same kind of issue long time back. This should be handled by the pdf engine if you set the text/char encoding as UTF-8. Please find how you can set encoding in your framework for pdf generation and try it out. Hope it helps !!
One hackish way to do this would be as follows:
/*
* TODO: poorly named
*/
public static String convertUnicodePoints(String input) {
// getting char array from input
char[] chars = input.toCharArray();
// initializing output
StringBuilder sb = new StringBuilder();
// iterating input chars
for (int i = 0; i < input.length(); i++) {
// checking character code point to infer whether "conversion" is required
// here, picking an arbitrary code point 125 as boundary
if (Character.codePointAt(input, i) < 125) {
sb.append(chars[i]);
}
// need to "convert", code point > boundary
else {
// for hex representation: prepends as many 0s as required
// to get a hex string of the char code point, 4 characters long
// sb.append(String.format("&#xu%04X;", (int)chars[i]));
// for decimal representation, which is what you want here
sb.append(String.format("&#%d;", (int)chars[i]));
}
}
return sb.toString();
}
If you execute: System.out.println(convertUnicodePoints("Yağmur"));...
... you'll get: Yağmur.
Of course, you can play with the "conversion" logic and decide which ranges get converted.

How to build the longest String with different Unicode characters

Thanks in advance for your patience. This is my problem.
I'm writing a program in Java that works best with a big set of different characters.
I have to store all the characters in a String. I started with
private static final String values = "0123456789";
Then I added A-Z, a-z and all the commons symbols.
But they are still too few, so I tought that maybe Unicode could be the solution.
The problem is now: what is the best way to get all the unicode characters that can be displayed in Eclipse (my algorithm will probably fail if there are unrecognized characters - those displayed like little rectangles). Is it possible to build a string (or some strings) with all the characters present here (en.wikipedia.org/wiki/List_of_Unicode_characters) correctly displayed?
I can do a rough copy-paste from http://www.terena.org/activities/multiling/euroml/tests/test-ucspages1ucs.html or http://zenoplex.jp/tools/unicoderange_generator.html, but I would appreciate some cleaner solution.
I don't know if there is a way to extract characters fron a font (the Unifont one). Or maybe I should parse this (www. utf8-chartable.de/unicode-utf8-table.pl) webpage.
Moreover, by adding all the characters into a String I will probably get the error:
"The type generates a string that requires more than 65535 bytes to encode in Utf8 format in the constant pool" (discussed in this question on SO: /questions/10798769/how-to-process-a-string-with-823237-characters).
Hybrid solutions can be accepted. I can remove duplicates following this question on SO questions/4989091/removing-duplicates-from-a-string-in-java)
Finally: every solution to get the longest only-different-characters string is accepted.
Thanks!
You are mixing some things up. The question whether a character can be displayed in Eclipse depends on the font you have chosen; and whether the source file can be processed correctly depends on which character encoding you have set up for the source file. When choosing UTF-8 and a good unicode font you can use and display almost any character, at least more than fit into a single String literal.
But is it really required to show the character in Eclipse? You can use the unicode escapes, e.g. \u20ac to refer to characters, regardless of whether they can be displayed or if the file encoding can handle them.
And if it is not a requirement to blow up your source code, it’s easy to create a String containing all existing characters:
// all chars (i.e. UTF-16 values)
StringBuilder sb=new StringBuilder(Character.MAX_VALUE);
for(char c=0; c<Character.MAX_VALUE; c++) sb.append(c);
String s=sb.toString();
// if it should behave like a compile-time constant:
s=s.intern();
or
// all unicode characters (aka code points)
StringBuilder sb=new StringBuilder(2162686);
for(int c=0; c<Character.MAX_CODE_POINT; c++) sb.appendCodePoint(c);
String s=sb.toString();
// if it should behave like a compile-time constant:
s=s.intern();
If you wan’t the String to contain valid unicode characters only you can use if(Character.isDefined(c)) … inside the loop. But that’s a moving target— newer JRE’s will most probably know more defined characters.
Smply use Apache classes, org.apache.commons.lang.RandomStringUtils (commons-lang) can solve your purpose.
http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/RandomStringUtils.html
Also please refer to below code for api usage,
import org.apache.commons.lang3.RandomStringUtils;
public class RandomString {
public static void main(String[] args) {
// Random string only with numbers
String string = RandomStringUtils.random(64, false, true);
System.out.println("Random 0 = " + string);
// Random alphabetic string
string = RandomStringUtils.randomAlphabetic(64);
System.out.println("Random 1 = " + string);
// Random ASCII string
string = RandomStringUtils.randomAscii(32);
System.out.println("Random 2 = " + string);
// Create a random string with indexes from the given array of chars
string = RandomStringUtils.random(32, 0, 20, true, true, "bj81G5RDED3DC6142kasok".toCharArray());
System.out.println("Random 3 = " + string);
}
}

How do I recognize a character such as "ç" as a letter?

I have an array of bytes that contains a sentence. I need to convert the lowercase letters on this sentence into uppercase letters. Here is the function that I did:
public void CharUpperBuffAJava(byte[] word) {
for (int i = 0; i < word.length; i++) {
if (!Character.isUpperCase(word[i]) && Character.isLetter(word[i])) {
word[i] -= 32;
}
}
return cchLength;
}
It will work fine with sentences like: "a glass of water". The problem is it must work with all ANSI characters, which includes "ç,á,é,í,ó,ú" and so on. The method Character.isLetter doesn't work with these letters and, therefore, they are not converted into uppercase.
Do you know how can I identify these ANSI characters as a letter in Java?
EDIT
If someone wants to know, I did method again after the answers and now it looks like this:
public static int CharUpperBuffAJava(byte[] lpsz, int cchLength) {
String value;
try {
value = new String(lpsz, 0, cchLength, "Windows-1252");
String upperCase = value.toUpperCase();
byte[] bytes = upperCase.getBytes();
for (int i = 0; i < cchLength; i++) {
lpsz[i] = bytes[i];
}
return cchLength;
} catch (UnsupportedEncodingException e) {
return 0;
}
}
You need to "decode" the byte[] into a character string. There are several APIs to do this, but you must specify the character encoding that is use for the bytes. The overloaded versions that don't use an encoding will give different results on different machines, because they use the platform default.
For example, if you determine that the bytes were encoded with Windows-1252 (sometimes referred to as ANSI).
String s = new String(bytes, "Windows-1252");
String upper = s.toUpperCase();
Convert the byte array into a string, supporting the encoding. Then call toUpperCase(). Then, you can call getBytes() on the string if you need it as a byte array after capitalizing.
Can't you simply use:
String s = new String(bytes, "cp1252");
String upper = s.toUpperCase(someLocale);
Wouldn't changing the character set do the trick before conversion? The internal conversion logic of Java might work fine. Something like http://www.exampledepot.com/egs/java.nio.charset/ConvertChar.html, but use ASCII as the target character set.
I am looking at this table:
http://slayeroffice.com/tools/ascii/
But anything > 227 appears to be a letter, but to make it upper case you would subtract 27 from the ASCII value.

JAVA: Space delimiting all non-numerical characters in a String

I am having some trouble with modifying Strings to be space delimited under the special case of adding spaces to all non-numerical characters.
My code must take a string representing a math equation, and split it up into it's individual parts. It does so using space delimits between values This part works great if the string is already delimited.
The problem is that I do not always get a space delimited input. To deal with this, I want to first insert these spaces so that the array is created properly.
What my code must do is take any character that is NOT a number, and add a space before and after it.
Something like this:
3*24+321 becomes 3 * 24 + 321
or
((3.0)*(2.5)) becomes ( ( 3.0 ) * ( 2.5 ) )
Obviously I need to avoid inserting space in the numbers, or 2.5 becomes 2 . 5, and then gets entered into the array as 3 elements. which it is not.
So far, I have tried using
String InputLineDelmit = InputLine.replaceAll("\B", " ");
which successfully changes a string of all letters "abcd" to "a b c d"
But it makes mistakes when it runs into numbers. Using this method, I have gotten that:
(((1)*(2))) becomes ( ( (1) * (2) ) ) ---- * The numbers must be separate from parens
12.7+3.1 becomes 1 2.7+3.1 ----- * 12.7 is split
51/3 becomes 5 1/3 ----- * same issue
and 5*4-2 does not change at all.
So, I know that \D can be used as a regular expression for all non-numbers in java. However, my attempts to implement this (by replacing, or combining it with \B above) have led either to compiler errors or it REPLACING the char with a space, not adding one.
EDIT:
==== Answered! ====
It wont let me add my own answer because I'm new, but an edit to neo108's code below (which, itself, does not work) did the job. What i did was change it to check isDigit, not isLetter, and then do nothing in that case (or in the special case of a decimal, for doubles). Else, the character is changed to have spaces on either side.
public static void main(String[] args){
String formula = "12+((13.0)*(2.5)-17*2)+(100/3)-7";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
char cdot = '.';
if(Character.isDigit(c) || c == cdot) {
builder.append(c);
}
else {
builder.append(" "+c+" ");
}
}
System.out.println("OUTPUT:" + builder);
}
OUTPUT: 12 + ( ( 13.0 ) * ( 2.5 ) - 17 * 2 ) + ( 100 / 3 ) - 7
However, any ideas on how to do this more succinctly, and also a decent explanation of StringBuilders, would be appreciated. Namely what is with this limit of 16 chars that I read about on javadocs, as the example above shows that you CAN have more output.
Something like this should work...
String formula = "Ab((3.0)*(2.5))";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
if(Character.isLetter(c)) {
builder.append(" "+c+" ");
} else {
builder.append(c);
}
}
Define the operations in your math equation + - * / () etc
Convert your equation string to char[]
Traverse through the char[] one char at a time and append the read char to a StringBuilder object.
If you encounter any character matching with the operations defined, then add a space before and after that character and then append this t o the StringBuilder object.
Well this is one of the algorithm you can implement. There might be other ways of doing it as well.

Checking if a character is an integer or letter

I am modifying a file using Java. Here's what I want to accomplish:
if an & symbol, along with an integer, is detected while being read, I want to drop the & symbol and translate the integer to binary.
if an & symbol, along with a (random) word, is detected while being read, I want to drop the & symbol and replace the word with the integer 16, and if a different string of characters is being used along with the & symbol, I want to set the number 1 higher than integer 16.
Here's an example of what I mean. If a file is inputted containing these strings:
&myword
&4
&anotherword
&9
&yetanotherword
&10
&myword
The output should be:
&0000000000010000 (which is 16 in decimal)
&0000000000000100 (or the number '4' in decimal)
&0000000000010001 (which is 17 in decimal, since 16 is already used, so 16+1=17)
&0000000000000101 (or the number '9' in decimal)
&0000000000010001 (which is 18 in decimal, or 17+1=18)
&0000000000000110 (or the number '10' in decimal)
&0000000000010000 (which is 16 because value of myword = 16)
Here's what I tried so far, but haven't succeeded yet:
for (i=0; i<anyLines.length; i++) {
char[] charray = anyLines[i].toCharArray();
for (int j=0; j<charray.length; j++)
if (Character.isDigit(charray[j])) {
anyLines[i] = anyLines[i].replace("&","");
anyLines[i] = Integer.toBinaryString(Integer.parseInt(anyLines[i]);
}
else {
continue;
}
if (Character.isLetter(charray[j])) {
anyLines[i] = anyLines[i].replace("&","");
for (int k=16; j<charray.length; k++) {
anyLines[i] = Integer.toBinaryString(Integer.parseInt(k);
}
}
}
}
I hope that I am articulate enough. Any suggestions on how to accomplish this task?
Character.isLetter() //tests to see if it is a letter
Character.isDigit() //tests the character to
It looks like something you could match against a regex. I don't know Java but you should have at least one regex engine at your disposal. Then the regex would be:
regex1: &(\d+)
and
regex2: &(\w+)
or
regex3: &(\d+|\w+)
in the first case, if regex1 matches, you know you ran into a number, and that number is into the first capturing group (eg: match.group(1)). If regex2 matches, you know you have a word. You can then lookup that word into a dictionary and see what its associated number is, or if not present, add it to the dictionary and associate it with the next free number (16 + dictionary size + 1).
regex3 on the other hand will match both numbers and words, so it's up to you to see what's in the capturing group (it's just a different approach).
If neither of the regex match, then you have an invalid sequence, or you need some other action. Note that \w in a regex only matches word characters (ie: letters, _ and possibly a few other characters), so &çSomeWord or &*SomeWord won't match at all, while the captured group in &Hello.World would be just "Hello".
Regex libs usually provide a length for the matched text, so you can move i forward by that much in order to skip already matched text.
You have to somehow tokenize your input. It seems you are splitting it in lines and then analyzing each line individually. If this is what you want, okay. If not, you could simply search for & (indexOf('%')) and then somehow determine what the next token is (either a number or a "word", however you want to define word).
What do you want to do with input which does not match your pattern? Neither the description of the task nor the example really covers this.
You need to have a dictionary of already read strings. Use a Map<String, Integer>.
I would post this as a comment, but don't have the ability yet. What is the issue you are running into? Error? Incorrect Results? 16's not being correctly incremented? Also, the examples use a '%' but in your description you say it should start with a '&'.
Edit2: Was thinking it was line by line, but re-reading indicates you could be trying to find say "I went to the &store" and want it to say "I went to the &000010000". So you would want to split by whitespace and then iterate through and pass the strings into your 'replace' method, which is similar to below.
Edit1: If I understand what you are trying to do, code like this should work.
Map<String, Integer> usedWords = new HashMap<String, Integer>();
List<String> output = new ArrayList<String>();
int wordIncrementer = 16;
String[] arr = test.split("\n");
for(String s : arr)
{
if(s.startsWith("&"))
{
String line = s.substring(1).trim(); //Removes &
try
{
Integer lineInt = Integer.parseInt(line);
output.add("&" + Integer.toBinaryString(lineInt));
}
catch(Exception e)
{
System.out.println("Line was not an integer. Parsing as a String.");
String outputString = "&";
if(usedWords.containsKey(line))
{
outputString += Integer.toBinaryString(usedWords.get(line));
}
else
{
outputString += Integer.toBinaryString(wordIncrementer);
usedWords.put(line, wordIncrementer++);
}
output.add(outputString);
}
}
else
{
continue; //Nothing indicating that we should parse the line.
}
}
How about this?
String input = "&myword\n&4\n&anotherword\n&9\n&yetanotherword\n&10\n&myword";
String[] lines = input.split("\n");
int wordValue = 16;
// to keep track words that are already used
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String line : lines) {
// if line doesn't begin with &, then ignore it
if (!line.startsWith("&")) {
continue;
}
// remove &
line = line.substring(1);
Integer binaryValue = null;
if (line.matches("\\d+")) {
binaryValue = Integer.parseInt(line);
}
else if (line.matches("\\w+")) {
binaryValue = wordValueMap.get(line);
// if the map doesn't contain the word value, then assign and store it
if (binaryValue == null) {
binaryValue = wordValue;
wordValueMap.put(line, binaryValue);
wordValue++;
}
}
// I'm using Commons Lang's StringUtils.leftPad(..) to create the zero padded string
String out = "&" + StringUtils.leftPad(Integer.toBinaryString(binaryValue), 16, "0");
System.out.println(out);
Here's the printout:-
&0000000000010000
&0000000000000100
&0000000000010001
&0000000000001001
&0000000000010010
&0000000000001010
&0000000000010000
Just FYI, the binary value for 10 is "1010", not "110" as stated in your original post.

Categories

Resources