replacing all cases of ISO Control characters in a string with "CTRL" - java

static String clean(String identifier) {
String firstString = "";
for (int i = 0; i < identifier.length(); i++)
if (Character.isISOControl(identifier.charAt(i))){
firstString = identifier.replaceAll(identifier.charAt(i),
"CTRL");
}
return firstString;
}
The logic behind the code above is to replace all instances of ISO Control characters in the string 'identifier' with "CTRL". I'm however faced with this error: "char cannot be converted to java.lang.String"
Can someone help me to solve and improve my code to produce the right output?

String#replaceAll expects a String as parameter, but it has to be a regular expression. Use String#replace instead.
EDIT: I haven't seen that you want to replace a character by some string. In that case, you can use this version of String#replace but you need to convert the character to a String, e. g. by using Character.toString.
Update
Example:
String text = "AB\003DE";
text = text.replace(Character.toString('\003'), "CTRL");
System.out.println(text);
// gives: ABCTRLDE

Code points, and Control Picture characters
I can add two points:
The char type is essentially broken since Java 2, and legacy since Java 5. Best to use code point integers when working with individual characters.
Unicode defines characters for display as placeholders for control characters. See Control Pictures section of one Wikipedia page, and see another page, Control Pictures.
For example, the NULL character at code point 0 decimal has a matching SYMBOL FOR NULL character at 9,216 decimal: ␀. To see all the Control Picture characters, use this PDF section of the Unicode standard specification.
Get an array of the code point integers representing each of the characters in your string.
int[] codePoints = myString.codePoints().toArray() ;
Loop those code points. Replace those of interest.
Here is some untested code.
int[] replacedCodePoints = new int[ codePoints.length ] ;
int index = 0 ;
for ( int codePoint : codePoints )
{
if( codePoint >= 0 && codePoint <= 32 ) // 32 is SPACE, so you may want to use 31 depending on your context.
{
replacedCodePoints[ index ] = codePoint + 9_216 ; // 9,216 is the offset to the beginning of the Control Picture character range defined in Unicode.
} else if ( codePoint == 127 ) // DEL character.
{
replacedCodePoints[ index ] = 9_249 ;
} else // Any other character, we keep as-is, no replacement.
{
replacedCodePoints[ index ] = codePoint ;
}
i ++ ; // Set up the next loop.
}
Convert code points back into text. Use StringBuilder#appendCodePoint to build up the characters of text. You can use the following stream-based code as boilerplate. For explanation, see this Question.
String result =
Arrays
.stream( replacedCodePoints )
.collect( StringBuilder::new , StringBuilder::appendCodePoint , StringBuilder::append )
.toString();

Related

How can I get all the characters after a certain number of digits in a string in Java?

My question is fairly simple. I want to get all the characters in a string after a certain number of digits. For example, if I entered in 123 InsertIGN, I want to know how I can get all the characters after 123, no matter how many of them are there. I want it to still work if I entered 123 VeryLongWordHereshfusihdisa. I would want to get everything AFTER 123. Is there any way I can go about doing that? Sorry for the dumb question, I stopped coding in Java for a while and recently have come back. Thanks in advance for any answers!
If your test case is "123" always in the first place of string. You can use substring as simple as
String testCase = "123ABCD!##$";
String result = testCase.substring("123".length());
Or if your test case would be "123" is in random place, but still could get all the chars behind that. We can hack it to
String testCase = "ABCD123!##$";
String splitBy
int index123 = testCase.indexOf("123") + "123".length();
String result = testCase.substring("123".length());
it will return
result = !##$
Using index123 object we are trying to get the position of the first "123" without "123" thats why we need to add length of "123".
Let's try using another test case
String testCase = "ABCD123!##$123XYZ";
will return
result = !##$123XYZ
Because we only process the first "123"
Get the Unicode code point integer assigned to each character in your input string.
String input = "123 InsertIGN" ;
List< Integer > codePoints = input.codePoints().boxed().toList() ;
Loop each code point. Ask if that character is a digit. If so, increment a count. Once your desired limit is reached, collect remaining characters.
StringBuilder result = new StringBuilder() ;
int limit = 3 ; // Number of digit occurrences to get past.
int count = 0 ;
for( Integer codePoint : codePoints )
{
if( count < limit )
{
if( Character.isDigit( codePoint ) ) { count ++ ; }
}
else
{
result.append( codePoint ) ;
}
}
There are nifty ways to do that with streams. But as someone returning to Java programming, you may not be familiar with that. So the above is the old-school approach.

Unable to verify ASCII CHARACTER 29 printing in JAVA [duplicate]

This question already has answers here:
ASCII non readable characters 28, 29 31
(3 answers)
Closed 3 years ago.
I'm using JAVA and I'm trying to add the ASCII character 29(Group Separator) to a String(alphanumeric) as part of my algorithm. But I'm unable to verify the output since it doesnt get printed.
If its a non-printable character, is there any other way I can verify that it does get added.
Tried 1)Printing it like any other ASCII character 2)Printing its HEX value(0x1D)
System.out.println("Test1====="+Character.toString((char)0x1D));
System.out.println("Test3====="+String.valueOf(Character.toChars(29)));
Expected Result:Verify its printed.
Actual Result:Unable to verify.
Maybe write a function that traverses a string and compare every char to
Character.toChars(29)? Something along the lines of:
String str = "Foo Bar" + yourCharacter29ToString;
for(int i=0;i<str.length();i++){
if(Character.toChars(29) == str.charAt(i)){
return true;
}
}
return false;
This could be enough as a proof of concept. (i did not check above code - read it as pseudo-code please)
To see which codepoints are in a String, you can use Character.getName(codepoint)
int[] codepoints = ("Test3==🚲==="+String.valueOf(Character.toChars(29)))
.codePoints()
.toArray(); // optionally, set up for traditional for loop
for (int codepoint : codepoints) {
char[] utf16 = Character.toChars(codepoint); // always one or two code units
if (utf16.length == 2) {
System.out.println(
String.format("U+%04X \\u%04X\\u%04X %s",
codepoint, (int)utf16[0], (int)utf16[1], Character.getName(codepoint)));
} else {
System.out.println(
String.format("U+%04X \\u%04X %s",
codepoint, (int)utf16[0], Character.getName(codepoint)));
}
}
The UTF-16 character encoding encodes a codepoint from the Unicode character set with one or two code units (char).
(Not sure how the existence of the ASCII character set is relevant to this project—or most any project. If you have bytes for ASCII-encoded text or need bytes for ASCII-encoded text, that's a different question. But, Java uses UTF-16 for text datatypes.)

How do I compare each character of a String while accounting for characters with length > 1?

I have a variable string that might contain any unicode character. One of these unicode characters is the han 𩸽.
The thing is that this "han" character has "𩸽".length() == 2 but is written in the string as a single character.
Considering the code below, how would I iterate over all characters and compare each one while considering the fact it might contain one character with length greater than 1?
for ( int i = 0; i < string.length(); i++ ) {
char character = string.charAt( i );
if ( character == '𩸽' ) {
// Fail, it interprets as 2 chars =/
}
}
EDIT:
This question is not a duplicate. This asks how to iterate for each character of a String while considering characters that contains .length() > 1 (character not as a char type but as the representation of a written symbol). This question does not require previous knowledge of how to iterate over unicode code points of a Java String, although an answer mentioning that may also be correct.
int hanCodePoint = "𩸽".codePointAt(0);
for (int i = 0; i < string.length();) {
int currentCodePoint = string.codePointAt(i);
if (currentCodePoint == hanCodePoint) {
// do something here.
}
i += Character.charCount(currentCodePoint);
}
The String.charAt and String.length methods treat a String as a sequence of UTF-16 code units. You want to treat the string as Unicode code-points.
Look at the "code point" methods in the String API:
codePointAt(int index) returns the (32 bit) code point at a given code-unit index
offsetByCodePoints(int index, int codePointOffset) returns the code-unit index corresponding to codePointOffset code-points from the code-unit at index.
codePointCount(int beginIndex, int endIndex) counts the code-points between two code-unit indexes.
Indexing the string by code point index is a bit tricky, especially if the string is long and you want to do it efficiently. However, it is a do-able, albeit that the code is rather cumbersome.
#sstan's answer is one solution.
This will be simpler if you treat both the string and the data you're searching for as Strings. If you just need to test for the presence of that character:
if (string.contains("𩸽") {
// do something here.
}
If you specifically need the index where that character appears:
int i = string.indexOf("𩸽");
if (i >= 0) {
// do something with i here.
}
And if you really need to iterate through every code point, see How can I iterate through the unicode codepoints of a Java String? .
An ASCII character takes half the amount a Unicode char does, so it's logical that the han character is of length 2. It not an ASCII char, nor a Unicode letter. If it were the second case, the letter would be displayed correctly.

JAVA: Space delimiting all non-numerical characters in a String

I am having some trouble with modifying Strings to be space delimited under the special case of adding spaces to all non-numerical characters.
My code must take a string representing a math equation, and split it up into it's individual parts. It does so using space delimits between values This part works great if the string is already delimited.
The problem is that I do not always get a space delimited input. To deal with this, I want to first insert these spaces so that the array is created properly.
What my code must do is take any character that is NOT a number, and add a space before and after it.
Something like this:
3*24+321 becomes 3 * 24 + 321
or
((3.0)*(2.5)) becomes ( ( 3.0 ) * ( 2.5 ) )
Obviously I need to avoid inserting space in the numbers, or 2.5 becomes 2 . 5, and then gets entered into the array as 3 elements. which it is not.
So far, I have tried using
String InputLineDelmit = InputLine.replaceAll("\B", " ");
which successfully changes a string of all letters "abcd" to "a b c d"
But it makes mistakes when it runs into numbers. Using this method, I have gotten that:
(((1)*(2))) becomes ( ( (1) * (2) ) ) ---- * The numbers must be separate from parens
12.7+3.1 becomes 1 2.7+3.1 ----- * 12.7 is split
51/3 becomes 5 1/3 ----- * same issue
and 5*4-2 does not change at all.
So, I know that \D can be used as a regular expression for all non-numbers in java. However, my attempts to implement this (by replacing, or combining it with \B above) have led either to compiler errors or it REPLACING the char with a space, not adding one.
EDIT:
==== Answered! ====
It wont let me add my own answer because I'm new, but an edit to neo108's code below (which, itself, does not work) did the job. What i did was change it to check isDigit, not isLetter, and then do nothing in that case (or in the special case of a decimal, for doubles). Else, the character is changed to have spaces on either side.
public static void main(String[] args){
String formula = "12+((13.0)*(2.5)-17*2)+(100/3)-7";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
char cdot = '.';
if(Character.isDigit(c) || c == cdot) {
builder.append(c);
}
else {
builder.append(" "+c+" ");
}
}
System.out.println("OUTPUT:" + builder);
}
OUTPUT: 12 + ( ( 13.0 ) * ( 2.5 ) - 17 * 2 ) + ( 100 / 3 ) - 7
However, any ideas on how to do this more succinctly, and also a decent explanation of StringBuilders, would be appreciated. Namely what is with this limit of 16 chars that I read about on javadocs, as the example above shows that you CAN have more output.
Something like this should work...
String formula = "Ab((3.0)*(2.5))";
StringBuilder builder = new StringBuilder();
for (int i = 0; i < formula.length(); i++){
char c = formula.charAt(i);
if(Character.isLetter(c)) {
builder.append(" "+c+" ");
} else {
builder.append(c);
}
}
Define the operations in your math equation + - * / () etc
Convert your equation string to char[]
Traverse through the char[] one char at a time and append the read char to a StringBuilder object.
If you encounter any character matching with the operations defined, then add a space before and after that character and then append this t o the StringBuilder object.
Well this is one of the algorithm you can implement. There might be other ways of doing it as well.

Creating Unicode character from its number

I want to display a Unicode character in Java. If I do this, it works just fine:
String symbol = "\u2202";
symbol is equal to "∂". That's what I want.
The problem is that I know the Unicode number and need to create the Unicode symbol from that. I tried (to me) the obvious thing:
int c = 2202;
String symbol = "\\u" + c;
However, in this case, symbol is equal to "\u2202". That's not what I want.
How can I construct the symbol if I know its Unicode number (but only at run-time---I can't hard-code it in like the first example)?
If you want to get a UTF-16 encoded code unit as a char, you can parse the integer and cast to it as others have suggested.
If you want to support all code points, use Character.toChars(int). This will handle cases where code points cannot fit in a single char value.
Doc says:
Converts the specified character (Unicode code point) to its UTF-16 representation stored in a char array. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the resulting char array has the same value as codePoint. If the specified code point is a supplementary code point, the resulting char array has the corresponding surrogate pair.
Just cast your int to a char. You can convert that to a String using Character.toString():
String s = Character.toString((char)c);
EDIT:
Just remember that the escape sequences in Java source code (the \u bits) are in HEX, so if you're trying to reproduce an escape sequence, you'll need something like int c = 0x2202.
The other answers here either only support unicode up to U+FFFF (the answers dealing with just one instance of char) or don't tell how to get to the actual symbol (the answers stopping at Character.toChars() or using incorrect method after that), so adding my answer here, too.
To support supplementary code points also, this is what needs to be done:
// this character:
// http://www.isthisthingon.org/unicode/index.php?page=1F&subpage=4&glyph=1F495
// using code points here, not U+n notation
// for equivalence with U+n, below would be 0xnnnn
int codePoint = 128149;
// converting to char[] pair
char[] charPair = Character.toChars(codePoint);
// and to String, containing the character we want
String symbol = new String(charPair);
// we now have str with the desired character as the first item
// confirm that we indeed have character with code point 128149
System.out.println("First code point: " + symbol.codePointAt(0));
I also did a quick test as to which conversion methods work and which don't
int codePoint = 128149;
char[] charPair = Character.toChars(codePoint);
System.out.println(new String(charPair, 0, 2).codePointAt(0)); // 128149, worked
System.out.println(charPair.toString().codePointAt(0)); // 91, didn't work
System.out.println(new String(charPair).codePointAt(0)); // 128149, worked
System.out.println(String.valueOf(codePoint).codePointAt(0)); // 49, didn't work
System.out.println(new String(new int[] {codePoint}, 0, 1).codePointAt(0));
// 128149, worked
--
Note: as #Axel mentioned in the comments, with java 11 there is Character.toString(int codePoint) which would arguably be best suited for the job.
This one worked fine for me.
String cc2 = "2202";
String text2 = String.valueOf(Character.toChars(Integer.parseInt(cc2, 16)));
Now text2 will have ∂.
Remember that char is an integral type, and thus can be given an integer value, as well as a char constant.
char c = 0x2202;//aka 8706 in decimal. \u codepoints are in hex.
String s = String.valueOf(c);
String st="2202";
int cp=Integer.parseInt(st,16);// it convert st into hex number.
char c[]=Character.toChars(cp);
System.out.println(c);// its display the character corresponding to '\u2202'.
Although this is an old question, there is a very easy way to do this in Java 11 which was released today: you can use a new overload of Character.toString():
public static String toString​(int codePoint)
Returns a String object representing the specified character (Unicode code point). The result is a string of length 1 or 2, consisting solely of the specified codePoint.
Parameters:
codePoint - the codePoint to be converted
Returns:
the string representation of the specified codePoint
Throws:
IllegalArgumentException - if the specified codePoint is not a valid Unicode code point.
Since:
11
Since this method supports any Unicode code point, the length of the returned String is not necessarily 1.
The code needed for the example given in the question is simply:
int codePoint = '\u2202';
String s = Character.toString(codePoint); // <<< Requires JDK 11 !!!
System.out.println(s); // Prints ∂
This approach offers several advantages:
It works for any Unicode code point rather than just those that can be handled using a char.
It's concise, and it's easy to understand what the code is doing.
It returns the value as a string rather than a char[], which is often what you want. The answer posted by McDowell is appropriate if you want the code point returned as char[].
This is how you do it:
int cc = 0x2202;
char ccc = (char) Integer.parseInt(String.valueOf(cc), 16);
final String text = String.valueOf(ccc);
This solution is by Arne Vajhøj.
The code below will write the 4 unicode chars (represented by decimals) for the word "be" in Japanese. Yes, the verb "be" in Japanese has 4 chars!
The value of characters is in decimal and it has been read into an array of String[] -- using split for instance. If you have Octal or Hex, parseInt take a radix as well.
// pseudo code
// 1. init the String[] containing the 4 unicodes in decima :: intsInStrs
// 2. allocate the proper number of character pairs :: c2s
// 3. Using Integer.parseInt (... with radix or not) get the right int value
// 4. place it in the correct location of in the array of character pairs
// 5. convert c2s[] to String
// 6. print
String[] intsInStrs = {"12354", "12426", "12414", "12377"}; // 1.
char [] c2s = new char [intsInStrs.length * 2]; // 2. two chars per unicode
int ii = 0;
for (String intString : intsInStrs) {
// 3. NB ii*2 because the 16 bit value of Unicode is written in 2 chars
Character.toChars(Integer.parseInt(intsInStrs[ii]), c2s, ii * 2 ); // 3 + 4
++ii; // advance to the next char
}
String symbols = new String(c2s); // 5.
System.out.println("\nLooooonger code point: " + symbols); // 6.
// I tested it in Eclipse and Java 7 and it works. Enjoy
Here is a block to print out unicode chars between \u00c0 to \u00ff:
char[] ca = {'\u00c0'};
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 16; j++) {
String sc = new String(ca);
System.out.print(sc + " ");
ca[0]++;
}
System.out.println();
}
Unfortunatelly, to remove one backlash as mentioned in first comment (newbiedoodle) don't lead to good result. Most (if not all) IDE issues syntax error. The reason is in this, that Java Escaped Unicode format expects syntax "\uXXXX", where XXXX are 4 hexadecimal digits, which are mandatory. Attempts to fold this string from pieces fails. Of course, "\u" is not the same as "\\u". The first syntax means escaped 'u', second means escaped backlash (which is backlash) followed by 'u'. It is strange, that on the Apache pages is presented utility, which doing exactly this behavior. But in reality, it is Escape mimic utility. Apache has some its own utilities (i didn't testet them), which do this work for you. May be, it is still not that, what you want to have. Apache Escape Unicode utilities But this utility 1 have good approach to the solution. With combination described above (MeraNaamJoker). My solution is create this Escaped mimic string and then convert it back to unicode (to avoid real Escaped Unicode restriction). I used it for copying text, so it is possible, that in uencode method will be better to use '\\u' except '\\\\u'. Try it.
/**
* Converts character to the mimic unicode format i.e. '\\u0020'.
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param ch the character to convert
* #return is in the mimic of escaped unicode string,
*/
public static String unicodeEscaped(char ch) {
String returnStr;
//String uniTemplate = "\u0000";
final static String charEsc = "\\u";
if (ch < 0x10) {
returnStr = "000" + Integer.toHexString(ch);
}
else if (ch < 0x100) {
returnStr = "00" + Integer.toHexString(ch);
}
else if (ch < 0x1000) {
returnStr = "0" + Integer.toHexString(ch);
}
else
returnStr = "" + Integer.toHexString(ch);
return charEsc + returnStr;
}
/**
* Converts the string from UTF8 to mimic unicode format i.e. '\\u0020'.
* notice: i cannot use real unicode format, because this is immediately translated
* to the character in time of compiling and editor (i.e. netbeans) checking it
* instead reaal unicode format i.e. '\u0020' i using mimic unicode format '\\u0020'
* as a string, but it doesn't gives the same results, of course
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param String - nationalString in the UTF8 string to convert
* #return is the string in JAVA unicode mimic escaped
*/
public String encodeStr(String nationalString) throws UnsupportedEncodingException {
String convertedString = "";
for (int i = 0; i < nationalString.length(); i++) {
Character chs = nationalString.charAt(i);
convertedString += unicodeEscaped(chs);
}
return convertedString;
}
/**
* Converts the string from mimic unicode format i.e. '\\u0020' back to UTF8.
*
* This format is the Java source code format.
*
* CharUtils.unicodeEscaped(' ') = "\\u0020"
* CharUtils.unicodeEscaped('A') = "\\u0041"
*
* #param String - nationalString in the JAVA unicode mimic escaped
* #return is the string in UTF8 string
*/
public String uencodeStr(String escapedString) throws UnsupportedEncodingException {
String convertedString = "";
String[] arrStr = escapedString.split("\\\\u");
String str, istr;
for (int i = 1; i < arrStr.length; i++) {
str = arrStr[i];
if (!str.isEmpty()) {
Integer iI = Integer.parseInt(str, 16);
char[] chaCha = Character.toChars(iI);
convertedString += String.valueOf(chaCha);
}
}
return convertedString;
}
char c=(char)0x2202;
String s=""+c;
(ANSWER IS IN DOT NET 4.5 and in java, there must be a similar approach exist)
I am from West Bengal in INDIA.
As I understand your problem is ...
You want to produce similar to ' অ ' (It is a letter in Bengali language)
which has Unicode HEX : 0X0985.
Now if you know this value in respect of your language then how will you produce that language specific Unicode symbol right ?
In Dot Net it is as simple as this :
int c = 0X0985;
string x = Char.ConvertFromUtf32(c);
Now x is your answer.
But this is HEX by HEX convert and sentence to sentence conversion is a work for researchers :P

Categories

Resources