So I just started reading "Java In A Nutshell", and on Chapter One it states that:
"To include a character literal in a Java program, simply place it between single quotes"
i.e.
char c = 'A';
What exactly does this do^? I thought char only took in values 0 - 65,535. I don't understand how you can assign 'A' to it?
You can also assign 'B' to an int?
int a = 'B'
The output for 'a' is 66. Where/why would you use the above^ operation?
I apologise if this is a stupid question.
My whole life has been a lie.
char is actually an integer type. It stores the 16-bit Unicode integer value of the character in question.
You can look at something like http://asciitable.com to see the different values for different characters.
In Java char literals represent UTF-16 (character encoding schema) code units. What you got from UTF-16 is mapping between integer values (and the way they are saved in memory) with corresponding character (graphical representation of unit code).
You can enclose characters in single quotes - this way you don't need to remember UTF-16 values for characters you use. You can still get the integer value from character type and put if for example in int type (but generally not in short, they both use 16 bits but short values are from -32768 to 32767 and char values are from 0 to 65535 or so).
If you look at an ASCII chart, the character "A" has a value of 41 hex or 65 decimal. Using the ' character to bracket a single character makes it a character literal. Using the double-quote (") would make it a String literal.
Assigning char someChar = 'A'; is exactly the same as saying char someChar = 65;.
As to why, consider if you simply want to see if a String contains a decimal number (and you don't have a convenient function to do this). You could use something like:
bool isDecimal = true;
for (int i = 0; i < decString.length(); i++) {
char theChar = decString.charAt(i);
if (theChar < '0' || theChar > '9') {
isDecimal = false;
break;
}
}
Related
String name = "Jack";
char letter = name.charAt(0);
System.out.println(letter);
You know this is a java method charAt that it gives you a character of a String just by telling the index of the String. I'm asking for a method like this in Dart, does Dart have a method like that?
You can use String.operator[].
String name = "Jack";
String letter = name[0];
print(letter);
Note that this operates on UTF-16 code units, not on Unicode code points nor on grapheme clusters. Also note that Dart does not have a char type, so you'll end up with another String.
If you need to operate on arbitrary Unicode strings, then you should use package:characters and do:
String name = "Jack";
Characters letter = name.characters.characterAt(0);
print(letter);
Dart has two operations that match the Java behavior, because Java prints integers of the type char specially.
Dart has String.codeUnitAt, which does the same as Java's charAt: Returns an integer representing the UTF-16 code unit at that position in the string.
If you print that in Dart, or add it to a StringBuffer, it's just an integer, so print("Jack".codeUnitAt(0)) prints 74.
The other operations is String.operator[], which returns a single-code-unit String. So print("Jack"[0]) prints J.
Both should be used very judiciously, since many Unicode characters are not just a single code unit. You can use String.runes to get code points or String.characters from package characters to get grapheme clusters (which is usually what you should be using, unless you happen to know text is ASCII only.)
You can use
String.substring(int startIndex, [ int endIndex ])
Example --
void main(){
String s = "hello";
print(s.substring(1, 2));
}
Output
e
Note that , endIndex is one greater than startIndex, and the char which is returned is present at startIndex.
I'm trying to compare two char primitives ch1 and ch2. Both are assigned the value 1 as shown below.
But when compared using the "==" operator it returns false, which I don't understand how or what's happening behind the scenes.
char ch1 = (char)1;
char ch2 = '1';
System.out.println(ch1==ch2); //false
//further comparisions
System.out.println(ch1 == 1); //true
System.out.println(ch1 == '\u0031'); //false
System.out.println(ch2 == 1); //false
System.out.println(ch2 == '\u0031'); //true
'1' has the value 49 (31 hexadecimal).
(char)1 has the value 1.
A char is just a 16-bit integer. The notation 'x' means 'the character code for the character x', where the encoding used in Java is Unicode, specifically UTF-16.
The cast (char) does not change the value of the expression to its right, except that it truncates it from a full-size integer to 16 bits (which is no change for values 0 to 65535).
Basically what you are doing is casting the number one as a char, so ch1 is now equals to unicode character 1 (SOH or Start of Header)
So when you compare ch1 (SOH) to ch2 ('1') its going to return false
As well if you compare ch1 (SOH - \u0001) to `'1' - \u0031 is going to return false
That's the main reason why is returning false, the unicode value of ch1 that you expect is different from the one you assigned
Code point
The char type is essentially broken since Java 2, physically incapable of representing most characters.
Instead use code point integer numbers. Every character is permanently assigned a specific number, a code point.
int codePoint = "1".codePointAt( 0 ) ; // Annoying zero-based index counting.
The result is 49 decimal, 31 hexadecimal.
Make a string of that single character per the code point.
String s = Character.toString( codePoint ) ;
Or more specifically:
String latinDigitOneCharacter = Character.toString( 49 ) ;
As others pointed out, your code was mistakenly comparing the character defined as the Latin digit “1” with a code point of 1.
The character assigned to the code point of one is the control code SOH, Start of Heading. This is true in both Unicode and US-ASCII (Unicode is a superset of US-ASCII).
I'm attempting to take in a string from the console of a certain length and set the empty characters in the string to an asterisk.
System.out.println("Enter a string of digits.");
someString = input.next();
if(someString.matches("\\d{0,9}")) {
charArr = someString.toCharArray();
for ( char digit: charArr) {
if(!Character.isDefined(charArr[digit])){
charArr[digit] = '*';
}
}
System.out.printf("Your string is: %s%n", new String(charArr));
This code is throwing an array index out of bounds exception and I'm not sure why.
for ( char digit: charArr) will iterate over each character from charArr.
Thus, digit contains a character value from charArr.
When you access the element from charArr by writing charArr[digit], you are converting digit from datatype char to int value.
For example, you have charArr = new char[]{'a','b','c'}.
charArr['a'] is equivalent to charArr[97] but charArr has size of length 3 only.
Thus, charArr cannot access the element outsize of its size and throws ArrayIndexOutOfBoundsException.
Solution: loop through the array index wise rather than element wise.
for(int i = 0; i < charArr.length; i++) {
// access using charArr[i] instead of charArr[digit]
...
}
Think you could do it in one line with:
newString = someString.replaceAll("\\s", "*");
"\s" is the regex pattern for a whitespace character.
I think you're mixing your for blocks. In your example, you're going over every character in your someString.toCharArray() so you can't do !Character.isDefined(charArr[digit]) because digit is a char, not an int. You can't take the index of an array with a char.
If you're checking purely if a character is a space, you can simply do one of the following:
if (digit != ' ')
if (!Character.isWhiteSpace(digit)
if (Character.isDigit(digit))
This loop statement:
for (char digit: charArr) {
iterates the values in the array. The values have type char and can be anything from 0 to 65535. However, this statement
if (!Character.isDefined(charArr[digit])) {
uses digit as an index for the array. For that to "work" (i.e. not throw an exception), the value needs to be in the range 0 to charArr.length - 1. Clearly, for the input string you are using, some of those values are not acceptable as indexes (e.g. value >= charArr.length) and an exception ensues.
But you don't want to fix that by testing value is in the range required. The values of value are not (from a semantic perspective) array indexes anyway. (If you use them as if they are indexes, you will end up missing some positions in the array.)
If you want to index the values in the array, do this:
for (int i = 0; i < charArr.length; i++) {
and then use i as the index.
Even when you have fixed that, there is still a problem with your code ... for some usecases.
If your input is encoded using UTF-8 (for example) it could include Unicode codepoints (characters) that are greater than 65535, and are encoded in the Java string as two consective char values. (A so-called surrogate pair.) If your string contains surrogate pairs, then isDefined(char) is not a valid test. Instead you should be using isDefined(int) and (more importantly) iterating the Unicode codepoints in the string, not the char values.
I have a variable string that might contain any unicode character. One of these unicode characters is the han 𩸽.
The thing is that this "han" character has "𩸽".length() == 2 but is written in the string as a single character.
Considering the code below, how would I iterate over all characters and compare each one while considering the fact it might contain one character with length greater than 1?
for ( int i = 0; i < string.length(); i++ ) {
char character = string.charAt( i );
if ( character == '𩸽' ) {
// Fail, it interprets as 2 chars =/
}
}
EDIT:
This question is not a duplicate. This asks how to iterate for each character of a String while considering characters that contains .length() > 1 (character not as a char type but as the representation of a written symbol). This question does not require previous knowledge of how to iterate over unicode code points of a Java String, although an answer mentioning that may also be correct.
int hanCodePoint = "𩸽".codePointAt(0);
for (int i = 0; i < string.length();) {
int currentCodePoint = string.codePointAt(i);
if (currentCodePoint == hanCodePoint) {
// do something here.
}
i += Character.charCount(currentCodePoint);
}
The String.charAt and String.length methods treat a String as a sequence of UTF-16 code units. You want to treat the string as Unicode code-points.
Look at the "code point" methods in the String API:
codePointAt(int index) returns the (32 bit) code point at a given code-unit index
offsetByCodePoints(int index, int codePointOffset) returns the code-unit index corresponding to codePointOffset code-points from the code-unit at index.
codePointCount(int beginIndex, int endIndex) counts the code-points between two code-unit indexes.
Indexing the string by code point index is a bit tricky, especially if the string is long and you want to do it efficiently. However, it is a do-able, albeit that the code is rather cumbersome.
#sstan's answer is one solution.
This will be simpler if you treat both the string and the data you're searching for as Strings. If you just need to test for the presence of that character:
if (string.contains("𩸽") {
// do something here.
}
If you specifically need the index where that character appears:
int i = string.indexOf("𩸽");
if (i >= 0) {
// do something with i here.
}
And if you really need to iterate through every code point, see How can I iterate through the unicode codepoints of a Java String? .
An ASCII character takes half the amount a Unicode char does, so it's logical that the han character is of length 2. It not an ASCII char, nor a Unicode letter. If it were the second case, the letter would be displayed correctly.
I'm looking for a straightforward answer and can't seem to find one.
I'm just trying to see if the following is valid. I want to take the integer 7 and turn it into the character '7'. Is this allowed:
int digit = 7;
char code = (char) digit;
Thank you in advance for your help!
This conversion is allowed, but the result won't be what you expect, because char 7 is the bell character whereas '7' is 55 (0x37). Because the numeric characters are in order, starting with '0' at 48 (0x30), just add '0', then cast the result as a char.
char code = (char) (digit + '0');
You may also take a look at the Unicode characters, of which the printable ASCII characters are the same codes.
'7' is Unicode code point U+0037.
Since it is a code point in the Basic Multiligual Plane, and since char is a UTF-16 code unit and that there is a one-to-one mapping between Unicode code points in this plane and UTF-16 code units, you can rely on this:
(char) ('0' + digit)
Do NOT think of '7' as ASCII 55 because that prevents a good understanding of char... For more details, see here.
Nope. The char '7' can be retrieved from int 7 in these ways:
int digit = 7;
char code = Integer.toString(digit).charAt(0);
code = Character.forDigit(digit, 10);
If digit is between 0 and 9:
int digit = 7;
char code = (char)(((int)'0')+digit);