What Java function offsetByCodePoints really takes as an argument?

What Java function offsetByCodePoints really takes as an argument? - java

I'm trying to understand some String class functions in Java. So, here's is a simple code:
/* different experiments with String class */
public class TestStrings {
public static void main(String[] args) {
String greeting = "Hello\uD835\uDD6b";
System.out.println("Number of code units in greeting is " + greeting.length());
System.out.println("Number of code points " + greeting.codePointCount(0,greeting.length()));
int index = greeting.offsetByCodePoints(0,6);
System.out.println("index = " + index);
int cp = greeting.codePointAt(index);
System.out.println("Code point at index is " + (char) cp);
}
}
\uD835\uDD6b is an ℤ symbol, so it's ok surrogate pair.
So, the string has 6(six) code points and 7(seven) code units (2-byte chars). As it's in documentation:
offsetByCodePoints
public int offsetByCodePoints(int index,
int codePointOffset)
Returns the index within this String that is offset from the given index by codePointOffset code points.
Unpaired surrogates within the text range given by index and codePointOffset count as one code point each.
Parameters:
index - the index to be offset
codePointOffset - the offset in code points
So we do give an argument in code points. But, with given arguments (0,6) it still works fine, without exceptions. But fails for codePointAt(), because it returns 7 which is out of bounds. So, maybe the function gets its args in code units? Or I've missed something.

codePointAt takes a char index.
The index refers to char values (Unicode code units) and ranges from 0 to length() - 1.
There are six code-points in that string. The offsetByCodePoints call returns the index after 6 code-points which is char-index 7. You then try to get the codePointAt(7) which is at the end of the string.
To see why, consider what
"".offsetByCodePoints(0, 0) == 0
because to count past all 0 code-points, you have to count past all 0 chars.
Extrapolating that to your string, to count past all 6 code-points, you have to count past all 7 chars.
Maybe seeing codePointAt in use will make this clear. This is the idiomatic way to iterate over all code-points in a string (or CharSequence):
for (var charIndex = 0, nChars = s.length(), codepoint;
charIndex < nChars;
charIndex += Character.charCount(codepoint)) {
codepoint = s.codePointAt(charIndex);
// Do something with codepoint.
}

Helpful answer, Mike... For easily understanding String#offsetByCodePoints, I commented its usage and modified a bit of the question example:
I personally find the Java documentation ambiguous here.
public class TestStrings
{
public static void main(String[] args)
{
String greeting = "Hello\uD835\uDD6b";
// Gets the `char` index a.k.a. offset of the code point
// at the code point index `0` starting from the `char` index `6`¹.
// ---
// Since `6` refers to an "unpaired" low surrogate (\uDD6b), the
// returned value is 6 + 1 = 7.
//
int charIndex = greeting.offsetByCodePoints(0,6);
System.out.println("charIndex = " + charIndex);
int cp = greeting.codePointAt(charIndex);
System.out.println("Code point at index is " + (char) cp);
}
}

Related

Checking for consecutively repeated characters in java

I am quite new to java. I am wondering if it is possible to check for a certain number of consecutively repeated characters (the 'certain number' being determined by the user) in a string or an index in a string array. So far I have tried
int multiple_characters = 0;
String array1 [] = {"abc","aabc","xyyyxy"};
for (int index = 0; index < array1.length;i++){
for (int i = 0;i<array1[index].length;i++){
if (array1[index].charAt(i) == array1[index].charAt(i+1)){
multiple_characters++;
}
}
}
But with this I get a StringIndexOutOfBounds error. I tried fixing this by putting in an extra if statement to make sure i was not equal to the array1[index].length, but this still threw up the same error. Other than the manual and cop-out method of:
if ((array1[index].charAt(i) == array1[index].charAt(i+1) && (array1[index].charAt(i) == array1[index].charAt(i+2))
and repeating however many times, (which would not be great for quick changes to my code), I can't seem to find a solution.

For the inner for loop (the one with the i variable), you're then calling string.charAt(i+1) where ii loops from 0 to the length of that string.
No wonder you get an index array out of bounds exception, you're asking for the character AFTER the last.
I advise that you try to understand the exception, and if you can't, debug your code (step through it, one line at a time, and if you don't know how to use a debugger, add println statements, checking what the code does what with you think it does. There where your code acts differently from your expectation? That's where the bug is).
This plan of 'oh, it does not work, I'll just chuck it out entirely and find another way to do it' is suboptimal :) – go back to the first snippet, and just fix this.

You are getting StringIndexOutOfBoundsException because you are trying to access string.charAt(i + 1) where i goes up to the highest index (i.e. string.length() - 1) of string.
You can do it as follows:
class Main {
public static void main(String[] args) {
int multiple_characters = 0;
int i;
String array1[] = { "abc", "aabc", "xyyyxy" };
for (int index = 0; index < array1.length; index++) {
System.out.println("String: " + array1[index]);
for (i = 0; i < array1[index].length() - 1; i++) {
multiple_characters = 1;
while (array1[index].charAt(i) == array1[index].charAt(i + 1) && i < array1[index].length() - 1) {
multiple_characters++;
i++;
}
System.out.println(array1[index].charAt(i) + " has been repeated consecutively " + multiple_characters
+ " time(s)");
}
if (multiple_characters == 1) {
System.out.println(array1[index].charAt(i) + " has been repeated consecutively 1 time(s)");
}
System.out.println("------------");
}
}
}
Output:
String: abc
a has been repeated consecutively 1 time(s)
b has been repeated consecutively 1 time(s)
c has been repeated consecutively 1 time(s)
------------
String: aabc
a has been repeated consecutively 2 time(s)
b has been repeated consecutively 1 time(s)
c has been repeated consecutively 1 time(s)
------------
String: xyyyxy
x has been repeated consecutively 1 time(s)
y has been repeated consecutively 3 time(s)
x has been repeated consecutively 1 time(s)
y has been repeated consecutively 1 time(s)
------------

If I was to look for repeated characters, I would go the regular expression route. For example to look for repeated a characters (repeated twice in this example), you could have:
import java.util.regex.Pattern;
public class Temp {
public static void main(final String[] args) {
String array1 [] = {"abc","aabc","xyyyxy"};
for (String item : array1){
if (Pattern.compile("[a]{2}").matcher(item).find()) {
System.out.println(item + " matches");
}
}
}
}
In this extract, the reg exp is "[a]{2}" which looks for any sequence of a characters repeated twice.
Of course more complicated regular expressions are required for more complex matches, good resources to explain this may be found here:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
Another point is that for efficiencies sake, it is often practise to move the:
Pattern.compile(*Pattern*)
outside of the method call, e.g. to a final static field
This stack overflow:
RegEx No more than 2 identical consecutive characters and a-Z and 0-9
gives quite a detailed description of the regular expression issues involved with this problem.

As a newbie I can't find the bug in the program

I was training a code wars kata and the kata was:
In a factory a printer prints labels for boxes. For one kind of boxes the printer has to use colors which, for the sake of simplicity, are named with letters from a to m.
The colors used by the printer are recorded in a control string. For example a "good" control string would be aaabbbbhaijjjm meaning that the printer used three times color a, four times color b, one time color h then one time color a...
Sometimes there are problems: lack of colors, technical malfunction and a "bad" control string is produced e.g. aaaxbbbbyyhwawiwjjjwwm with letters not from a to m.
You have to write a function printer_error which given a string will output the error rate of the printer as a string representing a rational whose numerator is the number of errors and the denominator the length of the control string. Don't reduce this fraction to a simpler expression.
The string has a length greater or equal to one and contains only letters from a to z.
Examples:
s="aaabbbbhaijjjm"
error_printer(s) => "0/14"
s="aaaxbbbbyyhwawiwjjjwwm"
error_printer(s) => "8/22"
and as a newbie, I tried to attempt it . My program is like this:
public class Printer {
public static String printerError(String s) {
int printErr = 0;
char end = 110;
int i = 0;
while (i < s.length()){
if(s.charAt(i) > end ){
printErr++;
}
i++;
}
String rate = String.format("%d/%d",printErr , s.length());
return rate;
}
}
It passed the test but while submitting the Kata the counter was missing 1 or 2 numbers. Can anyone help?

You can actually just use < and > to check if a character is in some range in java. Your logic is sound - but since you are a "newbie", you have re-created the functionality of a for-loop with your while loop. No need to do this - that's why we have for-loops.
See the adjusted method below:
public String printerError(String s) {
int printErr = 0;
for (int i = 0; i < s.length(); i++) {
// assuming the input rules hold true, we really only need the second condition
if (s.charAt(i) < 'a' || s.charAt(i) > 'm') {
printErr++;
}
}
return String.format("%d/%d", printErr, s.length());
}

This is an answer from one newbie to another :p, so my answer may be a little wrong. As far as I have understood, you have committed a silly logical error within the if-condition.
if(s.charAt(i) > end )
You have used ASCII values, which is assigned as follows: a-97, b-98, c-99..., m-109.
Note that you are counting it error only if the ASCII value of character is more than 110, meaning that your code will accept 'n' (whose ASCII value is 110) to be valid. That might be the only reason why your counter would store a wrong value.

Cut out different elements from a string and put them into a list

Here's updated code. For those following along the question edits contains the original question.
if (0 != searchString.length()) {
for (int index = input.indexOf(searchString, 0);
index != -1;
index = input.indexOf(searchString, eagerMatching ? index + 1 : index + searchString.length())) {
occurences++;
System.out.println(occurences);
indexIN=input.indexOf(ListStringIN, occurences - 1) + ListStringIN.length();
System.out.println(indexIN);
System.out.println(ListStringIN.length());
indexOUT=input.indexOf(ListStringOUT, occurences - 1);
System.out.println(indexOUT);
Lresult.add(input.substring(indexIN, indexOUT));
System.out.println();
}
}
As you can see, I gave me out the index numbers
My code works well with only one Element
But when I write something like this: %%%%ONE++++ %%%%TWO++++
There's this exception:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 16, end 7, length 23
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3410)
at java.base/java.lang.String.substring(String.java:1883)
at com.DMMS.Main.identify(Main.java:81)
And I found out that the indexIN changes in the Start of the second String but not the indexOUT
I couldn't find out why

When you look at your code you can notice: in the first loop that counts the number of occurrences, your code "knows" that it has to use that version of indexOf() that relies on offsets within the search strings.
In other words: you know that you have to search after previous "hits" when walking through your string.
But your second loop, the one that has to extract the actual things, there you are using indexOf() without that extra offset parameter. Therefore you keep "copying out" the same part repeatedly.
Thus: "simply" apply the same logic from loop 1 for loop 2!
Beyond that:
you don't need two loops for that. Counting occurrences and "copying out" the matching code ... can be done in one loop
and honestly: rewrite that first loop. This code is almost incomprehensible for human beings. A reader would have to sit down and read this 10, 20 times, and then run it in a debugger to understand what it is doing

I dit it!
Heres the code:
.........................
static String ListStringIN = "%%%%";
static String ListStringOUT = "++++";
........................
else if (input.contains(ListStringIN) && input.contains(ListStringOUT)) {
System.out.println("Identifiziere Liste...");
String searchString = ListStringIN;
int occurences = 0;
boolean eagerMatching = false;
if (0 != searchString.length()) {
for (int index = input.indexOf(searchString, 0); index != -1; index = input
.indexOf(searchString, eagerMatching ? index + 1 : index + searchString.length())) {
occurences++;
System.out.println(occurences);
indexIN=input.indexOf(ListStringIN, occurences - 1) + ListStringIN.length();
System.out.println(indexIN);
//indexOUT=input.indexOf(ListStringOUT, occurences);
//indexOUT=input.indexOf(ListStringOUT, occurences - 1);
indexOUT = input.indexOf(ListStringOUT, eagerMatching ? index + 1 : index + ListStringOUT.length());
System.out.println(indexOUT);
Lresult.add(input.substring(indexIN, indexOUT));
System.out.println();
}
}
//for (int i = 0; i <occurences; i ++) {
// Lresult.add(input.substring(input.indexOf(ListStringIN, 0) + ListStringIN.length(), input.indexOf(ListStringOUT)));
//}
result = Lresult.toString();
return result;
}
I hope this is useful for other people
#GhostCat Thanks for your advices!

Return the number of times query occurs as a substring of src

/** Return the number of times query occurs as a substring of src
* (different occurrences may overlap).
* Precondition: query is not the empty string "".
* Examples: For src = "ab", query = "b", return 1.
* For src = "Luke Skywalker", query = "ke", return 2.
* For src = "abababab", query = "aba", return 3.
* For src = "aaaa", query = "aa", return 3.*/
public static int numOccurrences(String src, String query) {
/* This should be done with one loop. If at all possible, don't have
* each iteration of the loop process one character of src. Instead,
* see whether some method of class String can be used to jump from one
* occurrence of query to the next. */
int count = 0;
for (int i = 0; i < src.length(); i++) {
int end = i + query.length() - 1;
if (end < src.length()) {
String sub = src.substring(i, end);
if (query.contentEquals(sub))
++count;
}
}return count;
}
I tested the code. If the src is "cherry" and the query is "err", then the output is expected to be 1 but it turns out to be 0. What's wrong with the code? BTW, I cannot use methods outside the String class.

Check the existence of query in src and loop until it return false. With each occurrence, take the substring, update the count and repeat until query is not found in src.
Pseudo code:
int count = 0;
loop (flag is true)
int index = find start index of query in src;
if (query is found)
src = update the src with substring(index + query.length());
count++;
else
flag = false;
return count;

What's wrong is that you're comparing err to:
i | sub
--|------
0 | ch
1 | he
2 | er
3 | rr
Notice that these strings you're comparing to look short, and you don't even get to the end of "cherry" before you stop checking for a match. So there are two things you need to fix in your code: the way you calculate end and the comparison between end and src.length().
Hint: the second argument (ending index) to substring is exclusive.

Pseudo code:
init 'count' and 'start' to 0
while true do
find first occurence of 'query' in 'source', start search at 'start'
if found
set 'start' to found position + 1
set count to count + 1
else
break out of while loop
end while
return count
Tip: Use String#indexOf(String str, int fromIndex) when finding occurence of query in source

This does the job:
public static int numOccurrences(String src, String query) {
int count = 0;
for(int i = src.indexOf(query); i > -1;i = src.indexOf(query, i + 1))
count++;
return count;
}
Here, i is the index of query in src, but the increment term makes use of indexOf(String str, int fromIndex), which javadoc says:
Returns the index within this string of the first occurrence of the specified substring, starting at the specified index.
which is passed the index i plus 1 to start searching for another occurrence after the previous hit.
This also addresses the NFR hinted at in the comment:
Instead, see whether some method of class String can be used to jump from one occurrence of query to the next.

Print out Yijing Hexagram Symbols

I encountered a problem while coding and I can't seem to find where I messed up or even why I get a wrong result.
First, let me explain the task.
It's about "Yijing Hexagram Symbols".
The left one is the original and the right one is the result that my code should give me.
Basically every "hexagram" contains 6 lines that can be either diveded or not.
So there are a total of
2^6 = 64 possible "hexagrams"
The task is to calculate and code a methode to print all possible combinations.
Thats what I have so far :
public class test {
public String toBin (int zahl) {
if(zahl ==0) return "0";
if (zahl ==1 ) return "1";
return ""+(toBin( zahl/2)+(zahl%2));
}
public void show (String s) {
for (char c : s.toCharArray()){
if (c == '1'){
System.out.println("--- ---");
}
if(c=='0'){
System.out.println("-------");
}
}
}
public void ausgeben (){
for(int i = 0 ; i < 64; i++) {
show (toBin(i));
}
}
}
The problem is, when I test the 'show'-methode with "10" I get 3 lines and not 2 as intended.
public class runner {
public static void main(String[] args){
test a = new test();
a.ausgeben();
a.show("10");
}
}
Another problem I've encoutered is, that since I'm converting to binary i sometimes have not enough lines because for example 10 in binary is 0001010 but the first "0" are missing. How can I implement them in an easy way without changing much ?
I am fairly new to all this so if I didn't explain anything enough or made any mistakes feel free to tell me.

You may find it easier if you use the Integer.toBinaryString method combined with the String.format and String.replace methods.
String binary = String.format("%6s", Integer.toBinaryString(zahl)).replace(' ', '0');
This converts the number to binary, formats it in a field six spaces wide (with leading spaces as necessary), and then replaces the spaces with '0'.

Well, there are many ways to pad a string with zeros, or create a binary string that is already padded with zeros.
For example, you could do something like:
public String padToSix( String binStr ) {
return "000000".substring( 0, 5 - binStr.length() ) + binStr;
}
This would check how long your string is, and take as many zeros are needed to fill it up to six from the "000000" string.
Or you could simply replace your conversion method (which is recursive, and that's not really necessary) with one that specializes in six-digit numbers:
public static String toBin (int zahl) {
char[] digits = { '0','0','0','0','0','0' };
int currDigitIndex = 5;
while ( currDigitIndex >= 0 && zahl > 0 ) {
digits[currDigitIndex] += (zahl % 2);
currDigitIndex--;
zahl /= 2;
}
return new String(digits);
}
This one modifies the character array ( which initially has only zeros ) from the right to the left. It adds the value of the current bit to the character at the given place. '0' + 0 is '0', and '0' + 1 is '1'. Because you know in advance that you have six digits, you can start from the right and go to the left. If your number has only four digits, well, the two digits we haven't touched will be '0' because that's how the character array was initialized.
There are really a lot of methods to achieve the same thing.

Your problem reduces to printing all binary strings of length 6. I would go with this code snippet:
String format = "%06d";
for(int i = 0; i < 64; i++)
{
show(String.format(format, Integer.valueOf(Integer.toBinaryString(i))));
System.out.println();
}
If you don't wish to print leading zeros, replace String.format(..) with Integer.toBinaryString(i).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

What Java function offsetByCodePoints really takes as an argument? - java

Related

Checking for consecutively repeated characters in java

As a newbie I can't find the bug in the program

Cut out different elements from a string and put them into a list

Return the number of times query occurs as a substring of src

Print out Yijing Hexagram Symbols

Categories

Resources