Remove consecutive duplicate characters from a String

Remove consecutive duplicate characters from a String - java

I'm trying to remove duplicate characters from a string recursively.
I don't know how to fix this code to remain the first character when characters have different cases.
/**
* Remove consecutive duplicate characters from a String. <br>
* Case should not matter, if two or more consecutive duplicate <br>
* characters have different cases, then the first letter should be kept.
* #param word A word with possible consecutive duplicate characters.
* #return A word without any consecutive duplicate characters.
*/
public static String dedupeChars(String word){
if ( word.length() <= 1 )
return word;
if( word.substring(0,1).equalsIgnoreCase(word.substring(1,2)) )
return dedupeChars(word.substring(1));
else
return word.substring(0,1) + dedupeChars(word.substring(1));
}

You were on the right track, but your logic was a bit off. Consider this version, with explanation below the code:
public static String dedupeChars(String word) {
if (word.length() <= 1) {
return word;
}
if (word.substring(0,1).equalsIgnoreCase(word.substring(1,2))) {
return dedupeChars(word.substring(0, 1) + word.substring(2));
}
else {
return word.substring(0,1) + dedupeChars(word.substring(1));
}
}
System.out.println(dedupeChars("aaaaBbBBBbCDdefghiIIiJ"));
This prints:
aBCDefghiJ
For an explanation of the algorithm, your base case was correct, and for a single character word, we just return than character. For the case where the first character be identical to the second one, we splice out that second character and then recursively call dedupeChars() again. For example, here is what happens with the input string shown above:
aaaaBbBBBbCDdefghiIIiJ
aaaBbBBBbCDdefghiIIiJ
aaBbBBBbCDdefghiIIiJ
aBbBBBbCDdefghiIIiJ
That is, we splice out duplicates, always retaining the first occurrence, until there are no more duplicates.
By the way, in practice you might instead want to use regex here, for a much more concise solution:
String input = "aaaaBbBBBbCDdefghiIIiJ";
input = input.replaceAll("(?i)(.)\\1+", "$1");
System.out.println(input);
This prints:
aBCDefghiJ
Here we just tell the regex engine to remove all duplicates of any single letter, retaining only the first letter in the series.

I have a different way to achieve your purpose
and I think your code is too expensive to remove duplicate characters(ignore uppercase or lowercase,just keep the first one).
public static String removeDup(String s) {
char[] chars = s.toCharArray();
StringBuilder sb = new StringBuilder();
for (int i = chars.length - 1; i > 0; i--) {
if (chars[i] == chars[i - 1]) {
continue;
}
if (chars[i] < 97) {
if (chars[i] == (chars[i - 1] - 32)) {
continue;
}
} else {
if (chars[i] == (chars[i - 1] + 32)) {
continue;
}
}
sb.append(chars[i]);
}
sb.append(chars[0]);
return sb.reverse().toString();}
For the input "aaaaBbBBBbCDdefghiIIiJ" the output will be "aBCDefghiJ"

Related

An underscore, a dot and a dash must always be followed by one or more alphanumeric characters without using Regex

I'm working on a school assignment that validates emails without using Regex. The premise of this exercise is to learn about methods and to practice our critical thinking. I understand that the code can be reduced to fewer lines.
Right now, I need to check all conditions for the prefix of an email (The characters before '#'):
It contains at least one character.
It contains only alphanumeric characters, underscores(‘_’), periods(‘.’), and dashes(‘-’).
An underscore, a period, or a dash must always be followed by one or more alphanumeric characters.
The first character must be alphanumeric.
Examples of valid prefixes are: “abc-d”, “abc.def”, “abc”, “abc_def”.
Examples of invalid prefixes are: “abc-”, “abc..d”, “.abc”, “abc#def”.
I'm having a hard time figuring out the third condition. So far, I have these methods that meet the other conditions.
public static boolean isAlphanumeric(char c) {
return Character.isLetterOrDigit(c);
}
public static boolean isValidPrefixChar(char preChar) {
char[] prefixChar = new char[] {'-', '.', '_'};
for (int i = 0; i < prefixChar.length; i++) {
if (prefixChar[i] == preChar) {
return true;
} else if (isAlphanumeric(preChar)) {
return true;
}
}
return false;
}
public static boolean isValidPrefix(String emailPrefix) {
boolean result = false;
// To check if first character is alphanumeric
if (isAlphanumeric(emailPrefix.charAt(0)) && emailPrefix.length() > 1) {
for (int i = 0; i < emailPrefix.length(); i++) {
// If email prefix respects all conditions, change result to true
if (isValidPrefixChar(emailPrefix.charAt(i))) {
result = true;
} else {
result = false;
break;
}
}
}
return result;
}

Let's look at your list:
It contains at least one character.
It contains only alphanumeric characters, underscores(‘_’), periods(‘.’), and dashes(‘-’).
An underscore, a period, or a dash must always be followed by one or more alphanumeric characters.
The first character must be alphanumeric.
As you say, 1, 2, and 4 are easy. Here's what I would do. My first line would check length and return false if incorrect. I would then iterate over the characters. Inside the loop;
set boolean lastWasSpecial = false.
Check that it's a legal character (condition 2)
If index == 0, check that it's alphanumeric (condition 4)
If it's one of the specials:
If lastWasSpecial is set, return false
Set lastWasSpecial = true;
else set lastWasSpecial = false again
Should be about 10 lines of easily-readable code.

The algorithm can be optimised, but I tried to change just some lines and to use the same code style. I added the explanation in the comments.
public static boolean isAlphanumeric(char c) {
return Character.isLetterOrDigit(c);
}
public static boolean isValidPrefixChar(char preChar) {
char[] prefixChar = new char[]{'-', '.', '_'};
for (int i = 0; i < prefixChar.length; i++) {
if (prefixChar[i] == preChar) {
return true;
}
}
return false;
}
public static boolean isValidPrefix(String emailPrefix) {
boolean result = false;
// To check if first character is alphanumeric
if (isAlphanumeric(emailPrefix.charAt(0)) && emailPrefix.length() > 1) {
// this boolean is set to true when the next char has to be alphanumeric
boolean nextHasToBeAlphaNumeric = false;
// the for loop start from 1 because char 0 has been already checked
for (int i = 1; i < emailPrefix.length(); i++) {
// If email prefix respects all conditions, change result to true
char character = emailPrefix.charAt(i);
if (isValidPrefixChar(character)) {
// the previous char is '.', '_', '-' then you cannot have two valid prefix char together
if (nextHasToBeAlphaNumeric) {
result = false;
break;
} else {
// the next char has to be alphanumeric
result = true;
nextHasToBeAlphaNumeric = true;
}
} else if (isAlphanumeric(character)) {
result = true;
nextHasToBeAlphaNumeric = false;
} else {
result = false;
break;
}
}
}
return result;
}

local-part
FYI, the portion of the address before the COMMERCIAL AT sign (#) is called a local-part.
Avoid char
Your code using the char will break when encountering characters outside the BMP. As a 16-bit value, char cannot represent most characters.
Code points
Use code points instead, when working with individual characters. A code point is the number assigned permanently to each of the over 140,000 characters defined in Unicode.
int[] codePoints = localPart.codePoints().toArray() ;
Define a array, list, or set of your acceptable punctuation characters.
int codePoint = "-".codePointAt( 0 ) ; // Annoying zero-based index counting.
To verify that every punctuation character encountered is followed by a letter/digit, first make sure the punctuation mark is not the last character. If not, then look ahead on the array for the following code point. Test if that code point is a letter or digit.
if( Character.isLetterOrDigit( codePoints[ i + 1 ] ) ) { … }

Find the largest ASCII character in a string recursively - java

I am trying to write a method which returns the largest character in a string according to ASCII (a character is greater if it comes later in the ASCII table). This is what I have so far,
public char maxChar (String s) {
char[] characters = s.toCharArray();
char character = characters[0];
return maxCharHelper(characters, character, 0);
}
private static char maxCharHelper(char[] characters, char character, int index) {
if (index >= characters.length - 1) {
return character;
}
if (characters[index] > character) {
character++;
}
return maxCharHelper(characters, character, ++index);
}
I receive three issues which are:
1) when string "helloWORLD" is used it returns 107(k) instead of 111(o)
2) when string "helloworld" is used it returns 110(n) instead of 119(w)
lastly, 3) when string "abbxL ? 12 x5y #" is used it returns 101(e) instead of 121(y)
Not sure why this happens, is there anything wrong with my code? Any help is appreciated.

replace character++; with character = characters[index] and it will work!
and replace index >= characters.length - 1 with index > characters.length - 1, else the last character wont be checked.

How to access an array when it is within an arraylist?

The overall goal of what I'm trying to do is to compare a string to index 0 of an array (that is held within an arraylist), and if the strings are the same (ignoring case), call a method that matches the case of the string to the translated word (held at index 1 of the array inside an arraylist). When I run this code and I print out the contents of my translated arraylist, I get all "no match" characters. I'm assuming this is because I'm not accessing the index I want in the correct manner. Please help!
public static String translate(String word, ArrayList<String[]> wordList) {
if (word == "." || word == "!" || word == ";" || word == ":") {
return word;
}
for (int i = 0; i < wordList.size(); i++) {
String origWord = wordList.get(i)[0];
String transWord = wordList.get(i)[1];
if (word.equalsIgnoreCase(origWord)) { //FIXME may need to change if you need to switch from translated to original
String translated = matchCase(word, transWord);
return translated;
}
}
String noMatch = Character.toString(Config.LINE_CHAR);
return noMatch;
}
Sample Data and expected result
word = "hello"
wordList.get(i)[0] = "Hello"
wordList.get(i)[1] = "Hola"
(word and wordList.get(i)[0] match, so the next step is executed)
match case method is called and returns the translated word with the same case as the original word ->
translated = "hola"
returns the translated word.
(the for loop iterates through the entire wordList until it finds a match, then it calls the translate method)
**
Match Case's Code
public static String matchCase(String template, String original) {
String matched = "";
if (template.length() > original.length()) {
for (int i = 1; i <= original.length(); i++) {
if (template.charAt(i-1) >= 'a' && template.charAt(i-1) <= 'z') {
if (i == original.length()) {
matched += original.substring(original.length() - 1).toLowerCase();
}
else {
matched += original.substring((i-1), i).toLowerCase();
}
}
else if (template.charAt(i-1) >= 'A' && template.charAt(i-1) <= 'Z') {
if (i == original.length()) {
matched += original.substring(original.length() - 1).toUpperCase();
}
else {
matched += original.substring((i-1), i).toUpperCase();
}
}
}
return matched;
}
else if (template.length() < original.length()) {
int o;
original.toLowerCase();
for (int i = 1; i <= template.length(); i++) {
if (template.charAt(i-1) >= 'a' && template.charAt(i-1) <= 'z') {
if (i == template.length()) {
matched += original.substring(original.length() - 1).toLowerCase();
}
else {
matched += original.substring((i-1), i).toLowerCase();
}
}
else if (template.charAt(i-1) >= 'A' && template.charAt(i-1) <= 'Z') {
if (i == template.length()) {
matched += original.substring(original.length() - 1).toUpperCase();
}
else {
matched += original.substring((i-1), i).toUpperCase();
}
}
String newMatched = matched + original.substring(i, original.length() - 1);
matched = newMatched;
newMatched = "";
}
return matched;
}
return original;
}

I have tested your code and it works rather well with the example you have provided. I cannot help for your bug.
There are however some bugs to notify and improvement to suggest:
matchCase fails when template is shorter than the translated word.
Never compare strings with ==. Use the equals method and look why .
This is not really important but why is noMatch always computed. Why don't you declare it as a constant once?
public static final String NO_MATCH = String.valueOf(Config.LINE_CHAR);
More importantly I think that matchCase is not really pertinent by design and is over complicated. I think that You should just determine if the word to translate is all lower case or upper case or with the first letter in uppercase and the following letters in lower case. What you do (comparing the case letter by letter) is not really pertinent when the length is different.
When you consider a single character, use charAt instead of substringit is simpler and faster.
You also might have a look a regex to analyze your Strings.
Have you considered Maps for your translation lookup?
...

Removing Consecutive Characters in each Iteration shows Unexpected error

How to remove Consecutive Characters at each Iteration..
Below is the screenshot that explains the question with more details
MySolution
Initially I checked whether there are any Consecutive characters.
If yes,Then,remove all the consecutive characters and when there are no consecutive characters add the remaining characters to another String.
If no Consecutive Characters just simply increment it.
public static void print(){
String s1="aabcccdee"; I have taken a sample test case
String s2="";
for(int i=0;i<s1.length();){
if(s1.charAt(i)==s1.charAt(i+1)){
while(s1.charAt(i)==s1.charAt(i+1)){
i++;
}
for(int j=i+1;j<s1.length();j++){
s2=s2+s1.charAt(j);
}
s1=s2;
}
else
i++;
}
System.out.println(s1);
}
Output Shown
An infinite Loop
Expected Output for the give sample is
bd
Can Anyone guide me how to correct?

You can simply use String::replaceFirts with this regex (.)\1+ which means matche any charater (.) which followed by itself \1 one or more time + with empty.
In case you want to replace first by first you have to check the input, if after each iteration still contain more than one consecutive characters or not, in this case you can use Pattern and Matcher like this :
String[] strings = {"aabcccdee", "abbabba", "abbd "};
for (String str : strings) {
Pattern pattern = Pattern.compile("([a-z])\\1");
// While the input contain more than one consecutive char make a replace
while (pattern.matcher(str).find()) {
// Note : use replaceFirst instead of replaceAll
str = str.replaceFirst("(.)\\1+", "");
}
System.out.println(str);
}
Outputs
aabcccdee -> bd
abbabba -> a
abbd -> ad

Update
I had misread the question. The intent is to also remove the consecutive characters after each replacement. The below code does that.
private static String removeDoubles(String str) {
int s = -1;
for (int i = 1; i < str.length(); i++) {
// If the current character is the same as the previous one,
// remember its start position, but only if it is not set yet
// (its value is -1)
if (str.charAt(i) == str.charAt(i - 1)) {
if (s == -1) {
s = i - 1;
}
}
else if (s != -1) {
// If the current char is not equal to the previous one,
// we have found our end position. Cut the characters away
// from the string.
str = str.substring(0, s) + str.substring(i);
// Reset i. Notice that we don't have to loop from 0 on,
// instead we can start from our last replacement position.
i = s - 1;
// Finally reset our start position
s = -1;
}
}
if (s != -1) {
// Check the last portion
str = str.substring(0, s);
}
return str;
}
Note that this is almost 10 times faster than YCF_L's answer.
Original post
You are almost there, but you don't have to use multiple for loops. You just need one loop, because whether to remove characters from the string only depends on subsequent characters; we don't need to count anything.
Try this:
private static String removeDoubles(String s) {
boolean rem = false;
String n = "";
for (int i = 0; i < s.length() - 1; i++) {
// First, if the current char equals the next char, don't add the
// character to the new string and set 'rem' to true, which is used
// to remove the last character of the sequence of the same
// characters.
if (s.charAt(i) == s.charAt(i + 1)) {
rem = true;
}
// If this is the last character of a sequence of 'doubles', then
// reset 'rem' to false.
else if (rem) {
rem = false;
}
// Else add the current character to the new string
else {
n += s.charAt(i);
}
}
// We haven't checked the last character yet. Let's add it to the string
// if 'rem' is false.
if (!rem) {
n += s.charAt(s.length() - 1);
}
return n;
}
Note that this code is on average more than three times faster than regular expressions.

Try something like this:
public static void print() {
String s1 = "abcccbd"; // I have taken a sample test case
String s2 = "";
while (!s1.equals(s2)) {
s2 = s1;
s1 = s1.replaceAll("(.)\\1+", "");
}
System.out.println(s1);
}

consider this easier to understand code
String s1="aabcccdee";
while (true) {
rvpoint:
for (int x = 0; x < s1.length() -1; x++)
{
char c = s1.charAt(x);
if (c == s1.charAt(x+ 1)) {
s1 = s1.replace(String.valueOf(c), "");
continue rvpoint; // keep looping if a replacement was made
}
}
break; // break out of outer loop, if replacement not found
}
System.out.println(s1);
note
This will only work for the first iteration, put into a method and keep calling until the sizes do not change

Finding the number of words in a string [duplicate]

This question already has answers here:
how to count the exact number of words in a string that has empty spaces between words?
(9 answers)
Closed 9 years ago.
I can't seem to figure out why this doesn't work, but I may have just missed some simple logic. The method doesn't seem to find the last word when there isn't a space after it, so i'm guessing something is wrong with i == itself.length() -1 , but it seems to me that it would return true; you're on the last character and it isn't a whitespace.
public void numWords()
{
int numWords = 0;
for (int i = 1; i <= itself.length()-1; i ++)
{
if (( i == (itself.length() - 1) || itself.charAt (i) <= ' ') && itself.charAt(i-1) > ' ')
numWords ++;
}
System.out.println(numWords);
}
itself is the string. I am comparing the characters the way I am because that's how it is shown in the book, but please let me know if there are better ways.

Naïve approach: treat everything that has a space following it as a word. With that, simply count the number of elements as the result of a String#split operation.
public int numWords(String sentence) {
if(null != sentence) {
return sentence.split("\\s").length;
} else {
return 0;
}
}

Try,
int numWords = (itself==null) ? 0 : itself.split("\\s+").length;

So basically what it seems you're trying to do it to count all chunks of whitespace in a string. I'll fix up your code and use my head compiler to help you out with the problems you're experiencing.
public void numWords()
{
int numWords = 0;
// Don't check the last character as it doesn't matter if it's ' '
for (int i = 1; i < itself.length() - 1; i++)
{
// If the char is space and the next one isn't, count a new word
if (itself.charAt(i) == ' ' && itself.charAt(i - 1) != ' ') {
numWords++;
}
}
System.out.println(numWords);
}
This is a very naive algorithm and fails in a few cases, if the string ends in multiple spaces for example 'hello world ', it would count 3 words.
Note that if I was going to implement such a method I would go with a regex approach similar to Makoto's answer in order to simplify the code.

The following code fragment does job better:
if(sentence == null) {
return 0;
}
sentence = sentence.trim();
if ("".equals(sentence)) {
return 0;
}
return sentence.split("\\s+").length;
The regex \\s+ works correctly in case of several spaces. trim()
removes trailng and leading spaces Additional empty line check
prevents result 1 for empty string.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Remove consecutive duplicate characters from a String - java

Related

An underscore, a dot and a dash must always be followed by one or more alphanumeric characters without using Regex

Find the largest ASCII character in a string recursively - java

How to access an array when it is within an arraylist?

Removing Consecutive Characters in each Iteration shows Unexpected error

Finding the number of words in a string [duplicate]

Categories

Resources