Generate new word from wildcard [duplicate]

Generate new word from wildcard [duplicate] - java

This question already has answers here:
Returning a list of wildcard matches from a HashMap in java
(3 answers)
Closed 7 years ago.
Im trying to generate a word with a wild card and check and see if this word is stored in the dictionary database. Like "appl*" should return apply or apple. However the problem comes in when I have 2 wild cards. "app**" will make words like appaa, appbb..appzz... instead of apple. The second if condition is just for a regular string that contains no wildcards"*"
public static boolean printWords(String s) {
String tempString, tempChar;
if (s.contains("*")) {
for (char c = 'a'; c <= 'z'; c++) {
tempChar = Character.toString(c);
tempString = s.replace("*", tempChar);
if (myDictionary.containsKey(tempString) == true) {
System.out.println(tempString);
}
}
}
if (myDictionary.containsKey(s) == true) {
System.out.println(s);
return true;
} else {
return false;
}
}

You're only using a single for loop over characters, and replacing all instances of * with that character. See the API for String.replace here. So it's no surprise that you're getting strings like Appaa, Appbb, etc.
If you want to actually use Regex expressions, then you shouldn't be doing any String.replace or contains, etc. etc. See Anubian's answer for how to handle your problem.
If you're treating this as a String exercise and don't want to use regular expressions, the easiest way to do what you're actually trying to do (try all combinations of letters for each wildcard) is to do it recursively. If there are no wild cards left in the string, check if it is a word and if so print. If there are wild cards, try each replacement of that wildcard with a character, and recursively call the function on the created string.
public static void printWords(String s){
int firstAsterisk = s.indexOf("*");
if(firstAsterisk == -1){ // doesn't contain asterisk
if (myDictionary.containsKey(s))
System.out.println(s);
return;
}
for(char c = 'a', c <= 'z', c++){
String s2 = s.subString(0, firstAsterisk) + c + s.subString(firstAsterisk + 1);
printWords(s2);
}
}
The base cause relies on the indexOf function - when indexOf returns -1, it means that the given substring (in our case "*") does not occur in the string - thus there are no more wild cards to replace.
The substring part basically recreates the original string with the first asterisk replaced with a character. So supposing that s = "abcd**ef" and c='z', we know that firstAsterisk = 4 (Strings are 0-indexed, index 4 has the first "*"). Thus,
String s2 = s.subString(0, firstAsterisk) + c + s.subString(firstAsterisk + 1);
= "abcd" + 'z' + "*ef"
= "abcdz*ef"

The * character is a regex wildcard, so you can treat the input string as a regular expression:
for (String word : myDictionary) {
if (word.matches(s)) {
System.out.println(word);
}
}
Let the libraries do the heavy lifting for you ;)

With your approach you have to check all possible combinations.
The better way would be to make a regex out of your input string, so replace all * with ..
Than you can loop over your myDirectory and check for every entry whether it matches the regex.
Something like this:
Set<String> dict = new HashSet<String>();
dict.add("apple");
String word = "app**";
Pattern pattern = Pattern.compile(word.replace('*', '.'));
for (String entry : dict) {
if (pattern.matcher(entry).matches()) {
System.out.println("matches: " + entry);
}
}
You have to take care if your input string already contains . than you have to escape them with a \. (The same for other special regex characters.)
See also
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html and
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html

Related

Capitalize first letters in words in the string with different separators using java 8 stream

I need to capitalize first letter in every word in the string, BUT it's not so easy as it seems to be as the word is considered to be any sequence of letters, digits, "_" , "-", "`" while all other chars are considered to be separators, i.e. after them the next letter must be capitalized.
Example what program should do:
For input: "#he&llo wo!r^ld"
Output should be: "#He&Llo Wo!R^Ld"
There are questions that sound similar here, but there solutions really don't help.
This one for example:
String output = Arrays.stream(input.split("[\\s&]+"))
.map(t -> t.substring(0, 1).toUpperCase() + t.substring(1))
.collect(Collectors.joining(" "));
As in my task there can be various separators, this solution doesn't work.

It is possible to split a string and keep the delimiters, so taking into account the requirement for delimiters:
word is considered to be any sequence of letters, digits, "_" , "-", "`" while all other chars are considered to be separators
the pattern which keeps the delimiters in the result array would be: "((?<=[^-`\\w])|(?=[^-`\\w]))":
[^-`\\w]: all characters except -, backtick and word characters \w: [A-Za-z0-9_]
Then, the "words" are capitalized, and delimiters are kept as is:
static String capitalize(String input) {
if (null == input || 0 == input.length()) {
return input;
}
return Arrays.stream(input.split("((?<=[^-`\\w])|(?=[^-`\\w]))"))
.map(s -> s.matches("[-`\\w]+") ? Character.toUpperCase(s.charAt(0)) + s.substring(1) : s)
.collect(Collectors.joining(""));
}
Tests:
System.out.println(capitalize("#he&l_lo-wo!r^ld"));
System.out.println(capitalize("#`he`&l+lo wo!r^ld"));
Output:
#He&l_lo-wo!R^Ld
#`he`&L+Lo Wo!R^Ld
Update
If it is needed to process not only ASCII set of characters but apply to other alphabets or character sets (e.g. Cyrillic, Greek, etc.), POSIX class \\p{IsWord} may be used and matching of Unicode characters needs to be enabled using pattern flag (?U):
static String capitalizeUnicode(String input) {
if (null == input || 0 == input.length()) {
return input;
}
return Arrays.stream(input.split("(?U)((?<=[^-`\\p{IsWord}])|(?=[^-`\\p{IsWord}]))")
.map(s -> s.matches("(?U)[-`\\p{IsWord}]+") ? Character.toUpperCase(s.charAt(0)) + s.substring(1) : s)
.collect(Collectors.joining(""));
}
Test:
System.out.println(capitalizeUnicode("#he&l_lo-wo!r^ld"));
System.out.println(capitalizeUnicode("#привет&`ёж`+дос^βιδ/ως"));
Output:
#He&L_lo-wo!R^Ld
#Привет&`ёж`+Дос^Βιδ/Ως

You can't use split that easily - split will eliminate the separators and give you only the things in between. As you need the separators, no can do.
One real dirty trick is to use something called 'lookahead'. That argument you pass to split is a regular expression. Most 'characters' in a regexp have the property that they consume the matching input. If you do input.split("\\s+") then that doesn't 'just' split on whitespace, it also consumes them: The whitespace is no longer part of the individual entries in your string array.
However, consider ^ and $. or \\b. These still match things but don't consume anything. You don't consume 'end of string'. In fact, ^^^hello$$$ matches the string "hello" just as well. You can do this yourself, using lookahead: It matches when the lookahead is there but does not consume it:
String[] args = "Hello World$Huh Weird".split("(?=[\\s_$-]+)");
for (String arg : args) System.out.println("*" + args[i] + "*");
Unfortunately, this 'works', in that it saves your separators, but isn't getting you all that much closer to a solution:
*Hello*
* World*
*$Huh*
* *
* *
* Weird*
You can go with lookbehind as well, but it's limited; they don't do variable length, for example.
The conclusion should rapidly become: Actually, doing this with split is a mistake.
Then, once split is off the table, you should no longer use streams, either: Streams don't do well once you need to know stuff about the previous element in a stream to do the job: A stream of characters doesn't work, as you need to know if the previous character was a non-letter or not.
In general, "I want to do X, and use Y" is a mistake. Keep an open mind. It's akin to asking: "I want to butter my toast, and use a hammer to do it". Oookaaaaayyyy, you can probably do that, but, eh, why? There are butter knives right there in the drawer, just.. put down the hammer, that's toast. Not a nail.
Same here.
A simple loop can take care of this, no problem:
private static final String BREAK_CHARS = "&-_`";
public String toTitleCase(String input) {
StringBuilder out = new StringBuilder();
boolean atBreak = true;
for (char c : input.toCharArray()) {
out.append(atBreak ? Character.toUpperCase(c) : c);
atBreak = Character.isWhitespace(c) || (BREAK_CHARS.indexOf(c) > -1);
}
return out.toString();
}
Simple. Efficient. Easy to read. Easy to modify. For example, if you want to go with 'any non-letter counts', trivial: atBreak = Character.isLetter(c);.
Contrast to the stream solution which is fragile, weird, far less efficient, and requires a regexp that needs half a page's worth of comment for anybody to understand it.
Can you do this with streams? Yes. You can butter toast with a hammer, too. Doesn't make it a good idea though. Put down the hammer!

You can use a simple FSM as you iterate over the characters in the string, with two states, either in a word, or not in a word. If you are not in a word and the next character is a letter, convert it to upper case, otherwise, if it is not a letter or if you are already in a word, simply copy it unmodified.
boolean isWord(int c) {
return c == '`' || c == '_' || c == '-' || Character.isLetter(c) || Character.isDigit(c);
}
String capitalize(String s) {
StringBuilder sb = new StringBuilder();
boolean inWord = false;
for (int c : s.codePoints().toArray()) {
if (!inWord && Character.isLetter(c)) {
sb.appendCodePoint(Character.toUpperCase(c));
} else {
sb.appendCodePoint(c);
}
inWord = isWord(c);
}
return sb.toString();
}
Note: I have used codePoints(), appendCodePoint(int), and int so that characters outside the basic multilingual plane (with code points greater than 64k) are handled correctly.

I need to capitalize first letter in every word
Here is one way to do it. Admittedly this is a might longer but your requirement to change the first letter to upper case (not first digit or first non-letter) required a helper method. Otherwise it would have been easier. Some others seemed to have missed this point.
Establish word pattern, and test data.
String wordPattern = "[\\w_-`]+";
Pattern p = Pattern.compile(wordPattern);
String[] inputData = { "#he&llo wo!r^ld", "0hel`lo-w0rld" };
Now this simply finds each successive word in the string based on the established regular expression. As each word is found, it changes the first letter in the word to upper case and then puts it in a string buffer in the correct position where the match was found.
for (String input : inputData) {
StringBuilder sb = new StringBuilder(input);
Matcher m = p.matcher(input);
while (m.find()) {
sb.replace(m.start(), m.end(),
upperFirstLetter(m.group()));
}
System.out.println(input + " -> " + sb);
}
prints
#he&llo wo!r^ld -> #He&Llo Wo!R^Ld
0hel`lo-w0rld -> 0Hel`lo-W0rld
Since words may start with digits, and the requirement was to convert the first letter (not character) to upper case. This method finds the first letter, converts it to upper case and
returns the new string. So 01_hello would become 01_Hello
public static String upperFirstLetter(String word) {
char[] chs = word.toCharArray();
for (int i = 0; i < chs.length; i++) {
if (Character.isLetter(chs[i])) {
chs[i] = Character.toUpperCase(chs[i]);
break;
}
}
return String.valueOf(chs);
}

Java regex: Replace all characters with `+` except instances of a given string

I have the following problem which states
Replace all characters in a string with + symbol except instances of the given string in the method
so for example if the string given was abc123efg and they want me to replace every character except every instance of 123 then it would become +++123+++.
I figured a regular expression is probably the best for this and I came up with this.
str.replaceAll("[^str]","+")
where str is a variable, but its not letting me use the method without putting it in quotations. If I just want to replace the variable string str how can I do that? I ran it with the string manually typed and it worked on the method, but can I just input a variable?
as of right now I believe its looking for the string "str" and not the variable string.
Here is the output its right for so many cases except for two :(
List of open test cases:
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
plusOut("abXYabcXYZ", "ab") → "ab++ab++++"
plusOut("abXYabcXYZ", "abc") → "++++abc+++"
plusOut("abXYabcXYZ", "XY") → "++XY+++XY+"
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
plusOut("--++ab", "++") → "++++++"
plusOut("aaxxxxbb", "xx") → "++xxxx++"
plusOut("123123", "3") → "++3++3"

Looks like this is the plusOut problem on CodingBat.
I had 3 solutions to this problem, and wrote a new streaming solution just for fun.
Solution 1: Loop and check
Create a StringBuilder out of the input string, and check for the word at every position. Replace the character if doesn't match, and skip the length of the word if found.
public String plusOut(String str, String word) {
StringBuilder out = new StringBuilder(str);
for (int i = 0; i < out.length(); ) {
if (!str.startsWith(word, i))
out.setCharAt(i++, '+');
else
i += word.length();
}
return out.toString();
}
This is probably the expected answer for a beginner programmer, though there is an assumption that the string doesn't contain any astral plane character, which would be represented by 2 char instead of 1.
Solution 2: Replace the word with a marker, replace the rest, then restore the word
public String plusOut(String str, String word) {
return str.replaceAll(java.util.regex.Pattern.quote(word), "#").replaceAll("[^#]", "+").replaceAll("#", word);
}
Not a proper solution since it assumes that a certain character or sequence of character doesn't appear in the string.
Note the use of Pattern.quote to prevent the word being interpreted as regex syntax by replaceAll method.
Solution 3: Regex with \G
public String plusOut(String str, String word) {
word = java.util.regex.Pattern.quote(word);
return str.replaceAll("\\G((?:" + word + ")*+).", "$1+");
}
Construct regex \G((?:word)*+)., which does more or less what solution 1 is doing:
\G makes sure the match starts from where the previous match leaves off
((?:word)*+) picks out 0 or more instance of word - if any, so that we can keep them in the replacement with $1. The key here is the possessive quantifier *+, which forces the regex to keep any instance of the word it finds. Otherwise, the regex will not work correctly when the word appear at the end of the string, as the regex backtracks to match .
. will not be part of any word, since the previous part already picks out all consecutive appearances of word and disallow backtrack. We will replace this with +
Solution 4: Streaming
public String plusOut(String str, String word) {
return String.join(word,
Arrays.stream(str.split(java.util.regex.Pattern.quote(word), -1))
.map((String s) -> s.replaceAll("(?s:.)", "+"))
.collect(Collectors.toList()));
}
The idea is to split the string by word, do the replacement on the rest, and join them back with word using String.join method.
Same as above, we need Pattern.quote to avoid split interpreting the word as regex. Since split by default removes empty string at the end of the array, we need to use -1 in the second parameter to make split leave those empty strings alone.
Then we create a stream out of the array and replace the rest as strings of +. In Java 11, we can use s -> String.repeat(s.length()) instead.
The rest is just converting the Stream to an Iterable (List in this case) and joining them for the result

This is a bit trickier than you might initially think because you don't just need to match characters, but the absence of specific phrase - a negated character set is not enough. If the string is 123, you would need:
(?<=^|123)(?!123).*?(?=123|$)
https://regex101.com/r/EZWMqM/1/
That is - lookbehind for the start of the string or "123", make sure the current position is not followed by 123, then lazy-repeat any character until lookahead matches "123" or the end of the string. This will match all characters which are not in a "123" substring. Then, you need to replace each character with a +, after which you can use appendReplacement and a StringBuffer to create the result string:
String inputPhrase = "123";
String inputStr = "abc123efg123123hij";
StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("(?<=^|" + inputPhrase + ")(?!" + inputPhrase + ").*?(?=" + inputPhrase + "|$)");
Matcher m = regex.matcher(inputStr);
while (m.find()) {
String replacement = m.group(0).replaceAll(".", "+");
m.appendReplacement(resultString, replacement);
}
m.appendTail(resultString);
System.out.println(resultString.toString());
Output:
+++123+++123123+++
Note that if the inputPhrase can contain character with a special meaning in a regular expression, you'll have to escape them first before concatenating into the pattern.

You can do it in one line:
input = input.replaceAll("((?:" + str + ")+)?(?!" + str + ").((?:" + str + ")+)?", "$1+$2");
This optionally captures "123" either side of each character and puts them back (a blank if there's no "123"):

So instead of coming up with a regular expression that matches the absence of a string. We might as well just match the selected phrase and append + the number of skipped characters.
StringBuilder sb = new StringBuilder();
Matcher m = Pattern.compile(Pattern.quote(str)).matcher(input);
while (m.find()) {
for (int i = 0; i < m.start(); i++) sb.append('+');
sb.append(str);
}
int remaining = input.length() - sb.length();
for (int i = 0; i < remaining; i++) {
sb.append('+');
}

Absolutely just for the fun of it, a solution using CharBuffer (unexpectedly it took a lot more that I initially hoped for):
private static String plusOutCharBuffer(String input, String match) {
int size = match.length();
CharBuffer cb = CharBuffer.wrap(input.toCharArray());
CharBuffer word = CharBuffer.wrap(match);
int x = 0;
for (; cb.remaining() > 0;) {
if (!cb.subSequence(0, size < cb.remaining() ? size : cb.remaining()).equals(word)) {
cb.put(x, '+');
cb.clear().position(++x);
} else {
cb.clear().position(x = x + size);
}
}
return cb.clear().toString();
}

To make this work you need a beast of a pattern. Let's say you you are operating on the following test case as an example:
plusOut("abXYxyzXYZ", "XYZ") → "+++++++XYZ"
What you need to do is build a series of clauses in your pattern to match a single character at a time:
Any character that is NOT "X", "Y" or "Z" -- [^XYZ]
Any "X" not followed by "YZ" -- X(?!YZ)
Any "Y" not preceded by "X" -- (?<!X)Y
Any "Y" not followed by "Z" -- Y(?!Z)
Any "Z" not preceded by "XY" -- (?<!XY)Z
An example of this replacement can be found here: https://regex101.com/r/jK5wU3/4
Here is an example of how this might work (most certainly not optimized, but it works):
import java.util.regex.Pattern;
public class Test {
public static void plusOut(String text, String exclude) {
StringBuilder pattern = new StringBuilder("");
for (int i=0; i<exclude.length(); i++) {
Character target = exclude.charAt(i);
String prefix = (i > 0) ? exclude.substring(0, i) : "";
String postfix = (i < exclude.length() - 1) ? exclude.substring(i+1) : "";
// add the look-behind (?<!X)Y
if (!prefix.isEmpty()) {
pattern.append("(?<!").append(Pattern.quote(prefix)).append(")")
.append(Pattern.quote(target.toString())).append("|");
}
// add the look-ahead X(?!YZ)
if (!postfix.isEmpty()) {
pattern.append(Pattern.quote(target.toString()))
.append("(?!").append(Pattern.quote(postfix)).append(")|");
}
}
// add in the other character exclusion
pattern.append("[^" + Pattern.quote(exclude) + "]");
System.out.println(text.replaceAll(pattern.toString(), "+"));
}
public static void main(String [] args) {
plusOut("12xy34", "xy");
plusOut("12xy34", "1");
plusOut("12xy34xyabcxy", "xy");
plusOut("abXYabcXYZ", "ab");
plusOut("abXYabcXYZ", "abc");
plusOut("abXYabcXYZ", "XY");
plusOut("abXYxyzXYZ", "XYZ");
plusOut("--++ab", "++");
plusOut("aaxxxxbb", "xx");
plusOut("123123", "3");
}
}
UPDATE: Even this doesn't quite work because it can't deal with exclusions that are just repeated characters, like "xx". Regular expressions are most definitely not the right tool for this, but I thought it might be possible. After poking around, I'm not so sure a pattern even exists that might make this work.

The problem in your solution that you put a set of instance string str.replaceAll("[^str]","+") which it will exclude any character from the variable str and that will not solve your problem
EX: when you try str.replaceAll("[^XYZ]","+") it will exclude any combination of character X , character Y and character Z from your replacing method so you will get "++XY+++XYZ".
Actually you should exclude a sequence of characters instead in str.replaceAll.
You can do it by using capture group of characters like (XYZ) then use a negative lookahead to match a string which does not contain characters sequence : ^((?!XYZ).)*$
Check this solution for more info about this problem but you should know that it may be complicated to find regular expression to do that directly.
I have found two simple solutions for this problem :
Solution 1:
You can implement a method to replace all characters with '+' except the instance of given string:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
for(int i = 0; i < str.length(); i++){
// exclude any instance string of exWord from replacing process in str
if(str.substring(i, str.length()).indexOf(exWord) + i == i){
i = i + exWord.length()-1;
}
else{
str = str.substring(0,i) + "+" + str.substring(i+1);//replace each character with '+' symbol
}
}
Note : str.substring(i, str.length()).indexOf(exWord) + i this if statement will exclude any instance string of exWord from replacing process in str.
Output:
+++++++XYZ
Solution 2:
You can try this Approach using ReplaceAll method and it doesn't need any complex regular expression:
String exWord = "XYZ";
String str = "abXYxyzXYZ";
str = str.replaceAll(exWord,"*"); // replace instance string with * symbol
str = str.replaceAll("[^*]","+"); // replace all characters with + symbol except *
str = str.replaceAll("\\*",exWord); // replace * symbol with instance string
Note : This solution will work only if your input string str doesn't contain any * symbol.
Also you should escape any character with a special meaning in a regular expression in phrase instance string exWord like : exWord = "++".

Java Get first character values for a string

I have inputs like
AS23456SDE
MFD324FR
I need to get First Character values like
AS, MFD
There should no first two or first 3 characters input can be changed. Need to get first characters before a number.
Thank you.
Edit : This is what I have tried.
public static String getPrefix(String serial) {
StringBuilder prefix = new StringBuilder();
for(char c : serial.toCharArray()){
if(Character.isDigit(c)){
break;
}
else{
prefix.append(c);
}
}
return prefix.toString();
}

Here is a nice one line solution. It uses a regex to match the first non numeric characters in the string, and then replaces the input string with this match.
public String getFirstLetters(String input) {
return new String("A" + input).replaceAll("^([^\\d]+)(.*)$", "$1")
.substring(1);
}
System.out.println(getFirstLetters("AS23456SDE"));
System.out.println(getFirstLetters("1AS123"));
Output:
AS
(empty)

A simple solution could be like this:
public static void main (String[]args) {
String str = "MFD324FR";
char[] characters = str.toCharArray();
for(char c : characters){
if(Character.isDigit(c))
break;
else
System.out.print(c);
}
}

Use the following function to get required output
public String getFirstChars(String str){
int zeroAscii = '0'; int nineAscii = '9';
String result = "";
for (int i=0; i< str.lenght(); i++){
int ascii = str.toCharArray()[i];
if(ascii >= zeroAscii && ascii <= nineAscii){
result = result + str.toCharArray()[i];
}else{
return result;
}
}
return str;
}
pass your string as argument

I think this can be done by a simple regex which matches digits and java's string split function. This Regex based approach will be more efficient than the methods using more complicated regexs.
Something as below will work
String inp = "ABC345.";
String beginningChars = inp.split("[\\d]+",2)[0];
System.out.println(beginningChars); // only if you want to print.
The regex I used "[\\d]+" is escaped for java already.
What it does?
It matches one or more digits (d). d matches digits of any language in unicode, (so it matches japanese and arabian numbers as well)
What does String beginningChars = inp.split("[\\d]+",2)[0] do?
It applies this regex and separates the string into string arrays where ever a match is found. The [0] at the end selects the first result from that array, since you wanted the starting chars.
What is the second parameter to .split(regex,int) which I supplied as 2?
This is the Limit parameter. This means that the regex will be applied on the string till 1 match is found. Once 1 match is found the string is not processed anymore.
From the Strings javadoc page:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will be efficient if your string is huge.
Possible other regex if you want to split only on english numerals
"[0-9]+"

public static void main(String[] args) {
String testString = "MFD324FR";
int index = 0;
for (Character i : testString.toCharArray()) {
if (Character.isDigit(i))
break;
index++;
}
System.out.println(testString.substring(0, index));
}
this prints the first 'n' characters before it encounters a digit (i.e. integer).

String manipulation of function names

For this Kata, i am given random function names in the PEP8 format and i am to convert them to camelCase.
(input)get_speed == (output)getSpeed ....
(input)set_distance == (output)setDistance
I have a understanding on one way of doing this written in pseudo-code:
loop through the word,
if the letter is an underscore
then delete the underscore
then get the next letter and change to a uppercase
endIf
endLoop
return the resultant word
But im unsure the best way of doing this, would it be more efficient to create a char array and loop through the element and then when it comes to finding an underscore delete that element and get the next index and change to uppercase.
Or would it be better to use recursion:
function camelCase takes a string
if the length of the string is 0,
then return the string
endIf
if the character is a underscore
then change to nothing,
then find next character and change to uppercase
return the string taking away the character
endIf
finally return the function taking the first character away
Any thoughts please, looking for a good efficient way of handing this problem. Thanks :)

I would go with this:
divide given String by underscore to array
from second word until end take first letter and convert it to uppercase
join to one word
This will work in O(n) (go through all names 3 time). For first case, use this function:
str.split("_");
for uppercase use this:
String newName = substring(0, 1).toUpperCase() + stre.substring(1);
But make sure you check size of the string first...
Edited - added implementation
It would look like this:
public String camelCase(String str) {
if (str == null ||str.trim().length() == 0) return str;
String[] split = str.split("_");
String newStr = split[0];
for (int i = 1; i < split.length; i++) {
newStr += split[i].substring(0, 1).toUpperCase() + split[i].substring(1);
}
return newStr;
}
for inputs:
"test"
"test_me"
"test_me_twice"
it returns:
"test"
"testMe"
"testMeTwice"

It would be simpler to iterate over the string instead of recursing.
String pep8 = "do_it_again";
StringBuilder camelCase = new StringBuilder();
for(int i = 0, l = pep8.length(); i < l; ++i) {
if(pep8.charAt(i) == '_' && (i + 1) < l) {
camelCase.append(Character.toUpperCase(pep8.charAt(++i)));
} else {
camelCase.append(pep8.charAt(i));
}
}
System.out.println(camelCase.toString()); // prints doItAgain

The question you pose is whether to use an iterative or a recursive approach. For this case I'd go for the recursive approach because it's straightforward, easy to understand doesn't require much resources (only one array, no new stackframe etc), though that doesn't really matter for this example.
Recursion is good for divide-and-conquer problems, but I don't see that fitting the case well, although it's possible.
An iterative implementation of the algorithm you described could look like the following:
StringBuilder buf = new StringBuilder(input);
for(int i = 0; i < buf.length(); i++){
if(buf.charAt(i) == '_'){
buf.deleteCharAt(i);
if(i != buf.length()){ //check fo EOL
buf.setCharAt(i, Character.toUpperCase(buf.charAt(i)));
}
}
}
return buf.toString();
The check for the EOL is not part of the given algorithm and could be ommitted, if the input string never ends with '_'

Removing duplicate same characters in a row

I am trying to create a method which will either remove all duplicates from a string or only keep the same 2 characters in a row based on a parameter.
For example:
helllllllo -> helo
or
helllllllo -> hello - This keeps double letters
Currently I remove duplicates by doing:
private String removeDuplicates(String word) {
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < word.length(); i++) {
char letter = word.charAt(i);
if (buffer.length() == 0 && letter != buffer.charAt(buffer.length() - 1)) {
buffer.append(letter);
}
}
return buffer.toString();
}
If I want to keep double letters I was thinking of having a method like private String removeDuplicates(String word, boolean doubleLetter)
When doubleLetter is true it will return hello not helo
I'm not sure of the most efficient way to do this without duplicating a lot of code.

why not just use a regex?
public class RemoveDuplicates {
public static void main(String[] args) {
System.out.println(new RemoveDuplicates().result("hellllo", false)); //helo
System.out.println(new RemoveDuplicates().result("hellllo", true)); //hello
}
public String result(String input, boolean doubleLetter){
String pattern = null;
if(doubleLetter) pattern = "(.)(?=\\1{2})";
else pattern = "(.)(?=\\1)";
return input.replaceAll(pattern, "");
}
}
(.) --> matches any character and puts in group 1.
?= --> this is called a positive lookahead.
?=\\1 --> positive lookahead for the first group
So overall, this regex looks for any character that is followed (positive lookahead) by itself. For example aa or bb, etc. It is important to note that only the first character is part of the match actually, so in the word 'hello', only the first l is matched (the part (?=\1) is NOT PART of the match). So the first l is replaced by an empty String and we are left with helo, which does not match the regex
The second pattern is the same thing, but this time we look ahead for TWO occurrences of the first group, for example helllo. On the other hand 'hello' will not be matched.
Look here for a lot more: Regex
P.S. Fill free to accept the answer if it helped.

try
String s = "helllllllo";
System.out.println(s.replaceAll("(\\w)\\1+", "$1"));
output
helo

Taking this previous SO example as a starting point, I came up with this:
String str1= "Heelllllllllllooooooooooo";
String removedRepeated = str1.replaceAll("(\\w)\\1+", "$1");
System.out.println(removedRepeated);
String keepDouble = str1.replaceAll("(\\w)\\1{2,}", "$1");
System.out.println(keepDouble);
It yields:
Helo
Heelo
What it does:
(\\w)\\1+ will match any letter and place it in a regex capture group. This group is later accessed through the \\1+. Meaning that it will match one or more repetitions of the previous letter.
(\\w)\\1{2,} is the same as above the only difference being that it looks after only characters which are repeated more than 2 times. This leaves the double characters untouched.
EDIT:
Re-read the question and it seems that you want to replace multiple characters by doubles. To do that, simply use this line:
String keepDouble = str1.replaceAll("(\\w)\\1+", "$1$1");

Try this, this will be most efficient way[Edited after comment]:
public static String removeDuplicates(String str) {
int checker = 0;
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - 'a';
if ((checker & (1 << val)) == 0)
buffer.append(str.charAt(i));
checker |= (1 << val);
}
return buffer.toString();
}
I am using bits to identify uniqueness.
EDIT:
Whole logic is that if a character has been parsed then its corrresponding bit is set and next time when that character comes up then it will not be added in String Buffer the corresponding bit is already set.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Generate new word from wildcard [duplicate] - java

The * character is a regex wildcard, so you can treat the input string as a regular expression: for (String word : myDictionary) { if (word.matches(s)) { System.out.println(word); } } Let the libraries do the heavy lifting for you ;)

Related

Capitalize first letters in words in the string with different separators using java 8 stream

Java regex: Replace all characters with `+` except instances of a given string

Java Get first character values for a string

String manipulation of function names

Removing duplicate same characters in a row

Categories

Resources