Camel-Case to Sentence-Case in Java - java

I have the following code to convert a camel-case phrase to sentence-case. It works fine for almost all cases, but it can't handle acronyms. How can this code be corrected to work with acronyms?
private static final Pattern UPPERCASE_LETTER = Pattern.compile("([A-Z]|[0-9]+)");
static String toSentenceCase(String camelCaseString) {
return camelCaseString.substring(0, 1).toUpperCase()
+ UPPERCASE_LETTER.matcher(camelCaseString.substring(1))
.replaceAll(matchResult -> " " + (matchResult.group(1).toLowerCase()));
}
JUnit5 test:
#ParameterizedTest(name = "#{index}: Convert {0} to sentence case")
#CsvSource(value = {"testOfAcronymUSA:Test of acronym USA"}, delimiter = ':')
void shouldSentenceCaseAcronym(String input, String expected) {
//TODO: currently fails
assertEquals(expected, toSentenceCase(input));
}
Output:
org.opentest4j.AssertionFailedError:
Expected :Test of acronym USA
Actual :Test of acronym u s a
I thought to add (?=[a-z]) to the end of the regex, but then it doesn't handle the spacing correctly.
I'm on Java 14.

Change the regex to (?<=[a-z])[A-Z]+|[A-Z](?=[a-z])|[0-9]+ where
(?<=[a-z])[A-Z]+ specifies positive lookbehind for [a-z]
[A-Z](?=[a-z]) specifies positive lookahead for [a-z]
Note that you do not need any capturing group.
Demo:
import java.util.regex.Pattern;
public class Main {
private static final Pattern UPPERCASE_LETTER = Pattern.compile("(?<=[a-z])[A-Z]+|[A-Z](?=[a-z])|[0-9]+");
static String toSentenceCase(String camelCaseString) {
return camelCaseString.substring(0, 1).toUpperCase() + UPPERCASE_LETTER.matcher(camelCaseString.substring(1))
.replaceAll(matchResult -> !matchResult.group().matches("[A-Z]{2,}")
? " " + matchResult.group().toLowerCase()
: " " + matchResult.group());
}
public static void main(String[] args) {
System.out.println(toSentenceCase("camelCaseString"));
System.out.println(toSentenceCase("USA"));
System.out.println(toSentenceCase("camelCaseStringUSA"));
}
}
Output:
Camel case string
USA
Camel case string USA

To fix your immediate issue you may use
private static final Pattern UPPERCASE_LETTER = Pattern.compile("([A-Z]{2,})|([A-Z]|[0-9]+)");
static String toSentenceCase(String camelCaseString) {
return camelCaseString.substring(0, 1).toUpperCase()
+ UPPERCASE_LETTER.matcher(camelCaseString.substring(1))
.replaceAll(m -> m.group(1) != null ? " " + m.group(1) : " " + m.group(2).toLowerCase() );
}
See the Java demo.
Details
([A-Z]{2,})|([A-Z]|[0-9]+) regex matches and captures into Group 1 two or more uppercase letters, or captures into Group 2 a single uppercase letter or 1+ digits
.replaceAll(m -> m.group(1) != null ? " " + m.group(1) : " " + m.group(2).toLowerCase() ) replaces with space + Group 1 if Group 1 matched, else with a space and Group 2 turned to lower case.

Related

regex expression in java using wildcards

Is there a way to use a regex expression with wild cards? Specifically, I have a String phrase and another String target. I would like to use the match method to find the first occurrence of the target in the phrase where the character before and after the target is anything other than a-z.
Updated:
Is there a way to use the String method matches() with the following regex:
"(?<![a-z])" + "hello" + "(?![a-z])";
You can use the regex, "(?<![a-z])" + Pattern.quote(phrase) + "(?![a-z])"
Demo at regex101 with phrase = "hello".
(?<![a-z]): Negative lookbehind for [a-z]
(?![a-z]): Negative lookahead for [a-z]
Java Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) {
// Test
String phrase = "hello";
String regex = "(?<![a-z])" + Pattern.quote(phrase) + "(?![a-z])";
Pattern pattern = Pattern.compile(regex);
Stream.of(
"hi hello world",
"hihelloworld"
).forEach(s -> {
Matcher matcher = pattern.matcher(s);
System.out.print(s + " => ");
if(matcher.find()) {
System.out.println("Match found");
}else {
System.out.println("No match found");
}
});
}
}
Output:
hi hello world => Match found
hihelloworld => No match found
In case you want the full-match, use the regex, .*(?<![a-z]) + Pattern.quote(phrase) +(?![a-z]).* as demonstrated at regex101.com. The pattern, .* means any character any number of times. The rest of the patterns are already explained above. The presence of .* before and after the match will ensure covering the whole string.
Java Demo:
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) {
// Test
String phrase = "hello";
String regex = ".*(?<![a-z])" + Pattern.quote(phrase) + "(?![a-z]).*";
Stream.of(
"hi hello world",
"hihelloworld"
).forEach(s -> System.out.println(s + " => " + (s.matches(regex) ? "Match found" : "No match found")));
}
}
Output:
hi hello world => Match found
hihelloworld => No match found

Regex to capture the staring with specific word or character and ending with either one of the word

Want to capture the string after the last slash and before either a (; sid=) word or a (?) character.
sample data:
sessionId=30a793b1-ed7e-464a-a630; Url=https://www.example.com/mybook/order/newbooking/itemSummary; sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;
sessionId=sfdsdfsd-ba57-4e21-a39f-34; Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW targetUrl=https://www.example.com/ mybook/order/newbooking/page1?id=123;
sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; Url=https://www.example.com/mybook/order/newbooking/; sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;
Expecting the below output:
1. itemSummary
2. itemList
3. ''(empty string)
Have build the below regex to capture it but its 100% accurate. It is capturing some additional part.
Regex
Url=.*\/(.*)(; sid|\?)
Could you please help me to improve the regex to get desired output?
Thanks in advance!
You may use this regex in Java with a greedy match after Url=:
\bUrl=\S+/([^?;/]+)(?=; sid|\?)
RegEx Demo
RegEx Demo:
\b: Word boundary
Url=: Match text Url=
\S+/: Match 1+ non-whitespace characters followed by a /
([^?;/]+): Match 1+ of a character that not ? and ; and /
(?=; sid|\?): Lookahead to assert that we have ; sid or ? ahead
Alternative solution:
Used regex:
"^Url=.*/(\\w+|)$"
Regex in test bench and context:
public static void main(String[] args) {
String input1 = "sessionId=30a793b1-ed7e-464a-a630; "
+ "Url=https://www.example.com/mybook/order/newbooking/itemSummary; "
+ "sid=KJ4dgQGdhg7dDn1h0TLsqhsdfhsfhjhsdjfhjshdjfhjsfddscg139bjXZQdkbHpzf9l6wy1GdK5XZp; "
+ "targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=122;";
String input2 = "sessionId=sfdsdfsd-ba57-4e21-a39f-34; "
+ "Url=https://www.example.com/mybook/order/newbooking/itemList?id=76734&para=jhjdfhj&type=new&ordertype=kjkf&memberid=273647632&iSearch=true; "
+ "sid=Q4hWgR1GpQb8xWTLpQB2yyyzmYRgXgFlJLGTc0QJyZbW "
+ "targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=123;";
String input3 = "sessionId=0e1acab1-45b8-sdf3454fds-afc1-sdf435sdfds; "
+ "Url=https://www.example.com/mybook/order/newbooking/; "
+ "sid=hkm2gRSL2t5ScKSJKSJn3vg2sfdsfdsfdsfdsfdfdsfdsfdsfvJZkDD3ng0kYTjhNQw8mFZMn; "
+ "targetUrl=https://www.example.com/mybook/order/newbooking/page1?id=343;";
List<String> inputList = Arrays.asList(input1, input2, input3);
// Pre-compiled Patterns should not be in loops - that is why they are placed outside the loops
Pattern replaceWithNewLinePattern = Pattern.compile(";?\\s|\\?");
Pattern extractWordFromUrlPattern = Pattern.compile("^Url=.*/(\\w+|)$", Pattern.MULTILINE);
int count = 0;
for(String input : inputList) {
String inputWithNewLines = replaceWithNewLinePattern.matcher(input).replaceAll("\n");
// System.out.println(inputWithNewLines); // Check the change...
Matcher matcher = extractWordFromUrlPattern.matcher(inputWithNewLines);
while (matcher.find()) {
System.out.printf( "%d. '%s'%n", ++count, matcher.group(1));
}
}
}
Output:
1. 'itemSummary'
2. 'itemList'
3. ''

Need help in regex matching

It may be very simple, but I am extremely new to regex and have a requirement where I need to do some regex matches in a string and extract the number in it. Below is my code with sample i/p and required o/p. I tried to construct the Pattern by referring to https://www.freeformatter.com/java-regex-tester.html, but my regex match itself is returning false.
Pattern pattern = Pattern.compile(".*/(a-b|c-d|e-f)/([0-9])+(#[0-9]?)");
String str = "foo/bar/Samsung-Galaxy/a-b/1"; // need to extract 1.
String str1 = "foo/bar/Samsung-Galaxy/c-d/1#P2";// need to extract 2.
String str2 = "foo.com/Samsung-Galaxy/9090/c-d/69"; // need to extract 69
System.out.println("result " + pattern.matcher(str).matches());
System.out.println("result " + pattern.matcher(str1).matches());
System.out.println("result " + pattern.matcher(str1).matches());
All of above SOPs are returning false. I am using java 8, is there is any way by which in a single statement I can match the pattern and then extract the digit from the string.
I would be great if somebody can point me on how to debug/develop the regex.Please feel free to let me know if something is not clear in my question.
You may use
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
See the regex demo
When used with matches(), the pattern above does not require explicit anchors, ^ and $.
Details
.* - any 0+ chars other than line break chars, as many as possible
/ - the rightmost / that is followed with the subsequent subpatterns
(?:a-b|c-d|e-f) - a non-capturing group matching any of the alternatives inside: a-b, c-d or e-f
/ - a / char
[^/]*? - any chars other than /, as few as possible
([0-9]+) - Group 1: one or more digits.
Java demo:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
for (String s : strs) {
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(s + ": \"" + m.group(1) + "\"");
}
}
A replacing approach using the same regex with anchors added:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
String pattern = "^.*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)$";
for (String s : strs) {
System.out.println(s + ": \"" + s.replaceFirst(pattern, "$1") + "\"");
}
See another Java demo.
Output:
foo/bar/Samsung-Galaxy/a-b/1: "1"
foo/bar/Samsung-Galaxy/c-d/1#P2: "2"
foo.com/Samsung-Galaxy/9090/c-d/69: "69"
Because you match always the last number in your regex, I would Like to just use replaceAll with this regex .*?(\d+)$ :
String regex = ".*?(\\d+)$";
String strResult1 = str.replaceAll(regex, "$1");
System.out.println(!strResult1.isEmpty() ? "result " + strResult1 : "no result");
String strResult2 = str1.replaceAll(regex, "$1");
System.out.println(!strResult2.isEmpty() ? "result " + strResult2 : "no result");
String strResult3 = str2.replaceAll(regex, "$1");
System.out.println(!strResult3.isEmpty() ? "result " + strResult3 : "no result");
If the result is empty then you don't have any number.
Outputs
result 1
result 2
result 69
Here is a one-liner using String#replaceAll:
public String getDigits(String input) {
String number = input.replaceAll(".*/(?:a-b|c-d|e-f)/[^/]*?(\\d+)$", "$1");
return number.matches("\\d+") ? number : "no match";
}
System.out.println(getDigits("foo.com/Samsung-Galaxy/9090/c-d/69"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/a-b/some other text/1"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/9090/a-b/69ace"));
69
no match
no match
This works on the sample inputs you provided. Note that I added logic which will display no match for the case where ending digits could not be matched fitting your pattern. In the case of a non-match, we would typically be left with the original input string, which would not be all digits.

Matcher.group throws IndexOutOfBoundsException Exception

I've below code and in which i am trying to print all the matches in a String using Matcher.group().
public static void main(String[] args) {
String s = "foo\r\nbar\r\nfoo"
+ "foo, bar\r\nak = "
+ "foo, bar\r\nak = "
+ "bar, bar\r\nak = "
+ "blr05\r\nsdfsdkfhsklfh";
//System.out.println(s);
Matcher matcher = Pattern.compile("^ak\\s*=\\s*(\\w+)", Pattern.MULTILINE)
.matcher(s);
matcher.find();
// This one works
System.out.println("first match " + matcher.group(1));
// Below 2 lines throws IndexOutOfBoundsException
System.out.println("second match " + matcher.group(2));
System.out.println("third match " + matcher.group(3));
}
Above code throws Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 2 Exception.
So My question is how Matcher.group() works and As you can see i'll have 3 matching string, how can i print all of them using the group().
It is clear that you have only one group :
^ak\\s*=\\s*(\\w+)
// ^----^----------this is the only group
instead you have to use a loop for example :
while(matcher.find()){
System.out.println("match " + matcher.group());
}
Outputs
match = foo
match = bar
match = blr05
read about groups:
Capturing group
Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered
group that can be reused with a numbered backreference. They allow you
to apply regex operators to the entire grouped regex.
You seemed to be confused by capture groups and the number of matches found in your string with the given pattern. In the pattern you used, you only have one capture group:
^ak\\s*=\\s*(\\w+)
A capture group is marked using parentheses in the pattern.
If you want to retrieve every match of your pattern against the input string, then you should use a while loop:
while (matcher.find()) {
System.out.println("entire pattern: " + matcher.group(0));
System.out.println("first capture group: " + matcher.group(1));
}
Each call to Matcher#find() will apply the pattern against the input string, from start to end, and will make available whatever matches.

Regex to remove stop-words between two words

I have a set of words named "stopwords". Now i need to match two words but between these words can appear a space or a words in the set "stopwords",for e.g. "power energy", "power of energy", "power for energy", "power of the energy".
In the stopwords set there are also "for, of, the, ..."
I want to obtain "power energy" without stopwords. Is it possible?
Finding the the substring will work. This will format any phrase in the form
Word (stopwords)+ Endword to Word Endword
String power = "power of energy";
String[] toks = power.split("[\\s]+"); // in case of extra space between words.
String removed =
power.substring(power.indexOf(toks[0]), power.indexOf(toks[0])
+ toks[0].length())
+ " " + power.substring(power.indexOf(toks[toks.length - 1]), power.indexOf(toks[toks.length - 1 ])
+ toks[toks.length - 1].length());
System.out.println(removed);
Output: power energy
Method
public static String removeStopWord(String phrase){
String[] toks = phrase.split("[\\s]+");
String removed =
phrase.substring(phrase.indexOf(toks[0]), phrase.indexOf(toks[0])
+ toks[0].length())
+ " " + phrase.substring(phrase.indexOf(toks[toks.length - 1]), phrase.indexOf(toks[toks.length - 1])
+ toks[toks.length - 1].length());
return removed;
}
Simple replaceAll() of java would do the trick :)
public class Replace {
public static void main(String[] args) {
String s="power of the world";
s=s.replaceAll("of|the|", "");
s=s.replaceAll("( )+", " ");
System.out.println(s);
}
}

Categories

Resources