Java Regular expression - find strings with no vowels

Java Regular expression - find strings with no vowels - java

I have a list of words and I have to output the number of words with no vowels in them. I have this so far
String matchString = "[^aeiou]"
for(String s: list) if(s.matches(matchString.toLowerCase())) {
System.out.println(s);
numMatches++;
}
I'm more worried that the reg expression is wrong.

Change regex to
[^aeiou]+
^-- important part
to test if string is build from one or more non vowel characters. Currently you are just checking if stirng is build from one character which is not a e i o u.
You can also make your regex case insensitive by adding (?i) flag at start. This way characters used in regex will represent its lower and upper case
(?i)[^aeiou]+

This one does the trick for me:
[^aeiou]+$
Also you should lowercase the string, not the expression:
if (s.toLowerCase().matches(matchString))

Looks like you put the regex into lowercase rather than the String you're actually testing. You want
if (s.toLowerCase().matches(matchString)){...}
Also, I believe your regex should be "[^aeiou]*", so you can match all the characters in the word.

import java.util.regex.*;
public class Vowels {
public static void main(String[] args) {
String s = "hello";
int count = 0;
Pattern p = Pattern.compile("[aeiou]");
Matcher m = p.matcher(s);
while (m.find()) {
count++;
System.out.println(m.start() + "...." + m.group());
}
System.out.println("number of vowels found in the string ="+count);
}
}

For me only /^[^aeiou]+$/ did the job.

Related

Matching three or more identical characters - Java program [duplicate]

I have this small piece of code
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]"))
{
System.out.println(s);
}
}
Supposed to print
dkoe
but it prints nothing!!

Welcome to Java's misnamed .matches() method... It tries and matches ALL the input. Unfortunately, other languages have followed suit :(
If you want to see if the regex matches an input text, use a Pattern, a Matcher and the .find() method of the matcher:
Pattern p = Pattern.compile("[a-z]");
Matcher m = p.matcher(inputstring);
if (m.find())
// match
If what you want is indeed to see if an input only has lowercase letters, you can use .matches(), but you need to match one or more characters: append a + to your character class, as in [a-z]+. Or use ^[a-z]+$ and .find().

[a-z] matches a single char between a and z. So, if your string was just "d", for example, then it would have matched and been printed out.
You need to change your regex to [a-z]+ to match one or more chars.

String.matches returns whether the whole string matches the regex, not just any substring.

java's implementation of regexes try to match the whole string
that's different from perl regexes, which try to find a matching part
if you want to find a string with nothing but lower case characters, use the pattern [a-z]+
if you want to find a string containing at least one lower case character, use the pattern .*[a-z].*

Used
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("[a-z]+"))
{
System.out.println(s);
}
}

I have faced the same problem once:
Pattern ptr = Pattern.compile("^[a-zA-Z][\\']?[a-zA-Z\\s]+$");
The above failed!
Pattern ptr = Pattern.compile("(^[a-zA-Z][\\']?[a-zA-Z\\s]+$)");
The above worked with pattern within ( and ).

Your regular expression [a-z] doesn't match dkoe since it only matches Strings of lenght 1. Use something like [a-z]+.

you must put at least a capture () in the pattern to match, and correct pattern like this:
String[] words = {"{apf","hum_","dkoe","12f"};
for(String s:words)
{
if(s.matches("(^[a-z]+$)"))
{
System.out.println(s);
}
}

You can make your pattern case insensitive by doing:
Pattern p = Pattern.compile("[a-z]+", Pattern.CASE_INSENSITIVE);

Java extract only first letters/characters from String

Hello guys I want to extract only first letters from this String:
String str = "使 徒 行 傳 16:31 ERV-ZH";
I only want to get these characters:
使 徒 行 傳
and not include
ERV-ZH
Only the letters or characters before the numbers plus the colon.
Note that Chinese letters can also be English and other letters.
this is what I've tried:
str.split(" ")[0];
But I'm only getting the first letter. Do you have an idea how to achieve my requirement? Any help will be appreciated. Thanks.
NOTE:
Also, strings are dynamic so I only presented sample characters.

This should give you the desired output
String str = "使 徒 行 傳 16:31 ERV-ZH";
String[] test = str.split("\\d\\d:\\d\\d");
for (String s : test) {
System.out.println(s);
}
The first element will be the part before the time and so on
Edit: if you are in need to be more dynamic for times like 6:31 or 16:6 then you could use this regex "\\d{1,2}:\\d{1,2}"

You can use the following regex ^([\\D\\s]+), this is what you need:
String str = "使 徒 行 傳 16:31 ERV-ZH";
String pattern = "^([\\D\\s]+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(str);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
}
This is a live DEMO here.
In the following regex ^([\\D\\s]+):
^ will match only in the begginnig.
\\D will avoid matching any number.
Note that this will be the case for any string.

If you don't always have a date pattern that can be used as a delimiter in the middle, and are looking for a more generic solution, you could go with this: str.replaceAll("[^\\p{L}\\s]+.*", "")

Extracting both matching and not matching regex

I have a String like this one abc3a de'f gHi?jk I want to split it into the substrings abc3a, de'f, gHi, ? and jk. In other terms, I want to return Strings that match the regular expression [a-zA-Z0-9'] and the Strings that do not match this regular expression. If there is a way to tell whether each resulting substring is a match or not, this will be a plus.
Thanks!

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void main(String []args){
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9']*)?");
String str = "abc3a de'f gHi?jk";
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
if(matcher.group(1).length() > 0)
System.out.println("Match:" + matcher.group(1));
if(matcher.group(2).length() > 0)
System.out.println("Miss: `" + matcher.group(2) + "`");
}
}
}
Output:
Match:abc3a
Miss: ` `
Match:de'f
Miss: ` `
Match:gHi
Miss: `?`
Match:jk
If you don't want white space.
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9'\\s]*)?");
Output:
Match:abc3a
Match:de'f
Match:gHi
Miss: `?`
Match:jk

You can use this regex:
"[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+"
Will give:
["abc3a", "de'f", "gHi", "?", "jk"]
Online Demo: http://regex101.com/r/xS0qG4
Java code:
Pattern p = Pattern.compile("[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+");
Matcher m = p.matcher("abc3a de'f gHi?jk");
while (m.find())
System.out.println(m.group());
OUTPUT
abc3a
de'f
gHi
?
jk

myString.split("\\s+|(?<=[a-zA-Z0-9'])(?=[^a-zA-Z0-9'\\s])|(?<=[^a-zA-Z0-9'\\s])(?=[a-zA-Z0-9'])")
splits at all the boundaries between runs of characters in that charset.
The lookbehind (?<=...) matches after a character in a run, while the lookahead (?=...) matches before a character in a run of characters outside the set.
The \\s+ is not a boundary match, and matches a run of whitespace characters. This has the effect of removing white-space from the result entirely.
The | allows causing splitting to happy at either boundary or at a run of white-space.
Since the lookbehind and lookahead are both positive, the boundaries will not match at the start or end of the string, so there's no need to ignore empty strings in the output unless there is white-space there.

You can use anchors to split
private static String[] splitString(final String s) {
final String [] arr = s.split("(?=[^a-zA-Z0-9'])|(?<=[^a-zA-Z0-9'])");
final ArrayList<String> strings = new ArrayList<String>(arr.length);
for (final String str : arr) {
if(!"".equals(str.trim())) {
strings.add(str);
}
}
return strings.toArray(new String[strings.size()]);
}
(?=xxx) means xxx will follow here and (?<=xxx) mean xxx precedes this position.
As you did not want to include all-whitespace-matches into the result you need to filter the Array given by split.

Regular Expression for UpperCase Letters In A String

For the life of me, I can't figure out why this regular expression is not working. It should find upper case letters in the given string and give me the count. Any ideas are welcome.
Here is the unit test code:
public class RegEx {
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "^[A-Z]+$";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
System.out.printf("Found %d, of capital letters in %s%n", matcher.groupCount(), testStr);
}
}

It doesn't work because you have 2 problems:
Regex is incorrect, it should be "[A-Z]" for ASCII letter or \p{Lu} for Unicode uppercase letters
You're not calling while (matcher.find()) before matcher.groupCount()
Correct code:
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "(\\p{Lu})";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find())
System.out.printf("Found %d, of capital letters in %s%n",
matcher.groupCount(), testStr);
}
UPDATE: Use this much simpler one-liner code to count number of Unicode upper case letters in a string:
int countuc = testStr.split("(?=\\p{Lu})").length - 1;

You didn't call matches or find on the matcher. It hasn't done any work.
getGroupCount is the wrong method to call. Your regex has no capture groups, and even if it did, it wouldn't give you the character count.
You should be using find, but with a different regex, one without anchors. I would also advise using the proper Unicode character class: "\\p{Lu}+". Use this in a while (m.find()) loop, and accumulate the total number of characters obtained from m.group(0).length() at each step.

This should do what you're after,
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "[A-Z]+";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
int count = 0;
while (matcher.find()) {
count+=matcher.group(0).length();
}
System.out.printf("Found %d, of capital letters in %s%n", count, testStr);
}

It should find upper case letters in the given string and give me the count.
No, it shouldn't: the ^ and $ anchors prevent it from doing so, forcing to look for a non-empty string composed entirely of uppercase characters.
Moreover, you cannot expect a group count in an expression that does not define groups to be anything other than zero (no matches) or one (a single match).
If you insist on using a regex, use a simple [A-Z] expression with no anchors, and call matcher.find() in a loop. A better approach, however, would be calling Character.isUpperCase on the characters of your string, and counting the hits:
int count = 0;
for (char c : str.toCharArray()) {
if (Character.isUpperCase(c)) {
count++;
}
}

Your pattern as you've written it looks for 1 or more capital letters between the beginning and the end of the line...if there are any lowercase characters in the line it won't match.

In this example i'm using a regex(regular Expression) to count the number of UpperCase and LowerCase letters in the given string using Java.
import java.util.regex.*;
import java.util.Scanner;
import java.io.*;
public class CandidateCode {
public static void main(String args[] ) throws Exception {
Scanner sc= new Scanner(System.in);
// Reads the String of data entered in a line
String str = sc.nextLine();
//counts uppercase letteres in the given String
int countuc = str.split("([A-Z]+?)").length;
//counts lowercase letteres in the given String
int countlc = str.split("([a-z]+?)").length;
System.out.println("UpperCase count: "+countuc-1);
System.out.println("LowerCase count: "+countlc-1);
}
}

Change the regular expression to
[A-Z] which checks all occurrences of capital letters
Please refer the below example which counts number of capital letters in a string using pattern
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
Pattern ptrn = Pattern.compile("[A-Z]");
Matcher matcher = ptrn.matcher("ivekKVVV");
int from = 0;
int count = 0;
while(matcher.find(from)) {
count++;
from = matcher.start() + 1;
}
System.out.println(count);
}
}

You can also use Java Regex, for example:
.+[\p{javaUpperCase}].+
An example from my work project:

Here's a solution for Java 9 and later that makes use of the results() method of Matcher, which returns a stream of the results, out of which the entries can be counted. The suggestion from #Sergey Kalinichenko to remove the ^ and $ anchors has also been incorporated into the regex string.
public class RegEx {
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "\\p{Lu}";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
long count = matcher.results().count();
System.out.printf("Found %d of capital letters in %s%n", count, testStr);
}
}

Need help in Regex to exclude splitting string within "

I need to split a String based on comma as seperator, but if the part of string is enclosed with " the splitting has to stop for that portion from starting of " to ending of it even it contains commas in between.
Can anyone please help me to solve this using regex with look around.

Resurrecting this question because it had a simple regex solution that wasn't mentioned. This situation sounds very similar to ["regex-match a pattern unless..."][4]
\"[^\"]*\"|(,)
The left side of the alternation matches complete double-quoted strings. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
Here is working code (see online demo):
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) {
String subject = "\"Messages,Hello\",World,Hobbies,Java\",Programming\"";
Pattern regex = Pattern.compile("\"[^\"]*\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b = new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits)
System.out.println(split);
} // end main
} // end Program
Reference
How to match pattern except in situations s1, s2, s3

Please try this:
(?<!\G\s*"[^"]*),
If you put this regex in your program, it should be:
String regex = "(?<!\\G\\s*\"[^\"]*),";
But 2 things are not clear:
Does the " only start near the ,, or it can start in the middle of content, such as AAA, BB"CC,DD" ? The regex above only deal with start neer , .
If the content has " itself, how to escape? use "" or \"? The regex above does not deal any escaped " format.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regular expression - find strings with no vowels - java

I have a list of words and I have to output the number of words with no vowels in them. I have this so far String matchString = "[^aeiou]" for(String s: list) if(s.matches(matchString.toLowerCase())) { System.out.println(s); numMatches++; } I'm more worried that the reg expression is wrong.

This one does the trick for me: [^aeiou]+$ Also you should lowercase the string, not the expression: if (s.toLowerCase().matches(matchString))

Looks like you put the regex into lowercase rather than the String you're actually testing. You want if (s.toLowerCase().matches(matchString)){...} Also, I believe your regex should be "[^aeiou]*", so you can match all the characters in the word.

For me only /^[^aeiou]+$/ did the job.

Related

Matching three or more identical characters - Java program [duplicate]

Java extract only first letters/characters from String

Extracting both matching and not matching regex

Regular Expression for UpperCase Letters In A String

Need help in Regex to exclude splitting string within "

Categories

Resources