For the life of me, I can't figure out why this regular expression is not working. It should find upper case letters in the given string and give me the count. Any ideas are welcome.
Here is the unit test code:
public class RegEx {
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "^[A-Z]+$";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
System.out.printf("Found %d, of capital letters in %s%n", matcher.groupCount(), testStr);
}
}
It doesn't work because you have 2 problems:
Regex is incorrect, it should be "[A-Z]" for ASCII letter or \p{Lu} for Unicode uppercase letters
You're not calling while (matcher.find()) before matcher.groupCount()
Correct code:
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "(\\p{Lu})";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find())
System.out.printf("Found %d, of capital letters in %s%n",
matcher.groupCount(), testStr);
}
UPDATE: Use this much simpler one-liner code to count number of Unicode upper case letters in a string:
int countuc = testStr.split("(?=\\p{Lu})").length - 1;
You didn't call matches or find on the matcher. It hasn't done any work.
getGroupCount is the wrong method to call. Your regex has no capture groups, and even if it did, it wouldn't give you the character count.
You should be using find, but with a different regex, one without anchors. I would also advise using the proper Unicode character class: "\\p{Lu}+". Use this in a while (m.find()) loop, and accumulate the total number of characters obtained from m.group(0).length() at each step.
This should do what you're after,
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "[A-Z]+";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
int count = 0;
while (matcher.find()) {
count+=matcher.group(0).length();
}
System.out.printf("Found %d, of capital letters in %s%n", count, testStr);
}
It should find upper case letters in the given string and give me the count.
No, it shouldn't: the ^ and $ anchors prevent it from doing so, forcing to look for a non-empty string composed entirely of uppercase characters.
Moreover, you cannot expect a group count in an expression that does not define groups to be anything other than zero (no matches) or one (a single match).
If you insist on using a regex, use a simple [A-Z] expression with no anchors, and call matcher.find() in a loop. A better approach, however, would be calling Character.isUpperCase on the characters of your string, and counting the hits:
int count = 0;
for (char c : str.toCharArray()) {
if (Character.isUpperCase(c)) {
count++;
}
}
Your pattern as you've written it looks for 1 or more capital letters between the beginning and the end of the line...if there are any lowercase characters in the line it won't match.
In this example i'm using a regex(regular Expression) to count the number of UpperCase and LowerCase letters in the given string using Java.
import java.util.regex.*;
import java.util.Scanner;
import java.io.*;
public class CandidateCode {
public static void main(String args[] ) throws Exception {
Scanner sc= new Scanner(System.in);
// Reads the String of data entered in a line
String str = sc.nextLine();
//counts uppercase letteres in the given String
int countuc = str.split("([A-Z]+?)").length;
//counts lowercase letteres in the given String
int countlc = str.split("([a-z]+?)").length;
System.out.println("UpperCase count: "+countuc-1);
System.out.println("LowerCase count: "+countlc-1);
}
}
Change the regular expression to
[A-Z] which checks all occurrences of capital letters
Please refer the below example which counts number of capital letters in a string using pattern
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
Pattern ptrn = Pattern.compile("[A-Z]");
Matcher matcher = ptrn.matcher("ivekKVVV");
int from = 0;
int count = 0;
while(matcher.find(from)) {
count++;
from = matcher.start() + 1;
}
System.out.println(count);
}
}
You can also use Java Regex, for example:
.+[\p{javaUpperCase}].+
An example from my work project:
Here's a solution for Java 9 and later that makes use of the results() method of Matcher, which returns a stream of the results, out of which the entries can be counted. The suggestion from #Sergey Kalinichenko to remove the ^ and $ anchors has also been incorporated into the regex string.
public class RegEx {
#Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "\\p{Lu}";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
long count = matcher.results().count();
System.out.printf("Found %d of capital letters in %s%n", count, testStr);
}
}
Related
If I have a String that consists of letters and numbers, how can I get rid of everything after the last number in the String?
Example:
banana_orange_62_34_wednesday would become banana_orange_62_34
1234_4564_www_6_j_1_rrrr would become 1234_4564_www_6_j_1
I tried this so far:
int endIndex = inputXMLFilename.lastIndexOf("\\d+");
inputXMLFilename = inputXMLFilename.substring(0, endIndex);
Use regex replace:
str = str.replaceAll("\\D+$", "");
What the regex means:
\D means “non-digit”
+ means “one or more of the previous term, greedy (as much of the input as possible)”
$ means “end of input”
The $ anchors the match to the end, without which this would match (and delete) all non-digits.
lastIndexOf() only works with plain text, not regex.
#Test
public void cutAfterLastDigit() {
String s = "banana_orange_62_34_wednesday";
Pattern pattern = Pattern.compile("^(.*\\d)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345
Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.
In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.
Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);
Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]
if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}
first;snd;3rd;4th;5th;6th;...
How can I split the above after the third occurence of the ; separator? Especially without having to value.split(";") the whole string as an array, as I won't need the values separated. Just the first part of the string up until nth occurence.
Desired output would be:
first;snd;3rd.
I just need that as a string substring, not as split separated values.
Use StringUtils.ordinalIndexOf() from Apache
Finds the n-th index within a String, handling null. This method uses String.indexOf(String).
Parameters:
str - the String to check, may be null
searchStr - the String to find, may be null
ordinal - the n-th searchStr to find
Returns:
the n-th index of the search String, -1 (INDEX_NOT_FOUND) if no match or null string input
Or this way, no libraries required:
public static int ordinalIndexOf(String str, String substr, int n) {
int pos = str.indexOf(substr);
while (--n > 0 && pos != -1)
pos = str.indexOf(substr, pos + 1);
return pos;
}
I would go with this, easy and basic:
String test = "first;snd;3rd;4th;5th;6th;";
int result = 0;
for (int i = 0; i < 3; i++) {
result = test.indexOf(";", result) +1;
}
System.out.println(test.substring(0, result-1));
Output:
first;snd;3rd
You can ofc change the 3 in the loop with the number of arguments you need
If you want to use regular expressions, it is pretty straightforward:
import re
value = "first;snd;3rd;4th;5th;6th;"
reg = r'^([\w]+;[\w]+;[\w]+)'
re.match(reg, value).group()
Outputs:
"first;snd;3rd"
More options here .
You could use a regex that uses a negated character class to match from the start of the string not a semicolon.
Then repeat a grouping structure 2 times that matches a semicolon followed by not a semicolon 1+ times.
^[^;]+(?:;[^;]+){2}
Explanation
^ Assert the start of the string
[^;]+ Negated character class to match not a semicolon 1+ times
(?: Start non capturing group
;[^;]+ Match a semicolon and 1+ times not a semi colon
){2} Close non capturing group and repeat 2 times
For example:
String regex = "^[^;]+(?:;[^;]+){2}";
String string = "first;snd;3rd;4th;5th;6th;...";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println(matcher.group(0)); // first;snd;3rd
}
See the Java demo
If you don't want to use split, just use indexOf in a for loop to know the index of the 3rd and 4th ";" then do a substring between these index.
Also you can do a split with a regex that match the 3rd ; but it's probably not the best solution.
If you need to do this frequently it is best to compile the regex upfront in a static Pattern instance:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class NthOccurance {
static Pattern pattern=Pattern.compile("^(([^;]*;){3}).*");
public static void main(String[] args) {
String in="first;snd;3rd;4th;5th;6th;";
Matcher m=pattern.matcher(in);
if (m.matches())
System.out.println(m.group(1));
}
}
Replace the '3' by the number of elements you want.
Below code find index of 3rd occurence of ';' character and make substring.
String s = "first;snd;3rd;4th;5th;6th;";
String splitted = s.substring(0, s.indexOf(";", s.indexOf(";", s.indexOf(";") + 1) + 1));
I have a list of words and I have to output the number of words with no vowels in them. I have this so far
String matchString = "[^aeiou]"
for(String s: list) if(s.matches(matchString.toLowerCase())) {
System.out.println(s);
numMatches++;
}
I'm more worried that the reg expression is wrong.
Change regex to
[^aeiou]+
^-- important part
to test if string is build from one or more non vowel characters. Currently you are just checking if stirng is build from one character which is not a e i o u.
You can also make your regex case insensitive by adding (?i) flag at start. This way characters used in regex will represent its lower and upper case
(?i)[^aeiou]+
This one does the trick for me:
[^aeiou]+$
Also you should lowercase the string, not the expression:
if (s.toLowerCase().matches(matchString))
Looks like you put the regex into lowercase rather than the String you're actually testing. You want
if (s.toLowerCase().matches(matchString)){...}
Also, I believe your regex should be "[^aeiou]*", so you can match all the characters in the word.
import java.util.regex.*;
public class Vowels {
public static void main(String[] args) {
String s = "hello";
int count = 0;
Pattern p = Pattern.compile("[aeiou]");
Matcher m = p.matcher(s);
while (m.find()) {
count++;
System.out.println(m.start() + "...." + m.group());
}
System.out.println("number of vowels found in the string ="+count);
}
}
For me only /^[^aeiou]+$/ did the job.
I have a few strings which are like this:
text (255)
varchar (64)
...
I want to find out the number between ( and ) and store that in a string. That is, obviously, store these lengths in strings.
I have the rest of it figured out except for the regex parsing part.
I'm having trouble figuring out the regex pattern.
How do I do this?
The sample code is going to look like this:
Matcher m = Pattern.compile("<I CANT FIGURE OUT WHAT COMES HERE>").matcher("text (255)");
Also, I'd like to know if there's a cheat sheet for regex parsing, from where one can directly pick up the regex patterns
I would use a plain string match
String s = "text (255)";
int start = s.indexOf('(')+1;
int end = s.indexOf(')', start);
if (end < 0) {
// not found
} else {
int num = Integer.parseInt(s.substring(start, end));
}
You can use regex as sometimes this makes your code simpler, but that doesn't mean you should in all cases. I suspect this is one where a simple string indexOf and substring will not only be faster, and shorter but more importantly, easier to understand.
You can use this pattern to match any text between parentheses:
\(([^)]*)\)
Or this to match just numbers (with possible whitespace padding):
\(\s*(\d+)\s*\)
Of course, to use this in a string literal, you have to escape the \ characters:
Matcher m = Pattern.compile("\\(\\s*(\\d+)\\s*\\)")...
Here is some example code:
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="varchar (64)";
String re1=".*?"; // Non-greedy match on filler
String re2="\\((\\d+)\\)"; // Round Braces 1
Pattern p = Pattern.compile(re1+re2,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String rbraces1=m.group(1);
System.out.print("("+rbraces1.toString()+")"+"\n");
}
}
}
This will print out any (int) it finds in the input string, txt.
The regex is \((\d+)\) to match any numbers between ()
int index1 = string.indexOf("(")
int index2 = string.indexOf(")")
String intValue = string.substring(index1+1, index2-1);
Matcher m = Pattern.compile("\\((\\d+)\\)").matcher("text (255)");
if (m.find()) {
int len = Integer.parseInt (m.group(1));
System.out.println (len);
}