I'm currently working on something where the code inputs about thousands of lines of strings. Each line must follow a specific format like the following:
"Name,#,#,#,#,#,#"
Where 'name' is the name of a movie (we can assume the name won't have any numbers), and # is any number from 0-10. Each value MUST be separated by a comma.
My code is the following:
if (line.matches(".*[a-zA-z].*,([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)")) {
System.out.println("no");
}
else {
System.out.println(line);
The issue is that the title of the film can't have commas in it. If it does, it needs to be printed. However, my 'matches()' doesn't seem to pick up lines that have a comma in the title. It seems to me that my code specifically outlines that if the next entry (separated by a comma) is not an integer, then it does not match, and therefore the 'line' needs to be printed.
Can anyone see where I'm going wrong in this?
You are saying that rules are:
Lines must be 7 comma-separated values: a name and 6 numbers in range 0-10.
The name must not contain a comma.
We can assume the name won't have any numbers, but it is not a requirement that it cannot.
Since the only invalid character in a name is a comma, so regex would be:
[^,]*,(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10),(?:[0-9]|10)
If you want to capture the fields, you would use this code:
Pattern p = Pattern.compile("([^,]*),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)");
for (String line : lines) {
Matcher m = p.matcher(line);
if (! m.matches()) {
System.out.println("Invalid line: " + line);
} else {
System.out.println("Name: " + m.group(1));
System.out.println(" Values: " + m.group(2)
+ " " + m.group(3)
+ " " + m.group(4)
+ " " + m.group(5)
+ " " + m.group(6)
+ " " + m.group(7));
}
}
Test
String[] lines = { "Buffalo Bill and the Indians, or Sitting Bull's History Lesson,0,1,2,3,4,5",
"Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb,6,7,8,9,10,0",
"300,1,2,3,4,5,6"};
Output
Invalid line: Buffalo Bill and the Indians, or Sitting Bull's History Lesson,0,1,2,3,4,5
Name: Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb
Values: 6 7 8 9 10 0
Name: 300
Values: 1 2 3 4 5 6
First movie name has a comma, so it doesn't match.
Second movie name has special characters (. and :), but no comma, so it matches.
Third movie name is "300", which is an actual movie, so it matches.
The problem lies within with the .*. This part is able to include the comma.
Fri,dayaervsere,6,4,78,7
<--><--------->^
.* [a-zA-Z] ,( [...]
So, basically you only need to get rid of the .*. Instead, apply a quantifier to your first group:
[a-zA-Z]* // to match any number of characters
or
[a-zA-Z]+ // to match at least one character
If you do use regex to solve this, I'd recommend allowing commas in the 'Name' part of your regex. Focus on making sure there are 6 numbers, each following a comma. You can check to see if the name fits an appropriate criteria later.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
// before your for-loop, create a pattern (Assuming no digits in title)
Pattern p = Pattern.compile("([^0-9]+),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)");
// ...
// later on in your actual for-loop for each line.
Matcher m = p.matcher(line);
if (m.matches())
{
String title = m.group(1);
// do extra checking for the title if needed
}
else
{
// print no
}
The following regex supposed to solve your problem:
^([a-zA-Z ]+),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10),([0-9]|10)
Or the shorter version of it, with no code duplication:
^([a-zA-Z ]+)(,([0-9]|10)){6}
Testing
"The Killer,6,7,3,6,8,1" matches the pattern.
"The Kill,er,6,7,3,6,8,1" doesn't match the pattern, as you wanted.
Also, spaces in the title are supported.
You can play with it here.
Related
I have a free flowing string that has some random text like below:
"Some random text 080 2668215901"
"Some ramdom text 040-1234567890"
"Some random text 0216789101112"
I need to capture the the 3 digit numbers and the following 10 digit numbers:
with space condition
with hypen condition
without any space/hypen
I am using Java.
This is what I tried to get the numbers from the free flowing text:
"\\w+([0-9]+)\\w+([0-9]+)"
I can do a string length check to see if there are any 3 digit numbers that precedes a Hypen or a space, which is then followed by a 10 digit number.But i really would like to explore if regex can give me a better solution.
Also,if there are more occurances within the String,i'd need to capture them all. I would also need to capture any 10 digit String as well,that need not precede a hypen and a space
It is usually (\d{3})[ -]?(\d{10})
With boundary conditions maybe (?<!\d)(\d{3})[ -]?(\d{10})(?!\d)
Assuming you'll run this regex on individual lines, and ignoring some of the... more expressive regex implementations, this is perhaps the simplest way:
/([0-9]{3})[ -]?([0-9]{10})/
If your text might end in numbers, you'll need to anchor the result to the end of the line like this:
/([0-9]{3})[ -]?([0-9]{10})$/
If you are guaranteed literal double quote characters around your inputs, you could instead use:
/([0-9]{3})[ -]?([0-9]{10})"$/
And if you needed to match the entire line for some input error testing, you could use:
/^"(.+)([0-9]{3})[ -]?([0-9]{10})"$/
Here is a longer demo. From your responses above you're also looking for matches with trailing chars after the match.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Class {
private static final Pattern p = Pattern.compile("" +
"((?<threeDigits>\\d{3})[- ]?)?" +
"(?<tenDigits>\\d{10})");
public static void main(String... args) {
final String input =
"Here is some text to match: Some random text 080 2668215901. " +
"We're now matching stray sets of ten digit as well: 1234567890. " +
"Notice how you get the first ten and the second ten, with the preceding three:1234123412-040-1234567890" +
"A stranger case:111222333444555666777888. Where should matches here begin and end?";
printAllMatches(p.matcher(input));
}
private static void printAllMatches(final Matcher m) {
while (m.find()) {
System.out.println("three digits: " + m.group("threeDigits"));
System.out.println("ten digits: " + m.group("tenDigits"));
}
}
}
switched to findall battleplan.
I'm having trouble figuring out how to grab a certain part of a string using regular expressions in JAVA. Here's my input string:
application.APPLICATION NAME.123456789.status
I need to grab the portion of the string called "APPLICATION NAME". I can't simply split on the period character becuase APPLICATION NAME may itself include a period. The first word, "application", will always remain the same and the characters after "APPLICATION NAME" will always be numbers.
I've been able to split on period and grab the 1st index but as I mentioned, APPLICATION NAME may itself include periods so this is no good. I've also been able to grab the first and second to last index of a period but that seems ineffecient and would like to future-proof by using REGEX.
I've googled around for hours and haven't been able to find much guidance. Thanks!
You can use ^application\.(.*)\.\d with find(), or application\.(.*)\.\d.* with matches().
Sample code using find():
private static void test(String input) {
String regex = "^application\\.(.*)\\.\\d";
Matcher m = Pattern.compile(regex).matcher(input);
if (m.find())
System.out.println(input + ": Found \"" + m.group(1) + "\"");
else
System.out.println(input + ": **NOT FOUND**");
}
public static void main(String[] args) {
test("application.APPLICATION NAME.123456789.status");
test("application.Other.App.Name.123456789.status");
test("application.App 55 name.123456789.status");
test("application.App.55.name.123456789.status");
test("bad input");
}
Output
application.APPLICATION NAME.123456789.status: Found "APPLICATION NAME"
application.Other.App.Name.123456789.status: Found "Other.App.Name"
application.App 55 name.123456789.status: Found "App 55 name"
application.App.55.name.123456789.status: Found "App.55.name"
bad input: **NOT FOUND**
The above will work as long as "status" doesn't start with a digit.
With split(), you could save key.split("\\.") in a String[] s and, in a second time, join from s[1] to s[s.length-3].
With regexes you can do:
String appName = key.replaceAll("application\\.(.*)\\.\\d+\\.\\w+")", "$1");
Why split? Just:
String appName = input.replaceAll(".*?\\.(.*)\\.\\d+\\..*", "$1");
This also correctly handles a dot then digits within the application name, but only works correctly if you know the input is in the expected format.
To handle "bad" input by returning blank if the pattern is not matched, be more strict and use an optional that will always match (replace) the entire input:
String appName = input.replaceAll("^application\\.(.*)\\.\\d+\\.\\w+$|.*", "$1");
I have some strings which are indexed and are dynamic.
For example:
name01,
name02,
name[n]
now I need to separate name from index.
I've come up with this regex which works OK to extract index.
([0-9]+(?!.*[0-9]))
But, there are some exceptions of these names. Some of them may have a number appended which is not the index.(These strings are limited and I know them, meaning I can add them as "exceptions" in the regex)
For example,
panLast4[01]
Here the last '4' is not part of the index, so I need to distinguish.
So I tried:
[^panLast4]([0-9]+(?!.*[0-9]))
Which works for panLast4[123] but not panLast4[43]
Note: the "[" and "]" is for explanation purposes only, it's not present in the strings
What is wrong?
Thanks
You can use the split method with this pattern:
(?<!^panLast(?=4)|^nm(?=14)|^nm1(?=4))(?=[0-9]+$)
The idea is to find the position where there are digits until the end of the string (?=[0-9]+$). But the match will succeed if the negative lookbehind allows it (to exclude particular names (panLast4 and nm14 here) that end with digits). When one of these particular names is found, the regex engine must go to the next position to obtain a match.
Example:
String s ="panLast412345";
String[] res = s.split("(?<!^panLast(?=4)|^nm(?=14)|^nm1(?=4))(?=[0-9]+$)", 2);
if ( res.length==2 ) {
System.out.println("name: " + res[0]);
System.out.println("ID: " + res[1]);
}
An other method with matches() that simply uses a lazy quantifier as last alternative:
Pattern p = Pattern.compile("(panLast4|nm14|.*?)([0-9]+)");
String s = "panLast42356";
Matcher m = p.matcher(s);
if ( m.matches() && m.group(1).length()>0 ) {
System.out.println("name: "+ m.group(1));
System.out.println("ID: "+ m.group(2));
}
I think what I am asking is either very trivial or already asked, but I have had a hard time finding answers.
We need to capture the inner number characters between brackets within a given string.
so given the string
StringWithMultiArrayAccess[0][9][4][45][1]
and the regex
^\w*?(\[(\d+)\])+?
I would expect 6 capture groups and access to the inner data.
However, I end up only capturing the last "1" character in capture group 2.
If it is important heres my java junit test:
#Test
public void ensureThatJsonHandlerCanHandleNestedArrays(){
String stringWithArr = "StringWithMultiArray[0][0][4][45][1]";
Pattern pattern = Pattern.compile("^\\w*?(\\[(\\d+)\\])+?");
Matcher matcher = pattern.matcher(stringWithArr);
matcher.find();
assertTrue(matcher.matches()); //passes
System.out.println(matcher.group(2)); //prints 1 (matched from last array symbols)
assertEquals("0", matcher.group(2)); //expected but its 1 not zero
assertEquals("45", matcher.group(5)); //only 2 capture groups exist, the whole string and the 1 from the last array brackets
}
In order to capture each number, you need to change your regex so it (a) captures a single number and (b) is not anchored to--and therefore limited by--any other part of the string ("^\w*?" anchors it to the start of the string). Then you can loop through them:
Matcher mtchr = Pattern.compile("\\[(\\d+)\\]").matcher(arrayAsStr);
while(mtchr.find()) {
System.out.print(mtchr.group(1) + " ");
}
Output:
0 9 4 45 1
I have an input string in the following format
String input = "00IG356001110002005064007000000";
Characters 3-7 is the code.
Characters 8-12 is the amount.
Based on the code in the input string (IG356 in the sample input string), i need to capture the amount(00111 in the sample).
The value in the amount (characters 8-12) should be picked up only for specific codes and the logic is detailed below.
The code should not be SG356. If it is SG356, not a match and exit.
a. If the code is not SG356, check if the codes are IG902 or SG350, in this case capture the amount(00111)
else
b. Check for the 3 numbers in the code (characters 5-7, 356 in this sample). If they are 200,201,356,370. go ahead and capture the amount
I am using the regular expression shown below:
Using positive lookahead and if then else construct.
String regex= ".{2}(?!SG356)((?=IG902|SG350).{5}(.{5}).+|.{2}(?=200|201|356|370).{3}(.{5}).+)";
The regular expression works fine if the code in the input string is IG902 or SG350 (when the 'if' part of the regex is getting matched). but if the 'else' is getting matched, i am unable to capture the amount.
This regular expression is working fine while just checking for a match.
.{2}(?!SG356)((?=IG902|SG350).+|.{2}(?=200|201|356|370).+)
The problem is only while capturing the group.
I am running this in Java. Any help would be greatly appreciated.
The java code i am using is :
public String getTsqlSum(String input, String regex){
String value = null;
Matcher m = Pattern.compile(regex).matcher(input);
System.out.println("Group Count: " + m.groupCount());
if (m.matches()) {
for (int i=0;i<m.groupCount();i++){
System.out.println("For i: " + i +" Value: " + m.group(i));
}
}
return value;
}
public void forumTest(){
//String input = "00IG902001110002005064007000000";
String input = "00IG356001110002005064007000000";
String regex= ".{2}(?!SG356)(?:(?=IG902|SG350).{5}|.{2}(?=200|201|356|370).{3})(.{5}).+";
System.out.println(match(input, regex));
String match = getTsqlSum(input, regex);
System.out.println("Match: " + match);
}
The regular expression works fine if the code in the input string is IG902 or SG350 (when the 'if' part of the regex is getting matched). but if the 'else' is getting matched, i am unable to capture the amount.
You are not unable to capture the amount, the expression is working fine. But if you are in the second part of the alternation (This is not a regex if-then-else) then your result is in a different capturing group. You will find it in the capturing group 3 and not in the second one like when you are matching in the first part of the alternation.
String regex= ".{2}(?!SG356)((?=IG902|SG350).{5}(.{5}).+|.{2}(?=200|201|356|370).{3}(.{5}).+)";
Group number 1 2 3
In a regular expression the capturing groups are numbered by their opening brackets and this continues also in an alternation. In Perl there would be a construct that gives the capturing groups of an alternation the same number, but I think thats the only flavour that is able to do this.
In Java you need to check after the expression in which group you have the result.
See my answer here, similar topic
You can change your regex and make the alternation before the capturing group
try this
.{2}(?!SG356)(?:(?=IG902|SG350).{5}|.{2}(?=200|201|356|370).{3})(.{5}).+
You will find your result in both cases in the group 1. (I made the first one a non capturing group using the ?:)
Update after the source was added
Your loop is wrong, that means the groups are starting at 1, if you want the content of group one, you have to use m.group(1).
In group m.group(0) you will find the whole matched string.
Try this
for (int i=1;i<=m.groupCount();i++){
System.out.println("For i: " + i +" Value: " + m.group(i));
}