I am trying to write a regex which would match a (not necessarily repeating) sequence of text blocks, e.g.:
foo,bar,foo,bar
My initial thought was to use backreferences, something like
(foo|bar)(,\1)*
But it turns out that this regex only matches foo,foo or bar,bar but not foo,bar or bar,foo (and so on).
Is there any other way to refer to a part of a pattern?
In the real world, foo and bar are 50+ character long regexes and I simply want to avoid copy pasting them to define a sequence.
With a decent regex flavor you could use (foo|bar)(?:,(?-1))* or the like.
But Java does not support subpattern calls.
So you end up having a choice of doing String replace/format like in ajx's answer, or you could condition the comma if you know when it should be present and when not. For example:
(?:(?:foo|bar)(?:,(?!$|\s)|))+
Perhaps you could build your regex bit by bit in Java, as in:
String subRegex = "foo|bar";
String fullRegex = String.format("(%1$s)(,(%1$s))*", subRegex);
The second line could be factored out into a function. The function would take a subexpression and return a full regex that would match a comma-separated list of subexpressions.
The point of the back reference is to match the actual text that matches, not the pattern, so I'm not sure you could use that.
Can you use quantifiers like:
String s= "foo,bar,foo,bar";
String externalPattern = "(foo|bar)"; // comes from somewhere else
Pattern p = Pattern.compile(externalPattern+","+externalPattern+"*");
Matcher m = p.matcher(s);
boolean b = m.find();
which would match 2 or more instances of foo or bar (followed by commas)
Related
I'm trying to compile one Java Regex pattern but have that pattern look for 3 different matches. I've learned that I can do that using the pipe (|) but I'm having trouble with the actual syntax of the regex.
I'm looking through XML data and trying to pull out 3 matches. The XML will look something like this:
<Element createdOn="1405358703367" updatedOn="1405358718804" url="http://www.someurl.com" />
The regex I'm trying looks like this so far:
((?<="url": ").*(?=")) | (createdOn="(\d)") | (updatedOn="(\d)")
In the end I Need to get everything between the quotes in the XML (i.e. 1405358703367, 1405358718804, and http://www.someurl.com.
I had the URL regex working on its own earlier, but there seems to be no matches being made.
Thanks.
Get the matched group from index 2.
(url|createdOn|updatedOn)="([^"]*)"
DEMO
Here is sample code:
String string = "<Element createdOn=\"1405358703367\" updatedOn=\"1405358718804\" url=\"http://www.someurl.com\" />";
String patternString = "(url|createdOn|updatedOn)=\"([^\"]*)\"";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(2));
}
output:
1405358703367
1405358718804
http://www.someurl.com
Java doesn't have library method that extracts matches, but you only need one line:
String[] matches = input.replaceAll("^[^\"]*\"|\"[^\"]*$", "").split("\"[^\"]*\"");
This works by stripping off leading and trailing input up to/from the first/last quote, then splits on quote-nonquote-quote input, leaving the target matches as an array.
In action:
String input = "<Element createdOn=\"1405358703367\" updatedOn=\"1405358718804\" url=\"http://www.someurl.com\" />";
String[] matches = input.replaceAll("^[^\"]*\"|\"[^\"]*$", "").split("\"[^\"]*\"");
System.out.println(Arrays.toString(matches));
Output:
[1405358703367, 1405358718804, http://www.someurl.com]
The pipe (|) is used to find a match that could be some-pattern OR some-other-pattern OR yet-another-pattern. It's not good at finding all occurrences of several patterns. To do that, if the patterns you're looking for aren't necessarily in a fixed order, you'll need to use a loop.
Here's a code example that starts with the pattern you tried, fixes some problems, and uses a loop to find the patterns:
Pattern p = Pattern.compile("((?<=url=\").*(?=\"))|(createdOn=\"(\\d+)\")|(updatedOn=\"(\\d+)\")");
Matcher m = p.matcher(source);
while (m.find()) {
System.out.println("Found: "+m.group());
System.out.println("Group 1: "+m.group(1));
System.out.println("Group 3: "+m.group(3));
System.out.println("Group 5: "+m.group(5));
}
(Some problems with your original pattern: You put space characters before and after each |, which are treated literally and mean the pattern has to match spaces that aren't there. I added + after \\d because you want to match more than one digit. There were some mistakes, like putting : after url instead of =.)
Now the code uses a loop to find each successive pattern that matches one of the patterns you're looking for. It matches either url=... or createdOn=... or updatedOn=..., but by using a loop we will find all of them. (Note that it doesn't care if it sees a url or a createdOn attribute twice in the source. You'll have to check that yourself.)
The group() method with no parameters will return whatever was matched by the pattern. group(1), group(3), and group(5) return certain subsections of the pattern; the numbers are determined by counting wherever you use ( in the pattern except for (?. So group 1 matches something using url as a lookbehind; group 2 starts with createdOn; group 3 is the sequence of digits following createdOn; group 4 starts with updatedOn, etc. The way the pattern is set up, not all of these will have values, since only one of the three alternatives separated by | will match. The rest will be null. As a result, the output of the above code will display null for two of the groups, and a useful value for the other. If you do things this way, you'll need to test for null to see which value actually got returned.
This would also be a case where named capturing groups could be useful. See http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html. Matcher has a group(name) function that takes a group name as a parameter.
This is one approach, but there are always multiple approaches to string parsing, and the other answers posted here are valid also. Plus there are already XML parsers to take care of things like this for you.
NOTE: This answer was meant to point out how | works. I don't recommend actually doing things this way, since it's overly complicated. If you're going to look separately for each attribute, it would be simpler just to set up three patterns and look for each one, one at a time. Or use #braj's suggestion in a loop, and perform a later check to make sure the createdOn and updatedOn values are numeric.
I want to replace a part of a string with another string. For example, a string comes in, and it contains meat, but I want it to change that meat to vegetable. I know I can do this with the following statement.
str = str.replaceAll("meat", "vegetable");
However, I need to have the exact capitalization. I'm looking for a way to do this no matter what letters are upper or lowercase in the example meat.
My application is a filter for Minecraft. Either swears, or just customizing the menu. Like I'd like to replace every instance of the word minecraft, no matter the capitalization, to say Best game Ever!!.
I have modded it so that it can recognize and replace specific capitalizations, but that is very limiting.
I hope I have supplied enough info so that someone can help me with this.
You can make regex case-insensitive by adding (?i) flag at start
str = str.replaceAll("(?i)meat", "vegatable");
example
System.out.println("aBc def".replaceAll("(?i)abc","X")); // out: "X def"
The first argument to replaceAll is a regular expression, and you can embed a flag in the expression to make it case-insensitive:
str = str.replaceAll("(?i)meat", "vegatable");
Another way: Here the flag is not explicitly in the regex, but passed as a separate parameter.
String input = "abc MeAt def";
Pattern pattern = Pattern.compile("meat", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
String output = matcher.replaceAll("vegetable");
System.out.println(output);
I want to find all the negative floating point numbers in a string and store them in an array. I think my regex is correct, but something is wrong with my method.
Pattern pattern = Pattern.compile("[-]?[0-9]*[.][0-9]+$");
String[] results = pattern.split("|AAA--A A05_#A| |-999.999| |-55.7|");
Your regex anchors the match to the end of the string, which isn't what you want.
Likewise, Pattern.split doesn't do what you want. Here's some sample code to get you going:
Pattern pattern = Pattern.compile("[-]?[0-9]*[.][0-9]+");
String text = "|AAA--A A05_#A| |-999.999| |-55.7|";
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
This prints:
-999.999
-55.7
Obviously you could add to a list or something similar within the while loop. I don't know of any method which returns a collection of all the matches, but you could easily write a utility method to do that yourself, based on code like the above.
EDIT: As noted in comments, if you only want to find negative values, the - shouldn't be optional (and it doesn't need to be in a set, either - just -? would have been fine).
I'm totally beginner in java.
In javascript i have this regex:
/[^0-9.,\-\ ]/gi
How can i do the same in java?
Have a look at this: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Theres quite a lot you can do in Java with Regex
If you want to match repeatedly against that regex, you would do:
Pattern p = Pattern.compile("(?i)[^0-9.,-\ ]");
Matcher m = p.matcher(targetString);
Then use the matcher methods in a loop to get the match you want. The "i" is a case insensitivity flag (which you actually don't need as there are no characters specified), but I'm not sure what the equivalent of the "g" flag is.. I think it's simply to attempt to apply the pattern repeatedly to the target string rather than to try and match the whole string, which is what the above code does.
Also, the pattern above will only match one character at a time, you may in fact want [^0-9.,-\ ]*, which will match against 0 or more characters, greedily. I would read the docs on the Pattern class if I were you.
I'm not very good at RegEx, can someone give me a regex (to use in Java) that will select all whitespace that isn't between two quotes? I am trying to remove all such whitespace from a string, so any solution to do so will work.
For example:
(this is a test "sentence for the regex")
should become
(thisisatest"sentence for the regex")
Here's a single regex-replace that works:
\s+(?=([^"]*"[^"]*")*[^"]*$)
which will replace:
(this is a test "sentence for the regex" foo bar)
with:
(thisisatest"sentence for the regex"foobar)
Note that if the quotes can be escaped, the even more verbose regex will do the trick:
\s+(?=((\\[\\"]|[^\\"])*"(\\[\\"]|[^\\"])*")*(\\[\\"]|[^\\"])*$)
which replaces the input:
(this is a test "sentence \"for the regex" foo bar)
with:
(thisisatest"sentence \"for the regex"foobar)
(note that it also works with escaped backspaces: (thisisatest"sentence \\\"for the regex"foobar))
Needless to say (?), this really shouldn't be used to perform such a task: it makes ones eyes bleed, and it performs its task in quadratic time, while a simple linear solution exists.
EDIT
A quick demo:
String text = "(this is a test \"sentence \\\"for the regex\" foo bar)";
String regex = "\\s+(?=((\\\\[\\\\\"]|[^\\\\\"])*\"(\\\\[\\\\\"]|[^\\\\\"])*\")*(\\\\[\\\\\"]|[^\\\\\"])*$)";
System.out.println(text.replaceAll(regex, ""));
// output: (thisisatest"sentence \"for the regex"foobar)
Here is the regex which works for both single & double quotes (assuming that all strings are delimited properly)
\s+(?=(?:[^\'"]*[\'"][^\'"]*[\'"])*[^\'"]*$)
It won't work with the strings which has quotes inside.
This just isn't something regexes are good at. Search-and-replace functions with regexes are always a bit limited, and any sort of nesting/containment at all becomes difficult and/or impossible.
I'd suggest an alternate approach: Split your string on quote characters. Go through the resulting array of strings, and strip the spaces from every other substring (whether you start with the first or second depends on whether you string started with a quote or not). Then join them back together, using quotes as separators. That should produce the results you're looking for.
Hope that helps!
PS: Note that this won't handle nested strings, but since you can't make nested strings with the ASCII double-qutoe character, I'm gonna assume you don't need that behaviour.
PPS: Once you're dealing with your substrings, then it's a good time to use regexes to kill those spaces - no containing quotes to worry about. Just remember to use the /.../g modifier to make sure it's a global replacement and not just the first match.
Groups of whitespace outside of quotes are separated by stuff that's a) not whitespace, or b) inside quotes.
Perhaps something like:
(\s+)([^ "]+|"[^"]*")*
The first part matches a sequence of spaces; the second part matches non-spaces (and non-quotes), or some stuff in quotes, either repeated any number of times. The second part is the separator.
This will give you two groups for each item in the result; just ignore the second element. (We need the parentheses for precidence rather than match grouping there.) Or, you could say, concatenate all the second elements -- though you need to match the first non-space word too, or in this example, make the spaces optional:
StringBuffer b = new StringBuffer();
Pattern p = Pattern.compile("(\\s+)?([^ \"]+|\"[^\"]*\")*");
Matcher m = p.matcher("this is \"a test\"");
while (m.find()) {
if (m.group(2) != null)
b.append(m.group(2));
}
System.out.println(b.toString());
(I haven't done much regex in Java so expect bugs.)
Finally This is how I'd do it if regexes were compulsory. ;-)
As well as Xavier's technique, you could simply do it the way you'd do it in C: just iterate over the input characters, and copy each to the new string if either it's non-space, or you've counted an odd number of quotes up to that point.
If there is only one set of quotes, you can do this:
String s = "(this is a test \"sentence for the regex\") a b c";
Matcher matcher = Pattern.compile("^[^\"]+|[^\"]+$").matcher(s);
while (matcher.find())
{
String group = matcher.group();
s = s.replace(group, group.replaceAll("\\s", ""));
}
System.out.println(s); // (thisisatest"sentence for the regex")abc
This isn't an exact solution, but you can accomplish your goal by doing the following:
STEP 1: Match the two segments
\\(([a-zA-Z ]\*)"([a-zA-Z ]\*)"\\)
STEP 2: remove spaces
temp = $1 replace " " with ""
STEP 3: rebuild your string
(temp"$2")