Regex Expressions in Java - java

This is my first time using Regex and I'm finding some difficulties in validating a string against a regular expression of the sort (x,y,z)(y,z)(x,w) etc. This is the pattern that I have been trying to match with the string
String expressionmatcher = "[\\([[wxyz][,]]*[wxyz]{1,1}\\)]*";
boolean checker = expression.matches(expressionmatcher);
if (checker == true) {
System.out.println("Expression Valid");
}
else
{
System.out.println("Expression is not valid");
}
Although my pattern is accepted, the matcher also accepts everything that is included in the string regardless of the sequence. For example if I input 'x' or a '(' or a ',', it is accepted as a valid expression.
What should I do to fix this? Thank you

That's because the square brackets you have surrounding the entire thing indicate "one of the contents" -- so that if something matches any one of the inside groups, it'll work.
Easy fix is to replace the outer brackets with parentheses. And the brackets surrounding [wxyz][,] too, because if you replace the outer brackets without also replacing the brackets I just mentioned (,x) will also work. I believe you might also want to put a set of brackets around the outer parentheses followed by a + too -- this way you'll only match if you have at least one ordered something inside.
Few other improvements:
You don't need to have the parentheses in a pair of brackets
You don't need to say {1, 1} -- {1} works just fine
I'd recommend putting \\s* after the comma so you can put spaces (or any form of whitespace, for that matter) after the comma
This likely is not the most efficient regex you can get, as I'm not too experienced with them. It works, though!

Related

Find and replace characters in brackets

I have a string kind of:
String text = "(plum) some other words, [apple], another words {pear}.";
I have to find and replace the words in brackets, don't replacing the brackets themselves.
If I write:
text = text.replaceAll("(\\(|\\[|\\{).*?(\\)|\\]|\\})", "fruit");
I get:
fruit some other words, fruit, another words fruit.
So the brackets went away with the fruits, but I need to keep them.
Desired output:
(fruit) some other words, [fruit], another words {fruit}.
Here is your regex:
(?<=[({\[\(])[A-Za-z]*(?=[}\]\)])
Test it here:
https://regex101.com/
In order to use it in Java, remember to add second backslashes:
(?<=[({\\[\\(])[A-Za-z]*(?=[}\\]\\)])
It matches 0 or more letters (uppercase or lowercase) preceded by either of these [,{,( and followed by either of these ],},).
If you want to have at least 1 letter between brackets just replace '*' with '+' like this:
(?<=[({\[\(])[A-Za-z]+(?=[}\]\)])
GCP showed how to use look aheads and look behinds to exclude the brackets from the matched part. But you can also match them, and refer to them in your replacement string with capturing groups:
text.replaceAll("([\\(\\[\\{]).*?([\\)\\]\\}])", "$1fruit$2");
Also note that you can replace the | ORs by a character group [].

Java regular expression matching two consecutive consonants

I'm trying to match only strings with two consecutive consonants. but no matter what input I give to myString this never evaluates to true, so I have to assume something is wrong with the syntax of my regex. Any ideas?
if (Pattern.matches("([^aeiou]&&[^AEIOU]){2}", myString)) {...}
Additional info:
myString is a substring of at most two characters
There is no whitespace, as this string is the output of a .split with a whitespace delimiter
I'm not worried about special characters, as the program just concatenates and prints the result, though if you'd like to show me how to include something like [b-z]&&[^eiou] in your answer I would appreciate it.
Edit:
After going through these answers and testing a little more, the code I finally used was
if (myString.matches("(?i)[b-z&&[^eiou]]{2}")) {...}
[^aeiou] matches non-letter characters as well, so you should use a different pattern:
Pattern rx = Pattern.compile("[bcdfghjklmnpqrstuvwxyz]{2}", Pattern.CASE_INSENSITIVE);
if (rx.matches(myString)) {...}
If you would like to use && for an intersection, you can do it like this:
"[a-z&&[^aeiou]]{2}"
Demo.
To use character class intersection, you need to wrap your syntax inside of a bracketed expression. The below matches characters that are both lowercase letters and not vowels.
[a-z&&[^aeiou]]{2}

How to split a string in java using (,) with certain conditions [duplicate]

I would like to find a regex that will pick out all commas that fall outside quote sets.
For example:
'foo' => 'bar',
'foofoo' => 'bar,bar'
This would pick out the single comma on line 1, after 'bar',
I don't really care about single vs double quotes.
Has anyone got any thoughts? I feel like this should be possible with readaheads, but my regex fu is too weak.
This will match any string up to and including the first non-quoted ",". Is that what you are wanting?
/^([^"]|"[^"]*")*?(,)/
If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:
/(,)(?=(?:[^"]|"[^"]*")*$)/
which will match all of them. Thus
'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';')
replaces all the commas not inside quotes with semicolons, and produces:
'test; a "comma,"; bob; ",sam,";here'
If you need it to work across line breaks just add the m (multiline) flag.
The below regexes would match all the comma's which are present outside the double quotes,
,(?=(?:[^"]*"[^"]*")*[^"]*$)
DEMO
OR(PCRE only)
"[^"]*"(*SKIP)(*F)|,
"[^"]*" matches all the double quoted block. That is, in this buz,"bar,foo" input, this regex would match "bar,foo" only. Now the following (*SKIP)(*F) makes the match to fail. Then it moves on to the pattern which was next to | symbol and tries to match characters from the remaining string. That is, in our output , next to pattern | will match only the comma which was just after to buz . Note that this won't match the comma which was present inside double quotes, because we already make the double quoted part to skip.
DEMO
The below regex would match all the comma's which are present inside the double quotes,
,(?!(?:[^"]*"[^"]*")*[^"]*$)
DEMO
While it's possible to hack it with a regex (and I enjoy abusing regexes as much as the next guy), you'll get in trouble sooner or later trying to handle substrings without a more advanced parser. Possible ways to get in trouble include mixed quotes, and escaped quotes.
This function will split a string on commas, but not those commas that are within a single- or double-quoted string. It can be easily extended with additional characters to use as quotes (though character pairs like « » would need a few more lines of code) and will even tell you if you forgot to close a quote in your data:
function splitNotStrings(str){
var parse=[], inString=false, escape=0, end=0
for(var i=0, c; c=str[i]; i++){ // looping over the characters in str
if(c==='\\'){ escape^=1; continue} // 1 when odd number of consecutive \
if(c===','){
if(!inString){
parse.push(str.slice(end, i))
end=i+1
}
}
else if(splitNotStrings.quotes.indexOf(c)>-1 && !escape){
if(c===inString) inString=false
else if(!inString) inString=c
}
escape=0
}
// now we finished parsing, strings should be closed
if(inString) throw SyntaxError('expected matching '+inString)
if(end<i) parse.push(str.slice(end, i))
return parse
}
splitNotStrings.quotes="'\"" // add other (symmetrical) quotes here
Try this regular expression:
(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*=>\s*(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*,
This does also allow strings like “'foo\'bar' => 'bar\\',”.
MarkusQ's answer worked great for me for about a year, until it didn't. I just got a stack overflow error on a line with about 120 commas and 3682 characters total. In Java, like this:
String[] cells = line.split("[\t,](?=(?:[^\"]|\"[^\"]*\")*$)", -1);
Here's my extremely inelegant replacement that doesn't stack overflow:
private String[] extractCellsFromLine(String line) {
List<String> cellList = new ArrayList<String>();
while (true) {
String[] firstCellAndRest;
if (line.startsWith("\"")) {
firstCellAndRest = line.split("([\t,])(?=(?:[^\"]|\"[^\"]*\")*$)", 2);
}
else {
firstCellAndRest = line.split("[\t,]", 2);
}
cellList.add(firstCellAndRest[0]);
if (firstCellAndRest.length == 1) {
break;
}
line = firstCellAndRest[1];
}
return cellList.toArray(new String[cellList.size()]);
}
#SocialCensus, The example you gave in the comment to MarkusQ, where you throw in ' alongside the ", doesn't work with the example MarkusQ gave right above that if we change sam to sam's: (test, a "comma,", bob, ",sam's,",here) has no match against (,)(?=(?:[^"']|["|'][^"']")$). In fact, the problem itself, "I don't really care about single vs double quotes", is ambiguous. You have to be clear what you mean by quoting either with " or with '. For example, is nesting allowed or not? If so, to how many levels? If only 1 nested level, what happens to a comma outside the inner nested quotation but inside the outer nesting quotation? You should also consider that single quotes happen by themselves as apostrophes (ie, like the counter-example I gave earlier with sam's). Finally, the regex you made doesn't really treat single quotes on par with double quotes since it assumes the last type of quotation mark is necessarily a double quote -- and replacing that last double quote with ['|"] also has a problem if the text doesn't come with correct quoting (or if apostrophes are used), though, I suppose we probably could assume all quotes are correctly delineated.
MarkusQ's regexp answers the question: find all commas that have an even number of double quotes after it (ie, are outside double quotes) and disregard all commas that have an odd number of double quotes after it (ie, are inside double quotes). This is generally the same solution as what you probably want, but let's look at a few anomalies. First, if someone leaves off a quotation mark at the end, then this regexp finds all the wrong commas rather than finding the desired ones or failing to match any. Of course, if a double quote is missing, all bets are off since it might not be clear if the missing one belongs at the end or instead belongs at the beginning; however, there is a case that is legitimate and where the regex could conceivably fail (this is the second "anomaly"). If you adjust the regexp to go across text lines, then you should be aware that quoting multiple consecutive paragraphs requires that you place a single double quote at the beginning of each paragraph and leave out the quote at the end of each paragraph except for at the end of the very last paragraph. This means that over the space of those paragraphs, the regex will fail in some places and succeed in others.
Examples and brief discussions of paragraph quoting and of nested quoting can be found here http://en.wikipedia.org/wiki/Quotation_mark .

How to remove a specific special character pattern from a string

I have a string name s,
String s = "<NOUN>Sam</NOUN> , a student of the University of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue Olympiad Hotel";
I want to remove all <NOUN> and </NOUN> tags from the string. I used this to remove tags,
s.replaceAll("[<NOUN>,</NOUN>]","");
Yes it removes the tag. but it also removes letter 'U' and 'O' characters from the string which gives me following output.
Sam , a student of the niversity of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue lympiad Hotel
Can anyone please tell me how to do this correctly?
Try:
s.replaceAll("<NOUN>|</NOUN>", "");
In RegEx, the syntax [...] will match every character inside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of "<", "N", "O" etc. are removed. Instead use the pipe (|) to match both "<NOUN>" and "</NOUN>".
The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash:
s.replaceAll("</?NOUN>", "");
String.replaceAll() takes a regular expression as its first argument. The regexp:
"[<NOUN>,</NOUN>]"
defines within the brackets the set of characters to be identified and thus removed. Thus you're asking to remove the characters <,>,/,N,O,U and comma.
Perhaps the simplest method to do what you want is to do:
s.replaceAll("<NOUN>","").replaceAll("</NOUN>","");
which is explicit in what it's removing. More complex regular expressions are obviously possible.
You can use one regular expression for this: "<[/]*NOUN>"
so
s.replaceAll("<[/]*NOUN>","");
should do the trick. The "[/]*" matches zero or more "/" after the "<".
Try this :String result = originValue.replaceAll("\\<.*?>", "");

Java Quotation Matching

I'm not sure if this is a regex question, but i need to be able to grab whats inside single qoutes, and surround them with something. For example:
this is a 'test' of 'something' and 'I' 'really' want 'to' 'solve this'
would turn into
this is a ('test') of ('something') and ('I') ('really') want ('to') ('solve this')
Any help you could provide would be great!
Thanks!
String str = "this is a 'test' of 'something'";
String rep = str.replaceAll("'[^']*'", "($0)"); // stand back, I know regex
What I did here is use toe replaceAll() method which searches for all matches for regex "'[^']*'" and replaces them with regex "($0)".
The pattern "'[^']*'" matches all substrings that start and end with a single quote ('), and between them are any characters, except another single quote ([^']), and those can appear any number of times (*). Replacing those with "($0)" means taking every match ($0) and wrapping it with parenthesis.
One easy way (but not always valid) is the following
If always you have [ '] and [' ] , you can do this:
myString.replace(" '"," ('"); // replaces all <space_apostrophe> with <space_bracket_apostrophe>
Do the same for the rear bracket :)
One more thing - why do you even have apostrophes-surrounded words? Is it a must that they must be like that? If you made them like that, why did you do it and then look for another approach !
If you can ignore single apostrophes, you could do like this (C# code, should be easy to translate)
string input = "this is a 'test' of 'something' and ...";
Console.WriteLine(Regex.Replace(input, "'([^']*)'", "('$1')"));

Categories

Resources