java regex help - java

I can have one string with the following two formats:
"[HardCOdeText1 (HardCodeText2)].[HardCodeText3].[MatchString] between changeValue1 and changeValue2";
"[MatchString] between changeValue1 and changeValue2";
I would like to match if the string have "[MatchString] between" expression.
ANd depending upon which string I match, The changed value format should be one of the following:
The changed format should be :
"[HardCOdeText1 (HardCodeText2)].[HardCodeText3].[MatchString] between chamged1 and changed2"; or
"[MatchString] between changed1 and changed2";
I started to match "[MatchString] between" expression and I got stuck over there:
ANy help is appreciated.

[ and ] are reserved chars in regular expressions and you need to escape them.
Also, here is a really nice online regular expression tester that uses java regexp:
http://www.regexplanet.com/simple/index.html

Are you using "[MatchString] between" as your pattern without escaping the [] brackets? The characters [, ], ., (, and ) are all special characters in RegEx. If you want to refer to those as literal characters, you need to escape them in the pattern.
Your question is a little unclear. Maybe if you provided specific examples of a real input string you're using, and what you want the matches to look like?
I'd also check out regular-expressions.info for more information and tutorials, including info about and syntax for Java's implementation of RegEx.

This will match both kinds of line examples that you give.
String inputLine;
String outputLine;
String regex1 = "(\\[MatchString\\] between )changeValue1 and";
String regex2 = "(\\[MatchString\\] between )[^ ]+ and";
do {
inputLine = readTheInput();
outputLine = inputLine.replaceFirst(regex1, "$1changed1 and");
writeTheOutput(outputLine);
} while (thereIsStillInput());
The first regex1 looks specifically for changeValue1 while the second regex2 looks for anything following "between" and preceding "and".
This should get you started.

Related

Give look behind the priority over the actual regular expression

I am looking for a regular expression that can strip all 'a' characters from the beginning of an input word (comprising only of English alphabet).
How would I do this using an regular expression?
The following look behind based regex fails to do the job:
(?<=a*?)(\w)+
as for input abc the above regular expression would return abc.
Is there a clean way to do this using lookbehinds?
A (brute force-ish) regular expression that does work is using negation:
(?<=a*)([[^a]&&\w])*
which returns the correct answer of bc for an input word abc.
But I was wondering if there could be a more elegant regular expression, say, using the correct quantifier?
Pattern removeWords = Pattern.compile("\\b(?:a)\\b\\s*", Pattern.CASE_INSENSITIVE);
Matcher fix = removeWords.matcher(YourWord);
String fixedString = fix.replaceAll("");
this will remove a from the current string and if you want to remove some other letters
Pattern removeWords = Pattern.compile("\\b(?:a|b|c)\\b\\s*",Pattern.CASE_INSENSITIVE);
you ca do it this way
I think that a regex for this problem is overkill.
You could instead do:
str = str.startsWith("a") ? str.substring(1) : str;
Try with:
(?i)\\ba?(\\w+)\\b
and replace a word with captured group 1.
Code example:
String word = "aWord Another";
word = word.replaceAll("(?i)\\ba?(\\w+)\\b", "$1");
System.out.println(word);
with output:
Word nother
There are much more simpler way to do this, but as you insist on using using lookbehinds, I will give one. The regex will be
(?<=\b)a+(\w*)
Regex Breakdown
(?<=\b) #Find all word boundaries
a+ #Match the character a literally at least once. We have already ensured using word boundary to find those a's only which are starting of word
(\w*) #Find remaining characters
Regex Demo
Java Code
String str = "abc cdavbvhsza aaabcd";
System.out.println(str.replaceAll("(?<=\\b)a+(\\w*)", "$1"));
Ideone Demo

Regular expression for year with optional closing parenthesis

I am struggling to get the following regular expression (in Java) to work nicely. I want to see if a string has a year, and the strings can be
Mar 3, 2014
or sometimes with a closing parenthesis such as
Mar 3, 2014)
I am using
text.matches("\\b((19|20)\\d{2})(\\)?)\\b")
which works in most cases, but does not match if string ends at the parenthesis
If I use
text.matches("\\b((19|20)\\d{2})(\\)?)$")
it matches text that ends after the parenthesis but not a string that has another space
I thought that \b would include end of string, but cannot get it to work.
I know I can use two regex's but that seems really ugly.
Your main problem is that matches checks if entire string matches regex. What you want is to test if string contains substring which can be matched by regex. To do so use
Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(stringYouWantToTest);
if (m.find()){
//tested string contains part which can be matched by regex
}else{
//part which could be matched by regex couldn't be found
}
You can also surround your regex with .* to let it match characters beside part you wanted to find and use matches like you are doing now,
if(yourString.matches(".*"+yourRegex+".*"))
but this will have to iterate over entire string.
In other words you can try to find \\b(19|20)\\d{2}\\b using Pattern/Matcher or use something like matches(".*\\b(19|20)\\d{2}\\b.*").
BTW parenthesis ) are not included in \w class so \b will accept place between \w and ) as word boundary so for instance "9)" will match regex \d\b\).
Your question isn't very clear, but from what I understand, this should work for you:
text.matches("((?:19|20)(?:\\d){2})\\)?");
Demo: http://regex101.com/r/lO0aH4/3
You could try something like :
".*(19|20)[0-9]{2}\\)?$"
I'm not sure it could help you, it would better to give us a complete example of string to match. Must the string be ended by a year (with optional parenthesis) or may it be something else after ?

Eliminating spaces and words starting with particular chars from JAVA string

Eliminating spaces and words starting with particular chars from JAVA string.
With the following code spaces between string words are eliminated:
String str1= "This is symbel for snow and silk. Grapes are very dear"
String str2=str1.replaceAll(" ","");
System.out.println(str2);
It gives this output:-
output:
Thisissymbelforsnowandsilk.Grapesareverydear
But I want to eliminate all the words in str1 starting with char 's' (symbel snow silk) and char 'd' (dear) to get the following output:-
output:
Thisisforand.Grapesarevery
How it can be achieved by amending this code?
The best solution is to use a Regular Expression also known as a Regex.
These are designed specifically for complex search and replace functionality in strings.
This one:
"([sd]\\w+)|\\s+"
matches a word group indicated by the parentheses () starting with 's' or 'd' followed by one or more "word" characters (\\w = any alpha numeric or underscore) OR one or more whitespace characters (\\s = whitespace). When used as an argument to the String replaceAll function like so:
s.replaceAll("([sd]\\w+)|\\s+", "");
every occurance that matches either of these two patterns is replaced with the empty string.
There is comprehensive information on regexes in Oracle's java documentation here:
http://docs.oracle.com/javase/tutorial/essential/regex/
Although they seem cryptic at first, learning them can greatly simplify your code. Regexes are available in almost all modern languages so any knowledge you gain about regexes is useful and transferable.
Furthermore, the web is littered with handy sites where you can test your regexes out before committing them to code.
Do like this
String str1= "This is symbel for snow and silk. Grapes are very dear";
System.out.print(str1.replaceAll("[sd][a-z]+|[ ]+",""));
Explanation
try this
s = s.replaceAll("([sd]\\w+)|\\s+", "");

Java String Split on any character (including regex special characters)

I'm sure I'm just overlooking something here...
Is there a simple way to split a String on an explicit character without applying RegEx rules?
For instance, I receive a string with a dynamic delimiter, I know the 5th character defines the delimiter.
String s = "This,is,a,sample";
For this, it's simple to do
String delimiter = String.valueOf(s.charAt(4));
String[] result = s.split(delimiter);
However, when I have a delimiter that's a special RegEx character, this doesn't work:
String s = "This*is*a*sample";
So... is there a way to split the string on an explicit character without trying to apply extra RegEx rules? I feel like I must be missing something pretty simple.
split uses a regular expression as its argument. * is a meta-character used to match zero of more characters in regular expressions, You could use Pattern#quote to avoid interpreting the character
String[] result = s.split(Pattern.quote(delimiter));
You need not to worry about the character type If you use Pattern
Pattern regex = Pattern.compile(s.charAt(4));
Matcher matcher = regex.matcher(yourString);
if (matcher.find()){
//do something
}
You can run Pattern.quote on the delimiter before feeding it in. This will create a string literal and escape any regex specific chars:
delimiter = Pattern.quote(delimiter);
StringUtils.split(s, delimiter);
That will treat the delimiter as just a character, not use it like a regex.
StringUtils is a part of the ApacheCommons library, which is tons of useful methods. It is worth taking a look, could save you some time in the future.
Simply put your delimiter between []
String delimiter = "["+s.charAt(4)+"]";
String[] result = s.split(delimiter);
Since [ ] is the regex matches any characters between [ ]. You can also specify a list of delimiters like [*,.+-]

Java Regex replaceAll() with lookahead

I am fairly new to using regex with java. My motive is to escape all occurrences of '*' with a back slash.
This was the statement that I tried:
String replacementStr= str.replaceAll("(?=\\[*])", "\\\\");
This does not seem to work though. After some amount of tinkering, found out that this works though.
String replacementStr= str.replaceAll("(?=[]\\[*])", "\\\\");
Based on what I know of regular expressions, I thought '[]' represents an empty character class. Am I missing something here? Can someone please help me understand this?
Note: The motive of my trial was to learn to use the lookahead feature of regex. While the purpose stated in the question does not warrant the use of lookahead, am just trying to use it for educational purposes. Sorry for not making that clear!
When some metacharacters are placed within brackets, no need to escape.
In another way, I do not know if you mean escape * with \*. In that case, try the next:
String newStr = str.replace("*", "\\*");
EDIT: There is something curious in your regular expressions.
(?=\[*]) Look ahead for the character [ (0 or more times), followed by ]
(?=[]\[*]) Look ahead for one of the next characters: [, ], *
Perhaps the regex that you are looking for is the following:
(?=\*)
In Java, "(?=\\*)"
In your replaceAll("(?=\\[*])", "\\\\"); simply modify as
String newStr = str.replace("*", "\\");
Dont bother about regex
For example
String str = "abc*123*";
String newStr = str.replace("*", "\\");
System.out.println(newStr);
Shows output as
abc\123\
Know about String replace
Below Code will work
Code
String strTest = "jhgfg*gfb*gfhh";
strTest = strTest.replaceAll("\\*", "\\\\"); // strTest = strTest.replace("*", "\\");
System.out.println("String is : "+strTest);
OUTPUT
String is : jhgfg\gfb\gfhh
If the regex engine finds [], it treats the ] as a literal ]. This is never a problem because an empty character class is useless anyway, and it means you can avoid some character escaping.
There are a few rules for characters you don't have to escape in character classes:
in [] (or [^]), the ] is literal
in [-.....] or [^-.....] or [.....-] or [^.....-], the - is literal
^ is literal unless it is at the start of the character class
So you'll never need to escape ], - or ^ if you don't want to.
This is down to the Perl origins of the regex syntax. It's a very Perl-style way of doing things.

Categories

Resources