Find and replace characters in brackets - java

I have a string kind of:
String text = "(plum) some other words, [apple], another words {pear}.";
I have to find and replace the words in brackets, don't replacing the brackets themselves.
If I write:
text = text.replaceAll("(\\(|\\[|\\{).*?(\\)|\\]|\\})", "fruit");
I get:
fruit some other words, fruit, another words fruit.
So the brackets went away with the fruits, but I need to keep them.
Desired output:
(fruit) some other words, [fruit], another words {fruit}.

Here is your regex:
(?<=[({\[\(])[A-Za-z]*(?=[}\]\)])
Test it here:
https://regex101.com/
In order to use it in Java, remember to add second backslashes:
(?<=[({\\[\\(])[A-Za-z]*(?=[}\\]\\)])
It matches 0 or more letters (uppercase or lowercase) preceded by either of these [,{,( and followed by either of these ],},).
If you want to have at least 1 letter between brackets just replace '*' with '+' like this:
(?<=[({\[\(])[A-Za-z]+(?=[}\]\)])

GCP showed how to use look aheads and look behinds to exclude the brackets from the matched part. But you can also match them, and refer to them in your replacement string with capturing groups:
text.replaceAll("([\\(\\[\\{]).*?([\\)\\]\\}])", "$1fruit$2");
Also note that you can replace the | ORs by a character group [].

Related

Regex to remove initials from full name

I have names like "D John Livingston" , "S. Jennifer Adstan" and I want only the initials to be removed from the names , "D" in the first name and "S." in the second name. How can i do it using java regex?
The following code snippet seems to be working well:
String input = "John O'Connel";
input = input.replaceAll("\\b[A-Z]+(?:\\.|\\s+|$)", "").trim();
System.out.println(input);
John O'Connel
Your question is chock full of edge cases, since an initial could be, for example, more than one letter, and could appear at the start, middle, or end of the name. I replaced using the pattern \s*[A-Z]+(?:\.|\b), which seems to at least cover your examples. Also, I make a call to String#trim() for some whitespace cleanup for initials at the very beginning or end.
Demo
For this I would consider using String replaceAll().
So how do we design the regex?
Basically there are three cases you need to consider:
A. a single letter at the beginning of the name (optional period), followed by one
space
B. a single letter at the end of the name (optional period), preceded by one
space
C. a single letter in the middle of the name (optional period), surrounded by
two spaces
For the first two cases, you need to leave no spaces. So you would match one space and replace it with zero spaces.
For the last case, you need to leave one space. However, rather than handling this case explicitly, you may treat it as either A or B, since those will replace only one of the two spaces, leaving you with the desired number of spaces: 1.
So how do we combine case A and case B together? Using the symbol of |.
To prevent grabbing a single letter from a larger chain of letters, you can use the word border marker \b on the side which is not demarcated by a space character. (Normally for cases A and B, I would have used ^ and $ to explicitly match begin and end of string for this purpose. However, since we also need to handle case C in the middle of the string, we should use word border marker instead. )
And how do we represent the optional period? Since the period is a special character it must be escaped: \. Then it is marked as optional with question mark: \.? However, there's still the problem that the A. in the middle of a name might be matched as just A since period also counts as a word border. To prevent this, we add a possessive quantifier to the optional period \\.?+.
Putting all of this together, our regex would be: (\b[A-Z]\.?+ )|( [A-Z]\.?+\b)
However, in the final Java string, the backslash must be escaped, so in the final Java string, each \ will appear as \\
Example code:
String pattern = "(\\b[A-Z]\\.?+ )|( [A-Z]\\.?+\\b)";
String input1 = "MC Hammer I Smash U";
String input2 = "S. Jennifer A. Adstan JR.";
System.out.println(input1.replaceAll(pattern, ""));
System.out.println(input2.replaceAll(pattern, ""));
Output:
MC Hammer Smash
Jennifer Adstan JR.

Replace white spaces only in part of the string

I have a String like
"This is apple tree"
I want to remove the white spaces available until the word apple.After the change it will be like
"Thisisapple tree"
I need to achieve this in single replace command combined with regular expressions.
For now it looks like you may be looking for
String s = "This is apple tree";
System.out.println(s.replaceAll("\\G(\\S+)(?<!(?<!\\S)apple)\\s", "$1"));
Output: Thisisapple tree.
Explanation:
\G represents either end of previous match or start of input (^) if there was no previous match yet (when we are attempting to find first match)
\S+ represents one or more non-whitespace characters (to match words, including non-alphabetic characters like ' or punctuation)
(?<!(?<!\\S)apple)\\s negative-look-behind will prevent accepting whitespace which has apple before it (I added another negative-look-behind before apple to make sure that it doesn't have any non-whitespace which ensures that this is not part of some other word)
$1 in replacement represents match from group 1 (the one from (\S+)) which represents word. So we are replacing word and spaces with only word (effectively removing spaces)
WARNING: This solution assumes that
sentence doesn't start with space,
words can be separated with only one space.
If we want to get rid of this assumptions we would need something like:
System.out.println(s.replaceAll("^\\s+|\\G(\\S+)(?<!(?<!\\S)apple)\\s+", "$1"));
^\s+ will allow us to match spaces at beginning of string (and replace them with content of group 1 (word) which in this case will be empty, so we will simply remove these whitespaces)
\s+ at the end allows us to match word and one or more spaces after it (to remove them)
A single replace() is unlikely to solve your problem. You could do something like this..
String s[] = "This is an apple tree, not an orange tree".split("apple");
System.out.println(new StringBuilder(s[0].replace(" ","")).append("apple").append(s[1]));
This is achived via lookahead assertion, like this:
String str = "This is an apple tree";
System.out.println(str.replaceAll(" (?=.*apple)", ""));
It means: replace all spaces in front of which there anywhere word apple
If you want to use a regular expression you could try:
Matcher matcher = Pattern.compile("^(.*?\\bapple\\b)(.*)$").matcher("This is an apple but this apple is an orange");
System.out.println((!matcher.matches()) ? "No match" : matcher.group(1).replaceAll(" ", "") + matcher.group(2));
This checks that "apple" is an individual word and not just part of another word such as "snapple". It also splits at the first use of "apple".

Split on non arabic characters

I have a String like this
أصبح::ينال::أخذ::حصل (على)::أحضر
And I want to split it on non Arabic characters using java
And here's my code
String s = "أصبح::ينال::أخذ::حصل (على)::أحضر";
String[] arr = s.split("^\\p{InArabic}+");
System.out.println(Arrays.toString(arr));
And the output was
[, ::ينال::أخذ::حصل (على)::أحضر]
But I expect the output to be
[ينال,أخذ,حصل,على,أحضر]
So I don't know what's wrong with this?
You need a negated class, and to do that, you need square brackets [ ... ]. Try to split with this:
"[^\\p{InArabic}]+"
If \\p{InArabic} matches any arabic character, then [^\\p{InArabic}] will match any non-arabic character.
Another option you can consider is an equivalent syntax, using P instead of p to indicate the opposite of the \\p{InArabic} character class like #Pshemo mentioned:
"\\P{InArabic}+"
This works just like \\W is the opposite of \\w.
The only possible advantage you get with the first syntax over the second (again like #Pshemo mentioned), is that if you want to add other characters to the list of characters which shouldn't match, for example, if you want to match all non \\p{InArabic} except periods, the first one is more flexible:
"[^\\p{InArabic}.]+"
^
Otherwise, if you really want to use \\P{InArabic}, you'll need subtraction within classes:
"[\\P{InArabic}&&[^.]]+"
The expression you want is "\\P{InArabic}+"
This means match any (non-zero) number of characters that are not Arabic.

Regex for text in brackets which contains no specific words

I found some expressions like
\((.*?)\)
which works good to find any text in brackets with brackets like (text)
and
^((?!.*(word1|word2|word3)).*)+
to find text which contains no one specific word like word4abcd and not for example word1 test
How to merge them to find text in brackets which not contains these words like in (example) and not like (example word2)?
Thanks in advance
The first regular expression uses a reluctant quantifier *? to make sure that the first available closing bracket is matched after the opening bracket. The second regular expression uses a zero-width negative look-ahead group (that's the (?!...) construction) to prevent matching certain words. To combine these tricks we're looking at something like this:
\(((?!something).*?)\)
The question is what goes in the place of the something. Simply putting .*(word1|word2) there will not work: this will also forbid word1 or word2 outside the brackets. Replacing .* by .*? does not change that. What does work is [^)]*(word1|word2) which will match any sequence of characters unequal to ) followed by word1 or word2.
The resultant expression then is
\(((?![^)]*(word1|word2)).*?)\)
which will match a bracketed expression that does not contain word1 or word2.
Question isn't very clear but I am giving an example that might help you.
Look at this regex:
/\((?!.*?\bexample\s+word\b[^)]*\)).*?\bexample\b.*?\)/
This matches word example inside square brackets. match will happen if there is no example word in the square brackets.
\b has been used for word boundaries so that examples is not matched instead.
Maybe this it what you want?
^\(((?!.*word1)(\S)*)|(\3)\)
Match: (wordwordexample)
Not Match: (word wordexample)
(word1wordexample)

Regex excluding square brackets

I am new to regex. I have this regex:
\[(.*[^(\]|\[)].*)\]
Basically it should take this:
[[a][b][[c]]]
And be able to replace with:
[dd[d]]
abc, d are unrelated. Needless to say the regex bit isn't working. it replaces the entire string with "d" in this case.
Any explanation or aid would be great!
EDIT:
I tried another regex,
\[([^\]]{0})\]
This one worked for the case where brackets contain no inner brackets and nothing else inside. But it doesn't work for the described case.
You need to know that . dot is special character which represents "any character beside new line mark" and * is greedy so it will try to find maximal match.
In your regex \[(.*[^(\]|\[)].*)\] first .* will represent maximal set of characters between [ and [^(\]|\[)].*)\]] and this part can be understood as non [ or ] character, optional other characters .* and finally ]. So this regex will match your entire input.
To get rid of that problem remove both .* from your regex. Also you don't need to use | or ( ) inside [^...].
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^\\]\\[]\\]", "d"));
Output: [dd[d]]
\[(\[a\])(\[b\])\[(\[c\])\]\]
If you need to double backslashes in the current context (such as you are placing it in a "" style string):
\\[(\\[a\\])(\\[b\\])\\[(\\[c\\])\\]\\]
An example replacement for a, b and c is [^\]]*, or if you need to escape backslashes [^\\]]*.
Now you can replace capture one, capture two and capture three each with d.
If the string you are replacing in is not exactly of that format, then you want to do a global replacement with
(\[a\])
replacing a,
(\[[^\]]*\])
doubling backslashes,
(\\[[^\\]]*\\])
Try this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]]", "d"));
if a,b,c are in real world more than one character, use this:
System.out.println("[[a][b][[c]]]".replaceAll("\\[[^]\\[]++]", "d"));
The idea is to use a character class that contains all characters but [ and ]. The class is: [^]\\[] and other square brackets in the pattern are literals.
Note that a literal closing square bracket don't need to be escaped at the first position in a character class and outside a character class.

Categories

Resources