Remove repeating set of characters in a string - java

I want to remove the sequesnce "-~-~-" if it repeats in a string, but only if they are together.
I have tried to create a regex based on the removing of multiple white spaces regex:
test.replaceAll("\\s+", " ");
Unfortunately I was unsuccessful. Can someone please help me write the correct regex? thanks.
Example:
string test = "hello-~-~--~-~--~-~-"
output:
hello-~-~-
Another example
string test = "-~-~--~-~--~-~-hello-~-~--~-~--~-~-"
output:
-~-~-hello-~-~-

The regex is:
test.replaceAll("(-~-~-){2,}", "-~-~-")
replaceAll replaces all occurrences matched by the regex (the first parameter) with the second parameter.
the () groups the expression -~-~- together, {2,} means two or more occurrences.
EDIT
Like #anubhava said, instead of using -~-~- for the replacement string, you could also use $1 which backreferences the first capturing group (i.e. the expression in the regex surrounded by ()).

test.replaceAll("(-~-~-)+", "-~-~-");

This is the regex you need:
(-~-~-){2}

Related

Java String Regex replacement

Sample Input:
a:b
a.in:b
asds.sdsd:b
a:b___a.sds:bc___ab:bd
Sample Output:
a:replaced
a.in:replaced
asds.sdsd:replaced
a:replaced___a.sds:replaced___ab:replaced
String which comes after : should be replaced with custom function.
I have done the same without Regex. I feel it can be replaced with regex as we are trying to extract string out of specific pattern.
For first three cases, it's simple enough to extract String after :, but I couldn't find a way to deal with third case, unless I split the string ___ and apply the approach for first type of pattern and again concatenate them.
Just replace only the letters with exists next to : with the string replaced.
string.replaceAll("(?<=:)[A-Za-z]+", "replaced");
DEMO
or
If you also want to deal with digits, then add \d inside the char class.
string.replaceAll("(?<=:)[A-Za-z\\d]+", "replaced");
(:)[a-zA-Z]+
You can simply do this with string.replaceAll.Replace by $1replaced.See demo.
https://regex101.com/r/fX3oF6/18

Java, how to replace a sequence of numbers in a string

I am trying to replace any sequence of numbers in a string with the number itself within brackets.
So the input:
"i ee44 a1 1222"
Should have as an output:
"i ee(44) a(1) (1222)"
I am trying to implement it using String.replace(a,b) but with no success.
"i ee44 a1 1222".replaceAll("\\d+", "($0)");
Try this and see if it works.
Since you need to work with regular expressions, you may consider using replaceAll instead of replace.
You should use replaceAll. This method uses two arguments
regex for substrings we want to find
replacement for what should be used to replace matched substring.
In replacement part you can use groups matched by regex via $x where x is group index. For example
"ab cdef".replaceAll("[a-z]([a-z])","-$1")
will produce new string with replaced every two lower case letters with - and second currently matched letter (notice that second letter is placed parenthesis so it means that it is in group 1 so I can use it in replacement part with $1) so result will be -b -d-f.
Now try to use this to solve your problem.
You can use String.replaceAll with regular expressions:
"i ee44 a1 1222".replaceAll("(\\d+)", "($1)");

What is this Java regex code doing?

I just found this method inside a "Utils"-type class in our codebase. It was written a long time ago by a developer who no longer works for us. What in tarnation is it doing? What is it returning?!? Of course, there's no JavaDocs or comments.
public static String stripChars(String toChar, String ptn){
String stripped = "";
stripped = toChar.replaceAll(ptn, "$1");
return stripped.trim();
}
Thanks in advance!
It's a very short alias, essentially. This:
stripChars(a, b)
Is equivalent to:
a.replaceAll(b, "$1").trim()
It seems to replace everything in "toChar" which matches the regular expression "ptn" with the first group to match in "toChar"
Regular expressions have a concept of groups, for example matching "year 2012" and replacing it with "year 1012", or "year 2006" with "year 1007" (changing the first 20 to 10) can be accomplished by replacing
"year 20([0-9][9-9])" with "year 20$1" -- That is, match the entire string, and then replace it "year 20" followed by the first group ($1). The group is the first thing in parenthesis.
Anyway, your method then replaces everything that matches "ptn" in "toChar" with the first group in the regular expression "ptn". So given
stripChars("year 2012", "year 20([0-9][9-9]"); You would receive back only "12" because the entire text would match and be replaced by only the first group.
It then trips any leading or trailing whitespace.
The pattern string that is passed as argument method seems to contain a matching group and the call to replace all is going to replace the entire match to the paatern with the portion that matched the first group. You should look for the call hierarchy of this method to find some regexes passed to the method along with the strings that are being worked upon,
It's just replacing a string with its own subset of matched characters and then trimming the spaces from both end.
Fo example
So if you want a word to be replaced by a series of digits of that word
Use the regex \b.*?(\d*).*?\b
and then boom,your replaceAll method will give these results
hey123wow->123
what666->666
how888->888
$0 refers to the whole matched string i.e hey123wow,what666,how888 in this example
$1 refers to the group.i.e.(\d*) in this example i.e.123,666,888
$2 would refer to the second group which does not exist in this example.
toChar.replaceAll(ptn, "$1");
Its replacing all the occurences of ptn in toChar with the captured group $1 which we don't know where it is.
Capture groups are patterns inside brackets (): -
For E.G in the below Regex : -
"(\\d+)(cd)"
$0 denotes the complete match
$1 denotes the first capture group (\\d+)
$2 denotes the second capture group (cd)
String str1 = "xyz12cd";
// This will replace `12cd` with the first capture group `12`
str1 = str1.replaceAll("(\\d+)(cd)", "$1");
System.out.println(str1);
For learning more about Regular Expression, you can refer to the following links: -
http://www.vogella.com/articles/JavaRegularExpressions/article.html
http://docs.oracle.com/javase/tutorial/essential/regex/

Extract words (multiple whitespace) starting with # by regular expression

I have a problem with my regular expression:
String regex = "(?<=[\\s])#\\w+\\s";
I want a regex that formats a string like this:
"This is a Text #tag1 #tag2 #tag3"
With the regular expression, I get the last two values as result but not tag1 - because there is more than one whitespace. But i want all 3 of them!
I tried some variations, but nothing worked.
Use this regular expression:
(?<=(^|\\S)\\s)#\\w+(?=\\s|$)
Here's a demo.
It's a bit unclear from your question what you're really after, so I've put up some simple alternatives:
To capture all the tags in the string, we can use a lookbehind:
((?<=\\s|^)#\\w+)
To capture all the tags at the end of the string, we can use a lookahead:
(#\\w+(?=\\s#)|#\\w+$)
If there's always three tags at the end, there's no need for a lookaround:
(#\\w+)\s(#\\w+)\s(#\\w+)$

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?
Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link
IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").
Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.
Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Categories

Resources