Replacing first occurence of two asterisks in a String in java - java

I java, I need to replace a double asterisk, only the first occurence. How?
I want that:
the first "**" --> "<u>"
and the second "**" --> "<\u>"
Example:
String a = "John **Doe** is a bad boy"
should become:
String a = "John <u>Doe<\u> is a bad boy"
using somethig as:
a = a.replaceFirst("**","<u>").replaceFirst("**","<\u>")
How?

You need to escape the asterisks to avoid them being interpreted as part of a regular expression:
a = a.replaceFirst(Pattern.escape("**"), "<u>");
Or:
a = a.replaceFirst("\\Q**\\E", "<u>")
Or:
a = a.replaceFirst("\\*\\*"), "<u>");
To perform your translation you could do this:
a = a.replaceAll("\\*\\*(.*?)\\*\\*", "<u>$1</u>");
The advantage of a single replaceAll over a pair of replaceFirst calls is that replaceAll would work for strings containing multiple asterisked words, e.g. "John **Doe** is a **bad** boy".
Essentially the matching expression means:
\\*\\* -- literal "**"
( -- start a capturing group
. -- match any character (except LF, CR)
* -- zero or more of them
? -- not greedily (i.e. find the shortest match possible)
) -- end the group
\\*\\* -- literal "**"
The replacement:
<u> -- literal <u>
$1 -- the contents of the captured group (i.e. text inside the asterisks)
</u> -- literal </u>
By the way, I've changed your end tag to </u> instead of <\u> :-)
Depending on your requirements, you might be able to use a Markdown parser, e.g. Txtmark and save yourself reinventing the wheel.

You can use:
String a = "John **Doe** is a bad boy"
a = a.replaceFirst("\\Q**\\E", "<u>").replaceFirst("\\Q**\\E", "</u>");
//=> John <u>Doe</u> is a bad boy

As mentioned above by aetheria and going with what you already are trying:
a = a.replaceFirst("\\*\\*", "<u>").replaceFirst("\\*\\*", "<\u>");
When you want to try something else, I recommend using the online regex tester below which will show the results of different patterns using replaceFirst, replaceAll, etc on different input strings. It will also provide in the top left the correctly escaped string that should be used in your Java code.
http://www.regexplanet.com/advanced/java/index.html

I would do this:
String a = "John **Doe** is a bad boy";
String b = a.replaceAll("\\*\\*(.*?)\\*\\*", "<u>$1</u>");
//John <u>Doe</u> is a bad boy
LIVE DEMO
REGEX EXPLANATION
\*\*(.*?)\*\*
Match the character “*” literally «\*»
Match the character “*” literally «\*»
Match the regex below and capture its match into backreference number 1 «(.*?)»
Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “*” literally «\*»
Match the character “*” literally «\*»
<u>$1</u>
Insert the character string “<u>” literally «<u>»
Insert the text that was last matched by capturing group number 1 «$1»
Insert the character string “</u>” literally «</u>»

Related

Regex pattern matching with multiple strings

Forgive me. I am not familiarized much with Regex patterns.
I have created a regex pattern as below.
String regex = Pattern.quote(value) + ", [NnoneOoff0-9\\-\\+\\/]+|[NnoneOoff0-9\\-\\+\\/]+, "
+ Pattern.quote(value);
This regex pattern is failing with 2 different set of strings.
value = "207e/160";
Use Case 1 -
When channelStr = "207e/160, 149/80"
Then channelStr.matches(regex), returns "true".
Use Case 2 -
When channelStr = "207e/160, 149/80, 11"
Then channelStr.matches(regex), returns "false".
Not able to figure out why? As far I can understand it may be because of the multiple spaces involved when more than 2 strings are present with separated by comma.
Not sure what should be correct pattern I should write for more than 2 strings.
Any help will be appreciated.
If you print your pattern, it is:
\Q207e/160\E, [NnoneOoff0-9\-\+\/]+|[NnoneOoff0-9\-\+\/]+, \Q207e/160\E
It consists of an alternation | matching a mandatory comma as well on the left as on the right side.
Using matches(), should match the whole string and that is the case for 207e/160, 149/80 so that is a match.
Only for this string 207e/160, 149/80, 11 there are 2 comma's, so you do get a partial match for the first part of the string, but you don't match the whole string so matches() returns false.
See the matches in this regex demo.
To match all the values, you can use a repeating pattern:
^[NnoeOf0-9+/-]+(?:,\h*[NnoeOf0-90+/-]+)*$
^ Start of string
[NnoeOf0-9\\+/-]+
(?: Non capture group
,\h* Match a comma and optional horizontal whitespace chars
[NnoeOf0-90-9\\+/-]+ Match 1+ any of the listed in the character class
)* Close the non capture group and optionally repeat it (if there should be at least 1 comma, then the quantifier can be + instead of *)
$ End of string
Regex demo
Example using matches():
String channelStr1 = "207e/160, 149/80";
String channelStr2 = "207e/160, 149/80, 11";
String regex = "^[NnoeOf0-9+/-]+(?:,\\h*[NnoeOf0-90+/-]+)*$";
System.out.println(channelStr1.matches(regex));
System.out.println(channelStr2.matches(regex));
Output
true
true
Note that in the character class you can put - at the end not having to escape it, and the + and / also does not have to be escaped.
You can use regex101 to test your RegEx. it has a description of everything that's going on to help with debugging. They have a quick reference section bottom right that you can use to figure out what you can do with examples and stuff.
A few things, you can add literals with \, so \" for a literal double quote.
If you want the pattern to be one or more of something, you would use +. These are called quantifiers and can be applied to groups, tokens, etc. The token for a whitespace character is \s. So, one or more whitespace characters would be \s+.
It's difficult to tell exactly what you're trying to do, but hopefully pointing you to regex101 will help. If you want to provide examples of the current RegEx you have, what you want to match and then the strings you're using to test it I'll be happy to provide you with an example.
^(?:[NnoneOoff0-9\\-\\+\\/]+ *(?:, *(?!$)|$))+$
^ Start
(?: ... ) Non-capturing group that defines an item and its separator. After each item, except the last, the separator (,) must appear. Spaces (one, several, or none) can appear before and after the comma, which is specified with *. This group can appear one or more times to the end of the string, as specified by the + quantifier after the group's closing parenthesis.
Regex101 Test

java regular expression and replace all occurrences

I want to replace one string in a big string, but my regular expression is not proper I guess. So it's not working.
Main string is
Some sql part which is to be replaced
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'
String to find and replace is
Based on some condition sql part to be replaced
hemp.EMPLOYEE_NAME = 'xxx'
I have tried this with
Pattern and Matcher class is used and
Pattern pat1 = Pattern.compile("/^hemp.EMPLOYEE_NAME\\s=\\s\'\\w\'\\s[and|or]*/$", Pattern.CASE_INSENSITIVE);
Matcher mat = pat1.matcher(cond);
while (mat.find()) {
System.out.println("Match: " + mat.group());
cond = mat.replaceFirst("xx "+mat.group()+"x");
mat = pat1.matcher(cond);
}
It's not working, not entering the loop at all. Any help is appreciated.
Obviously not - your regexp pattern doesn't make any sense.
The opening /: In some languages, regexps aren't strings and start with an opening slash. Java is not one of those languages, and it has nothing to do with regexps itself. So, this looks for a literal slash in that SQL, which isn't there, thus, failure.
^ is regexpese for 'start of string'. Your string does not start with hemp.EMPLOYEE_NAME, so that also doesn't work. Get rid of both / and ^ here.
\\s is one whitespace character (there are many whitespace characters - this matches any one of them, exactly one though). Your string doesn't have any spaces. Your intent, surely, was \\s* which matches 0 to many of them, i.e.: \\s* is: "Whitespace is allowed here". \\s is: There must be exactly one whitespace character here. Make all the \\s in your regexp an \\s*.
\\w is exactly one 'word' character (which is more or less a letter or digit), you obviously wanted \\w*.
[and|or] this is regexpese for: "An a, or an n, or a d, or an o, or an r, or a pipe symbol". Clearly you were looking for (and|or) which is regexpese for: Either the sequence "and", or the sequence "or".
* - so you want 0 to many 'and' or 'or', which makes no sense.
closing slash: You don't want this.
closing $: You don't want this - it means 'end of string'. Your string didn't end here.
The code itself:
replaceFirst, itself, also does regexps. You don't want to double apply this stuff. That's not how you replace a found result.
This is what you wanted:
Matcher mat = pat1.matcher(cond);
mat.replaceFirst("replacement goes here");
where replacement can include references to groups in the match if you want to take parts of what you matched (i.e. don't use mat.group(), use those references).
More generally did you read any regexp tutorial, did any testing, or did any reading of the javadoc of Pattern and Matcher?
I've been developing for a few years. It's just personal experience, perhaps, but, reading is pretty fundamental.
Instead of the anchors ^ and $, you can use word boundaries \b to prevent a partial match.
If you want to match spaces on the same line, you can use \h to match horizontal whitespace char, as \s can also match a newline.
You can use replaceFirst on the string using $0 to get the full match, and an inline modifier (?i) for a case insensitive match.
Note that using [and|or] is a character class matching one of the listed chars and escape the dot to match it literally, or else . matches any char except a newline.
(?i)\bhemp\.EMPLOYEE_NAME\h*=\h*'\w+'\h+(?:and|or)\b
See a regex demo or a Java demo
For example
String regex = "\\bhemp\\.EMPLOYEE_NAME\\h*=\\h*'\\w+'\\h+(?:and|or)\\b";
String string = "cond = emp.EMAIL_ID = 'xx#xx.com' AND\n"
+ "emp.PERMANENT_ADDR LIKE('%98n%') \n"
+ "AND hemp.EMPLOYEE_NAME = 'xxx' and is_active='Y'";
System.out.println(string.replaceFirst(regex, "xx$0x"));
Output
cond = emp.EMAIL_ID = 'xx#xx.com' AND
emp.PERMANENT_ADDR LIKE('%98n%')
AND xxhemp.EMPLOYEE_NAME = 'xxx' andx is_active='Y'

Regular expression to remove unwanted characters from the String

I have a requirement where I need to remove unwanted characters for String in java.
For example,
Input String is
Income ......................4,456
liability........................56,445.99
I want the output as
Income 4,456
liability 56,445.99
What is the best approach to write this in java. I am parsing large documents
for this hence it should be performance optimized.
You can do this replace with this line of code:
System.out.println("asdfadf ..........34,4234.34".replaceAll("[ ]*\\.{2,}"," "));
For this particular example, I might use the following replacement:
String input = "Income ......................4,456";
input = input.replaceAll("(\\w+)\\s*\\.+(.*)", "$1 $2");
System.out.println(input);
Here is an explanation of the pattern being used:
(\\w+) match AND capture one or more word characters
\\s* match zero or more whitespace characters
\\.+ match one or more literal dots
(.*) match AND capture the rest of the line
The two quantities in parentheses are known as capture groups. The regex engine remembers what these were while matching, and makes them available, in order, as $1 and $2 to use in the replacement string.
Output:
Income 4,456
Demo
Best way to do that is like:
String result = yourString.replaceAll("[-+.^:,]","");
That will replace this special character with nothing.

How do i check if string contains char sequence and backslash "\"?

I'm trying to get true in the following test. I have a string with the backslash, that for some reason doesn't recognized.
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\.");
System.out.println(test);
I've tried a lot of variants, but only one (.*)news(.*) works. But that actually means any characters after news, i need only with \.
How can i do that?
Group the elements at the end:(.*)news\\(.*)
You can use this instead :
Boolean test = s.matches("(.*)news\\\\(.*)");
Try something like:
Boolean test = s.matches(".*news\\\\.*");
Here .* means any number of characters followed by news, followed by double back slashes (escaped in a string) and then any number of characters after that (can be zero as well).
With your regex what it means is:
.* Any number of characters
news\\ - matches by "news\" (see one slash)
. followed by one character.
which doesn't satisfies for String in your program "Good news\ everyone!"
You are testing for an escaped occurrence of a literal dot: ".".
Refactor your pattern as follows (inferring the last part as you need it for a full match):
String s = "Good news\\ everyone!";
System.out.println(s.matches("(.*)news\\\\.*"));
Output
true
Explanation
The back-slash is used to escape characters and the back-slash itself in Java Strings
In Java Pattern representations, you need to double-escape your back-slashes for representing a literal back-slash ("\\\\"), as double-back-slashes are already used to represent special constructs (e.g. \\p{Punct}), or escape them (e.g. the literal dot \\.).
String.matches will attempt to match the whole String against your pattern, so you need the terminal part of the pattern I've added
you can try this :
String s = "Good news\\ everyone!";
Boolean test = s.matches("(.*)news\\\\(.*)");
System.out.println(test);

What is this Java regex code doing?

I just found this method inside a "Utils"-type class in our codebase. It was written a long time ago by a developer who no longer works for us. What in tarnation is it doing? What is it returning?!? Of course, there's no JavaDocs or comments.
public static String stripChars(String toChar, String ptn){
String stripped = "";
stripped = toChar.replaceAll(ptn, "$1");
return stripped.trim();
}
Thanks in advance!
It's a very short alias, essentially. This:
stripChars(a, b)
Is equivalent to:
a.replaceAll(b, "$1").trim()
It seems to replace everything in "toChar" which matches the regular expression "ptn" with the first group to match in "toChar"
Regular expressions have a concept of groups, for example matching "year 2012" and replacing it with "year 1012", or "year 2006" with "year 1007" (changing the first 20 to 10) can be accomplished by replacing
"year 20([0-9][9-9])" with "year 20$1" -- That is, match the entire string, and then replace it "year 20" followed by the first group ($1). The group is the first thing in parenthesis.
Anyway, your method then replaces everything that matches "ptn" in "toChar" with the first group in the regular expression "ptn". So given
stripChars("year 2012", "year 20([0-9][9-9]"); You would receive back only "12" because the entire text would match and be replaced by only the first group.
It then trips any leading or trailing whitespace.
The pattern string that is passed as argument method seems to contain a matching group and the call to replace all is going to replace the entire match to the paatern with the portion that matched the first group. You should look for the call hierarchy of this method to find some regexes passed to the method along with the strings that are being worked upon,
It's just replacing a string with its own subset of matched characters and then trimming the spaces from both end.
Fo example
So if you want a word to be replaced by a series of digits of that word
Use the regex \b.*?(\d*).*?\b
and then boom,your replaceAll method will give these results
hey123wow->123
what666->666
how888->888
$0 refers to the whole matched string i.e hey123wow,what666,how888 in this example
$1 refers to the group.i.e.(\d*) in this example i.e.123,666,888
$2 would refer to the second group which does not exist in this example.
toChar.replaceAll(ptn, "$1");
Its replacing all the occurences of ptn in toChar with the captured group $1 which we don't know where it is.
Capture groups are patterns inside brackets (): -
For E.G in the below Regex : -
"(\\d+)(cd)"
$0 denotes the complete match
$1 denotes the first capture group (\\d+)
$2 denotes the second capture group (cd)
String str1 = "xyz12cd";
// This will replace `12cd` with the first capture group `12`
str1 = str1.replaceAll("(\\d+)(cd)", "$1");
System.out.println(str1);
For learning more about Regular Expression, you can refer to the following links: -
http://www.vogella.com/articles/JavaRegularExpressions/article.html
http://docs.oracle.com/javase/tutorial/essential/regex/

Categories

Resources