replace regex string with parameter / token - java

I have this code snippet of a code base I am supposed to maintain.
String number = "1";
String value = "test";
String output = "";
output = value.replaceAll("\\Q{#}", number);
The value of output stays as "test" and I can only guess what this code is supposed to do: the value of numbershould be appended to whatever is in value. Maybe something like this: test1 or replace the value with the number entirely.
I found out that \\Q is the regex option to quote everything until \\E but there is no \\E. Anyway it is not doing anything at all and I am wondering if I oversee something?

Your regex just matches a literal {#}. It is true that after \Q the pattern is considered to have literal symbols (all the symbols after \Q get "quoted" or "escaped"), and \E stops this escaping/quoting, and if it is missing, the whole pattern will get quoted/escaped.
If your value variable holds test{#} value, the {#} will get replaced with the number.
See this demo:
String number = "1";
String value = "test{#}";
String output = "";
output = value.replaceAll("\\Q{#}", number);
System.out.println(output); // => test1
Note that without \Q, your regex ({#}) would throw a java.util.regex.PatternSyntaxException: Illegal repetition error because Java regex engine is not "smart" enough to disambiguate the braces (PCRE, JS, .NET can easily guess that since there is no number inside, it is not a limiting/bound quantifier).

Related

What is the Regex for decimal numbers in Java?

I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}

how can I replace substring that occurs only once while excluding substring two or more times in a row?

Is there any way that I can do the following in java ?
String s = "acdaaefacaa";
String b = s.replaceLikeMethod("a", "");
and b becomes "cdaaefcaa". Basically replace any occurrence of the first string "a" with the other one "" unless "a" appears two or more times in a row.
You can use regex to achieve this. The features you want are
Negative LookBehind (?<!foo) Match pattern unless foo occurs right before.
Negative LookAhead. (?!foo) Match pattern unless foo occurs right afterwards
You basically need to use both at the same time with the same string as the string to match and pattern. E.g.
String pattern = "(?<!foo)foo(?!foo)";
Or to easily replace with a string known at runtime like "a"
String pattern = "(?<!foo)foo(?!foo)".replace("foo", "a");
Finally, to replace just do :
String b = s.replaceAll(pattern, "");
Use this regex: ((?<!a)a(?!a)). It uses negative lookahead and lookbehind. It matches every a that is not preceded and followed by another a.
Test:
String input = "acdaaefacaa";
String output = input.replaceAll("((?<!a)a(?!a))", "");
System.out.println(output);
Outputs:
cdaaefcaa

Regex: extract String from String

I need a regex that makes it possible to extract a part out of String. I get this String by parsing a XML-Document with DOM. Then I am looking for the "§regex" part in this String and now I try do extract the value of it. e.g. "([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})" from the rest.
The Problem is, I don´t know how to make sure the extracted part ends with a ")"
This regex needs to work for every value given. The goal is to write only the Value in brackets after the "§regex=" including the brackets into a String.
<UML:TaggedValue tag="description" value=" random Text §regex=([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3}) random text"/>
private List<String> findRegex() {
List<String> forReturn = new ArrayList<String>();
for (String str : attDescription) {
if (str.contains("§regex=")) {
String s = str.replaceAll(regex);
forReturn.add(s);
}
}
return forReturn;
}
attDescription is a list which contains all Attributes found in the XML-Document parsed.
So far i tried this regex: ".*(§regex=)(.*)[)$].*", "$2" but this cuts off the ")" and does not delete the text infront of the searched part. Even with the help of this http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I really don´t understand how to get what I need.
It seems to work for me (with this example anyway) if I use this in place of String s = str.replaceAll(regex);
String s = str.replaceAll( ".*§regex=(\\(.*\\)).*", "$1" );
It's just looking for a substring enclosed by parentheses following §regex=.
This seems to work:
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
Note:
Escape the leading bracket
The $ inside a character class is a literal $ - ignore it, because your regex should always end with a bracket
No need to capture the fixed text
Test code, noting that this works with brackets in/around the regex:
String str = "random Text §regex=(([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})) random text";
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
System.out.println(s);
Output:
([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})

Matching everything after the first comma in a string

I am using java to do a regular expression match. I am using rubular to verify the match and ideone to test my code.
I got a regex from this SO solution , and it matches the group as I want it to in rubular, but my implementation in java is not matching. When it prints 'value', it is printing the value of commaSeparatedString and not matcher.group(1) I want the captured group/output of println to be "v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso"
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
//match everything after first comma
String myRegex = ",(.*)";
Pattern pattern = Pattern.compile(myRegex);
Matcher matcher = pattern.matcher(commaSeparatedString);
String value = "";
if (matcher.matches())
value = matcher.group(1);
else
value = commaSeparatedString;
System.out.println(value);
(edit: I left out that commaSeparatedString will not always contain 2 commas. Rather, it will always contain 0 or more commas)
If you don't have to solve it with regex, you can try this:
int size = commaSeparatedString.length();
value = commaSeparatedString.substring(commaSeparatedString.indexOf(",")+1,size);
Namely, the code above returns the substring which starts from the first comma's index.
EDIT:
Sorry, I've omitted the simpler version. Thanks to one of the commentators, you can use this single line as well:
value = commaSeparatedString.substring( commaSeparatedString.indexOf(",") );
The definition of the regex is wrong. It should be:
String myRegex = "[^,]*,(.*)";
You are yet another victim of Java's misguided regex method naming.
.matches() automatically anchors the regex at the beginning and end (which is in total contradiction with the very definition of "regex matching"). The method you are looking for is .find().
However, for such a simple problem, it is better to go with #DelShekasteh's solution.
I would do this like
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
System.out.println(commaSeparatedString.substring(commaSeparatedString.indexOf(",")+1));
Here is another approach with limited split
String[] spl = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso".split(",", 2);
if (spl.length == 2)
System.out.println(spl[1]);
Byt IMHO Del's answer is best for your case.
I would use replaceFirst
String commaSeparatedString = "Vtest7,v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso";
System.out.println(commaSeparatedString.replaceFirst(".*?,", ""));
prints
v123_gpbpvl-testpv1,v223_gpbpvl-testpv1-iso
or you could use the shorter but obtuse
System.out.println(commaSeparatedString.split(",", 2)[1]);

How to remove special characters from a string?

I want to remove special characters like:
- + ^ . : ,
from an String using Java.
That depends on what you define as special characters, but try replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since you'd then either have to escape it or it would mean "any but these characters".
Another note: the - character needs to be the first or last one on the list, otherwise you'd have to escape it or it would define a range ( e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character positioning, you might want to escape all those characters that have a special meaning in regular expressions (the following list is not complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex: \p{P}\p{S} (keep in mind that in Java strings you'd have to escape back slashes: "\\p{P}\\p{S}").
A third way could be something like this, if you can exactly define what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
This means: replace everything that is not a word character (a-z in any case, 0-9 or _) or whitespace.
Edit: please note that there are a couple of other patterns that might prove helpful. However, I can't explain them all, so have a look at the reference section of regular-expressions.info.
Here's less restrictive alternative to the "define allowed characters" approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and not a separator (whitespace, linebreak etc.). Note that you can't use [\P{L}\P{Z}] (upper case P means not having that property), since that would mean "everything that is not a letter or not whitespace", which almost matches everything, since letters are not whitespace and vice versa.
Additional information on Unicode
Some unicode characters seem to cause problems due to different possible ways to encode them (as a single code point or a combination of code points). Please refer to regular-expressions.info for more information.
This will replace all the characters except alphanumeric
replaceAll("[^A-Za-z0-9]","");
As described here
http://developer.android.com/reference/java/util/regex/Pattern.html
Patterns are compiled regular expressions. In many cases, convenience methods such as String.matches, String.replaceAll and String.split will be preferable, but if you need to do a lot of work with the same regular expression, it may be more efficient to compile it once and reuse it. The Pattern class and its companion, Matcher, also offer more functionality than the small amount exposed by String.
public class RegularExpressionTest {
public static void main(String[] args) {
System.out.println("String is = "+getOnlyStrings("!&(*^*(^(+one(&(^()(*)(*&^%$##!#$%^&*()("));
System.out.println("Number is = "+getOnlyDigits("&(*^*(^(+91-&*9hi-639-0097(&(^("));
}
public static String getOnlyDigits(String s) {
Pattern pattern = Pattern.compile("[^0-9]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
public static String getOnlyStrings(String s) {
Pattern pattern = Pattern.compile("[^a-z A-Z]");
Matcher matcher = pattern.matcher(s);
String number = matcher.replaceAll("");
return number;
}
}
Result
String is = one
Number is = 9196390097
Try replaceAll() method of the String class.
BTW here is the method, return type and parameters.
public String replaceAll(String regex,
String replacement)
Example:
String str = "Hello +-^ my + - friends ^ ^^-- ^^^ +!";
str = str.replaceAll("[-+^]*", "");
It should remove all the {'^', '+', '-'} chars that you wanted to remove!
To Remove Special character
String t2 = "!##$%^&*()-';,./?><+abdd";
t2 = t2.replaceAll("\\W+","");
Output will be : abdd.
This works perfectly.
Use the String.replaceAll() method in Java.
replaceAll should be good enough for your problem.
You can remove single char as follows:
String str="+919595354336";
String result = str.replaceAll("\\\\+","");
System.out.println(result);
OUTPUT:
919595354336
If you just want to do a literal replace in java, use Pattern.quote(string) to escape any string to a literal.
myString.replaceAll(Pattern.quote(matchingStr), replacementStr)

Categories

Resources