how to understand code like this pKataLengkap.replaceAll("(.)\\1+", "$1") - java

can anyone describe what's mean of code below this
pKataLengkap.replaceAll("(.)\\1+", "$1")
i dont understand, im get some reference from link from code fight
thanks!

replaceAll replaces regular expressions (regexes). If you don't understand anything about regexes, you should read this tutorial. However, this particular regex is a bit on the tricky side, so I'll explain it. The regex is (.)\1+ (the backslash has to be doubled in a string literal, but the regex only has one backslash).
The first . matches any single character. Since it's in parentheses, the matcher treats this as a "capturing group"; since it's the first group in the regex, it's "capturing group 1". When a match is found (i.e. when the matcher finds any single character), the text of that match will be the capturing group. Thus, "capturing group 1" is that one character.
The next part is \1+. + is a quantifier meaning "one or more of whatever the + follows". \1 is a special pattern that means "whatever is in capturing group 1". So what this all means is that the pattern will match any single character followed by one or more occurrences of that same character. That is, it matches patterns with two or more occurrences of the same character.
Now each such pattern is replaced by "$1". The $1 is special in replaceAll, and it means "the contents of the capturing group 1", which is the single character that got matched.
So basically, any time the matcher sees two or more consecutive occurrences of the same character, it will replace them with one occurrence of that character. That is, it will transform "xxxyyyyyyzzz" to "xyz".

Related

Reg Ex strictly match word start with a pattern

I'm trying to extract a text after a sequence. But I have multiple sequences. the regex should ideally match first occurrence of any of these sequences.
my sequences are
PIN, PIN :, PIN IN, PIN IN:, PIN OUT,PIN OUT :
So I came up with the below regex
(PIN)(\sOUT|\sIN)?\:?\s*
It is doing the job except that the regex is also matching strings like
quote lupin in, pippin etc.
My question is how can I strictly select the string that match the pattern being the whole word
note: I tried ^(PIN)(\sOUT|\sON)?\:?\s* but of no use.
I'm new to java, any help is appreciated
It’s always recommended to have the documentation at hand when using regular expressions.
There, under Boundary matchers we find:
\b          A word boundary
So you may use the pattern \bPIN(\sOUT|\sIN)?:?\s* to enforce that PIN matches at the beginning of a word only, i.e. stands at the beginning of a string/line or is preceded by non-word characters like space or punctuation. A boundary only matches a position, rather than characters, so if a preceding non-word character makes this a word boundary, the character still is not part of the match.
Note that the first (…) grouping was unnecessary for the literal match PIN, further the colon : has no special meaning and doesn’t need to be escaped.

Java regular expressions for specific name\value format

I'm not familiar yet with java regular expressions. I want to validate a string that has the following format:
String INPUT = "[name1 value1];[name2 value2];[name3 value3];";
namei and valuei are Strings should contain any characters expect white-space.
I tried with this expression:
String REGEX = "([\\S*\\s\\S*];)*";
But if I call matches() I get always false even for a good String.
what's the best regular expression for it?
This does the trick:
(?:\[\w.*?\s\w.*?\];)*
If you want to only match three of these, replace the * at the end with {3}.
Explanation:
(?:: Start of non-capturing group
\[: Escapes the [ sign which is a meta-character in regex. This
allows it to be used for matching.
\w.*?: Lazily matches any word character [a-z][A-Z][0-9]_. Lazy matching means it attempts to match the character as few times possible, in this case meaning that when will stop matching once it finds the following \s.
\s: Matches one whitespace
\]: See \[
;: Matches one semicolon
): End of non-capturing group
*: Matches any number of what is contained in the preceding non-capturing group.
See this link for demonstration
You should escape square brackets. Also, if your aim is to match only three, replace * with {3}
(\[\\S*\\s\\S*\];){3}

capture all characters between match character (single or repeated) on string

I'm trying to extract the string preceding a specific character (even when character is repeated, like this (ie: underscore '_'):
this_is_my_example_line_0
this_is_my_example_line_1_
this_is_my_example_line_2___
_this_is_my_ _example_line_3_
__this_is_my___example_line_4__
and after running my regex I should get this (the regex should ignore the any instances of the matching character in the middle of the string):
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4
In other words I'm trying to 'trim' the matched character(s) at the beginning and end of string.
I'm trying to use a Regex in Java to accomplish this, my idea is to capture the group of characters between the special character(s) at the end or beginning of the line.
So far I can only do this successfully for example 3 with this regexp:
/[^_]+|_+(.*)[_$]+|_$+/
[^_]+ not 'underscore' once or more
| OR
_+ underscore once or more
(.*) capture all characters
[_$]+ not 'underscore' once or more followed by end of line
|_$+ OR 'underscore' once or more followed by end of line
I just realized that this excludes the first word of the message on example 0,1,2 since the string doesn't start with underscore and it only starts matching after finding a underscore..
Is there an easier way not involving regex?
I don't really care about the first character (although it would be nice) I only need to ignore the repeating character at the end.. it looks that (by this regex tester) just doing this, would work? /()_+$/ the empty parenthesis matches anything before a single or repeting matches at the end of the line.. would that be correct?
Thank you!
There are a couple of options here, you could either replace matches of ^_+|_+$ with an empty string, or extract the contents of the first capture group from the match of ^_*(.*?)_*$. Note that if your strings may be multiple lines and you want to perform the replacement on each line then you will need to use the Pattern.MULTILINE flag for either approach. If your strings may be multiple lines and you only want to replacement to occur at the very beginning and end, don't use Pattern.MULTILINE but use Pattern.DOTALL for the second approach.
For example: http://regexr.com?355ff
How about [^_\n\r](.*[^_\n\r])??
Demo
String data=
"this_is_my_example_line_0\n" +
"this_is_my_example_line_1_\n" +
"this_is_my_example_line_2___\n" +
"_this_is_my_ _example_line_3_\n" +
"__this_is_my___example_line_4__";
Pattern p=Pattern.compile("[^_\n\r](.*[^_\n\r])?");
Matcher m=p.matcher(data);
while(m.find()){
System.out.println(m.group());
}
output:
this_is_my_example_line_0
this_is_my_example_line_1
this_is_my_example_line_2
this_is_my_ _example_line_3
this_is_my___example_line_4

What is the responsibility of (.*) in the Java String?

What is the responsibility of (.*) in the third line and how it works?
String Str = new String("Welcome to Tutorialspoint.com");
System.out.print("Return Value :" );
System.out.println(Str.matches("(.*)Tutorials(.*)"));
.matches() is a call to parse Str using the regex provided.
Regex, or Regular Expressions, are a way of parsing strings into groups. In the example provided, this matches any string which contains the word "Tutorials". (.*) simply means "a group of zero or more of any character".
This page is a good regex reference (for very basic syntax and examples).
Your expression matches any word prefixed and suffixed by any character of word Tutorial. .* means occurrence of any character any number of times including zero times.
The . represents regular expression meta-character which means any character.
The * is a regular expression quantifier, which means 0 or more occurrences of the expression character it was associated with.
matches takes regular expression string as parameter and (.*) means capture any character zero or more times greedily
.* means a group of zero or more of any character
In Regex:
.
Wildcard: Matches any single character except \n
for example pattern a.e matches ave in nave and ate in water
*
Matches the previous element zero or more times
for example pattern \d*\.\d matches .0, 19.9, 219.9
There is no reason to put parentheses around the .*, nor is there a reason to instantiate a String if you've already got a literal String. But worse is the fact that the matches() method is out of place here.
What it does is greedily matching any character from the start to the end of a String. Then it backtracks until it finds "Tutorials", after which it will again match any characters (except newlines).
It's better and more clear to use the find method. The find method simply finds the first "Tutorials" within the String, and you can remove the "(.*)" parts from the pattern.
As a one liner for convenience:
System.out.printf("Return value : %b%n", Pattern.compile("Tutorials").matcher("Welcome to Tutorialspoint.com").find());

Multiple Regular Expressions

I'm not used to them and having trouble with the java syntax "matches".
I have two files one is 111.123.399.555.xml the other one is Conf.xml.
Now I only want to get the first file with regular expressions.
string.matches("[1-9[xml[.]]]");
doesnt work.
How to do this?
The use of string.matches("[1-9[xml[.]]]"); will not work because [] will create a character class group, not a capturing group.
What this means is that, to java, your expression is saying "match any of: [1-to-9 [or x, or m, or l [or *any*]]]" (*any* here is because you did not escape the ., and as it, it will create a match any character command)
Important:
"\" is recognized by java as a literal escape character, and for it to be sent to the matcher as an actual matcher's escape character (also "\", but in string form), it itself needs to be escaped, thus, when you mean to use "\" on the matcher, you must actually use "\\".
This is a bit confusing when you are not used to it, but to sum it up, to send an actual "\" to be matched to the matcher, you might have to use "\\\\"! The first "\\" will become "\" to the matcher, thus a scape character, and the second "\\", escaped by the first, will become the actual "\" string!
The correct pattern-string to match for a ###.###.###.###.xml pattern where the "#" are always numbers, is string.matches("(\\d{3}\\.){4}xml"), and how it works is as follows:
The \\d = will match a single digit character. It is the same as
using [0-9], just simpler.
The {3} specifies matching for "exactly 3 times" for the previous
\\d. Thus matching ###.
The \\. matches a single dot character.
The () enclosing the previous code says "this is a capturing group"
to the matcher. It is used by the next {4}, thus creating a "match
this whole ###. group exactly 4 times", thus creating "match ###.###.###.###.".
And finally, the xml before the pattern-string ends will match
exactly "xml", which, along the previous items, makes the exact match for that pattern: "###.###.###.###.xml".
For further learning, read Java's Pattern docs.
string.matches("[1-9.]+\\.xml")
should do it.
[1-9.]+ matches one or more digits between 1 and 9 and/or periods. (+ means "one or more", * means "zero or more", ? means "zero or one").
\.xml matches .xml. Since . means "any character" in a regex, you need to escape it if you want it to mean a literal period: \. (and since this is in a Java string, the backslash itself needs to be escaped by doubling).

Categories

Resources