What is the responsibility of (.*) in the Java String? - java

What is the responsibility of (.*) in the third line and how it works?
String Str = new String("Welcome to Tutorialspoint.com");
System.out.print("Return Value :" );
System.out.println(Str.matches("(.*)Tutorials(.*)"));

.matches() is a call to parse Str using the regex provided.
Regex, or Regular Expressions, are a way of parsing strings into groups. In the example provided, this matches any string which contains the word "Tutorials". (.*) simply means "a group of zero or more of any character".
This page is a good regex reference (for very basic syntax and examples).

Your expression matches any word prefixed and suffixed by any character of word Tutorial. .* means occurrence of any character any number of times including zero times.
The . represents regular expression meta-character which means any character.
The * is a regular expression quantifier, which means 0 or more occurrences of the expression character it was associated with.

matches takes regular expression string as parameter and (.*) means capture any character zero or more times greedily

.* means a group of zero or more of any character

In Regex:
.
Wildcard: Matches any single character except \n
for example pattern a.e matches ave in nave and ate in water
*
Matches the previous element zero or more times
for example pattern \d*\.\d matches .0, 19.9, 219.9

There is no reason to put parentheses around the .*, nor is there a reason to instantiate a String if you've already got a literal String. But worse is the fact that the matches() method is out of place here.
What it does is greedily matching any character from the start to the end of a String. Then it backtracks until it finds "Tutorials", after which it will again match any characters (except newlines).
It's better and more clear to use the find method. The find method simply finds the first "Tutorials" within the String, and you can remove the "(.*)" parts from the pattern.
As a one liner for convenience:
System.out.printf("Return value : %b%n", Pattern.compile("Tutorials").matcher("Welcome to Tutorialspoint.com").find());

Related

Java regular expressions for specific name\value format

I'm not familiar yet with java regular expressions. I want to validate a string that has the following format:
String INPUT = "[name1 value1];[name2 value2];[name3 value3];";
namei and valuei are Strings should contain any characters expect white-space.
I tried with this expression:
String REGEX = "([\\S*\\s\\S*];)*";
But if I call matches() I get always false even for a good String.
what's the best regular expression for it?
This does the trick:
(?:\[\w.*?\s\w.*?\];)*
If you want to only match three of these, replace the * at the end with {3}.
Explanation:
(?:: Start of non-capturing group
\[: Escapes the [ sign which is a meta-character in regex. This
allows it to be used for matching.
\w.*?: Lazily matches any word character [a-z][A-Z][0-9]_. Lazy matching means it attempts to match the character as few times possible, in this case meaning that when will stop matching once it finds the following \s.
\s: Matches one whitespace
\]: See \[
;: Matches one semicolon
): End of non-capturing group
*: Matches any number of what is contained in the preceding non-capturing group.
See this link for demonstration
You should escape square brackets. Also, if your aim is to match only three, replace * with {3}
(\[\\S*\\s\\S*\];){3}

how to understand code like this pKataLengkap.replaceAll("(.)\\1+", "$1")

can anyone describe what's mean of code below this
pKataLengkap.replaceAll("(.)\\1+", "$1")
i dont understand, im get some reference from link from code fight
thanks!
replaceAll replaces regular expressions (regexes). If you don't understand anything about regexes, you should read this tutorial. However, this particular regex is a bit on the tricky side, so I'll explain it. The regex is (.)\1+ (the backslash has to be doubled in a string literal, but the regex only has one backslash).
The first . matches any single character. Since it's in parentheses, the matcher treats this as a "capturing group"; since it's the first group in the regex, it's "capturing group 1". When a match is found (i.e. when the matcher finds any single character), the text of that match will be the capturing group. Thus, "capturing group 1" is that one character.
The next part is \1+. + is a quantifier meaning "one or more of whatever the + follows". \1 is a special pattern that means "whatever is in capturing group 1". So what this all means is that the pattern will match any single character followed by one or more occurrences of that same character. That is, it matches patterns with two or more occurrences of the same character.
Now each such pattern is replaced by "$1". The $1 is special in replaceAll, and it means "the contents of the capturing group 1", which is the single character that got matched.
So basically, any time the matcher sees two or more consecutive occurrences of the same character, it will replace them with one occurrence of that character. That is, it will transform "xxxyyyyyyzzz" to "xyz".

How to use Java Regular Expressions to extract the following data?

How to obtain the first long number from the whole sentence given below using regular exression:
396124450036269056,"#Anyi1987 asi fue,bano total para mi.,:D",MiriamBustam
I want the result as: 396124450036269056.
So how do I represent the number in this whole sentence using regular expressions?
I am using Apache Pig scripting language which makes use of Java regular expressions.
So in Apace Pig:
REGEX_EXTRACT_ALL:
Syntax:
REGEX_EXTRACT_ALL (string, regex)
. Use the REGEX_EXTRACT_ALL function to perform regular expression matching and to extract all matched groups.
This example will return the tuple (192.168.1.5,8020).
REGEX_EXTRACT_ALL('192.168.1.5:8020', '(.*)\:(.*)');
REGEX_EXTRACT:
Syntax:
REGEX_EXTRACT (string, regex, index).
Use the REGEX_EXTRACT function to perform regular expression matching and to extract the matched group defined by the index parameter (where the index is a 1-based parameter.)
This example will return the string '192.168.1.5'.
REGEX_EXTRACT('192.168.1.5:8020', '(.*):(.*)', 1);
\d+
Matches all digit characters.
So it matches 396124450036269056 in this case.
You don't need a regex here. You could use a substring().
s.substring(0, s.indexOf(","))
I think is not exist a regular expresion to match the longest number from a text.
The expressions like \d+ or \d* will match only the first number no matter how many digits will have. So if you will have "55 msadmmsada 8882138213821321382183" those expressions will match 55 only.
If your string always starts with a number, simply use (\d+) (see this at regex101).
This will extract all digits at the start of something into a matching group. So, if I understand your examples right,
REGEX_EXTRACT(you, '(\d+).*', 1);
Would do the trick. You would only have to append the .* if this function has to match the whole text to extract something, otherwise you can omit it.
You could use:
\d*
and it will match 396124450036269056
Explanation:
\d* match a digit [0-9]
Quantifier: * Between zero and unlimited times

simple java regular expression not working

I have this simple example of a regular expression. But it is not working. I don't know what I am doing wrong:
String name = "abc";
System.out.println(name.matches("[a-zA-Z]"));
it returns false, it should be true.
use :
name.matches("[a-zA-Z]+") // matches more than one character
or name.matches("\\w+") // matches more than one character
name.matches("[a-zA-Z]") // matches exactly one character.
Add + to your regex to match one or more alphabets,
String name = "abc"; System.out.println(name.matches("[a-zA-Z]+"));
Your regex [a-zA-Z] must match a single alphabet, not more than one.
[a-zA-Z] Match a lowercase alphabet from a-z or match an uppercase alphabet from A-Z.
The reason why this evaluates to false is, it tries to match the entrie string (see doc of String.matches()) to the Pattern [A-Za-z] wich only matches a single character. Either use
Pattern.compile("[A-Za-z]").matcher(str).find() to see if a substring matches (will return true in this case), or alter the RegEx to account for multiple Characters. The cleanest way of doing so is
Pattern.compile("^[A-Za-z]+$");
The ^ marks "start of string" and $ marks "end of string". + means "previous token at least once".
If you want to allow the empty String as well, use
Pattern.compile("^[A-Za-z]*$");
instead (* means "match the previous token 0 or more times")
Try with [a-zA-Z]+
[a-zA-Z] indicates:

What is this Java regex code doing?

I just found this method inside a "Utils"-type class in our codebase. It was written a long time ago by a developer who no longer works for us. What in tarnation is it doing? What is it returning?!? Of course, there's no JavaDocs or comments.
public static String stripChars(String toChar, String ptn){
String stripped = "";
stripped = toChar.replaceAll(ptn, "$1");
return stripped.trim();
}
Thanks in advance!
It's a very short alias, essentially. This:
stripChars(a, b)
Is equivalent to:
a.replaceAll(b, "$1").trim()
It seems to replace everything in "toChar" which matches the regular expression "ptn" with the first group to match in "toChar"
Regular expressions have a concept of groups, for example matching "year 2012" and replacing it with "year 1012", or "year 2006" with "year 1007" (changing the first 20 to 10) can be accomplished by replacing
"year 20([0-9][9-9])" with "year 20$1" -- That is, match the entire string, and then replace it "year 20" followed by the first group ($1). The group is the first thing in parenthesis.
Anyway, your method then replaces everything that matches "ptn" in "toChar" with the first group in the regular expression "ptn". So given
stripChars("year 2012", "year 20([0-9][9-9]"); You would receive back only "12" because the entire text would match and be replaced by only the first group.
It then trips any leading or trailing whitespace.
The pattern string that is passed as argument method seems to contain a matching group and the call to replace all is going to replace the entire match to the paatern with the portion that matched the first group. You should look for the call hierarchy of this method to find some regexes passed to the method along with the strings that are being worked upon,
It's just replacing a string with its own subset of matched characters and then trimming the spaces from both end.
Fo example
So if you want a word to be replaced by a series of digits of that word
Use the regex \b.*?(\d*).*?\b
and then boom,your replaceAll method will give these results
hey123wow->123
what666->666
how888->888
$0 refers to the whole matched string i.e hey123wow,what666,how888 in this example
$1 refers to the group.i.e.(\d*) in this example i.e.123,666,888
$2 would refer to the second group which does not exist in this example.
toChar.replaceAll(ptn, "$1");
Its replacing all the occurences of ptn in toChar with the captured group $1 which we don't know where it is.
Capture groups are patterns inside brackets (): -
For E.G in the below Regex : -
"(\\d+)(cd)"
$0 denotes the complete match
$1 denotes the first capture group (\\d+)
$2 denotes the second capture group (cd)
String str1 = "xyz12cd";
// This will replace `12cd` with the first capture group `12`
str1 = str1.replaceAll("(\\d+)(cd)", "$1");
System.out.println(str1);
For learning more about Regular Expression, you can refer to the following links: -
http://www.vogella.com/articles/JavaRegularExpressions/article.html
http://docs.oracle.com/javase/tutorial/essential/regex/

Categories

Resources