Multiple Regular Expressions - java

I'm not used to them and having trouble with the java syntax "matches".
I have two files one is 111.123.399.555.xml the other one is Conf.xml.
Now I only want to get the first file with regular expressions.
string.matches("[1-9[xml[.]]]");
doesnt work.
How to do this?

The use of string.matches("[1-9[xml[.]]]"); will not work because [] will create a character class group, not a capturing group.
What this means is that, to java, your expression is saying "match any of: [1-to-9 [or x, or m, or l [or *any*]]]" (*any* here is because you did not escape the ., and as it, it will create a match any character command)
Important:
"\" is recognized by java as a literal escape character, and for it to be sent to the matcher as an actual matcher's escape character (also "\", but in string form), it itself needs to be escaped, thus, when you mean to use "\" on the matcher, you must actually use "\\".
This is a bit confusing when you are not used to it, but to sum it up, to send an actual "\" to be matched to the matcher, you might have to use "\\\\"! The first "\\" will become "\" to the matcher, thus a scape character, and the second "\\", escaped by the first, will become the actual "\" string!
The correct pattern-string to match for a ###.###.###.###.xml pattern where the "#" are always numbers, is string.matches("(\\d{3}\\.){4}xml"), and how it works is as follows:
The \\d = will match a single digit character. It is the same as
using [0-9], just simpler.
The {3} specifies matching for "exactly 3 times" for the previous
\\d. Thus matching ###.
The \\. matches a single dot character.
The () enclosing the previous code says "this is a capturing group"
to the matcher. It is used by the next {4}, thus creating a "match
this whole ###. group exactly 4 times", thus creating "match ###.###.###.###.".
And finally, the xml before the pattern-string ends will match
exactly "xml", which, along the previous items, makes the exact match for that pattern: "###.###.###.###.xml".
For further learning, read Java's Pattern docs.

string.matches("[1-9.]+\\.xml")
should do it.
[1-9.]+ matches one or more digits between 1 and 9 and/or periods. (+ means "one or more", * means "zero or more", ? means "zero or one").
\.xml matches .xml. Since . means "any character" in a regex, you need to escape it if you want it to mean a literal period: \. (and since this is in a Java string, the backslash itself needs to be escaped by doubling).

Related

Regex to find decimal or non decimal number on same or next line in java

I have the following text
My thing 0.02
My thing 100.2
My thing 65
My thing
0.03
My thing
13
My thing
45.67 stuff
I want to extract the 'My thing' and the number associated with it can split it and put it into an map (I know the keys will over-wreite each other in this example- its just the example Im using here- My thing will actually be incorporated into its own map so it isn't an issue)
Mything=0.02,Mything=100.2,Mything=65,Mything=0.03,Mything=13,Mything=45.67
I tried
Pattern match_pattern = Pattern.compile(start.trim()+"\\n.*?\\d*\\.\\d*\\s",Pattern.DOTALL);
but this doesn't quite do what I want
The pattern for an integer or decimal might be \d+(\.\d+)? so if you want to look for start followed by that number and optional whitespace in between you might try the pattern start + "\\s*\\d+(\\.\\d+)?" (line breaks are whitespace as well) and apply the pattern to multiline text (i.e. don't apply it to individual lines). If there can be anything in between (not just whitespace) you'll want to use .* along with the DOT_ALL flag instead of \s*.
Breakdown of the expression start + "\\s*\\d+(\\.\\d+)?"
start contains a subexpression which is provided from elsewhere. If you want to make sure it is treated as a literal (i.e. special characters like * etc. are not interpreted wrap it with \Q and \E, i.e. "\\Q" + start + "\\E")
\s* (or \\s* in a Java string literal) means "any whitespace" which also includes line breaks
\d+(\.\d+)? (or \\d+(\\.\\d+)? in a Java string literal) means "one or more digits followed by zero or one group consisting of a dot and one or more digits" - this means the "dot and one or more digits" part is optional but if there is a dot it must be followed by at least one digit.
Additional note: if you want to access the capturing groups e.g. to extract the number you'll want to use a non-capturing group for the optional part and wrap the entire (sub-)expression in a capturing group, e.g. (\d+(?:\.\d+)?). In that case, if you'd use Pattern and Matcher, you could access the number using group(1) - or if you wrap start in a group as well (like "(\\Q" + start + "\\E)\\s*(\\d+(?:\\.\\d+)?)") you'd get the first part as group(1) and the second part as group(2).
If you simply want to extract the records you could do it like
String s = "My thing 0.02\nMy thing 100.2\nMy thing 65\nMy thing\n"+
"0.03\nMy thing\n13\nMy thing\n 45.67 stuff\n";
Matcher m = Pattern.compile("(My thing)\\s*(\\d+(?:\\.\\d+)?)").matcher(s);
Then loop through the matches and add to the dictionary, or what ever... ;)
while (m.find()) {
// Add to dictionary, group 1 is key, 2 is value
System.out.println("Found: " + m.group(0)+ ":" + m.group(1)+":" + m.group(2));
}
See it here at ideone.

how to understand code like this pKataLengkap.replaceAll("(.)\\1+", "$1")

can anyone describe what's mean of code below this
pKataLengkap.replaceAll("(.)\\1+", "$1")
i dont understand, im get some reference from link from code fight
thanks!
replaceAll replaces regular expressions (regexes). If you don't understand anything about regexes, you should read this tutorial. However, this particular regex is a bit on the tricky side, so I'll explain it. The regex is (.)\1+ (the backslash has to be doubled in a string literal, but the regex only has one backslash).
The first . matches any single character. Since it's in parentheses, the matcher treats this as a "capturing group"; since it's the first group in the regex, it's "capturing group 1". When a match is found (i.e. when the matcher finds any single character), the text of that match will be the capturing group. Thus, "capturing group 1" is that one character.
The next part is \1+. + is a quantifier meaning "one or more of whatever the + follows". \1 is a special pattern that means "whatever is in capturing group 1". So what this all means is that the pattern will match any single character followed by one or more occurrences of that same character. That is, it matches patterns with two or more occurrences of the same character.
Now each such pattern is replaced by "$1". The $1 is special in replaceAll, and it means "the contents of the capturing group 1", which is the single character that got matched.
So basically, any time the matcher sees two or more consecutive occurrences of the same character, it will replace them with one occurrence of that character. That is, it will transform "xxxyyyyyyzzz" to "xyz".

How to replace strings using java String.replaceAll() excluding some patterns?

I am using String.Replaceall to replace forward slash / followed or preceded by a space with a comma followed by space ", " EXCEPT some patterns (for example n/v, n/d should not be affected)
ALL the following inputs
"nausea/vomiting"
"nausea /vomiting"
"nausea/ vomiting"
"nausea / vomiting"
Should be outputted as
nausea, vomiting
HOWEVER ALL the following inputs
"user have n/v but not other/ complications"
"user have n/d but not other / complications"
Should be outputted as follows
"user have n/v but not other, complications"
"user have n/d but not other, complications"
I have tried
String source= "nausea/vomiting"
String regex= "([^n/v])(\\s*/\\s*)";
source.replaceAll(regex, ", ");
But it cuts the a before / and gives me nause , vomiting
Does any body know a solution?
Your first capturing group, ([^n/v]), captures any single character that is not the letter n, the letter v, or a slash (/). In this case, it's matching the a at the end of nausea and capturing it to be replaced.
You need to be a bit more clear about what you are and are not replacing here. Do you just want to make sure there's a comma instead when it doesn't end in "vomiting" or "d"? You can use non-capturing groups to indicate this:
(?=asdf) does not capture but when placed at the end ensures that right after the match the string will contain asdf; (?!asdf) ensures that it will not. Whichever you use, the question mark after the initial parenthesis ensures that any text it matches will not be returned or replaced when the match is found.
Also, do not forget that in Java source you must always double up any backslashes you put in string literals.
[^n/v] is a character class, and means anything except a n, / or a v.
You are probably looking for something like a negative lookbehind:
String regex= "(?<!\\bn)(\\s*/\\s*)";
This will match any of your slash and space combinations that are not preceded by just an n, and works for all your examples. You can read more on lookaround here.

Java regex "[.]" vs "."

I'm trying to use some regex in Java and I came across this when debugging my code.
What's the difference between [.] and .?
I was surprised that .at would match "cat" but [.]at wouldn't.
[.] matches a dot (.) literally, while . matches any character except newline (\n) (unless you use DOTALL mode).
You can also use \. ("\\." if you use java string literal) to literally match dot.
The [ and ] are metacharacters that let you define a character class. Anything enclosed in square brackets is interpreted literally. You can include multiple characters as well:
[.=*&^$] // Matches any single character from the list '.','=','*','&','^','$'
There are two specific things you need to know about the [...] syntax:
The ^ symbol at the beginning of the group has a special meaning: it inverts what's matched by the group. For example, [^.] matches any character except a dot .
Dash - in between two characters means any code point between the two. For example, [A-Z] matches any single uppercase letter. You can use dash multiple times - for example, [A-Za-z0-9] means "any single upper- or lower-case letter or a digit".
The two constructs above (^ and -) are common to nearly all regex engines; some engines (such as Java's) define additional syntax specific only to these engines.
regular-expression constructs
. => Any character (may or may not match line terminators)
and to match the dot . use the following
[.] => it will matches a dot
\\. => it will matches a dot
NOTE: The character classes in Java regular expression is defined using the square brackets "[ ]", this subexpression matches a single character from the specified or, set of possible characters.
Example : In string address replaces every "." with "[.]"
public static void main(String[] args) {
String address = "1.1.1.1";
System.out.println(address.replaceAll("[.]","[.]"));
}
if anything is missed please add :)

What is the responsibility of (.*) in the Java String?

What is the responsibility of (.*) in the third line and how it works?
String Str = new String("Welcome to Tutorialspoint.com");
System.out.print("Return Value :" );
System.out.println(Str.matches("(.*)Tutorials(.*)"));
.matches() is a call to parse Str using the regex provided.
Regex, or Regular Expressions, are a way of parsing strings into groups. In the example provided, this matches any string which contains the word "Tutorials". (.*) simply means "a group of zero or more of any character".
This page is a good regex reference (for very basic syntax and examples).
Your expression matches any word prefixed and suffixed by any character of word Tutorial. .* means occurrence of any character any number of times including zero times.
The . represents regular expression meta-character which means any character.
The * is a regular expression quantifier, which means 0 or more occurrences of the expression character it was associated with.
matches takes regular expression string as parameter and (.*) means capture any character zero or more times greedily
.* means a group of zero or more of any character
In Regex:
.
Wildcard: Matches any single character except \n
for example pattern a.e matches ave in nave and ate in water
*
Matches the previous element zero or more times
for example pattern \d*\.\d matches .0, 19.9, 219.9
There is no reason to put parentheses around the .*, nor is there a reason to instantiate a String if you've already got a literal String. But worse is the fact that the matches() method is out of place here.
What it does is greedily matching any character from the start to the end of a String. Then it backtracks until it finds "Tutorials", after which it will again match any characters (except newlines).
It's better and more clear to use the find method. The find method simply finds the first "Tutorials" within the String, and you can remove the "(.*)" parts from the pattern.
As a one liner for convenience:
System.out.printf("Return value : %b%n", Pattern.compile("Tutorials").matcher("Welcome to Tutorialspoint.com").find());

Categories

Resources