I'm trying to parse a simple DDL statement. First I'm trying to pull the table name out.
The syntax will be something like 'CREATE TABLE DB_NAME.TABLE_NAME'
So far I've got this:
String line = "CREATE TABLE DB_NAME.T_NAME";
String pattern = ".*?\\bTABLE\\s+(\\w+)\\b.*";
System.out.println(line.replaceFirst(pattern, "$1"));
That gives me back "DB_NAME". How can I get it to give me back "T_NAME"?
I tried following the update in this answer, but I couldn't get it to work, probably due to my very limited regex skills.
What about sth like this:
.*?\\bTABLE\\s+\\w+\\.(\\w+)\\b.*
Demo
It first matches the TABLE keyword with .*?\\bTABLE\\s+. Then it matches DB_NAME. with \\w+\\.. Finally it matches and captures T_NAME with (\\w+)
Here's a small piece of code that will do (using named capturing groups):
String line = "CREATE TABLE DB_NAME.T_NAME";
Pattern pattern = Pattern.compile("CREATE TABLE (?<database>\\w+)\\.(?<table>\\w+)");
Matcher matcher = pattern.matcher(line);
if (matcher.matches()) {
String database = matcher.group("database"); // DB_NAME
String table = matcher.group("table"); // T_NAME
}
You may extract all the string after the TABLE into a group and then split with comma to get individual values:
String line = "CREATE TABLE DB_NAME.T_NAME";
String pattern = "\\bTABLE\\s+(\\w+(?:\\.\\w+)*)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(line);
if (m.find()){
System.out.println(Arrays.toString(m.group(1).split("\\.")));
// => [DB_NAME, T_NAME]
}
See the Java demo.
If you are sure of the incoming format of the string, you might even use
"\\bTABLE\\s+(\\S+)"
See another Java demo.
While \w+(?:\.\w+)* matches 1+ word chars followed with 0+ repetitions of . and 1+ word chars, \S+ plainly matches 1+ non-whitespace chars.
Related
I have a string email = John.Mcgee.r2d2#hitachi.com
How can I write a java code using regex to bring just the r2d2?
I used this but got an error on eclipse
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = patter.matcher
for (Strimatcher.find()){
System.out.println(matcher.group(1));
}
To match after the last dot in a potential sequence of multiple dots request that the sequence that you capture does not contain a dot:
(?<=[.])([^.]*)(?=#)
(?<=[.]) means "preceded by a single dot"
(?=#) means "followed by # sign"
Note that since dot . is a metacharacter, it needs to be escaped either with \ (doubled for Java string literal) or with square brackets around it.
Demo.
Not sure if your posting the right code. I'll rewrite it based on what it should look like though:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
int count = 0;
while(matcher.find()) {
count++;
System.out.println(matcher.group(count));
}
but I think you just want something like this:
String email = John.Mcgee.r2d2#hitachi.com
Pattern pattern = Pattern.compile(".(.*)\#");
Matcher matcher = pattern.matcher(email);
if(matcher.find()){
System.out.println(matcher.group(1));
}
No need to Pattern you just need replaceAll with this regex .*\.([^\.]+)#.* which mean get the group ([^\.]+) (match one or more character except a dot) which is between dot \. and #
email = email.replaceAll(".*\\.([^\\.]+)#.*", "$1");
Output
r2d2
regex demo
If you want to go with Pattern then you have to use this regex \\.([^\\.]+)# :
String email = "John.Mcgee.r2d2#hitachi.com";
Pattern pattern = Pattern.compile("\\.([^\\.]+)#");
Matcher matcher = pattern.matcher(email);
if (matcher.find()) {
System.out.println(matcher.group(1));// Output : r2d2
}
Another solution you can use split :
String[] split = email.replaceAll("#.*", "").split("\\.");
email = split[split.length - 1];// Output : r2d2
Note :
Strings in java should be between double quotes "John.Mcgee.r2d2#hitachi.com"
You don't need to escape # in Java, but you have to escape the dot with double slash \\.
There are no syntax for a for loop like you do for (Strimatcher.find()){, maybe you mean while
Given the following string
Created by CreateImage(i-b9b4ffaa) for ami-dbcf88b1 from vol-e97db305
I want to be able to extract the following using a regular expression
i-b9b4ffaa
ami-dbcf88b1
vol-e97db305
This is the regular expression I came up with, which currently doesn't do what I need :
Pattern p = Pattern.compile("Created by CreateImage([a-z]+[0.9]+)([a-z]+[0.9]+)([a-z]+[0.9]+)",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("Created by CreateImage(i-b9b4ffaa) for ami-dbcf88b1 from vol-e97db305");
System.out.println(m.matches()); --> false
You may match all words starting with letters, followed with a hyphen, and then having alphanumeric chars:
String s = "Created by CreateImage(i-b9b4ffaa) for ami-dbcf88b1 from vol-e97db305";
Pattern pattern = Pattern.compile("(?i)\\b[a-z]+-[a-z0-9]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
// => i-b9b4ffaa, ami-dbcf88b1, vol-e97db305
See the Java demo
Pattern details:
(?i) - a case insensitive modifier (embedded flag option)
\\b - a word boundary
[a-z]+ - 1 or more ASCII letters
- - a hyphen
[a-z0-9]+ - 1 or more alphanumerics.
To make sure these values appear on the same line after Created by CreateImage, use a \G-based regex:
String s = "Created by CreateImage(i-b9b4ffaa) for ami-dbcf88b1 from vol-e97db305";
Pattern pattern = Pattern.compile("(?i)(?:Created by CreateImage|(?!\\A)\\G)(?:(?!\\b[a-z]+-[a-z0-9]+).)*\\b([a-z]+-[a-z0-9]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
See this demo.
Note that the above pattern is based on the \G operator that matches the end of the last successful match (so we only match after a match or after Created...) and a tempered greedy token (?:(?!\\b[a-z]+-[a-z0-9]+).)* (matching any symbol other than a newline that does not start a sequence: word boundary+letters+-+letters|digits) that is very resource consuming.
You should consider using a two-step approach to first check if a string starts with Created... string, and then process it:
String s = "Created by CreateImage(i-b9b4ffaa) for ami-dbcf88b1 from vol-e97db305";
if (s.startsWith("Created by CreateImage")) {
Matcher n = Pattern.compile("(?i)\\b[a-z]+-[a-z0-9]+").matcher(s);
while(n.find()) {
System.out.println(n.group(0));
}
}
See another demo
Can you suggest me an approach by which I can split a String which is like:
:31C:150318
:31D:150425 IN BANGLADESH
:20:314015040086
So I tried to parse that string with
:[A-za-z]|\\d:
This kind of regular expression, but it is not working . Please suggest me a regular expression by which I can split that string with 20 , 31C , 31D etc as Keys and 150318 , 150425 IN BANGLADESH etc as Values .
If I use string.split(":") then it would not serve my purpose.
If a string is like:
:20: MY VALUES : ARE HERE
then It will split up into 3 string , and key 20 will be associated with "MY VALUES" , and "ARE HERE" will not associated with key 20 .
You may use matching mechanism instead of splitting since you need to match a specific colon in the string.
The regex to get 2 groups between the first and second colon and also capture everything after the second colon will look like
^:([^:]*):(.*)$
See demo. The ^ will assert the beginning of the string, ([^:]*) will match and capture into Group 1 zero or more characters other than :, and (.*) will match and capture into Group 2 the rest of the string. $ will assert the position at the end of a single line string (as . matches any symbol but a newline without Pattern.DOTALL modifier).
String s = ":20:AND:HERE";
Pattern pattern = Pattern.compile("^:([^:]*):(.*)$");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1) + ", Value: " + matcher.group(2) + "\n");
}
Result for this demo: Key: 20, Value: AND:HERE
You can use the following to split:
^[:]+([^:]+):
Try with split function of String class
String[] splited = string.split(":");
For your requirements:
String c = ":31D:150425 IN BANGLADESH:todasdsa";
c=c.substring(1);
System.out.println("C="+c);
String key= c.substring(0,c.indexOf(":"));
String value = c.substring(c.indexOf(":")+1);
System.out.println("key="+key+" value="+value);
Result:
C=31D:150425 IN BANGLADESH:todasdsa
key=31D value=150425 IN BANGLADESH:todasdsa
I have some input data such as
some string with 'hello' inside 'and inside'
How can I write a regex so that the quoted text (no matter how many times it is repeated) is returned (all of the occurrences).
I have a code that returns a single quotes, but I want to make it so that it returns multiple occurances:
String mydata = "some string with 'hello' inside 'and inside'";
Pattern pattern = Pattern.compile("'(.*?)+'");
Matcher matcher = pattern.matcher(mydata);
while (matcher.find())
{
System.out.println(matcher.group());
}
Find all occurences for me:
String mydata = "some '' string with 'hello' inside 'and inside'";
Pattern pattern = Pattern.compile("'[^']*'");
Matcher matcher = pattern.matcher(mydata);
while(matcher.find())
{
System.out.println(matcher.group());
}
Output:
''
'hello'
'and inside'
Pattern desciption:
' // start quoting text
[^'] // all characters not single quote
* // 0 or infinite count of not quote characters
' // end quote
I believe this should fit your requirements:
\'\w+\'
\'.*?' is the regex you are looking for.
I was trying to write a regex to detect email addresses of the type 'abc#xyz.com' in java. I came up with a simple pattern.
String line = // my line containing email address
Pattern myPattern = Pattern.compile("()(\\w+)( *)#( *)(\\w+)\\.com");
Matcher myMatcher = myPattern.matcher(line);
This will however also detect email addresses of the type 'abcd.efgh#xyz.com'.
I went through http://www.regular-expressions.info/ and links on this site like
How to match only strings that do not contain a dot (using regular expressions)
Java RegEx meta character (.) and ordinary dot?
So I changed my pattern to the following to avoid detecting 'efgh#xyz.com'
Pattern myPattern = Pattern.compile("([^\\.])(\\w+)( *)#( *)(\\w+)\\.com");
Matcher myMatcher = myPattern.matcher(line);
String mailid = myMatcher.group(2) + "#" + myMatcher.group(5) + ".com";
If String 'line' contained the address 'abcd.efgh#xyz.com', my String mailid will come back with 'fgh#yyz.com'. Why does this happen? How do I write the regex to detect only 'abc#xyz.com' and not 'abcd.efgh#xyz.com'?
Also how do I write a single regex to detect email addresses like 'abc#xyz.com' and 'efg at xyz.com' and 'abc (at) xyz (dot) com' from strings. Basically how would I implement OR logic in regex for doing something like check for # OR at OR (at)?
After some comments below I tried the following expression to get the part before the # squared away.
Pattern.compile("((([\\w]+\\.)+[\\w]+)|([\\w]+))#(\\w+)\\.com")
Matcher myMatcher = myPattern.matcher(line);
what will the myMatcher.groups be? how are these groups considered when we have nested brackets?
System.out.println(myMatcher.group(1));
System.out.println(myMatcher.group(2));
System.out.println(myMatcher.group(3));
System.out.println(myMatcher.group(4));
System.out.println(myMatcher.group(5));
the output was like
abcd.efgh
abcd.efgh
abcd.
null
xyz
for abcd.efgh#xyz.com
abc
null
null
abc
xyz
for abc#xyz.com
Thanks.
You can use | operator in your regexps to detect #ORAT: #|OR|(at).
You can avoid having dot in email addresses by using ^ at the beginning of the pattern:
Try this:
Pattern myPattern = Pattern.compile("^(\\w+)\\s*(#|at|\\(at\\))\\s*(\\w+)\\.(\\w+)");
Matcher myMatcher = myPattern.matcher(line);
if (myMatcher.matches())
{
String mail = myMatcher.group(1) + "#" + myMatcher.group(3) + "." +myMatcher.group(4);
System.out.println(mail);
}
Your first pattern needs to combine the facts that you want word character and not dots, you currently have it separately, it should be:
[^\\.\W]+
This is 'not dots' and 'not not word characters'
So you have:
Pattern myPattern = Pattern.compile("([^\\.\W]+)( *)#( *)(\\w+)\\.com");
To answer your second question, you can use OR in REGEX with the | character
(#|at)