android : extract uk postcode - java

Hello I am trying to extract a uk postcode from a string i.e. "the person's house is at SS9 8ID we'll be there at 8pm" so I can extract the "SS9 8ID" bit. I've tried the following code but it's not working for some reason...any ideas???
String pc1="^([A-PR-UWYZ](([0-9](([0-9]|[A-HJKSTUW])?)?)|([A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?)) [0-9][ABD-HJLNP-UW-Z]{2})|GIR 0AA$";
String test="the person's house is at SS9 8ID we'll be there at 8pm";
Pattern pattern = Pattern.compile(pc1);
Matcher matcher = pattern.matcher(test.toUpperCase());
if (matcher.matches()) {
//Log.d("pccode:::", matcher.group(1) );
Log.d("pccode:::", matcher.group());
} else { Log.d("NO","NO PCODE"); }

The matches method matches the whole string, you should use find instead. And don't use ^ and $ in the expression.
Also the SS9 8ID doesn't match the regexp, because ABD-HJLNP-UW-Z doesn't include letter I which is in the postcode.

Related

How to parse a string to get array of #tags out of the string?

so I have this string like
"#tag1 #tag2 #tag3 not_tag1 not_tag2 #tag4" (the space between tag2 and tag4 is to indicate there can be many spaces). From this string I want to parse just a tag1, tag2 and so on. They are similar to #tags we see on LinkedIn or any other social media. Is there any easy way to do this using regex or any other function in Java. Or should I do it hard way(i.e. using loops and conditions).
Tag format should be "#" (to indicate tag is starting) and space " "(to indicate end of tag). In between there can be character or numbers but start should be a character only.
example,
input : "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4"
output : ["tag1", "tag2", "tag3", "tag4"]
split by regex: "#\w+"
EDIT: this is the correct regex, but split is not the right method.
same solution as javadev suggested, but use instead:
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Matcher matcher = Pattern.compile("#\\w+").matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
output with # as expected.
Maybe something like:
public static void main(String[] args ) {
String input = "#tag1 #tag2 #tag3 not_tag1 not_tag2 #12tag #tag4";
Pattern pattern = Pattern.compile("#([A-z][A-z0-9]*) *");
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
worked for me :)
Output:
tag1
tag2
tag3
tag4

Using Regular Expression in Java to extract information from a String

I have one input String like this:
"I am Duc/N Ta/N Van/N"
String "/N" present it is the Name of one person.
The expected output is:
Name: Duc Ta Van
How can I do it by using regular expression?
You can use Pattern and Matcher like this :
String input = "I am Duc/N Ta/N Van/N";
Pattern pattern = Pattern.compile("([^\\s]+)/N");
Matcher matcher = pattern.matcher(input);
String result = "";
while (matcher.find()) {
result+= matcher.group(1) + " ";
}
System.out.println("Name: " + result.trim());
Output
Name: Duc Ta Van
Another Solution using Java 9+
From Java9+ you can use Matcher::results like this :
String input = "I am Duc/N Ta/N Van/N";
String regex = "([^\\s]+)/N";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
String result = matcher.results().map(s -> s.group(1)).collect(Collectors.joining(" "));
System.out.println("Name: " + result); // Name: Duc Ta Van
Here is the regex to use to capture every "name" preceded by a /N
(\w+)\/N
Validate with Regex101
Now, you just need to loop on every match in that String and concatenate the to get the result :
String pattern = "(\\w+)\\/N";
String test = "I am Duc/N Ta/N Van/N";
Matcher m = Pattern.compile(pattern).matcher(test);
StringBuilder sbNames = new StringBuilder();
while(m.find()){
sbNames.append(m.group(1)).append(" ");
}
System.out.println(sbNames.toString());
Duc Ta Van
It is giving you the hardest part. I let you adapt this to match your need.
Note :
In java, it is not required to escape a forward slash, but to use the same regex in the entire answer, I will keep "(\\w+)\\/N", but "(\\w+)/N" will work as well.
I've used "[/N]+" as the regular expression.
Regex101
[] = Matches characters inside the set
\/ = Matches the character / literally (case sensitive)
+ = Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

Using Regular Expressions to Extract specific Values in Java

I have several strings in the rough form:
String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";
I want to extract the values for websiteName, userAgentNameWithSpaces, username and someTime.
I have tried the following code.
private static final Pattern USER_NAME_PATTERN = Pattern.compile("for user.*;");
final Matcher matcher = USER_NAME_PATTERN.matcher(line);
matcher.find() ? Optional.of(matcher.group(group)) : Optional.empty();
It returns the whole string " for user username" after that I have to replace the for user string with empty string to get the user name.
However, I want to know if there is regex to just get the username directly?
You can use regex groups:
Pattern pattern = Pattern.compile("for user (\\w+)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
The pair of parenthesis ( and ) forms a group that can be obtained by the matcher using group method (as it's the first parenthesis, it's group 1).
\w means a "word character" (letters, numbers and _) and + means "one or more ocurrences". So \w+ means basically "a word" (assuming your username has only these characters). PS: note that I had to escape \, so the resulting expression is \\w+.
The ouput of this code is:
username
If you want to match all the values (websiteName, userAgentNameWithSpaces and so on), you could do the following:
Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent (.*) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
}
The output will be:
websiteNAme
userAgentNameWithSpaces
username
someTime
Note that if userAgentNameWithSpaces contains spaces, \w+ won't work (because \w doesn't match spaces), so .* will work in this case.
But you can also use [\w ]+ - the brackes [] means "any of the characters inside me", so [\w ] means "a word character, or a space" (note that there's a space between w and ]. So the code would be (testing with a username with spaces):
String s = "Rendering content from websiteNAme using user agent userAgent Name WithSpaces ; for user username ; at time someTime";
Pattern pattern = Pattern.compile("Rendering content from (.*) using user agent ([\\w ]+) ; for user (.*) ; at time (.*)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
System.out.println(matcher.group(4));
}
And the output will be:
websiteNAme
userAgent Name WithSpaces
username
someTime
Note: you can test if the groups were matched before calling matcher.group(n). The method matcher.groupCount() returns how many groups were matched (because if you call matcher.group(n) and group n is not available, you'll get an IndexOutOfBoundsException)
I think you want to use lookaheads and lookbehinds:
String s = "Rendering content from websiteNAme using user agent userAgentNameWithSpaces ; for user username ; at time someTime";
Pattern USER_NAME_PATTERN = Pattern.compile("(?<=for user).*?(?=;)");
final Matcher matcher = USER_NAME_PATTERN.matcher(s);
matcher.find();
System.out.println(matcher.group(0).trim());
Output:
username

How to return everything before X characters?

Say user enters a string, like "My name is Oz, the great and powerful". I have cut off "My name is" using substring, but I want to cut off ", the great and powerful" and keep only Oz.
Is there any method of doing this ?
Keeping in mind that the user entered String is unknown to us.
You can use Regex, you have to identify the pattern that it will match, and this is easy in your case as the string before the name is fixed "My name is" and there are a comma after the name, so the pattern will be (My name is )(.*)(,)(.*) and that makes the output generated using the following code:
// String to be scanned to find the pattern.
String line = "My name is Oz, the great and powerful";
String pattern = "(My name is )(.*)(,)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
String name = m.group(2);
}
and if there is no comma the pattern has to be:
String pattern = "(My name is )([A-Z]*[a-z]*)(\\s)(.*)";
You can have a look here, it's a great tutorial for it.

How to remove dot (.) character using a regex for email addresses of type "abcd.efgh#xyz.com" in java?

I was trying to write a regex to detect email addresses of the type 'abc#xyz.com' in java. I came up with a simple pattern.
String line = // my line containing email address
Pattern myPattern = Pattern.compile("()(\\w+)( *)#( *)(\\w+)\\.com");
Matcher myMatcher = myPattern.matcher(line);
This will however also detect email addresses of the type 'abcd.efgh#xyz.com'.
I went through http://www.regular-expressions.info/ and links on this site like
How to match only strings that do not contain a dot (using regular expressions)
Java RegEx meta character (.) and ordinary dot?
So I changed my pattern to the following to avoid detecting 'efgh#xyz.com'
Pattern myPattern = Pattern.compile("([^\\.])(\\w+)( *)#( *)(\\w+)\\.com");
Matcher myMatcher = myPattern.matcher(line);
String mailid = myMatcher.group(2) + "#" + myMatcher.group(5) + ".com";
If String 'line' contained the address 'abcd.efgh#xyz.com', my String mailid will come back with 'fgh#yyz.com'. Why does this happen? How do I write the regex to detect only 'abc#xyz.com' and not 'abcd.efgh#xyz.com'?
Also how do I write a single regex to detect email addresses like 'abc#xyz.com' and 'efg at xyz.com' and 'abc (at) xyz (dot) com' from strings. Basically how would I implement OR logic in regex for doing something like check for # OR at OR (at)?
After some comments below I tried the following expression to get the part before the # squared away.
Pattern.compile("((([\\w]+\\.)+[\\w]+)|([\\w]+))#(\\w+)\\.com")
Matcher myMatcher = myPattern.matcher(line);
what will the myMatcher.groups be? how are these groups considered when we have nested brackets?
System.out.println(myMatcher.group(1));
System.out.println(myMatcher.group(2));
System.out.println(myMatcher.group(3));
System.out.println(myMatcher.group(4));
System.out.println(myMatcher.group(5));
the output was like
abcd.efgh
abcd.efgh
abcd.
null
xyz
for abcd.efgh#xyz.com
abc
null
null
abc
xyz
for abc#xyz.com
Thanks.
You can use | operator in your regexps to detect #ORAT: #|OR|(at).
You can avoid having dot in email addresses by using ^ at the beginning of the pattern:
Try this:
Pattern myPattern = Pattern.compile("^(\\w+)\\s*(#|at|\\(at\\))\\s*(\\w+)\\.(\\w+)");
Matcher myMatcher = myPattern.matcher(line);
if (myMatcher.matches())
{
String mail = myMatcher.group(1) + "#" + myMatcher.group(3) + "." +myMatcher.group(4);
System.out.println(mail);
}
Your first pattern needs to combine the facts that you want word character and not dots, you currently have it separately, it should be:
[^\\.\W]+
This is 'not dots' and 'not not word characters'
So you have:
Pattern myPattern = Pattern.compile("([^\\.\W]+)( *)#( *)(\\w+)\\.com");
To answer your second question, you can use OR in REGEX with the | character
(#|at)

Categories

Resources