Java regex matcher not working - java

Im trying to get the hang of pattern and matcher. This method should use the regex pattern to iterate over an array of state capitals and return the state or states that correspond to the pattern. The method works fine when I check for whole strings like "tallahassee" or "salt lake city" but not for something like "^t" what is it that im not getting?
This is the method and main that calls it:
public ArrayList<String> getState(String s) throws RemoteException
{
Pattern pattern = Pattern.compile(s);
Matcher matcher;
int i=0;
System.out.println(s);
for(String ct:capitalValues)
{
matcher = pattern.matcher(ct);
if(ct.toLowerCase().matches(s))
states.add(stateValues[i]);
i++;
}
return states;
}
public static void main (String[] args) throws RemoteException
{
ArrayList<String> result = new ArrayList<String>();
hashTester ht = new hashTester();
result = ht.getState(("^t").toLowerCase());
System.out.println("result: ");
for(String s:result)
System.out.println(s);
}
thanks for your help

You're not even using your matcher for matching. You're using String#matches() method. Both that method and Matcher#matches() method matches the regex against the complete string, and not a part of it. So your regex should cover entire string. If you just want to match with a part of the string, use Matcher#find() method.
You should use it like this:
if(matcher.find(ct.toLowerCase())) {
// Found regex pattern
}
BTW, if you only want to see if a string starts with t, you can directly use String#startsWith() method. No need of regex for that case. But I guess it's a general case here.

^ is an anchor character in regex. You have to escape it if you do not want anchoring. Otherwise ^t mens the t at the beginning of the string. Escape it using \\^t

Related

How to make an "or" statement in a java Pattern?

So, basically , I am currently working on a program that extracts answers from an html code and stores them into an array. The problem is that when I try to make a pattern in order to separate the answers , I can't seem to make an 'or' statement.
The answers are stored like this on the html code:
['h','e','l','l','o',' ','w','o','r','l','d']
My problem is that when I write it into a String the one with a space(' ') is not recognized by the pattern, so when I write it into a file what shows up is helloworld, with no spaces. What I want to do is a pattern that simultaneously detects the letters AND the spaces , but I have no idea of how to make an 'or' statement in the middle of a pattern.
This is my pattern right now, which only detects the letters:
Pattern p= Pattern.compile("'[A-Z]+'");
EDIT: Still doesn't work...Do you think it might be something else?
Here's part of my code( sorry, I know it's a mess):
// creates a String containing the letters separated by ' '
public static String createString(BufferedReader in,BufferedWriter out, String texto) throws IOException{
StringBuilder sb= new StringBuilder();
Pattern p = Pattern.compile("'[A-Z '']'");
Matcher m= p.matcher(texto);
while(m.find()){
sb.append(m.group());
}
return sb.toString();
}
//splits the String in order to create an array with nothing but letters
public static void toArray(String s, String[] lista, BufferedWriter out) throws IOException{
lista=s.split("[']");
for(String a:lista){
out.write(a);
System.out.print(a); // to check output
}
}
Just add a space to the character class:
public class HelloWorldRegex {
public static void main(final String... args) {
final String regex = "'([A-Z ])'";
final Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
final String input = "['h','e','l','l','o',' ','w','o','r','l','d']";
final Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.print(matcher.group(1));
}
}
}
Output: hello world
Test the regex online: https://regex101.com/r/eL8uT9/3
What you have now only says you're expecting zero or more letters. You need to say you're expecting some letters or a space.
Pattern p = Pattern.compile("'[A-Z]+' | ' ' ");
You need to use the or operator. This way you're saying you're expecting zero or more letters or a space!

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

java regex/arraylist issue, unable to get matches/arraylist store new values

just getting into java arrays and regex and trying to build a program I once previously built in php, what I need to happen is I have an arraylist from twitter that I would like to use regular expressions to find text containing links, if a text contains links I want to add these new values to a new array which I would like to display, however the final array returns empty, meaning that at some point either the regular expression in my code isnt matching properly or the values arent transferring over to the new array, as I am new to this in java I am unable to spot where this is going wrong, any help would be massive, thanks in advance.
protected void onPostExecute(ResponseList<twitter4j.Status> results) {
// TODO Auto-generated method stub
super.onPostExecute(results);
ArrayList<twitter4j.Status> al = new ArrayList<twitter4j.Status>();
for(twitter4j.Status statii: results){
String patternStr = "^(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(statii.getText());
if(matcher.find() == true){
al.add(statii);
}
}
StatusListAdapter adapter = new StatusListAdapter(
TweepicsappActivity.this, al);
setListAdapter(adapter);
}
Your regex has a bunch of unescaped special-characters in it (i.e. things like + and , and . and : and | that have meaning to the regex parser and do not match literal text unless you escape them).
Personally I always escape all special characters in a regex, even for ones that have no special meaning to the regex parser. The issues caused by forgetting to escape one can be too confusing to debug for it to be worth risking not escaping something, in my opinion.
So I would do patternStr like:
String patternStr = "(https?|ftp|file)\\://[\\-a-zA-Z0-9\\+\\&\\#\\#/\\%\\?\\=\\~\\_\\|\\!\\:\\,\\.\\;]*[\\-a-zA-Z0-9\\+\\&\\#\\#/\\%\\=\\~\\_\\|]";
Not very pretty, but it gets the job done.
Here's an example: http://ideone.com/W8s3p
First, I would double check your regex. Step through the code in the debugger. Second, I would use matcher.matches() rather than find(). Lastly, if performance is important, I would reuse both the Matcher -- initializing it as a static member in a static class initializer.
private static final Matcher matcher;
static {
String patternStr = "^(https?|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]";
Pattern pattern = Pattern.compile(patternStr);
matcher = pattern.matcher("");
}
protected void onPostExecute(ResponseList results) {
// TODO Auto-generated method stub
super.onPostExecute(results);
ArrayList al = new ArrayList();
for(twitter4j.Status statii: results) {
matcher.reset(statii.getText());
if(matcher.matches()) {
al.add(statii);
}
}
StatusListAdapter adapter = new StatusListAdapter(TweepicsappActivity.this, al);
setListAdapter(adapter);
}

java strings with numbers

I am having a group of strings in Arraylist.
I want to remove all the strings with only numbers
and also strings like this : (0.75%),$1.5 ..basically everything that does not contain the characters.
2) I want to remove all special characters in the string before i write to the console.
"God should be printed God.
"Including should be printed: quoteIncluding
'find should be find
Java boasts a very nice Pattern class that makes use of regular expressions. You should definitely read up on that. A good reference guide is here.
I was going to post a coding solution for you, but styfle beat me to it! The only thing I was going to do different here was within the for loop, I would have used the Pattern and Matcher class, as such:
for(int i = 0; i < myArray.size(); i++){
Pattern p = Pattern.compile("[a-z][A-Z]");
Matcher m = p.matcher(myArray.get(i));
boolean match = m.matches();
//more code to get the string you want
}
But that too bulky. styfle's solution is succinct and easy.
When you say "characters," I'm assuming you mean only "a through z" and "A through Z." You probably want to use Regular Expressions (Regex) as D1e mentioned in a comment. Here is an example using the replaceAll method.
import java.util.ArrayList;
public class Test {
public static void main(String[] args) {
ArrayList<String> list = new ArrayList<String>(5);
list.add("\"God");
list.add(""Including");
list.add("'find");
list.add("24No3Numbers97");
list.add("w0or5*d;");
for (String s : list) {
s = s.replaceAll("[^a-zA-Z]",""); //use whatever regex you wish
System.out.println(s);
}
}
}
The output of this code is as follows:
God
quotIncluding
find
NoNumbers
word
The replaceAll method uses a regex pattern and replaces all the matches with the second parameter (in this case, the empty string).

What's up with this regular expression not matching?

public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.)"));
}
}
This program prints "false". What?!
I am expecting to match the prefix of the string: "117_117_0009v0_1"
I know this stuff, really I do... but for the life of me, I've been staring at this for 20 minutes and have tried every variation I can think of and I'm obviously missing something simple and obvious here.
Hoping the many eyes of SO can pick it out for me before I lose my mind over this.
Thanks!
The final working version ended up as:
String text = "117_117_0009v0_172_5738_5740";
String regex = "[0-9_]+v._.";
Pattern p = Pattern.compile(regex);
Mather m = p.matcher(text);
if (m.lookingAt()) {
System.out.println(m.group());
}
One non-obvious discovery/reminder for me was that before accessing matcher groups, one of matches() lookingAt() or find() must be called. If not an IllegalStateException is thrown with the unhelpful message "Match not found". Despite this, groupCount() will still return non-zero, but it lies. Do not beleive it.
I forgot how ugly this API is. Argh...
by default Java sticks in the ^ and $ operators, so something like this should work:
public class PatternTest {
public static void main(String[] args) {
System.out.println("117_117_0009v0_172_5738_5740".matches("^([0-9_]+v._.).*$"));
}
}
returns:
true
Match content:
117_117_0009v0_1
This is the code I used to extract the match:
Pattern p = Pattern.compile("^([0-9_]+v._.).*$");
String str = "117_117_0009v0_172_5738_5740";
Matcher m = p.matcher(str);
if (m.matches())
{
System.out.println(m.group(1));
}
If you want to check if a string starts with the certain pattern you should use Matcher.lookingAt() method:
Pattern pattern = Pattern.compile("([0-9_]+v._.)");
Matcher matcher = pattern.matcher("117_117_0009v0_172_5738_5740");
if (matcher.lookingAt()) {
int groupCount = matcher.groupCount();
for (int i = 0; i <= groupCount; i++) {
System.out.println(i + " : " + matcher.group(i));
}
}
Javadoc:
boolean
java.util.regex.Matcher.lookingAt()
Attempts to match the input sequence,
starting at the beginning of the
region, against the pattern. Like the
matches method, this method always
starts at the beginning of the region;
unlike that method, it does not
require that the entire region be
matched. If the match succeeds then
more information can be obtained via
the start, end, and group methods.
I donno Java Flavor of Regular Expression However This PCRE Regular Expression Should work
^([\d_]+v\d_\d).+
Dont know why you are using ._. instead of \d_\d

Categories

Resources