So, basically , I am currently working on a program that extracts answers from an html code and stores them into an array. The problem is that when I try to make a pattern in order to separate the answers , I can't seem to make an 'or' statement.
The answers are stored like this on the html code:
['h','e','l','l','o',' ','w','o','r','l','d']
My problem is that when I write it into a String the one with a space(' ') is not recognized by the pattern, so when I write it into a file what shows up is helloworld, with no spaces. What I want to do is a pattern that simultaneously detects the letters AND the spaces , but I have no idea of how to make an 'or' statement in the middle of a pattern.
This is my pattern right now, which only detects the letters:
Pattern p= Pattern.compile("'[A-Z]+'");
EDIT: Still doesn't work...Do you think it might be something else?
Here's part of my code( sorry, I know it's a mess):
// creates a String containing the letters separated by ' '
public static String createString(BufferedReader in,BufferedWriter out, String texto) throws IOException{
StringBuilder sb= new StringBuilder();
Pattern p = Pattern.compile("'[A-Z '']'");
Matcher m= p.matcher(texto);
while(m.find()){
sb.append(m.group());
}
return sb.toString();
}
//splits the String in order to create an array with nothing but letters
public static void toArray(String s, String[] lista, BufferedWriter out) throws IOException{
lista=s.split("[']");
for(String a:lista){
out.write(a);
System.out.print(a); // to check output
}
}
Just add a space to the character class:
public class HelloWorldRegex {
public static void main(final String... args) {
final String regex = "'([A-Z ])'";
final Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
final String input = "['h','e','l','l','o',' ','w','o','r','l','d']";
final Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.print(matcher.group(1));
}
}
}
Output: hello world
Test the regex online: https://regex101.com/r/eL8uT9/3
What you have now only says you're expecting zero or more letters. You need to say you're expecting some letters or a space.
Pattern p = Pattern.compile("'[A-Z]+' | ' ' ");
You need to use the or operator. This way you're saying you're expecting zero or more letters or a space!
Related
I am trying to return a range of words in string using regular expressions but i am lost in between the lines of doing it. This is my attempt
private static final String REGEX = "\\hello";
private static final String INPUT = "I love holidays hello how are you today during summer";
public static void main( String args[] ) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
System.out.println("value of m >> "+m);//to print >>> hello how are you
Does anyone know how I can return this word hello how are you from the string using regex
Since the requirement is to start at hello and end at the first you you could use the following expression: hello.*?you.
Here .*? means you want to match anything but as little as possible. This is meant to stop at the first you rather than the the last.
If you want to prevent matches on input where hello and you are only parts of other words surround them with the word boundary \b: \bhello\b.*?\byou\b.
If you want to match inside sentences only, i.e. the match should not include ., ! or ?, you could use a negative character class like [^.!?], i.e. replace .*? with [^.!?]*?. Note that inside the character class . has a literal meaning, i.e. it is the dot and not the "any character" wildcard.
I am trying to find a match between commas if it contains a specific string.
so far i have ,(.*?myString.?*),
Obviously this finds all the input between the first comma in the entire input and the first comma after the string i want. How do i reference the comma immediately before the string that i want?
Edit: i also want to find the match that occurs after a specific set of characters
ie. occurs after (fooo)
dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa
returns gfhhhgdtheMatchfhhhfd, not gfdsgdtheMatchfdsgfd
The following regex should do it :
[^,]+theMatch.*?(?=,)
see regex demo / explanation
Java ( demo )
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegEx {
public static void main(String[] args) {
String s = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa";
String r = "[^,]+theMatch.*?(?=,)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group()); // gfdsgdtheMatchfdsgfd
}
}
}
Edit
use this regex fooo.*?([^,]+theMatch.*?)(?=,) demo
You are finding too much because .* will include the comma.
You need the following regular expression: ,([^,]*myinput[^,]*),
[^,]* basically says find all non-comma characters.
I would suggest the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(0));
// prints out ",myinput,"
System.out.println(m.group(1));
// prints out "myinput"
}
}
}
Here is a StackOverflow question that is basically the same with some very good answers associated:
Regex to find internal match between two characters
For more on regular expressions in Java look here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
If you want the position of the comma proceeding your input string use the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(str.indexOf(m.group(0)));
// prints out "16"
}
}
}
By feeding the match of the regular expression into the String Method indexOf( you are able to locate the position of the start of your string.
Edit:
To find the occurrence of a string following another string, simply modify the regex to: fooo.*,([^,]*theMatch[^,]*),
fooo.* will greedily consume all characters between fooo and the start of your match.
Example code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa";
Pattern p = Pattern.compile("fooo.*,([^,]*theMatch[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1));
// prints out: gfhhhgdtheMatchfhhhfd
}
}
}
The usual approach is to use a pattern that cannot match your delimiter in place of .. In this case, you need that only at the front of the pattern; you can use a reluctant quantifier at the back as you already do (though you've misspelled it). For example:
,([^,]*myString.*?),
I asked this question a while ago, but did not get a proper answer, so giving it another shot.
class Test {
public static void main (String[] args) throws java.lang.Exception
{
String file_name = "C:\\Temp\\Test.txt";
String string = FileUtils.readFileToString(new File(file_name), "UTF-8");
String regex = "^(ipv6 pim(?: vrf .*?)? rp-address .*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println("Matcher: " + matcher.group(1));
} else {
System.out.println("No Matches");
}
}
}
The file contains a lot of lines, more than 750, i guess, I want to extract all the lines that match the regex value. Now the problem is, the way i have done the code, does not return any matches. I only does if the first line of the file matches the regex and nothing else, if its somewhere in the middle, no luck. I thought that since everything is in new line it is causing a problem. But even writing some code converting the string into a single line one does not return a value if the pattern does not match is at the beginning.
A sample matching string: ipv6 pim rp-address 20:20:20::F
Try giving the MULTILINE modifier :
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Instead of using an if condition, switch it to a while loop.
while (matcher.find()) {
System.out.println("Matcher: " + matcher.group(1));
}
find() searches for one matching value. To get the next one, you must invoke find() again, hence the loop.
Additionally, the ^ prevents you to match again & again as subsequent searches don't match the starting with criteria. So you may drop the ^.
Alternatively, as Rambler suggested use the Pattern.MULTILINE flag. This will ensure the ^ is used at the beginning of every new line instead of once at the beginning of the whole string.
Im trying to get the hang of pattern and matcher. This method should use the regex pattern to iterate over an array of state capitals and return the state or states that correspond to the pattern. The method works fine when I check for whole strings like "tallahassee" or "salt lake city" but not for something like "^t" what is it that im not getting?
This is the method and main that calls it:
public ArrayList<String> getState(String s) throws RemoteException
{
Pattern pattern = Pattern.compile(s);
Matcher matcher;
int i=0;
System.out.println(s);
for(String ct:capitalValues)
{
matcher = pattern.matcher(ct);
if(ct.toLowerCase().matches(s))
states.add(stateValues[i]);
i++;
}
return states;
}
public static void main (String[] args) throws RemoteException
{
ArrayList<String> result = new ArrayList<String>();
hashTester ht = new hashTester();
result = ht.getState(("^t").toLowerCase());
System.out.println("result: ");
for(String s:result)
System.out.println(s);
}
thanks for your help
You're not even using your matcher for matching. You're using String#matches() method. Both that method and Matcher#matches() method matches the regex against the complete string, and not a part of it. So your regex should cover entire string. If you just want to match with a part of the string, use Matcher#find() method.
You should use it like this:
if(matcher.find(ct.toLowerCase())) {
// Found regex pattern
}
BTW, if you only want to see if a string starts with t, you can directly use String#startsWith() method. No need of regex for that case. But I guess it's a general case here.
^ is an anchor character in regex. You have to escape it if you do not want anchoring. Otherwise ^t mens the t at the beginning of the string. Escape it using \\^t
Trying to make a regex that grabs all words like lets just say, chicken, that are not in brackets. So like
chicken
Would be selected but
[chicken]
Would not. Does anyone know how to do this?
String template = "[chicken]";
String pattern = "\\G(?<!\\[)(\\w+)(?!\\])";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(template);
while (m.find())
{
System.out.println(m.group());
}
It uses a combination of negative look-behind and negative look-aheads and boundary matchers.
(?<!\\[) //negative look behind
(?!\\]) //negative look ahead
(\\w+) //capture group for the word
\\G //is a boundary matcher for marking the end of the previous match
(please read the following edits for clarification)
EDIT 1:
If one needs to account for situations like:
"chicken [chicken] chicken [chicken]"
We can replace the regex with:
String regex = "(?<!\\[)\\b(\\w+)\\b(?!\\])";
EDIT 2:
If one also needs to account for situations like:
"[chicken"
"chicken]"
As in one still wants the "chicken", then you could use:
String pattern = "(?<!\\[)?\\b(\\w+)\\b(?!\\])|(?<!\\[)\\b(\\w+)\\b(?!\\])?";
Which essentially accounts for the two cases of having only one bracket on either side. It accomplishes this through the | which acts as an or, and by using ? after the look-ahead/behinds, where ? means 0 or 1 of the previous expression.
I guess you want something like:
final Pattern UNBRACKETED_WORD_PAT = Pattern.compile("(?<!\\[)\\b\\w+\\b(?!])");
private List<String> findAllUnbracketedWords(final String s) {
final List<String> ret = new ArrayList<String>();
final Matcher m = UNBRACKETED_WORD_PAT.matcher(s);
while (m.find()) {
ret.add(m.group());
}
return Collections.unmodifiableList(ret);
}
Use this:
/(?<![\[\w])\w+(?![\w\]])/
i.e., consecutive word characters with no square bracket or word character before or after.
This needs to check both left and right for both a square bracket and a word character, else for your input of [chicken] it would simply return
hicke
Without look around:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatchingTest
{
private static String x = "pig [cow] chicken bull] [grain";
public static void main(String[] args)
{
Pattern p = Pattern.compile("(\\[?)(\\w+)(\\]?)");
Matcher m = p.matcher(x);
while(m.find())
{
String firstBracket = m.group(1);
String word = m.group(2);
String lastBracket = m.group(3);
if ("".equals(firstBracket) && "".equals(lastBracket))
{
System.out.println(word);
}
}
}
}
Output:
pig
chicken
A bit more verbose, sure, but I find it more readable and easier to understand. Certainly simpler than a huge regular expression trying to handle all possible combinations of brackets.
Note that this won't filter out input like [fence tree grass]; it will indicate that tree is a match. You cannot skip tree in that without a parser. Hopefully, this is not a case you need to handle.