I have a sort of a problem with this code:
String[] paragraph;
if(paragraph[searchKeyword_counter].matches("(.*)(\\b)"+"is"+"(\\b)(.*)")){
if i am not mistaken to use .matches() and search a particular character in a string i need a .* but what i want to happen is to search a character without matching it to another word.
For example is the keyword i am going to search I do not want it to match with words that contain is character like ship, his, this. so i used \b for boundary but the code above is not working for me.
Example:
String[] Content= {"is,","his","fish","ish","its","is"};
String keyword = "is";
for(int i=0;i<Content.length;i++){
if(content[i].matches("(.*)(\\b)"+keyword+"(\\b)(.*)")){
System.out.println("There are "+i+" is.");
}
}
What i want to happen here is that it will only match with is is, but not with his fish. So is should match with is, and is meaning I want it to match even the character is beside a non-alphanumerical character and spaces.
What is the problem with the code above?
what if one of the content has a uppercase character example IS and it is compared with is, it will be unmatched. Correct my if i am wrong. How to match a lower cased character to a upper cased character without changing the content of the source?
String string = "...";
String word = "is";
Pattern p = Pattern.compile("\\b" + Pattern.quote(word) + "\\b");
Matcher m = p.matcher(string);
if (m.find()) {
...
}
just add spaces like this:
suppose message equal your content string and pattern is your keyword
if ((message).matches(".* " + pattern + " .*")||(message).matches("^" + pattern + " .*")
||(message).matches(".* " + pattern + "$")) {
Related
i have a full string like this - "Hello all you guys"
and i have a bad word like "all"
now i managed to find the second string in the first that's easy,
but let's say my first string is "Hello a.l.l you guys"
or "Hello a,l,l you guys"
or even "Hello a l l you guys"
is there a regex way to find it ?
what i've got so far is
String wordtocheck =pair.getKey().toString();
String newerstr = "";
for(int i=0;i<wordtocheck.length();i++)
newerstr+=wordtocheck.charAt(i)+"\\.";
Pattern.compile("(?i)\\b(newerstr)(?=\\W)").matcher(currentText.toString());
but it doesn't do the trick
thanks to all helpers
You may build the pattern dynamically by inserting \W* (=zero or more non-word chars, that is, chars that are not letters, digits or underscore) in between the characters of a keyword to search for:
String s = "Hello a l l you guys";
String key = "all";
String pat = "(?i)\\b" + TextUtils.join("\\W*", key.split("")) + "\\b";
System.out.println("Pattern: " + pat);
Matcher m = Pattern.compile(pat).matcher(s);
if (m.find())
{
System.out.println("Found: " + m.group());
}
See the online demo (String.join is used instead of TextUtils.join since this is a Java demo)
If there can be non-word chars in the search words, you need to replace \b word boundaries with (?<!\\S) (the initial \b) and (?!\\S) (instead of the trailing \b), or remove altogether.
Try this
String str="Hello .a-l l? guys";
str=str.replaceAll("\\W",""); //replaces all non-words chars with empty string.
str is now "Helloallguys"
I have some strings which are indexed and are dynamic.
For example:
name01,
name02,
name[n]
now I need to separate name from index.
I've come up with this regex which works OK to extract index.
([0-9]+(?!.*[0-9]))
But, there are some exceptions of these names. Some of them may have a number appended which is not the index.(These strings are limited and I know them, meaning I can add them as "exceptions" in the regex)
For example,
panLast4[01]
Here the last '4' is not part of the index, so I need to distinguish.
So I tried:
[^panLast4]([0-9]+(?!.*[0-9]))
Which works for panLast4[123] but not panLast4[43]
Note: the "[" and "]" is for explanation purposes only, it's not present in the strings
What is wrong?
Thanks
You can use the split method with this pattern:
(?<!^panLast(?=4)|^nm(?=14)|^nm1(?=4))(?=[0-9]+$)
The idea is to find the position where there are digits until the end of the string (?=[0-9]+$). But the match will succeed if the negative lookbehind allows it (to exclude particular names (panLast4 and nm14 here) that end with digits). When one of these particular names is found, the regex engine must go to the next position to obtain a match.
Example:
String s ="panLast412345";
String[] res = s.split("(?<!^panLast(?=4)|^nm(?=14)|^nm1(?=4))(?=[0-9]+$)", 2);
if ( res.length==2 ) {
System.out.println("name: " + res[0]);
System.out.println("ID: " + res[1]);
}
An other method with matches() that simply uses a lazy quantifier as last alternative:
Pattern p = Pattern.compile("(panLast4|nm14|.*?)([0-9]+)");
String s = "panLast42356";
Matcher m = p.matcher(s);
if ( m.matches() && m.group(1).length()>0 ) {
System.out.println("name: "+ m.group(1));
System.out.println("ID: "+ m.group(2));
}
Can you help me in finding a regex that take list of phrases and check if one of these phrases exist in the given text, please?
Example:
If I have in the hashSet the following words:
كيف الحال
إلى أين
أين يوجد
هل من أحد هنا
And the given text is: كيف الحال أتمنى أن تكون بخير
I want to get after performing regex: كيف الحال
My initial code:
HashSet<String> QWWords = new HashSet<String>();
QWWords.add("كيف الحال");
QWWords.add("إلى أين");
QWWords.add("أين يوجد");
QWWords.add("هل من أحد هنا");
String s1 = "كيف الحال أتمنى أن تكون بخير";
for (String qp : QWWords) {
Pattern p = Pattern.compile("[\\s" + qp + "\\s]");
Matcher m = p.matcher(s1);
String found = "";
while (m.find()) {
found = m.group();
System.out.println(found);
}
}
[...] is character class and character class can match only one character it specifies. For instance character class like [abc] can match only a OR b OR c. So if you want to find only word abc don't surround it with [...].
Another problem is that you are using \\s as word separator, so in following String
String data = "foo foo foo foo";
regex \\sfoo\\s will not be able to match first foo because there is no space before.
So first match it will find will be
String data = "foo foo foo foo";
// this one--^^^^^
Now, since regex consumed space after second foo it can't reuse it in next match so third foo will also be skipped because there is no space available to match before it.
You will also not match forth foo because this time there is no space after it.
To solve this problem you can use \\b - word boundary which checks if place it represents is between alphanumeric and non-alphanumeric characters (or start/end of string).
So instead of
Pattern p = Pattern.compile("[\\s" + qp + "\\s]");
use
Pattern p = Pattern.compile("\\b" + qp + "\\b");
or maybe better as Tim mentioned
Pattern p = Pattern.compile("\\b" + qp + "\\b",Pattern.UNICODE_CHARACTER_CLASS);
to make sure that \\b will include Arabic characters in predefined alphanumeric class.
UPDATE:
I am not sure if your words can contain regex metacharacters like { [ + * and so on, so just in case you can also add escaping mechanism to change such characters into literals.
So
"\\b" + qp + "\\b"
can become
"\\b" + Pattern.quote(qp) + "\\b"
I am trying to write a regex that will match urls inside strings of text that may be html-encoded. I am having a considerable amount of trouble with lookaround though. I need something that would correctly match both links in the string below:
some text "http://www.notarealwebsite.com/?q=asdf&searchOrder=1" "http://www.notarealwebsite.com" some other text
A verbose description of what I want would be: "http://" followed by any number of characters that are not spaces, quotes, or the string ""[semicolon]" (I don't care about accepting other non-url-safe characters as delimiters)
I have tried a few regexes using lookahead to check for &'s followed by q's followed by u's and so on, but as soon as I put one into the [^...] negation it just completely breaks down and evaluates more like: "http:// followed by any number of characters that are not spaces, quotes, ampersands, q's, u's, o's, t's, or semicolons" which is obviously not what I am looking for.
This will correctly match the &'s at the beginning of the "[semicolon]:
&(?=q(?=u(?=o(?=t(?=;)))))
But this does not work:
http://[^ "&(?=q(?=u(?=o(?=t(?=;)))))]*
I know just enough about regexes to get into trouble, and that includes not knowing why this won't work the way I want it to. I understand to some extent positive and negative lookaround, but I don't understand why it breaks down inside the [^...]. Is it possible to do this with regexes? Or am I wasting my time trying to make it work?
If your regex implementation supports it, use a positive look ahead and a backreference with a non-greedy expression in the body.
Here is one with your conditions: (["\s]|")(http://.*?)(?=\1)
For example, in Python:
import re
p = re.compile(r'(["\s]|")(https?://.*?)(?=\1)', re.IGNORECASE)
url = "http://test.url/here.php?var1=val&var2=val2"
formatstr = 'text "{0}" more text {0} and more "{0}" test greed"'
data = formatstr.format(url)
for m in p.finditer(data):
print "Found:", m.group(2)
Produces:
Found: http://test.url/here.php?var1=val&var2=val2
Found: http://test.url/here.php?var1=val&var2=val2
Found: http://test.url/here.php?var1=val&var2=val2
Or in Java:
#Test
public void testRegex() {
Pattern p = Pattern.compile("([\"\\s]|")(https?://.*?)(?=\\1)",
Pattern.CASE_INSENSITIVE);
final String URL = "http://test.url/here.php?var1=val&var2=val2";
final String INPUT = "some text " + URL + " more text + \"" + URL +
"\" more then "" + URL + "" testing greed "";
Matcher m = p.matcher(INPUT);
while( m.find() ) {
System.out.println("Found: " + m.group(2));
}
}
Produces the same output.
I'm currently trying to solve a problem from codingbat.com with regular expressions.
I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.
Here is the prompt:
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
etc
My code thus far:
String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
String newStr = "";
while(m.find())
newStr += m.group().replace(word, "");
return newStr;
The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.
For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"
I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.
This is a one-liner solution:
String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.
Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.
Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.
Here's a test of the usual case and the edge case, showing it works:
public static String wordEnds(String input, String word) {
word = Pattern.quote(word); // add this line to be 100% safe
return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}
public static void main(String[] args) {
System.out.println(wordEnds("abcXY123XYijk", "XY"));
System.out.println(wordEnds("abc1xyz1i1j", "1"));
}
Output:
c13i
cxziij
Use positive lookbehind and postive lookahead which are zero-width assertions
(?<=(.)|^)1(?=(.)|$)
^ ^ ^-looks for a character after 1 and captures it in group2
| |->matches 1..you can replace it with any word
|
|->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values
$1 and $2 contains your value..Go on finding till the end
So this should be like
String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij
works here
Use regex as follows:
Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);
Check this demo.