trying to find a word with seperators in string

trying to find a word with seperators in string - java

i have a full string like this - "Hello all you guys"
and i have a bad word like "all"
now i managed to find the second string in the first that's easy,
but let's say my first string is "Hello a.l.l you guys"
or "Hello a,l,l you guys"
or even "Hello a l l you guys"
is there a regex way to find it ?
what i've got so far is
String wordtocheck =pair.getKey().toString();
String newerstr = "";
for(int i=0;i<wordtocheck.length();i++)
newerstr+=wordtocheck.charAt(i)+"\\.";
Pattern.compile("(?i)\\b(newerstr)(?=\\W)").matcher(currentText.toString());
but it doesn't do the trick
thanks to all helpers

You may build the pattern dynamically by inserting \W* (=zero or more non-word chars, that is, chars that are not letters, digits or underscore) in between the characters of a keyword to search for:
String s = "Hello a l l you guys";
String key = "all";
String pat = "(?i)\\b" + TextUtils.join("\\W*", key.split("")) + "\\b";
System.out.println("Pattern: " + pat);
Matcher m = Pattern.compile(pat).matcher(s);
if (m.find())
{
System.out.println("Found: " + m.group());
}
See the online demo (String.join is used instead of TextUtils.join since this is a Java demo)
If there can be non-word chars in the search words, you need to replace \b word boundaries with (?<!\\S) (the initial \b) and (?!\\S) (instead of the trailing \b), or remove altogether.

Try this
String str="Hello .a-l l? guys";
str=str.replaceAll("\\W",""); //replaces all non-words chars with empty string.
str is now "Helloallguys"

Related

Java Regex Return Last Word

String regex = "(some|text|)";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
while (matcher.find())
{
int start = matcher.start();
int end = matcher.end();
System.out.print("Start index: " + start);
System.out.print(" End index: " + end + " ");
System.out.println(matcher.group());
}
Hi I would like to return the the full substring including to the end of the word. For example,
if the input is:
String input = "I am a texte";
I would expect it to return 7 11, I would basically would want to return text upto "e" rather than "t". Is this possible? If so, how can this be implemented?

Why use a regexp for this? String has lastIndexOf to find the last index of a delimiter (like space), and it looks like you're not trying to find a "word" but "the substring after the last space" (which are not the same thing in many, many languages), so given that, just use:
String last = input.substring(input.getLastIndexOf(' ') + 1);
(optionally as two lines with a check to see if getLastIndexOf is a sensible position), and done?

you can use the following regex instead:
String regex = "(.[^\\s+].*some*.[^\\s]+|.[^\\s+].*text*.[^\\s]+)";
This will take all the words that start with some or text. For example: someone.

Extract Arabic phrases from a given text in java

Can you help me in finding a regex that take list of phrases and check if one of these phrases exist in the given text, please?
Example:
If I have in the hashSet the following words:
كيف الحال
إلى أين
أين يوجد
هل من أحد هنا
And the given text is: كيف الحال أتمنى أن تكون بخير
I want to get after performing regex: كيف الحال
My initial code:
HashSet<String> QWWords = new HashSet<String>();
QWWords.add("كيف الحال");
QWWords.add("إلى أين");
QWWords.add("أين يوجد");
QWWords.add("هل من أحد هنا");
String s1 = "كيف الحال أتمنى أن تكون بخير";
for (String qp : QWWords) {
Pattern p = Pattern.compile("[\\s" + qp + "\\s]");
Matcher m = p.matcher(s1);
String found = "";
while (m.find()) {
found = m.group();
System.out.println(found);
}
}

[...] is character class and character class can match only one character it specifies. For instance character class like [abc] can match only a OR b OR c. So if you want to find only word abc don't surround it with [...].
Another problem is that you are using \\s as word separator, so in following String
String data = "foo foo foo foo";
regex \\sfoo\\s will not be able to match first foo because there is no space before.
So first match it will find will be
String data = "foo foo foo foo";
// this one--^^^^^
Now, since regex consumed space after second foo it can't reuse it in next match so third foo will also be skipped because there is no space available to match before it.
You will also not match forth foo because this time there is no space after it.
To solve this problem you can use \\b - word boundary which checks if place it represents is between alphanumeric and non-alphanumeric characters (or start/end of string).
So instead of
Pattern p = Pattern.compile("[\\s" + qp + "\\s]");
use
Pattern p = Pattern.compile("\\b" + qp + "\\b");
or maybe better as Tim mentioned
Pattern p = Pattern.compile("\\b" + qp + "\\b",Pattern.UNICODE_CHARACTER_CLASS);
to make sure that \\b will include Arabic characters in predefined alphanumeric class.
UPDATE:
I am not sure if your words can contain regex metacharacters like { [ + * and so on, so just in case you can also add escaping mechanism to change such characters into literals.
So
"\\b" + qp + "\\b"
can become
"\\b" + Pattern.quote(qp) + "\\b"

Iterating through String with .find() in Java regex

I'm currently trying to solve a problem from codingbat.com with regular expressions.
I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.
Here is the prompt:
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
etc
My code thus far:
String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
String newStr = "";
while(m.find())
newStr += m.group().replace(word, "");
return newStr;
The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.
For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"
I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.

This is a one-liner solution:
String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.
Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.
Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.
Here's a test of the usual case and the edge case, showing it works:
public static String wordEnds(String input, String word) {
word = Pattern.quote(word); // add this line to be 100% safe
return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}
public static void main(String[] args) {
System.out.println(wordEnds("abcXY123XYijk", "XY"));
System.out.println(wordEnds("abc1xyz1i1j", "1"));
}
Output:
c13i
cxziij

Use positive lookbehind and postive lookahead which are zero-width assertions
(?<=(.)|^)1(?=(.)|$)
^ ^ ^-looks for a character after 1 and captures it in group2
| |->matches 1..you can replace it with any word
|
|->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values
$1 and $2 contains your value..Go on finding till the end
So this should be like
String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij
works here

Use regex as follows:
Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);
Check this demo.

String Matches, Java

I have a sort of a problem with this code:
String[] paragraph;
if(paragraph[searchKeyword_counter].matches("(.*)(\\b)"+"is"+"(\\b)(.*)")){
if i am not mistaken to use .matches() and search a particular character in a string i need a .* but what i want to happen is to search a character without matching it to another word.
For example is the keyword i am going to search I do not want it to match with words that contain is character like ship, his, this. so i used \b for boundary but the code above is not working for me.
Example:
String[] Content= {"is,","his","fish","ish","its","is"};
String keyword = "is";
for(int i=0;i<Content.length;i++){
if(content[i].matches("(.*)(\\b)"+keyword+"(\\b)(.*)")){
System.out.println("There are "+i+" is.");
}
}
What i want to happen here is that it will only match with is is, but not with his fish. So is should match with is, and is meaning I want it to match even the character is beside a non-alphanumerical character and spaces.
What is the problem with the code above?
what if one of the content has a uppercase character example IS and it is compared with is, it will be unmatched. Correct my if i am wrong. How to match a lower cased character to a upper cased character without changing the content of the source?

String string = "...";
String word = "is";
Pattern p = Pattern.compile("\\b" + Pattern.quote(word) + "\\b");
Matcher m = p.matcher(string);
if (m.find()) {
...
}

just add spaces like this:
suppose message equal your content string and pattern is your keyword
if ((message).matches(".* " + pattern + " .*")||(message).matches("^" + pattern + " .*")
||(message).matches(".* " + pattern + "$")) {

Regarding String manipulation

I have a String str which can have list of values like below. I want the first letter in the string to be uppercase and if underscore appears in the string then i need to remove it and need to make the letter after it as upper case. The rest all letter i want it to be lower case.
""
"abc"
"abc_def"
"Abc_def_Ghi12_abd"
"abc__de"
"_"
Output:
""
"Abc"
"AbcDef"
"AbcDefGhi12Abd"
"AbcDe"
""

Well, without showing us that you put any effort into this problem this is going to be kinda vague.
I see two possibilities here:
Split the string at underscores, apply the answer from this question to each part and re-combine them.
Create a StringBuilder, walk through the string and keep track of whether you are
at the start of the string
after an underscore or
somewhere else
and act appropriately on the current character before appending it to the StringBuilder instance.

replace _ with space (str.replace("_", " "))
use WordUtils.capitalizeFully(str); (from commons-lang)
replace space with nothing (str.replace(" ", ""))

You can use following regexp based code:
public static String camelize(String input) {
char[] c = input.toCharArray();
Pattern pattern = Pattern.compile(".*_([a-z]).*");
Matcher m = pattern.matcher(input);
while ( m.find() ) {
int index = m.start(1);
c[index] = String.valueOf(c[index]).toUpperCase().charAt(0);
}
return String.valueOf(c).replace("_", "");
}

Use Pattern/Matcher in the java.util.regex package:
for each string that is in your array do the following:
StringBuffer output = new StringBuffer();
Matcher match = Pattern.compile("[^|_](\w)").matcher(inStr);
while(match.find()) {
match.appendReplacement(output, matcher.match(0).ToUpper());
}
match.appendTail(output);
// Will have the properly capitalized string.
String capitalized = output.ToString();
The regular expression looks for either the start of the string or an underscore "[^|_]"
Then puts the following character into a group "(\w)"
The code then goes through each of the matches in the input string capitalizing the first satisfying group.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

trying to find a word with seperators in string - java

Try this String str="Hello .a-l l? guys"; str=str.replaceAll("\\W",""); //replaces all non-words chars with empty string. str is now "Helloallguys"

Related

Java Regex Return Last Word

Extract Arabic phrases from a given text in java

Iterating through String with .find() in Java regex

String Matches, Java

Regarding String manipulation

Categories

Resources