How would I find if a whole word, i.e. "EU", exists within the String "I am in the EU.", while not also matching cases like "I am in Europe."?
Basically, I'd like some sort of regex for the word i.e. "EU" with non-alphabetical characters on either side.
.*\bEU\b.*
public static void main(String[] args) {
String regex = ".*\\bEU\\b.*";
String text = "EU is an acronym for EUROPE";
//String text = "EULA should not match";
if(text.matches(regex)) {
System.out.println("It matches");
} else {
System.out.println("Doesn't match");
}
}
You could do something like
String str = "I am in the EU.";
Matcher matcher = Pattern.compile("\\bEU\\b").matcher(str);
if (matcher.find()) {
System.out.println("Found word EU");
}
Use a pattern with word boundaries:
String str = "I am in the EU.";
if (str.matches(".*\\bEU\\b.*"))
doSomething();
Take a look at the docs for Pattern.
Related
I need to make a method that will retrieve words from the text without anything (punctuation etc.) except lowercase words themselves.
BUT I've struggled for 2 hours with regex pattern and faced such a problem.
There are words like "50-year" in the text.
And with my regex, output will be like:
-year
Instead of a normal
year
But I cannot replace dash symbol "-" cause there is another words with hyphen that should be left.
Here is a code:
public List<String> retrieveWordsFromFile() {
List<String> wordsFromText = new ArrayList<>();
scanner.useDelimiter("\\n+|\\s+|'");
while (scanner.hasNext()) {
wordsFromText.add(scanner.next()
.toLowerCase()
.replaceAll("^s$", "is")
.replaceAll("[^\\p{Lower}\\-]", "")
);
}
wordsFromText.removeIf(word -> word.equals(""));
return wordsFromText;
}
So how can I say that I need to replace everything except text and words with dash starting only with a letter/s. So this regex string should probably be such a "merged" into one sequence?
Use the regex, \\b[\\p{Lower}]+\\-[\\p{Lower}]+\\b|\\b[\\p{Lower}]+\\b
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "Hello world", "Hello world 123", "HELLO world", "50-year", "stack-overflow" };
// Define regex pattern
Pattern pattern = Pattern.compile("\\b[\\p{Lower}]+\\-[\\p{Lower}]+\\b|\\b[\\p{Lower}]+\\b");
for (String s : arr) {
// The string to be matched
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
// Matched string
String matchedStr = matcher.group();
// Display the matched string
System.out.println(matchedStr);
}
}
}
}
Output:
world
world
world
year
stack-overflow
Explanation of regex:
\b species the word boundary.
+ specifies one or more characters.
| specifies OR
This is how you can discard the non-matching text:
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "Hello world", "Hello world 123", "HELLO world", "50-year", "stack-overflow", "HELLO",
"HELLO WORLD", "&^*%", "hello", "123", "1w23" };
// Regex pattern
String regex = ".*?(\\b[\\p{Lower}]+\\-[\\p{Lower}]+\\b|\\b[\\p{Lower}]+\\b).*";
for (String s : arr) {
// Replace the string with group(1)
String str = s.replaceAll(regex, "$1");
// If the replaced string does not match the regex pattern, replace it with
// empty string
s = !str.matches(regex) ? "" : str;
// Display the replaced string if it is not empty
if (!s.isEmpty()) {
System.out.println(s);
}
}
}
}
Output:
world
world
world
year
stack-overflow
hello
Explanation of replacement:
.*? matches everything reluctantly i.e. before it yields to the next pattern.
s.replaceAll(regex, "$1") will replace s with group(1)
I have this regex:
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$
I need a regex to accept a minimum word length of 8, letters(uppercase & lowercase), numbers and these characters:
!#$%&'*+-/=?^_`{|}~"(),:;<>#[]
It works when I tested it here.
This is how I used it in Java Android.
public static final String regex = "^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\\]]{8,}$";
This is the error that I received.
java.util.regex.PatternSyntaxException: Missing closing bracket in character class near index 49
^[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>[-\]]{8,}$
If you just want to test if a given input string matches your pattern, you may use String#matches directly, e.g.
String regex = "[a-zA-Z0-9_#.#$%&'*+-/=?^`{|}~!(),:;<>\\[\\]-]{8,}";
String input = "Jon#Skeet#123";
if (input.matches(regex)) {
System.out.println("Found a match");
}
else {
System.out.println("No match");
}
If you wanted to parse a larger input text and identify such matching words, then you would want to use a formal Pattern and Matcher. But, I don't see the need for this just based on your question.
You have to use pattern marcher concept. it may help you.
follow tutorial : https://www.mkyong.com/regular-expressions/how-to-validate-password-with-regular-expression/
Here is one Example.
try {
Pattern pattern;
Matcher matcher;
final String PASSWORD_PATTERN = "((?=.*\\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[##$%]).{6,20})";
pattern = Pattern.compile(PASSWORD_PATTERN);
matcher = pattern.matcher(password_string );
if(matcher.matches()){
Log.e("TAG", "TRUE")
}else{
Log.e("TAG", "FALSE")
}
} catch (RuntimeException e) {
return false;
}
someone can help me with code?
How to search word in String text, this word end "." or "," in java
I don't want search like this to find it
String word = "test.";
String wordSerch = "I trying to tasting the Artestem test.";
String word1 = "test,"; // here with ","
String word2 = "test."; // here with "."
String word3 = "test"; //here without
//after i make string array and etc...
if((wordSearch.equalsIgnoreCase(word1))||
(wordSearch.equalsIgnoreCase(word2))||
(wordSearh.equalsIgnoreCase(word3))) {
}
if (wordSearch.contains(gramer))
//it's not working because the word Artestem will contain test too, and I don't need it
You can use the matches(Regex) function with a String
String word = "test.";
boolean check = false;
if (word.matches("\w*[\.,\,]") {
check = true;
}
You can use regex for this
Matcher matcher = Pattern.compile("\\btest\\b").matcher(wordSearch);
if (matcher.find()) {
}
\\b\\b will match only a word. So "Artestem" will not match in this case.
matcher.find() will return true if there is a word test in your sentence and false otherwise.
String stringToSearch = "I trying to tasting the Artestem test. test,";
Pattern p1 = Pattern.compile("test[.,]");
Matcher m = p1.matcher(stringToSearch);
while (m.find())
{
System.out.println(m.group());
}
You can transform your String in an Array divided by words(with "split"), and search on that array , checking the last character of the words(charAt) with the character that you want to find.
String stringtoSearch = "This is a test.";
String whatIwantToFind = ",";
String[] words = stringtoSearch.split("\\s+");
for (String word : words) {
if (whatIwantToFind.equalsignorecas(word.charAt(word.length()-1);)) {
System.out.println("FIND");
}
}
What is a word? E.g.:
Is '5' a word?
Is '漢語' a word, or two words?
Is 'New York' a word, or two words?
Is 'Kraftfahrzeughaftpflichtversicherung' (meaning "automobile liability insurance") a word, or 3 words?
For some languages you can use Pattern.compile("[^\\p{Alnum}\u0301-]+") for split words. Use Pattern#split for this.
I think, you can find word by this pattern:
String notWord = "[^\\p{Alnum}\u0301-]{0,}";
Pattern.compile(notWord + "test" + notWord)`
See also: https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
I am missing something basic here. I have this regex (.*)=\1 and I am using it to match 100=100 and its failing. When I remove the back reference from the regex and continue to use the capturing group, it shows that the captured group is '100'. Why does it not work when I try to use the back reference?
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String eqPattern = "(.*)=\1";
String input[] = {"1=1"};
testAndPrint(eqPattern, input); // this does not work
eqPattern = "(.*)=";
input = new String[]{"1=1"};
testAndPrint(eqPattern, input); // this works when the backreference is removed from the expr
}
static void testAndPrint(String regexPattern, String[] input) {
System.out.println("\n Regex pattern is "+regexPattern);
Pattern p = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
boolean found = false;
for (String str : input) {
System.out.println("Testing "+str);
Matcher matcher = p.matcher(str);
while (matcher.find()) {
System.out.println("I found the text "+ matcher.group() +" starting at " + "index "+ matcher.start()+" and ending at index "+matcher.end());
found = true;
System.out.println("Group captured "+matcher.group(1));
}
if (!found) {
System.out.println("No match found");
}
}
}
}
When I run this, I get the following output
Regex pattern is (.*)=\1
Testing 100=100
No match found
Regex pattern is (.*)=
Testing 100=100
I found the text 100= starting at index 0 and ending at index 4
Group captured 100 -->If the group contains 100, why doesnt it match when I add \1 above
?
You have to escape the pattern string.
String eqPattern = "(.*)=\\1";
I think you need to escape the backslash.
String eqPattern = "(.*)=\\1";
How would I find if a whole word, i.e. "EU", exists within the String "I am in the EU.", while not also matching cases like "I am in Europe."?
Basically, I'd like some sort of regex for the word i.e. "EU" with non-alphabetical characters on either side.
.*\bEU\b.*
public static void main(String[] args) {
String regex = ".*\\bEU\\b.*";
String text = "EU is an acronym for EUROPE";
//String text = "EULA should not match";
if(text.matches(regex)) {
System.out.println("It matches");
} else {
System.out.println("Doesn't match");
}
}
You could do something like
String str = "I am in the EU.";
Matcher matcher = Pattern.compile("\\bEU\\b").matcher(str);
if (matcher.find()) {
System.out.println("Found word EU");
}
Use a pattern with word boundaries:
String str = "I am in the EU.";
if (str.matches(".*\\bEU\\b.*"))
doSomething();
Take a look at the docs for Pattern.