Regex for Words with Apostrophes (Java) - java

I am trying to figure out the regex to match strings that contain only letters and apostrophes. If a string contains an apostrophe, I only want to match it if there is a letter on both sides of it.
What I have so far is [a-zA-Z]+('[a-zA-Z])?
I want to match strings like:
a'a
aa'a
a'aaa
But not:
bb'
'bb

You're almost there, just you need to add + after the char class present inside the optional group.
^[a-zA-Z]+('[a-zA-Z]+)?$
OR
Use this if you want to deal with more than one apostrophe.
^[a-zA-Z]+(?:'[a-zA-Z]+)*$
DEMO
String s = "a'a'a'a a' a'a-'bb";
String parts[] = s.split("[ -]");
for(String i:parts) {
if(!i.isEmpty())
{
System.out.println(i + " => " + i.matches("[a-zA-Z]+(?:'[a-zA-Z]+)*"));
}
}
Output:
a'a'a'a => true
a' => false
a'a => true
'bb => false

public static void main(String[] args) {
String s = "a'a'a";
Pattern pattern = Pattern.compile("^[a-zA-Z]+(?:'[a-zA-Z]+)*$");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("true");
} else {
System.out.println("false");
}
}
output
false

Related

Why Regular expression matches one character less at the end?

Problem is last one character never gets matched.
When I tried displaying using group ,it shows all match except last character.
Its same in all cases.
Below is the code and its o/p.
package mon;
import java.util.*;
import java.util.regex.*;
class HackerRank {
static void Pattern(String text) {
String p="\\d{1,2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5]";
String pattern="(("+p+")\\.){3}"+p;
Pattern pi=Pattern.compile(pattern);
Matcher m=pi.matcher(text);
// System.out.println(m.group());
if(m.find() && m.group().equals(text))
System.out.println(m.group()+"true");
else
System.out.println(m.group()+" false");
}
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
while(sc.hasNext()) {
Pattern(sc.next());
}
sc.close();
}
}
I/P:000.12.12.034;
O/P:000.12.12.03 false
You should properly group the alternatives inside the octet pattern:
String p="(?:\\d{1,2}|[01]\\d{2}|2[0-4]\\d|25[0-5])";
// ^^^ ^
Then build the patter like
String pattern = p + "(?:\\." + p + "){3}";
It will become a bit more efficient. Then, use matches to require a full string match:
if(m.matches()) {...
See a Java demo:
String p="(?:\\d{1,2}|[01]\\d{2}|2[0-4]\\d|25[0-5])";
String pattern = p + "(?:\\." + p + "){3}";
String text = "192.156.34.56";
// System.out.println(pattern); => (?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])(?:\.(?:\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])){3}
Pattern pi=Pattern.compile(pattern);
Matcher m=pi.matcher(text);
if(m.matches())
System.out.println(m.group()+" => true");
else
System.out.println("False"); => 192.156.34.56 => true
And here is the resulting regex demo.

Remove pattern from string in Java

I am currently working on a tool, which helps me to analyze a constantly growing String, that can look like this: String s = "AAAAAAABBCCCDDABQ". What I want to do is to find a sequence of A's and B's, do something and then remove that sequence from the original String.
My code looks like this:
while (someBoolean){
if(Pattern.matches("A+B+", s)) {
//Do stuff
//Remove the found pattern
}
if(Pattern.matches("C+D+", s)) {
//Do other stuff
//Remove the found pattern
}
}
return s;
Also, how I could remove the three sequences, so that s just contains "Q" at the end of the calculation, without and endless loop?
You should use a regex replacement loop, i.e. the methods appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb).
To find one of many patterns, use the | regex matcher, and capture each pattern separately.
You can then use group(int group) to get the matched string for each capture group (first group is group 1), which returns null if that group didn't match. For better performance, to simply check whether the group matched, use start(int group), which returns -1 if that group didn't match.
Example:
String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
if (m.start(1) != -1) { // Group 1 found
System.out.println("Found AB: " + m.group(1));
m.appendReplacement(buf, ""); // Replace matched substring with ""
} else if (m.start(2) != -1) { // Group 2 found
System.out.println("Found CD: " + m.group(2));
m.appendReplacement(buf, ""); // Replace matched substring with ""
}
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);
Output
Found AB: AAAAAAABB
Found CD: CCCDD
Found AB: AB
Remain: Q
This solution assumes that the string always ends in Q.
String s="AAAAAAABBCCCDDABQ";
Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");
while (s.length() > 1){
Matcher abMatcher = abPattern.matcher(s);
if (abMatcher.find()) {
s = abMatcher.replaceFirst("");
//Do other stuff
}
Matcher cdMatcher = cdPattern.matcher(s);
if (cdMatcher.find()) {
s = cdMatcher.replaceFirst("");
//Do other stuff
}
}
System.out.println(s);
You are probably looking for something like this:
String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace
for (String ch : chars) {
if (result.contains(ch)) {
String pattern = "[" + ch + "]+";
result = result.replaceAll(pattern, ch);
}
}
System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"
This basically replace sequence of each character for single one.
If you want to remove the sequence completely, just replace ch to "" in replaceAll method parameters inside if body.

How to search word in String text, this word end "." or "," in java

someone can help me with code?
How to search word in String text, this word end "." or "," in java
I don't want search like this to find it
String word = "test.";
String wordSerch = "I trying to tasting the Artestem test.";
String word1 = "test,"; // here with ","
String word2 = "test."; // here with "."
String word3 = "test"; //here without
//after i make string array and etc...
if((wordSearch.equalsIgnoreCase(word1))||
(wordSearch.equalsIgnoreCase(word2))||
(wordSearh.equalsIgnoreCase(word3))) {
}
if (wordSearch.contains(gramer))
//it's not working because the word Artestem will contain test too, and I don't need it
You can use the matches(Regex) function with a String
String word = "test.";
boolean check = false;
if (word.matches("\w*[\.,\,]") {
check = true;
}
You can use regex for this
Matcher matcher = Pattern.compile("\\btest\\b").matcher(wordSearch);
if (matcher.find()) {
}
\\b\\b will match only a word. So "Artestem" will not match in this case.
matcher.find() will return true if there is a word test in your sentence and false otherwise.
String stringToSearch = "I trying to tasting the Artestem test. test,";
Pattern p1 = Pattern.compile("test[.,]");
Matcher m = p1.matcher(stringToSearch);
while (m.find())
{
System.out.println(m.group());
}
You can transform your String in an Array divided by words(with "split"), and search on that array , checking the last character of the words(charAt) with the character that you want to find.
String stringtoSearch = "This is a test.";
String whatIwantToFind = ",";
String[] words = stringtoSearch.split("\\s+");
for (String word : words) {
if (whatIwantToFind.equalsignorecas(word.charAt(word.length()-1);)) {
System.out.println("FIND");
}
}
What is a word? E.g.:
Is '5' a word?
Is '漢語' a word, or two words?
Is 'New York' a word, or two words?
Is 'Kraftfahrzeughaftpflichtversicherung' (meaning "automobile liability insurance") a word, or 3 words?
For some languages you can use Pattern.compile("[^\\p{Alnum}\u0301-]+") for split words. Use Pattern#split for this.
I think, you can find word by this pattern:
String notWord = "[^\\p{Alnum}\u0301-]{0,}";
Pattern.compile(notWord + "test" + notWord)`
See also: https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Regex back reference to match a number (or any char sequence) with itself

I am missing something basic here. I have this regex (.*)=\1 and I am using it to match 100=100 and its failing. When I remove the back reference from the regex and continue to use the capturing group, it shows that the captured group is '100'. Why does it not work when I try to use the back reference?
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args) {
String eqPattern = "(.*)=\1";
String input[] = {"1=1"};
testAndPrint(eqPattern, input); // this does not work
eqPattern = "(.*)=";
input = new String[]{"1=1"};
testAndPrint(eqPattern, input); // this works when the backreference is removed from the expr
}
static void testAndPrint(String regexPattern, String[] input) {
System.out.println("\n Regex pattern is "+regexPattern);
Pattern p = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
boolean found = false;
for (String str : input) {
System.out.println("Testing "+str);
Matcher matcher = p.matcher(str);
while (matcher.find()) {
System.out.println("I found the text "+ matcher.group() +" starting at " + "index "+ matcher.start()+" and ending at index "+matcher.end());
found = true;
System.out.println("Group captured "+matcher.group(1));
}
if (!found) {
System.out.println("No match found");
}
}
}
}
When I run this, I get the following output
Regex pattern is (.*)=\1
Testing 100=100
No match found
Regex pattern is (.*)=
Testing 100=100
I found the text 100= starting at index 0 and ending at index 4
Group captured 100 -->If the group contains 100, why doesnt it match when I add \1 above
?
You have to escape the pattern string.
String eqPattern = "(.*)=\\1";
I think you need to escape the backslash.
String eqPattern = "(.*)=\\1";

UNICODE Regex in java

In a combined regex it looks like working and it is failing when am using it in pattern. please help.
^(?=.*\\p{Nd})(?=.*\\p{L})(?!.*(.{2,})\\1).{5,12}$
this seems to be working but when I split it's failing.
^(?=.*\\p{Nd})(?=.*\\p{L})
Also I am looking for UNICODE validation to ignore any special character and just accept mixture of letters/Alpha & digits (atleast one alpha and one digit)
public void setValidations(){
validation1 = "^(?=.*\\p{Nd})(?=.*\\p{L})"; //this is failing
validation2 = "^.{5,12}$";
validation3 = "(\\S+?)\\1";
p1 = Pattern.compile(validation1);
p3 = Pattern.compile(validation3);
}
public boolean validateString(String str){
matcher1 = p1.matcher(str);
matcher3 = p3.matcher(str);
if(matcher1.find()){ //Expecting string passed "invalid" to fail (no numeric in it)
System.out.println(str + " String must have letters & number at least one");
return false;
}
if (!str.matches(validation2)){
System.out.println(str + " String must be between 5 and 12 chars in length");
return false;
}
if (matcher3.find()){
System.out.println(str + " got repeated: " + matcher3.group(1) + " String must not contain any immediate repeated sequence of characters");
return false;
}
return true;
}
public static void main(String[] args) {
StringValidation sv = new StringValidation();
String s2[] = {"1newAb", "A1DOALDO", "1234567AaAaAaAa", "123456ab3434", "$1214134abA", "invalid"};
boolean b3;
for(int i=0; i<s2.length; i++){
b3 = s2[i].matches("^(?=.*\\p{Nd})(?=.*\\p{L})(?!.*(.{2,})\\1).{5,12}$");
System.out.println(s2[i] + " "+ b3); // string "invalid" returning false (expected)
}
for (String str : s2) {
if(sv.validateString(str))
System.out.println(str + "String is Valid");
}
}
Also I want "$1214134abA" this string to fail since it has $
Pattern.compile("^(?=.*\\p{Nd})(?=.*\\p{L})").matcher("invalid").find() returns false as "invalid" does not contain a digit. Thus the if condition is evaluated to false and that block is skipped.
Use ^(?=[\\p{Nd}\\p{L}]*\\p{Nd})(?=[\\p{Nd}\\p{L}]*\\p{L}) to avoid characters other than letters and digits.
It will not accept $1214134abA as it contains $.
It seems that you forgot to use negation in
if(matcher1.find()){ //Expecting
...
return false;
}
It should return false if it will not find match. Try with
if(!matcher1.find()){ //Expecting...
Also since you want to check if your entire string is build on letters and digits instead of .{5,12} at the end try [\\p{L}\\p{Nd}]{5,12} .

Categories

Resources