UNICODE Regex in java

UNICODE Regex in java - java

In a combined regex it looks like working and it is failing when am using it in pattern. please help.
^(?=.*\\p{Nd})(?=.*\\p{L})(?!.*(.{2,})\\1).{5,12}$
this seems to be working but when I split it's failing.
^(?=.*\\p{Nd})(?=.*\\p{L})
Also I am looking for UNICODE validation to ignore any special character and just accept mixture of letters/Alpha & digits (atleast one alpha and one digit)
public void setValidations(){
validation1 = "^(?=.*\\p{Nd})(?=.*\\p{L})"; //this is failing
validation2 = "^.{5,12}$";
validation3 = "(\\S+?)\\1";
p1 = Pattern.compile(validation1);
p3 = Pattern.compile(validation3);
}
public boolean validateString(String str){
matcher1 = p1.matcher(str);
matcher3 = p3.matcher(str);
if(matcher1.find()){ //Expecting string passed "invalid" to fail (no numeric in it)
System.out.println(str + " String must have letters & number at least one");
return false;
}
if (!str.matches(validation2)){
System.out.println(str + " String must be between 5 and 12 chars in length");
return false;
}
if (matcher3.find()){
System.out.println(str + " got repeated: " + matcher3.group(1) + " String must not contain any immediate repeated sequence of characters");
return false;
}
return true;
}
public static void main(String[] args) {
StringValidation sv = new StringValidation();
String s2[] = {"1newAb", "A1DOALDO", "1234567AaAaAaAa", "123456ab3434", "$1214134abA", "invalid"};
boolean b3;
for(int i=0; i<s2.length; i++){
b3 = s2[i].matches("^(?=.*\\p{Nd})(?=.*\\p{L})(?!.*(.{2,})\\1).{5,12}$");
System.out.println(s2[i] + " "+ b3); // string "invalid" returning false (expected)
}
for (String str : s2) {
if(sv.validateString(str))
System.out.println(str + "String is Valid");
}
}
Also I want "$1214134abA" this string to fail since it has $

Pattern.compile("^(?=.*\\p{Nd})(?=.*\\p{L})").matcher("invalid").find() returns false as "invalid" does not contain a digit. Thus the if condition is evaluated to false and that block is skipped.
Use ^(?=[\\p{Nd}\\p{L}]*\\p{Nd})(?=[\\p{Nd}\\p{L}]*\\p{L}) to avoid characters other than letters and digits.
It will not accept $1214134abA as it contains $.

It seems that you forgot to use negation in
if(matcher1.find()){ //Expecting
...
return false;
}
It should return false if it will not find match. Try with
if(!matcher1.find()){ //Expecting...
Also since you want to check if your entire string is build on letters and digits instead of .{5,12} at the end try [\\p{L}\\p{Nd}]{5,12} .

Related

Remove pattern from string in Java

I am currently working on a tool, which helps me to analyze a constantly growing String, that can look like this: String s = "AAAAAAABBCCCDDABQ". What I want to do is to find a sequence of A's and B's, do something and then remove that sequence from the original String.
My code looks like this:
while (someBoolean){
if(Pattern.matches("A+B+", s)) {
//Do stuff
//Remove the found pattern
}
if(Pattern.matches("C+D+", s)) {
//Do other stuff
//Remove the found pattern
}
}
return s;
Also, how I could remove the three sequences, so that s just contains "Q" at the end of the calculation, without and endless loop?

You should use a regex replacement loop, i.e. the methods appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb).
To find one of many patterns, use the | regex matcher, and capture each pattern separately.
You can then use group(int group) to get the matched string for each capture group (first group is group 1), which returns null if that group didn't match. For better performance, to simply check whether the group matched, use start(int group), which returns -1 if that group didn't match.
Example:
String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
if (m.start(1) != -1) { // Group 1 found
System.out.println("Found AB: " + m.group(1));
m.appendReplacement(buf, ""); // Replace matched substring with ""
} else if (m.start(2) != -1) { // Group 2 found
System.out.println("Found CD: " + m.group(2));
m.appendReplacement(buf, ""); // Replace matched substring with ""
}
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);
Output
Found AB: AAAAAAABB
Found CD: CCCDD
Found AB: AB
Remain: Q

This solution assumes that the string always ends in Q.
String s="AAAAAAABBCCCDDABQ";
Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");
while (s.length() > 1){
Matcher abMatcher = abPattern.matcher(s);
if (abMatcher.find()) {
s = abMatcher.replaceFirst("");
//Do other stuff
}
Matcher cdMatcher = cdPattern.matcher(s);
if (cdMatcher.find()) {
s = cdMatcher.replaceFirst("");
//Do other stuff
}
}
System.out.println(s);

You are probably looking for something like this:
String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace
for (String ch : chars) {
if (result.contains(ch)) {
String pattern = "[" + ch + "]+";
result = result.replaceAll(pattern, ch);
}
}
System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"
This basically replace sequence of each character for single one.
If you want to remove the sequence completely, just replace ch to "" in replaceAll method parameters inside if body.

Regex for Words with Apostrophes (Java)

I am trying to figure out the regex to match strings that contain only letters and apostrophes. If a string contains an apostrophe, I only want to match it if there is a letter on both sides of it.
What I have so far is [a-zA-Z]+('[a-zA-Z])?
I want to match strings like:
a'a
aa'a
a'aaa
But not:
bb'
'bb

You're almost there, just you need to add + after the char class present inside the optional group.
^[a-zA-Z]+('[a-zA-Z]+)?$
OR
Use this if you want to deal with more than one apostrophe.
^[a-zA-Z]+(?:'[a-zA-Z]+)*$
DEMO
String s = "a'a'a'a a' a'a-'bb";
String parts[] = s.split("[ -]");
for(String i:parts) {
if(!i.isEmpty())
{
System.out.println(i + " => " + i.matches("[a-zA-Z]+(?:'[a-zA-Z]+)*"));
}
}
Output:
a'a'a'a => true
a' => false
a'a => true
'bb => false

public static void main(String[] args) {
String s = "a'a'a";
Pattern pattern = Pattern.compile("^[a-zA-Z]+(?:'[a-zA-Z]+)*$");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("true");
} else {
System.out.println("false");
}
}
output
false

password validation and UNICODE

How to validate regex for condition:
Password must not contain any sequence of characters immediately followed by the same sequence of characters. I am having other conditions as well and am using
(?=.*(..+)\\1)
to validate for immediate sequence repeat. And it is failing. This piece of code returns "true" for 3rd and 4th strings passed; I need it to return false. Please help.
String s2[] = {"1newAb", "newAB1", "1234567AaAa", "123456ab3434", "love", "love1"};
boolean b3;
for(int i=0; i<s2.length; i++){
b3 = s2[i].matches("^(?=.*[0-9])(?=.*[a-zA-Z])(?=.*(..+)\\1).{5,12}$");
System.out.println("value" + b3);
}

You can try with negative look-ahead (?!.*(.{2,})\\1).
For those who are wondering what \\1 is: it represents match from group 1, which in our case is match from (.{2,})

With Ron's suggestion I found which methods in java helps; matches(), find() work differently. find() helped me.
Guido's suggestion am breaking up code for different rules. Here's my code; yet to refine it: For checking repeat of any sequence using (\S+?)\1
String regex = "(\\S+?)\\1";
String regex2 = "^(?=.*[0-9])(?=.*[a-zA-Z]).{5,12}$";
p = Pattern.compile(regex);
for (String str : s2) {
matcher = p.matcher(str);
if (matcher.find())
System.out.println(str + " got repeated: " + matcher.group(1));
else if(str.matches(regex2))
System.out.println(str + " Password correct");
else
System.out.println(str + " Password incorrect");
}

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

Looking for quick, simple way in Java to change this string
" hello there "
to something that looks like this
"hello there"
where I replace all those multiple spaces with a single space, except I also want the one or more spaces at the beginning of string to be gone.
Something like this gets me partly there
String mytext = " hello there ";
mytext = mytext.replaceAll("( )+", " ");
but not quite.

Try this:
String after = before.trim().replaceAll(" +", " ");
See also
String.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.
regular-expressions.info/Repetition
No trim() regex
It's also possible to do this with just one replaceAll, but this is much less readable than the trim() solution. Nonetheless, it's provided here just to show what regex can do:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$|( )+", "$1")
);
}
There are 3 alternates:
^_+ : any sequence of spaces at the beginning of the string
Match and replace with $1, which captures the empty string
_+$ : any sequence of spaces at the end of the string
Match and replace with $1, which captures the empty string
(_)+ : any sequence of spaces that matches none of the above, meaning it's in the middle
Match and replace with $1, which captures a single space
See also
regular-expressions.info/Anchors

You just need a:
replaceAll("\\s{2,}", " ").trim();
where you match one or more spaces and replace them with a single space and then trim whitespaces at the beginning and end (you could actually invert by first trimming and then matching to make the regex quicker as someone pointed out).
To test this out quickly try:
System.out.println(new String(" hello there ").trim().replaceAll("\\s{2,}", " "));
and it will return:
"hello there"

Use the Apache commons StringUtils.normalizeSpace(String str) method. See docs here

This worked perfectly for me : sValue = sValue.trim().replaceAll("\\s+", " ");

trim() method removes the leading and trailing spaces and using replaceAll("regex", "string to replace") method with regex "\s+" matches more than one space and will replace it with a single space
myText = myText.trim().replaceAll("\\s+"," ");

The following code will compact any whitespace between words and remove any at the string's beginning and end
String input = "\n\n\n a string with many spaces, \n"+
" a \t tab and a newline\n\n";
String output = input.trim().replaceAll("\\s+", " ");
System.out.println(output);
This will output a string with many spaces, a tab and a newline
Note that any non-printable characters including spaces, tabs and newlines will be compacted or removed
For more information see the respective documentation:
String#trim() method
String#replaceAll(String regex, String replacement) method
For information about Java's regular expression implementation see the documentation of the Pattern class

"[ ]{2,}"
This will match more than one space.
String mytext = " hello there ";
//without trim -> " hello there"
//with trim -> "hello there"
mytext = mytext.trim().replaceAll("[ ]{2,}", " ");
System.out.println(mytext);
OUTPUT:
hello there

To eliminate spaces at the beginning and at the end of the String, use String#trim() method. And then use your mytext.replaceAll("( )+", " ").

You can first use String.trim(), and then apply the regex replace command on the result.

Try this one.
Sample Code
String str = " hello there ";
System.out.println(str.replaceAll("( +)"," ").trim());
OUTPUT
hello there
First it will replace all the spaces with single space. Than we have to supposed to do trim String because Starting of the String and End of the String it will replace the all space with single space if String has spaces at Starting of the String and End of the String So we need to trim them. Than you get your desired String.

String blogName = "how to do in java . com";
String nameWithProperSpacing = blogName.replaceAll("\\\s+", " ");

trim()
Removes only the leading & trailing spaces.
From Java Doc,
"Returns a string whose value is this string, with any leading and trailing whitespace removed."
System.out.println(" D ev Dum my ".trim());
"D ev Dum my"
replace(), replaceAll()
Replaces all the empty strings in the word,
System.out.println(" D ev Dum my ".replace(" ",""));
System.out.println(" D ev Dum my ".replaceAll(" ",""));
System.out.println(" D ev Dum my ".replaceAll("\\s+",""));
Output:
"DevDummy"
"DevDummy"
"DevDummy"
Note: "\s+" is the regular expression similar to the empty space character.
Reference : https://www.codedjava.com/2018/06/replace-all-spaces-in-string-trim.html

In Kotlin it would look like this
val input = "\n\n\n a string with many spaces, \n"
val cleanedInput = input.trim().replace(Regex("(\\s)+"), " ")

A lot of correct answers been provided so far and I see lot of upvotes. However, the mentioned ways will work but not really optimized or not really readable.
I recently came across the solution which every developer will like.
String nameWithProperSpacing = StringUtils.normalizeSpace( stringWithLotOfSpaces );
You are done.
This is readable solution.

You could use lookarounds also.
test.replaceAll("^ +| +$|(?<= ) ", "");
OR
test.replaceAll("^ +| +$| (?= )", "")
<space>(?= ) matches a space character which is followed by another space character. So in consecutive spaces, it would match all the spaces except the last because it isn't followed by a space character. This leaving you a single space for consecutive spaces after the removal operation.
Example:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$| (?= )", "")
);
}

See String.replaceAll.
Use the regex "\s" and replace with " ".
Then use String.trim.

String str = " hello world"
reduce spaces first
str = str.trim().replaceAll(" +", " ");
capitalize the first letter and lowercase everything else
str = str.substring(0,1).toUpperCase() +str.substring(1,str.length()).toLowerCase();

you should do it like this
String mytext = " hello there ";
mytext = mytext.replaceAll("( +)", " ");
put + inside round brackets.

String str = " this is string ";
str = str.replaceAll("\\s+", " ").trim();

This worked for me
scan= filter(scan, " [\\s]+", " ");
scan= sac.trim();
where filter is following function and scan is the input string:
public String filter(String scan, String regex, String replace) {
StringBuffer sb = new StringBuffer();
Pattern pt = Pattern.compile(regex);
Matcher m = pt.matcher(scan);
while (m.find()) {
m.appendReplacement(sb, replace);
}
m.appendTail(sb);
return sb.toString();
}

The simplest method for removing white space anywhere in the string.
public String removeWhiteSpaces(String returnString){
returnString = returnString.trim().replaceAll("^ +| +$|( )+", " ");
return returnString;
}

check this...
public static void main(String[] args) {
String s = "A B C D E F G\tH I\rJ\nK\tL";
System.out.println("Current : "+s);
System.out.println("Single Space : "+singleSpace(s));
System.out.println("Space count : "+spaceCount(s));
System.out.format("Replace all = %s", s.replaceAll("\\s+", ""));
// Example where it uses the most.
String s = "My name is yashwanth . M";
String s2 = "My nameis yashwanth.M";
System.out.println("Normal : "+s.equals(s2));
System.out.println("Replace : "+s.replaceAll("\\s+", "").equals(s2.replaceAll("\\s+", "")));
}
If String contains only single-space then replace() will not-replace,
If spaces are more than one, Then replace() action performs and removes spacess.
public static String singleSpace(String str){
return str.replaceAll(" +| +|\t|\r|\n","");
}
To count the number of spaces in a String.
public static String spaceCount(String str){
int i = 0;
while(str.indexOf(" ") > -1){
//str = str.replaceFirst(" ", ""+(i++));
str = str.replaceFirst(Pattern.quote(" "), ""+(i++));
}
return str;
}
Pattern.quote("?") returns literal pattern String.

My method before I found the second answer using regex as a better solution. Maybe someone needs this code.
private String replaceMultipleSpacesFromString(String s){
if(s.length() == 0 ) return "";
int timesSpace = 0;
String res = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(c == ' '){
timesSpace++;
if(timesSpace < 2)
res += c;
}else{
res += c;
timesSpace = 0;
}
}
return res.trim();
}

Stream version, filters spaces and tabs.
Stream.of(str.split("[ \\t]")).filter(s -> s.length() > 0).collect(Collectors.joining(" "))

I know replaceAll method is much easier but I wanted to post this as well.
public static String removeExtraSpace(String input) {
input= input.trim();
ArrayList <String> x= new ArrayList<>(Arrays.asList(input.split("")));
for(int i=0; i<x.size()-1;i++) {
if(x.get(i).equals(" ") && x.get(i+1).equals(" ")) {
x.remove(i);
i--;
}
}
String word="";
for(String each: x)
word+=each;
return word;
}

String myText = " Hello World ";
myText = myText.trim().replace(/ +(?= )/g,'');
// Output: "Hello World"

string.replaceAll("\s+", " ");

If you already use Guava (v. 19+) in your project you may want to use this:
CharMatcher.whitespace().trimAndCollapseFrom(input, ' ');
or, if you need to remove exactly SPACE symbol ( or U+0020, see more whitespaces) use:
CharMatcher.anyOf(" ").trimAndCollapseFrom(input, ' ');

public class RemoveExtraSpacesEfficient {
public static void main(String[] args) {
String s = "my name is mr space ";
char[] charArray = s.toCharArray();
char prev = s.charAt(0);
for (int i = 0; i < charArray.length; i++) {
char cur = charArray[i];
if (cur == ' ' && prev == ' ') {
} else {
System.out.print(cur);
}
prev = cur;
}
}
}
The above solution is the algorithm with the complexity of O(n) without using any java function.

Please use below code
package com.myjava.string;
import java.util.StringTokenizer;
public class MyStrRemoveMultSpaces {
public static void main(String a[]){
String str = "String With Multiple Spaces";
StringTokenizer st = new StringTokenizer(str, " ");
StringBuffer sb = new StringBuffer();
while(st.hasMoreElements()){
sb.append(st.nextElement()).append(" ");
}
System.out.println(sb.toString().trim());
}
}

pattern match java: does not work

i am trying to find a certain tag in a html-page with java. all i know is what kind of tag (div, span ...) and the id ... i dunno how it looks, how many whitespaces are where or what else is in the tag ... so i thought about using pattern matching and i have the following code:
// <tag[any character may be there or not]id="myid"[any character may be there or not]>
String str1 = "<" + Tag + "[.*]" + "id=\"" + search + "\"[.*]>";
// <tag[any character may be there or not]id="myid"[any character may be there or not]/>
String str2 = "<" + Tag + "[.*]" + "id=\"" + search + "\"[.*]/>";
Pattern p1 = Pattern.compile( str1 );
Pattern p2 = Pattern.compile( str2 );
Matcher m1 = p1.matcher( content );
Matcher m2 = p2.matcher( content );
int start = -1;
int stop = -1;
String Anfangsmarkierung = null;
int whichMatch = -1;
while( m1.find() == true || m2.find() == true ){
if( m1.find() ){
//System.out.println( " ... " + m1.group() );
start = m1.start();
//ende = m1.end();
stop = content.indexOf( "<", start );
whichMatch = 1;
}
else{
//System.out.println( " ... " + m2.group() );
start = m2.start();
stop = m2.end();
whichMatch = 2;
}
}
but i get an exception with m1(m2).start(), when i enter the actual tag without the [.*] and i dun get anything when i enter the regular expression :( ... i really havent found an explanation for this ... i havent worked with pattern or match at all yet, so i am a little lost and havent found anything so far. would be awesome if anyone could explain me what i am doing wrong or how i can do it better ...
thnx in advance :)
... dg

I know that I am broadening your question, but I think that using a dedicated library for parsing HTML documents (such as: http://htmlparser.sourceforge.net/) will be much more easier and accurate than regexps.

Here is an example for what you're trying to do adapted from one of my notes:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String tag = "thetag";
String id = "foo";
String content = "<tag1>\n"+
"<thetag name=\"Tag Name\" id=\"foo\">Some text</thetag>\n" +
"<thetag name=\"AnotherTag\" id=\"foo\">Some more text</thetag>\n" +
"</tag1>";
String patternString = "<" + tag + ".*?name=\"(.*?)\".*?id=\"" + id + "\".*?>";
System.out.println("Content:\n" + content);
System.out.println("Pattern: " + patternString);
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(content);
boolean found = false;
while (matcher.find()) {
System.out.format("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
System.out.println("Name: " + matcher.group(1));
found = true;
}
if (!found) {
System.out.println("No match found.");
}
}
}
You'll notice that the pattern string becomes something like <thetag.*?name="(.*?)".*?id="foo".*?> which will search for tags named thetag where the id attribute is set to "foo".
Note the following:
It uses .*? to weakly match zero or more of anything (if you don't understand, try removing the ? to see what I mean).
It uses a submatch expression between parenthesis (the name="(.*?)" part) to extract the contents of the name attribute (as an example).

I think each call to find is advancing through your match. Calling m1.find() inside your condition is moving your matcher to a place where there is no longer a valid match, which causes m1.start() to throw (I'm guessing) an IllegalStateException Ensuring you call find once per iteration and referencing that result from some flag avoids this problem.
boolean m1Matched = m1.find()
boolean m2Matched = m2.find()
while( m1Matched || m2Matched ) {
if( m1Matched ){
...
}
m1Matched = m1.find();
m2Matched = m2.find();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

UNICODE Regex in java - java

Related

Remove pattern from string in Java

Regex for Words with Apostrophes (Java)

password validation and UNICODE

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

pattern match java: does not work

Categories

Resources