Regular expression string search in Java - java

I know this can be done in many ways but im curious as to what the regex would be to pick out all strings not containing a particular substring, say GDA from
strings like GADSA, GDSARTCC, , THGDAERY.

you can do negative lookaround
"^((?!GAD).)*$"

You don't need a regex. Just use string.contains("GDA") to see if a string contains a particular substring. It will return false if it doesn't.

If your input is one long string then you have to decide how you define a substring. If it's separated by spaces then:
String[] split = mylongstr.split(" ");
for (String s : split) {
if (!s.contains("GDA")) {
// do whatever
}
}

String regex = ".*GDA.*";
List<String> testStrings = populateStrings();
for (String s : testStrings)
{
if (!s.matches(regex))
System.out.println("String " + s + " does not match " + regex);
}

Give this a shot:
java.util.regex.Pattern p = java.util.regex.Pattern.compile("(?!\\w*GDA\\w*)\\b\\w+\\b");
java.util.regex.Matcher m = p.matcher("GADSA, GDSARTCC, , THGDAERY");
while (m.find()) {
System.out.println("Found: " + m.group());
}

Related

Need help in regex matching

It may be very simple, but I am extremely new to regex and have a requirement where I need to do some regex matches in a string and extract the number in it. Below is my code with sample i/p and required o/p. I tried to construct the Pattern by referring to https://www.freeformatter.com/java-regex-tester.html, but my regex match itself is returning false.
Pattern pattern = Pattern.compile(".*/(a-b|c-d|e-f)/([0-9])+(#[0-9]?)");
String str = "foo/bar/Samsung-Galaxy/a-b/1"; // need to extract 1.
String str1 = "foo/bar/Samsung-Galaxy/c-d/1#P2";// need to extract 2.
String str2 = "foo.com/Samsung-Galaxy/9090/c-d/69"; // need to extract 69
System.out.println("result " + pattern.matcher(str).matches());
System.out.println("result " + pattern.matcher(str1).matches());
System.out.println("result " + pattern.matcher(str1).matches());
All of above SOPs are returning false. I am using java 8, is there is any way by which in a single statement I can match the pattern and then extract the digit from the string.
I would be great if somebody can point me on how to debug/develop the regex.Please feel free to let me know if something is not clear in my question.
You may use
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
See the regex demo
When used with matches(), the pattern above does not require explicit anchors, ^ and $.
Details
.* - any 0+ chars other than line break chars, as many as possible
/ - the rightmost / that is followed with the subsequent subpatterns
(?:a-b|c-d|e-f) - a non-capturing group matching any of the alternatives inside: a-b, c-d or e-f
/ - a / char
[^/]*? - any chars other than /, as few as possible
([0-9]+) - Group 1: one or more digits.
Java demo:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
for (String s : strs) {
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(s + ": \"" + m.group(1) + "\"");
}
}
A replacing approach using the same regex with anchors added:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
String pattern = "^.*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)$";
for (String s : strs) {
System.out.println(s + ": \"" + s.replaceFirst(pattern, "$1") + "\"");
}
See another Java demo.
Output:
foo/bar/Samsung-Galaxy/a-b/1: "1"
foo/bar/Samsung-Galaxy/c-d/1#P2: "2"
foo.com/Samsung-Galaxy/9090/c-d/69: "69"
Because you match always the last number in your regex, I would Like to just use replaceAll with this regex .*?(\d+)$ :
String regex = ".*?(\\d+)$";
String strResult1 = str.replaceAll(regex, "$1");
System.out.println(!strResult1.isEmpty() ? "result " + strResult1 : "no result");
String strResult2 = str1.replaceAll(regex, "$1");
System.out.println(!strResult2.isEmpty() ? "result " + strResult2 : "no result");
String strResult3 = str2.replaceAll(regex, "$1");
System.out.println(!strResult3.isEmpty() ? "result " + strResult3 : "no result");
If the result is empty then you don't have any number.
Outputs
result 1
result 2
result 69
Here is a one-liner using String#replaceAll:
public String getDigits(String input) {
String number = input.replaceAll(".*/(?:a-b|c-d|e-f)/[^/]*?(\\d+)$", "$1");
return number.matches("\\d+") ? number : "no match";
}
System.out.println(getDigits("foo.com/Samsung-Galaxy/9090/c-d/69"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/a-b/some other text/1"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/9090/a-b/69ace"));
69
no match
no match
This works on the sample inputs you provided. Note that I added logic which will display no match for the case where ending digits could not be matched fitting your pattern. In the case of a non-match, we would typically be left with the original input string, which would not be all digits.

How to search word in String text, this word end "." or "," in java

someone can help me with code?
How to search word in String text, this word end "." or "," in java
I don't want search like this to find it
String word = "test.";
String wordSerch = "I trying to tasting the Artestem test.";
String word1 = "test,"; // here with ","
String word2 = "test."; // here with "."
String word3 = "test"; //here without
//after i make string array and etc...
if((wordSearch.equalsIgnoreCase(word1))||
(wordSearch.equalsIgnoreCase(word2))||
(wordSearh.equalsIgnoreCase(word3))) {
}
if (wordSearch.contains(gramer))
//it's not working because the word Artestem will contain test too, and I don't need it
You can use the matches(Regex) function with a String
String word = "test.";
boolean check = false;
if (word.matches("\w*[\.,\,]") {
check = true;
}
You can use regex for this
Matcher matcher = Pattern.compile("\\btest\\b").matcher(wordSearch);
if (matcher.find()) {
}
\\b\\b will match only a word. So "Artestem" will not match in this case.
matcher.find() will return true if there is a word test in your sentence and false otherwise.
String stringToSearch = "I trying to tasting the Artestem test. test,";
Pattern p1 = Pattern.compile("test[.,]");
Matcher m = p1.matcher(stringToSearch);
while (m.find())
{
System.out.println(m.group());
}
You can transform your String in an Array divided by words(with "split"), and search on that array , checking the last character of the words(charAt) with the character that you want to find.
String stringtoSearch = "This is a test.";
String whatIwantToFind = ",";
String[] words = stringtoSearch.split("\\s+");
for (String word : words) {
if (whatIwantToFind.equalsignorecas(word.charAt(word.length()-1);)) {
System.out.println("FIND");
}
}
What is a word? E.g.:
Is '5' a word?
Is '漢語' a word, or two words?
Is 'New York' a word, or two words?
Is 'Kraftfahrzeughaftpflichtversicherung' (meaning "automobile liability insurance") a word, or 3 words?
For some languages you can use Pattern.compile("[^\\p{Alnum}\u0301-]+") for split words. Use Pattern#split for this.
I think, you can find word by this pattern:
String notWord = "[^\\p{Alnum}\u0301-]{0,}";
Pattern.compile(notWord + "test" + notWord)`
See also: https://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Java multiple regular expression search

I have a string some thing like this:
If message contains sensitive info like: {Password:123456, tmpPwd : tesgjadgj, TEMP_PASSWORD: kfnda}
My pattern should look for the particular words Password or tmpPwd or TEMP_PASSWORD.
How can I create a pattern for this kind of search?
I think you are looking for the values after these words. You need to set capturing groups to extract those values, e.g.
String content = "If message contains sensitive info like: {Password:123456, tmpPwd : tesgjadgj, TEMP_PASSWORD: kfnda} ";
Pattern p = Pattern.compile("\\{Password\\s*:\\s*([^,]+)\\s*,\\s*tmpPwd\\s*:\\s*([^,]+)\\s*,\\s*TEMP_PASSWORD:\\s*([^,]+)\\s*\\}");
Matcher m = p.matcher(content);
while (m.find()) {
System.out.println(m.group(1) + ", " + m.group(2) + ", " + m.group(3));
}
See IDEONE demo
This will output 123456, tesgjadgj, kfnda.
To just find out if there are any of the substrings, use contains method:
System.out.println(content.contains("Password") ||
content.contains("tmpPwd") ||
content.contains("TEMP_PASSWORD"));
See another demo
And if you want a regex-solution for the keywords, here it is:
String str = "If message contains sensitive info like: {Password:123456, tmpPwd : tesgjadgj, TEMP_PASSWORD: kfnda} ";
Pattern ptrn = Pattern.compile("Password|tmpPwd|TEMP_PASSWORD");
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println("Match found: " + m.group(0));
}
See Demo 3
Finally I am using it like as per my requirement .
private final static String censoredWords =
"(?i)PASSWORD|pwd";
The (?i) makes it case-insensitive

Regex not matching words delimited by whitespace

I have an input string that will follow the pattern /user/<id>?name=<name>, where <id> is alphanumeric but must start with a letter, and <name> is a letter-only string that can have multiple spaces. Some examples of matches would be:
/user/ad?name=a a
/user/one111?name=one ONE oNe
/user/hello?name=world
I came up with the following regex:
String regex = "/user/[a-zA-Z]+\\w*\\?name=[a-zA-Z\\s]+";
All of the above examples match the regex, but it only looks at the first word in <name>. Shouldn't the sequence \s allow me to have white spaces?
The code that I made to test what it is doing is:
String regex = "/user/[a-zA-Z]+\\w*\\?name=[a-zA-Z\\s]+";
// Check to see that input matches pattern
if(Pattern.matches(regex, str) == true){
str = str.replaceFirst("/user/", "");
str = str.replaceFirst("name=", "");
String[] tokens = str.split("\\?");
System.out.println("size = " + tokens.length);
System.out.println("tokens[0] = " + tokens[0]);
System.out.println("tokens[1] = " + tokens[1]);
} else
System.out.println("Didn't match.");
So for example, one test might look like:
/user/myID123?name=firstName LastName
size = 2
tokens[0] = myID123
tokens[1] = firstName
whereas the desired output would be
tokens[1] = firstName LastName
How can I change my regex to do this?
Not sure what you think is the problem in your code. tokens[1] will indeed contain firstName LastName in your example.
Here's an ideone.com demo showing this.
However, have you considered using capturing groups for the id and the name.
If you write it like
String regex = "/user/(\\w+)\\?name=([a-zA-Z\\s]+)";
Matcher m = Pattern.compile(regex).matcher(input);
you can get hold of myID123 and firstName LastName through m.group(1) and m.group(2)
I don't find any fault in your code but you may capture group like this:
String str = "/user/myID123?name=firstName LastName ";
String regex = "/user/([a-zA-Z]+\\w*)\\?name=([a-zA-Z\\s]+)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1) + ", " + m.group(2));
}
The problem is that * is greedy by default (it matches the whole string), so you need to modify your regex by adding a ? (making it reluctant):
List<String> str = Arrays.asList("/user/ad?name=a a", "/user/one111?name=one ONE oNe", "/user/hello?name=world");
String regex = "/user/([a-zA-Z]+\\w*?)\\?name=([a-zA-Z\\s]+)";
for (String s : str) {
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.matches()) {
System.out.println("user: " + matcher.group(1));
System.out.println("name: " + matcher.group(2));
}
}
Output:
user: ad
name: a a
user: one111
name: one ONE oNe
user: hello
name: world

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

Looking for quick, simple way in Java to change this string
" hello there "
to something that looks like this
"hello there"
where I replace all those multiple spaces with a single space, except I also want the one or more spaces at the beginning of string to be gone.
Something like this gets me partly there
String mytext = " hello there ";
mytext = mytext.replaceAll("( )+", " ");
but not quite.
Try this:
String after = before.trim().replaceAll(" +", " ");
See also
String.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.
regular-expressions.info/Repetition
No trim() regex
It's also possible to do this with just one replaceAll, but this is much less readable than the trim() solution. Nonetheless, it's provided here just to show what regex can do:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$|( )+", "$1")
);
}
There are 3 alternates:
^_+ : any sequence of spaces at the beginning of the string
Match and replace with $1, which captures the empty string
_+$ : any sequence of spaces at the end of the string
Match and replace with $1, which captures the empty string
(_)+ : any sequence of spaces that matches none of the above, meaning it's in the middle
Match and replace with $1, which captures a single space
See also
regular-expressions.info/Anchors
You just need a:
replaceAll("\\s{2,}", " ").trim();
where you match one or more spaces and replace them with a single space and then trim whitespaces at the beginning and end (you could actually invert by first trimming and then matching to make the regex quicker as someone pointed out).
To test this out quickly try:
System.out.println(new String(" hello there ").trim().replaceAll("\\s{2,}", " "));
and it will return:
"hello there"
Use the Apache commons StringUtils.normalizeSpace(String str) method. See docs here
This worked perfectly for me : sValue = sValue.trim().replaceAll("\\s+", " ");
trim() method removes the leading and trailing spaces and using replaceAll("regex", "string to replace") method with regex "\s+" matches more than one space and will replace it with a single space
myText = myText.trim().replaceAll("\\s+"," ");
The following code will compact any whitespace between words and remove any at the string's beginning and end
String input = "\n\n\n a string with many spaces, \n"+
" a \t tab and a newline\n\n";
String output = input.trim().replaceAll("\\s+", " ");
System.out.println(output);
This will output a string with many spaces, a tab and a newline
Note that any non-printable characters including spaces, tabs and newlines will be compacted or removed
For more information see the respective documentation:
String#trim() method
String#replaceAll(String regex, String replacement) method
For information about Java's regular expression implementation see the documentation of the Pattern class
"[ ]{2,}"
This will match more than one space.
String mytext = " hello there ";
//without trim -> " hello there"
//with trim -> "hello there"
mytext = mytext.trim().replaceAll("[ ]{2,}", " ");
System.out.println(mytext);
OUTPUT:
hello there
To eliminate spaces at the beginning and at the end of the String, use String#trim() method. And then use your mytext.replaceAll("( )+", " ").
You can first use String.trim(), and then apply the regex replace command on the result.
Try this one.
Sample Code
String str = " hello there ";
System.out.println(str.replaceAll("( +)"," ").trim());
OUTPUT
hello there
First it will replace all the spaces with single space. Than we have to supposed to do trim String because Starting of the String and End of the String it will replace the all space with single space if String has spaces at Starting of the String and End of the String So we need to trim them. Than you get your desired String.
String blogName = "how to do in java . com";
String nameWithProperSpacing = blogName.replaceAll("\\\s+", " ");
trim()
Removes only the leading & trailing spaces.
From Java Doc,
"Returns a string whose value is this string, with any leading and trailing whitespace removed."
System.out.println(" D ev Dum my ".trim());
"D ev Dum my"
replace(), replaceAll()
Replaces all the empty strings in the word,
System.out.println(" D ev Dum my ".replace(" ",""));
System.out.println(" D ev Dum my ".replaceAll(" ",""));
System.out.println(" D ev Dum my ".replaceAll("\\s+",""));
Output:
"DevDummy"
"DevDummy"
"DevDummy"
Note: "\s+" is the regular expression similar to the empty space character.
Reference : https://www.codedjava.com/2018/06/replace-all-spaces-in-string-trim.html
In Kotlin it would look like this
val input = "\n\n\n a string with many spaces, \n"
val cleanedInput = input.trim().replace(Regex("(\\s)+"), " ")
A lot of correct answers been provided so far and I see lot of upvotes. However, the mentioned ways will work but not really optimized or not really readable.
I recently came across the solution which every developer will like.
String nameWithProperSpacing = StringUtils.normalizeSpace( stringWithLotOfSpaces );
You are done.
This is readable solution.
You could use lookarounds also.
test.replaceAll("^ +| +$|(?<= ) ", "");
OR
test.replaceAll("^ +| +$| (?= )", "")
<space>(?= ) matches a space character which is followed by another space character. So in consecutive spaces, it would match all the spaces except the last because it isn't followed by a space character. This leaving you a single space for consecutive spaces after the removal operation.
Example:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$| (?= )", "")
);
}
See String.replaceAll.
Use the regex "\s" and replace with " ".
Then use String.trim.
String str = " hello world"
reduce spaces first
str = str.trim().replaceAll(" +", " ");
capitalize the first letter and lowercase everything else
str = str.substring(0,1).toUpperCase() +str.substring(1,str.length()).toLowerCase();
you should do it like this
String mytext = " hello there ";
mytext = mytext.replaceAll("( +)", " ");
put + inside round brackets.
String str = " this is string ";
str = str.replaceAll("\\s+", " ").trim();
This worked for me
scan= filter(scan, " [\\s]+", " ");
scan= sac.trim();
where filter is following function and scan is the input string:
public String filter(String scan, String regex, String replace) {
StringBuffer sb = new StringBuffer();
Pattern pt = Pattern.compile(regex);
Matcher m = pt.matcher(scan);
while (m.find()) {
m.appendReplacement(sb, replace);
}
m.appendTail(sb);
return sb.toString();
}
The simplest method for removing white space anywhere in the string.
public String removeWhiteSpaces(String returnString){
returnString = returnString.trim().replaceAll("^ +| +$|( )+", " ");
return returnString;
}
check this...
public static void main(String[] args) {
String s = "A B C D E F G\tH I\rJ\nK\tL";
System.out.println("Current : "+s);
System.out.println("Single Space : "+singleSpace(s));
System.out.println("Space count : "+spaceCount(s));
System.out.format("Replace all = %s", s.replaceAll("\\s+", ""));
// Example where it uses the most.
String s = "My name is yashwanth . M";
String s2 = "My nameis yashwanth.M";
System.out.println("Normal : "+s.equals(s2));
System.out.println("Replace : "+s.replaceAll("\\s+", "").equals(s2.replaceAll("\\s+", "")));
}
If String contains only single-space then replace() will not-replace,
If spaces are more than one, Then replace() action performs and removes spacess.
public static String singleSpace(String str){
return str.replaceAll(" +| +|\t|\r|\n","");
}
To count the number of spaces in a String.
public static String spaceCount(String str){
int i = 0;
while(str.indexOf(" ") > -1){
//str = str.replaceFirst(" ", ""+(i++));
str = str.replaceFirst(Pattern.quote(" "), ""+(i++));
}
return str;
}
Pattern.quote("?") returns literal pattern String.
My method before I found the second answer using regex as a better solution. Maybe someone needs this code.
private String replaceMultipleSpacesFromString(String s){
if(s.length() == 0 ) return "";
int timesSpace = 0;
String res = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(c == ' '){
timesSpace++;
if(timesSpace < 2)
res += c;
}else{
res += c;
timesSpace = 0;
}
}
return res.trim();
}
Stream version, filters spaces and tabs.
Stream.of(str.split("[ \\t]")).filter(s -> s.length() > 0).collect(Collectors.joining(" "))
I know replaceAll method is much easier but I wanted to post this as well.
public static String removeExtraSpace(String input) {
input= input.trim();
ArrayList <String> x= new ArrayList<>(Arrays.asList(input.split("")));
for(int i=0; i<x.size()-1;i++) {
if(x.get(i).equals(" ") && x.get(i+1).equals(" ")) {
x.remove(i);
i--;
}
}
String word="";
for(String each: x)
word+=each;
return word;
}
String myText = " Hello World ";
myText = myText.trim().replace(/ +(?= )/g,'');
// Output: "Hello World"
string.replaceAll("\s+", " ");
If you already use Guava (v. 19+) in your project you may want to use this:
CharMatcher.whitespace().trimAndCollapseFrom(input, ' ');
or, if you need to remove exactly SPACE symbol ( or U+0020, see more whitespaces) use:
CharMatcher.anyOf(" ").trimAndCollapseFrom(input, ' ');
public class RemoveExtraSpacesEfficient {
public static void main(String[] args) {
String s = "my name is mr space ";
char[] charArray = s.toCharArray();
char prev = s.charAt(0);
for (int i = 0; i < charArray.length; i++) {
char cur = charArray[i];
if (cur == ' ' && prev == ' ') {
} else {
System.out.print(cur);
}
prev = cur;
}
}
}
The above solution is the algorithm with the complexity of O(n) without using any java function.
Please use below code
package com.myjava.string;
import java.util.StringTokenizer;
public class MyStrRemoveMultSpaces {
public static void main(String a[]){
String str = "String With Multiple Spaces";
StringTokenizer st = new StringTokenizer(str, " ");
StringBuffer sb = new StringBuffer();
while(st.hasMoreElements()){
sb.append(st.nextElement()).append(" ");
}
System.out.println(sb.toString().trim());
}
}

Categories

Resources