Java Split a String with Regex expression - java

I don't know much about regex. So can you please tell me how to split the below string to get the desired output?
String ruleString= "/Rule/Account/Attribute[N='accountCategory' and V>=1]"+
" and /Rule/Account/Attribute[N='accountType' and V>=34]"+
" and /Rule/Account/Attribute[N='acctSegId' and V>=341]"+
" and /Rule/Account/Attribute[N='is1sa' and V>=1]"+
" and /Rule/Account/Attribute[N='isActivated' and V>=0]"+
" and /Rule/Account/Attribute[N='mogId' and V>=3]"+
" and /Rule/Account/Attribute[N='regulatoryId' and V>=4]"+
" and /Rule/Account/Attribute[N='vipCode' and V>=5]"+
" and /Rule/Subscriber/Attribute[N='agentId' and V='346']​";
Desired output:
a[0] = /Rule/Account/Attribute[N='accountCategory' and V>=1]
a[1] = /Rule/Account/Attribute[N='accountType' and V>=34]
.
.
.
a[n] = /Rule/Subscriber/Attribute[N='agentId' and V='346']
We can not simply split a string using " and " as we have two of those in the string (one is required and other one is not)
I want to split it something like this
String[] splitArray= ruleString.split("] and ");
But this won't work, as it will remove the end bracket ] from each of the splits.

Split your input according to the below regex.
String[] splitArray= ruleString.split("\\s+and\\s+(?=/)");
This splits the input according to the and which exits just before to the forward slash.

You have to use look-behind here:
String[] splitArray= ruleString.split("(?<=\\])\\s*and\\s*");

Related

How to remove dash (-) from string with regex except between numbers & words in java?

I am trying to remove dashes, underscores & extra spaces from a string.
Below is the code I tried :
public class Hello {
public static void main(String args[]) {
String blogName = "ABCD __________________________________________________ Size: 1234-5678 BAR 8956-7896 ---------- CAR 8756-2365";
String nameWithProperSpacing = blogName.replaceAll("\\s+", " ").replaceAll("-", "").replaceAll("_", "");
System.out.println( nameWithProperSpacing );
}
}
I don't want it to remove them from between words & numbers, though. For example my current code changes 8956-7896 to 89567896.
The output I want :
ABCD Size:1234-5678 BAR 8956-7896 CAR 8756-2365
You will need search with lookahead and lookbehind
String nameWithProperSpacing = blogName
.replaceAll("(?<![a-zA-Z0-9])[-_](?![a-zA-Z0-9])", "") ///No -_ not inside a word
.replaceAll("\\s+", " "); /// only 1 space between words
What you are trying to achieve can be simply done by (Keep single dash, remove multiple dashes):
blogName.replaceAll("\\-(\\-)+","");
Try this, not pretty but works with your example:
.replaceAll("\\s+", " ")
.replaceAll("--", "").replaceAll("__", "").replaceAll(" _", "")
.replaceAll("_ ", "").replaceAll(" -", "").replaceAll("- ", "")
.replaceAll(" ", " ").trim();
Another solution:
String nameWithProperSpacing = blogName.replaceAll("([^A-Za-z0-9])(\\-|\\s|_)+", "$1 ");
You can play with non-word boundaries \B to remove them, but to do that you have to remove underscores first:
String nameWithProperSpacing = blogName.replaceAll("\\s+", " ").replaceAll("_+", "").replaceAll("\\B-+|-+\\B", "");

Split String To Get Word Separators

I want to find store all the separators between the words in a sentence which could be spaces, newlines.
Say I have the following String:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String[] separators = text.split("\\S+");
Output: [, , , , ,
, , , , ,
,
]
So I split on anything but a space it is returning an empty separator at first and the rest are good. Why the empty string at first tho?
Also, I would like to split on periods and commas. But I don't know how to do that meaning that ".\n" is a separator.
Wanted Output for the above String:
separators = {", ", " ", " ", " ", ".\n", " ", " ", " ", " ", "\r\n", "\n "}
or
separators = {",", " ", " ", " ", " ", ".", "\n", " ", " ", " ", " ", "\r\n", "\n "}
Try this:
String[] separators = text.split("[\\w']+");
This defines non-separators as "word chars" and/or apostrophes.
This does leave a leading blank in the result array, which is not possible to avoid, except by removing the leading word first:
String[] separators = text.replaceAll("^[\\w']+", "").split("[\\w']+");
You may consider adding the hyphen to the character class, if you consider hyphenated words (example in the previous sentence) as one word, ie
String[] separators = text.split("[\\w'-]+");
See live demo.
I think this can also work correctly:
String[] separators = text.split("\\w+");
If think it's more easy to use the .find() method to obtain the desired result:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String pat = "[\\s,.]+"; // add all that you need to the character class
Matcher m = Pattern.compile(pat).matcher(text);
List<String> list = new ArrayList<String>();
while( m.find() ) {
list.add(m.group());
}
// the result is already stored in "list" but if you
// absolutely want to store the result in an array, just do:
String[] result = list.toArray(new String[0]);
This way you avoid the empty string problem at the beginning.

Digits are getting deleted when splitting a string

I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
But, I am getting some elements which are blank. The output is:
spart[0]: s
spart[1]: film
spart[2]:
spart[3]: normal
- is a special character in PHP character classes. For instance, [a-z] matches all chars from a to z inclusive. Note that you've got )-_ in your regex.
- defines a range in regular expressions as used by String.split argument so that needs to be escaped
String[] part = line.toLowerCase().split("[,/?:;\"{}()\\-_+*=|<>!`~##$%^&]");
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s]+");

How to remove spaces in between the String

I have below String
string = "Book Your Domain And Get\n \n\n \n \n \n Online Today."
string = str.replace("\\s","").trim();
which returning
str = "Book Your Domain And Get Online Today."
But what is want is
str = "Book Your Domain And Get Online Today."
I have tried Many Regular Expression and also googled but got no luck. and did't find related question, Please Help, Many Thanks in Advance
Use \\s+ instead of \\s as there are two or more consecutive whitespaces in your input.
string = str.replaceAll("\\s+"," ")
You can use replaceAll which takes a regex as parameter. And it seems like you want to replace multiple spaces with a single space. You can do it like this:
string = str.replaceAll("\\s{2,}"," ");
It will replace 2 or more consecutive whitespaces with a single whitespace.
First get rid of multiple spaces:
String after = before.trim().replaceAll(" +", " ");
If you want to just remove the white space between 2 words or characters and not at the end of string
then here is the
regex that i have used,
String s = " N OR 15 2 ";
Pattern pattern = Pattern.compile("[a-zA-Z0-9]\\s+[a-zA-Z0-9]", Pattern.CASE_INSENSITIVE);
Matcher m = pattern.matcher(s);
while(m.find()){
String replacestr = "";
int i = m.start();
while(i<m.end()){
replacestr = replacestr + s.charAt(i);
i++;
}
m = pattern.matcher(s);
}
System.out.println(s);
it will only remove the space between characters or words not spaces at the ends
and the output is
NOR152
Eg. to remove space between words in a string:
String example = "Interactive Resource";
System.out.println("Without space string: "+ example.replaceAll("\\s",""));
Output:
Without space string: InteractiveResource
If you want to print a String without space, just add the argument sep='' to the print function, since this argument's default value is " ".
//user this for removing all the whitespaces from a given string for example a =" 1 2 3 4"
//output: 1234
a.replaceAll("\\s", "")
String s2=" 1 2 3 4 5 ";
String after=s2.replace(" ", "");
this work for me
String string_a = "AAAA BBB";
String actualTooltip_3 = string_a.replaceAll("\\s{2,}"," ");
System.out.println(String actualTooltip_3);
OUTPUT will be:AAA BBB

Using a JTextField to get a regular expression from a user. How do I make it see \t as a tab instead of a \ followed by a t

JTextField reSource; //contains the regex expression the user wants to search for
String re=reSource.getText();
Pattern p=Pattern.compile(re,myflags); //myflags defined elsewhere in code
Matcher m=p.matcher(src); //src is the text to search and comes from a JTextArea
while (m.find()==true) {
If the user enters \t it finds \t not tab.
If the user enters \\\t it finds \\\t not tab.
If the user enters [\t] or [\\\t] it finds t not tab.
I want it such that if the user enters \t it finds tab. Of course it also needs to work with \n, \r etc...
If re="\t"; is used instead of re=reSource.getText(); with \t in the JTextField then it finds tabs. How do I get it to work with the contents of the JTextField?
Example:
String src = "This\tis\ta\ttest";
System.out.println("src=\"" + src + '"'); // --> prints "This is a test"
String re="\\t";
System.out.println("re=\"" + re + '"'); // --> prints "\t" - as when you use reSource.getText();
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(src);
while (m.find()) {
System.out.println('"' + m.group() + '"');
}
Output:
src="This is a test"
re="\t"
" "
" "
" "
Try this:
re=re.replace("\\t", "\t");
OR
re=re.replace("\\t", "\\\\t");
I think the problem is in understanding that when you type:
String str = "\t";
Then it is actualy same as:
String str = " ";
But if you type:
String str = "\\t";
Then the System.out.print(str) will be "\t".
Matching \t should work, however, your flags might have a problem.
Here's what works for me:
String src = "A\tBC\tD";
Pattern p=Pattern.compile("\\w\\t\\w"); //simulates the user entering \w\t\w
Matcher m=p.matcher(src);
while (m.find())
{
System.out.println("Match: \"" + m.group(0) + "\"");
}
Output is:
Match: "A B"
Match: "C D"
My experience is that Java Swing JTextField and JTable GUI controls escape user-entered backslashes by prefixing a backslash.
User types two-character sequence "backslash t", control's getText() method returns a String containing the three-character sequence "backslash backslash t". The SO formatter does its own thing with backslashes in text so here it is as code:
Single backslash: input is 2 char sequence \t and return value is 3 char \\t
For three-character input sequence "backsl backsl t", getText() returns the five-character sequence "backsl backsl backsl backsl t". As code:
Double backslash: input is 3 char sequence \\t and return value is 5 char \\\\t
This basically prevents the backslash from modifying the t to yield a character sequence that becomes a tab when interpreted by something like System.out.println.
Conveniently, and surprisingly to me, the regex processor accepts it either way. A two-character sequence "\t" matches a tab character, as does a three-character sequence "\\t". Please see demo code below. The system.out calls demonstrate which sequences and patterns, have tabs, and in JDK 1.7 both matches yield true.
package my.text;
/**
* Demonstrate use of tab character in regexes
*/
public class RegexForSo {
public static void main(String [] argv) {
final String sequenceTab="x\ty\tz";
final String patternBsTab = "x\t.*";
final String patternBsBsTab = "x\\t.*";
System.out.println("sequence is >" + sequenceTab + "<");
System.out.println("pattern BsTab is >" + patternBsTab + "<");
System.out.println("pattern BsBsTab is >" + patternBsBsTab + "<");
System.out.println("matched BsTab = " + sequenceTab.matches(patternBsTab));
System.out.println("matched BsBsTab = " + sequenceTab.matches(patternBsBsTab));
}
}
Output on my JDK1.7 system is below, tabs in output might not survive SO formatter :)
sequence is >x y z<
pattern BsTab is >x .*<
pattern BsBsTab is >x\t.*<
matched BsTab = true
matched BsBsTab = true
HTH

Categories

Resources