I have a string like below:
"This is the code: cd001, cd002, cd003 "
(i already catch: cd001, cd002, cd003)
but it's must ignore for: cd001, cd002, cd003 in the string below
"This is the code: cd001,cd002, cd003,xxxx "
i have a regex: [^|\\s|>]*([a-z]{2}[0-9]+\\.?)\\b
(Begin with start string, space then two lowercase letters, digits after, and then is [. or , or # or space] )
// parse inputString into String[] of codes
// if there are no codes in the string, codes[0] is ""
String[] codes =
// delete beginning of the line till ":" inclusive
inputString.replaceFirst("^.*: ", "").
// delete two codes that are separated by "," and
// followed by 0 or 1 "," and 1 " "
replaceAll("[a-z0-9]+,[a-z0-9]+,? ", "").
// delete trailing spaces
replaceFirst(" +$", "").
// split codes
split(", ");
Related
I am trying to remove dashes, underscores & extra spaces from a string.
Below is the code I tried :
public class Hello {
public static void main(String args[]) {
String blogName = "ABCD __________________________________________________ Size: 1234-5678 BAR 8956-7896 ---------- CAR 8756-2365";
String nameWithProperSpacing = blogName.replaceAll("\\s+", " ").replaceAll("-", "").replaceAll("_", "");
System.out.println( nameWithProperSpacing );
}
}
I don't want it to remove them from between words & numbers, though. For example my current code changes 8956-7896 to 89567896.
The output I want :
ABCD Size:1234-5678 BAR 8956-7896 CAR 8756-2365
You will need search with lookahead and lookbehind
String nameWithProperSpacing = blogName
.replaceAll("(?<![a-zA-Z0-9])[-_](?![a-zA-Z0-9])", "") ///No -_ not inside a word
.replaceAll("\\s+", " "); /// only 1 space between words
What you are trying to achieve can be simply done by (Keep single dash, remove multiple dashes):
blogName.replaceAll("\\-(\\-)+","");
Try this, not pretty but works with your example:
.replaceAll("\\s+", " ")
.replaceAll("--", "").replaceAll("__", "").replaceAll(" _", "")
.replaceAll("_ ", "").replaceAll(" -", "").replaceAll("- ", "")
.replaceAll(" ", " ").trim();
Another solution:
String nameWithProperSpacing = blogName.replaceAll("([^A-Za-z0-9])(\\-|\\s|_)+", "$1 ");
You can play with non-word boundaries \B to remove them, but to do that you have to remove underscores first:
String nameWithProperSpacing = blogName.replaceAll("\\s+", " ").replaceAll("_+", "").replaceAll("\\B-+|-+\\B", "");
I want to find store all the separators between the words in a sentence which could be spaces, newlines.
Say I have the following String:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String[] separators = text.split("\\S+");
Output: [, , , , ,
, , , , ,
,
]
So I split on anything but a space it is returning an empty separator at first and the rest are good. Why the empty string at first tho?
Also, I would like to split on periods and commas. But I don't know how to do that meaning that ".\n" is a separator.
Wanted Output for the above String:
separators = {", ", " ", " ", " ", ".\n", " ", " ", " ", " ", "\r\n", "\n "}
or
separators = {",", " ", " ", " ", " ", ".", "\n", " ", " ", " ", " ", "\r\n", "\n "}
Try this:
String[] separators = text.split("[\\w']+");
This defines non-separators as "word chars" and/or apostrophes.
This does leave a leading blank in the result array, which is not possible to avoid, except by removing the leading word first:
String[] separators = text.replaceAll("^[\\w']+", "").split("[\\w']+");
You may consider adding the hyphen to the character class, if you consider hyphenated words (example in the previous sentence) as one word, ie
String[] separators = text.split("[\\w'-]+");
See live demo.
I think this can also work correctly:
String[] separators = text.split("\\w+");
If think it's more easy to use the .find() method to obtain the desired result:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String pat = "[\\s,.]+"; // add all that you need to the character class
Matcher m = Pattern.compile(pat).matcher(text);
List<String> list = new ArrayList<String>();
while( m.find() ) {
list.add(m.group());
}
// the result is already stored in "list" but if you
// absolutely want to store the result in an array, just do:
String[] result = list.toArray(new String[0]);
This way you avoid the empty string problem at the beginning.
I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
But, I am getting some elements which are blank. The output is:
spart[0]: s
spart[1]: film
spart[2]:
spart[3]: normal
- is a special character in PHP character classes. For instance, [a-z] matches all chars from a to z inclusive. Note that you've got )-_ in your regex.
- defines a range in regular expressions as used by String.split argument so that needs to be escaped
String[] part = line.toLowerCase().split("[,/?:;\"{}()\\-_+*=|<>!`~##$%^&]");
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s]+");
I don't know much about regex. So can you please tell me how to split the below string to get the desired output?
String ruleString= "/Rule/Account/Attribute[N='accountCategory' and V>=1]"+
" and /Rule/Account/Attribute[N='accountType' and V>=34]"+
" and /Rule/Account/Attribute[N='acctSegId' and V>=341]"+
" and /Rule/Account/Attribute[N='is1sa' and V>=1]"+
" and /Rule/Account/Attribute[N='isActivated' and V>=0]"+
" and /Rule/Account/Attribute[N='mogId' and V>=3]"+
" and /Rule/Account/Attribute[N='regulatoryId' and V>=4]"+
" and /Rule/Account/Attribute[N='vipCode' and V>=5]"+
" and /Rule/Subscriber/Attribute[N='agentId' and V='346']";
Desired output:
a[0] = /Rule/Account/Attribute[N='accountCategory' and V>=1]
a[1] = /Rule/Account/Attribute[N='accountType' and V>=34]
.
.
.
a[n] = /Rule/Subscriber/Attribute[N='agentId' and V='346']
We can not simply split a string using " and " as we have two of those in the string (one is required and other one is not)
I want to split it something like this
String[] splitArray= ruleString.split("] and ");
But this won't work, as it will remove the end bracket ] from each of the splits.
Split your input according to the below regex.
String[] splitArray= ruleString.split("\\s+and\\s+(?=/)");
This splits the input according to the and which exits just before to the forward slash.
You have to use look-behind here:
String[] splitArray= ruleString.split("(?<=\\])\\s*and\\s*");
I am splitting a string by word boundary.
What I am expecting is:
TOKEN 0
TOKEN 1 0
TOKEN 2
TOKEN 3 +Ve
and, what I am getting is,
TOKEN 0
TOKEN 1 0
TOKEN 2 +
TOKEN 3 Ve
public void StringExample(){
String str = " 0 +Ve";
String[] token = str.split("\\b");
System.out.println("TOKEN 0 " + token[0]);
System.out.println("TOKEN 1 " + token[1]);
System.out.println("TOKEN 2 " + token[2]);
System.out.println("TOKEN 3 " + token[3]);
}
Can someone give a clue where its going wrong? and Possible corrections if any,
Both #pb2q and #Hovercraft have already explained why word boundary doesn't work in your situation. An alternative, is to use a Pattern and capture each group, which will give you what you want:
String str = " 0 +Ve";
Pattern p = Pattern.compile("( |[^ ]+)");
Matcher m = p.matcher(str);
List<String> tokens = new ArrayList<String>();
while (m.find()) {
tokens.add(m.group(1));
}
System.out.println("TOKEN 0 " + tokens.get(0));
System.out.println("TOKEN 1 " + tokens.get(1));
System.out.println("TOKEN 2 " + tokens.get(2));
System.out.println("TOKEN 3 " + tokens.get(3));
Nothing is going wrong, and the results are as should be expected. Word boundaries match at the before the first character of a String, after the last character of a String and between two characters in the string, where one is a word character and the other is not a word character. The last rule will result in a match between '+' and 'V', and so your results make perfect sense.
Perhaps you want to use look ahead and look behind to match anything next to a space. For example:
public class Foo001 {
// private static final String REGEX1 = "\\b";
private static final String REGEX2 = "(?= )|(?<= )";
public static void main(String[] args) {
String str = " 0 +Ve";
String[] tokens = str.split(REGEX2);
for (int i = 0; i < tokens.length; i++) {
System.out.printf("token %d: \"%s\"%n", i, tokens[i]);
}
}
}
This will also match the left of the first space giving an extra token:
token 0: ""
token 1: " "
token 2: "0"
token 3: " "
token 4: "+Ve"
+ is not counted as a word char for word boundaries. Word chars are [a-zA-Z_0-9], that is, alphanumeric, and underscore
Unless your strings get more complex than your example, this is another instance where you can just split around the space:
" 0 +Ve".split(" ");
This should yield this array: [" ", "0", "+Ve"].
Which doesn't quite match the token list that you expect, but may suit your purposes. With this token list you know that there is a leading space character, and you can infer a space as the third token.
A problem with splitting this way is that multiple space characters will yield additional " " tokens in the resulting array.