Digits are getting deleted when splitting a string - java

I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
But, I am getting some elements which are blank. The output is:
spart[0]: s
spart[1]: film
spart[2]:
spart[3]: normal

- is a special character in PHP character classes. For instance, [a-z] matches all chars from a to z inclusive. Note that you've got )-_ in your regex.

- defines a range in regular expressions as used by String.split argument so that needs to be escaped
String[] part = line.toLowerCase().split("[,/?:;\"{}()\\-_+*=|<>!`~##$%^&]");

String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s]+");

Related

Android toLowerCase() issue with accented characters

My app has a feature to filter content based on some keywords.
This is case insensitive so in order to work I first call String.toLowerCase() on the source content.
The issue I have is when the source is in upper case and contains accentuated characters like with the french word: "INVITÉ"
This word when set to lowercase using the device default locale returns "invité"
The problem is that the last character is not the same as the lowercase character "é"
Instead it's the combination of 2 chars:
"e" 101 &
" ' " 769
Because of this "invité" does not match "invité"
How can I solve this? I would prefer not to remove accentuated characters altogether
You should normalize the string like this.
String upper = "INVITÉ";
System.out.println(upper + " length=" + upper.length());
String lower = upper.toLowerCase();
System.out.println(lower + " length=" + lower.length());
String normalized = Normalizer.normalize(lower, Normalizer.Form.NFC);
System.out.println(normalized + " length=" + normalized.length());
output:
INVITÉ length=7
invité length=7
invité length=6
It also works for Japanese.
String japanese = "が";
System.out.println(japanese + " length=" + japanese.length());
String normalized = Normalizer.normalize(japanese, Normalizer.Form.NFC);
System.out.println(normalized + " length=" + normalized.length());
output:
が length=2
が length=1

Split String To Get Word Separators

I want to find store all the separators between the words in a sentence which could be spaces, newlines.
Say I have the following String:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String[] separators = text.split("\\S+");
Output: [, , , , ,
, , , , ,
,
]
So I split on anything but a space it is returning an empty separator at first and the rest are good. Why the empty string at first tho?
Also, I would like to split on periods and commas. But I don't know how to do that meaning that ".\n" is a separator.
Wanted Output for the above String:
separators = {", ", " ", " ", " ", ".\n", " ", " ", " ", " ", "\r\n", "\n "}
or
separators = {",", " ", " ", " ", " ", ".", "\n", " ", " ", " ", " ", "\r\n", "\n "}
Try this:
String[] separators = text.split("[\\w']+");
This defines non-separators as "word chars" and/or apostrophes.
This does leave a leading blank in the result array, which is not possible to avoid, except by removing the leading word first:
String[] separators = text.replaceAll("^[\\w']+", "").split("[\\w']+");
You may consider adding the hyphen to the character class, if you consider hyphenated words (example in the previous sentence) as one word, ie
String[] separators = text.split("[\\w'-]+");
See live demo.
I think this can also work correctly:
String[] separators = text.split("\\w+");
If think it's more easy to use the .find() method to obtain the desired result:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String pat = "[\\s,.]+"; // add all that you need to the character class
Matcher m = Pattern.compile(pat).matcher(text);
List<String> list = new ArrayList<String>();
while( m.find() ) {
list.add(m.group());
}
// the result is already stored in "list" but if you
// absolutely want to store the result in an array, just do:
String[] result = list.toArray(new String[0]);
This way you avoid the empty string problem at the beginning.

Java Split a String with Regex expression

I don't know much about regex. So can you please tell me how to split the below string to get the desired output?
String ruleString= "/Rule/Account/Attribute[N='accountCategory' and V>=1]"+
" and /Rule/Account/Attribute[N='accountType' and V>=34]"+
" and /Rule/Account/Attribute[N='acctSegId' and V>=341]"+
" and /Rule/Account/Attribute[N='is1sa' and V>=1]"+
" and /Rule/Account/Attribute[N='isActivated' and V>=0]"+
" and /Rule/Account/Attribute[N='mogId' and V>=3]"+
" and /Rule/Account/Attribute[N='regulatoryId' and V>=4]"+
" and /Rule/Account/Attribute[N='vipCode' and V>=5]"+
" and /Rule/Subscriber/Attribute[N='agentId' and V='346']​";
Desired output:
a[0] = /Rule/Account/Attribute[N='accountCategory' and V>=1]
a[1] = /Rule/Account/Attribute[N='accountType' and V>=34]
.
.
.
a[n] = /Rule/Subscriber/Attribute[N='agentId' and V='346']
We can not simply split a string using " and " as we have two of those in the string (one is required and other one is not)
I want to split it something like this
String[] splitArray= ruleString.split("] and ");
But this won't work, as it will remove the end bracket ] from each of the splits.
Split your input according to the below regex.
String[] splitArray= ruleString.split("\\s+and\\s+(?=/)");
This splits the input according to the and which exits just before to the forward slash.
You have to use look-behind here:
String[] splitArray= ruleString.split("(?<=\\])\\s*and\\s*");

How to remove special characters in String

I want to remove "[", "]", "," from my string
for example,
[569.24, 569.24, 568.10, 566.00, 566.01, 566.00, 567.98, 565.14]
to
569.24 569.24 568.10 566.00 566.01 566.00 567.98 565.14
however, I can remove "," but "[" and "]"
my codes are as follows.
String content = price_result.toString();
//remove special characters
String content_modified = content.replaceAll("[ \t\"',;]+", " ");
System.out.println(content_modified);
the above result in [569.24, 569.24, 568.10, 566.00, 566.01, 566.00, 567.98, 565.14]..
How can I remove "[" and "]"?
just use this
String content = price_result.toString();
//remove special characters
String content_modified = content.replace("[","").replace("]","").replace(",","");
System.out.println(content_modified);
You can try the next:
// Characters you want to remove
String unwanted = "[],";
// It will be used frequently? Use a constant.
Pattern pattern = Pattern.compile("[" + Pattern.quote(unwanted) + "]");
String content = price_result.toString();
String content_modified = pattern.matcher(content).replaceAll("");
System.out.println(content_modified);
Put them in character class [] with escape character \
String content_modified = content.replaceAll("[\\[\t\"',;\\]]+", " ");
Or pipe them one by one(put other characters yourself :)
String content_modified = content.replaceAll("\\[|\\]|,|;", " ");

Using a JTextField to get a regular expression from a user. How do I make it see \t as a tab instead of a \ followed by a t

JTextField reSource; //contains the regex expression the user wants to search for
String re=reSource.getText();
Pattern p=Pattern.compile(re,myflags); //myflags defined elsewhere in code
Matcher m=p.matcher(src); //src is the text to search and comes from a JTextArea
while (m.find()==true) {
If the user enters \t it finds \t not tab.
If the user enters \\\t it finds \\\t not tab.
If the user enters [\t] or [\\\t] it finds t not tab.
I want it such that if the user enters \t it finds tab. Of course it also needs to work with \n, \r etc...
If re="\t"; is used instead of re=reSource.getText(); with \t in the JTextField then it finds tabs. How do I get it to work with the contents of the JTextField?
Example:
String src = "This\tis\ta\ttest";
System.out.println("src=\"" + src + '"'); // --> prints "This is a test"
String re="\\t";
System.out.println("re=\"" + re + '"'); // --> prints "\t" - as when you use reSource.getText();
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(src);
while (m.find()) {
System.out.println('"' + m.group() + '"');
}
Output:
src="This is a test"
re="\t"
" "
" "
" "
Try this:
re=re.replace("\\t", "\t");
OR
re=re.replace("\\t", "\\\\t");
I think the problem is in understanding that when you type:
String str = "\t";
Then it is actualy same as:
String str = " ";
But if you type:
String str = "\\t";
Then the System.out.print(str) will be "\t".
Matching \t should work, however, your flags might have a problem.
Here's what works for me:
String src = "A\tBC\tD";
Pattern p=Pattern.compile("\\w\\t\\w"); //simulates the user entering \w\t\w
Matcher m=p.matcher(src);
while (m.find())
{
System.out.println("Match: \"" + m.group(0) + "\"");
}
Output is:
Match: "A B"
Match: "C D"
My experience is that Java Swing JTextField and JTable GUI controls escape user-entered backslashes by prefixing a backslash.
User types two-character sequence "backslash t", control's getText() method returns a String containing the three-character sequence "backslash backslash t". The SO formatter does its own thing with backslashes in text so here it is as code:
Single backslash: input is 2 char sequence \t and return value is 3 char \\t
For three-character input sequence "backsl backsl t", getText() returns the five-character sequence "backsl backsl backsl backsl t". As code:
Double backslash: input is 3 char sequence \\t and return value is 5 char \\\\t
This basically prevents the backslash from modifying the t to yield a character sequence that becomes a tab when interpreted by something like System.out.println.
Conveniently, and surprisingly to me, the regex processor accepts it either way. A two-character sequence "\t" matches a tab character, as does a three-character sequence "\\t". Please see demo code below. The system.out calls demonstrate which sequences and patterns, have tabs, and in JDK 1.7 both matches yield true.
package my.text;
/**
* Demonstrate use of tab character in regexes
*/
public class RegexForSo {
public static void main(String [] argv) {
final String sequenceTab="x\ty\tz";
final String patternBsTab = "x\t.*";
final String patternBsBsTab = "x\\t.*";
System.out.println("sequence is >" + sequenceTab + "<");
System.out.println("pattern BsTab is >" + patternBsTab + "<");
System.out.println("pattern BsBsTab is >" + patternBsBsTab + "<");
System.out.println("matched BsTab = " + sequenceTab.matches(patternBsTab));
System.out.println("matched BsBsTab = " + sequenceTab.matches(patternBsBsTab));
}
}
Output on my JDK1.7 system is below, tabs in output might not survive SO formatter :)
sequence is >x y z<
pattern BsTab is >x .*<
pattern BsBsTab is >x\t.*<
matched BsTab = true
matched BsBsTab = true
HTH

Categories

Resources