I have a special kind of Strings in Java that has a sequence of zeros and some short sequence of characters between them like those:
"0000000000TT0000TU0000U0"
"0000000000TL"
"0000000000TL0000TM"
I want to count the number of sequences that are different from zeros.
for example:
"0000000000TT0000TU0000U0" will return 3
"0000000000TL" will return 1
"0000000000TL0000TM" will return 2
"000000" will return 0.
Any short and easy way to do it (maybe some Java String build option or regex of some kinde)?
Thanks
Use a negated character class to match any character but not of 0.
Matcher m = Pattern.compile("[^0]+").matcher(s);
int i = 0;
while(m.find()) {
i = i + 1;
}
System.out.println("Total count " + i);
DEMO
Related
This question is similar to my previous question Split a string contain dash and minus sign. But I asked it in a wrong and then it got a slightly different semantics and people answered(including) in that perspective. Therefore rather than modifying that question I thought it's better to ask in a new question.
I have to split a string which contain hyphen-minus character and minus sign. I tried to split based on the unicode character (https://en.wikipedia.org/wiki/Hyphen#Unicode), still it considering minus sign same as hyphen-minus character. Is there a way I can solve it?
Expected output
(coun)
(US)
-1
Actual output
(coun)
(US)
// actually blank line will print here but SO editor squeezing the blank line
1
public static void main(String[] args) {
char dash = '-';
int i = -1;
String a = "(country)" + dash + "(US)" + dash + i;
Pattern p = Pattern.compile("-", Pattern.LITERAL);
String[] m = p.split(a);
for (String s : m) {
System.out.println(s);
}
}
char dash = '\u2010'; // 2010 is hyphen, 002D is hyphen-minus
int i = -1;
String a = "(country)" + dash + "(US)" + dash + i;
Pattern p = Pattern.compile("\u2010", Pattern.LITERAL);
String[] m = p.split(a);
for (String s : m) {
System.out.println(s);
}
The string representation of an integer always uses the hyphen-minus as the negative sign:
From Integer.toString:
If the first argument is negative, the first element of the result is the ASCII minus character '-' ('\u002D'). If the first argument is not negative, no sign character appears in the result.
so in the end your string has 3 hyphen-minus characters. That's why split can't distinguish between them.
Since you can't change the string representation of an integer, you need to change the dash variable to store a hyphen instead of hyphen-minus. Now there are 2 hyphens and 1 hyphen-minus in your string, making split able to distinguish between them.
I need to mask the phone number. it may consist of the digits, + (for country code) and dashes. The country code may consist of 1 or more digits. I have created such kind of regular expression to mask all the digits except the last 4:
inputPhoneNum.replaceAll("\\d(?=\\d{4})", "*");
For such input: +13334445678
I get this result: +*******5678
However, it doesn't work for such input: +1-333-444-5678
In particular, it returns just the same number without any change. While the desired output is masking all the digits except for the last 4, plus sign and dashes.
That is why I was wondering how I can change my regular expression to include dashes? I would be grateful for any help!
Use this regex for searching:
.(?=.{4})
RegEx Demo
Difference is that . will match any character not just a digit as in your regex.
Java code:
inputPhoneNum = inputPhoneNum.replaceAll(".(?=.{4})", "*");
However if your intent is to mask all digits before last 4 digits then use:
.(?=(?:\D*\d){4})
Or in Java:
inputPhoneNum = inputPhoneNum.replaceAll("\\d(?=(?:\\D*\\d){4})", "*");
(?=(?:\\D*\\d){4}) is a positive lookahead that asserts presence of at least 4 digits ahead that may be separated by 0 or more non-digits.
RegEx Demo 2
I'm not good in RegEx but I think you should normalize the phone numbers by getting rid of -occurences :
inputPhoneNum = inputPhoneNum.replace("-","").replaceAll("\\d(?=\\d{4})", "*");
Try to use two replace all non digit or + with empty then use your regex :
"+1-333-444-5678".replaceAll("[^\\d\\+]", "").replaceAll("\\d(?=\\d{4})", "*");
Output
+*******5678
I think this should work
".*\\d(?=\\d{4})","*"
You can try creating by hit and trial using this website.
If you don't want to use regex, an alternate solution would be to loop through the String with a StringBuilder from end to start, and append the first 4 digits and then * after that (and just append any non-digit characters as normal)
public static String lastFour(String s) {
StringBuilder lastFour = new StringBuilder();
int check = 0;
for (int i = s.length() - 1; i >= 0; i--) {
if (Character.isDigit(s.charAt(i))) {
check++;
}
if (check <= 4) {
lastFour.append(s.charAt(i));
} else {
lastFour.append(Character.isDigit(s.charAt(i)) ? "*" : s.charAt(i));
}
}
return lastFour.reverse().toString();
}
Try it online!
This is what I used, it may be useful, just masks some digits in the provided number
/*
* mask mobile number .
*/
public String maskMobileNumber(String mobile) {
final String mask = "*******";
mobile = mobile == null ? mask : mobile;
final int lengthOfMobileNumber = mobile.length();
if (lengthOfMobileNumber > 2) {
final int maskLen = Math.min(Math.max(lengthOfMobileNumber / 2, 2), 6);
final int start = (lengthOfMobileNumber - maskLen) / 2;
return mobile.substring(0, start) + mask.substring(0, maskLen) + mobile.substring(start + maskLen);
}
return mobile;
}
Is there a one liner to replace the while loop?
String formatSpecifier = "%(\\d+\\$)?([-#+ 0,(\\<]*)?(\\d+)?(\\.\\d+)?([tT])?([a-zA-Z%])";
Pattern pattern = Pattern.compile(formatSpecifier);
Matcher matcher = pattern.matcher("hello %s my name is %s");
// CAN BELOW BE REPLACED WITH A ONE LINER?
int counter = 0;
while (matcher.find()) {
counter++;
}
Personally I don't see any reason to aim for one-liner given the original code is already easy to understand. Anyway, several ways if you insists:
1. Make a helper method
make something like this
static int countMatches(Matcher matcher) {
int counter = 0;
while (matcher.find())
counter++;
return counter;
}
so your code become
int counter = countMatches(matcher);
2. Java 9
Matcher in Java 9 provides results() which returns a Stream<MatchResult>. So your code becomes
int counter = matcher.results().count();
3. String Replace
Similar to what the other answer suggest.
Here I am replacing with null character (which is almost not used in any normal string), and do the counting by split:
Your code become:
int counter = matcher.replaceAll("\0").split("\0", -1).length - 1;
Yes: replace any occurrence by a char that can be neither in the pattern nor in the string to match, then count the number of occurrences of this char.
Here I choose X, for the example to be simple. You should use a char not so often used (see UTF-8 special chars for instance).
final int counter = pattern.matcher("hello %s my name is %s").replaceAll("X").replaceAll("[^X]", "").length();
Value computed for counter is 2 with your example.
So I have something like this
System.out.println(some_string.indexOf("\\s+"));
this gives me -1
but when I do with specific value like \t or space
System.out.println(some_string.indexOf("\t"));
I get the correct index.
Is there any way I can get the index of the first occurrence of whitespace without using split, as my string is very long.
PS - if it helps, here is my requirement. I want the first number in the string which is separated from the rest of the string by a tab or space ,and i am trying to avoid split("\\s+")[0]. The string starts with that number and has a space or tab after the number ends
The point is: indexOf() takes a char, or a string; but not a regular expression.
Thus:
String input = "a\tb";
System.out.println(input);
System.out.println(input.indexOf('\t'));
prints 1 because there is a TAB char at index 1.
System.out.println(input.indexOf("\\s+"));
prints -1 because there is no substring \\s+ in your input value.
In other words: if you want to use the powers of regular expressions, you can't use indexOf(). You would be rather looking towards String.match() for example. But of course - that gives a boolean result; not an index.
If you intend to find the index of the first whitespace, you have to iterate the chars manually, like:
for (int index = 0; index < input.length(); index++) {
if (Character.isWhitespace(input.charAt(index))) {
return index;
}
}
return -1;
Something of this sort might help? Though there are better ways to do this.
class Sample{
public static void main(String[] args) {
String s = "1110 001";
int index = -1;
for(int i = 0; i < s.length(); i++ ){
if(Character.isWhitespace(s.charAt(i))){
index = i;
break;
}
}
System.out.println("Required Index : " + index);
}
}
Well, to find with a regular expression, you'll need to use the regular expression classes.
Pattern pat = Pattern.compile("\\s");
Matcher m = pat.matcher(s);
if ( m.find() ) {
System.out.println( "Found \\s at " + m.start());
}
The find method of the Matcher class locates the pattern in the string for which the matcher was created. If it succeeds, the start() method gives you the index of the first character of the match.
Note that you can compile the pattern only once (even create a constant). You just have to create a Matcher for every string.
I have inputs like
AS23456SDE
MFD324FR
I need to get First Character values like
AS, MFD
There should no first two or first 3 characters input can be changed. Need to get first characters before a number.
Thank you.
Edit : This is what I have tried.
public static String getPrefix(String serial) {
StringBuilder prefix = new StringBuilder();
for(char c : serial.toCharArray()){
if(Character.isDigit(c)){
break;
}
else{
prefix.append(c);
}
}
return prefix.toString();
}
Here is a nice one line solution. It uses a regex to match the first non numeric characters in the string, and then replaces the input string with this match.
public String getFirstLetters(String input) {
return new String("A" + input).replaceAll("^([^\\d]+)(.*)$", "$1")
.substring(1);
}
System.out.println(getFirstLetters("AS23456SDE"));
System.out.println(getFirstLetters("1AS123"));
Output:
AS
(empty)
A simple solution could be like this:
public static void main (String[]args) {
String str = "MFD324FR";
char[] characters = str.toCharArray();
for(char c : characters){
if(Character.isDigit(c))
break;
else
System.out.print(c);
}
}
Use the following function to get required output
public String getFirstChars(String str){
int zeroAscii = '0'; int nineAscii = '9';
String result = "";
for (int i=0; i< str.lenght(); i++){
int ascii = str.toCharArray()[i];
if(ascii >= zeroAscii && ascii <= nineAscii){
result = result + str.toCharArray()[i];
}else{
return result;
}
}
return str;
}
pass your string as argument
I think this can be done by a simple regex which matches digits and java's string split function. This Regex based approach will be more efficient than the methods using more complicated regexs.
Something as below will work
String inp = "ABC345.";
String beginningChars = inp.split("[\\d]+",2)[0];
System.out.println(beginningChars); // only if you want to print.
The regex I used "[\\d]+" is escaped for java already.
What it does?
It matches one or more digits (d). d matches digits of any language in unicode, (so it matches japanese and arabian numbers as well)
What does String beginningChars = inp.split("[\\d]+",2)[0] do?
It applies this regex and separates the string into string arrays where ever a match is found. The [0] at the end selects the first result from that array, since you wanted the starting chars.
What is the second parameter to .split(regex,int) which I supplied as 2?
This is the Limit parameter. This means that the regex will be applied on the string till 1 match is found. Once 1 match is found the string is not processed anymore.
From the Strings javadoc page:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will be efficient if your string is huge.
Possible other regex if you want to split only on english numerals
"[0-9]+"
public static void main(String[] args) {
String testString = "MFD324FR";
int index = 0;
for (Character i : testString.toCharArray()) {
if (Character.isDigit(i))
break;
index++;
}
System.out.println(testString.substring(0, index));
}
this prints the first 'n' characters before it encounters a digit (i.e. integer).