I have a string which looks like this:
"m 535.71429,742.3622 55.71428,157.14286 c 0,0 165.71429,-117.14286 -55.71428,-157.14286 z"
and i want the java scanner to ouput the following strings: "m", "535.71429", "742.3622", "55.71428", "157.14286", "c", ...
so everything seperated by a comma or a space, but I am having troubles getting it to work.
This is how my code looks like:
Scanner scanner = new Scanner(path_string);
scanner.useDelimiter(",||//s");
String s = scanner.next();
if (s.equals("m")){
s = scanner.next();
point[0] = Float.parseFloat(s);
s = scanner.next();
point[1] = Float.parseFloat(s);
....
but the strings that come out are: "m", " ", "5", "3", ...
I think trouble is with //s. You have to use this pattern:
scanner.useDelimiter("(,|\\s)");
Regex patterns:
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd
We use dual \ because this is special symbol and | isn't
If you want the output to be strings, the Float.parseFloat(s); is of no use for your problem. Is your array a float-array?
Because if it is, your should not get any output but an NumberFormatException, because the string "m" cannot be parsed into a float.
Furthermore, to solve the problem of the single values, you could use a StringBuilder which constructs your numbers and ignores the letters and commas. A special use of the letters should be implemented.
Finally, if it is not absolutely neccessary, use double instead of float. It's just so much safer and might save your from some more problems within you program!
Related
I am trying to mask the CC number, in a way that third character and last three characters are unmasked.
For eg.. 7108898787654351 to **0**********351
I have tried (?<=.{3}).(?=.*...). It unmasked last three characters. But it unmasks first three also.
Can you throw some pointers on how to unmask 3rd character alone?
You can use this regex with a lookahead and lookbehind:
str = str.replaceAll("(?<!^..).(?=.{3})", "*");
//=> **0**********351
RegEx Demo
RegEx Details:
(?<!^..): Negative lookahead to assert that we don't have 2 characters after start behind us (to exclude 3rd character from matching)
.: Match a character
(?=.{3}): Positive lookahead to assert that we have at least 3 characters ahead
I would suggest that regex isn't the only way to do this.
char[] m = new char[16]; // Or whatever length.
Arrays.fill(m, '*');
m[2] = cc.charAt(2);
m[13] = cc.charAt(13);
m[14] = cc.charAt(14);
m[15] = cc.charAt(15);
String masked = new String(m);
It might be more verbose, but it's a heck of a lot more readable (and debuggable) than a regex.
Here is another regular expression:
(?!(?:\D*\d){14}$|(?:\D*\d){1,3}$)\d
See the online demo
It may seem a bit unwieldy but since a credit card should have 16 digits I opted to use negative lookaheads to look for an x amount of non-digits followed by a digit.
(?! - Negative lookahead
(?: - Open 1st non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){14} - Close 1st non capture group and match it 14 times.
$ - End string ancor.
| - Alternation/OR.
(?: - Open 2nd non capture group.
\D*\d - Match zero or more non-digits and a single digit.
){1,3} - Close 2nd non capture group and match it 1 to 3 times.
$ - End string ancor.
) - Close negative lookahead.
\d - Match a single digit.
This would now mask any digit other than the third and last three regardless of their position (due to delimiters) in the formatted CC-number.
Apart from where the dashes are after the first 3 digits, leave the 3rd digit unmatched and make sure that where are always 3 digits at the end of the string:
(?<!^\d{2})\d(?=[\d-]*\d-?\d-?\d$)
Explanation
(?<! Negative lookbehind, assert what is on the left is not
^\d{2} Match 2 digits from the start of the string
) Close lookbehind
\d Match a digit
(?= Positive lookahead, assert what is on the right is
[\d-]* 0+ occurrences of either - or a digit
\d-?\d-?\d Match 3 digits with optional hyphens
$ End of string
) Close lookahead
Regex demo | Java demo
Example code
String regex = "(?<!^\\d{2})\\d(?=[\\d-]*\\d-?\\d-?\\d$)";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
String strings[] = { "7108898787654351", "7108-8987-8765-4351"};
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
System.out.println(matcher.replaceAll("*"));
}
Output
**0**********351
**0*-****-****-*351
Don't think you should use a regex to do what you want. You could use StringBuilder to create the required string
String str = "7108-8987-8765-4351";
StringBuilder sb = new StringBuilder("*".repeat(str.length()));
for (int i = 0; i < str.length(); i++) {
if (i == 2 || i >= str.length() - 3) {
sb.replace(i, i + 1, String.valueOf(str.charAt(i)));
}
}
System.out.print(sb.toString()); // output: **0*************351
You may add a ^.{0,1} alternative to allow matching . when it is the first or second char in the string:
String s = "7108898787654351"; // **0**********351
System.out.println(s.replaceAll("(?<=.{3}|^.{0,1}).(?=.*...)", "*"));
// => **0**********351
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
The regex can be written as a PCRE compliant pattern, too: (?<=.{3}|^|^.).(?=.*...).
It is equal to
System.out.println(s.replaceAll("(?<!^..).(?=.*...)", "*"));
See the Java demo and a regex demo.
Regex details
(?<=.{3}|^.{0,1}) - there must be any three chars other than line break chars immediately to the left of the current location, or start of string, or a single char at the start of the string
(?<!^..) - a negative lookbehind that fails the match if there are any two chars other than line break chars immediately to the left of the current location
. - any char but a line break char
(?=.*...) - there must be any three chars other than line break chars immediately to the right of the current location.
If the CC number always has 16 digits, as it does in the example, and as do Visa and MasterCard CC's, matches of the following regular expression can be replaced with an asterisk.
\d(?!\d{0,2}$|\d{13}$)
Start your engine!
Given a String containing numbers (possibly with decimals), parentheses and any amount of whitespace, I need to iterate through the String and handle each number and parenthesis.
The below works for the String "1 ( 2 3 ) 4", but does not work if I remove whitespaces between the parentheses and the numbers "1 (2 3) 4)".
Scanner scanner = new Scanner(expression);
while (scanner.hasNext()) {
String token = scanner.next();
// handle token ...
System.out.println(token);
}
Scanner uses whitespace as it's default delimiter. You can change this to use a different Regex pattern, for example:
(?:\\s+)|(?<=[()])|(?=[()])
This pattern will set the delimiter to the left bracket or right bracket or one or more whitespace characters. However, it will also keep the left and right brackets (as I think you want to include those in your parsing?) but not the whitespace.
Here is an example of using this:
String test = "123(3 4)56(7)";
Scanner scanner = new Scanner(test);
scanner.useDelimiter("(?:\\s+)|(?<=[()])|(?=[()])");
while(scanner.hasNext()) {
System.out.println(scanner.next());
}
Output:
123
(
3
4
)
56
(
7
)
Detailed Regex Explanation:
(?:\\s+)|(?<=[()])|(?=[()])
1st Alternative: (?:\\s+)
(?:\\s+) Non-capturing group
\\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
2nd Alternative: (?<=[()])
(?<=[()]) Positive Lookbehind - Assert that the regex below can be matched
[()] match a single character present in the list below
() a single character in the list () literally
3rd Alternative: (?=[()])
(?=[()]) Positive Lookahead - Assert that the regex below can be matched
[()] match a single character present in the list below
() a single character in the list () literally
Scanner's .next() method uses whitespace as its delimiter. Luckily, we can change the delimiter!
For example, if you need the scanner to process to handle whitespace and parentheses, you could run this code immediately after constructing your Scanner:
scanner.useDelimiter(" ()");
I am very new to regular expressions. I need to construct a regular expression which if used in the code below would produce a newLine that had only letters (upper and lowercase), numbers, #, -, _ and ..
The following expression does not work: ([^\\w][^#][^-][^_][^\\.]). It replaces some of the letters and not all of the unwanted characters. Why doesn't it work?
String line = in.nextLine();
String newLine = line.replaceAll( "([^\\w][^#][^-][^_][^\\.])", " ");
System.out.println(s.replaceAll("[^a-zA-Z0-9-_#.]",""));
I have the following string:
String str = "Klaße, STRAßE, FUß";
Using of combined regex I want to replace German ß letter to ss or SS respectively. To perform this I have:
String replaceUml = str
.replaceAll("ß", "ss")
.replaceAll("A-Z|ss$", "SS")
.replaceAll("^(?=^A-Z)(?=.*A-Z$)(?=.*ss).*$", "SS");
Expected result:
Klasse, STRASSE, FUSS
Actual result:
Klasse, STRAssE, FUSS
Where I'm wrong?
First of all, if you're trying to match some character in the range A-Z, you need to put it in square brackets. This
.replaceAll("A-Z|ss$", "SS")
will look for the three characters A-Z in the source, which isn't what you want. Second, I think you're confused about what | means. If you say this:
.replaceAll("[A-Z]|ss$", "SS")
it will replace any upper-case letter at the end of the word with SS, because | means look for this or that.
A third problem with your approach is that the second and third replaceAll's will look for any ss that was in the original string, even if it didn't come from a ß. This may or may not be what you want.
Here's what I'd do:
String replaceUml = str
.replaceAll("(?<=[A-Z])ß", "SS")
.replaceAll("ß", "ss");
This will first replace all ß by SS if the character before the ß is an upper-case letter; then if there are any ß's left over, they get replaced by ss. Actually, this won't work if the character before ß is an umlaut like Ä, so you probably should change this to
String replaceUml = str
.replaceAll("(?<=[A-ZÄÖÜ])ß", "SS")
.replaceAll("ß", "ss");
(There may be a better way to specify an "upper-case Unicode letter"; I'll look for it.)
EDIT:
String replaceUml = str
.replaceAll("(?<=\\p{Lu})ß", "SS")
.replaceAll("ß", "ss");
A problem is that it won't work if ß is the second character in the text, and the first letter of the word is upper-cased but the rest of the word isn't. In that case you probably want lower-case "ss".
String replaceUml = str
.replaceAll("(?<=\\b\\p{Lu})ß(?=\\P{Lu})", "ss")
.replaceAll("(?<=\\p{Lu})ß", "SS")
.replaceAll("ß", "ss");
Now the first one will replace ß by ss if it's preceded by an upper-case letter that is the first letter of the word but followed by a character that isn't an upper-case letter. \P{Lu} with an upper-case P will match any character other than an upper-case letter (it's the negative of \p{Lu} with a lower-case p). I also included \b to test for the first character of a word.
String replaceUml = str
.replaceAll("(?<=\\p{Lu})ß", "SS")
.replace("ß", "ss")
This uses regex with a preceding unicode upper case letter ("SÜß"), to have capital "SS".
The (?<= ... ) is a look-behind, a kind of context matching. You could also do
.replaceAll("(\\p{Lu})ß", "$1SS")
as ß will not occure at the beginning.
Your main trouble was not using brackets [A-Z].
Breaking your regex into parts:
Regex 101 Demo
Regex
/ß/g
Description
ß Literal ß
g modifier: global. All matches (don't return on first match)
Visualization
Regex 101 Demo
Regex
/([A-Z])ss$/g
Description
1st Capturing group ([A-Z])
Char class [A-Z] matches:
A-Z A character range between Literal A and Literal Z
ss Literal ss
$ End of string
g modifier: global. All matches (don't return on first match)
Visualization
Regex 101 Demo
Regex
/([A-Z]+)ss([A-Z]+)/g
Description
1st Capturing group ([A-Z]+)
Char class [A-Z] 1 to infinite times [greedy] matches:
A-Z A character range between Literal A and Literal Z
ss Literal ss
2nd Capturing group ([A-Z]+)
Char class [A-Z] 1 to infinite times [greedy] matches:
A-Z A character range between Literal A and Literal Z
g modifier: global. All matches (don't return on first match)
Visualization
Specifically for you
String replaceUml = str
.replaceAll("ß", "ss")
.replaceAll("([A-Z])ss$", "$1SS")
.replaceAll("([A-Z]+)ss([A-Z]+)", "$1SS$2");
Use String.replaceFirst() instead of String.replaceAll().
replaceAll("ß", "ss")
This will replace all the occurrences of "ß". Hence the output after this statement becomes something like this :
Klasse, STRAssE, FUss
Now replaceAll("A-Z|ss$", "SS") replaces the last occurrence of "ss" with "SS", hence your final result looks like this :
Klasse, STRAssE, FUSS
To get your expected result try this out :
String replaceUml = str.replaceFirst("ß", "ss").replaceAll("ß", "SS");
I need split a text and get only words, numbers and hyphenated composed-words. I need to get latin words also, then I used \p{L}, which gives me é, ú ü ã, and so forth. The example is:
String myText = "Some latin text with symbols, ? 987 (A la pointe sud-est de l'île se dresse la cathédrale Notre-Dame qui fut lors de son achèvement en 1330 l'une des plus grandes cathédrales d'occident) : ! # # $ % ^& * ( ) + - _ #$% " ' : ; > < / \ | , here some is wrong… * + () e -"
Pattern pattern = Pattern.compile("[^\\p{L}+(\\-\\p{L}+)*\\d]+");
String words[] = pattern.split( myText );
What is wrong with this regex? Why it matches symbols like "(", "+", "-", "*" and "|"?
Some of results are:
dresse // OK
sud-est // OK
occident) // WRONG
987 // OK
() // WRONG
(a // WRONG
* // WRONG
- // WRONG
+ // WRONG
( // WRONG
| // WRONG
The regex explanation is:
[^\p{L}+(\-\p{L}+)*\d]+
* Word separator will be:
* [^ ... ] No sequence in:
* \p{L}+ Any latin letter
* (\-\p{L}+)* Optionally hyphenated
* \d or numbers
* [ ... ]+ once or more.
If my understanding of your requirement is correct, this regex will match what you want:
"\\p{IsLatin}+(?:-\\p{IsLatin}+)*|\\d+"
It will match:
A contiguous sequence of Unicode Latin script characters. I restrict it to Latin script, since \p{L} will match letter in any script. Change \\p{IsLatin} to \\pL if your version of Java doesn't support the syntax.
Or several such sequences, hyphenated
Or a contiguous sequence of decimal digits (0-9)
The regex above is to be used by calling Pattern.compile, and call matcher(String input) to obtain a Matcher object, and use a loop to find matches.
Pattern pattern = Pattern.compile("\\p{IsLatin}+(?:-\\p{IsLatin}+)*|\\d+");
Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
System.out.println(matcher.group());
}
If you want to allow words with apostrophe ':
"\\p{IsLatin}+(?:['\\-]\\p{IsLatin}+)*|\\d+"
I also escape - in the character class ['\\-] just in case you want to add more. Actually - doesn't need escaping if it is the first or last in the character class, but I escape it anyway just to be safe.
If the opening bracket of a character class is followed by a ^ then the characters listed inside the class are not allowed. So your regex allows anything except unicode letter,+,(,-,),* and digit occurring one or more times.
Note that characters like +,(,),* etc. don't have any special meaning inside a character class.
What pattern.split does is that it splits the string at patterns matching the regex. Your regex matches whitespace and hence split occurs at each occurrence of one or more whitespace. So result will be this.
For example consider this
Pattern pattern = Pattern.compile("a");
for (String s : pattern.split("sda a f g")) {
System.out.println("==>"+s);
}
Output will be
==>sd
==>
==> f g
A regular expression set description with [] can contain only letters, classes (\p{...}), sequences (e.g. a-z) and the complement symbol (^). You have to place the other magic characters you are using (+*()) outside the [ ] block.