How do I use a delimiter with Scanner.useDelimiter in Java?

How do I use a delimiter with Scanner.useDelimiter in Java? - java

sc = new Scanner(new File(dataFile));
sc.useDelimiter(",|\r\n");
I don't understand how delimiter works, can someone explain this in layman terms?

The scanner can also use delimiters other than whitespace.
Easy example from Scanner API:
String input = "1 fish 2 fish red fish blue fish";
// \\s* means 0 or more repetitions of any whitespace character
// fish is the pattern to find
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt()); // prints: 1
System.out.println(s.nextInt()); // prints: 2
System.out.println(s.next()); // prints: red
System.out.println(s.next()); // prints: blue
// don't forget to close the scanner!!
s.close();
The point is to understand the regular expressions (regex) inside the Scanner::useDelimiter. Find an useDelimiter tutorial here.
To start with regular expressions here you can find a nice tutorial.
Notes
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd

With Scanner the default delimiters are the whitespace characters.
But Scanner can define where a token starts and ends based on a set of delimiter, wich could be specified in two ways:
Using the Scanner method: useDelimiter(String pattern)
Using the Scanner method : useDelimiter(Pattern pattern) where Pattern is a regular expression that specifies the delimiter set.
So useDelimiter() methods are used to tokenize the Scanner input, and behave like StringTokenizer class, take a look at these tutorials for further information:
Setting Delimiters for Scanner
Java.util.Scanner.useDelimiter() Method
And here is an Example:
public static void main(String[] args) {
// Initialize Scanner object
Scanner scan = new Scanner("Anna Mills/Female/18");
// initialize the string delimiter
scan.useDelimiter("/");
// Printing the tokenized Strings
while(scan.hasNext()){
System.out.println(scan.next());
}
// closing the scanner stream
scan.close();
}
Prints this output:
Anna Mills
Female
18

For example:
String myInput = null;
Scanner myscan = new Scanner(System.in).useDelimiter("\\n");
System.out.println("Enter your input: ");
myInput = myscan.next();
System.out.println(myInput);
This will let you use Enter as a delimiter.
Thus, if you input:
Hello world (ENTER)
it will print 'Hello World'.

Related

What does scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?"); do? [duplicate]

In HACKERRANK this line of code occurs very frequently. I think this is to skip whitespaces but what does that "\r\u2028\u2029\u0085" thing mean
scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");

Scanner.skip skips a input which matches the pattern, here the pattern is :-
(\r\n|[\n\r\u2028\u2029\u0085])?
? matches exactly zero or one of the previous character.
| Alternative
[] Matches single character present in
\r matches a carriage return
\n newline
\u2028 matches the character with index 2018 base 16(8232 base 10 or 20050 base 8) case sensitive
\u2029 matches the character with index 2029 base 16(8233 base 10 or 20051 base 8) case sensitive
\u0085 matches the character with index 85 base 16(133 base 10 or 205 base 8) case sensitive
1st Alternative \r\n
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
2nd Alternative [\n\r\u2028\u2029\u0085]
Match a single character present in the list below [\n\r\u2028\u2029\u0085]
\n matches a line-feed (newline) character (ASCII 10)
\r matches a carriage return (ASCII 13)
\u2028 matches the character with index 202816 (823210 or 200508) literally (case sensitive) LINE SEPARATOR
\u2029 matches the character with index 202916 (823310 or 200518) literally (case sensitive) PARAGRAPH SEPARATOR
\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive) NEXT LINE

Skip \r\n is for Windows.
The rest is standard \r=CR, \n=LF (see \r\n , \r , \n what is the difference between them?)
Then some Unicode special characters:
u2028 = LINE SEPARATOR (https://www.fileformat.info/info/unicode/char/2028/index.htm)
u2029 = PARAGRAPH SEPARATOR
(http://www.fileformat.info/info/unicode/char/2029/index.htm)
u0085 = NEXT LINE (https://www.fileformat.info/info/unicode/char/0085/index.htm)

OpenJDK's source code shows that nextLine() uses this regex for line separators:
private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]";
\r\n is a Windows line ending.
\n is a UNIX line ending.
\r is a Macintosh (pre-OSX) line ending.
\u2028 is LINE SEPARATOR.
\u2029 is PARAGRAPH SEPARATOR.
\u0085 is NEXT LINE (NEL).

The whole thing is a regex expression, so you could simply drop it into https://regexr.com or https://regex101.com/ and it will provided you with a full description of what each part of the regex means.
Here it is for you though:
(\r\n|[\n\r\u2028\u2029\u0085])? / gm
1st Capturing Group (\r\n|[\n\r\u2028\u2029\u0085])?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
1st Alternative \r\n
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
2nd Alternative [\n\r\u2028\u2029\u0085]
Match a single character present in the list below
[\n\r\u2028\u2029\u0085]
\n matches a line-feed (newline) character (ASCII 10)
\r matches a carriage return (ASCII 13)
\u2028 matches the character   with index 202816 (823210 or 200508) literally (case sensitive)
\u2029 matches the character   with index 202916 (823310 or 200518) literally (case sensitive)
\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
As for scanner.skip this does (Scanner Pattern Tutorial):
The java.util.Scanner.skip(Pattern pattern) method skips input that matches the specified pattern, ignoring delimiters. This method will skip input if an anchored match of the specified pattern succeeds.If a match to the specified pattern is not found at the current position, then no input is skipped and a NoSuchElementException is thrown.
I would also recommend reading Alan Moore's answer on here RegEx in Java: how to deal with newline he talks about new ways in Java 1.8.

scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
in Unix and all Unix-like systems, \n is the code for end-of-line,
\r means nothing special
as a consequence, in C and most languages that somehow copy it (even
remotely), \n is the standard escape sequence for end of line
(translated to/from OS-specific sequences as needed)
in old Mac systems (pre-OS X), \r was the code for end-of-line
instead in Windows (and many old OSs), the code for end of line is 2
characters, \r\n, in this order as a (surprising;-) consequence
(harking back to OSs much older than Windows), \r\n is the standard
line-termination for text formats on the Internet
u0085 NEXT LINE (NEL)
U2029 PARAGRAPH SEPARATOR
U2028 LINE SEPARATOR'
The whole logic behind this is to remove the extra space and extra new line when input is from scanner

There's already a similar question here scanner.skip. It won't skip whitespaces since the unicode char for it is not present (u0020)
\r = CR (Carriage Return) // Used as a new line character in Mac OS before X
\n = LF (Line Feed) // Used as a new line character in Unix/Mac OS X
\r\n = CR + LF // Used as a new line character in Windows
u2028 = line separator
u2029 = paragraph separator
u0085 = next line

This ignores one line break, see \R.
Exactly the same could have been done with \R - sigh.
scanner.skip("\\R?");

I have a much simpler exercise to explain this
public class Solution {
public static void main(String[] args) {
int i = 4;
double d = 4.0;
String s = "HackerRank ";
Scanner scan = new Scanner(System.in);
int a;
double b;
String c = null;
a = scan.nextInt();
b = scan.nextDouble();
c = scan.nextLine();
System.out.println(c);
scan.close();
System.out.println(a + i);
System.out.println(b + d);
System.out.println(s.concat(c));
}
}
TRY running this.. FIRST and see the output
After that
public class Solution {
public static void main(String[] args) {
int i = 4;
double d = 4.0;
String s = "HackerRank ";
Scanner scan = new Scanner(System.in);
int a;
double b;
String c = null;
a = scan.nextInt();
b = scan.nextDouble();
scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
c = scan.nextLine();
System.out.println(c);
scan.close();
System.out.println(a + i);
System.out.println(b + d);
System.out.println(s.concat(c));
}
}
TRY THIS AGAIN..
This can be a very tricky interview question
I cursing myself before I could realise the issue..
Just ask any programmer
to take an integer number
to take an double number
and a string
ALL FROM USER INPUT
If they don't know this.. they will most definitely fail..
You can find a much simpler answer about the behaivor of the integer and the double in their javadocs

It is associated to scanner class:
Lets suppose u have input from system console
4
This is next line
int a =scanner.nextInt();
String s = scanner.nextLine();
value of a will be read as 4
and value of s will be empty string because nextLine just reads what is next in same line, and after that it shifts to nextLine
to read it perfectly, u should add one more time nextLine() like below
int a =scanner.nextInt();
scanner.nextLine();
String s = scanner.nextLine();
to insure that it reaches to nextline and skips everything if there is any anomaly in the input
scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
upper line does job perfectly in every OS and environment.

Explain this line written in JAVA

In HACKERRANK this line of code occurs very frequently. I think this is to skip whitespaces but what does that "\r\u2028\u2029\u0085" thing mean
scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");

Scanner.skip skips a input which matches the pattern, here the pattern is :-
(\r\n|[\n\r\u2028\u2029\u0085])?
? matches exactly zero or one of the previous character.
| Alternative
[] Matches single character present in
\r matches a carriage return
\n newline
\u2028 matches the character with index 2018 base 16(8232 base 10 or 20050 base 8) case sensitive
\u2029 matches the character with index 2029 base 16(8233 base 10 or 20051 base 8) case sensitive
\u0085 matches the character with index 85 base 16(133 base 10 or 205 base 8) case sensitive
1st Alternative \r\n
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
2nd Alternative [\n\r\u2028\u2029\u0085]
Match a single character present in the list below [\n\r\u2028\u2029\u0085]
\n matches a line-feed (newline) character (ASCII 10)
\r matches a carriage return (ASCII 13)
\u2028 matches the character with index 202816 (823210 or 200508) literally (case sensitive) LINE SEPARATOR
\u2029 matches the character with index 202916 (823310 or 200518) literally (case sensitive) PARAGRAPH SEPARATOR
\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive) NEXT LINE

Skip \r\n is for Windows.
The rest is standard \r=CR, \n=LF (see \r\n , \r , \n what is the difference between them?)
Then some Unicode special characters:
u2028 = LINE SEPARATOR (https://www.fileformat.info/info/unicode/char/2028/index.htm)
u2029 = PARAGRAPH SEPARATOR
(http://www.fileformat.info/info/unicode/char/2029/index.htm)
u0085 = NEXT LINE (https://www.fileformat.info/info/unicode/char/0085/index.htm)

OpenJDK's source code shows that nextLine() uses this regex for line separators:
private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]";
\r\n is a Windows line ending.
\n is a UNIX line ending.
\r is a Macintosh (pre-OSX) line ending.
\u2028 is LINE SEPARATOR.
\u2029 is PARAGRAPH SEPARATOR.
\u0085 is NEXT LINE (NEL).

The whole thing is a regex expression, so you could simply drop it into https://regexr.com or https://regex101.com/ and it will provided you with a full description of what each part of the regex means.
Here it is for you though:
(\r\n|[\n\r\u2028\u2029\u0085])? / gm
1st Capturing Group (\r\n|[\n\r\u2028\u2029\u0085])?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
1st Alternative \r\n
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
2nd Alternative [\n\r\u2028\u2029\u0085]
Match a single character present in the list below
[\n\r\u2028\u2029\u0085]
\n matches a line-feed (newline) character (ASCII 10)
\r matches a carriage return (ASCII 13)
\u2028 matches the character   with index 202816 (823210 or 200508) literally (case sensitive)
\u2029 matches the character   with index 202916 (823310 or 200518) literally (case sensitive)
\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
As for scanner.skip this does (Scanner Pattern Tutorial):
The java.util.Scanner.skip(Pattern pattern) method skips input that matches the specified pattern, ignoring delimiters. This method will skip input if an anchored match of the specified pattern succeeds.If a match to the specified pattern is not found at the current position, then no input is skipped and a NoSuchElementException is thrown.
I would also recommend reading Alan Moore's answer on here RegEx in Java: how to deal with newline he talks about new ways in Java 1.8.

scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
in Unix and all Unix-like systems, \n is the code for end-of-line,
\r means nothing special
as a consequence, in C and most languages that somehow copy it (even
remotely), \n is the standard escape sequence for end of line
(translated to/from OS-specific sequences as needed)
in old Mac systems (pre-OS X), \r was the code for end-of-line
instead in Windows (and many old OSs), the code for end of line is 2
characters, \r\n, in this order as a (surprising;-) consequence
(harking back to OSs much older than Windows), \r\n is the standard
line-termination for text formats on the Internet
u0085 NEXT LINE (NEL)
U2029 PARAGRAPH SEPARATOR
U2028 LINE SEPARATOR'
The whole logic behind this is to remove the extra space and extra new line when input is from scanner

There's already a similar question here scanner.skip. It won't skip whitespaces since the unicode char for it is not present (u0020)
\r = CR (Carriage Return) // Used as a new line character in Mac OS before X
\n = LF (Line Feed) // Used as a new line character in Unix/Mac OS X
\r\n = CR + LF // Used as a new line character in Windows
u2028 = line separator
u2029 = paragraph separator
u0085 = next line

This ignores one line break, see \R.
Exactly the same could have been done with \R - sigh.
scanner.skip("\\R?");

I have a much simpler exercise to explain this
public class Solution {
public static void main(String[] args) {
int i = 4;
double d = 4.0;
String s = "HackerRank ";
Scanner scan = new Scanner(System.in);
int a;
double b;
String c = null;
a = scan.nextInt();
b = scan.nextDouble();
c = scan.nextLine();
System.out.println(c);
scan.close();
System.out.println(a + i);
System.out.println(b + d);
System.out.println(s.concat(c));
}
}
TRY running this.. FIRST and see the output
After that
public class Solution {
public static void main(String[] args) {
int i = 4;
double d = 4.0;
String s = "HackerRank ";
Scanner scan = new Scanner(System.in);
int a;
double b;
String c = null;
a = scan.nextInt();
b = scan.nextDouble();
scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
c = scan.nextLine();
System.out.println(c);
scan.close();
System.out.println(a + i);
System.out.println(b + d);
System.out.println(s.concat(c));
}
}
TRY THIS AGAIN..
This can be a very tricky interview question
I cursing myself before I could realise the issue..
Just ask any programmer
to take an integer number
to take an double number
and a string
ALL FROM USER INPUT
If they don't know this.. they will most definitely fail..
You can find a much simpler answer about the behaivor of the integer and the double in their javadocs

It is associated to scanner class:
Lets suppose u have input from system console
4
This is next line
int a =scanner.nextInt();
String s = scanner.nextLine();
value of a will be read as 4
and value of s will be empty string because nextLine just reads what is next in same line, and after that it shifts to nextLine
to read it perfectly, u should add one more time nextLine() like below
int a =scanner.nextInt();
scanner.nextLine();
String s = scanner.nextLine();
to insure that it reaches to nextline and skips everything if there is any anomaly in the input
scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
upper line does job perfectly in every OS and environment.

Scanning letters and floats using the java scanner

I have a string which looks like this:
"m 535.71429,742.3622 55.71428,157.14286 c 0,0 165.71429,-117.14286 -55.71428,-157.14286 z"
and i want the java scanner to ouput the following strings: "m", "535.71429", "742.3622", "55.71428", "157.14286", "c", ...
so everything seperated by a comma or a space, but I am having troubles getting it to work.
This is how my code looks like:
Scanner scanner = new Scanner(path_string);
scanner.useDelimiter(",||//s");
String s = scanner.next();
if (s.equals("m")){
s = scanner.next();
point[0] = Float.parseFloat(s);
s = scanner.next();
point[1] = Float.parseFloat(s);
....
but the strings that come out are: "m", " ", "5", "3", ...

I think trouble is with //s. You have to use this pattern:
scanner.useDelimiter("(,|\\s)");
Regex patterns:
abc… Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd
We use dual \ because this is special symbol and | isn't

If you want the output to be strings, the Float.parseFloat(s); is of no use for your problem. Is your array a float-array?
Because if it is, your should not get any output but an NumberFormatException, because the string "m" cannot be parsed into a float.
Furthermore, to solve the problem of the single values, you could use a StringBuilder which constructs your numbers and ignores the letters and commas. A special use of the letters should be implemented.
Finally, if it is not absolutely neccessary, use double instead of float. It's just so much safer and might save your from some more problems within you program!

How to insert space before capital letter for a String using JAVA?

I have a String "nameOfThe_String". Here 1st letter of the string should be capital. So I have used
String strJobname="nameOfThe_String";
strJobname=strJobname.substring(0,1).toUpperCase()+strJobname.substring(1);
Now, I need to insert the space before uppercase letters. So, I used
strJobname=strJobname.replaceAll("(.)([A-Z])", "$1 $2");
But here I need the output as "Name Of The_String". After '_' I don't need any space even S is a capital letter.
How can I do that? Please help me with this.

strJobname=strJobname.replaceAll("([^_])([A-Z])", "$1 $2");
The ^ character as the first character in square brackets means: Not this character. So, with the first bracket group you say: Any character that is not a _.
However, note that your regex might also insert spaces between consecutive capitals.

With look-arounds you can use:
String strJobname="nameOfThe_String";
strJobname = Character.toUpperCase(strJobname.charAt(0)) +
strJobname.substring(1).replaceAll("(?<!_)(?=[A-Z])", " ");
//=> Name Of The_String
RegEx Demo

Here's a different way which may fulfill your requirement.
public static void main(String[] args) {
String input;
Scanner sc = new Scanner(System.in);
input = sc.next();
StringBuilder text = new StringBuilder(input);
String find = "([^_])([A-Z])";
Pattern word = Pattern.compile(find);
Matcher matcher = word.matcher(text);
while(matcher.find())
text = text.insert(matcher.end() - 1, " ");
System.out.println(text);
}

Parse numbers and parentheses from a String?

Given a String containing numbers (possibly with decimals), parentheses and any amount of whitespace, I need to iterate through the String and handle each number and parenthesis.
The below works for the String "1 ( 2 3 ) 4", but does not work if I remove whitespaces between the parentheses and the numbers "1 (2 3) 4)".
Scanner scanner = new Scanner(expression);
while (scanner.hasNext()) {
String token = scanner.next();
// handle token ...
System.out.println(token);
}

Scanner uses whitespace as it's default delimiter. You can change this to use a different Regex pattern, for example:
(?:\\s+)|(?<=[()])|(?=[()])
This pattern will set the delimiter to the left bracket or right bracket or one or more whitespace characters. However, it will also keep the left and right brackets (as I think you want to include those in your parsing?) but not the whitespace.
Here is an example of using this:
String test = "123(3 4)56(7)";
Scanner scanner = new Scanner(test);
scanner.useDelimiter("(?:\\s+)|(?<=[()])|(?=[()])");
while(scanner.hasNext()) {
System.out.println(scanner.next());
}
Output:
123
(
3
4
)
56
(
7
)
Detailed Regex Explanation:
(?:\\s+)|(?<=[()])|(?=[()])
1st Alternative: (?:\\s+)
(?:\\s+) Non-capturing group
\\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
2nd Alternative: (?<=[()])
(?<=[()]) Positive Lookbehind - Assert that the regex below can be matched
[()] match a single character present in the list below
() a single character in the list () literally
3rd Alternative: (?=[()])
(?=[()]) Positive Lookahead - Assert that the regex below can be matched
[()] match a single character present in the list below
() a single character in the list () literally

Scanner's .next() method uses whitespace as its delimiter. Luckily, we can change the delimiter!
For example, if you need the scanner to process to handle whitespace and parentheses, you could run this code immediately after constructing your Scanner:
scanner.useDelimiter(" ()");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I use a delimiter with Scanner.useDelimiter in Java? - java

sc = new Scanner(new File(dataFile)); sc.useDelimiter(",|\r\n"); I don't understand how delimiter works, can someone explain this in layman terms?

Related

What does scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?"); do? [duplicate]

Explain this line written in JAVA

Scanning letters and floats using the java scanner

How to insert space before capital letter for a String using JAVA?

Parse numbers and parentheses from a String?

Categories

Resources