String contains() function with Scanner in Java - java

System.out.println("Enter a string:");
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
if (str.contains("\n")) {
System.out.println("yes");
}
to the above piece of code the input string one\ntwo does not print "yes"
but the below code prints "yes"
String str = "one\ntwo";
if (str.contains("\n")) {
System.out.println("yes");
}
Could anyone suggest the reason for such a result?

When you type one\ntwo in console input \n is treated as two characters: \ and n, but when you write "\n" in code in String literal, then it represents line separator.
To check if your input contains \ character followed by n use contains("\\n") - to create \ literal we need to escape it by writing it as "\\" because it is special character in String (used for instance to create \n, \r \t, or \").

In Java, the \ is the 'escape' character. If you use \ in a String declaration, it is never literally put into the String, but used to escape the character right after it. For instance, you can use it to escape the double quote:
String str = "A double quote: \""; \\
You can also escape the escape character:
String str = "A backslash : \\";
The escape character is also used in meta-characters like \n. If you want to literally use those in a string, you have to escape them as well:
String str = "A newline character: \\n";
And that last example is exactly what Java does automatically for you if you retrieve the input from the System.in. It gets the literal \n, not the meta-character new-line.
So to summarize: inputting \n via the System.in is equivalent to directly setting a String to \\n.

Related

Regex to split a string based on \r characters not a carriage return or a new line

i want a Regex expression to split a string based on \r characters not a carriage return or a new line.
Below is the sample string i have.
MSH|^~\&|1100|CB|CERASP|TESTSB8F|202008041554||ORU|1361|P|2.2\rPID|1|833944|21796920320|8276975
i want this to be split into
MSH|^~\&|1100|CB|CERASP|TESTSB8F|202008041554||ORU|1361|P|2.2
PID|1|833944|21796920320|8276975
currently i have something like this
StringUtils.split(testStr, "\\r");
but it is splitting into
MSH|^~
&|1100|CB|CERASP|TESTSB8F|202008041554||ORU|1361|P|2.2
PID|1|833944|21796920320|8276975
You can just use String#split:
final String str = "MSH|^~\\&|1100|CB|CERASP|TESTSB8F|202008041554||ORU|1361|P|2.2\\rPID|1|833944|21796920320|8276975";
final String[] substrs = str.split("\\\\r");
System.out.println(Arrays.toString(substrs));
// Outputs [MSH|^~\&|1100|CB|CERASP|TESTSB8F|202008041554||ORU|1361|P|2.2, PID|1|833944|21796920320|8276975]
You can use
import java.utl.regex.*;
//...
String[] results = text.split(Pattern.quote("\\r"));
The Pattern.quote allows using any plain text inside String.split that accepts a valid regular expression. Here, \ is a special char, and needs to be escaped for both Java string interpretation engine and the regex engine.
The method being called matches any one of the contents in the delimiter string as a delimiter, not the entire sequence. Here is the code from SeparatorUtils that executes the delimiter (str is the input string being split) check:
if (separatorChars.indexOf(str.charAt(i)) >= 0) {
As #enzo mentioned, java.lang.String.split() will do the job - just make sure to quote the separator. Pattern.quote() can help.

string.replaceAll("\\n","") not working if string is taken as input from console and contains newline character

Case 1: Taking string input from scanner and replacing \n with -- (Not working)
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
str = str.replaceAll("\n", "--");
System.out.println(str);
input: "UY9Q3HGjqYE1aHNIG+Rju2hS3WAAEFlakOSGZWffabFpWkeQ\nz4g6mfKoGVR2\nF1QkiHRMZfL4mCvChAuL7gCT3d3SrmxD6lBnOiWiFTPUz4Q=\n"
Case2: Same thing works if I directly assign string with same value as above.
String str = "UY9Q3HGjqYE1aHNIG+Rju2hS3WAAEFlakOSGZWffabFpWkeQ\nz4g6mfKoGVR2\nF1QkiHRMZfL4mCvChAuL7gCT3d3SrmxD6lBnOiWiFTPUz4Q=\n";
str = str.replaceAll("\n", "--");
PS: I have already tried using \n, line.separater
String str = "Input\\nwith backslash and n";
str = str.replaceAll("\\\\n", "--");
System.out.println(str);
Output:
Input--with backslash and n
We need to escape the backslash twice: To tell the regular expression that a literal backslash is intended we need to put two backslashes. And to tell the Java compiler that we intend literal backslashes in the string, each of those two needs to be entered as two backslashes. So we end up typing four of them.
nextLine() reads one line, so the line cannot contain a newline character. So I have assumes that you were entering a backslash and an n as part of your input.
Less confusing solution
We don’t need to use any regular expression here, and doing that complicates the escaping business. So don’t.
String str = "Input\\nwith backslash\\nand n\\n";
str = str.replace("\\n", "--");
System.out.println(str);
Input--with backslash--and n--
The replace method replaces all occurrences of the literal string given (in spite of not having All in the method name). So now we only need one escape, the one for the Java compiler.
In regular expression if you use single backward slash “\” throws error as it is a escape character. If you use double backward slash “\”, it throws “java.util.regex.PatternSyntaxException: Unexpected internal error near index” exception.
The double backward slash is treated as a single backward slash “\” in regular expression. So four backward slash “\\” should be added to match a single backward slash in a String.
Please try replacing \n with \\n:
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
str = str.replaceAll("\\\\n", "--");
System.out.println(str);

What does scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?"); do? [duplicate]

In HACKERRANK this line of code occurs very frequently. I think this is to skip whitespaces but what does that "\r\u2028\u2029\u0085" thing mean
scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
Scanner.skip skips a input which matches the pattern, here the pattern is :-
(\r\n|[\n\r\u2028\u2029\u0085])?
? matches exactly zero or one of the previous character.
| Alternative
[] Matches single character present in
\r matches a carriage return
\n newline
\u2028 matches the character with index 2018 base 16(8232 base 10 or 20050 base 8) case sensitive
\u2029 matches the character with index 2029 base 16(8233 base 10 or 20051 base 8) case sensitive
\u0085 matches the character with index 85 base 16(133 base 10 or 205 base 8) case sensitive
1st Alternative \r\n
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
2nd Alternative [\n\r\u2028\u2029\u0085]
Match a single character present in the list below [\n\r\u2028\u2029\u0085]
\n matches a line-feed (newline) character (ASCII 10)
\r matches a carriage return (ASCII 13)
\u2028 matches the character with index 202816 (823210 or 200508) literally (case sensitive) LINE SEPARATOR
\u2029 matches the character with index 202916 (823310 or 200518) literally (case sensitive) PARAGRAPH SEPARATOR
\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive) NEXT LINE
Skip \r\n is for Windows.
The rest is standard \r=CR, \n=LF (see \r\n , \r , \n what is the difference between them?)
Then some Unicode special characters:
u2028 = LINE SEPARATOR (https://www.fileformat.info/info/unicode/char/2028/index.htm)
u2029 = PARAGRAPH SEPARATOR
(http://www.fileformat.info/info/unicode/char/2029/index.htm)
u0085 = NEXT LINE (https://www.fileformat.info/info/unicode/char/0085/index.htm)
OpenJDK's source code shows that nextLine() uses this regex for line separators:
private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]";
\r\n is a Windows line ending.
\n is a UNIX line ending.
\r is a Macintosh (pre-OSX) line ending.
\u2028 is LINE SEPARATOR.
\u2029 is PARAGRAPH SEPARATOR.
\u0085 is NEXT LINE (NEL).
The whole thing is a regex expression, so you could simply drop it into https://regexr.com or https://regex101.com/ and it will provided you with a full description of what each part of the regex means.
Here it is for you though:
(\r\n|[\n\r\u2028\u2029\u0085])? / gm
1st Capturing Group (\r\n|[\n\r\u2028\u2029\u0085])?
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
1st Alternative \r\n
\r matches a carriage return (ASCII 13)
\n matches a line-feed (newline) character (ASCII 10)
2nd Alternative [\n\r\u2028\u2029\u0085]
Match a single character present in the list below
[\n\r\u2028\u2029\u0085]
\n matches a line-feed (newline) character (ASCII 10)
\r matches a carriage return (ASCII 13)
\u2028 matches the character 
 with index 202816 (823210 or 200508) literally (case sensitive)
\u2029 matches the character 
 with index 202916 (823310 or 200518) literally (case sensitive)
\u0085 matches the character with index 8516 (13310 or 2058) literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
As for scanner.skip this does (Scanner Pattern Tutorial):
The java.util.Scanner.skip(Pattern pattern) method skips input that matches the specified pattern, ignoring delimiters. This method will skip input if an anchored match of the specified pattern succeeds.If a match to the specified pattern is not found at the current position, then no input is skipped and a NoSuchElementException is thrown.
I would also recommend reading Alan Moore's answer on here RegEx in Java: how to deal with newline he talks about new ways in Java 1.8.
scanner.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
in Unix and all Unix-like systems, \n is the code for end-of-line,
\r means nothing special
as a consequence, in C and most languages that somehow copy it (even
remotely), \n is the standard escape sequence for end of line
(translated to/from OS-specific sequences as needed)
in old Mac systems (pre-OS X), \r was the code for end-of-line
instead in Windows (and many old OSs), the code for end of line is 2
characters, \r\n, in this order as a (surprising;-) consequence
(harking back to OSs much older than Windows), \r\n is the standard
line-termination for text formats on the Internet
u0085 NEXT LINE (NEL)
U2029 PARAGRAPH SEPARATOR
U2028 LINE SEPARATOR'
The whole logic behind this is to remove the extra space and extra new line when input is from scanner
There's already a similar question here scanner.skip. It won't skip whitespaces since the unicode char for it is not present (u0020)
\r = CR (Carriage Return) // Used as a new line character in Mac OS before X
\n = LF (Line Feed) // Used as a new line character in Unix/Mac OS X
\r\n = CR + LF // Used as a new line character in Windows
u2028 = line separator
u2029 = paragraph separator
u0085 = next line
This ignores one line break, see \R.
Exactly the same could have been done with \R - sigh.
scanner.skip("\\R?");
I have a much simpler exercise to explain this
public class Solution {
public static void main(String[] args) {
int i = 4;
double d = 4.0;
String s = "HackerRank ";
Scanner scan = new Scanner(System.in);
int a;
double b;
String c = null;
a = scan.nextInt();
b = scan.nextDouble();
c = scan.nextLine();
System.out.println(c);
scan.close();
System.out.println(a + i);
System.out.println(b + d);
System.out.println(s.concat(c));
}
}
TRY running this.. FIRST and see the output
After that
public class Solution {
public static void main(String[] args) {
int i = 4;
double d = 4.0;
String s = "HackerRank ";
Scanner scan = new Scanner(System.in);
int a;
double b;
String c = null;
a = scan.nextInt();
b = scan.nextDouble();
scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
c = scan.nextLine();
System.out.println(c);
scan.close();
System.out.println(a + i);
System.out.println(b + d);
System.out.println(s.concat(c));
}
}
TRY THIS AGAIN..
This can be a very tricky interview question
I cursing myself before I could realise the issue..
Just ask any programmer
to take an integer number
to take an double number
and a string
ALL FROM USER INPUT
If they don't know this.. they will most definitely fail..
You can find a much simpler answer about the behaivor of the integer and the double in their javadocs
It is associated to scanner class:
Lets suppose u have input from system console
4
This is next line
int a =scanner.nextInt();
String s = scanner.nextLine();
value of a will be read as 4
and value of s will be empty string because nextLine just reads what is next in same line, and after that it shifts to nextLine
to read it perfectly, u should add one more time nextLine() like below
int a =scanner.nextInt();
scanner.nextLine();
String s = scanner.nextLine();
to insure that it reaches to nextline and skips everything if there is any anomaly in the input
scan.skip("(\r\n|[\n\r\u2028\u2029\u0085])?");
upper line does job perfectly in every OS and environment.

Java pattern regex with escape characters

I want to replace ";" with "\n" except when it's escaped with a leading '\'. I haven't figured out the correct regex.
Here is what I have:
String s = "abc;efg\\;hij;pqr;xyz\\;123"
s.replaceAll("\\[^\\\\];", "\\\\n");
I'd expect the above string to be replaced with "abc\nefg\;hij;pqr;xyz\;123"
Use a negative look behind:
s = s.replaceAll("(?<!\\\\);", "\n");
The expression (?<!\\) (coded as a java string literal "(?<!\\\\)") means "the previous character should not be a backslash"
Test code:
String s = "abc;efg\\;hij;pqr;xyz\\;123";
s = s.replaceAll("(?<!\\\\);", "\n");
System.out.println(s);
Output:
abc
efg\;hij
pqr
xyz\;123

How to split a string according to "\\" or "\"?

I want to split a string "ABC\DEF" ?
I have tried
String str = "ABC\DEF";
String[] values1 = str.split("\\");
String[] values2 = str.split("\");
But none seems to be working. Please help.
String.split() expects a regular expression. You need to escape each \ because it is in a java string (by the way you should escape on String str = "ABC\DEF"; too), and you need to escape for the regex. In the end, you will end with this line:
String[] values = str.split("\\\\");
The "\\\\" will be the \\ string, which the regex will interpret as \.
Note that String.split splits a string by regex.
One correct way1 to specify \ as delimiter, in RAW regex is:
\\
Since \ is special character in regex, you need to escape it to specify the literal \.
Putting the regex in string literal, you need to escape again, since \ is also escape character in string literal. Therefore, you end up with:
"\\\\"
So your code should be:
str.split("\\\\")
Note that this splits on every single instance of \ in the string.
Footnote
1 Other ways (in RAW regex) are:
\x5C
\0134
\u005C
In string literal (even worse than the quadruple escaping):
"\\x5C"
"\\0134"
"\\u005C"
Use it:
String str = "ABC\\DEF";
String[] values1 = str.split("\\\\");
final String HAY = "_0_";
String str = "ABC\\DEF".replace("\\", HAY);
System.out.println(Arrays.asList(str.split(HAY)));

Categories

Resources