Splitting strings with spaces - java

So I take in a line from a .txt file and turn it into a string. I would like to split the string up by |, but I also have spaces before and after it that is messing with the code, here is what I have so far:
File file = new File(fileLocation);
Scanner sc = new Scanner(file);
String line;
String[] words;
while(sc.hasNext()){
line = sc.next();
words = line.split("\\|");
this.german.add(words[0]);
this.english.add(words[1]);
}
An example line would be something like: in blue|in blau
I would also like to keep the spaces.
The .txt file would be:
in Rot|in red
in Blau|in blue
in Grun|in green
in Gelb|in Yellow
It would add all the items on the left of the | to the german list, and all of the ones on the right to the english list.
Ah, figured it out, the sc.next() is the next String, not the next line, I replaced it with sc.nextLine() and everything worked, thanks.

Call
line.replaceAll(" ", "");
beforehand; this will get rid of all the spaces. If you only want leading and trailing spaces from the split strings removed, use
words[i].trim()
instead.

Use the following pattern:
words = line.split("\\s+\\|\\s+");

Related

Split String between lines with regex

I want to split the file in the picture and i can't because of the lines...The output i get is all together.What is the right command in regex to split even if there is a new line ?
You are reading the file using a Scanner to get each line.
while(sc.hasNextLine())
You append the StringBuffer with each line (without a separators).
sb.append(sc.nextLine())
So at that point, a file with :
foo
bar
Would give a StringBuffer with
sb.toString(); //foobar
So there is not line separator left... Instead, do that with the Scanner.
while(sc.hasNextLine()){
String s = sc.nextLine();
System.out.println(s):
sb.append(s);
}
Last thing, please close every Scanner you open with sc.close() when you are done with them (that's a good practice to have).
EDIT:
If you want to keep that logic, simply append with a line separator after each lines:
while(sc.hasNextLine()){
sb.append(sc.nextLine()).append("\n");
}
That way, you will always have a line separator and the split will work
Quick example of what this would look like :
StringBuffer sb = new StringBuffer();
sb.append("foo").append("\n")
.append("bar").append("\n"); //fake "file reading"
String[] result = sb.toString().split("\n");
for(String s : result){
System.out.println(s);
}
foo
bar
This should do:
protasi[] = string.split("\\r?\\n");
split("\n") won't work because you need to provide a back-slash for \n which becomes \\n. I'd also added \\r which is what we should use for newline/carriage return.

Replacing substrings in a string Java

I'm trying to replace multiple substrings in a string, for example I have the following string wordlist
one two three
Where I want to replace \t tab characters with \r\n new line characters.
I define the separator variable as \n and replacement variable as \r\n.
Then I use wordlist = wordlist.replaceAll(separator, replacement); to replace all the characters, but when I display the wordlist again, it gives me the following result
onerntwornthree
I also tried splitting the wordlist by the substring separator into an array and then joining it again word by word into a new string separated by the replacement, but then it just gave me a result as
one\r\ntwo\r\nthree
Does anybody know how to solve this problem? In case you need it, here's the whole code:
System.out.print("Separator to replace: ");
separator = scanner.next( );
System.out.print("Replacement for separator: ");
replacement = scanner.next( );
wordlist = wordlist.replaceAll(separator, replacement);
Your input character for tab seems to be incorrect.
This code gives
String wordlist="one two three";
wordlist = wordlist.replaceAll("\t", "\r\n");
System.out.println(wordlist);
This output-
one
two
three
What you want to do is probably to split the string and the write the different lines one at a time to a PrintStream. That way you can use println.
Java is a platform independent language, and new lines are platform dependent. Making use of PrintStream.println will make sure your code is portable.
Why do you set the separator to \n?, it should be \t I assume?
The following code works fine for jdoodle:
String s = "one\ttwo\tthree";
s = s.replaceAll("\t","\r\n");
System.out.println(s);
EDIT
The reason why this doesn't work is because you query the user for the separator and when he enters \t, this is a string with the first character \ and the second t and not an escape character.
You should use StringEscapeUtils.unescapeJava first.
Thus:
Scanner sc = new Scanner(System.in);
String separator = sc.nextLine();
separator = StringEscapeUtils.unescapeJava(separator);
String s = "one\ttwo\tthree";
s = s.replaceAll(separator,"\r\n");
System.out.println(s);
If org.apache.commons.lang.StringEscapeUtils is not available, you can do this explicitly:
Scanner sc = new Scanner(System.in);
String separator = sc.nextLine();
separator = separator.replaceAll("\\t","\t");
String s = "one\ttwo\tthree";
s = s.replaceAll(separator,"\r\n");
System.out.println(s);
demo

java string split based on new line

I have following string
String str="aaaaaaaaa\n\n\nbbbbbbbbbbb\n \n";
I want to break it on \n so at the end i should two string aaaaaaaa and bbbbbbbb. I dont want last one as it only contain white space. so if i split it based on new line character using str.split() final array should have two entry only.
I tried below:
String str="aaaaaaaaa\n\n\nbbbbbbbbbbb\n \n".replaceAll("\\s+", " ");
String[] split = str.split("\n+");
it ignore all \n and give single string aaaaaaaaaa bbbbbbbb.
Delete the call to replaceAll(), which is removing the newlines too. Just this will do:
String[] split = str.split("\n\\s*");
This will not split on just spaces - the split must start at a newline (followed by optional further whitespace).
Here's some test code using your sample input with edge case enhancement:
String str = "aaaaaaaaa\nbbbbbb bbbbb\n \n";
String[] split = str.split("\n\\s*");
System.out.println(Arrays.toString(split));
Output:
[aaaaaaaaa, bbbbbb bbbbb]
This should do the trick:
String str="aaaaaaaaa\n\n\nbbbbbbbbbbb\n \n";
String[] lines = str.split("\\s*\n\\s*");
It will also remove all trailing and leading whitespace from all lines.
The \ns are removed by your first statement: \s matches \n

Recognizing empty lines in a text?

Im taking input from a separate file which currently has one paragraph. Im storing every word in the paragraph into a list and then iterating over each of them using this:
for (String word: words)
However, this iterator goes over each WORD. If I have two paragraphs in my input file which are separated by an empty line, how do I recognize that empty line under this for-loop iterator? My thinking is that iterating over words is obviously different from going over lines, so Im not sure.
An empty line follows the pattern:
\n\n
\r\n\r\n
\n -whitespace- \n
etc
A word following the pattern
-whitespace-nonwhitespace-whitespace-
Very different patterns. So looping over something using the definition of a word will never work.
You can use Java scanner to look at a file line by line.
public class LineScanner {
public List<String> eliminateEmptyLines(String input) {
scanner Scanner = new Scanner(input);
ArrayList<String> output = new ArrayList<>();
while (scanner.hasNextLine) {
String line = scanner.nextLine;
boolean isEmpty = line.matches("^\s*$");
if !(isEmpty) {
output.add(line);
}
}
return output;
}
}
Here's how the regex in String.matches works: How to check if a line is blank using regex
Here's the javadoc on Scanner: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html

reading line in bufferedReader

From the javadoc
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
I have following kind of text :
Now the earth was formless and empty. Darkness was on the surface
of the deep. God's Spirit was hovering over the surface
of the waters.
I am reading lines as:
while(buffer.readline() != null){
}
But, the problem is it is considering a line for string upto before newline.But i would like to consider line when string ends with .. How would i do it?
You can use a Scanner and set your own delimiter using useDelimiter(Pattern).
Note that the input delimiter is a regex, so you will need to provide the regex \. (you need to break the special meaning of the character . in regex)
You can read a character at a time, and copy the data to a StringBuilder
Reader reader = ...;
StringBuilder sb = new StringBuilder();
int ch;
while((ch = reader.read()) >= 0) {
if(ch == '.') break;
sb.append((char) ch);
}
Use a java.util.Scanner instead of a buffered reader, and set the delimiter to "\\." with Scanner.useDelimiter().
(but be aware that the delimiter is consumed, so you'll have to add it again!)
or read the raw string and split it on each .
You could split the whole text by every .:
String text = "Your test.";
String[] lines = text.split("\\.");
After you split the text you get an array of lines. You could also use a regex if you want more control, e.g. to split the text also by : or ;. Just google it.
PS.: Perhaps you have to remove the new line characters first with something like:
text = text.replaceAll("\n", "");

Categories

Resources