Split String between lines with regex - java

I want to split the file in the picture and i can't because of the lines...The output i get is all together.What is the right command in regex to split even if there is a new line ?

You are reading the file using a Scanner to get each line.
while(sc.hasNextLine())
You append the StringBuffer with each line (without a separators).
sb.append(sc.nextLine())
So at that point, a file with :
foo
bar
Would give a StringBuffer with
sb.toString(); //foobar
So there is not line separator left... Instead, do that with the Scanner.
while(sc.hasNextLine()){
String s = sc.nextLine();
System.out.println(s):
sb.append(s);
}
Last thing, please close every Scanner you open with sc.close() when you are done with them (that's a good practice to have).
EDIT:
If you want to keep that logic, simply append with a line separator after each lines:
while(sc.hasNextLine()){
sb.append(sc.nextLine()).append("\n");
}
That way, you will always have a line separator and the split will work
Quick example of what this would look like :
StringBuffer sb = new StringBuffer();
sb.append("foo").append("\n")
.append("bar").append("\n"); //fake "file reading"
String[] result = sb.toString().split("\n");
for(String s : result){
System.out.println(s);
}
foo
bar

This should do:
protasi[] = string.split("\\r?\\n");
split("\n") won't work because you need to provide a back-slash for \n which becomes \\n. I'd also added \\r which is what we should use for newline/carriage return.

Related

Finding the first number of a .txt file - Java

I have different .txt files, which should be read by Java. Then java should give me the first number of first line or if I say differently, the first number of 4th word.
For instance:
"AAAAAA B Version 5.0.1" or "AAAAAA B Version 6.0.2"
5 or 6 should be the resulting number.
I've tried some methods such as bufferedReader with line.charAt but I don't know how I can make it work for my problem. What would you guys suggest ?
BufferedReader br = null;
br = new BufferedReader(new FileReader(new File("C:\\Users\\Desktop\\new2.txt")));
String line = null;
while((line = br.readLine()) != null) {
String[] parts = line.startsWith(...?.);
System.out.println("");
First of all just read First line of the file.
File f=new File("your_file.txt");
FileReader fr=new FileReader(f);
BufferedReader br=new BufferedReader(fr);
String firstLine=br.readLine();
Then split the line by spaces to get String array of words
String[] words=firstLine.split("\\s+");
Split the fourth word using " . "
String[] nos=words[3].split("\\.");
System.out.println(nos[0]);//Expected output, First number of fourth
//word in first line of given file
If you know the fourth word in each line is supposed to be the version number just grab that word and get the first character. If its not necessarily the fourth word try using indexOf() on the line and then get the character before that, that is if you know for sure each line is set up this way.
I meant to say if each line is known to have the format you showed above you should be able to use IndexOf() on the decimal.
AAAAAA B Version 5(.)0.1
Or use split() to split white space and get each word
parts = line.split(" ");
Pattern p = Pattern.compile("-?\\d+");
Matcher m = p.matcher(Your string here);
if(m.find()) {
System.out.println(m.group());
}
It tries to find a negative or positive integer and prints it if it can find any.
Well, first of all, make sure that you post your full code. It's much easier to diagnose the problem when you actually have the information along-side it.
Secondly, what you're going to want is a scanner that's declared to read this .txt file.
A regular scanner is delcared like this:
Scanner sc = new Scanner(System.in);
This will read the system imput for your own file. however, you could change system.in to your file if you declare the file as a variable.

Recognizing empty lines in a text?

Im taking input from a separate file which currently has one paragraph. Im storing every word in the paragraph into a list and then iterating over each of them using this:
for (String word: words)
However, this iterator goes over each WORD. If I have two paragraphs in my input file which are separated by an empty line, how do I recognize that empty line under this for-loop iterator? My thinking is that iterating over words is obviously different from going over lines, so Im not sure.
An empty line follows the pattern:
\n\n
\r\n\r\n
\n -whitespace- \n
etc
A word following the pattern
-whitespace-nonwhitespace-whitespace-
Very different patterns. So looping over something using the definition of a word will never work.
You can use Java scanner to look at a file line by line.
public class LineScanner {
public List<String> eliminateEmptyLines(String input) {
scanner Scanner = new Scanner(input);
ArrayList<String> output = new ArrayList<>();
while (scanner.hasNextLine) {
String line = scanner.nextLine;
boolean isEmpty = line.matches("^\s*$");
if !(isEmpty) {
output.add(line);
}
}
return output;
}
}
Here's how the regex in String.matches works: How to check if a line is blank using regex
Here's the javadoc on Scanner: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html

Splitting strings with spaces

So I take in a line from a .txt file and turn it into a string. I would like to split the string up by |, but I also have spaces before and after it that is messing with the code, here is what I have so far:
File file = new File(fileLocation);
Scanner sc = new Scanner(file);
String line;
String[] words;
while(sc.hasNext()){
line = sc.next();
words = line.split("\\|");
this.german.add(words[0]);
this.english.add(words[1]);
}
An example line would be something like: in blue|in blau
I would also like to keep the spaces.
The .txt file would be:
in Rot|in red
in Blau|in blue
in Grun|in green
in Gelb|in Yellow
It would add all the items on the left of the | to the german list, and all of the ones on the right to the english list.
Ah, figured it out, the sc.next() is the next String, not the next line, I replaced it with sc.nextLine() and everything worked, thanks.
Call
line.replaceAll(" ", "");
beforehand; this will get rid of all the spaces. If you only want leading and trailing spaces from the split strings removed, use
words[i].trim()
instead.
Use the following pattern:
words = line.split("\\s+\\|\\s+");

reading line in bufferedReader

From the javadoc
public String readLine()
throws IOException
Read a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
I have following kind of text :
Now the earth was formless and empty. Darkness was on the surface
of the deep. God's Spirit was hovering over the surface
of the waters.
I am reading lines as:
while(buffer.readline() != null){
}
But, the problem is it is considering a line for string upto before newline.But i would like to consider line when string ends with .. How would i do it?
You can use a Scanner and set your own delimiter using useDelimiter(Pattern).
Note that the input delimiter is a regex, so you will need to provide the regex \. (you need to break the special meaning of the character . in regex)
You can read a character at a time, and copy the data to a StringBuilder
Reader reader = ...;
StringBuilder sb = new StringBuilder();
int ch;
while((ch = reader.read()) >= 0) {
if(ch == '.') break;
sb.append((char) ch);
}
Use a java.util.Scanner instead of a buffered reader, and set the delimiter to "\\." with Scanner.useDelimiter().
(but be aware that the delimiter is consumed, so you'll have to add it again!)
or read the raw string and split it on each .
You could split the whole text by every .:
String text = "Your test.";
String[] lines = text.split("\\.");
After you split the text you get an array of lines. You could also use a regex if you want more control, e.g. to split the text also by : or ;. Just google it.
PS.: Perhaps you have to remove the new line characters first with something like:
text = text.replaceAll("\n", "");

How to remove line breaks from a file in Java?

How can I replace all line breaks from a string in Java in such a way that will work on Windows and Linux (ie no OS specific problems of carriage return/line feed/new line etc.)?
I've tried (note readFileAsString is a function that reads a text file into a String):
String text = readFileAsString("textfile.txt");
text.replace("\n", "");
but this doesn't seem to work.
How can this be done?
You need to set text to the results of text.replace():
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");
This is necessary because Strings are immutable -- calling replace doesn't change the original String, it returns a new one that's been changed. If you don't assign the result to text, then that new String is lost and garbage collected.
As for getting the newline String for any environment -- that is available by calling System.getProperty("line.separator").
As noted in other answers, your code is not working primarily because String.replace(...) does not change the target String. (It can't - Java strings are immutable!) What replace actually does is to create and return a new String object with the characters changed as required. But your code then throws away that String ...
Here are some possible solutions. Which one is most correct depends on what exactly you are trying to do.
// #1
text = text.replace("\n", "");
Simply removes all the newline characters. This does not cope with Windows or Mac line terminations.
// #2
text = text.replace(System.getProperty("line.separator"), "");
Removes all line terminators for the current platform. This does not cope with the case where you are trying to process (for example) a UNIX file on Windows, or vice versa.
// #3
text = text.replaceAll("\\r|\\n", "");
Removes all Windows, UNIX or Mac line terminators. However, if the input file is text, this will concatenate words; e.g.
Goodbye cruel
world.
becomes
Goodbye cruelworld.
So you might actually want to do this:
// #4
text = text.replaceAll("\\r\\n|\\r|\\n", " ");
which replaces each line terminator with a space1. Since Java 8 you can also do this:
// #5
text = text.replaceAll("\\R", " ");
And if you want to replace multiple line terminator with one space:
// #6
text = text.replaceAll("\\R+", " ");
1 - Note there is a subtle difference between #3 and #4. The sequence \r\n represents a single (Windows) line terminator, so we need to be careful not to replace it with two spaces.
This function normalizes down all whitespace, including line breaks, to single spaces. Not exactly what the original question asked for, but likely to do exactly what is needed in many cases:
import org.apache.commons.lang3.StringUtils;
final String cleansedString = StringUtils.normalizeSpace(rawString);
If you want to remove only line terminators that are valid on the current OS, you could do this:
text = text.replaceAll(System.getProperty("line.separator"), "");
If you want to make sure you remove any line separators, you can do it like this:
text = text.replaceAll("\\r|\\n", "");
Or, slightly more verbose, but less regexy:
text = text.replaceAll("\\r", "").replaceAll("\\n", "");
str = str.replaceAll("\\r\\n|\\r|\\n", " ");
Worked perfectly for me after searching a lot, having failed with every other line.
This would be efficient I guess
String s;
s = "try this\n try me.";
s.replaceAll("[\\r\\n]+", "")
Linebreaks are not the same under windows/linux/mac. You should use System.getProperties with the attribute line.separator.
String text = readFileAsString("textfile.txt").replaceAll("\n", "");
Even though the definition of trim() in oracle website is
"Returns a copy of the string, with leading and trailing whitespace omitted."
the documentation omits to say that new line characters (leading and trailing) will also be removed.
In short
String text = readFileAsString("textfile.txt").trim(); will also work for you.
(Checked with Java 6)
In Kotlin, and also since Java 11, String has lines() method, which returns list of lines in the multi-line string.
You can get all the lines and then merge them into a single string.
With Kotlin it will be as simple as
str.lines().joinToString("")
String text = readFileAsString("textfile.txt").replace("\n","");
.replace returns a new string, strings in Java are Immutable.
You may want to read your file with a BufferedReader. This class can break input into individual lines, which you can assemble at will. The way BufferedReader operates recognizes line ending conventions of the Linux, Windows and MacOS worlds automatically, regardless of the current platform.
Hence:
BufferedReader br = new BufferedReader(
new InputStreamReader("textfile.txt"));
StringBuilder sb = new StringBuilder();
for (;;) {
String line = br.readLine();
if (line == null)
break;
sb.append(line);
sb.append(' '); // SEE BELOW
}
String text = sb.toString();
Note that readLine() does not include the line terminator in the returned string. The code above appends a space to avoid gluing together the last word of a line and the first word of the next line.
I find it odd that (Apache) StringUtils wasn't covered here yet.
you can remove all newlines (or any other occurences of a substring for that matter) from a string using the .replace method
StringUtils.replace(myString, "\n", "");
This line will replace all newlines with the empty string.
because newline is technically a character you can optionally use the .replaceChars method that will replace characters
StringUtils.replaceChars(myString, '\n', '');
FYI if you can want to replace simultaneous muti-linebreaks with single line break then you can use
myString.trim().replaceAll("[\n]{2,}", "\n")
Or replace with a single space
myString.trim().replaceAll("[\n]{2,}", " ")
You can use apache commons IOUtils to iterate through the line and append each line to StringBuilder. And don't forget to close the InputStream
StringBuilder sb = new StringBuilder();
FileInputStream fin=new FileInputStream("textfile.txt");
LineIterator lt=IOUtils.lineIterator(fin, "utf-8");
while(lt.hasNext())
{
sb.append(lt.nextLine());
}
String text = sb.toString();
IOUtils.closeQuitely(fin);
You can use generic methods to replace any char with any char.
public static void removeWithAnyChar(String str, char replceChar,
char replaceWith) {
char chrs[] = str.toCharArray();
int i = 0;
while (i < chrs.length) {
if (chrs[i] == replceChar) {
chrs[i] = replaceWith;
}
i++;
}
}
org.apache.commons.lang.StringUtils#chopNewline
Try doing this:
textValue= textValue.replaceAll("\n", "");
textValue= textValue.replaceAll("\t", "");
textValue= textValue.replaceAll("\\n", "");
textValue= textValue.replaceAll("\\t", "");
textValue= textValue.replaceAll("\r", "");
textValue= textValue.replaceAll("\\r", "");
textValue= textValue.replaceAll("\r\n", "");
textValue= textValue.replaceAll("\\r\\n", "");

Categories

Resources