Read String line by line - java

Given a string that isn't too long, what is the best way to read it line by line?
I know you can do:
BufferedReader reader = new BufferedReader(new StringReader(<string>));
reader.readLine();
Another way would be to take the substring on the eol:
final String eol = System.getProperty("line.separator");
output = output.substring(output.indexOf(eol + 1));
Any other maybe simpler ways of doing it? I have no problems with the above approaches, just interested to know if any of you know something that may look simpler and more efficient?

There is also Scanner. You can use it just like the BufferedReader:
Scanner scanner = new Scanner(myString);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
// process the line
}
scanner.close();
I think that this is a bit cleaner approach that both of the suggested ones.

You can also use the split method of String:
String[] lines = myString.split(System.getProperty("line.separator"));
This gives you all lines in a handy array.
I don't know about the performance of split. It uses regular expressions.

Since I was especially interested in the efficiency angle, I created a little test class (below). Outcome for 5,000,000 lines:
Comparing line breaking performance of different solutions
Testing 5000000 lines
Split (all): 14665 ms
Split (CR only): 3752 ms
Scanner: 10005
Reader: 2060
As usual, exact times may vary, but the ratio holds true however often I've run it.
Conclusion: the "simpler" and "more efficient" requirements of the OP can't be satisfied simultaneously, the split solution (in either incarnation) is simpler, but the Reader implementation beats the others hands down.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
/**
* Test class for splitting a string into lines at linebreaks
*/
public class LineBreakTest {
/** Main method: pass in desired line count as first parameter (default = 10000). */
public static void main(String[] args) {
int lineCount = args.length == 0 ? 10000 : Integer.parseInt(args[0]);
System.out.println("Comparing line breaking performance of different solutions");
System.out.printf("Testing %d lines%n", lineCount);
String text = createText(lineCount);
testSplitAllPlatforms(text);
testSplitWindowsOnly(text);
testScanner(text);
testReader(text);
}
private static void testSplitAllPlatforms(String text) {
long start = System.currentTimeMillis();
text.split("\n\r|\r");
System.out.printf("Split (regexp): %d%n", System.currentTimeMillis() - start);
}
private static void testSplitWindowsOnly(String text) {
long start = System.currentTimeMillis();
text.split("\n");
System.out.printf("Split (CR only): %d%n", System.currentTimeMillis() - start);
}
private static void testScanner(String text) {
long start = System.currentTimeMillis();
List<String> result = new ArrayList<>();
try (Scanner scanner = new Scanner(text)) {
while (scanner.hasNextLine()) {
result.add(scanner.nextLine());
}
}
System.out.printf("Scanner: %d%n", System.currentTimeMillis() - start);
}
private static void testReader(String text) {
long start = System.currentTimeMillis();
List<String> result = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new StringReader(text))) {
String line = reader.readLine();
while (line != null) {
result.add(line);
line = reader.readLine();
}
} catch (IOException exc) {
// quit
}
System.out.printf("Reader: %d%n", System.currentTimeMillis() - start);
}
private static String createText(int lineCount) {
StringBuilder result = new StringBuilder();
StringBuilder lineBuilder = new StringBuilder();
for (int i = 0; i < 20; i++) {
lineBuilder.append("word ");
}
String line = lineBuilder.toString();
for (int i = 0; i < lineCount; i++) {
result.append(line);
result.append("\n");
}
return result.toString();
}
}

Using Apache Commons IOUtils you can do this nicely via
List<String> lines = IOUtils.readLines(new StringReader(string));
It's not doing anything clever, but it's nice and compact. It'll handle streams as well, and you can get a LineIterator too if you prefer.

Since Java 11, there is a new method String.lines:
/**
* Returns a stream of lines extracted from this string,
* separated by line terminators.
* ...
*/
public Stream<String> lines() { ... }
Usage:
"line1\nline2\nlines3"
.lines()
.forEach(System.out::println);

Solution using Java 8 features such as Stream API and Method references
new BufferedReader(new StringReader(myString))
.lines().forEach(System.out::println);
or
public void someMethod(String myLongString) {
new BufferedReader(new StringReader(myLongString))
.lines().forEach(this::parseString);
}
private void parseString(String data) {
//do something
}

You can also use:
String[] lines = someString.split("\n");
If that doesn't work try replacing \n with \r\n.

You can use the stream api and a StringReader wrapped in a BufferedReader which got a lines() stream output in java 8:
import java.util.stream.*;
import java.io.*;
class test {
public static void main(String... a) {
String s = "this is a \nmultiline\rstring\r\nusing different newline styles";
new BufferedReader(new StringReader(s)).lines().forEach(
(line) -> System.out.println("one line of the string: " + line)
);
}
}
Gives
one line of the string: this is a
one line of the string: multiline
one line of the string: string
one line of the string: using different newline styles
Just like in BufferedReader's readLine, the newline character(s) themselves are not included. All kinds of newline separators are supported (in the same string even).

Or use new try with resources clause combined with Scanner:
try (Scanner scanner = new Scanner(value)) {
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
// process the line
}
}

You can try the following regular expression:
\r?\n
Code:
String input = "\nab\n\n \n\ncd\nef\n\n\n\n\n";
String[] lines = input.split("\\r?\\n", -1);
int n = 1;
for(String line : lines) {
System.out.printf("\tLine %02d \"%s\"%n", n++, line);
}
Output:
Line 01 ""
Line 02 "ab"
Line 03 ""
Line 04 " "
Line 05 ""
Line 06 "cd"
Line 07 "ef"
Line 08 ""
Line 09 ""
Line 10 ""
Line 11 ""
Line 12 ""

The easiest and most universal approach would be to just use the regex Linebreak matcher \R which matches Any Unicode linebreak sequence:
Pattern NEWLINE = Pattern.compile("\\R")
String lines[] = NEWLINE.split(input)
#see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html

Related

Split a string with a space in Java, when a space occurs?

I'm trying to extract data from a CSV file, in which I have the following example CSV
timestamp, Column1,column2,column3
2019-05-07 19:17:23,x,y,z
2019-03-30 19:41:33,a,b,c
etc.
currently, my code is as follows:
public static void main(String[]args){
String blah = "file.csv";
File file = new File(blah);
try{
Scanner iterate = new Scanner(file);
iterate.next(); //skips the first line
while(iterate.hasNext()){
String data = iterate.next();
String[] values = data.split(",");
Float nbr = Float.parseFloat(values[2]);
System.out.println(nbr);
}
iterate.close();
}catch (FileNotFoundException e){
e.printStackTrace();
}
}
However, my code is giving me an error
java.lang.ArrayIndexOutOfBoundsException: Index 3 is out of bounds for length 3
My theory here is the split is the problem here. As there is no comma, my program thinks that the array ends with only the first element since there's no comma on the first element (I've tested it with the timestamp column and it seems to work, however, I want to print the values in column 3)
How do I use the split function to get the column1, column2, and column3 values?
import java.util.*;
import java.util.*;
import java.io.*;
public class Sample
{
public static void main(String[] args)
{
String line = "";
String splitBy = ",";
try
{ int i=0;
String file="blah.csv";
BufferedReader br = new BufferedReader(new FileReader(file));
int iteration=0;
while ((line = br.readLine()) != null) //returns a Boolean value
{ if(iteration < 1) {
iteration++;
continue;} //skips the first line
String[] stu = line.split(splitBy);
String time=stu[3];
System.out.println(time);
}
}
catch (IOException e)
{
e.printStackTrace();
}} }
Try this way by using BufferedReader
Input:
timestamp, Column1,column2,column3
2019-05-07 19:17:23,x,y,z
2019-03-30 19:41:33,a,b,c
2019-05-07 19:17:23,x,y,a
2019-03-30 19:41:33,a,b,f
2019-05-07 19:17:23,x,y,x
2019-03-30 19:41:33,a,b,y
Output for this above code is:
z
c
a
f
x
y
A few suggestions:
Use Scanner#nextLine and Scanner#hasNextLine.
Use try-with-resources statement.
Since lines have either whitespace or a comma as the delimiter, use the regex pattern, \s+|, as the parameter to the split method. The regex pattern, \s+|, means one or more whitespace characters or a comma. Alternatively, you can use [\s+,] as the regex pattern.
Demo:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
String blah = "file.csv";
File file = new File(blah);
try (Scanner iterate = new Scanner(file)) {
iterate.nextLine(); // skips the first line
while (iterate.hasNextLine()) {
String line = iterate.nextLine();
String[] values = line.split("[\\s+,]");
System.out.println(Arrays.toString(values));
}
}
}
}
Output:
[2019-05-07, 19:17:23, x, y, z]
[2019-03-30, 19:41:33, a, b, c]

How to print specific numbers from txt file?

I have a text file written in the following texts:
18275440:Annette Nguyen:98
93840989:Mary Rochetta:87
23958632:Antoine Yung:79
23658231:Claire Coin:78
23967548:Emma Chung:69
23921664:Jung Kim:98
23793215:Harry Chiu:98
I want to extract last two digit numbers from each line. This is my written code:
for (int i = 3; i < 25; i++) {
line = inFile.nextLine();
String[] split = line.split(":");
System.out.println(split[2]);
}
And I am getting a runtime error.
Update the reading method, if you are using Scanner you can check if there are more lines left or not.
while(inFile.hasNextLine()) {
line = inFile.nextLine();
String[] split = line.split(":");
System.out.println(split[2]);
}
Why the complexity of the for loop specification? You don't use i, so why bother with all that. Don't you just want to read lines until there aren't any more? If you do that, assuming that inFile will let you read lines from it, your code to actually parse each line and extract the number at the end seems right. Here's a complete (minus the class definition) example that uses your parsing logic:
public static void main(String[] args) throws IOException {
// Open the input data file
BufferedReader inFile = new BufferedReader(new FileReader("/tmp/data.txt"));
while(true) {
// Read the next line
String line = inFile.readLine();
// Break out of our loop if we've run out of lines
if (line == null)
break;
// Strip off any whitespace on the beginning and end of the line
line = line.strip();
// If the line is empty, skip it
if (line.isEmpty())
continue;
// Parse the line, and print out the third component, the two digit number at the end of the line
String[] split = line.strip().split(":");
System.out.println(split[2]);
}
}
If there's a file named /tmp/data.txt with the contents you provide in your question, this is the output you get from this code:
98
87
79
78
69
98
98
Don't be so explicit with your loop criteria. Use a counter to acquire the data you want from the file, for example:
int lineCounter = 0;
String line;
while (inFile.hasNextLine()) {
line = inFile.nextLine();
lineCounter++;
if (lineCounter >=3 && lineCounter <= 24) {
String[] split = line.trim().split(":");
System.out.println(split[2]);
}
}
I don't know why your code gives error. If you had any unwanted lines (I see you have 3 such lines in your code) in the beginning just run an empty scanner over them.
Scanner scanner = new Scanner(new File("E:\\file.txt"));
String[] split;
// run an empty scanner
for (int i = 1; i <= 3; i++) scanner.nextLine();
while (scanner.hasNextLine()) {
split = scanner.nextLine().split(":");
System.out.println(split[2]);
}
In case you don't know of such lines and they would not comply to the rules of the lines, then you could use try...catch to eliminate them. I'm using a simple exception here. But you could throw an exception when your conditions doesn't meet.
Suppose your file looks like this:
1
2
3
18275440:Annette Nguyen:98
93840989:Mary Rochetta:87
23958632:Antoine Yung:79
bleh bleh bleh
23658231:Claire Coin:78
23967548:Emma Chung:69
23921664:Jung Kim:98
23793215:Harry Chiu:98
Then your code would be
Scanner scanner = new Scanner(new File("E:\\file.txt"));
String[] split;
// run an empty scanner
// for (int i = 1; i <= 3; i++) scanner.nextLine();
while (scanner.hasNextLine()) {
split = scanner.nextLine().split(":");
try {
System.out.println(split[2]);
} catch (ArrayIndexOutOfBoundsException e) {
}
}
Assuming you're using Java 8, you can take a simpler, less imperative approach by using BufferedReader's lines method, which returns a Stream:
BufferedReader reader = new BufferedReader(new FileReader("/tmp/data.txt"));
reader.lines()
.map(line -> line.split(":")[2])
.forEach(System.out::println);
But, come to think of it, you could avoid BufferedReader by using Files from Java's NIO API:
Files.lines(Paths.get("/tmp/data.txt"))
.map(line -> line.split(":")[2])
.forEach(System.out::println);
You can split on \d+:[\p{L}\s]+: and take the second element from the resulting array. The regex pattern, \d+:[\p{L}\s]+: means a string of digits (\d+) followed by a : which in turn is followed by a string of any combinations of letters and space which in turn is followed by a :
public class Main {
public static void main(String[] args) {
String line = "18275440:Annette Nguyen:98";
String[] split = line.split("\\d+:[\\p{L}\\s]+:");
String n = "";
if (split.length == 2) {
n = split[1].trim();
}
System.out.println(n);
}
}
Output:
98
Note that \p{L} specifies a letter.

Searching in file - java

I will write data into the text file in format like this:
Jonathan 09.5.2015 1
John 10.5.2015 4
Jonathan 11.5.2015 14
Jonathan 12.5.2015 15
Jonathan 13.5.2015 7
Tobias 14.5.2015 9
Jonathan 15.5.2015 6
The last number is hours. I need to make something where I can write two dates and name. For example - Jonathan 11.5.2015 and second date 15.5.2015. All I want to do is count hours between these dates. Output should looks like Jonathan 11.5.2014 - 15.5.2014 - 42 hours I don't have problem with GUI but I don't know the right way how to compute my result.
Assuming that you have to write a method that, given a text file in the above format, a name and two dates, returns the total hours attributed to that person between the two dates, your code can be made very simple:
public int totalHours(Iterable<String> input, String person, String date1, String date2) {
SimpleDateFormat sdf = new SimpleDateFormat("MM.dd.yyyy");
Date start = sdf.parse(date1);
Date end = sdf.parse(date2);
int total = 0;
for (String line : input) { // assuming each string in input is a line
String parts[] = line.split(" ");
if ( ! parts[0].equals(person)) continue; // ignore - not him
Date d = sdf.parse(parts[1]);
if (d.compareTo(end) > 0) break; // if dates are sorted, we're finished
if (d.compareTo(start) <= 0) total += Integer.parseInt(parts[2]);
}
return total;
}
This code assumes that your input is already split into lines. You write that you already know how to read from files, so this should not be an obstacle. The function would run a lot faster (for repeated queries) if you store all lines in a TreeMap, indexed by their dates. And even more efficient if you built a HashMap<String, TreeMap<Date, Integer> > from the file, where the strings would be people's names and the integers would be the hours on those dates.
Edit: one way of doing the file-reading part
There are many ways of reading files. The most standard is the one you describe in your comment. This is a modified version that makes minimal changes to the above totalHours (argument input is now an Iterable<String> instead of String[]). The code has been adapted from
Iterating over the content of a text file line by line - is there a best practice? (vs. PMD's AssignmentInOperand):
public class IterableReader implements Iterable<String> {
private BufferedReader r;
public IterableReader(String fileName) throws IOException {
r = new BufferedReader(new FileReader(fileName));
}
public Iterator<String> iterator() {
return new Iterator<String>() {
String nextLine = readAndIfNullClose();
private String readAndIfNullClose() {
try {
String line = r.readLine();
if (line == null) r.close();
return line;
} catch (IOException e) {
return null;
}
}
#Override
public boolean hasNext() {
return nextLine != null;
}
#Override
public String next() {
String line = nextLine;
nextLine = readAndIfNullClose();
return line;
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
};
}
}
And you should now be able to call it as follows:
System.out.println("Hours worked by Bob from 11.5.2015 to 15.5.2015: "
+ totalHours(new IterableReader("inputfile.txt"),
"Bob", "11.5.2015", "15.5.2015"));
import java.io.*;
class Test{
BufferedReader f = null;
try{
f = new BufferedReader(new FileReader("youFile.txt"));
String something=null;
while((something=f.readLine())!=null){
String[] part= something.split(" ");
}
}catch(FileNotFoundException e){
e.getMessage();
}
}
After you split this code, you will get a array "part" with 3 index
per line, so you should convert to int or String depending what you
want to do. 0 = name 1 = hour 2 = this forever alone number :D

converting one line string into individual integers

if i have this line in a file: 2 18 4 3
and i want to read it as individual integers, how could i?
i'm using bufferreader:
BufferedReader(new FileReader("mp1.data.txt"));
i have tried to use:
BufferedReader(new RandomAccessFile("mp1.data.txt"));
so i can use the method
.readCahr();
but i got an error
if i use
int w = in.read();
it will read the ASCII, and i want it as it is(in dec.)
i was thinking to read it as a string first, but then could i separate each number?
also i was thinking to let each number in a line, but the file i have is long with numbers
Consider using a Scanner:
Scanner scan = new Scanner(new File("mp1.data.txt"));
You can then use scan.nextInt() (which returns an int, not a String) so long as scan.hasNextInt().
No need for that ugly splitting and parsing :)
However, note that this approach will continue reading integers past the first line (if that's not what you want, you should probably follow the suggestions outlined in the other answers for reading and handling only a single line).
Furthermore, hasNextInt() will return false as soon as a non-integer is encountered in the file. If you require a way to detect and handle invalid data, you should again consider the other answers.
It's important to approach larger problems in software engineering by breaking them into smaller ones. In this case, you've got three tasks:
Read a line from the file
Break it into individual parts (still strings)
Convert each part into an integer
Java makes each of these simple:
Use BufferedReader.readLine() to read the line as a string first
It looks like the splitting is as simple as splitting by a space with String.split():
String[] bits = line.split(" ");
If that's not good enough, you can use a more complicated regular expression in the split call.
Parse each part using Integer.parseInt().
Another option for the splitting part is to use the Splitter class from Guava. Personally I prefer that, but it's a matter of taste.
You can split() the String and then use the Integer.parseInt() method in order to convert all the elements to Integer objects.
try {
BufferedReader br = new BufferedReader(new FileReader("mp1.data.txt"));
String line = null;
while ((line = br.readLine()) != null) {
String[] split = line.split("\\s");
for (String element : split) {
Integer parsedInteger = Integer.parseInt(element);
System.out.println(parsedInteger);
}
}
}
catch (IOException e) {
System.err.println("Error: " + e);
}
Once you read the line using BufferedReader, you can use String.split(regex) method to split the string by space ("\\s").
for(String s : "2 18 4 3".split("\\s")) {
int i = Integer.parseInt(s);
System.out.println(i);
}
If you use Java 7+, you can use this utility method:
List<String> lines = Files.readAllLines(file, Charset.forName("UTF-8"));
for (String line: lines) {
String[] numbers = line.split("\\s+");
int firstNumber = Integer.parseInt(numbers[0]);
//etc
}
Try this;
try{
// Open the file that is the first
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
//split line by whitespace
String[] ints = strLine.split(" ");
int[] integers = new int[ints.length];
// to convert from string to integers - Integer.parseInt ("123")
for ( int i = 0; i < ints.length; i++) {
integers[i] = Integer.parseInt(ints[i]);
}
// now do what you want with your integer
// ...
}
in.close();
} catch (Exception e) {//Catch exception if any
System.err.println("Error: " + e.getMessage());
}

Preserving line breaks and spacing in file IO

I am workig on a pretty neat problem challenge that involves reading words from a .txt file. The program must allow for ANY .txt file to be read, ergo the program cannot predict what words it will be dealing with.
Then, it takes the words and makes them their "Pig Latin" counterpart, and writes them into a new file. There are a lot more requirements to this problem but siffice to say, I have every part solved save one...when printng to the new file I am unable to perserve the line spacing. That is to say, if line 1 has 5 words and then there is a break and line 2 has 3 words and a break...the same must be true for the new file. As it stands now, it all works but all the converted words are all listed one after the other.
I am interested in learning this so I am OK if you all wish to play coy in your answers. Although I have been at this for 9 hours so "semi-coy" will be appreaciated as well :) Please pay close attention to the "while" statements in the code that is where the file IO action is happening. I am wondering if I need to utilize the nextLine() commands from the scanner and then make a string off that...then make substrings off the nextLine() string to convert the words one at a time. The substrings could be splits or tokens, or something else - I am unclear on this part and token attempts are giving me compiler arrors exceptions "java.util.NoSuchElementException" - I do not seem to understand the correct call for a split command. I tried something like String a = scan.nextLine() where "scan" is my scanner var. Then tried String b = a.split() no go. Anyway here is my code and see if you can figure out what I am missing.
Here is code and thank you very much in advance Java gods....
import java.util.*;
import javax.swing.*;
import java.io.*;
import java.text.*;
public class PigLatinTranslator
{
static final String ay = "ay"; // "ay" is added to the end of every word in pig latin
public static void main(String [] args) throws IOException
{
File nonPiggedFile = new File(...);
String nonPiggedFileName = nonPiggedFile.getName();
Scanner scan = new Scanner(nonPiggedFile);
nonPiggedFileName = ...;
File pigLatinFile = new File(nonPiggedFileName + "-pigLatin.txt"); //references a file that may or may not exist yet
pigLatinFile.createNewFile();
FileWriter newPigLatinFile = new FileWriter(nonPiggedFileName + "-pigLatin.txt", true);
PrintWriter PrintToPLF = new PrintWriter(newPigLatinFile);
while (scan.hasNext())
{
boolean next;
while (next = scan.hasNext())
{
String nonPig = scan.next();
nonPig = nonPig.toLowerCase();
StringBuilder PigLatWord = new StringBuilder(nonPig);
PigLatWord.insert(nonPig.length(), nonPig.charAt(0) );
PigLatWord.insert(nonPig.length() + 1, ay);
PigLatWord.deleteCharAt(0);
String plw = PigLatWord.toString();
if (plw.contains("!") )
{
plw = plw.replace("!", "") + "!";
}
if (plw.contains(".") )
{
plw = plw.replace(".", "") + ".";
}
if (plw.contains("?") )
{
plw = plw.replace("?", "") + "?";
}
PrintToPLF.print(plw + " ");
}
PrintToPLF.close();
}
}
}
Use BufferedReader, not Scanner. http://docs.oracle.com/javase/6/docs/api/java/io/BufferedReader.html
I leave that part of it as an exercise for the original poster, it's easy once you know the right class to use! (And hopefully you learn something instead of copy-pasting my code).
Then pass the entire line into functions like this: (note this does not correctly handle quotes as it puts all non-apostrophe punctuation at the end of the word). Also it assumes that punctuation is supposed to go at the end of the word.
private static final String vowels = "AEIOUaeiou";
private static final String punct = ".,!?";
public static String pigifyLine(String oneLine) {
StringBuilder pigified = new StringBuilder();
boolean first = true;
for (String word : oneLine.split(" ")) {
if (!first) pigified.append(" ");
pigified.append(pigify(word));
first = false;
}
return pigified.toString();
}
public static String pigify(String oneWord) {
char[] chars = oneWord.toCharArray();
StringBuilder consonants = new StringBuilder();
StringBuilder newWord = new StringBuilder();
StringBuilder punctuation = new StringBuilder();
boolean consDone = false; // set to true when the first consonant group is done
for (int i = 0; i < chars.length; i++) {
// consonant
if (vowels.indexOf(chars[i]) == -1) {
// punctuation
if (punct.indexOf(chars[i]) > -1) {
punctuation.append(chars[i]);
consDone = true;
} else {
if (!consDone) { // we haven't found the consonants
consonants.append(chars[i]);
} else {
newWord.append(chars[i]);
}
}
} else {
consDone = true;
// vowel
newWord.append(chars[i]);
}
}
if (consonants.length() == 0) {
// vowel words are "about" -> "aboutway"
consonants.append("w");
}
consonants.append("ay");
return newWord.append(consonants).append(punctuation).toString();
}
You could try to store the count of words per line in a separate data structure, and use that as a guide for when to move on to the next line when writing the file.
I purposely made this semi-vague for you, but can elaborate on request.

Categories

Resources