I'm trying to read text inside a .txt document using console command java program < doc.txt. The program should look for words inside a file, and the file CAN contain empty new lines, so I've tried changing the while condition from:
while((s = in.nextLine()) != null)
to:
while((s = in.nextLine()) != "-1")
making it stop when it would have found -1 (I've also tried with .equals()), but it does not work. How can I tell my program to stop searching for words when there's no more text to examine? Otherwise it keeps stopping when it finds an empty string (newline alone or sequence of new lines).
I've only found solutions using BufferedReader, but I don't know how to use it in this situation where the file is being read by the console command java program < doc.txt.
I post the code inside the while, if it can be necessary:
while((s = in.nextLine()) != null) {
s = s.toLowerCase();
Scanner line = new Scanner(s);
a = line.next();
if(a.equals("word")) {
k++;
}
}
Proper way of figuring out when Scanner runs out of input is checking hasNextLine() condition. Use this loop to read a sequence of strings that includes empty lines:
Scanner in = new Scanner(System.in);
while(in.hasNextLine()) {
String s = in.nextLine();
System.out.println(s);
}
Demo.
Related
Can someone tell me how to read every second line from a file in java?
BufferedReader br = new BufferedReader(new FileReader(file));
String line = br.readLine();
while(line != null){
//Do something ..
line = br.readLine()
}
br.close
One simple way would be to just maintain a counter of number of lines read:
int count = 0;
String line;
while ((line = br.readLine()) != null) {
if (count % 2 == 0) {
// do something with this line
}
++count;
}
But this still technically reads every line in the file, only choosing to process every other line. If you really only want to read every second line, then something like RandomAccessFile might be necessary.
You can do it in Java 8 fashion with very few lines :
static final int FIRST_LINE = 1;
Stream<String> lines = Files.lines(path);
String secondLine = lines.limit(2).skip(FIST_LINE).collect(Collectors.joining("\n"));
First you stream your file lines
You keep only the two first lines
Skip the first line
Note : In java 8, when using Files.lines(), you are supposed to close the stream afterwards or use it in a try-with-resource block.
This is similar to #Tim Biegeleisen's approach, but I thought I would show an alternative to get every other line using a boolean instead of a counter:
boolean skipOddLine = true;
String line;
while ((line = br.readLine()) != null) {
if (skipOddLine = !skipOddLine) {
//Use the String line here
}
}
This will toggle the boolean value every loop iteration, skipping every odd line. If you want to skip every even line instead you just need to change the initial condition to boolean skipOddLine = false;.
Note: This approach only works if you do not need to extend functionality to skip every 3rd line for example, where an approach like Tim's would be easier to modify. It also has the downside of being harder to read than the modulo approach.
This will help you to do it very well
You can use try with resource
You can use stream api java 8
You can use stream api supplier to use stream object again and again
I already hane added comment area to understand you
try (BufferedReader reader =
new BufferedReader(
new InputStreamReader(
new ByteArrayInputStream(x.getBytes()),
"UTF-8"))) { //this will help to you for various languages reading files
Supplier<Stream<String>> fileContentStream = reader::lines; // this will help you to use stream object again and again
if (FilenameUtils.getExtension(x.getOriginalFilename()).equals("txt")) { this will help you to various files extension filter
String secondLine = lines.limit(2).skip(FIST_LINE).collect(Collectors.joining("\n"));
String secondLine =
fileContentStream
.get()
.limit(2)
.skip(1)// you can skip any line with this action
.collect(Collectors.joining("\n"));
}
else if (FilenameUtils.getExtension(x.getOriginalFilename()).equals("pdf")) {
} catch (Exception ex) {
}
I am trying to go over a bunch of files, read each of them, and remove all stopwords from a specified list with such words. The result is a disaster - the content of the whole file copied over and over again.
What I tried:
- Saving the file as String and trying to look with regex
- Saving the file as String and going over line by line and comparing tokens to the stopwords that are stored in a LinkedHashSet, I can also store them in a file
- tried to twist the logic below in multiple ways, getting more and more ridiculous output.
- tried looking into text / line with the .contains() method, but no luck
My general logic is as follows:
for every word in the stopwords set:
while(file has more lines):
save current line into String
while (current line has more tokens):
assign current token into String
compare token with current stopword:
if(token equals stopword):
write in the output file "" + " "
else: write in the output file the token as is
Tried what's in this question and many other SO questions, but just can't achieve what I need.
Real code below:
private static void removeStopWords(File fileIn) throws IOException {
File stopWordsTXT = new File("stopwords.txt");
System.out.println("[Removing StopWords...] FILE: " + fileIn.getName() + "\n");
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader(stopWordsTXT));
Set<String> stopWords = new LinkedHashSet<String>();
for (String line; (line = readerSW.readLine()) != null; readerSW.readLine()) {
// trim() eliminates leading and trailing spaces
stopWords.add(line.trim());
}
File outp = new File(fileIn.getPath().substring(0, fileIn.getPath().lastIndexOf('.')) + "_NoStopWords.txt");
FileWriter fOut = new FileWriter(outp);
Scanner readerTxt = new Scanner(new FileInputStream(fileIn), "UTF-8");
while(readerTxt.hasNextLine()) {
String line = readerTxt.nextLine();
System.out.println(line);
Scanner lineReader = new Scanner(line);
for (String curSW : stopWords) {
while(lineReader.hasNext()) {
String token = lineReader.next();
if(token.equals(curSW)) {
System.out.println("---> Removing SW: " + curSW);
fOut.write("" + " ");
} else {
fOut.write(token + " ");
}
}
}
fOut.write("\n");
}
fOut.close();
}
What happens most often is that it looks for the first word from the stopWords set and that's it. The output contains all the other words even if I manage to remove the first one. And the first will be there in the next appended output in the end.
Part of my stopword list
about
above
after
again
against
all
am
and
any
are
as
at
With tokens I mean words, i.e. getting every word from the line and comparing it to the current stopword
After awhile of debugging I believe I have found the solution. This problem is very tricky as you have to use several different scanners and file readers etc. Here is what I did:
I changed how you added to your StopWords set, as it wasn't adding them correctly. I used a buffered reader to read each line, then a scanner to read each word, then added it to the set.
Then when you compared them I got rid of one of your loops as you can easily use the .contains() method to check if the word was a stopWord.
I left you to do the part of writing to the file to take out the stop words, as I'm sure you can figure that out now that everything else is working.
-My sample stop words txt file:
Stop words
Words
-My samples input file was the exact same, so it should catch all three words.
The code:
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader("stopWords.txt"));
Set<String> stopWords = new LinkedHashSet<String>();
String stopWordsLine = readerSW.readLine();
while (stopWordsLine != null) {
// trim() eliminates leading and trailing spaces
Scanner words = new Scanner(stopWordsLine);
String word = words.next();
while(word != null) {
stopWords.add(word.trim()); //Add the stop words to the set
if(words.hasNext()) {
word = words.next(); //If theres another line, read it
}
else {
break; //else break the inner while loop
}
}
stopWordsLine = readerSW.readLine();
}
BufferedReader outp = new BufferedReader(new FileReader("Words.txt"));
String line = outp.readLine();
while(line != null) {
Scanner lineReader = new Scanner(line);
String line2 = lineReader.next();
while(line2 != null) {
if(stopWords.contains(line2)) {
System.out.println("removing " + line2);
}
if(lineReader.hasNext()) { //If theres another line, read it
line2 = lineReader.next();
}
else {
break; //else break the first while loop
}
}
lineReader.close();
line = outp.readLine();
}
OutPut:
removing Stop
removing words
removing Words
Let me know if I can elaborate any more on my code or why I did something!
I'm trying to read a file line by line, but every time I run my program I get a NullPointerException at the line spaceIndex = line.indexOf(" "); which obviously means that line is null. HOWEVER. I know for a fact that the file I'm using has exactly 7 lines (even if I print the value of numLines, I get the value 7. And yet I still get a nullpointerexception when I try to read a line into my string.
// File file = some File I take in after clicking a JButton
Charset charset = Charset.forName("US-ASCII");
try (BufferedReader reader = Files.newBufferedReader(file.toPath(), charset)) {
String line = "";
int spaceIndex;
int numLines = 0;
while(reader.readLine()!=null) numLines++;
for(int i = 0; i<numLines; i++) {
line = reader.readLine();
spaceIndex = line.indexOf(" ");
System.out.println(spaceIndex);
}
PS: (I'm not actually using this code to print the index of the space, I replaced the code in my loop since there's a lot of it and it would make it longer to read)
If i'm going about reading the lines the wrong way, it would be great if someone could suggest another way, since so far every way I've tried gives me the same exception. Thanks.
By the time you start your for loop, the reader is already at the end of the file
(from the while loop).
Therefore, readLine() will always return null.
You should get rid of the for loop and do all of your work in the while loop as you first read the file.
You have two options.
First, you could read number of lines from a file this way:
LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")));
lnr.skip(Long.MAX_VALUE);
System.out.println(lnr.getLineNumber());
Then read the file right after:
while((line = reader.readLine())!=null)
{
spaceIndex = line.indexOf(" ");
System.out.println(spaceIndex);
}
This first option is an alternative (and in my my opinion, cooler) way of doing this.
Second option (and probably the more sensible) is to do it all at once in the while loop:
while((line = reader.readLine())!=null)
{
numLines++;
spaceIndex = line.indexOf(" ");
System.out.println(spaceIndex);
}
I am writing a class that will read lines from a log file when it is updated.
I am using Apache VFS2 to get a method called when a file is updated. My main issue is I don't want to read the line from the file if the line is not complete yet, as in it does have a "\n" or "\r" line separator type character at the end. I think i have looked at all the Java libraries i can to read lines but they all discard the EOF and line termination information so I don't think I can use them.
Instead I am looking at reading it in byte by byte and then checking the result to then discard all stuff that comes after the last line separator. I was wondering what you folks thoughts on the best method for doing this is.
So for example:
2013-Jul-01_14:07:17.875 - Connection to Message Bus is reestablished<LF>
2013-Jul-01_14:07:17.875 - Connection to Message Bus is reestablished<LF>
2013-Jul-01_14:15:08.205 - No connection to Message Bus - reestablish before we can publish<LF>
2013-Jul-01_14:15:08.205 - NOT A REAL LINE PLEASE DONT READ
I want to read in the first 3 but not the fourth as it doesn't have a line feed or carriage return character ().
I have looked at Apache commons-io Tailer stuff but I cant tell if that will give me "incomplete" lines (and I realize I will have to ditch the VFS2 stuff to use it).
So psudo-code:
private void ingestFileObject(FileObject file) {
BufferedInputStream bs = new BufferedInputStream(file.getContent().getInputStream());
StringBuilder result = new StringBuilder();
while (bs.available() > 0) {
result.append((char) bs.read());
}
bs.close();
String resultString = result.toString();
//determine what part of resultString is after last carriage return/line seperate (using regex [\\r\\n]+?
//remove the offending part of String.
}
Or any other solutions completely ignoring my psudo-code are welcome at this point too...
Thanks
Is using Scanner help you?
Scanner scanner = new Scanner(file);
//block till there is some thing with a new line
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
//do processing.
}
This is what I ended up doing:
BufferedReader bufReader = new BufferedReader(new InputStreamReader(file.getContent().getInputStream()));
StringBuilder result = new StringBuilder();
int readInInt = -1;
String charsSinceLastLineSep = "";
if (bufReader.ready()) {
while (-1 != (readInInt = bufReader.read())) {
char readInChar = (char) readInInt;
// if new line reset line buffer, otherwise add to buffer
if (readInChar == '\n' || readInChar == '\r') {
charsSinceLastLineSep = "";
} else {
charsSinceLastLineSep += readInChar;
}
result.append(readInChar);
}
bufReader.close();
// remove all characters added since last Carriage Return or NewLine was found indicating
// that line was not a complete log line
String resultString = (result.subSequence(0, (result.length() - charsSinceLastLineSep.length())).toString());
So I'm writing a java program that is supposed to keep analyzing strings from the user until the end of standard input (until they press CTRL+D or the end of an input file). The program works as intended, however when I press CTRL+D, there is a null pointer exception. Here is the code in question:
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
String line = " ";
while (line != null) {
line = in.readLine();
String[] tokens = line.split(" ");
System.out.println(line); ......
The null pointer is aimed at String[] tokens = line.split(" ");
It looks like the code is trying to tokenize a line that is null. But I thought I wrote it in a way to never attempt to tokenize a null line. Can anyone help me out?
Change your while loop to: -
while ((line = in.readLine())!= null) {
And remove the first line inside it.
Note that, you were reading the line inside the loop, and then testing later on. So, you would have got a NPE at the end of the file.
Also, if you are tokenizing the file after reading, I would prefer to use Scanner class indeed.
The problem is that while is a pre check condition. At the beginning of the loop, line isn't null. You should use a do-while instead.
while (line != null) {
line = in.readLine(); // in the loop, but in.readLine() returns null
String[] tokens = line.split(" "); // OH SHI- LINE IS NULL
System.out.println(line);