I am trying to read the words of a file into a stream and the count the number of times the word "the" appears in the file. I cannot seem to figure out an efficient way of doing this with only streams.
Example: If the file contained a sentence such as: "The boy jumped over the river." the output would be 2
This is what I've tried so far
public static void main(String[] args){
String filename = "input1";
try (Stream<String> words = Files.lines(Paths.get(filename))){
long count = words.filter( w -> w.equalsIgnoreCase("the"))
.count();
System.out.println(count);
} catch (IOException e){
}
}
Just line name suggests Files.lines returns stream of lines not words. If you want to iterate over words I you can use Scanner like
Scanner sc = new Scanner(new File(fileLocation));
while(sc.hasNext()){
String word = sc.next();
//handle word
}
If you really want to use streams you can split each line and then map your stream to those words
try (Stream<String> lines = Files.lines(Paths.get(filename))){
long count = lines
.flatMap(line->Arrays.stream(line.split("\\s+"))) //add this
.filter( w -> w.equalsIgnoreCase("the"))
.count();
System.out.println(count);
} catch (IOException e){
e.printStackTrace();//at least print exception so you would know what wend wrong
}
BTW you shouldn't leave empty catch blocks, at least print exception which was throw so you would have more info about problem.
You could use Java's StreamTokenizer for this purpose.
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.StreamTokenizer;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) throws IOException {
long theWordCount = 0;
String input = "The boy jumped over the river.";
try (InputStream stream = new ByteArrayInputStream(
input.getBytes(StandardCharsets.UTF_8.name()))) {
StreamTokenizer tokenizer =
new StreamTokenizer(new InputStreamReader(stream));
int tokenType = 0;
while ( (tokenType = tokenizer.nextToken())
!= StreamTokenizer.TT_EOF) {
if (tokenType == StreamTokenizer.TT_WORD) {
String word = tokenizer.sval;
if ("the".equalsIgnoreCase(word)) {
theWordCount++;
}
}
}
}
System.out.println("The word 'the' count is: " + theWordCount);
}
}
Use the stream reader to calculate the number of words.
Related
I'm trying to extract data from a CSV file, in which I have the following example CSV
timestamp, Column1,column2,column3
2019-05-07 19:17:23,x,y,z
2019-03-30 19:41:33,a,b,c
etc.
currently, my code is as follows:
public static void main(String[]args){
String blah = "file.csv";
File file = new File(blah);
try{
Scanner iterate = new Scanner(file);
iterate.next(); //skips the first line
while(iterate.hasNext()){
String data = iterate.next();
String[] values = data.split(",");
Float nbr = Float.parseFloat(values[2]);
System.out.println(nbr);
}
iterate.close();
}catch (FileNotFoundException e){
e.printStackTrace();
}
}
However, my code is giving me an error
java.lang.ArrayIndexOutOfBoundsException: Index 3 is out of bounds for length 3
My theory here is the split is the problem here. As there is no comma, my program thinks that the array ends with only the first element since there's no comma on the first element (I've tested it with the timestamp column and it seems to work, however, I want to print the values in column 3)
How do I use the split function to get the column1, column2, and column3 values?
import java.util.*;
import java.util.*;
import java.io.*;
public class Sample
{
public static void main(String[] args)
{
String line = "";
String splitBy = ",";
try
{ int i=0;
String file="blah.csv";
BufferedReader br = new BufferedReader(new FileReader(file));
int iteration=0;
while ((line = br.readLine()) != null) //returns a Boolean value
{ if(iteration < 1) {
iteration++;
continue;} //skips the first line
String[] stu = line.split(splitBy);
String time=stu[3];
System.out.println(time);
}
}
catch (IOException e)
{
e.printStackTrace();
}} }
Try this way by using BufferedReader
Input:
timestamp, Column1,column2,column3
2019-05-07 19:17:23,x,y,z
2019-03-30 19:41:33,a,b,c
2019-05-07 19:17:23,x,y,a
2019-03-30 19:41:33,a,b,f
2019-05-07 19:17:23,x,y,x
2019-03-30 19:41:33,a,b,y
Output for this above code is:
z
c
a
f
x
y
A few suggestions:
Use Scanner#nextLine and Scanner#hasNextLine.
Use try-with-resources statement.
Since lines have either whitespace or a comma as the delimiter, use the regex pattern, \s+|, as the parameter to the split method. The regex pattern, \s+|, means one or more whitespace characters or a comma. Alternatively, you can use [\s+,] as the regex pattern.
Demo:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
String blah = "file.csv";
File file = new File(blah);
try (Scanner iterate = new Scanner(file)) {
iterate.nextLine(); // skips the first line
while (iterate.hasNextLine()) {
String line = iterate.nextLine();
String[] values = line.split("[\\s+,]");
System.out.println(Arrays.toString(values));
}
}
}
}
Output:
[2019-05-07, 19:17:23, x, y, z]
[2019-03-30, 19:41:33, a, b, c]
So, I have a separate program that requires a string of numbers in the format: final static private String INITIAL = "281043765"; (no spaces) This program works perfectly so far with the hard coded assignment of the numbers. I need to, instead of typing the numbers in the code, have the program read a txt file, and assign the numbers in the file to the INITIAL variable
To do this, I'm attempting to use StringTokenizer. My implementation outputs [7, 2, 4, 5, 0, 6, 8, 1, 3]. I need it to output the numbers without the "[]" or the "," or any spaces between each number. Making it look exactly like INITIAL. I'm aware I probably need to put the [] and , as delimiters but the original txt file is purely numbers and I don't believe that will help. Here is my code. (ignore all the comments please)
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;
public class test {
//public static String num;
static ArrayList<String> num;
//public static TextFileInput myFile;
public static StringTokenizer myTokens;
//public static String name;
//public static String[] names;
public static String line;
public static void main(String[] args) {
BufferedReader reader = null;
try {
File file = new File("test3_14.txt");
reader = new BufferedReader(new FileReader(file));
num = new ArrayList<String>();
while ((line = reader.readLine())!= null) {
myTokens = new StringTokenizer(line, " ,");
//num = new String[myTokens.countTokens()];
while (myTokens.hasMoreTokens()){
num.add(myTokens.nextToken());
}
}
System.out.println(num);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
You are currently printing the default .toString() implementation of ArrayList. Try this instead:
for (String nbr : num) {
System.out.print(nbr)
}
To get rid of the brackets, you have to actually call each item in the ArrayList. Try this just below your System.out.println
for(String number : num)
System.out.print(number);
System.out.println("");
Can you also provide sample input data?
Try Replacing space and , with empty string
line = StringUtils.replaceAll(line, “,” , “”);
line = StringUtils.replaceAll(line, “ “, “”);
System.out.println(line);
I am trying to search a string in a file in java and this is what, I tried . In the below program I am getting output as No Data Found and I am sure that the file has the word which I am searching
import java.io.*;
import java.util.Scanner;
public class readfile {
static String[] list;
static String sear = "CREATE";
public void search() {
Scanner scannedFile = new Scanner("file.txt");
while (scannedFile.hasNext()) {
String search = scannedFile.next();
System.out.println("SEARCH CONTENT:"+search);
if (search.equalsIgnoreCase(sear)) {
System.out.println("Found: " +search);
}
else {
System.out.println("No data found.");
}
}
}
public static void main (String [] args) throws IOException {
readfile read = new readfile();
read.search();
}
}
Don't do:
search.equalsIgnoreCase(sear)
Try:
search.toUpperCase().contains(sear)
I think the search is the whole String of the File, so you never would become true with equals.
Use nextLine() instead of next() and then use split. Like this :
What's the difference between next() and nextLine() methods from Scanner class?
Difference :
next() can read the input only till the space. It can't read two words separated by space. Also, next() places the cursor in the same line after reading the input.
nextLine() reads input including space between the words (that is, it reads till the end of line \n). Once the input is read, nextLine() positions the cursor in the next line.
Use following code :
String search = scannedFile.nextLine();
String[] pieces = data.split("\\s+");
for(int i=0; i<pieces.length(); i++)
{
if(pieces[i].equalsIgnoreCase(sear))
{
System.out.println("Found: " +search);
}
else
{
System.out.println("No data found.");
}
}
Ok, here is my understanding of your program.
You search in the file file.txt the word CREATE.
To do so, you read each word in the file and if it is CREATE you print Found create.
The issue here is that for every word in the file, if it isn't CREATE you print No data found.
Instead you should wait for the end of the file and then if you haven't found it you will print the error message.
Try this :
import java.io.*;
import java.util.ArrayList;
import java.util.List;
public class readfile {
static String[] list;
static String sear = "CREATE";
public void search() throws IOException {
List<String> saveAllLinesForRewriting = new ArrayList<String>();
// Need to read file line by line
BufferedReader bufferedReader = new BufferedReader(new FileReader("file.txt"));
String saveLine;
while ((saveLine = bufferedReader.readLine()) != null) {
saveAllLinesForRewriting.add(saveLine);
}
bufferedReader.close();
// Check if your word exists
if (saveAllLinesForRewriting.toString().contains(sear)) {
System.out.println("Found: " + sear);
} else {
System.out.println("No data found.");
}
}
public static void main(String[] args) throws IOException {
readfile read = new readfile();
read.search();
}
}
Instead of reading file using scanner, first create a file resource to read by adding the below line
File file = new File("Full Path of file location");
before
Scannner scannedfile = new Scanner("file.txt");
and change the above line to
Scanner scannedfile = new Scanner(file);
rest your code is working fine.
The problem is that the scanner is scanning the String "file.txt" and not the file.
To fix this you have to do what amit28 says. Your finally code is as follows
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Scanner;
public class readfile {
static String[] list;
static String sear = "CREATE";
public void search() {
File f = new File("file.txt");
Scanner scannedFile;
try {
scannedFile = new Scanner(f);
while (scannedFile.hasNext()) {
String search = scannedFile.next();
System.out.println("SEARCH CONTENT:"+search);
if (search.equalsIgnoreCase(sear)) {
System.out.println("Found: " +search);
}
else {
System.out.println("No data found.");
}
}
} catch (FileNotFoundException e) {
// FIXME Auto-generated catch block
e.printStackTrace();
}
}
public static void main (String [] args) throws IOException {
readfile read = new readfile();
read.search();
}
}
I tried to do counting lines, words, character from user "inputted" file.
After this show counting and keep asking again.
If file doesn't exist print all data which have been counted during running.
Code:
public class KeepAskingApp {
private static int lines;
private static int words;
private static int chars;
public static void main(String[] args) {
boolean done = false;
//counters
int charsCount = 0, wordsCount = 0, linesCount = 0;
Scanner in = null;
Scanner scanner = null;
while (!done) {
try {
in = new Scanner(System.in);
System.out.print("Enter a (next) file name: ");
String input = in.nextLine();
scanner = new Scanner(new File(input));
while(scanner.hasNextLine()) {
lines += linesCount++;
Scanner lineScanner = new Scanner(scanner.nextLine());
lineScanner.useDelimiter(" ");
while(lineScanner.hasNext()) {
words += wordsCount++;
chars += charsCount += lineScanner.next().length();
}
System.out.printf("# of chars: %d\n# of words: %d\n# of lines: ",
charsCount, wordsCount, charsCount);
lineScanner.close();
}
scanner.close();
in.close();
} catch (FileNotFoundException e) {
System.out.printf("All lines: %d\nAll words: %d\nAll chars: %d\n",
lines, words, chars);
System.out.println("The end");
done = true;
}
}
}
}
But I can't understand why it always show output with no parameters:
All lines: 0
All words: 0
All chars: 0
The end
Why it omits all internal part.
It may be coz I'm using few scanners, but all look ok.
Any suggestions?
UPDATE:
Thanks all who give some hint. I rethinking all constructed and rewrite code with newly info.
To awoid tricky scanner input line, I used JFileChooser:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import javax.swing.JFileChooser;
public class KeepAskingApp {
private static int lines;
private static int words;
private static int chars;
public static void main(String[] args) {
boolean done = false;
// counters
int charsCount = 0, wordsCount = 0, linesCount = 0;
Scanner in = null;
Scanner lineScanner = null;
File selectedFile = null;
while (!done) {
try {
try {
JFileChooser chooser = new JFileChooser();
if (chooser.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
selectedFile = chooser.getSelectedFile();
in = new Scanner(selectedFile);
}
while (in.hasNextLine()) {
linesCount++;
lineScanner = new Scanner(in.nextLine());
lineScanner.useDelimiter(" ");
while (lineScanner.hasNext()) {
wordsCount++;
charsCount += lineScanner.next().length();
}
}
System.out.printf(
"# of chars: %d\n# of words: %d\n# of lines: %d\n",
charsCount, wordsCount, linesCount);
lineScanner.close();
lines += linesCount;
words += wordsCount;
chars += charsCount;
in.close();
} finally {
System.out.printf(
"\nAll lines: %d\nAll words: %d\nAll chars: %d\n",
lines, words, chars);
System.out.println("The end");
done = true;
}
} catch (FileNotFoundException e) {
System.out.println("Error! File not found.");
}
}
}
}
Couple of issues (actually there are many issues with your code, but I will address the ones directly related to the output you have posted):
First of all, the stuff in the catch block only happens if you get a FileNotFoundException; that's there to handle and recover from errors. I suspect you meant to put a finally block there, or you meant to do that after the catch. I suggest reading this tutorial on catching and handling exceptions, which straightforwardly describes try, catch, and finally.
Once you read that tutorial, come back to your code; you may find that you have a little bit of reorganizing to do.
Second, with the above in mind, it's obvious by the output you are seeing that you are executing the code in that catch block, which means you are getting a FileNotFoundException. This would be caused by one of two (possibly obvious) things:
The file you entered, well, wasn't found. It may not exist or it may not be where you expect. Check to make sure you are entering the correct filename and that the file actually exists.
The input string is not what you expect. Perhaps you read a blank line from previous input, etc.
Addressing reason 2: If there is already a newline on the input buffer for whatever reason, you will read a blank line with Scanner. You might want to print the value of input just before opening the file to make sure it's what you expect.
If you're seeing blank lines, just skip them. So, instead of this:
String input = in.nextLine();
scanner = new Scanner(new File(input));
Something like this instead would be immune to blank lines:
String input;
do {
input = in.nextLine().trim(); // remove stray leading/trailing whitespace
} while (input.isEmpty()); // keep asking for input if a blank line is read
scanner = new Scanner(new File(input));
And, finally, I think you can work out the reason that you're seeing 0's in your output. When you attempt to open the file with new Scanner(new File(input)); and it fails because it can't find the file, it throws an exception and the program immediately jumps to the code in your catch block. That means lines, words, and chars still have their initial value of zero (all code that modifies them was skipped).
Hope that helps.
Your println()s are in a catch block
} catch (FileNotFoundException e) {
System.out.printf("All lines: %d\nAll words: %d\nAll chars: %d\n",
lines, words, chars);
System.out.println("The end");
done = true;
}
That means you caught a FileNotFoundException. I think you can figure out from here.
I have the following code but I don't understand how I can reset the pointer to the starter position:
BufferedReader inp=new BufferedReader(new FileReader(file));
Scanner leggi=new Scanner(inp);
for(int i=0;i<nwords;i++){
while(leggi.hasNext())
if(leggi.next().equals(args[i+2]))
occorrenze[i]=occorrenze[i]+1;
}
inp.close();
I tried
inp.mark(0);
inp.reset();
with no results.
Paul,
I suggest you read through this old thread: Java BufferedReader back to the top of a text file?.
Personally I prefer Jon Skeet's response, which boils down to "Don't bother [unless you MUST]."
Cheers. Keith.
EDIT: Also you should ALLWAYS close that input file, even if you hit an Exception. The finally block is perfect for this.
EDIT2:
Hope you're still with us.
Here's my attempt, and FWIW, you DON'T need to reset the input-file, you just need to transpose your while and for loops.
package forums;
import java.util.Scanner;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class WordOccurrenceCount
{
public static void main(String[] args) {
try {
String[] words = { "and", "is", "a", "the", "of", "as" };
int[] occurrences = readOccurrences("C:/tmp/prose.txt", words);
for ( int i=0; i<words.length; i++ ) {
System.out.println(words[i] + " " + occurrences[i]);
}
} catch (Exception e) {
e.printStackTrace();
}
}
private static final int[] readOccurrences(String filename, String... words)
throws IOException
{
int[] occurrences = new int[words.length];
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(filename));
Scanner scanner = new Scanner(reader);
while ( scanner.hasNext() ) {
String word = scanner.next();
for ( int i=0; i<words.length; i++ ) {
if ( words[i].equals(word) ) {
occurrences[i]++;
}
}
}
} finally {
if(reader!=null) reader.close();
}
return occurrences;
}
}
And BTW, java.util.Map is perfect for building a frequency table... Parallel arrays are just SOOOOO 90's. The "default" implementation of Map is the HashMap class... By default I mean use HashMap whenever you need a Map, unless you've got a good reason to use something else, which won't be often. HashMap is generally the best allround performer.
You have two options:
reset the FileReader (FieldReader.reset) instead and create a new bufferedreader.
use mark functionaliy (try BufferedReader.markSupported)
I think what you tried is to call mark after you read the input. Instead do the following:
inp.mark(readAheadLimit);
// .... all your code processing input
inp.reset();
mark(int readAheadLimit) takes a readAheadLimit parameter.
You should not set the readAheadLimit to 0. Try using a meaningful number that is larger than number of bytes you read in an iteration.