BufferedReader - Search for string inside .txt file - java

I am trying, using a BufferedReader to count the appearances of a string inside a .txt file. I am using:
File file = new File(path);
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
int appearances = 0;
while ((line = br.readLine()) != null) {
if (line.contains("Hello")) {
appearances++;
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Found " + appearances);
But the problem is that if my .txt file contains for example the string "Hello, world\nHello, Hello, world!" and "Hello" is to be found then the appearances become two instead of three because it searches a line for only one appearance of the string. How could I fix this? Thanks a lot

The simplest solution is to do
while ((line = br.readLine()) != null)
appearances += line.split("Hello", -1).length-1;
Note that, if instead of "Hello", you search for anything with regex-reserved characters, you should escape the string before splitting:
String escaped = Pattern.quote("Hello."); // avoid '.' special meaning in regex
while ((line = br.readLine()) != null)
appearances += line.split(escaped, -1).length-1;

This is an efficent and correct solution:
String line;
int count = 0;
while ((line = br.readLine()) != null)
int index = -1;
while((index = line.indexOf("Hello",index+1)) != -1){
count++;
}
}
return count;
It walks through the line and looks for the next index, starting from the previous index+1.
The problem with Peter's solution is that it is wrong (see my comment). The problem with TheLostMind's solution is that it creates a lot of new strings by replacement which is an unnecessary performance drawback.

A regex-driven version:
String line;
Pattern p = Pattern.compile(Pattern.quote("Hello")); // quotes in case you need 'Hello.'
int count = 0;
while ((line = br.readLine()) != null)
for (Matcher m = p.matcher(line); m.find(); count ++) { }
}
return count;
I am now curious as to performance between this and gexicide's version - will edit when I have results.
EDIT: benchmarked by running 100 times on a ~800k log file, looking for strings that were found once at the start, once around middle-ish, once at the end, and several times throughout. Results:
IndexFinder: 1579ms, 2407200hits. // gexicide's code
RegexFinder: 2907ms, 2407200hits. // this code
SplitFinder: 5198ms, 2407200hits. // Peter Lawrey's code, after quoting regexes
Conclussion: for non-regex strings, the repeated-indexOf approach is fastest by a nice margin.
Essential benchmark code (log file from vanilla Ubuntu 12.04 installation):
public static void main(String ... args) throws Exception {
Finder[] fs = new Finder[] {
new SplitFinder(), new IndexFinder(), new RegexFinder()};
File log = new File("/var/log/dpkg.log.1"); // around 800k in size
Find test = new Find();
for (int i=0; i<100; i++) {
for (Finder f : fs) {
test.test(f, log, "2014"); // start
test.test(f, log, "gnome"); // mid
test.test(f, log, "ubuntu1"); // end
test.test(f, log, ".1"); // multiple; not at start
}
}
test.printResults();
}

while (line.contains("Hello")) { // search until line has "Hello"
appearances++;
line = line.replaceFirst("Hello",""); // replace first occurance of "Hello" with empty String
}

Related

How do I count the occurence of a word in an array in Java?

I am working on a project where I have to read the data from a file into my code, in the txt file I have columns of data, and I have managed to separate each column of data into an array with this code.
public static void main(String[] args) {
String line = "";
String date = "";
ArrayList<String> date = new ArrayList<String>();
try {
FileReader fr = new FileReader("list.txt");
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) {
line.split("\\s+");
date.add(line.split("\\s+")[0]);
System.out.println(line.split("\\s+")[0]);
}
} catch (IOException e) {
System.out.println("File not found!");
}
This will output the first column of data from the "list.txt" file which is...
30-Nov-2016
06-Oct-2016
05-Feb-2016
04-Sep-2016
18-Apr-2016
09-Feb-2016
22-Oct-2016
20-Aug-2016
17-Dec-2016
25-Dec-2016
However, I want to count the occurrence of the word "Feb" so for example it will come up...
"The month February occurs: 2 times"
But I'm struggling to find the right code, could somebody please help me on this matter I've been trying for over 24 hours, any help will be greatly appreciated, I can't find any other questions that help me.
Another solution could be using split
String month = "Feb";
int count = 0;
while ((line = br.readLine()) != null)
{
String strDate = line.split("\\s+")[0]; // get first column, which has date
String temp = strDate.split("\\-")[1]; // get Month from extracted date.
if (month.equalsIgnoreCase(temp))
{
count++;
// or store strDate into List for further process.
}
}
System.out.println (count);// should print total occurrence of date with Feb month
==Edited==
Since, you are extracting date from each line using line.split("\\s+")[0], which means actual string, which only contains date would be extract string.
For simplicity, you could simply use a regular expression, something like...
Pattern p = Pattern.compile("Feb", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("30-Nov-2016, 06-Oct-2016, 05-Feb-2016, 04-Sep-2016, 18-Apr-2016, 09-Feb-2016, 22-Oct-2016, 20-Aug-2016, 17-Dec-2016, 25-Dec-2016");
int count = 0;
while (m.find()) {
count++;
}
System.out.println("Count = " + count);
Which, based on the input, would be 2.
Now, obviously, if you're reading each value from a file one at a time, this is not that efficient, and simply using something like...
if (line.toLowerCase().concat("feb")) {
count++;
}
would be simple and quicker
Updated...
So, based on the provided input data and the following code...
Pattern p = Pattern.compile("Feb", Pattern.CASE_INSENSITIVE);
int count = 0;
try (BufferedReader br = new BufferedReader(new InputStreamReader(Test.class.getResourceAsStream("Data.txt")))) {
String text = null;
while ((text = br.readLine()) != null) {
Matcher m = p.matcher(text);
if (m.find()) {
count++;
}
}
System.out.println(count);
} catch (IOException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
It prints 67.
Now, this is brute force method, because I'm checking the whole line. In order to overcome possible mismatches in the text, you should split the line by the common delimiter (ie tab character) and check the first element, for example...
String[] parts = text.split("\t");
Matcher m = p.matcher(parts[0]);

Java compare strings from two places and exclude any matches

I'm trying to end up with a results.txt minus any matching items, having successfully compared some string inputs against another .txt file. Been staring at this code for way too long and I can't figure out why it isn't working. New to coding so would appreciate it if I could be steered in the right direction! Maybe I need a different approach? Apologies in advance for any loud tutting noises you may make. Using Java8.
//Sending a String[] into 'searchFile', contains around 8 small strings.
//Example of input: String[]{"name1","name2","name 3", "name 4.zip"}
^ This is my exclusions list.
public static void searchFile(String[] arr, String separator)
{
StringBuilder b = new StringBuilder();
for(int i = 0; i < arr.length; i++)
{
if(i != 0) b.append(separator);
b.append(arr[i]);
String findME = arr[i];
searchInfo(MyApp.getOptionsDir()+File.separator+"file-to-search.txt",findME);
}
}
^This works fine. I'm then sending the results to 'searchInfo' and trying to match and remove any duplicate (complete, not part) strings. This is where I am currently failing. Code runs but doesn't produce my desired output. It often finds part strings rather than complete ones. I think the 'results.txt' file is being overwritten each time...but I'm not sure tbh!
file-to-search.txt contains: "name2","name.zip","name 3.zip","name 4.zip" (text file is just a single line)
public static String searchInfo(String fileName, String findME)
{
StringBuffer sb = new StringBuffer();
try {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String line = null;
while((line = br.readLine()) != null)
{
if(line.startsWith("\""+findME+"\""))
{
sb.append(line);
//tried various replace options with no joy
line = line.replaceFirst(findME+"?,", "");
//then goes off with results to create a txt file
FileHandling.createFile("results.txt",line);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}
What i'm trying to end up with is a result file MINUS any matching complete strings (not part strings):
e.g. results.txt to end up with: "name.zip","name 3.zip"
ok with the information I have. What you can do is this
List<String> result = new ArrayList<>();
String content = FileUtils.readFileToString(file, "UTF-8");
for (String s : content.split(", ")) {
if (!s.equals(findME)) { // assuming both have string quotes added already
result.add(s);
}
}
FileUtils.write(newFile, String.join(", ", result), "UTF-8");
using apache commons file utils for ease. You may add or remove spaces after comma as per your need.

How can I read a specifc column from a text file and calculate the average of this column?

I am a little stuck with a java exercise I am currently working on. I have a text file in this format:
Quio Kla,2221,3.6
Wow Pow,3332,9.3
Zou Tou,5556,9.7
Flo Po,8766,8.1
Andy Candy,3339,6.8
I now want to calculate the average of the whole third column, but I have to extract the data first I believe and store it in an array. I was able to read all the data with a buffered reader and print out the entire file in console, but that did not get me closer to get it into an array. Any suggestions on how I can read in a specific column of a text file with a buffered readder into an array would be highly appreciated.
Thank you very much in advance.
You can split your text file by using this portion of code:
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader("textfile.txt"));
String read = null;
while ((read = in.readLine()) != null) {
String[] splited = read.split(",");
for (String part : splited) {
System.out.println(part);
}
}
} catch (IOException e) {
System.out.println("There was a problem: " + e);
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e) {
}
}
And then you'll have all your columns in the array part.
It`s definitely not the best solution, but should be sufficient for you
BufferedReader input = new BufferedReader(new FileReader("/file"));
int numOfColumn = 2;
String line = "";
ArrayList<Integer>lines = new ArrayList<>();
while ((line = input.readLine()) != null) {
lines.add(Integer.valueOf(line.split(",")[numOfColumn-1]));
}
long sum =0L;
for(int j:lines){
sum+=j;
}
int avg = (int)sum/lines.size();
I'm going to assume each data set is separated by newline characters in your text file.
ArrayList<Double> thirdColumn = new ArrayList<>();
BufferedReader in = null;
String line=null;
//initialize your reader here
while ((line = in.readLine())!=null){
String[] split = line.split(",");
if (split.length>2)
thirdColumn.add(Double.parseDouble(split[2]));
}
By the end of the while loop, you should have the thirdColumn ArrayList ready and populated with the required data.
The assumption is made that your data set has the following standard format.
String,Integer,Double
So naturally a split by a comma should give a String array of length 3, Where the String at index 2 contains your third column data.

Problems trying to retrieve information from txt file

I'm stuck on one issue in my application. I have one text file that contains one piece of code that I need to retrieve to apply into one string variable. The problem is which is the best way to do this? I ran those samples below, but they are logically incorrect / incomplete. Take a look:
Reading through line:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
String line = null;
try {
while( (line = bfr.readLine()) != null ){
line.contentEquals("d.href");
System.out.println(line);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Reading through character:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
int i = 0;
try {
while ((i = bfr.read()) != -1) {
char ch = (char) i;
System.out.println(Character.toString(ch));
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
};
Reading through Scanner:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
BufferedReader bfr = new BufferedReader(new FileReader(Node));
int wordCount = 0, totalcount = 0;
Scanner s = new Scanner(googleNode);
while (s.hasNext()) {
totalcount++;
if (s.next().contains("(?=d.href).*?(}=?)")) wordCount++;
}
System.out.println(wordCount+" "+totalcount);
With (1.) I'm having difficult to find d.href with contains the start of the code piece.
With (2.) I can't think or find one way to store d.href as string and retrieve the rest of information.
With (3.) I can correctly find d.href but I can't retrieve pieces of the txt.
Could anyone help me please?
As answer of my question, I used scanner to read word by word in the text file. .contains("window.maybeRedirectForGBV") returns one boolean value, and hasNext() one string. Then, I stoped the query for my code stretch on the text file one word before I wanted and moved forward one more time to store the value of the next word on one string variable. From this point you only need to treat your string the way you want. Hope this help.
String stringSplit = null;
Scanner s = new Scanner(Node);
while (s.hasNext()) {
if (s.next().contains("window.maybeRedirectForGBV")){
stringSplit = s.next();
break;
}
}
You can use regular expressions like this:
Pattern pattern = Pattern.compile("^\\s*d\\.href([^=]*)=(.*)$");
// Groups: 1-----1 2--2
// Possibly spaces, "d.href", any characters not '=', the '=', any chars.
....
Matcher m = pattern.matcher(line);
if (m.matches()) {
String dHrefSuffix = m.group(1);
String value = m.group(2);
System.out.println(value);
break;
}
BufferedReader will do.

How can I get the number of empty lines using the LineNumberReader?

I am trying to use the LineNumberReader to get the number of empty lines in a file. However I cannot manage to get such information. the following is the code that I am using
LineNumberReader reader = new LineNumberReader(new FileReader(this.file));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {
if(lineRead.length == 0){
cnt++;
}
}
reader.close();
System.out.println(cnt);
Does anyone know of how to be able to get such information ?
Try with
if(lineRead.isEmpty()){
or
if(lineRead.trim().isEmpty()){
if you consider empty a line that contains only spaces or tabs

Categories

Resources