Searching a text file in java and Listing the results - java

I've really searched around for ideas on how to go about this, and so far nothing's turned up.
I need to search a text file via keywords entered in a JTextField and present the search results to a user in an array of columns, like how google does it. The text file has a lot of content, about 22,000 lines of text. I want to be able to sift through lines not containing the words specified in the JTextField and only present lines containing at least one of the words in the JTextField in rows of search results, each row being a line from the text file.
Anyone has any ideas on how to go about this? Would really appreciate any kind of help. Thank you in advance

You can read the file line by line and search in every line for your keywords. If you find one, store the line in an array.
But first split you text box String by whitespaces and create the array:
String[] keyWords = yourTextBoxString.split(" ");
ArrayList<String> results = new ArrayList<String>();
Reading the file line by line:
void readFileLineByLine(File file) {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
processOneLine(line);
}
br.close();
}
Processing the line:
void processOneLine(String line) {
for (String currentKey : keyWords) {
if (line.contains(currentKey) {
results.add(line);
break;
}
}
}
I have not testst this, but you should get a overview on how you can do this.
If you need more speed, you can also use a RegularExpression to search for the keywords so you don't need this for loop.

Read in file, as per the Oracle tutorial, http://docs.oracle.com/javase/tutorial/essential/io/file.html#textfiles Iterate through each line and search for your keyword(s) using String's contain method. If it contains the search phrase, place the line and line number in a results List. When you've finished you can display the results list to the user.

You need a method as follows:
List<String> searchFile(String path, String match){
List<String> linesToPresent = new ArrayList<String>();
File f = new File(path);
FileReader fr;
try {
fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
String line;
do{
line = br.readLine();
Pattern p = Pattern.compile(match);
Matcher m = p.matcher(line);
if(m.find())
linesToPresent.add(line);
} while(line != null);
br.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return linesToPresent;
}
It searches a file line by line and checks with regex if a line contains a "match" String. If you have many Strings to check you can change the second parameter to String[] match and with a foreach loop check for each String match.

You can use :
FileUtils
This will read each line and return you a List<String>.
You can iterate over this List<String> and check whether the String contains the word entered by the user, if it contains, add it to another List<String>. then at the end you will be having another List<String> which contains all the lines which contains the word entered by the user. You can iterate this List<String> and display the result to the user.

Related

split method to output values under each other when reading from a file

My code works fine however it prints the values side by side instead of under each other line by line. Like this:
iatadult,DDD,
iatfirst,AAA,BBB,CCC
I have done a diligent search on stackoverflow and none of my solution's seem to work. I know that I have to make the change while the looping is going on. However none of the examples I have seen have worked. Any further understanding or techniques to achieve my goal would be helpful. Whatever I am missing is probably very small. Please help.
String folderPath1 = "C:\\PayrollSync\\client\\client_orginal.txt";
File file = new File (folderPath1);
ArrayList<String> fileContents = new ArrayList<>(); // holds all matching client names in array
try {
BufferedReader reader = new BufferedReader(new FileReader(file));// reads entire file
String line;
while (( line = reader.readLine()) != null) {
if(line.contains("fooa")||line.contains("foob")){
fileContents.add(line);
}
//---------------------------------------
}
reader.close();// close reader
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println(fileContents);
Add a Line Feed before you add to fileContents.
fileContents.add(line+"\n");
By printing the list directly as you are doing you are invoking the method toString() overridden for the list which prints the contents like this:
obj1.toString(),obj2.toString() .. , objN.toString()
in your case the obj* are of type String and the toString() override for it returns the string itself. That's why you are seeing all the strings separated by comma.
To do something different, i.e: printing each object in a separate line you should implement it yourself, and you can simply append the new line character('\n') after each string.
Possible solution in java 8:
String result = fileContents.stream().collect(Collectors.joining('\n'));
System.out.println(result);
A platform-independent way to add a new line:
fileContents.add(line + System.lineSeparator);
Below is my full answer. Thanks for your help stackoverflow. It took me all day but I have a full solution.
File file = new File (folderPath1);
ArrayList<String> fileContents = new ArrayList<>(); // holds all matching client names in array
try {
BufferedReader reader = new BufferedReader(new FileReader(file));// reads entire file
String line;
while (( line = reader.readLine()) != null) {
String [] names ={"iatdaily","iatrapala","iatfirst","wpolkrate","iatjohnson","iatvaleant"};
if (Stream.of(names).anyMatch(line.trim()::contains)) {
System.out.println(line);
fileContents.add(line + "\n");
}
}
System.out.println("---------------");
reader.close();// close reader
} catch (Exception e) {
System.out.println(e.getMessage());
}

Remove stop words from file - going over it multiple times causes content duplication and does not remove the words

I am trying to go over a bunch of files, read each of them, and remove all stopwords from a specified list with such words. The result is a disaster - the content of the whole file copied over and over again.
What I tried:
- Saving the file as String and trying to look with regex
- Saving the file as String and going over line by line and comparing tokens to the stopwords that are stored in a LinkedHashSet, I can also store them in a file
- tried to twist the logic below in multiple ways, getting more and more ridiculous output.
- tried looking into text / line with the .contains() method, but no luck
My general logic is as follows:
for every word in the stopwords set:
while(file has more lines):
save current line into String
while (current line has more tokens):
assign current token into String
compare token with current stopword:
if(token equals stopword):
write in the output file "" + " "
else: write in the output file the token as is
Tried what's in this question and many other SO questions, but just can't achieve what I need.
Real code below:
private static void removeStopWords(File fileIn) throws IOException {
File stopWordsTXT = new File("stopwords.txt");
System.out.println("[Removing StopWords...] FILE: " + fileIn.getName() + "\n");
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader(stopWordsTXT));
Set<String> stopWords = new LinkedHashSet<String>();
for (String line; (line = readerSW.readLine()) != null; readerSW.readLine()) {
// trim() eliminates leading and trailing spaces
stopWords.add(line.trim());
}
File outp = new File(fileIn.getPath().substring(0, fileIn.getPath().lastIndexOf('.')) + "_NoStopWords.txt");
FileWriter fOut = new FileWriter(outp);
Scanner readerTxt = new Scanner(new FileInputStream(fileIn), "UTF-8");
while(readerTxt.hasNextLine()) {
String line = readerTxt.nextLine();
System.out.println(line);
Scanner lineReader = new Scanner(line);
for (String curSW : stopWords) {
while(lineReader.hasNext()) {
String token = lineReader.next();
if(token.equals(curSW)) {
System.out.println("---> Removing SW: " + curSW);
fOut.write("" + " ");
} else {
fOut.write(token + " ");
}
}
}
fOut.write("\n");
}
fOut.close();
}
What happens most often is that it looks for the first word from the stopWords set and that's it. The output contains all the other words even if I manage to remove the first one. And the first will be there in the next appended output in the end.
Part of my stopword list
about
above
after
again
against
all
am
and
any
are
as
at
With tokens I mean words, i.e. getting every word from the line and comparing it to the current stopword
After awhile of debugging I believe I have found the solution. This problem is very tricky as you have to use several different scanners and file readers etc. Here is what I did:
I changed how you added to your StopWords set, as it wasn't adding them correctly. I used a buffered reader to read each line, then a scanner to read each word, then added it to the set.
Then when you compared them I got rid of one of your loops as you can easily use the .contains() method to check if the word was a stopWord.
I left you to do the part of writing to the file to take out the stop words, as I'm sure you can figure that out now that everything else is working.
-My sample stop words txt file:
Stop words
Words
-My samples input file was the exact same, so it should catch all three words.
The code:
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader("stopWords.txt"));
Set<String> stopWords = new LinkedHashSet<String>();
String stopWordsLine = readerSW.readLine();
while (stopWordsLine != null) {
// trim() eliminates leading and trailing spaces
Scanner words = new Scanner(stopWordsLine);
String word = words.next();
while(word != null) {
stopWords.add(word.trim()); //Add the stop words to the set
if(words.hasNext()) {
word = words.next(); //If theres another line, read it
}
else {
break; //else break the inner while loop
}
}
stopWordsLine = readerSW.readLine();
}
BufferedReader outp = new BufferedReader(new FileReader("Words.txt"));
String line = outp.readLine();
while(line != null) {
Scanner lineReader = new Scanner(line);
String line2 = lineReader.next();
while(line2 != null) {
if(stopWords.contains(line2)) {
System.out.println("removing " + line2);
}
if(lineReader.hasNext()) { //If theres another line, read it
line2 = lineReader.next();
}
else {
break; //else break the first while loop
}
}
lineReader.close();
line = outp.readLine();
}
OutPut:
removing Stop
removing words
removing Words
Let me know if I can elaborate any more on my code or why I did something!

Java - Read and storing in an array

I want to read the contents of a text file, split on a delimiter and then store each part in a separate array.
For example the-file-name.txt contains different string all on a new line:
football/ronaldo
f1/lewis
wwe/cena
So I want to read the contents of the text file, split on the delimiter "/" and store the first part of the string before the delimiter in one array, and the second half after the delimiter in another array. This is what I have tried to do so far:
try {
File f = new File("the-file-name.txt");
BufferedReader b = new BufferedReader(new FileReader(f));
String readLine = "";
System.out.println("Reading file using Buffered Reader");
while ((readLine = b.readLine()) != null) {
String[] parts = readLine.split("/");
}
} catch (IOException e) {
e.printStackTrace();
}
This is what I have achieved so far but I am not sure how to go on from here, any help in completing the program will be appreciated.
You can create two Lists one for the first part and se second for the second part :
List<String> part1 = new ArrayList<>();//create a list for the part 1
List<String> part2 = new ArrayList<>();//create a list for the part 2
while ((readLine = b.readLine()) != null) {
String[] parts = readLine.split("/");//you mean to split with '/' not with '-'
part1.add(parts[0]);//put the first part in ths list part1
part2.add(parts[1]);//put the second part in ths list part2
}
Outputs
[football, f1, wwe]
[ronaldo, lewis, cena]

complete indexing of text file java

im trying to read a text file, sort the words within alphabetically and display what line numbers those words appear on.
Im new to java so not sure what the most efficient way to approach the system is.
My plan so far is to:
-use a scanner to parse file into one string
-string.split
-lineCount++
-(somehow sort those split strings alphabetically)
-print sorted words with line number next to them
Is that the best way of going about this? im not sure if java has some sort of ordered dictionary maybe i could use?
A Scanner is fine, as you could scan per word, not even needing a split.
A BufferedReader would be for line-wise reading, and there exists a LineNumberReader for your goal: counting lines.
I head indicate the encoding of the file.
SortedMap<String, SortedSet<Integer>> linenosPerWord = new TreeMap<>();
// A BufferedReader with a linenumber counter:
try (LineNumberReader in = new LineNumberReader(new InputStreamReader(
new FileInputSTream(file, StandardCharsets.UTF_8))) {
for (;;) {
String line = in.readLine();
if (line == null) {
break;
}
int lineno = in.getLineNumber();
String[] words = line.split("\\P{LM}"); // Split on non-letters and non-accents
for (String word : words) {
word = word.toLowerCase(); // Possible with Locale
SortedSet<Integer> linenos = linenosPerWord.get(word);
if (linenos == null) {
linenos = new TreeSet<>();
linenosPerWord.put(word, lineno);
}
linenos.add(lineno);
}
}
}
linenosPerWord.remove(""); // Remove a possibly found empty word, like in "-Hello"

Read text containing multiple line using bufferedreader

I would like to know how to read a text file containing multiple lines in java using BufferedStreamReader.
Every line has two words separated by (;) and I want to use split() String operation to separate the 2 words. I also need to compare each word to a word in a master arraylist.
I'm having problems to continue.
Here's my code:
{
FileInputStreamReader f = new FileInputStreamReader(C://Desktop/test.txt);
InputStreamReader reader = new InputStreamReader(f);
BufferedReader Buff = new BufferedReader (reader);
String Line = buff.readLine();
String t[] = Line.split(;);
}
Replace
String Line = Buff.readLine();
with
// buffer for storing file contents in memory
StringBuffer stringBuffer = new StringBuffer("");
// for reading one line
String line = null;
// keep reading till readLine returns null
while ((line = Buff.readLine()) != null) {
// keep appending last line read to buffer
stringBuffer.append(line);
}
Now, you have read the complete file into StringBuffer, you do whatever you want.
Hope this helps.
Try
while((line=buff.readLine())!=null){
System.out.println(line);
}
You need a while loop to read all the lines.
Here is an example http://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/
You can use BufferedReader to loop through each of the line encountered within the specified file. In order to get your words split by a ";", you can use .split and can store the resulting array in a list.
Finally, combine all the lists to a single list which would inturn hold all the words present in your file.
List<String> words = Arrays.asList(line.split(";"));
list.addAll(words);
Now you would want to compare the retrieved list against a Master list containing all your records.
// Compare the 2 lists, assuming your file list has less number of
// records
masterList.removeAll(list);
The above statement can be used in reverse too; in case the file holds the master list of words. Alternatively, you can store the 2 lists in temporary lists and compare in whatsoever way your require.
Here is the complete code:
public static void main(String[] args) {
String line;
// List of all the words read from the file
List<String> list = new ArrayList<String>();
// Your original mast list of words against which you want to compare
List<String> masterList = new ArrayList<String>(Arrays.asList("cleaner",
"code", "java", "read", "write", "market", "python", "snake",
"stack", "overflow"));
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader("testing.txt"));
while ((line = reader.readLine()) != null) {
// Add all the words split by a ; to the list
List<String> words = Arrays.asList(line.split(";"));
list.addAll(words);
}
// Compare the 2 lists, assuming your file list has less number of
// records
masterList.removeAll(list);
System.out.println(masterList);
reader.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
File which I have created looks like:
cleaner;code
java;read
write;market
python;snake
The output of the above code:
[stack, overflow]

Categories

Resources