Read the each string text from file in java - java

I am new in java. I just wants to read each string in java and print it on console.
Code:
public static void main(String[] args) throws Exception {
File file = new File("/Users/OntologyFile.txt");
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(
fstream));
String data = new String();
while ((data = infile.readLine()) != null) { // use if for reading just 1 line
System.out.println(""+data);
}
} catch (IOException e) {
// Error
}
}
If file contains:
Add label abc to xyz
Add instance cdd to pqr
I want to read each word from file and print it to a new line, e.g.
Add
label
abc
...
And afterwards, I want to extract the index of a specific string, for instance get the index of abc.
Can anyone please help me?

It sounds like you want to be able to do two things:
Print all words inside the file
Search the index of a specific word
In that case, I would suggest scanning all lines, splitting by any whitespace character (space, tab, etc.) and storing in a collection so you can later on search for it. Not the question is - can you have repeats and in that case which index would you like to print? The first? The last? All of them?
Assuming words are unique, you can simply do:
public static void main(String[] args) throws Exception {
File file = new File("/Users/OntologyFile.txt");
ArrayList<String> words = new ArrayList<String>();
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(
fstream));
String data = null;
while ((data = infile.readLine()) != null) {
for (String word : data.split("\\s+") {
words.add(word);
System.out.println(word);
}
}
} catch (IOException e) {
// Error
}
// search for the index of abc:
for (int i = 0; i < words.size(); i++) {
if (words.get(i).equals("abc")) {
System.out.println("abc index is " + i);
break;
}
}
}
If you don't break, it'll print every index of abc (if words are not unique). You could of course optimize it more if the set of words is very large, but for a small amount of data, this should suffice.
Of course, if you know in advance which words' indices you'd like to print, you could forego the extra data structure (the ArrayList) and simply print that as you scan the file, unless you want the printings (of words and specific indices) to be separate in output.

Split the String received for any whitespace with the regex \\s+ and print out the resultant data with a for loop.
public static void main(String[] args) { // Don't make main throw an exception
File file = new File("/Users/OntologyFile.txt");
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(fstream));
String data;
while ((data = infile.readLine()) != null) {
String[] words = data.split("\\s+"); // Split on whitespace
for (String word : words) { // Iterate through info
System.out.println(word); // Print it
}
}
} catch (IOException e) {
// Probably best to actually have this on there
System.err.println("Error found.");
e.printStackTrace();
}
}

Just add a for-each loop before printing the output :-
while ((data = infile.readLine()) != null) { // use if for reading just 1 line
for(String temp : data.split(" "))
System.out.println(temp); // no need to concatenate the empty string.
}
This will automatically print the individual strings, obtained from each String line read from the file, in a new line.
And afterwards, I want to extract the index of a specific string, for
instance get the index of abc.
I don't know what index are you actually talking about. But, if you want to take the index from the individual lines being read, then add a temporary variable with count initialised to 0.
Increment it till d equals abc here. Like,
int count = 0;
for(String temp : data.split(" ")){
count++;
if("abc".equals(temp))
System.out.println("Index of abc is : "+count);
System.out.println(temp);
}

Use Split() Function available in Class String.. You may manipulate according to your need.
or
use length keyword to iterate throughout the complete line
and if any non- alphabet character get the substring()and write it to the new line.

List<String> words = new ArrayList<String>();
while ((data = infile.readLine()) != null) {
for(String d : data.split(" ")) {
System.out.println(""+d);
}
words.addAll(Arrays.asList(data));
}
//words List will hold all the words. Do words.indexOf("abc") to get index
if(words.indexOf("abc") < 0) {
System.out.println("word not present");
} else {
System.out.println("word present at index " + words.indexOf("abc"))
}

Related

Reading a .txt file and excluding certain elements

In my journey to complete this program I've run into a little hitch with one of my methods. The method I am writing reads a certain .txt file and creates a HashMap and sets every word found as a Key and the amount of time it appears is its Value. I have managed to figure this out for another method, but this time, the .txt file the method is reading is in a weird format. Specifically:
more 2
morning's 1
most 3
mostly 1
mythology. 1
native 1
nearly 2
northern 1
occupying 1
of 29
off 1
And so on.
Right now, the method is returning only one line in the file.
Here is my code for the method:
public static HashMap<String,Integer> readVocabulary(String fileName) {
// Declare the HashMap to be returned
HashMap<String, Integer> wordCount = new HashMap();
String toRead = fileName;
try {
FileReader reader = new FileReader(toRead);
BufferedReader br = new BufferedReader(reader);
// The BufferedReader reads the lines
String line = br.readLine();
// Split the line into a String array to loop through
String[] words = line.split(" ");
// for loop goes through every word
for (int i = 0; i < words.length; i++) {
// Case if the HashMap already contains the key.
// If so, just increments the value.
if (wordCount.containsKey(words[i])) {
int n = wordCount.get(words[i]);
wordCount.put(words[i], ++n);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], 1);
}
}
br.close();
}
// Catching the file not found error
// and any other errors
catch (FileNotFoundException fnfe) {
System.err.println("File not found.");
}
catch (Exception e) {
System.err.print(e);
}
return wordCount;
}
The issue is that I'm not sure how to get the method to ignore the 2's and 1's and 29's of the .txt file. I attempted making an 'else if' statement to catch all of these cases but there are too many. Is there a way for me to catch all the ints from say, 1-100, and exlude them from being Keys in the HashMap? I've searched online but have turned up something.
Thank you for any help you can give!
How about just doing wordCount.put(words[0],1) into wordcount for every line, after you've done the split. If the pattern is always "word number", you only need the first item from the split array.
Update after some back and forth
public static HashMap<String,Integer> readVocabulary(String toRead)
{
// Declare the HashMap to be returned
HashMap<String, Integer> wordCount = new HashMap<String, Integer>();
String line = null;
String[] words = null;
int lineNumber = 0;
FileReader reader = null;
BufferedReader br = null;
try {
reader = new FileReader(toRead);
br = new BufferedReader(reader);
// Split the line into a String array to loop through
while ((line = br.readLine()) != null) {
lineNumber++;
words = line.split(" ");
if (words.length == 2) {
if (wordCount.containsKey(words[0]))
{
int n = wordCount.get(words[0]);
wordCount.put(words[0], ++n);
}
// Otherwise, puts the word into the HashMap
else
{
boolean word2IsInteger = true;
try
{
Integer.parseInt(words[1]);
}
catch(NumberFormatException nfe)
{
word2IsInteger = false;
}
if (word2IsInteger) {
wordCount.put(words[0], Integer.parseInt(words[1]));
}
}
}
}
br.close();
br = null;
reader.close();
reader = null;
}
// Catching the file not found error
// and any other errors
catch (FileNotFoundException fnfe) {
System.err.println("File not found.");
}
catch (Exception e) {
System.err.print(e);
}
return wordCount;
}
To check if a String contains a only digits use StringĀ“s matches() method, e.g.
if (!words[i].matches("^\\d+$")){
// NOT a String containing only digits
}
This wont require checking exceptions and it doesnt matter if the number wouldnt fit inside an Integer.
Option 1: Ignore numbers separated by whitespace
Use Integer.parseInt() or Double.parseInt() and catch the exception.
// for loop goes through every word
for (int i = 0; i < words.length; i++) {
try {
int wordAsInt = Integer.parseInt(words[i]);
} catch(NumberFormatException e) {
// Case if the HashMap already contains the key.
// If so, just increments the value.
if (wordCount.containsKey(words[i])) {
int n = wordCount.get(words[i]);
wordCount.put(words[i], ++n);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], 1);
}
}
}
There is a Double.parseDouble(String) method, which you could use in place of Integer.parseInt(String) above if you wanted to eliminate all numbers, not just integers.
Option 2: Ignore numbers everywhere
Another option is to parse your input one character at a time and ignore any character that isn't a letter. When you scan whitespace, then you could add the word generated by the characters just scanned in to your HashMap. Unlike the methods mentioned above, scanning by character would allow you to ignore numbers even if they appear immediately next to other characters.

OpenNLP - Tokenize an Array of Strings

I am trying to tokenize a text file using the OpenNLP tokenizer.
What I do, I read in a .txt file and store it in a list, want to iterate over every line, tokenize the line and write the tokenized line to a new file.
In the line:
tokens[i] = tokenizer.tokenize(output[i]);
I get:
Type mismatch: cannot convert from String[] to String
This is my code:
public class Tokenizer {
public static void main(String[] args) throws Exception {
InputStream modelIn = new FileInputStream("en-token-max.bin");
try {
TokenizerModel model = new TokenizerModel(modelIn);
Tokenizer tokenizer = new TokenizerME(model);
CSVReader reader = new CSVReader(new FileReader("ParsedRawText1.txt"),',', '"', 1);
String csv = "ParsedRawText2.txt";
CSVWriter writer = new CSVWriter(new FileWriter(csv),CSVWriter.NO_ESCAPE_CHARACTER,CSVWriter.NO_QUOTE_CHARACTER);
//Read all rows at once
List<String[]> allRows = reader.readAll();
for(String[] output : allRows) {
//get current row
String[] tokens=new String[output.length];
for(int i=0;i<output.length;i++){
tokens[i] = tokenizer.tokenize(output[i]);
System.out.println(tokens[i]);
}
//write line
writer.writeNext(tokens);
}
writer.close();
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
}
}
Does anyone has any idea how to complete this task?
As compiler says, you try to assign array of Strings (result of tokenize()) to String (tokens[i] is a String). So you should declare and use tokens inside the inner loop and write tokens[] there, too:
for (String[] output : allRows) {
// get current row
for (int i = 0; i < output.length; i++) {
String[] tokens = tokenizer.tokenize(output[i]);
System.out.println(tokens);
// write line
writer.writeNext(tokens);
}
}
writer.close();
Btw, are you sure that your source file is a csv? If it is actually a plain text file, then you split text by commas and gives such chunks to Opennlp, and it can perform worse, because its model was trained over normal sentences, not split like yours.

Java Matcher: How to match multiple lines with one regex

My method takes a file, and tries to extract the text between the header ###Title### and closing ###---###. I need it to extract multiple lines and put each line into an array. But since readAllLines() converts all lines into an array, I don't know how to compare and match it.
public static ArrayList<String> getData(File f, String title) throws IOException {
ArrayList<String> input = (ArrayList<String>) Files.readAllLines(f.toPath(), StandardCharsets.US_ASCII);
ArrayList<String> output = new ArrayList<String>();
//String? readLines = somehow make it possible to match
System.out.println("Checking entry.");
Pattern p = Pattern.compile("###" + title + "###(.*)###---###", Pattern.DOTALL);
Matcher m = p.matcher(readLines);
if (m.matches()) {
m.matches();
String matched = m.group(1);
System.out.println("Contents: " + matched);
String[] array = matched.split("\n");
ArrayList<String> array2 = new ArrayList<String>();
for (String j:array) {
array2.add(j);
}
output = array2;
} else {
System.out.println("No matches.");
}
return output;
}
Here is my file, and I'm 100% sure that the compiler is reading the correct one.
###Test File###
Entry 1
Entry 2
Data 1
Data 2
Test 1
Test 2
###---###
The output says "No matches." instead of the entries.
You don't need regex for that. It's enough to loop through the array and compare items line by line, taking those between the start and end tags.
ArrayList<String> input = (ArrayList<String>) Files.readAllLines(f.toPath(), StandardCharsets.US_ASCII);
ArrayList<String> output = new ArrayList<String>();
boolean matched = false;
for (String line : input) {
if (line.equals("###---###") && matched) matched = false; //needed parentheses
if (matched) output.add(line);
if (line.equals("###Test File###") && !matched) matched = true;
}
As per your comment, if they are going to be in the same way as posted, then i don't think regex is needed for this requirement. You can read line by line and do a contains of '###'
public static void main(String args[])
{
ArrayList<String> dataList = new ArrayList<String>();
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// this line will skip the header and footer with '###'
if(!strLine.contains("###");
dataList.add(strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
//Now dataList has all the data between ###Test File### and ###---###
}
You can also change the contains method parameter according to your requirement to ignore lines!

Print data from file to array

I need to have this file print to an array, not to screen.And yes, I MUST use an array - School Project - I'm very new to java so any help is appreciated. Any ideas? thanks
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class HangmanProject
{
public static void main(String[] args) throws FileNotFoundException
{
String scoreKeeper; // to keep track of score
int guessesLeft; // to keep track of guesses remaining
String wordList[]; // array to store words
Scanner keyboard = new Scanner(System.in); // to read user's input
System.out.println("Welcome to Hangman Project!");
// Create a scanner to read the secret words file
Scanner wordScan = null;
try {
wordScan = new Scanner(new BufferedReader(new FileReader("words.txt")));
while (wordScan.hasNext()) {
System.out.println(wordScan.next());
}
} finally {
if (wordScan != null) {
wordScan.close();
}
}
}
}
Nick, you just gave us the final piece of the puzzle. If you know the number of lines you will be reading, you can simply define an array of that length before you read the file
Something like...
String[] wordArray = new String[10];
int index = 0;
String word = null; // word to be read from file...
// Use buffered reader to read each line...
wordArray[index] = word;
index++;
Now that example's not going to mean much to be honest, so I did these two examples
The first one uses the concept suggested by Alex, which allows you to read an unknown number of lines from the file.
The only trip up is if the lines are separated by more the one line feed (ie there is a extra line between words)
public static void readUnknownWords() {
// Reference to the words file
File words = new File("Words.txt");
// Use a StringBuilder to buffer the content as it's read from the file
StringBuilder sb = new StringBuilder(128);
BufferedReader reader = null;
try {
// Create the reader. A File reader would be just as fine in this
// example, but hay ;)
reader = new BufferedReader(new FileReader(words));
// The read buffer to use to read data into
char[] buffer = new char[1024];
int bytesRead = -1;
// Read the file to we get to the end
while ((bytesRead = reader.read(buffer)) != -1) {
// Append the results to the string builder
sb.append(buffer, 0, bytesRead);
}
// Split the string builder into individal words by the line break
String[] wordArray = sb.toString().split("\n");
System.out.println("Read " + wordArray.length + " words");
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (Exception e) {
}
}
}
The second demonstrates how to read the words into an array of known length. This is probably closer to the what you actually want
public static void readKnownWords()
// This is just the same as the previous example, except we
// know in advance the number of lines we will be reading
File words = new File("Words.txt");
BufferedReader reader = null;
try {
// Create the word array of a known quantity
// The quantity value could be defined as a constant
// ie public static final int WORD_COUNT = 10;
String[] wordArray = new String[10];
reader = new BufferedReader(new FileReader(words));
// Instead of reading to a char buffer, we are
// going to take the easy route and read each line
// straight into a String
String text = null;
// The current array index
int index = 0;
// Read the file till we reach the end
// ps- my file had lots more words, so I put a limit
// in the loop to prevent index out of bounds exceptions
while ((text = reader.readLine()) != null && index < 10) {
wordArray[index] = text;
index++;
}
System.out.println("Read " + wordArray.length + " words");
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (Exception e) {
}
}
}
If you find either of these useful, I would appropriate it you would give me a small up-vote and check Alex's answer as correct, as it's his idea that I've adapted.
Now, if you're really paranoid about which line break to use, you can find the values used by the system via the System.getProperties().getProperty("line.separator") value.
Do you need more help with the reading the file, or getting the String to a parsed array? If you can read the file into a String, simply do:
String[] words = readString.split("\n");
That will split the string at each line break, so assuming this is your text file:
Word1
Word2
Word3
words will be: {word1, word2, word3}
If the words you are reading are stored in each line of the file, you can use the hasNextLine() and nextLine() to read the text one line at a time. Using the next() will also work, since you just need to throw one word in the array, but nextLine() is usually always preferred.
As for only using an array, you have two options:
You either declare a large array, the size of whom you are sure will never be less than the total amount of words;
You go through the file twice, the first time you read the amount of elements, then you initialize the array depending on that value and then, go through it a second time while adding the string as you go by.
It is usually recommended to use a dynamic collection such as an ArrayList(). You can then use the toArray() method to turnt he list into an array.

Java: Search a word in multiple files

Basically, i need to check for a word's occurances within multiple files.
Also, a word might exist in a single text file multiple times.
I want to save positions of a word for each file; so i wrote the code below:
public static void findWord(String word, File file){
try{
BufferedReader input = new BufferedReader(
new InputStreamReader(
new FileInputStream(file)));
String line;
ArrayList<Integer> list=new ArrayList<Integer>();
while((line=input.readLine())!=null){
if(line.indexOf(word)>-1){
list.add(line.indexOf(word));
}
}
System.out.println(file +": "+ list);
input.close();
}
catch(Exception ex){
ex.printStackTrace();
}
}
My code fails to add to list after first successful occurance. So I have only one element within every array.
How do i fix it?
P.S My text files consists of one line
Here goes the fix (replace your while loop with this):
while ((line = input.readLine ()) != null)
{
int index = -1;
while ((index = line.indexOf (word, index + 1) > -1)
{
list.add (index);
}
}

Categories

Resources