I'm creating an inverted index for an information retrieval course and can't figure out how to see if a word is in my nested hashmap.
"inner" contains a word & its frequency while the "invertedIndex" contains the name of the document it occurs in.
When processing a search, I'm trying to see if the user input defined as "query" is in the inner hashmap. I'm pretty sure the error is arising from the nested for loop at the bottom of my code...
My code is below.
public class PositionalIndex extends Stemmer{
// no more than this many input files needs to be processed
final static int MAX_NUMBER_OF_INPUT_FILES = 100;
// an array to hold Gutenberg corpus file names
static String[] inputFileNames = new String[MAX_NUMBER_OF_INPUT_FILES];
static int fileCount = 0;
// loads all files names in the directory subtree into an array
// violates good programming practice by accessing a global variable (inputFileNames)
public static void listFilesInPath(final File path) {
for (final File fileEntry : path.listFiles()) {
if (fileEntry.isDirectory()) {
listFilesInPath(fileEntry);
}
else if (fileEntry.getName().endsWith((".txt"))) {
inputFileNames[fileCount++] = fileEntry.getPath();
}
}
System.out.println("File count: " + fileCount);
}
public static void main(String[] args){
// did the user provide correct number of command line arguments?
// if not, print message and exit
if (args.length != 1){
System.err.println("Number of command line arguments must be 1");
System.err.println("You have given " + args.length + " command line arguments");
System.err.println("Incorrect usage. Program terminated");
System.err.println("Correct usage: java Ngrams <path-to-input-files>");
System.exit(1);
}
// extract input file name from command line arguments
// this is the name of the file from the Gutenberg corpus
String inputFileDirName = args[0];
System.out.println("Input files directory path name is: " + inputFileDirName);
// collects file names and write them to
listFilesInPath(new File (inputFileDirName));
// wordPattern specifies pattern for words using a regular expression
Pattern wordPattern = Pattern.compile("[a-zA-Z]+");
// wordMatcher finds words by spotting word word patterns with input
Matcher wordMatcher;
// a line read from file
String line;
// br for efficiently reading characters from an input stream
BufferedReader br = null;
// an extracted word from a line
String word;
// simplified version of porterStemmer
Stemmer porterStemmer = new Stemmer();
System.out.println("Processing files...");
// create an instance of the Stemmer class
Stemmer stemmer = new Stemmer();
Map<String, Map<String, Integer>> invertedIndex = new HashMap<String, Map<String, Integer>>();
Map<String, Integer> inner = new HashMap<String, Integer>();
// process one file at a time
for (int index = 0; index < fileCount; index++){
// open the input file, read one line at a time, extract words
// in the line, extract characters in a word, write words and
// character counts to disk files
try {
// get a BufferedReader object, which encapsulates
// access to a (disk) file
br = new BufferedReader(new FileReader(inputFileNames[index]));
// as long as we have more lines to process, read a line
// the following line is doing two things: makes an assignment
// and serves as a boolean expression for while test
while ((line = br.readLine()) != null) {
// process the line by extracting words using the wordPattern
wordMatcher = wordPattern.matcher(line);
// process one word at a time
while ( wordMatcher.find() ) {
// extract the word
word = line.substring(wordMatcher.start(), wordMatcher.end());
word = word.toLowerCase();
//use Stemmer class to stem word & convert to lowercase
porterStemmer.stemWord(word);
if (!inner.containsKey(word)) {
inner.put(word, 1);
}
else
{
inner.put(word, inner.get(word) + 1);
}
} // end one word at a time while
} // end outer while
invertedIndex.put(inputFileNames[index], inner);
/*for(String x : inner.keySet()) {
System.out.println(x);
}*/
inner.clear();
} // end try
catch (IOException ex) {
System.err.println("File " + inputFileNames[index] + " not found. Program terminated.\n");
System.exit(1);
}
} // end for
System.out.print("Enter a query: ");
Scanner kbd = new Scanner(System.in);
String query = kbd.next();
for(String fileName : invertedIndex.keySet()) {
for(String wordInFile : invertedIndex.get(fileName).keySet())
{
if(wordInFile.equals(query))
{
System.out.println(query + " was found in document " + fileName);
}
}
}
}
}
Why are you invoking:
inner.clear()
it seems that a new inner map needs to be created every time and then added to invertedIndex; instead of clearing it as data are lost.
try this
for(String w : invertedIndex.keySet()) {
Map<String, Integer> fileWordMap = invertedIndex.get(w)
if(fileWordMap.containsKey(query))
{
System.out.println(query + " was found in document " + w);
}
}
or as per your original code
for(String fileName : invertedIndex.keySet()) {
for(String wordInFile : invertedIndex.get(fileName).keySet())
{
if(wordInFile.equals(query))
{
System.out.println(query + " was found in document " + fileName);
}
}
}
As a tip, try having variable names that can tell you what the code is doing :) Its very easy to get confused if we only use random variable names
Related
I have a .csv file that is formated like this:
ID,date,itemName
456,1-4-2020,Lemon
345,1-3-2020,Bacon
345,1-4-2020,Sausage
123,1-1-2020,Apple
123,1-2-2020,Pineapple
234,1-2-2020,Beer
345,1-4-2020,Cheese
I have already implemented the algorithm to go through the file, scan for the first number and sort it in a descending order and make a new output:
123,1-1-2020,Apple
123,1-2-2020,Pineapple
234,1-2-2020,Beer
345,1-3-2020,Bacon
345,1-4-2020,Cheese
345,1-4-2020,Sausage
456,1-4-2020,Lemon
My question is, how do I implement my algorithm to make an output that counts the duplicate first number entries and reformat it to make it look like this...
123,1-1-2020,1,Apple
123,1-2-2020,1,Pineapple
234,1-2-2020,1,Beer
345,1-3-2020,1,Bacon
345,1-4-2020,2,Cheese,Sausage
456,1-4-2020,1,Lemon
...so that it counts the number of occurrence for each ID, denote it with the number of times, and if the date of that ID is also the same, combine the item names to the same line. Below is my source code (each line in the .csv is made into an object named 'receipt' that has ID, date, and name with their respective get() methods):
public class ReadFile {
private static List<Receipt> readFile() {
List<Receipt> receipts = new ArrayList<>();
try {
BufferedReader reader = new BufferedReader(new FileReader("dataset.csv"));
// Move past the first title line
reader.readLine();
String line = reader.readLine();
// Start reading from second line till EOF, split each string at ","
while (line != null) {
String[] attributes = line.split(",");
Receipt attribute = getAttributes(attributes);
receipts.add(attribute);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
return receipts;
}
private static Receipt getAttributes(String[] attributes) {
// Get ID located before the first ","
long memberNumber = Long.parseLong(attributes[0]);
// Get date located after the first ","
String date = attributes[1];
// Get name located after the second ","
String name = attributes[2];
return new Receipt(memberNumber, date, name);
}
// Parse the data into new file after sorting
private static void parse(List<Receipt> receipts) {
PrintWriter output = null;
try {
output = new PrintWriter("output.txt");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// For each receipts, assert the text output stream is not null, print line.
for (Receipt p : receipts) {
assert output != null;
output.println(p.getMemberNumber() + "," + p.getDate() + "," + p.getName());
}
assert output != null;
output.close();
}
// Main method, accept input file, sort and parse
public static void main(String[] args) {
List<Receipt> receipts = readFile();
QuickSort q = new QuickSort();
q.quickSort(receipts);
parse(receipts);
}
}
The easiest way is to use a map.
Sample data from your file.
String[] lines = {
"123,1-1-2020,Apple",
"123,1-2-2020,Pineapple",
"234,1-2-2020,Beer",
"345,1-3-2020,Bacon",
"345,1-4-2020,Cheese",
"345,1-4-2020,Sausage",
"456,1-4-2020,Lemon"};
Create a map
as you read the lines, split them and add them to the map using the compute method. This will put the line in if the key (number and date) doesn't exist. Otherwise it simply appends the last item to the existing entry.
the file does not have to be sorted but the values will be added to the end as they are encountered.
Map<String, String> map = new LinkedHashMap<>();
for (String line : lines) {
String[] vals = line.split(",");
// if v is null, add the line
// if v exists, take the existing line and append the last value
map.compute(vals[0]+vals[1], (k,v)->v == null ? line : v +","+vals[2]);
}
for (String line : map.values()) {
String[] fields = line.split(",",3);
int count = fields[2].split(",").length;
System.out.printf("%s,%s,%s,%s%n", fields[0],fields[1],count,fields[2]);
}
For this sample run prints
123,1-1-2020,1,Apple
123,1-2-2020,1,Pineapple
234,1-2-2020,1,Beer
345,1-3-2020,1,Bacon
345,1-4-2020,2,Cheese,Sausage
456,1-4-2020,1,Lemon
I am trying to find the String "5464" in a csv document then have it return all of the values under that String (same number of Delimiters from the start of the line), until reaching the end of the list (no more values in the column). Any help would be sincerely appreciated.
import javax.swing.JOptionPane;
public class SearchNdestroyV2 {
private static Scanner x;
public static void main(String[] args) {
String filepath = "tutorial.txt";
String searchTerm = "5464"
readRecord(searchTerm,filepath);
}
public void readRecord(String searchTerm, String filepath)
{
boolean found = false;
String ID = ""; String ID2 = ""; String ID3 = "";
}
try
{
x = new Scanner(new File(filepath));
x.useDelimeter("[,\n]");
while(x.hasNext() && !found )
{
ID = x.next();
ID2 = x.nextLine();
ID3 = x.nextLine();
if(ID.equals(searchTerm))
{
found = true;
}
}
if (found)
{
JOptionPane.showMessageDialog(null,"ID: " + ID + "ID2: " + ID2 + "ID3: "+ID3);
}
}
else
{
JOptionPane.showMessageDialog(null, "Error:");
}
catch(Exception e)
{
}
{
}
I'm not exactly sure of what you mean. The way I read your question:
You want to locate a specific String ("5464") that is contained within a specific column within a comma (,) delimited CSV file. If this specific string (search term) is found then retrieve all other values contained within the same column for the rest of the CSV file records from the point of location. Here is how:
import java.io.File;
import java.util.ArrayList;
import java.util.Scanner;
import javax.swing.JOptionPane;
public class SearchNDestroyV2 {
private Scanner fileInput;
public static void main(String[] args) {
// Do this if you don't want to deal with statics
new SearchNDestroyV2().startApp(args);
}
private void startApp(String[] args) {
String filepath = "tutorial.txt";
String searchTerm = "5464";
readRecord(searchTerm, filepath);
}
public void readRecord(String searchTerm, String filepath) {
try {
fileInput = new Scanner(new File(filepath));
// Variable to hold each file line data read.
String line;
// Used to hold the column index value to
// where the found search term is located.
int foundColumn = -1;
// An ArrayList to hold the column values retrieved from file.
ArrayList<String> columnList = new ArrayList<>();
// Read file to the end...
while(fileInput.hasNextLine()) {
// Read in file - 1 trimmed line per iteration
line = fileInput.nextLine().trim();
//Skip blank lines (if any).
if (line.equals("")) {
continue;
}
// Split the curently read line into a String Array
// based on the comma (,) delimiter
String[] lineParts = line.split("\\s{0,},\\s{0,}"); // Split on any comma/space situation.
// Iterate through the lineParts array to see if any
// delimited portion equals the search term.
for (int i = 0; i < lineParts.length; i++) {
/* This IF statement will always accept the column data and
store it if the foundColumn variable equals i OR the current
column data being checked is equal to the search term.
Initially when declared, foundColumn equals -1* and will
never equal i unless the search term is indeed found. */
if (foundColumn == i || lineParts[i].equals(searchTerm)) {
// Found a match
foundColumn = i; // Hold the Coloumn index number of the found item.
columnList.add(lineParts[i]); // Add the found ite to the List.
break; // Get out of this loop. Don't need it anymore for this line.
}
}
}
if (foundColumn != -1) {
System.out.println("Items Found:" + System.lineSeparator() +
"============");
for (String str : columnList) {
System.out.println(str);
}
}
else {
JOptionPane.showMessageDialog(null, "Can't find the Search Term: " + searchTerm);
}
}
catch(Exception ex) {
System.out.println(ex.getMessage());
}
}
}
If however, what you want is to search through the CSV file and as soon as any particular column equals the Search Term ("5464") then simply store the CSV line (all its data columns) which contains that Search Term. Here is how:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
import javax.swing.JFrame;
import javax.swing.JOptionPane;
public class SearchNDestroyV2 {
/* A JFrame used as Parent for displaying JOptionPane dialogs.
Using 'null' can allow the dialog to open behind other open
applications (like the IDE). This ensures that it will be
displayed above all other applications at center screen. */
JFrame iFRAME = new JFrame();
{
iFRAME.setAlwaysOnTop(true);
iFRAME.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
iFRAME.setLocationRelativeTo(null);
}
public static void main(String[] args) {
// Do this if you don't want to deal with statics
new SearchNDestroyV2().startApp(args);
}
private void startApp(String[] args) {
String filepath = "tutorial.txt";
String searchTerm = "5464";
ArrayList<String> recordsFound = readRecord(searchTerm, filepath);
/* Display any records found where a particular column
matches the Search Term. */
if (!recordsFound.isEmpty()) {
System.out.println("Records Found:" + System.lineSeparator()
+ "==============");
for (String str : recordsFound) {
System.out.println(str);
}
}
else {
JOptionPane.showMessageDialog(iFRAME, "Can't find the Search Term: " + searchTerm);
iFRAME.dispose();
}
}
/**
* Returns an ArrayList (of String) of any comma delimited CSV file line
* records which contain any column matching the supplied Search Term.<br>
*
* #param searchTerm (String) The String to search for in all Record
* columns.<br>
*
* #param filepath (String) The CSV (or text) file that contains the data
* records.<br>
*
* #return ({#code ArrayList<String>}) An ArrayList of String Type which
* contains the file line records where any particular column
* matches the supplied Search Term.
*/
public ArrayList<String> readRecord(String searchTerm, String filepath) {
// An ArrayList to hold the line(s) retrieved from file
// that match the search term.
ArrayList<String> linesList = new ArrayList<>();
// Try With Resourses used here to auto-close the Scanner reader.
try (Scanner fileInput = new Scanner(new File(filepath))) {
// Variable to hold each file line data read.
String line;
// Read file to the end...
while (fileInput.hasNextLine()) {
// Read in file - 1 trimmed line per iteration
line = fileInput.nextLine().trim();
//Skip blank lines (if any).
if (line.equals("")) {
continue;
}
// Split the curently read line into a String Array
// based on the comma (,) delimiter
String[] lineParts = line.split("\\s{0,},\\s{0,}"); // Split on any comma/space situation.
// Iterate through the lineParts array to see if any
// delimited portion equals the search term.
for (int i = 0; i < lineParts.length; i++) {
if (lineParts[i].equals(searchTerm)) {
// Found a match
linesList.add(line); // Add the found line to the List.
break; // Get out of this loop. Don't need it anymore for this line.
}
}
}
}
catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
}
return linesList; // Return the ArrayList
}
}
Please try to note the differences between the two code examples. In particular how the file reader (Scanner object) is closed, etc.
I get multiple errors when writing the header of a method that takes an array list and an integer as input.
I have tried several different ways of writing the header for the method. The body is good and gives me what I want but I can't get the header/call name (I don't know what you call the first line of a method) to not throw errors
/**
* Creates Arraylist "list" using prompt user for the input and output file path and sets the file name for the output file to
* p01-runs.txt
*
*/
Scanner scan = new Scanner(System.in);
System.out.println("Please enter the path to your source file: ");
String inPath = scan.nextLine(); // sets inPath to user supplied path
System.out.println("Please enter the path for your source file: ");
String outPath = scan.nextLine() + "p01-runs.txt"; // sets outPath to user supplied input path
ArrayList<Integer> listRunCount = new ArrayList<Integer>();
ArrayList<Integer> list = new ArrayList<Integer>();
/**
* Reads data from input file and populates array with integers.
*/
FileReader fileReader = new FileReader(inPath);
BufferedReader bufferedReader = new BufferedReader(fileReader);
// file writing buffer
PrintWriter printWriter = new PrintWriter(outPath);
System.out.println("Reading file...");
/**
* Reads lines from the file, removes spaces in the line converts the string to
* an integer and adds the integer to the array
*/
File file = new File(inPath);
Scanner in = new Scanner(file);
String temp=null;
while (in.hasNextLine()) {
temp = in.nextLine();
temp = temp.replaceAll("\\s","");
int num = Integer.parseInt(temp);
list.add(num);
}
listRunCount.findRuns(list, RUN_UP);
//********************************************************************************************************
public ArrayList<Integer> findRuns(ArrayList<Integer> list, int RUN_UP){
returns listRunCount;
}
error messages
Multiple markers at this line
- Syntax error on token "int", delete this token
- Syntax error, insert ";" to complete LocalVariableDeclarationStatement
- Integer cannot be resolved to a variable
- ArrayList cannot be resolved to a variable
- Syntax error, insert ";" to complete LocalVariableDeclarationStatement
- Illegal modifier for parameter findRuns; only final is permitted
- Syntax error, insert ") Expression" to complete CastExpression
- Syntax error on token "findRuns", = expected after this token
- Syntax error, insert "VariableDeclarators" to complete
LocalVariableDeclaration
- Syntax error, insert ";" to complete Statement
This sort of thing removes the need for statics. If you run your code from within the static method main() then all class methods, member variables, etc that are called or referenced from within main() must also be declared as static. By doing:
public class Main {
public static void main(String[] args) {
new Main().run();
}
}
eliminates the need for statics. In my opinion to properly do this the run() method within the class should also be passed the args[] parameter:
public class Main {
public static void main(String[] args) {
new Main().run(args);
}
private void run(String[] args) {
// You project code here
}
}
That way any Command Line arguments passed to the application can also be processed from within the run() method. You will find that most people won't use the method name run for this sort of thing since run() is a method name more related to the running of a Thread. A name like startApp() is more appropriate.
public class Main {
public static void main(String[] args) {
new Main().startApp(args);
}
private void startApp(String[] args) {
// You project code here
}
}
With all this in mind your code might look something like this:
public class Main {
public static void main(String[] args) {
new Main().run(args);
}
private void run(String[] args) {
String runCountFileCreated = createListRunCount();
if (!runCountFileCreated.equals("") {
System.out.println(The count file created was: " + runCountFileCreated);
}
else {
System.out.println(A count file was NOT created!);
}
}
/**
* Creates an ArrayList "list" using prompts for the input and output file
* paths and sets the file name for the output (destination) file to an
* incremental format of p01-runs.txt, p02-runs.txt, p03-runs.txt, etc. If
* p01 exists then the file name is incremented to p02, etc. The file name
* is incremented until it is determined that the file name does not exist.
*
* #return (String) The path and file name of the generated destination
* file.
*/
public String createListRunCount() {
String ls = System.lineSeparator();
File file = null;
Scanner scan = new Scanner(System.in);
// Get the source file path from User...
String sourceFile = "";
while (sourceFile.equals("")) {
System.out.print("Please enter the path to your source file." + ls
+ "Enter nothing to cancel this process:" + ls
+ "Source File Path: --> ");
sourceFile = scan.nextLine().trim(); // User Input
/* If nothing was entered (just the enter key was hit)
then exit this method. */
if (sourceFile.equals("")) {
System.out.println("Process CANCELED!");
return "";
}
// See if the supplied file exists...
file = new File(sourceFile);
if (!file.exists()) {
System.out.println("The supplied file Path/Name can not be found!." + ls
+ "[" + sourceFile + "]" + ls + "Please try again...");
sourceFile = "";
}
}
String destinationFile = "";
while (destinationFile.equals("")) {
System.out.print(ls + "Please enter the path to folder where data will be saved." + ls
+ "If the supplied folder path does not exist then an attempt" + ls
+ "will be made to automatically created it. DO NOT supply a" + ls
+ "file name. Enter nothing to cancel this process:" + ls
+ "Destination Folder Path: --> ");
String destinationPath = scan.nextLine();
if (destinationPath.equals("")) {
System.out.println("Process CANCELED!");
return "";
}
// Does supplied path exist. If not then create it...
File fldr = new File(destinationPath);
if (fldr.exists() && fldr.isDirectory()) {
/* Supplied folder exists. Now establish a new incremental file name.
Get the list of files already contained within this folder that
start with p and a number (ex: p01-..., p02--..., p03--..., etc)
*/
String[] files = fldr.list(); // Get a list of files in the supplied folder.
// Are there any files in the supplied folder?
if (files.length > 0) {
//Yes, so process them...
List<String> pFiles = new ArrayList<>();
for (String fileNameString : files) {
if (fileNameString.matches("^p\\d+\\-runs\\.txt$")) {
pFiles.add(fileNameString);
}
}
// Get the largest p file increment number
int largestPnumber = 0;
for (int i = 0; i < pFiles.size(); i++) {
int fileNumber = Integer.parseInt(pFiles.get(i).split("-")[0].replace("p", ""));
if (fileNumber > largestPnumber) {
largestPnumber = fileNumber;
}
}
largestPnumber++; // Increment the largest p file number by 1
// Create the new file name...
String fileName = String.format("p%02d-runs.txt", largestPnumber);
//Create the new destination File path and name string
destinationFile = fldr.getAbsolutePath() + "\\" + fileName;
}
else {
// No, so let's start with p01-runs.txt
destinationFile = fldr.getAbsolutePath() + "\\p01-runs.txt";
}
}
else {
// Supplied folder does not exist so create it.
// User Confirmation of folder creation...
JFrame iFrame = new JFrame();
iFrame.setAlwaysOnTop(true);
iFrame.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
iFrame.setLocationRelativeTo(null);
int res = JOptionPane.showConfirmDialog(iFrame, "The supplied storage folder does not exist!"
+ ls + "Do you want to create it?", "Create Folder?", JOptionPane.YES_NO_OPTION);
iFrame.dispose();
if (res != 0) {
destinationFile = "";
continue;
}
try {
fldr.mkdirs();
}
catch (Exception ex) {
// Error in folder creation...
System.out.println(ls + "createListRunCount() Method Error! Unable to create path!" + ls
+ "[" + fldr.getAbsolutePath() + "]" + ls + "Please try again..." + ls);
destinationFile = "";
continue;
}
destinationFile = fldr.getAbsolutePath() + "\\p01-runs.txt";
}
}
ArrayList<Integer> list = new ArrayList<>();
/* Prepare for writing to the destination file.
Try With Resourses is use here to auto-close
the writer. */
try (PrintWriter printWriter = new PrintWriter(destinationFile)) {
System.out.println(ls + "Reading file...");
/**
* Reads lines from the file, removes spaces in the line converts
* the string to an integer and adds the integer to the List.
*/
String temp = null;
/* Prepare for writing to the destination file.
Try With Resourses is use here to auto-close
the reader. */
try (Scanner reader = new Scanner(file)) {
while (reader.hasNextLine()) {
temp = reader.nextLine().replaceAll("\\s+", "");
/* Make sure the line isn't blank and that the
line actually contains no alpha characters.
The regular expression: "\\d+" is used for
this with the String#matches() method. */
if (temp.equals("") || !temp.matches("\\d+")) {
continue;
}
int num = Integer.parseInt(temp);
list.add(num);
}
// PLACE YOUR WRITER PROCESSING CODE HERE
}
catch (FileNotFoundException ex) {
Logger.getLogger("createListRunCount() Method Error!").log(Level.SEVERE, null, ex);
}
}
catch (FileNotFoundException ex) {
Logger.getLogger("createListRunCount() Method Error!").log(Level.SEVERE, null, ex);
}
/* return the path and file name of the
destination file auto-created. */
return destinationFile;
}
}
I am new in java. I just wants to read each string in java and print it on console.
Code:
public static void main(String[] args) throws Exception {
File file = new File("/Users/OntologyFile.txt");
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(
fstream));
String data = new String();
while ((data = infile.readLine()) != null) { // use if for reading just 1 line
System.out.println(""+data);
}
} catch (IOException e) {
// Error
}
}
If file contains:
Add label abc to xyz
Add instance cdd to pqr
I want to read each word from file and print it to a new line, e.g.
Add
label
abc
...
And afterwards, I want to extract the index of a specific string, for instance get the index of abc.
Can anyone please help me?
It sounds like you want to be able to do two things:
Print all words inside the file
Search the index of a specific word
In that case, I would suggest scanning all lines, splitting by any whitespace character (space, tab, etc.) and storing in a collection so you can later on search for it. Not the question is - can you have repeats and in that case which index would you like to print? The first? The last? All of them?
Assuming words are unique, you can simply do:
public static void main(String[] args) throws Exception {
File file = new File("/Users/OntologyFile.txt");
ArrayList<String> words = new ArrayList<String>();
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(
fstream));
String data = null;
while ((data = infile.readLine()) != null) {
for (String word : data.split("\\s+") {
words.add(word);
System.out.println(word);
}
}
} catch (IOException e) {
// Error
}
// search for the index of abc:
for (int i = 0; i < words.size(); i++) {
if (words.get(i).equals("abc")) {
System.out.println("abc index is " + i);
break;
}
}
}
If you don't break, it'll print every index of abc (if words are not unique). You could of course optimize it more if the set of words is very large, but for a small amount of data, this should suffice.
Of course, if you know in advance which words' indices you'd like to print, you could forego the extra data structure (the ArrayList) and simply print that as you scan the file, unless you want the printings (of words and specific indices) to be separate in output.
Split the String received for any whitespace with the regex \\s+ and print out the resultant data with a for loop.
public static void main(String[] args) { // Don't make main throw an exception
File file = new File("/Users/OntologyFile.txt");
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(fstream));
String data;
while ((data = infile.readLine()) != null) {
String[] words = data.split("\\s+"); // Split on whitespace
for (String word : words) { // Iterate through info
System.out.println(word); // Print it
}
}
} catch (IOException e) {
// Probably best to actually have this on there
System.err.println("Error found.");
e.printStackTrace();
}
}
Just add a for-each loop before printing the output :-
while ((data = infile.readLine()) != null) { // use if for reading just 1 line
for(String temp : data.split(" "))
System.out.println(temp); // no need to concatenate the empty string.
}
This will automatically print the individual strings, obtained from each String line read from the file, in a new line.
And afterwards, I want to extract the index of a specific string, for
instance get the index of abc.
I don't know what index are you actually talking about. But, if you want to take the index from the individual lines being read, then add a temporary variable with count initialised to 0.
Increment it till d equals abc here. Like,
int count = 0;
for(String temp : data.split(" ")){
count++;
if("abc".equals(temp))
System.out.println("Index of abc is : "+count);
System.out.println(temp);
}
Use Split() Function available in Class String.. You may manipulate according to your need.
or
use length keyword to iterate throughout the complete line
and if any non- alphabet character get the substring()and write it to the new line.
List<String> words = new ArrayList<String>();
while ((data = infile.readLine()) != null) {
for(String d : data.split(" ")) {
System.out.println(""+d);
}
words.addAll(Arrays.asList(data));
}
//words List will hold all the words. Do words.indexOf("abc") to get index
if(words.indexOf("abc") < 0) {
System.out.println("word not present");
} else {
System.out.println("word present at index " + words.indexOf("abc"))
}
My method takes a file, and tries to extract the text between the header ###Title### and closing ###---###. I need it to extract multiple lines and put each line into an array. But since readAllLines() converts all lines into an array, I don't know how to compare and match it.
public static ArrayList<String> getData(File f, String title) throws IOException {
ArrayList<String> input = (ArrayList<String>) Files.readAllLines(f.toPath(), StandardCharsets.US_ASCII);
ArrayList<String> output = new ArrayList<String>();
//String? readLines = somehow make it possible to match
System.out.println("Checking entry.");
Pattern p = Pattern.compile("###" + title + "###(.*)###---###", Pattern.DOTALL);
Matcher m = p.matcher(readLines);
if (m.matches()) {
m.matches();
String matched = m.group(1);
System.out.println("Contents: " + matched);
String[] array = matched.split("\n");
ArrayList<String> array2 = new ArrayList<String>();
for (String j:array) {
array2.add(j);
}
output = array2;
} else {
System.out.println("No matches.");
}
return output;
}
Here is my file, and I'm 100% sure that the compiler is reading the correct one.
###Test File###
Entry 1
Entry 2
Data 1
Data 2
Test 1
Test 2
###---###
The output says "No matches." instead of the entries.
You don't need regex for that. It's enough to loop through the array and compare items line by line, taking those between the start and end tags.
ArrayList<String> input = (ArrayList<String>) Files.readAllLines(f.toPath(), StandardCharsets.US_ASCII);
ArrayList<String> output = new ArrayList<String>();
boolean matched = false;
for (String line : input) {
if (line.equals("###---###") && matched) matched = false; //needed parentheses
if (matched) output.add(line);
if (line.equals("###Test File###") && !matched) matched = true;
}
As per your comment, if they are going to be in the same way as posted, then i don't think regex is needed for this requirement. You can read line by line and do a contains of '###'
public static void main(String args[])
{
ArrayList<String> dataList = new ArrayList<String>();
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// this line will skip the header and footer with '###'
if(!strLine.contains("###");
dataList.add(strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
//Now dataList has all the data between ###Test File### and ###---###
}
You can also change the contains method parameter according to your requirement to ignore lines!