How do I count word occurrences in a csv file? - java

I have a CSV file that I need to read and display the number of occurrences of each word, the application should only count words that have more than one letter and not alphanumerical also turned to lowercase.
This is what I have right now and I'm stuck at this and have no ideas where to go from this.
public static void countWordNumber() throws IOException, CsvException
String pathFile1 = "src/main/resources/Documents/Example.csv"
{
CSVReader reader = new CSVReaderBuilder(new FileReader(pathFile1)).withSkipLines(1).build();
Map<String, Integer> frequency = new HashMap<>();
String[] line;
while ((line = reader.readNext()) != null) {
String words = line[1];
words = words.replaceAll("\\p{Punct}", " ").trim();
words = words.replaceAll("\\s{2}", " ");
words = words.toLowerCase();
if (frequency.containsKey(words)) {
frequency.put(words, frequency.get(words) + 1);
} else {
frequency.put(words, 0);
}
}
}
I am trying to read the second index in the array list of the csv, which is line[1] , This is where the text of the document is located.
I have replaced all punctuation with spaces and trimmed it, also if there are more than 2 spaces I have replaced those with 1 and made it lowercase.
The output I am trying to achieve is:
Title of document: XXXX
Word: is, Value: 3
EDIT: This is an example of my input file.
title,text,date
exampleTitle,This is is is an example example, April 2022

Your solution does not look that bad. But for initialization i would replace
frequency.put(words, 0);
with
frequency.put(words, 1);
Since I am mising your Input file i created a dummy that works fine.
Map<String, Integer> frequency = new HashMap<>();
List<String> csvSimulation = new ArrayList<String>();
csvSimulation.add("test");
csvSimulation.add( "marvin");
csvSimulation.add("aaaaa");
csvSimulation.add("nothing");
csvSimulation.add("test");
csvSimulation.add("test");
csvSimulation.add("aaaaa");
csvSimulation.add("stackoverflow");
csvSimulation.add("test");
csvSimulation.add("bread");
Iterator<String> iterator = csvSimulation.iterator();
while(iterator.hasNext()){
String words = iterator.next();
words = words.toLowerCase();
if (frequency.containsKey(words)) {
frequency.put(words, frequency.get(words) + 1);
} else {
frequency.put(words, 1);
}
}
System.out.println(frequency);
Are you sure that accessing line[1] in an loop while iteration is correct? The correct reading of the input seems to be the problem for me. Without seeing your CSV file i however cant help you any further.
EDIT:
with the provided csv data an adjustemt to your Code like this would solve your Problem
.....
.....
while ((line = reader.readNext()) != null) {
String words = line[1];
words = words.replaceAll("\\p{Punct}", " ").trim();
words = words.replaceAll("\\s{2}", " ");
words = words.toLowerCase();
String[] singleWords = words.split(" ");
for(int i = 0 ; i < singleWords.length; i++) {
String currentWord = singleWords[i];
if (frequency.containsKey(currentWord)) {
frequency.put(currentWord, frequency.get(currentWord) + 1);
} else {
frequency.put(currentWord, 1);
}
}
}
System.out.println("Word: is, Value: " + frequency.get("is"));

You can use a regex match to verify that words fits your criteria before adding it to your HashMap, like so:
if (words.matches("[a-z]{2,}"))
[a-z] specifies only lowercase alpha chars
{2,} specifies "Minimum of 2 occurrences, maximum of <undefined>"
Though, given you're converting punctuation to spaces, it sounds like you could have multiple words in line[1]. If you want to gather counts of multiple words across multiple lines, then you may want to split words on the space char, like so:
for (String word : words.split(" ")) {
if (word.matches("[a-z]{2,}")) {
// Then use your code for checking if frequency contains the term,
// but use `word` instead of `words`
}
}

Just another twist on things:
Since it's been established (by OP) that the CSV file consists of Title, Text, Date data, it can be assumed that every data line of that file is delimited with the typical comma (,) and, that each line of that CSV data file (other than the Header line) can potentially contain a different Title.
Yet, the established desired output (by OP) is:
Title of document: exampleTitle
Word: is, Value: 3
Word: example, Value: 2
Let's change this output so that things are wee more pleasing to the eye:
-------------------------------
Title of document: exampleTitle
-------------------------------
Word Value
=====================
an 1
is 3
example 2
this 1
=====================
Based on this information, it only seems logical that because each file data line contains a Title we need to process and store word occurrences from column 2 only for that data line. With each word being processed we need to maintain the origin of that word so that we know what Title it came from rather than just carrying out a simple occurrence count of all column 2 words in all rows (file lines). This if course then means that the KEY for the Map which would be a word must also hold the origin of the word. This isn't a big deal but a little more thought will be needed when it comes time to pull relevant data from the Map so to display it properly in the Console Window or, to use for other purposes within the application. What we could do is utilize a List Interface of Map, for example:
List<Map<String, Integer>> mapsList = new ArrayList<>();
By doing this we can place each file line processed into a Map and then add that map to a List named mapsList.
The provided text file contents example leaves little to be desired, never the less, it does help to some extent in the sense that, it comforts me to know that, yes, there is a Header line in the CSV data file and the fact that the typical comma is used as the delimiter in the file .... and that's all. So more questions come to mind:
Can more than one file data line contain the same Title (Origin)?
If "yes", then what do you want to do with the word
occurrences for the other line(s)?
Do you want to add them to the first established
Title?
Or do you want to start a new modified Title? (must be a different title name in this case)
Will the second column (the text column) ever possibly contain a
comma delimiter? Commas are very common in text of reasonable
length.
If "yes", is the text in column 2 enclosed in quotation marks?
How long can the text in column 2 possibly get to? (just curious - it's actually irrelevant).
Will there ever be different CSV files to get Word Occurrences from
which contain more than three columns?
Will there ever be a time where Word Occurrences will need to be
derived from more than one column on any CSV file data line?
The method provided below (in the runnable application) named getWordOccurrencesFromCSV() is flexible enough to basically cover all the questions above. This method also makes use of two other helper methods named getWordOccurrences() and combineSameOrigins() to get the job done. These helper methods can be used on their own for other situations if so desired although the combineSameOrigins() method is uniquely designed for the getWordOccurrencesFromCSV() method. The startApp() method gets the ball rolling and displays the generated List of Maps into the console Window.
Here is the runnable code (be sure to read ALL the comments in code):
package so_demo_hybridize;
public class SO_Demo_hybridize {
private final java.util.Scanner userInput = new java.util.Scanner(System.in);
public static void main(String[] args) {
// Started this way to avoid the need for statics.
new SO_Demo_hybridize().startApp(args);
}
private void startApp(String[] args) {
String ls = System.lineSeparator();
String filePath = "DemoCSV.csv"; //"DemoCSV.csv";
/* Retrieve the neccessary data from the supplied CSV
file and place it into a List of Maps:
getWordOccurrencesFromCSV() Parameters Info:
--------------------------------------------
filePath: Path and file name of the CSV file.
"," : The delimiter used in the CSV file.
1 : The literal data column which will hold the Origin String.
2 : The literal data column which will hold text of words to get occurrences from.
1 : The minimum number of occurrences needed to save. */
java.util.List<java.util.Map<String, Integer>> mapsList
= getWordOccurrencesFromCSV(filePath, ",", 1, new int[]{2}, 1);
/* Display what is desired from the gathered file data now
contained within the Maps held within the 'mapsList' List.
Now that you have this List of Maps, you can do whatever
and display whatever you like with the data. */
System.out.println("Word Occurrences In CSV File" + ls
+ "============================" + ls);
for (java.util.Map<String, Integer> maps : mapsList) {
String mapTitle = "";
int cnt = 0;
for (java.util.Map.Entry<String, Integer> entry : maps.entrySet()) {
/* Because the Origin is attached to the Map Key (a word)
we need to split it off. Note the special delimiter. */
String[] keyParts = entry.getKey().split(":\\|:");
String wordOrigin = keyParts[0];
String word = keyParts[1];
if (mapTitle.isEmpty()) {
mapTitle = "Title of document: " + wordOrigin;
// The Title underline...
String underLine = String.join("", java.util.Collections.nCopies(mapTitle.length(), "-"));
System.out.println(underLine + ls + mapTitle + ls + underLine);
// Disaplay a Header and underline
String mapHeader = "Words Values" + ls
+ "=====================";
System.out.println(mapHeader);
cnt++;
}
System.out.println(String.format("%-15s%-6s", word, entry.getValue()));
}
if (cnt > 0) {
// The underline for the Word Occurences table displayed.
System.out.println("=====================");
System.out.println();
}
}
}
/**
* Retrieves and returns a List of Word Occurrences Maps ({#code Map<String,
* Integer>}) from each CSV data line from the specified column. The each
* word is considered the KEY and the number of Occurrences of that word
* would be VALUE. Each KEY in each Map is also prefixed with the Origin
* String of that word delimited with ":|:". READ THIS DOCUMENT IN FULL!
*
* #param csvFilePath (String) The full path and file name of the
* CSV file to process.<br>
*
* #param csvDelimiter (String) The delimiter used within the CSV
* File. This can be and single character
* string including the whitespace. Although
* not mandatory, adding whitespaces to your
* CSV Delimiter argument should be
* discouraged.<br>
*
* #param originFromColumn (Integer) The The CSV File line data column
* literal number where the Origin for the
* evaluated occurrences will be related to. By
* <b><i>literal</i></b> we mean the actual
* column number, <u>not</u> the column Index
* Value. Whatever literal column number is
* supplied, the data within that column should
* be Unique to all other lines within the CSV
* data file. In most CSV files, records in that
* file (each line) usually contains one column
* that contains a Unique ID of some sort which
* identifies that line as a separate record. It
* would be this column which would make a good
* candidate to use as the <b><i>origin</i></b>
* for the <i>Word Occurrences</i> about to be
* generated from the line (record) column
* specified from the argument supplied to the
* 'occurrencesFromColumn' parameter.<br><br>
*
* If null is supplied to this parameter then
* Column one (1) (index 0) of every data line
* will be assumed to contain the Origin data
* string.<br>
*
* #param occurrencesFromColumn (int[] Array) Because this method can gather
* Word Occurrences from 1 <b><i>or
* more</i></b> columns within any given CSV
* file data line, the literal column number(s)
* must be supplied within an <b>int[]</b> array.
* The objective of this method is to obviously
* collect Word Occurrences contained within
* text that is mounted within at least one
* column of any CSV file data line, therefore,
* the literal column number(s) from where the
* Words to process are located need to be suppled.
* By <b><i>literal</i></b> we mean the actual
* column number, <u>not</u> the column Index
* Value. The desired column number (or column
* numbers) can be supplied to this parameter
* in this fashion: <b><i>new int[]{3}</i></b>
* OR <b><i>new int[]{3,5,6}</i></b>.<br><br>
* All words processed, regardless of what
* columns they come from, will all fall under
* the same Origin String.<br><br>
*
* Null <b><u>can not</u></b> be supplied as an
* argument to this parameter.<br>
*
* #param minWordCountRequired (Integer) If any integer value less than 2
* is supplied then all words within the
* supplied Input String will be placed into
* the map regardless of how many there are. If
* however you only want words where there are
* two (2) or more within the Input String then
* supply 2 as an argument to this parameter.
* If you only want words where there are three
* (3) or more within the Input String then
* supply 3 as an argument to this parameter...
* and so on.<br><br>
*
* If null is supplied to this parameter then a
* default of one (1) will be assumed.<br>
*
* #param options (Optional - Two parameters both boolean):<pre>
*
* noDuplicateOrigins - Optional - Default is true. By default, duplicate
* Origins are not permitted. This means that no two
* Maps within the List can contain the same Origin
* for word occurrences. Obviously, it is possible
* for a CSV file to contain data lines which holds
* duplicate Origins but in this case the words in
* the duplicate Origin are added to the Map within
* the List which already contains that Origin and
* if any word in the new duplicate Origin Map is
* found to be in the already stored Map with the
* original Origin then the occurrences count for
* the word in the new Map is added to the same word
* occurrences count of the already stored Map.
*
* If boolean <b>false</b> is optionally supplied to this
* parameter then a duplicate Map with the duplicate
* Origin is added to the List of Maps.
*
* Null can be supplied to this optional parameter.
* You can not just supply a blank comma.
*
* noNumerics - Optional - Default is false. By default, this
* method considers numbers (either integer or
* floating point) as words therefore this
* parameter would be considered as always
* false. If however you don't want then
* occurrences of numbers to be placed into the
* returned Map then you can optionally supply
* an argument of boolean true here.
*
* If null or a boolean value is supplied to this
* optional parameter then null or a boolean value
* <u>must</u> be supplied to the noDuplicateOrigins
* parameter.</pre>
*
* #return ({#code List<Map<String, Integer>>})
*/
#SuppressWarnings("CallToPrintStackTrace")
public java.util.List<java.util.Map<String, Integer>> getWordOccurrencesFromCSV(
String csvFilePath, String csvDelimiter, Integer originFromColumn,
int[] occurrencesFromColumn, Integer minWordCountRequired, Boolean... options) {
String ls = System.lineSeparator();
// Handle invalid arguments to this method...
if (!new java.io.File(csvFilePath).exists()) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The file indicated below can not be found!" + ls
+ csvFilePath + ls);
}
else if (csvFilePath == null || csvFilePath.isEmpty()) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The csvFilePath parameter can not be supplied "
+ "null or a null string!" + ls);
}
else if (csvDelimiter == null || csvDelimiter.isEmpty()) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The csvDelimiter parameter can not be supplied "
+ "null or a null string!" + ls);
}
if (originFromColumn == null || originFromColumn < 1) {
originFromColumn = 1;
}
for (int i = 0; i < occurrencesFromColumn.length; i++) {
if (occurrencesFromColumn[i] == originFromColumn) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The 'occurrencesFromColumn' argument ("
+ occurrencesFromColumn[i] + ")" + ls + "can not be the same column "
+ "as the 'originFromColumn' argument (" + originFromColumn
+ ")!" + ls);
}
else if (occurrencesFromColumn[i] < 1) {
throw new IllegalArgumentException(ls + "getWordOccurrencesFromCSV() "
+ "Method Error! The argument for the occurrencesFromColumn "
+ "parameter can not be less than 1!" + ls);
}
}
if (minWordCountRequired == null || minWordCountRequired < 2) {
minWordCountRequired = 1;
}
final int minWrdCnt = minWordCountRequired;
// Take care of the Optional Parameters
boolean noDuplicateOrigins = true;
boolean noNumerics = false;
if (options != null && options.length > 0) {
if (options[0] != null && options.length >= 1) {
noDuplicateOrigins = options[0];
}
if (options[1] != null && options.length >= 2) {
noNumerics = options[1];
}
}
java.util.List<java.util.Map<String, Integer>> mapsList = new java.util.ArrayList<>();
// 'Try With Resources' is used here to auto-close file and free resources.
try (java.util.Scanner reader = new java.util.Scanner(new java.io.FileReader(csvFilePath))) {
String line = reader.nextLine(); // Skip the Header line (first line in file).
String origin = "", date = "";
java.util.Map<String, Integer> map;
while (reader.hasNextLine()) {
line = reader.nextLine().trim();
// Skip blank lines (if any)
if (line.isEmpty()) {
continue;
}
// Get columnar data from data line
// If there are no quotation marks in data line.
String regex = "\\s*\\" + csvDelimiter + "\\s*";
/* If there are quotation marks in data line and they are
actually balanced. If they're not balanced the we obviously
use the regular expression above. The regex below ignores
the supplied delimiter contained between quotation marks. */
if (line.contains("\"") && line.replaceAll("[^\"]", "").length() % 2 == 0) {
regex = "\\s*\\" + csvDelimiter + "\\s*(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)";
}
String[] csvColumnStrings = line.split(regex);
// Acquire the Origin String
origin = csvColumnStrings[originFromColumn - 1];
// Get the Word Occurrences from the provided column number(s)...
for (int i = 0; i < occurrencesFromColumn.length; i++) {
/* Acquire the String to get Word Occurrences from
and remove any punctuation characters from it. */
line = csvColumnStrings[occurrencesFromColumn[i] - 1];
line = line.replaceAll("\\p{Punct}", "").replaceAll("\\s+", " ").toLowerCase();
// Get Word Occurrences...
map = getWordOccurrences(origin, line, noNumerics);
/* Has same Origin been processed before?
If so, do we add this one to the original? */
if (noDuplicateOrigins || i > 0) {
if (combineSameOrigins(mapsList, map) > 0) {
continue;
}
}
mapsList.add(map);
}
}
}
catch (java.io.FileNotFoundException ex) {
ex.printStackTrace();
}
/* Remove any words from all the Maps within the List that
does not meet our occurrences minimum count argument
supplied to the 'minWordCountRequired' parameter.
(Java8+ needed) */
for (java.util.Map<String, Integer> mapInList : mapsList) {
mapInList.entrySet().removeIf(e -> e.getValue() < minWrdCnt);
}
return mapsList; // Return the generated List of Maps
}
/**
* This method will go through all Maps contained within the supplied List of
* Maps and see if the Origin String within the supplied Map already exists
* within a Listed Map. If it does then those words within the Supplied Map
* are added to the Map within the List is they don't already exist there.
* If any words from the Supplied Map does exist within the Listed Map then
* only the count values from those words are summed to the words within the
* Listed Map.<br>
*
* #param list ({#code List of Map<String, Integer>}) The List Interface which
* contains all the Maps of Word Occurrences or different Origins.<br>
*
* #param suppliedMap ({#code Map<String, Integer> Map}) The Map to check
* against all Maps contained within the List of Maps for duplicate Origins.
*
* #return (int) The number of words added to any Map contained within the
* List which contains the same Origin.
*/
public int combineSameOrigins(java.util.List<java.util.Map<String, Integer>> list,
java.util.Map<String, Integer> suppliedMap) {
int wrdCnt = 0;
String newOrigin = suppliedMap.keySet().stream().findFirst().get().split(":\\|:")[0].trim();
String originInListedMap;
for (java.util.Map<String, Integer> mapInList : list) {
originInListedMap = mapInList.keySet().stream().findFirst().get().split(":\\|:")[0].trim();
if (originInListedMap.equals(newOrigin)) {
wrdCnt++;
for (java.util.Map.Entry<String, Integer> suppliedMapEntry : suppliedMap.entrySet()) {
String key = suppliedMapEntry.getKey();
int value = suppliedMapEntry.getValue();
boolean haveIt = false;
for (java.util.Map.Entry<String, Integer> mapInListEntry : mapInList.entrySet()) {
if (mapInListEntry.getKey().equals(key)) {
haveIt = true;
mapInListEntry.setValue(mapInListEntry.getValue() + value);
break;
}
}
if (!haveIt) {
mapInList.put(key, value);
}
}
}
}
return wrdCnt;
}
/**
* Find the Duplicate Words In a String And Count the Number Of Occurrences
* for each of those words. This method will fill and return a Map of all
* the words within the supplied string (as Key) and the number of
* occurrences for each word (as Value).<br><br>
* <p>
* <b>Example to read the returned Map and display in console:</b><pre>
* {#code for (java.util.Map.Entry<String, Integer> entry : map.entrySet()) {
* System.out.println(String.format("%-12s%-4s", entry.getKey(), entry.getValue()));
* } }</pre>
*
* #param origin (String) The UNIQUE origin String of what the word is
* related to. This can be anything as long as this same
* origin string is applied to all the related words of
* the same Input String. A unique ID of some sort or a
* title string of some kind would work fine for
* this.<br>
*
* #param inputString (String) The string to process for word
* occurrences.<br>
*
* #param noNumerics (Optional - Default - false) By default, this method
* considers numbers (either integer or floating point)
* as words therefore this parameter would be considered
* as always false. If however you don't want the
* occurrences of numbers to be placed into the returned
* Map then you can optionally supply an argument of
* boolean true here.<br>
*
* #return ({#code java.util.Map<String, Integer>}) Consisting of individual
* words found within the supplied String as KEY and the number of
* occurrences as VALUE.
*/
public static java.util.Map<String, Integer> getWordOccurrences(String origin,
String inputString, Boolean... noNumerics) {
boolean allowNumbersAsWords = true;
if (noNumerics.length > 0) {
if (noNumerics[0] != null) {
allowNumbersAsWords = !noNumerics[0];
}
}
// Use this to have Words in Ascending order.
java.util.TreeMap<String, Integer> map = new java.util.TreeMap<>();
// Use this to have Words in the order of Insertion (when they're added).
//java.util.LinkedHashMap<String, Integer> map = new java.util.LinkedHashMap<>();
// Use this to have Words in a 'who cares' order.
//java.util.Map<String, Integer> map = new java.util.HashMap<>();
String[] words = inputString.replaceAll("[^A-Za-z0-9' ]", "")
.replaceAll("\\s+", " ").trim().split("\\s+");
for (int i = 0; i < words.length; i++) {
String w = words[i];
if (!allowNumbersAsWords && w.matches("-?\\d+(\\.\\d+)?")) {
continue;
}
if (map.containsKey(origin + ":|:" + w)) {
int cnt = map.get(origin + ":|:" + w);
map.put(origin + ":|:" + w, ++cnt);
}
else {
map.put(origin + ":|:" + w, 1);
}
}
return map;
}
}

Related

Be able to enter a value, and a string in a java scanner to output a ranking

I'm trying to create a ranking that displays this:
int(value) - String(username)
(In total ten times even if I enter 30 values and 30 nicknames)
Here is my working code:
public class Methods {
private static final ArrayList<Double> nbAll = new ArrayList<>();
private static final ArrayList<String> pseudoAll = new ArrayList<>();
public static void test() {
try (Scanner scanner = new Scanner(System.in)) {
System.out.print(ANSI_RED + "Please enter the number of notes you want to calculate : ");
double nb = scanner.nextInt();
String pseudo = scanner.next();
for (int i = 0; i < nb; i++) {
double temp = scanner.nextDouble();
nbAll.add(temp);
}
System.out.println("------------");
System.out.println("Ranking: ");
nbAll.stream().sorted(Comparator.reverseOrder()).forEach(System.out::println);
retry();
}
}
I tried : To make a second for loop to be forced to enter the username in string but it didn't work and for the ranking I didn't succeed yet
Screen for Desired operation: https://i.imgur.com/0QlGHd8.png
In this particular case I personally think it may be a little better if you used a HashMap or Map Interface to store the required data. It's rather ideal for this sort of thing since the User Name should be unique and can be used as the Key and the Rank as the Value since several Users could potentially contain the same rank value:
Map<String, Integer> map = new HashMap<>();
Another thing which may make life a little easier is for the User to enter the Rank AND the User Name related to that rank on a single line separated with a whitespace or a tab or whatever, for example:
Ranking #1:
Enter a Rank value followed by a User Name separated with space,
for example: 250 John Doe. Enter 'd' when done.
Your entry: --> |
Of course validation would need to be carried out so to ensure proper entry is done but this isn't overly difficult using the String#matches() method and a small Regular Expression (regex), for example:
if (!myString.matches("^\\d+\\s+.{1,}$")) {
System.err.println("Invalid Entry! Try again...");
System.err.println();
myString = "";
continue;
}
What the regular expression "^\\d+\\s+.{1,}$" above passed to the String#matches() method does is that it validates the fact that the first component of the supplied User entry is in fact a string representation of a Integer value consisting of one or more digits. It then checks to make sure at least one whitespace follows that numerical value and then after the space it expects to see at least 1 (or more) of any characters after the space(s) which is to essentially be the User Name. Of course if the User enters the data incorrectly then an Invalid Entry warning would be issued and the user is given the opportunity to attempt the entry again.
Once valid input has been acquired the data now needs to of course be split into its' respective data types before it can be applied to the the Map Interface object. This of course is done with the String#split() method:
String[] stringParts = myString.split("\\s+");
This will create a String[] Array named stringParts. The \\s+ regular expression tells the split() method to split the string on one or more whitespaces ' ' (or Tabs \t, newlines \n, Carriage Returns \r, form-feeds \f, and vertical tabulations \x0B). This would cover pretty much all the cases for the Users required entry.
Now that we have the array we know that the first element of that array will be the supplied Ranking value. We want to convert this into an Integer data type before adding to our Map, like this:
int rank = Integer.parseInt(stringParts[0]);
Now we want the User Name. Because in this example we also allow for multiple names like First and Last names, a little more is involved to add the names together so to make a single User Name string from it all. Remember we split the data entry on whitespaces so if there are multiple names we could potentially have more than just two elements within the stringParts[] array. We'll need to build the userName string. We use a for loop and the StringBuilder class to do this, for example:
String[] stringParts = tmp.split("\\s+");
int rank = Integer.parseInt(stringParts [0]);
StringBuilder sb = new StringBuilder("");
for (int i = 1; i < stringParts .length; i++) {
if (!sb.toString().isEmpty()) {
sb.append(" ");
}
sb.append(stringParts [i]);
}
String userName = sb.toString();
Okay...now we have the User Name so let's make sure a ranking with that User Name isn't already contained within the Map:
if (map.containsKey(userName)) {
System.err.println("A ranking for '" + userName
+ "' has already been supplied! Try again...");
System.err.println();
myString = "";
continue;
}
If we pass to this point then all is good and we can add the data to the Map:
map.put(userName, rank);
This may seem a little long winded but in my opinion, it's not. Below is a working example or all the above in use:
Scanner userInput = new Scanner(System.in);
Map<String, Integer> map = new HashMap<>();
int count = 0;
String tmp = "";
while (tmp.isEmpty()) {
System.out.println("Ranking #" + (count+1) + ":");
System.out.print("Enter a Rank value followed by a User Name separated "
+ "with space,\nfor example: 250 John Doe. Enter 'd' when done.\n"
+ "Your entry: --> ");
tmp = userInput.nextLine();
if (tmp.equalsIgnoreCase("d")) {
break;
}
if (!tmp.matches("^\\d+\\s+.{1,}$")) {
System.err.println("Invalid Entry! Try again...");
System.err.println();
tmp = "";
continue;
}
String[] parts = tmp.split("\\s+");
int rank = Integer.parseInt(parts[0]);
StringBuilder sb = new StringBuilder("");
for (int i = 1; i < parts.length; i++) {
if (!sb.toString().isEmpty()) {
sb.append(" ");
}
sb.append(parts[i]);
}
String userName = sb.toString();
if (map.containsKey(userName)) {
System.err.println("A ranking for '" + userName
+ "' has already been supplied! Try again...");
System.err.println();
tmp = "";
continue;
}
count++;
map.put(userName, rank);
tmp = "";
System.out.println();
}
// Sort the map by RANK value in 'descending' order:
Map<String, Integer> sortedMap = map.entrySet().stream().sorted(Map.Entry.<String,
Integer>comparingByValue().reversed()).collect(java.util.stream.Collectors.toMap(Map.Entry::
getKey, Map.Entry::getValue,(e1, e2) -> e1, java.util.LinkedHashMap::new));
// If you want the Rank values sorted in 'Ascending' order then use below instead:
/* Map<String, Integer> sortedMap2= map.entrySet().stream().sorted(Map.Entry.<String,
Integer>comparingByValue()).collect(java.util.stream.Collectors.toMap(Map.Entry::
getKey, Map.Entry::getValue,(e1, e2) -> e1, java.util.LinkedHashMap::new)); */
// Display the rankings in Console Window:
System.out.println();
System.out.println("You entered " + count + " rankings and they are as follows:");
System.out.println();
// Table header
String header = String.format("%-4s %-15s %-6s",
"No.", "User Name", "Rank");
System.out.println(header);
// The header underline
System.out.println(String.join("", java.util.Collections.nCopies(header.length(), "=")));
// The rankings in spaced format...
count = 1;
for (Map.Entry<String,Integer> enties : sortedMap.entrySet()) {
System.out.printf("%-4s %-15s %-6d %n",
String.valueOf(count) + ")",
enties.getKey(),
enties.getValue());
count++;
}
When the above code is run, The User is asked to supply a Rank value and a User Name related to that rank. The User is then asked to enter another and another and another until that User enter 'd' (for done). All entries provided are then displayed within the Console Window in a table type format. The Rankings within the Map had been sorted in descending order before (highest rank first) before displaying them. If you prefer Ascending order then that code is also provided but is currently commented out.

Replace all black-labeled items if not part of the white-labeled items in text

I need to replace all occurrences from text with *, which are
available in the black-labeled list and
bordered by (spaces or "," or "." or start or end of text) and
not part of any white-labeled item
Example:
["is", "panter"] // black-labeled
["pink panter", "blue panter"] // white-labeled
"This is pink panter." -> "This * pink panter."
"This is black panter." -> "This * black *."
String input = ...;
List<String> whiteLabelded = ...;
List<String> blackLabelded = ...;
String enhencedText = enhenceText(input, whiteLabelded, blackLabelded);
My try:
Map<String, String> whiteLabeldedMap = assignUniquePlaceHolder(whiteLabelded); // like ####1, ####2, ####3
String output = input;
whiteLabeldedMap.forEach((key, value) -> output = output.replace(key, value)); // replace white labeled strings
blackLabelded.forEach(key -> output = output.replace(key, "*")); // replace black labeled strings
whiteLabeldedMap.forEach((key, value) -> output = output.replace(value, key)); // return white labeled strings
Is it possible to implement better?
A first version that roughly does what you want, very inefficiently, but hey, it needs to be correct before it's fast. Adapt to your needs and improve performance if needed.
import java.util.Arrays;
import java.util.List;
public class BlackWhiteLists {
/**
* Replace words in black lists by * unless they are in whitelist
* #param input the input
* #param white the white list
* #param black the black list
* #return the replaced string
*/
public static String enhanceText(String input, List<String> white, List<String> black) {
String result=input;
for (String bs:black) {
// find text to remove
int ix=result.indexOf(bs);
while (ix>-1) {
boolean ok=true;
// find if text we want to keep overlaps
for (String ws:white) {
int ix2=input.indexOf(ws);
while (ok && ix2>-1) {
// overlaps
if (ix>=ix2 && ix<ix2+ws.length() ) {
ok=false;
}
// search for next instance of white
ix2=input.indexOf(ws,ix2+ws.length());
}
}
// no overlap, we replace
if (ok && isWord(result,ix,bs.length())) {
result=result.substring(0,ix)+"*"+result.substring(ix+bs.length());
// search for next instance of black
ix=result.indexOf(bs,ix+1);
} else {
// search for next instance of black
ix=result.indexOf(bs,ix+bs.length());
}
}
}
return result;
}
/**
* Is the text at the specific place in the input a word
* #param input the input string
* #param ix the start index
* #param length the text length
* #return true if the text is not preceded or followed by another letter
*/
public static boolean isWord(String input, int ix, int length) {
if (ix>0) {
if (Character.isLetter(input.charAt(ix-1))){
return false;
}
}
if (ix+length<input.length()) {
if (Character.isLetter(input.charAt(ix+length))){
return false;
}
}
return true;
}
public static void main(String[] args) {
List<String> white=Arrays.asList("pink panter", "blue panter");
List<String> black=Arrays.asList("is", "panter");
System.out.println(enhanceText("This is pink panter.",white,black));
System.out.println(enhanceText("This is black panter.",white,black));
}
}
I am not sure if this is of help. Anyway i assume you want to solve this task with some hints? So here are some hints:
A text can be considered an array of characters (assuming it is one long string). First letter has index 0 and the last character has index "length-1".
1 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
T h i s i s b l a c k p a n t e r .
Knowing this you can start the comparisons:
for a black labeled string (here: panter) see if it is found in the string
if the string is not found -> good. nothing to do
if the string is found write down the index where it starts (java string substring gives information about this) and how long it is (eg start index 14, length 6 for given example "panter")
now do for all white labeled items
find if it matches the string and if it does: find where the start index is and how long the match is (eg match at position 8 and length is 12 for above example black panter)
(java string object has methods for this) if the location overlaps with the location from step 2. than the string is white labeled and will remain unchanged. Otherwise it has to be replaced (again the java string object has methods for his)

Put multiple values in arrays from cvs file

Values are separated with comma, following format:
Country,Timescale,Vendor,Units
Africa,2010 Q3,Fujitsu Siemens,2924.742632
I want to make array for every value. How can I do it?
I tried many things, code below:
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] country = line.split(cvsSplitBy);
country[0] +=",";
String[] Krajina = country[0].split(",");
What you appear to be talking about is utilizing what is otherwise known as Parallel Arrays and is generally a bad idea in this particular use case since it can be prone to OutOfBounds Exceptions later on down the road. A better solution would be to utilize a Two Dimensional (2D) Array or an ArrayList. Never the less, parallel arrays it is:
You say an array size of 30, well maybe today but tomorrow it might be 25 or 40 so in order to size your Arrays to hold the file data you will need to know how many lines of that actual raw data is contained within the CSV file (excluding Header, possible comments, and possible blank lines). The easiest way would be to just dump everything into separate ArrayList's and then convert them to their respective arrays later on be it String, int's, long's, double, whatever.
Counting file lines first so as to initialize Arrays:
One line of code can give you the number of lines contained within a supplied text file:
long count = Files.lines(Paths.get("C:\\MyDataFiles\\DataFile.csv")).count();
In reality however, on its own the above code line does need to be enclosed within a try/catch block in case of a IO Exception so there is a wee bit more code than a single line. For a simple use case where the CSV file contains a Header Line and no Comment or Blank lines this could be all you need since all you would need to do is subtract one to eliminate the Header Line from the overall count for initializing your Arrays. Another minor issue with the above one-liner is the fact that it provides a count value in a Long Integer (long) data type. This is no good since Java Arrays will only accept Integer (int) values for initialization therefore the value obtained will need to be cast to int, for example:
String[] countries = new String[(int) count];
and this is only good if count does not exceed the Integer.MAX_VALUE - 2 (2147483645). That's a lot of array elements so in general you wouldn't really have a problem with this but if are dealing with extremely large array initializations then you will also need to consider JVM Memory and running out of it.
Sometimes it's just nice to have a method that could be used for a multitude of different situations when getting the total number of raw data lines from a CSV (or other) text file. The provided method below is obviously more than a single line of code but it does provide a little more flexibility towards what to count in a file. As mentioned earlier there is the possibility of a Header Line. A Header line is very common in CSV files and it is usually the first line within the file but this may not always be the case. The Header line could be preceded with a Comment Line of even a Blank Line. The Header line however should always be the first line before the raw data lines. Here is an example of a possible CSV file:
Example CSV file contents:
# Units Summary Report
# End Date: May 27, 2019
Country,TimeScale,Vendor,Units
Czech Republic,2010 Q3,Fujitsu Siemens,2924.742032
Slovakia,2010 Q4,Dell,2525r.011404
Slovakia,2010 Q4,Lenovo,2648.973238
Czech Republic,2010 Q3,ASUS,1323.507139
Czech Republic,2010 Q4,Apple,266.7584542
The first two lines are Comment Lines and Comment Lines always begin with either a Hash (#) character or a Semicolon (;). These lines are to be ignored when read.
The third line is a Blank Line and serves absolutely no purpose other than aesthetics (easier on the eyes I suppose). These lines are also to be ignored.
The fourth line which is directly above the raw data lines is the Header Line. This line may or may not be contained within a CSV file. Its purpose is to provide the Column Names for the data records contained on each raw data line. This line can be read (if it exists) to acquire record field (column) names.
The remaining lines within the CSV file are Raw Data Lines otherwise considered data records. Each line is a complete record and each delimited element of that record is considered a data field value. These are the lines you want to count so as to initialize your different Arrays. Here is a method that allows you to do that:
The fileLinesCount() Method:
/**
* Counts the number of lines within the supplied Text file. Which lines are
* counted depends upon the optional arguments supplied. By default, all
* file lines are counted.<br><br>
*
* #param filePath (String) The file path and name of file (with
* extension) to count lines in.<br>
*
* #param countOptions (Optional - Boolean) Three Optional Parameters. If an
* optional argument is provided then the preceeding
* optional argument MUST also be provided (be it true
* or false):<pre>
*
* ignoreHeader - Default is false. If true is passed then a value of
* one (1) is subtracted from the sum of lines detected.
* You must know for a fact that a header exists before
* passing <b>true</b> to this optional parameter.
*
* ignoreComments - Default is false. If true is passed then comment lines
* are ignored from the count. Only file lines (after being
* trimmed) which <b>start with</b> either a semicolon (;) or a
* hash (#) character are considered a comment line. These
* characters are typical for comment lines in CSV files and
* many other text file formats.
*
* ignoreBlanks - Default is false. If true is passed then file lines
* which contain nothing after they are trimmed is ignored
* in the count.
*
* <u>When a line is Trimmed:</u>
* If the String_Object represents an empty character
* sequence then reference to this String_Object is
* returned. If both the first & last character of the
* String_Object have codes greater than unicode ‘\u0020’
* (the space character) then reference to this String_Object
* is returned. When there is no character with a code
* greater than unicode ‘\u0020’ (the space character)
* then an empty string is created and returned.
*
* As an example, a trimmed line removes leading and
* trailing whitespaces, tabs, Carriage Returns, and
* Line Feeds.</pre>
*
* #return (Long) The number of lines contained within the supplied text
* file.
*/
public long fileLinesCount(final String filePath, final boolean... countOptions) {
// Defaults for optional parameters.
final boolean ignoreHeader = (countOptions.length >= 1 ? countOptions[0] : false);
// Only strings in lines that start with ';' or '#' are considered comments.
final boolean ignoreComments = (countOptions.length >= 2 ? countOptions[1] : false);
// All lines that when trimmed contain nothing (null string).
final boolean ignoreBlanks = (countOptions.length >= 3 ? countOptions[2] : false);
long count = 0; // lines Count variable to hold the number of lines.
// Gather supplied arguments for optional parameters
try {
if (ignoreBlanks) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreComments
? (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
&& line.trim().length() > 0 : line.trim().length() > 0)).count();
if (ignoreHeader) {
count--;
}
return count;
}
if (ignoreComments) {
// Using lambda along with Ternary Operator
count = Files.lines(Paths.get(filePath)).filter(line -> (ignoreBlanks ? line.trim().length() > 0
&& (!line.trim().startsWith(";") && !line.trim().startsWith("#"))
: (!line.trim().startsWith(";") && !line.trim().startsWith("#")))).count();
if (ignoreHeader) {
count--;
}
return count;
}
else {
count = Files.lines(Paths.get(filePath)).count();
if (ignoreHeader) {
count--;
}
}
}
catch (IOException ex) {
Logger.getLogger("fileLinesCount() Method Error!").log(Level.SEVERE, null, ex);
}
return count;
}
Filling the Parallel Arrays:
Now it time to create a method to fill the desired Arrays and by looking at the data file it look like you need three String type arrays and one double type Array. You may want to make these instance or Class member variables:
// Instance (Class Member) variables:
String[] country;
String[] timeScale;
String[] vendor;
double[] units;
then for filling these arrays we would use an method like this:
/**
* Fills the 4 class member array variables country[], timeScale[], vendor[],
* and units[] with data obtained from the supplied CSV data file.<br><br>
*
* #param filePath (String) Full Path and file name of the CSV data file.<br>
*
* #param fileHasHeader (Boolean) Either true or false. Supply true if the CSV
* file does contain a Header and false if it does not.
*/
public void fillDataArrays(String filePath, boolean fileHasHeader) {
long dataCount = fileLinesCount(filePath, fileHasHeader, true, true);
/* Java Arrays will not accept the long data type for sizing
therefore we cast to int. */
country = new String[(int) dataCount];
timeScale = new String[(int) dataCount];
vendor = new String[(int) dataCount];
units = new double[(int) dataCount];
int lineCounter = 0; // counts all lines contained within the supplied text file
try (Scanner reader = new Scanner(new File("DataFile.txt"))) {
int indexCounter = 0;
while (reader.hasNextLine()) {
lineCounter++;
String line = reader.nextLine().trim();
// Skip comment and blank file lines.
if (line.startsWith(";") || line.startsWith("#") || line.equals("")) {
continue;
}
if (indexCounter == 0 && fileHasHeader) {
/* Since we are skipping the header right away we
now no longer need the fileHasHeader flag. */
fileHasHeader = false;
continue; // Skip the first line of data since it's a header
}
/* Split the raw data line based on a comma (,) delimiter.
The Regular Expression (\\s{0,},\\s{0,}") ensures that
it doesn't matter how many spaces (if any at all) are
before OR after the comma, the split removes those
unwanted spaces, even tabs are removed if any.
*/
String[] splitLine = line.split("\\s{0,},\\s{0,}");
country[indexCounter] = splitLine[0];
timeScale[indexCounter] = splitLine[1];
vendor[indexCounter] = splitLine[2];
/* The Regular Expression ("-?\\d+(\\.\\d+)?") below ensures
that the value contained within what it to be the Units
element of the split array is actually a string representation
of a signed or unsigned integer or double/float numerical value.
*/
if (splitLine[3].matches("-?\\d+(\\.\\d+)?")) {
units[indexCounter] = Double.parseDouble(splitLine[3]);
}
else {
JOptionPane.showMessageDialog(this, "<html>An invalid Units value (<b><font color=blue>" +
splitLine[3] + "</font></b>) has been detected<br>in data file line number <b><font " +
"color=red>" + lineCounter + "</font></b>. A value of <b>0.0</b> has been applied<br>to " +
"the Units Array to replace the data provided on the data<br>line which consists of: " +
"<br><br><b><center>" + line + "</center></b>.", "Invalid Units Value Detected!",
JOptionPane.WARNING_MESSAGE);
units[indexCounter] = 0.0d;
}
indexCounter++;
}
}
catch (IOException ex) {
Logger.getLogger("fillDataArrays() ethod Error!").log(Level.SEVERE, null, ex);
}
}
To get the ball rolling just run the following code:
/// Fill the Arrays with data.
fillDataArrays("DataFile.txt", true);
// Display the filled Arrays.
System.out.println(Arrays.toString(country));
System.out.println(Arrays.toString(timeScale));
System.out.println(Arrays.toString(vendor));
System.out.println(Arrays.toString(units));
You have to define your arrays before processing your file :
String[] country = new String[30];
String[] timescale = new String[30];
String[] vendor = new String[30];
String[] units = new String[30];
And while reading lines you have to put the values in the defined arrays with the same index, to keep the index use another variable and increase it at every iteration. It should look like this:
int index = 0;
while (true) {
if (!((line = br.readLine()) != null)) break;
String[] splitted = line.split(",");
country[index] = splitted[0];
timescale[index] = splitted[1];
vendor[index] = splitted[2];
units[index] = splitted[3];
index++;
}
Since your csv would probably include headers in it, you may also want to skip the first line too.
Always try to use try-with-resources when using I/O
The following code should help you out:
String line = "";
String cvsSplitBy = ",";
List<String> countries = new ArrayList<>();
List<String> timeScales = new ArrayList<>();
List<String> vendors = new ArrayList<>();
List<String> units = new ArrayList<>();
//use try-with resources
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
String[] parts = line.split(cvsSplitBy);
countries.add(parts[0]);
timeScales.add(parts[1]);
vendors.add(parts[2]);
units.add(parts[3]);
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
for (String country: countries) {
System.out.println(country);
}
for (String scale: timeScales) {
System.out.println(scale);
}
for (String vendor: vendors) {
System.out.println(vendor);
}
for (String unit: units) {
System.out.println(unit);
}

Java JOptionPane from a created thread - window not showing components

I'm working on a application that reads strings from a document and replaces every occurance of a given word with another word (by user input).
The program runs with three seperate threads ,one for reading data from file to the buffer, one for modifying the strings and one for writing the output.
However, if a checkbox is marked as notify-user then I need to ask the user if he wants to replace the substring at a given 'hit'. Now here is the problem, when I try to use JOptionPane.showConfirmDialog(...) from the modify-thread then the window doesn't contain any content (blank white box).
I also tried to use SwingUtilities.InvokeLater(new Runnable(){ ...logic...} which did work for showing the confirm-box but the other threads continued running in pararell (I need them to stop and wait for user input).
/**
* Checks the status of the string at each position in the buffer. If the status = Status.New and the String-object
* matches to the string to replace then it will be replaced with the String-object replaceString.
* <p>
* If the Status of the object is anything other than Status.New then the thread will be blocked.
* <p>
* When done, the status of the modified object is changed to Status.Checked.
*/
public synchronized void modify() {
try {
while (status[findPos] != Status.New) {
wait();
}
String oldString = buffer[findPos];
if (buffer[findPos].equals(findString)) {
buffer[findPos] = replace(findString, replaceString, start, findString.length());
}
start += oldString.length() + 1;
status[findPos] = Status.Checked;
findPos = (findPos + 1) % maxSize;
} catch (InterruptedException e) {
e.printStackTrace();
}
notify();
}
/**
* Replaces the strSource with strReplace and marks the word in the source-tab JTextPane. The start argument
* represents the index at position to replace the substring, the size argument represents the substring's
* length.
*
* TODO : if notifyUser -> ask for user prompt before replacing.
*
* #param strSource : String
* #param strReplace : String
* #param start : int
* #param size : int
* #return s : String
*/
public String replace(String strSource, String strReplace, int start, int size) {
String s = strSource;
DefaultHighlighter.DefaultHighlightPainter highlightPainter =
new DefaultHighlighter.DefaultHighlightPainter(Color.YELLOW);
//Ask user if he wants to replace the substring at position 'start'.
if (notifyUser) {
int x= JOptionPane.showConfirmDialog(null, "TEST", "TEST", JOptionPane.YES_NO_OPTION);
} else {
try {
textPaneSource.getHighlighter().addHighlight(start, start + size,
highlightPainter);
} catch (BadLocationException e) {
e.printStackTrace();
}
s = strReplace;
nbrReplacement++;
}
return s;
}
I think you need a ThreadLocal variable which is shared between threads. then you must check it in every thread you want to suspend it.

Java spell checker using hash tables

I don't want any codes. I really want to learn the logic myself but I need pointing to the right direction. Pseudocode is fine. I basically need to create a spell checker using hash tables as my primary data structure. I know it may not be the best data structure for the job but that it what i was tasked to do. The words with correct spellings will come from a text file. Please guide me on how to approach the problem.
The way I'm thinking of doing it:
I'm guessing I need to create a ADT class that takes the string words.
I need a main class that reads the dictionary text file and takes a sentence inputted by a user. This class then scans that string of words then places each word into an ArrayList by noting the spaces in between the words. A boolean method will then pass each word in the Arraylist to the class that will handle misspellings and return if the word is valid or false.
I believe I need to create a class that generates the misspellings from the word list and stores them into the hash table? There will be a boolean method that takes a string parameter that checks in the table if the word is valid and return true or false.
In generating the misspellings, the key concepts I will have to look out for will be:
(Take for example the word: "Hello")
Missing characters. E.g. "Ello", "Helo"
Jumbled version of the word. E.g. "ehllo", "helol"
Phonetic misspelling. E.g. "fello" ('f' for 'h')
How can I improve on this thinking?
EDIT! This is what I came up with using HashSet
/**
* The main program that loads the correct dictionary spellings
* and takes input to be analyzed from user.
* #author Catherine Austria
*/
public class SpellChecker {
private static String stringInput; // input to check;
private static String[] checkThis; // the stringInput turned array of words to check.
public static HashSet dictionary; // the dictionary used
/**
* Main method.
* #param args Argh!
*/
public static void main(String[] args) {
setup();
}//end of main
/**
* This method loads the dictionary and initiates the checks for errors in a scanned input.
*/
public static void setup(){
int tableSIZE=59000;
dictionary = new HashSet(tableSIZE);
try {
//System.out.print(System.getProperty("user.dir"));//just to find user's working directory;
// I combined FileReader into the BufferReader statement
//the file is located in edu.frostburg.cosc310
BufferedReader bufferedReader = new BufferedReader(new FileReader("./dictionary.txt"));
String line = null; // notes one line at a time
while((line = bufferedReader.readLine()) != null) {
dictionary.add(line);//add dictinary word in
}
prompt();
bufferedReader.close(); //close file
}
catch(FileNotFoundException ex) {
ex.printStackTrace();//print error
}
catch(IOException ex) {
ex.printStackTrace();//print error
}
}//end of setUp
/**
* Just a prompt for auto generated tests or manual input test.
*/
public static void prompt(){
System.out.println("Type a number from below: ");
System.out.println("1. Auto Generate Test\t2.Manual Input\t3.Exit");
Scanner theLine = new Scanner(System.in);
int choice = theLine.nextInt(); // for manual input
if(choice==1) autoTest();
else if(choice==2) startwInput();
else if (choice==3) System.exit(0);
else System.out.println("Invalid Input. Exiting.");
}
/**
* Manual input of sentence or words.
*/
public static void startwInput(){
//printDictionary(bufferedReader); // print dictionary
System.out.println("Spell Checker by C. Austria\nPlease enter text to check: ");
Scanner theLine = new Scanner(System.in);
stringInput = theLine.nextLine(); // for manual input
System.out.print("\nYou have entered this text: "+stringInput+"\nInitiating Check...");
/*------------------------------------------------------------------------------------------------------------*/
//final long startTime = System.currentTimeMillis(); //speed test
WordFinder grammarNazi = new WordFinder(); //instance of MisSpell
splitString(removePunctuation(stringInput));//turn String line to String[]
grammarNazi.initialCheck(checkThis);
//final long endTime = System.currentTimeMillis();
//System.out.println("Total execution time: " + (endTime - startTime) );
}//end of startwInput
/**
* Generates a testing case.
*/
public static void autoTest(){
System.out.println("Spell Checker by C. Austria\nThis sentence is being tested:\nThe dog foud my hom. And m ct hisse xdgfchv!## ");
WordFinder grammarNazi = new WordFinder(); //instance of MisSpell
splitString(removePunctuation("The dog foud my hom. And m ct hisse xdgfchv!## "));//turn String line to String[]
grammarNazi.initialCheck(checkThis);
}//end of autoTest
/**
* This method prints the entire dictionary.
* Was used in testing.
* #param bufferedReader the dictionary file
*/
public static void printDictionary(BufferedReader bufferedReader){
String line = null; // notes one line at a time
try{
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
}catch(FileNotFoundException ex) {
ex.printStackTrace();//print error
}
catch(IOException ex) {
ex.printStackTrace();//print error
}
}//end of printDictionary
/**
* This methods splits the passed String and puts them into a String[]
* #param sentence The sentence that needs editing.
*/
public static void splitString(String sentence){
// split the sentence in between " " aka spaces
checkThis = sentence.split(" ");
}//end of splitString
/**
* This method removes the punctuation and capitalization from a string.
* #param sentence The sentence that needs editing.
* #return the edited sentence.
*/
public static String removePunctuation(String sentence){
String newSentence; // the new sentence
//remove evil punctuation and convert the whole line to lowercase
newSentence = sentence.toLowerCase().replaceAll("[^a-zA-Z\\s]", "").replaceAll("\\s+", " ");
return newSentence;
}//end of removePunctuation
}
This class checks for misspellings
public class WordFinder extends SpellChecker{
private int wordsLength;//length of String[] to check
private List<String> wrongWords = new ArrayList<String>();//stores incorrect words
/**
* This methods checks the String[] for spelling errors.
* Hashes each index in the String[] to see if it is in the dictionary HashSet
* #param words String list of misspelled words to check
*/
public void initialCheck(String[] words){
wordsLength=words.length;
System.out.println();
for(int i=0;i<wordsLength;i++){
//System.out.println("What I'm checking: "+words[i]); //test only
if(!dictionary.contains(words[i])) wrongWords.add(words[i]);
} //end for
//manualWordLookup(); //for testing dictionary only
if (!wrongWords.isEmpty()) {
System.out.println("Mistakes have been made!");
printIncorrect();
} //end if
if (wrongWords.isEmpty()) {
System.out.println("\n\nMove along. End of Program.");
} //end if
}//end of initialCheck
/**
* This method that prints the incorrect words in a String[] being checked and generates suggestions.
*/
public void printIncorrect(){//delete this guy
System.out.print("These words [ ");
for (String wrongWord : wrongWords) {
System.out.print(wrongWord + " ");
}//end of for
System.out.println("]seems incorrect.\n");
suggest();
}//end of printIncorrect
/**
* This method gives suggestions to the user based on the wrong words she/he misspelled.
*/
public void suggest(){
MisSpell test = new MisSpell();
while(!wrongWords.isEmpty()&&test.possibilities.size()<=5){
String wordCheck=wrongWords.remove(0);
test.generateMispellings(wordCheck);
//if the possibilities size is greater than 0 then print suggestions
if(test.possibilities.size()>=0) test.print(test.possibilities);
}//end of while
}//end of suggest
/*ENTERING TEST ZONE*/
/**
* This allows a tester to look thorough the dictionary for words if they are valid; and for testing only.
*/
public void manualWordLookup(){
System.out.print("Enter 'ext' to exit.\n\n");
Scanner line = new Scanner(System.in);
String look=line.nextLine();
do{
if(dictionary.contains(look)) System.out.print(look+" is valid\n");
else System.out.print(look+" is invalid\n");
look=line.nextLine();
}while (!look.equals("ext"));
}//end of manualWordLookup
}
/**
* This is the main class responsible for generating misspellings.
* #author Catherine Austria
*/
public class MisSpell extends SpellChecker{
public List<String> possibilities = new ArrayList<String>();//stores possible suggestions
private List<String> tempHolder = new ArrayList<String>(); //telps for the transposition method
private int Ldistance=0; // the distance related to the two words
private String wrongWord;// the original wrong word.
/**
* Execute methods that make misspellings.
* #param wordCheck the word being checked.
*/
public void generateMispellings(String wordCheck){
wrongWord=wordCheck;
try{
concatFL(wordCheck);
concatLL(wordCheck);
replaceFL(wordCheck);
replaceLL(wordCheck);
deleteFL(wordCheck);
deleteLL(wordCheck);
pluralize(wordCheck);
transposition(wordCheck);
}catch(StringIndexOutOfBoundsException e){
System.out.println();
}catch(ArrayIndexOutOfBoundsException e){
System.out.println();
}
}
/**
* This method concats the word behind each of the alphabet letters and checks if it is in the dictionary.
* FL for first letter
* #param word the word being manipulated.
*/
public void concatFL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=97;i<123;i++){
cur=(char)i;//assign ASCII from index i value
tempWord+=cur;
//if the word is in the dictionary then add it to the possibilities list
tempWord=tempWord.concat(word); //add passed String to end of tempWord
checkDict(tempWord); //check to see if in dictionary
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of concatFL
/**
* This concatenates the alphabet letters behind each of the word and checks if it is in the dictionary. LL for last letter.
* #param word the word being manipulated.
*/
public void concatLL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=123;i>97;i--){
cur=(char)i;//assign ASCII from index i value
tempWord=tempWord.concat(word); //add passed String to end of tempWord
tempWord+=cur;
//if the word is in the dictionary then add it to the possibilities list
checkDict(tempWord);
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of concatLL
/**
* This method replaces the first letter (FL) of a word with alphabet letters.
* #param word the word being manipulated.
*/
public void replaceFL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=97;i<123;i++){
cur=(char)i;//assign ASCII from index i value
tempWord=cur+word.substring(1,word.length()); //add the ascii of i ad the substring of the word from index 1 till the word's last index
checkDict(tempWord);
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of replaceFL
/**
* This method replaces the last letter (LL) of a word with alphabet letters
* #param word the word being manipulated.
*/
public void replaceLL(String word){
char cur; // current character
String tempWord=""; // stores temp made up word
for(int i=97;i<123;i++){
cur=(char)i;//assign ASCII from index i value
tempWord=word.substring(0,word.length()-1)+cur; //add the ascii of i ad the substring of the word from index 1 till the word's last index
checkDict(tempWord);
tempWord="";//reset temp word to contain nothing
}//end of for
}//end of replaceLL
/**
* This deletes first letter and sees if it is in dictionary
* #param word the word being manipulated.
*/
public void deleteFL(String word){
String tempWord=word.substring(1,word.length()-1); // stores temp made up word
checkDict(tempWord);
//print(possibilities);
}//end of deleteFL
/**
* This deletes last letter and sees if it is in dictionary
* #param word the word being manipulated.
*/
public void deleteLL(String word){
String tempWord=word.substring(0,word.length()-1); // stores temp made up word
checkDict(tempWord);
//print(possibilities);
}//end of deleteLL
/**
* This method pluralizes a word input
* #param word the word being manipulated.
*/
public void pluralize(String word){
String tempWord=word+"s";
checkDict(tempWord);
}//end of pluralize
/**
* It's purpose is to check a word if it is in the dictionary.
* If it is, then add it to the possibilities list.
* #param word the word being checked.
*/
public void checkDict(String word){
if(dictionary.contains(word)){//check to see if tempWord is in dictionary
//if the tempWord IS in the dictionary, then check if it is in the possibilities list
//then if tempWord IS NOT in the list, then add tempWord to list
if(!possibilities.contains(word)) possibilities.add(word);
}
}//end of checkDict
/**
* This method transposes letters of a word into different places.
* Not the best implementation. This guy was my last minute addition.
* #param word the word being manipulated.
*/
public void transposition(String word){
wrongWord=word;
int wordLen=word.length();
String[] mixer = new String[wordLen]; //String[] length of the passed word
//make word into String[]
for(int i=0;i<wordLen;i++){
mixer [i]=word.substring(i,i+1);
}
shift(mixer);
}//end of transposition
/**
* This method takes a string[] list then shifts the value in between
* the elements in the list and checks if in dictionary, adds if so.
* I agree that this is probably the brute force implementation.
* #param mixer the String array being shifted around.
*/
public void shift(String[] mixer){
System.out.println();
String wordValue="";
for(int i=0;i<=tempHolder.size();i++){
resetHelper(tempHolder);//reset the helper
transposeHelper(mixer);//fill tempHolder
String wordFirstValue=tempHolder.remove(i);//remove value at index in tempHolder
for(int j=0;j<tempHolder.size();j++){
int inttemp=0;
String temp;
while(inttemp<j){
temp=tempHolder.remove(inttemp);
tempHolder.add(temp);
wordValue+=wordFirstValue+printWord(tempHolder);
inttemp++;
if(dictionary.contains(wordValue)) if(!possibilities.contains(wordValue)) possibilities.add(wordValue);
wordValue="";
}//end of while
}//end of for
}//end for
}//end of shift
/**
* This method fills a list tempHolder with contents from String[]
* #param wordMix the String array being shifted around.
*/
public void transposeHelper(String[] wordMix){
for(int i=0;i<wordMix.length;i++){
tempHolder.add(wordMix[i]);
}
}//end of transposeHelper
/**
* This resets a list
* #param thisList removes the content of a list
*/
public void resetHelper(List<String> thisList){
while(!thisList.isEmpty()) thisList.remove(0); //while list is not empty, remove first value
}//end of resetHelper
/**
* This method prints out a list
* #param listPrint the list to print out.
*/
public void print(List<String> listPrint){
if (possibilities.isEmpty()) {
System.out.print("Can't seem to find any related words for "+wrongWord);
return;
}
System.out.println("Maybe you meant these for "+wrongWord+": ");
System.out.printf("%s", listPrint);
resetHelper(possibilities);
}//end of print
/**
* This returns a String word version of a list
* #param listPrint the list to make into a word.
* #return the generated word version of a list.
*/
public String printWord(List<String> listPrint){
Object[] suggests = listPrint.toArray();
String theWord="";
for(Object word: suggests){//form listPrint elements into a word
theWord+=word;
}
return theWord;
}//end of printWord
}
Instead of generating all possible misspellings of the words in your dictionary and adding them to the hash table, consider performing all possible changes (that you already suggested) to the user-entered words, and checking to see if those words are in the dictionary file.
It sounds like what you want is a quick way to verify that a word is spelled correctly, or to find the correct spelling. If this is what your trying to do you can use a HashMap<String,String> (i.e. a hash table with String keys and String values). Whenever you find a word in your dictionary you enter a key for it with a null value indicating that the word is not to be changed (i.e. a correct spelling). You can then compute and add keys for possible misspellings and give the correct word for the value.
You'd have to devise a way to do this very carefully, because if your dictionary has two similar words "clot" and "colt" computed misspellings of one may replace the correct spelling (or misspellings) of the other. Once your done you can look up a word to see if it is in the dictionary, if it is a misspelling of a dictionary word (and which word), or if it is not found at all.
I believe this is a bad design though, because your table has to be exponentially larger than your (I assume, already quite large) dictionary. And because you spent a lot of time calculating many misspelling for every word in the dictionary (very big overhead if you only check a few lines which may contain a few of these words). Given a only a little liberty I would opt for a HashSet<String> (which is a hash table but without values) filled only with dictionary words. This is allows you to check quickly if a word is in the dictionary or not.
You can dynamically compute other ways to spell words when you encounter ones not in your dictionary. If your doing this for only a line or two it should not be slow at all (certainly faster than computing alternatives for everything in your dictionary). But if you wanted to this for every for a whole file you may want to keep a much smaller HashMap<String,String> separate from your dictionary to store any corrections you find since the author may misspell the word the same way in future. Checking this before computing alternatives keeps you from duplicating your efforts several times over.

Categories

Resources