Checking for and counting punctuation in a file - java

I'm currently doing an assignment which requires the program to count words and punctuation from a text file. The word counting program is done and working but my professor provided an additional method to be combined with it to count punctuation that I cannot seem to get to work. Here is the working program:
import java.util.*;
import java.io.*;
public class SnippetWeek11 {
public static void main(String[] args) throws Exception {
Scanner input = new Scanner(System.in);
System.out.print("Enter a filename of a text file to process: ");
String filename = input.nextLine();
File file = new File(filename);
if (file.exists()) {
processFile(file);
}
else {
System.out.println("File " + filename + " does not exist");
}
}
private static void processFile(File theFile) throws Exception {
int wordIndex;
// Create a TreeMap to hold words as key and count as value
Map<String, Integer> map = new TreeMap<>();
Scanner input = new Scanner(theFile);
String line, keyText;
String[] words;
while (input.hasNextLine()) {
line = input.nextLine();
words = line.split("[\\s+\\p{P}]");
for (wordIndex = 0; wordIndex < words.length; wordIndex++) {
keyText = words[wordIndex].toLowerCase();
updateMap(map, keyText);
}
}
// Display key and value for each entry
map.forEach((key, value) -> System.out.println(key + "\t" + value));
}
private static void updateMap(Map<String, Integer> theMap,
String theText) {
int value;
String key = theText.toLowerCase();
if (key.length() > 0) {
if (!theMap.containsKey(key)) {
// The key does not exist in the Map object (theMap), so add key and
// the value (which is a count in this case) to a new theMap element.
theMap.put(key, 1);
}
else {
// The key already exists, so obtain the value (count in this case)
// from theMap element that contains the key and update the element
// with an increased count.
value = theMap.get(key);
value++;
theMap.put(key, value);
}
}
}
And here is the method that must be combined with the word count program. I would appreciate any help you could give. Thanks.
public static int countPunctuation(File theFile) throws Exception {
String[] punctuationString = {"[","]",".",";",",",":","!","?","(",")","{","}","'"};
Set<String> punctuationSet =
new HashSet<>(Arrays.asList(punctuationString));
int count = 0;
Scanner input = new Scanner(theFile);
while (input.hasNext()) {
String character = input.next();
if (punctuationSet.contains(character))
count++;
}
return count;
}
}

If you could use Pattern Class, you can do this.
import java.util.regex.*;
import java.util.*;
import java.util.stream.*;
class PunctuationMatch
{
public static void main(String[] args) {
final Pattern p = Pattern.compile("^[,|.|?|!|:|;]");
System.out.println(p.splitAsStream("Hello, World! How are you?").count());
}
}
While passing string in compile method pass all the puctuation you want to identify.
Passing into splitAsStream method your entire data string or a line by line of a file and add every thing up.
Here is the Java Docs Ref

Related

How do I Take certain elements of a String array and create a new array java

so I am currently writing a program that reads in inputs from a file. (I am a beginner in java and don't understand a lot yet, so if you guys can work with me being slow that would be great.)
The file consists of a whole bunch of information regarding country data based on sales of products. The two pieces of the file that I care about are the Country names and the profit numbers. What I'm stuck with is how do I take specific portions of the file and read them into an array and then tally up the total profits? Currently I have read in the header of the file, found the indexes of the Country and profit of the header ( I assumed that finding the index of the headers will translate to finding numbers and names for profit and country later on). The file for example has multiple countries and they repeat multiple times through the file in a random order. Ex
Any help will be useful thanks!
my code right now is:
public static void main(String[]args)throws IOException{
Scanner in = new Scanner(new File("sample-csv-file-for-testing-fixed.csv"));
PrintWriter pw = new PrintWriter(new File("Output.csv"));
// gets first line of file
String firstline = in.nextLine();
firstline.trim();
String data = firstline.replaceAll(" ","");
String[] header = data.split(",") ;
// find index of Country and Profit and store them into variables
String country = "Country";
String profit = "Profit";
int index1 =0 , index2=0;
for(int i = 0;i<header.length;i++){
if(header[i].equals(country)){
index1 = i;
}
}
for(int i = 0;i<header.length;i++){
if(header[i].equals(profit)){
index2 = i;
}
}
System.out.println(index1+" "+index2);
while( in.hasNextLine()){
String line = in.nextLine();
String nextline = line.replaceAll(" ","");
String[] values = nextline.split(",");
for(int i = 0;i< values.length;i++){
System.out.print(values[i]+ " ");
}
}
// Read in line of file into string, separate the string into an array
// keep track of country names
// find a way to get rid of all other numbers except profit
// sum the total profit for each line for each country
// create a output file and print out the table
}
If I don't understand bad, you want something like this:
public static final String COUNTRY_HEADER = "country";
public static final String PROFIT_HEADER = "profit";
public static void main(String[] args) throws URISyntaxException, IOException {
final Scanner in = new Scanner(new File("src/main/resources/group-by.txt"));
final String firstLine = in.nextLine();
final String[] headers = firstLine.split(" ");
int countryIndex = -1;
int profitIndex = -1;
for (int i = 0; i < headers.length; i++) {
if (headers[i].equalsIgnoreCase(COUNTRY_HEADER)) {
countryIndex = i;
} else if (headers[i].equalsIgnoreCase(PROFIT_HEADER)) {
profitIndex = i;
}
}
final Map<String, Long> profitsByCountry = new HashMap<>();
while (in.hasNextLine()) {
final String line = in.nextLine();
final String[] values = line.split(" ");
profitsByCountry.merge(values[countryIndex], Long.valueOf(values[profitIndex]), Long::sum);
}
profitsByCountry.forEach((key, value) -> System.out.printf("Country: %s, Profit: %d%n", key, value));
// Do more stuff
}
Basically, once you have located the index of the columns you are looking for, you just need to go throw the rest of the lines in the file and accumulate their values.
Note: The data example you have offered has one mistake, there is an extra 'blah' in the last line for 'USA'
A File Stream based solution. Finding the index of the header uses the same logic as #Dave and #fjvierm.
public class FileStreaming {
public static void main(String[] args) {
try (BufferedReader br = Files.newBufferedReader(Paths.get("filestreamdata.txt"))) {
int[] idx = getIndex(br.readLine());
Map<String, Integer> result = br.lines()
.map(l -> l.split(" +"))
.map(ss -> new AbstractMap.SimpleEntry<>(ss[idx[0]], Integer.parseInt(ss[idx[1]])))
.collect(Collectors.toMap(AbstractMap.SimpleEntry::getKey,
AbstractMap.SimpleEntry::getValue,
Integer::sum));
result.forEach((key, value) -> System.out.printf("%s %d\n", key, value));
} catch (IOException e) {
e.printStackTrace();
}
}
private static int[] getIndex(String line) {
String[] splits = line.split(" +");
int[] result = new int[2];
for (int i = 0; i < splits.length; i++) {
if (splits[i].equals("country")) {
result[0] = i;
}
if (splits[i].equals("profit")) {
result[1] = i;
}
}
return result;
}
}
The below code might help.
It assumes: the header is always the first line; the header record begins with "Segment"; and the profit values are always in the same field position as the “Profit” header.
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.io.IOException;
import java.util.*;
import java.util.stream.*;
public class SumTheProfit {
public static void main(String[] args) throws IOException {
String fileName = "test.csv";
String firstColumnHeader = "Segment";
String profitColumnHeader = "Profit";
// put header record into an array
Path filePath = Paths.get(fileName);
String[] firstLine = Files.lines(filePath)
.map(s -> s.replaceAll(" ", ""))
.map(s -> s.split(","))
.findFirst()
.get();
// get the index of "Profit" from the header
int profitIndex = java.util.Arrays.asList(firstLine).indexOf(profitColumnHeader);
List<String> list = new ArrayList<>();
// filter out header record & collect each profit (index 5) into a list
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
list = stream
.filter(line -> !line.startsWith(firstColumnHeader))
.map(line -> line.split("\\s*(,|\\s)\\s*")[profitIndex])
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}
//sum each profit value in the list
Integer sum = list.stream().mapToInt(num -> Integer.parseInt(num)).sum();
System.out.println(sum);
}
}
This takes a declarative approach using the Java Streams API, as that's easier to read in comparison to an imperative for loop approach that hides application logic inside boilerplate code.

How to scan a file for a specific letter in Java

I need to take a file that a user chooses and scan that file for a letter that a user chooses, and then output how many times the user's letter appeared in the file.
I know how to get the user input and get the user to select a file, as well as scanning the file, but I cannot figure out a way to check each character within a file for a specific letter. The closest I have been able to come is this:
public class FileLetterCounter
{
public static void main(String[] args) throws IOException
{
int count = 0, stringLength;
String file, a = "a";
Scanner fileScanner, letterScan;
ArrayList<String> line = new ArrayList<String>();
fileScanner = new Scanner(new File("lab6.txt"));
while (fileScanner.hasNext())
{
line.add(fileScanner.next());
for (int index = 0; index < line.length(); index ++)
{
if (line.get(index).contains(a));
{
count++;
}
}
}
}
}
This doesn't work because the length() method does not work on an ArrayList, and I am unsure of how to approach the problem. I am asking this question because I found a similar one, but the recommended solution was to use what I have right now in my for loop (line.length()), but this won't work.
Instead of adding it to the list, just scan the text into a string, iterate each character of the string to check if the character matches with the search character, and increase the value of count for each match.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
int count = 0;
Scanner keyboard = new Scanner(System.in);
System.out.println("Enter the char to search: ");
char searchChar = keyboard.next().charAt(0);
Scanner fileScanner = new Scanner(new File("lab6.txt"));
while (fileScanner.hasNext()) {
String text = fileScanner.next();
for (int index = 0; index < text.length(); index++) {
if (text.charAt(index) == searchChar) {
count++;
}
}
}
System.out.println("The character " + searchChar + " appears " + count + " times in the file.");
fileScanner.close();
}
}
Look at this implementation with Streams. Looks pretty nice to me. Additionally do not forget to provide Charset, otherwise you could get unexpected results.
public static long countCharacterInFile(Path file, char ch, Charset charset) throws IOException {
try (Stream<String> stream = Files.lines(file, charset)) {
return stream.map(String::codePoints)
.flatMap(IntStream::boxed)
.filter(c -> c == ch)
.count();
}
}
Output:
Path file = Paths.get("lab6.txt");
System.out.println(countCharacterInFile(file, 'e', StandardCharsets.UTF_8)); // 666
Assuming you are trying to search a character in the whole file. Modified the code by removing all those unnecessary variables. Also I don't see any use of adding each line to a list of strings.
Idea is to scan through each line and increment count if the current character character matches your character
public class FileLetterCounter
{
public static void main(String[] args) throws IOException
{
int count = 0;
char targetLetter = 'a'; //define whatever you want or take it from user input
Scanner fileScanner = new Scanner(new File("lab6.txt"));
while (fileScanner.hasNext()) {
String line = fileScanner.nextLine();
for(int i=0; i<line.length(); i++) {
if(line.charAt(i) == targetLetter) {
count++;
}
}
}
System.out.println(count);
}
}

Java - Hashmapping a text file

and please excuse my ignorance, I have been puzzling on this for a while.
I have a huge .txt file containing mostly letters. I need to create HashMaps to store word length, Word characters and Word count...i have to print out the longest word occurred more than three times and show how many times it occurred.
Im thinking something like that
private void readWords(){
BufferedReader in = new BufferedReader(new FileReader("text.txt"));
Map<Integer, Map<String, Integer>>
}
The problem is that i dont quite know how to save to HashMap, can anybody help please?
Thank you!
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class HashMapExample {
static String fileName = "text.txt";
private static Scanner input;
public static void main(String[] args) throws FileNotFoundException {
input = new Scanner(new File(fileName));
Map<String, Integer> map = new HashMap<String, Integer>();
while (input.hasNext()) {
String word = input.next();
if (map.containsKey(word)) {
int temp = map.get(word) + 1;
map.put(word, temp);
} else {
map.put(word, 1);
}
}
System.out.println("printing longest word(s) with word count < 3");
System.out.println("");
// iterate through the key set and display word, word length and values
System.out.printf("%-25s\t%-25s\t%s\n", "Word", "Word Length", "Count");
String longest = getLongest(map);
int valueOfLongest = 0;
if (!longest.equals("")) {
valueOfLongest = longest.length();
System.out.printf("%-25s\t%-25s\t%s\n", longest, longest.length(), map.get(longest));
map.remove(longest);
}
boolean isAllRemoved = false;
while (!isAllRemoved) {
isAllRemoved = false;
longest = getLongest(map);
if (!longest.equals("") && longest.length() == valueOfLongest){
System.out.printf("%-25s\t%-25s\t%s\n", longest, longest.length(), map.get(longest));
map.remove(longest);
} else
isAllRemoved = true;
}
System.out.println("");
System.out.println("printing next longest word(s) with word count > = 3");
System.out.println("");
// iterate through the key set and display word, word length and values
System.out.printf("%-25s\t%-25s\t%s\n", "Word", "Word Length", "Count");
String nextLongest = getNextLongest(map, valueOfLongest);
int valueOfNextLongest = 0;
if (!longest.equals("")) {
valueOfNextLongest = nextLongest.length();
System.out.printf("%-25s\t%-25s\t%s\n", nextLongest, nextLongest.length(), map.get(nextLongest));
map.remove(nextLongest);
}
boolean isNextLongest = false;
while (!isNextLongest) {
isNextLongest = true;
nextLongest = getNextLongest(map, valueOfLongest);
if (!(nextLongest.equals("")) && nextLongest.length() == valueOfNextLongest) {
System.out.printf("%-25s\t%-25s\t%s\n", nextLongest, nextLongest.length(), map.get(nextLongest));
map.remove(nextLongest);
isNextLongest = false;
}
}
}
public static String getLongest(Map<String, Integer> map) {
String longest = "";
for (Map.Entry<String, Integer> entry : map.entrySet()) {
String key = (String) entry.getKey();
if (longest.length() < key.length() && map.get(key) < 3) {
longest = key;
}
}
return longest;
}
public static String getNextLongest(Map<String, Integer> map,
int valueOfLongest) {
String nextLongest = "";
for (Map.Entry<String, Integer> entry : map.entrySet()) {
String key = (String) entry.getKey();
if (valueOfLongest > key.length() && nextLongest.length() < key.length() && map.get(key) >= 3) {
nextLongest = key;
}
}
return nextLongest;
}
}
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import com.google.common.collect.HashMultiset;
import com.google.common.collect.Multiset;
public class CountWord {
public static void main(String args[]) throws IOException {
FileReader fr = new FileReader("c:/a.txt");
BufferedReader br = new BufferedReader(fr);
// init the longest size 0
int longestSize = 0;
String s = null;
// may be some word have the same length
Set<String> finalAnswerSet = new HashSet<String>();
Multiset<String> everyWordSet = HashMultiset.create();
while (br != null && (s = br.readLine()) != null) {
// put every word into the everyWordSet
everyWordSet.add(s);
// we care about the word appear 3+ times
if (everyWordSet.count(s) > 3) {
if (s.length() > longestSize) {
//if s'length is the longest,clear the finalAnswerSet and put s into it
longestSize = s.length();
finalAnswerSet.clear();
finalAnswerSet.add(s);
} else if (s.length() == longestSize) {
// finalAnswerSet may contains multi values
finalAnswerSet.add(s);
}
}
}
// and now we have the longestSize,and finalAnswerSet contains the answers,let's check it
System.out.println("The longest size is:" + longestSize);
for (String answer : finalAnswerSet) {
System.out.println("The word is :" + answer);
System.out.println("The word appears time is:" + everyWordSet.count(answer));
}
//don't forget to close the resource
br.close();
fr.close();
}
}

How to count words in array of strings in java?

I am learning about arrays and I wanted to make a program count words. Given: String myWords = {"soon; hi; also; soon; job; also"};
, I have to create a method like countWrods(myWords);
The printed result should be the words printed alphabetical order, the number of unique words and total words.
here is my code:
public class Words {
public static void main(String[] args){
String[] myWords = {"soon; hi; also; soon; job; mother; job; also; soon; later"};
Words myW= new Words();
myW.countWords();
System.out.println("\tWords \tFreq");
}
public static String[] countWords(myWords){
for (int i=0; i<myWords.length; i++){
String temp = myWords[i];
//System.out.println(temp + " ");
for(int j=i+1; j<myWords.length; j++){
String temp2= myWords[j];
System.out.println("No. of unique words: " );
}
}
}
}
What should I do next?
import java.io.*;
import java.util.*;
public class Count_Words_Scan
{
void main()throws IOException
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("ENTER A STRING ");
String str = br.readLine();
str= str.toLowerCase();
int c=0;
Scanner sc = new Scanner(str);
while(sc.hasNext())
{
sc.next();
c++;
}
System.out.println("NO.OF WORDS = "+c);
}
}
Input: the word counter
Output: NO.OF WORDS = 3
I would suggest you take a look at split, trim and HashSet.
I am assuming you want to count the words in a string .
String : "soon hi also soon job mother job also soon later"
public class Words {
Map<String , Integer> dictionary=new HashMap<String,Integer>();
public static void main(String[] args) {
String myWords = "soon hi also soon job mother job also soon later";
Words myW = new Words();
String[] array=myWords.split("\\s+");
myW.countWords(array);
System.out.println(myW.dictionary);
}
private void countWords(String[] myWords) {
for(String s:myWords){
if(dictionary.containsKey(s))
dictionary.put(s, dictionary.get(s)+1);
else
dictionary.put(s, 1);
}
}
}
O/P : {mother=1, later=1, job=2, hi=1, also=2, soon=3}
First you need to split your String, presumably on ";" - then you can whack that into a TreeSet to sort it and make then words unqiue. Add a counter to count the total words. You could also use a TreeMap to keep a count of each word, override the put method on the map to aggregate as you go...
final String myString = {"soon; hi; also; soon; job; mother; job; also; soon; later"};
final String[] myStrings = myString.split(";");
final Map<String, Integer> myStringMap = new TreeMap<>(){
#override
public String put(final String key, final Integer value) {
if(contains(key)) {
return put(key, get(key) + 1);
} else {
return put(key, 1);
}
}
};
for(final String string : myStrings) {
myStringMap.put(string.trim(), 1);
}
Now myStringMap.size() is the number of unique words, myStringMap.keys() is a alphabetically sorted Set of all unquie words and if you want the total you just need to add up the values:
int totalWords = 0;
for(final Integer count : myStringMap.values()) {
totalWorks += count;
}

How to disregard numbers when reading from a text file?

Right now I want to store a text file that goes like this:
1 apple
2 banana
3 orange
4 lynx
5 cappuccino
and so on into a data structure. Would the best way of doing this be mapping the int to the string somehow, or should I make an arraylist? I'm supposed to, when I store the words themselves, disregard the int and any whitespace, and keep only the word itself. How do I disregard the int when reading in lines? Here is my hacked together code right now:
public Dictionary(String filename) throws IOException {
if (filename==null)
throw new IllegalArgumentException("Null filename");
else{
try {
BufferedReader in = new BufferedReader(new FileReader(filename));
String str;
int numLines=0;
while ((str = in.readLine()) != null) {
numLines++;
}
String[] words=new String[numLines];
for (int i=0; i<words.length;i++){
words[i]=in.readLine();
}
in.close();
} catch (IOException e) {
}
}
}
Thank you in advance for the help!!
Just implement the power of the regular expression:
List texts<String> = new ArrayList<String>();
Pattern pattern = Pattern.compile("[^0-9\\s]+");
String text = "1 apple 2 oranges 3 carrots";
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
texts.add(matcher.group(0));
}
regular expressions are very much popular these days. the compile method is used for compiling your search pattern, with the numbers you see in the parameter is to prevent getting them on your search. So it's completely safe. use apache's IOUtilities to convert a text file to String
This won´t work because you are already at the end of the file, so the in.readLine() methode will return null.
I would use a Map to store the name and the amount...something like this:
HashMap<String, Integer> map = new HashMap<String, Integer>();
while( (line = br.readLine() !=null){
//also check if the array is null and the right size, trim, etc.
String[] tmp = line.split(" ");
map.put(tmp[1], Integer.parseInt(tmp[0]) );
}
Otherwise you can try it with the Scanner class. Good luck.
You can give regular expressions a try.
Pattern p = Pattern.compile("[^0-9\\s]+");
String s = "1 apple 2 oranges";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(0));
}
Output =
apple
oranges
To get a idea about regular expressions Java regex tutorial.
I suggest you use a List of items to store the results parsed from the file. One way to parse every text line is to use the String.split(String) method. Also note that you should handle exceptions in the code properly and do not forget to close the Reader when you are done (no matter whether flawlessly or with an exception => use a finally block). The following example should put you on track... Hope this helps.
package test;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(String[] args) throws IOException {
Main m = new Main();
m.start("test.txt");
}
private void start(String filename) throws IOException {
System.out.println(readFromFile(filename));
}
private final class Item {
private String name;
private int id;
public Item(String name, int id) {
this.name = name;
this.id = id;
}
public int getId() {
return id;
}
public String getName() {
return name;
}
#Override
public String toString() {
return "Item [name=" + name + ", id=" + id + "]";
}
}
private List<Item> readFromFile(String filename) throws IOException {
List<Item> items = new ArrayList<Item>();
Reader r = null;
try {
r = new FileReader(filename);
BufferedReader br = new BufferedReader(r);
String line = null;
while ((line = br.readLine()) != null) {
String[] lineItems = line.split(" ");
if (lineItems.length != 2) {
throw new IOException("Incorrect input file data format! Two space separated items expected on every line!");
}
try {
int id = Integer.parseInt(lineItems[0]);
Item i = new Item(lineItems[1], id);
items.add(i);
} catch (NumberFormatException ex) {
throw new IOException("Incorrect input file data format!", ex); // JDK6+
}
}
} finally {
if (r != null) {
r.close();
}
}
return items;
}
}
If your words don't contain spaces, you could use String.split( " " ) to split up the String into an array of Strings delimited by spaces.
Then just take the second element of the array (the first will be the number).
Also, the String.trim( ) method will remove any whitespace before or after the String.
Note: there's probably some error checking that you'd want to perform (what if the String isn't formatted as you expect). But this code snippet gives the basic idea:
...
String s = in.readLine( );
String[] tokens = s.split( " " );
words[i] = tokens[1].trim( );
...
If you want to do something easy just substring the original work by counting digits:
int t = 0;
while (word.charAt(t) >= '0' && word.charAt(t) <= '9')
++t;
word = word.substring(t);
If words NEVER contain spaces you can also use word.split(" ")[1]
Instead of using a buffer reader use the Scanner class and instead of using an Array use an ArrayList, like so :
import java.util.Scanner;
import java.util.ArrayList;
public class Dictionary {
private ArrayList strings = new ArrayList();
code...
public Dictionary(String fileName) throws IOException {
code...
try {
Scanner inFile = new Scanner(new fileRead(fileName));
ArrayList.add("Dummy"); // Dummy value to make the index start at 1
while(inFile.hasNext()) {
int n = inFile.nextInt(); // this line just reads in the int from the file and
// doesn't do anything with it
String s = inFile.nextLine().trim();
strings.add(s);
}
inFile.close(); // don't forget to close the file
}
and then since your data goes 1, 2, 3, 4, 5, you can just use the index to retrieve each item's number.
By doing this:
for(int i = 1; i < strings.size(); i++) {
int n = i;
String s = n + " " + strings.get(i);
System.out.println(s);
}

Categories

Resources