Character occurrence in a txt file java

Character occurrence in a txt file java - java

I'm writing a character occurrence counter in a txt file. I keep getting a result of 0 for my count when I run this:
public double charPercent(String letter) {
Scanner inputFile = new Scanner(theText);
int charInText = 0;
int count = 0;
// counts all of the user identified character
while(inputFile.hasNext()) {
if (inputFile.next() == letter) {
count += count;
}
}
return count;
}
Anyone see where I am going wrong?

This is because Scanner.next() will be returning entire words rather than characters. This means that the string from will rarely be the same as the single letter parameter(except for cases where the word is a single letter such as 'I' or 'A'). I also don't see the need for this line:
int charInText = 0;
as the variable is not being used.
Instead you could try something like this:
public double charPercent(String letter) {
Scanner inputFile = new Scanner(theText);
int totalCount = 0;
while(inputFile.hasNext()) {
//Difference of the word with and without the given letter
int occurencesInWord = inputFile.next().length() - inputFile.next().replace(letter, "").length();
totalCount += occurencesInWord;
}
return totalCount;
}
By using the difference between the length of the word at inputFile.next() with and without the letter, you will know the number of times the letter occurs in that specific word. This is added to the total count and repeated for all words in the txt.

use inputFile.next().equals(letter) instead of inputFile.next() == letter1.
Because == checks for the references. You should check the contents of the String object. So use equals() of String
And as said in comments change count += count to count +=1 or count++.
Read here for more explanation.

Do you mean to compare the entire next word to your desired letter?
inputFile.next() will return the next String, delimited by whitespace (tab, enter, spacebar). Unless your file only contains singular letters all separated by spaces, your code won't be able to find all the occurrences of letters in those words.
You might want to try calling inputFile.next() to get the next String, and then breaking that String down into a charArray. From there, you can iterate through the charArray (think for loops) to find the desired character. As a commenter mentioned, you don't want to use == to compare two Strings, but you can use it to compare two characters. If the character from the charArray of your String matches your desired character, then try count++ to increment your counter by 1.

Related

Java Get first character values for a string

I have inputs like
AS23456SDE
MFD324FR
I need to get First Character values like
AS, MFD
There should no first two or first 3 characters input can be changed. Need to get first characters before a number.
Thank you.
Edit : This is what I have tried.
public static String getPrefix(String serial) {
StringBuilder prefix = new StringBuilder();
for(char c : serial.toCharArray()){
if(Character.isDigit(c)){
break;
}
else{
prefix.append(c);
}
}
return prefix.toString();
}

Here is a nice one line solution. It uses a regex to match the first non numeric characters in the string, and then replaces the input string with this match.
public String getFirstLetters(String input) {
return new String("A" + input).replaceAll("^([^\\d]+)(.*)$", "$1")
.substring(1);
}
System.out.println(getFirstLetters("AS23456SDE"));
System.out.println(getFirstLetters("1AS123"));
Output:
AS
(empty)

A simple solution could be like this:
public static void main (String[]args) {
String str = "MFD324FR";
char[] characters = str.toCharArray();
for(char c : characters){
if(Character.isDigit(c))
break;
else
System.out.print(c);
}
}

Use the following function to get required output
public String getFirstChars(String str){
int zeroAscii = '0'; int nineAscii = '9';
String result = "";
for (int i=0; i< str.lenght(); i++){
int ascii = str.toCharArray()[i];
if(ascii >= zeroAscii && ascii <= nineAscii){
result = result + str.toCharArray()[i];
}else{
return result;
}
}
return str;
}
pass your string as argument

I think this can be done by a simple regex which matches digits and java's string split function. This Regex based approach will be more efficient than the methods using more complicated regexs.
Something as below will work
String inp = "ABC345.";
String beginningChars = inp.split("[\\d]+",2)[0];
System.out.println(beginningChars); // only if you want to print.
The regex I used "[\\d]+" is escaped for java already.
What it does?
It matches one or more digits (d). d matches digits of any language in unicode, (so it matches japanese and arabian numbers as well)
What does String beginningChars = inp.split("[\\d]+",2)[0] do?
It applies this regex and separates the string into string arrays where ever a match is found. The [0] at the end selects the first result from that array, since you wanted the starting chars.
What is the second parameter to .split(regex,int) which I supplied as 2?
This is the Limit parameter. This means that the regex will be applied on the string till 1 match is found. Once 1 match is found the string is not processed anymore.
From the Strings javadoc page:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will be efficient if your string is huge.
Possible other regex if you want to split only on english numerals
"[0-9]+"

public static void main(String[] args) {
String testString = "MFD324FR";
int index = 0;
for (Character i : testString.toCharArray()) {
if (Character.isDigit(i))
break;
index++;
}
System.out.println(testString.substring(0, index));
}
this prints the first 'n' characters before it encounters a digit (i.e. integer).

Counting the occurrences of string in Java using string.split()

I'm new to Java. I thought I would write a program to count the occurrences of a character or a sequence of characters in a sentence. I wrote the following code. But I then saw there are some ready-made options available in Apache Commons.
Anyway, can you look at my code and say if there is any rookie mistake? I tested it for a couple of cases and it worked fine. I can think of one case where if the input is a big text file instead of a small sentence/paragraph, the split() function may end up being problematic since it has to handle a large variable. However this is my guess and would love to have your opinions.
private static void countCharInString() {
//Get the sentence and the search keyword
System.out.println("Enter a sentence\n");
Scanner in = new Scanner(System.in);
String inputSentence = in.nextLine();
System.out.println("\nEnter the character to search for\n");
String checkChar = in.nextLine();
in.close();
//Count the number of occurrences
String[] splitSentence = inputSentence.split(checkChar);
int countChar = splitSentence.length - 1;
System.out.println("\nThe character/sequence of characters '" + checkChar + "' appear(s) '" + countChar + "' time(s).");
}
Thank you :)

Because of edge cases, split() is the wrong approach.
Instead, use replaceAll() to remove all other characters then use the length() of what's left to calculate the count:
int count = input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();
FYI, the regex created (for example when check = 'xyz'), looks like ".*?(xyz|$)", which means "everything up to and including 'xyz' or end of input", and is replaced by the captured text (either `'xyz' or nothing if it's end of input). This leaves just a string of 0-n copies the check string. Then dividing by the length of check gives you the total.
To protect against the check being null or zero-length (causing a divide-by-zero error), code defensively like this:
int count = check == null || check.isEmpty() ? 0 : input.replaceAll(".*?(" + check + "|$)", "$1").length() / check.length();

A flaw that I can immediately think of is that if your inputSentence only consists of a single occurrence of checkChar. In this case split() will return an empty array and your count will be -1 instead of 1.
An example interaction:
Enter a sentence
onlyme
Enter the character to search for
onlyme
The character/sequence of characters 'onlyme' appear(s) '-1' time(s).
A better way would be to use the .indexOf() method of String to count the occurrences like this:
while ((i = inputSentence.indexOf(checkChar, i)) != -1) {
count++;
i = i + checkChar.length();
}

split is the wrong approach for a number of reasons:
String.split takes a regular expression
Regular expressions have characters with special meanings, so you cannot use it for all characters (without escaping them). This requires an escaping function.
Performance String.split is optimized for single characters. If this were not the case, you would be creating and compiling a regular expression every time. Still, String.split creates one object for the String[] and one object for each String in it, every time that you call it. And you have no use for these objects; all you want to know is the count. Although a future all-knowing HotSpot compiler might be able to optimize that away, the current one does not - it is roughly 10 times as slow as simply counting characters as below.
It will not count correctly if you have repeating instances of your checkChar
A better approach is much simpler: just go and count the characters in the string that match your checkChar. If you think about the steps you need to take count characters, that's what you'd end up with by yourself:
public static int occurrences(String str, char checkChar) {
int count = 0;
for (int i = 0, l = str.length(); i < l; i++) {
if (str.charAt(i) == checkChar)
count++;
}
return count;
}
If you want to count the occurrence of multiple characters, it becomes slightly tricker to write with some efficiency because you don't want to create a new substring every time.
public static int occurrences(String str, String checkChars) {
int count = 0;
int offset = 0;
while ((offset = str.indexOf(checkChars, offset)) != -1) {
offset += checkChars.length();
count++;
}
return count;
}
That's still 10-12 times as fast to match a two-character string than String.split()
Warning: Performance timings are ballpark figures that depends on many circumstances. Since the difference is an order of magnitude, it's safe to say that String.split is slower in general. (Tests performed on jdk 1.8.0-b28 64-bit, using 10 million iterations, verified that results were stable and the same with and without -Xcomp, after performing tests 10 times in same JVM instances.)

How can I compare 2 strings character by character?

for a college project, I am doing a spelling test for children and i need to give 1 mark for a minor spelling error. For this I am going to do if the spelling has 2 characters wrong. How can I compare the saved word to the inputed word?
char wLetter1 = word1.charAt(0);
char iLetter1 = input1.charAt(0);
char wLetter2 = word1.charAt(1);
char iLetter2 = input1.charAt(1);
I have started out with this where word1 is the saved word and input1 is the user input word.
However, if I add lots of these, if the word is 3 characters long but I am trying to compare the 4th character, I will get an error? Is there a way of knowing how many characters are in the string and only finding the characters of those letters?

Just use a for loop. Since I'm assuming this is about JavaScript, calling charAt() with an index out-of-bounds will just return the empty string "".
To avoid a out-of-bounds exception you'll have to iterate up until the lower of the lengths:
int errs = Math.abs(word1.length - input1.length);
int len = Math.min(word1.length, input1.length);
for (int i = 0; i < len; i++) {
if (word1.charAt(i) != input1.charAt(i)) errs++;
}
// errs now holds the number of character mismatches

Add brackets to sequence of chars in string

I need to put a sequence of characters in a String in brackets in such way that it would choose the longest substring as the optimal to put in brackets. To make it clear because it is too complicated to explain with words:
If my input is:
'these are some chars *£&$'
'these are some chars *£&$^%(((£'
the output in both inputs respectively should be:
'these are some chars (*£&$)'
'these are some chars (*£&$^%)(((£'
so I would like to put in brackets the sequence *£&$^% IF it exists otherwise put in brackets just *£&$
I hope it makes sense!

In the general case, this method works. It surrounds the earliest substring of any keyword in any given String:
public String bracketize() {
String chars = ...; // you can put whatever input (such as 'these are some chars *£&$')
String keyword = ...; // you can put whatever keyword (such as *£&$^%)
String longest = "";
for(int i=0;i<keyword.length()-1;i++) {
for(int j=keyword.length(); j>i; j--) {
String tempString = keyword.substring(i,j);
if(chars.indexOf(tempString) != -1 && tempString.length()>longest.length()) {
longest = tempString;
}
}
}
if(longest.length() == 0)
return chars; // no possible substring of keyword exists in chars, so just return chars
String bracketized = chars.substring(0,chars.indexOf(longest))+"("+longest+")"+chars.substring(chars.indexOf(longest)+longest.length());
return bracketized;
}
The nested for loops check every possible substring of keyword and select the longest one that is contained in the bigger String, chars. For example, if the keyword is Dog, it will check the substrings "Dog", "Do", "D", "og", "o", and "g". It stores this longest possible substring in longest (which is initialized to the empty String). If the length of longest is still 0 after checking every substring, then no such substring of keyword can be found in chars, so the original String, chars, is returned. Otherwise, a new string is returned which is chars with the substring longest surrounded by brackets (parentheses).
Hope this helps, let me know if it works.

Try something like this (assuming target string only occurs once).
String input = "these are some chars *£&$"
String output = "";
String[] split;
if(input.indexOf("*£&$^%")!=(-1)){
split = input.split("*£&$^%");
output = split[0]+"(*£&$^%)";
if(split.length>1){
output = output+split[1];
}
}else if(input.indexOf("*£&$")!=(-1)){
split = input.split("*£&$");
output = split[0]+"(*£&$)";
if(split.length>1){
output = output+split[1];
}
}else{
System.out.println("does not contain either string");
}

Word Count no duplicates

Here is my word count program using java. I need to reprogram this so that something, something; something? something! and something count as one word. That means it should not count the same word twice irregardless of case and punctuation.
import java.util.Scanner;
public class WordCount1
{
public static void main(String[]args)
{
final int Lines=6;
Scanner in=new Scanner (System.in);
String paragraph = "";
System.out.println( "Please input "+ Lines + " lines of text.");
for (int i=0; i < Lines; i+=1)
{
paragraph=paragraph+" "+in.nextLine();
}
System.out.println(paragraph);
String word="";
int WordCount=0;
for (int i=0; i<paragraph.length()-1; i+=1)
{
if (paragraph.charAt(i) != ' ' || paragraph.charAt(i) !=',' || paragraph.charAt(i) !=';' || paragraph.charAt(i) !=':' )
{
word= word + paragraph.charAt(i);
if(paragraph.charAt(i+1)==' ' || paragraph.charAt(i) ==','|| paragraph.charAt(i) ==';' || paragraph.charAt(i) ==':')
{
WordCount +=1;
word="";
}
}
}
System.out.println("There are "+WordCount +" words ");
}
}

Since this is homework, here are some hints and advice.
There is a clever little method called String.split that splits a string into parts, using a separator specified as a regular expression. If you use it the right way, this will give you a one line solution to the "word count" problem. (If you've been told not to use split, you can ignore that ... though it is the simple solution that a seasoned Java developer would consider first.)
Format / indent your code properly ... before you show it to other people. If your instructor doesn't deduct marks for this, he / she isn't doing his job properly.
Use standard Java naming conventions. The capitalization of Lines is incorrect. It could be LINES for a manifest constant or lines for variable, but a mixed case name starting with a capital letter should always be a class name.
Be consistent in your use of white space characters around operators (including the assignment operator).
It is a bad idea (and completely unnecessary) to hard wire the number of lines of input that the user must supply. And you are not dealing with the case where he / supplies less than 6 lines.

You should just remove punctuation and change to a single case before doing further processing. (Be careful with locales and unicode)
Once you have broken the input into words, you can count the number of unique words by passing them into a Set and checking the size of the set.

Here You Go. This Works. Just Read The Comments And You Should Be Able To Follow.
import java.util.Arrays;
import java.util.HashSet;
import javax.swing.JOptionPane;
// Program Counts Words In A Sentence. Duplicates Are Not Counted.
public class WordCount
{
public static void main(String[]args)
{
// Initialize Variables
String sentence = "";
int wordCount = 1, startingPoint = 0;
// Prompt User For Sentence
sentence = JOptionPane.showInputDialog(null, "Please input a sentence.", "Input Information Below", 2);
// Remove All Punctuations. To Check For More Punctuations Just Add Another Replace Statement.
sentence = sentence.replace(",", "").replace(".", "").replace("?", "");
// Convert All Characters To Lowercase - Must Be Done To Compare Upper And Lower Case Words.
sentence = sentence.toLowerCase();
// Count The Number Of Words
for (int i = 0; i < sentence.length(); i++)
if (sentence.charAt(i) == ' ')
wordCount++;
// Initialize Array And A Count That Will Be Used As An Index
String[] words = new String[wordCount];
int count = 0;
// Put Each Word In An Array
for (int i = 0; i < sentence.length(); i++)
{
if (sentence.charAt(i) == ' ')
{
words[count] = sentence.substring(startingPoint,i);
startingPoint = i + 1;
count++;
}
}
// Put Last Word In Sentence In Array
words[wordCount - 1] = sentence.substring(startingPoint, sentence.length());
// Put Array Elements Into A Set. This Will Remove Duplicates
HashSet<String> wordsInSet = new HashSet<String>(Arrays.asList(words));
// Format Words In Hash Set To Remove Brackets, And Commas, And Convert To String
String wordsString = wordsInSet.toString().replace(",", "").replace("[", "").replace("]", "");
// Print Out None Duplicate Words In Set And Word Count
JOptionPane.showMessageDialog(null, "Words In Sentence:\n" + wordsString + " \n\n" +
"Word Count: " + wordsInSet.size(), "Sentence Information", 2);
}
}

If you know the marks you want to ignore (;, ?, !) you could do a simple String.replace to remove the characters out of the word. You may want to use String.startsWith and String.endsWith to help
Convert you values to lower case for easier matching (String.toLowercase)
The use of a 'Set' is an excellent idea. If you want to know how many times a particular word appears you could also take advantage of a Map of some kind

You'll need to strip out the punctuation; here's one approach: Translating strings character by character
The above can also be used to normalize the case, although there are probably other utilities for doing so.
Now all of the variations you describe will be converted to the same string, and thus be recognized as such. As pretty much everyone else has suggested, as set would be a good tool for counting the number of distinct words.

What your real problem is, is that you want to have a Distinct wordcount, so, you should either keep track of which words allready encountered, or delete them from the text entirely.
Lets say that you choose the first one, and store the words you already encountered in a List, then you can check against that list whether you allready saw that word.
List<String> encounteredWords = new ArrayList<String>();
// continue after that you found out what the word was
if(!encounteredWords.contains(word.toLowerCase()){
encounteredWords.add(word.toLowerCase());
wordCount++;
}
But, Antimony, made a interesting remark as well, he uses the property of a Set to see what the distinct wordcount is. It is defined that a set can never contain duplicates, so if you just add more of the same word, the set wont grow in size.
Set<String> wordSet = new HashSet<String>();
// continue after that you found out what the word was
wordSet.add(word.toLowerCase());
// continue after that you scanned trough all words
return wordSet.size();

remove all punctuations
convert all strings to lowercase OR uppercase
put those strings in a set
get the size of the set

As you parse your input string, store it word by word in a map data structure. Just ensure that "word", "word?" "word!" all are stored with the key "word" in the map, and increment the word's count whenever you have to add to the map.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Character occurrence in a txt file java - java

use inputFile.next().equals(letter) instead of inputFile.next() == letter1. Because == checks for the references. You should check the contents of the String object. So use equals() of String And as said in comments change count += count to count +=1 or count++. Read here for more explanation.

Related

Java Get first character values for a string

Counting the occurrences of string in Java using string.split()

How can I compare 2 strings character by character?

Add brackets to sequence of chars in string

Word Count no duplicates

Categories

Resources