I have a really long text that looks like "123testes1233iambeginnerplshelp123 .." and I need to separate the line with the paragraph each time the program reads number.
So output should be like:
123tests
12333iambeninnerplshelp
123 ...
You can solve it using Regex. Everytime we are looking for patterns where number is followed by characters and if it is found, print it:
String text = "123testes1233stackoverflowwillsaveyou123dontworry";
String wordToFind = "\\d+[a-z]+";
Pattern word = Pattern.compile(wordToFind);
Matcher match = word.matcher(text);
while (match.find()) {
System.out.println(match.group());
}
One way to do it would be to use StringTokenizer. If you make the assumption that every output line must start with 123, even if the input doesn't start with it, it could be:
String input = "123testes1233iambeginnerplshelp123 ..";
String delimiter = "123";
StringTokenizer tokenizer = new StringTokenizer(input, delimiter);
while (tokenizer.hasMoreTokens()) {
String line = delimiter + tokenizer.nextToken();
System.out.println(line);
}
A simple approach (without any dependencies) would look something like this,
class Test {
public static void main (String[] args) throws java.lang.Exception
{
String a = "123testes1233iambeginnerplshelp123";
StringBuffer sb = new StringBuffer();
for (int i=0; i<a.length()-1; i++) {
while (i<a.length()-1 && !(!isNumber(a.charAt(i)) && isNumber(a.charAt(i+1)))) {
sb.append(a.substring(i,i+1));
i++;
}
sb.append(a.substring(i,i+1));
System.out.println(sb.toString());
sb.setLength(0);
}
}
private static boolean isNumber (char c) {
return ((int)c >=48) && ((int)c <= 57);
}
}
my solution
public StringSplitNum(){
String someString = "123testes1233iambeginnerplshelp123abc";
String regex = "((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))";
List arr = Arrays.asList(someString.split(regex));
for(int i=0; i< arr.size();i+=2){
System.out.println(arr.get(i)+ " " + arr.get(i+1));
}
I'm trying to write a program that will allow a user to input a phrase (for example: "I like cats") and print each word on a separate line. I have already written the part to allow a new line at every space but I don't want to have blank lines between the words because of excess spaces. I can't use any regular expressions such as String.split(), replaceAll() or trim().
I tried using a few different methods but I don't know how to delete spaces if you don't know the exact number there could be. I tried a bunch of different methods but nothing seems to work.
Is there a way I could implement it into the code I've already written?
for (i=0; i<length-1;) {
j = text.indexOf(" ", i);
if (j==-1) {
j = text.length();
}
System.out.print("\n"+text.substring(i,j));
i = j+1;
}
Or how can I write a new expression for it? Any suggestions would really be appreciated.
I have already written the part to allow a new line at every space but
I don't want to have blank lines between the words because of excess
spaces.
If you can't use trim() or replaceAll(), you can use java.util.Scanner to read each word as a token. By default Scanner uses white space pattern as a delimiter for finding tokens. Similarly, you can also use StringTokenizer to print each word on new line.
String str = "I like cats";
Scanner scanner = new Scanner(str);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
OUTPUT
I
like
cats
Here is a simple solution using substring() and indexOf()
public static void main(String[] args) {
List<String> split = split("I like cats");
split.forEach(System.out::println);
}
public static List<String> split(String s){
List<String> list = new ArrayList<>();
while(s.contains(" ")){
int pos = s.indexOf(' ');
list.add(s.substring(0, pos));
s = s.substring(pos + 1);
}
list.add(s);
return list;
}
Edit:
If you only want to print the text without splitting or making lists, you can use this:
public static void main(String[] args) {
newLine("I like cats");
}
public static void newLine(String s){
while(s.contains(" ")){
int pos = s.indexOf(' ');
System.out.println(s.substring(0, pos));
s = s.substring(pos + 1);
}
System.out.println(s);
}
I think this will solve your problem.
public static List<String> getWords(String text) {
List<String> words = new ArrayList<>();
BreakIterator breakIterator = BreakIterator.getWordInstance();
breakIterator.setText(text);
int lastIndex = breakIterator.first();
while (BreakIterator.DONE != lastIndex) {
int firstIndex = lastIndex;
lastIndex = breakIterator.next();
if (lastIndex != BreakIterator.DONE && Character.isLetterOrDigit(text.charAt(firstIndex))) {
words.add(text.substring(firstIndex, lastIndex));
}
}
return words;
}
public static void main(String[] args) {
String text = "I like cats";
List<String> words = getWords(text);
for (String word : words) {
System.out.println(word);
}
}
Output :
I
like
cats
What about something like this, its O(N) time complexity:
Just use a string builder to create the string as you iterate through your string, add "\n" whenever you find a space
String word = "I like cats";
StringBuilder sb = new StringBuilder();
boolean newLine = true;
for(int i = 0; i < word.length(); i++) {
if (word.charAt(i) == ' ') {
if (newLine) {
sb.append("\n");
newLine = false;
}
} else {
newLine = true;
sb.append(word.charAt(i));
}
}
String result = sb.toString();
EDIT: Fixed the problem mentioned on comments (new line on multiple spaces)
Sorry, I didnot caution you cannot use replaceAll().
This is my other solution:
String s = "I like cats";
Pattern p = Pattern.compile("([\\S])+");
Matcher m = p.matcher(s);
while (m.find( )) {
System.out.println(m.group());
}
Old solution:
String s = "I like cats";
System.out.println(s.replaceAll("( )+","\n"));
You almost done all job. Just make small addition, and your code will work as you wish:
for (int i = 0; i < length - 1;) {
j = text.indexOf(" ", i);
if (i == j) { //if next space after space, skip it
i = j + 1;
continue;
}
if (j == -1) {
j = text.length();
}
System.out.print("\n" + text.substring(i, j));
i = j + 1;
}
I am writing a spell checker that takes a text file as input and outputs the file with spelling corrected.
The program should preserve formatting and punctuation.
I want to split the input text into a list of string tokens such that each token is either 1 or more: word, punctuation, whitespace, or digit characters.
For example:
Input:
words.txt:
asdf don't ]'.'..;'' as12....asdf.
asdf
Input as list:
["asdf" , " " , "don't" , " " , "]'.'..;''" , " " , "as" , "12" ,
"...." , "asdf" , "." , "\n" , "asdf"]
Words like won't and i'll should be treated as a single token.
Having the data in this format would allow me to process the tokens like so:
String output = "";
for(String token : tokens) {
if(isWord(token)) {
if(!inDictionary(token)) {
token = correctSpelling(token);
}
}
output += token;
}
So my main question is how can i split a string of text into a list of substrings as described above? Thank you.
The main difficulty here would be to find the regex that matches what you consider to be a "word". For my example I consider ' to be part of a word if it's proceeded by a letter or if the following character is a letter:
public static void main(String[] args) {
String in = "asdf don't ]'.'..;'' as12....asdf.\nasdf";
//The pattern:
Pattern p = Pattern.compile("[\\p{Alpha}][\\p{Alpha}']*|'[\\p{Alpha}]+");
Matcher m = p.matcher(in);
//If you want to collect the words
List<String> words = new ArrayList<String>();
StringBuilder result = new StringBuilder();
Now find something from the start
int pos = 0;
while(m.find(pos)) {
//Add everything from starting position to beginning of word
result.append(in.substring(pos, m.start()));
//Handle dictionary logig
String token = m.group();
words.add(token); //not used actually
if(!inDictionary(token)) {
token = correctSpelling(token);
}
//Add to result
result.append(token);
//Repeat from end position
pos = m.end();
}
//Append remainder of input
result.append(in.substring(pos));
System.out.println("Result: " + result.toString());
}
Because I like solving puzzles, I tried the following and I think it works fine:
public class MyTokenizer {
private final String str;
private int pos = 0;
public MyTokenizer(String str) {
this.str = str;
}
public boolean hasNext() {
return pos < str.length();
}
public String next() {
int type = getType(str.charAt(pos));
StringBuilder sb = new StringBuilder();
while(hasNext() && (str.charAt(pos) == '\'' || type == getType(str.charAt(pos)))) {
sb.append(str.charAt(pos));
pos++;
}
return sb.toString();
}
private int getType(char c) {
String sc = Character.toString(c);
if (sc.matches("\\d")) {
return 0;
}
else if (sc.matches("\\w")) {
return 1;
}
else if (sc.matches("\\s")) {
return 2;
}
else if (sc.matches("\\p{Punct}")) {
return 3;
}
else {
return 4;
}
}
public static void main(String... args) {
MyTokenizer mt = new MyTokenizer("asdf don't ]'.'..;'' as12....asdf.\nasdf");
while(mt.hasNext()) {
System.out.println(mt.next());
}
}
}
I'm trying to build a program with BufferedReader that reads a file and keeps track of vowels, words, and can calculate avg # of words per line. I have the skeleton in place to read the file, but I really don't know where to take it from here. Any help would be appreciated. Thanks.
import java.io.*;
public class JavaReader
{
public static void main(String[] args) throws IOException
{
String line;
BufferedReader in;
in = new BufferedReader(new FileReader("message.txt"));
line = in.readLine();
while(line != null)
{
System.out.println(line);
line = in.readLine();
}
}
}
Here's what I got. The word counting is questionable, but works for an example that I will give. Changes can be made (I accept criticism).
import java.io.*;
public class JavaReader
{
public static void main(String[] args) throws IOException
{
BufferedReader in = new BufferedReader(new FileReader("message.txt"));
String line = in.readLine();
// for keeping track of the file content
StringBuffer fileText = new StringBuffer();
while(line != null) {
fileText.append(line + "\n");
line = in.readLine();
}
// put file content to a string, display it for a test
String fileContent = fileText.toString();
System.out.println(fileContent + "--------------------------------");
int vowelCount = 0, lineCount = 0;
// for every char in the file
for (char ch : fileContent.toCharArray())
{
// if this char is a vowel
if ("aeiou".indexOf(ch) > -1) {
vowelCount++;
}
// if this char is a new line
if (ch == '\n') {
lineCount++;
}
}
double wordCount = checkWordCount(fileContent);
double avgWordCountPerLine = wordCount / lineCount;
System.out.println("Vowel count: " + vowelCount);
System.out.println("Line count: " + lineCount);
System.out.println("Word count: " + wordCount);
System.out.print("Average word count per line: "+avgWordCountPerLine);
}
public static int checkWordCount(String fileContent) {
// split words by puncutation and whitespace
String words[] = fileContent.split("[\\n .,;:&?]"); // array of words
String punctutations = ".,:;";
boolean isPunctuation = false;
int wordCount = 0;
// for every word in the word array
for (String word : words) {
// only check if it's a word if the word isn't whitespace
if (!word.trim().isEmpty()) {
// for every punctuation
for (char punctuation : punctutations.toCharArray()) {
// if the trimmed word is just a punctuation
if (word.trim().equals(String.valueOf(punctuation)))
{
isPunctuation = true;
}
}
// only add one to wordCount if the word wasn't punctuation
if (!isPunctuation) {
wordCount++;
}
}
}
return wordCount;
}
}
Sample input/output:
File:
This is a test. How do you do?
This is still a test.Let's go,,count.
Output:
This is a test. How do you do?
This is still a test.Let's go,,count.
--------------------------------
Vowel count: 18
Line count: 4
Word count: 16
Average word count per line: 4.0
You can use a Scanner to pass over the the line and retrieve every token of the string line.
line = line.replaceAll("[^a-zA-Z]", ""); //remove all punctuation
line = line.toLowerCase(); //make line lower case
Scanner scan = new Scanner(line);
String word = scan.next();
Then you could loop through each token to calculate the vowels in each word.
for(int i = 0; i < word.legnth(); i++){
//get char
char c = word.charAt(i);
//check if the char is a vowel here
if("aeiou".indexOf(c) > -1){
//c is vowel
}
}
All you need to do is set a couple of counter ints to keep track of these and you're laughing.
Ahh, if you want to make sure that there are no non-words such as " - " counting as a word, the easiest way would probably be to strip all non-alphanumeric characters out of the text.
I also added it above.
line = line.replaceAll("[^a-zA-Z]", "");
line = line.toLowerCase();
Oh and since you are new to java don't forget to import
import java.util.Scanner;
I am trying to make a program on word count which I have partially made and it is giving the correct result but the moment I enter space or more than one space in the string, the result of word count show wrong results because I am counting words on the basis of spaces used. I need help if there is a solution in a way that no matter how many spaces are I still get the correct result. I am mentioning the code below.
public class CountWords
{
public static void main (String[] args)
{
System.out.println("Simple Java Word Count Program");
String str1 = "Today is Holdiay Day";
int wordCount = 1;
for (int i = 0; i < str1.length(); i++)
{
if (str1.charAt(i) == ' ')
{
wordCount++;
}
}
System.out.println("Word count is = " + wordCount);
}
}
public static void main (String[] args) {
System.out.println("Simple Java Word Count Program");
String str1 = "Today is Holdiay Day";
String[] wordArray = str1.trim().split("\\s+");
int wordCount = wordArray.length;
System.out.println("Word count is = " + wordCount);
}
The ideas is to split the string into words on any whitespace character occurring any number of times.
The split function of the String class returns an array containing the words as its elements.
Printing the length of the array would yield the number of words in the string.
Two routes for this. One way would be to use regular expressions. You can find out more about regular expressions here. A good regular expression for this would be something like "\w+" Then count the number of matches.
If you don't want to go that route, you could have a boolean flag that remembers if the last character you've seen is a space. If it is, don't count it. So the center of the loop looks like this:
boolean prevCharWasSpace=true;
for (int i = 0; i < str1.length(); i++)
{
if (str1.charAt(i) == ' ') {
prevCharWasSpace=true;
}
else{
if(prevCharWasSpace) wordChar++;
prevCharWasSpace = false;
}
}
Update
Using the split technique is exactly equivalent to what's happening here, but it doesn't really explain why it works. If we go back to our CS theory, we want to construct a Finite State Automa (FSA) that counts words. That FSA may appear as:
If you look at the code, it implements this FSA exactly. The prevCharWasSpace keeps track of which state we're in, and the str1.charAt('i') is decideds which edge (or arrow) is being followed. If you use the split method, a regular expression equivalent of this FSA is constructed internally, and is used to split the string into an array.
Java does have StringTokenizer API and can be used for this purpose as below.
String test = "This is a test app";
int countOfTokens = new StringTokenizer(test).countTokens();
System.out.println(countOfTokens);
OR
in a single line as below
System.out.println(new StringTokenizer("This is a test app").countTokens());
StringTokenizer supports multiple spaces in the input string, counting only the words trimming unnecessary spaces.
System.out.println(new StringTokenizer("This is a test app").countTokens());
Above line also prints 5
You can use String.split (read more here) instead of charAt, you will get good results.
If you want to use charAt for some reason then try trimming the string before you count the words that way you won't have the extra space and an extra word
My implementation, not using StringTokenizer:
Map<String, Long> getWordCounts(List<String> sentences, int maxLength) {
Map<String, Long> commonWordsInEventDescriptions = sentences
.parallelStream()
.map(sentence -> sentence.replace(".", ""))
.map(string -> string.split(" "))
.flatMap(Arrays::stream)
.map(s -> s.toLowerCase())
.filter(word -> word.length() >= 2 && word.length() <= maxLength)
.collect(groupingBy(Function.identity(), counting()));
}
Then, you could call it like this, as an example:
getWordCounts(list, 9).entrySet().stream()
.filter(pair -> pair.getValue() <= 3 && pair.getValue() >= 1)
.findFirst()
.orElseThrow(() ->
new RuntimeException("No matching word found.")).getKey();
Perhaps flipping the method to return Map<Long, String> might be better.
Use split(regex) method. The result is an array of strings that was splited by regex.
String s = "Today is Holdiay Day";
System.out.println("Word count is = " + s.split(" ").length);
You need to read the file line by line and reduce the multiple occurences of the whitespaces appearing in your line to a single occurence and then count for the words. Following is a sample:
public static void main(String... args) throws IOException {
FileInputStream fstream = new FileInputStream("c:\\test.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
int wordcount = 0;
while ((strLine = br.readLine()) != null) {
strLine = strLine.replaceAll("[\t\b]", "");
strLine = strLine.replaceAll(" {2,}", " ");
if (!strLine.isEmpty()){
wordcount = wordcount + strLine.split(" ").length;
}
}
System.out.println(wordcount);
in.close();
}
public class wordCOunt
{
public static void main(String ar[])
{
System.out.println("Simple Java Word Count Program");
String str1 = "Today is Holdiay Day";
int wordCount = 1;
for (int i = 0; i < str1.length(); i++)
{
if (str1.charAt(i) == ' '&& str1.charAt(i+1)!=' ')
{
wordCount++;
}
}
System.out.println("Word count is = " +(str1.length()- wordCount));
}
}
public class wordCount
{
public static void main(String ar[]) throws Exception
{
System.out.println("Simple Java Word Count Program");
int wordCount = 1,count=1;
BufferedReader br = new BufferedReader(new FileReader("C:/file.txt"));
String str2 = "", str1 = "";
while ((str1 = br.readLine()) != null) {
str2 += str1;
}
for (int i = 0; i < str2.length(); i++)
{
if (str2.charAt(i) == ' ' && str2.charAt(i+1)!=' ')
{
wordCount++;
}
}
System.out.println("Word count is = " +(wordCount));
}
}
you should make your code more generic by considering other word separators as well.. such as "," ";" etc.
public class WordCounter{
public int count(String input){
int count =0;
boolean incrementCounter = false;
for (int i=0; i<input.length(); i++){
if (isValidWordCharacter(input.charAt(i))){
incrementCounter = true;
}else if (incrementCounter){
count++;
incrementCounter = false;
}
}
if (incrementCounter) count ++;//if string ends with a valid word
return count;
}
private boolean isValidWordCharacter(char c){
//any logic that will help you identify a valid character in a word
// you could also have a method which identifies word separators instead of this
return (c >= 'A' && c<='Z') || (c >= 'a' && c<='z');
}
}
import com.google.common.base.Optional;
import com.google.common.base.Splitter;
import com.google.common.collect.HashMultiset;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Multiset;
String str="Simple Java Word Count count Count Program";
Iterable<String> words = Splitter.on(" ").trimResults().split(str);
//google word counter
Multiset<String> wordsMultiset = HashMultiset.create();
for (String string : words) {
wordsMultiset.add(string.toLowerCase());
}
Set<String> result = wordsMultiset.elementSet();
for (String string : result) {
System.out.println(string+" X "+wordsMultiset.count(string));
}
public static int CountWords(String str){
if(str.length() == 0)
return 0;
int count =0;
for(int i=0;i< str.length();i++){
if(str(i) == ' ')
continue;
if(i > 0 && str.charAt(i-1) == ' '){
count++;
}
else if(i==0 && str.charAt(i) != ' '){
count++;
}
}
return count;
}
public class CountWords
{
public static void main (String[] args)
{
System.out.println("Simple Java Word Count Program");
String str1 = "Today is Holdiay Day";
int wordCount = 1;
for (int i = 0; i < str1.length(); i++)
{
if (str1.charAt(i) == ' ' && str1.charAt(i+1)!=' ')
{
wordCount++;
}
}
System.out.println("Word count is = " + wordCount));
}
}
This gives the correct result because if space comes twice or more then it can't increase wordcount. Enjoy.
try this
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class wordcount {
public static void main(String[] args) {
String s = "India is my country. I love India";
List<String> qw = new ArrayList<String>();
Map<String, Integer> mmm = new HashMap<String, Integer>();
for (String sp : s.split(" ")) {
qw.add(sp);
}
for (String num : qw) {
mmm.put(num, Collections.frequency(qw, num));
}
System.out.println(mmm);
}
}
To count total words Or to count total words without repeat word count
public static void main(String[] args) {
// TODO Auto-generated method stub
String test = "I am trying to make make make";
Pattern p = Pattern.compile("\\w+");
Matcher m = p.matcher(test);
HashSet<String> hs = new HashSet<>();
int i=0;
while (m.find()) {
i++;
hs.add(m.group());
}
System.out.println("Total words Count==" + i);
System.out.println("Count without Repetation ==" + hs.size());
}
}
Output :
Total words Count==7
Count without Repeatation ==5
Not sure if there is a drawback, but this worked for me...
Scanner input = new Scanner(System.in);
String userInput = input.nextLine();
String trimmed = userInput.trim();
int count = 1;
for (int i = 0; i < trimmed.length(); i++) {
if ((trimmed.charAt(i) == ' ') && (trimmed.charAt(i-1) != ' ')) {
count++;
}
}
You can use this code.It may help you:
public static void main (String[] args)
{
System.out.println("Simple Java Word Count Program");
String str1 = "Today is Holdiay Day";
int count=0;
String[] wCount=str1.split(" ");
for(int i=0;i<wCount.length;i++){
if(!wCount[i].isEmpty())
{
count++;
}
}
System.out.println(count);
}
String data = "This world is mine";
System.out.print(data.split("\\s+").length);
This could be as simple as using split and count variable.
public class SplitString {
public static void main(String[] args) {
int count=0;
String s1="Hi i love to code";
for(String s:s1.split(" "))
{
count++;
}
System.out.println(count);
}
}
public class TotalWordsInSentence {
public static void main(String[] args) {
String str = "This is sample sentence";
int NoOfWOrds = 1;
for (int i = 0; i<str.length();i++){
if ((str.charAt(i) == ' ') && (i!=0) && (str.charAt(i-1) != ' ')){
NoOfWOrds++;
}
}
System.out.println("Number of Words in Sentence: " + NoOfWOrds);
}
}
In this code, There wont be any problem regarding white-space in it.
just the simple for loop. Hope this helps...
To count specified words only like John, John99, John_John and John's only. Change regex according to yourself and count the specified words only.
public static int wordCount(String content) {
int count = 0;
String regex = "([a-zA-Z_’][0-9]*)+[\\s]*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(content);
while(matcher.find()) {
count++;
System.out.println(matcher.group().trim()); //If want to display the matched words
}
return count;
}
class HelloWorld {
public static void main(String[] args) {
String str = "User is in for an interview";
int counter=0;
String arrStr[] = str.split(" ");
for (int i = 0; i< arrStr.length; i++){
String charStr = arrStr[i];
for(int j=0; j<charStr.length(); j++) {
if(charStr.charAt(j) =='i') {
counter++;
}
}
}
System.out.println("i " + counter);
}
}
public class CountWords {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter the string :");
String str = sc.nextLine();
System.out.println("length is string is :"+str.length());
int worldCount = 1;
for(int i=0; i<str.length(); i++){
if(str.charAt(i) == ' '){
worldCount++;
}
}
System.out.println(worldCount);
}
}
The full program working is:
public class main {
public static void main(String[] args) {
logicCounter counter1 = new logicCounter();
counter1.counter("I am trying to make a program on word count which I have partially made and it is giving the correct result but the moment I enter space or more than one space in the string, the result of word count show wrong results because I am counting words on the basis of spaces used. I need help if there is a solution in a way that no matter how many spaces are I still get the correct result. I am mentioning the code below.");
}
}
public class logicCounter {
public void counter (String str) {
String str1 = str;
boolean space= true;
int i;
for ( i = 0; i < str1.length(); i++) {
if (str1.charAt(i) == ' ') {
space=true;
} else {
i++;
}
}
System.out.println("there are " + i + " letters");
}
}