java regex: capitalize words with certain number of characters

java regex: capitalize words with certain number of characters - java

I am trying to capitalize the words in a string with more than 5 characters.
I was able to retrieve the number of words that are greater 5 characters using .length, and I could exclude the words that were greater than 5 characters but I couldn't capitalize them.
Ex. input: "i love eating pie"
Ex. output: "i love Eating pie"
Here's my code:
public static void main(String[] args) {
String sentence = "";
Scanner input = new Scanner(System.in);
System.out.println("Enter a sentence: ");
sentence = input.nextLine();
String[] myString = sentence.split("\\s\\w{6,}".toUpperCase());
for (String myStrings : myString) {
System.out.println(sentence);
System.out.println(myStrings);
}

String sentence = "";
StringBuilder sb = new StringBuilder(sentence.length());
Scanner input = new Scanner(System.in);
System.out.println("Enter a sentence: ");
sentence = input.nextLine();
/*
* \\s (match whitespace character)
* (<?g1> (named group with name g1)
* \\w{6,}) (match word of length 6) (end of g1)
* | (or)
* (?<g2> (named group with name g2)
* \\S+) (match any non-whitespace characters) (end of g2)
*/
Pattern pattern = Pattern.compile("\\s(?<g1>\\w{6,})|(?<g2>\\S+)");
Matcher matcher = pattern.matcher(sentence);
//check if the matcher found a match
while (matcher.find())
{
//get value from g1 group (null if not found)
String g1 = matcher.group("g1");
//get value from g2 group (null if not found)
String g2 = matcher.group("g2");
//if g1 is not null and is not an empty string
if (g1 != null && g1.length() > 0)
{
//get the first character of this word and upercase it then append it to the StringBuilder
sb.append(Character.toUpperCase(g1.charAt(0)));
//sanity check to stop us from getting IndexOutOfBoundsException
//check if g1 length is more than 1 and append the rest of the word to the StringBuilder
if(g1.length() > 1) sb.append(g1.substring(1, g1.length()));
//append a space
sb.append(" ");
}
//we only need to check if g2 is not null here
if (g2 != null)
{
//g2 is smaller than 5 characters so just append it to the StringBuilder
sb.append(g2);
//append a space
sb.append(" ");
}
}
System.out.println("Original Sentence: " + sentence);
System.out.println("Modified Sentence: " + sb.toString());

Split input sentence with space as delimiter and use intiCap method if length greater than 5:
PS: System.out.print to be replaced with StringBuilder.
String delim = " ";
String[] myString = sentence.split(delim);
for (int i = 0; i < myString.length; i++) {
if (i != 0) System.out.print(delim);
if (myString[i].length() > 5)
System.out.print(intiCap(myString[i]));
else
System.out.print(myString[i]);
}
private static String intiCap(String string) {
return Character.toUpperCase(string.charAt(0)) + string.substring(1);
}

You can use the following (short and sweet :P):
Pattern p = Pattern.compile("(?=\\b\\w{6,})([a-z])\\w+");
Matcher m = p.matcher(sentence);
StringBuffer s = new StringBuffer();
while (m.find()){
m.appendReplacement(s, m.group(1).toUpperCase() + m.group(0).substring(1));
}
System.out.println(s.toString());
See Ideone Demo

Related

java regex parse

Thanks for checking out my question.
Here the user enter the string in the format: "xD xS xP xH". The program takes the string, splits it on the space bar, then uses regex to parse the string. There is an issue with my "final string regex" and I am not sure where.
final String regex = "([0-9]+)[D|d]| ([0-9]+)[S|s]| ([0-9]+)[P|p]| ([0-9]+)[H|h]";
Lastly, the loop prints out only the value for D so I suspect it reaches an error moving to match S or s.
public class parseStack
{
public parseStack()
{
System.out.print('\u000c');
String CurrencyFormat = "xD xS xP xH";
System.out.println("Please enter currency in the following format: \""+CurrencyFormat+"\" where x is any integer");
Scanner scan = new Scanner(System.in);
String currencyIn = scan.nextLine();
currencyFinal = currencyIn.toUpperCase();
System.out.println("This is the currency you entered: "+currencyFinal);
String[] tokens = currencyFinal.split(" ");
final String input = tokens[0];
final String regex = "([0-9]+)[D|d]| ([0-9]+)[S|s]| ([0-9]+)[P|p]| ([0-9]+)[H|h]";
if (input.matches(regex) == false) {
throw new IllegalArgumentException("Input is malformed.");
}
long[] values = Arrays.stream(input.replaceAll(regex, "$1 $2 $3 $4").split(" "))
.mapToLong(Long::parseLong)
.toArray();
for (int i=0; i<values.length; i++)
{
System.out.println("value of i: "+i+ " |" +values[i]+ "|");
}
//pause to print
System.out.println("Please press enter to continue . . . ");
Scanner itScan = new Scanner(System.in);
String nextIt = itScan.nextLine();
}
}

Your regular expression should be [\d]+[DdSsPpHh].
The problem you are having is you split the string into chunks, then you match chunks with a RegEx that matches the original string that you have split.
HOWEVER this answer only addresses a problem in your code. Your routine doesn't seem to cater your expectation. And your expectation is not clear at all.
EDIT
Added the multidigit requirement.

Your regex can be simplified somewhat.
"(?i)(\d+d) (\d+s) (\d+p) (\d+h)"
will do a case-insensitive match against multiple digits ( \d+ )
This can be further simplified into
"(?i)(\d+[dsph])"
which will iteratively match the various groups in your currency string.

First of all your regex looks a bit to complex. You input format is "xD xS xP xH" also you are converting the input to uppercase currencyIn = currencyIn.toUpperCase(); but this isn't the problem.
The problem is
String[] tokens = currencyIn.split(" ");
final String input = tokens[0];
You are splitting the input and only use the first part which would be "xD"
The fixed code would look like:
String currencyIn = scan.nextLine();
currencyIn = currencyIn.toUpperCase();
System.out.println("This is the currency you entered: "+currencyIn);
final String regex = "([0-9]+)D ([0-9]+)S ([0-9]+)P ([0-9]+)H";
if (!currencyIn.matches(regex)) {
throw new IllegalArgumentException("Input is malformed.");
}
long[] values = Arrays.stream(currencyIn.replaceAll(regex, "$1 $2 $3 $4").split(" "))
.mapToLong(Long::parseLong)
.toArray();
for (int i=0; i<values.length; i++) {
System.out.println("value of i: "+i+ " |" +values[i]+ "|");
}

How can I move the punctuation from the end of a string to the beginning?

I am attempting to write a program that reverses a string's order, even the punctuation. But when my backwards string prints. The punctuation mark at the end of the last word stays at the end of the word instead of being treated as an individual character.
How can I split the end punctuation mark from the last word so I can move it around?
For example:
When I type in : Hello my name is jason!
I want: !jason is name my Hello
instead I get: jason! is name my Hello
import java.util.*;
class Ideone
{
public static void main(String[] args) {
Scanner userInput = new Scanner(System.in);
System.out.print("Enter a sentence: ");
String input = userInput.nextLine();
String[] sentence= input.split(" ");
String backwards = "";
for (int i = sentence.length - 1; i >= 0; i--) {
backwards += sentence[i] + " ";
}
System.out.print(input + "\n");
System.out.print(backwards);
}
}

Manually rearranging Strings tends to become complicated in no time. It's usually better (if possible) to code what you want to do, not how you want to do it.
String input = "Hello my name is jason! Nice to meet you. What's your name?";
// this is *what* you want to do, part 1:
// split the input at each ' ', '.', '?' and '!', keep delimiter tokens
StringTokenizer st = new StringTokenizer(input, " .?!", true);
StringBuilder sb = new StringBuilder();
while(st.hasMoreTokens()) {
String token = st.nextToken();
// *what* you want to do, part 2:
// add each token to the start of the string
sb.insert(0, token);
}
String backwards = sb.toString();
System.out.print(input + "\n");
System.out.print(backwards);
Output:
Hello my name is jason! Nice to meet you. What's your name?
?name your What's .you meet to Nice !jason is name my Hello
This will be a lot easier to understand for the next person working on that piece of code, or your future self.
This assumes that you want to move every punctuation char. If you only want the one at the end of the input string, you'd have to cut it off the input, do the reordering, and finally place it at the start of the string:
String punctuation = "";
String input = "Hello my name is jason! Nice to meet you. What's your name?";
System.out.print(input + "\n");
if(input.substring(input.length() -1).matches("[.!?]")) {
punctuation = input.substring(input.length() -1);
input = input.substring(0, input.length() -1);
}
StringTokenizer st = new StringTokenizer(input, " ", true);
StringBuilder sb = new StringBuilder();
while(st.hasMoreTokens()) {
sb.insert(0, st.nextToken());
}
sb.insert(0, punctuation);
System.out.print(sb);
Output:
Hello my name is jason! Nice to meet you. What's your name?
?name your What's you. meet to Nice jason! is name my Hello

Like the other answers, need to separate out the punctuation first, and then reorder the words and finally place the punctuation at the beginning.
You could take advantage of String.join() and Collections.reverse(), String.endsWith() for a simpler answer...
String input = "Hello my name is jason!";
String punctuation = "";
if (input.endsWith("?") || input.endsWith("!")) {
punctuation = input.substring(input.length() - 1, input.length());
input = input.substring(0, input.length() - 1);
}
List<String> words = Arrays.asList(input.split(" "));
Collections.reverse(words);
String reordered = punctuation + String.join(" ", words);
System.out.println(reordered);

The below code should work for you
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ReplaceSample {
public static void main(String[] args) {
String originalString = "TestStr?";
String updatedString = "";
String regex = "end\\p{Punct}+|\\p{Punct}+$";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(originalString);
while (matcher.find()) {
int start = matcher.start();
updatedString = matcher.group() + originalString.substring(0, start);<br>
}
System.out.println("Original -->" + originalString + "\nReplaced -->" + updatedString);
}
}

You need to follow the below steps:
(1) Check for the ! character in the input
(2) If input contains ! then prefix it to the empty output string variable
(3) If input does not contain ! then create empty output string variable
(4) Split the input string and iterate in reverse order (you are already doing this)
You can refer the below code:
public static void main(String[] args) {
Scanner userInput = new Scanner(System.in);
System.out.print("Enter a sentence: ");
String originalInput = userInput.nextLine();
String backwards = "";
String input = originalInput;
//Define your punctuation chars into an array
char[] punctuationChars = {'!', '?' , '.'};
String backwards = "";
//Remove ! from the input
for(int i=0;i<punctuationChars.length;i++) {
if(input.charAt(input.length()-1) == punctuationChars[i]) {
input = input.substring(0, input.length()-1);
backwards = punctuationChars[i]+"";
break;
}
}
String[] sentence= input.split(" ");
for (int i = sentence.length - 1; i >= 0; i--) {
backwards += sentence[i] + " ";
}
System.out.print(originalInput + "\n");
System.out.print(input + "\n");
System.out.print(backwards);
}

Don't split by spaces; split by word boundaries. Then you don't need to care about punctuation or even putting spaces back, because you just reverse them too!
And it's only 1 line:
Arrays.stream(input.split("\\b"))
.reduce((a, b) -> b + a)
.ifPresent(System.out::println);
See live demo.

counting character occurrence in input file

My program prompts the user to enter a specific letter and filename and then prints out the number of occurrences of the parameter letter in the input file.
Code I wrote:
public class CharCount {
public static void main(String[] args) {
Scanner inp= new Scanner(System.in);
String str;
char ch;
int count=0;
System.out.println("Enter a letter: ");
str=inp.nextLine();
while(str.length()>0)
{
ch=str.charAt(0);
int i=0;
while (i < str.length() && str.charAt(i) == ch)
{
count++;
i++;
}
str = str.substring(count);
System.out.println(ch + " appears " + count + " in" );
}
}
}
I get this output
Enter a letter:
e appears 1 in
But I should be getting this output
Enter a letter: Enter a filename: e appears 58 times in input.txt
Any help/advice would be great :)

With Java 8, you can rely on streams to do the work for you.
String sampleText = "Lorem ipsum";
Character letter = 'e';
long count = sampleText.chars().filter(c -> c == letter).count();
System.out.println(count);

Let's give a start help:
// Ask letter:
System.out.println("Enter a letter: ");
String str = inp.nextLine();
while (str.isEmpty()) {
System.out.println("Enter a letter:");
str = inp.nextLine();
}
char letter = str.charAt(0);
// Ask file name:
System.out.println("Enter file name:");
String fileName = inp.nextLine();
while (fileName.isEmpty()) {
System.out.println("Enter file name:");
fileName = tnp.nextLine();
}
// Process file:
//Scanner textInp = new Scanner(new File(fileName)); // Either old style
Scanner textInp = new Scanner(Paths.get(fileName)); // Or new style
while (textInp.hasNextLine()) {
String line = textInp.nextLine();
...
}

You could use regex.
Imports:
import java.util.regex.*;
Ex Usage:
String input = "abcaa a";
String letter = "a";
Pattern p = Pattern.compile(letter, Pattern.CASE_INSENSITIVE + Pattern.MULTILINE);
Matcher m = p.matcher(input);
int i = 0;
while(m.find()){
i++;
}
System.out.println(i); // 4 = # of occurrences of "a".

Counting number of time the articles "a","an" are being used in a text file

I'm trying to make a program that count the number of words, lines, sentences, and also the number of articles 'a', 'and','the'.
So far I got the words, lines, sentences. But I have no idea who I am going to count the articles. How can a program make the difference between 'a' and 'and'.
This my code so far.
public static void main(String[]args) throws FileNotFoundException, IOException
{
FileInputStream file= new FileInputStream("C:\\Users\\nlstudent\\Downloads\\text.txt");
Scanner sfile = new Scanner(new File("C:\\Users\\nlstudent\\Downloads\\text.txt"));
int ch,sentence=0,words = 0,chars = 0,lines = 0;
while((ch=file.read())!=-1)
{
if(ch=='?'||ch=='!'|| ch=='.')
sentence++;
}
while(sfile.hasNextLine()) {
lines++;
String line = sfile.nextLine();
chars += line.length();
words += new StringTokenizer(line, " ,").countTokens();
}
System.out.println("Number of words: " + words);
System.out.println("Number of sentence: " + sentence);
System.out.println("Number of lines: " + lines);
System.out.println("Number of characters: " + chars);
}
}

How can a program make the difference between 'a' and 'and'.
You can use regex for this:
String input = "A and Andy then the are a";
Matcher m = Pattern.compile("(?i)\\b((a)|(an)|(and)|(the))\\b").matcher(input);
int count = 0;
while(m.find()){
count++;
}
//count == 4
'\b' is a word boundary, '|' is OR, '(?i)' — ignore case flag. All list of patterns you can find here and probably you should learn about regex.

The tokenizer will split each line into tokens. You can evaluate each token (a whole word) to see if it matches a string you expect. Here is an example to count a, and, the.
int a = 0, and = 0, the = 0, forCount = 0;
while (sfile.hasNextLine()) {
lines++;
String line = sfile.nextLine();
chars += line.length();
StringTokenizer tokenizer = new StringTokenizer(line, " ,");
words += tokenizer.countTokens();
while (tokenizer.hasMoreTokens()) {
String element = (String) tokenizer.nextElement();
if ("a".equals(element)) {
a++;
} else if ("and".equals(element)) {
and++;
} else if ("for".equals(element)) {
forCount++;
} else if ("the".equals(element)) {
the++;
}
}
}

How to pull out substrings (words) from string?

I'm trying to input a four word sentence, and then be able to print out each word individually using indexOf and substrings. Any ideas what I'm doing wrong?
Edited
So is this what it should look like? I've ran it twice and received two different answers, so I'm not sure if my program running the program is faulty or my program itself is faulty.
import java.util.Scanner;
public class arithmetic {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String sentence;
String word1, word2, word3, word4;
int w1, w2, w3, w4;
int p, p2, p3, p4;
System.out.print("Enter a sentence with 4 words: ");
sentence = in.nextLine();
p = sentence.indexOf(" ");
word1 = sentence.substring(0,p)+" ";
w1 = 1 + word1.length();
p2 = word1.indexOf(" ");
word2 = sentence.substring(w1,p2);
w2 = w1+1+word2.length();
p3 = word2.indexOf(" ");
word3 = sentence.substring(w2,p3);
w3 = w1+w2+1+word3.length();
p4 = word3.indexOf(" ");
word4 = sentence.substring(w3,p4);
w4 = w1+w2+w3+1+word4.length();

I see at least two things:
You're not computing the indices correctly. The starting index for the third word should be something like length of first word + 1 + length of second word + 1, but it looks like you're leaving out the length of the first word. Similarly, when you're getting the fourth word, you're leaving out the lengths of the first two words.
indexOf(" ") will only get you the index of the first occurrence of a space. After you get the first space, you're reusing that index instead of using the indices of the other spaces.
Lastly, after you fix those two, if you know that the words are going to be delimited by spaces, then you might want to look at the String.split function. Using that, you could split your sentence without having to do all of the space-finding yourself.

I hardly suggest not to use substring and indexOf, both for performance reasons, readability, and bugs. Consider any of the following (all of these are considering words as non-whitespace characters):
public static void main (String[] args) throws java.lang.Exception
{
int wordNo = 0;
System.out.println("using a Scanner (exactly 4 words):");
InputStream in0 = new ByteArrayInputStream("a four word sentence".getBytes("UTF-8"));
Scanner scanner = new Scanner(/*System.*/in0);
try {
String word1 = scanner.next();
String word2 = scanner.next();
String word3 = scanner.next();
String word4 = scanner.next();
System.out.printf("1: %s, 2: %s, 3: %s, 4: %s\n", word1, word2, word3, word4);
} catch(NoSuchElementException ex) {
System.err.println("The sentence is shorter than 4 words");
}
System.out.println("\nusing a Scanner (general):");
InputStream in1 = new ByteArrayInputStream("this is a sentence".getBytes("UTF-8"));
for(Scanner scanner1 = new Scanner(/*System.*/in1); scanner1.hasNext(); ) {
String word = scanner1.next();
System.out.printf("%d: %s\n", ++wordNo, word);
}
System.out.println("\nUsing BufferedReader and split:");
InputStream in2 = new ByteArrayInputStream("this is another sentence".getBytes("UTF-8"));
BufferedReader reader = new BufferedReader(new InputStreamReader(/*System.*/in2));
String line = null;
while((line = reader.readLine()) != null) {
for(String word : line.split("\\s+")) {
System.out.printf("%d: %s\n", ++wordNo, word);
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java regex: capitalize words with certain number of characters - java

Related

java regex parse

How can I move the punctuation from the end of a string to the beginning?

counting character occurrence in input file

Counting number of time the articles "a","an" are being used in a text file

How to pull out substrings (words) from string?

Categories

Resources