I want to code a regex in java. The possible strings for this are:
yyyyyy$
<t>yy<\t>$
<t><t>yyyyy<\t><\t>$
<t><t>y<\t>y<\t><t>yyyyy<\t>yy$
And the strings NOT allowed or possible are:
<t><\t>$ (no “y” in the string)
<t>yy<t><\t>$ (one extra <t> ).
Some Specifications are:
There is exactly one $ in any correct string, and
this is always the last symbol in the string. The
string before the $ must be non-empty, and we call
it an expression. An expression is defined recursively
as:
the letter ‘y’
an expression bracketed by <t> and <\t>
a sequence of expressions.
The regex I have built is : y+|y*(<t>y+(<t>y*<\t>)*<\t>)
Now I am coding this regex in java as: "d+|(d*(<s>d+(<s>d*<\\s>)*<\\s>))$"
Code:
private static void checkForPattern(String input) {
Pattern p = Pattern.compile(" d+ | (d*(<s>d+(<s>d*<\\s>)*<\\s>)) $");
//Pattern p= Pattern.compile("d+|d*<s>dd<\\s>$");
Matcher m = p.matcher(input);
if (m.matches()) {
System.out.println("Correct string");
} else {
System.out.println("Wrong string");
}
}
What is the error in the syntax as it is saying "wrong" on every String that I am parsing.
I would suggest not using regex for this since Java's regex engine cannot effectively balance the number of <t> vs <\t> occurrences like other regex engines can (i.e. .NET). Even doing this in those engines is fairly complex and there are likely better ways to go about your problem. The code below does just this: It counts the number of occurrences of <t> and ensures the same number of <\t> exists. Similarly, it counts the number of occurrences of y and ensures there's more than 0 instances. The logic for the countOccurrences method was adapted from this answer on the question Occurrences of substring in a string.
See code in use here
class Main {
public static void main(String[] args) {
String[] strings = {
"yyyyyy$",
"<t>yy<\\t>$",
"<t><t>yyyyy<\\t><\\t>$",
"<t><t>y<\\t>y<\\t><t>yyyyy<\\t>yy$",
"<t><\\t>$",
"<t>yy<t><\\t>$"
};
for(String s : strings) {
if (countOccurrences("<t>", s) == countOccurrences("<\\t>", s) && countOccurrences("y", s) > 0) {
System.out.println("Good: " + s);
} else {
System.out.println("Bad: " + s);
}
}
}
private static int countOccurrences(String needle, String haystack) {
int lastIndex = 0;
int count = 0;
while (lastIndex != -1) {
lastIndex = haystack.indexOf(needle, lastIndex);
if (lastIndex != -1) {
count++;
lastIndex += needle.length();
}
}
return count;
}
}
Result:
Good: yyyyyy$
Good: <t>yy<\t>$
Good: <t><t>yyyyy<\t><\t>$
Good: <t><t>y<\t>y<\t><t>yyyyy<\t>yy$
Bad: <t><\t>$
Bad: <t>yy<t><\t>$
After a thorough research and reading, I have concluded that regex for such type of Language cannot be created as it is an infinite Automata (regex for infinite automata cannot be created). So to solve this problem we will have to create the CFG directly.
CFG for the above mentioned problem is below:
R --> <t>S<\t>$(1.1 production)
R-->SS$(1.2 production)
R-->y$(1.3 production)
S--><t>S<\t>(2.1 production)
S-->SS(2.2 production)
S-->y(2.3 production)
I have come across regular expressions for different problems but I could not find out regex s to balance characters in a string.
I came across a problem, to find if a string is balanced.
ex: aabbccdd is a balanced one, as a characters are repeated in even numbers
but aabbccddd is not a balanced one since ddd is repeated in odd number mode. This is applicable for all characters give an input not to specific a,b,c and d. If i give input as 12344321 or 123454321, it should return balanced and unbalanced result respectively.
How to find the balance using regex. What type of regular expression we should use to find if the string is balanced?
Edit:
I tried to find solution using regex only as the problem demands answer in regex pattern. I would implemented using any other solution if regex was not mentioned explicitly
I don't think you can do it with regex. Why do you need to use them?
I tried this: it works and it's pretty simple
static boolean isBalanced(String str) {
ArrayList<Character> odds = new ArrayList<>(); //Will contain the characters read until now an odd number of times
for (char x : str.toCharArray()) { //Reads each char of the string
if (odds.contains(x)) { //If x was in the arraylist we found x an even number of times so let's remove it
odds.remove(odds.indexOf(x));
}
else {
odds.add(x);
}
}
return odds.isEmpty();
}
Regular expression for this problem exists, but doesn't speed up anythings and will be totally messy. It's easier to prepare NFA, and then switch to REGEX. Still, it's not proper tool.
public static void main(String args[]) {
String s = args[0];
int[] counter = new int[256];
for (int i = 0; i < s.length(); i++) {
counter[s.charAt(i)]++;
}
if (validate(counter)) {
System.out.println("valid");
} else {
System.out.println("invalid");
}
}
public static boolean validate(int[] tab) {
for (int i : tab) {
if (i%2 == 1) {
return false;
}
}
return true;
}
Edit: for pointing the regex existance
Reference for a finite automate for just two characters. Start on the very left, win with double circle. Each state named by the set of characters that have odd count so far.
I'm writing a simple program that counts how many times a sequence appears in a string.
Case 1:
Given string: EATNEATMMMMEAT
Given sequence: EAT
The program should return a value of 3.
Case 2:
Given string: EATEAT
Given sequence: EAT
The program should return 2.
import java.util.*;
public class FrequencyOfSequence { //Finds the frequency of a sequence in a string s
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String s = in.nextLine();
String sequence = in.nextLine();
String[] cArr = s.split(sequence);
System.out.println(cArr.length);
}
}
My program works in case 1. It fails on the 2nd case because s.split(sequence) removes both 'EAT', leaving an array of size 0.
Is there a way around this?
Use Regex for this:
Pattern pattern = Pattern.compile(sequence);
Matcher matcher = pattern.matcher(s);
int count = 0;
while (matcher.find())
count++;
System.out.println(count);
One option is to use replace() to remove the matches and calculate the difference in size:
int count = (s.length() - s.replace(sequence, "").length()) / sequence.length();
If you want to use the split() method, it should work if you use it like this:
int count = s.split(sequence, -1).length - 1;
The -1 argument tells the method not to discard trailing empty strings. Then we subtract 1 from the length to avoid the fencepost problem.
Here is one more option you would like to try
int count=0;
if(s.endsWith(sequence)){
count=s.split(sequence).length;}
else{
count=s.split(sequence).length-1;
}
I have text as a String and need to calculate number of syllables in each word. I've tried to split all text into array of words and than processed each word separately. I used regular expressions for that. But pattern for syllables doesn't work as it should. Please advice how to change it to calculate correct number of syllables. My initial code.
public int getNumSyllables()
{
String[] words = getText().toLowerCase().split("[a-zA-Z]+");
int count=0;
List <String> tokens = new ArrayList<String>();
for(String word: words){
tokens = Arrays.asList(word.split("[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*"));
count+= tokens.size();
}
return count;
}
This question is from a Java Course of UCSD, am I right?
I think you should provide enough information for this question, so that it won't confused people who want to offer some help. And here I have my own solution, which already been tested by the test case from the local program, also the OJ from UCSD.
You missed some important information about the definition of syllable in this question. Actually I think the key point of this problem is how should you deal with the e. For example, let's say there is a combination of te. And if you put te in the middle of a word, of course it should be counted as a syllable; However if it's at the end of a word, the e should be thought as a silent e in English, so it should not be thought as a syllable.
That's it. And I would like to write down my thought with some pseudo code:
if(last character is e) {
if(it is silent e at the end of this word) {
remove the silent e;
count the rest part as regular;
} else {
count++;
} else {
count it as regular;
}
}
You may find that I am not only using regex to deal with this problem. Actually I have thought about it: can this question really be done only using regex? My answer is: nope, I don't think so. At least now, with the knowledge UCSD gives us, it's too difficult to do that. Regex is a powerful tool, it can map the desired characters very fast. However regex is missing some functionality. Take the te as example again, regex won't be able to think twice when it is facing the word like teate (I made up this word just for example). If our regex pattern would count the first te as syllable, then why the last te not?
Meanwhile, UCSD actually have talked about it on the assignment paper:
If you find yourself doing mental gymnastics to come up with a single regex to count syllables directly, that's usually an indication that there's a simpler solution (hint: consider a loop over characters--see the next hint below). Just because a piece of code (e.g. a regex) is shorter does not mean it is always better.
The hint here is that, you should think this problem together with some loop, combining with regex.
OK, I should finally show my code now:
protected int countSyllables(String word)
{
// TODO: Implement this method so that you can call it from the
// getNumSyllables method in BasicDocument (module 1) and
// EfficientDocument (module 2).
int count = 0;
word = word.toLowerCase();
if (word.charAt(word.length()-1) == 'e') {
if (silente(word)){
String newword = word.substring(0, word.length()-1);
count = count + countit(newword);
} else {
count++;
}
} else {
count = count + countit(word);
}
return count;
}
private int countit(String word) {
int count = 0;
Pattern splitter = Pattern.compile("[^aeiouy]*[aeiouy]+");
Matcher m = splitter.matcher(word);
while (m.find()) {
count++;
}
return count;
}
private boolean silente(String word) {
word = word.substring(0, word.length()-1);
Pattern yup = Pattern.compile("[aeiouy]");
Matcher m = yup.matcher(word);
if (m.find()) {
return true;
} else
return false;
}
You may find that besides from the given method countSyllables, I also create two additional methods countit and silente. countit is for counting the syllables inside the word, silente is trying to figure it out that is this word end with a silent e. And it should also be noticed that the definition of not silent e. For example, the should be consider not silent e, while ate is considered silent e.
And here is the status my code has already passed the test, from both local test case and OJ from UCSD:
And from OJ the test result:
P.S: It should be fine to use something like [^aeiouy] directly, because the word is parsed before we call this method. Also change to lowercase is necessary, that would save a lot of work dealing with the uppercase. What we want is only the number of syllables.
Talking about number, an elegant way is to define count as static, so the private method could directly use count++ inside. But now it's fine.
Feel free to contact me if you still don't get the method of this question :)
Using the concept of user5500105, I have developed the following method to calculate the number of Syllables in a word. The rules are:
consecutive vowels are counted as 1 syllable. eg. "ae" "ou" are 1 syllable
Y is considered as a vowel
e at the end is counted as syllable if e is the only vowel: eg: "the" is one syllable, since "e" at the end is the only vowel while "there" is also 1 syllable because "e" is at the end and there is another vowel in the word.
public int countSyllables(String word) {
ArrayList<String> tokens = new ArrayList<String>();
String regexp = "[bcdfghjklmnpqrstvwxz]*[aeiouy]+[bcdfghjklmnpqrstvwxz]*";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(word.toLowerCase());
while (m.find()) {
tokens.add(m.group());
}
//check if e is at last and e is not the only vowel or not
if( tokens.size() > 1 && tokens.get(tokens.size()-1).equals("e") )
return tokens.size()-1; // e is at last and not the only vowel so total syllable -1
return tokens.size();
}
This gives you a number of syllables vowels in a word:
public int getNumVowels(String word) {
String regexp = "[bcdfghjklmnpqrstvwxz]*[aeiouy]+[bcdfghjklmnpqrstvwxz]*";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(word.toLowerCase());
int count = 0;
while (m.find()) {
count++;
}
return count;
}
You can call it on every word in your string array:
String[] words = getText().split("\\s+");
for (String word : words ) {
System.out.println("Word: " + word + ", vowels: " + getNumVowels(word));
}
Update: as freerunner noted, calculating the number of syllables is more complicated than just counting vowels. One need to take into account combinations like ou, ui, oo, the final silent e and possibly something else. As I am not a native English speaker, I am not sure what the correct algorithm would be.
This is how I do it. This is about as simple an algorithm I could come up with.
public static int syllables(String s) {
final Pattern p = Pattern.compile("([ayeiou]+)");
final String lowerCase = s.toLowerCase();
final Matcher m = p.matcher(lowerCase);
int count = 0;
while (m.find())
count++;
if (lowerCase.endsWith("e"))
count--;
return count < 0 ? 1 : count;
}
I use this in combination with a soundex function to determine if words sound alike. The syllable count improves accuracy of my soundex function.
Note: This is strictly for counting the syllables in a word. I assume you can parse your input for words using something like java.util.StringTokenizer.
Your line
String[] words = getText().toLowerCase().split("[a-zA-Z]+");
is splitting ON words, and returning only the space between words! You want to split on the space between words, as follows:
String[] words = getText().toLowerCase().split("\\s+");
you can do it as the following :
public int getNumSyllables()
{
return getSyllables(getTokens("[a-zA-Z]+"));
}
protected List<String> getWordTokens(String word,String pattern)
{
ArrayList<String> tokens = new ArrayList<String>();
Pattern tokSplitter = Pattern.compile(pattern);
Matcher m = tokSplitter.matcher(word);
while (m.find()) {
tokens.add(m.group());
}
return tokens;
}
private int getSyllables(List<String> tokens)
{
int count=0;
for(String word : tokens)
if(word.toLowerCase().endsWith("e") && getWordTokens(word.toLowerCase().substring(0, word.length()-1), "[aeiouy]+").size() > 0)
count+=getWordTokens(word.toLowerCase().substring(0, word.length()-1), "[aeiouy]+").size();
else
count+=getWordTokens(word.toLowerCase(), "[aeiouy]+").size();
return count;
}
I count the the separately, then split the text based on words which are ended with e.
Then counting the syllables, here is my implementation:
int syllables = 0;
word = word.toLowerCase();
if(word.contains("the ")){
syllables ++;
}
String[] split = word.split("e!$|e[?]$|e,|e |e[),]|e$");
ArrayList<String> tokens = new ArrayList<String>();
Pattern tokSplitter = Pattern.compile("[aeiouy]+");
for (int i = 0; i < split.length; i++) {
String s = split[i];
Matcher m = tokSplitter.matcher(s);
while (m.find()) {
tokens.add(m.group());
}
}
syllables += tokens.size();
I've testesd an all test cases are passed.
You are using method split incorrectly. This method recieve separator. Need write something like this:
String[] words = getText().toLowerCase().split(" ");
But if you want to count the number of syllables, it is enough to count the number of vowels:
String input = "text";
Set<Character> vowel = new HashSet<>();
vowel.add('a');
vowel.add('e');
vowel.add('i');
vowel.add('o');
vowel.add('u');
int count = 0;
for (char c : input.toLowerCase().toCharArray()) {
if (vowel.contains(c)){
count++;
}
}
System.out.println("count = " + count);
i need to extract the numbers alone from this text i use sub string to extract the details some times the number decreases so i am getting an error value...
example(16656);
Use Pattern to compile your regular expression and Matcher to get a particular captured group. The regex I'm using is:
example\((\d+)\)
which captures the digits (\d+) within the parentheses. So:
Pattern p = Pattern.compile("example\\((\\d+)\\)");
Matcher m = p.matcher(text);
if (m.find()) {
int i = Integer.valueOf(m.group(1));
...
}
look at Java Regular Expression sample here:
http://java.sun.com/developer/technicalArticles/releases/1.4regex/
specially focus on find method.
String yourString = "example(16656);";
Pattern pattern = Pattern.compile("\\w+\\((\\d+)\\);");
Matcher matcher = pattern.matcher(yourString);
if (matcher.matches())
{
int value = Integer.parseInt(matcher.group(1));
System.out.println("Your number: " + value);
}
I will suggest you to write your own logic to do this. Using Pattern and Matcher things from java are good practice but these are standard solutions and may not suit as a solution in effective manner always. Like cletus provided a very neat solution but what happens in this logic is that a substring matching algorithm is performed in the background to trace digits. You do not need the pattern finding here I suppose. You just need to extract the digits from a string (like 123 from "a1b2c3") .See the following code which does it in clean manner in O(n) and does not perform unnecessary extra operation as Pattern and Matcher classes do for you (just do copy and paste and run :) ):
public class DigitExtractor {
/**
* #param args
*/
public static void main(String[] args) {
String sample = "sdhj12jhj345jhh6mk7mkl8mlkmlk9knkn0";
String digits = getDigits(sample);
System.out.println(digits);
}
private static String getDigits(String sample) {
StringBuilder out = new StringBuilder(10);
int stringLength = sample.length();
for(int i = 0; i <stringLength ; i++)
{
char currentChar = sample.charAt(i);
int charDiff = currentChar -'0';
boolean isDigit = ((9-charDiff)>=0&& (9-charDiff <=9));
if(isDigit)
out.append(currentChar);
}
return out.toString();
}
}