How to construct regular expression to balance characters in a string?

How to construct regular expression to balance characters in a string? - java

I have come across regular expressions for different problems but I could not find out regex s to balance characters in a string.
I came across a problem, to find if a string is balanced.
ex: aabbccdd is a balanced one, as a characters are repeated in even numbers
but aabbccddd is not a balanced one since ddd is repeated in odd number mode. This is applicable for all characters give an input not to specific a,b,c and d. If i give input as 12344321 or 123454321, it should return balanced and unbalanced result respectively.
How to find the balance using regex. What type of regular expression we should use to find if the string is balanced?
Edit:
I tried to find solution using regex only as the problem demands answer in regex pattern. I would implemented using any other solution if regex was not mentioned explicitly

I don't think you can do it with regex. Why do you need to use them?
I tried this: it works and it's pretty simple
static boolean isBalanced(String str) {
ArrayList<Character> odds = new ArrayList<>(); //Will contain the characters read until now an odd number of times
for (char x : str.toCharArray()) { //Reads each char of the string
if (odds.contains(x)) { //If x was in the arraylist we found x an even number of times so let's remove it
odds.remove(odds.indexOf(x));
}
else {
odds.add(x);
}
}
return odds.isEmpty();
}

Regular expression for this problem exists, but doesn't speed up anythings and will be totally messy. It's easier to prepare NFA, and then switch to REGEX. Still, it's not proper tool.
public static void main(String args[]) {
String s = args[0];
int[] counter = new int[256];
for (int i = 0; i < s.length(); i++) {
counter[s.charAt(i)]++;
}
if (validate(counter)) {
System.out.println("valid");
} else {
System.out.println("invalid");
}
}
public static boolean validate(int[] tab) {
for (int i : tab) {
if (i%2 == 1) {
return false;
}
}
return true;
}
Edit: for pointing the regex existance
Reference for a finite automate for just two characters. Start on the very left, win with double circle. Each state named by the set of characters that have odd count so far.

Related

How to check whether certain characters are present in another string which characters are unordered? Using RegEx Java

How to check whether a string is present on another string in Java, but here the conditions be like:
For Example:
String 1: Panda
String 2: "a1d22n333a4444p"
Here String 2 needs to have the letter 'p','n','d' at-least once and 'a' at-least twice. Pattern should be matches with the above conditions.
I have done with Regular Expression, but i am not getting the solution.
public static boolean isContainsAnimal(String message,String animal) {
String animalPattern=generatePattern("panda");
Pattern pattern = Pattern.compile(animalPattern);
Matcher matcher = pattern.matcher(message);
int count = 0;
while (matcher.find()) {
count++;
}
if(count>=1){
return true;
}else
{
return false;
}
}
public static String generatePattern(String animal){
String result="";
for(int i=0;i<animal.length();i++){
result+="[^"+animal.charAt(i)+"]*"+animal.charAt(i);
}
return result;
}
Suggest me a solution for this problem.

Your attempt does not take account of the different possible orderings of the characters in the animal string. In fact, for a 5 distinct character string, there are 5 factorial different orders.
It is possible to generate a regex with all of the orderings as alternates, but the result is ... horrible and inefficient.
A better idea is to work out if there are letters (like 'a') are repeated. Then generate a regex for each letter, use "match" to apply each one and AND the results.
An even better idea is to not use regexes at all. They are the wrong tool for this job.

Java Get first character values for a string

I have inputs like
AS23456SDE
MFD324FR
I need to get First Character values like
AS, MFD
There should no first two or first 3 characters input can be changed. Need to get first characters before a number.
Thank you.
Edit : This is what I have tried.
public static String getPrefix(String serial) {
StringBuilder prefix = new StringBuilder();
for(char c : serial.toCharArray()){
if(Character.isDigit(c)){
break;
}
else{
prefix.append(c);
}
}
return prefix.toString();
}

Here is a nice one line solution. It uses a regex to match the first non numeric characters in the string, and then replaces the input string with this match.
public String getFirstLetters(String input) {
return new String("A" + input).replaceAll("^([^\\d]+)(.*)$", "$1")
.substring(1);
}
System.out.println(getFirstLetters("AS23456SDE"));
System.out.println(getFirstLetters("1AS123"));
Output:
AS
(empty)

A simple solution could be like this:
public static void main (String[]args) {
String str = "MFD324FR";
char[] characters = str.toCharArray();
for(char c : characters){
if(Character.isDigit(c))
break;
else
System.out.print(c);
}
}

Use the following function to get required output
public String getFirstChars(String str){
int zeroAscii = '0'; int nineAscii = '9';
String result = "";
for (int i=0; i< str.lenght(); i++){
int ascii = str.toCharArray()[i];
if(ascii >= zeroAscii && ascii <= nineAscii){
result = result + str.toCharArray()[i];
}else{
return result;
}
}
return str;
}
pass your string as argument

I think this can be done by a simple regex which matches digits and java's string split function. This Regex based approach will be more efficient than the methods using more complicated regexs.
Something as below will work
String inp = "ABC345.";
String beginningChars = inp.split("[\\d]+",2)[0];
System.out.println(beginningChars); // only if you want to print.
The regex I used "[\\d]+" is escaped for java already.
What it does?
It matches one or more digits (d). d matches digits of any language in unicode, (so it matches japanese and arabian numbers as well)
What does String beginningChars = inp.split("[\\d]+",2)[0] do?
It applies this regex and separates the string into string arrays where ever a match is found. The [0] at the end selects the first result from that array, since you wanted the starting chars.
What is the second parameter to .split(regex,int) which I supplied as 2?
This is the Limit parameter. This means that the regex will be applied on the string till 1 match is found. Once 1 match is found the string is not processed anymore.
From the Strings javadoc page:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
This will be efficient if your string is huge.
Possible other regex if you want to split only on english numerals
"[0-9]+"

public static void main(String[] args) {
String testString = "MFD324FR";
int index = 0;
for (Character i : testString.toCharArray()) {
if (Character.isDigit(i))
break;
index++;
}
System.out.println(testString.substring(0, index));
}
this prints the first 'n' characters before it encounters a digit (i.e. integer).

Splitting input string for a calculator

I'm trying to split the input given by the user for my calculator.
For example,
if the user inputs "23+45*(1+1)" I want to this to be split into [23,+,45,*,(,1,+,1,)].

What your looking for is called a lexer. A lexer splits up input into chunks (called tokens) that you can read.
Fortunately, your lexer is pretty simple and can be written by hand. For more complicated lexers, you can use flex (as in "The Fast Lexical Analyzer"--not Adobe Flex), or (since you're using Java) ANTLR (note, ANTLR is much more than just a lexer).
Simply come up with a list of regular expressions, one for each token to match (note that since your input is so simple, you can probably do away with this list and merge them all into one single regex. However, for more advanced lexers, it helps to do one regex for each token) e.g.
\d+
\+
-
*
/
\(
\)
Then start a loop: while there are more characters to be parsed, go through each of your regular expressions and attempt to match them against the beginning of the string. If they match, add the first matched group to your list of input. Otherwise, continue matching (if none of them match, tell the user they have a syntax error).
Pseudocode:
List<String>input = new LinkedList<String>();
while(userInputString.length()>0){
for (final Pattern p : myRegexes){
final Matcher m = p.matcher(userInputString);
if(m.find()) {
input.add(m.group());
//Remove the token we found from the user's input string so that we
//can match the rest of the string against our regular expressions.
userInputString=userInputString.substring(m.group().length());
break;
}
}
}
Implementation notes:
You may want to prepend the ^ character to all of your regular expressions. This makes sure you anchor your matches against the beginning of the string. My pseudocode assumes you have done this.

I think using stacks to split the operand and operator and evaluate the expression would be more appropriate. In the calculator we generally use Infix notation to define the arithmetic expression.
Operand1 op Operand2
Check the Shunting-yard algorithm used in many such cases to parse the mathematical expression. This is also a good read.

This might be a little sloppy, because I am learning still, but it does split them into strings.
public class TestClass {
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
ArrayList<String> separatedInput = new ArrayList<String>();
String input = "";
System.out.print("Values: ");
input = sc.next();
if (input.length() != 0)
{
boolean numberValue = true;
String numbers = "";
for (int i = 0; i < input.length(); i++)
{
char ch = input.charAt(i);
String value = input.substring(i, i+1);
if (Character.isDigit(ch))
{ numberValue = true; numbers = numbers + value; }
if (!numberValue)
{ separatedInput.add(numbers); separatedInput.add(value); numbers = ""; }
numberValue = false;
if (i == input.length() - 1)
{
if (Character.isDigit(ch))
{ separatedInput.add(numbers); }
}
}
}
System.out.println(separatedInput);
}
}

Java regex: Repeating capturing groups

An item is a comma delimited list of one or more strings of numbers or characters e.g.
"12"
"abc"
"12,abc,3"
I'm trying to match a bracketed list of zero or more items in Java e.g.
""
"(12)"
"(abc,12)"
"(abc,12),(30,asdf)"
"(qqq,pp),(abc,12),(30,asdf,2),"
which should return the following matching groups respectively for the last example
qqq,pp
abc,12
30,asdf,2
I've come up with the following (incorrect)pattern
\((.+?)\)(?:,\((.+?)\))*
which matches only the following for the last example
qqq,pp
30,asdf,2
Tips? Thanks

That's right. You can't have a "variable" number of capturing groups in a Java regular expression. Your Pattern has two groups:
\((.+?)\)(?:,\((.+?)\))*
|___| |___|
group 1 group 2
Each group will contain the content of the last match for that group. I.e., abc,12 will get overridden by 30,asdf,2.
Related question:
Regular expression with variable number of groups?
The solution is to use one expression (something like \((.+?)\)) and use matcher.find to iterate over the matches.

You can use regular expression like ([^,]+) in loop or just str.split(",") to get all elements at once. This version: str.split("\\s*,\\s*") even allows spaces.

(^|\s+)(\S*)(($|\s+)\2)+ with ignore case option /i
She left LEft leFT now
example here - https://regex101.com/r/FEmXui/2
Match 1
Full match 3-23 ` left LEft leFT LEFT`
Group 1. 3-4 ` `
Group 2. 4-8 `left`
Group 3. 18-23 ` LEFT`
Group 4. 18-19 ` `

Using an ANTLR grammar can solve this problem. This is really beyond the reasonable capabilities of RegExp, although I believe some newer versions of Microsoft's implementation in .Net support this behavior. See this other SO question. If you're stuck with everything but .Net your best option is going to be a parser-generator (you don't have to use ANTLR, that's just my personal preference). Going through the ANTLR4 GitHub page can help get someone started on matching on more complex expressions with things like repeating match groups. Another option that doesn't require a whole lot of new learning is to tokenize the string input that you're wanting to match on and pull out the pieces that you want, but this can prove to be extremely messy and create nightmarish chunks of parsing code that are better-suited to a generated parser.

This may be the solution :
package com.drl.fw.sch;
import java.util.regex.Pattern;
public class AngularJSMatcher extends SimpleStringMatcher {
Matcher delegate;
public AngularJSMatcher(String lookFor){
super(lookFor);
// ng-repeat
int ind = lookFor.indexOf('-');
if(ind >= 0 ){
StringBuilder sb = new StringBuilder();
boolean first = true;
for (String s : lookFor.split("-")){
if(first){
sb.append(s);
first = false;
}else{
if(s.length() >1){
sb.append(s.substring(0,1).toUpperCase());
sb.append(s.substring(1));
}else{
sb.append(s.toUpperCase());
}
}
}
delegate = new SimpleStringMatcher(sb.toString());
}else {
String words[] = lookFor.split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])");
if(words.length > 1 ){
StringBuilder sb = new StringBuilder();
for (int i=0;i < words.length;i++) {
sb.append(words[i].toLowerCase());
if(i < words.length-1) sb.append("-");
}
delegate = new SimpleStringMatcher(sb.toString());
}
}
}
#Override
public boolean match(String in) {
if(super.match(in)) return true;
if(delegate != null && delegate.match(in)) return true;
return false;
}
public static void main(String[] args){
String lookfor="ngRepeatStart";
Matcher matcher = new AngularJSMatcher(lookfor);
System.out.println(matcher.match( "<header ng-repeat-start=\"item in items\">"));
System.out.println(matcher.match( "var ngRepeatStart=\"item in items\">"));
}
}

How to validate phone number(US format) in Java?

I just want to know where am i wrong here:
import java.io.*;
class Tokens{
public static void main(String[] args)
{
//String[] result = "this is a test".split("");
String[] result = "4543 6546 6556".split("");
boolean flag= true;
String num[] = {"0","1","2","3","4","5","6","7","8","9"};
String specialChars[] = {"-","#","#","*"," "};
for (int x=1; x<result.length; x++)
{
for (int y=0; y<num.length; y++)
{
if ((result[x].equals(num[y])))
{
flag = false;
continue;
}
else
{
flag = true;
}
if (flag == true)
break;
}
if (flag == false)
break;
}
System.out.println(flag);
}
}

If this is not homework, is there a reason you're avoiding regular expressions?
Here are some useful ones: http://regexlib.com/DisplayPatterns.aspx?cattabindex=6&categoryId=7
More generally, your code doesn't seem to validate that you have a phone number, it seems to merely validate that your strings consists only of digits. You're also not allowing any special characters right now.

Asides from the regex suggestion (which is a good one), it would seem to make more sense to deal with arrays of characters rather than single-char Strings.
In particular, the split("") call (shudder) could/should be replaced by toCharArray(). This lets you iterate over each individual character, which more clearly indicates your intent, is less prone to bugs as you know you're treating each character at once, and is more efficient*. Likewise your valid character sets should also be characters.
Your logic is pretty strangely expressed; you're not even referencing the specialChars set at all, and the looping logic once you've found a match seems odd. I think this is your bug; the matching seems to be the wrong way round in that if the character matches the first valid char, you set flag to false and continue round the current loop; so it will definitely not match the next valid char and hence you break out of the loop with a true flag. Always.
I would have thought something like this would be more intuitive:
private static final Set<Character> VALID_CHARS = ...;
public boolean isValidPhoneNumber(String number)
{
for (char c : number,toCharArray())
{
if (!VALID_CHARS.contains(c))
{
return false;
}
}
// All characters were valid
return true;
}
This doesn't take sequences into account (e.g. the strings "--------** " and "1" would be valid because all individual characters are valid) but then neither does your original code. A regex is better because it lets you specify the pattern, I supply the above snippet as an example of a clearer way of iterating through the characters.
*Yes, premature optimization is the root of all evil, but when better, cleaner code also happens to be faster that's an extra win for free.

Maybe this is overkill, but with a grammar similar to:
<phone_numer> := <area_code><space>*<local_code><space>*<number> |
<area_code><space>*"-"<space>*<local_code><space>*"-"<space>*<number>
<area_code> := <digit><digit><digit> |
"("<digit><digit><digit>")"
<local_code> := <digit><digit><digit>
<number> := <digit><digit><digit><digit>
you can write a recursive descent parser. See this page for an example.

You can checkout the Pattern class in Java, really easy to work with regular expression using this class:
https://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to construct regular expression to balance characters in a string? - java

Related

How to check whether certain characters are present in another string which characters are unordered? Using RegEx Java

Java Get first character values for a string

Splitting input string for a calculator

Java regex: Repeating capturing groups

How to validate phone number(US format) in Java?

Categories

Resources