Java: Stringtokenizer To Array - java

Given a polynomial, I'm attempting to write code to create a polynomial that goes by the degree's, and adds like terms together For instance... given
String term = "323x^3+2x+x-5x+5x^2" //Given
What I'd like = "323x^3+5x^2-2x" //result
So far I've tokenized the given polynomial by this...
term = term.replace("+" , "~+");
term = term.replace("-", "~-");
System.out.println(term);
StringTokenizer multiTokenizer = new StringTokenizer(term, "~");
int numberofTokens = multiTokenizer.countTokens();
String[] tokensArray = new String[numberofTokens];
int x=0;
while (multiTokenizer.hasMoreTokens())
{
System.out.println(multiTokenizer.nextToken());
}
Resulting in
323x^3~+2x~+x~-5x~+5x^2
323x^3
+2x
+x
-5x
+5x^2
How would I go about splitting the coefficient from the x value, saving each coefficient in an array, and then putting the degrees in a different array with the same index as it's coefficient? I will then use this algorithm to add like terms....
for (i=0;i<=biggest_Root; i++)
for(j=0; j<=items_in_list ; j++)
if (degree_array[j] = i)
total += b1[j];
array_of_totals[i] = total;
Any and all help is much appreciated!

You can also update the terms so they all have coefficients:
s/([+-])x/\11/g
So +x^2 becomes +1x^2.
Your individual coefficients can be pulled out by simple regex expressions.
Something like this should suffice:
/([+-]?\d+)x/ // match for x
/([+-]?\d+)x\^2/ // match for x^2
/([+-]?\d+)x\^3/ // match for x^3
/([+-]?\d+)x\^4/ // match for x^4
Then
sum_of_coefficient[degree] += match
where "match" is the parseInt of the the regex match (special case where coefficient is 1 and has no number eg. +x)
sum_of_coefficient[3] = 323
sum_of_coefficient[1] = +2+1-5 = -2
sum_of_coefficient[2] = 5

Using a "Regular Expression" Pattern to Simplify the Parsing
(and make the code cooler and more concise)
Here is a working example that parses coefficient, variable and degree for each term based on the terms you've parsed so far. It just inserted the terms shown into your example into a list of Strings and then processes each string the same way.
This program runs and produces output, and if you like it you can splice it into your program. To try it:
$ javac parse.java
$ java parse
Limitations and Potential Improvements:
Technically speaking the coefficient and degrees could be fractional, so the regular expression could easily be changed to handle those kinds of numbers. And then instead of Integer.parseInt() you could use Float.parseFloat() instead to convert the matched value to a variable you can use.
import java.util.*;
import java.util.regex.*;
public class parse {
public static void main(String args[]) {
/*
* Substitute this List with your own list or
* array from the code you've written already...
*
* vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv */
List<String>terms = new ArrayList<String>();
terms.add("323x^3");
terms.add("+2x");
terms.add("+x");
terms.add("-5x");
terms.add("+5x^2");
/* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */
for (String term : terms) {
System.out.print("Term: " + term + ": \n");
Pattern pattern = Pattern.compile("([+-]*\\d*)([A-Za-z]*)\\^*(\\d*)");
Matcher matcher = pattern.matcher(term);
if (matcher.find()) {
int coefficient = 1;
try {
coefficient = Integer.parseInt(matcher.group(1));
} catch (Exception e) {}
String variable = matcher.group(2);
int degree = 1;
try {
degree = Integer.parseInt(matcher.group(3));
} catch (Exception e) {}
System.out.println(" coefficient = " + coefficient);
System.out.println(" variable = " + variable);
System.out.println(" degree = " + degree);
/*
* Here, do what you need to do with
* variable, coefficient and degree
*/
}
}
}
}
Explanation of the Regular Expression in the Example Code:
This is the regular expression used:
([+-]*\\d*)([A-Za-z]*)\\^*(\\d*)
Each parenthesized section represents part of the term I want to match and extract into my result. It puts whatever is matched in a group corresponding to the set of parenthesis. First set of parenthesis goes into group 1, second into group 2, etc...
The first matcher (grouped by ( )), is ([+-]*\\d*)
That is designed match (e.g. extract) the coefficient (if any) and put it into group 1. It expects something that has zero or more occurances of '+' or '-' characters, followed by zero or more digits. I probably should have written in [+-]?\\d* which would match zero or one + or - characters.
The next grouped matcher is ([A-Za-z]*) That says match zero or more capital or lowercase letters.
That is trying to extract the variable name, if any and put it into group 2.
Following that, there is an ungrouped \\^*, which matches 0 or more ^ characters. It's not grouped in parenthesis, because we want to account for the ^ character in the text, but not stash it anywhere. We're really interested in the exponent number following it. Note: Two backslashes are how you make one backslash in a Java string. The real world regular expression we're trying to represent is \^*. The reason it's escaped here is because ^ unescaped has special meaning in regular expressions, but we just want to match/allow for the possibility of an actual caret ^ at that position in the algebraic term we're parsing.
The final pattern group is (\\d*). Outside of a string literal, as most regex's in the wild are, that would simply be \d*. It's escaped because, by default, in a regex, d, unescaped, means match a literal d at the current position in the text, but, escaped,\d is a special regex pattern that matches match any digit [0-9] (as the Pattern javadoc explains). * means expect (match) zero or more digits at that point. Alternatively, + would mean expect 1 or more digits in the text at the current position, and ? would mean 0 or 1 digits are expected in the text at the current position. So, essentially, the last group is designed to match and extract the exponent (if any) after the optional caret, putting that number into group 3.
Remember the ( ) (parenthesized) groupings are just so that we can extract those areas parsed into separate groups.
If this doesn't all make perfect sense, study regular expressions in general and read the Java Pattern class javadoc online. The are NOT as scary as they first look, and an extremely worthwhile study for any programmer ASAP, as it crosses most popular scripting languages and compilers, so learn it once and you have an extremely powerful tool for life.

This looks like a homework question, so I won't divulge the entire answer here but here's how I'd get started
public class Polynomial {
private String rawPolynomial;
private int lastTermIndex = 0;
private Map<Integer, Integer> terms = new HashMap<>();
public Polynomial(String poly) {
this.rawPolynomial = poly;
}
public void simplify() {
while(true){
String term = getNextTerm(rawPolynomial);
if ("".equalsIgnoreCase(term)) {
return;
}
Integer degree = getDegree(term);
Integer coeff = getCoefficient(term);
System.out.println(String.format("%dx^%d", coeff, degree));
terms.merge(degree, coeff, Integer::sum);
}
}
private String getNextTerm(String poly) {
...
}
private Integer getDegree(String poly) {
...
}
private Integer getCoefficient(String poly) {
...
}
#Override public String toString() {
return terms.toString();
}
}
and some tests to get you started -
public class PolynomialTest {
#Test public void oneTermPolynomialRemainsUnchanged() {
Polynomial poly = new Polynomial("3x^2");
poly.simplify();
assertTrue("3x^2".equalsIgnoreCase(poly.toString()));
}
}
You should be able to fill in the blanks, hope this helps. I'll be happy to help you further if you're stuck somewhere.

Related

How to pad Strings with Unicode characters in Java

I add right padding to a String to output it in a table format.
for (String[] tuple : testData) {
System.out.format("%-32s -> %s\n", tuple[0], tuple[1]);
}
The result looks like this (random test data):
znZfmOEQ0Gb68taaNU6HY21lvo -> Xq2aGqLedQnTSXg6wmBNDVb
frKweMCH8Kvgyk0J -> lHJ5r7YDV0jTL
NxtHP -> odvPJklwIzZZ
NX2scXjl5dxWmer -> wPDlKCKllVKk
x2HKsSHCqDQ -> RMuWLZ2vaP9sOF0yHmjVysJ
b0hryXKd6b80xAI -> 05MHjvTOxlxq1bvQ8RGe
This approach does not work when there are multi-byte unicode characters:
0OZot🇨🇳ivbyG🧷hZM1FI👡wNhn6r6cC -> OKDxDV1o2NMqXH3VvE7q3uONwEcY5V
fBHRCjU4K8OCdzACmQZSn6WO -> gvGBtUO5a4gPMKj9BKqBHFKx1iO7
cDUh🇲🇺b0cXkLWkS -> SZX
WtP9t -> Q0wWOeY3W66mM5rcQQYKpG
va4d🍷u8SS -> KI
a71?⚖TZ💣🧜‍♀🕓ws5J -> b8A
As you can see, the alignment is off.
My idea was to calculate the difference between the length of the String and the number of bytes used and use that to offset the padding, something like this:
int correction = tuple[0].getBytes().length - tuple[0].length();
And then instead of padding to 32 chars, I would pad to 32 + correction. However, this didn't work either.
Here is my test code (using emoji-java but the behaviour should be reproducable with any unicode characters):
import java.util.Collection;
import org.apache.commons.lang3.RandomStringUtils;
import com.vdurmont.emoji.Emoji;
import com.vdurmont.emoji.EmojiManager;
public class Test {
public static void main(String[] args) {
// create random test data
String[][] testData = new String[15][2];
for (String[] tuple : testData) {
tuple[0] = RandomStringUtils.randomAlphanumeric(2, 32);
tuple[1] = RandomStringUtils.randomAlphanumeric(2, 32);
}
// add some emojis
Collection<Emoji> all = EmojiManager.getAll();
for (String[] tuple : testData) {
for (int i = 1; i < tuple[0].length(); i++) {
if (Math.random() > 0.90) {
Emoji emoji = all.stream().skip((int) (all.size() * Math.random())).findFirst().get();
tuple[0] = tuple[0].substring(0, i - 1) + emoji.getUnicode() + tuple[0].substring(i + 1);
}
}
}
// output
for (String[] tuple : testData) {
System.out.format("%-32s -> %s\n", tuple[0], tuple[1]);
}
}
}
There are actually a few issues here, other than that some fonts display the flag wider than the other characters. I assume that you want to count the Chinese flag as a single character (as it is drawn as a single element on the screen).
The String class reports an incorrect length
The String class works with chars, which are 16-bit integers of Unicode code points. The problem is that not all code points fit in 16 bits, only code points from the Basic Multilingual Plane (BMP) fit in those chars. String's length() method returns the number of chars, not the number of code points.
Now String's codePointCount method may help in this case: it counts the number of code points in the given index range. So providing string.length() as second argument to the method returns the total count of code points.
Combining characters
However, there's another problem. The 🇨🇳 Chinese flag, for example, consists of two Unicode code points: the Regional Indicator Symbol Letters C (🇨, U+1F1E8) and N (🇳, U+1F1F3). Those two code points are combined into a flag of China. This is a problem you are not going to solve with the codePointCount method.
The Regional Indicator Symbol Letters seem to be a special occasion. Two of those characters can be combined into a national flag. I am not aware of a standard way to achieve what you want. You may have to take that manually into account.
I've written a small program to get the length of a string.
static int length(String str) {
String a = "\uD83C\uDDE6";
String z = "\uD83C\uDDFF";
Pattern p = Pattern.compile("[" + a + "-" + z + "]{2}");
Matcher m = p.matcher(str);
int count = 0;
while (m.find()) {
count++;
}
return str.codePointCount(0, str.length()) - count;
}
As is discussed by the comments in the question linked to by #Xehpuk, in this discussion on kotlinlang.org as well as in this blog post by Daniel Lemire the following seems to be correct:
The problem is that the java String class represents characters as
UTF-16 characters. This means any unicode character that is
represented by more than 16 bits is saved as 2 separate Char values.
This fact is ignored by many of the functions within String, eg.
String.lenght does not return the number of unicode characters, it
returns the number of 16bit characters within the String, some emoji
counting for 2 characters.
The behaviour, however, seems to be implementation-specific.
As David mentions in his post you could try the following to get the correct lenght:
tuple.codePointCount(0, tuple.length())
See code point methods from Java SE docs

I need a regex that matches numbers depending on a variable

i'm having some problems when trying to find a regex for my code. Here it is:
Scanner key = new Scanner(System.in);
//this is the variable
int s = 4;
String input = "";
String bregex = "[1-9][0-9]{1," + (s*s) + "}";
boolean cfgmatch = false;
while(cfgmatch == false){
input = key.next();
Pattern cfgbp = Pattern.compile(bregex);
Matcher bm = cfgbp.matcher(input);
if(bm.matches()){
System.out.println("working");
}
else{
System.out.println("not working");
}
}
I'm trying to make a regex to resrict a number of cells in a board. cells number can't be higer than board's space, which is "s*s".
Example: If board's size is 4, the input can be from 1 to 16, if it's 5, from 1 to 25, etc...
Board size can only be from 1 to 9.
I've written that while to ask for another number in case of failing the input.
Be Careful with Regular Expressions
While a regular expression could potentially work for this, it's really better designed to handle pattern matching as opposed to arithmetic operations. Your current regular expression would generate s*s digits, which isn't going to be defining the range you are looking for :
// If s = 4, then this regular express will match any string that begins with a 1 and
// would allow any values from 1-99999999999999999 as opposed to the 1-16 you are expecting
String bregex = "[1-9][0-9]{1,16}";
Consider a Simpler Approach
You may be better off avoiding it if you are going to be comparing your input numerically to another value (i.e. is this number less than x) :
// Is your number less than the largest possible square value?
if(parseInt(input) <= s*s){
// Valid
}
else {
// Invalid
}

Java regex patterns conjunction pattern

Is there any way to get Pattern that will be a conjunction of two another, such that any String will match it if in matches two another both?
Some math:
S — set of strings
P — set of patterns (where each pattern has one or more string representation (e.g. “[0-9]” and “\d” are the same pattern))
Sᵢ — subset of strings (Sᵢ ⊂ S) that match pᵢ pattern (where instead of i could be any index).
In equation form: “Sᵢ = {s | s ∈ S, s matches pᵢ, pᵢ ∈ P}” — that meas: “Sᵢ is a set of elements that are strings and match pᵢ pattern”.
Or another notation: “Sᵢ ⊂ S, ∀pᵢ ∈ P ∀s ∈ S (s matches pᵢ ≡ s ∈ Sᵢ)” — that meas: “Sᵢ is subset of strings and any string is element of Sᵢ if it matches pᵢ pattern”.
Let's define conjunction of patterns: “p₁ ∧ p₂ = p₃ ≡ S₁ ∩ S₂ = S₃” — that means: “Set of strings that match conjunction of patterns p₁ and p₂ is intersection of sets of strings that match p₁ pattern and that match p₂ pattern”.
Assuming you want exact matches (that is, tomato does not match omat), then you need to wrap each p_i between (?=^(?: and )$), then join them.
If you want inexact matches (tomato does match omat), then you need to wrap each p_i between (?=.*?(?: and )), then join them. Note that in this case, there is the potential for catastrophic backtracking.
In both cases, you can add .* after joining if you want to eat the word (remember, lookaheads match the empty string).
Explanation
In the exact case, the outside is wrapped in a lookahead so that no characters are eaten. Inside are anchors ^ and $ (this provides the exactness) surrounding a non-capturing group. The non-capturing group is so if you have an or expression at the upper level in one of p_i, the anchors apply to the entire group rather than to the first expression.
The inexact case is exactly the same, except instead of anchoring, we eat characters until we get to the match position.
You can see a detailed example on www.debuggex.com.
private static String combineRE(String p1, String p2){
int groups1 = 0, groups2=0;
StringBuilder newP = new StringBuilder("(?=");
newP.append(p1);
newP.append("$)(?=");
Pattern capturingGroup = Pattern.compile("(?<!\\\\)(\\\\\\\\)*\\((?!\\?)");
Matcher m = capturingGroup.matcher(p1);
while(m.find()) groups1 ++;
m = capturingGroup.matcher(p2);
while(m.find()) groups2 ++;
String new2 = p2;
for(int i=1; i<=groups2; i++)
new2 = new2.replaceAll("(?<!\\\\)\\\\"+i, "\\\\" + (i+groups1));
newP.append(new2);
newP.append("$).*");
return newP.toString();
}
This function uses the basic structure (?=p1$)(?=p2$).*, while recounting the numbered backreferences int the second pattern. It uses a regular expression to count the number of capturing group openers (unescaped (s not followed by a ?) in each pattern, then updates the backrefences in the second pattern before placing it in the resultant pattern. I've set up a testing environment with ideone: Please, add all the test cases you can think of, but I think this answers your question.
http://ideone.com/Wm8cRc
Round 2:
There's no good way to generate a Pattern that will find() a substring to match two patterns. I toyed briefly with (?=p1(?<p2)).*(?<(?=p1)p2) and other such nonsense before giving up and writing, instead, an algorithm. First, I slightly modified my CombineRE from before:
private static String combineRE(String p1, String p2, boolean anchors){
int groups1 = 0, groups2=0;
StringBuilder newP = new StringBuilder((anchors)?"^(?=":"(?=");
newP.append(p1);
if (anchors) newP.append('$');
newP.append(")(?=");
Pattern capturingGroup = Pattern.compile("(?<!\\\\)(\\\\\\\\)*\\((?!\\?)");
Matcher m = capturingGroup.matcher(p1);
while(m.find()) groups1 ++;
m = capturingGroup.matcher(p2);
while(m.find()) groups2 ++;
String new2 = p2;
for(int i=1; i<=groups2; i++)
new2 = new2.replaceAll("(?<!\\\\)\\\\"+i, "\\\\" + (i+groups1));
newP.append(new2);
if (anchors) newP.append('$');
newP.append(')');
if (anchors) newP.append(".*");
return newP.toString();
}
You'll see that it now supports optional anchors. I used this functionality in my new function:
private static String[] findAllCombinedRE(String p1, String p2, String haystack, boolean overlap){
ArrayList<String> toReturn = new ArrayList<String>();
Pattern pCombo = Pattern.compile(combineRE(p1,p2, false));
String pComboMatch = combineRE(p1,p2, true);
Matcher m = pCombo.matcher(haystack);
int s = 0;
while (m.find(s)){
String match = haystack.substring(m.start());
s = m.start()+1;
for (int i=match.length(); i>0; i--){
String sMatch = match.substring(0,i);
if (Pattern.matches(pComboMatch, sMatch)){
toReturn.add(sMatch);
/**
* Note that at this point we can caluclute match
* object like Information:
*
* group() = sMatch;
* start() = m.start();
* end() = m.start() + i;
*
* If it so suited us, we could pass this information
* back in a wrapped object.
*/
if (!overlap){
s = m.start()+i;
break;
}
}
}
}
return toReturn.toArray(new String[]{});
}
It uses an anchor free version of the two Regular Expressions to find all strings that might match, then chops off one letter at a time until the string matches the anchored version. It also includes a boolean to control for overlapping matches.
http://ideone.com/CBoBN5 Works pretty well.

Java regex and pattern matching: finding "blanks" in pattern which do not include them?

So, I need to write a compiler scanner for a homework, and thought it'd be "elegant" to use regex. Fact is, I seldomly used them before, and it was a long time ago. So I forgot most of the stuff about them and needed to have a look around. I used them successfully for the identifiers (or at least I think so, I still need to do some further tests but for now they all look ok), but I have a problem with the numbers-recognition.
The function nextCh() reads the next character on the input (lookahead char). What I'd like to do here is to check if this char matches the regex [0-9]*. I append every matching char in the str field of my current token, then I read the int value of this field. It recognizes a single number input such as "123", but the problem I have is that for the input "123 456", the final str will be "123 456" while I should get 2 separate tokens with fields "123" and "456". Why is the " " being matched?
private void readNumber(Token t) {
t.str = "" + ch; // force conversion char --> String
final Pattern pattern = Pattern.compile("[0-9]*");
nextCh(); // get next char and check if it is a digit
Matcher match = pattern.matcher("" + ch);
while (match.find() && ch != EOF) {
t.str += ch;
nextCh();
match = pattern.matcher("" + ch);
}
t.kind = Kind.number;
try {
int value = Integer.parseInt(t.str);
t.val = value;
} catch(NumberFormatException e) {
error(t, Message.BIG_NUM, t.str);
}
Thank you!
PS: I did solve my problem using the code below. Nevertheless, I'd like to understand where the flaw is in my regex expression.
t.str = "" + ch;
nextCh(); // get next char and check if it is a number
while (ch>='0' && ch<='9') {
t.str += ch;
nextCh();
}
t.kind = Kind.number;
try {
int value = Integer.parseInt(t.str);
t.val = value;
} catch(NumberFormatException e) {
error(t, Message.BIG_NUM, t.str);
}
EDIT: turns out my regex also doesn't work for the identifiers recognition (again, includes blanks), so I had to switch to a system similar to my "solution" (while with a lot of conditions). Guess I'll need to study the regex again :O
I'm not 100% sure whether this is relevant in your case, but this:
Pattern.compile("[0-9]*");
matches zero or more numbers anywhere in the string, because of the asterisk. I think the space gets matched because it is a match for 'zero numbers'. If you wanted to make sure the char was a number, you would have to match one or more, using the plus sign:
Pattern.compile("[0-9]+");
or, since you are only comparing a single char at a time, just match one number:
Pattern.compile("^[0-9]$");
You should be using the matches method rather than the find method. From the documentation:
The matches method attempts to match the entire input sequence against the pattern
The find method scans the input sequence looking for the next subsequence that matches the pattern.
So in other words, by using find, if the string contains a digit anywhere at all, you'll get a match, but if you use matches the entire string must match the pattern.
For example, try this:
Pattern p = Pattern.compile("[0-9]*");
Matcher m123abc = p.matcher("123 abc");
System.out.println(m123abc.matches()); // prints false
System.out.println(m123abc.find()); // prints true
Use a simpler regex like
/\d+/
Where
\d means a digit
+ means one or more
In code:
final Pattern pattern = Pattern.compile("\\d+");

Splitting input string for a calculator

I'm trying to split the input given by the user for my calculator.
For example,
if the user inputs "23+45*(1+1)" I want to this to be split into [23,+,45,*,(,1,+,1,)].
What your looking for is called a lexer. A lexer splits up input into chunks (called tokens) that you can read.
Fortunately, your lexer is pretty simple and can be written by hand. For more complicated lexers, you can use flex (as in "The Fast Lexical Analyzer"--not Adobe Flex), or (since you're using Java) ANTLR (note, ANTLR is much more than just a lexer).
Simply come up with a list of regular expressions, one for each token to match (note that since your input is so simple, you can probably do away with this list and merge them all into one single regex. However, for more advanced lexers, it helps to do one regex for each token) e.g.
\d+
\+
-
*
/
\(
\)
Then start a loop: while there are more characters to be parsed, go through each of your regular expressions and attempt to match them against the beginning of the string. If they match, add the first matched group to your list of input. Otherwise, continue matching (if none of them match, tell the user they have a syntax error).
Pseudocode:
List<String>input = new LinkedList<String>();
while(userInputString.length()>0){
for (final Pattern p : myRegexes){
final Matcher m = p.matcher(userInputString);
if(m.find()) {
input.add(m.group());
//Remove the token we found from the user's input string so that we
//can match the rest of the string against our regular expressions.
userInputString=userInputString.substring(m.group().length());
break;
}
}
}
Implementation notes:
You may want to prepend the ^ character to all of your regular expressions. This makes sure you anchor your matches against the beginning of the string. My pseudocode assumes you have done this.
I think using stacks to split the operand and operator and evaluate the expression would be more appropriate. In the calculator we generally use Infix notation to define the arithmetic expression.
Operand1 op Operand2
Check the Shunting-yard algorithm used in many such cases to parse the mathematical expression. This is also a good read.
This might be a little sloppy, because I am learning still, but it does split them into strings.
public class TestClass {
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
ArrayList<String> separatedInput = new ArrayList<String>();
String input = "";
System.out.print("Values: ");
input = sc.next();
if (input.length() != 0)
{
boolean numberValue = true;
String numbers = "";
for (int i = 0; i < input.length(); i++)
{
char ch = input.charAt(i);
String value = input.substring(i, i+1);
if (Character.isDigit(ch))
{ numberValue = true; numbers = numbers + value; }
if (!numberValue)
{ separatedInput.add(numbers); separatedInput.add(value); numbers = ""; }
numberValue = false;
if (i == input.length() - 1)
{
if (Character.isDigit(ch))
{ separatedInput.add(numbers); }
}
}
}
System.out.println(separatedInput);
}
}

Categories

Resources