Spliting and reading from string in Java - java

I've written a code that works similar to calculator, but it solves cryptarithmetic equations. It works fine with basic operations like +-*/.
Now I added the power and root operation and it doesn't work when I use those two new operations. It seems that the problem is with the way I split the input string. The problem is that it doesn't split the string with "^" operator. Here is the code where the problem occurs:
private void findOperator() {
// TODO Auto-generated method stub
String[] tempString = this.rawInputString.split("");
for(String s : tempString){
if(s.equals("+")){
this.operator = "[+]";
break;
}
else if(s.equals("*")){
this.operator = "[*]";
break;
}
else if(s.equals("-")){
this.operator = s;
break;
}
else if(s.equals("/")){
this.operator = s;
break;
}
else if(s.equals("^")){
this.operator = s;
break;
}
else if(s.equals("sqrt")){
this.operator = s;
break;
}
}
}
public void parseInput(){
String[] tempString = rawInputString.split(this.operator);
this.firstString = tempString[0].split("");
this.firstLetterFirstNumber = this.firstString[0];
String temporarySecondPart = tempString[1];//This is where it says I
//have the problem, but it works fine
//with other operators
this.operator = rawInputString.substring(this.firstString.length,this.firstString.length+1);
tempString = temporarySecondPart.split("=");
this.secondString = tempString[0].split("");
this.firstLetterSecondNUmber = this.secondString[0];
this.result = tempString[1].split("");
this.firstLetterResult = this.result[0];
}

split is using regular expression (regex) as argument. Some characters have special meaning in regex (we call them metacharacters) and ^ is one of them. It usually represent start of string, or can be used to create negative character set like [^a-z] will represent any character which is not in range a and z.
If you want to change ^ into simple literal you need to escape it like
split("\\^")
but safer way would be allowing regex to do escaping for you. To do so use
split(Pattern.quote("^"))
or in your case
split(Pattern.quote(operator)).

You are doing some weird jumping through hoops in that code.
findOperator() splits rawInputString into 1-character strings, then searches for the first +, *, -, /, or ^ (ignoring the non-working sqrt) and assigns it to this.operator as a regex.
You then split rawInputString using that regex. Why?
You just found it in findOperator(), so you know exactly where it is.
Then you begin splitting, and splitting, and splitting...
All that, when all you want to do is parse a string a op b = c?
And you seem to want to save it all in fields:
firstString a as a String[] of 1-character
operator op
secondString b as a String[] of 1-character
result c as a String[] of 1-character
firstLetterFirstNumber First 1-character string in firstString
firstLetterSecondNUmber First 1-character string in secondString
firstLetterResult First 1-character string in result
And no error handling whatsoever, so you get ArrayIndexOutOfBoundsException, instead of some meaningful error.
Just use one regular expression, and all your values are ready for you.
And using toCharArray() will give you the 1-character values as a char[].
String rawInputString = "3√343=7";
String regex = "(.+?)([-+*/^√])(.+?)=(.+)";
Matcher m = Pattern.compile(regex).matcher(rawInputString);
if (! m.matches())
throw new IllegalArgumentException("Bad input: " + rawInputString);
char[] firstString = m.group(1).toCharArray();
String operator = m.group(2);
char[] secondString = m.group(3).toCharArray();
char[] result = m.group(4).toCharArray();
char firstLetterFirstNumber = firstString[0];
char firstLetterSecondNUmber = secondString[0];
char firstLetterResult = result[0];
System.out.println("firstString = " + Arrays.toString(firstString));
System.out.println("operator = " + operator);
System.out.println("secondString = " + Arrays.toString(secondString));
System.out.println("result = " + Arrays.toString(result));
OUTPUT
firstString = [3]
operator = √
secondString = [3, 4, 3]
result = [7]

try this regex out
String abc = "a+b-c*d^f";
String reg = "((?<=[<=|>=|==|\\+|\\*|\\-|<|>|/|=|\\^])|(?=[<=|>=|==|\\+|\\*|\\-|<|>|/|=|\\^]))";
String [] arr = abc.split(reg); //split your String according to Expression
for(String obj : arr)
System.out.println(obj);
Your Output will be like that
a
+
b
-
c
*
d
^
f
Note :- Your Complete Mathematical expression will be split into an array of String just by finding any mathematical expression in row

Related

Someone please explain this to me

Someone please help, How exactly can I take a string and break it up evenly.
for example (41-25) how can I pull the 41 or 25 out instead of getting a seperate 4 and 1.
Whenever I enter a double it registers it as each single digit including the period but not as a whole.
static double evaluate(String expr){
//Add code below
Stack<String> operators = new Stack<String>();
Stack<Double> values = new Stack<Double>();
String[] temp = expr.split("");
Double value = 0.0;
for(int i=0; i< temp.length;i++){
if(temp[i].equals("(")) continue;
else if(temp[i].equals("+")) operators.push(temp[i]);
else if(temp[i].equals("-")) operators.push(temp[i]);
else if(temp[i].equals("*")) operators.push(temp[i]);
else if(temp[i].equals("/")) operators.push(temp[i]);
else if(temp[i].equals(")")) {
String ops = operators.pop();
value = values.pop();
value = operate(values.pop(), value, ops);
System.out.println(value);
values.push(value);
}
else{
System.out.println(temp[i]);
Double current = Double.parseDouble(temp[i]);
values.push(current);
}
}
return 0;
}
I would split the string before and after any operator rather than splitting every character:
static double evaluate(String expr){
//Add code below
...
String[] temp = expr.split("((?<=[\+\-\*\/])|(?=[\+\-\*\/]))"); // Changes "41-25" to ["41", "-", "25"]
This uses regex to split the string using a positive look behind (?<=) and a positive lookahead (?=) with a character set inside for the four operators that you need [\+\-\*\/] (the operators are escaped with a backslash.
Any string will split before and after any operator. If you need more operators, they can be added to the character set.
With Java you could even make your character set a String to remove duplicate code by putting:
String operators = "\\+-\\*/";
String[] temp = expr.split("((?<=[" + operators + "])|(?=[" + operators + "]))";
This method enables you to change what operators to split on easily.

Java regex - how to chop String into parts [duplicate]

I have a multiline string which is delimited by a set of different delimiters:
(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
I can split this string into its parts, using String.split, but it seems that I can't get the actual string, which matched the delimiter regex.
In other words, this is what I get:
Text1
Text2
Text3
Text4
This is what I want
Text1
DelimiterA
Text2
DelimiterC
Text3
DelimiterB
Text4
Is there any JDK way to split the string using a delimiter regex but also keep the delimiters?
You can use lookahead and lookbehind, which are features of regular expressions.
System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));
And you will get:
[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]
The last one is what you want.
((?<=;)|(?=;)) equals to select an empty character before ; or after ;.
EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s) and use Java's String.format to replace the placeholders with the actual string you need to use; for example:
static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}
You want to use lookarounds, and split on zero-width matches. Here are some examples:
public class SplitNDump {
static void dump(String[] arr) {
for (String s : arr) {
System.out.format("[%s]", s);
}
System.out.println();
}
public static void main(String[] args) {
dump("1,234,567,890".split(","));
// "[1][234][567][890]"
dump("1,234,567,890".split("(?=,)"));
// "[1][,234][,567][,890]"
dump("1,234,567,890".split("(?<=,)"));
// "[1,][234,][567,][890]"
dump("1,234,567,890".split("(?<=,)|(?=,)"));
// "[1][,][234][,][567][,][890]"
dump(":a:bb::c:".split("(?=:)|(?<=:)"));
// "[][:][a][:][bb][:][:][c][:]"
dump(":a:bb::c:".split("(?=(?!^):)|(?<=:)"));
// "[:][a][:][bb][:][:][c][:]"
dump(":::a::::b b::c:".split("(?=(?!^):)(?<!:)|(?!:)(?<=:)"));
// "[:::][a][::::][b b][::][c][:]"
dump("a,bb:::c d..e".split("(?!^)\\b"));
// "[a][,][bb][:::][c][ ][d][..][e]"
dump("ArrayIndexOutOfBoundsException".split("(?<=[a-z])(?=[A-Z])"));
// "[Array][Index][Out][Of][Bounds][Exception]"
dump("1234567890".split("(?<=\\G.{4})"));
// "[1234][5678][90]"
// Split at the end of each run of letter
dump("Boooyaaaah! Yippieeee!!".split("(?<=(?=(.)\\1(?!\\1))..)"));
// "[Booo][yaaaa][h! Yipp][ieeee][!!]"
}
}
And yes, that is triply-nested assertion there in the last pattern.
Related questions
Java split is eating my characters.
Can you use zero-width matching regex in String split?
How do I convert CamelCase into human-readable names in Java?
Backreferences in lookbehind
See also
regular-expressions.info/Lookarounds
A very naive solution, that doesn't involve regex would be to perform a string replace on your delimiter along the lines of (assuming comma for delimiter):
string.replace(FullString, "," , "~,~")
Where you can replace tilda (~) with an appropriate unique delimiter.
Then if you do a split on your new delimiter then i believe you will get the desired result.
import java.util.regex.*;
import java.util.LinkedList;
public class Splitter {
private static final Pattern DEFAULT_PATTERN = Pattern.compile("\\s+");
private Pattern pattern;
private boolean keep_delimiters;
public Splitter(Pattern pattern, boolean keep_delimiters) {
this.pattern = pattern;
this.keep_delimiters = keep_delimiters;
}
public Splitter(String pattern, boolean keep_delimiters) {
this(Pattern.compile(pattern==null?"":pattern), keep_delimiters);
}
public Splitter(Pattern pattern) { this(pattern, true); }
public Splitter(String pattern) { this(pattern, true); }
public Splitter(boolean keep_delimiters) { this(DEFAULT_PATTERN, keep_delimiters); }
public Splitter() { this(DEFAULT_PATTERN); }
public String[] split(String text) {
if (text == null) {
text = "";
}
int last_match = 0;
LinkedList<String> splitted = new LinkedList<String>();
Matcher m = this.pattern.matcher(text);
while (m.find()) {
splitted.add(text.substring(last_match,m.start()));
if (this.keep_delimiters) {
splitted.add(m.group());
}
last_match = m.end();
}
splitted.add(text.substring(last_match));
return splitted.toArray(new String[splitted.size()]);
}
public static void main(String[] argv) {
if (argv.length != 2) {
System.err.println("Syntax: java Splitter <pattern> <text>");
return;
}
Pattern pattern = null;
try {
pattern = Pattern.compile(argv[0]);
}
catch (PatternSyntaxException e) {
System.err.println(e);
return;
}
Splitter splitter = new Splitter(pattern);
String text = argv[1];
int counter = 1;
for (String part : splitter.split(text)) {
System.out.printf("Part %d: \"%s\"\n", counter++, part);
}
}
}
/*
Example:
> java Splitter "\W+" "Hello World!"
Part 1: "Hello"
Part 2: " "
Part 3: "World"
Part 4: "!"
Part 5: ""
*/
I don't really like the other way, where you get an empty element in front and back. A delimiter is usually not at the beginning or at the end of the string, thus you most often end up wasting two good array slots.
Edit: Fixed limit cases. Commented source with test cases can be found here: http://snippets.dzone.com/posts/show/6453
Pass the 3rd aurgument as "true". It will return delimiters as well.
StringTokenizer(String str, String delimiters, true);
I know this is a very-very old question and answer has also been accepted. But still I would like to submit a very simple answer to original question. Consider this code:
String str = "Hello-World:How\nAre You&doing";
inputs = str.split("(?!^)\\b");
for (int i=0; i<inputs.length; i++) {
System.out.println("a[" + i + "] = \"" + inputs[i] + '"');
}
OUTPUT:
a[0] = "Hello"
a[1] = "-"
a[2] = "World"
a[3] = ":"
a[4] = "How"
a[5] = "
"
a[6] = "Are"
a[7] = " "
a[8] = "You"
a[9] = "&"
a[10] = "doing"
I am just using word boundary \b to delimit the words except when it is start of text.
I got here late, but returning to the original question, why not just use lookarounds?
Pattern p = Pattern.compile("(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)");
System.out.println(Arrays.toString(p.split("'ab','cd','eg'")));
System.out.println(Arrays.toString(p.split("boo:and:foo")));
output:
[', ab, ',', cd, ',', eg, ']
[boo, :, and, :, foo]
EDIT: What you see above is what appears on the command line when I run that code, but I now see that it's a bit confusing. It's difficult to keep track of which commas are part of the result and which were added by Arrays.toString(). SO's syntax highlighting isn't helping either. In hopes of getting the highlighting to work with me instead of against me, here's how those arrays would look it I were declaring them in source code:
{ "'", "ab", "','", "cd", "','", "eg", "'" }
{ "boo", ":", "and", ":", "foo" }
I hope that's easier to read. Thanks for the heads-up, #finnw.
I had a look at the above answers and honestly none of them I find satisfactory. What you want to do is essentially mimic the Perl split functionality. Why Java doesn't allow this and have a join() method somewhere is beyond me but I digress. You don't even need a class for this really. Its just a function. Run this sample program:
Some of the earlier answers have excessive null-checking, which I recently wrote a response to a question here:
https://stackoverflow.com/users/18393/cletus
Anyway, the code:
public class Split {
public static List<String> split(String s, String pattern) {
assert s != null;
assert pattern != null;
return split(s, Pattern.compile(pattern));
}
public static List<String> split(String s, Pattern pattern) {
assert s != null;
assert pattern != null;
Matcher m = pattern.matcher(s);
List<String> ret = new ArrayList<String>();
int start = 0;
while (m.find()) {
ret.add(s.substring(start, m.start()));
ret.add(m.group());
start = m.end();
}
ret.add(start >= s.length() ? "" : s.substring(start));
return ret;
}
private static void testSplit(String s, String pattern) {
System.out.printf("Splitting '%s' with pattern '%s'%n", s, pattern);
List<String> tokens = split(s, pattern);
System.out.printf("Found %d matches%n", tokens.size());
int i = 0;
for (String token : tokens) {
System.out.printf(" %d/%d: '%s'%n", ++i, tokens.size(), token);
}
System.out.println();
}
public static void main(String args[]) {
testSplit("abcdefghij", "z"); // "abcdefghij"
testSplit("abcdefghij", "f"); // "abcde", "f", "ghi"
testSplit("abcdefghij", "j"); // "abcdefghi", "j", ""
testSplit("abcdefghij", "a"); // "", "a", "bcdefghij"
testSplit("abcdefghij", "[bdfh]"); // "a", "b", "c", "d", "e", "f", "g", "h", "ij"
}
}
I like the idea of StringTokenizer because it is Enumerable.
But it is also obsolete, and replace by String.split which return a boring String[] (and does not includes the delimiters).
So I implemented a StringTokenizerEx which is an Iterable, and which takes a true regexp to split a string.
A true regexp means it is not a 'Character sequence' repeated to form the delimiter:
'o' will only match 'o', and split 'ooo' into three delimiter, with two empty string inside:
[o], '', [o], '', [o]
But the regexp o+ will return the expected result when splitting "aooob"
[], 'a', [ooo], 'b', []
To use this StringTokenizerEx:
final StringTokenizerEx aStringTokenizerEx = new StringTokenizerEx("boo:and:foo", "o+");
final String firstDelimiter = aStringTokenizerEx.getDelimiter();
for(String aString: aStringTokenizerEx )
{
// uses the split String detected and memorized in 'aString'
final nextDelimiter = aStringTokenizerEx.getDelimiter();
}
The code of this class is available at DZone Snippets.
As usual for a code-challenge response (one self-contained class with test cases included), copy-paste it (in a 'src/test' directory) and run it. Its main() method illustrates the different usages.
Note: (late 2009 edit)
The article Final Thoughts: Java Puzzler: Splitting Hairs does a good work explaning the bizarre behavior in String.split().
Josh Bloch even commented in response to that article:
Yes, this is a pain. FWIW, it was done for a very good reason: compatibility with Perl.
The guy who did it is Mike "madbot" McCloskey, who now works with us at Google. Mike made sure that Java's regular expressions passed virtually every one of the 30K Perl regular expression tests (and ran faster).
The Google common-library Guava contains also a Splitter which is:
simpler to use
maintained by Google (and not by you)
So it may worth being checked out. From their initial rough documentation (pdf):
JDK has this:
String[] pieces = "foo.bar".split("\\.");
It's fine to use this if you want exactly what it does:
- regular expression
- result as an array
- its way of handling empty pieces
Mini-puzzler: ",a,,b,".split(",") returns...
(a) "", "a", "", "b", ""
(b) null, "a", null, "b", null
(c) "a", null, "b"
(d) "a", "b"
(e) None of the above
Answer: (e) None of the above.
",a,,b,".split(",")
returns
"", "a", "", "b"
Only trailing empties are skipped! (Who knows the workaround to prevent the skipping? It's a fun one...)
In any case, our Splitter is simply more flexible: The default behavior is simplistic:
Splitter.on(',').split(" foo, ,bar, quux,")
--> [" foo", " ", "bar", " quux", ""]
If you want extra features, ask for them!
Splitter.on(',')
.trimResults()
.omitEmptyStrings()
.split(" foo, ,bar, quux,")
--> ["foo", "bar", "quux"]
Order of config methods doesn't matter -- during splitting, trimming happens before checking for empties.
Here is a simple clean implementation which is consistent with Pattern#split and works with variable length patterns, which look behind cannot support, and it is easier to use. It is similar to the solution provided by #cletus.
public static String[] split(CharSequence input, String pattern) {
return split(input, Pattern.compile(pattern));
}
public static String[] split(CharSequence input, Pattern pattern) {
Matcher matcher = pattern.matcher(input);
int start = 0;
List<String> result = new ArrayList<>();
while (matcher.find()) {
result.add(input.subSequence(start, matcher.start()).toString());
result.add(matcher.group());
start = matcher.end();
}
if (start != input.length()) result.add(input.subSequence(start, input.length()).toString());
return result.toArray(new String[0]);
}
I don't do null checks here, Pattern#split doesn't, why should I. I don't like the if at the end but it is required for consistency with the Pattern#split . Otherwise I would unconditionally append, resulting in an empty string as the last element of the result if the input string ends with the pattern.
I convert to String[] for consistency with Pattern#split, I use new String[0] rather than new String[result.size()], see here for why.
Here are my tests:
#Test
public void splitsVariableLengthPattern() {
String[] result = Split.split("/foo/$bar/bas", "\\$\\w+");
Assert.assertArrayEquals(new String[] { "/foo/", "$bar", "/bas" }, result);
}
#Test
public void splitsEndingWithPattern() {
String[] result = Split.split("/foo/$bar", "\\$\\w+");
Assert.assertArrayEquals(new String[] { "/foo/", "$bar" }, result);
}
#Test
public void splitsStartingWithPattern() {
String[] result = Split.split("$foo/bar", "\\$\\w+");
Assert.assertArrayEquals(new String[] { "", "$foo", "/bar" }, result);
}
#Test
public void splitsNoMatchesPattern() {
String[] result = Split.split("/foo/bar", "\\$\\w+");
Assert.assertArrayEquals(new String[] { "/foo/bar" }, result);
}
I will post my working versions also(first is really similar to Markus).
public static String[] splitIncludeDelimeter(String regex, String text){
List<String> list = new LinkedList<>();
Matcher matcher = Pattern.compile(regex).matcher(text);
int now, old = 0;
while(matcher.find()){
now = matcher.end();
list.add(text.substring(old, now));
old = now;
}
if(list.size() == 0)
return new String[]{text};
//adding rest of a text as last element
String finalElement = text.substring(old);
list.add(finalElement);
return list.toArray(new String[list.size()]);
}
And here is second solution and its round 50% faster than first one:
public static String[] splitIncludeDelimeter2(String regex, String text){
List<String> list = new LinkedList<>();
Matcher matcher = Pattern.compile(regex).matcher(text);
StringBuffer stringBuffer = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(stringBuffer, matcher.group());
list.add(stringBuffer.toString());
stringBuffer.setLength(0); //clear buffer
}
matcher.appendTail(stringBuffer); ///dodajemy reszte ciagu
list.add(stringBuffer.toString());
return list.toArray(new String[list.size()]);
}
Another candidate solution using a regex. Retains token order, correctly matches multiple tokens of the same type in a row. The downside is that the regex is kind of nasty.
package javaapplication2;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JavaApplication2 {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
String num = "58.5+variable-+98*78/96+a/78.7-3443*12-3";
// Terrifying regex:
// (a)|(b)|(c) match a or b or c
// where
// (a) is one or more digits optionally followed by a decimal point
// followed by one or more digits: (\d+(\.\d+)?)
// (b) is one of the set + * / - occurring once: ([+*/-])
// (c) is a sequence of one or more lowercase latin letter: ([a-z]+)
Pattern tokenPattern = Pattern.compile("(\\d+(\\.\\d+)?)|([+*/-])|([a-z]+)");
Matcher tokenMatcher = tokenPattern.matcher(num);
List<String> tokens = new ArrayList<>();
while (!tokenMatcher.hitEnd()) {
if (tokenMatcher.find()) {
tokens.add(tokenMatcher.group());
} else {
// report error
break;
}
}
System.out.println(tokens);
}
}
Sample output:
[58.5, +, variable, -, +, 98, *, 78, /, 96, +, a, /, 78.7, -, 3443, *, 12, -, 3]
I don't know of an existing function in the Java API that does this (which is not to say it doesn't exist), but here's my own implementation (one or more delimiters will be returned as a single token; if you want each delimiter to be returned as a separate token, it will need a bit of adaptation):
static String[] splitWithDelimiters(String s) {
if (s == null || s.length() == 0) {
return new String[0];
}
LinkedList<String> result = new LinkedList<String>();
StringBuilder sb = null;
boolean wasLetterOrDigit = !Character.isLetterOrDigit(s.charAt(0));
for (char c : s.toCharArray()) {
if (Character.isLetterOrDigit(c) ^ wasLetterOrDigit) {
if (sb != null) {
result.add(sb.toString());
}
sb = new StringBuilder();
wasLetterOrDigit = !wasLetterOrDigit;
}
sb.append(c);
}
result.add(sb.toString());
return result.toArray(new String[0]);
}
I suggest using Pattern and Matcher, which will almost certainly achieve what you want. Your regular expression will need to be somewhat more complicated than what you are using in String.split.
I don't think it is possible with String#split, but you can use a StringTokenizer, though that won't allow you to define your delimiter as a regex, but only as a class of single-digit characters:
new StringTokenizer("Hello, world. Hi!", ",.!", true); // true for returnDelims
If you can afford, use Java's replace(CharSequence target, CharSequence replacement) method and fill in another delimiter to split with.
Example:
I want to split the string "boo:and:foo" and keep ':' at its righthand String.
String str = "boo:and:foo";
str = str.replace(":","newdelimiter:");
String[] tokens = str.split("newdelimiter");
Important note: This only works if you have no further "newdelimiter" in your String! Thus, it is not a general solution.
But if you know a CharSequence of which you can be sure that it will never appear in the String, this is a very simple solution.
Fast answer: use non physical bounds like \b to split. I will try and experiment to see if it works (used that in PHP and JS).
It is possible, and kind of work, but might split too much. Actually, it depends on the string you want to split and the result you need. Give more details, we will help you better.
Another way is to do your own split, capturing the delimiter (supposing it is variable) and adding it afterward to the result.
My quick test:
String str = "'ab','cd','eg'";
String[] stra = str.split("\\b");
for (String s : stra) System.out.print(s + "|");
System.out.println();
Result:
'|ab|','|cd|','|eg|'|
A bit too much... :-)
Tweaked Pattern.split() to include matched pattern to the list
Added
// add match to the list
matchList.add(input.subSequence(start, end).toString());
Full source
public static String[] inclusiveSplit(String input, String re, int limit) {
int index = 0;
boolean matchLimited = limit > 0;
ArrayList<String> matchList = new ArrayList<String>();
Pattern pattern = Pattern.compile(re);
Matcher m = pattern.matcher(input);
// Add segments before each match found
while (m.find()) {
int end = m.end();
if (!matchLimited || matchList.size() < limit - 1) {
int start = m.start();
String match = input.subSequence(index, start).toString();
matchList.add(match);
// add match to the list
matchList.add(input.subSequence(start, end).toString());
index = end;
} else if (matchList.size() == limit - 1) { // last one
String match = input.subSequence(index, input.length())
.toString();
matchList.add(match);
index = end;
}
}
// If no match was found, return this
if (index == 0)
return new String[] { input.toString() };
// Add remaining segment
if (!matchLimited || matchList.size() < limit)
matchList.add(input.subSequence(index, input.length()).toString());
// Construct result
int resultSize = matchList.size();
if (limit == 0)
while (resultSize > 0 && matchList.get(resultSize - 1).equals(""))
resultSize--;
String[] result = new String[resultSize];
return matchList.subList(0, resultSize).toArray(result);
}
Here's a groovy version based on some of the code above, in case it helps. It's short, anyway. Conditionally includes the head and tail (if they are not empty). The last part is a demo/test case.
List splitWithTokens(str, pat) {
def tokens=[]
def lastMatch=0
def m = str=~pat
while (m.find()) {
if (m.start() > 0) tokens << str[lastMatch..<m.start()]
tokens << m.group()
lastMatch=m.end()
}
if (lastMatch < str.length()) tokens << str[lastMatch..<str.length()]
tokens
}
[['<html><head><title>this is the title</title></head>',/<[^>]+>/],
['before<html><head><title>this is the title</title></head>after',/<[^>]+>/]
].each {
println splitWithTokens(*it)
}
An extremely naive and inefficient solution which works nevertheless.Use split twice on the string and then concatenate the two arrays
String temp[]=str.split("\\W");
String temp2[]=str.split("\\w||\\s");
int i=0;
for(String string:temp)
System.out.println(string);
String temp3[]=new String[temp.length-1];
for(String string:temp2)
{
System.out.println(string);
if((string.equals("")!=true)&&(string.equals("\\s")!=true))
{
temp3[i]=string;
i++;
}
// System.out.println(temp.length);
// System.out.println(temp2.length);
}
System.out.println(temp3.length);
String[] temp4=new String[temp.length+temp3.length];
int j=0;
for(i=0;i<temp.length;i++)
{
temp4[j]=temp[i];
j=j+2;
}
j=1;
for(i=0;i<temp3.length;i++)
{
temp4[j]=temp3[i];
j+=2;
}
for(String s:temp4)
System.out.println(s);
String expression = "((A+B)*C-D)*E";
expression = expression.replaceAll("\\+", "~+~");
expression = expression.replaceAll("\\*", "~*~");
expression = expression.replaceAll("-", "~-~");
expression = expression.replaceAll("/+", "~/~");
expression = expression.replaceAll("\\(", "~(~"); //also you can use [(] instead of \\(
expression = expression.replaceAll("\\)", "~)~"); //also you can use [)] instead of \\)
expression = expression.replaceAll("~~", "~");
if(expression.startsWith("~")) {
expression = expression.substring(1);
}
String[] expressionArray = expression.split("~");
System.out.println(Arrays.toString(expressionArray));
One of the subtleties in this question involves the "leading delimiter" question: if you are going to have a combined array of tokens and delimiters you have to know whether it starts with a token or a delimiter. You could of course just assume that a leading delim should be discarded but this seems an unjustified assumption. You might also want to know whether you have a trailing delim or not. This sets two boolean flags accordingly.
Written in Groovy but a Java version should be fairly obvious:
String tokenRegex = /[\p{L}\p{N}]+/ // a String in Groovy, Unicode alphanumeric
def finder = phraseForTokenising =~ tokenRegex
// NB in Groovy the variable 'finder' is then of class java.util.regex.Matcher
def finderIt = finder.iterator() // extra method added to Matcher by Groovy magic
int start = 0
boolean leadingDelim, trailingDelim
def combinedTokensAndDelims = [] // create an array in Groovy
while( finderIt.hasNext() )
{
def token = finderIt.next()
int finderStart = finder.start()
String delim = phraseForTokenising[ start .. finderStart - 1 ]
// Groovy: above gets slice of String/array
if( start == 0 ) leadingDelim = finderStart != 0
if( start > 0 || leadingDelim ) combinedTokensAndDelims << delim
combinedTokensAndDelims << token // add element to end of array
start = finder.end()
}
// start == 0 indicates no tokens found
if( start > 0 ) {
// finish by seeing whether there is a trailing delim
trailingDelim = start < phraseForTokenising.length()
if( trailingDelim ) combinedTokensAndDelims << phraseForTokenising[ start .. -1 ]
println( "leading delim? $leadingDelim, trailing delim? $trailingDelim, combined array:\n $combinedTokensAndDelims" )
}
If you want keep character then use split method with loophole in .split() method.
See this example:
public class SplitExample {
public static void main(String[] args) {
String str = "Javathomettt";
System.out.println("method 1");
System.out.println("Returning words:");
String[] arr = str.split("t", 40);
for (String w : arr) {
System.out.println(w+"t");
}
System.out.println("Split array length: "+arr.length);
System.out.println("method 2");
System.out.println(str.replaceAll("t", "\n"+"t"));
}
I don't know Java too well, but if you can't find a Split method that does that, I suggest you just make your own.
string[] mySplit(string s,string delimiter)
{
string[] result = s.Split(delimiter);
for(int i=0;i<result.Length-1;i++)
{
result[i] += delimiter; //this one would add the delimiter to each items end except the last item,
//you can modify it however you want
}
}
string[] res = mySplit(myString,myDelimiter);
Its not too elegant, but it'll do.

How to split a string by every other separator

There's a string
String str = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
How do I split it into strings like this
"ggg;ggg;"
"nnn;nnn;"
"aaa;aaa;"
"xxx;xxx;"
???????
Using Regex
String input = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
Pattern p = Pattern.compile("([a-z]{3});\\1;");
Matcher m = p.matcher(input);
while (m.find())
// m.group(0) is the result
System.out.println(m.group(0));
Will output
ggg;ggg;
nnn;nnn;
aaa;aaa;
xxx;xxx;
I assume that the you only want to check if the last segment is similar and not every segment that has been read.
If that is not the case then you would probably have to use an ArrayList instead of a Stack.
I also assumed that each segment has the format /([a-z])\1\1/.
If that is not the case either then you should change the if statement with:
(stack.peek().substring(0,index).equals(temp))
public static Stack<String> splitString(String text, char split) {
Stack<String> stack = new Stack<String>();
int index = text.indexOf(split);
while (index != -1) {
String temp = text.substring(0, index);
if (!stack.isEmpty()) {
if (stack.peek().charAt(0) == temp.charAt(0)) {
temp = stack.pop() + split + temp;
}
}
stack.push(temp);
text = text.substring(index + 1);
index = text.indexOf(split);
}
return stack;
}
Split and join them.
public static void main(String[] args) throws Exception {
String data = "ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String del = ";";
int splitSize = 2;
StringBuilder sb = new StringBuilder();
for (Iterable<String> iterable : Iterables.partition(Splitter.on(del).split(data), splitSize)) {
sb.append("\"").append(Joiner.on(del).join(iterable)).append(";\"");
}
sb.delete(sb.length()-3, sb.length());
System.out.println(sb.toString());
}
Ref : Split a String at every 3rd comma in Java
Use split with a regex:
String data="ggg;ggg;nnn;nnn;aaa;aaa;xxx;xxx;";
String [] array=data.split("(?<=\\G\\S\\S\\S;\\S\\S\\S);");
S: A non-whitespace character
G: last match/start of string, think of it of a way to skip delimiting if the
previous string matches current one.
?<=:positive look-behind will match semicolon which has string behind it.
Some other answer, that only works given your specific example input.
You see, in your example, there are two similarities:
All patterns seem to have exactly three characters
All patterns occur exactly twice
In other words: if those two properties are really met for all your input, you could avoid splitting - as you know exactly what to find in each position of your string.
Of course, following the other answers for "real" splitting are more flexible; but (theoretically), you could just go forward and do a bunch of substring calls in order to directly access all elements.

Java, calculating difference between unique characters in strings

Let's say I have 2 strings and i need to calculate a difference between their unique characters. It's simple:
String s1 = "abcd";
String s2 = "aaaacccbbf";
//answer: 1
The answer is 1, because there is no "f" in s1 variable.
But what about characters like மா or 漢字, or any other non ASCII character? If i loop though those strings, one character like கு will count 2-3 times as separate character, giving me wrong answer:
String s1 = "ab";
String s2 = "aaaகுb";
//answer: 2 (wrong!)
The code i tried with:
class a {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String s1 = sc.nextLine();
String s2 = sc.nextLine();
sc.close();
String missingCharacters= "";
for(char c : s2.toCharArray()) {
if(!missingCharacters.contains(c+"") && !s1.contains(c+""))
missingCharacters+= c;
}
System.out.println(missingCharacters.length());
}
}
Your symbol கு is compound form of Tamil script which contains two Unicode chars க் + உ (0B95 + 0BC1). If you plan to work with Tamil script you have to find all similiar characters with pattern:
String s1 = "ab";
String s2 = "aaaகுb";
Pattern pattern = Pattern.compile("\\p{L}\\p{M}*");
Matcher matcher = pattern.matcher(s2);
Set<String> missingCharacters=new TreeSet<>();
while (matcher.find()) {
missingCharacters.add(matcher.group());
}
matcher = pattern.matcher(s1);
while (matcher.find()) {
missingCharacters.remove(matcher.group());
}
System.out.println(missingCharacters.size());
Regex source:
How to Match a Single Unicode Grapheme
Set<Integer> missing = new HashSet<>();
for (int i = 0; i < s1.length();) {
int codePoint = s1.codePointAt(i);
if (s2.indexOf(codePoint) == -1) {
missing.add(codePoint); // takes care of duplicates
}
i += Character.charCount(codePoint);
}
System.out.println(missing.size());
கு is a special character, it it formed by merging க and ு, thus creating 2 different characters, and doesn't have 1 single char value. You are looping over the chars in s2, so you won't find that character itself.
Java doesn't have a way around this, as String.substring() and String.charAt() both use chars.
Conclusion, it's impossible to do this with Java's default libraries.

Split a mathematical expression while handling negative numbers with Java

I'm working on an expression calculator in Java. I decided to first write a code for conversion to postfix and then write an reverse polish notation calculator. So far my calculator works great and can handle any expression including operators + - * / %.
The problem I'm having however is that it splits the expression using a space input.split(" ") so this means the expression must be entered ( 4 + ( 2 * ( -2 - 1 ) ) ) * 1.5 when I it should be able to be entered (4+(2*(-2-1)))*1.5.
After hours of tinkering and I now know it cant work regex but would it be able to write a for loop that loops through two tokens of the string at a time and if they both equal an operator then you can assume that the second must be a negative value. Or if the Equation starts with an operator then it must be a negative value? Iterate through the string like this until the second operator gets to the end of the expression?
Here is some code I have been playing with trying to make a start at this but since I'm still quite new to programming I can't seem to get it to work.
String expression = "(4+(2*(-2--16)))*-1.5";
ArrayList<String> tokens = new ArrayList<String>();
String orig = null;
String regex = "[-+/*()]+";
String first = Character.toString(expression.charAt(0));
tokens.add(first);
for (int i = 0; i < expression.length(); i++) {
char x = expression.charAt(i);
String a = Character.toString(x);
if (i >= 1){ //Check i is greater than or equal to 1
char y = expression.charAt(i-1);
String b = Character.toString(y);
if(b.matches(regex) && x == '-'){
orig = a;
}else if(orig != null && orig.equals("-")){
System.out.println(orig + a);
tokens.add(orig + a);
orig = null;
}else{
tokens.add(a);
}
}
}
for(String t:tokens){
System.out.print(t+" ");
}
Thanks for any help, Ciaran.
Edit:
My question is how can I write a method to split a mathematical expression which while splitting can tell the difference '-' as a binary operator and '-' as a unary operator? Am I on the right lines with the idea of iterating through a string and comparing the two tokens? – Ciaran Ashton 6 mins ago
What I am trying to achieve
I want to turn String expression = (4+(2*(-2-1))) into String[] expression = (, 4, (, 2, *, (, -2, -, 1, ), ), )
This is a job for a proper parser generator. The best known ones in the Java world are JavaCC and Antlr. I like to use JFlex paired with JavaCC.
What's nice about them is that you give tokens a different meaning based on the context. So, a minus can mean one thing in one place and something different in another place.
Using a parser is the better solution, but to answer your question as you asked it, you can use this regex, which will pretty much do what you want (not 100% but comes close):
(?<=[\(\)\+\-*\/\^A-Za-z])|(?=[\(\)\+\-*\/\^A-Za-z])
So, you will have to escape it and use it like this:
String input = ...;
String temp[] = input.split("(?<=[\\(\\)\\+\\-*\\/\\^A-Za-z])|(?=[\\(\\)\\+\\-*\\/\\^A-Za-z])");
System.out.println(Arrays.toString(temp));
Input:
7+4-(18/3)/2a^222+1ab
Output:
[7, +, 4, -, (, 18, /, 3, ), /, 2, a, ^, 222, +, 1, a, b]
See it in action here:
http://rubular.com/r/uHAObPwaln
http://ideone.com/GLFmo4
This can be the solution to your problem and problem like this although i have not tested this thoroughly on variety of data but approach is that-- whenever unary operator comes in expression(fully parenthesized expression) it will be preceded by '(' and followed by a number.
String expression = "(4+(2*(-2-1)))*1.5";
List<String> tokens = new ArrayList<String>();
String prev = null;
int c = 0;
for (int i = 0; i < expression.length(); i++) {
char x = expression.charAt(i);
String a = Character.toString(x);
if (i >= 1 && expression.charAt(i - 1) == '(' && x == '-') {
prev = a;
} else {
if (prev != null && prev.equals("-")) {
tokens.add(prev + a);
prev = null;
} else {
tokens.add(a);
}
c++;
}
}
This is a version that only uses regular expressions. It matches your sample input but it won't handle situations where unary operators are placed in front of parentheses or if multiple unary operations are nested (e.g. "--1"):
// expression that matches standard Java number formats
// such as 1234, 12.5, and 1.3E-19
String number = "\\d+(?:\\.\\d+(?:(?:E|e)-?\\d+)?)?";
// expression that matches :
// number
// number with unary operator (deemed unary if preceded by (,-,+,/, or *)
// (,),-,+,/, or *
String token = "(" + number + "|(?<=[(-+/*])-" + number + "|[-+/*()])?";
Pattern p = Pattern.compile(token);
Matcher m = p.matcher("(4+(2*(-2-1)))*1.5");
while (m.find()) {
System.out.println(m.group(0));
}
Okay so after all the great advise from the guys I created a method that will take an input such as -1+2(4+(2*(-2--1)))*-1.5 and split it to an array, such as [-1, +, 2, (, 4, +, (, 2, *, (, -2, -, -1, ), ), ), *, -1.5].
The way the method works is that it splits the input String using regex. With regex I was able to split all the numbers and operators. While this is great it wasn't able to handle negative values. Using regex it would always see - as a binary operator. I needed it to see it as a unary operator so that it could understand that it's a negative value. So what I did was compare each operator with the string that followed it. If this was also an operator I knew that the second one was a unary operator. I then also had to put in an if statement for if the first value was a - and if it was I knew that that was a unary operator.
Here's the code so far. I'm sure there is an easier way to do this using a parser, I just couldn't wrap my head around it.
import java.util.ArrayList;
import java.util.Arrays;
public class expSplit {
public String[] splitExp(String theexp){
ArrayList<String> tokens = new ArrayList<String>();
//System.out.println(theexp);
String expression = theexp.replaceAll("\\s+", "");
//System.out.println(expression);
String tem[] = expression.split("(?<=[-+*/%(),])(?=.)|(?<=.)(?=[-+*/%(),])");
ArrayList<String> temp = new ArrayList<String>(Arrays.asList(tem));
String orig = null;
String regex = "[-+/%*]+";
String first = temp.get(0);
tokens.add(first);
String secound = temp.get(1);
if(first.equals("-")){
tokens.remove(0);
tokens.add(first+secound);
}
for (int i = 0; i < temp.size(); i++) {
String a = temp.get(i);
if (i >= 1){
String b = temp.get(i-1);
if(b.matches(regex) && a.matches("[-+]+")){
String c = temp.get(i-2);
if(c.matches("[-+]+")){
//System.out.println("MATCH");
break;
}else{
//System.out.println("NO MATCH");
orig = a;
}
}else if(orig != null && orig.equals("-")){
tokens.add(orig + a);
orig = null;
}else{
tokens.add(a);
}
}
}
if(first.equals("+")){
tokens.remove(0);
}
if(first.equals("-")){
tokens.remove(1);
}
String[]tokenArray = new String[tokens.size()];
tokenArray = tokens.toArray(tokenArray);
//System.out.print(tokens);
return tokenArray;
}
}
Thanks for the help, Ciaran

Categories

Resources