Confused on how exactly to use Regex? - java

So I've been looking at a lot of searches when it came to regex expressions, however i'm still pretty confused on how to set them up. The issue I'm having is that i'm trying to convert this given text given from an input file:
(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)
such that I can stick it into an String array where it looks like this:
[42, 10, d, 23, 1, 123, 4, 32, 10, d, 12, 9]
Any tips?
I tried using a delimiter at first to get rid of the parentheses and commas however, delimiters puts each value on a whole separate line which sadly isn't what I'm aiming for. I'm essentially trying to ignore those special characters so I can assign for example 42 to an int a, and 10 to int b.

What language are you using? EDIT: nvm I see you're using Java. Premise is still there on how to do it, I'll get back to you in a bit with the Java version.
In perl this would be pretty simple.
use Data::Dumper;
my $var = "(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)";
$var =~ s/\)/,/g;
$var =~ s/\(//g;
$var =~ s/d/d,/g;
$var =~ s/\s*//g;
my #arr = split /,/, $var;
print Dumper \#arr;
Java Version:
String content = "(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)";
String[] split = null;
split = content.replace(")",",").replace("(","").replace("d","d,").replace(" ","").split(",");
for (String a : split)
{
System.out.println(a);
}
Although I guess this doesn't strictly answer your question, since it doesn't use regex. It just uses replace and split.

If you really want regex, then this works better when there is more complexity in your data.
Note: I just do alphanumeric data, since the d didn't seem to mean anything special
Pattern p = Pattern.compile("[A-Za-z0-9]+");
Matcher m = p.matcher("(42, 10) d (23, 1) (123, 4) (32, 10) d (12, 9)");
String delim = ",";
StringBuffer sb = new StringBuffer("[");
while (m.find()) {
sb.append(m.group()).append(delim);
}
sb.setLength(sb.length() - delim.length());
System.out.println(sb.append("]").toString());
Output
[42,10,d,23,1,123,4,32,10,d,12,9]
Use a List<String> if you do want to keep that data around.

Related

Java pad string with specific characters

I have a string apple I want it maximum 8 character long.
So i have apple. its 5 chars long. I pad it with 3* to make it ***apple
So i have bat. its 3 chars long. I pad it with 5* to make it *****bat
Is there any way to do it? Cant find any padding library out there.
Other way how you can do it:
String.format("%8s", "helllo").replace(' ', '*');
In this case you do not need to add library.
tl;dr
Built into Java 11+
"*" // A single-character string.
.repeat( 8 - "apple".length ) // `String::repeat` multiplies text. Here we return a `String` of a certain number of asterisks.
.concat( "apple" ) ; // Appends to those asterisks our original input.
***apple
String::repeat
Java 11 brought a new method, repeat, to the String class. This method multiplies a piece of text a certain number of times. Multiplying by zero is valid, resulting in an empty string.
In our case, we want to multiply a single character string, *.
int goal = 8 ;
String input = "apple" ;
int length = input.length() ;
String output = "*".repeat( goal - length ).concat( input ) ;
See this code run live at IdeOne.com.
***apple
No library needed. Uses a StringBuilder and reverse() for speed (insert is slower than append):
public String padWord(String word)
{
if(word.length() < 8)
{
StringBuilder foo = new StringBuilder(word);
foo = foo.reverse();
for(int x = word.length(); x < 8; x++)
{
foo.append('*');
}
foo.reverse();
}
return foo.toString();
}
You can use StringUtils.leftPad from Apache Commons Lang library
StringUtils.leftPad(null, *, *) = null
StringUtils.leftPad("", 3, 'z') = "zzz"
StringUtils.leftPad("bat", 3, 'z') = "bat"
StringUtils.leftPad("bat", 5, 'z') = "zzbat"
StringUtils.leftPad("bat", 1, 'z') = "bat"
StringUtils.leftPad("bat", -1, 'z') = "bat"

Splitting String with wildcard

I have a variable String which contains values i need and splitters. The problem is, the length of the string is variable and the type of splitters as well. They arrive through XML-file.
A string will look like this:
1+"."+20+"."+51+"."+2+"name.jpg"
but can also be:
1+"*"+20+"*"+51+"name.jpg"
The solid factors are:
the digits are id's which I need to retrieve.
the splitter values will be between "quotes".
the amount of id's is unknown, can be one, can be 200
the value used to split can be everything, but will always be between two quotes.
I was looking for a way to split the string on the "." but instead of the dot (.) give a wildcard, which can be 1 character or multiple.
Note: The value between the quotes can be anything! Doesn't even have to be a single character
Try to split by regular expression, i.e. like this:
String regex = "\\+?\"[^\"]*\"\\+?";
System.out.println(Arrays.toString( "1+\".\"+20+\".\"+51+\".\"+2+\"name.jpg\"".split( regex ) ));
System.out.println(Arrays.toString( "1+\"*\"+20+\"*\"+51+\"name.jpg\"".split( regex ) ));
Output:
[1, 20, 51, 2]
[1, 20, 51]
The regex would match any 2 double quotes with non-double quote characters in between and preceeded/followed by optional pluses. You could expand that to allow whitespace as well, e.g. "\\s*\\+?\\s*\"[^\"]*\"\\s*\\+?\\s*". The only thing that's not allowed in a splitter would be double quotes.
If you need the name as well, you might try and define the potential splitters in the regex,
e.g. "(\\+?\"[\\.\\*]*\"\\+?)|\\+?\""
Note that in that case you'd have to account for the quotes around the name, i.e. to split 2+"name.jpg" you have to add the alternative \+?" (double quotes preceded by an optional plus).
Update:
Additional examples (input -> output)
5+".."+272+"..."+21+"splitter"+2+"name.jpg" --> [5, 272, 21, 2]
444+"()"+0+"abc"+51+"__"+2+"name.jpg" --> [444, 0, 51, 2]
1+"."+20+"."+51+"."+2+"name.jpg" --> [1, 20, 51, 2]
1+"*"+20+"*"+51+"name.jpg" --> [1, 20, 51]
hmm can't you try something like this:
String oldStr=1+"."+20+"."+51+"."+2+"name.jpg";
String newStr= oldStr.replace("name.jpg",""); // or you can use regex such as : oldStr.replaceAll("(\w+.\w+)","");
String[] array;
array=newStr.split(".");
if(array==null || array.length==0){
array=newStr.split("*");
}
So, just that I get it right, possible filenames / string values are:
1.20.51.2name.jpg
1*20*51*name.jpg
Right?
So more general you could say: Some digits of unknown amount, seperated by a non-digit character?
You could execute a RegEx statement onto each String: \d+.
If executed globaly, you will get a list of each number. So for
1.20.51.2name.jpg
I got
1, 20, 51, 2
Using this :
String x = 1+"."+20+"."+51+"."+2+"name.jpg";
String y = 1+"*"+20+"*"+51+"name.jpg";
System.out.println(Arrays.toString(x.split("\\.|\\*")));
System.out.println(Arrays.toString(y.split("\\.|\\*")));
Will give you the following output:
[1, 20, 51, 2name, jpg]
[1, 20, 51name, jpg]

How to split string at operators

Im creating a calculator in Java.
If i have the user enter a string such as:
7+4-(18/3)/2
So far i have had to have the user enter a space between each number or operator.
How would i create an array from the given string where the string is split at either number or an operator so in this case the array would be:
[7, +, 4, -, (, 18, /, 3, ), /, 2]
(The array is of type String)
Any help would be really appreciated
Thanks :)
try this:
String[] temp = expression.split("[\s+-\\\(\)]+");
will split on:
white spaces
+ operator
- operator
\ character
( character
) character
You haven't specified what you want to do with the array. If you really want to evaluate the expression, then there are already libraries available for that. You can use one of them. But if you only want an array like the one you have shown, then also I wouldn't suggest to use regex. You can write your own parser method like below:
public static String[] parseExpression(String str) {
List<String> list = new ArrayList<String>();
StringBuilder currentDigits = new StringBuilder();
for (char ch: str.toCharArray()) {
if (Character.isDigit(ch)) {
currentDigits.append(ch);
} else {
if (currentDigits.length() > 0) {
list.add(currentDigits.toString());
currentDigits = new StringBuilder();
}
list.add(String.valueOf(ch));
}
}
if (currentDigits.length() > 0)
list.add(currentDigits.toString());
return list.toArray(new String[list.size()]);
}
Now call it like:
String str = "7+4-(18/3)/2";
System.out.println(Arrays.toString(parseExpression(str)));
and you will get your result.
The way I would do this is just scan the string myself to be honest. You will want to build an operation from the results anyway so you don't really gain anything by using an automated parser/splitter/etc.
Here is a rough sketch of the code:
List<Operations> ops = new ArrayList();
for (int i=0;i<str.length();i++) {
char c = str.get(i);
if (c == '.' || c >= '0' || c<='9') {
// extract a number, moving i onwards as I do
// insert number into ops (new Operation(num))
} else if (c!= ' ') {
Operator operator = operators.get(c);
if (operator == null) {
// Handle invalid input - could just skip it
} else {
// Add operator to ops
}
}
}
You would need to define operators for each of the various symbols.
Once you have done that you have parsed the string out to hold only the important data and compiled a list of what operations they are.
Now you need to work out how to process that list of operations applying correct precedence rules etc :) The simplest way may just be to repeatedly loop through the list each time performing each calculation that is valid that time around.
i.e.
1+2*(3+4)-(4+2)
First pass:
1+2*12-6
Second pass:
1+24-6
Result:
19
My first attempt was to use "\b", but that didn't split -(. After some searching, I came up with this:
(?<=[\(\)\+\-*\/\^A-Za-z])|(?=[\(\)\+\-*\/\^A-Za-z])
So, you will have to escape it and use it like this:
String input = ...;
String temp[] = input.split("(?<=[\\(\\)\\+\\-*\\/\\^A-Za-z])|(?=[\\(\\)\\+\\-*\\/\\^A-Za-z])");
System.out.println(Arrays.toString(temp));
Input:
7+4-(18/3)/2a^222+1ab
Output:
[7, +, 4, -, (, 18, /, 3, ), /, 2, a, ^, 222, +, 1, a, b]
See it in action here:
http://rubular.com/r/uHAObPwaln
http://ideone.com/GLFmo4

I'm looking to extract values from matched pattern in Java

I'm using this regex:
([\w\s]+)(=|!=)([\w\s]+)( (or|and) ([\w\s]+)(=|!=)([\w\s]+))*
to match a string such as this: i= 2 or i =3 and k!=4
When I try to extract values using m.group(index), I get:
(i, =, 2, **and k!=4**, and, k, ,!=, 4).
Expected output: (i, =, 2, or, i, =, 3, and, k , !=, 4)
How do i extract the values correctly?
P.S. m.matches() returns true.
you are trying to match with a regexp on an expression...you might want to use a parser, because this regexp (when you have it) can't be extended further..but a parser can be extended at any time
for example, consider using antlr (ANTLR: Is there a simple example?)
This is because your third set of parens (the one that you use for repeating expressions) is what's confusing you. Try using a non-capturing parens:
([\w\s]+)(=|!=)([\w\s]+)(?: (or|and) ([\w\s]+)(=|!=)([\w\s]+))*
Description
Why not simplify your expression to match exactly what you're looking for?
!?=|(?:or|and)|\b(?:(?!or|and)[\w\s])+\b
Example
Live Demo hover over the blue bubbles in the text area to see exactly what is matched
Sample Text
i= 2 or i =1234 and k!=4
Matches Found
[0][0] = i
[1][0] = =
[2][0] = 2
[3][0] = or
[4][0] = i
[5][0] = =
[6][0] = 1234
[7][0] = and
[8][0] = k
[9][0] = !=
[10][0] = 4
Everything in brackets makes a capturing group which you can later access via index. But you can make the group which you do not need non-capturing: (?: ... ), then it will not be considered at Matcher.group(int).

problem with java split()

I have a string:
strArray= "-------9---------------";
I want to find 9 from the string. The string may be like this:
strArray= "---4-5-5-7-9---------------";
Now I want to find out only the digits from the string. I need the values 9,4, or such things and ignore the '-' . I tried the following:
strArray= strignId.split("-");
but it gets error, since there are multiple '-' and I don't get my output. So what function of java should be used?
My input and output should be as follows:
input="-------9---------------";
output="9";
input="---4-5-5-7-9---------------";
output="45579";
What should I do?
The + is a regex metacharacter of "one-or-more" repetition, so the pattern -+ is "one or more dash". This would allow you to use str.split("-+") instead, but you may get an empty string as first element.
If you just want to remove all -, then you can do str = str.replace("-", ""). This uses replace(CharSequence, CharSequence) method, which performs literal String replacement, i.e. not regex patterns.
If you want a String[] with each digit in its own element, then it's easiest to do in two steps: first remove all non-digits, then use zero-length assertion to split everywhere that's not the beginning of the string (?!^) (to prevent getting an empty string as a first element). If you want a char[], then you can just call String.toCharArray()
Lastly, if the string can be very long, it's better to use a java.util.regex.Matcher in a find() loop looking for a digit \d, or a java.util.Scanner with a delimiter \D*, i.e. a sequence (possibly empty) of non-digits. This will not give you an array, but you can use the loop to populate a List (see Effective Java 2nd Edition, Item 25: Prefer lists to arrays).
References
regular-expressions.info/Repetition with Star and Plus, Character Class, Lookaround
Snippets
Here are some examples to illustrate the above ideas:
System.out.println(java.util.Arrays.toString(
"---4--5-67--8-9---".split("-+")
));
// [, 4, 5, 67, 8, 9]
// note the empty string as first element
System.out.println(
"---4--5-67--8-9---".replace("-", "")
);
// 456789
System.out.println(java.util.Arrays.toString(
"abcdefg".toCharArray()
));
// [a, b, c, d, e, f, g]
The next example first deletes all non-digit \D, then splitting everywhere except the beginning of the string (?!^), to get a String[] each containing a digit:
System.out.println(java.util.Arrays.toString(
"#*#^$4#!#5ajs67>?<{8_(9SKJDH"
.replaceAll("\\D", "")
.split("(?!^)")
));
// [4, 5, 6, 7, 8, 9]
This uses a Scanner, with \D* as delimiter, to get each digit as its own token, using it to populate a List<String>:
List<String> digits = new ArrayList<String>();
String text = "(&*!##123ask45{P:L6";
Scanner sc = new Scanner(text).useDelimiter("\\D*");
while (sc.hasNext()) {
digits.add(sc.next());
}
System.out.println(digits);
// [1, 2, 3, 4, 5, 6]
Common problems with split()
Here are some common beginner problems when dealing with String.split:
Lesson #1: split takes a regular expression pattern
This is probably the most common beginner mistake:
System.out.println(java.util.Arrays.toString(
"one|two|three".split("|")
));
// [, o, n, e, |, t, w, o, |, t, h, r, e, e]
System.out.println(java.util.Arrays.toString(
"not.like.this".split(".")
));
// []
The problem here is that | and . are regex metacharacters, and since they are intended to be matched literally, they need to be escaped by preceding with a backslash, which as a Java string literal is "\\".
System.out.println(java.util.Arrays.toString(
"one|two|three".split("\\|")
));
// [one, two, three]
System.out.println(java.util.Arrays.toString(
"not.like.this".split("\\.")
));
// [not, like, this]
Lesson #2: split discards trailing empty strings by default
Sometimes it's desired to keep trailing empty strings (which are discarded by default split):
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";")
));
// [a, b, , d, , , g]
Note that there are slots for the "missing" values for c, e, f, but not for h and i. To fix this, you can use a negative limit argument to String.split(String regex, int limit).
System.out.println(java.util.Arrays.toString(
"a;b;;d;;;g;;".split(";", -1)
));
// [a, b, , d, , , g, , ]
You can also use a positive limit of n to apply the pattern at most n - 1 times (i.e. resulting in no more than n elements in the array).
Zero-width matching split examples
Here are more examples of splitting on zero-width matching constructs; this can be used to split a string but also keep "delimiters".
Simple sentence splitting, keeping punctuation marks:
String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"
Splitting a long string into fixed-length parts, using \G
String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"
Split before capital letters (except the first!)
System.out.println(java.util.Arrays.toString(
"OhMyGod".split("(?=(?!^)[A-Z])")
)); // prints "[Oh, My, God]"
A variety of examples is provided in related questions below.
References
regular-expressions.info/Lookarounds
Related questions
Can you use zero-width matching regex in String split?
"abc<def>ghi<x><x>" -> "abc", "<def>", "ghi", "<x>", "<x>"
How do I convert CamelCase into human-readable names in Java?
"AnXMLAndXSLT2.0Tool" -> "An XML And XSLT 2.0 Tool"
C# version: is there a elegant way to parse a word and add spaces before capital letters
Java split is eating my characters
Is there a way to split strings with String.split() and include the delimiters?
Regex split string but keep separators
You don't use split!
Split is to get the things BETWEEN the separator.
For this you want to eliminate the unwanted chars; '-'
The solution is simple
out=in.replaceAll("-","");
Use something like this to get the single values splitted. I'd rather eliminate the unwanted chars first to avoid getting empty/null String in the result array.
final Vector nodes = new Vector();
int index = original.indexOf(separator);
while (index >= 0) {
nodes.addElement(original.substring(0, index));
original = original.substring(index + separator.length());
index = original.indexOf(separator);
}
nodes.addElement(original);
final String[] result = new String[nodes.size()];
if (nodes.size() > 0) {
for (int loop = 0; loop smaller nodes.size(); loop++) {
result[loop] = (String) nodes.elementAt(loop);
}
}
return result;
}

Categories

Resources