Java Regex to remove start/end single quotes but leave inside quotes

Java Regex to remove start/end single quotes but leave inside quotes - java

I have data from a CSV file that is enclosed in single quotes, like:
'Company name'
'Price: $43.50'
'New York, New York'
I want to be able to replace the single quotes at the start/end of the value but leave quotes in the data, like:
'Joe's Diner' should become Joe's Diner
I can do
updateString = theString.replace("^'", "").replace("'$", "");
but I wanted to know if I could combine it to only do one replace.

You could use the or operator.
updateString = theString.replaceAll("(^')|('$)","");
See if that works for you :)

updateString = theString.replaceFirst("^'(.*)'$", "$1");
Note that the form you have no won't work because replace uses literal strings, not regexes.
This works by using a capturing group (.*), which is referred to with $1 in the replacement text. You could also do something like:
Pattern patt = Pattern.compile("^'(.*)'$"); // could be stored in a static final field.
Matcher matcher = patt.matcher(theString);
boolean matches = matcher.matches();
updateString = matcher.group(1);
Of course, if you're certain there's a single quote at the beginning and end, the simplest solution is:
updateString = theString.substring(1, theString.length() - 1);

You can use regex to remove double quotes around digits/numbers.
jsonString.replaceAll("\"(\\d+)\"","$1");
above will not work if negative numbers are present.
for negative numbers, the regex will be a little complex like below. But I haven't tried it.
"([0-9]+\.{0,1}[0-9]*)"

Related

Java regex, replace certain characters except

I have this string "u2x4m5x7" and I want replace all the characters but a number followed by an x with "".
The output should be:
"2x5x"
Just the number followed by the x.
But I am getting this:
"2x45x7"
I'm doing this:
String string = "u2x4m5x7";
String s = string.replaceAll("[^0-9+x]","");
Please help!!!

Here is a one-liner using String#replaceAll with two replacements:
System.out.println(string.replaceAll("\\d+(?!x)", "").replaceAll("[^x\\d]", ""));
Here is another working solution. We can iterate the input string using a formal pattern matcher with the pattern \d+x. This is the whitelist approach, of trying to match the variable combinations we want to keep.
String input = "u2x4m5x7";
Pattern pattern = Pattern.compile("\\d+x");
Matcher m = pattern.matcher(input);
StringBuilder b = new StringBuilder();
while(m.find()) {
b.append(m.group(0));
}
System.out.println(b)
This prints:
2x5x

It looks like this would be much simpler by searching to get the match rather than replacing all non matches, but here is a possible solution, though it may be missing a few cases:
\d(?!x)|[^0-9x]|(?<!\d)x
https://regex101.com/r/v6udph/1
Basically it will:
\d(?!x) -- remove any digit not followed by an x
[^0-9x] -- remove all non-x/digit characters
(?<!\d)x -- remove all x's not preceded by a digit
But then again, grabbing from \dx would be much simpler

Capture what you need to $1 OR any character and replace with captured $1 (empty if |. matched).
String s = string.replaceAll("(\\d+x)|.", "$1");
See this demo at regex101 or a Java demo at tio.run

Java regex replace the group value in original string

I have a specific requirement to find a pattern and replace the value of matching group(2) in the original string by retaining the pattern(delimiter), I am using the pattern
:(\w+)[:\|]+(.*)
With this pattern it parse the values correctly but i am not able to replace the value of group(2). For example i have a multi-line input string
:20:9405601140
:2D::11298666
:28C:20/1
I want to replace the value(9405601140) of tag 20 with new value(1234) so the output i am expecting is
:20:1234
:2D::11298666
:28C:20/1
Thanks

Use this one:
input = input.replaceAll("(:20):(\\d+)(?!\\d)", "$1:1234");
Here (\\d+)(?!\\d) is checking whether the digits after the :20: are not followed by a digit or not.
However, if you want to replace only the :20:9405601140 there here it is much simple:
input = input.replaceAll(":20:9405601140(?!\\d)", ":20:1234");

You can do this by capturing what you want to keep, instead of what you want to replace, and then using a backreference ($1, for the first capturing group) in the replacement string to include it in the final result.
Something like:
string.replaceAll("(:\\w+[:\\|]+).*", "$11234")
To perform the replacement on all the given lines, or just:
string.replaceAll("(:20[:\\|]+).*", "$11234")
To perform the replacement only on the line beginning with ":20".

try this
s = s.replaceAll("\\A(?::[:\\|])\\w+", "1234");

How about doing it the other way around.
Create a pattern like this (:(\w+)[:\|]+)(.*) then for each row output the first group and your replacement (instead of group 2).
Here is an working example http://ideone.com/9TkGx6

Java regex to obtain value between parenthesis

Regular expression to obtain value from [[text]]. I have tried the regex
"((?<=[[)*(?=]])*)+" the value between [[ ]] is not obtained.
For example, from the string [[text]], we should obtain text.
Pattern pat = Pattern.compile("((?<=\\[[)*(?=\\]])*)");
Matcher matcher = pat.matcher("[[text]]");
String next ="";
while(matcher.find()) {
next = matcher.group(0);
break;
}
System.out.println(next); //next should be text

You need to escape brackets [] when using them as actual characters in a regular expression. And you also need to add something to actually capture what is between the brackets. You can use .* for that or use my approach, if you are sure the value cannot contain a ].
((?<=\[\[)([^\]]*)(?=\]\]))+
There is not even really a need to use lookbacks und lookaheads unless you explictly need to exempt those limiters from the match. This will work just as well:
\[\[([\]]*\]\]
And obviously when you put these into a String, you need to add additional \ to escape the \ for the String...they are just more readable this way.

If you don't wanna get into regex, String.replaceAll can also help you.
String s2 = s.replaceAll("\\[", "").replaceAll("\\]", "");

"(?<=\\[\\[)[^\\]]*"
this should work for you

Can I use regex to match every third occurrence of a specific character?

I have a string containing some delimited values:
1.95;1.99;1.78;10.9;11.45;10.5;25.95;26;45;21.2
What I'd like to achieve is a split by every third occurence of a semicolon, so my resulting String[] should contain this:
result[0] = "1.95;1.99;1.78";
result[1] = "10.9;11.45;10.5";
result[2] = "25.95;26;45";
result[3] = "21.2";
So far I've tried several regex solutions, but all I could get to was finding any patterns that are between the semi colons. For example:
(?<=^|;)[^;]*;?[^;]*;?[^;]*
Which matches the values I want, so that makes it impossible to use split() or am I missing something?
Unfortunately I can only supply the pattern used and have no possibility to add some looping through results of the above pattern.

String re = "(?<=\\G[^;]*;[^;]*;[^;]*);";
String text = "1.95;1.99;1.78;10.9;11.45;10.5;25.95;26;45;21.2";
String[] result = Pattern.compile(re).split(text);
Now the result is what you want
Hint: \G in java's regex is a boundary matcher like ^, it means 'end of previous match'

You can try something like this instead:
String s = "1.95;1.99;1.78;10.9;11.45;10.5;25.95;26;45;21.2";
Pattern p = Pattern.compile(".*?;.*?;.*?;");
Matcher m = p.matcher(s);
int lastEnd = -1;
while(m.find()){
System.out.println(m.group());
lastEnd = m.end();
}
System.out.println(s.substring(lastEnd));

You are correct. Since Java doesn't support indefinite-length lookbehind assertions (which you need if you want to check whether there are 3, 6, 9 or 3*n values before the current semicolon), you can't use split() for this. Your regex works perfectly with a "find all" approach, but if you can't apply that in your situation, you're out of luck.
In other languages (.NET-based ones, for example), the following regex would work:
;(?<=^(?:[^;]*;[^;]*;[^;]*;)*)

Would something like:
([0-9.]*;){3}
not work for your needs? The caveat is that there will be a trailing ; at the end of the group. You might be able to tweak the expression to trim that off however.
I just reread your question, and although this simple expression will work for matching groups, if you need to supply it to the split() method, it will unfortunately not do the job.

Regular expression to select all whitespace that isn't in quotes?

I'm not very good at RegEx, can someone give me a regex (to use in Java) that will select all whitespace that isn't between two quotes? I am trying to remove all such whitespace from a string, so any solution to do so will work.
For example:
(this is a test "sentence for the regex")
should become
(thisisatest"sentence for the regex")

Here's a single regex-replace that works:
\s+(?=([^"]*"[^"]*")*[^"]*$)
which will replace:
(this is a test "sentence for the regex" foo bar)
with:
(thisisatest"sentence for the regex"foobar)
Note that if the quotes can be escaped, the even more verbose regex will do the trick:
\s+(?=((\\[\\"]|[^\\"])*"(\\[\\"]|[^\\"])*")*(\\[\\"]|[^\\"])*$)
which replaces the input:
(this is a test "sentence \"for the regex" foo bar)
with:
(thisisatest"sentence \"for the regex"foobar)
(note that it also works with escaped backspaces: (thisisatest"sentence \\\"for the regex"foobar))
Needless to say (?), this really shouldn't be used to perform such a task: it makes ones eyes bleed, and it performs its task in quadratic time, while a simple linear solution exists.
EDIT
A quick demo:
String text = "(this is a test \"sentence \\\"for the regex\" foo bar)";
String regex = "\\s+(?=((\\\\[\\\\\"]|[^\\\\\"])*\"(\\\\[\\\\\"]|[^\\\\\"])*\")*(\\\\[\\\\\"]|[^\\\\\"])*$)";
System.out.println(text.replaceAll(regex, ""));
// output: (thisisatest"sentence \"for the regex"foobar)

Here is the regex which works for both single & double quotes (assuming that all strings are delimited properly)
\s+(?=(?:[^\'"]*[\'"][^\'"]*[\'"])*[^\'"]*$)
It won't work with the strings which has quotes inside.

This just isn't something regexes are good at. Search-and-replace functions with regexes are always a bit limited, and any sort of nesting/containment at all becomes difficult and/or impossible.
I'd suggest an alternate approach: Split your string on quote characters. Go through the resulting array of strings, and strip the spaces from every other substring (whether you start with the first or second depends on whether you string started with a quote or not). Then join them back together, using quotes as separators. That should produce the results you're looking for.
Hope that helps!
PS: Note that this won't handle nested strings, but since you can't make nested strings with the ASCII double-qutoe character, I'm gonna assume you don't need that behaviour.
PPS: Once you're dealing with your substrings, then it's a good time to use regexes to kill those spaces - no containing quotes to worry about. Just remember to use the /.../g modifier to make sure it's a global replacement and not just the first match.

Groups of whitespace outside of quotes are separated by stuff that's a) not whitespace, or b) inside quotes.
Perhaps something like:
(\s+)([^ "]+|"[^"]*")*
The first part matches a sequence of spaces; the second part matches non-spaces (and non-quotes), or some stuff in quotes, either repeated any number of times. The second part is the separator.
This will give you two groups for each item in the result; just ignore the second element. (We need the parentheses for precidence rather than match grouping there.) Or, you could say, concatenate all the second elements -- though you need to match the first non-space word too, or in this example, make the spaces optional:
StringBuffer b = new StringBuffer();
Pattern p = Pattern.compile("(\\s+)?([^ \"]+|\"[^\"]*\")*");
Matcher m = p.matcher("this is \"a test\"");
while (m.find()) {
if (m.group(2) != null)
b.append(m.group(2));
}
System.out.println(b.toString());
(I haven't done much regex in Java so expect bugs.)
Finally This is how I'd do it if regexes were compulsory. ;-)
As well as Xavier's technique, you could simply do it the way you'd do it in C: just iterate over the input characters, and copy each to the new string if either it's non-space, or you've counted an odd number of quotes up to that point.

If there is only one set of quotes, you can do this:
String s = "(this is a test \"sentence for the regex\") a b c";
Matcher matcher = Pattern.compile("^[^\"]+|[^\"]+$").matcher(s);
while (matcher.find())
{
String group = matcher.group();
s = s.replace(group, group.replaceAll("\\s", ""));
}
System.out.println(s); // (thisisatest"sentence for the regex")abc

This isn't an exact solution, but you can accomplish your goal by doing the following:
STEP 1: Match the two segments
\\(([a-zA-Z ]\*)"([a-zA-Z ]\*)"\\)
STEP 2: remove spaces
temp = $1 replace " " with ""
STEP 3: rebuild your string
(temp"$2")

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regex to remove start/end single quotes but leave inside quotes - java

You could use the or operator. updateString = theString.replaceAll("(^')|('$)",""); See if that works for you :)

You can use regex to remove double quotes around digits/numbers. jsonString.replaceAll("\"(\\d+)\"","$1"); above will not work if negative numbers are present. for negative numbers, the regex will be a little complex like below. But I haven't tried it. "([0-9]+\.{0,1}[0-9]*)"

Related

Java regex, replace certain characters except

Java regex replace the group value in original string

Java regex to obtain value between parenthesis

Can I use regex to match every third occurrence of a specific character?

Regular expression to select all whitespace that isn't in quotes?

Categories

Resources