Replace whole tokens that may contain regular expression

Replace whole tokens that may contain regular expression - java

I want to do a startStr.replaceAll(searchStr, replaceStr) and I have two requirements.
The searchStr must be a whole word, meaning it must have a space, beginning of string or end of string character around it.
e.g.
startStr = "ON cONfirmation, put ON your hat"
searchStr = "ON"
replaceStr = ""
expected = " cONfirmation, put your hat"
The searchStr may contain a regex pattern
e.g.
startStr = "remove this * thing"
searchStr = "*"
replaceStr = ""
expected = "remove this thing"
For requirement 1, I've found that this works:
startStr.replaceAll("\\b"+searchStr+"\\b",replaceStr)
For requirement 2, I've found that this works:
startStr.replaceAll(Pattern.quote(searchStr), replaceStr)
But I can't get them to work together:
startStr.replaceAll("\\b"+Pattern.quote(searchStr)+"\\b", replaceStr)
Here is the simple test case that's failing
startStr = "remove this * thing but not this*"
searchStr = "*"
replaceStr = ""
expected = "remove this thing but not this*"
actual = "remove this * thing but not this*"
What am I missing?
Thanks in advance

First off, the \b, or word boundary, is not going to work for you with the asterisks. The reason is that \b only detects boundaries of word characters. A regex parser won't acknowledge * as a word character, so a wildcard-endowed word that begins or ends with a regex won't be surrounded by valid word boundaries.
Reference pages:
http://www.regular-expressions.info/wordboundaries.html
http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html
An option you might like is to supply wildcard permutations in a regex:
(?<=\s|^)(ON|\*N|O\*|\*)(?=\s|$)
Here's a Java example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class RegExTest
{
public static void main(String[] args){
String sourcestring = "ON cONfirmation, put * your hat";
sourcestring = sourcestring.replaceAll("(?<=\\s|^)(ON|\\*N|O\\*|\\*)(?=\\s|$)","").replaceAll(" "," ").trim();
System.out.println("sourcestring=["+sourcestring+"]");
}
}
You can write a little function to generate the wildcard permutations automatically. I admit I cheated a little with the spaces, but I don't think that was a requirement anyway.
Play with it online here: http://ideone.com/7uGfIS

The pattern "\\b" matches a word boundary, with a word character on one side and a non-word character on the other. * is not a word character, so \\b\\*\\b won't work. Look-behind and look-ahead match but do not consume patterns. You can specify that the beginning of the string or whitespace must come before your pattern and that whitespace or the end of the string must follow:
startStr.replaceAll("(?<=^|\\s)"+Pattern.quote(searchStr)+"(?=\\s|$)", replaceStr)

Try this,
For removing "ON"
StringBuilder stringBuilder = new StringBuilder();
String[] splittedValue = startStr.split(" ");
for (String value : splittedValue)
{
if (!value.equalsIgnoreCase("ON"))
{
stringBuilder.append(value);
stringBuilder.append(" ");
}
}
System.out.println(stringBuilder.toString().trim());
For removing "*"
String startStr1 = "remove this * thing";
System.out.println(startStr1.replaceAll("\\*[\\s]", ""));

You can use (^| )\*( |$) instead of using \\b
Try this startStr.replaceAll("(^| )youSearchString( |$)", replaceStr);

Related

String split regex [duplicate]

I'm new to regular expressions and would appreciate your help. I'm trying to put together an expression that will split the example string using all spaces that are not surrounded by single or double quotes. My last attempt looks like this: (?!") and isn't quite working. It's splitting on the space before the quote.
Example input:
This is a string that "will be" highlighted when your 'regular expression' matches something.
Desired output:
This
is
a
string
that
will be
highlighted
when
your
regular expression
matches
something.
Note that "will be" and 'regular expression' retain the space between the words.

I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:
[^\s"']+|"([^"]*)"|'([^']*)'
I added the capturing groups because you don't want the quotes in the list.
This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
if (regexMatcher.group(1) != null) {
// Add double-quoted string without the quotes
matchList.add(regexMatcher.group(1));
} else if (regexMatcher.group(2) != null) {
// Add single-quoted string without the quotes
matchList.add(regexMatcher.group(2));
} else {
// Add unquoted word
matchList.add(regexMatcher.group());
}
}
If you don't mind having the quotes in the returned list, you can use much simpler code:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}

There are several questions on StackOverflow that cover this same question in various contexts using regular expressions. For instance:
parsings strings: extracting words and phrases
Best way to parse Space Separated Text
UPDATE: Sample regex to handle single and double quoted strings. Ref: How can I split on a string except when inside quotes?
m/('.*?'|".*?"|\S+)/g
Tested this with a quick Perl snippet and the output was as reproduced below. Also works for empty strings or whitespace-only strings if they are between quotes (not sure if that's desired or not).
This
is
a
string
that
"will be"
highlighted
when
your
'regular expression'
matches
something.
Note that this does include the quote characters themselves in the matched values, though you can remove that with a string replace, or modify the regex to not include them. I'll leave that as an exercise for the reader or another poster for now, as 2am is way too late to be messing with regular expressions anymore ;)

If you want to allow escaped quotes inside the string, you can use something like this:
(?:(['"])(.*?)(?<!\\)(?>\\\\)*\1|([^\s]+))
Quoted strings will be group 2, single unquoted words will be group 3.
You can try it on various strings here: http://www.fileformat.info/tool/regex.htm or http://gskinner.com/RegExr/

The regex from Jan Goyvaerts is the best solution I found so far, but creates also empty (null) matches, which he excludes in his program. These empty matches also appear from regex testers (e.g. rubular.com).
If you turn the searches arround (first look for the quoted parts and than the space separed words) then you might do it in once with:
("[^"]*"|'[^']*'|[\S]+)+

(?<!\G".{0,99999})\s|(?<=\G".{0,99999}")\s
This will match the spaces not surrounded by double quotes.
I have to use min,max {0,99999} because Java doesn't support * and + in lookbehind.

It'll probably be easier to search the string, grabbing each part, vs. split it.
Reason being, you can have it split at the spaces before and after "will be". But, I can't think of any way to specify ignoring the space between inside a split.
(not actual Java)
string = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
regex = "\"(\\\"|(?!\\\").)+\"|[^ ]+"; // search for a quoted or non-spaced group
final = new Array();
while (string.length > 0) {
string = string.trim();
if (Regex(regex).test(string)) {
final.push(Regex(regex).match(string)[0]);
string = string.replace(regex, ""); // progress to next "word"
}
}
Also, capturing single quotes could lead to issues:
"Foo's Bar 'n Grill"
//=>
"Foo"
"s Bar "
"n"
"Grill"

String.split() is not helpful here because there is no way to distinguish between spaces within quotes (don't split) and those outside (split). Matcher.lookingAt() is probably what you need:
String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|('[^']+?')|([^\\s]+?))\\s++").matcher(str);
for (int i = 0; i < len; i++)
{
m.region(i, len);
if (m.lookingAt())
{
String s = m.group(1);
if ((s.startsWith("\"") && s.endsWith("\"")) ||
(s.startsWith("'") && s.endsWith("'")))
{
s = s.substring(1, s.length() - 1);
}
System.out.println(i + ": \"" + s + "\"");
i += (m.group(0).length() - 1);
}
}
which produces the following output:
0: "This"
5: "is"
8: "a"
10: "string"
17: "that"
22: "will be"
32: "highlighted"
44: "when"
49: "your"
54: "regular expression"
75: "matches"
83: "something."

I liked Marcus's approach, however, I modified it so that I could allow text near the quotes, and support both " and ' quote characters. For example, I needed a="some value" to not split it into [a=, "some value"].
(?<!\\G\\S{0,99999}[\"'].{0,99999})\\s|(?<=\\G\\S{0,99999}\".{0,99999}\"\\S{0,99999})\\s|(?<=\\G\\S{0,99999}'.{0,99999}'\\S{0,99999})\\s"

Jan's approach is great but here's another one for the record.
If you actually wanted to split as mentioned in the title, keeping the quotes in "will be" and 'regular expression', then you could use this method which is straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc
The regex:
'[^']*'|\"[^\"]*\"|( )
The two left alternations match complete 'quoted strings' and "double-quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expressions on the left. We replace those with SplitHere then split on SplitHere. Again, this is for a true split case where you want "will be", not will be.
Here is a full working implementation (see the results on the online demo).
import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) throws java.lang.Exception {
String subject = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
Pattern regex = Pattern.compile("\'[^']*'|\"[^\"]*\"|( )");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits) System.out.println(split);
} // end main
} // end Program

If you are using c#, you can use
string input= "This is a string that \"will be\" highlighted when your 'regular expression' matches <something random>";
List<string> list1 =
Regex.Matches(input, #"(?<match>\w+)|\""(?<match>[\w\s]*)""|'(?<match>[\w\s]*)'|<(?<match>[\w\s]*)>").Cast<Match>().Select(m => m.Groups["match"].Value).ToList();
foreach(var v in list1)
Console.WriteLine(v);
I have specifically added "|<(?[\w\s]*)>" to highlight that you can specify any char to group phrases. (In this case I am using < > to group.
Output is :
This
is
a
string
that
will be
highlighted
when
your
regular expression
matches
something random

1st one-liner using String.split()
String s = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
String[] split = s.split( "(?<!(\"|').{0,255}) | (?!.*\\1.*)" );
[This, is, a, string, that, "will be", highlighted, when, your, 'regular expression', matches, something.]
don't split at the blank, if the blank is surrounded by single or double quotes
split at the blank when the 255 characters to the left and all characters to the right of the blank are neither single nor double quotes
adapted from original post (handles only double quotes)

I'm reasonably certain this is not possible using regular expressions alone. Checking whether something is contained inside some other tag is a parsing operation. This seems like the same problem as trying to parse XML with a regex -- it can't be done correctly. You may be able to get your desired outcome by repeatedly applying a non-greedy, non-global regex that matches the quoted strings, then once you can't find anything else, split it at the spaces... that has a number of problems, including keeping track of the original order of all the substrings. Your best bet is to just write a really simple function that iterates over the string and pulls out the tokens you want.

A couple hopefully helpful tweaks on Jan's accepted answer:
(['"])((?:\\\1|.)+?)\1|([^\s"']+)
Allows escaped quotes within quoted strings
Avoids repeating the pattern for the single and double quote; this also simplifies adding more quoting symbols if needed (at the expense of one more capturing group)

You can also try this:
String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something";
String ss[] = str.split("\"|\'");
for (int i = 0; i < ss.length; i++) {
if ((i % 2) == 0) {//even
String[] part1 = ss[i].split(" ");
for (String pp1 : part1) {
System.out.println("" + pp1);
}
} else {//odd
System.out.println("" + ss[i]);
}
}

The following returns an array of arguments. Arguments are the variable 'command' split on spaces, unless included in single or double quotes. The matches are then modified to remove the single and double quotes.
using System.Text.RegularExpressions;
var args = Regex.Matches(command, "[^\\s\"']+|\"([^\"]*)\"|'([^']*)'").Cast<Match>
().Select(iMatch => iMatch.Value.Replace("\"", "").Replace("'", "")).ToArray();

When you come across this pattern like this :
String str = "2022-11-10 08:35:00,470 RAV=REQ YIP=02.8.5.1 CMID=caonaustr CMN=\"Some Value Pyt Ltd\"";
//this helped
String[] str1= str.split("\\s(?=(([^\"]*\"){2})*[^\"]*$)\\s*");
System.out.println("Value of split string is "+ Arrays.toString(str1));
This results in :[2022-11-10, 08:35:00,470, PLV=REQ, YIP=02.8.5.1, CMID=caonaustr, CMN="Some Value Pyt Ltd"]
This regex matches spaces ONLY if it is followed by even number of double quotes.

Need help in Regex to exclude splitting string within "

I need to split a String based on comma as seperator, but if the part of string is enclosed with " the splitting has to stop for that portion from starting of " to ending of it even it contains commas in between.
Can anyone please help me to solve this using regex with look around.

Resurrecting this question because it had a simple regex solution that wasn't mentioned. This situation sounds very similar to ["regex-match a pattern unless..."][4]
\"[^\"]*\"|(,)
The left side of the alternation matches complete double-quoted strings. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left.
Here is working code (see online demo):
import java.util.regex.*;
import java.util.List;
class Program {
public static void main (String[] args) {
String subject = "\"Messages,Hello\",World,Hobbies,Java\",Programming\"";
Pattern regex = Pattern.compile("\"[^\"]*\"|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b = new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits)
System.out.println(split);
} // end main
} // end Program
Reference
How to match pattern except in situations s1, s2, s3

Please try this:
(?<!\G\s*"[^"]*),
If you put this regex in your program, it should be:
String regex = "(?<!\\G\\s*\"[^\"]*),";
But 2 things are not clear:
Does the " only start near the ,, or it can start in the middle of content, such as AAA, BB"CC,DD" ? The regex above only deal with start neer , .
If the content has " itself, how to escape? use "" or \"? The regex above does not deal any escaped " format.

Excluding markup on lowercased parentheses letters

A string can contain one to many parentheses in lower case letters like String content = "This is (a) nightmare"; I want to transform the string to "<centamp>This is </centamp>(a) <centamp>nightmare</centamp>"; So basically add centamp markup around this string but if it has a lowercase letter in parentheses that should be excluded from the markup.
This is what I have tried so far, but it doesn't achieve the desired result. There could be none to many parentheses in a string and excluding it from the markup should happen for every parentheses.
Pattern pattern = Pattern.compile("^(.*)?(\\([a-z]*\\))?(.*)?$", Pattern.MULTILINE);
String content = "This is (a) nightmare";
System.out.println(content.matches("^(.*)?(\\([a-z]*\\))?(.*)?$"));
System.out.println(pattern.matcher(content).replaceAll("<centamp>$1$3</centamp>$2"));

This can be done in one replaceAll:
String outputString =
inputString.replaceAll("(?s)\\G((?:\\([a-z]+\\))*+)((?:(?!\\([a-z]+\\)).)+)",
"$1<centamp>$2</centamp>");
It allows a non-empty sequence of lower case English alphabet character inside bracket \\([a-z]+\\).
Features:
Whitespace only sequences are tagged.
There will be no tag surrounding empty string.
Explanation:
\G asserts the match boundary, i.e. the next match can only start from the end of last match. It can also match the beginning of the string (when we have yet to find any match).
Each match of the regex will contain a sequence of: 0 or more consecutive \\([a-z]+\\) (no space between allowed), and followed by at least 1 character that does not form \\([a-z]+\\) sequence.
0 or more consecutive \\([a-z]+\\) to cover the case where the string does not start with \\([a-z]+\\), and the case where the string does not contain \\([a-z]+\\).
In the pattern for this portion (?:\\([a-z]+\\))*+ - note that the + after * makes the quantifier possessive, in other words, it disallows backtracking. Simply put, an optimization.
One character restriction is necessary to prevent adding tag that encloses empty string.
In the pattern for this portion (?:(?!\\([a-z]+\\)).)+ - note that for every character, I check whether it is part of the pattern \\([a-z]+\\) before matching it (?!\\([a-z]+\\))..
(?s) flag will cause . to match any character including new line. This will allow a tag to enclose text that spans multiple lines.

You just replace all of the occurence of "([a-z])" with </centamp>$1<centamp> and then prepend <centamp> and append </centamp>
String content = "Test (a) test (b) (c)";
Pattern pattern = Pattern.compile("(\\([a-z]\\))");
Matcher matcher = pattern.matcher(content);
String result = "<centamp>" + matcher.replaceAll("</centamp>$1<centamp>") + "</centamp>";
note I wrote the above in the browser so there may be syntax errors.
EDIT Here's a full example with the simplest RegEx possible.
import java.util.*;
import java.lang.*;
import java.util.regex.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String content = "test (a) (b) and (c)";
String result = "<centamp>" +
content.replaceAll("(\\([a-z]\\))", "</centamp>$1<centamp>") +
"</centamp>";
result = result.replaceAll("<centamp></centamp>", "");
System.out.print(result);
}
}

This is another solution which uses cleaner regex. The solution is longer, but it allows more flexibility in adjusting the condition to add tag.
The idea here is to match the parenthesis containing lower case characters (the part we don't want to tag), then use the indices from the matches to identify the portion we want to enclose in tag.
// Regex for the parenthesis containing only lowercase English
// alphabet characters
static Pattern REGEX_IN_PARENTHESIS = Pattern.compile("\\([a-z]+\\)");
private static String addTag(String str) {
Matcher matcher = REGEX_IN_PARENTHESIS.matcher(str);
StringBuilder sb = new StringBuilder();
// Index that we have processed up to last append into StringBuilder
int lastAppend = 0;
while (matcher.find()) {
String bracket = matcher.group();
// The string from lastAppend to start of a match is the part
// we want to tag
// If you want to, you can easily add extra logic to process
// the string
if (lastAppend < matcher.start()) { // will not tag if empty string
sb.append("<centamp>")
.append(str, lastAppend, matcher.start())
.append("</centamp>");
}
// Append the parenthesis with lowercase English alphabet as it is
sb.append(bracket);
lastAppend = matcher.end();
}
// The string from lastAppend to end of string (no more match)
// is the part we want to tag
if (lastAppend < str.length()) {
sb.append("<centamp>")
.append(str, lastAppend, str.length())
.append("</centamp>");
}
return sb.toString();
}

Regular expression to match unescaped special characters only

I'm trying to come up with a regular expression that can match only characters not preceded by a special escape sequence in a string.
For instance, in the string Is ? stranded//? , I want to be able to replace the ? which hasn't been escaped with another string, so I can have this result : **Is Dave stranded?**
But for the life of me I have not been able to figure out a way. I have only come up with regular expressions that eat all the replaceable characters.
How do you construct a regular expression that matches only characters not preceded by an escape sequence?

Use a negative lookbehind, it's what they were designed to do!
(?<!//)[?]
To break it down:
(
?<! #The negative look behind. It will check that the following slashes do not exist.
// #The slashes you are trying to avoid.
)
[\?] #Your special charactor list.
Only if the // cannot be found, it will progress with the rest of the search.
I think in Java it will need to be escaped again as a string something like:
Pattern p = Pattern.compile("(?<!//)[\\?]");

Try this Java code:
str="Is ? stranded//?";
Pattern p = Pattern.compile("(?<!//)([?])");
m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1).replace("?", "Dave"));
}
m.appendTail(sb);
String s = sb.toString().replace("//", "");
System.out.println("Output: " + s);
OUTPUT
Output: Is Dave stranded?

I was thinking about this and have a second simplier solution, avoiding regexs. The other answers are probably better but I thought I might post it anyway.
String input = "Is ? stranded//?";
String output = input
.replace("//?", "a717efbc-84a9-46bf-b1be-8a9fb714fce8")
.replace("?", "Dave")
.replace("a717efbc-84a9-46bf-b1be-8a9fb714fce8", "?");
Just protect the "//?" by replacing it with something unique (like a guid). Then you know any remaining question marks are fair game.

Use grouping. Here's one example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
Pattern p = Pattern.compile("([^/][^/])(\\?)");
String s = "Is ? stranded//?";
Matcher m = p.matcher(s);
if (m.matches)
s = m.replaceAll("$1XXX").replace("//", "");
System.out.println(s + " -> " + s);
}
}
Output:
$ java Test
Is ? stranded//? -> Is XXX stranded?
In this example, I'm:
first replacing any non-escaped ? with "XXX",
then, removing the "//" escape sequences.
EDIT Use if (m.matches) to ensure that you handle non-matching strings properly.
This is just a quick-and-dirty example. You need to flesh it out, obviously, to make it more robust. But it gets the general idea across.

Match on a set of characters OTHER than an escape sequence, then a regex special character. You could use an inverted character class ([^/]) for the first bit. Special case an unescaped regex character at the front of the string.

String aString = "Is ? stranded//?";
String regex = "(?<!//)[^a-z^A-Z^\\s^/]";
System.out.println(aString.replaceAll(regex, "Dave"));
The part of the regular expression [^a-z^A-Z^\\s^/] matches non-alphanumeric, whitespace or non-forward slash charaters.
The (?<!//) part does a negative lookbehind - see docco here for more info
This gives the output Is Dave stranded//?

try matching:
(^|(^.)|(.[^/])|([^/].))[special characters list]

I used this one:
((?:^|[^\\])(?:\\\\)*[ESCAPABLE CHARACTERS HERE])
Demo: https://regex101.com/r/zH1zO3/4

how to read string part in java

I have this string :
<meis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" uri="localhost/naro-nei" onded="flpSW531213" identi="lemenia" id="75" lastStop="bendi" xsi:noNamespaceSchemaLocation="http://localhost/xsd/postat.xsd xsd/postat.xsd">
How can I get lastStop property value in JAVA?
This regex worked when tested on http://www.myregexp.com/
But when I try it in java I don't see the matched text, here is how I tried :
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SimpleRegexTest {
public static void main(String[] args) {
String sampleText = "<meis xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" uri=\"localhost/naro-nei\" onded=\"flpSW531213\" identi=\"lemenia\" id=\"75\" lastStop=\"bendi\" xsi:noNamespaceSchemaLocation=\"http://localhost/xsd/postat.xsd xsd/postat.xsd\">";
String sampleRegex = "(?<=lastStop=[\"']?)[^\"']*";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
String matchedText = m.group();
System.out.println("matched [" + matchedText + "]");
} else {
System.out.println("didn’t match");
}
}
}
Maybe the problem is that I use escape char in my test , but real string doesn't have escape inside. ?
UPDATE
Does anyone know why this doesn't work when used in java ? or how to make it work?

(?<=lastStop=[\"']?)[^\"]+

The reason it doesn't work as you expect is because of the * in [^\"']*. The lookbehind is matching at the position before the " in lastStop=", which is permitted because the quote is optional: [\"']?. The next part is supposed to match zero or more non-quote characters, but because the next character is a quote, it matches zero characters.
If you change that * to a +, the second part will fail to match at that position, forcing the regex engine to bump ahead one more position. The lookbehind will match the quote, and [^\"']+ will match what follows. However, you really shouldn't be using a lookbehind for this in the first place. It's much easier to just match the whole sequence in the normal way and extract the part you want to keep via a capturing group:
String sampleRegex = "lastStop=[\"']?([^\"']*)";
Pattern p = Pattern.compile(sampleRegex);
Matcher m = p.matcher(sampleText);
if (m.find()) {
String matchedText = m.group(1);
System.out.println("matched [" + matchedText + "]");
} else {
System.out.println("didn’t match");
}
It will also make it easier to deal with the problem #Kobi mentioned. You're trying to allow for values contained in double-quotes, single-quotes or no quotes, but your regex is too simplistic. For one thing, a quoted value can contain whitespace, but an unquoted one can't. To deal with all three possibilities, you'll need two or three capturing groups, not just one.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace whole tokens that may contain regular expression - java

You can use (^| )\*( |$) instead of using \\b Try this startStr.replaceAll("(^| )youSearchString( |$)", replaceStr);

Related

String split regex [duplicate]

Need help in Regex to exclude splitting string within "

Excluding markup on lowercased parentheses letters

Regular expression to match unescaped special characters only

how to read string part in java

Categories

Resources