String#replaceAll() to replace *anything but a =* group - java

I have a parameter of key-value like this:
sign="aaaabbbb="
And I want to get the parameter name sign and the value "aaaabbb="(with quote signs)
I thought I could split the string with = to get the first elem of the array which is the parameter name and do a String.replaceAll() to remove the sign= to get the value. Anyway here is my sample code:
public class TestStringReplace {
public static void main(String[] argvs){
String s = "sign=\"aaaabbbb=\"";
String[] ss = s.split("=");
String value = s.replaceAll("\\[^=]+=","");
//EDIT: s.replaceAll("[^=]+=","") will not do the job either.
System.out.println(ss[0]);
System.out.println(value);
}
}
but the output shows this:
sign
sign="aaaabbbb="
Why \\[^=]+= not matching sign= and replace it with empty string here?Quite a newbie of Java regex, need some help.
Thanks in advance.

In Java you can use the following:
String str = "sign=\"aaaabbbb=\"";
String var1 = str.substring(0, str.indexOf('='));
String var2 = str.substring(str.indexOf('=')+1);
System.out.println("var1="+var1+", var2="+var2);
The above would have the following output:
var1=sign, var2="aaaabbbb="

Try the following regex ^\\w+= with replaceAll() instead of your regex:
public class TestStringReplace {
public static void main(String[] argvs){
String s = "sign=\"aaaabbbb=\"";
String[] ss = s.split("=");
String value = s.replaceAll("^\\w+=","");
System.out.println(ss[0]);
System.out.println(value);
}
}
This will remove the sign=.
You can see the DEMO here.
Note that with your "\\[^=]+=" regex you were trying to match the character [ literally in the beginning of your regex.
And it explains why you got sign="aaaabbbb=" as a result with replaceAll() which didn't replace anything because there's no match.

You're probably better off with an actual Pattern and back-references here.
For instance:
String[] test = {
"sign=\"aaaabbbb=\"",
// assuming a HTTP GET-styled parameter list
"blah?sign=\"aaaabbbb=\"",
"foo?sign=\"aaaabbbb=\"&blah=\"hodor\""
};
// | group 1: literal "sign"
// | | literal key-value delimiter and double quote
// | | | group 2: any character reluctantly quantified
// | | | | literal ending double quote
// | | | | | look-ahead for either "&" or end
// | | | | |
Pattern p = Pattern.compile("(sign)=\"(.+?)\"(?=$|&)");
Matcher m = null;
for (String s: test) {
m = p.matcher(s);
while (m.find()) {
System.out.printf(
"Found key: \"%s\" and value: \"%s\"%n", m.group(1), m.group(2)
);
}
}
Output
Found key: "sign" and value: "aaaabbbb="
Found key: "sign" and value: "aaaabbbb="
Found key: "sign" and value: "aaaabbbb="
Notes
I'm assuming a HTTP GET styled parameter list, but maybe you don't need to actually check for a next parameter key-value pair delimiter (i.e. &) - in which case you can remove the & part
I'm also assuming you want the "s out of your value back-reference, which kind of makes the following & check useless
Your current pattern for the replaceAll invocation will match as follows:
// | literal "[" (double-escaped)
// ||literal "^" or "=" (in character class)
// || | ... greedily quantified (1+ occurrences)
// || || literal "="
"\\[^=]+="
Finally, if you really, really want to use String#replaceAll for this, here's a slightly different pattern than the one above:
for (String s: test) {
System.out.println(
s.replaceAll(
".*(sign)=\"(.+?)\"(?=$|&).*",
"Found key: \"$1\" and value: \"$2\""
)
);
}
It still uses back-references and will produce the same result, albeit in a uglier way: you can't reuse the $1 and $2 group values, since you're creating a new String replacing the original one.
Last possible solution, using String#'split. This is the ugliest as it won't work well with a list of parameters:
for (String s: test) {
System.out.println(
// | negative look-behind for start of input
// | | literal "="
// | | | literal "
// | | |
Arrays.toString(s.split("(?<!^)=\""))
);
}
Output
[sign, aaaabbbb]
[blah?sign, aaaabbbb] --> yuck
[foo?sign, aaaabbbb, &blah, hodor"] --> yuck again

The double slash is a mistake, because it is escaping the [ to a literal [, which will never match.
Instead, do this:
String name = s.replaceAll("=.*", "");
String value = s.replaceAll(".*?=", "");

Related

Java Regular Expression not worked

I have the following string source:
String source= "$This-is-(…-“demo”";
I need the result with separation of one desh (-) only with in between words like:
This-is-demo
I remove special characters and replace it with character "-"
String result = source.replaceAll("[^\\p{L}\\p{Z}]" + "\\s*", "-");
The results of running the program result="-This-is-----demo-".
I use the following command with the expectation of removing the character "-" if it is greater than 2.
result.replaceAll("(--|---|----|-----|------|-------|--------|---------|----------)", "-")
my results: -This-is---demo- it incorrectly
FULL CODE
public static void main(String[] args) {
String source = "$This-is-(…-“demo”";
String result= a.replaceAll("[^\\p{L}\\p{Z}]" + "\\s*", "-").trim().replaceAll("(--|---|----|-----|------|-------|--------|---------|----------)", "-");
System.out.println(result);
}
I have seen your problem and then resolved it. Use following in your code:
source.replaceAll("[\\p{P}\\p{S}]", " ").trim().replaceAll("( | | | | )", "-");
It gives the desired result string which you want.

Java Regex file extension

I have to check if a file name ends with a gzip extension. In particular I'm looking for two extensions: ".tar.gz" and ".gz". I would like to capture the file name (and path) as a group using a single regular expression excluding the gzip extension if any.
I tested the following regular expressions on this example path
String path = "/path/to/file.txt.tar.gz";
Expression 1:
String rgx = "(.+)(?=([\\.tar]?\\.gz)$)";
Expression 2:
String rgx = "^(.+)[\\.tar]?\\.gz$";
Extracting group 1 in this way:
Matcher m = Pattern.compile(rgx).matcher(path);
if(m.find()){
System.out.println(m.group(1));
}
Both regular expressions give me the same result: /path/to/file.txt.tar and not /path/to/file.txt.
Any help will be appreciated.
Thanks in advance
You can use the following idiom to match both your path+file name, an gzip extensions in one go:
String[] inputs = {
"/path/to/foo.txt.tar.gz",
"/path/to/bar.txt.gz",
"/path/to/nope.txt"
};
// ┌ group 1: any character reluctantly quantified
// | ┌ group 2
// | | ┌ optional ".tar"
// | | | ┌ compulsory ".gz"
// | | | | ┌ end of input
Pattern p = Pattern.compile("(.+?)((\\.tar)?\\.gz)$");
for (String s: inputs) {
Matcher m = p.matcher(s);
if (m.find()) {
System.out.printf("Found: %s --> %s %n", m.group(1), m.group(2));
}
}
Output
Found: /path/to/foo.txt --> .tar.gz
Found: /path/to/bar.txt --> .gz
You need to make the part that matches the file name reluctant, i.e. change (.+) to (.+?):
String rgx = "^(.+?)(\\.tar)?\\.gz";
// ^^^
Now you get:
Matcher m = Pattern.compile(rgx).matcher(path);
if(m.find()){
System.out.println(m.group(1)); // /path/to/file.txt
}
Use a capturing group based regex.
^(.+)/(.+)(?:\\.tar)?\\.gz$
And,
Get the path from index 1.
Get the filename from index 2.
DEMO

Java Regex for custom function

I'm looking for a Regex pattern that matches the following, but I'm kind of stumped so far. I'm not sure how to grab the results of the two groups I want, marked by id, and attr.
Should match:
account[id].attr
account[anotherid].anotherattr
These should respectively return id, attr,
and anotherid, anotherattr
Any tips?
Here's a complete solution mapping your id -> attributes:
String[] input = {
"account[id].attr",
"account[anotherid].anotherattr"
};
// | literal for "account"
// | | escaped "["
// | | | group 1: any character
// | | | | escaped "]"
// | | | | | escaped "."
// | | | | | | group 2: any character
Pattern p = Pattern.compile("account\\[(.+)\\]\\.(.+)");
Map<String, String> output = new LinkedHashMap<String, String>();
// iterating over input Strings
for (String s: input) {
// matching
Matcher m = p.matcher(s);
// finding only once per input String. Change to a while-loop if multiple instances
// within single input
if (m.find()) {
// back-referencing group 1 and 2 as key -> value
output.put(m.group(1), m.group(2));
}
}
System.out.println(output);
Output
{id=attr, anotherid=anotherattr}
Note
In this implementation, "incomplete" inputs such as "account[anotherid]." will not be put in the Map as they don't match the Pattern at all.
In order to have these cases put as id -> null, you only need to add a ? at the end of the Pattern.
That will make the last group optional.

How to replace multiple words with space in a string using Java

I tried to replace a list of words from a give string with the following code.
String Sample = " he saw a cat running of that pat's mat ";
String regex = "'s | he | of | to | a | and | in | that";
Sample = Sample.replaceAll(regex, " ");
The output is
[ saw cat running that pat mat ]
// minus the []
It still has the last word "that". Is there anyway to modify the regex to consider the last word also.
Try:
String Sample = " he saw a cat running of that pat's mat remove 's";
String resultString = Sample.replaceAll("\\b( ?'s|he|of|to|a|and|in|that)\\b", "");
System.out.print(resultString);
saw cat running pat mat remove
DEMO
http://ideone.com/Yitobz
The problem is that you have consecutive words that you are trying to replace.
For example, consider the substring
[ of that ]
while the replaceAll is running, the [ of ] matches
[ of that ]
^ ^
and that will be replaced with a (space). The next character to match is t, not a space expected by
... | that | ...
What I think you can do to fix this is add word boundaries instead of spaces.
String regex = "'s\\b|\\bhe\\b|\\bof\\b|\\bto\\b|\\ba\\b|\\band\\b|\\bin\\b|\\bthat\\b";
or the shorter version as shown in Tuga's answer.
it doesn't work, because you delete the " of " part first and then there is no space before the "that" word, because you deleted it (replaced)
you can change in two ways:
String regex = "'s | he | of| to | a | and | in | that";
or
String regex = "'s | he | of | to | a | and | in |that ";
or you just call Sample = Sample.replaceAll(regex, " "); again

Java regExp get sub-sting before last quote

I have a sting:
String text = "\"Alaska \"adaa\" asdas\" at [2013-10-298 13:36.062];";
I need to get substing
//"Alaska "adaa" asdas"
String text = "\"Alaska \"adaa\" asdas\"";
How to?
Why not just use lastIndexOf?
text = text.substring(0, text.lastIndexOf("\"") + 1);
One way would be replacing everything after the last quote with an empty string:
test = test.replaceAll("(?<=\")[^\"]*$", "");
// ^^^^^^^ ^^^ ^
// | | |
// Preceded by a quote ----+ | |
// Does not contain a quote -----+ |
// Goes all the way to the end ------+
Try this:
text.replace("\"[^\"]*$", "\"")

Categories

Resources