Java Regex for custom function - java

I'm looking for a Regex pattern that matches the following, but I'm kind of stumped so far. I'm not sure how to grab the results of the two groups I want, marked by id, and attr.
Should match:
account[id].attr
account[anotherid].anotherattr
These should respectively return id, attr,
and anotherid, anotherattr
Any tips?

Here's a complete solution mapping your id -> attributes:
String[] input = {
"account[id].attr",
"account[anotherid].anotherattr"
};
// | literal for "account"
// | | escaped "["
// | | | group 1: any character
// | | | | escaped "]"
// | | | | | escaped "."
// | | | | | | group 2: any character
Pattern p = Pattern.compile("account\\[(.+)\\]\\.(.+)");
Map<String, String> output = new LinkedHashMap<String, String>();
// iterating over input Strings
for (String s: input) {
// matching
Matcher m = p.matcher(s);
// finding only once per input String. Change to a while-loop if multiple instances
// within single input
if (m.find()) {
// back-referencing group 1 and 2 as key -> value
output.put(m.group(1), m.group(2));
}
}
System.out.println(output);
Output
{id=attr, anotherid=anotherattr}
Note
In this implementation, "incomplete" inputs such as "account[anotherid]." will not be put in the Map as they don't match the Pattern at all.
In order to have these cases put as id -> null, you only need to add a ? at the end of the Pattern.
That will make the last group optional.

Related

How to extract members with regex

I have this string to parse and extract all elements between <>:
String text = "test user #myhashtag <#C5712|user_name_toto> <#U433|user_hola>";
I tried with this pattern, but it doesn't work (no result):
String pattern = "<#[C,U][0-9]+\\|[.]+>";
So in this example I want to extract:
<#C5712|user_name_toto>
<#U433|user_hola>
Then for each, I want to extract:
C or U element
ID (ie: 5712 or 433)
user name (ie: user_name_toto)
Thank you very much guys
The main problem I can see with your pattern is that it doesn't contain groups, hence retrieving parts of it will be impossible without further parsing.
You define numbered groups within parenthesis: (partOfThePattern).
From Java 7 onwards, you can also define named groups as follows: (?<theName>partOfThePattern).
The second problem is that [.] corresponds to a literal dot, not an "any character" wildcard.
The third problem is your last quantifier, which is greedy, therefore it would consume the whole rest of the string starting from the first username.
Here's a self-contained example fixing all that:
String text = "test user #myhashtag <#C5712|user_name_toto> <#U433|user_hola>";
// | starting <#
// | | group 1: any 1 char
// | | | group 2: 1+ digits
// | | | | escaped "|"
// | | | | | group 3: 1+ non-">" chars, greedy
// | | | | | | closing >
// | | | | | |
Pattern p = Pattern.compile("<#(.)(\\d+)\\|([^>]+))>");
Matcher m = p.matcher(text);
while (m.find()) {
System.out.printf(
"C or U? %s%nUser ID: %s%nUsername: %s%n",
m.group(1), m.group(2), m.group(3)
);
}
Output
C or U? C
User ID: 5712
Username: user_name_toto
C or U? U
User ID: 433
Username: user_hola
Note
I'm not validating C vs U here (gives you another . example).
You can easily replace the initial (.) with (C|U) if you only have either. You can also have the same with ([CU]).
<#([CU])(\d{4})\|(\w+)>
Where:
$1 --> C/U
$2 --> 5712/433
$3 --> user_name_toto/user_hola

String#replaceAll() to replace *anything but a =* group

I have a parameter of key-value like this:
sign="aaaabbbb="
And I want to get the parameter name sign and the value "aaaabbb="(with quote signs)
I thought I could split the string with = to get the first elem of the array which is the parameter name and do a String.replaceAll() to remove the sign= to get the value. Anyway here is my sample code:
public class TestStringReplace {
public static void main(String[] argvs){
String s = "sign=\"aaaabbbb=\"";
String[] ss = s.split("=");
String value = s.replaceAll("\\[^=]+=","");
//EDIT: s.replaceAll("[^=]+=","") will not do the job either.
System.out.println(ss[0]);
System.out.println(value);
}
}
but the output shows this:
sign
sign="aaaabbbb="
Why \\[^=]+= not matching sign= and replace it with empty string here?Quite a newbie of Java regex, need some help.
Thanks in advance.
In Java you can use the following:
String str = "sign=\"aaaabbbb=\"";
String var1 = str.substring(0, str.indexOf('='));
String var2 = str.substring(str.indexOf('=')+1);
System.out.println("var1="+var1+", var2="+var2);
The above would have the following output:
var1=sign, var2="aaaabbbb="
Try the following regex ^\\w+= with replaceAll() instead of your regex:
public class TestStringReplace {
public static void main(String[] argvs){
String s = "sign=\"aaaabbbb=\"";
String[] ss = s.split("=");
String value = s.replaceAll("^\\w+=","");
System.out.println(ss[0]);
System.out.println(value);
}
}
This will remove the sign=.
You can see the DEMO here.
Note that with your "\\[^=]+=" regex you were trying to match the character [ literally in the beginning of your regex.
And it explains why you got sign="aaaabbbb=" as a result with replaceAll() which didn't replace anything because there's no match.
You're probably better off with an actual Pattern and back-references here.
For instance:
String[] test = {
"sign=\"aaaabbbb=\"",
// assuming a HTTP GET-styled parameter list
"blah?sign=\"aaaabbbb=\"",
"foo?sign=\"aaaabbbb=\"&blah=\"hodor\""
};
// | group 1: literal "sign"
// | | literal key-value delimiter and double quote
// | | | group 2: any character reluctantly quantified
// | | | | literal ending double quote
// | | | | | look-ahead for either "&" or end
// | | | | |
Pattern p = Pattern.compile("(sign)=\"(.+?)\"(?=$|&)");
Matcher m = null;
for (String s: test) {
m = p.matcher(s);
while (m.find()) {
System.out.printf(
"Found key: \"%s\" and value: \"%s\"%n", m.group(1), m.group(2)
);
}
}
Output
Found key: "sign" and value: "aaaabbbb="
Found key: "sign" and value: "aaaabbbb="
Found key: "sign" and value: "aaaabbbb="
Notes
I'm assuming a HTTP GET styled parameter list, but maybe you don't need to actually check for a next parameter key-value pair delimiter (i.e. &) - in which case you can remove the & part
I'm also assuming you want the "s out of your value back-reference, which kind of makes the following & check useless
Your current pattern for the replaceAll invocation will match as follows:
// | literal "[" (double-escaped)
// ||literal "^" or "=" (in character class)
// || | ... greedily quantified (1+ occurrences)
// || || literal "="
"\\[^=]+="
Finally, if you really, really want to use String#replaceAll for this, here's a slightly different pattern than the one above:
for (String s: test) {
System.out.println(
s.replaceAll(
".*(sign)=\"(.+?)\"(?=$|&).*",
"Found key: \"$1\" and value: \"$2\""
)
);
}
It still uses back-references and will produce the same result, albeit in a uglier way: you can't reuse the $1 and $2 group values, since you're creating a new String replacing the original one.
Last possible solution, using String#'split. This is the ugliest as it won't work well with a list of parameters:
for (String s: test) {
System.out.println(
// | negative look-behind for start of input
// | | literal "="
// | | | literal "
// | | |
Arrays.toString(s.split("(?<!^)=\""))
);
}
Output
[sign, aaaabbbb]
[blah?sign, aaaabbbb] --> yuck
[foo?sign, aaaabbbb, &blah, hodor"] --> yuck again
The double slash is a mistake, because it is escaping the [ to a literal [, which will never match.
Instead, do this:
String name = s.replaceAll("=.*", "");
String value = s.replaceAll(".*?=", "");

Matching ${123...456} and extracting 2 numbers in Java?

What is the simplest succinct way to expect 2 integers from a String when i know the format will always be ${INT1...INT2} e.g. "Hello ${123...456} would extract 123,456?
I would go with a Pattern with groups and back-references.
Here's an example:
String input = "Hello ${123...456}, bye ${789...101112}";
// | escaped "$"
// | | escaped "{"
// | | | first group (any number of digits)
// | | | | 3 escaped dots
// | | | | | second group (same as 1st)
// | | | | | | escaped "}"
Pattern p = Pattern.compile("\\$\\{(\\d+)\\.{3}(\\d+)\\}");
Matcher m = p.matcher(input);
// iterating over matcher's find for multiple matches
while (m.find()) {
System.out.println("Found...");
System.out.println("\t" + m.group(1));
System.out.println("\t" + m.group(2));
}
Output
Found...
123
456
Found...
789
101112
final String string = "${123...456}";
final String firstPart = string.substring(string.indexOf("${") + "${".length(), string.indexOf("..."));
final String secondPart = string.substring(string.indexOf("...") + "...".length(), string.indexOf("}"));
final Integer integer = Integer.valueOf(firstPart.concat(secondPart));

How to replace multiple words with space in a string using Java

I tried to replace a list of words from a give string with the following code.
String Sample = " he saw a cat running of that pat's mat ";
String regex = "'s | he | of | to | a | and | in | that";
Sample = Sample.replaceAll(regex, " ");
The output is
[ saw cat running that pat mat ]
// minus the []
It still has the last word "that". Is there anyway to modify the regex to consider the last word also.
Try:
String Sample = " he saw a cat running of that pat's mat remove 's";
String resultString = Sample.replaceAll("\\b( ?'s|he|of|to|a|and|in|that)\\b", "");
System.out.print(resultString);
saw cat running pat mat remove
DEMO
http://ideone.com/Yitobz
The problem is that you have consecutive words that you are trying to replace.
For example, consider the substring
[ of that ]
while the replaceAll is running, the [ of ] matches
[ of that ]
^ ^
and that will be replaced with a (space). The next character to match is t, not a space expected by
... | that | ...
What I think you can do to fix this is add word boundaries instead of spaces.
String regex = "'s\\b|\\bhe\\b|\\bof\\b|\\bto\\b|\\ba\\b|\\band\\b|\\bin\\b|\\bthat\\b";
or the shorter version as shown in Tuga's answer.
it doesn't work, because you delete the " of " part first and then there is no space before the "that" word, because you deleted it (replaced)
you can change in two ways:
String regex = "'s | he | of| to | a | and | in | that";
or
String regex = "'s | he | of | to | a | and | in |that ";
or you just call Sample = Sample.replaceAll(regex, " "); again

Java regExp get sub-sting before last quote

I have a sting:
String text = "\"Alaska \"adaa\" asdas\" at [2013-10-298 13:36.062];";
I need to get substing
//"Alaska "adaa" asdas"
String text = "\"Alaska \"adaa\" asdas\"";
How to?
Why not just use lastIndexOf?
text = text.substring(0, text.lastIndexOf("\"") + 1);
One way would be replacing everything after the last quote with an empty string:
test = test.replaceAll("(?<=\")[^\"]*$", "");
// ^^^^^^^ ^^^ ^
// | | |
// Preceded by a quote ----+ | |
// Does not contain a quote -----+ |
// Goes all the way to the end ------+
Try this:
text.replace("\"[^\"]*$", "\"")

Categories

Resources