Replace a string using a regular expression - java

I have a string that I would like to replace using a regular expression in java but I am not quite sure how to do this.
Let's say I have the code below:
String globalID="60DC6285-1E71-4C30-AE36-043B3F7A4CA6";
String regExpr="^([A-Z0-9]{3})[A-Z0-9]*|-([A-Z0-9]{3})[A-Z0-9]*$|-([A-Z0-9]{2})[A-Z0-9]*"
What I would like to do is apply my regExpr in globalID so the new string will be something like : 60D1E4CAE043; I did it with str.substring(0,3)+.... but I was wondering if I can do it using the regexpr in java. I tried to do it by using the replaceAll but the output was not the one I describe above.
To be more specific , I would like to change the globalID to a newglobalID using the regexpr I described above. The newglobalID will be : 60D1E4CAE043.
Thanks

This is definitively not the best code ever, but you could do something like this:
String globalID = "60DC6285-1E71-4C30-AE36-043B3F7A4CA6";
String regExpr = "^([A-Z0-9]{3})[A-Z0-9]*|-([A-Z0-9]{3})[A-Z0-9]*$|-([A-Z0-9]{2})[A-Z0-9]*";
Pattern pattern = Pattern.compile(regExpr);
Matcher matcher = pattern.matcher(globalID);
String newGlobalID = "";
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
newGlobalID += matcher.group(i) != null ? matcher.group(i) : "";
}
}
System.out.println(newGlobalID);
You will need to use a Matcher to iterate over all matches in your input as your regular expression matches subsequences of the input string only. Depending on which substring is matched a different capturing group will be non-null, you could also use named capturing groups or remember where in the input you currently are, but the above code should work as example.

Your regexp must match the whole string. Your wersioe tries to match the parts alternatively which does not work.
thy this:
String regExpr="^([A-Z0-9]{3})[^-]*"+
"-([A-Z0-9]{2})[^-]*"+
"-([A-Z0-9]{3})[^-]*"+
"-([A-Z0-9]{2})[^-]*"+
"-([A-Z0-9]{2}).*"

The total code should be like that below,
String globalID = "60DC6285-1E71-4C30-AE36-043B3F7A4CA6";
String regExpr = "^(\\w{3}).*?-"
+ "(\\w{2}).*?-"
+ "(\\w{2}).*?-"
+ "(\\w{2}).*?-"
+ "(\\w{3}).*";
System.out.println(globalID.replaceAll(regExpr, "$1$2$3$4$5"));
The output of println function is
60D1E4CAE043

Related

Replace content of matching XML tags

I have an XML string. I want to replace all the text contents enclosed with certain tags.
For Example:
Sample Case 1:
Input:
<newPassword>SOME_NEW_PASSWORD</newPassword>
<oldPasscode>SOME_OLD_PASSWORD</oldPasscode>
Output:
<newPassword>**HIDDEN**</newPassword>
<oldPasscode>**HIDDEN**</oldPasscode>
Sample Case 2:
Input:
<password>SOME_NEW_CODE</password>
<passcode>SOME_OLD_CODE</passcode>
Output:
<password>**HIDDEN**</password>
<passcode>**HIDDEN**</passcode>
So if the XML tag contains password or passcode, then replace its content with **HIDDEN**.
I have tried using regex and replace all occurrences like below:
String MASK = "**HIDDEN**";
Pattern pattern = Pattern.compile("(?i)([^>])(?=<\/\wpass[word|code]\w*>)", Pattern.CASE_INSENSITIVE);
atcher matcher = pattern.matcher("<password>SOME_NEW_CODE</password><passcode>SOME_OLD_CODE</passcode>");
boolean matchFound = matcher.find();
if (matchFound) {
System.out.println(matcher.replaceAll(MASK));
}
The output is coming as : <newPassword>**HIDDEN****HIDDEN**</newPassword>
It replaces twice.
It's possible to create a generic method which can do this replacement in case you need to use some other tag. It could be implemented something like this.
public String replaceTextBetweenTag(
final String input,
final String tag,
final String mask
) {
final String openingTag = String.format("<%s>", tag);
final String endTag = String.format("</%s>", tag);
return input.replaceAll(String.format("(%s)[^&]*(%s)",openingTag,endTag), openingTag + mask + endTag);
}
If there are multiple in which text needs to be replaced, it could be used like this
public String replaceTextBetweenTags(
final String input,
final List<String> tags,
final String mask
) {
String newOutput = input;
for(final String tag : tags){
newOutput = replaceTextBetweenTag(input, tag, mask);
}
return newOutput;
}
So in your case simple solution would be
replaceTextBetweenTags("<newPassword>SOME_NEW_PASSWORD</newPassword>\n" +
"<oldPasscode>SOME_OLD_PASSWORD</oldPasscode>",
List.of("newPassword", "oldPasscode"),
"**HIDDEN**"
);
In your pattern [word|code] is a character class matching one of the listed characters. If you want to match either word or code you can use an alternation (?:word|code)
A pattern to match pass followed by word or code (assuming that < and > are valid characters for a password)
<([^<>\s]*pass(?:word|code)[^<>\s]*)>.*?</\1>
< Match <
([^<>\s]*pass(?:word|code)[^<>\s]*) Match password or passcode between optional chars other than < > or whitespace chars
> Match >
.*? Match as least as possble characters
</\1> Match </ followed by a backreference to group 1 (matching up the element name) and >
See a Java demo or a Regex demo
In the replacement use **HIDDEN** between the value of capture group 1 in brackets <$1>**HIDDEN**</$1>
Note that this matches xml elements for this specific format. It does not take the validity of any structure into account.
String regex = "<([^<>\\s]*pass(?:word|code)[^<>]*)>.*?</\\1>";
String string = "<newPassword>SOME_NEW_PASSWORD</newPassword>\n"
+ "<oldPasscode>SOME_OLD_PASSWORD</oldPasscode>\n"
+ "<password>SOME_NEW_CODE</password>\n"
+ "<passcode>SOME_OLD_CODE</passcode>";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(string);
System.out.println(matcher.replaceAll("<$1>**HIDDEN**</$1>"));
Output
<newPassword>**HIDDEN**</newPassword>
<oldPasscode>**HIDDEN**</oldPasscode>
<password>**HIDDEN**</password>
<passcode>**HIDDEN**</passcode>
If there can only be word chars in the tag, and no < or > in the password:
<(\w*pass(?:word|code)\w*)>[^<>]*</\1>
Regex demo
First of all, you don't want to use Regex for anything based on XML tags (including HTML of course). Read more here and enter link description here.
The simplest Regex to satisfy the behavior might look like this (demo at Regex101 including substitution):
<(?<tagName>[a-z]*[Pp]ass(?:code|word))>([^<]*)<\/\k<tagName>>
So, in Java, a minimal and reproducible sample might look like this:
String xml = "<newPassword>SOME_NEW_PASSWORD</newPassword>" +
"<oldPasscode>SOME_OLD_PASSWORD</oldPasscode>" +
"<password>SOME_NEW_CODE</password>" +
"<passcode>SOME_OLD_CODE</passcode>" +
"<nope>VALUE</nope>";
String hidden = xml.replaceAll(
"<(?<tagName>[a-z]*[Pp]ass(?:code|word))>[^<]*<\\/\\k<tagName>>",
"<${tagName}>**HIDDEN**<\\/${tagName}>");
Alternatively (assuming the xml variable existence):
String xml = // your xml
String regex = "<(?<tagName>[a-z]*[Pp]ass(?:code|word))>[^<]*<\\/\\k<tagName>>";
Pattern pattern = Pattern.compile(regex);
String hidden = pattern
.matcher(xml)
.replaceAll("<${tagName}>**HIDDEN**<\\/${tagName}>");
In Java Regex (?<tagName>MATCH_ME) defines a named group matching MATCH_ME
In Java Regex \\k<tagName> refers to the already named group (here MATCH_ME)
In Java replacement ${tagName} refers to the named group captured from the Regex.
If you pretty-print the output, the result looks like (I used the nope element to demonstrate keeping the original value):
<newPassword>**HIDDEN**</newPassword>
<oldPasscode>**HIDDEN**</oldPasscode>
<password>**HIDDEN**</password>
<passcode>**HIDDEN**</passcode>
<nope>VALUE</nope>
Note: Remember you might need some modifications depending on how much strict the Regex pattern should be. If possible, use a proper XML parser.
(?i)([^>]+)(?=<\/\w*pass(?:code|word)\w*>)
Thanks #oliver_t and #Nikolas. I have taken references from your regex. And the above regex works as expected.
Your RegEx is just missing several cardinalities:
([^>])+(?=<\/\w*pass(word|code)\w*)
^ ^ ^

Java - extract JSON values from string using multi regex

I am trying to use this multi regex java library to extract JSON field value from a string.
My JSON look like this:
{
"field1": "something",
"field2": 13
"field3": "some"
}
I have created a regex pattern to fit each field, and it is working with Java Regex Pattern by simply doing something like this for each pattern:
Matcher matcher = patternToSearch.matcher(receiveData);
if (matcher.find()) {
return matcher.group(1);
}
I decided to try and improve the code and use multi regex so instead of scanning the string 3 times, it will scan it only one time and extract all needed values.
So I came up with something like this:
String[] patterns = new String[]{
"\"field1\":\\s*\"(.*?)\"",
"\"field2\":\\s*(\\d+)(\\.\\d)?",
"\"field3\":\\s*\"(.*?)\"",
};
this.matcher = MultiPattern.of(patterns).matcher();
the matcher has only one method - match - used like this:
int[] match = this.matcher.match(jsonStringToScan);
so I ended up with a list of integers, but I have no idea how to get the json values from these strings and how those integers are helping me. The multi regex matcher does not support the group method I used before to get the value.
Any idea of how I can extract multiple json values from string using multi regex? (Scanning string only once)
As mentioned on github page from your link match returnes indexes of patterns matched. Another point from this page:
The library does not handle groups.
Consider matching key as group too. Look at this simple example:
final Pattern p = Pattern.compile("\"(field.)\":((?:\".*?\")|(?:\\d+(?:\\.\\d+)?))");
final Matcher m = p.matcher("{\"field3\":\"hi\",\"field2\":100.0,\"field1\":\"hi\"}");
while (m.find()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.print(m.group(i) + " ");
}
System.out.println();
}
It prints:
field3 "hi"
field2 100.0
field1 "hi"
If you want to avoid quotes in value group, you need more complicated logic. I've stopped at:
final Pattern p = Pattern.compile("\"(field.)\":(?:(?:\"(.*?(?=\"))\")|(\\d+(?:\\.\\d+)?))");
resulting in
field3 hi null
field2 null 100.0
field1 hi null

Parse out specific characters from java string

I have been trying to drop specific values from a String holding JDBC query results and column metadata. The format of the output is:
[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]
I am trying to get it into the following format:
I_Col1=someValue1, I_Col2=someVal2, I_Col3=someVal3
I have tried just dropping everything before the "=", but some of the "someVal" data has "=" in them. Is there any efficient way to solve this issue?
below is the code I used:
for(int i = 0; i < finalResult.size(); i+=modval) {
String resulttemp = finalResult.get(i).toString();
String [] parts = resulttemp.split(",");
//below is only for
for(int z = 0; z < columnHeaders.size(); z++) {
String replaced ="";
replaced = parts[z].replace("*=", "");
System.out.println("Replaced: " + replaced);
}
}
You don't need any splitting here!
You can use replaceAll() and the power of regular expressions to simply replace all occurrences of those unwanted characters, like in:
someString.replaceAll("[\\[\\]\\{\\}", "")
When you apply that to your strings, the resulting string should exactly look like required.
You could use a regular expression to replace the square and curly brackets like this [\[\]{}]
For example:
String s = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]";
System.out.println(s.replaceAll("[\\[\\]{}]", ""));
That would produce the following output:
I_Col1=someValue1, I_Col2=someVal2, I_Col3=someVal3
which is what you expect in your post.
A better approach however might be to match instead of replace if you know the character set that will be in the position of 'someValue'. Then you can design a regex that will match this perticular string in such a way that no matter what seperates I_Col1=someValue1 from the rest of the String, you will be able to extract it :-)
EDIT:
With regards to the matching approach, given that the value following I_Col1= consists of characters from a-z and _ (regardless of the case) you could use this pattern: (I_Col\d=\w+),?
For example:
String s = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]";
Matcher m = Pattern.compile("(I_Col\\d=\\w+),?").matcher(s);
while (m.find())
System.out.println(m.group(1));
This will produce:
I_Col1=someValue1
I_Col2=someVal2
I_Col3=someVal3
You could do four calls to replaceAll on the string.
String query = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]"
String queryWithoutBracesAndBrackets = query.replaceAll("\\{", "").replaceAll("\\]", "").replaceAll("\\]", "").replaceAll("\\[", "")
Or you could use a regexp if you want the code to be more understandable.
String query = "[{I_Col1=someValue1, I_Col2=someVal2}, {I_Col3=someVal3}]"
queryWithoutBracesAndBrackets = query.replaceAll("\\[|\\]|\\{|\\}", "")

Parsing text using Regex

So I am trying to parse a String that contains two key components. One tells me the timing options, and the other is position.
Here is what the text looks like
KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif
The {iiii} is the position and the {ttt} is the timing options.
I need to separate the {ttt} and {iiii} out so I can get a full file name: example, position 1 and time slice 1 = KB_H9Oct4GFP_20130305_p0000001t000000001z001c02.tif
So far here is how I am parsing them:
int startTimeSlice = 1;
int startTile = 1;
String regexTime = "([^{]*)\\{([t]+)\\}(.*)";
Pattern patternTime = Pattern.compile(regexTime);
Matcher matcherTime = patternTime.matcher(filePattern);
if (!matcherTime.find() || matcherTime.groupCount() != 3)
{
throw new IllegalArgumentException("Incorect filePattern: " + filePattern);
}
String timePrefix = matcherTime.group(1);
int tCount = matcherTime.group(2).length();
String timeSuffix = matcherTime.group(3);
String timeMatcher = timePrefix + "%0" + tCount + "d" + timeSuffix;
String timeFileName = String.format(timeMatcher, startTimeSlice);
String regex = "([^{]*)\\{([i]+)\\}(.*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(timeFileName);
if (!matcher.find() || matcher.groupCount() != 3)
{
throw new IllegalArgumentException("Incorect filePattern: " + filePattern);
}
String prefix = matcher.group(1);
int iCount = matcher.group(2).length();
String suffix = matcher.group(3);
String nameMatcher = prefix + "%0" + iCount + "d" + suffix;
String fileName = String.format(nameMatcher, startTile);
Unfortunately my code is not working and it fails when checking if the second matcher finds anything in timeFileName.
After the first regex check it gets the following as the timeFileName: 000000001z001c02.tif, so it is cutting off the beginning potions including the {iiii}
Unfortunately I cannot assuming which group goes first ({iiii} or {ttt}), so I am trying to devise a solution that just handles {ttt} first and then processes {iiii}.
Also, here is another example of valid text that I am also trying to parse: F_{iii}_{ttt}.tif
Steps to follow:
Find string {ttt...} in file name
Form a number format based on no of "t" in string
Find string {iiii...} in file name
Form a number format based on no of "i" in string
Use String.replace() method to replace time and possition
Here is the code:
String filePattern = "KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif";
int startTimeSlice = 1;
int startTile = 1;
Pattern patternTime = Pattern.compile("(\\{[t]*\\})");
Matcher matcherTime = patternTime.matcher(filePattern);
if (matcherTime.find()) {
String timePattern = matcherTime.group(0);// {ttt}
NumberFormat timingFormat = new DecimalFormat(timePattern.replaceAll("t", "0")
.substring(1, timePattern.length() - 1));// 000
Pattern patternPosition = Pattern.compile("(\\{[i]*\\})");
Matcher matcherPosition = patternPosition.matcher(filePattern);
if (matcherPosition.find()) {
String positionPattern = matcherPosition.group(0);// {iiii}
NumberFormat positionFormat = new DecimalFormat(positionPattern
.replaceAll("i", "0").substring(1, positionPattern.length() - 1));// 0000
System.out.println(filePattern.replace(timePattern,
timingFormat.format(startTimeSlice)).replace(positionPattern,
positionFormat.format(startTile)));
}
}
Okay, so after a bit of testing I found a way to handle the case:
For parsing the {ttt} I can use the regex: (.*)\\{t([t]+)\\}(.*)
Now this means I have to increment tCount by one to account for the t I grab from \\{t
Same goes for {iii}: (.*)\\{i([i]+)\\}(.*)
Your first pattern looks like this:
String regexTime = "([^{]*)\\{([t]+)\\}(.*)";
This finds a string consisting of a sequence of zero or more non-{ characters, followed by {t...t}, followed by other characters.
When your input is
KB_H9Oct4GFP_20130305_p00{iiii}t00000{ttt}z001c02.tif
the first substring that matches is
iiii}t00000{ttt}z001c02.tif
The { before the i's can't match, because you told it only to match non-{ characters. The result is that when you re-form the string to do the second match, it will start with iiii} and therefore won't match {iiii} like you're trying to do.
When you're looking for {ttt...}, I don't see any reason to exclude { or any other character from the first part of the string. So changing the regex to
"^(.*)\\{(t+\\}(.*)$"
may be a simple way to fix this. Note that if you want to make sure you include the entire beginning of the string and the entire end of the string in your groups, you should include ^ and $ to match the beginning and end of the string, respectively; otherwise the matcher engine may decide not to include everything. In this case, it won't, but it's a good habit to get into anyway, because that makes things explicit and doesn't require anyone to know the difference between "greedy" and "reluctant" matching. Or use matches() instead of find(), since matches() automatically tries to match the entire string.
Perhaps an easier way to do this (as confirmed by http://regex101.com/r/vG7kY7) is
(\{i+\}).*(\{t+\})
You don't need the [] around a single character you are matching. Keep it simple. i+ means "one or more i's", and as long as these are in the order given, this expression will work (with the first match being {iiii} and the second {ttttt}).
You may need to escape the backslash when writing it in a string...

How to find expression, evaluate and replace in Java?

I have the following expressions inside a String (that comes from a text file):
{gender=male#his#her}
{new=true#newer#older}
And I would like to:
Find the occurences of that pattern {variable=value#if_true#if_false}
Temporarily store those variables in fields such as variableName, variableValue, ifTrue, ifFalse as Strings.
Evaluate an expression based on variableName and variableValue according to local variables (like String gender = "male" and String new = "true").
And finally replace the pattern with ifTrue or ifFalse according to (3).
Should I use String.replaceAll() in some way, or how do I look for this expression and save the strings that are inside? Thanks for your help
UPDATE
It would be something like PHP's preg_match_all.
UPDATE 2
I solved this by using Pattern and Matcher as I post as an answer below.
If the strings always take this format, then string.split('#') is probably the way to go. This will return an array of strings in the '#' separator (e.g. "{gender=male#his#her}".split('#') = {"{gender=male", "his", "her}"}; use substring to remove the first and last character to get rid of the braces)
After strugling for a while I managed to get this working using Pattern and Matcher as follows:
// \{variable=value#if_true#if_false\}
Pattern pattern = Pattern.compile(Pattern.quote("\\{") + "([\\w\\s]+)=([\\w\\s]+)#([\\w\\s]+)#([\\w\\s]+)" + Pattern.quote("\\}"));
Matcher matcher = pattern.matcher(doc);
// if we'll make multiple replacements we should keep an offset
int offset = 0;
// perform the search
while (matcher.find()) {
// by default, replacement is the same expression
String replacement = matcher.group(0);
String field = matcher.group(1);
String value = matcher.group(2);
String ifTrue = matcher.group(3);
String ifFalse = matcher.group(4);
// verify if field is gender
if (field.equalsIgnoreCase("Gender")) {
replacement = value.equalsIgnoreCase("Female")?ifTrue:ifFalse;
}
// replace the string
doc = doc.substring(0, matcher.start() + offset) + replacement + doc.substring(matcher.end() + offset);
// adjust the offset
offset += replacement.length() - matcher.group(0).length();
}

Categories

Resources