I want to replace all occurrences of +- with - from a string called myStr in java.
I also want to replace all occurrences of -- with + from myStr.
The following two lines of code are not accomplishing this in java:
myStr.replaceAll("\\+-", "-");
myStr.replaceAll("\\--", "+");
Can anyone show me how to alter these two lines of code to accomplish the desired replacements?
I usually try to avoid regular expressions, but am not sure how to do this operation without them.
You're throwing out the return value of the function. You probably want to use:
myStr = myStr.replaceAll("\\+-", "-").replaceAll("--", "+");
update with additional info from comment:
Be sure to keep the return value of replaceAll.
myStr = myStr.replaceAll("\\+-", "-");
and later
myStr = myStr.replaceAll("--", "+");
public static String escapePlusMinus(String myStr) {
Pattern pattern = Pattern.compile("[+-]-");
Matcher matcher = pattern.matcher(myStr);
StringBuffer result = new StringBuffer();
while (matcher.find()) {
if (matcher.group(0).equals("+-")) {
matcher.appendReplacement(result, "-");
}
else {
matcher.appendReplacement(result, "+");
}
}
matcher.appendTail(result);
return result.toString();
}
escapePlusMinus("+-, --, +-, ---+---") ⇒ "-, +, -, +--+"
The last token is matched as:
"--" ⇒ "+"
"-" skipped, since there are not another "-" following it.
"+-" ⇒ "-"
"--" ⇒ "+"
Related
It may be very simple, but I am extremely new to regex and have a requirement where I need to do some regex matches in a string and extract the number in it. Below is my code with sample i/p and required o/p. I tried to construct the Pattern by referring to https://www.freeformatter.com/java-regex-tester.html, but my regex match itself is returning false.
Pattern pattern = Pattern.compile(".*/(a-b|c-d|e-f)/([0-9])+(#[0-9]?)");
String str = "foo/bar/Samsung-Galaxy/a-b/1"; // need to extract 1.
String str1 = "foo/bar/Samsung-Galaxy/c-d/1#P2";// need to extract 2.
String str2 = "foo.com/Samsung-Galaxy/9090/c-d/69"; // need to extract 69
System.out.println("result " + pattern.matcher(str).matches());
System.out.println("result " + pattern.matcher(str1).matches());
System.out.println("result " + pattern.matcher(str1).matches());
All of above SOPs are returning false. I am using java 8, is there is any way by which in a single statement I can match the pattern and then extract the digit from the string.
I would be great if somebody can point me on how to debug/develop the regex.Please feel free to let me know if something is not clear in my question.
You may use
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
See the regex demo
When used with matches(), the pattern above does not require explicit anchors, ^ and $.
Details
.* - any 0+ chars other than line break chars, as many as possible
/ - the rightmost / that is followed with the subsequent subpatterns
(?:a-b|c-d|e-f) - a non-capturing group matching any of the alternatives inside: a-b, c-d or e-f
/ - a / char
[^/]*? - any chars other than /, as few as possible
([0-9]+) - Group 1: one or more digits.
Java demo:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
for (String s : strs) {
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(s + ": \"" + m.group(1) + "\"");
}
}
A replacing approach using the same regex with anchors added:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
String pattern = "^.*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)$";
for (String s : strs) {
System.out.println(s + ": \"" + s.replaceFirst(pattern, "$1") + "\"");
}
See another Java demo.
Output:
foo/bar/Samsung-Galaxy/a-b/1: "1"
foo/bar/Samsung-Galaxy/c-d/1#P2: "2"
foo.com/Samsung-Galaxy/9090/c-d/69: "69"
Because you match always the last number in your regex, I would Like to just use replaceAll with this regex .*?(\d+)$ :
String regex = ".*?(\\d+)$";
String strResult1 = str.replaceAll(regex, "$1");
System.out.println(!strResult1.isEmpty() ? "result " + strResult1 : "no result");
String strResult2 = str1.replaceAll(regex, "$1");
System.out.println(!strResult2.isEmpty() ? "result " + strResult2 : "no result");
String strResult3 = str2.replaceAll(regex, "$1");
System.out.println(!strResult3.isEmpty() ? "result " + strResult3 : "no result");
If the result is empty then you don't have any number.
Outputs
result 1
result 2
result 69
Here is a one-liner using String#replaceAll:
public String getDigits(String input) {
String number = input.replaceAll(".*/(?:a-b|c-d|e-f)/[^/]*?(\\d+)$", "$1");
return number.matches("\\d+") ? number : "no match";
}
System.out.println(getDigits("foo.com/Samsung-Galaxy/9090/c-d/69"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/a-b/some other text/1"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/9090/a-b/69ace"));
69
no match
no match
This works on the sample inputs you provided. Note that I added logic which will display no match for the case where ending digits could not be matched fitting your pattern. In the case of a non-match, we would typically be left with the original input string, which would not be all digits.
I am currently working on a tool, which helps me to analyze a constantly growing String, that can look like this: String s = "AAAAAAABBCCCDDABQ". What I want to do is to find a sequence of A's and B's, do something and then remove that sequence from the original String.
My code looks like this:
while (someBoolean){
if(Pattern.matches("A+B+", s)) {
//Do stuff
//Remove the found pattern
}
if(Pattern.matches("C+D+", s)) {
//Do other stuff
//Remove the found pattern
}
}
return s;
Also, how I could remove the three sequences, so that s just contains "Q" at the end of the calculation, without and endless loop?
You should use a regex replacement loop, i.e. the methods appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb).
To find one of many patterns, use the | regex matcher, and capture each pattern separately.
You can then use group(int group) to get the matched string for each capture group (first group is group 1), which returns null if that group didn't match. For better performance, to simply check whether the group matched, use start(int group), which returns -1 if that group didn't match.
Example:
String s = "AAAAAAABBCCCDDABQ";
StringBuffer buf = new StringBuffer();
Pattern p = Pattern.compile("(A+B+)|(C+D+)");
Matcher m = p.matcher(s);
while (m.find()) {
if (m.start(1) != -1) { // Group 1 found
System.out.println("Found AB: " + m.group(1));
m.appendReplacement(buf, ""); // Replace matched substring with ""
} else if (m.start(2) != -1) { // Group 2 found
System.out.println("Found CD: " + m.group(2));
m.appendReplacement(buf, ""); // Replace matched substring with ""
}
}
m.appendTail(buf);
String remain = buf.toString();
System.out.println("Remain: " + remain);
Output
Found AB: AAAAAAABB
Found CD: CCCDD
Found AB: AB
Remain: Q
This solution assumes that the string always ends in Q.
String s="AAAAAAABBCCCDDABQ";
Pattern abPattern = Pattern.compile("A+B+");
Pattern cdPattern = Pattern.compile("C+D+");
while (s.length() > 1){
Matcher abMatcher = abPattern.matcher(s);
if (abMatcher.find()) {
s = abMatcher.replaceFirst("");
//Do other stuff
}
Matcher cdMatcher = cdPattern.matcher(s);
if (cdMatcher.find()) {
s = cdMatcher.replaceFirst("");
//Do other stuff
}
}
System.out.println(s);
You are probably looking for something like this:
String input = "AAAAAAABBCCCDDABQ";
String result = input;
String[] chars = {"A", "B", "C", "D"}; // chars to replace
for (String ch : chars) {
if (result.contains(ch)) {
String pattern = "[" + ch + "]+";
result = result.replaceAll(pattern, ch);
}
}
System.out.println(input); //"AAAAAAABBCCCDDABQ"
System.out.println(result); //"ABCDABQ"
This basically replace sequence of each character for single one.
If you want to remove the sequence completely, just replace ch to "" in replaceAll method parameters inside if body.
i have this problem:
i have to make a regular expression which take this urls:
http://www.amazon.it/TP-LINK-TL-WR841N-Wireless-300Mbps-Ethernet/dp/B001FWYGJS?ie=UTF8&redirect=true&ref_=s9_simh_gw_p147_d0_i2
http://www.amazon.it/gp/product/B014KMQWU0/
http://www.amazon.it/gp/product/glance/B014KMQWU0/
I need a regular expression which matches the full url until the ASIN of the product (ASIN is a word of 10 capital letters)
I have write this regex but not make what i want:
String regex="http:\\/\\/(?:www\\.|)amazon\\.com\\/(?:gp\\ product|| gp\\ product\\ glance || [^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
while (urlAmazonMatcher.find()) {
System.out.println("PROVA "+urlAmazonMatcher.group(0));
}
This is my solution. Finally it works :D
String regex="(http|www\\.)amazon\\.(com|it|uk|fr|de)\\/(?:gp\\/product|gp\\/product\\/glance|[^\\/]+\\/dp|dp)\\/([^\\/]{10})";
Pattern pattern=Pattern.compile(regex);
Matcher urlAmazonMatcher = pattern.matcher(url);
String toReturn = null;
while (urlAmazonMatcher.find()) {
toReturn=urlAmazonMatcher.group(0);
}
How about
/[^/?]{10}(/$|\?)
This matches 10 characters that are neither / nor ? following a slash if those characters are followed by a final slash or a question mark.
You can get the part that precedes or follows the ASIN using one of the various Matcher functions.
Here is my work from a previous project that was to extract URLs from text:
private Pattern getUriPattern() {
if(uriPattern == null) {
// taken from http://labs.apache.org/webarch/uri/rfc/rfc3986.html
//TODO implement the full URI syntax
String genDelims = "\\:\\/\\?\\#\\[\\]\\#";
String subDelims = "\\!\\$\\&\\'\\*\\+\\,\\;\\=";
String reserved = genDelims + subDelims;
String unreserved = "\\w\\-\\.\\~"; // i.e. ALPHA / DIGIT / "-" / "." / "_" / "~"
String allowed = reserved + unreserved;
// ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
uriPattern = Pattern.compile("((?:[^\\:/\\?\\#]+:)?//[" + allowed + "&&[^\\?\\#]]*(?:\\?([" + allowed + "&&[^\\#]]*))?(?:\\#[" + allowed + "]*)?).*");
}
return uriPattern;
}
You can use the above method as follows:
Matcher uriMatcher =
getUriPattern().matcher(text);
if(uriMatcher.matches()) {
String candidateUriString = uriMatcher.group(1);
try {
new URI(candidateUriString); // check once again if you matched a URL
// your code here
} catch (Exception e) {
// error handling
}
}
This will catch the whole URL, including params. You can then split it up to the first occurence of '?' (if any) and take the first part. Of course, you can rework the regex too.
I want to parse a line from a CSV(comma separated) file, something like this:
Bosh,Mark,mark#gmail.com,"3, Institute","83, 1, 2",1,21
I have to parse the file, and instead of the commas between the apostrophes I wanna have ';', like this:
Bosh,Mark,mark#gmail.com,"3; Institute","83; 1; 2",1,21
I use the following Java code but it doesn't parse it well:
Pattern regex = Pattern.compile("(\"[^\\]]*\")");
Matcher matcher = regex.matcher(line);
if (matcher.find()) {
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
the output is:
Bosh,Mark,mark#gmail.com,"3; Institute";"83; 1; 2",1,21
anyone have any idea how to fix this?
This is my solution to replace , inside quote to ;. It assumes that if " were to appear in a quoted string, then it is escaped by another ". This property ensures that counting from start to the current character, if the number of quotes " is odd, then that character is inside a quoted string.
// Test string, with the tricky case """", which resolves to
// a length 1 string of single quote "
String line = "Bosh,\"\"\"\",mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(line);
int start = 0;
StringBuilder output = new StringBuilder();
while (matcher.find()) {
// System.out.println(m.group() + "\n " + m.start() + " " + m.end());
output
.append(line.substring(start, matcher.start())) // Append unrelated contents
.append(matcher.group().replaceAll(",", ";")); // Append replaced string
start = matcher.end();
}
output.append(line.substring(start)); // Append the rest of unrelated contents
// System.out.println(output);
Although I cannot find any case that will fail the method of replace the matched group like you did in line = line.replace(matcher.group(), replacedMatch);, I feel safer to rebuild the string from scratch.
Here's a way:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
String in = "Bosh,Mark,mark#gmail.com,\"3, \"\" Institute\",\"83, 1, 2\",1,21";
String regex = "[^,\"\r\n]+|\"(\"\"|[^\"])*\"";
Matcher matcher = Pattern.compile(regex).matcher(in);
StringBuilder out = new StringBuilder();
while(matcher.find()) {
out.append(matcher.group().replace(',', ';')).append(',');
}
out.deleteCharAt(out.length() - 1);
System.out.println(in + "\n" + out);
}
}
which will print:
Bosh,Mark,mark#gmail.com,"3, "" Institute","83, 1, 2",1,21
Bosh,Mark,mark#gmail.com,"3; "" Institute","83; 1; 2",1,21
Tested on Ideone: http://ideone.com/fCgh7
Here is the what you need
String line = "Bosh,Mark,mark#gmail.com,\"3, Institute\",\"83, 1, 2\",1,21";
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Matcher matcher = regex.matcher(line);
while(matcher.find()){
String replacedMatch = matcher.group();
String gr1 = matcher.group(1);
gr1.trim();
replacedMatch = replacedMatch.replace(",", ";");
line = line.replace(matcher.group(), replacedMatch);
}
line will have value you needed.
Have you tried to make the RegExp lazy?
Another idea: inside the [] you should use a " too. If you do that, you should have the expected output with global flag set.
Your regex is faulty. Why would you want to make sure there are no ] within the "..." expression? You'd rather make the regex reluctant (default is eager, which means it catches as much as it can).
"(\"[^\\]]*\")"
should be
"(\"[^\"]*\")"
But nhadtdh is right, you should use a proper CSV library to parse it and replace , to ; in the values the parser returns.
I'm sure you'll find a parser when googling "Java CSV parser".
Shouldn't your regex be ("[^"]*") instead? In other words, your first line should be:
Pattern regex = Pattern.compile("(\"[^\"]*\")");
Of course, this is assuming you can't have quotes in the quoted values of your input line.
I'm looking for a built-in Java functions which for example can convert "\\n" into "\n".
Something like this:
assert parseFunc("\\n") = "\n"
Or do I have to manually search-and-replace all the escaped characters?
You can use StringEscapeUtils.unescapeJava(s) from Apache Commons Lang. It works for all escape sequences, including Unicode characters (i.e. \u1234).
https://commons.apache.org/lang/apidocs/org/apache/commons/lang3/StringEscapeUtils.html#unescapeJava-java.lang.String-
Anthony is 99% right -- since backslash is also a reserved character in regular expressions, it needs to be escaped a second time:
result = myString.replaceAll("\\\\n", "\n");
Just use the strings own replaceAll method.
result = myString.replaceAll("\\n", "\n");
However if you want match all escape sequences then you could use a Matcher. See http://www.regular-expressions.info/java.html for a very basic example of using Matcher.
Pattern p = Pattern.compile("\\(.)");
Matcher m = p.matcher("This is tab \\t and \\n this is on a new line");
StringBuffer sb = new StringBuffer();
while (m.find()) {
String s = m.group(1);
if (s == "n") {s = "\n"; }
else if (s == "t") {s = "\t"; }
m.appendReplacement(sb, s);
}
m.appendTail(sb);
System.out.println(sb.toString());
You just need to make the assignment to s more sophisticated depending on the number and type of escapes you want to handle. (Warning this is air code, I'm not Java developer)
If you don't want to list all possible escaped characters you can delegate this to Properties behaviour
String escapedText="This is tab \\t and \\rthis is on a new line";
Properties prop = new Properties();
prop.load(new StringReader("x=" + escapedText + "\n"));
String decoded = prop.getProperty("x");
System.out.println(decoded);
This handle all possible characters