using replace() or replaceall()

using replace() or replaceall() - java

I know of using this:
public String RemoveTag(String html){
html = html.replaceAll("\\<.*?>","");
html = html.replaceAll(" ","");
html = html.replaceAll("&","");
return html;
}
This removes all tags within an html string. However the question is how does it get a wild characters in between <.*?>. Could someone give me a more detailed explanation on how getting wild characters in String.
The main reason for this is that I still have this characters that has "an # at start point and } at end point" and I want to get rid of everything in between "#" and "}".

The first parameter to replaceAll(...) is a regex string. The .*? in your example is the part that matches anything. So, if you want a regular expression that will get rid of everything between "#" and "}" you would use something like:
String exampleText = "Start #some text} finish.";
exampleText.replaceAll("#(.*?)\\}", "#}");
System.out.println(exampleText); // prints "Start #} finish."
Notice the same pattern: .*?. The parentheses, which are optional here, are just used for grouping. Also notice the } is escaped with backslashes since it can have special meaning within regular expressions.
For more info on Java's regex support see the Pattern class.

regular expressions can be implemented by building a finite automaton, since every regular expression has a finite deterministic automaton and vice versa.
The regex for what you are seeking is #.*?} if you want to keep these chars: you can replace it with "#}" instead of with "". it will be something like: s.replaceAll("#.*?}", "#}") [s is your String].
It seems you might need the regex "#.*?\}", though the special } char should be ignored by the pattern recognizer when it fails to see the preceding {. To be on the safe side: "#.*?\\}" should work either way, as #WayneBaylor posted.
You might want to read more on regular expressions

Related

What all characters can be used as String Delimiters in Java?

I am trying break a String in various pieces using delimiter(":").
String sepIds[]=ids.split(":");
It is working fine. But when I replace ":" with " * " and use " * " as delimiter, it doesn't work.
String sepIds[]=ids.split("*"); //doesn't work
It just hangs up there, and doesn't execute further.
What mistake I am making here?

String#split takes a regular expression as parameter. In regex some chars have special meanings so they need to be escaped, for example:
"foo*bar".split("\\*")
the result will be as you expect:
[foo, bar]
You could also use the method Pattern#quote to simplify the task.
"foo*bar".split(Pattern.quote("*"))

String.split expects a regular expression argument. * has got a meaning in regex. So if you want to use them then you need to escape them like this:
String sepIds[]=ids.split("\\*");

The argument of .split() is a regular expression, not a string literal. Therefore you need to escape * since it is a special regex character. Write:
ids.split("\\*");
This is how you would split agaisnt one or more spaces:
ids.split("\\s+");
Note that Guava has Splitter which is very, very fast and can split against literals:
Splitter.on('*').split(ids);

'*' and '.' are special characters you have to blackshlash it.
String sepIds[]=ids.split("\\*");
To read more about java patterns please visit that page.

That is expected behaviour. The documentation for the String split function says that the input string is treated as a regular expression (with a link explaining how that works). As Germann points out, '*' is a special character in regular expressions.

Java's String.split() uses regular expressions to split up the string (unlike similar functions in C# or python). * is a special character in regular expressions and you need to escape it with a \ (backslash). So you should use instead:
String sepIds[]=ids.split("\\*");
You can find more information on regular expressions anywhere on the internet a quite complete list of special characters supported by java should be here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Java replaceAll to javascript regex

I want to move some user input test from Java to javascript. The code suppose to remove wildcard characters out of user input string, at any position. I'm attempting to convert the following Java notation to javascript, but keep getting error
"Invalid regular expression: /(?<!\")~[\\d\\.]*|\\?|\\*/: Invalid group".
I have almost no experience with regex expressions. Any help will be much appreciated:
JAVA:
str = str.replaceAll("(?<!\")~[\\d\\.]*|\\?|\\*","");
My failing javascript version:
input = input.replace( /(?<!\")~[\\d\\.]*|\\?|\\*/g, '');

The problem, as anubhava points out, is that JavaScript doesn't support lookbehind assertions. Sad but true. The lookbehind assertion in your original regex is (?<!\"). Specifically, it's looking only for strings that don't start with a double quotation mark.
However, all is not lost. There are some tricks you can use to achieve the same result as a lookbehind. In this case, the lookbehind is there only to prevent the character prior to the tilde from being replaced as well. We can accomplish this in JavaScript by matching the character anyway, but then including it in the replacement:
input = input.replace( /([^"])~[\d.]*|\?|\*/g, '$1' );
Note that for the alternations \? and \*, there will be no groups, so $1 will evaluate to the empty string, so it doesn't hurt to include it in the replacement.
NOTE: this is not 100% equivalent to the original regular expression. In particular, lookaround assertions (like the lookbehind above) also prevent the input stream from being consumed, which can sometimes be very helpful when matching things that are right next to each other. However, in this case, I can't think of a way that that would be a problem. To make a completely equivalent regex would be more difficult, but I believe this meets the need of the original regex.

Necessary to escape a java regular expression in matches()?

I'm currently doing a test on an HTTP Origin to determine if it came from SSL:
(HttpHeaders.Names.ORIGIN).matches("/^https:\\/\\//")
But I'm finding it's not working. Do I need to escape matches() strings like a regular expression or can I leave it like https://? Is there any way to do a simple string match?
Seems like it would be a simple question, but surprisingly I'm not getting anywhere even after using a RegEx tester http://www.regexplanet.com/advanced/java/index.html. Thanks.

Java's regex doesn't need delimiters. Simply do:
.matches("https://.*")
Note that matches validates the entire input string, hence the .* at the end. And if the input contains line break chars (which . will not match), enable DOT-ALL:
.matches("(?s)https://.*")
Of couse, you could also simply do:
.startsWith("https://")
which takes a plain string (no regex pattern).

How about this Regex:
"^(https:)\/\/.*"
It works in your tester

How do I write a regular expression to find the following pattern?

I am trying to write a regular expression to do a find and replace operation. Assume Java regex syntax. Below are examples of what I am trying to find:
12341+1
12241+1R1
100001+1R2
So, I am searching for a string beginning with one or more digits, followed by a "1+1" substring, followed by 0 or more characters. I have the following regex:
^(\d+)(1\\+1).*
This regex will successfully find the examples above, however, my goal is to replace the strings with everything before "1+1". So, 12341+1 would become 1234, and 12241+1R1 would become 1224. If I use the first grouped expression $1 to replace the pattern, I get the wrong result as follows:
12341+1 becomes 12341
12241+1R1 becomes 12241
100001+1R2 becomes 100001
Any ideas?

Your existing regex works fine, just that you are missing a \ before \d
String str = "100001+1R2";
str = str.replaceAll("^(\\d+)(1\\+1).*","$1");
Working link

IMHO, the regex is correct.
Perhaps you wrote it wrong in the code. If you want to code the regex ^(\d+)(1\+1).* in a string, you have to write something like String regex = "^(\\d+)(1\\+1).*".
Your output is the result of ^(\d+)(1+1).* replacement, as you miss some backslash in the string (e.g. "^(\\d+)(1\+1).*").

Your regex looks fine to me - I don't have access to java but in JavaScript the code..
"12341+1".replace(/(\d+)(1\+1)/g, "$1");
Returns 1234 as you'd expect. This works on a string with many 'codes' in too e.g.
"12341+1 54321+1".replace(/(\d+)(1\+1)/g, "$1");
gives 1234 5432.

Personally, I wouldn't use a Regex at all (it'd be like using a hammer on a thumbtack), I'd just create a substring from (Pseudocode)
stringName.substring(0, stringName.indexOf("1+1"))
But it looks like other posters have already mentioned the non-greedy operator.
In most Regex Syntaxes you can add a '?' after a '+' or '*' to indicate that you want it to match as little as possible before moving on in the pattern. (Thus: ^(\d+?)(1+1) matches any number of digits until it finds "1+1" and then, NOT INCLUDING the "1+1" it continues matching, whereas your original would see the 1 and match it as well).

Escaping a String from getting regex parsed in Java

In Java, suppose I have a String variable S, and I want to search for it inside of another String T, like so:
if (T.matches(S)) ...
(note: the above line was T.contains() until a few posts pointed out that that method does not use regexes. My bad.)
But now suppose S may have unsavory characters in it. For instance, let S = "[hi". The left square bracket is going to cause the regex to fail. Is there a function I can call to escape S so that this doesn't happen? In this particular case, I would like it to be transformed to "\[hi".

String.contains does not use regex, so there isn't a problem in this case.
Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.

As Tom Hawtin said, you need to quote the pattern. You can do this in two ways (edit: actually three ways, as pointed out by #diastrophism):
Surround the string with "\Q" and "\E", like:
if (T.matches("\\Q" + S + "\\E"))
Use Pattern instead. The code would be something like this:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL);
if (sPattern.matcher(T).matches()) { /* do something */ }
This way, you can cache the compiled Pattern and reuse it. If you are using the same regex more than once, you almost certainly want to do it this way.
Note that if you are using regular expressions to test whether a string is inside a larger string, you should put .* at the start and end of the expression. But this will not work if you are quoting the pattern, since it will then be looking for actual dots. So, are you absolutely certain you want to be using regular expressions?

Try Pattern.quote(String). It will fix up anything that has special meaning in the string.

Any particular reason not to use String.indexOf() instead? That way it will always be interpreted as a regular string rather than a regex.

Regex uses the backslash character '\' to escape a literal. Given that java also uses the backslash character you would need to use a double bashslash like:
String S = "\\[hi"
That will become the String:
\[hi
which will be passed to the regex.
Or if you only care about a literal String and don't need a regex you could do the following:
if (T.indexOf("[hi") != -1) {

T.contains() (according to javadoc : http://java.sun.com/javase/6/docs/api/java/lang/String.html) does not use regexes. contains() delegates to indexOf() only.
So, there are NO regexes used here. Were you thinking of some other String method ?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

using replace() or replaceall() - java

Related

What all characters can be used as String Delimiters in Java?

Java replaceAll to javascript regex

Necessary to escape a java regular expression in matches()?

How do I write a regular expression to find the following pattern?

Escaping a String from getting regex parsed in Java

Categories

Resources