Regular expressions in java

Regular expressions in java - java

String s= "(See <a href=\"/wiki/Grass_fed_beef\" title=\"Grass fed beef\" " +
"class=\"mw-redirect\">grass fed beef.) They have been used for " +
"<a href=\"/wiki/Paper\" title=\"Paper\">paper-making since " +
"2400 BC or before.";
In the string above I have inter-mixed html with text.
Well the requirement is that the output looks like:-
They have been used for paper-making since 2400 BC or before.
Could some one help me with a generic regular expression that would produce the desired output from the given input?
Thanks in advance!

The following expression:
\([^)]*?\)|<[a-zA-Z/][^>]*?>
will match anything that looks like an HTML tag and any parenthesized text. Replace said text with "", and there ya go.
Note: If you try to match any string that has script tags in it, or "HTML" where the author didn't bother to escape < and > when they weren't used as tag delimiters), or a ( without a ), things will probably not work as you'd hoped.

https://stackoverflow.com/questions/1732348#1732454
You have been warned.

Related

How would I ignore a single quotation mark within a string that interacts with my embedded quotation marks?

My problem is found here in my loop that displays the results from my SQL query in my java into HTML
I need to add an "add to cart" link with <a href='?id=id&name=name etc.> But the problem lies in that one of the results the name has a possesive as in "John's Smith".
This single quotation mark is ending my href link and not adding that into the name section of the link.
Any suggestions?
do {
out.println("<tr><td class='col-md-1'> <a href='addcart.jsp?id="+rst.getString(1)+"&name="+rst.getString(2)+"&price="+rst.getString(4)+"'>Add to cart</a></td><td>"+rst.getString(2)+"</td><td>"+rst.getString(3)+"</td><td>"+rst.getString(4)+"</td></tr>");
} while (rst.next());
Thanks in advance!

The best way to handle arbitrary and unknown characters in your href is by using URL Encoding, then it won't matter which quotes you use (single or double) or how your quotes might be nested.
Java has a URLEncoder class specifically to do this.
Typically you'd use it on the query string if the rest of the url is known and fixed. I'd recommend building the query string, url encoding it, and then adding that to your output, something like:
final String query =
"id=" + rst.getString(1) +
"&name=" + rst.getString(2) +
"&price=" + rst.getString(4);
out.println("<tr><td class='col-md-1'> <a href='addcart.jsp?" +
URLEncoder.encode(query) +
"etc. the rest");
This Guide to Java URL Encoding/Decoding may be helpful.

RegEx for matching between any two HTML tags

I have the following content :
<div class="TEST-TEXT">hi</span>
first young CEO's TEST-TEXT
<span class="test">hello</span>
I am trying to match the TEST-TEXT string to replace it is value but only when it is a text and not within an attribute value.
I have checked the concepts of look-ahead and look-behind in Regex but the current issue with that is that it needs to use a fixed width for the match here is a link regex-match-all-characters-between-two-html-tags that show case a very similar case but with an exception that there is a span with a class to create a match
also checked the link regex-match-attribute-in-a-html-code
here are two regular expressions I am trying with :
\"([^"]*)\"
(?s)(?<=<([^{]*)>)(.+?)(?=</.>)
both are not working for me try using [https://regex101.com/r/ApbUEW/2]
I expect it to match only the string when it is a text
current behavior it matches both cases
Edit : I want the text to be dynamic and not specific to TEST-TEXT

Something like this should help:
\>([^"<]*)\<
EDIT:
Without open and close tags included:
(?<=\>)([^"<]*)(?=\<)

Try TEST-TEXT(?=<\/a>)
TEST-TEXT matches TEST-TEXT
?= look ahead to check closing tag </a>
see at
regex101

Here, we might just add a soft boundary on the right of the desired output, which you have been already doing, then a char list for the desired output, then collect, after that we can make a replacement by using capturing groups (). Maybe similar to this:
([A-Z-]+)(<\/)
Demo
This snippet is just to show that the expression might be valid:
const regex = /([A-Z-]+)(<\/)/gm;
const str = `<div class="TEST-TEXT">hi</span><a href=\\"https://en.wikipedia.org/wiki/TEST-TEXT\\">first young CEO's
TEST-TEXT</a><span class="test">hello</span><div class="TEST-TEXT">hi</span><a href=\\"https://en.wikipedia.org/wiki/TEST-TEXT\\">first young CEO's
TEST-TEXT</a><span class="test">hello</span>`;
const subst = `NEW-TEXT$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
RegEx
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im also helps to visualize the expressions.

Maybe this will help?
String html = "<div class=\"TEST-TEXT\">hi</span>\n" +
"first young CEO's TEST-TEXT\n" +
"<span class=\"test\">hello</span>";
Pattern pattern = Pattern.compile("(<)(.*)(>)(.*)(TEST-TEXT)(.*)</.*>");
Matcher matcher = pattern.matcher(html);
while (matcher.find()){
System.out.println(matcher.group(5));
}

A RegEx for that a string between any two HTML tags
(?![^<>]*>)(TEST\-TEXT)

Escape special characters using Regex in java [duplicate]

Does Java have a built-in way to escape arbitrary text so that it can be included in a regular expression? For example, if my users enter "$5", I'd like to match that exactly rather than a "5" after the end of input.

Since Java 1.5, yes:
Pattern.quote("$5");

Difference between Pattern.quote and Matcher.quoteReplacement was not clear to me before I saw following example
s.replaceFirst(Pattern.quote("text to replace"),
Matcher.quoteReplacement("replacement text"));

It may be too late to respond, but you can also use Pattern.LITERAL, which would ignore all special characters while formatting:
Pattern.compile(textToFormat, Pattern.LITERAL);

I think what you're after is \Q$5\E. Also see Pattern.quote(s) introduced in Java5.
See Pattern javadoc for details.

First off, if
you use replaceAll()
you DON'T use Matcher.quoteReplacement()
the text to be substituted in includes a $1
it won't put a 1 at the end. It will look at the search regex for the first matching group and sub THAT in. That's what $1, $2 or $3 means in the replacement text: matching groups from the search pattern.
I frequently plug long strings of text into .properties files, then generate email subjects and bodies from those. Indeed, this appears to be the default way to do i18n in Spring Framework. I put XML tags, as placeholders, into the strings and I use replaceAll() to replace the XML tags with the values at runtime.
I ran into an issue where a user input a dollars-and-cents figure, with a dollar sign. replaceAll() choked on it, with the following showing up in a stracktrace:
java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:374)
at java.util.regex.Matcher.appendReplacement(Matcher.java:748)
at java.util.regex.Matcher.replaceAll(Matcher.java:823)
at java.lang.String.replaceAll(String.java:2201)
In this case, the user had entered "$3" somewhere in their input and replaceAll() went looking in the search regex for the third matching group, didn't find one, and puked.
Given:
// "msg" is a string from a .properties file, containing "<userInput />" among other tags
// "userInput" is a String containing the user's input
replacing
msg = msg.replaceAll("<userInput \\/>", userInput);
with
msg = msg.replaceAll("<userInput \\/>", Matcher.quoteReplacement(userInput));
solved the problem. The user could put in any kind of characters, including dollar signs, without issue. It behaved exactly the way you would expect.

To have protected pattern you may replace all symbols with "\\\\", except digits and letters. And after that you can put in that protected pattern your special symbols to make this pattern working not like stupid quoted text, but really like a patten, but your own. Without user special symbols.
public class Test {
public static void main(String[] args) {
String str = "y z (111)";
String p1 = "x x (111)";
String p2 = ".* .* \\(111\\)";
p1 = escapeRE(p1);
p1 = p1.replace("x", ".*");
System.out.println( p1 + "-->" + str.matches(p1) );
//.*\ .*\ \(111\)-->true
System.out.println( p2 + "-->" + str.matches(p2) );
//.* .* \(111\)-->true
}
public static String escapeRE(String str) {
//Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
//return escaper.matcher(str).replaceAll("\\\\$1");
return str.replaceAll("([^a-zA-Z0-9])", "\\\\$1");
}
}

Pattern.quote("blabla") works nicely.
The Pattern.quote() works nicely. It encloses the sentence with the characters "\Q" and "\E", and if it does escape "\Q" and "\E".
However, if you need to do a real regular expression escaping(or custom escaping), you can use this code:
String someText = "Some/s/wText*/,**";
System.out.println(someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));
This method returns: Some/\s/wText*/\,**
Code for example and tests:
String someText = "Some\\E/s/wText*/,**";
System.out.println("Pattern.quote: "+ Pattern.quote(someText));
System.out.println("Full escape: "+someText.replaceAll("[-\\[\\]{}()*+?.,\\\\\\\\^$|#\\\\s]", "\\\\$0"));

^(Negation) symbol is used to match something that is not in the character group.
This is the link to Regular Expressions
Here is the image info about negation:

Correct. escape sequence for over bar?

I am using a string as my source for an equation, and whenever I try to add something like an overbar tag which is:
\ov5\ - creates a bar over the 5
However, when I add this into a Java string, for it to compile I am required to write it like this:
string x= "\\ov5\\";
It would appear that this way breaks JQMath and doesn't work, resulting in a broken equation. Here is the code in case I did something terribly wrong:
WebView webView;
String functext = "$$\\ov55\\$$";
js = "<html><head>"
+ "<link rel='stylesheet' href='file:///android_asset/mathscribe/jqmath-0.4.3.css'>"
+ "<script src='file:///android_asset/mathscribe/jquery-1.4.3.min.js'></script>"
+ "<script src='file:///android_asset/mathscribe/jqmath-etc-0.4.3.min.js'></script>"
+ "</head><body>"
+ functext + "</body></html>";
webView.loadDataWithBaseURL("", js, "text/html", "UTF-8", "");
EDIT: For clarification, the end result oddly reads "$$\ov55$$".
Please note that when I try the same string on JQMath's website page here, it works as intended.
EDIT2: Here are some debug values for a breakpoint placed at webView.loadDataWithBaseURL:
actual string: String functext = "$$\\\\ov55\\\\$$";
actual displayed result: $$\ov55\$$
debug results:
functext = $$\\ov55\\$$
js = <html><head><link rel='stylesheet' href='file:///android_asset/mathscribe/jqmath-0.4.3.css'><script src='file:///android_asset/mathscribe/jquery-1.4.3.min.js'></script><script src='file:///android_asset/mathscribe/jqmath-etc-0.4.3.min.js'></script></head><body>$$\\ov55\\$$</body></html>
Any help with loading it in another way other than a string would help greatly.

I think you want this:
String functext = "$$\\ov55\\$$";
(The first \ needs to be before the ov operator.)
EDIT: Another possibility (since the above was evidently just a typo in your post, not in your code) is that somewhere in the pipeline the string is being interpolated a second time. In that case, you would need to double-escape the backslashes:
String functext = "$$\\\\ov55\\\\$$";
P.S. If the end result reads "$$\ov55$$" then the problem seems to be before jqmath sees anything. The code you posted definitely does not produce that result for me.

Also jqMath accepts ` (backquote) in place of \ if that makes things easier. Finally, I'd put a space between the ov and the 5 to clarify that it's not a macro named ov5. (Plus see my comment above to remove the final \.)

Need java Regex to remove/replace the XML elements from specific string

I have a problem in getting the correct Regular expression.I have below xml as string
<user_input>
<UserInput Question="test Q?" Answer=<value>0</value><sam#testmail.com>"
</user_input>
Now I need to remove the xml character from Answer attribute only.
So I need the below:-
<user_input>
<UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
I have tried the below regex but did not worked out:-
str1.replaceAll("Answer=.*?<([^<]*)>", "$1");
its removing all the text before..
Can anyone help please?

You need to put ? within the first group to make it none greedy, also you dont need Answer=.*?:
str1.replaceAll("<([^<]*?)>", "$1")
DEMO

httpRequest.send("msg="+data+"&TC="+TC); try like this

Although variable width look-behinds are not supported in Java, you can work around it with .{0,1000} that should suffice.
Please check out this approach using 2 regexes, or 1 regex and 1 replace. Choose the one that suits best (I removed the \n line break from the first input string to show the flaw with using simple replace):
String input = "<user_input><UserInput Question=\"test Q?\" Answer=<value>0</value><sam#testmail.com>\"\n</user_input>";
String st = input.replace("><", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
String st1 = input.replaceAll("(?<=Answer=.{0,1000})><(?=[^\"]*\")", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
System.out.println(st + "\n" + st1);
Output of a sample program:
<user_input UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>
<user_input><UserInput Question="test Q?" Answer=value0value sam#testmail.com"
</user_input>

First off, in your sample above, there is a trailing " after the email and > which I do not know if it was placed by error.
However, I will keep it there as according to your expected result, you need it to still be present.
This is my hack.
(Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>) to replace it with
$1$3$5$8 $10
The explanation...
(Answer=)(<)(value)(>) matches from Answer to the start of the value 0
(.+?([^<]*) matches the result from 0 or more right to the beginning < which starts the closing value tag
(</) here, I still select this since it was dropped in the previous expression
(><) I will later replace this with a space
(.+?([^>]*) This matches from the start of the email and excludes the > after the .com
(>) this one selects the last > which I will later drop when replacing.
The trailing " is not selected as I will rather not touch it as requested.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regular expressions in java - java

https://stackoverflow.com/questions/1732348#1732454 You have been warned.

Related

How would I ignore a single quotation mark within a string that interacts with my embedded quotation marks?

RegEx for matching between any two HTML tags

Escape special characters using Regex in java [duplicate]

Correct. escape sequence for over bar?

Need java Regex to remove/replace the XML elements from specific string

Categories

Resources