I'm using this regex to match and replace a part of a string:
(.*)<a href=\\"(.*)\\" class=\\"PopupItemLink\\">(.*)<\\\/a>(.*)
This string is an example: (https://regexr.com/3n1f1)
\n&7This is the alert body\n\nYour name: HAlexTM\nYour ID: 1\nHere the link: <a href=\"test.com\" class=\"PopupItemLink\">Hey<\/a>\n\nThis is a html test: <p>Hey<\/p>\n&8Thu Jun 09 18:07:30 CEST 2016
This part of the string (matched by the RegEX) should be replaced with Hey
<a href=\"test.com\" class=\"PopupItemLink\">Hey<\/a>
So in Java I use this code
if (asit.matches("(.*)<a href=\\\\\"(.*)\\\\\" class=\\\\\"PopupItemLink\\\\\">(.*)<\\\\\\/a>(.*)")) {
asit.replaceAll("<a href=\\\\\"(.*)\\\\\" class=\\\\\"PopupItemLink\\\\\"", "$1");
asit.replaceAll(">(.*)<\\\\\\/a>", "$1");
return asit;
}
But it doesn't return anything, what's the problem?
I've resolved it removing the if block and write just replaceAll(), reassigning the variable value since Strings are immutable (thanks to #PM77-1)
output = output.replaceAll("<a href=\"(.*)\" class=\"PopupItemLink\"", "$1");
output = output.replaceAll(">(.*)<\\/a>", " ($1)");
return output;
Related
I have a requirement where depending on the filename I am required to call different methods
filename example are as below
Abc_def_20180719_ghi.txt
Pqr_xy_gh_20180730.txt
Here I want to remove all the characters once I encounter datepattern
So the output should be like:
"Abc_def"
"Pqr_xy_gh"
Please suggest suitable string operations with regex
For filtering all numbers you can use: yourText.replaceAll("[0-9]","") .
But if you want to drop the .txt use: yourTextAfterReplacingAll.split("\\.")
The text you want is in yourTextAfterSplit[0]
You can use following regex to detect required portion of file name
/.+(?=_\d{8})/
For demonstration have a look here. It detects any character except line breaks before an underscore and concurrent 8 digits which is pattern of date.
It may be overthinking it a little to validate that the date at least superficially looks like a good date. This regex could be simplified if you don't care about invalid dates like 10664964.
import java.util.Optional;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DatePrefix {
// no digits before date; year must be between 2000 and 9999
// month from 01 to 12, day from 01 to 31
private static Pattern beforeDate = Pattern.compile(
"([^0-9]+)_[2-9]\\d{3}(?:0[1-9]|1[0-2])(?:0[1-9]|[1-2]\\d|3[01])");
public static void main(String[] args) {
for (String filename : args) {
getPrefixBeforeDate(filename)
.ifPresentOrElse(
prefix -> System.out.format("Found %s%n", prefix),
() -> System.out.format("Bad date: %s%n", filename));
}
}
public static Optional<String> getPrefixBeforeDate(String filename) {
Matcher matcher = beforeDate.matcher(filename);
if (matcher.find()) {
return Optional.of(matcher.group(1));
}
return Optional.empty();
}
}
When called with:
java DatePrefix Pq_xy_20180229.txt Abc_def_ghi_20380323_foo_1200.xml \
Hey_its_20182395.gif Foo_bar.txt
It prints:
Found Pq_xy
Found Abc_def_ghi
Bad date: Hey_its_20182395.gif
Bad date: Foo_bar.txt
The pattern could simply be the following if you don't care whether the date looks at all valid:
private static Pattern beforeDate = Pattern.compile("([^0-9]+)_\\d{8}");
Try this pattern:
[\w\d]+[A-Z-a-z][_]
You can test is online
I have the following code
String timeStampSentence = lastedEditedElement.getText();
LOG.info(timeStampSentence);
Pattern timestampPattern =
Pattern.compile("Last edited by [a-zA-Z]* on ([a-zA-Z]* [0-9]*), ([0-9]*) at ([0-9:]*) ([amp]*)");
Matcher matcher = timestampPattern.matcher(timeStampSentence);
String day = matcher.group(1);
The string I'm trying to match ( as output from the LOG ) is Last edited by admin on January 27, 2017 at 8:12 pm which, testing online, is matched
The exception thrown is ( on Windows, as on Mac it looks like it's working )
java.lang.IllegalStateException: No match found at
java.util.regex.Matcher.group(Matcher.java:536) at
com.xxx.integration.test.notification.steps.wordpress.editor.WordPressEditorSteps.iShouldBeAbleToSeeTheLastEditedTimestampOnTheEditorPage(WordPressEditorSteps.java:249)
at ✽.And I should be able to see the Last edited timestamp on the
editor
page(F:/content-stack-integration-tests/src/test/resources/features/wordpress/stories-dashboard.feature:24)
jdk is 1.8.0_111 on Windows 7
You should call Matcher#matches() prior to retrieving the first group:
Matcher matcher = timestampPattern.matcher(timeStampSentence);
if(matcher.matches()) {
String day = matcher.group(1);
...
}
The documentation of group() mentions:
Returns the input subsequence captured by the given group during the previous match operation.
Can you suggest me an approach by which I can split a String which is like:
:31C:150318
:31D:150425 IN BANGLADESH
:20:314015040086
So I tried to parse that string with
:[A-za-z]|\\d:
This kind of regular expression, but it is not working . Please suggest me a regular expression by which I can split that string with 20 , 31C , 31D etc as Keys and 150318 , 150425 IN BANGLADESH etc as Values .
If I use string.split(":") then it would not serve my purpose.
If a string is like:
:20: MY VALUES : ARE HERE
then It will split up into 3 string , and key 20 will be associated with "MY VALUES" , and "ARE HERE" will not associated with key 20 .
You may use matching mechanism instead of splitting since you need to match a specific colon in the string.
The regex to get 2 groups between the first and second colon and also capture everything after the second colon will look like
^:([^:]*):(.*)$
See demo. The ^ will assert the beginning of the string, ([^:]*) will match and capture into Group 1 zero or more characters other than :, and (.*) will match and capture into Group 2 the rest of the string. $ will assert the position at the end of a single line string (as . matches any symbol but a newline without Pattern.DOTALL modifier).
String s = ":20:AND:HERE";
Pattern pattern = Pattern.compile("^:([^:]*):(.*)$");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1) + ", Value: " + matcher.group(2) + "\n");
}
Result for this demo: Key: 20, Value: AND:HERE
You can use the following to split:
^[:]+([^:]+):
Try with split function of String class
String[] splited = string.split(":");
For your requirements:
String c = ":31D:150425 IN BANGLADESH:todasdsa";
c=c.substring(1);
System.out.println("C="+c);
String key= c.substring(0,c.indexOf(":"));
String value = c.substring(c.indexOf(":")+1);
System.out.println("key="+key+" value="+value);
Result:
C=31D:150425 IN BANGLADESH:todasdsa
key=31D value=150425 IN BANGLADESH:todasdsa
I am trying to extract movie names from a list that looks like this:
The Maze Runner 2014 DVDRip XviD MP3-RARBG
Fury 2014 DVDSCR x264 AC3-Blackjesus
Dracula's Untold Story (WebRip / 2014)
I need to extract the words up to the year or a special character like ( or [ but not '
The Maze Runner 2014 DVDRip XviD MP3-RARBG ==> The Maze Runner
Fury 2014 DVDSCR x264 AC3-Blackjesus ==> Fury
Dracula's Untold Story (WebRip / 2014) == Dracula's Untold Story
Dracula's Untold Story [WebRip / 2014] == Dracula's Untold Story
I have no idea how to go on about writing a complex regex like this. Any ideas?
The below code snippet can be helpful to meet your requirements
public static String extractMovieName(String movieNameString){
Pattern pattern = Pattern.compile("([\\w' ]+)([\\[]|[\\(]|[\\d]{4})");
Matcher matcher = pattern.matcher(movieNameString);
String extractedName = "";
if(matcher.find()){
extractedName = matcher.group(1);
}
return extractedName;
}
^[a-zA-Z0-9\ '-]+(?=\b\d{4}\b|\()
Try this.See demo.
http://regex101.com/r/yR3mM3/4
Try below code:
Example:
System.out.println("Fury 2014 DVDSCR x264 AC3-Blackjesus".replaceAll("\\s(\\d|\\(|\\[).*", ""));
I have to remove data between two string as below
<PACKET>752</PACKET>
<TIME>23-Oct-2013 12:05:46 GMT Standard Time</TIME>
<INTERVAL>2</INTERVAL>
<HEADER>hi this should not be printed only</HEADER>
<DATA></DATA>
In this I have to remove data between <HEADER> and </HEADER> . Can any body give me regex for this?
I think this can do the job with RegEx:
String str="b1<HEADER>aaaaa</HEADER>b2";
String newstring = str.replaceAll("<HEADER[^>]*>([^<]*)<\\/HEADER>", "");
System.out.println(newstring);
This prints b1b2
In the case that you have other tags inside <HEADER> the above will fail. Consider the below example :
String str = "b1<HEADER>aa<xxx>xx</xxx>aaa</HEADER>b2";
String newstring = str.replaceAll("<HEADER[^>]*>([^<]*)<\\/HEADER>", "");
System.out.println(newstring);
This prints: b1<HEADER>aa<xxx>xx</xxx>aaa</HEADER>b2
To overcome this and remove also the containing tags use this:
newstring = str.replaceAll("<HEADER.+?>([^<]*)</HEADER>", "");
This will print b1b2.
Maroun's right that it's not a good idea, but if you have to do it then this might work:
(?ms)(.*<HEADER>).*(<\/HEADER>.*)
This captures everything up to and including <HEADER> in group 1, and everything from </HEADER> onwards in group 2. You can then concatenate the two to remove the bit in the middle.
See here: http://regex101.com/r/bC2eQ7
This RegEx replaces everything inside the tag with en empty String:
String input = "<PACKET>752</PACKET>...<HEADER>hi this should be printed only</HEADER><DATA></DATA>";
String output = input.replaceAll("(?<=<HEADER>).*?(?=</HEADER>)", "");
Result:
<PACKET>752</PACKET>...<HEADER></HEADER><DATA></DATA>