Regex matches but doesn't work on Java

Regex matches but doesn't work on Java - java

I'm using this regex to match and replace a part of a string:
(.*)<a href=\\"(.*)\\" class=\\"PopupItemLink\\">(.*)<\\\/a>(.*)
This string is an example: (https://regexr.com/3n1f1)
\n&7This is the alert body\n\nYour name: HAlexTM\nYour ID: 1\nHere the link: <a href=\"test.com\" class=\"PopupItemLink\">Hey<\/a>\n\nThis is a html test: <p>Hey<\/p>\n&8Thu Jun 09 18:07:30 CEST 2016
This part of the string (matched by the RegEX) should be replaced with Hey
<a href=\"test.com\" class=\"PopupItemLink\">Hey<\/a>
So in Java I use this code
if (asit.matches("(.*)<a href=\\\\\"(.*)\\\\\" class=\\\\\"PopupItemLink\\\\\">(.*)<\\\\\\/a>(.*)")) {
asit.replaceAll("<a href=\\\\\"(.*)\\\\\" class=\\\\\"PopupItemLink\\\\\"", "$1");
asit.replaceAll(">(.*)<\\\\\\/a>", "$1");
return asit;
}
But it doesn't return anything, what's the problem?

I've resolved it removing the if block and write just replaceAll(), reassigning the variable value since Strings are immutable (thanks to #PM77-1)
output = output.replaceAll("<a href=\"(.*)\" class=\"PopupItemLink\"", "$1");
output = output.replaceAll(">(.*)<\\/a>", " ($1)");
return output;

Related

Extracting string from filename after removing Date from the string

I have a requirement where depending on the filename I am required to call different methods
filename example are as below
Abc_def_20180719_ghi.txt
Pqr_xy_gh_20180730.txt
Here I want to remove all the characters once I encounter datepattern
So the output should be like:
"Abc_def"
"Pqr_xy_gh"
Please suggest suitable string operations with regex

For filtering all numbers you can use: yourText.replaceAll("[0-9]","") .
But if you want to drop the .txt use: yourTextAfterReplacingAll.split("\\.")
The text you want is in yourTextAfterSplit[0]

You can use following regex to detect required portion of file name
/.+(?=_\d{8})/
For demonstration have a look here. It detects any character except line breaks before an underscore and concurrent 8 digits which is pattern of date.

It may be overthinking it a little to validate that the date at least superficially looks like a good date. This regex could be simplified if you don't care about invalid dates like 10664964.
import java.util.Optional;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DatePrefix {
// no digits before date; year must be between 2000 and 9999
// month from 01 to 12, day from 01 to 31
private static Pattern beforeDate = Pattern.compile(
"([^0-9]+)_[2-9]\\d{3}(?:0[1-9]|1[0-2])(?:0[1-9]|[1-2]\\d|3[01])");
public static void main(String[] args) {
for (String filename : args) {
getPrefixBeforeDate(filename)
.ifPresentOrElse(
prefix -> System.out.format("Found %s%n", prefix),
() -> System.out.format("Bad date: %s%n", filename));
}
}
public static Optional<String> getPrefixBeforeDate(String filename) {
Matcher matcher = beforeDate.matcher(filename);
if (matcher.find()) {
return Optional.of(matcher.group(1));
}
return Optional.empty();
}
}
When called with:
java DatePrefix Pq_xy_20180229.txt Abc_def_ghi_20380323_foo_1200.xml \
Hey_its_20182395.gif Foo_bar.txt
It prints:
Found Pq_xy
Found Abc_def_ghi
Bad date: Hey_its_20182395.gif
Bad date: Foo_bar.txt
The pattern could simply be the following if you don't care whether the date looks at all valid:
private static Pattern beforeDate = Pattern.compile("([^0-9]+)_\\d{8}");

Try this pattern:
[\w\d]+[A-Z-a-z][_]
You can test is online

Regexp not matching text on windows, working as expected on a Mac

I have the following code
String timeStampSentence = lastedEditedElement.getText();
LOG.info(timeStampSentence);
Pattern timestampPattern =
Pattern.compile("Last edited by [a-zA-Z]* on ([a-zA-Z]* [0-9]*), ([0-9]*) at ([0-9:]*) ([amp]*)");
Matcher matcher = timestampPattern.matcher(timeStampSentence);
String day = matcher.group(1);
The string I'm trying to match ( as output from the LOG ) is Last edited by admin on January 27, 2017 at 8:12 pm which, testing online, is matched
The exception thrown is ( on Windows, as on Mac it looks like it's working )
java.lang.IllegalStateException: No match found at
java.util.regex.Matcher.group(Matcher.java:536) at
com.xxx.integration.test.notification.steps.wordpress.editor.WordPressEditorSteps.iShouldBeAbleToSeeTheLastEditedTimestampOnTheEditorPage(WordPressEditorSteps.java:249)
at ✽.And I should be able to see the Last edited timestamp on the
editor
page(F:/content-stack-integration-tests/src/test/resources/features/wordpress/stories-dashboard.feature:24)
jdk is 1.8.0_111 on Windows 7

You should call Matcher#matches() prior to retrieving the first group:
Matcher matcher = timestampPattern.matcher(timeStampSentence);
if(matcher.matches()) {
String day = matcher.group(1);
...
}
The documentation of group() mentions:
Returns the input subsequence captured by the given group during the previous match operation.

Java Split String by colon on both side

Can you suggest me an approach by which I can split a String which is like:
:31C:150318
:31D:150425 IN BANGLADESH
:20:314015040086
So I tried to parse that string with
:[A-za-z]|\\d:
This kind of regular expression, but it is not working . Please suggest me a regular expression by which I can split that string with 20 , 31C , 31D etc as Keys and 150318 , 150425 IN BANGLADESH etc as Values .
If I use string.split(":") then it would not serve my purpose.
If a string is like:
:20: MY VALUES : ARE HERE
then It will split up into 3 string , and key 20 will be associated with "MY VALUES" , and "ARE HERE" will not associated with key 20 .

You may use matching mechanism instead of splitting since you need to match a specific colon in the string.
The regex to get 2 groups between the first and second colon and also capture everything after the second colon will look like
^:([^:]*):(.*)$
See demo. The ^ will assert the beginning of the string, ([^:]*) will match and capture into Group 1 zero or more characters other than :, and (.*) will match and capture into Group 2 the rest of the string. $ will assert the position at the end of a single line string (as . matches any symbol but a newline without Pattern.DOTALL modifier).
String s = ":20:AND:HERE";
Pattern pattern = Pattern.compile("^:([^:]*):(.*)$");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1) + ", Value: " + matcher.group(2) + "\n");
}
Result for this demo: Key: 20, Value: AND:HERE

You can use the following to split:
^[:]+([^:]+):

Try with split function of String class
String[] splited = string.split(":");
For your requirements:
String c = ":31D:150425 IN BANGLADESH:todasdsa";
c=c.substring(1);
System.out.println("C="+c);
String key= c.substring(0,c.indexOf(":"));
String value = c.substring(c.indexOf(":")+1);
System.out.println("key="+key+" value="+value);
Result:
C=31D:150425 IN BANGLADESH:todasdsa
key=31D value=150425 IN BANGLADESH:todasdsa

Regex: Get all words until a number of a special character is found

I am trying to extract movie names from a list that looks like this:
The Maze Runner 2014 DVDRip XviD MP3-RARBG
Fury 2014 DVDSCR x264 AC3-Blackjesus
Dracula's Untold Story (WebRip / 2014)
I need to extract the words up to the year or a special character like ( or [ but not '
The Maze Runner 2014 DVDRip XviD MP3-RARBG ==> The Maze Runner
Fury 2014 DVDSCR x264 AC3-Blackjesus ==> Fury
Dracula's Untold Story (WebRip / 2014) == Dracula's Untold Story
Dracula's Untold Story [WebRip / 2014] == Dracula's Untold Story
I have no idea how to go on about writing a complex regex like this. Any ideas?

The below code snippet can be helpful to meet your requirements
public static String extractMovieName(String movieNameString){
Pattern pattern = Pattern.compile("([\\w' ]+)([\\[]|[\\(]|[\\d]{4})");
Matcher matcher = pattern.matcher(movieNameString);
String extractedName = "";
if(matcher.find()){
extractedName = matcher.group(1);
}
return extractedName;
}

^[a-zA-Z0-9\ '-]+(?=\b\d{4}\b|\()
Try this.See demo.
http://regex101.com/r/yR3mM3/4

Try below code:
Example:
System.out.println("Fury 2014 DVDSCR x264 AC3-Blackjesus".replaceAll("\\s(\\d|\\(|\\[).*", ""));

Remove data between two strings

I have to remove data between two string as below
<PACKET>752</PACKET>
<TIME>23-Oct-2013 12:05:46 GMT Standard Time</TIME>
<INTERVAL>2</INTERVAL>
<HEADER>hi this should not be printed only</HEADER>
<DATA></DATA>
In this I have to remove data between <HEADER> and </HEADER> . Can any body give me regex for this?

I think this can do the job with RegEx:
String str="b1<HEADER>aaaaa</HEADER>b2";
String newstring = str.replaceAll("<HEADER[^>]*>([^<]*)<\\/HEADER>", "");
System.out.println(newstring);
This prints b1b2
In the case that you have other tags inside <HEADER> the above will fail. Consider the below example :
String str = "b1<HEADER>aa<xxx>xx</xxx>aaa</HEADER>b2";
String newstring = str.replaceAll("<HEADER[^>]*>([^<]*)<\\/HEADER>", "");
System.out.println(newstring);
This prints: b1<HEADER>aa<xxx>xx</xxx>aaa</HEADER>b2
To overcome this and remove also the containing tags use this:
newstring = str.replaceAll("<HEADER.+?>([^<]*)</HEADER>", "");
This will print b1b2.

Maroun's right that it's not a good idea, but if you have to do it then this might work:
(?ms)(.*<HEADER>).*(<\/HEADER>.*)
This captures everything up to and including <HEADER> in group 1, and everything from </HEADER> onwards in group 2. You can then concatenate the two to remove the bit in the middle.
See here: http://regex101.com/r/bC2eQ7

This RegEx replaces everything inside the tag with en empty String:
String input = "<PACKET>752</PACKET>...<HEADER>hi this should be printed only</HEADER><DATA></DATA>";
String output = input.replaceAll("(?<=<HEADER>).*?(?=</HEADER>)", "");
Result:
<PACKET>752</PACKET>...<HEADER></HEADER><DATA></DATA>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex matches but doesn't work on Java - java

I've resolved it removing the if block and write just replaceAll(), reassigning the variable value since Strings are immutable (thanks to #PM77-1) output = output.replaceAll("<a href=\"(.)\" class=\"PopupItemLink\"", "$1"); output = output.replaceAll(">(.)<\\/a>", " ($1)"); return output;

Related

Extracting string from filename after removing Date from the string

Regexp not matching text on windows, working as expected on a Mac

Java Split String by colon on both side

Regex: Get all words until a number of a special character is found

Remove data between two strings

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex matches but doesn't work on Java - java

I've resolved it removing the if block and write just replaceAll(), reassigning the variable value since Strings are immutable (thanks to #PM77-1) output = output.replaceAll("<a href=\"(.*)\" class=\"PopupItemLink\"", "$1"); output = output.replaceAll(">(.*)<\\/a>", " ($1)"); return output;

Related

Extracting string from filename after removing Date from the string

Regexp not matching text on windows, working as expected on a Mac

Java Split String by colon on both side

Regex: Get all words until a number of a special character is found

Remove data between two strings

Categories

Resources

I've resolved it removing the if block and write just replaceAll(), reassigning the variable value since Strings are immutable (thanks to #PM77-1) output = output.replaceAll("<a href=\"(.)\" class=\"PopupItemLink\"", "$1"); output = output.replaceAll(">(.)<\\/a>", " ($1)"); return output;