REGEX: GETTING VALUE OF href="" EXCEPT FOR PATICULAR STRING - java

Here's my regex code:
\\s*(?i)href\\s*=\\s*(\"(([^\"]*\")|'[^']*'|([^'\">\\s]+)))
Actually the real problem is like this. I want to change the value for each href that will match except for these two types <link href="foo.css"> and <link href="boo.ico">. I want to retain the value of these two Strings.
Pattern p = Pattern.compile(HTML_A_HREF);
Matcher m = p.matcher(getLine());
setNewLine(m.replaceAll((String.format("%-1s", sp))+"href=\"javascript:history.go(0)\"" + (String.format("%-1s", sp))));
getLine() is the html file itself.
String sp = "";

Your regex is off. To show you, let me explode it:
\\s*(?i)href\\s*=\\s*
(\"
(
([^\"]*\")
|
'[^']*'
|
([^'\">\\s]+)
)
)
The leading double-quote is outside the multi-choice block. It needs to be in the first choice section.
Also:
You should put (?i) first.
With \" inside first choice, one set of parenthesis goes away.
You don't need parenthesis in choice sections.
Parenthesis around choice block should be non-capturing.
So, that means:
(?i)\\s*href\\s*=\\s*
(?:
\"[^\"]*\"
|
'[^']*'
|
[^'\">\\s]+
)
Which is (?i)\\s*href\\s*=\\s*(?:\"[^\"]*\"|'[^']*'|[^'\">\\s]+).
As for the replacement code:
String sp = "";
m.replaceAll((String.format("%-1s", sp))
+
"href=\"javascript:history.go(0)\""
+
(String.format("%-1s", sp))
)
What is the purpose of (String.format("%-1s", sp)) when sp = ""??? An empty string, formatted to fit at least 1 space, left-aligned. That is a single space, i.e. " ", so why all that overhead?
m.replaceAll(" href=\"javascript:history.go(0)\" ")
Finally, you want to exclude foo.css and boo.ico.
One way to do that is with a negative lookahead. Since you have 3 choices, you need to repeat it 3 times:
(?i)\\s*href\\s*=\\s*
(?:
\"(?!foo\\.css|boo\\.ico)[^\"]*\"
|
'(?!foo\\.css|boo\\.ico)[^']*'
|
(?!foo\\.css|boo\\.ico)[^'\">\\s]+
)
I'll let you collapse that back to one line.
UPDATE
If you want to exclude all .css and .ico files, use a negative lookbehind instead.
Also, I forgot to escape the . before, sorry. Fixed that.
(?i)\\s*href\\s*=\\s*
(?:
\"[^\"]*(?<!\\.css|\\.ico)\"
|
'[^']*(?<!\\.css|\\.ico)'
|
[^'\">\\s]+(?<!\\.css|\\.ico)
)

Related

How to remove everything after specific character in string using Java

I have a string that looks like this:
analitics#gmail.com#5
And it represents my userId.
I have to send that userId as parameter to the function and send it in the way that I remove number 5 after second # and append new number.
I started with something like this:
userService.getUser(user.userId.substring(0, userAfterMigration.userId.indexOf("#") + 1) + 3
What is the best way of removing everything that comes after the second # character in string above using Java?
Here is a splitting option:
String input = "analitics#gmail.com#5";
String output = String.join("#", input.split("#")[0], input.split("#")[1]) + "#";
System.out.println(output); // analitics#gmail.com#
Assuming your input would only have two at symbols, you could use a regex replacement here:
String input = "analitics#gmail.com#5";
String output = input.replaceAll("#[^#]*$", "#");
System.out.println(output); // analitics#gmail.com#
You can capture in group 1 what you want to keep, and match what comes after it to be removed.
In the replacement use capture group 1 denoted by $1
^((?:[^#\s]+#){2}).+
^ Start of string
( Capture group 1
(?:[^#\s]+#){2} Repeat 2 times matching 1+ chars other than #, and then match the #
) Close group 1
.+ Match 1 or more characters that you want to remove
Regex demo | Java demo
String s = "analitics#gmail.com#5";
System.out.println(s.replaceAll("^((?:[^#\\s]+#){2}).+", "$1"));
Output
analitics#gmail.com#
If the string can also start with ##1 and you want to keep ## then you might also use:
^((?:[^#]*#){2}).+
Regex demo
The simplest way that would seem to work for you:
str = str.replaceAll("#[^.]*$", "");
See live demo.
This matches (and replaces with blank to delete) # and any non-dot chars to the end.

String replacement when regex reverse group is null in java

I want to convert a software version number into a github tag name by regular expression.
For example, the version of ognl is usually 3.2.1. What I want is the tag name OGNL_3_2_1
So we can use String::replaceAll(String regex, String replacement) method like this
"3.2.1".replaceAll("(\d+).(\d+).(\d+)", "OGNL_$1_$2_$3")
And we can get the tag name OGNL_3_2_1 easily.
But when it comes to 3.2, I want the regex still working so I change it into (\d+).(\d+)(?:.(\d+))?.
Execute the code again, what I get is OGNL_3_2_ rather than OGNL_3_2. The underline _ at the tail is not what I want. It is resulted by the null group for $3
So how can I write a suitable replacement to solve this case?
When the group for $3 is null, the underline _ should disappear
Thanks for your help !!!
You can make the last . + digits part optional by enclosing it with an optional non-capturing group and use a lambda as a replacement argument with Matcher.replaceAll in the latest Java versions:
String regex = "(\\d+)\\.(\\d+)(?:\\.(\\d+))?";
Pattern p = Pattern.compile(regex);
String s="3.2.1";
Matcher m = p.matcher(s);
String result = m.replaceAll(x ->
x.group(3) != null ? "OGNL_" + x.group(1) + "_" + x.group(2) + "_" + x.group(3) :
"OGNL_" + x.group(1) + "_" + x.group(2) );
System.out.println(result);
See the Java demo.
The (\d+)\.(\d+)(?:\.(\d+))? pattern (note that literal . are escaped) matches and captures into Group 1 any one or more digits, then matches a dot, then captures one or more digits into Group 2 and then optionally matches a dot and digits (captured into Group 3). If Group 3 is not null, add the _ and Group 3 value, else, omit this part when building the final replacement value.

How to match java regexp between some '#'?

I am facing an issue with the String.replaceFirst method.
I have the following String :
String content = "select * from queries
where update_date >= to_timestamp('#date|Date debut|dd/MM/yyyy# 00:00:00','DD/MM/YYYY HH24:MI:SS')
and update_date <= to_timestamp('#date|Date fin|dd/MM/yyyy# 23:59:59','DD/MM/YYYY HH24:MI:SS')";
(The two expressions between '#' are dynamically defined).
And I have 2 dates too :
String begin = "28/05/2018";
String end = "29/05/2018";
Then I would to replace the first expression with begin, and the second with end.
I use :
content = content.replaceFirst("#(date)\\|(.*)\\|(.*)#", begin);
content = content.replaceFirst("#(date)\\|(.*)\\|(.*)#", end);
Although, replaceFirst takes the last '#' of entire String and I am obtaining:
select * from queries where update_date >= to_timestamp('28/05/2018 23:59:59','DD/MM/YYYY HH24:MI:SS');
I understand the error but I ask you to help me to find a solution.
Thank you a lot ! Axel.
If looking for a generic regex for both replacements as your question's code seems to want, this is how to make it work:
the regex for .* that captures all characters is greedy by default, it means that it will try to capture as many characters as it can. This is why your first replacement replaces all.
You can use the lazy quantifier ? to precise that you want to capture the less characters possible instead of the most.
try:
#(date)\|(.*?)\|(.*?)#
(or escaped version for your code: "#(date)\\|(.*?)\\|(.*?)#")
see regex in regex101
When reading your question, I was not sure whether the text between #s (here I mean "date|Date debut|dd/MM/yyyy" and "date|Date fin|dd/MM/yyyy") were dynamically defined or if you were just explaining that you wanted to dynamically replace the fix contents above with your dynamically defined dates.
So I will give you two answers (and both should work).
If the text is fix:
#date\|Date debut\|dd/MM/yyyy# - for the first range
#date\|Date fin\|dd/MM/yyyy# - for the second range
If the text between # is not fix:
#[^#]*#
The regex above means find a range of chars that start with a #, than contains any chars except a #, this is what [^#] means, 0 or several times (the *) and ends with a #
I hope it helps!
Try this:
String content = "select * from queries " +
"where update_date >= to_timestamp('#date|Date debut|dd/MM/yyyy# 00:00:00','DD/MM/YYYY HH24:MI:SS') " +
"and update_date <= to_timestamp('#date|Date fin|dd/MM/yyyy# 23:59:59','DD/MM/YYYY HH24:MI:SS') ;";
String begin = "28/05/2018";
String end = "29/05/2018";
content = content.replaceFirst( "#date\\|[^\\|]*\\|[^#]*#", begin );
content = content.replaceFirst( "#date\\|[^\\|]*\\|[^#]*#", end );
System.out.println( content );
Here we don't need to use the () and we are matching until our character like | or # matched.

How to trim/cut string in java by symbol?

I'm working on a project where my API returns url with id at the end of it and I want to extract it to be used in another function. Here's example url:
String advertiserUrl = http://../../.../uuid/advertisers/4 <<< this is the ID i want to extract.
At the moment I'm using java's string function called substring() but this not the best approach as IDs could become 3 digit numbers and I would only get part of it. Heres my current approach:
String id = advertiserUrl.substring(advertiserUrl.length()-1,advertiserUrl.length());
System.out.println(id) //4
It works in this case but if id would be e.g "123" I would only get it as "3" after using substring, so my question is: is there a way to cut/trim string using dashes "/"? lets say theres 5 / in my current url so the string would get cut off after it detects fifth dash? Also any other sensible approach would be helpful too. Thanks.
P.s uuid in url may vary in length too
You don't need to use regular expressions for this.
Use String#lastIndexOf along with substring instead:
String advertiserUrl = "http://../../.../uuid/advertisers/4";// <<< this is the ID i want to extract.
// this implies your URLs always end with "/[some value of undefined length]".
// Other formats might throw exception or yield unexpected results
System.out.println(advertiserUrl.substring(advertiserUrl.lastIndexOf("/") + 1));
Output
4
Update
To find the uuid value, you can use regular expressions:
String advertiserUrl = "http://111.111.11.111:1111/api/ppppp/2f5d1a31-878a-438b-a03b-e9f51076074a/adver‌​tisers/9";
// | preceded by "/"
// | | any non-"/" character, reluctantly quantified
// | | | followed by "/advertisers"
Pattern p = Pattern.compile("(?<=/)[^/]+?(?=/adver‌​tisers)");
Matcher m = p.matcher(advertiserUrl);
if (m.find()) {
System.out.println(m.group());
}
Output
2f5d1a31-878a-438b-a03b-e9f51076074a
You can either split the string on slashes and take the last position of the array returned, or use the lastIndexOf("/") to get the index of the last slash and then substring the rest of the string.
Use the lastIndexOf() method, which returns the index of the last occurrence of the specified character.
String id = advertiserUrl.substring(advertiserUrl.lastIndexOf('/') + 1, advertiserUrl.length());

Regular Expression for string in java

I am trying to write a regular expression for these find of strings
05 IMA-POLICY-ID PIC X(15). 00020068
05 (AMENT)-GROUPCD PIC X(10).
I want to parse anything between 05 and first tab .
The line might start with tabs or spaces and then digit
Initial number can be anything 05,10,15 .
So In the first line I need to pasrse IMA-POLICY-ID and in second line (AMENT)-GROUPCD
This is the code i have written and its not finding the pattern where am i going wrong ?
Pattern p1 = Pattern.compile("^[0-9]+\\s\\S+\t$");
Matcher m1 = p1.matcher(line);
System.out.println("m1 =="+m1.group());
Pattern p1 = Pattern.compile("\\b(?:05|1[05])\\b[^\\t]*\\t");
will match anything from 05, 10 or 15 until the nearest \t.
Explanation:
\b # Start of number/word
(?:05|1[05]) # Match 05, 10 or 15
\b # End of number/word
[^\t]* # Match any number of characters except tab
\t # Match a tab
^\d+\s+([^\s]+)
this will match your requirement
demo here : http://regex101.com/r/rQ7fT3
Your regex is almost correct. Just remove the \t$ at the end of your regex. and capture the \\S+ as a group.
Pattern p1 = Pattern.compile("^[0-9]+\\s(\\S+)");
Now print it as:
if (m.find( )) {
System.out.println(m.group(1));
}
Your pattern expects the line to end after IMA-POLICY-ID etc, because of the $ at the end.
If there is no white space in the string you want to match (I assume there isn't because of your use of \S+, I'd change the pattern to ^\d+\s+(\S+) which should be sufficient to match any number at the start of a line, followed by whitespace and then the group of non-whitespace characters you want to match (note that a tab is whitespace as well).
If you need to match until the first tab or the end of the input and include other whitespace, replace (\S+) with ([^\t]+).
I can see two things that might prevent your Pattern from working.
Firstly your input Strings contain multiple tab-separated values, therefore the $ "end-of-input" character at the end of your Pattern will fail to match the String
Secondly, you want to find what's in between 05 (etc.) and the 1st tab. Therefore you need to wrap your desired expression between parenthesis (e.g. (\\S+)) and refer it by its group number (in this case, it would be group 1)
Here's an example:
String input = "05 IMA-POLICY-ID\tPIC X(15).\t00020068" +
"\r\n05 (AMENT)-GROUPCD\tPIC X(10).";
// | 0, 1, or 5 twice (refine here if needed)
// | | 1 whitespace
// | | | your queried expression (here I use a
// | | | reluctant dot search
// | | | | tab
// | | | | | anything after, reluctant
Pattern p = Pattern.compile("[015]{2}\\s(.+?)\t.+?");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found: " + m.group(1));
}
Output
Found: IMA-POLICY-ID
Found: (AMENT)-GROUPCD
This is what i came up with and it worked :
String re = "^\\s+\\d+\\s+([^\\s]+)";
Pattern p1 = Pattern.compile(re, Pattern.MULTILINE);
Matcher m1 = p1.matcher(line);

Categories

Resources