Java - Capture optional field with regexp? - java

I've a regex that correctly captures values from the result of a string.
regex is look like;
intGetHatSaatRenk_v22=anyType{SiraNo=(.*?); HatKodu=(.*?) ; GunTipi=(.*?); Gidis=(.*?); ? };
But the problem is the source is like;
intGetHatSaatRenk_v22=anyType{SiraNo=54; HatKodu=502 ; GunTipi=C; Gidis=12:00; RenkGidis=0000FF; };
intGetHatSaatRenk_v22=anyType{SiraNo=55; HatKodu=502 ; GunTipi=C; Gidis=12:07; }; intGetHatSaatRenk_v22=anyType{SiraNo=56; HatKodu=502 ; GunTipi=C; Gidis=12:14; };
as you can see there is an optional field that named RenkGidis, how can i get the value from RenkGidis if it's not null?
with the regex code that i wrote above, i can get if RenkGidis exists in group(4) like 12:00; RenkGidis=0000FF but group(4) must be only 12:00.
I hope that I could explain my problem.

Might want to make the last group optional:
intGetHatSaatRenk_v22=anyType\{SiraNo=([^;\s]*);\s+HatKodu=([^;\s]*)\s*;\s+GunTipi=([^;\s]*);\s+Gidis=([^;\s]*);(?:\s+RenkGidis=([^;\s]*);)?
As a Java string:
"intGetHatSaatRenk_v22=anyType\\{SiraNo=([^;\\s]*);\\s+HatKodu=([^;\\s]*)\\s*;\\s+GunTipi=([^;\\s]*);\\s+Gidis=([^;\\s]*);(?:\\s+RenkGidis=([^;\\s]*);)?"
At the last group ( ?: prevents the group to be captured into output. ( inside ) catpured as usual.
Also changed .*? to [^;\s]* (negation of [;\s] -> any characters, that are no white-space or ;)
As Alan mentioned in the comments, for not getting a null match for the optional part, e.g. just make RenkGidis optional and wrap the value in an alternation with nothing: ([^;\s]*;|)
intGetHatSaatRenk_v22=anyType\{SiraNo=([^;\s]*);\s+HatKodu=([^;\s]*)\s*;\s+GunTipi=([^;\s]*);\s+Gidis=([^;\s]*);(?:\s+RenkGidis=)?([^;\s]*|)
As a Java string:
"intGetHatSaatRenk_v22=anyType\\{SiraNo=([^;\\s]*);\\s+HatKodu=([^;\\s]*)\\s*;\\s+GunTipi=([^;\\s]*);\\s+Gidis=([^;\\s]*);(?:\\s+RenkGidis=)?([^;\\s]*|)"

The regex could look like this
intGetHatSaatRenk_v22=anyType\{SiraNo=(.*?); HatKodu=(.*?) ; GunTipi=(.*?); Gidis=(.*?);( RenkGidis=.*?;\s*|\s*)\};
Group 5 will then be either " RenkGidis=0000FF;" or " ". You can then use a second regex to get 0000FF.

Related

How to remove everything after specific character in string using Java

I have a string that looks like this:
analitics#gmail.com#5
And it represents my userId.
I have to send that userId as parameter to the function and send it in the way that I remove number 5 after second # and append new number.
I started with something like this:
userService.getUser(user.userId.substring(0, userAfterMigration.userId.indexOf("#") + 1) + 3
What is the best way of removing everything that comes after the second # character in string above using Java?
Here is a splitting option:
String input = "analitics#gmail.com#5";
String output = String.join("#", input.split("#")[0], input.split("#")[1]) + "#";
System.out.println(output); // analitics#gmail.com#
Assuming your input would only have two at symbols, you could use a regex replacement here:
String input = "analitics#gmail.com#5";
String output = input.replaceAll("#[^#]*$", "#");
System.out.println(output); // analitics#gmail.com#
You can capture in group 1 what you want to keep, and match what comes after it to be removed.
In the replacement use capture group 1 denoted by $1
^((?:[^#\s]+#){2}).+
^ Start of string
( Capture group 1
(?:[^#\s]+#){2} Repeat 2 times matching 1+ chars other than #, and then match the #
) Close group 1
.+ Match 1 or more characters that you want to remove
Regex demo | Java demo
String s = "analitics#gmail.com#5";
System.out.println(s.replaceAll("^((?:[^#\\s]+#){2}).+", "$1"));
Output
analitics#gmail.com#
If the string can also start with ##1 and you want to keep ## then you might also use:
^((?:[^#]*#){2}).+
Regex demo
The simplest way that would seem to work for you:
str = str.replaceAll("#[^.]*$", "");
See live demo.
This matches (and replaces with blank to delete) # and any non-dot chars to the end.

How to match java regexp between some '#'?

I am facing an issue with the String.replaceFirst method.
I have the following String :
String content = "select * from queries
where update_date >= to_timestamp('#date|Date debut|dd/MM/yyyy# 00:00:00','DD/MM/YYYY HH24:MI:SS')
and update_date <= to_timestamp('#date|Date fin|dd/MM/yyyy# 23:59:59','DD/MM/YYYY HH24:MI:SS')";
(The two expressions between '#' are dynamically defined).
And I have 2 dates too :
String begin = "28/05/2018";
String end = "29/05/2018";
Then I would to replace the first expression with begin, and the second with end.
I use :
content = content.replaceFirst("#(date)\\|(.*)\\|(.*)#", begin);
content = content.replaceFirst("#(date)\\|(.*)\\|(.*)#", end);
Although, replaceFirst takes the last '#' of entire String and I am obtaining:
select * from queries where update_date >= to_timestamp('28/05/2018 23:59:59','DD/MM/YYYY HH24:MI:SS');
I understand the error but I ask you to help me to find a solution.
Thank you a lot ! Axel.
If looking for a generic regex for both replacements as your question's code seems to want, this is how to make it work:
the regex for .* that captures all characters is greedy by default, it means that it will try to capture as many characters as it can. This is why your first replacement replaces all.
You can use the lazy quantifier ? to precise that you want to capture the less characters possible instead of the most.
try:
#(date)\|(.*?)\|(.*?)#
(or escaped version for your code: "#(date)\\|(.*?)\\|(.*?)#")
see regex in regex101
When reading your question, I was not sure whether the text between #s (here I mean "date|Date debut|dd/MM/yyyy" and "date|Date fin|dd/MM/yyyy") were dynamically defined or if you were just explaining that you wanted to dynamically replace the fix contents above with your dynamically defined dates.
So I will give you two answers (and both should work).
If the text is fix:
#date\|Date debut\|dd/MM/yyyy# - for the first range
#date\|Date fin\|dd/MM/yyyy# - for the second range
If the text between # is not fix:
#[^#]*#
The regex above means find a range of chars that start with a #, than contains any chars except a #, this is what [^#] means, 0 or several times (the *) and ends with a #
I hope it helps!
Try this:
String content = "select * from queries " +
"where update_date >= to_timestamp('#date|Date debut|dd/MM/yyyy# 00:00:00','DD/MM/YYYY HH24:MI:SS') " +
"and update_date <= to_timestamp('#date|Date fin|dd/MM/yyyy# 23:59:59','DD/MM/YYYY HH24:MI:SS') ;";
String begin = "28/05/2018";
String end = "29/05/2018";
content = content.replaceFirst( "#date\\|[^\\|]*\\|[^#]*#", begin );
content = content.replaceFirst( "#date\\|[^\\|]*\\|[^#]*#", end );
System.out.println( content );
Here we don't need to use the () and we are matching until our character like | or # matched.

How to trim/cut string in java by symbol?

I'm working on a project where my API returns url with id at the end of it and I want to extract it to be used in another function. Here's example url:
String advertiserUrl = http://../../.../uuid/advertisers/4 <<< this is the ID i want to extract.
At the moment I'm using java's string function called substring() but this not the best approach as IDs could become 3 digit numbers and I would only get part of it. Heres my current approach:
String id = advertiserUrl.substring(advertiserUrl.length()-1,advertiserUrl.length());
System.out.println(id) //4
It works in this case but if id would be e.g "123" I would only get it as "3" after using substring, so my question is: is there a way to cut/trim string using dashes "/"? lets say theres 5 / in my current url so the string would get cut off after it detects fifth dash? Also any other sensible approach would be helpful too. Thanks.
P.s uuid in url may vary in length too
You don't need to use regular expressions for this.
Use String#lastIndexOf along with substring instead:
String advertiserUrl = "http://../../.../uuid/advertisers/4";// <<< this is the ID i want to extract.
// this implies your URLs always end with "/[some value of undefined length]".
// Other formats might throw exception or yield unexpected results
System.out.println(advertiserUrl.substring(advertiserUrl.lastIndexOf("/") + 1));
Output
4
Update
To find the uuid value, you can use regular expressions:
String advertiserUrl = "http://111.111.11.111:1111/api/ppppp/2f5d1a31-878a-438b-a03b-e9f51076074a/adver‌​tisers/9";
// | preceded by "/"
// | | any non-"/" character, reluctantly quantified
// | | | followed by "/advertisers"
Pattern p = Pattern.compile("(?<=/)[^/]+?(?=/adver‌​tisers)");
Matcher m = p.matcher(advertiserUrl);
if (m.find()) {
System.out.println(m.group());
}
Output
2f5d1a31-878a-438b-a03b-e9f51076074a
You can either split the string on slashes and take the last position of the array returned, or use the lastIndexOf("/") to get the index of the last slash and then substring the rest of the string.
Use the lastIndexOf() method, which returns the index of the last occurrence of the specified character.
String id = advertiserUrl.substring(advertiserUrl.lastIndexOf('/') + 1, advertiserUrl.length());

Replace querystring attribute value in java using regex

I want to replace querystring value but it's creating some problems:
Problem 1: Its Removing the "&" symbol after replacing
String queryString = "?pid=1&name=Dell&cid=25";
String nQueryString=queryString.replaceAll("(?<=[?&;])pid=.*?($|[&;])","pid=23");
System.out.println(nQueryString);
output of above example
?pid=23name=Dell&cid=25
you can see its removed the "&" after pid
Problem 2: Its not working if I removed the "?" symbol from the queryString variable.
String queryString = "pid=1&name=Dell&cid=25";
String nQueryString=queryString.replaceAll("(?<=[?&;])pid=.*?($|[&;])","pid=23");
System.out.println(nQueryString);
output of above example
?pid=1&name=Dell&cid=25
We can say the regex is not working, So anyone can suggest me better regex which completely fulfill my requirements.
queryString.replaceAll("(?<=[?&;])pid=.*?(?=[&;])", "pid=23")
Difference is that I'm using a positive-lookahead: (?=[&;]), which is zero-length, making it atomic, and is not actually included in the replacement via replaceAll(), just like your original positive-lookbehind is not replaced.
Alternatively, we can match until a & or ; is found, but not included in the replacement, ie:
queryString.replaceAll("(?<=[?&;])pid=[^&;]*", "pid=23")
[^&;] : ^ negates the following: &;, so [^&;]* will match until a ; or & is encountered.
Yours does not work because ($|[&;]) is a non-atomic group, specifically a capturing group, and thus is included in the replacement. NB: a non-capturing group (?:$|[&;]) would also fail here.
To your final note, you're using a positive look-behind for ?, &, and ;, so by removing the ?, it will no longer match, which makes sense.
Use this regex instead:
String nQueryString = queryString.replaceAll("(?<=[?&;])pid=[^&]*", "pid=23");
//=> ?pid=23&name=Dell&cid=25
Here [^&]* is called negation matching pattern, that will match query string value until & is found OR else end of string is found thus leaving rest of the query string un-effected.

Java String Replace Regex

I am doing some string replace in SQL on the fly.
MySQLString = " a.account=b.account ";
MySQLString = " a.accountnum=b.accountnum ";
Now if I do this
MySQLString.replaceAll("account", "account_enc");
the result will be
a.account_enc=b.account_enc
(This is good)
But look at 2nd result
a.account_enc_num=a.account_enc_num
(This is not good it should be a.accountnum_enc=b.accountnum_enc)
Please advise how can I achieve what I want with Java String Replace.
Many Thanks.
From your comment:
Is there anyway to tell in Regex only replace a.account=b.account or a.accountnum=b.accountnum. I do not want accountname to be replace with _enc
If I understand correctly you want to add _enc part only to account or accountnum. To do this you can use
MySQLString = MySQLString.replaceAll("\\baccount(num)?\\b", "$0_enc");
(num)? mean that num is optional so regex will accept account or accountnum
\\b at start mean that there can be no letters, numbers or "_" before account so it wont accept (affect) something like myaccount, or my_account.
\\b at the end will prevent other letters, numbers or "_" after account or accountnum.
It's hard to extrapolate from so few examples, but maybe what you want is:
MySQLString = MySQLString.replaceAll("account\\w*", "$0_enc");
which will append _enc to any sequence of letters, digits, and underscores that starts with account.
try
String s = " a.accountnum=b.accountnum ".replaceAll("(account[^ =]*)", "$1_enc");
it means replace any sequence characters which are not ' ' or '=' which starts the word "account" with the sequence found + "_enc".
$1 is a reference to group 1 in regex; group 1 is the expression in parenthesis (account[^ =]+), i.e. our sequence
See http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html for details

Categories

Resources