How to extract the session id from an RTSP message's content? - java

I have a string like this:
RTSP/1.0 200 OK
CSeq: 3
Server: Ants Rtsp Server/1.0
Date: 21 Oct 2016 15:55:30 GMT
Session: 980603187; timeout=60
Transport: RTP/AVP/TCP;unicast;interleaved=0-1;ssrc=F006B800
I want to extract the session number(980603187)
Could someone please provide some help?

Simply use a regular expression with a group, then extract the value of the group as next:
String content ="RTSP/1.0 200 OK\n" +
"CSeq: 3\n" +
"Server: Ants Rtsp Server/1.0\n" +
"Date: 21 Oct 2016 15:55:30 GMT\n" +
"Session: 980603187; timeout=60\n" +
"Transport: RTP/AVP/TCP;unicast;interleaved=0-1;ssrc=F006B800\n";
Pattern pattern = Pattern.compile("Session: ([a-zA-Z0-9$\\-_.+]+)");
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
980603187
Explanation:
Session: ([a-zA-Z0-9$\\-_.+]+)
Session: matches the characters Session: literally (case sensitive)
([a-zA-Z0-9$\\-_.+]+): Capturing group that matches with several consecutive ALPHA, DIGIT or SAFE characters (at least one) (cf RFC 2326 chapter 3.4 Session Identifiers)

Use Regex! Having String str = .., extract the number needed with the Regex capturing anything between Session: and ;:
Session: (.+);
Feel free to specify only letters \\w+ or digits \\d+. Mind the double escaping in Java. The first matched m.group(1) is your result:
Pattern p = Pattern.compile("Session: (.+);");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
Outputs 980603187. Check out the Regex101 for the explanation.
In come cases the ; timeout is optional and to need to amend the Regex used:
Session: (.+?)[\n;]

Once you have each header you can look up the specification in RFC 2336 which specifies the RTSP protocol.
First of all, you should split your string into lines. The lines end with CR/LF according to the specification. The first line indicates the response, the other should be header fields.
The definition is:
Session = "Session" ":" session-id [ ";" "timeout" "=" delta-seconds ]
where session-id is specified as:
session-id = 1*( ALPHA | DIGIT | safe )
which means you should not confuse it with a number. The definition of safe is
safe = "\$" | "-" | "_" | "." | "+"
and alpha means all upper- and lowercase numbers. This means it is possible to put in a base 64 url encoded binary session-id, by the way.
OK, now it becomes a question of looking for the session ID. You step through all lines (except the first one) and then look for the line that matches:
^Session[ \t]*:[ \t]*([a-zA-Z0-9\$\-_.+]+).*$
this will match only valid session headers / valid session identifiers. Note that the standard is vague about white-space, so I skipped over space and tab characters before and after the colon ':'. The session identifier is then in group 1 of the regular expression.
You can of course easily extend this by including the timeout in the regular expression, once you need it.
Note that you will have to double escape the backslash characters before using the regular expression in Java. It's also possible to use the Posix character classes defined in the Pattern class to make the regular expression more readable.

If you use apache-commons in your dependencies, then you can do it within one line:
StringUtils.substringBetween(string, "Session: ", ";");

Related

String replacement when regex reverse group is null in java

I want to convert a software version number into a github tag name by regular expression.
For example, the version of ognl is usually 3.2.1. What I want is the tag name OGNL_3_2_1
So we can use String::replaceAll(String regex, String replacement) method like this
"3.2.1".replaceAll("(\d+).(\d+).(\d+)", "OGNL_$1_$2_$3")
And we can get the tag name OGNL_3_2_1 easily.
But when it comes to 3.2, I want the regex still working so I change it into (\d+).(\d+)(?:.(\d+))?.
Execute the code again, what I get is OGNL_3_2_ rather than OGNL_3_2. The underline _ at the tail is not what I want. It is resulted by the null group for $3
So how can I write a suitable replacement to solve this case?
When the group for $3 is null, the underline _ should disappear
Thanks for your help !!!
You can make the last . + digits part optional by enclosing it with an optional non-capturing group and use a lambda as a replacement argument with Matcher.replaceAll in the latest Java versions:
String regex = "(\\d+)\\.(\\d+)(?:\\.(\\d+))?";
Pattern p = Pattern.compile(regex);
String s="3.2.1";
Matcher m = p.matcher(s);
String result = m.replaceAll(x ->
x.group(3) != null ? "OGNL_" + x.group(1) + "_" + x.group(2) + "_" + x.group(3) :
"OGNL_" + x.group(1) + "_" + x.group(2) );
System.out.println(result);
See the Java demo.
The (\d+)\.(\d+)(?:\.(\d+))? pattern (note that literal . are escaped) matches and captures into Group 1 any one or more digits, then matches a dot, then captures one or more digits into Group 2 and then optionally matches a dot and digits (captured into Group 3). If Group 3 is not null, add the _ and Group 3 value, else, omit this part when building the final replacement value.

Want to extract values from text file using regex

"00.00.00.00" 00.00.00.00 - - [07/Jun/2016:00:00:00 -0700] "Hey /acd?bg=1 HTTP/1.1" 200 2 "-" "00.00.00.00:0000" "Java/1.8.0_66" - - 2000
There are records as above, i want to extract values from all the fields , each field is separated by space , please help
I am using as below:
String p;
Pattern pattern = Pattern.compile(p);
Matcher matcher = pattern.matcher(str);
if (matcher.find()){
System.out.println(matcher.group(1));
}
But I am not getting the correct output. I am new to regex
The desired out put is
00.00.00.00
00.00.00.00
-
-
07/Jun/2016:00:00:01 -0700
Hey /acd?bg=1 HTTP/1.1
200
I've got a pattern that does what you want, but it isn't pretty:
^"((?:\d\d?\d?\.){3}\d\d?\d?)" ((?:\d\d?\d?\.){3}\d\d?\d?) (-) (-) (\[\d\d\/\w+\/\d{4}(?::\d\d){3} -\d{4}\]) "(.*?)" (\d{3})
To break it down a bit (because it's nasty):
^ makes it start at the beginning of the string.
((?:\d\d?\d?\.){3}\d\d?\d?) will match and capture the first IP address, with each element being composed of between 1 and 3 digits. The same pattern is then used to match the second IP address as well.
(-) will capture the hyphens - not sure why you want it, but it's in your desired input.
(\[\d\d\/\w+\/\d{4}(?::\d\d){3} -\d{4}\]) captures the timestamp (the bit in the square brackets).
"(.*?)" will match and capture the text string.
Finally, (\d{3}) will capture the HTTP status code.
Taken together, this pattern will match the stuff you want from the string you provided.

Making a Regex More Dynamic

I posted this question a couple weeks ago pertaining to extracting a capture group using regex in Java, Extracting Capture Group Using Regex, and I received a working answer. I also posted this question a couple weeks ago pertaining to character replacement in Java using regex, Replace Character in Matching Regex, and received an even better answer that was more dynamic than the one I got from my first post. I'll quickly illustrate by example. I have a string like this that I want to extract the "ID" from:
String idInfo = "Any text up here\n" +
"Here is the id\n" +
"\n" +
"?a0 12 b5\n" +
"&Edit Properties...\n" +
"And any text down here";
And in this case I want the output to just be:
a0 12 b5
But it turns out the ID could be any number of octets (just has to be 1 or more octets), and I want my regex to be able to basically account for an ID of 1 octet then any number of subsequent octets (from 0 to however many). The person I received an answer from in my Replace Character in Matching Regex post did this for a similar but different use case of mine, but I'm having trouble porting this "more dynamic" regex over to the first use case.
Currently, I have ...
Pattern p = Pattern.compile("(?s)?:Here is the id\n\n\\?([a-z0-9]{2})|(?<!^)\\G:?([a-z0-9]{2})|.*?(?=Here is the id\n\n\\?)|.+");
Matcher m = p.matcher(certSerialNum);
String idNum = m.group(1);
System.out.println(idNum);
But it's throwing an exception. In addition, I would actually like it to use all known adjacent text in the pattern including "Here is the id\n\n\?" and "\n&Edit Properties...". What corrections do I need to get this working?
Seems like you want something like this,
String idInfo = "Any text up here\n" +
"Here is the id\n" +
"\n" +
"?a0 12 b5\n" +
"&Edit Properties...\n" +
"And any text down here";
Pattern regex = Pattern.compile("Here is the id\\n+\\?([a-z0-9]{2}(?:\\s[a-z0-9]{2})*)(?=\\n&Edit Properties)");
Matcher matcher = regex.matcher(idInfo);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
a0 12 b5
DEMO

Regular Expression for string in java

I am trying to write a regular expression for these find of strings
05 IMA-POLICY-ID PIC X(15). 00020068
05 (AMENT)-GROUPCD PIC X(10).
I want to parse anything between 05 and first tab .
The line might start with tabs or spaces and then digit
Initial number can be anything 05,10,15 .
So In the first line I need to pasrse IMA-POLICY-ID and in second line (AMENT)-GROUPCD
This is the code i have written and its not finding the pattern where am i going wrong ?
Pattern p1 = Pattern.compile("^[0-9]+\\s\\S+\t$");
Matcher m1 = p1.matcher(line);
System.out.println("m1 =="+m1.group());
Pattern p1 = Pattern.compile("\\b(?:05|1[05])\\b[^\\t]*\\t");
will match anything from 05, 10 or 15 until the nearest \t.
Explanation:
\b # Start of number/word
(?:05|1[05]) # Match 05, 10 or 15
\b # End of number/word
[^\t]* # Match any number of characters except tab
\t # Match a tab
^\d+\s+([^\s]+)
this will match your requirement
demo here : http://regex101.com/r/rQ7fT3
Your regex is almost correct. Just remove the \t$ at the end of your regex. and capture the \\S+ as a group.
Pattern p1 = Pattern.compile("^[0-9]+\\s(\\S+)");
Now print it as:
if (m.find( )) {
System.out.println(m.group(1));
}
Your pattern expects the line to end after IMA-POLICY-ID etc, because of the $ at the end.
If there is no white space in the string you want to match (I assume there isn't because of your use of \S+, I'd change the pattern to ^\d+\s+(\S+) which should be sufficient to match any number at the start of a line, followed by whitespace and then the group of non-whitespace characters you want to match (note that a tab is whitespace as well).
If you need to match until the first tab or the end of the input and include other whitespace, replace (\S+) with ([^\t]+).
I can see two things that might prevent your Pattern from working.
Firstly your input Strings contain multiple tab-separated values, therefore the $ "end-of-input" character at the end of your Pattern will fail to match the String
Secondly, you want to find what's in between 05 (etc.) and the 1st tab. Therefore you need to wrap your desired expression between parenthesis (e.g. (\\S+)) and refer it by its group number (in this case, it would be group 1)
Here's an example:
String input = "05 IMA-POLICY-ID\tPIC X(15).\t00020068" +
"\r\n05 (AMENT)-GROUPCD\tPIC X(10).";
// | 0, 1, or 5 twice (refine here if needed)
// | | 1 whitespace
// | | | your queried expression (here I use a
// | | | reluctant dot search
// | | | | tab
// | | | | | anything after, reluctant
Pattern p = Pattern.compile("[015]{2}\\s(.+?)\t.+?");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println("Found: " + m.group(1));
}
Output
Found: IMA-POLICY-ID
Found: (AMENT)-GROUPCD
This is what i came up with and it worked :
String re = "^\\s+\\d+\\s+([^\\s]+)";
Pattern p1 = Pattern.compile(re, Pattern.MULTILINE);
Matcher m1 = p1.matcher(line);

regular expressions in java

How to validate an expression for a single dot character?
For example if I have an expression "trjb....fsf..ib.bi." then it should return only dots at index 15 and 18. If I use Pattern p=Pattern.compile("(\\.)+"); I get
4 ....
11 ..
15 .
18 .
This seems to do the trick:
String input = "trjb....fsf..ib.bi.";
Pattern pattern = Pattern.compile("[^\\.]\\.([^\\.]|$)");
Matcher matcher = pattern.matcher(" " + input);
while (matcher.find()) {
System.out.println(matcher.start());
}
The extra space in front of the input does two things:
Allows for a . to be detected as the first character of the input string
Offsets the matcher.start() by one to account for the character in front of the matched .
Result is:
15
18
add a blank at the beginning and at the end of the string and then use the pattern
"[^\\.]\\.[^\\.]"
you need to use negative lookarounds .
Something like Pattern.compile("(?<!\\.)\\.(?!\\.)");
Try
Pattern.compile("(?<=[^\\.])\\.(?=[^\\.])")
or even better...
Pattern.compile("(?<![\\.])\\.(?![\\.])")
This uses negative lookaround.
(?<![\\.]) => not preceeded by a .
\\. => a .
(?![\\.]) => not followed by a .

Categories

Resources