Using java validate a string is printed in particular format - java

I have a response string like as follows
21.03.2019_15:06.26 [SELOGER]:: [Seloger value]-[PROGRESS]: marminto=true, france24=true,
Using Java I have to validate the above response is printed in following format:
<date+time> [SELOGER]:: [Seloger value]-[<PROGRESS|STOP|START>]: <value1>=<true|false>, <value2>=<true|false>........
first is <date+time> then [SELOGER]:: [Seloger value]- then [PROGRESS or STOP or START]: then values marminto=true, france24=true,.....
How can i perform this with regex? or any java API's available to detect a string is printed in particular format.

Try this pattern:
\d{2}\.\d{2}\.\d{4}\_\d{2}:\d{2}\.\d{2} \[SELOGER\]:: \[Seloger value\]-\[(?:PROGRESS|STOP|START)\]: *(?:[a-zA-Z0-9]+=(?:true|false), ?)*
Explanation:
\d{2}\.\d{2}\.\d{4}\_\d{2}:\d{2}\.\d{2} matches date in specified format
(?:PROGRESS|STOP|START) - conditional, match any from PROGRESS, STOP or START
(?:[a-z0-9]+=(?:true|false), ?)* - match zero or more value=true/value=false pairs optionally followed by space and followed by comma
Demo

Related

how to substring and extract a dynamic content

i have a string which loads on the page based on the success and failure search results. If the search case is success then my output table will have a string like
Search Results - 31 Items Found (Debug Code: 4b50016efc3a1ad93502)
or if my search results fail, then the output table will display a string either:
Search Results - No data found (Debug Code: 4b50016efc3a1ad93502)
Search Results - 0 Items Found (Debug Code: 4b50016efc3a1ad93502)
Search Results - An Exception Occurred (Debug Code: 4b50016efc3a1ad93502)
depending on the input conditions.
I want to extract the Debug Code value and pass on to other scenario to validate further. I know I can use substring() to extract the Debug Code, but its position in the string is not constant; it varies based on the input conditions, however Debug code will be at last.
How can I extract the Debug Code value (eg 4b50016efc3a1ad93502) for all scenarios?
You can use regex to capture and return the target code:
String debugCode = output.replaceAll(".*Debug Code: (\\w+).*|.*", "$1");
What's happening here?
The regex .*Debug Code: (\\w+).* matches the entire string, and captures (with brackets) the target text using \\w+, which means "one or more 'letter' characters" ('letters' included digits). The replacement string $1 means "group 1" - the first captured group. Because the entire input is matched, this operation effectively replaces the whole input with the captured group, so returns just the target text.
So what happens if the input doesn't have a debug code?
The extra |.* at the end of the regex means "or 'anything'", so the regex will match the entire input even if there it doesn't have a debug code, but captured group 1 will still exist, but it will be empty, so the operation returns the blank string.
Examples:
String output1 = "Results - 0 Items Found (Debug Code: 4b50016efc3a1ad93502)";
String output2 = "something else";
String code1 = output1.replaceAll(".*Debug Code: (\\w+).*|.*", "$1"); // "4b50016efc3a1ad93502"
String code2 = output2.replaceAll(".*Debug Code: (\\w+).*|.*", "$1"); // ""
You don't have to use "|.*", but you don't have it, the entire string will be returned if the input doesn't have the 'debug code' format.
A java String is invariant. No matter how you get a String, its contents will not change.
For example
String s = "the way we were";
String t = s.substring(4, 6); // t = "way"
s = "abcdefghijk";
s is a new String, but t is unchanged

Java String replacement with custom regex

I have a Java application which streams Twitter data.
Assuming that I have a String text = tweet.getText() variable.
In a text we can have one or more #MentionedUser. I'd like to delete not just the # but the username too.
How can I do this with replaceAll and without touching the rest of the string?
Thank you.
I would like to use (^|\s)#\w+($|\s) because you can get emails in your input like :
a #twitter username and a simple#email.com another #twitterUserName
So you can use :
String text = "a #twitter username and a simple#email.com another #twitterUserName";
text = text.replaceAll("(^|\\s)#\\w+($|\\s)", "$1$2");
// Output : a username and a simple#email.com another
Details :
(^|\s) which match ^ start of string or | a space \s
#\w+ match # followed by one or more word characters which is equivalent to [A-Za-z0-9_]
($|\s) which match $ end of string or | a space \s
If you want to go deeper to specify the correct syntax of twitter usernames i read this article here they mention some helpful information :
Your username cannot be longer than 15 characters. Your name can be longer (50 characters), but usernames are kept shorter for the
sake of ease.
A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. ...
From this rules you use this regex as well :
(?i)(^|\s)#[a-z0-9_]{1,15}($|\s)
Here is an alternative which does not produce doubled whitespaces and also does not capture emails:
String str = "a #twitter #user username and a john.doe#gmail.com another #twitterUserName #test jane#doe.com";
System.out.println(str.replaceAll("(?<=[^\\w])#[^#\\s]+(\\s+|$)", ""));
Output:
a username and a john.doe#gmail.com another jane#doe.com
Explanation of the parts of the actual regex expression (?<=[^\w])#[^#\s]+(\s+|$) :
(?<=[^\w])# - Try to find the '#' character and then look back to check that there is no regular character behind it (uses zero-width positive lookbehind).
[^#\s]+ - Find something which is not an '#' or space character
(\s+|$) - Find multiple spaces or the end of the line

How to remove \u200B (Zero Length Whitespace Unicode Character) from String in Java?

My application is using Spring Integration for email polling from Outlook mailbox.
As, it is receiving the String (email body)from an external system (Outlook), So I have no control over it.
For Example,
String emailBodyStr= "rejected by sundar14-\u200B.";
Now I am trying to remove the unicode character \u200B from this String.
What I tried already.
Try#1:
emailBodyStr = emailBodyStr.replaceAll("\u200B", "");
Try#2:
`emailBodyStr = emailBodyStr.replaceAll("\u200B", "").trim();`
Try#3 (using Apache Commons):
StringEscapeUtils.unescapeJava(emailBodyStr);
Try#4:
StringEscapeUtils.unescapeJava(emailBodyStr).trim();
Nothing worked till now.
When I tried to print this String using below code.
logger.info("Comment BEFORE:{}",emailBodyStr);
logger.info("Comment AFTER :{}",emailBodyStr);
In Eclipse console, it is NOT printing unicode char,
Comment BEFORE:rejected by sundar14-​.
But the same code prints the unicode char in Linux console as below.
Comment BEFORE:rejected by sundar14-\u200B.
I read some examples where str.replace() is recommended, but please note that examples uses javascript, PHP and not Java.
Finally, I am able to remove 'Zero Width Space' character by using 'Unicode Regex'.
String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");
Reference to find the category of Unicode characters.
Character class from Java.
Character class from Java lists all of these unicode categories.
Website: http://www.fileformat.info/
Website: http://www.regular-expressions.info/ => Unicode Regular Expressions
Note 1: As I received this string from Outlook Email Body - none of the approaches listed in my question was working.
My application is receiving a String from an external system
(Outlook), So I have no control over it.
Note 2: This SO answer helped me to know about Unicode Regular Expressions .

Using regex to find chars in a string and replace

When returning a string value from an incoming request in my network based app, I have a string like this
'post http://a.com\r\nHost: a.com\r\n'
Issue is that the host is always changing so I need to replace it with my defined host. To accomplish that I tried using regex but am stuck trying to find the 'host:a.com' chars in the string and replacing it with a defined valued.
I tried using this example www.javamex.com/tutorials/regular_expressions/search_replace_loop.shtml#.VUWvt541jqB changing the pattern compile to :([\\d]+) but it still remains unchanged.
My goal is to replace given chars in a string with a defined value and returning the new string with the defined value.
Any pointers?
EDIT:
Sample of a typical incoming request:
Post http://example.com\r\nHost: example.com\r\nConnection: close\r\n
Another incoming request might take this form:
GET http://example2.net\r\nContent-Length: 2\r\nConnection: close\r\nHost: example2.net\r\n
I want to replace it to this forms
Post http://example.com\r\nHost: mycustomhostvalue.com\r\nConnection: close\r\n
GET http://example2.net\r\nContent-Length: 2\r\nConnection: close\r\nHost: mycustomhostvalue.com\r\n
Use a regex to replace it, like this:
content = content.replaceAll("Host:\\s*(\\w)*\\.\\w*", "Host: newhost.com")
This will replace anything after Host: with newHost.com.
Note: as per comment by cfqueryparam, you may want to usea regex like this to cover .co.uk and such:
Host:\\s*.*?(?=\\\\r\\\\n)

java regex matcher results != to notepad++ regex find result

I am trying to extract data out of a website access log as part of a java program. Every entry in the log has a url. I have successfully extracted the url out of each record.
Within the url, there is a parameter that I want to capture so that I can use it to query a database. Unfortunately, it doesn't seem that the web developers used any one standard to write the parameter's name.
The parameter is usually called "course_id", but I have also seen "courseId", "course%3DId", "course%253Did", etc. The format for the parameter name and value is usually course_id=_22222_1, where the number I want is between the "_" and "_1". (The value is always the same, even if the parameter name varies.)
So, my idea was to use the regex /^.*course_id[^_]*_(\d*)_1.*$/i to find and extract the number.
In java, my code is
java.util.regex.Pattern courseIDPattern = java.util.regex.Pattern.compile(".*course[^i]*id[^_]*_(\\d*)_1.*", java.util.regex.Pattern.CASE_INSENSITIVE);
java.util.regex.Matcher courseIDMatcher = courseIDPattern.matcher(_url);
_courseID = "";
if(courseIDMatcher.matches())
{
_courseID = retrieveCourseID(courseIDMatcher.group(1));
return;
}
This works for a lot of the records. However, some records do not record the course_id, even though the parameter is in the url. One such example is the record:
/webapps/contentDetail?course_id=_223629_1&content_id=_3641164_1&rich_content_level=RICH&language=en_US&v=1&ver=4.1.2
However, I used notepad++ to do a regex replace on this (in fact, every) url using the regex above, and the url was successfully replaced by the course ID, implying that the regex is not incorrect.
Am I doing something wrong in the java code, or is the java matcher broken?

Categories

Resources