Regex to extract data located within square brackets [...] - java

I need to find regex for string represent date [07/Mar/2014:22:12:28 -0800] from mentioned line:
64.242.88.10 – – [07/Mar/2014:22:12:28 -0800] “GET /twiki/bin/attach/TWiki/WebSearch HTTP/1.1” 401 12846

If your string doesn't have any other content in square braces besides this, then:
\[.*?]
Regex101 Demo
Details
\[ - opening bracket (escaped because [ is a meta-character)
.*? - non-greedy match-all
] - closing bracket (doesn't need escaping)
When adapting for use in a Java program, you'll need to escape the backslash too.:
Pattern.compile("\\[.*?]");

Try this:
\[[0-9]{1,2}\/[a-zA-Z]+\/[0-9]{4}:[0-9]{2}:[0-9]{2}:[0-9]{2} -[0-9]{4}]
Short version(greedy) since it is enclosed by [ ]:
\[.*]

Related

expression between curly bracket with inside expression

I have got example expression :
firstName =:'Mon';lastName =:'Arthur';:or{size >:'20';lastName ^:'H';:and{company |:'lon';:or{company |:'we'}}};lastName =:'aa';:and{length >:'33';:or{color =:'red'};width <:'2'};date <:'2012';:!{source =:'dictionary,locale'}
and regex must match:
:or{size >:'20';lastName ^:'H';:and{company |:'lon';:or{company |:'we'}}}
:and{length >:'33';:or{color =:'red'};width <:'2'}
:!{source =:'dictionary, locale'}
So that regex must match to expression that start with ':[anycharacters]{' and end with '}' and expression between that curly parentheses may also contains inner expression that can match.
I try to wrote something:
https://regex101.com/r/gM3dR9/13
and the return is:
:or{size >:'20';lastName ^:'H';:and{company |:'lon';:or{company |:'we'} - OK
:and{length >:'33';:or{color =:'red'} -MISSING ;width <:'2'}
:!{source =:'dictionary, locale'} -OK
I tried to work out a solution that fits your example and the requirements you wrote, but I'm not sure, if I got it entirely:
(?:;:)(\S+(?:{.*?}(?=[^}]*$|;[^}]*;:)))
This uses a positive lookahead to ensure that the last closing bracket is catched correctly (it has to be followed by the end of the string or another ;:)
If it is possible, that your match is the beginning of the string and therefor not proceeded by ;: you could change the part (?:;:) to (?:^|;:)
Here is the link for Regex101: https://regex101.com/r/dV8uI4/1
Try this regEx
(:or{.*?\};{1,})|(:and{.*\};)|(:!{.*?\};{0,})
I can't guarantee for any other complex case, but it is definitely what you have mentioned as output Except extra ';'
"firstName =:'Mon';lastName =:'Arthur';:or{size >:'20';lastName ^:'H';:and{company |:'lon';:or{company |:'we'}}};lastName =:'aa';:and{length >:'33';:or{color =:'red'};:width <:'2'};date <:'2012';:!{source =:'dictionary,locale'}".match(/(:or{.*?\};{1,})|(:and{.*\};)|(:!{.*?\};{0,})/g)
Output
[":or{size >:'20';lastName ^:'H';:and{company |:'lon';:or{company |:'we'}}};", ":and{length >:'33';:or{color =:'red'};:width <:'2'};", ":!{source =:'dictionary,locale'}"]
Formatted output
[
":or{size >:'20';lastName ^:'H';:and{company |:'lon';:or{company |:'we'}}};",
":and{length >:'33';:or{color =:'red'};:width <:'2'};",
":!{source =:'dictionary,locale'}"
]
tested Here - Java RegEx Tester

Complex Java Regular Expression with Nested Groupings

I am trying to get a regular expression written that will capture what I'm trying to match in Java, but can't seem to get it.
This is my latest attempt:
Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
This is what I want to match:
hello
hello/world
hello/big/world
hello/big/world/
This what I don't want matched:
/
/hello
hello//world
hello/big//world
I'd appreciate any insight into what I am doing wrong :)
Try this regex:
Pattern.compile( "^[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?$" );
Doesn't your regex require question mark at the end?
I always write unit tests for my regexes so I can fiddle with them until they pass.
// your exact regex:
final Pattern regex = Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
// your exact examples:
final String[]
good = { "hello", "hello/world", "hello/big/world", "hello/big/world/" },
bad = { "/", "/hello", "hello//world", "hello/big//world"};
for (String goodOne : good) System.out.println(regex.matcher(goodOne).matches());
for (String badOne : bad) System.out.println(!regex.matcher(badOne).matches());
prints a solid column of true values.
Put another way: your regex is perfectly fine just as it is.
It looks like what you're trying to 'Capture' is being overwritten each quantified itteration. Just change parenthesis arangement.
# "[A-Za-z0-9]+((?:/[A-Za-z0-9]+)*)/?"
[A-Za-z0-9]+
( # (1 start)
(?: / [A-Za-z0-9]+ )*
) # (1 end)
/?
Or, with no capture's at all -
# "[A-Za-z0-9]+(?:/[A-Za-z0-9]+)*/?"
[A-Za-z0-9]+
(?: / [A-Za-z0-9]+ )*
/?

Use java regex to find all strings that start with '#' and end with ' ' , and not include ' ' and '#'

I need to get all strings(not empty) starts with # and end with ' '(space) in String below:
String s = "#test1 #test2 #test3 #test4 ## #test5";
I hope I can get all "test1", "test2", "test3", "test4", "test5" strings.
How to do it with java regx? thanks a lot!
You can use the following regex
#\w+
\w is similar to [a-zA-Z\d_]
\w+ matches 1 to many characters which are from [a-zA-Z\d_]
The Java regex (?<=#)[^# ]+(?= ) should do the trick. According to Regex Planet's Java regex page that regex matches test1, test2, test3 and test4. (#test5 does not end with a space, so test5 is not matched.)
If you're OK with matching the leading #s and trailing s as well, you can get away with the simpler Java regex #[^# ]+.
Finally I solved it with code below:
Pattern pattern = Pattern.compile("#\\p{L}+");

Youtube complete Java Regex

I need to parse several pages to get all of their Youtube IDs.
I found many regular expressions on the web, but : the Java ones are not complete (they either give me garbage in addition to the IDs, or they miss some IDs).
The one that I found that seems to be complete is hosted here. But it is written in JavaScript and PHP. Unfortunately I couldn't translate them into JAVA.
Can somebody help me rewrite this PHP regex or the following JavaScript one in Java?
'~
https?:// # Required scheme. Either http or https.
(?:[0-9A-Z-]+\.)? # Optional subdomain.
(?: # Group host alternatives.
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com followed by
\S* # Allow anything up to VIDEO_ID,
[^\w\-\s] # but char before ID is non-ID char.
) # End host alternatives.
([\w\-]{11}) # $1: VIDEO_ID is exactly 11 chars.
(?=[^\w\-]|$) # Assert next char is non-ID or EOS.
(?! # Assert URL is not pre-linked.
[?=&+%\w]* # Allow URL (query) remainder.
(?: # Group pre-linked alternatives.
[\'"][^<>]*> # Either inside a start tag,
| </a> # or inside <a> element text contents.
) # End recognized pre-linked alts.
) # End negative lookahead assertion.
[?=&+%\w]* # Consume any URL (query) remainder.
~ix'
/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com\S*[^\w\-\s])([\w\-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:['"][^<>]*>|<\/a>))[?=&+%\w]*/ig;
First of all you need to insert and extra backslash \ foreach backslash in the old regex, else java thinks you escapes some other special characters in the string, which you are not doing.
https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
Next when you compile your pattern you need to add the CASE_INSENSITIVE flag. Here's an example:
String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
Pattern compiledPattern = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(link);
while(matcher.find()) {
System.out.println(matcher.group());
}
Marcus above has a good regex, but i found that it doesn't recognize youtube links that have "www" but not "http(s)" in them
for example www.youtube....
i have an update:
^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
it's the same except for the start

Need regex to format file in php

I have a java file that I want to post online. I am using php to format the file.
Does anyone know the regex to turn the comments blue?
INPUT:
/*****
*This is the part
*I want to turn blue
*for my class
*******************/
class MyClass{
String s;
}
Thanks.
Naiive version:
$formatted = preg_replace('|(/\*.*?\*/)|m', '<span class="blue">$1</span>', $java_code_here);
... not tested, YMMV, etc...
In general, you won't be able to parse specific parts of a Java file using only regular expressions - Java is not a regular language. If your file has additional structure (such as "it always begins with a comment followed by a newline, followed by a class definition"), you can generate a regular expression for such a case. For instance, you'd match /\*+(.*?)\*+/$, where . is assumed to match multiple lines, and $ matches the end of a line.
In general, to make a regex work, you first define what patterns you want to find (rigorously, but in spoken language), and then translate that to standard regular expression notation.
Good luck.
A regex that can parse simple quotes should be able to find comments in C/C++ style languages.
I assume Java is of that type.
This is a Perl faq sample by someone else, although I added the part about // style comments (with or without line continuation) and reformated.
It basically does a global search and replace. Data is replaced verbatim if non a comment, otherwise replace the comment with your color formatting tags.
You should be able to adapt this to php, and it is expanded for clarity (maybe too much clarity though).
s{
## Comments, group 1:
(
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
(?:
[^/*][^*]*\*+
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment
|
// ## Start of // ... comment
(?:
[^\\] ## Any Non-Continuation character ^\
| ## OR
\\\n? ## Any Continuation character followed by 0-1 newline \n
)*? ## To be done 0-many times, stopping at the first end of comment
\n ## End of // comment
)
| ## OR, various things which aren't comments, group 2:
(
" (?: \\. | [^"\\] )* " ## Double quoted text
|
' (?: \\. | [^'\\] )* ' ## Single quoted text
|
. ## Any other char
[^/"'\\]* ## Chars which doesn't start a comment, string, escape
) ## or continuation (escape + newline)
}
{defined $2 ? $2 : "<some color>$1</some color>"}gxse;

Categories

Resources