Need regex to format file in php - java

I have a java file that I want to post online. I am using php to format the file.
Does anyone know the regex to turn the comments blue?
INPUT:
/*****
*This is the part
*I want to turn blue
*for my class
*******************/
class MyClass{
String s;
}
Thanks.

Naiive version:
$formatted = preg_replace('|(/\*.*?\*/)|m', '<span class="blue">$1</span>', $java_code_here);
... not tested, YMMV, etc...

In general, you won't be able to parse specific parts of a Java file using only regular expressions - Java is not a regular language. If your file has additional structure (such as "it always begins with a comment followed by a newline, followed by a class definition"), you can generate a regular expression for such a case. For instance, you'd match /\*+(.*?)\*+/$, where . is assumed to match multiple lines, and $ matches the end of a line.
In general, to make a regex work, you first define what patterns you want to find (rigorously, but in spoken language), and then translate that to standard regular expression notation.
Good luck.

A regex that can parse simple quotes should be able to find comments in C/C++ style languages.
I assume Java is of that type.
This is a Perl faq sample by someone else, although I added the part about // style comments (with or without line continuation) and reformated.
It basically does a global search and replace. Data is replaced verbatim if non a comment, otherwise replace the comment with your color formatting tags.
You should be able to adapt this to php, and it is expanded for clarity (maybe too much clarity though).
s{
## Comments, group 1:
(
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
(?:
[^/*][^*]*\*+
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment
|
// ## Start of // ... comment
(?:
[^\\] ## Any Non-Continuation character ^\
| ## OR
\\\n? ## Any Continuation character followed by 0-1 newline \n
)*? ## To be done 0-many times, stopping at the first end of comment
\n ## End of // comment
)
| ## OR, various things which aren't comments, group 2:
(
" (?: \\. | [^"\\] )* " ## Double quoted text
|
' (?: \\. | [^'\\] )* ' ## Single quoted text
|
. ## Any other char
[^/"'\\]* ## Chars which doesn't start a comment, string, escape
) ## or continuation (escape + newline)
}
{defined $2 ? $2 : "<some color>$1</some color>"}gxse;

Related

How to validate number.number.number;number this format in Reg ex

My string should contains in this format number.number.number;number ex:15.2.63;4
How to validate this format in Reg ex. I have done in normal way used contains, spilt etc. But lines of code increased. May I know how do it in reg ex?
You can go with this:
^\d+[.]\d+[.]\d+;\d+$
With a liveDemo
Many ways to do it, here using PCRE:
laptop:~$ echo "12.34.56;7" | perl -ne 'print $_ if (/^\d+\.\d+\.\d+;\d+$/);'
12.34.56;7
laptop:~$ echo "12a.34b.56c;7" | perl -ne 'print $_ if (/\d+\.\d+\.\d+;\d+/);'
laptop:~$ echo "12.34.56;7" | perl -ne 'print $_ if (/^(\d+\.){2}\d+;\d+$/);'
12.34.56;7
If you know the exact length of each part, you can also fix it.
For example \d{2}. will match 11. but won't match 123.
The above answer group dot into bracket ([.]) this is useless for a single character.
But if you delimiter may vary, you can use, for example [.;-] to allow . ; and - as a delimiter.
Try this..
^[1-9][\.\d]*[1-9][\.\d]*[1-9][\.\d]*[1-9][\;\d]?$
Hope this helps...

Java regex: Multiple delimiters preceded by "-"

Remove suffix from network name of hosts at specific delimiters preceded by "-"hyphen, so that if there are other combinations with "-", it should be taken as part of the network name.
Few examples:
abcd-new --> abcd-new ## Stays same ##
efgh-nic --> efgh ## delimiter is '-nic' ##
mnop-ilo-a --> mnop-ilo ## delimiter is '-a' ##
xyz-a01 --> xyz-a01 ## Stays same ##
vm-1-ad-nic --> vm-1-ad ## delimiter is '-nic' ##
vm-lab-nic1 --> vm-lab-nic1 ## Stays same ##
The delimiting characters are 'nic', 'a' only. Other combinations of "-" & characters should be kept intact.
How do I achieve the above using java regex ?
If possible, please suggest a single liner...
You can do this with a String#replaceAll method:
str = str.replaceAll("-(nic|a)\\b", "");
Regex -(nic|adm) matches a hyphen followed by nic or adm.
\\b is for word boundary to make sure we don't match unwanted text like abc.
You can add more suffixes in this group that you want to be removed.

Velocity parser crashes when parsing java code template

When trying to use a java source code as template for Velocity, it crashes at this line of the template:
/* #see panama.form.Validator#validate(java.lang.Object) */
with this Exception:
Exception in thread "main" org.apache.velocity.exception.ParseErrorException: Lexical error, Encountered: "l" (108), after : "." at *unset*[line 23, column 53]
at org.apache.velocity.runtime.RuntimeInstance.evaluate(RuntimeInstance.java:1301)
at org.apache.velocity.runtime.RuntimeInstance.evaluate(RuntimeInstance.java:1265)
at org.apache.velocity.app.VelocityEngine.evaluate(VelocityEngine.java:199)
Apparently it takes the #validate for a macro and crashes when it tries to parse the arguments for the macro. Is there anything one could do about this?
I'm using Velocity 1.7.
Edit
I know I could escape the # characters in the template files, but there are quite a number of them which also might change now and then, so I would prefer a way that would not require manual changes on the files.
First option
Try this solution from here: Escaping VTL Directives
VTL directives can be escaped with the backslash character ("\") in a manner similar to valid VTL references.
## #include( "a.txt" ) renders as <contents of a.txt>
#include( "a.txt" )
## \#include( "a.txt" ) renders as #include( "a.txt" )
\#include( "a.txt" )
## \\#include ( "a.txt" ) renders as \<contents of a.txt>
\\#include ( "a.txt" )
Second option
You have this tool [EscapeTool][2].
Tool for working with escaping in Velocity templates.
It provides methods to escape outputs for Java, JavaScript, HTML, XML and SQL. Also provides methods to render VTL characters that otherwise needs escaping.
Third option:
You may also try this workaround, I didn't use it but it should work:
You can at the beginning read your template as a String and then pre-parse it. For example replace all # with \#, or add to the beginning of file
#set( $H = '#' )
$H$H
see this answer: How to escape a # in velocity And then from that pre-parsed String create Template by using this answer: How to use String as Velocity Template?

Complex Java Regular Expression with Nested Groupings

I am trying to get a regular expression written that will capture what I'm trying to match in Java, but can't seem to get it.
This is my latest attempt:
Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
This is what I want to match:
hello
hello/world
hello/big/world
hello/big/world/
This what I don't want matched:
/
/hello
hello//world
hello/big//world
I'd appreciate any insight into what I am doing wrong :)
Try this regex:
Pattern.compile( "^[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?$" );
Doesn't your regex require question mark at the end?
I always write unit tests for my regexes so I can fiddle with them until they pass.
// your exact regex:
final Pattern regex = Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
// your exact examples:
final String[]
good = { "hello", "hello/world", "hello/big/world", "hello/big/world/" },
bad = { "/", "/hello", "hello//world", "hello/big//world"};
for (String goodOne : good) System.out.println(regex.matcher(goodOne).matches());
for (String badOne : bad) System.out.println(!regex.matcher(badOne).matches());
prints a solid column of true values.
Put another way: your regex is perfectly fine just as it is.
It looks like what you're trying to 'Capture' is being overwritten each quantified itteration. Just change parenthesis arangement.
# "[A-Za-z0-9]+((?:/[A-Za-z0-9]+)*)/?"
[A-Za-z0-9]+
( # (1 start)
(?: / [A-Za-z0-9]+ )*
) # (1 end)
/?
Or, with no capture's at all -
# "[A-Za-z0-9]+(?:/[A-Za-z0-9]+)*/?"
[A-Za-z0-9]+
(?: / [A-Za-z0-9]+ )*
/?

Youtube complete Java Regex

I need to parse several pages to get all of their Youtube IDs.
I found many regular expressions on the web, but : the Java ones are not complete (they either give me garbage in addition to the IDs, or they miss some IDs).
The one that I found that seems to be complete is hosted here. But it is written in JavaScript and PHP. Unfortunately I couldn't translate them into JAVA.
Can somebody help me rewrite this PHP regex or the following JavaScript one in Java?
'~
https?:// # Required scheme. Either http or https.
(?:[0-9A-Z-]+\.)? # Optional subdomain.
(?: # Group host alternatives.
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com followed by
\S* # Allow anything up to VIDEO_ID,
[^\w\-\s] # but char before ID is non-ID char.
) # End host alternatives.
([\w\-]{11}) # $1: VIDEO_ID is exactly 11 chars.
(?=[^\w\-]|$) # Assert next char is non-ID or EOS.
(?! # Assert URL is not pre-linked.
[?=&+%\w]* # Allow URL (query) remainder.
(?: # Group pre-linked alternatives.
[\'"][^<>]*> # Either inside a start tag,
| </a> # or inside <a> element text contents.
) # End recognized pre-linked alts.
) # End negative lookahead assertion.
[?=&+%\w]* # Consume any URL (query) remainder.
~ix'
/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com\S*[^\w\-\s])([\w\-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:['"][^<>]*>|<\/a>))[?=&+%\w]*/ig;
First of all you need to insert and extra backslash \ foreach backslash in the old regex, else java thinks you escapes some other special characters in the string, which you are not doing.
https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
Next when you compile your pattern you need to add the CASE_INSENSITIVE flag. Here's an example:
String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
Pattern compiledPattern = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(link);
while(matcher.find()) {
System.out.println(matcher.group());
}
Marcus above has a good regex, but i found that it doesn't recognize youtube links that have "www" but not "http(s)" in them
for example www.youtube....
i have an update:
^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
it's the same except for the start

Categories

Resources