Regex: match JUnit assertEquals? - java

I'm migrating quite a few tests from JUnit to Spock:
// before
assertEquals("John Doe", userDTO.getFirstName());
// after
userDTO.getFirstName() == "John Doe"
To help make things quicker I want to replace (most of) JUnit's assert expressions with Spock's via a regular expression - supervised and file-by-file. assertFalse, assertTrue and assertNotNull are easy, but assertEqual is not since it has 2 parameters.
My current attempt is: assertEquals\(([^;]+),([^;]+)\);. But this doesn't work so well because it doesn't know whether a , separates an assertEquals parameter or not. How to I solve this?
My test cases are:
assertEquals(az, bz);
assertEquals(az(), bz);
assertEquals(az, bz());
assertEquals(az(), bz));
assertEquals(az, bz(cz, dz));
assertEquals(bz(cz, dz), az);
PS: Nested method calls are out of scope here.
Online: https://www.debuggex.com/r/aESv3YmNWsakNgI6/1

In general, matching arbitrarily nested structures with regexes is not something you should be doing. If we, however, limit your needs to the test cases you've listed here (removing the 4th, which is an error), then we can do something. You can also construct regexes for a variety of additional limited cases without making the thing too difficult.
I'll illustrate with python, but the same things probably work in your IDE.
>>> import re
>>> import pprint
>>> t = ["assertEquals(az, bz);", \
... "assertEquals(az(), bz);", \
... "assertEquals(az, bz());", \
... "assertEquals(az, bz(dz));", \
... "assertEquals(bz(dz), az);", \
... "assertEquals(az, bz(cz, dz));", \
... "assertEquals(bz(cz, dz), az);"]
>>> var = r'([a-z]+(\(([a-z]+(\s*,\s*[a-z]+)*)?\))?)'
>>> res = [ \
... re.sub( \
... r'assertEquals\(\s*' + var + '\s*,\s*' + var + '\s*\)', \
... r'\1 == \5', str \
... ) \
... for str in t]
>>> pprint.pprint(res)
['az == bz;',
'az() == bz;',
'az == bz();',
'az == bz(dz);',
'bz(dz) == az;',
'az == bz(cz, dz);',
'bz(cz, dz) == az;']
The important part is var:
( # group the entire var before the comma
[a-z]+ # acceptable variable name
( # followed by an optional group
\( # containing a pair of matching parens
( # which contain, optionally
[a-z]+ # an acceptable variable name
( # followed by any number (0 or more)
\s*,\s*[a-z]+ # of commas followed by acceptable variable names
)*
)?
\)
)?
)
To get this to work on your actual code, you'll have to change [a-z] to something more reasonable like [a-zA-Z0-9_]

Related

Why can't Nextflow handle this awk phrase?

Background:
Using a csv as input, I want to combine the first two columns into a new one (separated by an underscore) and add that new column to the end of a new csv.
Input:
column1,column2,column3
1,2,3
a,b,c
Desired output:
column1,column2,column3,column1_column2
1,2,3,1_2
a,b,c,a_b
The below awk phrase works from the command line:
awk 'BEGIN{FS=OFS=","} {print \$0, (NR>1 ? \$1"_"\$2 : "column1_column2")}' file.csv > full_template.csv
However, when placed within a nextflow script (below) it gives an error.
#!/usr/bin/env nextflow
params.input = '/file/location/here/file.csv'
process unique {
input:
path input from params.input
output:
path 'full_template.csv' into template
"""
awk 'BEGIN{FS=OFS=","} {print \$0, (NR>1 ? \$1"_"\$2 : "combined_header")}' $input > full_template.csv
"""
}
Here is the error:
N E X T F L O W ~ version 21.10.0
Launching `file.nf` [awesome_pike] - revision: 1b63d4b438
class groovyx.gpars.dataflow.expression.DataflowInvocationExpression cannot be cast to class java.nio.file.FileSystem (groovyx.gpars.dataflow.expression.Dclass groovyx.gpars.dataflow.expression.DataflowInvocationExpression cannot be cast to class java.nio.file.FileSystem (groovyx.gpars.dataflow.expression.DataflowInvocationExpression is in unnamed module of loader 'app'; java.nio.file.FileSystem is in module java.base of loader 'bootstrap')
I'm not sure what is causing this, and any help would be appreciated.
Thanks!
Edit:
Yes it seems this was not the source of the error (sorry!). I'm trying to use splitCsv on the resulting csv and this appears to be what's causing the error. Like so:
Channel
.fromPath(template)
.splitCsv(header:true, sep:',')
.map{ row -> tuple(row.column1, file(row.column2), file(row.column3)) }
.set { split }
I expect my issue is it's not acceptable to use .fromPath on a channel, but I can't figure out how else to do it.
Edit 2:
So this was a stupid mistake. I simply needed to add the .splitCsv option directly after the input line where I invoked the channel. Hardly elegant, but appears to be working great now.
process blah {
input:
what_you_want from template.splitCsv(header:true, sep:',').map{ row -> tuple(row.column1, file(row.column2), file(row.column3)) }
I was unable to reproduce the error you're seeing with your example code and Nextflow version. In fact, I get the expected output. This shouldn't be much of a surprise though, because you have correctly escaped the special dollar variables in your AWK command. The cause of the error is likely somewhere else in your code.
If escaping the special characters gets tedious, another way is to use a shell block instead:
It is an alternative to the Script definition with an important
difference, it uses the exclamation mark ! character as the variable
placeholder for Nextflow variables in place of the usual dollar
character.
The example becomes:
params.input_csv = '/file/location/here/file.csv'
input_csv = file( params.input_csv)
process unique {
input:
path input_csv
output:
path 'full_template.csv' into template
shell:
'''
awk 'BEGIN { FS=OFS="," } { print $0, (NR>1 ? $1 "_" $2 : "combined_header") }' \\
"!{input_csv}" > "full_template.csv"
'''
}
template.view { it.text }
Results:
$ nextflow run file.nf
N E X T F L O W ~ version 20.10.0
Launching `file.nf` [wise_hamilton] - revision: b71ff1eb03
executor > local (1)
[76/ddbb87] process > unique [100%] 1 of 1 ✔
column1,column2,column3,combined_header
1,2,3,1_2
a,b,c,a_b

What is the simplest way to parse logical expressions from a string in java? [duplicate]

I'm looking for some advice on my school project. I am supposed to create a program that takes a logical expression and outputs a truth table for it. The actually creating of the truth table for me is not difficult at all and I've already wrote the methods in Java for it. I would like to know if there are any classes in java that I could use to parse the expression for me and put it into a stack. If not I'm looking for help on parsing the expression. It's the parentheses that get me whenever I try and think it through. Also if this would be easier in any other language I would be open to doing it in that. Perl is probably my next best language.
Some examples
(P && Q) -> R
(P || Q || R) && ((P -> R) -> Q)
If you're allowed to use a parser generator tool like ANTLR, here's how you could get started. The grammar for a simple logic-language could look like this:
grammar Logic;
parse
: expression EOF
;
expression
: implication
;
implication
: or ('->' or)*
;
or
: and ('||' and)*
;
and
: not ('&&' not)*
;
not
: '~' atom
| atom
;
atom
: ID
| '(' expression ')'
;
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
However, if you'd parse input like (P || Q || R) && ((P -> R) -> Q) with a parser generated from the grammar above, the parse tree would contain the parenthesis (something you're not interested in after parsing the expression) and the operators would not be the root of each sub-trees, which doesn't make your life any easier if you're interested in evaluating the expression.
You'll need to tell ANTLR to omit certain tokens from the AST (this can be done by placing a ! after the token/rule) and make certain tokens/rules the root of their (sub) tree (this can be done by placing a ^ after it). Finally, you need to indicate in the options section of your grammar that you want a proper AST to be created instead of a simple parse tree.
So, the grammar above would look like this:
// save it in a file called Logic.g
grammar Logic;
options {
output=AST;
}
// parser/production rules start with a lower case letter
parse
: expression EOF! // omit the EOF token
;
expression
: implication
;
implication
: or ('->'^ or)* // make `->` the root
;
or
: and ('||'^ and)* // make `||` the root
;
and
: not ('&&'^ not)* // make `&&` the root
;
not
: '~'^ atom // make `~` the root
| atom
;
atom
: ID
| '('! expression ')'! // omit both `(` and `)`
;
// lexer/terminal rules start with an upper case letter
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
You can test the parser with the following class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
// the expression
String src = "(P || Q || R) && ((P -> R) -> Q)";
// create a lexer & parser
LogicLexer lexer = new LogicLexer(new ANTLRStringStream(src));
LogicParser parser = new LogicParser(new CommonTokenStream(lexer));
// invoke the entry point of the parser (the parse() method) and get the AST
CommonTree tree = (CommonTree)parser.parse().getTree();
// print the DOT representation of the AST
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
Now to run the Main class, do:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool Logic.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
Windows
java -cp antlr-3.3.jar org.antlr.Tool Logic.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
which will print a DOT source of the following AST:
(image produced with graphviz-dev.appspot.com)
Now all you need to do is evaluate this AST! :)
In Perl you can use Regexp::Grammars to do the parsing. It may be a little on the "grenade to kill an ant" side, but it should work.
Edit: Here is a (very quick) example which might get you going.
#!/usr/bin/env perl
use strict;
use warnings;
use Regexp::Grammars;
use Data::Dumper;
my $parser = qr/
<nocontext:>
<Logic>
<rule: Logic> <[Element]>*
<rule: Element> <Group> | <Operator> | <Item>
<rule: Group> \( <[Element]>* \)
<rule: Operator> (?:&&) | (?:\|\|) | (?:\-\>)
<rule: Item> \w+
/xms; #/ #Fix Syntax Highlight
my $text = '(P && Q) -> R';
print Dumper \%/ if $text =~ $parser; #/ #Fix Syntax Highlight
Look into JavaCC or ANTLR.
Regexps won't work.
You can probably also run your own parser using StreamTokenizer.
Building an expression parser is easy. Attaching actions to compute a value as you parse it is easy, too.
I assume you can write a BNF for your expression language.
This answer shows you how to build a parser easily, if you have a BNF.
Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
If you want to write your own parser, use the Shunting-yard algorithm to get rid of parentheses by converting the expression from infix into postfix notation or directly into a tree.
Another parser generator for Java is CUP.

Complex Java Regular Expression with Nested Groupings

I am trying to get a regular expression written that will capture what I'm trying to match in Java, but can't seem to get it.
This is my latest attempt:
Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
This is what I want to match:
hello
hello/world
hello/big/world
hello/big/world/
This what I don't want matched:
/
/hello
hello//world
hello/big//world
I'd appreciate any insight into what I am doing wrong :)
Try this regex:
Pattern.compile( "^[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?$" );
Doesn't your regex require question mark at the end?
I always write unit tests for my regexes so I can fiddle with them until they pass.
// your exact regex:
final Pattern regex = Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
// your exact examples:
final String[]
good = { "hello", "hello/world", "hello/big/world", "hello/big/world/" },
bad = { "/", "/hello", "hello//world", "hello/big//world"};
for (String goodOne : good) System.out.println(regex.matcher(goodOne).matches());
for (String badOne : bad) System.out.println(!regex.matcher(badOne).matches());
prints a solid column of true values.
Put another way: your regex is perfectly fine just as it is.
It looks like what you're trying to 'Capture' is being overwritten each quantified itteration. Just change parenthesis arangement.
# "[A-Za-z0-9]+((?:/[A-Za-z0-9]+)*)/?"
[A-Za-z0-9]+
( # (1 start)
(?: / [A-Za-z0-9]+ )*
) # (1 end)
/?
Or, with no capture's at all -
# "[A-Za-z0-9]+(?:/[A-Za-z0-9]+)*/?"
[A-Za-z0-9]+
(?: / [A-Za-z0-9]+ )*
/?

Youtube complete Java Regex

I need to parse several pages to get all of their Youtube IDs.
I found many regular expressions on the web, but : the Java ones are not complete (they either give me garbage in addition to the IDs, or they miss some IDs).
The one that I found that seems to be complete is hosted here. But it is written in JavaScript and PHP. Unfortunately I couldn't translate them into JAVA.
Can somebody help me rewrite this PHP regex or the following JavaScript one in Java?
'~
https?:// # Required scheme. Either http or https.
(?:[0-9A-Z-]+\.)? # Optional subdomain.
(?: # Group host alternatives.
youtu\.be/ # Either youtu.be,
| youtube\.com # or youtube.com followed by
\S* # Allow anything up to VIDEO_ID,
[^\w\-\s] # but char before ID is non-ID char.
) # End host alternatives.
([\w\-]{11}) # $1: VIDEO_ID is exactly 11 chars.
(?=[^\w\-]|$) # Assert next char is non-ID or EOS.
(?! # Assert URL is not pre-linked.
[?=&+%\w]* # Allow URL (query) remainder.
(?: # Group pre-linked alternatives.
[\'"][^<>]*> # Either inside a start tag,
| </a> # or inside <a> element text contents.
) # End recognized pre-linked alts.
) # End negative lookahead assertion.
[?=&+%\w]* # Consume any URL (query) remainder.
~ix'
/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com\S*[^\w\-\s])([\w\-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:['"][^<>]*>|<\/a>))[?=&+%\w]*/ig;
First of all you need to insert and extra backslash \ foreach backslash in the old regex, else java thinks you escapes some other special characters in the string, which you are not doing.
https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
Next when you compile your pattern you need to add the CASE_INSENSITIVE flag. Here's an example:
String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
Pattern compiledPattern = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(link);
while(matcher.find()) {
System.out.println(matcher.group());
}
Marcus above has a good regex, but i found that it doesn't recognize youtube links that have "www" but not "http(s)" in them
for example www.youtube....
i have an update:
^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
it's the same except for the start

Need regex to format file in php

I have a java file that I want to post online. I am using php to format the file.
Does anyone know the regex to turn the comments blue?
INPUT:
/*****
*This is the part
*I want to turn blue
*for my class
*******************/
class MyClass{
String s;
}
Thanks.
Naiive version:
$formatted = preg_replace('|(/\*.*?\*/)|m', '<span class="blue">$1</span>', $java_code_here);
... not tested, YMMV, etc...
In general, you won't be able to parse specific parts of a Java file using only regular expressions - Java is not a regular language. If your file has additional structure (such as "it always begins with a comment followed by a newline, followed by a class definition"), you can generate a regular expression for such a case. For instance, you'd match /\*+(.*?)\*+/$, where . is assumed to match multiple lines, and $ matches the end of a line.
In general, to make a regex work, you first define what patterns you want to find (rigorously, but in spoken language), and then translate that to standard regular expression notation.
Good luck.
A regex that can parse simple quotes should be able to find comments in C/C++ style languages.
I assume Java is of that type.
This is a Perl faq sample by someone else, although I added the part about // style comments (with or without line continuation) and reformated.
It basically does a global search and replace. Data is replaced verbatim if non a comment, otherwise replace the comment with your color formatting tags.
You should be able to adapt this to php, and it is expanded for clarity (maybe too much clarity though).
s{
## Comments, group 1:
(
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
(?:
[^/*][^*]*\*+
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment
|
// ## Start of // ... comment
(?:
[^\\] ## Any Non-Continuation character ^\
| ## OR
\\\n? ## Any Continuation character followed by 0-1 newline \n
)*? ## To be done 0-many times, stopping at the first end of comment
\n ## End of // comment
)
| ## OR, various things which aren't comments, group 2:
(
" (?: \\. | [^"\\] )* " ## Double quoted text
|
' (?: \\. | [^'\\] )* ' ## Single quoted text
|
. ## Any other char
[^/"'\\]* ## Chars which doesn't start a comment, string, escape
) ## or continuation (escape + newline)
}
{defined $2 ? $2 : "<some color>$1</some color>"}gxse;

Categories

Resources