Difficult Regex in Java, need advice

Difficult Regex in Java, need advice - java

Im trying to make a Regex in Java that can parse the following Strings
g1(g2,g2),g1(g2)
g1(g2(g3,g3),g2),g1(g2)
g1(g2)
I have been trying for hours but I cant make one that can split each example in the following classes.
public class G1{
List<G2> list;
}
public class G2{
String g2;
Set<String> g3;
}
Where I need one instance of g1 for each of the groups.
Thanks
EDIT
Fixed the classes.

It looks like you have a grammar to deal with, regular expressions is not really the appropriate tool to do it, instead you're better off building a simple finite state machine to do the parsing.
Another option which I don't recommend for something this simple is to use ANTLR which is a tool that is designed to do this sort of parsing. I don't recommend it because it would be overkill for the job.

Regex is not a tool that handles recursion well.
For instance, it can't easily discern that the outer parentheses is the one you want in this line
g1(g2(g3,g3),g2),g1(g2)
If you try to use a greedy regex, it would go collect the whole line g1(g2(g3,g3),g2),g1(g2). If you try to go for non-greedy, it would collect g1(g2(g3,g3). Regexes that might gather it are pretty shaky and can break pretty easy.
If the outer group is always called g1 and g1 is never nested within another group, you might be able to use something like this
g1\(.*?\)(?=,g1|$)
Really though, regex is not a tool for this task.

Related

Javaslang - how do I join two successful Trys?

I'm still learning Javaslang/Vavr, so excuse the ignorance. The project I'm working on is stuck on Javaslang 2.1.0.
My question: is there a more "Functional" way (as opposed to imperative style) to structure my code to join multiple Trys only once they are successful?
I want to Try each input independently, the idea being to get as much as possible error information - I do not want to stop on the first error encountered (so orElse() etc. won't do the trick). But once no errors are found any more, I want to do something further involving all of the inputs.
My current code looks like this (suitably anonymized):
Try<BigDecimal> amountTry = Try.of(this::readNumber)
.map(BigDecimal::valueOf)
.onFailure(this::collectError);
Try<Currency> currencyTry = Try.of(this::readString)
.map(currency -> currencyLookup(Currency.class, currency))
.onFailure(this::collectError);
if (amountTry.isSuccess() && currencyTry.isSuccess()) {
sale.setAmount(Amount.of(amountTry.get(), currencyTry.get()));
}
Can you suggest a pattern to replace the if() with something more in the functional style of programming?

The Javaslang/Vavr construct that you are looking for is the for comprehension construct, which is accessible through the API.For methods.
import javaslang.control.Try;
import static javaslang.API.For;
...
For(amountTry, currencyTry)
.yield(Amount::of)
.forEach(sale::setAmount);
That is, if both amountTry and currencyTry are non-empty, it creates an Iterable by yielding a result value on the cross-product of the two iterables, and performing an action on each of the resulting elements by invoking a Consumer. Here is the same in lambda form with explicit input types, if it helps you better understand it:
For(amountTry, currencyTry)
.yield((BigDecimal amount, Currency currency) -> Amount.of(amount, currency))
.forEach((Amount amount) -> sale.setAmount(amount));
Later versions of the library have overloads of the for comprehension for Try which will return a Try instance instead of Iterable, which makes the API a little bit nicer if you want to stay in Try domain.

AssertJ testing on a collection: what is better: 'extracting' or Java8 forEach

I'm new to AssertJ and using it to unit-test my written code and was thinking how to assert a list.
Lets assume we have a list of Consumers Entities. each Entity has it own Phone, own ServiceProvider which has it own Name and EntityName.
Now we want to assert that each Entity from a repository gets the right data, so we want to test that each item on list has equal Phone.
ConsumerEntity savedConsumer1 = Consumer(phone, name, serviceProvider)
List<ConsumerEntity> consumerListFromRepository = repository.findAllByPhone(phone)
Now I want to test that the data given from Repository is correct,
I can use this:
assertThat(consumerListFromRepository)
.extracting(ConsumerEntity::getPhone())
.containsOnly(savedConsumer1.getPhone());
Or I can do this with forEach (java 8):
consumerListFromRepository.forEach(consumerEntity ->
assertThat(consumerEntity.getPhone()).isEqualTo(savedConsumer1.getPhone()));
1. Which one is faster/simple-r/readable? I will go for the forEach for less lines of code but less read-ability as well.
2. Is there any other way to do it 1liner like the foreach but with asserThat? so it will be readable and simple - and without the need to use EqualTo each
time? something like:
asserThat(list).forEach........
3. Which one is faster? Extracting or forEach?
Thanks!

I'm not sure that "faster" is a primary concern here. It's likely that any performance difference is immaterial; either the underlying implementations are ~equivalent in terms of non-functionals or - since the context here is a unit test - the consumerListFromRepository is trivially small thereby limiting the scope for any material performance differences.
I think your main concerns here should be
Making it as easy as possible for other developers to:
Understand/reason about your test case
Edit/refactor your test case
Ensuring that your approach to asserting is consistent with other test cases in your code base
Judging which of your two approaches best ticks this box is somewhat subjective but I think the following considerations are relevant:
The Java 8 forEach construct is well understood and the isEqualTo matcher is explicit and easily understood
The AssertJ extracting helper paired with the containsOnly is less common that Java8's forEach construct but this pairing reads logically and is easily understood
So, IMHO both approaches are valid. If your code base consistently uses AssertJ then I'd suggest using the extracting helper paired with the containsOnly matcher for consistency. Otherwise, use whichever of them reads best to you :)

parsing a Python expression from Java

I've got a bit of an interesting challenge
To the point:
I want to allow a user to enter an expression in a text field, and have that string treated as a python expression. There are a number of local variables I would like to make available to this expression.
I do have a solution though it will be cumbersome to implement. I was thinking of keeping a Python class source file, with a function that has a single %s in it. When the user enters his expression, we simply do a string format, and then call Jython's interpreter, to spit out something we can execute. There would have to be a number of variable declaration statements in front of that expression to make sure the variables we want to expose to the user for his expression.
So the user would be presented with a text field, he would enter
x1 + (3.5*x2) ** x3
and we would do our interpreting process to come up with an open delegate object. We then punch the values into this object from a map, and call execute, to get the result of the expression.
Any objections to using Jython, or should I be doing something other than modifying source code? I would like to think that some kind of mutable object akin to C#'s Expression object, where we could do something like
PythonExpression expr = new PythonExpression(userSuppliedText)
expr.setDefaultNamespace();
expr.loadLibraries("numPy", /*other libraries?*/);
//comes from somewhere else in the flow, but effectively we get
Map<String, Double> symbolValuesByName = new HashMap<>(){{
put("x1", 3.0);
put("x2", 20.0);
put("x3", 2.0);
}};
expr.loadSymbols(symbolValuesByName);
Runnable exprDelegate = expr.compile();
//sometime later
exprDelegate.run();
but, I'm hoping for a lot, and it looks like Jython is as good as it gets. Still, modifying source files and then passing them to an interpreter seems really heavy-handed.
Does that sound like a good approach? Do you guys have any other libraries you'd suggest?
Update: NumPy does not work with Jython
I should've discovered this one on my own.
So now my question shifts: Is there any way that from a single JVM process instance (meaning, without ever having to fork) I can compile and run some Python code?

If you simply want to parse the expressions, you ought to be able to put something together with a Java parser generator.
If you want to parse, error check and evaluate the expressions, then you will need a substantial subset of the functionality a full Python interpreter.
I'm not aware of a subset implementation.
If such a subset implementation exists, it is unclear that it would be any easier to embed / call than to use a full Python interpreter ... like Jython.
If the powers that be dictate that "thou shalt use python", then they need to pay for the extra work it is going to cause you ... and the next guy who is going to need to maintain a hybrid system across changes in requirements, and updates to the Java and Python / Jython ecosystems. Factor it into the project estimates.
The other approach would be to parse the full python expression grammar, but limit what your evalutor can handle ... based on what it actually required, and what is implementable in your project's time-frame. Limit the types supported and the operations on the types. Limit the built-in functions supported. Etcetera.
Assuming that you go down the Java calling Jython route, there is a lot of material on how to implement it here: http://www.jython.org/jythonbook/en/1.0/JythonAndJavaIntegration.html

What is "string bashing" and why is it bad?

My boss keeps using the term "string bashing" (we're a Java shop) and usually makes an example out of me whenever I ask him anything (as if, I'm supposed to know it already). I Googled the term only to find results pertaining to theoretical physics and string theory.
I am guessing it has something to do with using String/StringBuilders incorrectly or not in keeping with best practices, but for the life of me, I can't figure out what it is.

"String bashing" is a slang term for cutting up strings and manipulating them: splitting, joining, inserting, tokenizing, parsing, etc..
It's not inherently bad (despite the connotation of "bashing"), but as you point out, in Java, one needs to be careful not to use String when StringBuilder would be more efficient.

Why don't you ask your boss for an example of string bashing.
Don't forget to ask him for the correct way of refactoring the examples he gives you.

Out of context, "string bashing" doesn't really have any meaning in itself. It's not a buzz word for any good or bad behaviour. It would just mean "bashing strings", as in using string operations.
Whether that is good or bad depends on what you are doing, and the role of the strings would not really be important. There are good and bad ways of handling any kind of data.
Sometimes "bashing strings" is actually the best solution. Consider for example that you want to pick out the first three characters of a string. You could create a regular expression that isolates the characters, but that would certainly be overkill as there is a simple string operation that can do the same, which is a lot faster and easier to maintain.

Effective Java has an item about using strings: "Item 50: Avoid strings where other types are more appropriate". Also on stackoverflow: "Stringly typed".

A guess: It might imply something related to creation of unnecessary temporary objects, and in this particular case Strings. For example, if you're constructing a String token by token then it's usually a good idea to use a StringBuilder. If the String is not built using a builder, each concatenation will cause another temporary object to be created (and later garbage collected).
In modern VMs (I'm thinking HotSpot 1.5 or 1.6) this is rarely a problem unless you're in performance critical code or you're building long strings, e.g. in for loops.
Only a guess; might be better to ask what he or she means? I've never heard the term before.

There are a few results on google which refer to string bashing in this context. They don't appear to refer to the concern about the inefficent temporaries and using StringBuilder.
Instead, it appears to refer to simplistic string parsing. I.e. doing stuff like checking for substrings, slicing the string, etc. In particular, it appears to have the implication of it being a hacky solution to the problem.
It might be seen badly because you should either use real parsing or obtain the data in a non-string format.

Java source refactoring of 7000 references

I need to change the signature of a method used all over the codebase.
Specifically, the method void log(String) will take two additional arguments (Class c, String methodName), which need to be provided by the caller, depending on the method where it is called. I can't simply pass null or similar.
To give an idea of the scope, Eclipse found 7000 references to that method, so if I change it the whole project will go down. It will take weeks for me to fix it manually.
As far as I can tell Eclipse's refactoring plugin of Eclipse is not up to the task, but I really want to automate it.
So, how can I get the job done?

Great, I can copy a previous answer of mine and I just need to edit a tiny little bit:
I think what you need to do is use a source code parser like javaparser to do this.
For every java source file, parse it to a CompilationUnit, create a Visitor, probably using ModifierVisitor as base class, and override (at least) visit(MethodCallExpr, arg). Then write the changed CompilationUnit to a new File and do a diff afterwards.
I would advise against changing the original source file, but creating a shadow file tree may me a good idea (e.g. old file: src/main/java/com/mycompany/MyClass.java, new file src/main/refactored/com/mycompany/MyClass.java, that way you can diff the entire directories).

Eclipse is able to do that using Refactor -> Change Method signature and provide default values for the new parameters.
For the class parameter the defaultValue should be this.getClass() but you are right in your comment I don't know how to do for the method name parameter.

IntelliJ IDEA shouldn't have any trouble with this.
I'm not a Java expert, but something like this could work. It's not a perfect solution (it may even be a very bad solution), but it could get you started:
Change the method signature with IntelliJ's refactoring tools, and specify default values for the 2 new parameters:
c: self.getClass()
methodName: Thread.currentThread().getStackTrace()[1].getMethodName()
or better yet, simply specify null as the default values.

I think that there are several steps to dealing with this, as it is not just a technical issue but a 'situation':
Decline to do it in short order due to the risk.
Point out the issues caused by not using standard frameworks but reinventing the wheel (as Paul says).
Insist on using Log4j or equivalent if making the change.
Use Eclipse refactoring in sensible chunks to make the changes and deal with the varying defaults.
I have used Eclipse refactoring on quite large changes for fixing old smelly code - nowadays it is fairly robust.

Maybe I'm being naive, but why can't you just overload the method name?
void thing(paramA) {
thing(paramA, THE_DEFAULT_B, THE_DEFAULT_C)
}
void thing(paramA, paramB, paramC) {
// new method
}

Do you really need to change the calling code and the method signature? What I'm getting at is it looks like the added parameters are meant to give you the calling class and method to add to your log data. If the only requirement is just adding the calling class/method to the log data then Thread.currentThread().getStackTrace() should work. Once you have the StackTraceElement[] you can get the class name and method name for the caller.

If the lines you need replaced fall into a small number of categories, then what you need is Perl:
find -name '*.java' | xargs perl -pi -e 's/log\(([^,)]*?)\)/log(\1, "foo", "bar")/g'
I'm guessing that it wouldn't be too hard to hack together a script which would put the classname (derived from the filename) in as the second argument. Getting the method name in as the third argument is left as an exercise to the reader.

Try refactor using intellij. It has a feature called SSR (Structural Search and Replace). You can refer classes, method names, etc for a context. (seanizer's answer is more promising, I upvoted it)

I agree with Seanizer's answer that you want a tool that can parse Java. That's necessary but not sufficient; what you really want is a tool that can carry out a reliable mass-change.
To do this, you want a tool that can parse Java, can pattern match against the parsed code, install the replacement call, and spit out the answer without destroying the rest of the source code.
Our DMS Software Reengineering Toolkit can do all of this for a variety of languages, including Java. It parses complete java systems of source, builds abstract syntax trees (for the entire set of code).
DMS can apply pattern-directed, source-to-source transformations to achieve the desired change.
To achieve the OP's effect, he would apply the following program transformation:
rule replace_legacy_log(s:STRING): expression -> expression
" log(\s) " -> " log( \s, \class\(\), \method\(\) ) "
What this rule says is, find a call to log which has a single string argument, and replace it with a call to log with two more arguments determined by auxiliary functions class and method.
These functions determine the containing method name and containing class name for the AST node root where the rule finds a match.
The rule is written in "source form", but actually matches against the AST and replaces found ASTs with the modified AST.
To get back the modified source, you ask DMS to simply prettyprint (to make a nice layout) or fidelity print (if you want the layout of the old code preserved). DMS preserves comments, number radixes, etc.\
If the exisitng application has more than one defintion of the "log" function, you'll need to add a qualifier:
... if IsDesiredLog().
where IsDesiredLog uses DMS's symbol table and inheritance information to determine if the specific log refers to the definition of interest.

Il fact your problem is not to use a click'n'play engine that will allow you to replace all occurences of
log("some weird message");
by
log(this.getClass(), new Exception().getStackTrace()[1].getMethodName());
As it has few chances to work on various cases (like static methods, as an example).
I would tend to suggest you to take a look at spoon. This tool allows source code parsing and transformation, allowing you to achieve your operation in a -obviously code based- slow, but controlled operation.
However, you could alos consider transforming your actual method with one exploring stack trace to get information or, even better, internally use log4j and a log formatter that displays the correct information.

I would search and replace log( with log(#class, #methodname,
Then write a little script in any language (even java) to find the class name and the method names and to replace the #class and #method tokens...
Good luck

If the class and method name are required for "where did this log come from?" type data, then another option is to print out a stack trace in your log method. E.g.
public void log(String text)
{
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw, true);
new Throwable.printStackTrace(pw);
pw.flush();
sw.flush();
String stackTraceAsLog = sw.toString();
//do something with text and stackTraceAsLog
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Difficult Regex in Java, need advice - java

Related

Javaslang - how do I join two successful Trys?

AssertJ testing on a collection: what is better: 'extracting' or Java8 forEach

parsing a Python expression from Java

What is "string bashing" and why is it bad?

Java source refactoring of 7000 references

Categories

Resources