Java console input handling - java

This is my first question here, I hope it's not too based on opinions. I've searched on the internet for quite a while now, but couldn't find a similar question.
I need to write a Java program that reads commands from the console, validates the input, gets the parameters and passes them on to a different class.
There are some restrictions on what I can do and use (university).
Only the packages java.util, java.lang and java.io are allowed
Each method can only be 80 lines long
Each line can only be 120 characters long
I am not allowed to use System.exit / Runtime.exit
The Terminal class is used to handle user input. Terminal.readLine() will read a line from the console, like Scanner.nextLine()
I have a fully working program - however my solution will not be accepted because of the way I handle console inputs (runInteractionLoop() method too long). I'm doing it like this:
The main class has the main method and an "interaction loop" where console inputs are handled. The main method calls the interaction loop in a while loop, with a boolean "quit" as a guardian.
private static boolean quit = false;
...
public static void main(String[] args) {
...
while (quit == false) {
runInteractionLoop();
}
}
The interaction loop handles console input. I need to check for 16 different commands - each with their own types of parameters. I chose to work with Patterns and Matchers, because I can use the groups for convenience. Now the problems start - I have never learned how to correctly handle user inputs. What I have done here is, for each possible command, create a new Matcher, see if the input matches, if it does then do whatever needs to be done for this input.
private static runInteractionLoop() {
Matcher m;
String query = Terminal.readLine;
m = Pattern.compile("sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSth(Integer.parseInt(m.group(1)), ......);
...
return;
}
m = Pattern.compile("record ([a-z]+) (-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSthElse(m.group(1), Double.parseDouble(m.group(2)));
return;
}
...
if (query.equals("quit")) {
quit = true;
return;
}
Terminal.printError("invalid input");
}
As you can see, doing this 16 times stretches out the method to more than 80 lines (5 lines per input max). It's also obviously very inefficient and to be honest, I'm quite ashamed to be posting this here (crap code). I just don't know how to do this correctly, using only java.util and having some way to quickly get the parameters (e.g. the Matcher groups here).
Any ideas? I would be very grateful for suggestions. Thanks.
EDIT/UPDATE:
I have made the decision to split the verification into two methods - one for each half of the commands. Looks ugly, but passes the Uni's checkstyle requirements. However, I'd still be more than happy if someone shows me a better solution to my problem - for the future (because I obviously have no idea how to make this prettier, shorter and/or more efficient).

I guess you could try something painful like this where you separate everything into a chain of method calls:
private static runInteractionLoop() {
Matcher m;
String query = Terminal.readLine;
m = Pattern.compile("sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSth(Integer.parseInt(m.group(1)), ......);
...
return;
} else {
tryDouble(query, m);
}
}
Private static tryDouble(String query, Matcher m) {
m = Pattern.compile("record ([a-z]+) (-?\\d+(?:\\.\\d+)?)").matcher(query);
if (m.matches()) {
xyz.doSthElse(m.group(1), Double.parseDouble(m.group(2)));
return;
} else {
trySomethingElse(query, m);
}
}
Private static trySomethingElse(String query, Matcher m) {
...
if (query.equals("quit")) {
quit = true;
return;
}
Terminal.printError("invalid input");
}

I would solve this with an abstract class CommandValidator:
public abstract class CommandValidator {
/* getter and setter */
public Matcher resolveMatcher(String query) {
return Pattern.compile(getCommand()).matcher(query);
}
public abstract String getCommand();
public abstract void doSth();
}
and would implement 16 different CommandValidators for each handler and implement the abstract methods differently:
public class IntegerCommandValidator extends CommandValidator {
#Override
public String getCommand() {
return "sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)";
}
#Override
public void doSth() {
/* magic here, parameter input the matcher and xyz, or have it defined as field at the class */
// xyz.doSth(Integer.parseInt(m.group(1)), ......);
}
}
Since you need the matcher in your CommandValidator you might set it as field of the class, or just give it into the doSth() method.
Then you can instantiate each concrete Validator in a list and iterate through every validator, resolve the matcher and look if it matches:
private static Set<CommandValidator> allConcreteValidators;
public static void main(String[] args) {
/* */
allConcreteValidators.add(new IntegerCommandValidator());
/* */
while (quit == false) {
runInteractionLoop();
}
}
private static runInteractionLoop() {
String query = Terminal.readLine;
for (CommandValidator validator : allConcreteValidators) {
if (validator.resolveMatcher(query).matches()) {
validator.doSth();
}
}
}
Of course you could build a lookup method before, if there even is a validator which fits and handle the case that you don't have any validator defined.
Might be a bit over engineered for your exercise. Maybe you can give the command into the constructor of your concrete validators, if they share the same doSth magic as well.
Ofc you should find better names for the classes, because it is not only a validator but something different.

You can boil down each possibility to two lines (or three if there must be a closing bracket on a separat line) by delegating the match work to a submethod:
if ( Matcher m = matches( query, "sliding-window (\\d+) (-?\\d+(?:\\.\\d+)?;)*(-?\\d+(?:\\.\\d+)?)") != null)
xyz.doSth(Integer.parseInt(m.group(1)), ......);
else if ( Matcher m = matches( query, "record ([a-z]+) (-?\\d+(?:\\.\\d+)?)") != null)
xyz.doSthElse(m.group(1), Double.parseDouble(m.group(2)));
...
else
private Matcher matches( String input, String regexp)
{
Matcher result = Pattern.compile(regexp).matcher(input);
if ( result.matches() )
return result;
else
return null;
}

Related

How to restrict Matcher or Pattern in java?

How to restrict Matcher in java to match only desired String ? Following is the code I have tried, however the expected match should be like "Invoice Received" but it is printing only "Invoice" on console.
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JavaTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
List<String> actionList = new ArrayList<String>();
actionList.add("Invoice");
actionList.add("Invoice Received");
List<String> notes = new ArrayList<String>();
notes.add("Invoice Received123");
for (String note : notes) {
for (String action : actionList) {
Pattern pattern = Pattern.compile(action);
Matcher matcher = pattern.matcher(note);
if(matcher.find()) {
System.out.println("Update History As : "+action);
}
}
}
}
}
if(matcher.find()) {
System.out.println("Update History As : "+action);
break;
}
This is breaking your code. Literally. The break statement exits the inner for loop when there is a pattern match. As a result, Invoice Recieved never has a chance to be matched.
Originally this was the interpreted issue, but the question has since become about flow control for this particular problem. As a suggested solution, here is an example of the Note object without polymorphism, but rather a control code.
public class Note {
public static final int INVOICE = 1;
public static final int INVOICE_RECEIVED = 2;
public int noteType;
public String userText;
public Note(int noteType, String userText) {
this.noteType = noteType;
this.userText = userText;
}
public void doSomething() {
switch(noteType) {
case INVOICE:
// do something with the INVOICE type
break;
case INVOICE_RECEIVED:
// do something with the INVOICE_RECEIVED type
break;
}
}
}
Then, you can then create a Invoice Received Note object by Note newNote = new Note(Note.INVOICE_RECEIVED, "this is some user text"); and add them to a list, similar to what you are doing, and handle them accordingly. Depending on the amount of notes you have, a polymorphic design might be better, or at least cleaner. But this is the way of doing it using control codes.
You'll need to order the patterns that you are looking for so that prefixes of one pattern always come after that pattern. In concrete term:
List<String> actionList = new ArrayList<String>();
actionList.add("Invoice Received"); /* Make this take precedence... */
actionList.add("Invoice"); /* ... over this. */
Then put the break back into your match case, or every "Invoiced Received" note will also be handled as an "Invoice" note too:
if(matcher.find()) {
System.out.println("Update History As : "+action);
break;
}
In general this sort of system will be very susceptible to bugs. If you have any control over the input to this process, modify it so that the note type is explicit, instead of guessed from its content.

Modelling a regular expression parser with polymorphism

So, I'm doing a regular expression parser for school that creates a hierarchy of objects in charge of the matching. I decided to do it object oriented because it's easier for me to imagine an implementation of the grammar that way. So, these are my classes making up the regular expressions. It's all in Java, but I think you can follow along if you're proficient in any object oriented language.
The only operators we're required to implement is Union (+), Kleene-Star (*), Concatenation of expressions (ab or maybe (a+b)c) and of course the Parenthesis as illustrated in the example of Concatination. This is what I've implemented right now and I've got it to work like a charm with a bit of overhead in the main.
The parent class, Regexp.java
public abstract class Regexp {
//Print out the regular expression it's holding
//Used for debugging purposes
abstract public void print();
//Checks if the string matches the expression it's holding
abstract public Boolean match(String text);
//Adds a regular expression to be operated upon by the operators
abstract public void add(Regexp regexp);
/*
*To help the main with the overhead to help it decide which regexp will
*hold the other
*/
abstract public Boolean isEmpty();
}
There's the most simple regexp, Base.java, which holds a char and returns true if the string matches the char.
public class Base extends Regexp{
char c;
public Base(char c){
this.c = c;
}
public Base(){
c = null;
}
#Override
public void print() {
System.out.println(c);
}
//If the string is the char, return true
#Override
public Boolean match(String text) {
if(text.length() > 1) return false;
return text.startsWith(""+c);
}
//Not utilized, since base is only contained and cannot contain
#Override
public void add(Regexp regexp) {
}
#Override
public Boolean isEmpty() {
return c == null;
}
}
A parenthesis, Paren.java, to hold a regexp inside it. Nothing really fancy here, but illustrates how matching works.
public class Paren extends Regexp{
//Member variables: What it's holding and if it's holding something
private Regexp regexp;
Boolean empty;
//Parenthesis starts out empty
public Paren(){
empty = true;
}
//Unless you create it with something to hold
public Paren(Regexp regexp){
this.regexp = regexp;
empty = false;
}
//Print out what it's holding
#Override
public void print() {
regexp.print();
}
//Real simple; either what you're holding matches the string or it doesn't
#Override
public Boolean match(String text) {
return regexp.match(text);
}
//Pass something for it to hold, then it's not empty
#Override
public void add(Regexp regexp) {
this.regexp = regexp;
empty = false;
}
//Return if it's holding something
#Override
public Boolean isEmpty() {
return empty;
}
}
A Union.java, which is two regexps that can be matched. If one of them is matched, the whole Union is a match.
public class Union extends Regexp{
//Members
Regexp lhs;
Regexp rhs;
//Indicating if there's room to push more stuff in
private Boolean lhsEmpty;
private Boolean rhsEmpty;
public Union(){
lhsEmpty = true;
rhsEmpty = true;
}
//Can start out with something on the left side
public Union(Regexp lhs){
this.lhs = lhs;
lhsEmpty = false;
rhsEmpty = true;
}
//Or with both members set
public Union(Regexp lhs, Regexp rhs) {
this.lhs = lhs;
this.rhs = rhs;
lhsEmpty = false;
rhsEmpty = false;
}
//Some stuff to help me see the unions format when I'm debugging
#Override
public void print() {
System.out.println("(");
lhs.print();
System.out.println("union");
rhs.print();
System.out.println(")");
}
//If the string matches the left side or right side, it's a match
#Override
public Boolean match(String text) {
if(lhs.match(text) || rhs.match(text)) return true;
return false;
}
/*
*If the left side is not set, add the member there first
*If not, and right side is empty, add the member there
*If they're both full, merge it with the right side
*(This is a consequence of left-to-right parsing)
*/
#Override
public void add(Regexp regexp) {
if(lhsEmpty){
lhs = regexp;
lhsEmpty = false;
}else if(rhsEmpty){
rhs = regexp;
rhsEmpty = false;
}else{
rhs.add(regexp);
}
}
//If it's not full, it's empty
#Override
public Boolean isEmpty() {
return (lhsEmpty || rhsEmpty);
}
}
A concatenation, Concat.java, which is basically a list of regexps chained together. This one is complicated.
public class Concat extends Regexp{
/*
*The list of regexps is called product and the
*regexps inside called factors
*/
List<Regexp> product;
public Concat(){
product = new ArrayList<Regexp>();
}
public Concat(Regexp regexp){
product = new ArrayList<Regexp>();
pushRegexp(regexp);
}
public Concat(List<Regexp> product) {
this.product = product;
}
//Adding a new regexp pushes it into the list
public void pushRegexp(Regexp regexp){
product.add(regexp);
}
//Loops over and prints them
#Override
public void print() {
for(Regexp factor: product){
factor.print();
}
}
/*
*Builds up a substring approaching the input string.
*When it matches, it builds another substring from where it
*stopped. If the entire string has been pushed, it checks if
*there's an equal amount of matches and factors.
*/
#Override
public Boolean match(String text) {
ArrayList<Boolean> bools = new ArrayList<Boolean>();
int start = 0;
ListIterator<Regexp> itr = product.listIterator();
Regexp factor = itr.next();
for(int i = 0; i <= text.length(); i++){
String test = text.substring(start, i);
if(factor.match(test)){
start = i;
bools.add(true);
if(itr.hasNext())
factor = itr.next();
}
}
return (allTrue(bools) && (start == text.length()));
}
private Boolean allTrue(List<Boolean> bools){
return product.size() == bools.size();
}
#Override
public void add(Regexp regexp) {
pushRegexp(regexp);
}
#Override
public Boolean isEmpty() {
return product.isEmpty();
}
}
Again, I've gotten these to work to my satisfaction with my overhead, tokenization and all that good stuff. Now I want to introduce the Kleene-star operation. It matches on any number, even 0, of occurrences in the text. So, ba* would match b, ba, baa, baaa and so on while (ba)* would match on ba, baba, bababa and so on. Does it even look possible to extend my Regexp to this or do you see another way of solving this?
PS: There's getters, setter and all kinds of other support functions that I didn't write out, but this is mainly for you to get the point quickly of how these classes works.
You seem to be trying to use a fallback algorithm to do the parsing. That can work -- although it is easier to do with higher-order functions -- but it is far from the best way to parse regular expressions (by which I mean the things which are mathematically regular expressions, as opposed to the panoply of parsing languages implemented by "regular expression" libraries in various languages).
It's not the best way because the parsing time is not linear in the size of the string to be matched; in fact, it can be exponential. But to understand that, it's important to understand why your current implementation has a problem.
Consider the fairly simple regular expression (ab+a)(bb+a). That can match exactly four strings: abbb, aba, abb, aa. All of those strings start with a, so your concatenation algorithm will match the first concatenand ((ab+a)) at position 1, and proceed to try the second concatenand (bb+a). That will successfully match abb and aa, but it will fail on aba and abbb.
Now, suppose you modified the concatenation function to select the longest matching substring rather than the shortest one. In that case, the first subexpression would match ab in three of the possible strings (all but aa), and the match would fail in the case of abb.
In short, when you are matching a concatenation R·S, you need to do something like this:
Find some initial string which matches R
See if S matches the rest of the text
If not, repeat with another initial string which matches R
In the case of full regular expression matches, it doesn't matter which order we list matches for R, but usually we're trying to find the longest substring which matches a regular expression, so it is convenient to enumerate the possible matches from longest to shortest.
Doing that means that we need to be able to restart a match after a downstream failure, to find the "next match". That's not terribly complicated, but it definitely complicates the interface, because all of the compound regular expression operators need to "pass through" the failure to their children in order to find the next alternative. That is, the operator R+S might first find something which matches R. If asked for the next possibility, it first has to ask R if there is another string which it could match, before moving on to S. (And that's passing over the question of how to get + to list the matches in order by length.)
With such an implementation, it's easy to see how to implement the Kleene star (R*), and it is also easy to see why it can take exponential time. One possible implementation:
First, match as many R as possible.
If asked for another match: ask the last R for another match
If there are no more possibilities, drop the last R from the list, and ask what is now the last R for another match
If none of that worked, propose the empty string as a match
Fail
(This can be simplified with recursion: Match an R, then match an R*. For the next match, first try the next R*; failing that try the next R and the first following R*; when all else fails, try the empty string.)
Implementing that is an interesting programming exercise, so I encourage you to continue. But be aware that there are better algorithms. You might want to read Russ Cox's interesting essays on regular expression matching.

How to iterate over regexp compliant strings

What is the easiest way to implement a class (in Java) that would serve as an iterator over the set of all values which conform to a given regexp?
Let's say I have a class like this:
public class RegexpIterator
{
private String regexp;
public RegexpIterator(String regexp) {
this.regexp = regexp;
}
public abstract boolean hasNext() {
...
}
public abstract String next() {
...
}
}
How do I implement it? The class assumes some linear ordering on the set of all conforming values and the next() method should return the i-th value when called for the i-th time.
Ideally the solution should support full regexp syntax (as supported by the Java SDK).
To avoid confusion, please note that the class is not supposed to iterate over matches of the given regexp over a given string. Rather it should (eventually) enumerate all string values that conform to the regexp (i.e. would be accepted by the matches() method of a matcher), without any other input string given as argument.
To further clarify the question, let's show a simple example.
RegexpIterator it = new RegexpIterator("ab?cd?e");
while (it.hasNext()) {
System.out.println(it.next());
}
This code snippet should have the following output (the order of lines is not relevant, even though a solution which would list shorter strings first would be preferred).
ace
abce
ecde
abcde
Note that with some regexps, such as ab[A-Z]*cd, the set of values over which the class is to iterate is ininite. The preceeding code snippet would run forever in these cases.
Do you need to implement a class? This pattern works well:
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher("123, sdfr 123kjkh 543lkj ioj345ljoij123oij");
while (m.find()) {
System.out.println(m.group());
}
output:
123
123
543
345
123
for a more generalized solution:
public static List<String> getMatches(String input, String regex) {
List<String> retval = new ArrayList<String>();
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
while (m.find()) {
retval.add(m.group());
}
return retval;
}
which then can be used like this:
public static void main(String[] args) {
List<String> matches = getMatches("this matches _all words that _start _with an _underscore", "_[a-z]*");
for (String s : matches) { // List implements the 'iterable' interface
System.out.println(s);
}
}
which produces this:
_all
_start
_with
_underscore
more information about the Matcher class can be found here: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html
Here is another working example. It might be helpful :
public class RegxIterator<E> implements RegexpIterator {
private Iterator<E> itr = null;
public RegxIterator(Iterator<E> itr, String regex) {
ArrayList<E> list = new ArrayList<E>();
while (itr.hasNext()) {
E e = itr.next();
if (Pattern.matches(regex, e.toString()))
list.add(e);
}
this.itr = list.iterator();
}
#Override
public boolean hasNext() {
return this.itr.hasNext();
}
#Override
public String next() {
return this.itr.next().toString();
}
}
If you want to use it for other dataTypes(Integer,Float etc. or other classes where toString() is meaningful), declare next() to return Object instead of String. Then you may able be to perform a typeCast on the return value to get back the actual type.

java regex multiple patterns sequential matching

I have a specific question, to which I couldn't find any answer online. Basically, I would like to run a pattern-matching operation on a text, with multiple patterns. However, I do not wish that the matcher gets me the result all at once, but instead that each pattern is called at different stages of the loop, at the same time that specific operations are performed on each of these stages. So for instance, imagining I have Pattern1, Pattern2, and Pattern3, I would like something like:
if (Pattern 1 = true) {
delete Pattern1;
} else if (Pattern 2 = true) {
delete Pattern2;
} else if (Pattern 3 = true) {
replace with 'something;
} .....and so on
(this is just an illustration of the loop, so probably the syntax is not correct, )
My question is then: how can I compile different patterns, while calling them separately?
(I've only seen multiple patterns compiled together and searched together with the help of AND/OR and so on..that's not what I'm looking for unfortunately) Could I save the patterns in an array and call each of them on my loop?
Prepare your Pattern objects pattern1, pattern2, pattern3 and store them at any container (array or list). Then loop over this container using usePattern(Pattern newPattern) method of Matcher object at each iteration.
You can make a common interface, and make anonymous implementations that use patterns or whatever else you may want to transform your strings:
interface StringProcessor {
String process(String source);
}
StringProcessor[] processors = new StringProcessor[] {
new StringProcessor() {
private final Pattern p = Pattern.compile("[0-9]+");
public String process(String source) {
String res = source;
if (p.matcher(source).find()) {
res = ... // delete
}
return res;
}
}
, new StringProcessor() {
private final Pattern p = Pattern.compile("[a-z]+");
public String process(String source) {
String res = source;
if (p.matcher(source).find()) {
res = ... // replace
}
return res;
}
}
, new StringProcessor() {
private final Pattern p = Pattern.compile("[%^##]{2,5}");
public String process(String source) {
String res = source;
if (p.matcher(source).find()) {
res = ... // do whatever else
}
return res;
}
}
};
String res = "My starting string 123 and more 456";
for (StringProcessor p : processors) {
res = p.process(res);
}
Note that implementations of StringProcessor.process do not need to use regular expressions at all. The loop at the bottom has no idea the regexp is involved in obtaining the results.

avoid code duplication

consider the following code:
if (matcher1.find()) {
String str = line.substring(matcher1.start()+7,matcher1.end()-1);
/*+7 and -1 indicate the prefix and suffix of the matcher... */
method1(str);
}
if (matcher2.find()) {
String str = line.substring(matcher2.start()+8,matcher2.end()-1);
method2(str);
}
...
I have n matchers, all matchers are independent (if one is true, it says nothing about the others...), for each matcher which is true - I am invoking a different method on the content it matched.
question: I do not like the code duplication nor the "magic numbers" in here, but I'm wondering if there is better way to do it...? (maybe Visitor Pattern?) any suggestions?
Create an abstract class, and add offset in subclass (with string processing too... depending of your requirement).
Then populate them in a list and process the list.
Here is a sample absract processor:
public abstract class AbsractProcessor {
public void find(Pattern pattern, String line) {
Matcher matcher = p.matcher(line);
if (matcher.find()) {
process(line.substring(matcher.start() + getStartOffset(), matcher.end() - getEndOffset()));
}
}
protected abstract int getStartOffset();
protected abstract int getEndOffset();
protected abstract void process(String str);
}
Simple mark the part of the regex that you want to pass to the method with a capturing group.
For example if your regex is foo.*bar and you are not interested in foo or bar, make the regex foo(.*)bar. Then always grab the group 1 from the Matcher.
Your code would then look like this:
method1(matcher1.group(1));
method2(matcher2.group(2));
...
One further step would be to replace your methods with classes implementing an like this:
public interface MatchingMethod {
String getRegex();
void apply(String result);
}
Then you can easily automate the task:
for (MatchingMethod mm : getAllMatchingMethods()) {
Pattern p = Pattern.compile(mm.getRegex());
Matcher m = p.matcher(input);
while (m.find()) {
mm.apply(m.group(1));
}
Note that if performance is important, then pre-compiling the Pattern can improve runtime if you apply this to many inputs.
You could make it a little bit shorter, but I the question is, is this really worth the effort:
private String getStringFromMatcher(Matcher matcher, int magicNumber) {
return line.subString(matcher.start() + magicNumber, matcher.end() - 1 )
}
if (matcher1.find()) {
method1(getStringFromMatcher(matcher1, 7);
}
if (matcher2.find()) {
method2.(getStringFromMatcher(mather2, 8);
}
use Cochard's solution combined with a factory (switch statement) with all the methodX methods. so you can call it like this:
Factory.CallMethodX(myEnum.MethodX, str)
you can assign the myEnum.MethodX in the population step of Cochard's solution

Categories

Resources