Java code regex not working - java

I am trying to add a validation utility to a program I have already created. This utility will analyze a file to ensure it meets proper parameters. I have finished the analysis for the first line and am running into a problem regarding whitespaces. The file that I am using for testing includes this line:
1459875655257 05112345678945612345678941EMMAM BANK OF AMERICA, NA BAC
The block of code that tests this line and is generating the problem is as follows:
if(immediateDestName.isEmpty()){
JOptionPane.showMessageDialog(null,
"ERROR: File header immediate destination name is missing!");
} else {
if((sCurrentLine.substring(40,63)).matches("A-Za-z0-9 ]+")){
}else{
JOptionPane.showMessageDialog(null,
"ERROR: File header immediate destination name is invalid: "
+sCurrentLine.substring(40,63));
}
}
When I run the validation utility it pops open that JOptionPane, suggesting that the name is invalid. Just FYI the actual substring is EMMAM followed by a bunch of whitespaces (hence the problem).
Can someone please help me figure out why my program is flagging this alert even though from all I can see the substring matches the regex?

Use
if (sCurrentLine.substring(40,63).matches("[A-Za-z0-9 ]+")) { .. }
I believe you missed the first bracket. This will match any character that is A-Z, a-z, 0-9 or a space.

Related

BYACCJ: How do I include line number in my error message?

This is my current error handling function:
public void yyerror(String error) {
System.err.println("Error: "+ error);
}
This is the default error function I found on the BYACC/J homepage. I can't find any way to add the line number. My question is similar to this question. But the solution to it doesn't work here.
For my lexer I am using a JFlex file.
It's not that different from the bison/flex solution proposed in the question you link. At least, the principle is the same. Only the details differ.
The key fact is that it is the scanner, not the parser, which needs to count lines, because it is the scanner which converts the input text into tokens. The parser knows nothing about the original text; it just receives a sequence of nicely-processed tokens.
So we have to scour the documentation for JFlex to figure out how to get it to track line numbers, and then we find the following in the section on options and declarations:
%line
Turns line counting on. The int member variable yyline contains the number of lines (starting with 0) from the beginning of input to the beginning of the current token.
The JFlex manual doesn't mention that yyline is a private member variable, so in order to get at it from the parser you need to add something like the following to your JFlex file:
%line
{
public int GetLine() { return yyline + 1; }
// ...
}
You can then add a call to GetLine in the error function:
public void yyerror (String error) {
System.err.println ("Error at line " + lexer.GetLine() + ": " + error);
}
That will sometimes produce confusing error messages, because by the time yyerror is called, the parser has already requested the lookahead token, which may be on the line following the error or even separated from the error by several lines of comments. (This problem often shows up when the error is a missing statement terminator.) But it's a good start.

java.util.regex.Pattern doesn't agree with online regex debugger

I'm working with some regex for a program, I want the program to detect a certain exe, called gruell[something].exe
So I ended up with the following regex:
gruell.*\.exe[^\.]
After testing on both these sites my test cases are detected properly
https://regex101.com/
https://regexr.com/
My test set: (and what should fail and pass)
gruell-Core.exe [PASS]
Gruell.exe [PASS]
gruell_x64.exe [PASS]
Gruell_x64-core.exe [PASS]
grull.exe [FAIL]
gruell_____.exe [PASS]
gruell_installer.msi [FAIL]
gruell.html [FAIL]
.gruell.exe.398sn [FAIL]
gru-ell.exe [FAIL]
When I run this on my machine using the java.util.regex.Pattern it will not find anything, eventhough the folder I told it to scan contains both:
gruell.exe
.gruell.exe.398sn
Now the intersting part is is when I remove [^.] it will detect, however, it detects the .gruell.exe.398sn aswell, which is what I don't want.
Code in question:
File f = new File("G:\\dev\\gruell");
recursive_scan(f);
The function:
for (file : location.listFiles())
{
if (file.isDirectory)
{
recursive_scan(file)
}
else
{
Pattern pattern = Pattern.compile("gruell.*\\.exe[^\\.]", Pattern.CASE_INSENSITIVE);
if (pattern.matcher(file.name).find())
{
System.out.println("FOUND: " + file.name);
}
}
}
After testing on both [regex101 and RegExr] my test cases are detected properly
That seems unlikely, since your pattern is indeed faulty, not only in Java's Regex dialect but also in the ones tested by those sites. The only plausible explanation I see is that you were not actually testing the cases you think you were. For example, your test inputs may have had trailing spaces or newlines.
Which brings me to the problem with your pattern. As you already observe,
Now the intersting part is is when I remove [^.] it will detect,
That's because that sub-expression matches a character (different from .). Your overall pattern therefore indeed does not match "gruell-Core.exe" because there is no character after the .exe. Try matching "gruell-Core.exee" instead.
If you want your matches to end with .exe, then anchor your pattern instead: gruell.*\.exe$
Alright thanks to the site provided by John Bollinger https://www.regexplanet.com/advanced/java/index.html I was able to find out 2 things that were wrong here.
First off I had to use:
pattern.matcher(file.name).matches()
Instead of what I had:
pattern.matcher(file.name).find()
And second off I had to remove [^.] from the end of the String.
From:
"gruell.*\\.exe[^.]"
To:
"gruell.*\\.exe"

Parse file and delete bracket content using Java

I have a configuration file which has multiple records respecting the syntax below :
# some comment
Job {
Name = "Job1"
Include {
Where = /etc
}
}
I'm writing a Java program to parse this file, if user chooses to delete "Job1", then the program will delete the entire bracket of "Job1".
The hard part is how to find bracket that matches the first open one, as shows there are several brackets inside the first one. And sometimes we have records not respecting 100% the syntax like :
# some comment
Job {
Include {
Where = /etc
}
Name = "Job1"
}
So it makes the parsing even harder. Could anyone give me some ideas? Thanks.
You cannot do that with regular expressions, you need a grammar parser such as Antlr.

java: count opening and closing tag pair

I have below text
`h1` text `/h1` `i` text `/i` `u` text `/u`
Here pair h1 /h1 , i /i , u /u perfectly exist so this text should be passed. Now take this text
`h1` text `/h1` `i` text `/i` `u` text `/u
here the u /u combination is missing. So the above text failed.
I tried this
String startTags[] = {"`b`","`h1`","`h2`","`h3`","`h4`","`h5`","`h6`","`ul`","`li`","`i`","`u`"};
String endTags[] = {"`/b`","`/h1`","`/h2`","`/h3`","`/h4`","`/h5`","`/h6`","`/ul`","`/li`","`/i`","`/u`"};
for(int i=0;i<startTags.length;i++){
if(str.indexOf(startTags[i])!=-1){
System.out.println(">>>>"+startTags[i]);
startTagCount++;
}
if(str.indexOf(endTags[i])!=-1){System.out.println("+++"+endTags[i]);
endTagCount++;
}
}
if(startTagCount==endTagCount){
//TEXT IS OK
}else{
// TEXT FAILED
}
It passes below text instead getting failed
`h5`Is your question about programming? `/h5`
`b` bbbbbbbbbbbbbb`/b`
`b` bbbbbbbbbbbbbb`/b
Any better solution or regex in java ?
I'm afraid this problem cannot be solved by (strict) regular expressions, because the language you describe is not a regular language, it extends the language {anbn}, which is a well-known non-regular language.
If all you care about is making sure all opening tags have matching closing tags, then you can use regular expressions.
Your code has a logic problem, in that you count all opening tags and all closing tags, but don't check if the opening tags and closing tags actually match. The startTagCount and endTagCount variables are not sufficient. I would suggest using a map, using the tag type as a key and the value as the count. Increment count on open tag, decrement count on close tag. Check for non-zero after scanning is complete.
What is the grammar of this "language"? Your approach might be not be proper validation. For example, this HTML has matching tag counts but is invalid:
<b><i>Invalid</b></i>

ANTLR4 Lexer error reporting (length of offending characters)

I'm developing a small IDE for some language using ANTLR4 and need to underline erroneous characters when the lexer fails to match them. The built in org.antlr.v4.runtime.ANTLRErrorListener implementation outputs a message to stderr in such cases, similar to this:
line 35:25 token recognition error at: 'foo\n'
I have no problem understanding how information about line and column of the error is obtained (passed as arguments to syntaxError callback), but how do I get the 'foo\n' string inside the callback?
When a parser is the source of the error, it passes the offending token as the second argument of syntaxError callback, so it becomes trivial to extract information about the start and stop offsets of the erroneous input and this is also explained in the reference book. But what about the case when the source is a lexer? The second argument in the callback is null in this case, presumably since the lexer failed to form a token.
I need the length of unmatched characters to know how much to underline, but while debugging my listener implementation I could not find this information anywhere in the supplied callback arguments (other than extracting it from the supplied error message though string manipulation, which would be just wrong). The 'foo\n' string may clearly be obtained somehow, so what am I missing?
I suspect that I might be looking in the wrong place and that I should be looking at extending DefaultErrorStrategy where error messages get formed.
You should write your lexer such that a syntax error is impossible. In ANTLR 4, it is easy to do this by simply adding the following as the last rule of your lexer:
ErrorChar : . ;
By doing this, your errors are moved from the lexer to the parser.
In some cases, you can take additional steps to help users while they edit code in your IDE. For example, suppose your language supports double-quoted strings of the following form, which cannot span multiple lines:
StringLiteral : '"' ~[\r\n"]* '"';
You can improve error reporting in your IDE by using the following pair of rules:
StringLiteral : '"' ~[\r\n"]* '"';
UnterminatedStringLiteral : '"' ~[\r\n"]*;
You can then override the emit() method to treat the UnterminatedStringLiteral in a special way. As a result, the user sees a great error message and the parser sees a single StringLiteral token that it can generally handle well.
#Override
public Token emit() {
switch (getType()) {
case UnterminatedStringLiteral:
setType(StringLiteral);
Token result = super.emit();
// you'll need to define this method
reportError(result, "Unterminated string literal");
return result;
default:
return super.emit();
}
}

Categories

Resources