Java Regex - Match line word that not inside string - java

For my project, I want to get class using regex but not inside string
for e.g
class fo{
void foo{
System.out.println("example writing of class");
System.out.println("class cls{");
}
}
so, I hope result like this :
class fo{
I have try create a pattern but not working, Here.
Pattern.compile("\\b(?!(\"))class\\s+\\w+\\{\\b(?!(\"))")

Try this regex:
(?:^|\\n)\\s*class.*
Explaining:
(?:^|\\n) # from start or new line
\\s* # as many as possible spaces
class # the 'class' text
.* # all characters till the end of line
Hope it helps.

Related

Scala RegEx String extractors behaving inconsistently

I have two regular expression extractors.
One for .java files and the other is for .scala files
val JavaFileRegEx =
"""\S*
\s+
//
\s{1}
([^\.java]+)
\.java
""".replaceAll("(\\s)", "").r
val ScalaFileRegEx =
"""\S*
\s+
//
\s{1}
([^\.scala]+)
\.scala
""".replaceAll("(\\s)", "").r
I want to use these extractors above to extract a java file name and a scala file name from the example code below.
val string1 = " // Tester.java"
val string2 = " // Hello.scala"
string1 match {
case JavaFileRegEx(fileName1) => println(" Java file: " + fileName1)
case other => println(other + "--NO_MATCH")
}
string2 match {
case ScalaFileRegEx(fileName2) => println(" Scala file: " + fileName2)
case other => println(other + "--NO_MATCH")
}
I get this output indicating that the .java file matched but the .scala file did not.
Java file: Tester
// Hello.scala--NO_MATCH
How is it that the Java file matched but the .scala file did not?
NOTE
[] denotes character class. It matches only a single character.
[^] denotes match anything except the characters present in the character class.
In your first regex
\S*\s+//\s{1}([^\.java]+)\.java
\S* matches nothing as there is space in starting
\s+ matches the space which is in starting
// matches // literally
\s{1} matches next space
You are using [^\.java] which says match anything except . or j or a or v or a which can be written as [^.jav].
So, the left string now to be tested is
Tester.java
(Un)luckily any character from Tester does not matches . or j or a or v until we encounter a .. So Tester is matched and then java is also matched.
In your second regex
\S*\s+//\s{1}([^\.scala]+)\.scala
\S* matches nothing as there is space in starting
\s+ matches the space which is in starting
// matches // literally
\s{1} matches next space
Now, you are using [^\.scala] which says that match anything except . or s or c or a or l or a which can be written as [^.scla].
You have now
Hello.scala
but (un)luckily Hello here contains l which is not allowed according to character class and the regex fails.
How to correct it?
I will modify only a bit of your regex
\S*\s+//\s{1}([^.]*)\.java
<-->
This says that match anything except .
You can also use \w here instead if [^.]
Regex Demo
\S*\s+//\s{1}([^.]*)\.scala
Regex Demo
There is no need of {1} in \s{1}. You can simply write it as \s and it will match exactly one space like
\S*\s+//\s([^.]*)\.java

Need help understanding unexpected output from Java regular expression

package com.j;
public class Program {
public static void main(String[] args) {
System.out.println(Puzzel.class.getName().replaceAll(".", "/")
+ ".class");
System.out.println(Program.class.getName());
}
}
in the above program i was expecting a output com/j/Program.class
But it is coming //////.class its y?
In the replacement, . is treated as a regular expression, where . means "any character" and here is replaced with / , so the output becomes
////////////.class
For the expected answer, change the expression to escape the .:
Name.class.getName().replaceAll("\\.", "/") + ".class");
Then the output will be what you expected:
com/j/Puzzel.class
Because . is a special char when it comes to regex. You should escape it with backslash.
replaceAll() takes a regular expression for the matcher. Your code says to replace every character (.) with a /. you need replaceAll("\\.") or maybe replaceAll("\\\\."). I can never remember how many escapes to use offhand.

A custom tokenizer for Java

I am developing an application in which I need to process text files containing emails. I need all the tokens from the text and the following is the definition of token:
Alphanumeric
Case-sensitive (case to be preserved)
'!' and '$' are to be considered as constituent characters. Ex: FREE!!, $50 are tokens
'.' (dot) and ',' comma are to be considered as constituent characters if they occur between numbers. For ex:
192.168.1.1, $24,500
are tokens.
and so on..
Please suggest me some open-source tokenizers for Java which are easy to customize to suit my needs. Will simply using StringTokenizer and regex be enough? I have to perform stopping also and that's why I was looking for an open source tokenizer which will also perform some extra things like stopping, stemming.
A few comments up front:
From StringTokenizer javadoc:
StringTokenizer is a legacy class that is retained for compatibility
reasons although its use is discouraged in new code. It is recommended
that anyone seeking this functionality use the split method of String
or the java.util.regex package instead.
Always use Google first - the first result as of now is JTopas. I did not use it, but it looks it could work for this
As for regex, it really depends on your requirements. Given the above, this might work:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Mkt {
public static void main(String[] args) {
Pattern p = Pattern.compile("([$\\d.,]+)|([\\w\\d!$]+)");
String str = "--- FREE!! $50 192.168.1.1 $24,500";
System.out.println("input: " + str);
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println("token: " + m.group());
}
}
}
Here's a sample run:
$ javac Mkt.java && java Mkt
input: --- FREE!! $50 192.168.1.1 $24,500
token: FREE!!
token: $50
token: 192.168.1.1
token: $24,500
Now, you might need to tweak the regex, for example:
You gave $24,500 as an example. Should this work for $24,500abc or $24,500EUR?
You mentioned 192.168.1.1 should be included. Should it also include 192,168.1,1 (given . and , are to be included)?
and I guess there are other things to consider.
Hope this helps to get you started.

Use java regex to find all strings that start with '#' and end with ' ' , and not include ' ' and '#'

I need to get all strings(not empty) starts with # and end with ' '(space) in String below:
String s = "#test1 #test2 #test3 #test4 ## #test5";
I hope I can get all "test1", "test2", "test3", "test4", "test5" strings.
How to do it with java regx? thanks a lot!
You can use the following regex
#\w+
\w is similar to [a-zA-Z\d_]
\w+ matches 1 to many characters which are from [a-zA-Z\d_]
The Java regex (?<=#)[^# ]+(?= ) should do the trick. According to Regex Planet's Java regex page that regex matches test1, test2, test3 and test4. (#test5 does not end with a space, so test5 is not matched.)
If you're OK with matching the leading #s and trailing s as well, you can get away with the simpler Java regex #[^# ]+.
Finally I solved it with code below:
Pattern pattern = Pattern.compile("#\\p{L}+");

Need regex to format file in php

I have a java file that I want to post online. I am using php to format the file.
Does anyone know the regex to turn the comments blue?
INPUT:
/*****
*This is the part
*I want to turn blue
*for my class
*******************/
class MyClass{
String s;
}
Thanks.
Naiive version:
$formatted = preg_replace('|(/\*.*?\*/)|m', '<span class="blue">$1</span>', $java_code_here);
... not tested, YMMV, etc...
In general, you won't be able to parse specific parts of a Java file using only regular expressions - Java is not a regular language. If your file has additional structure (such as "it always begins with a comment followed by a newline, followed by a class definition"), you can generate a regular expression for such a case. For instance, you'd match /\*+(.*?)\*+/$, where . is assumed to match multiple lines, and $ matches the end of a line.
In general, to make a regex work, you first define what patterns you want to find (rigorously, but in spoken language), and then translate that to standard regular expression notation.
Good luck.
A regex that can parse simple quotes should be able to find comments in C/C++ style languages.
I assume Java is of that type.
This is a Perl faq sample by someone else, although I added the part about // style comments (with or without line continuation) and reformated.
It basically does a global search and replace. Data is replaced verbatim if non a comment, otherwise replace the comment with your color formatting tags.
You should be able to adapt this to php, and it is expanded for clarity (maybe too much clarity though).
s{
## Comments, group 1:
(
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
(?:
[^/*][^*]*\*+
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment
|
// ## Start of // ... comment
(?:
[^\\] ## Any Non-Continuation character ^\
| ## OR
\\\n? ## Any Continuation character followed by 0-1 newline \n
)*? ## To be done 0-many times, stopping at the first end of comment
\n ## End of // comment
)
| ## OR, various things which aren't comments, group 2:
(
" (?: \\. | [^"\\] )* " ## Double quoted text
|
' (?: \\. | [^'\\] )* ' ## Single quoted text
|
. ## Any other char
[^/"'\\]* ## Chars which doesn't start a comment, string, escape
) ## or continuation (escape + newline)
}
{defined $2 ? $2 : "<some color>$1</some color>"}gxse;

Categories

Resources