Regular expression in Java and in Eclipse? - java

I want to remove all empty linse in Java. In Eclipse I will use:
\n( *)\n (or "\r\n( *)\r\n" in Windows)
. But in Java it isn't work (I used:
str=str.replaceAll("\n( *)\n")
). How to do it in Java using replaceAll? Sample:
package example
○○○○
public ... (where ○ is space)

I would do it like this
java.util.regex.Pattern ws = Pattern.compile("[\r|\n][\\s]*[\r|\n]");
java.util.regex.Matcher matcher = ws.matcher(str);
str = matcher.replaceAll(" ");

Related

Replace quote (‘NOA’) using groovy

Can anyone guide me on how to replace this char (‘ ’) using groovy or java?
When I try the below code (i assume this is a single quote), it's not working.
def a = "‘NOA’,’CTF’,’CLM’"
def rep = a.replaceAll("\'","")
My expected Output : NOA,CTF,CLM
Those are curly quotes in your source text. Your replaceAll is replacing straight quotes.
You should have copy-pasted the characters from your source.
System.out.println(
"‘NOA’,’CTF’,’CLM’"
.replaceAll( "‘" , "" )
.replaceAll( "’" , "" )
);
See this code run live at OneCompiler.
NOA,CTF,CLM
i would suggest this
a.replaceAll("[‘’]", "")
or even better to escape unicode characters in a source code
a.replaceAll("[\u2018\u2019]", "")

Match path string using glob in Java

I have following string as a glob rule:
**/*.txt
And test data:
/foo/bar.txt
/foo/buz.jpg
/foo/oof/text.txt
Is it possible to use glob rule (without converting glob to regex) to match test data and return valud entries ?
One requirement: Java 1.6
If you have Java 7 can use FileSystem.getPathMatcher:
final PathMatcher matcher = FileSystem.getPathMatcher("glob:**/*.txt");
This will require converting your strings into instances of Path:
final Path myPath = Paths.get("/foo/bar.txt");
For earlier versions of Java you might get some mileage out of Apache Commons' WildcardFileFilter. You could also try and steal some code from Spring's AntPathMatcher - that's very close to the glob-to-regex approach though.
FileSystem#getPathMatcher(String) is an abstract method, you cannot use it directly. You need to do get a FileSystem instance first, e.g. the default one:
PathMatcher m = FileSystems.getDefault().getPathMatcher("glob:**/*.txt");
Some examples:
// file path
PathMatcher m = FileSystems.getDefault().getPathMatcher("glob:**/*.txt");
m.matches(Paths.get("/foo/bar.txt")); // true
m.matches(Paths.get("/foo/bar.txt").getFileName()); // false
// file name only
PathMatcher n = FileSystems.getDefault().getPathMatcher("glob:*.txt");
n.matches(Paths.get("/foo/bar.txt")); // false
n.matches(Paths.get("/foo/bar.txt").getFileName()); // true
To add to the previous answer: org.apache.commons.io.FilenameUtils.wildcardMatch(filename, wildcardMatcher)
from Apache commons-lang library.

A custom tokenizer for Java

I am developing an application in which I need to process text files containing emails. I need all the tokens from the text and the following is the definition of token:
Alphanumeric
Case-sensitive (case to be preserved)
'!' and '$' are to be considered as constituent characters. Ex: FREE!!, $50 are tokens
'.' (dot) and ',' comma are to be considered as constituent characters if they occur between numbers. For ex:
192.168.1.1, $24,500
are tokens.
and so on..
Please suggest me some open-source tokenizers for Java which are easy to customize to suit my needs. Will simply using StringTokenizer and regex be enough? I have to perform stopping also and that's why I was looking for an open source tokenizer which will also perform some extra things like stopping, stemming.
A few comments up front:
From StringTokenizer javadoc:
StringTokenizer is a legacy class that is retained for compatibility
reasons although its use is discouraged in new code. It is recommended
that anyone seeking this functionality use the split method of String
or the java.util.regex package instead.
Always use Google first - the first result as of now is JTopas. I did not use it, but it looks it could work for this
As for regex, it really depends on your requirements. Given the above, this might work:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Mkt {
public static void main(String[] args) {
Pattern p = Pattern.compile("([$\\d.,]+)|([\\w\\d!$]+)");
String str = "--- FREE!! $50 192.168.1.1 $24,500";
System.out.println("input: " + str);
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println("token: " + m.group());
}
}
}
Here's a sample run:
$ javac Mkt.java && java Mkt
input: --- FREE!! $50 192.168.1.1 $24,500
token: FREE!!
token: $50
token: 192.168.1.1
token: $24,500
Now, you might need to tweak the regex, for example:
You gave $24,500 as an example. Should this work for $24,500abc or $24,500EUR?
You mentioned 192.168.1.1 should be included. Should it also include 192,168.1,1 (given . and , are to be included)?
and I guess there are other things to consider.
Hope this helps to get you started.

Java regex to extract version

I am trying to write a Java regex to strip off Java version from string I get from java -version command.
If the string is java version 1.7.0_17 I need to extract 1.7 and 17 separately. Suppose if the string is java version 1.06_18 I need to extract 1.06 and 18. Where first string should be only till the first decimal point. I tried with the below regex:
Pattern p = Pattern.compile(".*\"(.+)_(.+)\"");
But it extracts only 1.7.0 and 17, but I not sure how to stop still one decimal point.
For the test cases you give, you could use this : (\d+\.\d+).*_(\d+)
Re-read your question, my old answer didn't cover stopping at the first decimal point.
This should solve your problem:
Pattern p = Pattern.compile(".*?(\\d+[.]\\d+).*?_(\\d+)");
Tested:
String input = "java version 1.7.0_17";
Pattern pattern = Pattern.compile(".*?(\\d+[.]\\d+).*?_(\\d+)");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
String version = matcher.group(1); // 1.7
String update = matcher.group(2); // 17
}
Old version:
Instead of finding the numbers, I'd get rid of the rest:
String string = "java version 1.7.0_17";
String[] parts = string.split("[\\p{Alpha}\\s_]+", -1);
String version = parts[1]; // 1.7.0
String update = parts[2]; // 17

simple regex in java split

I have a string blah-*-bleh-*-bloh
and I want to split it by -*- so I tried (amongst other things):
res.split("/-\\*-/g");
But it's not working. Anyone has an idea?
In java, there is no need for / before and /g after:
String[] splittedArray = res.split("-\\*-");

Categories

Resources