How do I make the method indexOfVowel to return 1 for strings such as translate? I need it to find all the consonant that is before a vowel to be moved to the end. If I changed "notVowel" to 2 the word "translate" should be "anslatetray" but it returns ranslatetay. It only checks the first letter but not the rest of the word.
private String translateWord(String word) {
String translated = "";
int vowel = 0;
int notVowel = 1;
if (vowel == indexOfVowel(word)) {
return (word + "way ");
}
if (notVowel == indexOfVowel(word)) {
return (word.substring(1) + word.substring(vowel, 1) + "ay ");
}
return translated;
}
private static int indexOfVowel(String word) {
int index = 0;
for (int i = 0; i < word.length(); i++) {
if (isVowel(word.charAt(i))) {
return i;
}
}
return word.length();
}
private static boolean isVowel(char ch) {
switch (ch) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
return true;
default:
return false;
}
}
}
I don't think you want indexOfVowel to return 1 specifically under any case. You want it to do what it sounds like it will do...return the index of the first vowel in the word passed to it. If the first vowel happens to be at the beginning of the word, it will return 0, otherwise it will return a non-0 value indicating the location of the first vowel. You need that location to do the right thing elsewhere. So indexOfVowel is correct as you have it.
Your problem is your use of the substring method...understanding how to use it to pick out the right portions of the target word.
Here's a modified version of your code that does the right thing, with comments to explain the two uses of substring:
public class Test {
private static String translateWord(String word) {
int i = indexOfVowel(word);
if (i == 0) { // if word starts with vowel
return (word + "way ");
}
else { // word doesn't start with a vowel
// 'i' is the position of the first vowel
// word.substring(i) = from position of first vowel to end of word
// word.substring(0, i) = from start of word to char before first vowel
return (word.substring(i) + word.substring(0, i) + "ay ");
}
}
private static int indexOfVowel(String word) {
for (int i = 0; i < word.length(); i++) {
if (isVowel(word.charAt(i))) {
return i;
}
}
return word.length();
}
private static boolean isVowel(char ch) {
switch (ch) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
return true;
default:
return false;
}
}
public static void main(String[] args) {
System.out.println(translateWord("action"));
System.out.println(translateWord("translate"));
System.out.println(translateWord("parachute"));
System.out.println(translateWord("scrap"));
}
}
Result:
actionway
anslatetray
arachutepay
apscray
For an assignment we are creating a java program that accepts a java file, fixes messy code and outputs to a new file.
We are to assume there is only one bracket { } per line and that each bracket occurs at the end of the line. If/else statements also use brackets.
I am currently having trouble finding a way to indent every line after an opening bracket until next closing bracket, then decreasing indent after closing bracket until the next opening bracket. We are also required to use the methods below:
Updated code a bit:
public static void processJavaFile() {
}
}
This algorithm should get you started. I left a few glitches that you'll have to fix.
(For example it doesn't indent your { brackets } as currently written, and it adds an extra newline for every semicolon)
The indentation is handled by a 'depth' counter which keeps track of how many 'tabs' to add.
Consider using a conditional for loop instead of a foreach if you want more control over each iteration. (I wrote this quick n' dirty just to give you an idea of how it might be done)
public String parse(String input) {
StringBuilder output = new StringBuilder();
int depth = 0;
boolean isNewLine = false;
boolean wasSpaced = false;
boolean isQuotes = false;
String tab = " ";
for (char c : input.toCharArray()) {
switch (c) {
case '{':
output.append(c + "\n");
depth++;
isNewLine = true;
break;
case '}':
output.append("\n" + c);
depth--;
isNewLine = true;
break;
case '\n':
isNewLine = true;
break;
case ';':
output.append(c);
isNewLine = true;
break;
case '\'':
case '"':
if (!isQuotes) {
isQuotes = true;
} else {
isQuotes = false;
}
output.append(c);
break;
default:
if (c == ' ') {
if (!isQuotes) {
if (!wasSpaced) {
wasSpaced = true;
output.append(c);
}
} else {
output.append(c);
}
} else {
wasSpaced = false;
output.append(c);
}
break;
}
if (isNewLine) {
output.append('\n');
for (int i = 0; i < depth; i++) {
output.append(tab);
}
isNewLine = false;
}
}
return output.toString();
}
This question already has answers here:
How to convert a char array back to a string?
(14 answers)
Closed 9 years ago.
i am doing a porter stemmer.....the code gives me output in char array....but i need to convert that into string to proceed with futher work.....in the code i have given 2 words "looking" and "walks"....that is returned as look and walk(but in char array)...the output is printed in stem() function
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package file;
import java.util.Vector;
/**
*
* #author sky
*/
public class stemmer {
public static String line1;
private char[] b;
private int i, /* offset into b */
i_end, /* offset to end of stemmed word */
j, k;
private static final int INC = 50;
/* unit of size whereby b is increased */
public stemmer()
{
//b = new char[INC];
i = 0;
i_end = 0;
}
/**
* Add a character to the word being stemmed. When you are finished
* adding characters, you can call stem(void) to stem the word.
*/
public void add(char ch)
{
System.out.println("in add() function");
if (i == b.length)
{
char[] new_b = new char[i+INC];
for (int c = 0; c < i; c++)
new_b[c] = b[c];
b = new_b;
}
b[i++] = ch;
}
/** Adds wLen characters to the word being stemmed contained in a portion
* of a char[] array. This is like repeated calls of add(char ch), but
* faster.
*/
public void add(char[] w, int wLen)
{ if (i+wLen >= b.length)
{
char[] new_b = new char[i+wLen+INC];
for (int c = 0; c < i; c++)
new_b[c] = b[c];
b = new_b;
}
for (int c = 0; c < wLen; c++)
b[i++] = w[c];
}
public void addstring(String s1)
{
b=new char[s1.length()];
for(int k=0;k<s1.length();k++)
{
b[k] = s1.charAt(k);
System.out.println(b[k]);
}
i=s1.length();
}
/**
* After a word has been stemmed, it can be retrieved by toString(),
* or a reference to the internal buffer can be retrieved by getResultBuffer
* and getResultLength (which is generally more efficient.)
*/
public String toString() { return new String(b,0,i_end); }
/**
* Returns the length of the word resulting from the stemming process.
*/
public int getResultLength() { return i_end; }
/**
* Returns a reference to a character buffer containing the results of
* the stemming process. You also need to consult getResultLength()
* to determine the length of the result.
*/
public char[] getResultBuffer() { return b; }
/* cons(i) is true <=> b[i] is a consonant. */
private final boolean cons(int i)
{ switch (b[i])
{ case 'a': case 'e': case 'i': case 'o': case 'u': return false;
case 'y': return (i==0) ? true : !cons(i-1);
default: return true;
}
}
/* m() measures the number of consonant sequences between 0 and j. if c is
a consonant sequence and v a vowel sequence, and <..> indicates arbitrary
presence,
<c><v> gives 0
<c>vc<v> gives 1
<c>vcvc<v> gives 2
<c>vcvcvc<v> gives 3
....
*/
private final int m()
{ int n = 0;
int i = 0;
while(true)
{ if (i > j) return n;
if (! cons(i)) break; i++;
}
i++;
while(true)
{ while(true)
{ if (i > j) return n;
if (cons(i)) break;
i++;
}
i++;
n++;
while(true)
{ if (i > j) return n;
if (! cons(i)) break;
i++;
}
i++;
}
}
/* vowelinstem() is true <=> 0,...j contains a vowel */
private final boolean vowelinstem()
{ int i; for (i = 0; i <= j; i++) if (! cons(i)) return true;
return false;
}
/* doublec(j) is true <=> j,(j-1) contain a double consonant. */
private final boolean doublec(int j)
{ if (j < 1) return false;
if (b[j] != b[j-1]) return false;
return cons(j);
}
/* cvc(i) is true <=> i-2,i-1,i has the form consonant - vowel - consonant
and also if the second c is not w,x or y. this is used when trying to
restore an e at the end of a short word. e.g.
cav(e), lov(e), hop(e), crim(e), but
snow, box, tray.
*/
private final boolean cvc(int i)
{ if (i < 2 || !cons(i) || cons(i-1) || !cons(i-2)) return false;
{ int ch = b[i];
if (ch == 'w' || ch == 'x' || ch == 'y') return false;
}
return true;
}
private final boolean ends(String s)
{
int l = s.length();
int o = k-l+1;
if (o < 0)
return false;
for (int i = 0; i < l; i++)
if (b[o+i] != s.charAt(i))
return false;
j = k-l;
return true;
}
/* setto(s) sets (j+1),...k to the characters in the string s, readjusting
k. */
private final void setto(String s)
{ int l = s.length();
int o = j+1;
for (int i = 0; i < l; i++)
b[o+i] = s.charAt(i);
k = j+l;
}
/* r(s) is used further down. */
private final void r(String s) { if (m() > 0) setto(s); }
/* step1() gets rid of plurals and -ed or -ing. e.g.
caresses -> caress
ponies -> poni
ties -> ti
caress -> caress
cats -> cat
feed -> feed
agreed -> agree
disabled -> disable
matting -> mat
mating -> mate
meeting -> meet
milling -> mill
messing -> mess
meetings -> meet
*/
private final void step1()
{
if (b[k] == 's')
{ if (ends("sses")) k -= 2; else
if (ends("ies")) setto("i"); else
if (b[k-1] != 's') k--;
}
if (ends("eed")) { if (m() > 0) k--; } else
if ((ends("ed") || ends("ing")) && vowelinstem())
{ k = j;
if (ends("at")) setto("ate"); else
if (ends("bl")) setto("ble"); else
if (ends("iz")) setto("ize"); else
if (doublec(k))
{ k--;
{ int ch = b[k];
if (ch == 'l' || ch == 's' || ch == 'z') k++;
}
}
else if (m() == 1 && cvc(k)) setto("e");
}
}
/* step2() turns terminal y to i when there is another vowel in the stem. */
private final void step2() { if (ends("y") && vowelinstem()) b[k] = 'i'; }
/* step3() maps double suffices to single ones. so -ization ( = -ize plus
-ation) maps to -ize etc. note that the string before the suffix must give
m() > 0. */
private final void step3() { if (k == 0) return; /* For Bug 1 */ switch (b[k-1])
{
case 'a': if (ends("ational")) { r("ate"); break; }
if (ends("tional")) { r("tion"); break; }
break;
case 'c': if (ends("enci")) { r("ence"); break; }
if (ends("anci")) { r("ance"); break; }
break;
case 'e': if (ends("izer")) { r("ize"); break; }
break;
case 'l': if (ends("bli")) { r("ble"); break; }
if (ends("alli")) { r("al"); break; }
if (ends("entli")) { r("ent"); break; }
if (ends("eli")) { r("e"); break; }
if (ends("ousli")) { r("ous"); break; }
break;
case 'o': if (ends("ization")) { r("ize"); break; }
if (ends("ation")) { r("ate"); break; }
if (ends("ator")) { r("ate"); break; }
break;
case 's': if (ends("alism")) { r("al"); break; }
if (ends("iveness")) { r("ive"); break; }
if (ends("fulness")) { r("ful"); break; }
if (ends("ousness")) { r("ous"); break; }
break;
case 't': if (ends("aliti")) { r("al"); break; }
if (ends("iviti")) { r("ive"); break; }
if (ends("biliti")) { r("ble"); break; }
break;
case 'g': if (ends("logi")) { r("log"); break; }
} }
/* step4() deals with -ic-, -full, -ness etc. similar strategy to step3. */
private final void step4() { switch (b[k])
{
case 'e': if (ends("icate")) { r("ic"); break; }
if (ends("ative")) { r(""); break; }
if (ends("alize")) { r("al"); break; }
break;
case 'i': if (ends("iciti")) { r("ic"); break; }
break;
case 'l': if (ends("ical")) { r("ic"); break; }
if (ends("ful")) { r(""); break; }
break;
case 's': if (ends("ness")) { r(""); break; }
break;
} }
/* step5() takes off -ant, -ence etc., in context <c>vcvc<v>. */
private final void step5()
{ if (k == 0) return; /* for Bug 1 */ switch (b[k-1])
{ case 'a': if (ends("al")) break; return;
case 'c': if (ends("ance")) break;
if (ends("ence")) break; return;
case 'e': if (ends("er")) break; return;
case 'i': if (ends("ic")) break; return;
case 'l': if (ends("able")) break;
if (ends("ible")) break; return;
case 'n': if (ends("ant")) break;
if (ends("ement")) break;
if (ends("ment")) break;
/* element etc. not stripped before the m */
if (ends("ent")) break; return;
case 'o': if (ends("ion") && j >= 0 && (b[j] == 's' || b[j] == 't')) break;
/* j >= 0 fixes Bug 2 */
if (ends("ou")) break; return;
/* takes care of -ous */
case 's': if (ends("ism")) break; return;
case 't': if (ends("ate")) break;
if (ends("iti")) break; return;
case 'u': if (ends("ous")) break; return;
case 'v': if (ends("ive")) break; return;
case 'z': if (ends("ize")) break; return;
default: return;
}
if (m() > 1) k = j;
}
/* step6() removes a final -e if m() > 1. */
private final void step6()
{ j = k;
if (b[k] == 'e')
{ int a = m();
if (a > 1 || a == 1 && !cvc(k-1)) k--;
}
if (b[k] == 'l' && doublec(k) && m() > 1) k--;
}
/** Stem the word placed into the Stemmer buffer through calls to add().
* Returns true if the stemming process resulted in a word different
* from the input. You can retrieve the result with
* getResultLength()/getResultBuffer() or toString().
*/
public void stem()
{
// step1();
// System.out.println("hello in stem");
// step2();
// step3();
// step4();
// step5();
// step6();
//
// i_end = k+1;
// i = 0;
System.out.println(i);
k = i - 1;
if (k > 1)
{
step1();
step2();
step3();
step4();
step5();
step6();
}
for(int c=0;c<=k;c++)
System.out.println(b[c]);
i_end = k+1; i = 0;
}
public static void main(String[] args)
{
stemmer s = new stemmer();
s.addstring("looking");
s.stem();
s.addstring("walks");
s.stem();
//System.out.println("Output " +s.b);
}
}
- Use Character class method toString();
Eg:
class Test
{
public static void main (String[] args) throws java.lang.Exception
{
char c = 'a';
String s = Character.toString(c);
System.out.println(s);
}
}
- Now use this above explained method to convert all the character array items into String.
char[] data = new char[10];
String text = String.valueOf(data);
to convert a char[] to string use this way
String x=new String(char[])
example
char x[]={'a','m'};
String z=new String(x);
System.out.println(z);
output
am
char[] a = new char[10];
for(int i=0;i<10;i++)
{
a[i] = 's';
}
System.out.println(new String(a));
or
System.out.println(String.copyValueOf(a));
I have java program, which will receive plain text from server. The plain text may contain URLs. Is there any Class in Java library to convert plain text to HTML text? Or any other library? If there are not then what is the solution?
You should do some replacements on the text programmatically. Here are some clues:
All Newlines should be converted to "<br>\n" (The \n for better readability of the output).
All CRs should be dropped (who uses DOS encoding anyway).
All pairs of spaces should be replaced with " "
Replace "<" with "<"
Replace "&" with "&"
All other characters < 128 should be left as they are.
All other characters >= 128 should be written as "&#"+((int)myChar)+";", to make them readable in every encoding.
To autodetect your links, you could either use a regex like "http://[^ ]+", or "www.[^ ]" and convert them like JB Nizet said. to ""+url+"", but only after having done all the other replacements.
The code to do this looks something like this:
public static String escape(String s) {
StringBuilder builder = new StringBuilder();
boolean previousWasASpace = false;
for( char c : s.toCharArray() ) {
if( c == ' ' ) {
if( previousWasASpace ) {
builder.append(" ");
previousWasASpace = false;
continue;
}
previousWasASpace = true;
} else {
previousWasASpace = false;
}
switch(c) {
case '<': builder.append("<"); break;
case '>': builder.append(">"); break;
case '&': builder.append("&"); break;
case '"': builder.append("""); break;
case '\n': builder.append("<br>"); break;
// We need Tab support here, because we print StackTraces as HTML
case '\t': builder.append(" "); break;
default:
if( c < 128 ) {
builder.append(c);
} else {
builder.append("&#").append((int)c).append(";");
}
}
}
return builder.toString();
}
However, the link conversion has yet to be added. If someone does it, please update the code.
I found a solution using pattern matching. Here is my code -
String str = "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>?«»“”‘’]))";
Pattern patt = Pattern.compile(str);
Matcher matcher = patt.matcher(plain);
plain = matcher.replaceAll("$1");
And Here are the input and output -
Input text is variable plain:
some text and then the URL http://www.google.com and then some other text.
Output :
some text and then the URL http://www.google.com and then some other text.
Just joined the coded from all answers:
private static String txtToHtml(String s) {
StringBuilder builder = new StringBuilder();
boolean previousWasASpace = false;
for (char c : s.toCharArray()) {
if (c == ' ') {
if (previousWasASpace) {
builder.append(" ");
previousWasASpace = false;
continue;
}
previousWasASpace = true;
} else {
previousWasASpace = false;
}
switch (c) {
case '<':
builder.append("<");
break;
case '>':
builder.append(">");
break;
case '&':
builder.append("&");
break;
case '"':
builder.append(""");
break;
case '\n':
builder.append("<br>");
break;
// We need Tab support here, because we print StackTraces as HTML
case '\t':
builder.append(" ");
break;
default:
builder.append(c);
}
}
String converted = builder.toString();
String str = "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>?«»“”‘’]))";
Pattern patt = Pattern.compile(str);
Matcher matcher = patt.matcher(converted);
converted = matcher.replaceAll("$1");
return converted;
}
Use this
public static String stringToHTMLString(String string) {
StringBuffer sb = new StringBuffer(string.length());
// true if last char was blank
boolean lastWasBlankChar = false;
int len = string.length();
char c;
for (int i = 0; i < len; i++) {
c = string.charAt(i);
if (c == ' ') {
// blank gets extra work,
// this solves the problem you get if you replace all
// blanks with , if you do that you loss
// word breaking
if (lastWasBlankChar) {
lastWasBlankChar = false;
sb.append(" ");
} else {
lastWasBlankChar = true;
sb.append(' ');
}
} else {
lastWasBlankChar = false;
//
// HTML Special Chars
if (c == '"')
sb.append(""");
else if (c == '&')
sb.append("&");
else if (c == '<')
sb.append("<");
else if (c == '>')
sb.append(">");
else if (c == '\n')
// Handle Newline
sb.append("<br/>");
else {
int ci = 0xffff & c;
if (ci < 160)
// nothing special only 7 Bit
sb.append(c);
else {
// Not 7 Bit use the unicode system
sb.append("&#");
sb.append(new Integer(ci).toString());
sb.append(';');
}
}
}
}
return sb.toString();
}
If your plain text is a URL (which is different from containing a hyperlink, as you wrote in your question), then transforming it into a hyperlink in HTML is simply done by
String hyperlink = "<a href='" + url + "'>" + url + "</a>";
In Android application I just implemented HTMLifying of a content ( see https://github.com/andstatus/andstatus/issues/375 ). Actual transformation was done in literary 3 lines of code using Android system libraries. This gives an advantage of using better implementation at each subsequent version of Android libraries.
private static String htmlifyPlain(String textIn) {
SpannableString spannable = SpannableString.valueOf(textIn);
Linkify.addLinks(spannable, Linkify.WEB_URLS);
return Html.toHtml(spannable);
}
Is there a standard (preferably Apache Commons or similarly non-viral) library for doing "glob" type matches in Java? When I had to do similar in Perl once, I just changed all the "." to "\.", the "*" to ".*" and the "?" to "." and that sort of thing, but I'm wondering if somebody has done the work for me.
Similar question: Create regex from glob expression
Globbing is also planned for implemented in Java 7.
See FileSystem.getPathMatcher(String) and the "Finding Files" tutorial.
There's nothing built-in, but it's pretty simple to convert something glob-like to a regex:
public static String createRegexFromGlob(String glob)
{
String out = "^";
for(int i = 0; i < glob.length(); ++i)
{
final char c = glob.charAt(i);
switch(c)
{
case '*': out += ".*"; break;
case '?': out += '.'; break;
case '.': out += "\\."; break;
case '\\': out += "\\\\"; break;
default: out += c;
}
}
out += '$';
return out;
}
this works for me, but I'm not sure if it covers the glob "standard", if there is one :)
Update by Paul Tomblin: I found a perl program that does glob conversion, and adapting it to Java I end up with:
private String convertGlobToRegEx(String line)
{
LOG.info("got line [" + line + "]");
line = line.trim();
int strLen = line.length();
StringBuilder sb = new StringBuilder(strLen);
// Remove beginning and ending * globs because they're useless
if (line.startsWith("*"))
{
line = line.substring(1);
strLen--;
}
if (line.endsWith("*"))
{
line = line.substring(0, strLen-1);
strLen--;
}
boolean escaping = false;
int inCurlies = 0;
for (char currentChar : line.toCharArray())
{
switch (currentChar)
{
case '*':
if (escaping)
sb.append("\\*");
else
sb.append(".*");
escaping = false;
break;
case '?':
if (escaping)
sb.append("\\?");
else
sb.append('.');
escaping = false;
break;
case '.':
case '(':
case ')':
case '+':
case '|':
case '^':
case '$':
case '#':
case '%':
sb.append('\\');
sb.append(currentChar);
escaping = false;
break;
case '\\':
if (escaping)
{
sb.append("\\\\");
escaping = false;
}
else
escaping = true;
break;
case '{':
if (escaping)
{
sb.append("\\{");
}
else
{
sb.append('(');
inCurlies++;
}
escaping = false;
break;
case '}':
if (inCurlies > 0 && !escaping)
{
sb.append(')');
inCurlies--;
}
else if (escaping)
sb.append("\\}");
else
sb.append("}");
escaping = false;
break;
case ',':
if (inCurlies > 0 && !escaping)
{
sb.append('|');
}
else if (escaping)
sb.append("\\,");
else
sb.append(",");
break;
default:
escaping = false;
sb.append(currentChar);
}
}
return sb.toString();
}
I'm editing into this answer rather than making my own because this answer put me on the right track.
Thanks to everyone here for their contributions. I wrote a more comprehensive conversion than any of the previous answers:
/**
* Converts a standard POSIX Shell globbing pattern into a regular expression
* pattern. The result can be used with the standard {#link java.util.regex} API to
* recognize strings which match the glob pattern.
* <p/>
* See also, the POSIX Shell language:
* http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01
*
* #param pattern A glob pattern.
* #return A regex pattern to recognize the given glob pattern.
*/
public static final String convertGlobToRegex(String pattern) {
StringBuilder sb = new StringBuilder(pattern.length());
int inGroup = 0;
int inClass = 0;
int firstIndexInClass = -1;
char[] arr = pattern.toCharArray();
for (int i = 0; i < arr.length; i++) {
char ch = arr[i];
switch (ch) {
case '\\':
if (++i >= arr.length) {
sb.append('\\');
} else {
char next = arr[i];
switch (next) {
case ',':
// escape not needed
break;
case 'Q':
case 'E':
// extra escape needed
sb.append('\\');
default:
sb.append('\\');
}
sb.append(next);
}
break;
case '*':
if (inClass == 0)
sb.append(".*");
else
sb.append('*');
break;
case '?':
if (inClass == 0)
sb.append('.');
else
sb.append('?');
break;
case '[':
inClass++;
firstIndexInClass = i+1;
sb.append('[');
break;
case ']':
inClass--;
sb.append(']');
break;
case '.':
case '(':
case ')':
case '+':
case '|':
case '^':
case '$':
case '#':
case '%':
if (inClass == 0 || (firstIndexInClass == i && ch == '^'))
sb.append('\\');
sb.append(ch);
break;
case '!':
if (firstIndexInClass == i)
sb.append('^');
else
sb.append('!');
break;
case '{':
inGroup++;
sb.append('(');
break;
case '}':
inGroup--;
sb.append(')');
break;
case ',':
if (inGroup > 0)
sb.append('|');
else
sb.append(',');
break;
default:
sb.append(ch);
}
}
return sb.toString();
}
And the unit tests to prove it works:
/**
* #author Neil Traft
*/
public class StringUtils_ConvertGlobToRegex_Test {
#Test
public void star_becomes_dot_star() throws Exception {
assertEquals("gl.*b", StringUtils.convertGlobToRegex("gl*b"));
}
#Test
public void escaped_star_is_unchanged() throws Exception {
assertEquals("gl\\*b", StringUtils.convertGlobToRegex("gl\\*b"));
}
#Test
public void question_mark_becomes_dot() throws Exception {
assertEquals("gl.b", StringUtils.convertGlobToRegex("gl?b"));
}
#Test
public void escaped_question_mark_is_unchanged() throws Exception {
assertEquals("gl\\?b", StringUtils.convertGlobToRegex("gl\\?b"));
}
#Test
public void character_classes_dont_need_conversion() throws Exception {
assertEquals("gl[-o]b", StringUtils.convertGlobToRegex("gl[-o]b"));
}
#Test
public void escaped_classes_are_unchanged() throws Exception {
assertEquals("gl\\[-o\\]b", StringUtils.convertGlobToRegex("gl\\[-o\\]b"));
}
#Test
public void negation_in_character_classes() throws Exception {
assertEquals("gl[^a-n!p-z]b", StringUtils.convertGlobToRegex("gl[!a-n!p-z]b"));
}
#Test
public void nested_negation_in_character_classes() throws Exception {
assertEquals("gl[[^a-n]!p-z]b", StringUtils.convertGlobToRegex("gl[[!a-n]!p-z]b"));
}
#Test
public void escape_carat_if_it_is_the_first_char_in_a_character_class() throws Exception {
assertEquals("gl[\\^o]b", StringUtils.convertGlobToRegex("gl[^o]b"));
}
#Test
public void metachars_are_escaped() throws Exception {
assertEquals("gl..*\\.\\(\\)\\+\\|\\^\\$\\#\\%b", StringUtils.convertGlobToRegex("gl?*.()+|^$#%b"));
}
#Test
public void metachars_in_character_classes_dont_need_escaping() throws Exception {
assertEquals("gl[?*.()+|^$#%]b", StringUtils.convertGlobToRegex("gl[?*.()+|^$#%]b"));
}
#Test
public void escaped_backslash_is_unchanged() throws Exception {
assertEquals("gl\\\\b", StringUtils.convertGlobToRegex("gl\\\\b"));
}
#Test
public void slashQ_and_slashE_are_escaped() throws Exception {
assertEquals("\\\\Qglob\\\\E", StringUtils.convertGlobToRegex("\\Qglob\\E"));
}
#Test
public void braces_are_turned_into_groups() throws Exception {
assertEquals("(glob|regex)", StringUtils.convertGlobToRegex("{glob,regex}"));
}
#Test
public void escaped_braces_are_unchanged() throws Exception {
assertEquals("\\{glob\\}", StringUtils.convertGlobToRegex("\\{glob\\}"));
}
#Test
public void commas_dont_need_escaping() throws Exception {
assertEquals("(glob,regex),", StringUtils.convertGlobToRegex("{glob\\,regex},"));
}
}
There are couple of libraries that do Glob-like pattern matching that are more modern than the ones listed:
Theres Ants Directory Scanner
And
Springs AntPathMatcher
I recommend both over the other solutions since Ant Style Globbing has pretty much become the standard glob syntax in the Java world (Hudson, Spring, Ant and I think Maven).
I recently had to do it and used \Q and \E to escape the glob pattern:
private static Pattern getPatternFromGlob(String glob) {
return Pattern.compile(
"^" + Pattern.quote(glob)
.replace("*", "\\E.*\\Q")
.replace("?", "\\E.\\Q")
+ "$");
}
This is a simple Glob implementation which handles * and ? in the pattern
public class GlobMatch {
private String text;
private String pattern;
public boolean match(String text, String pattern) {
this.text = text;
this.pattern = pattern;
return matchCharacter(0, 0);
}
private boolean matchCharacter(int patternIndex, int textIndex) {
if (patternIndex >= pattern.length()) {
return false;
}
switch(pattern.charAt(patternIndex)) {
case '?':
// Match any character
if (textIndex >= text.length()) {
return false;
}
break;
case '*':
// * at the end of the pattern will match anything
if (patternIndex + 1 >= pattern.length() || textIndex >= text.length()) {
return true;
}
// Probe forward to see if we can get a match
while (textIndex < text.length()) {
if (matchCharacter(patternIndex + 1, textIndex)) {
return true;
}
textIndex++;
}
return false;
default:
if (textIndex >= text.length()) {
return false;
}
String textChar = text.substring(textIndex, textIndex + 1);
String patternChar = pattern.substring(patternIndex, patternIndex + 1);
// Note the match is case insensitive
if (textChar.compareToIgnoreCase(patternChar) != 0) {
return false;
}
}
// End of pattern and text?
if (patternIndex + 1 >= pattern.length() && textIndex + 1 >= text.length()) {
return true;
}
// Go on to match the next character in the pattern
return matchCharacter(patternIndex + 1, textIndex + 1);
}
}
Similar to Tony Edgecombe's answer, here is a short and simple globber that supports * and ? without using regex, if anybody needs one.
public static boolean matches(String text, String glob) {
String rest = null;
int pos = glob.indexOf('*');
if (pos != -1) {
rest = glob.substring(pos + 1);
glob = glob.substring(0, pos);
}
if (glob.length() > text.length())
return false;
// handle the part up to the first *
for (int i = 0; i < glob.length(); i++)
if (glob.charAt(i) != '?'
&& !glob.substring(i, i + 1).equalsIgnoreCase(text.substring(i, i + 1)))
return false;
// recurse for the part after the first *, if any
if (rest == null) {
return glob.length() == text.length();
} else {
for (int i = glob.length(); i <= text.length(); i++) {
if (matches(text.substring(i), rest))
return true;
}
return false;
}
}
It may be a slightly hacky approach. I've figured it out from NIO2's Files.newDirectoryStream(Path dir, String glob) code. Pay attention that every match new Path object is created. So far I was able to test this only on Windows FS, however, I believe it should work on Unix as well.
// a file system hack to get a glob matching
PathMatcher matcher = ("*".equals(glob)) ? null
: FileSystems.getDefault().getPathMatcher("glob:" + glob);
if ("*".equals(glob) || matcher.matches(Paths.get(someName))) {
// do you stuff here
}
UPDATE
Works on both - Mac and Linux.
The previous solution by Vincent Robert/dimo414 relies on Pattern.quote() being implemented in terms of \Q...\E, which is not documented in the API and therefore may not be the case for other/future Java implementations. The following solution removes that implementation dependency by escaping all occurrences of \E instead of using quote(). It also activates DOTALL mode ((?s)) in case the string to be matched contains newlines.
public static Pattern globToRegex(String glob)
{
return Pattern.compile(
"(?s)^\\Q" +
glob.replace("\\E", "\\E\\\\E\\Q")
.replace("*", "\\E.*\\Q")
.replace("?", "\\E.\\Q") +
"\\E$"
);
}
I don't know about a "standard" implementation, but I know of a sourceforge project released under the BSD license that implemented glob matching for files. It's implemented in one file, maybe you can adapt it for your requirements.
There is sun.nio.fs.Globs but it is not part of the public API.
You can use it indirectly via:
FileSystems.getDefault().getPathMatcher("glob:<myPattern>")
But it returns PathMatcher, which is inconvenient to work with. Since it can accept only Path as parameter (not File).
One possible option is to convert the PathMatcher to regex pattern (just call its 'toString()' method).
Another option is to use dedicated Glob library like glob-library-java.
Long ago I was doing a massive glob-driven text filtering so I've written a small piece of code (15 lines of code, no dependencies beyond JDK).
It handles only '*' (was sufficient for me), but can be easily extended for '?'.
It is several times faster than pre-compiled regexp, does not require any pre-compilation (essentially it is a string-vs-string comparison every time the pattern is matched).
Code:
public static boolean miniglob(String[] pattern, String line) {
if (pattern.length == 0) return line.isEmpty();
else if (pattern.length == 1) return line.equals(pattern[0]);
else {
if (!line.startsWith(pattern[0])) return false;
int idx = pattern[0].length();
for (int i = 1; i < pattern.length - 1; ++i) {
String patternTok = pattern[i];
int nextIdx = line.indexOf(patternTok, idx);
if (nextIdx < 0) return false;
else idx = nextIdx + patternTok.length();
}
if (!line.endsWith(pattern[pattern.length - 1])) return false;
return true;
}
}
Usage:
public static void main(String[] args) {
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
try {
// read from stdin space separated text and pattern
for (String input = in.readLine(); input != null; input = in.readLine()) {
String[] tokens = input.split(" ");
String line = tokens[0];
String[] pattern = tokens[1].split("\\*+", -1 /* want empty trailing token if any */);
// check matcher performance
long tm0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; ++i) {
miniglob(pattern, line);
}
long tm1 = System.currentTimeMillis();
System.out.println("miniglob took " + (tm1-tm0) + " ms");
// check regexp performance
Pattern reptn = Pattern.compile(tokens[1].replace("*", ".*"));
Matcher mtchr = reptn.matcher(line);
tm0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; ++i) {
mtchr.matches();
}
tm1 = System.currentTimeMillis();
System.out.println("regexp took " + (tm1-tm0) + " ms");
// check if miniglob worked correctly
if (miniglob(pattern, line)) {
System.out.println("+ >" + line);
}
else {
System.out.println("- >" + line);
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Copy/paste from here
By the way, it seems as if you did it the hard way in Perl
This does the trick in Perl:
my #files = glob("*.html")
# Or, if you prefer:
my #files = <*.html>