Replace quote (‘NOA’) using groovy - java

Can anyone guide me on how to replace this char (‘ ’) using groovy or java?
When I try the below code (i assume this is a single quote), it's not working.
def a = "‘NOA’,’CTF’,’CLM’"
def rep = a.replaceAll("\'","")
My expected Output : NOA,CTF,CLM

Those are curly quotes in your source text. Your replaceAll is replacing straight quotes.
You should have copy-pasted the characters from your source.
System.out.println(
"‘NOA’,’CTF’,’CLM’"
.replaceAll( "‘" , "" )
.replaceAll( "’" , "" )
);
See this code run live at OneCompiler.
NOA,CTF,CLM

i would suggest this
a.replaceAll("[‘’]", "")
or even better to escape unicode characters in a source code
a.replaceAll("[\u2018\u2019]", "")

Related

Groovy remove beginning of path

I'm trying to delete the beginning of a path that has '\' and ' ' in it. I seem to be getting the some issues saying escape character issue at character 3.
Example:
SomePath: C:\Users\ADMINISTRATOR\App Play\blah\blah
SomePath.replaceFirst('C:\\Users\\ADMINISTRATOR\\App Play\\', '');
Path should be blah\blah
I've tried:
SomePath.replaceFirst("C:\Users\ADMINISTRATOR\App Play\", "");
SomePath.replaceFirst("C:\\Users\\ADMINISTRATOR\\App Play\\", "");
SomePath.replaceFirst("C:\\\\Users\\\\ADMINISTRATOR\\\\App Play\\\\", "");
SomePath.replaceAll("C:\Users\ADMINISTRATOR\App Play\", "");
SomePath.replaceAll("C:\\Users\\ADMINISTRATOR\\App Play\\", "");
SomePath.replaceAll("C:\\\\Users\\\\ADMINISTRATOR\\\\App Play\\\\", "");
Just gave it a try... the examples with four backslashes work for me:
def somePath = "C:\\Users\\ADMINISTRATOR\\App Play\\blah\\blah"
println somePath
somePath.replaceFirst("C:\\\\Users\\\\ADMINISTRATOR\\\\App Play\\\\", "");
The problem is that the string needs one escaping \ and since the replaceFirst uses a regexp, the regexp-engine needs another \ to escape the \. The result are four backslashes.
Btw: you can use string operations to get your path, but you could also try file operations like this:
def root= new File("C:\\Users\\ADMINISTRATOR\\App Play\\")
def full= new File("C:\\Users\\ADMINISTRATOR\\App Play\\blah\\blah")
def relPath = root.toPath().relativize( full.toPath() ).toFile()
println relPath
(taken from https://gist.github.com/ysb33r/5804364)
You can tackle this problem differently. You could tokenize your input path using \ as a delimiter and then you could pick the last 2 elements (blah and blah) or skip first 4 elements (C:, Users, ADMINISTRATOR, App Play). It depends which assumption is easier to deduct for you. Consider following example:
def somePath = 'C:\\Users\\ADMINISTRATOR\\App Play\\blah\\blah'
// Build a new path by accepting the last 2 parts of the initial path
assert 'blah\\blah' == somePath.tokenize('\\')[-2..-1].join('\\')
// Build a new path by skipping the first 4 parts from initial path
assert 'blah\\blah' == somePath.tokenize('\\').drop(4).join('\\')
First option works better if you want only two last parts from the initial path. Second option works better if you can expect final path like blah\blah\blahhhh because you don't know how many nested children initial path contains and you want to start building a new path right after \App Play\ .

Java replace/replaceAll strange behavior

I can't get what I'm missing here. Both replace and replaceAll from java.lang.String are generating a question mark (?) after each ocurrence:
String str = "ABCD DKABCED DLS ABC";
System.out.println("str='"+str+"'");
System.out.println("str.replaceAll(\"ABC\", \"A\\\\${BC}​\" ) => " + str.replaceAll("ABC", "A\\${BC}​" ));
System.out.println("str.replace(\"ABC\", \"A${BC}​\" ) => " + str.replace("ABC", "A${BC}​" ));
Generates the following output:
str='ABCD DKABCED DLS ABC'
str.replaceAll("ABC", "A\\${BC}?" ) => A${BC}?D DKA${BC}?ED DLS A${BC}?
str.replace("ABC", "A${BC}?" ) => A${BC}?D DKA${BC}?ED DLS A${BC}?
Here an image of the execution:
Does anybody knows why?
EDITED:
Just for the record. The problem it that there really WAS a character after the brackets.
After coping and pasting to Notepad++ I could see the }?"text. Not in Netbeans.
So purelly enconding missunderstanding.
I suspect this is a character encoding problem. When I pasted your code into Eclipse (on Windows) it could not save the code, complaining about the character set:
Some characters cannot be mapped using "Cp1252" character encoding.
When I retyped it in from scratch, the problem went away:
String str = "ABCD DKABCED DLS ABC";
System.out.println("str='" + str + "'");
System.out.println(str.replace("ABC", "A${BC}"));
produces the following (without extra ? marks):
str='ABCD DKABCED DLS ABC'
A${BC}D DKA${BC}ED DLS A${BC}
If you take the hexdump of a normal } you get 7d.
But for the } character in your code, I get 7d e2 80 8b
That would be because you have question marks in your replacement string. Thus replace and replaceAll are simply doing exactly what you are telling them to do.

Complex Java Regular Expression with Nested Groupings

I am trying to get a regular expression written that will capture what I'm trying to match in Java, but can't seem to get it.
This is my latest attempt:
Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
This is what I want to match:
hello
hello/world
hello/big/world
hello/big/world/
This what I don't want matched:
/
/hello
hello//world
hello/big//world
I'd appreciate any insight into what I am doing wrong :)
Try this regex:
Pattern.compile( "^[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?$" );
Doesn't your regex require question mark at the end?
I always write unit tests for my regexes so I can fiddle with them until they pass.
// your exact regex:
final Pattern regex = Pattern.compile( "[A-Za-z0-9]+(/[A-Za-z0-9]+)*/?" );
// your exact examples:
final String[]
good = { "hello", "hello/world", "hello/big/world", "hello/big/world/" },
bad = { "/", "/hello", "hello//world", "hello/big//world"};
for (String goodOne : good) System.out.println(regex.matcher(goodOne).matches());
for (String badOne : bad) System.out.println(!regex.matcher(badOne).matches());
prints a solid column of true values.
Put another way: your regex is perfectly fine just as it is.
It looks like what you're trying to 'Capture' is being overwritten each quantified itteration. Just change parenthesis arangement.
# "[A-Za-z0-9]+((?:/[A-Za-z0-9]+)*)/?"
[A-Za-z0-9]+
( # (1 start)
(?: / [A-Za-z0-9]+ )*
) # (1 end)
/?
Or, with no capture's at all -
# "[A-Za-z0-9]+(?:/[A-Za-z0-9]+)*/?"
[A-Za-z0-9]+
(?: / [A-Za-z0-9]+ )*
/?

Split on Regular Expression per Path

If I have this:
thisisgibberish 1234 /hello/world/
more gibberish 43/7 /good/timing/
just onemore 8888 /thanks/mate
what would the regular expression inside the Java String.split() method be to obtain the paths per line?
ie.
[0]: /hello/world/
[1]: /good/timing/
[2]: /thanks/mate
Doing
myString.split("\/[a-zA-Z]")
causes the splits to occur to every /h, /w, /g, /t, and /m.
How would I go about writing a regular expression to split it only once per line while only capturing the paths?
Thanks in advance.
Why split ? I think running a match here is better, try the following expression:
(?<=\s)(/[a-zA-Z/])+
Regex101 Demo
This uses split() :
String[] split = myString.split(myString.substring(0, myString.lastIndexOf(" ")));
OR
myString.split(myString.substring(0, myString.lastIndexOf(" ")))[1]; //works for current inputs
You must first remove the leading junk, then split on the intervening junk:
String[] paths = str.replaceAll("^.*? (?=/[a-zA-Z])", "")
.split("(?m)((?<=[a-zA-Z]/|[a-zA-Z])\\s|^).*? (?=/[a-zA-Z])");
One important point here is the use of (?m), which is a switch that turns on "dot matches newline", which is required to split across the newlines.
Here's some test code:
String str = "thisisgibberish 1234 /hello/world/\nmore gibberish 43/7 /good/timing/\njust onemore 8888 /thanks/mate";
String[] paths = str.replaceAll("^.*? (?=/[a-zA-Z])", "")
.split("(?m)((?<=[a-zA-Z]/|[a-zA-Z])\\s|^).*? (?=/[a-zA-Z])");
System.out.println( Arrays.toString( paths));
Output (achieving requirements):
[/hello/world/, /good/timing/, /thanks/mate]

Use java regex to find all strings that start with '#' and end with ' ' , and not include ' ' and '#'

I need to get all strings(not empty) starts with # and end with ' '(space) in String below:
String s = "#test1 #test2 #test3 #test4 ## #test5";
I hope I can get all "test1", "test2", "test3", "test4", "test5" strings.
How to do it with java regx? thanks a lot!
You can use the following regex
#\w+
\w is similar to [a-zA-Z\d_]
\w+ matches 1 to many characters which are from [a-zA-Z\d_]
The Java regex (?<=#)[^# ]+(?= ) should do the trick. According to Regex Planet's Java regex page that regex matches test1, test2, test3 and test4. (#test5 does not end with a space, so test5 is not matched.)
If you're OK with matching the leading #s and trailing s as well, you can get away with the simpler Java regex #[^# ]+.
Finally I solved it with code below:
Pattern pattern = Pattern.compile("#\\p{L}+");

Categories

Resources