Java replace/replaceAll strange behavior

Java replace/replaceAll strange behavior - java

I can't get what I'm missing here. Both replace and replaceAll from java.lang.String are generating a question mark (?) after each ocurrence:
String str = "ABCD DKABCED DLS ABC";
System.out.println("str='"+str+"'");
System.out.println("str.replaceAll(\"ABC\", \"A\\\\${BC}\" ) => " + str.replaceAll("ABC", "A\\${BC}" ));
System.out.println("str.replace(\"ABC\", \"A${BC}\" ) => " + str.replace("ABC", "A${BC}" ));
Generates the following output:
str='ABCD DKABCED DLS ABC'
str.replaceAll("ABC", "A\\${BC}?" ) => A${BC}?D DKA${BC}?ED DLS A${BC}?
str.replace("ABC", "A${BC}?" ) => A${BC}?D DKA${BC}?ED DLS A${BC}?
Here an image of the execution:
Does anybody knows why?
EDITED:
Just for the record. The problem it that there really WAS a character after the brackets.
After coping and pasting to Notepad++ I could see the }?"text. Not in Netbeans.
So purelly enconding missunderstanding.

I suspect this is a character encoding problem. When I pasted your code into Eclipse (on Windows) it could not save the code, complaining about the character set:
Some characters cannot be mapped using "Cp1252" character encoding.
When I retyped it in from scratch, the problem went away:
String str = "ABCD DKABCED DLS ABC";
System.out.println("str='" + str + "'");
System.out.println(str.replace("ABC", "A${BC}"));
produces the following (without extra ? marks):
str='ABCD DKABCED DLS ABC'
A${BC}D DKA${BC}ED DLS A${BC}
If you take the hexdump of a normal } you get 7d.
But for the } character in your code, I get 7d e2 80 8b

That would be because you have question marks in your replacement string. Thus replace and replaceAll are simply doing exactly what you are telling them to do.

Related

Replace quote (‘NOA’) using groovy

Can anyone guide me on how to replace this char (‘ ’) using groovy or java?
When I try the below code (i assume this is a single quote), it's not working.
def a = "‘NOA’,’CTF’,’CLM’"
def rep = a.replaceAll("\'","")
My expected Output : NOA,CTF,CLM

Those are curly quotes in your source text. Your replaceAll is replacing straight quotes.
You should have copy-pasted the characters from your source.
System.out.println(
"‘NOA’,’CTF’,’CLM’"
.replaceAll( "‘" , "" )
.replaceAll( "’" , "" )
);
See this code run live at OneCompiler.
NOA,CTF,CLM

i would suggest this
a.replaceAll("[‘’]", "")
or even better to escape unicode characters in a source code
a.replaceAll("[\u2018\u2019]", "")

Groovy remove beginning of path

I'm trying to delete the beginning of a path that has '\' and ' ' in it. I seem to be getting the some issues saying escape character issue at character 3.
Example:
SomePath: C:\Users\ADMINISTRATOR\App Play\blah\blah
SomePath.replaceFirst('C:\\Users\\ADMINISTRATOR\\App Play\\', '');
Path should be blah\blah
I've tried:
SomePath.replaceFirst("C:\Users\ADMINISTRATOR\App Play\", "");
SomePath.replaceFirst("C:\\Users\\ADMINISTRATOR\\App Play\\", "");
SomePath.replaceFirst("C:\\\\Users\\\\ADMINISTRATOR\\\\App Play\\\\", "");
SomePath.replaceAll("C:\Users\ADMINISTRATOR\App Play\", "");
SomePath.replaceAll("C:\\Users\\ADMINISTRATOR\\App Play\\", "");
SomePath.replaceAll("C:\\\\Users\\\\ADMINISTRATOR\\\\App Play\\\\", "");

Just gave it a try... the examples with four backslashes work for me:
def somePath = "C:\\Users\\ADMINISTRATOR\\App Play\\blah\\blah"
println somePath
somePath.replaceFirst("C:\\\\Users\\\\ADMINISTRATOR\\\\App Play\\\\", "");
The problem is that the string needs one escaping \ and since the replaceFirst uses a regexp, the regexp-engine needs another \ to escape the \. The result are four backslashes.
Btw: you can use string operations to get your path, but you could also try file operations like this:
def root= new File("C:\\Users\\ADMINISTRATOR\\App Play\\")
def full= new File("C:\\Users\\ADMINISTRATOR\\App Play\\blah\\blah")
def relPath = root.toPath().relativize( full.toPath() ).toFile()
println relPath
(taken from https://gist.github.com/ysb33r/5804364)

You can tackle this problem differently. You could tokenize your input path using \ as a delimiter and then you could pick the last 2 elements (blah and blah) or skip first 4 elements (C:, Users, ADMINISTRATOR, App Play). It depends which assumption is easier to deduct for you. Consider following example:
def somePath = 'C:\\Users\\ADMINISTRATOR\\App Play\\blah\\blah'
// Build a new path by accepting the last 2 parts of the initial path
assert 'blah\\blah' == somePath.tokenize('\\')[-2..-1].join('\\')
// Build a new path by skipping the first 4 parts from initial path
assert 'blah\\blah' == somePath.tokenize('\\').drop(4).join('\\')
First option works better if you want only two last parts from the initial path. Second option works better if you can expect final path like blah\blah\blahhhh because you don't know how many nested children initial path contains and you want to start building a new path right after \App Play\ .

Scala RegEx String extractors behaving inconsistently

I have two regular expression extractors.
One for .java files and the other is for .scala files
val JavaFileRegEx =
"""\S*
\s+
//
\s{1}
([^\.java]+)
\.java
""".replaceAll("(\\s)", "").r
val ScalaFileRegEx =
"""\S*
\s+
//
\s{1}
([^\.scala]+)
\.scala
""".replaceAll("(\\s)", "").r
I want to use these extractors above to extract a java file name and a scala file name from the example code below.
val string1 = " // Tester.java"
val string2 = " // Hello.scala"
string1 match {
case JavaFileRegEx(fileName1) => println(" Java file: " + fileName1)
case other => println(other + "--NO_MATCH")
}
string2 match {
case ScalaFileRegEx(fileName2) => println(" Scala file: " + fileName2)
case other => println(other + "--NO_MATCH")
}
I get this output indicating that the .java file matched but the .scala file did not.
Java file: Tester
// Hello.scala--NO_MATCH
How is it that the Java file matched but the .scala file did not?

NOTE
[] denotes character class. It matches only a single character.
[^] denotes match anything except the characters present in the character class.
In your first regex
\S*\s+//\s{1}([^\.java]+)\.java
\S* matches nothing as there is space in starting
\s+ matches the space which is in starting
// matches // literally
\s{1} matches next space
You are using [^\.java] which says match anything except . or j or a or v or a which can be written as [^.jav].
So, the left string now to be tested is
Tester.java
(Un)luckily any character from Tester does not matches . or j or a or v until we encounter a .. So Tester is matched and then java is also matched.
In your second regex
\S*\s+//\s{1}([^\.scala]+)\.scala
\S* matches nothing as there is space in starting
\s+ matches the space which is in starting
// matches // literally
\s{1} matches next space
Now, you are using [^\.scala] which says that match anything except . or s or c or a or l or a which can be written as [^.scla].
You have now
Hello.scala
but (un)luckily Hello here contains l which is not allowed according to character class and the regex fails.
How to correct it?
I will modify only a bit of your regex
\S*\s+//\s{1}([^.]*)\.java
<-->
This says that match anything except .
You can also use \w here instead if [^.]
Regex Demo
\S*\s+//\s{1}([^.]*)\.scala
Regex Demo
There is no need of {1} in \s{1}. You can simply write it as \s and it will match exactly one space like
\S*\s+//\s([^.]*)\.java

A custom tokenizer for Java

I am developing an application in which I need to process text files containing emails. I need all the tokens from the text and the following is the definition of token:
Alphanumeric
Case-sensitive (case to be preserved)
'!' and '$' are to be considered as constituent characters. Ex: FREE!!, $50 are tokens
'.' (dot) and ',' comma are to be considered as constituent characters if they occur between numbers. For ex:
192.168.1.1, $24,500
are tokens.
and so on..
Please suggest me some open-source tokenizers for Java which are easy to customize to suit my needs. Will simply using StringTokenizer and regex be enough? I have to perform stopping also and that's why I was looking for an open source tokenizer which will also perform some extra things like stopping, stemming.

A few comments up front:
From StringTokenizer javadoc:
StringTokenizer is a legacy class that is retained for compatibility
reasons although its use is discouraged in new code. It is recommended
that anyone seeking this functionality use the split method of String
or the java.util.regex package instead.
Always use Google first - the first result as of now is JTopas. I did not use it, but it looks it could work for this
As for regex, it really depends on your requirements. Given the above, this might work:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Mkt {
public static void main(String[] args) {
Pattern p = Pattern.compile("([$\\d.,]+)|([\\w\\d!$]+)");
String str = "--- FREE!! $50 192.168.1.1 $24,500";
System.out.println("input: " + str);
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println("token: " + m.group());
}
}
}
Here's a sample run:
$ javac Mkt.java && java Mkt
input: --- FREE!! $50 192.168.1.1 $24,500
token: FREE!!
token: $50
token: 192.168.1.1
token: $24,500
Now, you might need to tweak the regex, for example:
You gave $24,500 as an example. Should this work for $24,500abc or $24,500EUR?
You mentioned 192.168.1.1 should be included. Should it also include 192,168.1,1 (given . and , are to be included)?
and I guess there are other things to consider.
Hope this helps to get you started.

How to append a backslash in a string in java

I want to add a '\' character to every string in a list of strings... I m doing something like this but it adds 2 backslashes instead.
feedbackMsgs.add(behaviorName+"\\"+fbCode);
result is like: "abc\\def"
how to make sure a single backslash is added??

I've just run a program with the following -
String s = "test" + "\\" + "test2";
System.out.println(s);
And it prints out the following -
test\test2
Are you sure there is no \ in the behaviourName or fbCode variables?

Looks like either your behaviourName ends with a \ or fbCode starts with one.

Try to Log/print behaviorName fbCode and find it yourself !
System.out.println(behaviorName);
System.out.println(fbCode);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java replace/replaceAll strange behavior - java

That would be because you have question marks in your replacement string. Thus replace and replaceAll are simply doing exactly what you are telling them to do.

Related

Replace quote (‘NOA’) using groovy

Groovy remove beginning of path

Scala RegEx String extractors behaving inconsistently

A custom tokenizer for Java

How to append a backslash in a string in java

Categories

Resources