Replacing special characters from a string - java

Just would like to know if there is a more elegant and maintainable approach for this:
private String replaceSpecialChars(String fileName) {
if (fileName.length() < 1) return null;
if (fileName.contains("Ü")) {
fileName = fileName.replace("Ü", "Ue");
}
if (fileName.contains("Ä")) {
fileName = fileName.replace("Ä", "Ae");
}
if (fileName.contains("Ö")) {
fileName = fileName.replace("Ö", "Oe");
}
if (fileName.contains("ü")) {
fileName = fileName.replace("ü", "ue");
}
...
return fileName;
}
I'm restricted to Java 6.

Before you go any further on this, note that what you're doing is effectively impossible. For example, the 'ascii-fication' of 'Ö' in swedish is 'O' and not 'Oe'. There is no way to know if a word is swedish or german; after all, swedes sometimes move to germany, for example. If you open a german phonebook and you see a Mrs. Sjögren, and you asciify that to Sjoegren, you messed it up.
If you want to run 'case and asciification insensitive comparisons', well, first you have to answer a few questions. Is muller equal to mueller equal to müller? That rabbit hole goes quite deep.
The general solution is trigrams or other generalized text search tools such as provided by postgres. Alternatively, opt out of this mechanism and store this stuff in unicode, and be clear that to find Ms. Sjögren, you're going to have search for "Sjögren" for the same reason that to find Mr. Johnson, you're not going to if you try to search for Jahnson.
Note that most filesystems allow unicode filenames; there is no need to try to replace a Ü.
This also goes some way as to explain why there are no ready libraries available for this seemingly common job; the job is, in fact, impossible.
You can simplify this code by using a Map<String, String> with replacements if you must. I advise against it for the above reasons. Or, just.. keep it as is, but ditch the contains. This code is needlessly slow and lengthy.
There is no difference between:
if (fileName.contains("x")) fileName = fileName.replace("x", "y");
and just fileName = fileName.replace("x", "y"); except that the former is strictly slower (replace does not make a new string and returns itself, if you ask it to replace a string that it does not contain. The former will search twice, the latter only once, and either one will make no new strings unless actual string replacing needs to be done.
You can then chain it:
if (fileName.isEmpty()) return null;
return fileName
.replace("Ü", "Ue")
.replace("Ä", "Ae")
...
;
But, as I said, you probably don't want to do that, unless you want an aggravated person on the line at some point in the future complaining that you bungled up the asciification of their surname.

You can remove unnecessary if statements an use a chain of String.replace methods. Your code might look something like this:
private static String replaceSpecialChars(String fileName) {
if (fileName == null)
return null;
else
return fileName
.replace("Ü", "Ue")
.replace("Ä", "Ae")
.replace("Ö", "Oe")
.replace("ü", "ue");
}
public static void main(String[] args) {
System.out.println(replaceSpecialChars("ABc")); // ABc
System.out.println(replaceSpecialChars("ÜÄÖü")); // UeAeOeue
System.out.println(replaceSpecialChars("").length()); // 0
System.out.println(replaceSpecialChars(null)); // null
}

Related

Simplifying/optimizing massive if...else if...else statement(s)

Okay so essentially, I have some code that uses the contains() method to detect the presence of specific characters in two strings. For extra context, this question is a good resource as to what kind of problem I'm having (and the third solution is also something I've looked into for this). Regardless, here is some of my code:
// code up here basically just concatenates different
// characters to Strings: stringX and stringY
if (stringX.contains("!\"#")) {
} else if (stringX.contains("$%&")) {
} else if (stringX.contains("\'()")) {
} else if (stringX.contains("!$\'")) {
} else if (stringX.contains("\"%(")) {
// literally 70+ more else-if statements
}
if (stringY.contains("!\"#")) {
} else if (stringY.contains("$%&")) {
} else if (stringY.contains("\'()")) {
} else if (stringY.contains("!$\'")) {
} else if (stringY.contains("\"%(")) {
// literally 70+ more else-if statements, all of which are
// exactly the same as those working with stringX
}
I'm still pretty new to Java programming, so I'm not sure how I should go about this. Maybe it is a non-issue? Also, if I can remedy this without using RegEx, that would be preferable; I am not very knowledgeable in it at this point it time. But if the only rational solution would be to utilize it, I will obviously do so.
Edit: The code within all of these else-if statements will not be very different from each other at all; basically just a System.out.println() with some information about what characters stringX/stringY contains.
Writing the same code more than once should immediately set off alarm bells in your head to move that code into a function so it can be reused.
As for simplifying the expression, the best approach is probably storing the patterns you're looking for as an array and iterating over the array with your condition.
private static final String[] patterns = new String[] {"!\"#", "$%&", "\'()", "!$\'", "\"%(", ...};
private static void findPatterns(String input) {
for (String pattern : patterns) {
if (input.contains(pattern) {
System.out.println("Found pattern: " + pattern);
}
}
}
// Elsewhere...
findPatterns(stringX);
findPatterns(stringY);
This pattern is especially common in functional and functional-style languages. Java 8 streams are a good example, so you could equivalently do
List<String> patterns = Arrays.asList("!\"#", "$%&", "\'()", "!$\'", "\"%(", ...);
patterns.stream()
.filter(pattern -> stringX.contains(pattern))
.forEach(pattern -> System.out.println("Found pattern: " + pattern));
can simply by make a list of your case. then using java 8 stream filter
List<String> pattems = Arrays.asList("!\"#", "$%&", ...);
Optional<String> matched = pattems.stream().filter(p -> stringX.contains(p));
if(matched.isPresent()) {
System.console().printf(matched.get())
}
java stream could make your peformance slower but not too much

One line check if String contains bannedSubstrings

I have a String title and a List<String> bannedSubstrings. Now I want to perform a one line check if title is free of those bannedSubstrings.
My approach:
if(bannedSubstrings.stream().filter(bannedSubstring -> title.contains(bannedSubstring)).isEmpty()){
...
}
Unfortunately, there is no isEmpty() method for streams. So how would you solve the problem? Is there a one line solution?
Sounds like you want to read up on anyMatch:
if (bannedSubstrings.stream().anyMatch(title::contains)) {
// bad words!
}
Inversely, there's also noneMatch:
if (bannedSubstrings.stream().noneMatch(title::contains)) {
// no bad words :D
}
This isn't very efficient if title is a long string (but titles usually aren't supposed to be long, I suppose).
If you want an efficient solution and you have many bannedSubstrings, I guess, it would be faster to join them into single regexp like this:
Pattern badWords = Pattern.compile(bannedSubstrings.stream().map(Pattern::quote)
.collect(Collectors.joining("|")));
Then use it like this:
if (badWords.matcher(title).find()) {
...
}
This should build a prefix tree from your substrings, so scanning will be significantly faster. If performance is not the concern in your case, use other answers.
I suppose you are looking for something like this:
if(bannedSubstrings.stream().anyMatch(title::contains)){
}
The answer you've selected is pretty good, but for real performance you'd probably be better off pre-compiling the list of bad words into a regex.
public class BannedWordChecker {
public final Pattern bannedWords;
public BannedWordChecker(Collection<String> bannedWords) {
this.bannedWords =
Pattern.compile(
bannedWords.stream()
.map(Pattern::quote)
.collect(Collectors.joining("|")));
}
public boolean containsBannedWords(String string) {
return bannedWords.matcher(string).find();
}
}

How to refactor to avoid passing "special values" into a Java method?

I'm sure there must be a standard way to do this, but my attempts to search Stackoverflow have failed.
I have a method like:
public void processSomeWidgetsForUser(int userItemId) {
Iterator<Widgets> iter = allWidgets.values().iterator();
while(iter.hasNext()) {
Widget thisWidget = iter.next();
if (userItemId == -1 || thisWidget.getUsersItemId() == userItemId) {
widget.process();
}
}
}
As you can see -1 is a "special value" meaning process all. Doing this saves repeating the loop code in another method called processSomeWidgetsForAllUsers.
But I dislike special values like this because they are easy to misuse or misunderstand, which is exactly the situation what I'm having to fix now (where someone thought -1 meant something else).
I can only think of two ways to improve this.
have a constant, containing -1 called something like
Widget.ALLWIDGETS which at least is self-documenting, but doesn't
stop code from using a -1 (if someone integrates old code in, for
example)
change the method to take a list of all user ids to
process, which can be empty, but that doesn't seem great
performance-wise (would need to retrieve all user ids first and then loop through
removing. Also what happens if the number of widgets in the list changes between
retreiving the ids and removing
Is there a better way? I'm sure I'm missing something obvious.
The above code has been changed slightly, so may not compile, but you should get the gist.
Although somewhat redundant, a fairly neat self-documenting approach could be to have 3 methods rather than one;
Make your original method private, and make one small change which would be to add your static final int EXECUTE_ALL = -1 and use that in your original method, then add the two new methods;
public void processWidget(int wID) throws IllegalArgumentException {
if(wID == EXECUTE_ALL) throw new IllegalArgumentException();
originalMethod(wID);
}
public void processAllWidgets() {
originalMethod(EXECUTE_ALL);
}
It makes your class a little more cluttered, but as far as the exposed methods go, it is clearer and hopefully foolproof. You could alter it not to throw an exception and just ignore any invalid ids, that just depends on your situation.
This approach of course has the major downside that it changes how the class appears to other classes, breaking everything that currently uses the, now private, originalMethod().
Number 1 would work very nicely. Be sure to document what the variable is though, so future coders (possibly yourself) know what it means.
/**This is the explanation for the below variable*/
public final static int ALL_WIDGETS = -1;
Have an external method like so:
static boolean idRepresentsAll(int id) {
return id == -1;
}
In this case, if you decide to replace it with a different mechanism, you only replace your magic number one place in your code.
At the very least, you would want to do something like this:
public static final int ID_REPRESENTING_ALL = -1;
You can change the method signature to accept a boolean for when you want to process them all.
public void processSomeWidgets(boolean doAll, int userItemId) {
Iterator<Widgets> iter = allWidgets.values().iterator();
while(iter.hasNext()) {
Widget thisWidget = iter.next();
if (doAll || thisWidget.getUsersItemId() == userItemId) {
widget.process();
}
}
}
This makes it more explicit, and easier to read in my opinion as there are no special values.

Switch or if statements in writing an interpreter in java

Current assignment needs me to write a program to read a file with instructions in a very tiny and basic programming language (behaves a little like FORTRAN) and execute those instructions. Basically it's a simple interpreter for the language I guess. It's completely linear, with statements all being defined in sequence and it only has String and integer variables. There are 8 keywords and 4 arithmetic operators I would need to find and define if they exist within the source file, and each line must start off with one of the reserved words.
A program in this language might look something like this:
#COMMENTS
LET.... (declares variables with values)
INTEGER myINT
STRING myString
CALCULATE...
PRINT
PRINTLN
END
Can I use a switch block instead of if-loops to find and then execute all these? My concern is that switches don't work with Strings in Java 6, which is what I'm supposed to be using, but I don't see how to easily assign various int values so the switch block would work. Thanks in advance for any suggestions and advice!
If your language is so simple that every statement begins in its own line and is identified by one word only, then (as Gray pointed out in another comment) you can split the words in each line, then compare the first word against a map. However, I would suggest, instead of mapping the words to ints and then doing one big switch, to map them into objects instead, like this (suggested by Dave Newton):
interface Directive {
public void execute(String line);
}
class LetDirective implements Directive {
public void execute(String line) { ...handle LET directive here... }
}
...define other directives in the same way...
Then define the map:
private Map<String, Directive> directives = new HashMap<String, Directive>();
directives.put("LET", new LetDirective());
...
Then in your parsing method:
int firstSpace = line.indexOf(' ');
String command = line;
if (firstSpace > 0)
command = line.substring(0, firstSpace);
Directive directive = directives.get(command.toUpperCase());
if (directive != null)
directive.execute(line);
else
...show some error...
Each directive would have to parse the rest of the line on its own and handle it correctly inside its execute() method.
The benefit of this over a switch is that you can handle a larger amount of commands without ending up with one gigantic method, but instead with one smaller method per each command.
If you are talking about converting strings to integers then you could do it with an Java enumerated type:
private enum ReservedWord {
LET,
...
}
// skip blank lines and comments
String[] tokens = codeLine.split(" ");
ReservedWord keyword;
try {
keyword = ReservedWord.valueOf(tokens[0]);
} catch (IllegalArgumentException e) {
// spit out nice syntax error message
}
You could also put the processing of the line inside of the enum as a method if you'd like. You could also do it with a Map:
private final Map<String, Integer> reservedWords = new HashMap<String, Integer>();
private final int RESERVED_WORD_LET 1
...
{
reservedWords.put("LET", RESERVED_WORD_LET);
...
}
// skip blank lines and comments
String[] tokens = codeLine.split(" ");
Integer value = reservedWords.get(tokens[0]);
if (value == null) // handle error... ;
switch (value) {
case 1:
// LET
...
}

How to replace tokens in a string without StringTokenizer

Given a string like so:
Hello {FIRST_NAME}, this is a personalized message for you.
Where FIRST_NAME is an arbitrary token (a key in a map passed to the method), to write a routine which would turn that string into:
Hello Jim, this is a personalized message for you.
given a map with an entry FIRST_NAME -> Jim.
It would seem that StringTokenizer is the most straight forward approach, but the Javadocs really say you should prefer to use the regex aproach. How would you do that in a regex based solution?
Thanks everyone for the answers!
Gizmo's answer was definitely out of the box, and a great solution, but unfortunately not appropriate as the format can't be limited to what the Formatter class does in this case.
Adam Paynter really got to the heart of the matter, with the right pattern.
Peter Nix and Sean Bright had a great workaround to avoid all of the complexities of the regex, but I needed to raise some errors if there were bad tokens, which that didn't do.
But in terms of both doing a regex and a reasonable replace loop, this is the answer I came up with (with a little help from Google and the existing answer, including Sean Bright's comment about how to use group(1) vs group()):
private static Pattern tokenPattern = Pattern.compile("\\{([^}]*)\\}");
public static String process(String template, Map<String, Object> params) {
StringBuffer sb = new StringBuffer();
Matcher myMatcher = tokenPattern.matcher(template);
while (myMatcher.find()) {
String field = myMatcher.group(1);
myMatcher.appendReplacement(sb, "");
sb.append(doParameter(field, params));
}
myMatcher.appendTail(sb);
return sb.toString();
}
Where doParameter gets the value out of the map and converts it to a string and throws an exception if it isn't there.
Note also I changed the pattern to find empty braces (i.e. {}), as that is an error condition explicitly checked for.
EDIT: Note that appendReplacement is not agnostic about the content of the string. Per the javadocs, it recognizes $ and backslash as a special character, so I added some escaping to handle that to the sample above. Not done in the most performance conscious way, but in my case it isn't a big enough deal to be worth attempting to micro-optimize the string creations.
Thanks to the comment from Alan M, this can be made even simpler to avoid the special character issues of appendReplacement.
Well, I would rather use String.format(), or better MessageFormat.
String.replaceAll("{FIRST_NAME}", actualName);
Check out the javadocs for it here.
Try this:
Note: The author's final solution builds upon this sample and is much more concise.
public class TokenReplacer {
private Pattern tokenPattern;
public TokenReplacer() {
tokenPattern = Pattern.compile("\\{([^}]+)\\}");
}
public String replaceTokens(String text, Map<String, String> valuesByKey) {
StringBuilder output = new StringBuilder();
Matcher tokenMatcher = tokenPattern.matcher(text);
int cursor = 0;
while (tokenMatcher.find()) {
// A token is defined as a sequence of the format "{...}".
// A key is defined as the content between the brackets.
int tokenStart = tokenMatcher.start();
int tokenEnd = tokenMatcher.end();
int keyStart = tokenMatcher.start(1);
int keyEnd = tokenMatcher.end(1);
output.append(text.substring(cursor, tokenStart));
String token = text.substring(tokenStart, tokenEnd);
String key = text.substring(keyStart, keyEnd);
if (valuesByKey.containsKey(key)) {
String value = valuesByKey.get(key);
output.append(value);
} else {
output.append(token);
}
cursor = tokenEnd;
}
output.append(text.substring(cursor));
return output.toString();
}
}
With import java.util.regex.*:
Pattern p = Pattern.compile("{([^{}]*)}");
Matcher m = p.matcher(line); // line being "Hello, {FIRST_NAME}..."
while (m.find) {
String key = m.group(1);
if (map.containsKey(key)) {
String value= map.get(key);
m.replaceFirst(value);
}
}
So, the regex is recommended because it can easily identify the places that require substitution in the string, as well as extracting the name of the key for substitution. It's much more efficient than breaking the whole string.
You'll probably want to loop with the Matcher line inside and the Pattern line outside, so you can replace all lines. The pattern never needs to be recompiled, and it's more efficient to avoid doing so unnecessarily.
The most straight forward would seem to be something along the lines of this:
public static void main(String[] args) {
String tokenString = "Hello {FIRST_NAME}, this is a personalized message for you.";
Map<String, String> tokenMap = new HashMap<String, String>();
tokenMap.put("{FIRST_NAME}", "Jim");
String transformedString = tokenString;
for (String token : tokenMap.keySet()) {
transformedString = transformedString.replace(token, tokenMap.get(token));
}
System.out.println("New String: " + transformedString);
}
It loops through all your tokens and replaces every token with what you need, and uses the standard String method for replacement, thus skipping the whole RegEx frustrations.
Depending on how ridiculously complex your string is, you could try using a more serious string templating language, like Velocity. In Velocity's case, you'd do something like this:
Velocity.init();
VelocityContext context = new VelocityContext();
context.put( "name", "Bob" );
StringWriter output = new StringWriter();
Velocity.evaluate( context, output, "",
"Hello, #name, this is a personalized message for you.");
System.out.println(output.toString());
But that is likely overkill if you only want to replace one or two values.
import java.util.HashMap;
public class ReplaceTest {
public static void main(String[] args) {
HashMap<String, String> map = new HashMap<String, String>();
map.put("FIRST_NAME", "Jim");
map.put("LAST_NAME", "Johnson");
map.put("PHONE", "410-555-1212");
String s = "Hello {FIRST_NAME} {LAST_NAME}, this is a personalized message for you.";
for (String key : map.keySet()) {
s = s.replaceAll("\\{" + key + "\\}", map.get(key));
}
System.out.println(s);
}
}
The docs mean that you should prefer writing a regex-based tokenizer, IIRC. What might work better for you is a standard regex search-replace.
Generally we'd use MessageFormat in a case like this, coupled with loading the actual message text from a ResourceBundle. This gives you the added benefit of being G10N friendly.

Categories

Resources