Replacing german umlauts generated by latex or bibtex tool in java? - java

I want to replace the german umlauts generated by a Citavi-Bibtex-Export-Tool. For example one reference string input is J{\"o}rgand I want Jörg as a result. After inspecting my JUnit-Test the result of my method was J{"o}rg - what went wrong?
public String replaceBibtexMutatedVowels(String str){
CharSequence target = "{\\\"o}";
CharSequence replacement = "ö";
str.replace(target, replacement);
return str;
}
UPDATE: Thanks guys - I was able to master german umlauts - unfortunately Bibtex escapes quotation marks with {\dg} - I was not able to create the corresponding java code.
String afterDg = "";
CharSequence targetDg = "{\\dg}";
CharSequence replacementDg = "\"";
afterDg = afterAe.replace(targetDg, replacementDg);
newStringInstance = afterDg;
return newStringInstance;

Basically, you are doing all right, but:
str.replace(target, replacement);
must be replaced with
str = str.replace(target, replacement);
because replace doesn't change the string itself, but returns a "replaced string".
P.S.: German has more special characters than "ö"; you're missing "ä", "ü" (and their corresponding capital letters), "ß" etc.
And here's my test code:
package test;
public class Test {
public static void main(String[] args) throws Exception {
String latexText = "J{\\\"o}rg";
String normalText = replaceBibtexMutatedVowels(latexText);
System.out.println(latexText);
System.out.println(normalText);
}
public static String replaceBibtexMutatedVowels(String str) {
CharSequence target = "{\\\"o}";
CharSequence replacement = "ö";
str = str.replace(target, replacement);
return str;
}
}

Related

How do I remove all whitespaces from a string?

I'm trying to remove all whitespaces from a string. I've googled a lot and found only replaceAll() method is used to remove whitespace. However, in an assignment I'm doing for an online course, it says to use replace() method to remove all whitespaces and to use \n for newline character and \t for tab characters. I tried it, here's my code:
public static String removeWhitespace(String s) {
String gtg = s.replace(' ', '');
gtg = s.replace('\t', '');
gtg = s.replace('\n', '');
return gtg;
}
After compiling, I get the error message:
Error:(12, 37) java: empty character literal
Error:(13, 37) java: empty character literal
Error:(14, 37) java: empty character literal
All 3 refer to the above replace() code in public static String removeWhitespace(String s).
I'd be grateful if someone pointed out what I'm doing wrong.
There are two flavors of replace() - one that takes chars and one that takes Strings. You are using the char type, and that's why you can't specify a "nothing" char.
Use the String verison:
gtg = gtg.replace("\t", "");
Notice also the bug I corrected there: your code replaces chars from the original string over and over, so only the last replace will be effected.
You could just code this instead:
public static String removeWhitespace(String s) {
return s.replaceAll("\\s", ""); // use regex
}
Try this code,
public class Main {
public static void main(String[] args) throws Exception {
String s = " Test example hello string replace enjoy hh ";
System.out.println("Original String : "+s);
s = s.replace(" ", "");
System.out.println("Final String Without Spaces : "+s);
}
}
Output :
Original String : Test example hello string replace enjoy hh
Final String Without Spaces : Testexamplehellostringreplaceenjoyhh
Another way by using char array :
public class Main {
public static void main(String[] args) throws Exception {
String s = " Test example hello string replace enjoy hh ";
System.out.println("Original String : "+s);
String ss = removeWhitespace(s);
System.out.println("Final String Without Spaces : "+ss);
}
public static String removeWhitespace(String s) {
char[] charArray = s.toCharArray();
String gtg = "";
for(int i =0; i<charArray.length; i++){
if ((charArray[i] != ' ') && (charArray[i] != '\t') &&(charArray[i] != '\n')) {
gtg = gtg + charArray[i];
}
}
return gtg;
}
}
Output :
Original String : Test example hello string replace enjoy hh
Final String Without Spaces : Testexamplehellostringreplaceenjoyhh
If you want to specify an empty character for the replace(char,char) method, you should do it like this:
public static String removeWhitespace(String s) {
// decimal format, or hexadecimal format
return s.replace(' ', (char) 0)
.replace('\f', (char) 0)
.replace('\n', (char) 0)
.replace('\r', '\u0000')
.replace('\t', '\u0000');
}
But an empty character is still a character, therefore it is better to specify an empty string for the replace(CharSequence,CharSequence) method to remove those characters:
public static String removeWhitespace(String s) {
return s.replace(" ", "")
.replace("\f", "")
.replace("\n", "")
.replace("\r", "")
.replace("\t", "");
}
To simplify this code, you can specify a regular expression for the replaceAll(String,String) method to remove all whitespace characters:
public static String removeWhitespace(String s) {
return s.replaceAll("\\s", "");
}
See also:
• Replacing special characters from a string
• First unique character in a string using LinkedHashMap

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

I need to split a string in Java (first remove whitespaces between quotes and then split at whitespaces.)
"abc test=\"x y z\" magic=\" hello \" hola"
becomes:
firstly:
"abc test=\"xyz\" magic=\"hello\" hola"
and then:
abc
test="xyz"
magic="hello"
hola
Scenario :
I am getting a string something like above from input and I want to break it into parts as above. One way to approach was first remove the spaces between quotes and then split at spaces. Also string before quotes complicates it. Second one was split at spaces but not if inside quote and then remove spaces from individual split. I tried capturing quotes with "\"([^\"]+)\"" but I'm not able to capture just the spaces inside quotes. I tried some more but no luck.
We can do this using a formal pattern matcher. The secret sauce of the answer below is to use the not-much-used Matcher#appendReplacement method. We pause at each match, and then append a custom replacement of anything appearing inside two pairs of quotes. The custom method removeSpaces() strips all whitespace from each quoted term.
public static String removeSpaces(String input) {
return input.replaceAll("\\s+", "");
}
String input = "abc test=\"x y z\" magic=\" hello \" hola";
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer("");
while (m.find()) {
m.appendReplacement(sb, "\"" + removeSpaces(m.group(1)) + "\"");
}
m.appendTail(sb);
String[] parts = sb.toString().split("\\s+");
for (String part : parts) {
System.out.println(part);
}
abc
test="xyz"
magic="hello"
hola
Demo
The big caveat here, as the above comments hinted at, is that we are really using a regex engine as a rudimentary parser. To see where my solution would fail fast, just remove one of the quotes by accident from a quoted term. But, if you are sure you input is well formed as you have showed us, this answer might work for you.
I wanted to mention the java 9's Matcher.replaceAll lambda extension:
// Find quoted strings and remove there whitespace:
s = Pattern.compile("\"[^\"]*\"").matcher(s)
.replaceAll(mr -> mr.group().replaceAll("\\s", ""));
// Turn the remaining whitespace in a comma and brace all.
s = '{' + s.trim().replaceAll("\\s+", ", ") + '}';
Probably the other answer is better but still I have written it so I will post it here ;) It takes a different approach
public static void main(String[] args) {
String test="abc test=\"x y z\" magic=\" hello \" hola";
Pattern pattern = Pattern.compile("([^\\\"]+=\\\"[^\\\"]+\\\" )");
Matcher matcher = pattern.matcher(test);
int lastIndex=0;
while(matcher.find()) {
String[] parts=matcher.group(0).trim().split("=");
boolean newLine=false;
for (String string : parts[0].split("\\s+")) {
if(newLine)
System.out.println();
newLine=true;
System.out.print(string);
}
System.out.println("="+parts[1].replaceAll("\\s",""));
lastIndex=matcher.end();
}
System.out.println(test.substring(lastIndex).trim());
}
Result is
abc
test="xyz"
magic="hello"
hola
It sounds like you want to write a basic parser/Tokenizer. My bet is that after you make something that can deal with pretty printing in this structure, you will soon want to start validating that there arn't any mis-matching "'s.
But in essence, you have a few stages for this particular problem, and Java has a built in tokenizer that can prove useful.
import java.util.LinkedList;
import java.util.List;
import java.util.StringTokenizer;
import java.util.stream.Collectors;
public class Q50151376{
private static class Whitespace{
Whitespace(){ }
#Override
public String toString() {
return "\n";
}
}
private static class QuotedString {
public final String string;
QuotedString(String string) {
this.string = "\"" + string.trim() + "\"";
}
#Override
public String toString() {
return string;
}
}
public static void main(String[] args) {
String test = "abc test=\"x y z\" magic=\" hello \" hola";
StringTokenizer tokenizer = new StringTokenizer(test, "\"");
boolean inQuotes = false;
List<Object> out = new LinkedList<>();
while (tokenizer.hasMoreTokens()) {
final String token = tokenizer.nextToken();
if (inQuotes) {
out.add(new QuotedString(token));
} else {
out.addAll(TokenizeWhitespace(token));
}
inQuotes = !inQuotes;
}
System.out.println(joinAsStrings(out));
}
private static String joinAsStrings(List<Object> out) {
return out.stream()
.map(Object::toString)
.collect(Collectors.joining());
}
public static List<Object> TokenizeWhitespace(String in){
List<Object> out = new LinkedList<>();
StringTokenizer tokenizer = new StringTokenizer(in, " ", true);
boolean ignoreWhitespace = false;
while (tokenizer.hasMoreTokens()){
String token = tokenizer.nextToken();
boolean whitespace = token.equals(" ");
if(!whitespace){
out.add(token);
ignoreWhitespace = false;
} else if(!ignoreWhitespace) {
out.add(new Whitespace());
ignoreWhitespace = true;
}
}
return out;
}
}

How to replace special character In Android?

I have to create file with user define name. If User can use the special character then i want to replace that special character with my specific string. i found the method like this.
String replaceString(String string) {
return string.replaceAll("special_char","");
}
but how to use this method.?
relpaceAll method is required regular expression and replace string.
string.replaceAll("regularExpression","replaceString");
You can use this regular expression :
"[;\\/:*?\"<>|&']"
e.g.
String replaceString(String string) {
return string.replaceAll("[;\\/:*?\"<>|&']","replaceString");
}
Try
regular expression
static String replaceString(String string) {
return string.replaceAll("[^A-Za-z0-9 ]","");// removing all special character.
}
call
public static void main(String[] args) {
String str=replaceString("Hello\t\t\t.. how\t\t are\t you."); // call to replace special character.
System.out.println(str);
}
output:
Hello how are you
use below function to replace your string
public static String getString(String p_value)
{
String p_value1 = p_value;
if(p_value1 != null && ! p_value1.isEmpty())
{
p_value1 = p_value1.replace("Replace string from", "Replace String to");
return p_value1;
}
else
{
return "";
}
}
example
Replace string from = "\n";
Replace String to = "\r\n";
after using above function \n is replace with \r\n
**this method make two line your string data after specific word **
public static String makeTwoPart(String data, String cutAfterThisWord){
String result = "";
String val1 = data.substring(0, data.indexOf(cutAfterThisWord));
String va12 = data.substring(val1.length(), data.length());
String secondWord = va12.replace(cutAfterThisWord, "");
Log.d("VAL_2", secondWord);
String firstWord = data.replace(secondWord, "");
Log.d("VAL_1", firstWord);
result = firstWord + "\n" + secondWord;
return result;
}

Encode only specific characters in String

I have to encode only some special characters in a string to numeric value.
Say,
String name = "test $#";
I want to encode only characters $ and # in the above string. I tried using below code but it did not work out.
String encode = URLEncoder.encode(StringEscapeUtils.escapeJava(name), "UTF-8");
The encoded value will be like, for white space the encoded value is &#160
What about to split that String (by string#split method - with space as regex), from Array, which it returns you can use last item and you will get there symbols, what you need :)
String name = "test $#";
String nameSplittedArr = name.split(" ");
String yourChars = nameSplittedArr[nameSplittedArr.length-1]; //indexes from zero
That should works :)
As per the comments, I think you are after a customized encoding function. Something like:
public static String EncodeString(String text) {
StringBuffer sb = new StringBuffer();
for (char c : text.toCharArray()) {
if (Character.isLetterOrDigit(c)) {
sb.append(c);
} else {
sb.append("&#" + (int)c + ";");
}
}
return sb.toString();
}
An example of this is here.

How do you print out a string exactly as it is?

I had an issue with my code because my file path somehow ended up with a "\n" at the end of the path which caused issues when trying to use the file, as it would not be able to find that file.
For debugging purposes, how can I print out a string INCLUDING things like \b \n \r etc.?
E.g.
System.out.println(file.getAbsolutePath).withSpecials()
which will print to console:
C:/folder/filename.extension\n
You could try using this code, which escapes a string. This takes care of all escapes except \u, which should display fine anyway.
public static String escape(String str) {
str = str.replace("\b", "\\b");
str = str.replace("\t", "\\t");
str = str.replace("\n", "\\n");
str = str.replace("\r", "\\r");
str = str.replace("\f", "\\f");
str = str.replace("\'", "\\'");
str = str.replace("\\", "\\\\");
return str;
}
This function can be used as follows:
System.out.println(escape("123\n\rabc"));
public class Main {
public static void main(String arg[]) {
String str = "bla\r\n";
System.out.print(str); // prints "bla" and breaks line
System.out.print(Main.withEndings(str)); // prints "bla\r\n"
// Breaks a line
System.out.println();
// Every char is a number, Java uses by default UTF-16 char encoding
char end = '\n';
System.out.println("Char code: " + (int)end); // prints "Char code: 10"
}
public static String withEndings(String str) {
// Replace the character '\n' to a string with 2 characters the '\' and the 'n',
// the same to '\r'.
return str.replace("\n", "\\n").replace("\r", "\\r");
}
}
You can print the \n by doing string.replace("\n", "\\\\n");
So to print it out do: System.out.println(file.getAbsolutePath().replace("\n", "\\\\n"));

Categories

Resources