Remove junk characters from string in java - java

I have the string like:
TEST
FURNITURE-34_TEST>
My requirement is to remove all those junk characters from the above string.
so my expected output will be:
TEST FURNITURE-34_TEST
I have tried the below code
public static String removeUnPrintableChars(String str) {
if (str != null) {
str = str.replaceAll("[^\\x00-\\x7F]", "");
str = str.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");
str = str.replaceAll("\\p{C}", "");
str = str.replaceAll("\\P{Print}", "");
str = str.substring(0, Math.min(256, str.length()));
str = str.trim();
if (str.isEmpty()) {
str = null;
}
}
return str;
}
But it does nothing. Instead of finding and replacing each character as empty, can anyone please help me with the generic solution to replace those kinds of characters from the string?

Simple way to split a string :
public class Trim {
public static void main(String[] args) {
String myString = "TEST FURNITURE-34_TEST&"
+ "amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#38;amp;amp;"
+ "#38;amp;#38;gt;";
String[] parts = myString.split("&");
String part1 = parts[0];
System.out.println(parts[0]);
}
}
Link to original thread :
How to split a string in Java

The sample strings you are presenting (within your post and in comments) are rather ridiculous and in my opinion, whatever is generating them should be burned....twice.
Try the following method on your string(s). Add whatever you like to have removed from the input string by adding it to the 2D removableItems String Array. This 2D array contains preparation strings for the String#replaceAll() method. The first element of each row contains a Regular Expression (regex) of a particular string item to replace and the second element of each row contains the string item to replace the found items with.
public static String cleanString(String inputString) {
String[][] removableItems = {
{"(&?amp;){1,}", " "},
{"(#38);?", ""},
{"gt;", ""}, {"lt;", ""}
};
String desiredString = inputString;
for (int i = 0; i < removableItems.length; i++) {
desiredString = desiredString.replaceAll(removableItems[i][0],
removableItems[i][1]).trim();
}
return desiredString;
}

You can use this method. This is work with marking word boundaries.
public static String removeUnPrintableChars(String str) {
if(str != null){
str = str.replaceAll("(\\b&?\\w+;#?)", "");
}
return str;
}

Related

How to remove multiple words from a string Java

I'm new to java and currently, I'm learning strings.
How to remove multiple words from a string?
I would be glad for any hint.
class WordDeleterTest {
public static void main(String[] args) {
WordDeleter wordDeleter = new WordDeleter();
// Hello
System.out.println(wordDeleter.remove("Hello Java", new String[] { "Java" }));
// The Athens in
System.out.println(wordDeleter.remove("The Athens is in Greece", new String[] { "is", "Greece" }));
}
}
class WordDeleter {
public String remove(String phrase, String[] words) {
String[] array = phrase.split(" ");
String word = "";
String result = "";
for (int i = 0; i < words.length; i++) {
word += words[i];
}
for (String newWords : array) {
if (!newWords.equals(word)) {
result += newWords + " ";
}
}
return result.trim();
}
}
Output:
Hello
The Athens is in Greece
I've already tried to use replacе here, but it didn't work.
You can do it using streams:
String phrase = ...;
List<String> wordsToRemove = ...;
String result = Arrays.stream(phrase.split("\s+"))
.filter(w -> !wordsToRemove.contains(w))
.collect(Collectors.joining(" "));
Programmers often do this:
String sentence = "Hello Java World!";
sentence.replace("Java", "");
System.out.println(sentence);
=> Hello Java World
Strings are immutable, and the replace function returns a new string object. So instead write
String sentence = "Hello Java World!";
sentence = sentence.replace("Java", "");
System.out.println(sentence);
=> Hello World!
(the whitespace still exists)
With that, your replace function could look like
public String remove(String phrase, String[] words) {
String result = phrase;
for (String word: words) {
result = result.replace(word, "").replace(" ", " ");
}
return result.trim();
}

How to separate a String line with a paragraph to make text as a list

I have a really long text that looks like "123testes1233iambeginnerplshelp123 .." and I need to separate the line with the paragraph each time the program reads number.
So output should be like:
123tests
12333iambeninnerplshelp
123 ...
You can solve it using Regex. Everytime we are looking for patterns where number is followed by characters and if it is found, print it:
String text = "123testes1233stackoverflowwillsaveyou123dontworry";
String wordToFind = "\\d+[a-z]+";
Pattern word = Pattern.compile(wordToFind);
Matcher match = word.matcher(text);
while (match.find()) {
System.out.println(match.group());
}
One way to do it would be to use StringTokenizer. If you make the assumption that every output line must start with 123, even if the input doesn't start with it, it could be:
String input = "123testes1233iambeginnerplshelp123 ..";
String delimiter = "123";
StringTokenizer tokenizer = new StringTokenizer(input, delimiter);
while (tokenizer.hasMoreTokens()) {
String line = delimiter + tokenizer.nextToken();
System.out.println(line);
}
A simple approach (without any dependencies) would look something like this,
class Test {
public static void main (String[] args) throws java.lang.Exception
{
String a = "123testes1233iambeginnerplshelp123";
StringBuffer sb = new StringBuffer();
for (int i=0; i<a.length()-1; i++) {
while (i<a.length()-1 && !(!isNumber(a.charAt(i)) && isNumber(a.charAt(i+1)))) {
sb.append(a.substring(i,i+1));
i++;
}
sb.append(a.substring(i,i+1));
System.out.println(sb.toString());
sb.setLength(0);
}
}
private static boolean isNumber (char c) {
return ((int)c >=48) && ((int)c <= 57);
}
}
my solution
public StringSplitNum(){
String someString = "123testes1233iambeginnerplshelp123abc";
String regex = "((?<=[a-zA-Z])(?=[0-9]))|((?<=[0-9])(?=[a-zA-Z]))";
List arr = Arrays.asList(someString.split(regex));
for(int i=0; i< arr.size();i+=2){
System.out.println(arr.get(i)+ " " + arr.get(i+1));
}

Split a string with multiple delimiters using only String methods

I want to split a string into tokens.
I ripped of another Stack Overflow question - Equivalent to StringTokenizer with multiple characters delimiters, but I want to know if this can be done with only string methods (.equals(), .startsWith(), etc.). I don't want to use RegEx's, the StringTokenizer class, Patterns, Matchers or anything other than String for that matter.
For example, this is how I want to call the method
String[] delimiters = {" ", "==", "=", "+", "+=", "++", "-", "-=", "--", "/", "/=", "*", "*=", "(", ")", ";", "/**", "*/", "\t", "\n"};
String splitString[] = tokenizer(contents, delimiters);
And this is the code I ripped of the other question (I don't want to do this).
private String[] tokenizer(String string, String[] delimiters) {
// First, create a regular expression that matches the union of the
// delimiters
// Be aware that, in case of delimiters containing others (example &&
// and &),
// the longer may be before the shorter (&& should be before &) or the
// regexpr
// parser will recognize && as two &.
Arrays.sort(delimiters, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return -o1.compareTo(o2);
}
});
// Build a string that will contain the regular expression
StringBuilder regexpr = new StringBuilder();
regexpr.append('(');
for (String delim : delimiters) { // For each delimiter
if (regexpr.length() != 1)
regexpr.append('|'); // Add union separator if needed
for (int i = 0; i < delim.length(); i++) {
// Add an escape character if the character is a regexp reserved
// char
regexpr.append('\\');
regexpr.append(delim.charAt(i));
}
}
regexpr.append(')'); // Close the union
Pattern p = Pattern.compile(regexpr.toString());
// Now, search for the tokens
List<String> res = new ArrayList<String>();
Matcher m = p.matcher(string);
int pos = 0;
while (m.find()) { // While there's a delimiter in the string
if (pos != m.start()) {
// If there's something between the current and the previous
// delimiter
// Add it to the tokens list
res.add(string.substring(pos, m.start()));
}
res.add(m.group()); // add the delimiter
pos = m.end(); // Remember end of delimiter
}
if (pos != string.length()) {
// If it remains some characters in the string after last delimiter
// Add this to the token list
res.add(string.substring(pos));
}
// Return the result
return res.toArray(new String[res.size()]);
}
public static String[] clean(final String[] v) {
List<String> list = new ArrayList<String>(Arrays.asList(v));
list.removeAll(Collections.singleton(" "));
return list.toArray(new String[list.size()]);
}
Edit: I ONLY want to use string methods charAt, equals, equalsIgnoreCase, indexOf, length, and substring
EDIT:
My original answer did not quite do the trick, it did not include the delimiters in the resultant array, and used the String.split() method, which was not allowed.
Here's my new solution, which is split into 2 methods:
/**
* Splits the string at all specified literal delimiters, and includes the delimiters in the resulting array
*/
private static String[] tokenizer(String subject, String[] delimiters) {
//Sort delimiters into length order, starting with longest
Arrays.sort(delimiters, new Comparator<String>() {
#Override
public int compare(String s1, String s2) {
return s2.length()-s1.length();
}
});
//start with a list with only one string - the whole thing
List<String> tokens = new ArrayList<String>();
tokens.add(subject);
//loop through the delimiters, splitting on each one
for (int i=0; i<delimiters.length; i++) {
tokens = splitStrings(tokens, delimiters, i);
}
return tokens.toArray(new String[] {});
}
/**
* Splits each String in the subject at the delimiter
*/
private static List<String> splitStrings(List<String> subject, String[] delimiters, int delimiterIndex) {
List<String> result = new ArrayList<String>();
String delimiter = delimiters[delimiterIndex];
//for each input string
for (String part : subject) {
int start = 0;
//if this part equals one of the delimiters, don't split it up any more
boolean alreadySplit = false;
for (String testDelimiter : delimiters) {
if (testDelimiter.equals(part)) {
alreadySplit = true;
break;
}
}
if (!alreadySplit) {
for (int index=0; index<part.length(); index++) {
String subPart = part.substring(index);
if (subPart.indexOf(delimiter)==0) {
result.add(part.substring(start, index)); // part before delimiter
result.add(delimiter); // delimiter
start = index+delimiter.length(); // next parts starts after delimiter
}
}
}
result.add(part.substring(start)); // rest of string after last delimiter
}
return result;
}
Original Answer
I notice you are using Pattern when you said you only wanted to use String methods.
The approach I would take would be to think of the simplest way possible. I think that is to first replace all the possible delimiters with just one delimiter, and then do the split.
Here's the code:
private String[] tokenizer(String string, String[] delimiters) {
//replace all specified delimiters with one
for (String delimiter : delimiters) {
while (string.indexOf(delimiter)!=-1) {
string = string.replace(delimiter, "{split}");
}
}
//now split at the new delimiter
return string.split("\\{split\\}");
}
I need to use String.replace() and not String.replaceAll() because replace() takes literal text and replaceAll() takes a regex argument, and the delimiters supplied are of literal text.
That's why I also need a while loop to replace all instances of each delimiter.
Using only non-regex String methods...
I used the startsWith(...) method, which wasn't in the exclusive list of methods that you listed because it does simply string comparison rather than a regex comparison.
The following impl:
public static void main(String ... params) {
String haystack = "abcdefghijklmnopqrstuvwxyz";
String [] needles = new String [] { "def", "tuv" };
String [] tokens = splitIntoTokensUsingNeedlesFoundInHaystack(haystack, needles);
for (String string : tokens) {
System.out.println(string);
}
}
private static String[] splitIntoTokensUsingNeedlesFoundInHaystack(String haystack, String[] needles) {
List<String> list = new LinkedList<String>();
StringBuilder builder = new StringBuilder();
for(int haystackIndex = 0; haystackIndex < haystack.length(); haystackIndex++) {
boolean foundAnyNeedle = false;
String substring = haystack.substring(haystackIndex);
for(int needleIndex = 0; (!foundAnyNeedle) && needleIndex < needles.length; needleIndex ++) {
String needle = needles[needleIndex];
if(substring.startsWith(needle)) {
if(builder.length() > 0) {
list.add(builder.toString());
builder = new StringBuilder();
}
foundAnyNeedle = true;
list.add(needle);
haystackIndex += (needle.length() - 1);
}
}
if( ! foundAnyNeedle) {
builder.append(substring.charAt(0));
}
}
if(builder.length() > 0) {
list.add(builder.toString());
}
return list.toArray(new String[]{});
}
outputs
abc
def
ghijklmnopqrs
tuv
wxyz
Note...
This code is demo-only. In the event that one of the delimiters is any empty String, it will behave poorly and eventually crash with OutOfMemoryError: Java heap space after consuming a lot of CPU.
As far as i understood your problem you can do something like this -
public Object[] tokenizer(String value, String[] delimeters){
List<String> list= new ArrayList<String>();
for(String s:delimeters){
if(value.contains(s)){
String[] strArr=value.split("\\"+s);
for(String str:strArr){
list.add(str);
if(!list.contains(s)){
list.add(s);
}
}
}
}
Object[] newValues=list.toArray();
return newValues;
}
Now in the main method call this function -
String[] delimeters = {" ", "{", "==", "=", "+", "+=", "++", "-", "-=", "--", "/", "/=", "*", "*=", "(", ")", ";", "/**", "*/", "\t", "\n"};
Object[] obj=st.tokenizer("ge{ab", delimeters); //st is the reference of the other class. Edit this of your own.
for(Object o:obj){
System.out.println(o.toString());
}
Suggestion:
private static int INIT_INDEX_MAX_INT = Integer.MAX_VALUE;
private static String[] tokenizer(final String string, final String[] delimiters) {
final List<String> result = new ArrayList<>();
int currentPosition = 0;
while (currentPosition < string.length()) {
// plan: search for the nearest delimiter and its position
String nextDelimiter = "";
int positionIndex = INIT_INDEX_MAX_INT;
for (final String currentDelimiter : delimiters) {
final int currentPositionIndex = string.indexOf(currentDelimiter, currentPosition);
if (currentPositionIndex < 0) { // current delimiter not found, go to the next
continue;
}
if (currentPositionIndex < positionIndex) { // we found a better one, update
positionIndex = currentPositionIndex;
nextDelimiter = currentDelimiter;
}
}
if (positionIndex == INIT_INDEX_MAX_INT) { // we found nothing, finish up
final String finalPart = string.substring(currentPosition, string.length());
result.add(finalPart);
break;
}
// we have one, add substring + delimiter to result and update current position
// System.out.println(positionIndex + ":[" + nextDelimiter + "]"); // to follow the internals
final String stringBeforeNextDelimiter = string.substring(currentPosition, positionIndex);
result.add(stringBeforeNextDelimiter);
result.add(nextDelimiter);
currentPosition += stringBeforeNextDelimiter.length() + nextDelimiter.length();
}
return result.toArray(new String[] {});
}
Notes:
I have added more comments than necessary. I guess it would help in this case.
The perfomance of this is quite bad (could be improved with tree structures and hashes). It was no part of the specification.
Operator precedence is not specified (see my comment to the question). It was no part of the specification.
I ONLY want to use string methods charAt, equals, equalsIgnoreCase, indexOf, length, and substring
Check. The function uses only indexOf(), length() and substring()
No, I mean in the returned results. For example, If my delimiter was {, and a string was ge{ab, I would like an array with ge, { and ab
Check:
private static void test() {
final String[] delimiters = { "{" };
final String contents = "ge{ab";
final String splitString[] = tokenizer(contents, delimiters);
final String joined = String.join("", splitString);
System.out.println(Arrays.toString(splitString));
System.out.println(contents.equals(joined) ? "ok" : "wrong: [" + contents + "]#[" + joined + "]");
}
// [ge, {, ab]
// ok
One final remark: I should advice to read about compiler construction, in particular the compiler front end, if one wants to have best practices for this kind of question.
Maybe I haven't fully understood the question, but I have the impression that you want to rewrite the Java String method split(). I would advise you to have a look at this function, see how it's done and start from there.
Honestly, you could use Apache Commons Lang. If you check the source code of library you will notice that it doesn't uses Regex. Only String and a lot of flags is used in method [StringUtils.split](http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#split(java.lang.String, java.lang.String)).
Anyway, take a look in code using the Apache Commons Lang.
import org.apache.commons.lang.StringUtils;
import org.junit.Assert;
import org.junit.Test;
public class SimpleTest {
#Test
public void testSplitWithoutRegex() {
String[] delimiters = {"==", "+=", "++", "-=", "--", "/=", "*=", "/**", "*/",
" ", "=", "+", "-", "/", "*", "(", ")", ";", "\t", "\n"};
String finalDelimiter = "#";
//check if demiliter can be used
boolean canBeUsed = true;
for (String delimiter : delimiters) {
if (finalDelimiter.equals(delimiter)) {
canBeUsed = false;
break;
}
}
if (!canBeUsed) {
Assert.fail("The selected delimiter can't be used.");
}
String s = "Assuming that we have /** or /* all these signals like == and; / or * will be replaced.";
System.out.println(s);
for (String delimiter : delimiters) {
while (s.indexOf(delimiter) != -1) {
s = s.replace(delimiter, finalDelimiter);
}
}
String[] splitted = StringUtils.split(s, "#");
for (String s1 : splitted) {
System.out.println(s1);
}
}
}
I hope it helps.
As simple as I could get it...
public class StringTokenizer {
public static String[] split(String s, String[] tokens) {
Arrays.sort(tokens, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.length()-o1.length();
}
});
LinkedList<String> result = new LinkedList<>();
int j=0;
for (int i=0; i<s.length(); i++) {
String ss = s.substring(i);
for (String token : tokens) {
if (ss.startsWith(token)) {
if (i>j) {
result.add(s.substring(j, i));
}
result.add(token);
j = i+token.length();
i = j-1;
break;
}
}
}
result.add(s.substring(j));
return result.toArray(new String[result.size()]);
}
}
It does a lot of new objects creation - and could be optimized by writing custom startsWith() implementation that would compare char by char of the string.
#Test
public void test() {
String[] split = StringTokenizer.split("this==is the most>complext<=string<<ever", new String[] {"=", "<", ">", "==", ">=", "<="});
assertArrayEquals(new String[] {"this", "==", "is the most", ">", "complext", "<=", "string", "<", "<", "ever"}, split);
}
passes fine :)
You can use recursion (a hallmark of functional programming) to make it less verbose.
public static String[] tokenizer(String text, String[] delims) {
for(String delim : delims) {
int i = text.indexOf(delim);
if(i >= 0) {
// recursive call
String[] tail = tokenizer(text.substring(i + delim.length()), delims);
// return [ head, middle, tail.. ]
String[] list = new String[tail.length + 2];
list[0] = text.substring(0,i);
list[1] = delim;
System.arraycopy(tail, 0, list, 2, tail.length);
return list;
}
}
return new String[] { text };
}
Tested it using the same unit-test from the other answer
public static void main(String ... params) {
String haystack = "abcdefghijklmnopqrstuvwxyz";
String [] needles = new String [] { "def", "tuv" };
String [] tokens = tokenizer(haystack, needles);
for (String string : tokens) {
System.out.println(string);
}
}
Output
abc
def
ghijklmnopqrs
tuv
wxyz
It would be a little more elegant if Java had better native array support.

How do i remove the whitespace?

I'm trying to create a palindrome tester program for my AP Java class and I need to remove the white spaces in my code completely but it's not letting me do so.
import java.util.Scanner;
public class Palin{
public static boolean isPalindrome(String stringToTest) {
String workingCopy = removeJunk(stringToTest);
String reversedCopy = reverse(workingCopy);
return reversedCopy.equalsIgnoreCase(workingCopy);
}
public static String removeJunk(String string) {
int i, len = string.length();
StringBuffer dest = new StringBuffer(len);
char c;
for (i = (len - 1); i >= 0; i-=1) {
c = string.charAt(i);
if (Character.isLetterOrDigit(c))
{
dest.append(c);
}
}
return dest.toString();
}
public static String reverse(String string) {
StringBuffer sb = new StringBuffer(string);
return sb.reverse().toString();
}
public static void main(String[] args) {
System.out.print("Enter Palindrome: ");
Scanner sc = new Scanner(System.in);
String string = sc.next();
String str = string;
String space = "";
String result = str.replaceAll("\\W", space);
System.out.println(result);
System.out.println();
System.out.println("Testing palindrome:");
System.out.println(" " + string);
System.out.println();
if (isPalindrome(result)) {
System.out.println("It's a palindrome!");
} else {
System.out.println("Not a palindrome!");
}
System.out.println();
}
}
Any help would be greatly appreciated.
Seems like your code is fine except for the following. You are using
String string = sc.next();
which will not read the whole line of input, hence you will lose part of the text. I think you should use the following instead of that line.
String string = sc.nextLine();
If you just want to remove the beginning and ending whitespace, you can use the built in function trim(), e.g. " abcd ".trim() is "abcd"
If you want to remove it everywhere, you can use the replaceAll() method with the whitespace class as the parameter, e.g. " abcd ".replaceAll("\W","").
Use a StringTokenizer to remove " "
StringTokenizer st = new StringTokenizer(string," ",false);
String t="";
while (st.hasMoreElements()) t += st.nextElement();
String result = t;
System.out.println(result);
I haven't actually tesed this, but have you considered the String.replaceAll(String regex, String replacement) method?
public static String removeJunk (String string) {
return string.replaceAll (" ", "");
}
Another thing to look out for is that while removing all non-digit/alpha characters removeJunk also reverses the string (it starts from the end and then appends one character at a time).
So after reversing it again (in reverse) you are left with the original and it will always claim that the given string is a palindrome.
You should use the String replace(char oldChar, char newChar) method.
Although the name suggests that only the first occurrence will be replaced, fact is that all occurrences will be replaced. The advantage of this method is that it won't use regular expressions, thus is more efficient.
So give a try to string.replace(' ', '');

How to capitalize the first character of each word in a string

Is there a function built into Java that capitalizes the first character of each word in a String, and does not affect the others?
Examples:
jon skeet -> Jon Skeet
miles o'Brien -> Miles O'Brien (B remains capital, this rules out Title Case)
old mcdonald -> Old Mcdonald*
*(Old McDonald would be find too, but I don't expect it to be THAT smart.)
A quick look at the Java String Documentation reveals only toUpperCase() and toLowerCase(), which of course do not provide the desired behavior. Naturally, Google results are dominated by those two functions. It seems like a wheel that must have been invented already, so it couldn't hurt to ask so I can use it in the future.
WordUtils.capitalize(str) (from apache commons-text)
(Note: if you need "fOO BAr" to become "Foo Bar", then use capitalizeFully(..) instead)
If you're only worried about the first letter of the first word being capitalized:
private String capitalize(final String line) {
return Character.toUpperCase(line.charAt(0)) + line.substring(1);
}
The following method converts all the letters into upper/lower case, depending on their position near a space or other special chars.
public static String capitalizeString(String string) {
char[] chars = string.toLowerCase().toCharArray();
boolean found = false;
for (int i = 0; i < chars.length; i++) {
if (!found && Character.isLetter(chars[i])) {
chars[i] = Character.toUpperCase(chars[i]);
found = true;
} else if (Character.isWhitespace(chars[i]) || chars[i]=='.' || chars[i]=='\'') { // You can add other chars here
found = false;
}
}
return String.valueOf(chars);
}
Try this very simple way
example givenString="ram is good boy"
public static String toTitleCase(String givenString) {
String[] arr = givenString.split(" ");
StringBuffer sb = new StringBuffer();
for (int i = 0; i < arr.length; i++) {
sb.append(Character.toUpperCase(arr[i].charAt(0)))
.append(arr[i].substring(1)).append(" ");
}
return sb.toString().trim();
}
Output will be: Ram Is Good Boy
I made a solution in Java 8 that is IMHO more readable.
public String firstLetterCapitalWithSingleSpace(final String words) {
return Stream.of(words.trim().split("\\s"))
.filter(word -> word.length() > 0)
.map(word -> word.substring(0, 1).toUpperCase() + word.substring(1))
.collect(Collectors.joining(" "));
}
The Gist for this solution can be found here: https://gist.github.com/Hylke1982/166a792313c5e2df9d31
String toBeCapped = "i want this sentence capitalized";
String[] tokens = toBeCapped.split("\\s");
toBeCapped = "";
for(int i = 0; i < tokens.length; i++){
char capLetter = Character.toUpperCase(tokens[i].charAt(0));
toBeCapped += " " + capLetter + tokens[i].substring(1);
}
toBeCapped = toBeCapped.trim();
I've written a small Class to capitalize all the words in a String.
Optional multiple delimiters, each one with its behavior (capitalize before, after, or both, to handle cases like O'Brian);
Optional Locale;
Don't breaks with Surrogate Pairs.
LIVE DEMO
Output:
====================================
SIMPLE USAGE
====================================
Source: cApItAlIzE this string after WHITE SPACES
Output: Capitalize This String After White Spaces
====================================
SINGLE CUSTOM-DELIMITER USAGE
====================================
Source: capitalize this string ONLY before'and''after'''APEX
Output: Capitalize this string only beforE'AnD''AfteR'''Apex
====================================
MULTIPLE CUSTOM-DELIMITER USAGE
====================================
Source: capitalize this string AFTER SPACES, BEFORE'APEX, and #AFTER AND BEFORE# NUMBER SIGN (#)
Output: Capitalize This String After Spaces, BeforE'apex, And #After And BeforE# Number Sign (#)
====================================
SIMPLE USAGE WITH CUSTOM LOCALE
====================================
Source: Uniforming the first and last vowels (different kind of 'i's) of the Turkish word D[İ]YARBAK[I]R (DİYARBAKIR)
Output: Uniforming The First And Last Vowels (different Kind Of 'i's) Of The Turkish Word D[i]yarbak[i]r (diyarbakir)
====================================
SIMPLE USAGE WITH A SURROGATE PAIR
====================================
Source: ab 𐐂c de à
Output: Ab 𐐪c De À
Note: first letter will always be capitalized (edit the source if you don't want that).
Please share your comments and help me to found bugs or to improve the code...
Code:
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Locale;
public class WordsCapitalizer {
public static String capitalizeEveryWord(String source) {
return capitalizeEveryWord(source,null,null);
}
public static String capitalizeEveryWord(String source, Locale locale) {
return capitalizeEveryWord(source,null,locale);
}
public static String capitalizeEveryWord(String source, List<Delimiter> delimiters, Locale locale) {
char[] chars;
if (delimiters == null || delimiters.size() == 0)
delimiters = getDefaultDelimiters();
// If Locale specified, i18n toLowerCase is executed, to handle specific behaviors (eg. Turkish dotted and dotless 'i')
if (locale!=null)
chars = source.toLowerCase(locale).toCharArray();
else
chars = source.toLowerCase().toCharArray();
// First charachter ALWAYS capitalized, if it is a Letter.
if (chars.length>0 && Character.isLetter(chars[0]) && !isSurrogate(chars[0])){
chars[0] = Character.toUpperCase(chars[0]);
}
for (int i = 0; i < chars.length; i++) {
if (!isSurrogate(chars[i]) && !Character.isLetter(chars[i])) {
// Current char is not a Letter; gonna check if it is a delimitrer.
for (Delimiter delimiter : delimiters){
if (delimiter.getDelimiter()==chars[i]){
// Delimiter found, applying rules...
if (delimiter.capitalizeBefore() && i>0
&& Character.isLetter(chars[i-1]) && !isSurrogate(chars[i-1]))
{ // previous character is a Letter and I have to capitalize it
chars[i-1] = Character.toUpperCase(chars[i-1]);
}
if (delimiter.capitalizeAfter() && i<chars.length-1
&& Character.isLetter(chars[i+1]) && !isSurrogate(chars[i+1]))
{ // next character is a Letter and I have to capitalize it
chars[i+1] = Character.toUpperCase(chars[i+1]);
}
break;
}
}
}
}
return String.valueOf(chars);
}
private static boolean isSurrogate(char chr){
// Check if the current character is part of an UTF-16 Surrogate Pair.
// Note: not validating the pair, just used to bypass (any found part of) it.
return (Character.isHighSurrogate(chr) || Character.isLowSurrogate(chr));
}
private static List<Delimiter> getDefaultDelimiters(){
// If no delimiter specified, "Capitalize after space" rule is set by default.
List<Delimiter> delimiters = new ArrayList<Delimiter>();
delimiters.add(new Delimiter(Behavior.CAPITALIZE_AFTER_MARKER, ' '));
return delimiters;
}
public static class Delimiter {
private Behavior behavior;
private char delimiter;
public Delimiter(Behavior behavior, char delimiter) {
super();
this.behavior = behavior;
this.delimiter = delimiter;
}
public boolean capitalizeBefore(){
return (behavior.equals(Behavior.CAPITALIZE_BEFORE_MARKER)
|| behavior.equals(Behavior.CAPITALIZE_BEFORE_AND_AFTER_MARKER));
}
public boolean capitalizeAfter(){
return (behavior.equals(Behavior.CAPITALIZE_AFTER_MARKER)
|| behavior.equals(Behavior.CAPITALIZE_BEFORE_AND_AFTER_MARKER));
}
public char getDelimiter() {
return delimiter;
}
}
public static enum Behavior {
CAPITALIZE_AFTER_MARKER(0),
CAPITALIZE_BEFORE_MARKER(1),
CAPITALIZE_BEFORE_AND_AFTER_MARKER(2);
private int value;
private Behavior(int value) {
this.value = value;
}
public int getValue() {
return value;
}
}
Using org.apache.commons.lang.StringUtils makes it very simple.
capitalizeStr = StringUtils.capitalize(str);
From Java 9+
you can use String::replaceAll like this :
public static void upperCaseAllFirstCharacter(String text) {
String regex = "\\b(.)(.*?)\\b";
String result = Pattern.compile(regex).matcher(text).replaceAll(
matche -> matche.group(1).toUpperCase() + matche.group(2)
);
System.out.println(result);
}
Example :
upperCaseAllFirstCharacter("hello this is Just a test");
Outputs
Hello This Is Just A Test
With this simple code:
String example="hello";
example=example.substring(0,1).toUpperCase()+example.substring(1, example.length());
System.out.println(example);
Result: Hello
I'm using the following function. I think it is faster in performance.
public static String capitalize(String text){
String c = (text != null)? text.trim() : "";
String[] words = c.split(" ");
String result = "";
for(String w : words){
result += (w.length() > 1? w.substring(0, 1).toUpperCase(Locale.US) + w.substring(1, w.length()).toLowerCase(Locale.US) : w) + " ";
}
return result.trim();
}
Use the Split method to split your string into words, then use the built in string functions to capitalize each word, then append together.
Pseudo-code (ish)
string = "the sentence you want to apply caps to";
words = string.split(" ")
string = ""
for(String w: words)
//This line is an easy way to capitalize a word
word = word.toUpperCase().replace(word.substring(1), word.substring(1).toLowerCase())
string += word
In the end string looks something like
"The Sentence You Want To Apply Caps To"
This might be useful if you need to capitalize titles. It capitalizes each substring delimited by " ", except for specified strings such as "a" or "the". I haven't ran it yet because it's late, should be fine though. Uses Apache Commons StringUtils.join() at one point. You can substitute it with a simple loop if you wish.
private static String capitalize(String string) {
if (string == null) return null;
String[] wordArray = string.split(" "); // Split string to analyze word by word.
int i = 0;
lowercase:
for (String word : wordArray) {
if (word != wordArray[0]) { // First word always in capital
String [] lowercaseWords = {"a", "an", "as", "and", "although", "at", "because", "but", "by", "for", "in", "nor", "of", "on", "or", "so", "the", "to", "up", "yet"};
for (String word2 : lowercaseWords) {
if (word.equals(word2)) {
wordArray[i] = word;
i++;
continue lowercase;
}
}
}
char[] characterArray = word.toCharArray();
characterArray[0] = Character.toTitleCase(characterArray[0]);
wordArray[i] = new String(characterArray);
i++;
}
return StringUtils.join(wordArray, " "); // Re-join string
}
public static String toTitleCase(String word){
return Character.toUpperCase(word.charAt(0)) + word.substring(1);
}
public static void main(String[] args){
String phrase = "this is to be title cased";
String[] splitPhrase = phrase.split(" ");
String result = "";
for(String word: splitPhrase){
result += toTitleCase(word) + " ";
}
System.out.println(result.trim());
}
1. Java 8 Streams
public static String capitalizeAll(String str) {
if (str == null || str.isEmpty()) {
return str;
}
return Arrays.stream(str.split("\\s+"))
.map(t -> t.substring(0, 1).toUpperCase() + t.substring(1))
.collect(Collectors.joining(" "));
}
Examples:
System.out.println(capitalizeAll("jon skeet")); // Jon Skeet
System.out.println(capitalizeAll("miles o'Brien")); // Miles O'Brien
System.out.println(capitalizeAll("old mcdonald")); // Old Mcdonald
System.out.println(capitalizeAll(null)); // null
For foo bAR to Foo Bar, replace the map() method with the following:
.map(t -> t.substring(0, 1).toUpperCase() + t.substring(1).toLowerCase())
2. String.replaceAll() (Java 9+)
ublic static String capitalizeAll(String str) {
if (str == null || str.isEmpty()) {
return str;
}
return Pattern.compile("\\b(.)(.*?)\\b")
.matcher(str)
.replaceAll(match -> match.group(1).toUpperCase() + match.group(2));
}
Examples:
System.out.println(capitalizeAll("12 ways to learn java")); // 12 Ways To Learn Java
System.out.println(capitalizeAll("i am atta")); // I Am Atta
System.out.println(capitalizeAll(null)); // null
3. Apache Commons Text
System.out.println(WordUtils.capitalize("love is everywhere")); // Love Is Everywhere
System.out.println(WordUtils.capitalize("sky, sky, blue sky!")); // Sky, Sky, Blue Sky!
System.out.println(WordUtils.capitalize(null)); // null
For titlecase:
System.out.println(WordUtils.capitalizeFully("fOO bAR")); // Foo Bar
System.out.println(WordUtils.capitalizeFully("sKy is BLUE!")); // Sky Is Blue!
For details, checkout this tutorial.
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter the sentence : ");
try
{
String str = br.readLine();
char[] str1 = new char[str.length()];
for(int i=0; i<str.length(); i++)
{
str1[i] = Character.toLowerCase(str.charAt(i));
}
str1[0] = Character.toUpperCase(str1[0]);
for(int i=0;i<str.length();i++)
{
if(str1[i] == ' ')
{
str1[i+1] = Character.toUpperCase(str1[i+1]);
}
System.out.print(str1[i]);
}
}
catch(Exception e)
{
System.err.println("Error: " + e.getMessage());
}
I decided to add one more solution for capitalizing words in a string:
words are defined here as adjacent letter-or-digit characters;
surrogate pairs are provided as well;
the code has been optimized for performance; and
it is still compact.
Function:
public static String capitalize(String string) {
final int sl = string.length();
final StringBuilder sb = new StringBuilder(sl);
boolean lod = false;
for(int s = 0; s < sl; s++) {
final int cp = string.codePointAt(s);
sb.appendCodePoint(lod ? Character.toLowerCase(cp) : Character.toUpperCase(cp));
lod = Character.isLetterOrDigit(cp);
if(!Character.isBmpCodePoint(cp)) s++;
}
return sb.toString();
}
Example call:
System.out.println(capitalize("An à la carte StRiNg. Surrogate pairs: 𐐪𐐪."));
Result:
An À La Carte String. Surrogate Pairs: 𐐂𐐪.
Use:
String text = "jon skeet, miles o'brien, old mcdonald";
Pattern pattern = Pattern.compile("\\b([a-z])([\\w]*)");
Matcher matcher = pattern.matcher(text);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(buffer, matcher.group(1).toUpperCase() + matcher.group(2));
}
String capitalized = matcher.appendTail(buffer).toString();
System.out.println(capitalized);
There are many way to convert the first letter of the first word being capitalized. I have an idea. It's very simple:
public String capitalize(String str){
/* The first thing we do is remove whitespace from string */
String c = str.replaceAll("\\s+", " ");
String s = c.trim();
String l = "";
for(int i = 0; i < s.length(); i++){
if(i == 0){ /* Uppercase the first letter in strings */
l += s.toUpperCase().charAt(i);
i++; /* To i = i + 1 because we don't need to add
value i = 0 into string l */
}
l += s.charAt(i);
if(s.charAt(i) == 32){ /* If we meet whitespace (32 in ASCII Code is whitespace) */
l += s.toUpperCase().charAt(i+1); /* Uppercase the letter after whitespace */
i++; /* Yo i = i + 1 because we don't need to add
value whitespace into string l */
}
}
return l;
}
package com.test;
/**
* #author Prasanth Pillai
* #date 01-Feb-2012
* #description : Below is the test class details
*
* inputs a String from a user. Expect the String to contain spaces and alphanumeric characters only.
* capitalizes all first letters of the words in the given String.
* preserves all other characters (including spaces) in the String.
* displays the result to the user.
*
* Approach : I have followed a simple approach. However there are many string utilities available
* for the same purpose. Example : WordUtils.capitalize(str) (from apache commons-lang)
*
*/
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class Test {
public static void main(String[] args) throws IOException{
System.out.println("Input String :\n");
InputStreamReader converter = new InputStreamReader(System.in);
BufferedReader in = new BufferedReader(converter);
String inputString = in.readLine();
int length = inputString.length();
StringBuffer newStr = new StringBuffer(0);
int i = 0;
int k = 0;
/* This is a simple approach
* step 1: scan through the input string
* step 2: capitalize the first letter of each word in string
* The integer k, is used as a value to determine whether the
* letter is the first letter in each word in the string.
*/
while( i < length){
if (Character.isLetter(inputString.charAt(i))){
if ( k == 0){
newStr = newStr.append(Character.toUpperCase(inputString.charAt(i)));
k = 2;
}//this else loop is to avoid repeatation of the first letter in output string
else {
newStr = newStr.append(inputString.charAt(i));
}
} // for the letters which are not first letter, simply append to the output string.
else {
newStr = newStr.append(inputString.charAt(i));
k=0;
}
i+=1;
}
System.out.println("new String ->"+newStr);
}
}
Here is a simple function
public static String capEachWord(String source){
String result = "";
String[] splitString = source.split(" ");
for(String target : splitString){
result += Character.toUpperCase(target.charAt(0))
+ target.substring(1) + " ";
}
return result.trim();
}
This is just another way of doing it:
private String capitalize(String line)
{
StringTokenizer token =new StringTokenizer(line);
String CapLine="";
while(token.hasMoreTokens())
{
String tok = token.nextToken().toString();
CapLine += Character.toUpperCase(tok.charAt(0))+ tok.substring(1)+" ";
}
return CapLine.substring(0,CapLine.length()-1);
}
Reusable method for intiCap:
public class YarlagaddaSireeshTest{
public static void main(String[] args) {
String FinalStringIs = "";
String testNames = "sireesh yarlagadda test";
String[] name = testNames.split("\\s");
for(String nameIs :name){
FinalStringIs += getIntiCapString(nameIs) + ",";
}
System.out.println("Final Result "+ FinalStringIs);
}
public static String getIntiCapString(String param) {
if(param != null && param.length()>0){
char[] charArray = param.toCharArray();
charArray[0] = Character.toUpperCase(charArray[0]);
return new String(charArray);
}
else {
return "";
}
}
}
Here is my solution.
I ran across this problem tonight and decided to search it. I found an answer by Neelam Singh that was almost there, so I decided to fix the issue (broke on empty strings) and caused a system crash.
The method you are looking for is named capString(String s) below.
It turns "It's only 5am here" into "It's Only 5am Here".
The code is pretty well commented, so enjoy.
package com.lincolnwdaniel.interactivestory.model;
public class StringS {
/**
* #param s is a string of any length, ideally only one word
* #return a capitalized string.
* only the first letter of the string is made to uppercase
*/
public static String capSingleWord(String s) {
if(s.isEmpty() || s.length()<2) {
return Character.toUpperCase(s.charAt(0))+"";
}
else {
return Character.toUpperCase(s.charAt(0)) + s.substring(1);
}
}
/**
*
* #param s is a string of any length
* #return a title cased string.
* All first letter of each word is made to uppercase
*/
public static String capString(String s) {
// Check if the string is empty, if it is, return it immediately
if(s.isEmpty()){
return s;
}
// Split string on space and create array of words
String[] arr = s.split(" ");
// Create a string buffer to hold the new capitalized string
StringBuffer sb = new StringBuffer();
// Check if the array is empty (would be caused by the passage of s as an empty string [i.g "" or " "],
// If it is, return the original string immediately
if( arr.length < 1 ){
return s;
}
for (int i = 0; i < arr.length; i++) {
sb.append(Character.toUpperCase(arr[i].charAt(0)))
.append(arr[i].substring(1)).append(" ");
}
return sb.toString().trim();
}
}
Here we go for perfect first char capitalization of word
public static void main(String[] args) {
String input ="my name is ranjan";
String[] inputArr = input.split(" ");
for(String word : inputArr) {
System.out.println(word.substring(0, 1).toUpperCase()+word.substring(1,word.length()));
}
}
}
//Output : My Name Is Ranjan
For those of you using Velocity in your MVC, you can use the capitalizeFirstLetter() method from the StringUtils class.
String s="hi dude i want apple";
s = s.replaceAll("\\s+"," ");
String[] split = s.split(" ");
s="";
for (int i = 0; i < split.length; i++) {
split[i]=Character.toUpperCase(split[i].charAt(0))+split[i].substring(1);
s+=split[i]+" ";
System.out.println(split[i]);
}
System.out.println(s);
package corejava.string.intern;
import java.io.DataInputStream;
import java.util.ArrayList;
/*
* wap to accept only 3 sentences and convert first character of each word into upper case
*/
public class Accept3Lines_FirstCharUppercase {
static String line;
static String words[];
static ArrayList<String> list=new ArrayList<String>();
/**
* #param args
*/
public static void main(String[] args) throws java.lang.Exception{
DataInputStream read=new DataInputStream(System.in);
System.out.println("Enter only three sentences");
int i=0;
while((line=read.readLine())!=null){
method(line); //main logic of the code
if((i++)==2){
break;
}
}
display();
System.out.println("\n End of the program");
}
/*
* this will display all the elements in an array
*/
public static void display(){
for(String display:list){
System.out.println(display);
}
}
/*
* this divide the line of string into words
* and first char of the each word is converted to upper case
* and to an array list
*/
public static void method(String lineParam){
words=line.split("\\s");
for(String s:words){
String result=s.substring(0,1).toUpperCase()+s.substring(1);
list.add(result);
}
}
}
If you prefer Guava...
String myString = ...;
String capWords = Joiner.on(' ').join(Iterables.transform(Splitter.on(' ').omitEmptyStrings().split(myString), new Function<String, String>() {
public String apply(String input) {
return Character.toUpperCase(input.charAt(0)) + input.substring(1);
}
}));
String toUpperCaseFirstLetterOnly(String str) {
String[] words = str.split(" ");
StringBuilder ret = new StringBuilder();
for(int i = 0; i < words.length; i++) {
ret.append(Character.toUpperCase(words[i].charAt(0)));
ret.append(words[i].substring(1));
if(i < words.length - 1) {
ret.append(' ');
}
}
return ret.toString();
}

Categories

Resources