Check if letter is emoji

Check if letter is emoji - java

I want to check if a letter is a emoji. I've found some similiar questions on so and found this regex:
private final String emo_regex = "([\\u20a0-\\u32ff\\ud83c\\udc00-\\ud83d\\udeff\\udbb9\\udce5-\\udbb9\\udcee])";
However, when I do the following in a sentence like:
for (int k=0; k<letters.length;k++) {
if (letters[k].matches(emo_regex)) {
emoticon.add(letters[k]);
}
}
It doesn't add any letters with any emoji. I've also tried with a Matcher and a Pattern, but that didn't work either. Is there something wrong with the regex or am I missing something obvious in my code?
This is how I get the letter:
sentence = "Jij staat op 10 😂"
String[] letters = sentence.split("");
The last 😂 should be recognized and added to emoticon

You could use emoji4j library. The following should solve the issue.
String htmlifiedText = EmojiUtils.htmlify(text);
// regex to identify html entitities in htmlified text
Matcher matcher = htmlEntityPattern.matcher(htmlifiedText);
while (matcher.find()) {
String emojiCode = matcher.group();
if (isEmoji(emojiCode)) {
emojis.add(EmojiUtils.getEmoji(emojiCode).getEmoji());
}
}

This function I created checks if given String consists of only emojis.
in other words if the String contains any character not included in the Regex, it will return false.
private static boolean isEmoji(String message){
return message.matches("(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|" +
"[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|" +
"[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|" +
"[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|" +
"[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|" +
"[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|" +
"[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|" +
"[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|" +
"[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|" +
"[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|" +
"[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)+");
}
Example of implementation:
public static int detectEmojis(String message){
int len = message.length(), NumEmoji = 0;
// if the the given String is only emojis.
if(isEmoji(message)){
for (int i = 0; i < len; i++) {
// if the charAt(i) is an emoji by it self -> ++NumEmoji
if (isEmoji(message.charAt(i)+"")) {
NumEmoji++;
} else {
// maybe the emoji is of size 2 - so lets check.
if (i < (len - 1)) { // some Emojis are two characters long in java, e.g. a rocket emoji is "\uD83D\uDE80";
if (Character.isSurrogatePair(message.charAt(i), message.charAt(i + 1))) {
i += 1; //also skip the second character of the emoji
NumEmoji++;
}
}
}
}
return NumEmoji;
}
return 0;
}
given is a function that runs on a string (of only emojis) and return the number of emojis in it. (with the help of other answers i found here on StackOverFlow).

It seems like those emojis are two characters long, but with split("") you are splitting between each single character, thus none of those letters can be the emoji you are looking for.
Instead, you could try splitting between words:
for (String word : sentence.split(" ")) {
if (word.matches(emo_regex)) {
System.out.println(word);
}
}
But of course this will miss emojis that are joined to a word, or punctuation.
Alternatively, you could just use a Matcher to find any group in the sentence that matches the regex.
Matcher matcher = Pattern.compile(emo_regex).matcher(sentence);
while (matcher.find()) {
System.out.println(matcher.group());
}

You can use Character class for determining is letter is part of surrogate pair. There some helpful methods to deal with surrogate pairs emoji symbols, for example:
String text = "💩";
if (text.length() > 1 && Character.isSurrogatePair(text.charAt(0), text.charAt(1))) {
int codePoint = Character.toCodePoint(text.charAt(0), text.charAt(1));
char[] c = Character.toChars(codePoint);
}

Try this project simple-emoji-4j
Compatible with Emoji 12.0 (2018.10.15)
Simple with:
EmojiUtils.containsEmoji(str)

It's worth bearing in mind that Java code can be written in Unicode. So you can just do:
#Test
public void containsEmoji_detects_smileys() {
assertTrue(containsEmoji("This 😂 is a smiley "));
assertTrue(containsEmoji("This 😄 is a different smiley"));
assertFalse(containsEmoji("No smiley here"));
}
private boolean containsEmoji(String s) {
String pattern = ".*[😂😄].*";
return s.matches(pattern);
}
Although see: Should source code be saved in UTF-8 format for discussion on whether that's a good idea.
You can split a String into Unicode codepoints in Java 8 using String.codePoints(), which returns an IntStream. That means you can do something like:
Set<Integer> emojis = new HashSet<>();
emojis.add("😂".codePointAt(0));
emojis.add("😄".codePointAt(0));
String s = "1😂34😄5";
s.codePoints().forEach( codepoint -> {
System.out.println(
new String(Character.toChars(codepoint))
+ " "
+ emojis.contains(codepoint));
});
... prints ...
1 false
😂 true
3 false
4 false
😄 true
5 false
Of course if you prefer not to have literal unicode chars in your code you can just put numbers in your set:
emojis.add(0x1F601);

Here you go -
for (String word : sentence.split("")) {
if (word.matches(emo_regex)) {
System.out.println(word);
}
}

This is how Telegram does it:
private static boolean isEmoji(String message){
return message.matches("(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|" +
"[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|" +
"[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|" +
"[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|" +
"[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|" +
"[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|" +
"[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|" +
"[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|" +
"[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|" +
"[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|" +
"[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)+");
}
It is Line 21,026 in ChatActivity.

Unicode has an entire document on this. Emojis and emoji sequences are a lot more complicated than just a few character ranges. There are emoji modifiers (for example, skin tones), regional indicator pairs (country flags), and some special sequences like the pirate flag.
You can use Unicode’s emoji data files to reliably find emoji characters and emoji sequences. This will work even as new complex emojis are added:
import java.net.URL;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class EmojiCollector {
private static String emojiSequencesBaseURI;
private final Pattern emojiPattern;
public EmojiCollector()
throws IOException {
StringBuilder sequences = new StringBuilder();
appendSequencesFrom(
uriOfEmojiSequencesFile("emoji-sequences.txt"),
sequences);
appendSequencesFrom(
uriOfEmojiSequencesFile("emoji-zwj-sequences.txt"),
sequences);
emojiPattern = Pattern.compile(sequences.toString());
}
private void appendSequencesFrom(String sequencesFileURI,
StringBuilder sequences)
throws IOException {
Path sequencesFile = download(sequencesFileURI);
Pattern range =
Pattern.compile("^(\\p{XDigit}{4,6})\\.\\.(\\p{XDigit}{4,6})");
Matcher rangeMatcher = range.matcher("");
try (BufferedReader sequencesReader =
Files.newBufferedReader(sequencesFile)) {
String line;
while ((line = sequencesReader.readLine()) != null) {
if (line.trim().isEmpty() || line.startsWith("#")) {
continue;
}
int semicolon = line.indexOf(';');
if (semicolon < 0) {
continue;
}
String codepoints = line.substring(0, semicolon);
if (sequences.length() > 0) {
sequences.append("|");
}
if (rangeMatcher.reset(codepoints).find()) {
String start = rangeMatcher.group(1);
String end = rangeMatcher.group(2);
sequences.append("[\\x{").append(start).append("}");
sequences.append("-\\x{").append(end).append("}]");
} else {
Scanner scanner = new Scanner(codepoints);
while (scanner.hasNext()) {
String codepoint = scanner.next();
sequences.append("\\x{").append(codepoint).append("}");
}
}
}
}
}
private static String uriOfEmojiSequencesFile(String baseName)
throws IOException {
if (emojiSequencesBaseURI == null) {
URL readme = new URL(
"https://www.unicode.org/Public/UCD/latest/ReadMe.txt");
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(readme.openStream(), "UTF-8"))) {
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("Public/emoji/")) {
emojiSequencesBaseURI =
"https://www.unicode.org/" + line.trim();
if (!emojiSequencesBaseURI.endsWith("/")) {
emojiSequencesBaseURI += "/";
}
break;
}
}
}
if (emojiSequencesBaseURI == null) {
// Where else can we get this reliably?
String version = "15.0";
emojiSequencesBaseURI =
"https://www.unicode.org/Public/emoji/" + version + "/";
}
}
return emojiSequencesBaseURI + baseName;
}
private static Path download(String uri)
throws IOException {
Path cacheDir;
String os = System.getProperty("os.name");
String home = System.getProperty("user.home");
if (os.contains("Windows")) {
Path appDataDir;
String appData = System.getenv("APPDATA");
if (appData != null) {
appDataDir = Paths.get(appData);
} else {
appDataDir = Paths.get(home, "AppData");
}
cacheDir = appDataDir.resolve("Local");
} else if (os.contains("Mac")) {
cacheDir = Paths.get(home, "Library", "Application Support");
} else {
cacheDir = Paths.get(home, ".cache");
String cacheHome = System.getenv("XDG_CACHE_HOME");
if (cacheHome != null) {
Path dir = Paths.get(cacheHome);
if (dir.isAbsolute()) {
cacheDir = dir;
}
}
}
String baseName = uri.substring(uri.lastIndexOf('/') + 1);
Path dataDir = cacheDir.resolve(EmojiCollector.class.getName());
Path dataFile = dataDir.resolve(baseName);
if (!Files.isReadable(dataFile)) {
Files.createDirectories(dataDir);
URL dataURL = new URL(uri);
try (InputStream data = dataURL.openStream()) {
Files.copy(data, dataFile);
}
}
return dataFile;
}
public Collection<String> getEmojisIn(String letters) {
Collection<String> emoticons = new ArrayList<>();
Matcher emojiMatcher = emojiPattern.matcher(letters);
while (emojiMatcher.find()) {
emoticons.add(emojiMatcher.group());
}
return emoticons;
}
public static void main(String[] args)
throws IOException {
EmojiCollector collector = new EmojiCollector();
for (String arg : args) {
Collection<String> emojis = collector.getEmojisIn(arg);
System.out.println(arg + " => " + String.join("", emojis));
}
}
}

Here's some java logic that relies on java.lang.Character api that I have found pretty reliably tells apart an emoji from mere 'special characters' and non-latin alphabets. Give it a try.
import static java.lang.Character.UnicodeBlock.MISCELLANEOUS_SYMBOLS;
import static java.lang.Character.UnicodeBlock.MISCELLANEOUS_TECHNICAL;
import static java.lang.Character.UnicodeBlock.VARIATION_SELECTORS;
import static java.lang.Character.codePointAt;
import static java.lang.Character.codePointBefore;
import static java.lang.Character.isSupplementaryCodePoint;
import static java.lang.Character.isValidCodePoint;
public boolean checkStringEmoji(String someString) {
if(!someString.isEmpty() && someString.length() < 5) {
int firstCodePoint = codePointAt(someString, 0);
int lastCodePoint = codePointBefore(someString, someString.length());
if (isValidCodePoint(firstCodePoint) && isValidCodePoint(lastCodePoint)) {
if (isSupplementaryCodePoint(firstCodePoint) ||
isSupplementaryCodePoint(lastCodePoint) ||
Character.UnicodeBlock.of(firstCodePoint) == MISCELLANEOUS_SYMBOLS ||
Character.UnicodeBlock.of(firstCodePoint) == MISCELLANEOUS_TECHNICAL ||
Character.UnicodeBlock.of(lastCodePoint) == VARIATION_SELECTORS
) {
// string is emoji
return true;
}
}
}
return false;
}

Related

Trying to delete all non letter parts of a word but this line deletes the whole word '"Have' from the tokenizer

while(tokenizer.hasMoreTokens()){
currentWord = tokenizer.nextToken();
String[] parts = currentWord.split(Pattern.quote("."));
String[] parts2 = parts[0].split(Pattern.quote(","));
String[] parts3 = parts2[0].split(Pattern.quote("?"));
String[] parts4 = parts3[0].split(Pattern.quote("\\.| "));
String[] parts5 = parts4[0].split("\"");
String[] parts6 = parts5[0].split(Pattern.quote(":"));
System.out.println(Arrays.toString(parts6));
I'm just trying to get this text to split properly, only issue right now is the word:
"Have
Also if someone could provide a solution that combines all this into one line that would be nice but I couldn't get that to work thanks

Try this.
The \ is to escape the ", and the "\\" are to escape the regex special characters "." & "?". We are replacing any of these .,":? with an empty string.
while(tokenizer.hasMoreTokens()){
currentWord = tokenizer.nextToken();
final String cleanWord = currentWord.replaceAll("[\\.,\":\\?]", "");
System.out.println(cleanWord);
}

There are specialized classes in the API to parse words out of text. Here is one such:
import java.text.BreakIterator;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Stream;
import java.nio.file.Files;
import java.nio.file.Paths;
public class WordCollector {
public static void main(String[] args) {
try {
List<String> words = WordCollector.getWords(Files.lines(Paths.get(args[0])));
System.out.println(words);
} catch (Throwable t) {
t.printStackTrace();
}
}
public static List<String> getWords(Stream<String> lines) {
List<String> result = new ArrayList<>();
BreakIterator boundary = BreakIterator.getWordInstance();
lines.forEach(line -> {
boundary.setText(line);
int start = boundary.first();
for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) {
String candidate = line.substring(start, end).replaceAll("\\p{Punct}", "").trim();
if (candidate.length() > 0) {
result.add(candidate);
}
}
});
return result;
}
}

Here is one way if you want to split the line on non-letters.
[^A-Za-z]+ split on one or more of non-letters
String line = "wordA, wordB; wordC;;; wordD, wordE!?+- !wordF??, !wordG!, wordH, wordI";
String[] words = line.split("[^A-Za-z]+");
for (String word : words) {
System.out.println(word);
}
prints
wordA
wordB
wordC
wordD
wordE
wordF
wordG
wordH
wordI
On the other hand, if you want to remove those characters from a word, use a similar pattern. No need to specify the non-letter characters separately.
String word = "C:om!>{}.p*u**te,;rs";
word = word.replaceAll("[^A-Za-z]","");
System.out.println(word);
prints
Computers

The code below shows how you can ignore all none alpha characters.
import java.io.*;
public class Main{
public static void main(String[] args) throws IOException {
int c = 0;
while((c=System.in.read())!=-1)
if (('a' <= c && c <= 'z') || ('A' <= c && c <='Z')
System.out.print((char)c);
}
}

How to split a string in format AB123 --> AB 123? Java

I have a string in format AB123. I want to split it between the AB and 123 so AB123 becomes AB 123. The contents of the string can differ but the format stays the same. Is there a way to do this?

Following up with the latest information you provided (2 letters then 3 numbers):
myString.subString(0, 2) + " " + myString.subString(2)
What this does: you split your input string myString at the 2nd character and append a space at this position.

Explanation: \D represents non-digit and \d represents a digit in a regular expression and I used ternary operation in the regex to split charter to the number.
String string = "AB123";
String[] split = string.split("(?<=\\D)(?=\\d)");
System.out.println(split[0]+" "+split[1]);

Try
String a = "abcd1234";
int i;
for(i = 0; i < a.length(); i++){
char c = a.charAt(i);
if( '0' <= c && c <= '9' )
break;
}
String alphaPart = a.substring(0, i);
String numberPart = a.substring(i);
Hope this helps

Although I would personally use the method provided in #RakeshMothukur's answer, since it also works when the letter or digit counts increase/decrease later on, I wanted to provide an additional method to insert the space between the two letters and three digits:
String str = "AB123";
StringBuilder sb = new StringBuilder(str);
sb.insert(2, " "); // Insert a space at 0-based index 2; a.k.a. after the first 2 characters
String result = sb.toString(); // Convert the StringBuilder back to a String
Try it online.

Here you go. I wrote it in very simple way to make things clear.
What it does is : After it takes user input, it converts the string into Char array and it checks single character if its INT or non INT.
In each iteration it compares the data type with the prev character and prints accordingly.
Alternate Solutions
1) Using ASCII range (difficulty = easy)
2) Override a method and check 2 variables at a time. (difficulty = Intermediate)
import org.omg.CORBA.INTERNAL;
import java.io.InputStreamReader;
import java.util.*;
import java.io.BufferedReader;
public class Main {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
char[] s = br.readLine().toCharArray();
int prevflag, flag = 0;
for (int i = 0; i < s.length; i++) {
int a = Character.getNumericValue(s[i]);
String b = String.valueOf(s[i]);
prevflag = flag;
flag = checktype(a, b);
if ((prevflag == flag) || (i == 0))
System.out.print(s[i]);
else
System.out.print(" " + s[i]);
}
}
public static int checktype(int x, String y) {
int flag = 0;
if (String.valueOf(x).equals(y))
flag = 1; // INT
else
flag = 2; // non INT
return flag;
}
}

I was waiting for a compile to finish before heading out, so threw together a slightly over-engineered example with basic error checking and a test.
import java.text.ParseException;
import java.util.LinkedList;
public class Main {
static public class ParsedData {
public final String prefix;
public final Integer number;
public ParsedData(String _prefix, Integer _number) {
prefix = _prefix;
number = _number;
}
#Override
public String toString() {
return prefix + "\t" + number.toString();
}
}
static final String TEST_DATA[] = {"AB123", "JX7272", "FX402", "ADF123", "JD3Q2", "QB778"};
public static void main(String[] args) {
parseDataArray(TEST_DATA);
}
public static ParsedData[] parseDataArray(String[] inputs) {
LinkedList<ParsedData> results = new LinkedList<ParsedData>();
for (String s : TEST_DATA) {
try {
System.out.println("Parsing: " + s);
if (s.length() != 5) throw new ParseException("Input Length incorrect: " + s.length(), 0);
String _prefix = s.substring(0, 2);
Integer _num = Integer.parseInt(s.substring(2));
results.add(new ParsedData(_prefix, _num));
} catch (ParseException | NumberFormatException e) {
System.out.printf("\"%s\", %s\n", s, e.toString());
}
}
return results.toArray(new ParsedData[results.size()]);
}
}

regex email validation

this is a continuation of my earlier post, My code:
public class Main {
static String theFile = "C:\\Users\\Pc\\Desktop\\textfile.txt";
public static boolean validate(String input) {
boolean status = false;
String REGEX = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
status = true;
} else {
status = false;
}
return status;
}
public static void main(String[] args) {
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(theFile));
String line;
int count = 0;
while ((line = br.readLine()) != null) {
String[] arr = line.split("#");
for (int x = 0; x < arr.length; x++) {
if (arr[x].equals(validate(theFile))) {
count++;
}
System.out.println("no of matches " + count);
}
}
} catch (IOException e) {
e.printStackTrace();
}
Main.validate(theFile);
}
}
It shows result :
no of matches 0
no of matches 0
no of matches 0
no of matches 0
and this is my text in input file
sjbfbhbs#yahoo.com # fgfgfgf#yahoo.com # ghghgh#gamil.com #fhfbs#y.com
my output should be three emails because the last string is not a standard email format
I know I'm not suppose to pass (arr[x].validate(theFile)))

I have always used this:
public static bool Validate(string email)
{
string expressions = #"^\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*$";
return Regex.IsMatch(email, expressions);
}
Note: I also have a function that "cleans" the string should there be multiple # symbols also.
Edit: Here is how I clean out extra # symbols. Note this will keep the FIRST # it finds and just remove the rest. This function should be used BEFORE you run the validation function on it.
public static string CleanEmail(string input)
{
string output = "";
try
{
if (input.Length > 0)
{
string first_pass = Regex.Replace(input, #"[^\w\.#-]", "");
List<string> second_pass = new List<string>();
string third_pass = first_pass;
string final_pass = "";
if (first_pass.Contains("#"))
{
second_pass = first_pass.Split('#').ToList();
if (second_pass.Count >= 2)
{
string second_pass_0 = second_pass[0];
string second_pass_1 = second_pass[1];
third_pass = second_pass_0 + "#" + second_pass_1;
second_pass.Remove(second_pass_0);
second_pass.Remove(second_pass_1);
}
}
if (second_pass.Count > 0)
{
final_pass = third_pass + string.Join("", second_pass.ToArray());
}
else
{
final_pass = third_pass;
}
output = final_pass;
}
}
catch (Exception Ex)
{
}
return output;
}

There are several errors in your code:
if (arr[x].equals(validate(theFile))) checks whether the mail address string equals the boolean value you get from the validate() method. This will never be the case.
In the validate() method, if you only want to check if the string matches a regex, you can simply do that with string.matches(pattern) - so you only need one line of code (not really in error, but it's more elegant this way)
After splitting your input string (the line), there are whitespaces left, because you only split at the # symbol. You can either trim() each string afterwards to remove those (see the code below) or split() at \\s*#\\s* instead of just #
Here is an example with all the fixes (i left out the part where you read the file and used a string with your mail addresses instead!):
public class Main {
private static final String PATTERN_MAIL
= "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#" + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
public static boolean validate(String input) {
return input.matches(PATTERN_MAIL);
}
public static void main(String[] args) {
String line = "sjbfbhbs#yahoo.com # fgfgfgf#yahoo.com # ghghgh#gamil.com #fhfbs#y.com";
String[] arr = line.split("#");
int count = 0;
for (int x = 0; x < arr.length; x++) {
if (validate(arr[x].trim())) {
count++;
}
System.out.println("no of matches " + count);
}
}
}
It prints:
no of matches 1
no of matches 2
no of matches 3
no of matches 4
EDIT: If the pattern is not supposed to match the last mail address, you'll have to change the pattern. Right now it matches all of them.

How to disregard numbers when reading from a text file?

Right now I want to store a text file that goes like this:
1 apple
2 banana
3 orange
4 lynx
5 cappuccino
and so on into a data structure. Would the best way of doing this be mapping the int to the string somehow, or should I make an arraylist? I'm supposed to, when I store the words themselves, disregard the int and any whitespace, and keep only the word itself. How do I disregard the int when reading in lines? Here is my hacked together code right now:
public Dictionary(String filename) throws IOException {
if (filename==null)
throw new IllegalArgumentException("Null filename");
else{
try {
BufferedReader in = new BufferedReader(new FileReader(filename));
String str;
int numLines=0;
while ((str = in.readLine()) != null) {
numLines++;
}
String[] words=new String[numLines];
for (int i=0; i<words.length;i++){
words[i]=in.readLine();
}
in.close();
} catch (IOException e) {
}
}
}
Thank you in advance for the help!!

Just implement the power of the regular expression:
List texts<String> = new ArrayList<String>();
Pattern pattern = Pattern.compile("[^0-9\\s]+");
String text = "1 apple 2 oranges 3 carrots";
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
texts.add(matcher.group(0));
}
regular expressions are very much popular these days. the compile method is used for compiling your search pattern, with the numbers you see in the parameter is to prevent getting them on your search. So it's completely safe. use apache's IOUtilities to convert a text file to String

This won´t work because you are already at the end of the file, so the in.readLine() methode will return null.
I would use a Map to store the name and the amount...something like this:
HashMap<String, Integer> map = new HashMap<String, Integer>();
while( (line = br.readLine() !=null){
//also check if the array is null and the right size, trim, etc.
String[] tmp = line.split(" ");
map.put(tmp[1], Integer.parseInt(tmp[0]) );
}
Otherwise you can try it with the Scanner class. Good luck.

You can give regular expressions a try.
Pattern p = Pattern.compile("[^0-9\\s]+");
String s = "1 apple 2 oranges";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(0));
}
Output =
apple
oranges
To get a idea about regular expressions Java regex tutorial.

I suggest you use a List of items to store the results parsed from the file. One way to parse every text line is to use the String.split(String) method. Also note that you should handle exceptions in the code properly and do not forget to close the Reader when you are done (no matter whether flawlessly or with an exception => use a finally block). The following example should put you on track... Hope this helps.
package test;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(String[] args) throws IOException {
Main m = new Main();
m.start("test.txt");
}
private void start(String filename) throws IOException {
System.out.println(readFromFile(filename));
}
private final class Item {
private String name;
private int id;
public Item(String name, int id) {
this.name = name;
this.id = id;
}
public int getId() {
return id;
}
public String getName() {
return name;
}
#Override
public String toString() {
return "Item [name=" + name + ", id=" + id + "]";
}
}
private List<Item> readFromFile(String filename) throws IOException {
List<Item> items = new ArrayList<Item>();
Reader r = null;
try {
r = new FileReader(filename);
BufferedReader br = new BufferedReader(r);
String line = null;
while ((line = br.readLine()) != null) {
String[] lineItems = line.split(" ");
if (lineItems.length != 2) {
throw new IOException("Incorrect input file data format! Two space separated items expected on every line!");
}
try {
int id = Integer.parseInt(lineItems[0]);
Item i = new Item(lineItems[1], id);
items.add(i);
} catch (NumberFormatException ex) {
throw new IOException("Incorrect input file data format!", ex); // JDK6+
}
}
} finally {
if (r != null) {
r.close();
}
}
return items;
}
}

If your words don't contain spaces, you could use String.split( " " ) to split up the String into an array of Strings delimited by spaces.
Then just take the second element of the array (the first will be the number).
Also, the String.trim( ) method will remove any whitespace before or after the String.
Note: there's probably some error checking that you'd want to perform (what if the String isn't formatted as you expect). But this code snippet gives the basic idea:
...
String s = in.readLine( );
String[] tokens = s.split( " " );
words[i] = tokens[1].trim( );
...

If you want to do something easy just substring the original work by counting digits:
int t = 0;
while (word.charAt(t) >= '0' && word.charAt(t) <= '9')
++t;
word = word.substring(t);
If words NEVER contain spaces you can also use word.split(" ")[1]

Instead of using a buffer reader use the Scanner class and instead of using an Array use an ArrayList, like so :
import java.util.Scanner;
import java.util.ArrayList;
public class Dictionary {
private ArrayList strings = new ArrayList();
code...
public Dictionary(String fileName) throws IOException {
code...
try {
Scanner inFile = new Scanner(new fileRead(fileName));
ArrayList.add("Dummy"); // Dummy value to make the index start at 1
while(inFile.hasNext()) {
int n = inFile.nextInt(); // this line just reads in the int from the file and
// doesn't do anything with it
String s = inFile.nextLine().trim();
strings.add(s);
}
inFile.close(); // don't forget to close the file
}
and then since your data goes 1, 2, 3, 4, 5, you can just use the index to retrieve each item's number.
By doing this:
for(int i = 1; i < strings.size(); i++) {
int n = i;
String s = n + " " + strings.get(i);
System.out.println(s);
}

How to capitalize the first character of each word in a string

Is there a function built into Java that capitalizes the first character of each word in a String, and does not affect the others?
Examples:
jon skeet -> Jon Skeet
miles o'Brien -> Miles O'Brien (B remains capital, this rules out Title Case)
old mcdonald -> Old Mcdonald*
*(Old McDonald would be find too, but I don't expect it to be THAT smart.)
A quick look at the Java String Documentation reveals only toUpperCase() and toLowerCase(), which of course do not provide the desired behavior. Naturally, Google results are dominated by those two functions. It seems like a wheel that must have been invented already, so it couldn't hurt to ask so I can use it in the future.

WordUtils.capitalize(str) (from apache commons-text)
(Note: if you need "fOO BAr" to become "Foo Bar", then use capitalizeFully(..) instead)

If you're only worried about the first letter of the first word being capitalized:
private String capitalize(final String line) {
return Character.toUpperCase(line.charAt(0)) + line.substring(1);
}

The following method converts all the letters into upper/lower case, depending on their position near a space or other special chars.
public static String capitalizeString(String string) {
char[] chars = string.toLowerCase().toCharArray();
boolean found = false;
for (int i = 0; i < chars.length; i++) {
if (!found && Character.isLetter(chars[i])) {
chars[i] = Character.toUpperCase(chars[i]);
found = true;
} else if (Character.isWhitespace(chars[i]) || chars[i]=='.' || chars[i]=='\'') { // You can add other chars here
found = false;
}
}
return String.valueOf(chars);
}

Try this very simple way
example givenString="ram is good boy"
public static String toTitleCase(String givenString) {
String[] arr = givenString.split(" ");
StringBuffer sb = new StringBuffer();
for (int i = 0; i < arr.length; i++) {
sb.append(Character.toUpperCase(arr[i].charAt(0)))
.append(arr[i].substring(1)).append(" ");
}
return sb.toString().trim();
}
Output will be: Ram Is Good Boy

I made a solution in Java 8 that is IMHO more readable.
public String firstLetterCapitalWithSingleSpace(final String words) {
return Stream.of(words.trim().split("\\s"))
.filter(word -> word.length() > 0)
.map(word -> word.substring(0, 1).toUpperCase() + word.substring(1))
.collect(Collectors.joining(" "));
}
The Gist for this solution can be found here: https://gist.github.com/Hylke1982/166a792313c5e2df9d31

String toBeCapped = "i want this sentence capitalized";
String[] tokens = toBeCapped.split("\\s");
toBeCapped = "";
for(int i = 0; i < tokens.length; i++){
char capLetter = Character.toUpperCase(tokens[i].charAt(0));
toBeCapped += " " + capLetter + tokens[i].substring(1);
}
toBeCapped = toBeCapped.trim();

I've written a small Class to capitalize all the words in a String.
Optional multiple delimiters, each one with its behavior (capitalize before, after, or both, to handle cases like O'Brian);
Optional Locale;
Don't breaks with Surrogate Pairs.
LIVE DEMO
Output:
====================================
SIMPLE USAGE
====================================
Source: cApItAlIzE this string after WHITE SPACES
Output: Capitalize This String After White Spaces
====================================
SINGLE CUSTOM-DELIMITER USAGE
====================================
Source: capitalize this string ONLY before'and''after'''APEX
Output: Capitalize this string only beforE'AnD''AfteR'''Apex
====================================
MULTIPLE CUSTOM-DELIMITER USAGE
====================================
Source: capitalize this string AFTER SPACES, BEFORE'APEX, and #AFTER AND BEFORE# NUMBER SIGN (#)
Output: Capitalize This String After Spaces, BeforE'apex, And #After And BeforE# Number Sign (#)
====================================
SIMPLE USAGE WITH CUSTOM LOCALE
====================================
Source: Uniforming the first and last vowels (different kind of 'i's) of the Turkish word D[İ]YARBAK[I]R (DİYARBAKIR)
Output: Uniforming The First And Last Vowels (different Kind Of 'i's) Of The Turkish Word D[i]yarbak[i]r (diyarbakir)
====================================
SIMPLE USAGE WITH A SURROGATE PAIR
====================================
Source: ab 𐐂c de à
Output: Ab 𐐪c De À
Note: first letter will always be capitalized (edit the source if you don't want that).
Please share your comments and help me to found bugs or to improve the code...
Code:
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Locale;
public class WordsCapitalizer {
public static String capitalizeEveryWord(String source) {
return capitalizeEveryWord(source,null,null);
}
public static String capitalizeEveryWord(String source, Locale locale) {
return capitalizeEveryWord(source,null,locale);
}
public static String capitalizeEveryWord(String source, List<Delimiter> delimiters, Locale locale) {
char[] chars;
if (delimiters == null || delimiters.size() == 0)
delimiters = getDefaultDelimiters();
// If Locale specified, i18n toLowerCase is executed, to handle specific behaviors (eg. Turkish dotted and dotless 'i')
if (locale!=null)
chars = source.toLowerCase(locale).toCharArray();
else
chars = source.toLowerCase().toCharArray();
// First charachter ALWAYS capitalized, if it is a Letter.
if (chars.length>0 && Character.isLetter(chars[0]) && !isSurrogate(chars[0])){
chars[0] = Character.toUpperCase(chars[0]);
}
for (int i = 0; i < chars.length; i++) {
if (!isSurrogate(chars[i]) && !Character.isLetter(chars[i])) {
// Current char is not a Letter; gonna check if it is a delimitrer.
for (Delimiter delimiter : delimiters){
if (delimiter.getDelimiter()==chars[i]){
// Delimiter found, applying rules...
if (delimiter.capitalizeBefore() && i>0
&& Character.isLetter(chars[i-1]) && !isSurrogate(chars[i-1]))
{ // previous character is a Letter and I have to capitalize it
chars[i-1] = Character.toUpperCase(chars[i-1]);
}
if (delimiter.capitalizeAfter() && i<chars.length-1
&& Character.isLetter(chars[i+1]) && !isSurrogate(chars[i+1]))
{ // next character is a Letter and I have to capitalize it
chars[i+1] = Character.toUpperCase(chars[i+1]);
}
break;
}
}
}
}
return String.valueOf(chars);
}
private static boolean isSurrogate(char chr){
// Check if the current character is part of an UTF-16 Surrogate Pair.
// Note: not validating the pair, just used to bypass (any found part of) it.
return (Character.isHighSurrogate(chr) || Character.isLowSurrogate(chr));
}
private static List<Delimiter> getDefaultDelimiters(){
// If no delimiter specified, "Capitalize after space" rule is set by default.
List<Delimiter> delimiters = new ArrayList<Delimiter>();
delimiters.add(new Delimiter(Behavior.CAPITALIZE_AFTER_MARKER, ' '));
return delimiters;
}
public static class Delimiter {
private Behavior behavior;
private char delimiter;
public Delimiter(Behavior behavior, char delimiter) {
super();
this.behavior = behavior;
this.delimiter = delimiter;
}
public boolean capitalizeBefore(){
return (behavior.equals(Behavior.CAPITALIZE_BEFORE_MARKER)
|| behavior.equals(Behavior.CAPITALIZE_BEFORE_AND_AFTER_MARKER));
}
public boolean capitalizeAfter(){
return (behavior.equals(Behavior.CAPITALIZE_AFTER_MARKER)
|| behavior.equals(Behavior.CAPITALIZE_BEFORE_AND_AFTER_MARKER));
}
public char getDelimiter() {
return delimiter;
}
}
public static enum Behavior {
CAPITALIZE_AFTER_MARKER(0),
CAPITALIZE_BEFORE_MARKER(1),
CAPITALIZE_BEFORE_AND_AFTER_MARKER(2);
private int value;
private Behavior(int value) {
this.value = value;
}
public int getValue() {
return value;
}
}

Using org.apache.commons.lang.StringUtils makes it very simple.
capitalizeStr = StringUtils.capitalize(str);

From Java 9+
you can use String::replaceAll like this :
public static void upperCaseAllFirstCharacter(String text) {
String regex = "\\b(.)(.*?)\\b";
String result = Pattern.compile(regex).matcher(text).replaceAll(
matche -> matche.group(1).toUpperCase() + matche.group(2)
);
System.out.println(result);
}
Example :
upperCaseAllFirstCharacter("hello this is Just a test");
Outputs
Hello This Is Just A Test

With this simple code:
String example="hello";
example=example.substring(0,1).toUpperCase()+example.substring(1, example.length());
System.out.println(example);
Result: Hello

I'm using the following function. I think it is faster in performance.
public static String capitalize(String text){
String c = (text != null)? text.trim() : "";
String[] words = c.split(" ");
String result = "";
for(String w : words){
result += (w.length() > 1? w.substring(0, 1).toUpperCase(Locale.US) + w.substring(1, w.length()).toLowerCase(Locale.US) : w) + " ";
}
return result.trim();
}

Use the Split method to split your string into words, then use the built in string functions to capitalize each word, then append together.
Pseudo-code (ish)
string = "the sentence you want to apply caps to";
words = string.split(" ")
string = ""
for(String w: words)
//This line is an easy way to capitalize a word
word = word.toUpperCase().replace(word.substring(1), word.substring(1).toLowerCase())
string += word
In the end string looks something like
"The Sentence You Want To Apply Caps To"

This might be useful if you need to capitalize titles. It capitalizes each substring delimited by " ", except for specified strings such as "a" or "the". I haven't ran it yet because it's late, should be fine though. Uses Apache Commons StringUtils.join() at one point. You can substitute it with a simple loop if you wish.
private static String capitalize(String string) {
if (string == null) return null;
String[] wordArray = string.split(" "); // Split string to analyze word by word.
int i = 0;
lowercase:
for (String word : wordArray) {
if (word != wordArray[0]) { // First word always in capital
String [] lowercaseWords = {"a", "an", "as", "and", "although", "at", "because", "but", "by", "for", "in", "nor", "of", "on", "or", "so", "the", "to", "up", "yet"};
for (String word2 : lowercaseWords) {
if (word.equals(word2)) {
wordArray[i] = word;
i++;
continue lowercase;
}
}
}
char[] characterArray = word.toCharArray();
characterArray[0] = Character.toTitleCase(characterArray[0]);
wordArray[i] = new String(characterArray);
i++;
}
return StringUtils.join(wordArray, " "); // Re-join string
}

public static String toTitleCase(String word){
return Character.toUpperCase(word.charAt(0)) + word.substring(1);
}
public static void main(String[] args){
String phrase = "this is to be title cased";
String[] splitPhrase = phrase.split(" ");
String result = "";
for(String word: splitPhrase){
result += toTitleCase(word) + " ";
}
System.out.println(result.trim());
}

1. Java 8 Streams
public static String capitalizeAll(String str) {
if (str == null || str.isEmpty()) {
return str;
}
return Arrays.stream(str.split("\\s+"))
.map(t -> t.substring(0, 1).toUpperCase() + t.substring(1))
.collect(Collectors.joining(" "));
}
Examples:
System.out.println(capitalizeAll("jon skeet")); // Jon Skeet
System.out.println(capitalizeAll("miles o'Brien")); // Miles O'Brien
System.out.println(capitalizeAll("old mcdonald")); // Old Mcdonald
System.out.println(capitalizeAll(null)); // null
For foo bAR to Foo Bar, replace the map() method with the following:
.map(t -> t.substring(0, 1).toUpperCase() + t.substring(1).toLowerCase())
2. String.replaceAll() (Java 9+)
ublic static String capitalizeAll(String str) {
if (str == null || str.isEmpty()) {
return str;
}
return Pattern.compile("\\b(.)(.*?)\\b")
.matcher(str)
.replaceAll(match -> match.group(1).toUpperCase() + match.group(2));
}
Examples:
System.out.println(capitalizeAll("12 ways to learn java")); // 12 Ways To Learn Java
System.out.println(capitalizeAll("i am atta")); // I Am Atta
System.out.println(capitalizeAll(null)); // null
3. Apache Commons Text
System.out.println(WordUtils.capitalize("love is everywhere")); // Love Is Everywhere
System.out.println(WordUtils.capitalize("sky, sky, blue sky!")); // Sky, Sky, Blue Sky!
System.out.println(WordUtils.capitalize(null)); // null
For titlecase:
System.out.println(WordUtils.capitalizeFully("fOO bAR")); // Foo Bar
System.out.println(WordUtils.capitalizeFully("sKy is BLUE!")); // Sky Is Blue!
For details, checkout this tutorial.

BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter the sentence : ");
try
{
String str = br.readLine();
char[] str1 = new char[str.length()];
for(int i=0; i<str.length(); i++)
{
str1[i] = Character.toLowerCase(str.charAt(i));
}
str1[0] = Character.toUpperCase(str1[0]);
for(int i=0;i<str.length();i++)
{
if(str1[i] == ' ')
{
str1[i+1] = Character.toUpperCase(str1[i+1]);
}
System.out.print(str1[i]);
}
}
catch(Exception e)
{
System.err.println("Error: " + e.getMessage());
}

I decided to add one more solution for capitalizing words in a string:
words are defined here as adjacent letter-or-digit characters;
surrogate pairs are provided as well;
the code has been optimized for performance; and
it is still compact.
Function:
public static String capitalize(String string) {
final int sl = string.length();
final StringBuilder sb = new StringBuilder(sl);
boolean lod = false;
for(int s = 0; s < sl; s++) {
final int cp = string.codePointAt(s);
sb.appendCodePoint(lod ? Character.toLowerCase(cp) : Character.toUpperCase(cp));
lod = Character.isLetterOrDigit(cp);
if(!Character.isBmpCodePoint(cp)) s++;
}
return sb.toString();
}
Example call:
System.out.println(capitalize("An à la carte StRiNg. Surrogate pairs: 𐐪𐐪."));
Result:
An À La Carte String. Surrogate Pairs: 𐐂𐐪.

Use:
String text = "jon skeet, miles o'brien, old mcdonald";
Pattern pattern = Pattern.compile("\\b([a-z])([\\w]*)");
Matcher matcher = pattern.matcher(text);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(buffer, matcher.group(1).toUpperCase() + matcher.group(2));
}
String capitalized = matcher.appendTail(buffer).toString();
System.out.println(capitalized);

There are many way to convert the first letter of the first word being capitalized. I have an idea. It's very simple:
public String capitalize(String str){
/* The first thing we do is remove whitespace from string */
String c = str.replaceAll("\\s+", " ");
String s = c.trim();
String l = "";
for(int i = 0; i < s.length(); i++){
if(i == 0){ /* Uppercase the first letter in strings */
l += s.toUpperCase().charAt(i);
i++; /* To i = i + 1 because we don't need to add
value i = 0 into string l */
}
l += s.charAt(i);
if(s.charAt(i) == 32){ /* If we meet whitespace (32 in ASCII Code is whitespace) */
l += s.toUpperCase().charAt(i+1); /* Uppercase the letter after whitespace */
i++; /* Yo i = i + 1 because we don't need to add
value whitespace into string l */
}
}
return l;
}

package com.test;
/**
* #author Prasanth Pillai
* #date 01-Feb-2012
* #description : Below is the test class details
*
* inputs a String from a user. Expect the String to contain spaces and alphanumeric characters only.
* capitalizes all first letters of the words in the given String.
* preserves all other characters (including spaces) in the String.
* displays the result to the user.
*
* Approach : I have followed a simple approach. However there are many string utilities available
* for the same purpose. Example : WordUtils.capitalize(str) (from apache commons-lang)
*
*/
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class Test {
public static void main(String[] args) throws IOException{
System.out.println("Input String :\n");
InputStreamReader converter = new InputStreamReader(System.in);
BufferedReader in = new BufferedReader(converter);
String inputString = in.readLine();
int length = inputString.length();
StringBuffer newStr = new StringBuffer(0);
int i = 0;
int k = 0;
/* This is a simple approach
* step 1: scan through the input string
* step 2: capitalize the first letter of each word in string
* The integer k, is used as a value to determine whether the
* letter is the first letter in each word in the string.
*/
while( i < length){
if (Character.isLetter(inputString.charAt(i))){
if ( k == 0){
newStr = newStr.append(Character.toUpperCase(inputString.charAt(i)));
k = 2;
}//this else loop is to avoid repeatation of the first letter in output string
else {
newStr = newStr.append(inputString.charAt(i));
}
} // for the letters which are not first letter, simply append to the output string.
else {
newStr = newStr.append(inputString.charAt(i));
k=0;
}
i+=1;
}
System.out.println("new String ->"+newStr);
}
}

Here is a simple function
public static String capEachWord(String source){
String result = "";
String[] splitString = source.split(" ");
for(String target : splitString){
result += Character.toUpperCase(target.charAt(0))
+ target.substring(1) + " ";
}
return result.trim();
}

This is just another way of doing it:
private String capitalize(String line)
{
StringTokenizer token =new StringTokenizer(line);
String CapLine="";
while(token.hasMoreTokens())
{
String tok = token.nextToken().toString();
CapLine += Character.toUpperCase(tok.charAt(0))+ tok.substring(1)+" ";
}
return CapLine.substring(0,CapLine.length()-1);
}

Reusable method for intiCap:
public class YarlagaddaSireeshTest{
public static void main(String[] args) {
String FinalStringIs = "";
String testNames = "sireesh yarlagadda test";
String[] name = testNames.split("\\s");
for(String nameIs :name){
FinalStringIs += getIntiCapString(nameIs) + ",";
}
System.out.println("Final Result "+ FinalStringIs);
}
public static String getIntiCapString(String param) {
if(param != null && param.length()>0){
char[] charArray = param.toCharArray();
charArray[0] = Character.toUpperCase(charArray[0]);
return new String(charArray);
}
else {
return "";
}
}
}

Here is my solution.
I ran across this problem tonight and decided to search it. I found an answer by Neelam Singh that was almost there, so I decided to fix the issue (broke on empty strings) and caused a system crash.
The method you are looking for is named capString(String s) below.
It turns "It's only 5am here" into "It's Only 5am Here".
The code is pretty well commented, so enjoy.
package com.lincolnwdaniel.interactivestory.model;
public class StringS {
/**
* #param s is a string of any length, ideally only one word
* #return a capitalized string.
* only the first letter of the string is made to uppercase
*/
public static String capSingleWord(String s) {
if(s.isEmpty() || s.length()<2) {
return Character.toUpperCase(s.charAt(0))+"";
}
else {
return Character.toUpperCase(s.charAt(0)) + s.substring(1);
}
}
/**
*
* #param s is a string of any length
* #return a title cased string.
* All first letter of each word is made to uppercase
*/
public static String capString(String s) {
// Check if the string is empty, if it is, return it immediately
if(s.isEmpty()){
return s;
}
// Split string on space and create array of words
String[] arr = s.split(" ");
// Create a string buffer to hold the new capitalized string
StringBuffer sb = new StringBuffer();
// Check if the array is empty (would be caused by the passage of s as an empty string [i.g "" or " "],
// If it is, return the original string immediately
if( arr.length < 1 ){
return s;
}
for (int i = 0; i < arr.length; i++) {
sb.append(Character.toUpperCase(arr[i].charAt(0)))
.append(arr[i].substring(1)).append(" ");
}
return sb.toString().trim();
}
}

Here we go for perfect first char capitalization of word
public static void main(String[] args) {
String input ="my name is ranjan";
String[] inputArr = input.split(" ");
for(String word : inputArr) {
System.out.println(word.substring(0, 1).toUpperCase()+word.substring(1,word.length()));
}
}
}
//Output : My Name Is Ranjan

For those of you using Velocity in your MVC, you can use the capitalizeFirstLetter() method from the StringUtils class.

String s="hi dude i want apple";
s = s.replaceAll("\\s+"," ");
String[] split = s.split(" ");
s="";
for (int i = 0; i < split.length; i++) {
split[i]=Character.toUpperCase(split[i].charAt(0))+split[i].substring(1);
s+=split[i]+" ";
System.out.println(split[i]);
}
System.out.println(s);

package corejava.string.intern;
import java.io.DataInputStream;
import java.util.ArrayList;
/*
* wap to accept only 3 sentences and convert first character of each word into upper case
*/
public class Accept3Lines_FirstCharUppercase {
static String line;
static String words[];
static ArrayList<String> list=new ArrayList<String>();
/**
* #param args
*/
public static void main(String[] args) throws java.lang.Exception{
DataInputStream read=new DataInputStream(System.in);
System.out.println("Enter only three sentences");
int i=0;
while((line=read.readLine())!=null){
method(line); //main logic of the code
if((i++)==2){
break;
}
}
display();
System.out.println("\n End of the program");
}
/*
* this will display all the elements in an array
*/
public static void display(){
for(String display:list){
System.out.println(display);
}
}
/*
* this divide the line of string into words
* and first char of the each word is converted to upper case
* and to an array list
*/
public static void method(String lineParam){
words=line.split("\\s");
for(String s:words){
String result=s.substring(0,1).toUpperCase()+s.substring(1);
list.add(result);
}
}
}

If you prefer Guava...
String myString = ...;
String capWords = Joiner.on(' ').join(Iterables.transform(Splitter.on(' ').omitEmptyStrings().split(myString), new Function<String, String>() {
public String apply(String input) {
return Character.toUpperCase(input.charAt(0)) + input.substring(1);
}
}));

String toUpperCaseFirstLetterOnly(String str) {
String[] words = str.split(" ");
StringBuilder ret = new StringBuilder();
for(int i = 0; i < words.length; i++) {
ret.append(Character.toUpperCase(words[i].charAt(0)));
ret.append(words[i].substring(1));
if(i < words.length - 1) {
ret.append(' ');
}
}
return ret.toString();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Check if letter is emoji - java

Try this project simple-emoji-4j Compatible with Emoji 12.0 (2018.10.15) Simple with: EmojiUtils.containsEmoji(str)

Here you go - for (String word : sentence.split("")) { if (word.matches(emo_regex)) { System.out.println(word); } }

Related

Trying to delete all non letter parts of a word but this line deletes the whole word '"Have' from the tokenizer

How to split a string in format AB123 --> AB 123? Java

regex email validation

How to disregard numbers when reading from a text file?

How to capitalize the first character of each word in a string

Categories

Resources