I am attempting to split a word from its punctuation:
So for example if the word is "Hello?". I want to store "Hello" in one variable and the "?" in another variable.
I tried using .split method but deletes the delimiter (the punctuation) , which means you wouldn't conserve the punctuation character.
String inWord = "hello?";
String word;
String punctuation = null;
if (inWord.contains(","+"?"+"."+"!"+";")) {
String parts[] = inWord.split("\\," + "\\?" + "\\." + "\\!" + "\\;");
word = parts[0];
punctuation = parts[1];
} else {
word = inWord;
}
System.out.println(word);
System.out.println(punctuation);
I am stuck I cant see another method of doing it.
Thanks in advance
You could use a positive lookahead to split so you don't actually use the punctuation to split, but the position right before it:
inWord.split("(?=[,?.!;])");
ideone demo
Further to the other suggestions, you can also use the 'word boundary' matcher '\b'. This may not always match what you are looking for, it detects the boundary between a word and a non-word, as documented: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
In your example, it works, though the first element in the array will be a blank string.
Here is some working code:
String inWord = "hello?";
String word;
String punctuation = null;
if (inWord.matches(".*[,?.!;].*")) {
String parts[] = inWord.split("\\b");
word = parts[1];
punctuation = parts[2];
System.out.println(parts.length);
} else {
word = inWord;
}
System.out.println(word);
System.out.println(punctuation);
You can see it running here: http://ideone.com/3GmgqD
I've also fixed your .contains to use .matches instead.
I think you can use the below regex. But not tried. Give it a try.
input.split("[\\p{P}]")
You could use substring here. Something like this:
String inWord = "hello?";
String word = inWord.substring (0, 5);
String punctuation = inWord.substring (5, inWord.length ());
System.out.println (word);
System.out.println (punctuation);
Related
I want to replace words in a string, but I am having little difficulties. Here is what I want to do. I have string:
String a = "I want to replace some words in this string";
It should work like some kind of a translator. I am doing this with String.replaceAll(), but it doesn't work completely because of this. Let's say I am translating from English to German, than this should be the output (Ich means I in German).
String toTranslate = "I";
String translated = "Ich";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
Now the output of the String a will be this:
"ich want to replace some words ich**n** **th**ich**s** **str**ich**ng**"
How to replace just the words, not the subwords in the words?
replaceAll uses regex, so you may add word boundaries or look-around mechanisms to check if there are no non-space characters surrounding word you want to replace.
String toTranslate = "I";
String translated = "Ich";
a = a.replaceAll("(?<!\\S)"+toTranslate.toLowerCase()+"(?!\\S)", translated.toLowerCase());
You can also add quotation mechanism to escape any regex metacharacters like + * ( inside word you want to replace. BTW you don't need to change your string to lower case, simply add case-insensitive flag to regex (?i).
a = a.replaceAll("(?i)(?<!\\S)"+Pattern.quote(toTranslate)+"(?!\\S)", translated.toLowerCase());
Use split(" ") for getting each word in the sentence. And then use replaceAll on each word.
String a = "I want to replace some words in this string";
String toTranslate = "I";
String translated = "Ich";
String newString[]=a.split(" ");
for (String string : newString) {
string=string.replaceAll(toTranslate, toTranslate.toLowerCase());//Adding this line ensures you dont miss any uppercase toTranslate
string=string.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
System.out.println("after translation ="+string);
}
String toTranslate = "I ";
String translated = "Ich ";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
If you add a space after the "I" it should replace it when it comes to the word "Ich" but if your word ends in a "I" then thats another problem
If you assume that I will always be capitalized in English as it should be then
a = a.replaceAll(toTranslate, translated);
will work, otherwise you need to replace both cases
a = a.replaceAll(toTranslate, translated);
a = a.replaceAll("([^a-zA-Z])("+toTranslate.toLowerCase()+")([^a-zA-Z])", "$1"+translated.toLowerCase()+"$3");
Here is a working example
Yes, the word boundaries are the solution. I just did this in the regex:
text.replaceAll("\\b" + parts1[i] + "\\b", map.element.value);
Don't be confused with the second argument it's string (from Hash table).
You can use RegEx's word bound, which is \b
String toTranslate = "\\bI\\b";
String translated = "Ich";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
This should ensure I is separated entirely into its own word
Edit: I misread the question and realized you want whole words. See above, as I have accounted for that
I need to use regular expression to get some values from the String. Thing is, that it is quite complicated for me.
For example i have a string like this:
oneWord [first, second, third]
My output should be:
first
second
third
So i need words which are between [ and ]. Plus there can be a different number of words between [].
Tried using some regex creator, but that wasn't very accurate:
String re1=".*?"; // Non-greedy match on filler
String re2="(?:[a-z][a-z]+)"; // Uninteresting: word
String re3=".*?"; // Non-greedy match on filler
String re4="((?:[a-z][a-z]+))"; // Word 1
String re5=".*?"; // Non-greedy match on filler
String re6="((?:[a-z][a-z]+))"; // Word 2
String re7=".*?"; // Non-greedy match on filler
String re8="((?:[a-z][a-z]+))"; // Word 3
I would do it like this, in just one line:
String[] words = str.replaceAll(".*\\[|\\].*", "").split(", ");
The first replaceAll() call strips off the leading and trailing wrapper, and the split() breaks up what's left into separate words.
You could try the below regex and get the words you want from group index 1.
(?:\[|(?<!^)\G),? *(\w+)(?=[^\[\]]*\])
DEMO
Java regex would be,
(?:\\[|(?<!^)\\G),? *(\\w+)(?=[^\\[\\]]*\\])
Example:
String s = "oneWord [first, second, third] foo bar [foobar]";
Pattern regex = Pattern.compile("(?:\\[|(?<!^)\\G),? *(\\w+)(?=[^\\[\\]]*\\])");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
first
second
third
foobar
You should use this string.
String[] words = str.replaceAll(".\[|\].", "").split(", ");
Hope it helps.
You can do it easily with method split.
String string = [first, second, third];
String[] parts = string.split(",");
String part1 = parts[0]; // first
String part2 = parts[1]; // second
String part3 = parts[2]; // third
if it dont work for you, please tell me that I will debug your regular expression.
I'm able to separate the words in the sentence but I do not know how to check if a word contains a character other than a letter. You don't have to post an answer just some material I could read to help me.
public static void main(String args [])
{
String sentance;
String word;
int index = 1;
System.out.println("Enter sentance please");
sentance = EasyIn.getString();
String[] words = sentance.split(" ");
for ( String ss : words )
{
System.out.println("Word " + index + " is " + ss);
index++;
}
}
What I would do is use String#matches and use the regex [a-zA-Z]+.
String hello = "Hello!";
String hello1 = "Hello";
System.out.println(hello.matches("[a-zA-Z]+")); // false
System.out.println(hello1.matches("[a-zA-Z]+")); // true
Another solution is if (Character.isLetter(str.charAt(i)) inside a loop.
Another solution is something like this
String set = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
String word = "Hello!";
boolean notLetterFound;
for (char c : word.toCharArray()){ // loop through string as character array
if (!set.contains(c)){ // if a character is not found in the set
notLetterfound = true; // make notLetterFound true and break the loop
break;
}
}
if (notLetterFound){ // notLetterFound is true, do something
// do something
}
I prefer the first answer though, using String#matches
For more reference goto-> How to determine if a String has non-alphanumeric characters?
Make the following changes in pattern "[^a-zA-Z^]"
Not sure if I understand your question, but there is the
Character.isAlpha(c);
You would iterate over all characters in your string and check whether they are alphabetic (there are other "isXxxxx" methods in the Character class).
You could loop through the characters in the word calling Character.isLetter(), or maybe check if it matches a regular expression e.g. [\w]* (this would match the word only if its contents are all characters).
you can use charector array to do this like..
char[] a=ss.toCharArray();
not you can get the charector at the perticulor index.
with "word "+index+" is "+a[index];
What is the simplest way to get the last word of a string in Java? You can assume no punctuation (just alphabetic characters and whitespace).
String test = "This is a sentence";
String lastWord = test.substring(test.lastIndexOf(" ")+1);
String testString = "This is a sentence";
String[] parts = testString.split(" ");
String lastWord = parts[parts.length - 1];
System.out.println(lastWord); // "sentence"
Here is a way to do it using String's built-in regex capabilities:
String lastWord = sentence.replaceAll("^.*?(\\w+)\\W*$", "$1");
The idea is to match the whole string from ^ to $, capture the last sequence of \w+ in a capturing group 1, and replace the whole sentence with it using $1.
Demo.
If other whitespace characters are possible, then you'd want:
testString.split("\\s+");
You can do that with StringUtils (from Apache Commons Lang). It avoids index-magic, so it's easier to understand. Unfortunately substringAfterLast returns empty string when there is no separator in the input string so we need the if statement for that case.
public static String getLastWord(String input) {
String wordSeparator = " ";
boolean inputIsOnlyOneWord = !StringUtils.contains(input, wordSeparator);
if (inputIsOnlyOneWord) {
return input;
}
return StringUtils.substringAfterLast(input, wordSeparator);
}
Get the last word in Kotlin:
String.substringAfterLast(" ")
String s="print last word";
x:for(int i=s.length()-1;i>=0;i--) {
if(s.charAt(i)==' ') {
for(int j=i+1;j<s.length();j++) {
System.out.print(s.charAt(j));
}
break x;
}
}
I have a String str which can have list of values like below. I want the first letter in the string to be uppercase and if underscore appears in the string then i need to remove it and need to make the letter after it as upper case. The rest all letter i want it to be lower case.
""
"abc"
"abc_def"
"Abc_def_Ghi12_abd"
"abc__de"
"_"
Output:
""
"Abc"
"AbcDef"
"AbcDefGhi12Abd"
"AbcDe"
""
Well, without showing us that you put any effort into this problem this is going to be kinda vague.
I see two possibilities here:
Split the string at underscores, apply the answer from this question to each part and re-combine them.
Create a StringBuilder, walk through the string and keep track of whether you are
at the start of the string
after an underscore or
somewhere else
and act appropriately on the current character before appending it to the StringBuilder instance.
replace _ with space (str.replace("_", " "))
use WordUtils.capitalizeFully(str); (from commons-lang)
replace space with nothing (str.replace(" ", ""))
You can use following regexp based code:
public static String camelize(String input) {
char[] c = input.toCharArray();
Pattern pattern = Pattern.compile(".*_([a-z]).*");
Matcher m = pattern.matcher(input);
while ( m.find() ) {
int index = m.start(1);
c[index] = String.valueOf(c[index]).toUpperCase().charAt(0);
}
return String.valueOf(c).replace("_", "");
}
Use Pattern/Matcher in the java.util.regex package:
for each string that is in your array do the following:
StringBuffer output = new StringBuffer();
Matcher match = Pattern.compile("[^|_](\w)").matcher(inStr);
while(match.find()) {
match.appendReplacement(output, matcher.match(0).ToUpper());
}
match.appendTail(output);
// Will have the properly capitalized string.
String capitalized = output.ToString();
The regular expression looks for either the start of the string or an underscore "[^|_]"
Then puts the following character into a group "(\w)"
The code then goes through each of the matches in the input string capitalizing the first satisfying group.