java regex replace any double letter in word with single - java

I've been searching for hours but can't find an answer, I apologize if this has been answered before.
I'm trying to check each word in a message for any double letters and remove the extra letter, words like wall or doll for example would become wal or dol. the purpose is for a fake language translation for a game, so far I've gottan as far as identifying the double letters but don't know how to replace them.
here's my code so far:
public String[] removeDouble(String[] words){
Pattern pattern = Pattern.compile("(\\w)\\1+");
for (int i = 0; i < words.length; i++){
Matcher matcher = pattern.matcher(words[i]);
if (matcher.find()){
words[i].replaceAll("what to replace with?");
}
}
return words;
}

You can do the whole replacement operation in one statement if you use back references:
for (int i = 0; i < words.length; i++)
words[i] = words[i].replaceAll("(.)\\1", "$1");
Note that you must assign the value returned from string methods that (appear to) change strings, because they return new strings rather than mutate the string.

String.replaceAll does not modify the string in-place. (Java String is immutable) You need assign the returned value back.
And the String.replaceAll accepts two parameters.
Replace following line:
words[i].replaceAll("what to replace with?");
with:
words[i] = "what to replace with?";

Related

How to count a word having an apostrophe as two separate words using Java regular expressions

I have a string which is having a word with an apostrophe.
Ex- He is a very very good boy, isn't he?
public class Solution {
public static void main(String[] args) {
String s = "He is a very very good boy, isn't he?";
String[] words = s.split("\\s+");
int itemCount = words.length;
System.out.println(itemCount);
for (int i = 0; i < itemCount; i++) {
String word = words[i];
System.out.println(word);
}
}
}
Output I'm getting is 9 words. But I want the count as 10, by separating isn't as 2 words. How to do it using the above Regular Expression?
It would be more reliable to use the \w construct:
Pattern p = Pattern.compile("(\\w)+");
Matcher m = p.matcher("He is a very very good boy, isn't he?");
while (m.find()) {
System.out.println(m.group(0));
}
Otherwise, you need to handle too many situations manually, for instance: "He's a very good boy.Isn't he?".
You can try using p{Punct}, which ignores characters like ?!
String s = "He is a very very good boy, isn't he?";
String[] words = s.split("[\\p{Punct}\\s]+");
int itemCount = words.length;
System.out.println(itemCount);
for (int i = 0; i < itemCount; i++) {
String word = words[i];
System.out.println(word);
}
Split on non-word chars:
String[] words = s.split("\\W+")
I think you want isn't to be is not and so count them as 2 separate words and not single one.
You can have or (|) in split regular expression,
\\s+|'t
This will only for 't and it will avoid to count for sentence like my friend's birthday.. here apostrophe should not be considered for another word.
But that's not just an end of the story. There are lot of other contractions are there which should be consider in such expression.
i.e.
't : isn't, aren't, wasn't, weren't, wouldn't, didn't etc.
's : it's, that's, etc. (This is difficult one)
'd : I'd, you'd etc.
'll : I'll, they'll etc.
...
So ultimately following regular expression will solve 90% of the problem counting word.
\\s+|'t|'d|'ll
Problem with 's(apostrophe S) is it comes with subject like Dog's, Cat's etc. which shows possession and these should not be considered as two separate words. On the other end some time we use 's to write It is, That is(That's, It's) etc. You can add the expressions in existing regular expression to differentiate between contractions and apostrophe which shows possession.
Note : This is only for counting the words and it will split isn't as isn and (space), 't will be removed.

Formatting String Array efficiently in Java

I was working on some string formatting, and I was curious if I was doing it the most efficient way.
Assume I have a String Array:
String ArrayOne[] = {"/test/" , "/this/is/test" , "/that/is/" "/random/words" }
I want the result Array to be
String resultArray[] = {"test", "this_is_test" , "that_is" , "random_words" }
It's quite messy and brute-force-like.
for(char c : ArrayOne[i].toCharArray()) {
if(c == '/'){
occurances[i]++;
}
}
First I count the number of "/" in each String like above and then using these counts, I find the indexOf("/") for each string and add "_" accordingly.
As you can see though, it gets very messy.
Is there a more efficient way to do this besides the brute-force way I'm doing?
Thanks!
You could use replaceAll and replace, as follows:
String resultArray[] = new String[ArrayOne.length];
for (int i = 0; i < ArrayOne.length; ++i) {
resultArray[i] = ArrayOne[i].replaceAll("^/|/$", "").replace('/', '_');
}
The replaceAll method searches the string for a match to the regex given in the first argument, and replaces each match with the text in the second argument.
Here, we use it first to remove leading and trailing slashes. We search for slashes at the start of the string (^/) or the end of the string (/$), and replace them with nothing.
Then, we replace all remaining slashes with underscores using replace.

Is there a way to remove characters from a string? Java

I am having trouble removing letters from a string. String ALPHABET = "abcdefghjklmnopqrstuvwxyz"; User puts in a string. "klmn". How would i remove klmn from the alphabet? Is there a way? Other then putting it into an array?
This is what i started with. This only removes the last letter in the string. Whats my problem here.
for(int i = 0; i < message.length(); i++){
for(int j = 0; j < ALPHABET.length(); j++){
letter = message.charAt(i);
if(ALPHABET.charAt(j) == message.charAt(i)){
newALPHABET = ALPHABET.replace(letter, ' ');
}
}
}
Don't know what you want to do but you can use String#replace
String alphabet = "abcdefghjklmnopqrstuvwxyz";
alphabet = alphabet.replace("klmn","");
Write a method to delete it.. the logic here is replace the char you want to delete with the next char.. and in place of second one keep the third char and so on..
if you want to delete a large length of String..
then use the method Replace..
You can do that with regular expressions. Try the next:
static String ALPHABET = "abcdefghjklmnopqrstuvwxyz";
public static void main(String[] args) {
String input = JOptionPane.showInputDialog("Letters: ");
Pattern p = Pattern.compile("[" + Pattern.quote(input) +"]");
Matcher m = p.matcher(ALPHABET);
String result = m.replaceAll("");
System.out.println(result);
}
If you simply wanted to replace a character or simple substring, then String.replace is the solution.
If you wanted to replace matches a regex, then String.replaceAll is the the solution.
The reason your code is not working is because there are a couple of bugs in it:
You appear to be under the impression that String.replace(char, char) replaces a single character instance. In fact, it replaces all instance of the first character in the String.
Each loop iteration creates a new String and assigns it to newALPHABET. But then you start again with ALPHABET on the next iteration.
If the aim is to produce an "alphabet" that excludes the letters in message, then the correct solution is something like this:
for (int i = 0; i < message.length(); i++) {
ALPHABET = ALPHABET.replace(message.charAt(i), ' ');
}
... except that you should NOT use ALPHABET as the name of a variable. It should be alphabet!!!

Splitting strings based on a delimiter

I am trying to break apart a very simple collection of strings that come in the forms of
0|0
10|15
30|55
etc etc. Essentially numbers that are seperated by pipes.
When I use java's string split function with .split("|"). I get somewhat unpredictable results. white space in the first slot, sometimes the number itself isn't where I thought it should be.
Can anybody please help and give me advice on how I can use a reg exp to keep ONLY the integers?
I was asked to give the code trying to do the actual split. So allow me to do that in hopes to clarify further my problem :)
String temp = "0|0";
String splitString = temp.split("|");
results
\n
0
|
0
I am trying to get
0
0
only. Forever grateful for any help ahead of time :)
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
You can do replace white space for pipes and split it.
String test = "0|0 10|15 30|55";
test = test.replace(" ", "|");
String[] result = test.split("|");
Hope this helps for you..
You can use StringTokenizer.
String test = "0|0";
StringTokenizer st = new StringTokenizer(test);
int firstNumber = Integer.parseInt(st.nextToken()); //will parse out the first number
int secondNumber = Integer.parseInt(st.nextToken()); //will parse out the second number
Of course you can always nest this inside of a while loop if you have multiple strings.
Also, you need to import java.util.* for this to work.
The pipe ('|') is a special character in regular expressions. It needs to be "escaped" with a '\' character if you want to use it as a regular character, unfortunately '\' is a special character in Java so you need to do a kind of double escape maneuver e.g.
String temp = "0|0";
String[] splitStrings = temp.split("\\|");
The Guava library has a nice class Splitter which is a much more convenient alternative to String.split(). The advantages are that you can choose to split the string on specific characters (like '|'), or on specific strings, or with regexps, and you can choose what to do with the resulting parts (trim them, throw ayway empty parts etc.).
For example you can call
Iterable<String> parts = Spliter.on('|').trimResults().omitEmptyStrings().split("0|0")
This should work for you:
([0-9]+)
Considering a scenario where in we have read a line from csv or xls file in the form of string and need to separate the columns in array of string depending on delimiters.
Below is the code snippet to achieve this problem..
{ ...
....
String line = new BufferedReader(new FileReader("your file"));
String[] splittedString = StringSplitToArray(stringLine,"\"");
...
....
}
public static String[] StringSplitToArray(String stringToSplit, String delimiter)
{
StringBuffer token = new StringBuffer();
Vector tokens = new Vector();
char[] chars = stringToSplit.toCharArray();
for (int i=0; i 0) {
tokens.addElement(token.toString());
token.setLength(0);
i++;
}
} else {
token.append(chars[i]);
}
}
if (token.length() > 0) {
tokens.addElement(token.toString());
}
// convert the vector into an array
String[] preparedArray = new String[tokens.size()];
for (int i=0; i < preparedArray.length; i++) {
preparedArray[i] = (String)tokens.elementAt(i);
}
return preparedArray;
}
Above code snippet contains method call to StringSplitToArray where in the method converts the stringline into string array splitting the line depending on the delimiter specified or passed to the method. Delimiter can be comma separator(,) or double code(").
For more on this, follow this link : http://scrapillars.blogspot.in

Java: Finding the number of word matches in a given string

I am trying to find the number of word matches for a given string and keyword combination, like this:
public int matches(String keyword, String text){
// ...
}
Example:
Given the following calls:
System.out.println(matches("t", "Today is really great, isn't that GREAT?"));
System.out.println(matches("great", "Today is really great, isn't that GREAT?"));
The result should be:
0
2
So far I found this: Find a complete word in a string java
This only returns if the given keyword exists but not how many occurrences. Also, I am not sure if it ignores case sensitivity (which is important for me).
Remember that substrings should be ignored! I only want full words to be found.
UPDATE
I forgot to mention that I also want keywords that are separated via whitespace to match.
E.g.
matches("today is", "Today is really great, isn't that GREAT?")
should return 1
Use a regular expression with word boundaries. It's by far the easiest choice.
int matches = 0;
Matcher matcher = Pattern.compile("\\bgreat\\b", Pattern.CASE_INSENSITIVE).matcher(text);
while (matcher.find()) matches++;
Your milage may vary on some foreign languages though.
How about taking advantage of indexOf ?
s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
int count = 0;
int x;
int y = s2.length();
while((x=s1.indexOf(s2)) != -1){
count++;
s1 = s1.substr(x,x+y);
}
return count;
Efficient version
int count = 0;
int y = s2.length();
for(int i=0; i<=s1.length()-y; i++){
int lettersMatched = 0;
int j=0;
while(s1[i]==s2[j]){
j++;
i++;
lettersMatched++;
}
if(lettersMatched == y) count++;
}
return count;
For more efficient solution, you will have to modify KMP algorithm a little. Just google it, its simple.
well,you can use "split" to separate the words and find if there exists a word matches exactly.
hope that helps!
one option would be RegEx. Basically it sounds like you are looking to match a word with any punctuation on the left or right. so:
" great."
" great!"
" great "
" great,"
"Great"
would all match, but
"greatest"
wouldn't

Categories

Resources