Java Regular expression to replace a string in text

Java Regular expression to replace a string in text - java

I have a large string where I will see a sequence of digits. I have to append a character in front of the number. lets take an example. my string is..
String s= "Microsoft Ventures' start up \"98756\" accelerator wrong launched in apple in \"2012\" has been one of the most \"4241\" prestigious such programs in the country.";
I am looking for a way in Java to add a character in front of each number.so I am expecting the modified string will looks like...
String modified= "Microsoft Ventures' start up \"x98756\" accelerator wrong launched in apple in \"x2012\" has been one of the most \"x4241\" prestigious such programs in the country.";
How do I do that in Java?

The regex to find the numerical part will be "\"[0-9]+\"". The approach I will do is loop through the original string by word, if the word matches the pattern, replace it.
String[] tokens = s.split(" ");
String modified = "";
for (int i = 0 ; i < tokens.length ; i++) {
// the digits are found
if (Pattern.matches("\"[0-9]+\"", tokens[i])) {
tokens[i] = "x" + tokens[i];
}
modified = modified + tokens[i] + " ";
}
The code is simply to give you the idea, please optimize it yourself (using StringBuilder to concatenate strings and etc).

The best way I could see to do this would be to split up the string into various sbustrings and append characters onto it. Something like the following:
String s="foo \67\ blah \89\"
String modified=" ";
String temp =" ";
int index=0;
char c=' ';
for(int i=0; i<s.length(); ++i) {
c=s.charAt(i);
if (Character.isDigit(c)) {
temp=s.substring(index, i-1);
modified=modified+temp+'x';
int j=i;
while(Character.isDigit(c)) {
modified+=s[j];
++j;
c=s.charAt(j);
}
index=j;
}
}

Related

Pig it method that I am trying to make trouble checking punctuation at the end java

I am trying to answer this question.
Move the first letter of each word to the end of it, then add "ay" to the end of the word. Leave punctuation marks untouched.
This is what I did so far:
public static String pigIt(String str) {
//Populating the String argument into the String Array after splitting them by spaces
String[] strArray = str.split(" ");
System.out.println("\nPrinting strArray: " + Arrays.toString(strArray));
String toReturn = "";
for (int i = 0; i < strArray.length; i++) {
String word = strArray[i];
for (int j = 1; j < word.length(); j++) {
toReturn += Character.toString(word.charAt(j));
}
//Outside of inner for loop
if (!(word.contains("',.!?:;")) && (i != strArray.length - 1)) {
toReturn += Character.toString(word.charAt(0)) + "ay" + " ";
} else if (word.contains("',.!?:;")) {
toReturn += Character.toString(word.charAt(0)) + "ay" + " " + strArray[strArray.length - 1];
}
}
return toReturn;
}
It is supposed to return the punctuation mark without adding "ay" + "". I think I am overthinking but please help. Please see the below debugger.

One of the problems here is that your else if statement is never being invoked. The .contains method will not work with multiple characters like that unless you are trying to match them all. In your conditions you are essentially asking if the word matches that entire string "',.!?:;". If you just keep the exclamation point in there it will work invoke it. I don't know how else you can use contains besides making a condition for each one like word.contains("!")|| word.contains(",")|| word.contains("'"), etc.. You can also use regex for this problem.
Alternatively, you can use something like,
Character ch = new Character(yourString.charAt(i));
if(!Character.isAlphabetic(yourString.charAt(i))) {
to determine if a character is not an alphabetical one, and is a symbol or punctuation.

I think the best way is not relay on str.split("\\s++"), because you could have punctuation in any plase. The best one is to look through the string and find all not letter or digit symbols. After that you can define a word borders and translate it.
public static String pigIt(String str) {
StringBuilder buf = new StringBuilder();
for (int i = 0, j = 0; j <= str.length(); j++) {
char ch = j < str.length() ? str.charAt(j) : '\0';
if (Character.isLetterOrDigit(ch))
continue;
if (i < j) {
buf.append(str.substring(i + 1, j));
buf.append(str.charAt(i));
buf.append("ay");
}
if (ch != '\0')
buf.append(ch);
i = j + 1;
}
return buf.toString();
}
Output:
System.out.println(pigIt(",Hello, !World")); // ,elloHay, !orldWay

Regex may be difficult to start with but is very powerful:
public static String pigIt(String str) {
return str.replaceAll("([a-zA-Z])([a-zA-Z]*)", "$2$1ay");
}
The () specify groups. So I have one group with the first alphabet character and a second group with the remaining alphabet characters.
In the replace parameter you can refer to these groups ($1, $2).
String.replaceAll will search all matching string parts and apply the replacement. Non matching characters like the punctuations are left untouched.
public static void main(String[] args) {
System.out.println("Hello, World, ! -->"+ pigIt("Hello, World, !"));
System.out.println("Hello?, Wo$, F, ! -->"+ pigIt("Hello?, Wo$, F, !"));
}
The output of this method is:
Hello, World, ! -->elloHay, orldWay, !
Hello?, Wo$, F, ! -->elloHay?, oWay$, Fay, !

split a string when there is a change in character without a regular expression

There is a way to split a string into repeating characters using a regex function but I want to do it without using it.
for example, given a string like: "EE B" my output will be an array of strings e.g
{"EE", " ", "B"}
my approach is:
given a string I will first find the number of unique characters in a string so I know the size of the array. Then I will change the string to an array of characters. Then I will check if the next character is the same or not. if it is the same then append them together if not begin a new string.
my code so far..
String myinput = "EE B";
char[] cinput = new char[myinput.length()];
cinput = myinput.toCharArray(); //turn string to array of characters
int uniquecha = myinput.length();
for (int i = 0; i < cinput.length; i++) {
if (i != myinput.indexOf(cinput[i])) {
uniquecha--;
} //this should give me the number of unique characters
String[] returninput = new String[uniquecha];
Arrays.fill(returninput, "");
for (int i = 0; i < uniquecha; i++) {
returninput[i] = "" + myinput.charAt(i);
for (int j = 0; j < myinput.length - 1; j++) {
if (myinput.charAt(j) == myinput.charAt(j + 1)) {
returninput[j] += myinput.charAt(j + 1);
} else {
break;
}
}
} return returninput;
but there is something wrong with the second part as I cant figure out why it is not beginning a new string when the character changes.

You question says that you don't want to use regex, but I see no reason for that requirement, other than this is maybe homework. If you are open to using regex here, then there is a one line solution which splits your input string on the following pattern:
(?<=\S)(?=\s)|(?<=\s)(?=\S)
This pattern uses lookarounds to split whenever what precedes is a non whitespace character and what proceeds is a whitespace character, or vice-versa.
String input = "EE B";
String[] parts = input.split("(?<=\\S)(?=\\s)|(?<=\\s)(?=\\S)");
System.out.println(Arrays.toString(parts));
[EE, , B]
^^ a single space character in the middle
Demo

If I understood correctly, you want to split the characters in a string so that similar-consecutive characters stay together. If that's the case, here is how I would do it:
public static ArrayList<String> splitString(String str)
{
ArrayList<String> output = new ArrayList<>();
String combo = "";
//iterates through all the characters in the input
for(char c: str.toCharArray()) {
//check if the current char is equal to the last added char
if(combo.length() > 0 && c != combo.charAt(combo.length() - 1)) {
output.add(combo);
combo = "";
}
combo += c;
}
output.add(combo); //adds the last character
return output;
}
Note that instead of using an array (has a fixed size) to store the output, I used an ArrayList, which has a variable size. Also, instead of checking the next character for equality with the current one, I preferred to use the last character for that. The variable combo is used to temporarily store the characters before they go to output.
Now, here is one way to print the result following your guidelines:
public static void main(String[] args)
{
String input = "EEEE BCD DdA";
ArrayList<String> output = splitString(input);
System.out.print("[");
for(int i = 0; i < output.size(); i++) {
System.out.print("\"" + output.get(i) + "\"");
if(i != output.size()-1)
System.out.print(", ");
}
System.out.println("]");
}
The output when running the above code will be:
["EEEE", " ", "B", "C", "D", " ", "D", "d", "A"]

How to replace the n th occurance of a character in a String ?

I need to replace all commas after the 5th one. So if a String contains 10 commans, I want to leave only the first 5, and remove all subsequent commas.
How can I do this ?
String sentence = "Test,test,test,test,test,test,test,test";
String newSentence = sentence.replaceAll(",[6]","");

Just capture all the characters from the start upto the 5th comma and match all the remaining commas using the alternation operator |. So , after | should match all the remaining commas. By replacing all the matched chars with $1 will give you the desired output.
sentence.replaceAll("^((?:[^,]*,){5})|,", "$1");
DEMO

In case you were wondering how to solve this problem without using regular expressions... There are libraries that could make your life easier but here is the first thought that came to mind.
public String replaceSpecificCharAfter( String input, char find, int deleteAfter){
char[] inputArray = input.toCharArray();
String output = "";
int count = 0;
for(int i=0; i <inputArray.length; i++){
char letter = inputArray[i];
if(letter == find){
count++;
if (count <= deleteAfter){
output += letter;
}
}
else{
output += letter;
}
}
return output;
}
Then you would invoke the function like so:
String sentence = "Test,test,test,test,test,test,test,test";
String newSentence = replaceSpecificCharAfter(sentence, ',', 6);

Copy the first N words in a string in java

I want to select the first N words of a text string.
I have tried split() and substring() to no avail.
What I want is to select the first 3 words of the following prayer and copy them to another variable.
For example if I have a string:
String greeting = "Hello this is just an example"
I want to get into the variable Z the first 3 words so that
Z = "Hello this is"

String myString = "Copying first N numbers of words to a string";
String [] arr = myString.split("\\s+");
//Splits words & assign to the arr[] ex : arr[0] -> Copying ,arr[1] -> first
int N=3; // NUMBER OF WORDS THAT YOU NEED
String nWords="";
// concatenating number of words that you required
for(int i=0; i<N ; i++){
nWords = nWords + " " + arr[i] ;
}
System.out.println(nWords);
NOTE : Here .split() function returns an array of strings computed by splitting a given string around matches of the given regular expression
so if i write the code like follows
String myString = "1234M567M98723651";
String[] arr = myString.split("M"); //idea : split the words if 'M' presents
then answers will be : 1234 and 567 where stored into an array.
This is doing by storing the split values into the given array. first split value store to arr[0], second goes to arr[1].
Later part of the code is for concatenating the required number of split words
Hope that you can get an idea from this!!!
Thank you!

public String getFirstNStrings(String str, int n) {
String[] sArr = str.split(" ");
String firstStrs = "";
for(int i = 0; i < n; i++)
firstStrs += sArr[i] + " ";
return firstStrs.trim();
}
Now getFirstNStrings("Hello this is just an example", 3); will output:
Hello this is

You could try something like:
String greeting = "Hello this is just an example";
int end = 0;
for (int i=0; i<3; i++) {
end = greeting.indexOf(' ', end) + 1;
}
String Z = greeting.substring(0, end - 1);
N.B. This assumes there are at least three space characters in your source string. Any less and this code will probably fail.

Add this in a utility class, such as Util.java
public static String getFirstNWords(String s, int n) {
if (s == null) return null;
String [] sArr = s.split("\\s+");
if (n >= sArr.length)
return s;
String firstN = "";
for (int i=0; i<n-1; i++) {
firstN += sArr[i] + " ";
}
firstN += sArr[n-1];
return firstN;
}
Usage:
Util.getFirstNWords("This will give you the first N words", 3);
---->
"This will give"

If you use Apache Commons Lang3, you can make it a little shorter like this:
public String firstNWords(String input, int numOfWords) {
String[] tokens = input.split(" ");
tokens = ArrayUtils.subarray(tokens, 0, numOfWords);
return StringUtils.join(tokens, ' ');
}

Most of the answers posted already use regular expressions which can become an overhead if we have to process a large number of strings. Even str.split(" ") uses regular expression operations internally. dave's answer is perhaps the mos efficient, but it does not handle correctly strings that have multiple spaces occurring together, beside assuming that regular space is the only word separator and that the input string has 3 or more words (an assumption he has already called out). If using Apache Commons in an option, then I would use the following code as it is not only concise and avoids using regular expression even internally but also handled gracefully input strings that have less than 3 words:
/* Splits by whitespace characters. All characters after the 3rd whitespace,
* if present in the input string, go into the 4th "word", which could really
* be a concanetation of multiple words. For the example in the question, the
* 4th "word" in the result array would be "just an example". Invoking the
* utility method with max-splits specified is slightly more efficient as it
* avoids the need to look for and split by space after the first 3 words have
* been extracted
*/
String[] words = StringUtils.split(greeting, null, 4);
String Z = StringUtils.join((String[]) ArrayUtils.subarray(words, 0, 3), ' ');

Delete next two characters in string with indexOf and map

i have a problem with a algorithm.
I have a Map (Each key int its a hex unicode character) and a String with unicode characters.
I want to delete the next character in the string when i found a character that exists as key in my map.
for example my map contains those keys: 0x111,0x333,0x444,0x555,0x666 and my string its:
0x111+0xffff+0x444+0xEEEEE+0x666
I want to convert it to:
0x111+0x444+0x666
I have this but this doesnt work:
private String cleanFlags(String text) {
int textLong = text.length();
for (int i = 0; i < textLong; i++) {
if (flagCountryEmojis.containsKey(text.codePointAt(text.charAt(i)))) {
text = text.replace(text.substring(i + 1, i + 2), "");
textLong-=2;
}
}
return text;
}
How can i do it this?

Since you didn't mention anything about space complexity, I went ahead and took the liberties of using an array to solve the question:
public String cleanFlags(String text){
String [] arr = text.split("+");
String newText = "";
for(int i = 0; i < arr.length; i++){
if(flagCountryEmojis.containsKey(arr[i])){
newText += arr[i];
i++; // skips the next character
}
if(i < arr.length - 1)
newText += "+";
}
return newText;
}
Not sure if this solution solves your problem, since strings are immutable anyways, and calling "replace" simply creates a new string in the background, I went ahead and created a new string for you and returned the result when it is populated correctly.
Lemme know if there is something I am missing or other restrictions that were unmentioned.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regular expression to replace a string in text - java

Related

Pig it method that I am trying to make trouble checking punctuation at the end java

split a string when there is a change in character without a regular expression

How to replace the n th occurance of a character in a String ?

Copy the first N words in a string in java

Delete next two characters in string with indexOf and map

Categories

Resources