Detect and replace unpaired markdown symbols, not changing paired symbols - java

I have this
Hello \**how are you\**? I'm fine*
need to get this
Hello *how are you*? I'm fine\*
I can get
Hello *how are you*? I'm fine*
but then I'm lost, since s.replace("*', "\*") is not an option
Basically the problem is about the matching (paired) * needing to be replaced with *, while unpaired * needing to be escaped.

The basic idea is to split the string in words, then find out which words have unpaired '*'.
String text1 = "*This* is **my** text*";
String[] words = text1.split(" ");
for(int i=0; i<words.length; i++){
int count = words[i].length() - words[i].replace("*", "").length(); // count the number of '*'
if(count%2 != 0){ // if it's not paired we can replace with '\*'
words[i] = words[i].replace("*", "\\*");
}
}
System.out.println(String.join(" ", words));
Which prints out: *This* is **my** text\*

Someone helped me out with this:
String text1 = "*This is **my** text* ";
System.out.println(text1.replaceAll("(?<=[^*\\\\]|\\A)[*](?=[^*\\\\]|\\z)", "\\\\*"));
which prints \*This is **my** text\*

Related

How to replace a specific string in Java?

I usually don't ask for help but here I really need it.
I have the following code example:
String text = "aa aab aa aab";
text = text.replace("aa", "--");
System.out.println(text);
Console output: -- --b -- --b
I have a question, how do I only replace aa parts of the string not aab included.
So the console output is:
-- aab -- aab
I have another example:
String text = "111111111 1";
text = text.replace("1", "-");
System.out.println(text);
Console output: --------- -
I only want to replace a single character, not all the same ones who are placed together.
So the console output is:
111111111 -
Are there any Java shortcuts for situations like these? I can't figure it out, how to only replace specific part of the string. Any help would be appreciated :)
You could use a regular expression with String.replaceAll(String, String). By using word boundaries (\b), something like
String[] texts = { "aa aab aa aab", "111111111 1" };
String[] toReplace = { "aa", "1" };
String[] toReplaceWith = { "--", "-" };
for (int i = 0; i < texts.length; i++) {
String text = texts[i];
text = text.replaceAll("\\b" + toReplace[i] + "\\b", toReplaceWith[i]);
System.out.println(text);
}
Outputs (as requested)
-- aab -- aab
111111111 -
You can use a regex
String text = "111111111 1";
text = text.replaceAll("1(?=[^1]*$)", "");
System.out.println(text);
Explanation:
String.replaceAll takes a regex contrarily to String.replace which takes a litteral to replace
(?=reg) the right part of the regex must be followed by a string matching the regex reg, but only the right part will be captured
[^1]* means a sequence from 0 to any number of characters different from '1'
$ means the end of the string is reached
In plain english, this means: Please replace by an empty string all the occurrences of the '1' character followed by any number of characters different from '1' until the end of the string.
We can use the StringTokenizer present in Java to acheive the solution for any kind of input. Below is the sample solution,
public class StringTokenizerExample {
/**
* #param args
*/
public static void main(String[] args) {
String input = "aa aab aa aab";
String output = "";
String replaceWord = "aa";
String replaceWith = "--";
StringTokenizer st = new StringTokenizer(input," ");
System.out.println("Before Replace: "+input);
while (st.hasMoreElements()) {
String word = st.nextElement().toString();
if(word.equals(replaceWord)){
word = replaceWith;
if(st.hasMoreElements()){
word = " "+word+" ";
}else{
word = " "+word;
}
}
output = output+word;
}
System.out.println("After Replace: "+output);
}

Java Regex : How to search a text or a phrase in a large text

I have a large text file and I need to search a word or a phrase in the file line by line and output the line with the text found in it.
For example, the sample text is
And the earth was without form,
Where [art] thou?
if the user search for thou word, the only line to be display is
Where [art] thou?
and if the user search for the earth, the first line should be displayed.
I tried using the contains function but it will display also the without when searching only for thou.
This is my sample code :
String[] verseList = TextIO.readFile("pentateuch.txt");
Scanner kbd = new Scanner(System.in);
int counter = 0;
for (int i = 0; i < verseList.length; i++) {
String[] data = verseList[i].split("\t");
String[] info3 = data[3].split(" ");
System.out.print("Search for: ");
String txtSearch = kbd.nextLine();
LinkedList<String> searchedList = new LinkedList<String>();
for (String bible : verseList){
if (bible.contains(txtSearch)){
searchedList.add(bible);
counter++;
}
}
if (searchedList.size() > 0){
for (String s : searchedList){
String[] searchedData = s.split("\t");
System.out.printf("%s - %s - %s - %s \n",searchedData[0], searchedData[1], searchedData[2], searchedData[3]);
}
}
System.out.print("Total: " + counter);
So I am thinking of using regex but I don't know how.
Can anyone help? Thank you.
Since sometimes variables have non-word characters at boundary positions, you cannot rely on \b word boundary.
In such cases, it is safer to use look-arounds (?<!\w) and (?!\w), i.e. in Java, something like:
"(?<!\\w)" + searchedData[n] + "(?!\\w)"
To match a String that contains a word, use this code:
String txtSearch; // eg "thou"
if (str.matches(".*?\\b" + txtSearch + "\\b.*"))
// it matches
This code builds a regex that only matches if both ends of txtSearch fall and the start/end of a word in the string by using \b, which means "word boundary".

StringIndexOutOfBoundsException when using delimiter

I want to split a string into multiple parts based on parentheses. So if I have the following string:
In fair (*NAME OF A CITY), where we lay our (*NOUN),
The string should be split as:
In fair
*NAME OF A CITY
, where we lay our
*NOUN
I set up a delimiter like so:
String delim = "[()]";
String [] inputWords = line.split (delim);
Because the strings in all caps with an * at the beginning are going to be replaced with user input, I set up a loop like so:
while (input.hasNextLine())
{
line = input.nextLine();
String [] inputWords = line.split (delim);
for (int i = 0; i < inputWords.length; i++)
{
if (inputWords[i].charAt(0) != '*')
{
newLine.append (inputWords[i]);
}
else
{
String userWord = JOptionPane.showInputDialog (null, inputWords[i].substring (1, inputWords[i].length()));
newLine.append (userWord);
}
}
output.println (newLine.toString());
output.flush();
newLine.delete (0, line.length());
}
Looks like I'm getting an error with this if statement:
if (inputWords[i].charAt(0) != '*')
When I run it, I get a StringIndexOutOfBoundsException: String index out of range: 0. Not sure why that's happening. Any advice? Thank you!
apparently line = input.nextLine(); gives you a blank string, as #Marco already mentioned.
handle empty line(s) before processing further.

Inserting Newline character before every number occurring in a string?

I have String of format something like this
String VIA = "1.NEW DELHI 2. Lucknow 3. Agra";
I want to insert a newline character before every digit occurring succeeded a dot so that it final string is like this
String VIA = "1.NEW DELHI " +"\n"+"2. Lucknow " +"\n"+"3. Agra";
How can I do it. I read Stringbuilder and String spilt, but now I am confused.
Something like:
StringBuilder builder = new StringBuilder();
String[] splits = VIA.split("\d+\.+");
for(String split : splits){
builder.append(split).append("\n");
}
String output = builder.toString().trim();
The safest way here to do that would be go in a for loop and check if the char is a isDigit() and then adding a '\n' before adding it to the return String. Please note, I am not sure if you want to put a '\n' before the first digit.
String temp = "";
for(int i=0; i<VIA.length(); i++) {
if(Character.isDigit(VIA.charAt(i)))
temp += "\n" + VIA.charAt(i);
} else {
temp += VIA.charAt(i);
}
}
VIA = temp;
//just use i=1 here of you want to skip the first charachter or better do a boolean check for first digit.

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

Looking for quick, simple way in Java to change this string
" hello there "
to something that looks like this
"hello there"
where I replace all those multiple spaces with a single space, except I also want the one or more spaces at the beginning of string to be gone.
Something like this gets me partly there
String mytext = " hello there ";
mytext = mytext.replaceAll("( )+", " ");
but not quite.
Try this:
String after = before.trim().replaceAll(" +", " ");
See also
String.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.
regular-expressions.info/Repetition
No trim() regex
It's also possible to do this with just one replaceAll, but this is much less readable than the trim() solution. Nonetheless, it's provided here just to show what regex can do:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$|( )+", "$1")
);
}
There are 3 alternates:
^_+ : any sequence of spaces at the beginning of the string
Match and replace with $1, which captures the empty string
_+$ : any sequence of spaces at the end of the string
Match and replace with $1, which captures the empty string
(_)+ : any sequence of spaces that matches none of the above, meaning it's in the middle
Match and replace with $1, which captures a single space
See also
regular-expressions.info/Anchors
You just need a:
replaceAll("\\s{2,}", " ").trim();
where you match one or more spaces and replace them with a single space and then trim whitespaces at the beginning and end (you could actually invert by first trimming and then matching to make the regex quicker as someone pointed out).
To test this out quickly try:
System.out.println(new String(" hello there ").trim().replaceAll("\\s{2,}", " "));
and it will return:
"hello there"
Use the Apache commons StringUtils.normalizeSpace(String str) method. See docs here
This worked perfectly for me : sValue = sValue.trim().replaceAll("\\s+", " ");
trim() method removes the leading and trailing spaces and using replaceAll("regex", "string to replace") method with regex "\s+" matches more than one space and will replace it with a single space
myText = myText.trim().replaceAll("\\s+"," ");
The following code will compact any whitespace between words and remove any at the string's beginning and end
String input = "\n\n\n a string with many spaces, \n"+
" a \t tab and a newline\n\n";
String output = input.trim().replaceAll("\\s+", " ");
System.out.println(output);
This will output a string with many spaces, a tab and a newline
Note that any non-printable characters including spaces, tabs and newlines will be compacted or removed
For more information see the respective documentation:
String#trim() method
String#replaceAll(String regex, String replacement) method
For information about Java's regular expression implementation see the documentation of the Pattern class
"[ ]{2,}"
This will match more than one space.
String mytext = " hello there ";
//without trim -> " hello there"
//with trim -> "hello there"
mytext = mytext.trim().replaceAll("[ ]{2,}", " ");
System.out.println(mytext);
OUTPUT:
hello there
To eliminate spaces at the beginning and at the end of the String, use String#trim() method. And then use your mytext.replaceAll("( )+", " ").
You can first use String.trim(), and then apply the regex replace command on the result.
Try this one.
Sample Code
String str = " hello there ";
System.out.println(str.replaceAll("( +)"," ").trim());
OUTPUT
hello there
First it will replace all the spaces with single space. Than we have to supposed to do trim String because Starting of the String and End of the String it will replace the all space with single space if String has spaces at Starting of the String and End of the String So we need to trim them. Than you get your desired String.
String blogName = "how to do in java . com";
String nameWithProperSpacing = blogName.replaceAll("\\\s+", " ");
trim()
Removes only the leading & trailing spaces.
From Java Doc,
"Returns a string whose value is this string, with any leading and trailing whitespace removed."
System.out.println(" D ev Dum my ".trim());
"D ev Dum my"
replace(), replaceAll()
Replaces all the empty strings in the word,
System.out.println(" D ev Dum my ".replace(" ",""));
System.out.println(" D ev Dum my ".replaceAll(" ",""));
System.out.println(" D ev Dum my ".replaceAll("\\s+",""));
Output:
"DevDummy"
"DevDummy"
"DevDummy"
Note: "\s+" is the regular expression similar to the empty space character.
Reference : https://www.codedjava.com/2018/06/replace-all-spaces-in-string-trim.html
In Kotlin it would look like this
val input = "\n\n\n a string with many spaces, \n"
val cleanedInput = input.trim().replace(Regex("(\\s)+"), " ")
A lot of correct answers been provided so far and I see lot of upvotes. However, the mentioned ways will work but not really optimized or not really readable.
I recently came across the solution which every developer will like.
String nameWithProperSpacing = StringUtils.normalizeSpace( stringWithLotOfSpaces );
You are done.
This is readable solution.
You could use lookarounds also.
test.replaceAll("^ +| +$|(?<= ) ", "");
OR
test.replaceAll("^ +| +$| (?= )", "")
<space>(?= ) matches a space character which is followed by another space character. So in consecutive spaces, it would match all the spaces except the last because it isn't followed by a space character. This leaving you a single space for consecutive spaces after the removal operation.
Example:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$| (?= )", "")
);
}
See String.replaceAll.
Use the regex "\s" and replace with " ".
Then use String.trim.
String str = " hello world"
reduce spaces first
str = str.trim().replaceAll(" +", " ");
capitalize the first letter and lowercase everything else
str = str.substring(0,1).toUpperCase() +str.substring(1,str.length()).toLowerCase();
you should do it like this
String mytext = " hello there ";
mytext = mytext.replaceAll("( +)", " ");
put + inside round brackets.
String str = " this is string ";
str = str.replaceAll("\\s+", " ").trim();
This worked for me
scan= filter(scan, " [\\s]+", " ");
scan= sac.trim();
where filter is following function and scan is the input string:
public String filter(String scan, String regex, String replace) {
StringBuffer sb = new StringBuffer();
Pattern pt = Pattern.compile(regex);
Matcher m = pt.matcher(scan);
while (m.find()) {
m.appendReplacement(sb, replace);
}
m.appendTail(sb);
return sb.toString();
}
The simplest method for removing white space anywhere in the string.
public String removeWhiteSpaces(String returnString){
returnString = returnString.trim().replaceAll("^ +| +$|( )+", " ");
return returnString;
}
check this...
public static void main(String[] args) {
String s = "A B C D E F G\tH I\rJ\nK\tL";
System.out.println("Current : "+s);
System.out.println("Single Space : "+singleSpace(s));
System.out.println("Space count : "+spaceCount(s));
System.out.format("Replace all = %s", s.replaceAll("\\s+", ""));
// Example where it uses the most.
String s = "My name is yashwanth . M";
String s2 = "My nameis yashwanth.M";
System.out.println("Normal : "+s.equals(s2));
System.out.println("Replace : "+s.replaceAll("\\s+", "").equals(s2.replaceAll("\\s+", "")));
}
If String contains only single-space then replace() will not-replace,
If spaces are more than one, Then replace() action performs and removes spacess.
public static String singleSpace(String str){
return str.replaceAll(" +| +|\t|\r|\n","");
}
To count the number of spaces in a String.
public static String spaceCount(String str){
int i = 0;
while(str.indexOf(" ") > -1){
//str = str.replaceFirst(" ", ""+(i++));
str = str.replaceFirst(Pattern.quote(" "), ""+(i++));
}
return str;
}
Pattern.quote("?") returns literal pattern String.
My method before I found the second answer using regex as a better solution. Maybe someone needs this code.
private String replaceMultipleSpacesFromString(String s){
if(s.length() == 0 ) return "";
int timesSpace = 0;
String res = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(c == ' '){
timesSpace++;
if(timesSpace < 2)
res += c;
}else{
res += c;
timesSpace = 0;
}
}
return res.trim();
}
Stream version, filters spaces and tabs.
Stream.of(str.split("[ \\t]")).filter(s -> s.length() > 0).collect(Collectors.joining(" "))
I know replaceAll method is much easier but I wanted to post this as well.
public static String removeExtraSpace(String input) {
input= input.trim();
ArrayList <String> x= new ArrayList<>(Arrays.asList(input.split("")));
for(int i=0; i<x.size()-1;i++) {
if(x.get(i).equals(" ") && x.get(i+1).equals(" ")) {
x.remove(i);
i--;
}
}
String word="";
for(String each: x)
word+=each;
return word;
}
String myText = " Hello World ";
myText = myText.trim().replace(/ +(?= )/g,'');
// Output: "Hello World"
string.replaceAll("\s+", " ");
If you already use Guava (v. 19+) in your project you may want to use this:
CharMatcher.whitespace().trimAndCollapseFrom(input, ' ');
or, if you need to remove exactly SPACE symbol ( or U+0020, see more whitespaces) use:
CharMatcher.anyOf(" ").trimAndCollapseFrom(input, ' ');
public class RemoveExtraSpacesEfficient {
public static void main(String[] args) {
String s = "my name is mr space ";
char[] charArray = s.toCharArray();
char prev = s.charAt(0);
for (int i = 0; i < charArray.length; i++) {
char cur = charArray[i];
if (cur == ' ' && prev == ' ') {
} else {
System.out.print(cur);
}
prev = cur;
}
}
}
The above solution is the algorithm with the complexity of O(n) without using any java function.
Please use below code
package com.myjava.string;
import java.util.StringTokenizer;
public class MyStrRemoveMultSpaces {
public static void main(String a[]){
String str = "String With Multiple Spaces";
StringTokenizer st = new StringTokenizer(str, " ");
StringBuffer sb = new StringBuffer();
while(st.hasMoreElements()){
sb.append(st.nextElement()).append(" ");
}
System.out.println(sb.toString().trim());
}
}

Categories

Resources