How to prevent Html.fromHtml from trimming whitespace - java

In an Android Java project, I have a string like this one (although with varying amounts of whitespace on either side):
String foo = " foo bar "
The whitespace on the two sides of the string is important, as the actual string contains indented code with HTML syntax highlighting.
When I pass the string through Html.fromHtml, this start and end whitespace is removed, but I need to keep the whitespace there:
Html.fromHtml(foo).toString() // "foo bar" - I want " foo bar "
How I can preserve the whitespace on the sides of the string through the Html.fromHtml call?

Try Using TextUtils.htmlEncode(str).
This method will escape all html string character.
https://developer.android.com/reference/android/text/TextUtils.html#htmlEncode(java.lang.String)

Yazan's suggestion was sufficient,but since you said that the string is generated dynamically,you can always take the newly generated string s for example and use the concat() method.For example s.concat(" &nbsp");

As Html.fromHtml() parses the string as html tags and content may be you want to use the encoded character for space which is
try this in your code
String foo = " foo bar ";
Note: repeat as many spaces as you need to show.
Edit:
if you are getting your string from somewhere else, you can replace spaces with before passing it
String foo = getMyFoo();
foo = foo.replaceAll(" "," ");

For preserving starting whitespace, this Kotlin code appears to work, and probably wouldn't be difficult to adapt for working with ending whitespace either:
fun replaceWithNonBreakingAtStart(str: String) = (1..(str.takeWhile { it == ' ' }.count())).map { " " }.joinToString("") + str.trimStart()

Related

Recognizing a pattern for regular expressions [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Please, help me with how use java regular expression for the scenario described Specifically how to recognize a particular pattern to match:
I have an input string that may look something like this:
something + {SomeProductSet1}.count + {SomeOtherProductSet2}.amount >
{SomeProductSet3}.count + {SomeUSOC4}.amount
I need to replace everything in the {} with something like this
something + [abc].count+[xyz].count+[something].count + [xom].amount+
[ytkd].amount > [d].count
Basically, everything in between "{..}" has it's equivalent as a list of things that I put later using "[..]".
I have the list of things for [..] but, how can I "recognize" the '{...}' part It is of variable length and variable character sets.
What would I use as a pattern if using regular expressions ??
Thank you !! Much appreciated.
In Java there are lots of ways you can code to get the contents between brackets (or any two specific characters for that matter) but you sort of want to dabble with a regular expression which can be rather slow for this sort of thing especially when dealing with multiple instances of bracket pairs within a supplied string.
The basic code you want to gather up the contents contained between Curly Brackets could be something like this:
String myString = "something + {SomeProductSet1}.count + {SomeOtherProductSet2}.amount > \n" +
" {SomeProductSet3}.count + {SomeUSOC4}.amount";
Matcher match = Pattern.compile("\\{([^}]+)\\}").matcher(myString);
while(match.find()) {
System.out.println(match.group(1));
}
What the above Regular Expression means:
\\{ Open Curly Bracket character {
( start match group
[ one of these characters
^ not the following character
} with the previous ^, this means "every character except the Close
Curly Bracket }"
+ one of more other characters from the [] set
) stop match group
\\} literal Closing Curly Bracket }
If it were me, I would create a method to house this code so that it can be used for other bracket types as well like: Parentheses (), Square Brackets [], Curly Brackets {} (as shown in code), Chevron Brackets <>, or even between any two supplied characters like: /.../ or %...% or maybe even A...A. See the example method below which demonstrates this.
In the example code above it would be within the while loop where you would handle each substring found between each set of brackets. You will of course require a mechanism to determine which substring detected is to be replaced with whatever string like perhaps a multidimensional Array, or perhaps even a custom dialog which would display the found substring between each bracket and allow the User to select its replacement from perhaps a Combo Box with a Do All option Check Box. There are of course several options here for how and what you want to handle each found substring between each set of brackets.
Here is a method example which demonstrates what we've discussed here. It is well commented:
public String replaceBetween(String inputString, String openChar,
String closeChar, String[][] replacements) {
// If the supplied input String contains nothing
// then return a Null String ("").
if (inputString.isEmpty()) { return ""; }
// Declare a string to hold the input string, this way
// we can freely manipulate it without jeopordizing the
// original input string.
String inString = inputString;
// Set the escape character (\) for RegEx special Characters
// for both the openChar and closeChar parameters in case
// a character in each was supplied that is a special RegEx
// character. We'll use RegEx to do this.
Pattern regExChars = Pattern.compile("[{}()\\[\\].+*?^$\\\\|]");
String opnChar = regExChars.matcher(openChar).replaceAll("\\\\$0");
String clsChar = regExChars.matcher(closeChar).replaceAll("\\\\$0");
// Create our Pattern to find the items contained between
// the characters tht was supplied in the openChar and
// closeChar parameters.
Matcher m = Pattern.compile(opnChar + "([^" + closeChar + "]+)" + clsChar).matcher(inString);
// Iterate through the located items...
while(m.find()) {
String found = m.group(1);
// Lets see if the found item is contained within
// our supplied 2D replacement items Array...
for (int i = 0; i < replacements.length; i++) {
// Is an item found in the array?
if (replacements[i][0].equals(found)) {
// Yup... so lets replace the found item in our
// input string with the related element in our
// replacement array.
inString = inString.replace(openChar + found + closeChar, replacements[i][1]);
}
}
}
// Return the modified input string.
return inString;
}
To use this method you might do this:
// Our 2D replacement array. In the first column we place
// the substrings we would like to find within our input
// string and in the second column we place what we want
// to replace the item in the first column with if it's
// found.
String[][] replacements = {{"SomeProductSet1", "[abc]"},
{"SomeOtherProductSet2", "[xyz]"},
{"SomeProductSet3", "[xom]"},
{"SomeUSOC4", "[ytkd]"}};
// The string we want to modify (the input string):
String example = "something + {SomeProductSet1}.count + {SomeOtherProductSet2}.amount > \n" +
" {SomeProductSet3}.count + {SomeUSOC4}.amount";
// Lets make the call and place the returned result
// into a new variable...
String newString = replaceBetween(example, "{", "}", replacements);
// Display the new string variable contents
// in Console.
System.out.println(newString);
The Console should display:
something + [abc].count + [xyz].amount >
[xom].count + [ytkd].amount
Notice how it also replaces the Curly Brackets? This appears to be one of your requirements but can be easily modified to just replace the substring between the brackets. Perhaps you can modify this method (if you like) to do this optionally and as yet another added optional feature....allow it to ignore letter case.

Regex: extract String from String

I need a regex that makes it possible to extract a part out of String. I get this String by parsing a XML-Document with DOM. Then I am looking for the "§regex" part in this String and now I try do extract the value of it. e.g. "([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})" from the rest.
The Problem is, I don´t know how to make sure the extracted part ends with a ")"
This regex needs to work for every value given. The goal is to write only the Value in brackets after the "§regex=" including the brackets into a String.
<UML:TaggedValue tag="description" value=" random Text §regex=([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3}) random text"/>
private List<String> findRegex() {
List<String> forReturn = new ArrayList<String>();
for (String str : attDescription) {
if (str.contains("§regex=")) {
String s = str.replaceAll(regex);
forReturn.add(s);
}
}
return forReturn;
}
attDescription is a list which contains all Attributes found in the XML-Document parsed.
So far i tried this regex: ".*(§regex=)(.*)[)$].*", "$2" but this cuts off the ")" and does not delete the text infront of the searched part. Even with the help of this http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I really don´t understand how to get what I need.
It seems to work for me (with this example anyway) if I use this in place of String s = str.replaceAll(regex);
String s = str.replaceAll( ".*§regex=(\\(.*\\)).*", "$1" );
It's just looking for a substring enclosed by parentheses following §regex=.
This seems to work:
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
Note:
Escape the leading bracket
The $ inside a character class is a literal $ - ignore it, because your regex should always end with a bracket
No need to capture the fixed text
Test code, noting that this works with brackets in/around the regex:
String str = "random Text §regex=(([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})) random text";
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
System.out.println(s);
Output:
([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})

Error when splitting a string in java

I am trying to split a string according to a certain set of delimiters.
My delimiters are: ,"():;.!? single spaces or multiple spaces.
This is the code i'm currently using,
String[] arrayOfWords= inputString.split("[\\s{2,}\\,\"\\(\\)\\:\\;\\.\\!\\?-]+");
which works fine for most cases but i'm have a problem when the the first word is surrounded by quotation marks. For example
String inputString = "\"Word\" some more text.";
Is giving me this output
arrayOfWords[0] = ""
arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"
I want the output to give me an array with
arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"
This code has been working fine when quotation marks are used in the middle of the sentence, I'm not sure what the trouble is when it's at the beginning.
EDIT: I just realized I have same problem when any of the delimiters are used as the first character of the string
Unfortunately you wont be able to remove this empty first element using only split. You should probably remove first elements from your string that match your delimiters and split after it. Also your regex seems to be incorrect because
by adding {2,} inside [...] you are in making { 2 , and } characters delimiters,
you don't need to escape rest of your delimiters (note that you don't have to escape - only because it is at end of character class [] so he cant be used as range operator).
Try maybe this way
String regexDelimiters = "[\\s,\"():;.!?\\-]+";
String inputString = "\"Word\" some more text.";
String[] arrayOfWords = inputString.replaceAll(
"^" + regexDelimiters,"").split(regexDelimiters);
for (String s : arrayOfWords)
System.out.println("'" + s + "'");
output:
'Word'
'some'
'more'
'text'
A delimiter is interpreted as separating the strings on either side of it, thus the empty string on its left is added to the result as well as the string to its right ("Word"). To prevent this, you should first strip any leading delimiters, as described here:
How to prevent java.lang.String.split() from creating a leading empty string?
So in short form you would have:
String delim = "[\\s,\"():;.!?\\-]+";
String[] arrayOfWords = inputString.replaceFirst("^" + delim, "").split(delim);
Edit: Looking at Pshemo's answer, I realize he is correct regarding your regex. Inside the brackets it's unnecessary to specify the number of space characters, as they will be caught be the + operator.

Remove curly brace in Java

I have a text file in which each line begins and ends with a curly brace:
{aaa,":"bbb,ID":"ccc,}
{zzz,":"sss,ID":"fff,}
{ggg,":"hhh,ID":"kkk,} ...
Between the characters there are no spaces. I'm trying to remove the curly braces and replace them with white space as follows:
String s = "{aaa,":"bbb,ID":"ccc,}";
String n = s.replaceAll("{", " ");
I've tried escaping the curly brace using:
String n = s.replaceAll("/{", " ");
String n = s.replaceAll("'{'", " ");
None of this works, as it comes up with an error. Does anyone know a solution?
you cannot define a String like this:
String s = "{aaa,":"bbb,ID":"ccc,}";
The error is here, you have to escape the double quotes inside the string, like this:
String s = "{aaa,\":\"bbb,ID\":\"ccc,}";
Now there will be no error if you call
s.replaceAll("\\{", " ");
If you have an IDE (that is a program like eclipse), you will notice that a string is colored different from the standard color black (for example the color of a method or a semicolon [;]). If the string is all of the same color (usually brown, sometimes blue) then you should be ok, if you notice some black color inside, you are doing something wrong. Usually the only thing that you would put after a double quote ["] is a plus [+] followed by something that has to be added to the string. For example:
String firstPiece = "This is a ";
// this is ok:
String all = s + "String";
//if you call:
System.out.println(all);
//the output will be: This is a String
// this is not ok:
String allWrong = s "String";
//Even if you are putting one after the other the two strings, this is forbidden and is a Syntax error.
String.replaceAll() takes a regex, and regex requires escaping of the '{' character. So, replace:
s.replaceAll("{", " ");
with:
s.replaceAll("\\{", " ");
Note the double-escapes - one for the Java string, and one for the regex.
However, you don't really need a regex here since you're just matching a single character. So you could use the replace method instead:
s.replace("{", " "); // Replace string occurrences
s.replace('{', ' '); // Replace character occurrences
Or, use the regex version to replace both braces in one fell swoop:
s.replaceAll("[{}]", " ");
No escaping is needed here since the braces are inside a character class ([]).
Just adding to the answer above:
If somebody is trying like below, this won't work:
if(values.contains("\\{")){
values = values.replaceAll("\\{", "");
}
if(values.contains("\\}")){
values = values.replaceAll("\\}", "");
}
Use below code if you are using contains():
if(values.contains("{")){
values = values.replaceAll("\\{", "");
}
if(values.contains("}")){
values = values.replaceAll("\\}", "");
}

Help building a regex

I need to build a regular expression that finds the word "int" only if it's not part of some string.
I want to find whether int is used in the code. (not in some string, only in regular code)
Example:
int i; // the regex should find this one.
String example = "int i"; // the regex should ignore this line.
logger.i("int"); // the regex should ignore this line.
logger.i("int") + int.toString(); // the regex should find this one (because of the second int)
thanks!
It's not going to be bullet-proof, but this works for all your test cases:
(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
It does a look behind and look ahead to assert that there's either none or two preceding/following quotes "
Here's the code in java with the output:
String regex = "(?<=^([^\"]*|[^\"]*\"[^\"]*\"[^\"]*))\\bint\\b(?=([^\"]*|[^\"]*\"[^\"]*\"[^\"]*)$)";
System.out.println(regex);
String[] tests = new String[] {
"int i;",
"String example = \"int i\";",
"logger.i(\"int\");",
"logger.i(\"int\") + int.toString();" };
for (String test : tests) {
System.out.println(test.matches("^.*" + regex + ".*$") + ": " + test);
}
Output (included regex so you can read it without all those \ escapes):
(?<=^([^"]*|[^"]*"[^"]*"[^"]*))\bint\b(?=([^"]*|[^"]*"[^"]*"[^"]*)$)
true: int i;
false: String example = "int i";
false: logger.i("int");
true: logger.i("int") + int.toString();
Using a regex is never going to be 100% accurate - you need a language parser. Consider escaped quotes in Strings "foo\"bar", in-line comments /* foo " bar */, etc.
Not exactly sure what your complete requirements are but
$\s*\bint\b
perhaps
Assuming input will be each line,
^int\s[\$_a-bA-B\;]*$
it follows basic variable naming rules :)
If you think to parse code and search isolated int word, this works:
(^int|[\(\ \;,]int)
You can use it to find int that in code can be only preceded by space, comma, ";" and left parenthesis or be the first word of line.
You can try it here and enhance it http://www.regextester.com/
PS: this works in all your test cases.
$[^"]*\bint\b
should work. I can't think of a situation where you can use a valid int identifier after the character '"'.
Of course this only applies if the code is limited to one statement per line.

Categories

Resources