reliable approach to unescape HTML Characters in java (Android)

reliable approach to unescape HTML Characters in java (Android) - java

I didn't find any approach except replacing myself to unescape any kind of HTML Charaters in a reliable way.
StringEscapeUtils currently seems the best for my, but it doesn't work for any case:
My Code:
#Test
public void escapeHTMLChars2() {
String sample1 = "<";
Assert.assertEquals("<", org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(sample1));
String sample2 = "–";
Assert.assertEquals("–", org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(sample2));
}
gradle:
implementation group: 'org.apache.commons', name: 'commons-lang3', version: '3.11'
leads to:
org.junit.ComparisonFailure:
Expected :-
Actual :?
Is there a way to unescape all kinds of HTML chars without hacking ?

Related

stripAccents on Thai language

I am trying to strip accents from Thai language word using the stripAccent function in scala language, seems like it's not able to strip the accent.
import org.apache.commons.lang3.StringUtils.stripAccents
println("stripped string " + stripAccents("CLEกอ่ตัRงขึนในปีR"))
stripped string CLEกอ่ตัRงขึนในปีR
I am running in Intellij windows environment. It's stripping many other languages like German, Dutch etc.
Has anyone faced similar issue, how did you resolve?

You can use java Normalizer :
import java.text.Normalizer
val thaiString = "CLEกอ่ตัRงขึนในปีR"
val strippedString = Normalizer.normalize(thaiString, Normalizer.Form.NFD)
.replaceAll("[\\p{InCombiningDiacriticalMarks}\\p{IsM}]+", "")
println(strippedString)
//CLEกอตRงขนในปR

Evaluate string expression in kotlin without using script engine

I want to evaluate the string expression without using script engine:
string expression can be like:
"((true&&(!false)||true)&&true)&&true"
Anybody have any idea how this can be done in android using kotlin
Thanks

Kotlin Android
app.gradle file
Add a new dependency
dependencies {
implementation 'net.objecthunter:exp4j:0.4.8'
}
main.kt file
var value = TextView.text.toString()
var result = ExpressionBuilder(value).build().evaluate()

I know my answer will not help you in your case where your string expression contain true, false, &&, ||, ! and (). BUT for those who wants to evaluate mathematical expressions I found this library that handles almost all mathematical operators.
Add it to your project and use it this way
// In root build.gradle
repositories {
maven {
url "https://dl.bintray.com/kaendagger/KParser"
}
}
//Add in the dependencies
dependencies{
implementation 'io.kaen.dagger:KParser-jvm:0.1.1'
}
And then in your code you can do something like this
val parser = ExpressionParser()
val result = parser.evaluate("5+1+cos(PI)-2*2/4")
println(result)

How to read properties with special characters from application.yml in springboot

application.yml
mobile-type:
mobile-codes:
BlackBerry: BBSS
Samsung: SAMS
Samsung+Vodafone: SAMSVV
While reading (Samsung+Vodafone)key from application yml file , we are getting.
concatenated String format as 'SamsungVodafone' .
Morever we heve tried "Samsung'/+'Vodafone": SAMSVV but the result was same and we have tried other symbol such as '-' so its working fine .
For reading key and value from application yml file . we have written below code.
import java.util.Map;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
#ConfigurationProperties(prefix = "mobile-type")
#Component
public class mobileTypeConfig {
Map<String, String> mobileCodes;
public Map<String, String> getMobileCodes() {
return mobileCodes;
}
public void setMobileCodes(Map<String, String> mobileCodes) {
this.mobileCodes= mobileCodes;
}
}
Note :Spring Boot Version
2.0.6.RELEASE

Use square brackets not to escape any character and encode that in double quotes
mobile-type:
mobile-codes:
BlackBerry: BBSS
Samsung: SAMS
"[Samsung+Vodafone]": SAMSVV
Output
{BlackBerry=BBSS, Samsung=SAMS, Samsung+Vodafone=SAMSVV}
Binding
When binding to Map properties, if the key contains anything other than lowercase alpha-numeric characters or -, you need to use the bracket notation so that the original value is preserved. If the key is not surrounded by [], any characters that are not alpha-numeric or - are removed. For example, consider binding the following properties to a Map:
acme:
map:
"[/key1]": value1
"[/key2]": value2

please keep in mind that the left side is a yml key, not an arbitrary string. my suggestion or your usecase would be to have a map with both on the right side such as:
foo:
- name: "Samsung+Vodafone"
code: "SAMSVV"
- name: "BlackBerry"
code: "BBMS"
- name: "Samsung"
codes:
- "SAMS"
- "SMG"
you will have to change your class structure slightly, but you could actually reconstruct your initial approach from that.

Convert string with Unicode to show unicode in java?

I am trying to convert strings that contain a unicode to the actual character but everything I have found so far either only work if the string is only the unicode or converts the symbol to the code.
This is the string I am using as an example right now
Rebroadcast of Shows from the past Week! RPGs, Talk shows, Science, Wisdom, Vampires and more - Good stuff! \\u003c3 - !rbschedule for more info
I am getting this in from an API call so I can't just write it as \ instead of the \\.

\\
That is called escaping, and it is what is currently blocking you from seeing the < character.
Un-escaping is not what you'd actually want to do manually, as there are many caveats.
You might want to use Apache common-text StringEscapeUtils#unescapeJava
final String result = StringEscapeUtils.unescapeJava(yourString);
That will output "...Good stuff! <3 - !rbschedule for more info..."
The Maven dependency
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.6</version>
</dependency>
Or for Gradle
compile group: 'org.apache.commons', name: 'commons-text', version: '1.6'

what if you replace all "//" with "/" dynamically?

Apache Solr 6.6.1 number mapping in Urdu language

I have configured Apache Solr 6.6.2 to index and search documents later. I am facing some problems. If there is a number in document like 1234, I want it should be mapped (copied) to corresponding Urdu numerics like ۱۲۳۴. It will ultimately help to retrieve document if either user enter 1234 or ۱۲۳۴.
Is there any built in solution in Solr or how I can come into this functionality?

If you are using Java/SolrJ client for indexing ...
Add junidecode dependency to your project
for gradle
compile group: 'junidecode', name: 'junidecode', version: '0.1.1'
for maven:
<dependency>
<groupId>junidecode</groupId>
<artifactId>junidecode</artifactId>
<version>0.1.1</version>
</dependency>
while indexing ... index an additional field ...
import net.sf.junidecode.Junidecode;
String converted = Junidecode.unidecode("۱۲۳۴")
// converted == 1234

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

reliable approach to unescape HTML Characters in java (Android) - java

Related

stripAccents on Thai language

Evaluate string expression in kotlin without using script engine

How to read properties with special characters from application.yml in springboot

Convert string with Unicode to show unicode in java?

Apache Solr 6.6.1 number mapping in Urdu language

Categories

Resources