converting HTML to String without TextView - java

I am having Problems filling my TextView.
I have an HTML String that needs to be converted from HTML to String and the replace some characters.
Problem is: I can convert it directly with:
TextView.setText(Html.fromHtml(sampleText);
But I need to alter the converted sampleText before giving it to the TextView.
E.g.:
String sampleText = "<b>Some Text</b>"
newSampleText = Html.fromHtml(sampleText);
newSampleText.replace(char1, char2);
TextView.setText(newSampletext);
Does anyone know how to convert the HTML saved inside the String?

if you don't need formatting, use Html.fromHtml(sampleText).toString()
otherwise, you need to extract text from html with jsoup to find and change text like here

please try this one:
You need to use Html.fromHtml() to use HTML in your XML Strings. Simply referencing a String with HTML in your layout XML will not work.
DEMO
Try use This version of setText and use SPANNABLE buffer type
DEMO1

Related

Remove a given tag from a html string without replace

I'd like to filter an html String before loading it in a WebView:
I'd like to remove all the img tags with the param:
data-custom:'delete'
In example
<img src="https://..." data-custom:'delete'/>
How can I do this in Android in a elegant way (without external libraries if possible)
I'm going to go for a nice and simple:
String element = "<img src='https://...' data-custom:'delete'/>";
String attributeRemoved = element.replaceAll("data-custom:['|\"].+['|\"]", "");
Updated based on comment
If you want to remove the whole tag you can do this:
String elementRemoved = element.replaceAll("<.*data-custom:['|\"].+['|\"].*>", "");
If you only want to do it for <img> tags you can do:
String imgElementRemoved = element.replaceAll("<img.*data-custom:['|\"].+['|\"].*>", "");
A much more reliable way would be to parse the HTML as an XML document and use XPath to find all elements with a data-custom attribute and remove them from the document, then save the updated document. While you can do this stuff with regex, it's not normally a good idea...

How to get url from JSON text

Wondering if it is possible to get url's from the text, wich I parse from JSON. For example I have some JSON object which is in the JSON array and called "text". This "text" contains strings and images (url's) like a img src=\u0022http:\/\/www.dostup1.ru\/netcat_files\/Image\/2014\/08\/04\/dzd-2.jpg
but "img" or "src" are not in quotation marks so how can I get them and then parse this images to show?
item.setContent(post.getString("text"));
This is how I parse text and when it loads, it show me text with strange green rectangles instead of images (because of method doesn't know about how to parse image from url which just is in the text)
Lets assume that you have this piece of String from the result of your JSON.
<img src="http:\/\/www.dostup1.ru\/netcat_files\/Image\/2014\/08\/04\/dzd-2.jpg">
Then you just need to use the split method of the String class to extract the url between qoutes.
sample:
This is just a sample java program that split the URL from the JSON response you got
String s = "<img src=\"http://www.dostup1.ru/netcat_files/Image/2014/08/04/dzd-2.jpg\">";
String [] result = s.split("\"");
System.out.println(result[1]);
result:
http://www.dostup1.ru/netcat_files/Image/2014/08/04/dzd-2.jpg

How to convert html text to Plain Text in report

I am passing HTML text as parameter value in java. like
parameter.put("TEMP_TEXT","<p>test</P>");
I have bound this parameter in ireport as static text field like $P{TEMP_TEXT} then it will be converted in plain text.
output ought to be 'test'
Which setting need to be added on staticText in iReport for acquire same?
Add a Text Field to your report.
Text Field Expression : $P{TEMP_TEXT}
Markup : html

Is there a function that converts HTML to plaintext?

Is there a "hocus-pocus" function, suitable for Android, that converts HTML to plaintext?
I am referring to a function like the clipboard conversion operation found in browsers like Internet Explorer, Firefox, etc: If you select all rendered HTML inside the browser and copy/paste it to a text editor, you will receive (most of) the text, without any HTML tags or headers.
In a similar thread, I saw a reference to html2text but it's in Python. I am looking for an Android/Java function.
Is there something like this available or must I do this myself, using Jsoup or Jtidy?
I'd try something like:
String html = "<b>hola</b>";
String plain = Html.fromHtml(html).toString();
Using JSOUP :
String plain = new HtmlToPlainText().getPlainText(Jsoup.parse(html));
Without JSOUP:
String html= "htmltext";
String newHtml = html.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", " ").trim();

how to convert HTML text to plain text? [duplicate]

This question already has answers here:
Remove HTML tags from a String
(35 answers)
Closed 1 year ago.
friend's
I have to parse the description from url,where parsed content have few html tags,so how can I convert it to plain text.
Yes, Jsoup will be the better option. Just do like below to convert the whole HTML text to plain text.
String plainText= Jsoup.parse(yout_html_text).text();
Just getting rid of HTML tags is simple:
// replace all occurrences of one or more HTML tags with optional
// whitespace inbetween with a single space character
String strippedText = htmlText.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", " ");
But unfortunately the requirements are never that simple:
Usually, <p> and <div> elements need a separate handling, there may be cdata blocks with > characters (e.g. javascript) that mess up the regex etc.
You can use this single line to remove the html tags and display it as plain text.
htmlString=htmlString.replaceAll("\\<.*?\\>", "");
Use Jsoup.
Add the dependency
<dependency>
<!-- jsoup HTML parser library # https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
Now in your java code:
public static String html2text(String html) {
return Jsoup.parse(html).wholeText();
}
Just call the method html2text with passing the html text and it will return plain text.
Use a HTML parser like htmlCleaner
For detailed answer : How to remove HTML tag in Java
I'd recommend parsing the raw HTML through jTidy which should give you output which you can write xpath expressions against. This is the most robust way I've found of scraping HTML.
If you want to parse like browser display, use:
import net.htmlparser.jericho.*;
import java.util.*;
import java.io.*;
import java.net.*;
public class RenderToText {
public static void main(String[] args) throws Exception {
String sourceUrlString="data/test.html";
if (args.length==0)
System.err.println("Using default argument of \""+sourceUrlString+'"');
else
sourceUrlString=args[0];
if (sourceUrlString.indexOf(':')==-1) sourceUrlString="file:"+sourceUrlString;
Source source=new Source(new URL(sourceUrlString));
String renderedText=source.getRenderer().toString();
System.out.println("\nSimple rendering of the HTML document:\n");
System.out.println(renderedText);
}
}
I hope this will help to parse table also in the browser format.
Thanks,
Ganesh
I needed a plain text representation of some HTML which included FreeMarker tags. The problem was handed to me with a JSoup solution, but JSoup was escaping the FreeMarker tags, thus breaking the functionality. I also tried htmlCleaner (sourceforge), but that left the HTML header and style content (tags removed).
http://stackoverflow.com/questions/1518675/open-source-java-library-for-html-to-text-conversion/1519726#1519726
My code:
return new net.htmlparser.jericho.Source(html).getRenderer().setMaxLineLength(Integer.MAX_VALUE).setNewLine(null).toString();
The maxLineLength ensures lines are not artificially wrapped at 80 characters.
The setNewLine(null) uses the same new line character(s) as the source.
I use HTMLUtil.textFromHTML(value)
from
<dependency>
<groupId>org.clapper</groupId>
<artifactId>javautil</artifactId>
<version>3.2.0</version>
</dependency>
Using Jsoup, I got all the text in the same line.
So I used the following block of code to parse HTML and keep new lines:
private String parseHTMLContent(String toString) {
String result = toString.replaceAll("\\<.*?\\>", "\n");
String previousResult = "";
while(!previousResult.equals(result)){
previousResult = result;
result = result.replaceAll("\n\n","\n");
}
return result;
}
Not the best solution but solved my problem :)

Categories

Resources