HTML.fromHtml adds space at end of text?

HTML.fromHtml adds space at end of text? - java

In my app I use the Html.fromHtml(string).toString method to remove some <p> tags that are received when I parse some JSON.
If I leave the <p> tags on, the text fits the background perfectly (the background is a relative layout with wrap_content in both height and width.) However, if I use fromHtml to remove the <p> tags, suddenely there is a huge space below the text, which I believe is the fromHtml method adding in space at the end?
Any ideas?
EDIT:
Here are screenshots:
http://imgur.com/a/zIZNo
The one with <p> tags is the one that doesnt use fromHtml, obviously! :)
EDIT 2: Solution has been found, see my answer below. Thank you to Andro Selva for helping me by telling me about the hidden /n that was being added!

Solution was found:
fromHtml returns the type Spanned. So I assigned what was being returned to a variable, converted it to a string and then used the .trim() method on it.
It removed all white space at the end.

Yes what you thought about is really correct. It adds space to the bottom. But before that let me explain how this works.
You have to look at HTML class to see how it works.
To be simple, this is how it works: whenever your Html class looks at a <p> tag, what it does is simply append two "\n" chars to the end.
In this case the empty space you see at the bottom is actually because of the two \n appended to the end of the paragaraph.
And I have added the actual method of the Html class which is responsible for this action,
private static void handleP(SpannableStringBuilder text) {
int len = text.length();
if (len >= 1 && text.charAt(len - 1) == '\n') {
if (len >= 2 && text.charAt(len - 2) == '\n') {
return;
}
text.append("\n");
return;
}
if (len != 0) {
text.append("\n\n");
}
}
If you want to override this action, you have to override the Html class itself which is a bit tricky and can't be completed here.
EDIT
here is the link to the Html class,
Html class

If you are trying to use it in an object or trying to fit it in a specific place, try using <a> tag instead of a <p>, <p> adds returns carriages at the end, a writes none, but you have to remember to write the \n yourself with <b>, and you get to keep the style

The explanation by #Andro Selva is correct and there is not much to be done about it. Frustratingly, things get better for API 24 and later with the inclusion of flags in the call
Spanned fromHtml (String source,
int flags,
Html.ImageGetter imageGetter,
Html.TagHandler tagHandler);
and I suspect the FROM_HTML_SEPARATOR_LINE_BREAK_PARAGRAPH flag will reduce the double "\n\n" of the standard paragraph termination to that of the single "\n" of a line break
Given the history of Android versions out there ~I can't afford to write software for Android API 24+ exclusively! So... I found a kludge solution with the inclusion of 2 extra custom tags.
1. <scale factor="x.xx">... </scale>
2. <default>... </default>
both invoke the RelativeSizeSpan class through this method
private void ProcessRelativeSizeTag(float scalefactor, boolean opening, Editable output) {
int len = output.length();
if (opening) {
System.out.println("scalefactor open: " + scalefactor);
output.setSpan(new RelativeSizeSpan(scalefactor), len, len,
Spannable.SPAN_MARK_MARK);
} else {
Object obj = getLast(output, RelativeSizeSpan.class);
int where = output.getSpanStart(obj);
scalefactor = ((RelativeSizeSpan)obj).getSizeChange();
output.removeSpan(obj);
System.out.println("scalefactor close: " + scalefactor);
if (where != len) {
output.setSpan(new RelativeSizeSpan(scalefactor), where, len,
Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
}
}
}
which is called from the custom TagHandler supplied to the Html.fromHtml method, viz:
private static class CustomTagHandler implements Html.TagHandler {
private void ProcessRelativeSizeTag(float scalefactor, boolean opening, Editable output) {
int len = output.length();
if (opening) {
//mSizeStack.push(scalefactor);
System.out.println("scalefactor open: " + scalefactor);
output.setSpan(new RelativeSizeSpan(scalefactor), len, len,
Spannable.SPAN_MARK_MARK);
} else {
Object obj = getLast(output, RelativeSizeSpan.class);
int where = output.getSpanStart(obj);
scalefactor = ((RelativeSizeSpan)obj).getSizeChange();
output.removeSpan(obj);
//scalefactor = (float)mSizeStack.pop();
System.out.println("scalefactor close: " + scalefactor);
if (where != len) {
output.setSpan(new RelativeSizeSpan(scalefactor), where, len,
Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
}
}
}
...
final HashMap<String, String> mAttributes = new HashMap<>();
#Override
public void handleTag(boolean opening, String tag, Editable output, XMLReader xmlReader) {
String Attr;
processAttributes(xmlReader);
if ("default".equalsIgnoreCase(tag)) {
ProcessRelativeSizeTag(mDefaultTextSize, opening, output);
return;
}
if ("scale".equalsIgnoreCase(tag)) {
Attr = mAttributes.get("factor");
if (Attr != null && !Attr.isEmpty()) {
float factor = parseFloat(Attr);
if (factor > 0)
ProcessRelativeSizeTag(factor, opening, output);
}
return;
...
}
}
To use, I set the text size of the Textview object to 1. That is, 1 pixel! I then set the required true text size required in the variable mDefaultTextSize. I have all the Html functionality inside an htmlTextView which extends TextView as:
public class htmlTextView extends AppCompatTextView {
static Typeface mLogo;
static Typeface mGAMZ;
static Typeface mBrush;
static Typeface mStandard;
int GS_PAINTFLAGS = FILTER_BITMAP_FLAG | ANTI_ALIAS_FLAG | SUBPIXEL_TEXT_FLAG | HINTING_ON;
static float mDefaultTextSize;
static Typeface mDefaultTypeface;
etc
}
which includes the public method
public void setDefaultTextMetrics(String face, float defaultTextSize) {
mDefaultTypeface = mStandard;
if (face != null) {
if ("gamz".equalsIgnoreCase(face)) {
mDefaultTypeface = mGAMZ;
} else {
if ("brush".equalsIgnoreCase(face)) {
mDefaultTypeface = mBrush;
}
}
}
setTypeface(mDefaultTypeface);
setTextSize(1);
mDefaultTextSize = defaultTextSize;
}
A simple ((htmlTextView)tv).setDefaultTextMetrics(null, 30); call sets my htmlTextView to use my standard typeface as default with a text size of 30.
Then when I give it this example to use in fromHtml:
<string name="htmlqwert">
<![CDATA[
<p><default><scale factor="1.5"><box> qwertQWERT </box></scale></default></p>
<p><default><scale factor="1.5"><box> qwertQWERT </box></scale></default></p>
<p><default><scale factor="1.5"><box> qwertQWERT </box></scale></default></p>
<p><default><scale factor="1.5"><box> qwertQWERT </box></scale></default></p>
]]>
</string>
my custom tag <box> just lets me highlight the background of the text. See the attached picture, showing one result using the <default> tag with the TextView text size set to 1 and the <default> tag invoking a RelevantSizeSpan by a factor of 30, and one with:
<string name="htmlqwert">
<![CDATA[
<p><scale factor="1.5"><box> qwertQWERT </box></scale></p>
<p><scale factor="1.5"><box>qwertQWERT</box></scale></p>
<p><scale factor="1.5"><box>qwertQWERT</box></scale></p>
<p><scale factor="1.5"><box>qwertQWERT</box></scale></p>
]]>
</string>
using no <default> tag but setting the TextView text size to 30 instead. In the first case the extra new line is still there but it is only 1 pixel high!
NB There is no real point to the <scale factor="1.5">...</scale> tags. They are just left over artefacts from other tests.
Results: Both examples below have 2 newlines between paragraphs but, in the one on the left, one of those lines is only 1 pixel high. I will leave it to the reader to figure out how to reduce it to zero, but do not use a text size of 0

This solution works for me
Create a helper method to replace all paragraph starting and ending tags and replace all with empty characters.
#Nullable
public static String removeParagraphTags(#Nullable String input) {
if (input == null) {
return null;
}
return input.replaceAll("<p>", "").replaceAll("</p>", "");
}
And Usage
String input = "<p>This is some text in a paragraph.</p>";
HtmlCompat.fromHtml(StringUtils.removeParagraphTags(input),HtmlCompat.FROM_HTML_MODE_COMPACT)

Related

How to extract data from PDF and split into particluar categories using java

I am trying to extract data from PDF and splitting it into certain categories.I am able to extract data from PDF and Split it into categories on basis of their font size. For example:Lets say there are 3 category, Country category, capital category and city category. I am able to put all countries, capitals and cities into their respective categories. But I am not able to map which capital belong to which city and which Country or which country belong which city and capital.
*It is reading data randomly, How I can Read data from bottom to Top without breaking the sequence, so I can Put first word in first category, 2nd into second and so on. *
Or anyone know some more efficient way? so I can put text into their respective categories and map it also.
I am using Java and
Here is my code:
public class readPdfText {
public static void main(String[] args) {
try{
PdfReader reader = null;
String src = "pdffile.pdf";
try {
reader = new PdfReader("pdfile.pdf");
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
SemTextExtractionStrategy smt = new SemTextExtractionStrategy();
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
PdfTextExtractor.getTextFromPage(reader, i, smt);
}
}catch(Exception e){
}
}
}
SemTextExtractionStrategy class:
public class SemTextExtractionStrategy implements TextExtractionStrategy {
private String text;
StringBuffer str = new StringBuffer();
StringBuffer item = new StringBuffer();
StringBuffer cat = new StringBuffer();
StringBuffer desc = new StringBuffer();
float temp = 0;
#Override
public void beginTextBlock() {
}
#Override
public void renderText(TextRenderInfo renderInfo) {
text = renderInfo.getText();
Vector curBaseline = renderInfo.getBaseline().getStartPoint();
Vector topRight = renderInfo.getAscentLine().getEndPoint();
Rectangle rect = new Rectangle(curBaseline.get(0), curBaseline.get(1),
topRight.get(0), topRight.get(1));
float curFontSize = rect.getHeight();
compare(text, curFontSize);
}
private void add(String text2, float curFontSize) {
str.append(text2);
System.out.println("str: " + str);
}
public void compare(String text2, float curFontSize) {
// text2.getFont().getBaseFont().Contains("bold");
// temp = curFontSize;
boolean flag = check(text);
if (temp == curFontSize) {
str.append(text);
/*
* if (curFontSize == 11.222168){ item.append(str);
* System.out.println(item); }else if (curFontSize == 10.420532){
* desc.append(str); }
*/
// str.append(text);
} else {
if (temp>9.8 && temp<10){
String Contry= str.toString();
System.out.println("Contry: "+Contry);
}else if(temp>8 && temp <9){
String itemPrice= str.toString();
System.out.println("itemPrice: "+itemPrice);
}else if(temp >7 && temp< 7.2){
String captial= str.toString();
System.out.println("captial: "+captial);
}else if(temp >7.2 && temp <8){
String city= str.toString();
System.out.println("city: "+city);
}else{
System.out.println("size: "+temp+" "+"str: "+str);
}
temp = curFontSize;
// System.out.println(temp);
str.delete(0, str.length());
str.append(text);
}
}
private boolean check(String text2) {
return true;
}
#Override
public void endTextBlock() {
}
#Override
public void renderImage(ImageRenderInfo renderInfo) {
}
#Override
public String getResultantText() {
return text;
}
}

It is reading data randomly, How I can Read data from bottom to Top without breaking the sequence, so I can Put first word in first category, 2nd into second and so on.
No, not randomly but instead in the order of the corresponding drawing operations in the content stream.
Your TextExtractionStrategy implementation SemTextExtractionStrategy simply uses the text in the order in which it is forwarded to it which is the order in which it is drawn. The order of the drawing operations does not need to be the reading order, though, as each drawing operation may start at a custom position on the page; if multiple fonts are used on one page, e.g., the text may be drawn grouped by font.
If you want to analyze the text from such a document, you first have to collect and sort the text fragments you get, and only when all text from the page is parsed, you can start analyzing it.
The LocationTextExtractionStrategy (included in the iText distribution) can be taken as an example of a strategy doing just that. It uses its inner class TextChunk for collecting the fragments, though, and this class does not carry the text ascent information you use in your code.
A SemLocationTextExtractionStrategy, therefore, would have to use an extended TextChunk class to also keep that information (or some information derived from it, e.g. a text category).
Furthermore the LocationTextExtractionStrategy only sorts top to bottom, left to right. If your PDF has a different design, e.g. if it is multi-columnar, either your sorting has to be adapted or you have to use filters and analyze the page column by column.
BTW, your code to determine the font size
Vector curBaseline = renderInfo.getBaseline().getStartPoint();
Vector topRight = renderInfo.getAscentLine().getEndPoint();
Rectangle rect = new Rectangle(curBaseline.get(0), curBaseline.get(1),
topRight.get(0), topRight.get(1));
float curFontSize = rect.getHeight();
does not return the actual font size but only the ascent above the base line. And even that only for unrotated text; as soon as rotation is part of the game, your code only returns the height of the rectangle enveloping the line from the start of the base line to the end of the ascent line. The length of the line from base line start to ascent line start would at least be independent from rotation.
Or anyone know some more efficient way?
Your task seems to depend very much on the PDF you are trying to extract information from. Without that PDF, therefore, tips for more efficient ways will remain vague.

Remove Hightlight matching String content

Ok, few days ago I made one post regarding to the remove of Hightlighted text in JTextArea:
Removing Highlight from specific word - Java
The thing is, that time I made one code to remove Hightlights macthing its size...but now I have a lot of words with the same size in my app and obviously the application isnt running right.
So I ask, Does anyone know a library or a way to do this removal macthing the content of each highlighted string?

You could write a method to get the text for a given highlighter:
private static String highlightedText(Highlight h, Document d) {
int start = h.getStartIndex();
int end = h.getEndIndex();
int length = end - start;
return d.getText(start, length);
}
Then your removeHighlights method would look like this:
public void removeHighlights(JTextComponent c, String toBlackOut) {
Highlighter highlighter = c.getHighlighter();
Highlighter.Highlight[] highlights = h.getHighlights();
Document d = c.getDocument();
for (Highlighter.Highlight h : highlights)
if (highlightedText(h, d).equals(toBlackOut) && h.getPainter() instanceof TextHighLighter)
highlighter.removeHighlight(h);
}

Need Help to optimize the following java code. It is the code for setting color of specific keywords in a textview.(Android Application)

I'm developing an app to view C-programs.I wanted to give a simple color scheme to the text which is stored in database,retrieved as string and then passed on to the textview.
The code I have written assigns green color to header file declarations and brackets,blue color is assigned to numbers,printf,scanf...red is assigned to datatypes such as int,char,float.
It is however very inefficient.Before applying this color scheme,my app was displaying the textview activity instantly.Now,depending on the length of the programs it takes up to 4 to 5 seconds which is really poor performance.
what it does is,it takes one keyword at a time,then iterates the complete text of textview looking for that particular keyword only and changes its color,sets the text again.
Thus,it traverses text of the entire textview 29 times as I have defined 29 keywords in String arrays( namely keywordsgreen,keywordsblue,keywordsred).
The activity's onCreate function contains the following code :
textView = (TextView) findViewById(R.id.textView1);
textView.setText(programtext);
textView.setBackgroundColor(0xFFE6E6E6);
//The problem stars here
String [] keywordsgreen={"#define","#include","stdio.h","conio.h","stdlib.h","math.h","graphics.h","string.h","malloc.h","time.h","{","}","(",")","<",">","&","while ","for "};
for(String y:keywordsgreen)
{
fontcolor(y,0xff21610B);
}
String [] keywordsred={"%d","%f","%c","%s","int ","char ","float","typedef","struct ","void "};
for(String y:keywordsred)
{
fontcolor(y,0xFFB40404);
}
String [] keywordsblue={"printf","scanf","\n","getch","0","1","2","3","4","5","6","7","8","9"};
for(String y:keywordsblue)
{
fontcolor(y,0xFF00056f);
}
The fontcolor function is as follows :
private void fontcolor(String text,int color)
{
Spannable raw=new SpannableString(textView.getText());
int index=TextUtils.indexOf(raw, text);
while (index >= 0)
{
raw.setSpan(new ForegroundColorSpan(color), index, index + text.length(), Spanned.SPAN_EXCLUSIVE_EXCLUSIVE);
index=TextUtils.indexOf(raw, text, index + text.length());
}
textView.setText(raw);
}

Limit EditText to text (A-z) only

I want an editText that only allows text input from A to z, no numbers or other characters. I've found out I have to use InputFilter but I don't understand how this code works.
InputFilter filter = new InputFilter() {
public CharSequence filter(CharSequence source, int start, int end,
Spanned dest, int dstart, int dend) {
for (int i = start; i < end; i++) {
if (!Character.isLetterOrDigit(source.charAt(i))) {
return "";
}
}
return null;
}
};
edit.setFilters(new InputFilter[]{filter});

The code you posted adds a custom filter to the EditText field. It checks to see if the character entered is not a number or digit and then, if so, returns an empty string "". That code is here:
if (!Character.isLetterOrDigit(source.charAt(i))) {
return "";
}
For your needs, you want to change the code slightly to check if the character is NOT a letter. So, just change the call to the static Character object to use the isLetter() method. That will look like this:
if (!Character.isLetter(source.charAt(i))) {
return "";
}
Now, anything that is not a letter will return an empty string.

Haven't actually done it, but check Androids NumberKeyListener. You can find the source code for it here:
http://www.java2s.com/Open-Source/Android/android-core/platform-frameworks-base/android/text/method/NumberKeyListener.java.htm
it does exactly the opposite of what you need, but that should be a good enough starting point.

DocumentListener slows down Document.setCharacterAttributes method?

this is my first question in this site, though is not the first time I enter to clear my doubts, awesome webpage. :)
I'm writing a java program that highlights code in a JTextPane and I'm changing the way highlights are done. I'm using a JTabbedPane to let the user edit more than one file at the same time and I used to perform document highlights using a Timer, now I've built a highlight queue that runs in a separate thread and implemented a DocumentListener that queues the documents as changes take place.
But I have a really big problem, if I add the document via DocumentListener, the Highlight process takes a really long time while if I add it in the main class by getting the document directly from the JTextPane, it takes just a few milliseconds.
I've performed multiple benchmarks in my code and found out that what takes so much time to be performed when the document is added from the DocumentListener is the method Document.setCharacterAttributes().
Here is the method that adds documents via DocumentListener:
// eventType: 0 - insertUpdate / 1- removeUpdate
private void queueChange(javax.swing.event.DocumentEvent e, int eventType){
StyledDocument doc = (StyledDocument) e.getDocument();
int changeLength = e.getLength();
int changeOffset = e.getOffset();
int length = doc.getLength();
String title = (String) doc.getProperty("title");
String text;
try {
text = doc.getText(0, length);
if (changeLength != 1) {
Element element = doc.getDefaultRootElement();
int startLn = element.getElement(element.getElementIndex(changeOffset)).getStartOffset();
int endLn = element.getElement(element.getElementIndex(changeOffset + changeLength)).getEndOffset() - 1;
Engine.addDocument(doc, startLn, endLn, title, text);
} else {
if(eventType == 1){
changeOffset = changeOffset - changeLength;
}
int startLn = text.lastIndexOf("\n", changeOffset) + 1;
int endLn = text.indexOf("\n", changeOffset);
if (endLn < 0) {
if (length != startLn) {
endLn = length;
Engine.addDocument(doc, startLn, endLn, title, text);
}
} else if (startLn != endLn && startLn < endLn) {
Engine.addDocument(doc, startLn, endLn, title, text);
}
}
} catch (BadLocationException ex) {
Engine.crashEngine();
}
}
If I add a document with 2k lines with this method, it takes ~1900 ms to highlight the whole document, while if I add the document to the highlight queue by using a caret listening method it takes ~500 ms.
Here's a part of the caret listening method that is used to highlight whole documents when they're loaded:
if (loadFile == true) {
isKey = false;
doc = edit[currentTab].Editor.getStyledDocument();
try {
Highlight.addDocument(doc, 0, doc.getLength(),
Scripts.getTitleAt(currentTab), doc.getText(0, doc.getLength()));
} catch (BadLocationException ex) {
ex.printStackTrace();
}
loadFile = false;
}
Note: the Highlight/Engine.addDocument() method has five parameters: (StyledDocument doc,int start, int end, String tabTitle, String docText). Start and end both indicate the region where highlighting is needed.
I will appreciate any help related to this problem cause I've been trying to solve it for a few days and I can't find anything similar on the Internet. :(
Btw, does anyone know the actual difference between Document.setCharacterAttributes and Document.setParagraphAttributes? :P

Maybe you have some kind of recursion in your code that is causing the problem. With the DocumentEvent you should only worry about additions and removals. You don't need to worry about changes since those are attribute changes.
Maybe you add some text which schedules the highlighting, but then when you change the attributes of the text you schedule another highllighting task.

You can try to set a flag indicating whether it's user changes or your API changes. In the beginning of the Engine.addDocument() set the flag to API state and reset it back after changes are done.
In your listener check the flag and skip changes from API.
You wrote " I use highlights the text by setting the character attributes of a portion of the Document, so the method is not inserting more text". I'm not sure it doesn't insert text. E.g. you have "it's a bold text piece" then you select the "bold" and change attributes to bold. Original element is separated and 3 new elements appear. I didn't test it but it might call insertUpdate() and removeUpdate()
does anyone know the actual difference between Document.setCharacterAttributes and Document.setParagraphAttributes?
There are paragraph and char attributes. Char attributes are font size, family, style, colors. Paragraph attributes are alignment, indentation, line spacing.
Actually paragraphs are char elements' parents.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HTML.fromHtml adds space at end of text? - java

Solution was found: fromHtml returns the type Spanned. So I assigned what was being returned to a variable, converted it to a string and then used the .trim() method on it. It removed all white space at the end.

If you are trying to use it in an object or trying to fit it in a specific place, try using <a> tag instead of a <p>, <p> adds returns carriages at the end, a writes none, but you have to remember to write the \n yourself with <b>, and you get to keep the style

Related

How to extract data from PDF and split into particluar categories using java

Remove Hightlight matching String content

Need Help to optimize the following java code. It is the code for setting color of specific keywords in a textview.(Android Application)

Limit EditText to text (A-z) only

DocumentListener slows down Document.setCharacterAttributes method?

Categories

Resources