I have couple of html pages in my assets folder, i am able to open them and get them in a string. My problem lies ahead of it, I just to extract text between certain tags. For example if i am having a line in my html page as <h3>Hello have a nice day</h3> inside h3 tag.
I just want to get "Hello have a nice day". Till now i tried it to string functions but no success. How can i achieve this?
UPDATE
I got the solution from link
Use Html.fromHtml(), pass the html source and it will return only the text..
check http://developer.android.com/reference/android/text/Html.html
If you are able to read html files, then everything should be easy. If it's simple html page you can use xpath to parse it and retrieve whatever you want, or you can use some libaries such as jsoup to parse the html.
Related
My website is just like Stack Overflow and under development. I am using plain textarea to take text input as I do not have any WMD editor like Stack Overflow's.
When I take HTML code as input and store it in database table in a text or nvarchar(max) column, it is stored successfully. But when I call that data for display, it displays the corresponding HTML page instead of that HTML code on screen. I am not able to resolve it. For better understanding I'm putting here input page and output page images of my website.
This is image of input page:
This is the image of output page:
What is going wrong here ?
One easy way is to replace
< with < and > with >
in the HTML string which you retrived and then display it on page.
Have you tried that ?
You need to escape the HTML so it's not interpreted by the browser. How to do that depends on the view technology you're using.
With JSP and JSTL the escaping is automatically done with <c:out value="${myString}"/>. If you're not using JSTL yet, now's the time to start (there's a lot of other helpful things in there too).
you can save the html codes just like text. You can use varchar(max) type column to save the html code in table. Display the code is depending the browser. But if you use nvarchar type that will cause problems in display.
Another possible solution is to replace the html tags before storing in database. What I did is :-
text=text.replaceAll("<", "<");
text=text.replaceAll(">", ">");
and then stored text in database and its working. Thanks to Bibin Mathew.
community!
My project is simple: I have a link to a website that has multiple information on different chemical substances and I want to extract some data and put in into pdf. Thing is that I want to keep the formatting of the original HTML (using it's css, of course).
Example of substance: http://www.molbase.com/en/msds_1659-31-0-moldata-2.html#tabs
I used jsoup to read the HTML of the table on the bottom of the page, the MSDS one, containing multiple sections with different information about the substance, but I really don't know how to save the exact HTML format into my pdf file. I have tried with iText too, but it gives me "missing ending tag" error, and if it worked, it would print the full page, not only that msds table.
Here is what I have tried to do, but ain't effective:
Document docu = Jsoup.connect(urlbun).get();
Element tableHeader = docu.select("div[class=\"msds\"]")
.first();
String[] finSyn = tableHeader.text().split(" ");
String moreText =" ";
I tried to split the text that the webpage has under that div ("class = "msds"") but I cannot find a way to split it the good way.
Please, could you please give me a hint on what to do? Even if the formating is not the same, I would like to be able to display the information in the same way, with indentation and such.
Thank you!
You can put the content that you want to convert to PDF inside a CSS ID (such as a DIV) and then use the PDFmyURL API to convert only that section to PDF.
Please refer to this on our website about how to select pieces from a page to convert to PDF
Disclosure: I work for the company that owns this site
I would want the part of the HTML file that gets highlighted when you inspect element. I just need it as a string to save. So a user chooses what element code to save by just clicking. Is this possible? Thanks
Edit: This is also not for a website I control, I need it for any website the user goes to.
The data URI will serve the purpose.
test
when click, a new page will come up with the content sdf.
You may use window.btoa(content) to generate the URI dynamically.
Yes, you will have to use ajax and json or jquery for saving a part of the html code to the data.
Using Jsoup, I want to extract all paragraphs from an HTML page, i.e. whatever is between <p> and </p>.
How do I accomplish this?
Can'y you just do:
myDocument.getElementsByTag('p')
JSoup getElementsByTag
You can then iterate over the returned elements and get their data/text/ownText / whatever you think is most relevant for what you want to do.
JSoup Element.text()
Im trying to display html in a JEditorPane. Initially the type is set to "text/html".
When I use setPage(URL) it works fine and the resulting output is displayed but If I have a String that contains HTML code and I used setText(String) to display the result on the JEditorPane nothing is displayed I see only white space.
Of-course if I copied the whats in the string pasted it in notpad, saved it as .html then opened the resulting file in the browser it displayed correctly. The real problem is in how or what the JEditorPane does with the string inorder to display whats inside it. The JEditorPane is inside a JscrollPane which is inside a Jframe. and I only used setContentType( "text/html" ) and setText(String) methods for html display.
Is there anyway to get around this than wrting the resulting html code to a file and using SetPage(URL)? I can post the html code if you need it (but its quite large). Thanks for your help.
Don't know why setText does not work. But here is a workaround.
Try this URL. (the whole file in the URL) (This is what Android's WebView calls when you setText in it)
data:text/html;charset=utf-8,%3C%21DOCTYPE%20html%3E%0D%0A%3Chtml%20lang%3D%22en%22%3E%0D%0A%3Chead%3E%3Ctitle%3EEmbedded%20Window%3C%2Ftitle%3E%3C%2Fhead%3E%0D%0A%3Cbody%3E%3Ch1%3E42%3C%2Fh1%3E%3C%2Fbody%3E%0A%3C%2Fhtml%3E%0A%0D%0A
It starts with data:text/html;charset=utf-8, and is followed by your HTML.
However you do have to encode it.. At least you have to replace % with %25 The rest might just work without encoding though.
You can also use this code to embed images without calling a file
<img src="" />
You just have to base64 encode your image and then you can paste it right in.