Generate HTML from plain text using Java - java

I have to convert a .log file into a nice and pretty HTML file with tables. Right now I just want to get the HTML header down. My current method is to println to file every single line of the HTML file. for example
p.println("<html>");
p.println("<script>");
etc. there has to be a simpler way right?

How about using a JSP scriplet and JSTL?, you could create some custom object which holds all the important information and display it formatted using the Expression Language.

Printing raw HTML text as strings is probably the "easiest" (most straightforward) way to do what you're asking but it has its drawbacks (e.g. properly escaping the content text).
You could use the DOM (e.g. Document et al) interface provided by Java but that would hardly be "easy". Perhaps there are "DOM builder" type tools/libraries for Java that would simplify this task for you; I suggest looking at dom4j.

Look at this Java HTML Generator library (easy to use). It should make generating the actual HTML muuuch clearer. There are complications when creating HTML with Java Strings (what happens if you want to change something like a rowspan?) that can be avoided with this library. Especially when dealing with tables.

There are many templating engines available. Have a look at https://stackoverflow.com/questions/174204/suggestions-for-a-java-based-templating-engine
This way you can define a template in a txt file and have the java code fill in the variables.

Related

Java add attribute to HTML tags without changing formatting

A have a task to make a maven plugin which takes HTML files in certain location and adds a service attribute to each tag that doesn't have it. This is done on the source code which means my colleagues and I will have to edit those files further.
As a first solution I turned to Jsoup which seems to be doing the job but has one small yet annoying problem: if we have a tag with multiple long attributes (we often do as this HTML code is a source for further processing) we wrap the lines like this:
<ui:grid id="category_search" title="${handler.getMessage( 'title' )}"
class="is-small is-outlined is-hoverable is-foldable"
filterListener="onApplyFilter" paginationListener="onPagination" ds="${handler.ds}"
filterFragment="grid_filter" contentFragment="grid_contents"/>
However, Jsoup turns this into one very long line:
<ui:grid id="category_search" title="${handler.getMessage( 'title' )}" class="is-small is-outlined is-hoverable is-foldable" filterListener="onApplyFilter" paginationListener="onPagination" ds="${handler.ds}" filterFragment="grid_filter" contentFragment="grid_contents"/>
Which is a bad practice and real pain to read and edit.
So is there any other not very convoluted way to add this attribute without parsing and recomposing HTML code or maybe somehow preserve line breaks inside the tag?
Unfortunately JSoup's main use case is not to create HTML that is read or edited by humans. Specifically JSoup's API is very closely modeled after DOM which has no way to store or model line breaks inside tags, so it has no way to preserve them.
I can think of only two solutions:
Find (or write) an alternative HTML parser library, that has an API that preserves formatting inside tags. I'd be surprised if such a thing already exists.
Run the generated code through a formatter that supports wrapping inside tags. This won't preserve the original line breaks, but at least the attributes won't be all on one line. I wasn't able to find a Java library that does that, so you may need to consider using an external program.
It seems there is no good way to preserve breaks inside tags while parsing them into POJOs (or I haven't found one), so I wrote a simple tokenizer which splits incoming HTML string into parts sort of like this:
String[] parts = html.split( "((?=<)|(?<=>))" );
This uses regex lookups to split before < and after >. Then just iterate over parts and decide whether to insert attribute or not.

Writing in HTML using Java

I have an html file already, and i have a java file that contain methods returning integers.
i want to print those integers in the html, is it possible to transfer the data from java to html and what is the simplest way?
You can write html Templates and fill them with java.
For example:
String html=
"<div>{{key_name}}</div>";
String value=42+"";
String key="key_name";
html=html.replaceAll("{{"+key+"}}",value);
You could just write it in JavaScript. Link your html file to a JavaScript document, or write it in tags. Its not the same language obviously, but basic syntax such as returning numbers is very similar and not only is there less chance of error, but there it is easier to fix. Its hard to say without seeing how much Java code you have written though. If you are interested in web development JavaScript is extremely important to learn and if you know Java it should not be too difficult to pick up the basics. This being said JavaScript is very finicky.

Parsing html text to obtain input fields

So I currently have a big blob of html text, and I want to generate an input form based on what is contained in that text. For example, if the text contains '[%Name%]', I want to be able to read that in and recognize 'Name' is there, and so in turn enable a form field for name. There will be multiple tags ([%age%], [%height%], etc.)
I was thinking about using Regex, but after doing some research it seems that Regex is a horrible idea to parse html with. I came across parsing html pages with groovy, but it is not strictly applicable to my implementation. I am storing the html formatted text (which I am creating using ckeditor) in a database.
Is there a efficient way to do this in java/groovy? Or should I just create an algorithm similar to examples shown here (I'm not too sure how effective the given algorithms would be, as they seem to be constructed around relatively small strings, whereas my string to parse through may end up being quite large (a 15-20 page document)).
Thanks in advance
Instead of reimplementing the wheel I think it's better to use jsoup. It is an excellent tool for your task and would be easy to obtain anything in a html page using it's selector syntax. Check out examples of usage in their cookbook.

HTML to ODT – XSLT?

I'm trying to convert single pieces of HTML code to the XML Format the *.odt format (Open Office) is using. For example, <p>This is some text</p> should be translated to <text:p>This is some text</text:p>. Of course, this should also work with lists etc.
I'm not sure whether the best way to go would be using a XSLT processor (and if so, which one for Java?) and create the stylesheet myself – isn't there a Java library out there that can already do this?
I'm using jodconverter to go from ODT->PDF, but even though OpenOffice Writer can handle copy&pasting the content and display it in the desired way, jodconvert doesn't seem to be able to "translate" single pieces of HTML (or am I wrong about that?).
Any ideas and suggestions would be very welcome. I should add that I'm absolutely new to Java. Thanks in advance
Ingo
XSLT is the best way to do it. The OpenDocument group is working on a HTML to ODT xsl template. Sadly, it is not ready yet.
You can check on their website to stay in touch (and get beta work maybe).
Otherwise, you have non official project, also based on XSLT: like this one
It would be easy to apply a little transformation on your HTML to get a valid XHTML before processing it to ODT.
Or just check this other example.

How to detect different data types inside HTML page?

What is the best way to detect data types inside html page using Java facilities DOM API, regexp, etc?
I'd like to detect types like skype plugin does for the phone/skype numbers, similar for addresses, emails, time, etc.
'Types' is an inappropriate term for the kind of information you are referring to. Choice of DOM API or regex depends upon the structure of information within the page.
If you know the structure, (for example tables being used for displaying information, you already know from which cell you can find phone number and which cell you can find email address), it makes sense to go with a DOM API.
Otherwise, you should use regex on plain HTML text without parsing it.
I'd use regexes in the following order:
Extract only the BODY content
Remove all tags to leave just plain text
Match relevant patterns in text
Of course, this assumes that markup isn't providing hints, and that you're purely extracting data, not modifying page context.
Hope this helps,
Phil Lello

Categories

Resources