I need to format java code to put into a Word document. Are there any programs that will do this with keyword highlighting, etc. ?
When I copy/paste from my IDE (Eclipse), the formatting comes along for the ride.
You'll probably want to turn off "Mark Occurrences" first.
This is a late reply but since it's quite a specific requirement I'll post my comment anyway.
You can do this programmatically with Docmosis assuming you want the program to be running in Java (not just showing java in documents) and can install OpenOffice where the program runs. The process would be:
Create a doc or odt file that will
act as a template (setting fonts,
position, tables etc) and will have
a placeholder for where you want to
insert the code sample
Add docmosis to your java project
and write the code to initialise
Docmosis, register the template,
then render document with your
selected Java code.
Currently, Docmosis FieldRenderers
can underline or italicize your data
as it goes, but the rendering is
currently applied to the entire
field. So this wouldn't let you
have a single field for all your
java text and individually highlight
words, but there are a few other
tricks that you could employ to get
useful/interesting results (such as
splitting your data into separate
fields and letting Docmosis render
the fields differently).
The "java code" text that you specify as data will be inserted into your template using the font and layout properties in the template. The renderer will have a chance to override specific formatting.
You can just copy and then paste it to the word document. I am using OS X as well. I just works fine. I am uploading the screenshot of how it looks in word.
I'm using Easy Code Formatter as called out here: How do you display code snippets in MS Word preserving format and syntax highlighting?
It's an Office add-in. You can select multiple themes, enable / disable line numbering / highlight lines in rectangles. It allows you to select the coding style / and has a quick formatting button. Pretty neat.
Requires you to have Office 2013 or beyond.
Related
I am trying to make some existing PDF's into templets.
Because these documents hold real data I am replaceing this data such as names and addrsss and making them into dummy place holders.
Examples
[[Name]]
[[Address1]]
When I alter the text via the iText version 5 library replace via a program I can use the template.
To speed things up I tried to use Adobe DC.
When using this method the template stops working.
Any ideas?
From what I understand of your question;
you have (or want to have) a template document
fill in the template with data from a program
turn this back into a pdf
You can easily achieve some of your goals with iText.
I suggest you look into http://developers.itextpdf.com/examples/form-examples/clone-filling-out-forms
Within a Java application, I would need to convert marked-down text into simple plain text instead of html (for example dropping all links addresses, bold and italic markers).
Which is the best way to do this? I was thinking using a markdown library like fleaxmark. But I cant find this feature at first sight. Is it there? Are there other better alternatives?
Edit
Commonmark supports rendering to text, by using org.commonmark.renderer.text.TextContentRenderer instead of the default HTML renderer. Not sure what it does with newlines, but worth a try.
Original answer, using flexmark HTML + JSoup
The ideal solution would be to implement a custom Renderer for flexmark, but this would force you to write a model-to-string for all language features in markdown. Unless it supports this out of the box, but I'm not aware of this feature...
A simpler solution may be to use flexmark (or any other lightweight markdown renderer) and let it create the HTML. After that, just run the generated HTML through https://jsoup.org/ and let it extract the text:
Jsoup.parse(htmlInputStream).text();
String org.jsoup.nodes.Element.text()
Gets the combined text of this element and all its children. Whitespace is normalized and trimmed.
For example, given HTML <p>Hello <b>there</b> now! </p>, p.text() returns Hello there now!
We use this approach to get a "preview" of the text entered in a rich content editor (summernote), after being sanitized with org.owasp.html.HtmlSanitizer.
flexmark also have mark down to text feature.
checkout this
Is it possible to get the styles of a paragraph in a particular langage ?. For example: on my personal computer I happen to have a dutch installation of microsoft windows. this is resulting in the paragraph.getStyles() method returning the dutch values of the styles, instead of a normal value of "heading1", "heading2" etc I am receiving values such as"Kop1", "kop2".
I am creating a parser for word based documents which selects certain parts on style. does anyone have any experience with this ?
I would take a look at the data in the .docx file (it's a zip-file) to verify if the data is written this way by Word already or "transposed" by POI or some local functionality.
If the data is already written by Word you will need to check how you can create the document in a different language in Word.
If not, then if you are using POI 3.13 or newer, you can try to set a different locale via LocaleUtil.setUserLocale() and see if that affects the results.
I have a use case in which I need to render an unformatted text in the format of a given web page programmatically in Java. i.e. The text should automatically be formatted like the web page with styles, paragraphs, bullet points etc.
As I see first I will have to analyze the piece of unformatted text to find out the candidates for paragraphs, bullet points, headings etc. I intend to use Lucene analyzers/tokenizers for this task. Are there any alternatives?
The second problem is to convert the formatted web page into some kind of template (e.g. velocity template) with place holders for various entities like titles, bullet points etc.
Is there any text analysis/templating library in Java that can help me do this? Preferably open source.
Are there any other suggestions for doing this sort of task in a better way in Java?
Thanks for your help.
There are a lot of hard parts to what you're doing.
The user input
If you don't ask your user to provide any context, you're never going to guess the structure of the text. At least, you should ask them to provide a title, and a series of paragraph in your GUI.
Ideally, you could ask them to follow a well-know markup language (Markdown, Textile, etc...) and use the open source parser to extract the structure.
The external page
If any page is used, the only things you can rely on are the "structural markup". So assuming you know the title of the page should be "Hello World", and there is a "h1" element somewhere in the page, you can maybe assume that this is where the header could go.
But if the pages is a div tag-soup, and only CSS is used to differentiate the rendering of the header as opposed to the bulk of the text, you're going to have to guess how the styling is done : that's plain impossible if you don't know how the page is made.
I don't think Lucene would help fo this (as far as I know Lucene is made to create an index of the words used in a bulk of text ; I don't think it can help you guessing which part of the text is meant to be a title, a subtitle, etc...)
Generating templates from external page
Assuming you have "guessed" right, you could generate the content by
copy pasting the page
replacing the parts to change with tags of your template language of choice
storing the template somewhere the templating system can access it
configure your template / view system (viewResolver for velocity) to use the right template for the rigth person
That would of course pose terrible legal questions, since your templates would incorporate works by the original website author (most probably copyrighted material)
A more realistic solution
I would suggest you constrain your problem to :
using input that has some structure information available (use a GUI to enter it, use a markup language, whatever)
using templates that you provide, know the structure of (and can reuse very easily)
Note that none of those points are related to the template system.
Otherwise, I'm afraid you're heading to an unreasonnable amount of work...
I am working on a project here that ingests internal resumes from people at my company, strips out the skills and relevant content from them and stores it in a database. This was all done using docx4j and Grails. This required the resumes to first be submitted via a template that formatted everything just right so that the ingest tool knew what to look for to strip the data.
The 2nd portion of this, is what if we want to get out a "reduced" resume from the database. In other words, I want to search the uploaded content I now have, and only print out new resumes for people who have Java programming experience lets say. So I can go into my database, find the people who originally had java as a skill, and output a new set of resumes that are also still in a nice templated format, and only have the relevant info in them, instead of ALL the content.
I have been writing some software to do this in Java that will basically use a docx template, overwriting the items in customXML which are bound to the content controls in the doc, so the new data shows up and can eb saved as a new docx with that custom data.
This seems really cumbersome to me, and has some limitations. For one, lets say my template has a place for 3 Skills, and the particular person has 8 skills. There seems to be no good way to add those 5 additional skills to the docx other than painstakingly inserting the data with all of the formatting XML tags and such. This is a real pain, because if the template changes, I dont want to have to go back into my software and edit source code to change that additional data input XML tag to bold instead of italic.
I was doing some reading up on using Infopath to create a form that I could use to get the input, connecting to some sharepoint data source or something to store the stripped out data. However, I can't seem to find out if it is possible using sharepoint to get the data back out, in a nice formatted way. What would the general steps for this be? It seems like I couldnt find very much about this topic with any quick googling.
Thanks
You could set up the skills:
<skills>
<skill>..</skill>
<skill>..</skill>
and use a "repeat" content control pointing to the container. This would handle any number of <skill> entries.