Generate Images for formulas in Java - java

I'd like to generate an image file showing some mathematical expression, taking a String like "(x+a)^n=∑_(k=0)^n" as input and getting a more (human) readable image file as output. AFAIK stuff like that is used in Wikipedia for example. Are there maybe any java libraries that do that?
Or maybe I use the wrong approach. What would you do if the requirement was to enable pasting of formulas from MS Word into an HTML-document? I'd ask the user to just make a screenshot himself, but that would be the lazy way^^
Edit: Thanks for the answers so far, but I really do not control the input. What I get is some messy Word-style formula, not clean latex-formatted one.
Edit2: http://www.panschk.de/text.tex
Looks a bit like LaTeX doesn't it? That's what I get when I do
clipboard.getContents(RTFTransfer.getInstance()) after having pasted a formula from Word07.

First and foremost you should familiarize yourself with TeX (and LaTeX) - a famous typesetting system created by Donald Knuth. Typesetting mathematical formulae is an advanced topic with many opinions and much attention to detail - therefore use something that builds upon TeX. That way you are sure to get it right ;-)
Edit: Take a look at texvc
It can output to PNG, HTML, MathML. Check out the README
Edit #2 Convert that messy Word-stuff to TeX or MathML?

My colleague found a surprisingly simple solution for this very specific problem: When you copy formulas from Word2007, they are also stored as "HTML" in the Clipboard. As representing formulas in HTML isn't easy neither, Word just creates a temporary image file on the fly and embeds it into the HTML-code. You can then simply take the temporary formula-image and copy it somewhere else. Problem solved;)

What you're looking for is Latex.
MikTex is a nice little application for churning out images using LaTeX.
I'd like to look into creating them on-the-fly though...

Steer clear of LaTeX. Seriously.
Check out JEuclid. It can convert MathML expressions into images.

Related

How to detect mistakes in IRIs in a RDF file?

I am trying to make a RDF corrector. One of the things I specifically want to correct are IRIs. My question is that, irrespective of the RDF format, is there anything that I can do to correct mistakes in the IRI? I understand there can be multiple number of mistakes, but what are the most generic mistakes that I can fix?
I am using ANTLR to make the corrector. I have extended the BaseErrorListener so that it gives out the errors made in the IRI in particular.
In my experience, the errors made in the real world depend on the source. A source may be systematically creating IRIs with spaces in, or have been binary copied between ISO-8859-1 ("latin") and UTF-8 (the correct format) which corrupts the UTF-8. These low level errors can be best fixed with a text editor on the input file (and correct the code generating them).
Try a few sample IRIs at http://www.sparql.org/iri-validator.html, which prints out warnings and errors, and is the same code as the parsers.

Mahout: converting one large text file to SequenceFile format

I have done a lot of searching on the web for this, but I've found nothing, even though I feel like it has to be somewhat common. I have used Mahout's seqdirectory command to convert a folder containing text files (each file is a separate document) in the past. But in this case there are so many documents (in the 100,000s) that I have one very large text file in which each line is a document. How can I convert this large file to SequenceFile format so that Mahout understands that each line should be considered a separate document? Thank you very much for any help.
Yeah, it is not quite apparent or very intuitive how to do this, although (lucky for you :P) I have answered that exact question several times here in stack, for instance here. Have a look ;)

Create a pdf with text at given coordinates (PDFBox?)

My Situation:
I'm programming in java
Using a library from a person from my university I'm able to read pdfs and create a XML document out of it
This XML document contains additional informations e.g. the coordinates of the text in the original document
My Problem
I would like to create the read PDF again with the content set at its original coordinates (Again: I have the coordinates)
My Question:
-> Do you know a way to create a pdf and set the text of the pdf at given coordinates? <-
I'm doing a lot of research these days about, but maybe I tried the wrong google search terms since I cant find much helpful results. So i thought I might be able to ask here, in the forum where I found the most help so far in my young "programmers life" :)
Most of the results I get, even here, are about people trying to get the coordinates, but I already have them.
I heard during a discussion that PDFBox might be able to do this, but I'm also happy to work with any other framework or library that is capable for my problem.
Thanks for every help and thought you're sharing with me.
Thanks a lot for your comments. In the end I've decided for iText, which allowed me to do all my tasks (placing text at absolute coordinates, give it a background color by certain criterias) in a quite easy and efficient way.
If someone here is searching for inspiration and has a similar task, check my related post here on stackoverflow for some code snippets How can I add a background color to my (pdf-) text using iText to create it with Java

Rapid Miner 101

I'm back with a question. I'm playing with Rapid Miner for automatic text classification and cant get it work. I'm getting an error that says, "no example set in the example, offending operator Performance ". Any idea what that is referring to ?
In RapidMiner you have to use the converter components before using it as example sets. So, if you have an output as 'doc', for example, you have to use the component 'Documents to Data' in order to link it to the next input 'exa'. That´s all!
Could you provide more details about your RapidMiner text mining process?
Without more context, your question is difficult to answer.
For more help with RapidMiner, you may want to check out the RapidMiner user forum: http://forum.rapid-i.com/
At RapidMiner Resources, you can find RapidMiner tutorial videos about how to text mining with RapidMiner:
http://rapidminerresources.com/index.php?page=text-mining-3
Rapid-I also offers a 90 minutes text mining webinar. You can find it at the Rapid-I web page under "services" and "training" or in the web shop.
I hope these links help you to get started with automatic text classification with RapidMiner. If you provide more details about your RapidMiner text mining process, I may also be able to directly answer your question.
If it says that there is no Example Set, then the issue is probably with your original data. Can you post an image of your process?
For instance, make sure that you have connected the initial input to your operator - what two operators does the error occur at?
One thought: the example set in text mining is usually your document collection, but if you are really using documents (PDF, Word) then your format will be Documents (Doc), and you may need to transform your documents to data (Documents to Data operator). Then you should have an Example Set that you can feed into your Performance operator.
Hope this helps - as the earlier comment said, without knowing the process, it is hard to tell exactly where the error is.

Java implementation for LDPC codes

Is there any open source Java implementation for LDPC (Low Density Parity Check) codes, I found only MATLAB codes.
My scenario is I will take text file and divide into block and I will delete some data in text file, and by using LDPC codes I need to recover data from text files.
Thanks.
I haven't tried this but the code here should get you started
http://www.cs.utoronto.ca/~radford/ftp/LDPC-2006-02-08/install.html
http://www.cs.utoronto.ca/~radford/ftp/LDPC-2006-02-08/examples.html
It's in C though. Might be easy to port. Or not.
I'd suggest looking into ways of calling matlab functions in java. I know there are a couple. Also why LDPC? While its one of the best FEC, it involves lots of matrix manipulation if I recall correctly. This is stuff much better suited for mat[rix]lab. The right tool for the right job...
There are also these two pure Java implementations:
https://github.com/a4a881d4/ldpc-java
https://github.com/pierroweb/LDPC-correcting-codes
I haven't tested them and would appreciate feedback from anyone else that has.
There's also a Java wrapper around a C++ library: http://cpham.perso.univ-pau.fr/MULTICAST/Java_wrapper_for_LDPC.html
Not the most promising results, but something to start from, at the very least.

Categories

Resources