Create a pdf with text at given coordinates (PDFBox?) - java

My Situation:
I'm programming in java
Using a library from a person from my university I'm able to read pdfs and create a XML document out of it
This XML document contains additional informations e.g. the coordinates of the text in the original document
My Problem
I would like to create the read PDF again with the content set at its original coordinates (Again: I have the coordinates)
My Question:
-> Do you know a way to create a pdf and set the text of the pdf at given coordinates? <-
I'm doing a lot of research these days about, but maybe I tried the wrong google search terms since I cant find much helpful results. So i thought I might be able to ask here, in the forum where I found the most help so far in my young "programmers life" :)
Most of the results I get, even here, are about people trying to get the coordinates, but I already have them.
I heard during a discussion that PDFBox might be able to do this, but I'm also happy to work with any other framework or library that is capable for my problem.
Thanks for every help and thought you're sharing with me.

Thanks a lot for your comments. In the end I've decided for iText, which allowed me to do all my tasks (placing text at absolute coordinates, give it a background color by certain criterias) in a quite easy and efficient way.
If someone here is searching for inspiration and has a similar task, check my related post here on stackoverflow for some code snippets How can I add a background color to my (pdf-) text using iText to create it with Java

Related

poi PropertyTemplate.Extent vs BorderExtent

I'm just working with Borders using the POI library (Thank you for the amazing work!) and I've just discovered the PropertyTemplate. While going through the Quick Guide (https://poi.apache.org/spreadsheet/quick-guide.html#DrawingBorders) I wrote up the following:
propertyTemplate.drawBorders(range, borderType, color, extent);
While trying to fill in the extent following the quick-guide, it shows "PropertyTemplate.Extent.ALL" for example as a constant. When I try to match that, the PropertyTemplate.Extent does not exist.
I tried however with "BorderExtent.ALL" and that works. Is it just a typo in the quick-guide?
I looked for a way to contact them directly about the quick guide but I didn't want to go through all the mailling list or bug contribution sections just for an update on the website.
Does anyone know:
If BorderExtent.ALL (or any of the other constants) is correct or should it be PropertyTemplate.Extent.ALL and I'm doing something wrong?
Is there a way to notify the POI team to update the Quick-Guide without disturbing to many people?
Thank you!
Alex

Country border coordinate data

I'm writing a geography game in Java, and I'd like to have some data on the locations of the borders of countries, but all I can find is shapefiles, and I can't get latitude/longitude data out of them, or else I can only find a single coordinate for each country.
Where can I find
a way to extract the longitude/latitude data into usable data in Java or in a text file?
a web site with free data on country borders that can be used in a java program?
Edit:
It doesn't need to be exact; for pretty much anything except Russia, China the U.S., and Brazil, 10 coordinates is probably enough. Islands don't really matter either. I just want to know be able to calculate relatively accurately the shortest distance between two countries.
Download the generalized Country borders from here: http://www.baruch.cuny.edu/geoportal/data/esri/esri_intl.htm. These are probably more detailed than you want (Canada has the most vertices at 3316), but is the only free rough border data set I could find online.
To get the coordinates from a shapefile as text, go to MyGeodata Converter
Run Vector Converter
Upload the zip file you just downloaded.
Check available operations
Export to GeoJSON
Download Zip file form MyGeodata Converter
Unzip the file.
Now you have the boundaries in GeoJSON format and can use a GeoJSON parser or a more simple text parser to get the coordinate data.
If that's too much work, you can also parse shapefiles with one of the various Java shapefile frameworks out there. See Does anyone know of a library in Java that can parse ESRI Shapefiles? for some options.
Download the osm file and import into postgresql+postgis(via osm2pgsql tool will do).
If you required another database migrate postgresql to your desired database
http://planet.openstreetmap.org/
The MyGeodata converter only throws a 500 error at the moment, so I had to look further and finally stumbled across KML files that roughly describe the outline of each country. I posted the link on this related question, so someone might want to look there.

PDF Parsing tables in java with Pdfbox

i've been looking for quite long time for answer, but i haven't found anything.
My problem is in parsing pdf, i have page made with some kind of tables.
I've already written some code via which i can extract iformation from specified rectangle, but i am declaring those values in code and it is not dynamic as it should. I want to find information about cells and with this information i will be able to get those string which i will need. In PDFbox api i haven't found anything what could be useful.
I would be graceful for any tips.

Rapid Miner 101

I'm back with a question. I'm playing with Rapid Miner for automatic text classification and cant get it work. I'm getting an error that says, "no example set in the example, offending operator Performance ". Any idea what that is referring to ?
In RapidMiner you have to use the converter components before using it as example sets. So, if you have an output as 'doc', for example, you have to use the component 'Documents to Data' in order to link it to the next input 'exa'. That´s all!
Could you provide more details about your RapidMiner text mining process?
Without more context, your question is difficult to answer.
For more help with RapidMiner, you may want to check out the RapidMiner user forum: http://forum.rapid-i.com/
At RapidMiner Resources, you can find RapidMiner tutorial videos about how to text mining with RapidMiner:
http://rapidminerresources.com/index.php?page=text-mining-3
Rapid-I also offers a 90 minutes text mining webinar. You can find it at the Rapid-I web page under "services" and "training" or in the web shop.
I hope these links help you to get started with automatic text classification with RapidMiner. If you provide more details about your RapidMiner text mining process, I may also be able to directly answer your question.
If it says that there is no Example Set, then the issue is probably with your original data. Can you post an image of your process?
For instance, make sure that you have connected the initial input to your operator - what two operators does the error occur at?
One thought: the example set in text mining is usually your document collection, but if you are really using documents (PDF, Word) then your format will be Documents (Doc), and you may need to transform your documents to data (Documents to Data operator). Then you should have an Example Set that you can feed into your Performance operator.
Hope this helps - as the earlier comment said, without knowing the process, it is hard to tell exactly where the error is.

Generate Images for formulas in Java

I'd like to generate an image file showing some mathematical expression, taking a String like "(x+a)^n=∑_(k=0)^n" as input and getting a more (human) readable image file as output. AFAIK stuff like that is used in Wikipedia for example. Are there maybe any java libraries that do that?
Or maybe I use the wrong approach. What would you do if the requirement was to enable pasting of formulas from MS Word into an HTML-document? I'd ask the user to just make a screenshot himself, but that would be the lazy way^^
Edit: Thanks for the answers so far, but I really do not control the input. What I get is some messy Word-style formula, not clean latex-formatted one.
Edit2: http://www.panschk.de/text.tex
Looks a bit like LaTeX doesn't it? That's what I get when I do
clipboard.getContents(RTFTransfer.getInstance()) after having pasted a formula from Word07.
First and foremost you should familiarize yourself with TeX (and LaTeX) - a famous typesetting system created by Donald Knuth. Typesetting mathematical formulae is an advanced topic with many opinions and much attention to detail - therefore use something that builds upon TeX. That way you are sure to get it right ;-)
Edit: Take a look at texvc
It can output to PNG, HTML, MathML. Check out the README
Edit #2 Convert that messy Word-stuff to TeX or MathML?
My colleague found a surprisingly simple solution for this very specific problem: When you copy formulas from Word2007, they are also stored as "HTML" in the Clipboard. As representing formulas in HTML isn't easy neither, Word just creates a temporary image file on the fly and embeds it into the HTML-code. You can then simply take the temporary formula-image and copy it somewhere else. Problem solved;)
What you're looking for is Latex.
MikTex is a nice little application for churning out images using LaTeX.
I'd like to look into creating them on-the-fly though...
Steer clear of LaTeX. Seriously.
Check out JEuclid. It can convert MathML expressions into images.

Categories

Resources