I'm writing a geography game in Java, and I'd like to have some data on the locations of the borders of countries, but all I can find is shapefiles, and I can't get latitude/longitude data out of them, or else I can only find a single coordinate for each country.
Where can I find
a way to extract the longitude/latitude data into usable data in Java or in a text file?
a web site with free data on country borders that can be used in a java program?
Edit:
It doesn't need to be exact; for pretty much anything except Russia, China the U.S., and Brazil, 10 coordinates is probably enough. Islands don't really matter either. I just want to know be able to calculate relatively accurately the shortest distance between two countries.
Download the generalized Country borders from here: http://www.baruch.cuny.edu/geoportal/data/esri/esri_intl.htm. These are probably more detailed than you want (Canada has the most vertices at 3316), but is the only free rough border data set I could find online.
To get the coordinates from a shapefile as text, go to MyGeodata Converter
Run Vector Converter
Upload the zip file you just downloaded.
Check available operations
Export to GeoJSON
Download Zip file form MyGeodata Converter
Unzip the file.
Now you have the boundaries in GeoJSON format and can use a GeoJSON parser or a more simple text parser to get the coordinate data.
If that's too much work, you can also parse shapefiles with one of the various Java shapefile frameworks out there. See Does anyone know of a library in Java that can parse ESRI Shapefiles? for some options.
Download the osm file and import into postgresql+postgis(via osm2pgsql tool will do).
If you required another database migrate postgresql to your desired database
http://planet.openstreetmap.org/
The MyGeodata converter only throws a 500 error at the moment, so I had to look further and finally stumbled across KML files that roughly describe the outline of each country. I posted the link on this related question, so someone might want to look there.
Related
i've been looking for quite long time for answer, but i haven't found anything.
My problem is in parsing pdf, i have page made with some kind of tables.
I've already written some code via which i can extract iformation from specified rectangle, but i am declaring those values in code and it is not dynamic as it should. I want to find information about cells and with this information i will be able to get those string which i will need. In PDFbox api i haven't found anything what could be useful.
I would be graceful for any tips.
My Situation:
I'm programming in java
Using a library from a person from my university I'm able to read pdfs and create a XML document out of it
This XML document contains additional informations e.g. the coordinates of the text in the original document
My Problem
I would like to create the read PDF again with the content set at its original coordinates (Again: I have the coordinates)
My Question:
-> Do you know a way to create a pdf and set the text of the pdf at given coordinates? <-
I'm doing a lot of research these days about, but maybe I tried the wrong google search terms since I cant find much helpful results. So i thought I might be able to ask here, in the forum where I found the most help so far in my young "programmers life" :)
Most of the results I get, even here, are about people trying to get the coordinates, but I already have them.
I heard during a discussion that PDFBox might be able to do this, but I'm also happy to work with any other framework or library that is capable for my problem.
Thanks for every help and thought you're sharing with me.
Thanks a lot for your comments. In the end I've decided for iText, which allowed me to do all my tasks (placing text at absolute coordinates, give it a background color by certain criterias) in a quite easy and efficient way.
If someone here is searching for inspiration and has a similar task, check my related post here on stackoverflow for some code snippets How can I add a background color to my (pdf-) text using iText to create it with Java
Good morning, fellas. I have been assigned a task wherein I am supposed to extract text from a PDF file (a bank invoice), as per the given specification of fields and sections. This specification is given in a YAML file. The fields are expressed as a set of two coordinates - top left and right bottom of the rectangle in which the text resides, and the name of the field. I am using SnakeYAML to load this info into objects. I have been successful upto this point. For the next part, where I have to extract text from PDFs using this data, well... I am kind of stuck here. For one, I am yet unable to decide on what PDF parsing library to use. Can you please suggest me a PDF parsing library suited to my task, and how should I go about accomplishing the above mentioned task? Thanks!
PDF Box is able to extract text from a given area. Have a look at PDFTextStripperByArea!
I am working on a project here that ingests internal resumes from people at my company, strips out the skills and relevant content from them and stores it in a database. This was all done using docx4j and Grails. This required the resumes to first be submitted via a template that formatted everything just right so that the ingest tool knew what to look for to strip the data.
The 2nd portion of this, is what if we want to get out a "reduced" resume from the database. In other words, I want to search the uploaded content I now have, and only print out new resumes for people who have Java programming experience lets say. So I can go into my database, find the people who originally had java as a skill, and output a new set of resumes that are also still in a nice templated format, and only have the relevant info in them, instead of ALL the content.
I have been writing some software to do this in Java that will basically use a docx template, overwriting the items in customXML which are bound to the content controls in the doc, so the new data shows up and can eb saved as a new docx with that custom data.
This seems really cumbersome to me, and has some limitations. For one, lets say my template has a place for 3 Skills, and the particular person has 8 skills. There seems to be no good way to add those 5 additional skills to the docx other than painstakingly inserting the data with all of the formatting XML tags and such. This is a real pain, because if the template changes, I dont want to have to go back into my software and edit source code to change that additional data input XML tag to bold instead of italic.
I was doing some reading up on using Infopath to create a form that I could use to get the input, connecting to some sharepoint data source or something to store the stripped out data. However, I can't seem to find out if it is possible using sharepoint to get the data back out, in a nice formatted way. What would the general steps for this be? It seems like I couldnt find very much about this topic with any quick googling.
Thanks
You could set up the skills:
<skills>
<skill>..</skill>
<skill>..</skill>
and use a "repeat" content control pointing to the container. This would handle any number of <skill> entries.
What's the best way to do spreadsheet-like calculations in a programming language? Example: A multi-user application needs to be available over the web that crunches columns and cells of numbers like a spread-sheet based on user submission. What are the best data structures/ database models/patterns to handle this type of work so that handling the different columns are done efficiently and easily in php, java, or even .Net. Is it better to use data structures within the language, or is it better to use a database? If using a database is the way, how does one go about doing this?
To do the actual calculation, look at graph theory. Basically you want to represent each cell as a node in a graph and each dependency as a directed edge. Next, do a topological sort to calculate the value of each cell in the right order.
Aspose.Cells (formerly Aspose.Excel.Web) is a good way to get the functionality you are looking for.
Unless you are asking more for a "How is it done?" than "I need to do it." Then I would look at the other answers given.
Along the lines of "I need to do it"
Microsoft has Excel Services which does just what you want.
Spreadsheet operations on the server. It is available via a web services interface, so you can connect and drive calculations from Java, PHP, .NET, whatever.
Excel Services is part of Sharepoint 2007.
Resolver One is a Spreadsheet app made in IronPython.
There is an explanation of the overall mechanic for the calculation [pythonology.org] it uses for user generated ecuations.
The relevant image showing Resolver One's overall algorithm.
Should note that users can write python code to be interpreted both on the cells and a special 'outside of sheet' place.
Look at another question here in SO, from where I reused my answer.
I can't tell you how to do it. But I would recommend you to look at the code of PHPExcel. PHPExcel is a library that allows you to create Excel files within PHP.
The workflow of PHPExcel is simplified like this:
Create an empty Excel file object
Add cells (with either data or formulas) to the "Excel file"
Call the create function which is generating the file itself
In your case you would have to replace 3. with something like "Create web interface".
Therefore I would recommend you to look at the code of this open source project and look how the general structure is. This should help you solving your problem.
I once used a binary tree to store the output of parsing a string using BODMAS. Each node was an operation between two other nodes, which could be a number, a variable or another operation.
So y = x * x + 2
became:
+
* 2
x x
Sadly this was at school in Pascal and is stored on a 5 1/4" disk, so you don't want it :)
SpreadsheetGear for .NET will let you load Excel workbooks, plug in values, calculate and then get the results.
You can see a few simple ASP.NET calculation samples here, other ASP.NET samples here and download a free trial here.
Disclaimer: I own SpreadsheetGear LLC
I must point out that google spreadsheets already does this kind of stuff.