Extracting data from Wikipedia [closed]

Extracting data from Wikipedia [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am creating a Spring application and I have the need to integrate with Wikipedia. In particular, I would like to extract data on a given (large) set of Cities, e.g. country, website and coordinates.
I am trying to understand which libraries or frameworks can be useful, but the big issue I am dealing with is that there is no reference structure for the pages I would like to extract information from. The main problem is not that some information is missing, which would be totally acceptable, but rather the city representation changes from city to city. E.g. the DBPedia ontology (e.g. http://dbpedia.org/ontology/City) does not reflect what I can extract via SPARQL query from dbpedia.org/sparql. This way, I don't know how to extract the data I need systematically (i.e. for my entire set).
Is there any technology that can support my task of extracting data on a predefined set of cities?
One solution could be to put in place some Natural Language Processing in order to seek for the required info in the entire Wikipedia page, but that requires a lot of effort, if I have to write it on my own.
Another solution could be leveraging a source of structured data that pre-processed Wikipedia for me and gave some structure to the contained information, but I could not find one.
On third solution could be trying to make different queries to Wikipedia, but I cannot figure out a way to extract the information I need via those Wikipedia APIs.

Data from Wikipedia is being transfered to Wikidata. Using their API you could get what you want. If you want a shortcut you could use the Wikidata query tool: http://wdq.wmflabs.org/api_documentation.html

Im not a java guy, but I did something like this in .Net.
You need some kind of web scraping framework.
In .Net there is HtmlAgilityPack. Where you get the site and then with fx XPATH go through elements of the sites. Offcourse you need to know where on the site the informations is. That could be the tags around the heading, text and so on.
For java, the framework I just found was
Tag Soup
HtmlUnit
Web-Harvest
jARVEST
jsoup
Jericho HTML Parser

Related

Library for analyzing xbrl files in java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed yesterday.
Improve this question
I'm trying to figure out how to: read xbrl files, analyze the files and make use of the data e.g. for calculating key figures, in Java.
I know how to read xbrl files as xml and structuring them with json nodes, but I have concluded that it's much more complicated to actually analyze them and use the data. I figured out that tags and attributes like "context id", "period" and "dimenson" etc determines how data is wired together.
Now, I'm not going to implement my own xbrl processor from scratch, because I simply don't have the time and knowledge to do that.
I'm looking for a Java library, including documentation and/or guides on how to use it, that processes xbrl files and that can be used to analyze and extract data.
I searched the web and read a few articles about how to get started, but I didn't quite find something that seemed very useful.
Any suggestions? I would really appreciate if someone could point me in the right direction.

Using an existing XBRL processor is a good idea as it saves you the (considerable) efforts of interpreting the XBRL semantics at a raw syntactic level.
From the top of my mind, I know of at least the following products that offer a Java API, in a random order. I have no affiliation with either and abstain from commenting further to not land into a taste/preference discussion.
Reporting Standard: http://www.reportingstandard.com/index.php/en/
CoreFiling: https://www.corefiling.com/
There are probably many more, possibly also open source. XBRL.org has a much more comprehensive list of vendors here as well as a getting started guide for developers.

I was able to parse Xbrl files from XbrlParser project here.
Credits: https://github.com/marcioalexandre

What is the best way for a Java developer to generate Javascript without writing Javascript [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am an experienced Java programmer, and Im trying to create a website with much of its content based on dynamic data from a database. The scope of the website is quite small with only about 5 webpage designs required ( although the user will see thousands of different pages generated from the data), but each page is quite complex.
I decided to go with plain old Java and Servlets as I understand this well, I also understand html and CSS so have no real difficultly generating the basic html pages from the data.
My problem lies with the addition of Javascript to improve the user interface. Ive tried using Javascript a few times over the years and always make very slow progress, if there is an off the shelf well documented solution such as a Jquery widget then I okay, but if I need to modify it or create custom Javascript I always get stuck.
Im looking for any alternative to writing pure Javascript. Im not looking at learning a new framewotk for the complete site, or for a way to abstract the html because I understand that and I don't really like deploying generated code that I didnt write.
But in the case of Javascript I would consider generated code, is there a tool that I could use to generate Javascript without writing Javascript that I could then reference from my webpages, or it impossible to consider Javascript and Html in isolation from each other.

Jeremy Ashkenas's public List of languages that compile to JS lists pretty many (~hundred) options.
The section for Java/JVM to JavaScript alone lists 15 choices.

Coffeescript is a language that generate Javascript. I haven`t used it, but friends that develop in Javascript have told me that Coffeescript is a nice tool.

text mining and advanced search solution for sharepoint [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Within my organization, we have maintained a sharepoint site to store a large amount of files related to previous/ongoing projects. These files can be word, pdf and ppt files. We are interesting to build a solution that have following functionalities
1) Advanced search, return a set of files that matches the keyword input by users. It is better to mark the returned files with some label (like using color) on the contents that are directly related to the search keyword.
2) Enable users to perform some types of analysis on the sharepoint site. Such as social network analysis of the person who are authors of some sharepoint files.
Are there any commercial software or open source library to fulfill these types of tasks?

This response is assuming you are using SharePoint 2010 or 2013.
Consider using faceted search. If you have an Enterprise cal you can easily set this up. The trick is making sure the metadata for the facets is available. This would obtain the search behavior your looking for, but not the interaction and tagging.
For this it would be best to create a custom solution, and leverage term sets in managed metadata. In SharePoint 2010 there is conditional formatting that you could use for color coding, however this is deprecated in 2013.
Hope those directions are helpful, but ultimately you are likely going to need to do a combination with custom code and event handlers.

How to have access to a Bible Document in Java/Android [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
This may be too broad of a question, but how does one access a Bible document in an Android application? Is it a text file that has an index to find specific verses or is it something much more complicated? I hope that is enough to answer from.

The first step would be to actually find a structured bible dataset from somewhere.
You can search and try to see if there's an xml version of your favourite translation somewhere, and maybe download that.
Once you've got it (either as xml, json, or whatever format), you can write some Java code to parse the file, and load it into some appropriate data structure that allows you to do what you want with it efficiently (eg. search it by verse).
You could even put it into a database (eg. MySQL or MongoDB), which would allow you to search it efficiently.
But really, how you want to structure the data depends on how you're going to use it, and what formats it's already available in (as it could be a pain to clean up the XML).
You might find the following resources useful:
Web Service APIs to directly get verses: http://www.4-14.org.uk/xml-bible-web-service-api
These would mean avoiding a lot of the headaches of dealing with file formats, indexing, and all kinds of other stuff.
Web service APIs generally work by your program submitting a query to a website (eg. including the biblical reference), and you get back some structured data (eg. xml/json) containing the verse(s) you requested.
Download a structured offline copy: http://www.bibletechnologies.net./osistext/
This would mean you have to find, download, parse, and index your own data structure for dealing with the text, but it would be much faster (if done right) than using a web service to do it.
The link I posted here has only some example books from the bible, but if you look you'll find more around the web.

It completely depends on the format of the file.
Any book or text document has multiple ways it can be stored and distributed. It could simply be in a .pdf file, or it could be stored in an XML, or .epub
It is beyond broad, because there are so many ways to do it, it's impossible to guess without more information.
This link has some information about the e-book formats:
http://en.wikipedia.org/wiki/Comparison_of_e-book_formats
And that's just one small subsection of ways text can be stored.

Integrating a map into a java application [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
As a part of my 4th year software engineering degree I'm doing a project in which I'm giving visualization to several path finding algorithms (for multiple agents).
The first part of my project was building a re-sizable grid environment and implementing 2-3 different path finding algorithms.
The second part involves Geographical maps. I want to able to show the user a real geographical map, for example a road map, and give the algorithm the roads data as input so that the algorithms will work on this data (i believe its called a layer in a vector map) and produce a path as an input.
So eventually i will be able to show the movements of the agents on the map according to the calculated path.
The algorithms we implemented are pretty generics in the states and data they can use, so my biggest issue is figuring out how to display the map file as part of the application and where to get the input data for my algorithm.
At the beginning i thought of something like the GoogleMaps API but I'm not sure its what I'm looking for as Google maps is for web and I'm not sure that they give access to the roads layer.
So i think that what i need is some sort of an open source GIS that i can easily integrate into a java application and i will also need sample data, which is the background image ( raster map i think) and the road layer which will be used as an input for my algorithms (A* for example).
I've never worked with such systems before so it would really help me if someone could give me some directions and recommend me a good GIS library that i can use in my project (it has to be open source)

Check out NASA Worldwind, its similar to Google Earth in a lot of ways, with a Java API
http://worldwind.arc.nasa.gov/java/

To get hands on this you may visit OpenStreetMap and you can download some "raw" data as XML.

You can use the Google Static Maps API to display your map as images. It requires you to do an HTTP request (With additional parameters) and it returns you an image, that you can display.
Alternatively, you can use OpenJUMP (which is completely written in Java).

OpenMap is an open source mapping project that supports a variety of mapping formats written in Java.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.