Update: The link in the answer is both interesting and useful, but unfortunately does not address the need for a java API, so I am still looking forward to any input.
I'm building a database of chemical compounds. I need all the synonyms (IUPAC and common names) as well as safety data for each.
I'll be using the freely available data at PubChem (http://pubchem.ncbi.nlm.nih.gov/)
There's an easy way of querying each compound with simple HTTP gets. For example, to obtain glycerol data, the URL is:
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=753
And the following URL would return an easy to parse format:
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=753&disopt=DisplaySDF
but it will respond only very basic info, lacking safety data and only a few common names.
There is one public domain API for JAVA that seems a very complete, developed by a group at Scripps (citation). The code is here.
Unfortunately, this API is not very well documented and it's quite difficult to follow due to the complexity of the data involved.
For what I gathered, pubchemdb is using the PubChem Power User Gateway (PUG) XML API
Has anyone used this API (or any other one available)? I would appreciate a short description or tutorial on how to start with it.
The Cactvs Chemoinformatics toolkit (free for academic/educational use) has full PubChem integration. Using the scripting environment, you can easily do something like
cactvs>ens create 753
ens0
cactvs>ens get ens0 E_NAMESET
PROPANE-1,2,3-TRIOL GLYCEROL 8043-29-6 29796-42-7 30049-52-6 37228-54-9 75398-78-6 78630-16-7 8013-25-0 175385-78-1 25618-55-7 64333-26-2 56-81-5 {Tegin M} LS-1377 G8773_SIGMA 15523_RIEDEL {Glycerin, natural} NCGC00090950-03 191612_ALDRICH 15524_RIEDEL {Glycerol solution} L-glycerol 49767_FLUKA {Biodiesel impurity} 49770_FLUKA 49771_FLUKA NCGC00090950-01 49927_FLUKA Glycerol-Gelatine G7757_SIAL GOL D-glycerol G9012_SIAL {Polyhydric alcohols} c0066 MOON {NSC 9230} G2025_SIGMA ZINC00895048 49781_FLUKA {Concentrated glycerin} {Concentrated glycerin (JP15)} D00028 {Glycerin (JP15/USP)} 44892U_SUPELCO {Glycerin, concentrated (JAN)} CRY 49782_FLUKA NCGC00090950-02 G6279_SIAL W252506_ALDRICH G7893_SIAL {Glycerin, concentrated} 33224_RIEDEL Bulbold Cristal Glyceol G9281_SIGMA Glycerol-1,2,3-3H G1901_SIGMA G7043_SIGMA 1,2,3-trihydroxypropane 1,2,3-trihydroxypropanol glycerin G2289_SIAL G9406_SIGMA {Glycerol-[2-3H]} CHEBI:17754 Glyzerin Oelsuess InChI=1/C3H8O3/c4-1-3(6)2-5/h3-6H,1-2H {90 Technical glycerine} Dagralax {Glycerin, anhydrous} {Glycerin, synthetic} Glycerine Glyceritol {Glycyl alcohol} Glyrol Glysanin NSC9230 Ophthalgan Osmoglyn Propanetriol {Synthetic glycerin} {Synthetic glycerine} Trihydroxypropane Vitrosupos {WLN: Q1YQ1Q} Glycerol-1,3-14C {4-01-00-02751 (Beilstein Handbook Reference)} AI3-00091 {BRN 0635685} {CCRIS 2295} {Caswell No. 469} {Citifluor AF 2} {Clyzerin, wasserfrei [German]} {EINECS 200-289-5} {EPA Pesticide Chemical Code 063507} {FEMA No. 2525} {Glicerina [DCIT]} {Glicerol [INN-Spanish]} {Glycerin (mist)} {Glycerin [JAN]} {Glycerin mist} {Glycerine mist} Glycerinum {Glycerolum [INN-Latin]} Grocolene {HSDB 492} IFP {Incorporation factor} 1,2,3-Propanetriol C00116 Optim {Propanetriol (VAN)} {1,2,3-PROPANETRIOL, HOMOPOLYMER} {Glycerol polymer} {Glycerol, polymers} {HL 80} {PGL 300} {PGL 500} {PGL 700} Polyglycerin Polyglycerine Polyglycerol {Unigly G 2} {Unigly G 6} G5516_SIGMA MolMap_000024
cactvs>
This hides all PUG ugliness - but in any case, I dare say that PUG is well documented. The toolkit goes much beyond simple data downloads - you can even open and query PubChem like a local SD file if you want to.
PubChem does not contain safety data, though. And safety data is country/region-dependent, strictly regulated, and you should be really careful not to be hit with liabilities. Have your approach checked by legal personnel!
Related
First of all, I want to clarify that my experience working with wikidata is very limited, so feel free to correct if any of my terminology is wrong.
I've been playing with wikidata toolkit, more specifically their wdtk-wikibaseapi. This allows you to get entity information and their different properties as such:
WikibaseDataFetcher wbdf = WikibaseDataFetcher.getWikidataDataFetcher();
EntityDocument q42 = wbdf.getEntityDocument("Q42");
List<StatementGroup> groups = ((ItemDocument) q42).getStatementGroups();
for(StatementGroup g : groups) {
List<Statement> statements = g.getStatements();
for(Statement s : statements) {
System.out.println(s.getMainSnak().getPropertyId().getId());
System.out.println(s.getValue());
}
}
The above would get me the entity Douglas Adams and all the properties under his site: https://www.wikidata.org/wiki/Q42
Now wikidata toolkit has the ability to load and process dump files, meaning you can download a dump to your local and process it using their DumpProcessingController class under the wdtk-dumpfiles library. I'm just not sure what is meant by processing.
Can anyone explain me what does processing mean in this context?
Can you do something similar to what was done using wdtk-wikibaseapi in the example above but using a local dump file and wdtk-dumpfiles i.e. get an entity and it's respective properties? I don't want to get the info from online source, only from the dump (offline).
If this is not possible using wikidata-toolkit, could you point me to somewhere that can get me started on getting entities and their properties from a dump file for wikidata please? I am using Java.
I am trying to create some statements and their inverse in Java using OpenRDF's Sesame. I am following the tutorial in the Sesame 2.7 documentation as well. Let's say I have created the following URIs and statements and added them to the Sesame repository:
RepositoryConnection connection;
ValueFactory factory = ValueFactoryImpl.getInstance();
Resource author_1 = new createURI("http://www.author.org/author_1");
Resource article_1 = new createURI("http://www.title.org/article_1");
URL wrote = factory.createURI("http://www.paper.org/wrote");
URL writtenby = factory.createURI("http://www.paper.org/writtenby");
Statement statement_1 = factory.createStatement(author_1, wrote, article_1);
connection.add(statement_1);
The code above is for creating a statement to describe that an author wrote an article. In the OpenRDF Workbench, I can see this statement. What I am trying to do is to do the inverse using OWL.INVERSEOF to get that article_1 is written by author_1 as follows:
connection.add(writtenby, OWL.INVERSEOF, wrote);
When I run the project and get back to the OpenRDF Workbench, I see the following statements:
<http://www.author.org/author_1>, http://www.paper.org/wrote, http://www.title.org/article_1>
<http://www.paper.org/writtenby>, <http://www.w3.org/2002/owl#inverseOf>, <http://www.paper.org/wrote>
When I click on <http://www.paper.org/writtenby>, I can't find the inverse statement that the article_1 is written by author1 but I can find the author_1 wrote article_1. Is my way of doing this inverse wrong or do I misunderstand something with this concept? Thank you very much for your help in advance.
It is as Joshua says. OpenRDF/Sesame doesn't have support for this kind of reasoning. I think it supports only some kind of basic RDF/S reasoning during load. It also (still) doesn't support custom rules I think.
You can achieve what you are asking by using Sesame with OWLIM. OWLIM-Lite and OWLIM-SE do have support for OWL reasoning (rule-based, forward-chaining, materialization). There is a number of predefined rule sets supported by OWLIM. You would probably want owl-max.
Depending on what you are trying to achieve, you might want to use a real OWL reasoner such as Pellet. However, Sesame doesn't support Pellet...
I am using google-api-translate-java-0.92.jar.
Translate.setHttpReferrer("http://translate.google.com/");
try {
String translation = Translate.execute("arrangement", Language.ENGLISH, Language.UKRANIAN);
System.out.println(translation);
} catch (Exception e) {
System.out.println(e.getMessage());
}
But I get only one translation.
I go to page http://translate.google.com/ and It gives me multiple translation.
How can I get multiple translation from my code?
I don't believe you can...
Looking at the soucre, it builds up the following URL:
http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=en|uk&q=arrangement
which when you look at the JSON response, returns:
{"responseData": {"translatedText":"Композиція"}, "responseDetails": null, "responseStatus": 200}
As you can see, this is only returning a single word. The dictionary lookup on the google translate page must be an additional call to a different service (not part of the translate service)
EDIT
Using firebug, you can see the request that is being made by the translate page, and you get this URL:
http://translate.google.com/translate_a/t?client=t&text=arrangement&hl=en&sl=en&tl=uk&multires=1&otf=2&pc=0&sc=1
Which returns this:
[[["Композиція","arrangement","Kompozytsiya"]],[["noun",["розташування","розміщення","домовленість","аранжування","упорядкування","механізм","оформлення","пристрій","систематизація","монтаж","пристосування","урегулювання","плани","згода","залагода","розв'язання","порозуміння"]]],"en"]
However, this extended URL format is not supported by the translate JAR you are using (at least I can't find it in the source on google code), is not part of the googleapis subdomain, and I'm not even sure it's for public consumption or that calling it directly doesn't violate Googles T&Cs.
But that's how they generate the dictinary list anyway...
From Google Translate API FAQ (checked: 16 November 2013)
Is it possible to get multiple translations of a word?
The answer:
No. This feature is only available via the web interface at
translate.google.com
source: https://developers.google.com/translate/v2/faq#technical
There is an open enhancement request for this functionality. So it doesn't look like you're doing anything wrong; the API just doesn't expose that functionality.
I would like to create a simple XMPP client in java that shares his location (XEP-0080) with other clients.
I already know I can use the smack library for XMPP and that it supports PEP, which is needed for XEP-0080.
Does anyone have an example how to implement this or any pointers, i don't find anything using google.
thanks in advance.
Kristof's right, the doc's are sparse - but they are getting better. There is a good, albeit hard to find, set of docs on extensions though. The PubSub one is at http://www.igniterealtime.org/fisheye/browse/~raw,r=11613/svn-org/smack/trunk/documentation/extensions/pubsub.html.
After going the from scratch custom IQ Provider route with an extension I found it was easier to do it using the managers as much as possible. The developers that wrote the managers have abstracted away a lot of the pain points.
Example (modified-for-geoloc version of one rcollier wrote on the Smack forum):
ConfigureForm form = new ConfigureForm(FormType.submit);
form.setPersistentItems(false);
form.setDeliverPayloads(true);
form.setAccessModel(AccessModel.open);
PubSubManager manager
= new PubSubManager(connection, "pubsub.communitivity.com");
Node myNode = manager.createNode("http://jabber.org/protocol/geoloc", form);
StringBuilder body = new StringBuilder(); //ws for readability
body.append("<geoloc xmlns='http://jabber.org/protocol/geoloc' xml:lang='en'>");
body.append(" <country>Italy</country>");
body.append(" <lat>45.44</lat>");
body.append(" <locality>Venice</locality>");
body.append(" <lon>12.33</lon>");
body.append(" <accuracy>20</accuracy>");
body.append("</geoloc>");
SimplePayload payload = new SimplePayload(
"geoloc",
"http://jabber.org/protocol/geoloc",
body.toString());
String itemId = "zz234";
Item<SimplePayload> item = new Item<SimplePayload>(itemId, payload);
// Required to recieve the events being published
myNode.addItemEventListener(myEventHandler);
// Publish item
myNode.publish(item);
Or at least that's the hard way :). Just remembered there's a PEPManager now...
PEPProvider pepProvider = new PEPProvider();
pepProvider.registerPEPParserExtension(
"http://jabber.org/protocol/tune", new TuneProvider());
ProviderManager.getInstance().addExtensionProvider(
"event",
"http://jabber.org/protocol/pubsub#event", pepProvider);
Tune tune = new Tune("jeff", "1", "CD", "My Title", "My Track");
pepManager.publish(tune);
You'd need to write the GeoLocProvider and GeoLoc classes.
I covered a pure PEP based approach as an alternative method in detail for Android here: https://stackoverflow.com/a/26719158/406920.
This will be very close to what you'd need to do with regular Smack.
Take a look at the existing code for implementations of other extensions. This will be your best example of how to develop with the current library. Unfortunately, there is no developers guide that I know of, so I just poked around to understand some of the basics myself until I felt comfortable with the environment. Hint: Use the providers extension facility to add custom providers for the extension specific stanzas.
You can ask questions on the developer forum for Smack, and contribute your code back to the project from here as well. If you produce an implementation of this extension, then you could potentially get commit privileges yourself if you want it.
I want to know: is there any API or a query interface through which I can access Wikipedia data?
Mediawiki, the wiki platform that wikipedia uses does have an HTTP based API. See MediaWiki API.
For example, to get pages with the title stackoverflow, you call
http://en.wikipedia.org/w/api.php?action=query&titles=Stackoverflow
There are some (incomplete) Java wrappers around the API - see the Client Code - Java section of the API page for more detail.
For the use with Java, try http://code.google.com/p/wiki-java. It is only one class, but a great one!
You can use Jwiki to get Wikipedia data
Example :
Jwiki jwiki = new Jwiki("elon musk");
System.out.println("Title :"+jwiki.getDisplayTitle()); //get title
System.out.println("Text : "+jwiki.getExtractText()); //get summary text
System.out.println("Image : "+jwiki.getImageURL()); //get image URL
I had the same question and the closest I came to an out-of-the-box solution is bliki, hosted at http://code.google.com/p/gwtwiki/.
I also wrote an article at Integrating Stuff to help you get started with it: http://www.integratingstuff.com/2012/04/06/hook-into-wikipedia-using-java-and-the-mediawiki-api/
MediaWiki is a free and open-source wiki software. Originally developed by Magnus Manske and improved by Lee Daniel Crocker, it runs on many websites, including Wikipedia, Wiktionary and Wikimedia Commons.[5][6]
There is list of Java libraries that can help you to connect wiki by java code .
https://www.mediawiki.org/wiki/API:Client_code#Java
but after use some of them because of their limitations , we try to call REST services from mediawiki directly.