DBPedia extraction framework failure during extraction of DBPedia Dump

DBPedia extraction framework failure during extraction of DBPedia Dump - java

While working on DBpedia extraction framework, I am facing issues with the csv files from the Core Dataset.
I'm interested in extracting data (in my case, abstract of all company's wikipedia page) from dbpedia dumps (RDF format). I'm following the instructions from DBpedia Abstract Extractioin Step-by-step Guide
Commands used:
$ git clone git://github.com/dbpedia/extraction-framework.git
$ cd extraction-framework
$ mvn clean install
$ cd dump
$ ../run download config=download.minimal.properties
$ ../run extraction extraction.default.properties
I get the below error when executing the last command "./run extraction extraction.properties.file". Can anyone point out wh at mistake am I making. Is there any specific csv file i need to process or some configur ation issue. I have the full "mediawiki-1.24.1".
Also please note th at pages-articles.xml.bz2, I download it partially upto 256MB only. Please help
parsing /opt/extraction-framework-master/DumpsD ata/wikid atawiki/20150113/wikipedias.csv
java.lang.reflect.Invoc ationTargetException
at sun.reflect.N ativeMethodAccessorImpl.invoke0(N ative Method)
at sun.reflect.N ativeMethodAccessorImpl.invoke(N ativeMethodAccessorImpl.java:62)
at sun.reflect.Deleg atingMethodAccessorImpl.invoke(Deleg atingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)
Caused by: java.lang.Exception: expected [15] fields, found [1] in line [%21%21%21 http://www.w3.org/2000/01/rdf-schema#label !!! l]
at org.dbpedia.extraction.util.WikiInfo$.fromLine(WikiInfo.scala:60)
at org.dbpedia.extraction.util.WikiInfo$$anonfun$fromLines$1.apply(WikiInfo.scala:49)
at org.dbpedia.extraction.util.WikiInfo$$anonfun$fromLines$1.apply(WikiInfo.scala:49)
at scala.collection.Iter ator$class.foreach(Iter ator.scala:743)
at scala.collection.AbstractIter ator.foreach(Iter ator.scala:1195)
at org.dbpedia.extraction.util.WikiInfo$.fromLines(WikiInfo.scala:49)
at org.dbpedia.extraction.util.WikiInfo$.fromSource(WikiInfo.scala:36)
at org.dbpedia.extraction.util.WikiInfo$.fromFile(WikiInfo.scala:27)
at org.dbpedia.extraction.util.ConfigUtils$.parseLanguages(ConfigUtils.scala:83)
at org.dbpedia.extraction.dump.sql.Import$.main(Import.scala:29)
at org.dbpedia.extraction.dump.sql.Import.main(Import.scala)

i was facing above issue because of incomplete download of enwiki-20150205-pages-articles.xml.bz2 file using
$ ../run download config=download.minimal.properties
but yet failing to resolve abstract extraction issue as i am expecting long abstract from bdpedia dump.
$ ../run extraction extraction extraction.abstracts.properties
it builds completely and perform extraction over 1 cr+ pages but not reflecting any data in long_abstracts_en.nt
i followed instruction to put mediawiki php and mysql etc.

Related

Unable to load tensorflow model in Java that was originally created in python (file not found exception)

I train a neural network in python 3.7 using tensorflow in version 1.14 and sonnet in version 1.34. After training, I want to export my neural network in order to use it in Java for predictions. Unfortunately, it always displays an error while trying to load the model in Java, saying the file cannot be found.
I double checked the model path a hundred times and it is correct (including transformation of slashes to backslashes because it runs on windows)
I tried several other export methods in python (including SavedModelBuilder and ModelSaver)
Below this you can see the Python code to export the model.
tf.saved_model.simple_save(sess,
path_graph_model_export_dir,
inputs=model_inputs, outputs={'sim': similarity})
This is the structure of the exported directory.
saved_model
---- saved_model.pb
---- variables
-------- variables.data-00000-of-00001
-------- variables.index
And the code in Java to load the model..
SavedModelBundle model = SavedModelBundle.load(modelPath, "serve")
results in this exception.
2019-09-17 10:59:03.905538: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: saved_model
2019-09-17 10:59:03.905997: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: fail. Took 457 microseconds.
org.tensorflow.TensorFlowException: Could not find SavedModel .pb or .pbtxt at supplied export directory path: saved_model
I cannot figure out why this exception appears when the file is clearly located at this directory. I hope someone can help me with this. Thanks in advance!

Java error while running maxent in biomod2

I am running maxent from R, in the package biomod2 and the following error appeared. I do not come from a technical background and wasn't sure why is this error happening. Is it a memory problem or someone said the java path is not set. But I followed the instructions to set maxent to run in R and also downloaded Java Platform, Standard Edition Development Kit and set a path for it as explained in this pdf: http://modata.ceoe.udel.edu/dev/dhaulsee/class_rcode/r_pkgmanuals/MAXENT4R_directions.pdf
I would be really grateful if you could help me understand this problem and any solution to it.
Thanks a lot
Error in file(file, "rt") : cannot open the connection
In addition: Warning messages:
1: running command 'java' had status 1
2: running command 'java -mx512m -jar E:\bioclim_2.5min\model/maxent.jar environmentallayers
="rainfed/models/1432733200/m_47203134/Back_swd.csv"
samplesfile="rainfed/models/1432733200/m_47203134/Sp_swd.csv"
projectionlayers="rainfed/models/1432733200/m_47203134/Predictions/Pred_swd.csv"
outputdirectory="rainfed/models/1432733200/rainfed_PA1_Full_MAXENT_outputs"
outputformat=logistic redoifexists visible=FALSE linear=TRUE quadratic=TRUE
product=TRUE threshold=TRUE hinge=TRUE lq2lqptthreshold=80 l2lqthreshold=10
hingethreshold=15 beta_threshold=-1 beta_categorical=-1 beta_lqp=-1
beta_hinge=-1 defaultprevalence=0.5 autorun nowarnings notooltips
noaddsamplestobackground' had status 1
3: In file(file, "rt") :
cannot open file 'rainfed/models/1432733200/rainfed_PA1_Full_MAXENT_outputs/rainfed_PA1_
Full_Pred_swd.csv': No such file or directory

I've just manage to solve this problem - it is a problem with the file path specified. For me, I had a space in one of the folder names which was not accepted in the path to the maxent.jar file. From looking at your error, it looks like it might be the two backslashes.
E:\bioclim_2.5min\model/maxent.jar
should probably read
E:/bioclim_2.5min/model/maxent.jar

lingpipe sentiment analysis tutorial demo error?

I was doing the sentiment analysis a from lingpipe website tutorial, and I keep getting this error, is there anyone wo can help?
java -cp "sentimentDemo.jar:../../../lingpipe
e-4.1.0.jar" PolarityBasic file:///Users/dylan/Desktop/POLARITY_DIR/
BASIC POLARITY DEMO
Data Directory=file:/Users/dylan/Desktop/POLARITY_DIR/txt_sentoken
Thrown: java.lang.NullPointerException
java.lang.NullPointerException
at com.aliasi.classify.DynamicLMClassifier.createNGramProcess(DynamicLMClassifier.java:313)
at PolarityBasic.<init>(PolarityBasic.java:26)
at PolarityBasic.main(PolarityBasic.java:92)

The specified path file:///Users/dylan/Desktop/POLARITY_DIR/ should contain the unpacked data (see tutorial) in the directory txt_sentoken
You can see this in the output:
Data Directory=file:/Users/dylan/Desktop/POLARITY_DIR/txt_sentoken
Also the the tutorial is not setup to use an URL, so the command should be
java -cp sentimentDemo.jar:../../../lingpipe-4.0.jar PolarityBasic /Users/dylan/Desktop/POLARITY_DIR

Error in installing DBpedia Spotlight while running the server class in jar

I get the following error:
org.dbpedia.spotlight.exceptions.ConfigurationException: Cannot find spotter file ../dist/src/deb/control/data/usr/share/dbpedia-spotlight/spotter.dict
at org.dbpedia.spotlight.model.SpotterConfiguration.<init>(SpotterConfiguration.java:54)
at org.dbpedia.spotlight.model.SpotlightConfiguration.<init>(SpotlightConfiguration.java:143)
at org.dbpedia.spotlight.web.rest.Server.main(Server.java:70)
Usage:
java -jar dbpedia-spotlight.jar org.dbpedia.spotlight.web.rest.Server [config file]
or:
mvn scala:run "-DaddArgs=[config file]"

Quick solution:
wget http://spotlight.dbpedia.org/download/release-0.5/dbpedia-spotlight-quickstart.zip
unzip dbpedia-spotlight-quickstart.zip
cd dbpedia-spotlight-quickstart/
./run.sh
Explanation:
DBpedia Spotlight looks for ~3.5M things of ~320 types in text and tries to disambiguate them to their global unique identifiers in DBpedia. Therefore it needs data files to accompany its jar. A minuscule example is distributed along with the source, but for real use cases you may need the larger files. After you've downloaded the files, you need to modify the configuration in server.properties with the correct path to the files. The error message you got tells you that one of the necessary files (spotter.dict) could not be found in the path you indicated in your server.properties.
More information available here:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Run-from-a-JAR

tdbloader on Cygwin: Gettging FileNotFoundException: d:\cygdrive\d\....\node2id.idn

I am completely new to Jena/TDB. All I want to do is to load data from some sample rdf, N3 etc file using tdb scripts or through java api.
I am tried to use tbdloader on Cygwin to load data (tdb-0.9.0, on Windows XP with IBM Java 1.6). Following are the command that I ran:
$ export TDBROOT=/cygdrive/d/Project/Store_DB/jena-tdb-0.9.0-incubating
$ export PATH=$TDBROOT/bin:$PATH
I also changed classpath for java in the tdbloader script as mentioned at tdbloader on Cygwin: java.lang.NoClassDefFoundError :
exec java $JVM_ARGS $SOCKS -cp "PATH_OF_JAR_FILES" "tdb.$TDB_CMD" $TDB_SPEC "$#"
So when I run $ tdbloader --help it shows the help correctly.
But when I run
$ tdbloader --loc /cygdrive/d/Project/Store_DB/data1
OR
$ tdbloader --loc /cygdrive/d/Project/Store_DB/data1 test.rdf
I am getting following exception:
com.hp.hpl.jena.tdb.base.file.FileException: Failed to open: d:\cygdrive\d\Project\Store_DB\data1\node2id.idn (mode=rw)
at com.hp.hpl.jena.tdb.base.file.ChannelManager.open$(ChannelManager.java:83)
at com.hp.hpl.jena.tdb.base.file.ChannelManager.openref$(ChannelManager.java:58)
at com.hp.hpl.jena.tdb.base.file.ChannelManager.acquire(ChannelManager.java:47)
at com.hp.hpl.jena.tdb.base.file.FileBase.<init>(FileBase.java:57)
at com.hp.hpl.jena.tdb.base.file.FileBase.<init>(FileBase.java:46)
at com.hp.hpl.jena.tdb.base.file.FileBase.create(FileBase.java:41)
at com.hp.hpl.jena.tdb.base.file.BlockAccessBase.<init>(BlockAccessBase.java:46)
at com.hp.hpl.jena.tdb.base.block.BlockMgrFactory.createStdFile(BlockMgrFactory.java:98)
at com.hp.hpl.jena.tdb.base.block.BlockMgrFactory.createFile(BlockMgrFactory.java:82)
at com.hp.hpl.jena.tdb.base.block.BlockMgrFactory.create(BlockMgrFactory.java:58)
at com.hp.hpl.jena.tdb.setup.Builder$BlockMgrBuilderStd.buildBlockMgr(Builder.java:196)
at com.hp.hpl.jena.tdb.setup.Builder$RangeIndexBuilderStd.createBPTree(Builder.java:165)
at com.hp.hpl.jena.tdb.setup.Builder$RangeIndexBuilderStd.buildRangeIndex(Builder.java:134)
at com.hp.hpl.jena.tdb.setup.Builder$IndexBuilderStd.buildIndex(Builder.java:112)
at com.hp.hpl.jena.tdb.setup.Builder$NodeTableBuilderStd.buildNodeTable(Builder.java:85)
at com.hp.hpl.jena.tdb.setup.DatasetBuilderStd$NodeTableBuilderRecorder.buildNodeTable(DatasetBuilderStd.java:389)
at com.hp.hpl.jena.tdb.setup.DatasetBuilderStd.makeNodeTable(DatasetBuilderStd.java:300)
at com.hp.hpl.jena.tdb.setup.DatasetBuilderStd._build(DatasetBuilderStd.java:167)
at com.hp.hpl.jena.tdb.setup.DatasetBuilderStd.build(DatasetBuilderStd.java:157)
at com.hp.hpl.jena.tdb.setup.DatasetBuilderStd.build(DatasetBuilderStd.java:70)
at com.hp.hpl.jena.tdb.StoreConnection.make(StoreConnection.java:132)
at com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction.<init>(DatasetGraphTransaction.java:46)
at com.hp.hpl.jena.tdb.sys.TDBMakerTxn._create(TDBMakerTxn.java:50)
at com.hp.hpl.jena.tdb.sys.TDBMakerTxn.createDatasetGraph(TDBMakerTxn.java:38)
at com.hp.hpl.jena.tdb.TDBFactory._createDatasetGraph(TDBFactory.java:166)
at com.hp.hpl.jena.tdb.TDBFactory.createDatasetGraph(TDBFactory.java:74)
at com.hp.hpl.jena.tdb.TDBFactory.createDataset(TDBFactory.java:53)
at tdb.cmdline.ModTDBDataset.createDataset(ModTDBDataset.java:95)
at arq.cmdline.ModDataset.getDataset(ModDataset.java:34)
at tdb.cmdline.CmdTDB.getDataset(CmdTDB.java:137)
at tdb.cmdline.CmdTDB.getDatasetGraph(CmdTDB.java:126)
at tdb.cmdline.CmdTDB.getDatasetGraphTDB(CmdTDB.java:131)
at tdb.tdbloader.loadQuads(tdbloader.java:163)
at tdb.tdbloader.exec(tdbloader.java:122)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:97)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:59)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:46)
at tdb.tdbloader.main(tdbloader.java:53)
Caused by: java.io.FileNotFoundException: d:\cygdrive\d\Project\Store_DB\data1\node2id.idn (The system cannot find the path specified.)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:222)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:107)
at com.hp.hpl.jena.tdb.base.file.ChannelManager.open$(ChannelManager.java:80)
... 37 more
I am not sure what node2id.idn file is and why is it expecting it?

The file node2id.idn is one of TDB's internal index files. It's not something that you have to create or manage for yourself. I've just tried tdbloader on cygwin myself, it it worked OK for me. I can think of two basic possibilities:
your disk is full
the TDB index is corrupted
If this is the first file you are loading into an otherwise emtpy TDB, the second possibility is unlikely. If you are loading into a non-empty TDB, try deleting the TDB image and starting again. Note that TDB by itself does not manage concurrent writes: if you have more than one process writing to a single TDB image, you must handle locking at the application level, or use TDB's transactions.
The final possibility, of course, is that your disk is flaky. You might want to try your code on another machine.
If none of these suggestions help, please send a complete minimal test case to the Jena users list.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.