Weka Experimenter mod : Problem creating experiment copy to run - java

In the experimenter mod in weka I have this configuration :
Results destination : Ex1_1.arff
Experiment type : cross-validation | number of folds : 10 | Classification
Dataset : soybean.arff.gz
Iteration Control : Number of repetitions : 10 | Data sets first
Algorithms : J48 -C 0.25 -M 2
This experience is saved as Ex1_1.xml (saving with .exp give the following error : Couldn't save experiment file : Ex1_1.exp Reason : java.awt.Component$AccessibleAWTComponent$AccessibleAWTComponentHandler)
And when I try to run this experience I have the following error : Problem creating experiment copy to run: java.awt.Component$AccessibleAWTComponent$AccessibleAWTComponentHandler
So it seem I have a problem with something like AWT in java... Do somebody have an idea ?
Thank you !

Related

stratified sampling R: java.lang.OutOfMemoryError: Java heap space

I want to use this function here is the code on github to sample my dataset in 2 parts 90% traning data set ( for example) and 10% (the rest) are the test ( for example tried this code :
library(XLConnect)
library(readxl)
library(xlsx)
library(readxl)
ybi <- read_excel("D:/ii.xls")
#View(ybi)
test= stratified(ybi, 8, .1)
no= (test$ID_unit) # to get indices of the testdataset samples
train = ybi [-no,] # the indices for training data
write.xlsx(train,"D:/mm.xlsx",sheetName = "Newdata")
in fact my data have 8 attributes and 65534 row.
I have selected by the code above just 10% based on the 8 eigth attribute which is the class it gives me without any problm the test set but not the training data ther error is on the figure (joined)error
how to fix it!
It looks like you JVM has no enough memory allocated for the heap.
As a quick fix, export system variable _JAVA_OPTIONS
export _JAVA_OPTIONS="-Xmx8G -Xms1G -Xcheck:jni"
you can also use:
options(java.parameters = "-Xmx8G")
and set -Xmx to a value that will make R happy.

python stanford posttager, java command failed after running for some time

I am using stanford posttager toolkit to tag list of words from academic papers. Here is my codes of this part:
st = StanfordPOSTagger(stanford_tagger_path, stanford_jar_path, encoding = 'utf8', java_options = '-mx2048m')
word_tuples = st.tag(document)
document is a list of words derived from nltk.word_tokenize, they come from mormal academic papers so usually there are several thousand of words (mostly 3000 - 4000). I need to process over 10000 files so I keep calling these functions. My program words fine on a small test set with 270 files, but when the number of file gets bigger, the program gives out this error (Java heap space 2G):
raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed
Note that this error does not occur immediately after the execution, it happens after some time of running. I really don't know the reason. Is this because my 3000 - 4000 words are too much ? Thank you very much for help !(Sorry for the bad edition, the error information is too long)
Here is my solution to the code,after I too faced the error.Basically increasing JAVA heapsize solved it.
import os
java_path = "C:\\Program Files\\Java\\jdk1.8.0_102\\bin\\java.exe"
os.environ['JAVAHOME'] = java_path
from nltk.tag.stanford import StanfordPOSTagger
path_to_model = "stanford-postagger-2015-12-09/models/english-bidirectional-distsim.tagger"
path_to_jar = "stanford-postagger-2015-12-09/stanford-postagger.jar"
tagger=StanfordPOSTagger(path_to_model, path_to_jar)
tagger.java_options='-mx4096m' ### Setting higher memory limit for long sentences
sentence = 'This is testing'
print tagger.tag(sentence.split())
I assume you have tried increasing the Java stack via the Tagger settings like so
stanford.POSTagger([...], java_options="-mxSIZEm")
Cf the docs, default is 1000:
def __init__(self, [...], java_options='-mx1000m')
In order to test if it is a problem with the size of the dataset, you can tokenize your text into sentences, e.g. using the Punkt Tokenizer and output them right after tagging.

What is the format name of this data?

Short: I'd like to know the name of this format!
I would like to know if this is a special common format or just a simple self-generated config file:
scenes : {
Scene : {
class : Scene
sources : {
Game Capture : {
render : 1
class : GraphicsCapture
data : {
window : "[duke3d]: Duke Nukem 3D Atomic Edition 1.4.3 STABLE"
windowClass : SDL_app
executable : duke3d.exe
stretchImage : 0
alphaBlend : 0
ignoreAspect : 0
captureMouse : 1
invertMouse : 0
safeHook : 0
useHotkey : 0
hotkey : 123
gamma : 100
}
cx : 1920
cy : 1080
}
}
}
}
My background is, that I would like to read multiple files like this one above. And I don't want to implement a whole new parser for this. That's why I want to fall back on java libraries which have already implemented this feature. But without being aware of such code formats, it's quite difficult to search for this libraries.
// additional info
This is a config file or a "scene file" for Open Broadcaster Software.
Filename extension is .xconfig
This appears to be a config file or a "scene file" for Open Broadcaster Software.
When used with OBS it has a extension of .xconfig
Hope this helps.
-Yang
I got some feedback from the main developer of this files.
As i thought, this is not a know format - just a simple config file.
solved!

ENCOG values in output file incorrectly denormalized?

The following was produced using the most recent version of encog-workbench (3.2.0)
I was wondering if this is a bug or if I do not grasp the purpose of the output file.
When I run the [ sunspot example ][1] in the encog workbench, without seggregation, i expect the output file to have the fitted values from the model. When i create the validation chart it presents me with the figure found in the tutorial so this seems correct.
But when i go to the sunspots_output.csv output file I get the following output:
ssn(t-29) ssn(t+1) Output:ssn(t+1)
... first thirty values have output Null ...
-0.600472813 -0.947202522 null
-0.477541371 -1 8.349050184
-0.528762805 -0.976359338 8.334476431
-0.814814815 -0.986603625 8.314903157
-0.817178881 -0.892040977 8.292847897
...
All the output values are around 8 for the rest of the file.
Now when i go back to the validation chart, there is a tab data, which contains the following columns:
Ideal Result
-0.477541371 -0.52449577
-0.528762805 -0.526507195
-0.814814815 -0.535029097
-0.817178881 -0.653884012
If I denormalize the values in these columns, I get the following.
66.3 60.3414868
59.8 60.08623701
23.5 59.00480764
23.2 43.92211894
These seem to be correct values for the actual(if i compare them with the original data) and thus these should be the predicted values in the output column.
Is this a bug or do the values in the output(t+1) column mean something else.
I copied these values to excel and denormalized by typing in the formula for (-1,1).
I was hoping not to have to do this every time I run an experiment.
I am going to move to code eventually. Just wanted to get some preliminary results with the workbench. Using segregation results in the same problem, btw.
If its a bug I'll report it on the encog website.
Thanks for your answers,
Florian
UPDATE
Hey Jef, I downloaded your zip and reproduced the problem using my workbench.
The problem only arises when i do not seggregate, which i do not want to.
There are some clear differences in the .ega file created by workbench-excecutable3.2.0
When i use your .ega file and remove the seggregate section, it works.
When i use mine it doesn't. That's why i uploaded my project [here][2]:
Maybe you can discover if something new interferes with outputting the correct values.
Hope it helps!
Update 3:
My actual goal is to build a forecaster of which the project can be found here:
http://wikisend.com/download/477372/Myproject.rar
I was wondering if you could tell me if I am doing something definitely wrong, because currently my output is total rubbish.
Thanks again.
I tried to reproduce the error, but when I ran my own sunspots prediction I did get predicted values closer to the expected range. You might try running the zipped version of the example, found here.
http://www.heatonresearch.com/dload/encog/example/workbench/SunspotExample.zip
You should be able to run the EGA file and it will produce an output file. Some of my data are as follows:
"year" "mon" "ssn" "dev" "Output:ssn(t+1)"
1948 5 174.0 69.3 156.3030108771
1948 6 167.8 26.6 168.4791037592
1948 7 142.2 28.3 208.1090604116
1948 8 157.9 35.3 186.0234029962
1948 9 143.3 55.9 131.5008296846
1948 10 136.3 44.9 93.0720770479
1948 11 95.8 21.8 89.8269594386
Perhaps comparing the EGA file for the above zip to your EGA file.

Hadoop Configuration - Cluster

what is the correct setting for the files core-site.xml and mapred-site.xml for Hadoop?
Because I'm trying to run hadoop but get the following error:
starting secondarynamenode , logging to / opt/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-lbad012.out
lbad012 : Exception in thread “main ” java.lang.IllegalArgumentException : Does not contain a valid host : port authority : file :/ / /
lbad012 : at org.apache.hadoop.net.NetUtils.createSocketAddr ( NetUtils.java : 164 )
lbad012 : at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress ( NameNode.java : 212 )
lbad012 : at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress ( NameNode.java : 244 )
lbad012 : at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress ( NameNode.java : 236 )
lbad012 : at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize ( SecondaryNameNode.java : 194 )
lbad012 : at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode . ( SecondaryNameNode.java : 150 )
lbad012 : at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main ( SecondaryNameNode.java : 676 )
You didn't specify which version of hadoop you're using, or whether or not you're using CDH (cloudera's hadoop distro)
You also didn't specify whether or not you're looking to run in a pseudo-distributed, single node, or distributed cluster setup. These options specifically are set up in the files youre mentioning (core-site and mapred-site)
Hadoop is very finicky so these details are important when asking questions related to hadoop.
Since you didn't specify any of the above though, Im guessing you're a beginner -- in which case this guide should help you (and show you what core-site and mapred-site should look like in a pseudo-distributed configuration)
Anyway, Hadoop has a 'Quick Start' guide for almost every version of hadoop they upload, so find one that relates to the version and setup you're looking for and it should be fairly easy to walk through.

Categories

Resources