Sigar API for JAVA (need a guide) - java

I've downloaded Sigar API ( http://support.hyperic.com/display/SIGAR/Home ) and would like to use it in a project to get information about different processes which are running.
My problem is that I can't really find some useful code snippets to learn from and the javadoc from their website isn't of much help, because I don't know what I should be looking for.
Do you have any ideea where I could find more information?

To find the pid (which is needed to find out information about a certain process), you can use a ProcessFinder.
The method to find a single process pid is findSingleProcess(String expression). Example:
Sigar sigar=new Sigar();
ProcessFinder find=new ProcessFinder(sigar);
long pid=find.findSingleProcess("Exe.Name.ct=explorer");
ProcMem memory=new ProcMem();
memory.gather(sigar, pid);
System.out.println(Long.toString(memory.getSize()));
The expression syntax is this:
Class.Attribute.operator=value
Where:
Class is the name of the Sigar class minus the Proc prefix.
Attribute is an attribute of the given Class, index into an array or key in a Map class.
operator is one of the following for String values:
eq - Equal to value
ne - Not Equal to value
ew - Ends with value
sw - Starts with value
ct - Contains value (substring)
re - Regular expression value matches
operator is one of the following for numeric values:
eq - Equal to value
ne - Not Equal to value
gt - Greater than value
ge - Greater than or equal value
lt - Less than value
le - Less than or equal value
More info here: http://support.hyperic.com/display/SIGAR/PTQL

If you are using Windows 7 try doing something
likefindSingleProcess("State.Name.ct=explorer");

In their latest package, they give a lot of usage examples under bindings\java\examples. Check them out.

The hyperic site seems to be gone, but this https://layer4.fr/blog/2016/10/10/os-monitoring-with-java/ tells you how to hook Sigar to Java. You do need to put the sigar-amd64-winnt.dll file somewhere on the DLL path (e.g. C:\Windows)

Related

Memory requirements for Stanford NER retraining

I am retraining the Stanford NER model on my own training data for extracting organizations. But, whether I use a 4GB RAM machine or an 8GB RAM machine, I get the same Java heap space error.
Could anyone tell what is the general configuration of machines on which we can retrain the models without getting these memory issues?
I used the following command :
java -mx4g -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop newdata_retrain.prop
I am working with training data (multiple files - each file has about 15000 lines in the following format) - one word and its category on each line
She O
is O
working O
at O
Microsoft ORGANIZATION
Is there anything else we could do to make these models run reliably ? I did try with reducing the number of classes in my training data. But that is impacting the accuracy of extraction. For example, some locations or other entities are getting classified as organization names. Can we reduce specific number of classes without impact on accuracy ?
One data I am using is the Alan Ritter twitter nlp data : https://github.com/aritter/twitter_nlp/tree/master/data/annotated/ner.txt
The properties file looks like this:
#location of the training file
trainFile = ner.txt
#location where you would like to save (serialize to) your
#classifier; adding .gz at the end automatically gzips the file,
#making it faster and smaller
serializeTo = ner-model-twitter.ser.gz
#structure of your training file; this tells the classifier
#that the word is in column 0 and the correct answer is in
#column 1
map = word=0,answer=1
#these are the features we'd like to train with
#some are discussed below, the rest can be
#understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
useNGrams=true
#no ngrams will be included that do not contain either the
#beginning or end of the word
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
#the next 4 deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
saveFeatureIndexToDisk = true
The error I am getting : the stacktrace is this :
CRFClassifier invoked on Mon Dec 01 02:55:22 UTC 2014 with arguments:
-prop twitter_retrain.prop
usePrevSequences=true
useClassFeature=true
useTypeSeqs2=true
useSequences=true
wordShape=chris2useLC
saveFeatureIndexToDisk=true
useTypeySequences=true
useDisjunctive=true
noMidNGrams=true
serializeTo=ner-model-twitter.ser.gz
maxNGramLeng=6
useNGrams=true
usePrev=true
useNext=true
maxLeft=1
trainFile=ner.txt
map=word=0,answer=1
useWord=true
useTypeSeqs=true
[1000][2000]numFeatures = 215032
setting nodeFeatureIndicesMap, size=149877
setting edgeFeatureIndicesMap, size=65155
Time to convert docs to feature indices: 4.4 seconds
numClasses: 21 [0=O,1=B-facility,2=I-facility,3=B-other,4=I-other,5=B-company,6=B-person,7=B-tvshow,8=B-product,9=B-sportsteam,10=I-person,11=B-geo-loc,12=B-movie,13=I-movie,14=I-tvshow,15=I-company,16=B-musicartist,17=I-musicartist,18=I-geo-loc,19=I-product,20=I-sportsteam]
numDocuments: 2394
numDatums: 46469
numFeatures: 215032
Time to convert docs to data/labels: 2.5 seconds
Writing feature index to temporary file.
numWeights: 31880772
QNMinimizer called on double function of 31880772 variables, using M = 25.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:923)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:885)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:879)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:91)
at edu.stanford.nlp.ie.crf.CRFClassifier.trainWeights(CRFClassifier.java:1911)
at edu.stanford.nlp.ie.crf.CRFClassifier.train(CRFClassifier.java:1718)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:759)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:747)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2937)
One way you can try reducing number of classes is to not use B-I notation. For example, club B-facility and I-facility into facility. Of course, another way it to use a bigger memory machine.
Shouldn't that be -Xmx4g not -mx4g?
Sorry for getting to this a bit late! I suspect the problem is the input format of the file; in particular, my first guess is that the file is being treated as a single long sentence.
The expected format of the training file is in the CoNLL format, which means each line of the file is a new token, and the end of a sentence is denoted by a double newline. So, for example, a file could look like:
Cats O
have O
tails O
. O
Felix ANIMAL
is O
a O
cat O
. O
Could you let me know if it's indeed in this format? If so, could you include a stack trace of the error, and the properties file you are using? Does it work if you run on just the first few sentences of the file?
--Gabor
If you are going to do analysis on non-transactional data sets you may want to use another tool like Elasticsearch (simpler) or Hadoop (exponentially more complicated). MongoDB is a good middleground as well.
First uninstall the existing java jdk and reinstall again.
Then you can use the heap size as much as you can based on your hard disk size.
In the term "-mx4g" 4g is not the RAM it is the heap size.
Even I Faced the same error initially. after doing this it is gone.
Even I misunderstood 4g as RAM initially.
Now I am able to start my server even with 100g of heap size.
Next,
Instead of using Customised NER model, I suggest you to use Custom RegexNER Model with which you can add millions of words of same entity name within in a single document too.
These 2 errors I faced initially.
For any queries comment below.

Apache Cayenne - I cannot find code defining the constants for Token.kind field

I'm using Cayenne to parse SQL conditions, through org.apache.cayenne.exp.parser.ExpressionParser, which produces a series of org.apache.cayenne.exp.parser.Tokens, and I want to determine the type of each Token (like identifier, equal sign, number, string etc.).
The token type is definitely identified by the ExpressionParser, and it seems to me that it is stored in the int field Token.kind. The values that this field shows in my parsing tests are definitely consistent (for ex. = is always 5, literal strings are always 42, and operators are always 2 etc.).
My problem is just that I cannot find the Java class containing the constants to compare Token.kind values with.
The Javadoc for field Token.kind says:
An integer that describes the kind of this token. This numbering
system is determined by JavaCCParser, and a table of these numbers is
stored in the file ...Constants.java.
It does not specify the full name of the file, so I downloaded JavaCCParser and I checked several *Constants.* files found in javacc-5.0src.zip, javacc-6.0.zip, the two javacc.jar contained in those two zip, and cayenne-3.0.2-src.tar.gz.
None of the classes I found there seems to me to have constants that consistently match the values I see in my tests.
The closest I was able to get to that was with class org.apache.cayenne.exp.parser.ExpressionParserConstants which for ex. contains int PROPERTY_PATH = 34 and int SINGLE_QUOTED_STRING = 42 which definitely match the actual tokens of my test expressions, but other tokens have no corresponding constant in that class, for ex. the = sign (kind = 5) and the and operator (kind = 2).
So my question is if anyone knows in which Java class are those constants defined.
First I should mention that ExpressionParser is designed to parse very specific format of Cayenne expressions. It certainly can not be used to parse SQL. So you might be looking in the wrong direction.
Parser itself is generated by JavaCC based on this grammar file. Tokens for the parser are formally defined in the bottom of this file, and are very specific to the task at hand.

Compare Code Submissions with Previous Submissions?

Users submit code (mainly java) on my site to solve simple programming challenges, but sending the code to a server to compile and execute it can sometimes take more than 10 seconds.
To speed up this process, I plan to first check the submissions database to see if equivalent code has been submitted before. I realize this will cause Random methods to always return the same result, but that doesn't matter much. Is there any other potential problem that could be caused by not running the code?
To find matches, I remove comments and whitespace when comparing code. However, the same code can still be written in different ways, such as with different variable names. Is there a way to compare code that will find more equivalent code?
You could store a SHA1 hash of the code to compare with a previous submission. You are right that different variable names would give different hashes. Try running the code through a minifier or obfuscator. That way, variable cat and dog will both end up like a1, then you could see if they are unique. The only other way would be to actually compile it into bytecode, but then it's too late.
Instead of analyzing the source code, why not speed up the compilation? Try having a servlet container always running with a custom ClassLoader, and use the JDK tools.jar to compile on the fly. You could even submit the code via AJAX REST and get the results back the same way.
Consider how Eclipse compiles your files in the background.
Also, consider how http://ideone.com implements their online compiler.
FYI It is a big security risk to allow random code execution. You have to be very careful about hackers.
Variable names:
You can write code to match variable names in one file with the variable names in the other, then you can replace both sets with a consistent variable name.
File 1:
var1 += this(var1 - 1);
File 2:
sum += this(sum - 1);
After you read File 1, you look for what variable name File 2 is using in the place of sum, then make the variable names the same across both files.
*Note, if variables are used in similar ways you may get incorrect substitutions. This is most likely when variables are being declared. To help mitigate this, you can start searching for variable names at the bottom of the file and work up.
Short hands:
Force {} and () braces into each if/else/for/while/etc...
rewrite operations like "i+=..." as "i=i+..."
Functions:
In cases where function order doesn't matter, you can make sure functions are equivalent and then ignore them.
Operator precedence:
"3 + (2 * 4)" is usually equivalent to "2 * 4 + 3"
A way around this could be by determining the precedence of each operation and then matching it to an operation of the same precedence in the other set of code. Once a set of operations have been matched, you can replace them with a variable to represent them.
Ex.
(2+4) * 3 + (2+6) * 5 == someotherequation
//substitute most precedent: (2+4) and (2+6) for a and b
... a * 3 + b * 5
//substitute most precedent: (a*3) and (b*5) for c and d
... c + d
//substitute most precedent....
These are just a couple ways I could think of. If you do it this way, it'll end up being quite a big project... especially if you're working with multiple languages.

How to subtract a substring from a string in web harvest

I am new to webharvest and am using it to get the article data from a website, using the following statement:
let $text := data($doc//div[#id="articleBody"])
and this is the data that I get from the above statement :
The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army
Notable people
Notable current and former residents of Pittstown include:
My question is that, is it possible to subtract a string from another
in the above example : "Notable people" from the content.
Is it possible to do this way? If its possible please let me know how. Thanks.
Is there something that I can do like this:
if (*contains*($text, 'Notable people')) then $text := *minus*($text, 'Notable people')
contains is a example function name to determine is a string is a substring of another,
and minus is a example function name to remove a substring from another
The desired output:
The Refine Spa (Furman's Mill) was built as a stone grist mill along the on a tributary of Capoolong Creek by Moore Furman, quartermaster general of George Washington's army
Notable current and former residents of Pittstown include:
From http://web-harvest.sourceforge.net/manual.php :
regexp
Searches the body for the given regular expression and optionally replaces found occurrences with specified pattern.
If body is a list of values then the regexp processor is applied to every item and final execution result is the list.
You just have to use correct regular expression a correct regexp-pattern and correct regexp-result

OrientDB having trouble with Unicode, Turkish, and enums

I am using a lib which has an enum type with consts like these;
Type.SHORT
Type.LONG
Type.FLOAT
Type.STRING
While I am debugging in Eclipse, I got an error:
No enum const class Type.STRİNG
As I am using a Turkish system, there is a problem on working i>İ but as this is an enum const, even though I put every attributes as UTF-8, nothing could get that STRING is what Eclipse should look for. But it still looks for STRİNG and it can't find and I can't use that. What must I do for that?
Project > Properties > Resouce > Text file encoding is UTF-8 now. Problem keeps.
EDIT: More information may give some clues which I can't get;
I am working on OrientDB. This is my first attempt, so I don't know if the problem could be on OrientDB packages. But I am using many other libs, I have never seen such a problem. There is a OType enum in this package, and I am only trying to connect to the database.
String url = "local:database";
ODatabaseObjectTx db = new ODatabaseObjectTx(url).
Person person = new Person("John");
db.save(person);
db.close();
There is no more code I use yet. Database created but then I get the java.lang.IllegalArgumentException:
Caused by: java.lang.IllegalArgumentException: No enum const class com.orientechnologies.orient.core.metadata.schema.OType.STRİNG
at java.lang.Enum.valueOf(Unknown Source)
at com.orientechnologies.orient.core.metadata.schema.OType.valueOf(OType.java:41)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLCreateProperty.parse(OCommandExecutorSQLCreateProperty.java:81)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLCreateProperty.parse(OCommandExecutorSQLCreateProperty.java:35)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.parse(OCommandExecutorSQLDelegate.java:43)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.parse(OCommandExecutorSQLDelegate.java:28)
at com.orientechnologies.orient.core.storage.OStorageEmbedded.command(OStorageEmbedded.java:63)
at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:63)
at com.orientechnologies.orient.core.metadata.schema.OClassImpl.addProperty(OClassImpl.java:342)
at com.orientechnologies.orient.core.metadata.schema.OClassImpl.createProperty(OClassImpl.java:258)
at com.orientechnologies.orient.core.metadata.security.OSecurityShared.create(OSecurityShared.java:177)
at com.orientechnologies.orient.core.metadata.security.OSecurityProxy.create(OSecurityProxy.java:37)
at com.orientechnologies.orient.core.metadata.OMetadata.create(OMetadata.java:70)
at com.orientechnologies.orient.core.db.record.ODatabaseRecordAbstract.create(ODatabaseRecordAbstract.java:142)
... 4 more
Here is OType class: http://code.google.com/p/orient/source/browse/trunk/core/src/main/java/com/orientechnologies/orient/core/metadata/schema/OType.java
And other class; OCommandExecutorSQLCreateProperty:
http://code.google.com/p/orient/source/browse/trunk/core/src/main/java/com/orientechnologies/orient/core/sql/OCommandExecutorSQLCreateProperty.java
Line 81 says: type = OType.valueOf(word.toString());
Am I correct to assume you are running this program using a turkish locale? Then it seems the bug is in line 118 of OCommandExecutorSQLCreateProperty:
linkedType = OType.valueOf(linked.toUpperCase());
You would have to specify the Locale whose upper casing rules should be used, probably Locale.ENGLISH as the parameter to toUpperCase.
This problem is related to your database connection. Presumably, there's a string in OrientDB somewhere, and you are reading it, and then trying to use it to select a member of the enum.
I'm assuming in the code that you posted that the variable word comes from data in the database. If it comes from somewhere else, then the problem is the 'somewhere else'. If OrientDB, for some strange reason, returns 'STRİNG' as metadata to tell you the type of something, then that is indeed a defect in OrientDB.
If that string actually contains a İ, then no Eclipse setting will have any effect on the results. You will have to write code to normalize İ to I.
If you dump out the contents of 'word' as a sequence of hex values for the chars of the string, I think you'll see your İ staring right at you. You have to change what's in the DB to have a plain old I.
Unfortunately, it is related with regional setting, locale of your OS which is Turkish.
Two work around options :
1. Change your regional settings to English-US
2. Give encoding to the jvm as command line param for setting locale to English
-Duser.language=en -Duser.region=EN
I have created bug reports for xmlbeans, exist and apache cxf for the same issue. Enumeration toUpper is the point of the exception.
Some related links:
https://issues.apache.org/jira/browse/XMLSCHEMA-22
http://mail-archives.apache.org/mod_mbox/xmlbeans-user/201001.mbox/%3CSNT123-DS11993DD331D6CA7799C46CF6650#phx.gbl%3E
http://mail-archives.apache.org/mod_mbox/cxf-users/201203.mbox/%3CBLU0-SMTP115A668459D9A0DA11EA5FAF6460#phx.gbl%3E
https://vaadin.com/forum/-/message_boards/view_message/793105
http://comments.gmane.org/gmane.comp.apache.cxf.user/18316
One work-around is to type Type.ST and then press Ctrl-space. Eclipse should auto-complete the variable name without you having to figure out how to enter a dotless capital I on a Turkish keyboard. :)

Categories

Resources