I'm currently trying to convert ISOCODE measure units into the fulltext labels.
For example I'll receive a string such as "LTR" and try to convert it to "Liter". It's in german so I'm also looking for a possibility to do this localized.
Is there a library or so which is already doing this? Is there an enum somewhere, containing all these information?
Otherwise, I guess I'll just have to create one myself.
Thanks a lot.
JSR 363 deals with units of measurement and has been implemented in UOM . You can browse the javadoc to get an idea of what's in there.
There was a project called the JScience project, but it doesn't seem to have been updated for some time.
Related
I have multiple resumes in a format like somebody sends to a company to apply for a job. I need to parse these resumes in Java.
Do I need to convert these resumes to XML first for parsing? May the example below be a way to convert the resume in XML?
<Name>Varjhjh</Name>
<Experience>5</Experience>
<Age>7</Age>
.
.
.
resume parsing isn't trivial task, I remember couple years ago I was implementing one strategy -- the main problem is everybody construct their CV his/her own way.
e.g. one writes Date of Birth, another DOB next Birth Date -- so you have to use some dictionary for these cases.
And another interesting thing which you can have it's parsing names, especially if your target candidate has very very very long long name e.g. Frederick Gerald Hubert Irvim John Kenneth
Or for example user have few phones his landline, mobile, his reference 1 phone, two etc.
I remember these guys parsed cv not badly
www.rchilli.com/
Other Parsing vendors include: Sovren, Daxtra, Burning Glass and Hireability
But I'm not sure if they have Java integration, and not sure about their cost.
Anyway, good luck in parsing.
I work for Sovren which is a parsing vendor for full disclosure. Resume parsing is not a trivial task. Many company including Sovren, HireAbility, Daxtra and Burning Glass offer installed and SaaS solutions for parsing. Typical work flow is convert the non image resume/cv to text and parsing it returning HR-XML, the industry standard.
I want to build a java tool that extracts lldp informations of some devices (switches, routers, etc) to make a 'topology map'.
Trying snmpwalk, i found only useless informations (for this case).
I think lldp mib is 1.0.8802.1.1.2, but i'm not sure.
Anyone knows how to extract this, using snmpwalk or another method?
Thanks in advance.
According to IEEE 802 MIB document, what append when you try to walk on this OID :
1.0.8802.1.1.2.1
I stands for :
iso std(0) iso8802(8802) ieee802dot1(1) ieee802dot1mibs(1) lldpMIB(2) lldpObjects(1)
This can be done via SNMPWALK. IF-MIB/IP-MIB are the generic MIBs that will provide you with good enough data to build something like that. If its just for Cisco than CDP-MIB will give you everything that you would need. Let me know if you are trying to make something vendor neutral.
I'm back with a question. I'm playing with Rapid Miner for automatic text classification and cant get it work. I'm getting an error that says, "no example set in the example, offending operator Performance ". Any idea what that is referring to ?
In RapidMiner you have to use the converter components before using it as example sets. So, if you have an output as 'doc', for example, you have to use the component 'Documents to Data' in order to link it to the next input 'exa'. That´s all!
Could you provide more details about your RapidMiner text mining process?
Without more context, your question is difficult to answer.
For more help with RapidMiner, you may want to check out the RapidMiner user forum: http://forum.rapid-i.com/
At RapidMiner Resources, you can find RapidMiner tutorial videos about how to text mining with RapidMiner:
http://rapidminerresources.com/index.php?page=text-mining-3
Rapid-I also offers a 90 minutes text mining webinar. You can find it at the Rapid-I web page under "services" and "training" or in the web shop.
I hope these links help you to get started with automatic text classification with RapidMiner. If you provide more details about your RapidMiner text mining process, I may also be able to directly answer your question.
If it says that there is no Example Set, then the issue is probably with your original data. Can you post an image of your process?
For instance, make sure that you have connected the initial input to your operator - what two operators does the error occur at?
One thought: the example set in text mining is usually your document collection, but if you are really using documents (PDF, Word) then your format will be Documents (Doc), and you may need to transform your documents to data (Documents to Data operator). Then you should have an Example Set that you can feed into your Performance operator.
Hope this helps - as the earlier comment said, without knowing the process, it is hard to tell exactly where the error is.
I want to write a Java func grabTopResults(String f) such that grabTopResults("automata theory") returns me a list of the top 100 cited papers on scholar.google.com for "automata theory".
Does anyone have suggestions for what libraries will make my life easy?
Thanks!
As I'm sure Google can afford the bandwidth, I'll ignore the question of whether this is immoral/illegal/prohibited by Google's T&C
First thing you need to do is figure out what HTTP request (or requests) you need to issue in order to obtain the page with the data you need. Once you've figured this out, use HttpClient to issue the same request from Java code. The previous link shows example code that explains how to do this.
Once you've downloaded the content of the relevant page, you'll need to use a HTML parser to extract the data you're interested in. The Jericho parser suggested by peperg is a good choice.
If the Google police come knocking, you've never heard of me, OK?
I use http://jericho.htmlparser.net/docs/index.html . Google Scholar doesn't have API ( http://code.google.com/p/google-ajax-apis/issues/detail?id=109 ). Of course it is not allowed by Google (read terms of use. Automatic requestr are forbidden).
Below is a bit of example code which gets the titles on the first page using the open source product TestPlan. It is a standalone product, but if you really need it I could help you integrated it into your Java code (it is written in Java itself).
GotoURL http://scholar.google.com/
SubmitForm with
%Params:q% automate theory
end
set %Items% as response //div[#class='gs_r']
foreach %Item% in %Items%
set %Title% as selectIn %Item% h3
Notice %Title%
end
This produces output like the below (my IP is Germany, thus a german response). Obviously you could format it however you like, or write it to a file; this is just a rough test.
00000000-00 GOTOURL http://scholar.google.com/
00000001-00 SUBMITFORM default
00000002-00 NOTICE [ZITATION] Stochastic complexity in statistical inquiry theory
00000003-00 NOTICE AUTOMATED THEORY FORMATION IN MATHEMATICS1
00000004-00 NOTICE Constraint generation via automated theory formation
00000005-00 NOTICE [BUCH] Automated theorem proving: after 25 years
00000006-00 NOTICE [BUCH] Introduction to the Theory of Computation
00000007-00 NOTICE [ZITATION] Computer-controlled systems: theory and design
00000008-00 NOTICE [BUCH] … , randomness & incompleteness: papers on algorithmic information theory
00000009-00 NOTICE [BUCH] Automatic control systems
00000010-00 NOTICE [BUCH] VLSI physical design automation: theory and practice
00000011-00 NOTICE Singular Control Systems.
Friends,
I need to know how to convert a text to a picture-message(.ota) format in JAVA for sending through mobiles? I am developing a software that sends the picture-message to another mobile via serial-port.
Could anyone help for creating a routine for the conversion process? I need that routine to converts the given text/picture to a .ota format?
Having read the article about the file format I would say it doesn't sound all that complicated. The basic steps are pretty much outlined and could be implemented within the hour.. Guessing you've that that already by now?
(And if so, mind sharing the code to solve the question? ;) Shouldn't need to be anything over a few hundred lines, right?)