ROME API to parse RSS/Atom - java

I'm trying to parse RSS/Atom feeds with the ROME library. I am new to Java, so I am not in tune with many of its intricacies.
Does ROME automatically use its modules to handle different feeds as it comes across them, or do I have to ask it to use them? If so, any direction on this.
How do I get to the correct 'source'? I was trying to use item.getSource(), but it is giving me fits. I guess I am using the wrong interface. Some direction would be much appreciated.
Here is the meat of what I have for collection my data.
I noted two areas where I am having problems, both revolving around getting Source Information of the feed. And by source, I want CNN, or FoxNews, or whomever, not the Author.
Judging from my reading, .getSource() is the correct method.
List<String> feedList = theFeeds.getFeeds();
List<FeedData> feedOutput = new ArrayList<FeedData>();
for (String sites : feedList ) {
URL feedUrl = new URL(sites);
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));
List<SyndEntry> entries = feed.getEntries();
for (SyndEntry item : entries){
String title = item.getTitle();
String link = item.getUri();
Date date = item.getPublishedDate();
Problem here --> ** SyndEntry source = item.getSource();
String description;
if (item.getDescription()== null){
description = "";
} else {
description = item.getDescription().getValue();
}
String cleanDescription = description.replaceAll("\\<.*?>","").replaceAll("\\s+", " ");
FeedData feedData = new FeedData();
feedData.setTitle(title);
feedData.setLink(link);
And Here --> ** feedData.setSource(link);
feedData.setDate(date);
feedData.setDescription(cleanDescription);
String preview =createPreview(cleanDescription);
feedData.setPreview(preview);
feedOutput.add(feedData);
// lets print out my pieces.
System.out.println("Title: " + title);
System.out.println("Date: " + date);
System.out.println("Text: " + cleanDescription);
System.out.println("Preview: " + preview);
System.out.println("*****");
}
}

getSource() is definitely wrong - it returns back SyndFeed to which entry in question belongs. Perhaps what you want is getContributors()?
As far as modules go, they should be selected automatically. You can even write your own and plug it in as described here

What about trying regex the source from the URL without using the API?
That was my first thought, anyway I checked against the RSS standardized format itself to get an idea if this option is actually available at this level, and then try to trace its implementation upwards...
In RSS 2.0, I have found the source element, however it appears that it doesn't exist in previous versions of the spec- not good news for us!
[ is an optional sub-element of 1
Its value is the name of the RSS channel that the item came from, derived from its . It has one required attribute, url, which links to the XMLization of the source.

Related

testing OpenNLP classifier model

I'm currently training a model for a classifier. yesterday I found out that it will be more accurate if you also test the created classify model. I tried searching on the internet how to test a model : testing openNLP model. But I cant get it to work. I think the reason is because i'm using OpenNLP version 1.83 instead of 1.5. Could anyone explain me how to properly test my model in this version of OpenNLP?
Thanks in advance.
Below is the way im training my model:
public static DoccatModel trainClassifier() throws IOException
{
// read the training data
final int iterations = 100;
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/trainingssetTest.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
// define the training parameters
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations+"");
params.put(TrainingParameters.CUTOFF_PARAM, 0+"");
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
// create a model from traning data
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
return model;
}
I can think of two ways to test your model. Either way, you will need to have annotated documents (an by annotated I really mean expert-classified).
The first way involves using the opennlp DocCatEvaluator. The syntax would be something akin to
opennlp DoccatEvaluator -model model -data sampleData
The format of your sampleData should be
OUTCOME <document text....>
documents are separated by the new line character.
The second way involves creating an DocumentCategorizer. Something like:
(the model is the DocCat model from your question)
DocumentCategorizer categorizer = new DocumentCategorizerME(model);
// could also use: Tokenizer tokenizer = new TokenizerME(tokenizerModel)
Tokenizer tokenizer = WhitespaceTokenizer.INSTANCE();
// linesample is like in your question...
for(String sample=linesample.read(); sample != null; sample=linesample.read()){
String[] tokens = tokenizer.tokenize(sample);
double[] outcomeProb = categorizer.categorize(tokens);
String sampleOutcome = categorizer.getBestCategory(outcomeProb);
// check if the outcome is right...
// keep track of # right and wrong...
}
// calculate agreement metric of your choice
Since I typed the code here there may be a syntax error or two (either I or the SO community can fix), but the idea for running through your data, tokenizing, running it through the document categorizer and keeping track of the results is how you want to evaluate your model.
Hope it helps...

Watson Natural Language Understanding Java Example

Does anyone have an example of making a call to Watson Natural Language Understanding using Java ? The API docs only show Node. However there is a class in the SDK to support it - but no documentation on how to construct the required 'Features' 'AnalyzeOptions' or 'Builder' input.
Here's a snippet that throws a 'Features cannot be Null' - I'm just fumbling in the dark at this point
String response = docConversionService.convertDocumentToHTML(doc).execute();
Builder b = new AnalyzeOptions.Builder();
b.html(response);
AnalyzeOptions ao = b.build();
nlu.analyze(ao);
Until the API reference is published, have you tried looking at the tests on github? See here for NaturalLanguageUnderstandingIT
I've gotten it working with a text string, and looking at the above test, it won't be too much to get it working with a URL or HTML (changing the AnalyzeOptions builder call from text() to html() for example).
Code example:
final NaturalLanguageUnderstanding understanding =
new NaturalLanguageUnderstanding(
NaturalLanguageUnderstanding.VERSION_DATE_2017_02_27);
understanding.setUsernameAndPassword(serviceUsername, servicePassword);
understanding.setEndPoint(url);
understanding.setDefaultHeaders(getDefaultHeaders());
final String testString =
"In remote corners of the world, citizens are demanding respect"
+ " for the dignity of all people no matter their gender, or race, or religion, or disability,"
+ " or sexual orientation, and those who deny others dignity are subject to public reproach."
+ " An explosion of social media has given ordinary people more ways to express themselves,"
+ " and has raised people's expectations for those of us in power. Indeed, our international"
+ " order has been so successful that we take it as a given that great powers no longer"
+ " fight world wars; that the end of the Cold War lifted the shadow of nuclear Armageddon;"
+ " that the battlefields of Europe have been replaced by peaceful union; that China and India"
+ " remain on a path of remarkable growth.";
final ConceptsOptions concepts =
new ConceptsOptions.Builder().limit(5).build();
final Features features =
new Features.Builder().concepts(concepts).build();
final AnalyzeOptions parameters = new AnalyzeOptions.Builder()
.text(testString).features(features).returnAnalyzedText(true).build();
final AnalysisResults results =
understanding.analyze(parameters).execute();
System.out.println(results);
Make sure you populate your NLU service with default headers (setDefaultHeaders()). I pulled these from WatsonServiceTest (I'd post the link but my rep is too low. Just use the FindFile option on WDC github)
final Map<String, String> headers = new HashMap<String, String>();
headers.put(HttpHeaders.X_WATSON_LEARNING_OPT_OUT, String.valueOf(true));
headers.put(HttpHeaders.X_WATSON_TEST, String.valueOf(true));
return headers;

Retrieving modified/ checked in files from Clear Case Activity

This query is related to Rational Clear Case Cm api programming using java. We have a requirement wherein we want to get the list of modified files of a particular stream. I am able to get the list of activities which are of type CcActivity from given Stream and Using that activitylist info I am able to fetch the Version information also.
I am unable to get the changeset information ie name of files which are modified, as there is no such method defined.
Could you please help me out as to which property or method I should use to fetch the list of mopdified file or changeset information using activity id or version information. Below is the code which I have written for getting activity list information and version information:-
PropertyRequest propertyrequest = new PropertyRequest(
CcStream.ACTIVITY_LIST,CcStream.TASK_LIST
);
stream=(CcStream) stream.doReadProperties(propertyrequest);
List<CcActivity> listOfAct = stream.getActivityList();
for(int i=0;i<listOfAct.size();i++){
CcActivity ccActivity = listOfAct.get(i);
PropertyRequest activityPropertyRequest = new PropertyRequest(
CcActivity.COMMENT,CcActivity.ID,CcActivity.DISPLAY_NAME,CcActivity.LATEST_VERSION_LIST,CcActivity.CREATOR_DISPLAY_NAME,CcActivity.NAME_RESOLVER_VIEW
,CcActivity.TASK_LIST,CcActivity.CREATOR_LOGIN_NAME,CcActivity.HEADLINE,CcActivity.COMMENT);
ccActivity = (CcActivity)ccActivity.doReadProperties(activityPropertyRequest);
trace(ccActivity.getDisplayName());
trace(ccActivity.getCreatorDisplayName());
trace("CREATOR_LOGIN_NAME :" +ccActivity.getCreatorLoginName());
trace("Headline:" +ccActivity.getHeadline());
ResourceList<javax.wvcm.Version> versionList = ccActivity.getLatestVersionList();
for(int j=0;j<versionList.size();j++){
Version version = versionList.get(j);
PropertyRequest versionPropertyRequest = new PropertyRequest(
Version.PREDECESSOR_LIST,Version.VERSION_NAME,Version.VERSION_HISTORY.nest(VersionHistory.CHILD_MAP),Version.DISPLAY_NAME,Version.COMMENT
,Version.PATHNAME_LOCATION,Version.ACTIVITY.nest(Resource.CONTENT_TYPE));
version = (Version)version.doReadProperties(versionPropertyRequest);
trace("Version Info");
trace("Version Name : " + version.getVersionName());
trace("Version Comment :" +version.getComment());
Old thread but I have recently been trying to learn how to list modified files through the API. I believe the issue is that version object returned by the ResourceList get method should be cast to CcVersion type.
This exposes the StpResource.doReadProperties(Resource context, Feedback feedback) method which requires a handle to the view we are interested in. We can also request the CcVersion specific properties that we are interested in.
The example would then become something like:
ResourceList<javax.wvcm.Version> versionList = ccActivity.getLatestVersionList();
for(int j=0;j<versionList.size();j++){
Version version = versionList.get(j);
PropertyRequest versionPropertyRequest = new PropertyRequest(CcVersion.DISPLAY_NAME, CcVersion.VIEW_RELATIVE_PATH, CcVersion.CREATION_DATE);
version = (CcVersion) version.doReadProperties(view, versionPropertyRequest);
trace("Version Info");
trace("Version DISPLAY_NAME : " + version.getDisplayName());
trace("Version VIEW_RELATIVE_PATH : " + version.getViewRelativePath());
trace("Version CREATION_DATE : " + version.getCreationDate());
}

Creating a poll system using PircBot

I'm new to Java and I'm trying to create a poll system using PircBot.
So far my code is this:
if (message.startsWith("!poll")) {
String polly = message.substring(6);
String[] vote = polly.split(" ");
String vote1 = vote[0];
String vote2 = vote[1];
}
Which splits the strings so that someone can type !poll "option1 option2" for example and it will be split into vote1 = option1 and vote2 = option2.
I'm kind of lost from here. Am I even heading in the right direction for creating a voting system?
I figure that I'd have a separate statement as follows.
if (message.equalsIgnoreCase("!vote " + option1))
But I'm not sure where to go with that either.

Finding product in category

I need to find products in different categories on eBay. But when I use the tutorial code
ebay.apis.eblbasecomponents.FindProductsRequestType request = new ebay.apis.eblbasecomponents.FindProductsRequestType();
request.setCategoryID("Art");
request.setQueryKeywords("furniture");
I get the following error: QueryKeywords, CategoryID and ProductID cannot be used together.
So how is this done?
EDIT: the tutorial code is here.
EDIT2: the link to the tutorial code died, apparently. I've continued to search and the category cannot be used with the keyword search, but there's a Domain that you could presumably add to the request, but sadly it's not in the API - so I'm not sure if indeed it can be done.
The less-than-great eBay API doc is here.
This is my full request:
Shopping service = new ebay.apis.eblbasecomponents.Shopping();
ShoppingInterface port = service.getShopping();
bp = (BindingProvider) port;
bp.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, endpointURL);
// Add the logging handler
List<Handler> handlerList = bp.getBinding().getHandlerChain();
if (handlerList == null) {
handlerList = new ArrayList<Handler>();
}
LoggingHandler loggingHandler = new LoggingHandler();
handlerList.add(loggingHandler);
bp.getBinding().setHandlerChain(handlerList);
Map<String,Object> requestProperties = bp.getRequestContext();
Map<String, List<String>> httpHeaders = new HashMap<String, List<String>>();
requestProperties.put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, endpointURL);
httpHeaders.put("X-EBAY-API-CALL-NAME", Collections.singletonList(CALLNAME));
httpHeaders.put("X-EBAY-API-APP-ID", Collections.singletonList(APPID));
httpHeaders.put("X-EBAY-API-VERSION", Collections.singletonList(VERSION));
requestProperties.put(MessageContext.HTTP_REQUEST_HEADERS, httpHeaders);
// initialize WS operation arguments here
FindProductsRequestType request = new FindProductsRequestType();
request.setAvailableItemsOnly(true);
request.setHideDuplicateItems(true);
request.setMaxEntries(2);
request.setPageNumber(1);
request.setQueryKeywords("Postcard");
request.setDomain("");
The last line, which should set the domain like I need to, does not compile. Any idea how to solve this?
EDIT 3: I gave up on the Java API and I'm doing direct REST. The categories on eBay are actually domains now, and the URL looks like this:
String findProducts = "http://open.api.ebay.com/shopping?callname=FindProducts&responseencoding=XML&appid=" + APPID
+ "&siteid=0&version=525&"
+ "&AvailableItemsOnly=true"
+ "&QueryKeywords=" + keywords
+ "&MaxEntries=10"
+ "&DomainName=" + domainName;
This works, but you want to hear a joke? It seems like not all the domains are listed here and so it doesn't really solve this problem. Pretty disappointing work by eBay.
The solution for finding items based on keywords, in a category, is to use findItemsAdvanced. Could have saved me a lot of time if the docs for FindProducts stated this, instead of just saying that you can use either keyword search OR category search.
This is the API URL:
http://open.api.ebay.com/shoppingcallname=findItemsAdvanced&responseencoding=XML&appid=" + APPID
+ "&siteid=0&version=525&"
+ "&AvailableItemsOnly=true"
+ "&QueryKeywords=" + keywords
+ "&categoryId=" + categoryId
+ "&MaxEntries=50
For completion, if you want to get a list of all the top categories you can use this:
http://open.api.ebay.com/Shopping?callname=GetCategoryInfo&appid=" + APPID + "&siteid=0&CategoryID=-1&version=729&IncludeSelector=ChildCategories

Categories

Resources