Solr function highlight for java web app - java

I have search the net, but cannot seems to find the solutions on how to do Solr highlight function. I am using java eclipse jsp & servlets.
F.Y.I I am using Solr 5
My objective is when user search the words "Hello" The word hello will be highlighted.
For example: Hello I suck at programming!
Any expert in Solr can give me some solution? I am sure that people will down vote my question. But I got to give it a try.
Here is the my code that I have tried. I had commented SOLR highlight in the following code how any expert in Solr can take it a look and tell me what I should do next. Also wish that some expert in Solr can share their source code if they done something like this before.
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException,
MalformedURLException {
System.out.println("request parameter: "
+ request.getParameter("search"));
PrintWriter out = response.getWriter();
try {
HttpSolrServer solr = new HttpSolrServer("http://localhost:8983/solr/name/");
SolrQuery query = new SolrQuery();
String test;
String content;
//SOLR HIGHLIGHT
query.set("hl", "true");
query.set("hl.snippets", "1");
query.set("hl.simple.pre", "<em>");
query.set("hl.requireFieldMatch","true");
query.set("q", "value:supreme");
query.set("hl.fl", "id,content");
query.setQuery(request.getParameter("search"));
query.setFields("id");
query.setStart(0);
QueryResponse response1 = solr.query(query);
SolrDocumentList results = response1.getResults();
for (int i = 0; i < results.size(); ++i) {
test = results.toString();

As per the snippet you have provided you are searching on field supreme but wish to highlight fields id and content. As far as I know, highlighting allows you to highlight the field on which match was found.
Try highlighting supreme and check the results.
Also highlighting results can be had from calling getHighlighting()
QueryResponse response1 = solr.query(query);
response1.getHighlighting()

You're searching in the field value, but you're only asking for highlighting to be performed on the fields id and content.
query.set("hl.requireFieldMatch","true");
query.set("q", "value:supreme");
query.set("hl.fl", "id,content");
Since you're also telling Solr to only highlight in fields where there's a match, no highlighting will take place. You'll have to either include the field you're expecting the match in (value) in your hl.fl list, or you'll might have to drop the hl.requireFieldMatch setting (possibly depending on the analysis chain and how the fields are indexed).

Use this for highlighting:
String searchTerm="harry",result=results.toString();
Pattern pattern = Pattern.compile("(" +
Pattern.quote(searchTerm.toLowerCase()) + ")",
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
result = pattern.matcher(result.toString()).replaceAll("<strong>$1</strong>");
results=results.add(result);
System.out.println(result);*/

Related

Dropbox SDK Java - write search query to get all files

I'm working on a simple project to download all files with certain extensions. And I'm doing a search like this
public void findFile(String query){
try{
SearchV2Builder searchBuilder = client.files().searchV2Builder(query);
List<String> fileExtensions = Arrays.asList(extensions);
SearchOptions searchOptions = SearchOptions.newBuilder().withFileExtensions(fileExtensions).build();
SearchV2Result searchResult = searchBuilder.withOptions(searchOptions).start();
List<SearchMatchV2> searchMatches = searchResult.getMatches();
System.out.println(searchMatches.size());
for (SearchMatchV2 s: searchMatches){
System.out.println(s.getMetadata());
}
}
catch(DbxException e){
e.printStackTrace();
}
}
And i don't know how to write query to get all the files I tried "*" "" and none of it worked. How to wrote correct query for that?
In my experience, you have to include in your initial search string something you are specifically looking for - there does not seem to be the notion of a "wildcard" search in this string.
Also, there is no way to limit the query by date.
So for my usage, I always have directories that start with a known string ("p_"), and also include a YYYYMMDD string in them.
To return all directories, I query "p_" only.

testing OpenNLP classifier model

I'm currently training a model for a classifier. yesterday I found out that it will be more accurate if you also test the created classify model. I tried searching on the internet how to test a model : testing openNLP model. But I cant get it to work. I think the reason is because i'm using OpenNLP version 1.83 instead of 1.5. Could anyone explain me how to properly test my model in this version of OpenNLP?
Thanks in advance.
Below is the way im training my model:
public static DoccatModel trainClassifier() throws IOException
{
// read the training data
final int iterations = 100;
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/trainingssetTest.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
// define the training parameters
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations+"");
params.put(TrainingParameters.CUTOFF_PARAM, 0+"");
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
// create a model from traning data
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
return model;
}
I can think of two ways to test your model. Either way, you will need to have annotated documents (an by annotated I really mean expert-classified).
The first way involves using the opennlp DocCatEvaluator. The syntax would be something akin to
opennlp DoccatEvaluator -model model -data sampleData
The format of your sampleData should be
OUTCOME <document text....>
documents are separated by the new line character.
The second way involves creating an DocumentCategorizer. Something like:
(the model is the DocCat model from your question)
DocumentCategorizer categorizer = new DocumentCategorizerME(model);
// could also use: Tokenizer tokenizer = new TokenizerME(tokenizerModel)
Tokenizer tokenizer = WhitespaceTokenizer.INSTANCE();
// linesample is like in your question...
for(String sample=linesample.read(); sample != null; sample=linesample.read()){
String[] tokens = tokenizer.tokenize(sample);
double[] outcomeProb = categorizer.categorize(tokens);
String sampleOutcome = categorizer.getBestCategory(outcomeProb);
// check if the outcome is right...
// keep track of # right and wrong...
}
// calculate agreement metric of your choice
Since I typed the code here there may be a syntax error or two (either I or the SO community can fix), but the idea for running through your data, tokenizing, running it through the document categorizer and keeping track of the results is how you want to evaluate your model.
Hope it helps...

How to get the browser name alone from client in java?

I tried using
String userAgent=req.getHeader("user-agent");
and also the following
#GET
#Path("/get")
public Response addUser(#HeaderParam("user-agent") String userAgent) {
return Response.status(200)
.entity("addUser is called, userAgent : " + userAgent)
.build();
}
But I need only, browser name as chrome,firefox,IE.Please help,if anyone know.
UPDATE : Got answer
public String browser(#HeaderParam("user-agent") String userAgent){
UserAgent browserName = UserAgent.parseUserAgentString(userAgent);
String browser=browserName.toString();
System.out.println(browser)
}
Getting information out of user agent strings is somewhat of a black art. Easiest is probably to use a library to parse the user agent string and extract the needed information.
I've used UADetector in the past with good results, but there are undoubtedly other libraries out there.
The following sample is from the UADetector documentation:
UserAgentStringParser parser = UADetectorServiceFactory.getResourceModuleParser();
ReadableUserAgent agent = parser.parse(request.getHeader("User-Agent"));
out.append("You're a <em>");
out.append(agent.getName());
out.append("</em> on <em>");
out.append(agent.getOperatingSystem().getName());
out.append("</em>!");

Finding product in category

I need to find products in different categories on eBay. But when I use the tutorial code
ebay.apis.eblbasecomponents.FindProductsRequestType request = new ebay.apis.eblbasecomponents.FindProductsRequestType();
request.setCategoryID("Art");
request.setQueryKeywords("furniture");
I get the following error: QueryKeywords, CategoryID and ProductID cannot be used together.
So how is this done?
EDIT: the tutorial code is here.
EDIT2: the link to the tutorial code died, apparently. I've continued to search and the category cannot be used with the keyword search, but there's a Domain that you could presumably add to the request, but sadly it's not in the API - so I'm not sure if indeed it can be done.
The less-than-great eBay API doc is here.
This is my full request:
Shopping service = new ebay.apis.eblbasecomponents.Shopping();
ShoppingInterface port = service.getShopping();
bp = (BindingProvider) port;
bp.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, endpointURL);
// Add the logging handler
List<Handler> handlerList = bp.getBinding().getHandlerChain();
if (handlerList == null) {
handlerList = new ArrayList<Handler>();
}
LoggingHandler loggingHandler = new LoggingHandler();
handlerList.add(loggingHandler);
bp.getBinding().setHandlerChain(handlerList);
Map<String,Object> requestProperties = bp.getRequestContext();
Map<String, List<String>> httpHeaders = new HashMap<String, List<String>>();
requestProperties.put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, endpointURL);
httpHeaders.put("X-EBAY-API-CALL-NAME", Collections.singletonList(CALLNAME));
httpHeaders.put("X-EBAY-API-APP-ID", Collections.singletonList(APPID));
httpHeaders.put("X-EBAY-API-VERSION", Collections.singletonList(VERSION));
requestProperties.put(MessageContext.HTTP_REQUEST_HEADERS, httpHeaders);
// initialize WS operation arguments here
FindProductsRequestType request = new FindProductsRequestType();
request.setAvailableItemsOnly(true);
request.setHideDuplicateItems(true);
request.setMaxEntries(2);
request.setPageNumber(1);
request.setQueryKeywords("Postcard");
request.setDomain("");
The last line, which should set the domain like I need to, does not compile. Any idea how to solve this?
EDIT 3: I gave up on the Java API and I'm doing direct REST. The categories on eBay are actually domains now, and the URL looks like this:
String findProducts = "http://open.api.ebay.com/shopping?callname=FindProducts&responseencoding=XML&appid=" + APPID
+ "&siteid=0&version=525&"
+ "&AvailableItemsOnly=true"
+ "&QueryKeywords=" + keywords
+ "&MaxEntries=10"
+ "&DomainName=" + domainName;
This works, but you want to hear a joke? It seems like not all the domains are listed here and so it doesn't really solve this problem. Pretty disappointing work by eBay.
The solution for finding items based on keywords, in a category, is to use findItemsAdvanced. Could have saved me a lot of time if the docs for FindProducts stated this, instead of just saying that you can use either keyword search OR category search.
This is the API URL:
http://open.api.ebay.com/shoppingcallname=findItemsAdvanced&responseencoding=XML&appid=" + APPID
+ "&siteid=0&version=525&"
+ "&AvailableItemsOnly=true"
+ "&QueryKeywords=" + keywords
+ "&categoryId=" + categoryId
+ "&MaxEntries=50
For completion, if you want to get a list of all the top categories you can use this:
http://open.api.ebay.com/Shopping?callname=GetCategoryInfo&appid=" + APPID + "&siteid=0&CategoryID=-1&version=729&IncludeSelector=ChildCategories

ROME API to parse RSS/Atom

I'm trying to parse RSS/Atom feeds with the ROME library. I am new to Java, so I am not in tune with many of its intricacies.
Does ROME automatically use its modules to handle different feeds as it comes across them, or do I have to ask it to use them? If so, any direction on this.
How do I get to the correct 'source'? I was trying to use item.getSource(), but it is giving me fits. I guess I am using the wrong interface. Some direction would be much appreciated.
Here is the meat of what I have for collection my data.
I noted two areas where I am having problems, both revolving around getting Source Information of the feed. And by source, I want CNN, or FoxNews, or whomever, not the Author.
Judging from my reading, .getSource() is the correct method.
List<String> feedList = theFeeds.getFeeds();
List<FeedData> feedOutput = new ArrayList<FeedData>();
for (String sites : feedList ) {
URL feedUrl = new URL(sites);
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));
List<SyndEntry> entries = feed.getEntries();
for (SyndEntry item : entries){
String title = item.getTitle();
String link = item.getUri();
Date date = item.getPublishedDate();
Problem here --> ** SyndEntry source = item.getSource();
String description;
if (item.getDescription()== null){
description = "";
} else {
description = item.getDescription().getValue();
}
String cleanDescription = description.replaceAll("\\<.*?>","").replaceAll("\\s+", " ");
FeedData feedData = new FeedData();
feedData.setTitle(title);
feedData.setLink(link);
And Here --> ** feedData.setSource(link);
feedData.setDate(date);
feedData.setDescription(cleanDescription);
String preview =createPreview(cleanDescription);
feedData.setPreview(preview);
feedOutput.add(feedData);
// lets print out my pieces.
System.out.println("Title: " + title);
System.out.println("Date: " + date);
System.out.println("Text: " + cleanDescription);
System.out.println("Preview: " + preview);
System.out.println("*****");
}
}
getSource() is definitely wrong - it returns back SyndFeed to which entry in question belongs. Perhaps what you want is getContributors()?
As far as modules go, they should be selected automatically. You can even write your own and plug it in as described here
What about trying regex the source from the URL without using the API?
That was my first thought, anyway I checked against the RSS standardized format itself to get an idea if this option is actually available at this level, and then try to trace its implementation upwards...
In RSS 2.0, I have found the source element, however it appears that it doesn't exist in previous versions of the spec- not good news for us!
[ is an optional sub-element of 1
Its value is the name of the RSS channel that the item came from, derived from its . It has one required attribute, url, which links to the XMLization of the source.

Categories

Resources