Filter query exclusion with SolrJ

Filter query exclusion with SolrJ - java

I use a SolrJ-based client to query Solr and I have been trying to construct HTTP requests where facet name/value pairs are excluded. The web interface I am working with has a refine further functionality, which allows excluding one or more facet values. I have 3 facet fields: domain, content type and author and I would like to be able to handle faceting by exclusion on each of them. For example, q = Dickens AND fq=-author:Dickens, Janet will construct the following HTTP request:
/solr/solrbase/select?q=Dickens&fq=-author:Dickens%2c+Janet&wt=json&indent=true
Whereas the XML dump will look like:
<facets>
<facet name="author">
<facetEntry count="20">Dickens, Charles</facetEntry>
<facetEntry count="10">Dickens, Sarah</facetEntry>
</facet>
</facets>
So far, the Java implementation I am working with does not seems to handle filter query exclusion:
private HttpSolrServer solrServer;
solrServer = new HttpSolrServer("http://localhost:8983/solr/");
private static final String CONFIG_SOLR_FACET_FIELD = "facet_field";
private String[] _facetFields = new String[] {"author"};
private static final String CONFIG_SOLR_FACETS = "facets"
Element el = myParams.getChild(CONFIG_SOLR_FACETS);
_facetUse = el.getAttributeValue("useFacets", "true");
_facetMinCount = el.getAttributeValue("minCount", String.valueOf(1));
_facetLimit = el.getAttributeValue("limit", String.valueOf(20));
List vals = el.getChildren(CONFIG_SOLR_FACET_FIELD);
if (vals.size() > 0) {
_facetFields = new String[vals.size()];
for (int i=0; i < vals.size(); i++) {
_facetFields[i] = ((Element)vals.get(i)).getTextTrim();
}
}
SolrQuery query = new SolrQuery();
query.setQuery(qs);
List facetList = doc.getRootElement().getChildren("facet");
Iterator<String> it = facetList.iterator();
while (it.hasNext()) {
Element el = (Element)it.next(); //
String name = el.getAttributeValue("name");
String value = el.getTextTrim();
if (name != null && value != null) {
facets.add(name+":"+value);
}
}
query.setQuery(qs).
setFacet(Boolean.parseBoolean(_facetUse)).
setFacetMinCount(Integer.parseInt(_facetMinCount)).
setFacetLimit(Integer.parseInt(_facetLimit)).
for (int i=0; i<_facetFields.length; i++) {
query.addFacetField(_facetFields[i]);
};
for (int i=0; i<facets.size(); i++) {
query.addFilterQuery(facets.get(i));
};
return query;
}
I was recommended to use something along these lines:
SolrQuery solrQuery = new SolrQuery();
solrQuery.set(CommonParams.FQ, “-author:Dickens,Janet”);
However, this seems to be a hardcoded approach and it cannot be easily applied across all 3 facets and all facet values. I have looked at this, but still it is not clear to me how I should include the exclusion variant in my current code. Can you help with this?
Thanks indeed,
I.
EDIT 1
I have attached the code to construct/prepare the Solr Query, but I should have also included how the Solr instance is actually queried:
private QueryResponse execQuery(SolrQuery query) throws SolrServerException {
QueryResponse rsp = solrServer.query( query );
return rsp;
}
Moreover, it would be helpful to post the code that converts the Solr query response for the facets into something that can be understood by the web application:
Element elfacets = new Element("facets");
List<FacetField> facets = rsp.getFacetFields();
if (facets != null) {
int i = 0;
for (FacetField facet : facets) {
Element sfacet = new Element("facet");
sfacet.setAttribute("name", facet.getName());
List<Count> facetEntries = facet.getValues();
for(FacetField.Count fcount : facetEntries) {
Element facetEntry = new Element("facetEntry");
facetEntry.setText(fcount.getName());
facetEntry.setAttribute("count", String.valueOf(fcount.getCount()));
sfacet.addContent(facetEntry);
}
elfacets.addContent(sfacet);
}
root.addContent(elfacets);
}
doc.addContent(root);
return doc;
}
"facets" is nothing more than the XSLT, which includes rules on how to map Solr facets with the facets as handled by the web application.
EDIT 2
I attach the "facets" template, which is called by the code as presented in EDIT 1:
<xsl:template name="facets">
<xsl:param name="q" />
<xsl:analyze-string select="$q" regex='AND facet_(.*?):\(("?.*?"?)\)'>
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1) = 'author'">
<facet name="author"><xsl:value-of select="regex-group(2)" /></facet>
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
<xsl:non-matching-substring>
<!--<xsl:analyze-string select="$q" regex='AND NOT facet_(.*?):\(("?.*?"?)\)'>
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="regex-group(1) = 'author'">
<facet name="author"><xsl:value-of select="regex-group(2)" /></facet>
</xsl:when>
</xsl:choose>
</xsl:matching-substring>
</xsl:analyze-string>-->
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
The template only features the author facet, but I have 3 facets in total. It should be noted that my web application has the following syntax for excluding facets:
AND NOT facet_author:("Dickens, Janet")

I'm sure you you have the following lines inside some method. Instead of hard coding the fq part, have some variable there.
SolrQuery solrQuery = new SolrQuery();
solrQuery.set(CommonParams.FQ, “-author:Dickens,Janet”);
If you need to use the fq, pass proper parameter (e.g “-author:Dickens,Janet”). Otherwise pass an empty string. So, your query will be like
/solr/solrbase/select?q=Dickens&fq=&wt=json&indent=true
Then add your faceting part of the query. Though your query having fq=, it won't throw an error. It basically won't work for the fq part. But rest of the query will work fine.
Hope this will help.

Related

org.dbunit.database.AmbiguousTableNameException

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<employee name = "abc" empId = "100" location = "goa" dept = "IT" />
<facilities transport="Y" mealcard = "" driver = "Y"/>
<reportees manager = "XXX" lead = "XXX" VP = "XXX"/>
<employee name = "XYZ" empId = "101" location = "mombai" dept = "IT" />
<facilities transport="Y" mealcard = "" driver = "Y"/>
<reportees manager = "XXX" lead = "XXX" VP = "XXX"/>
<employee name = "PQA" empId = "102" location = "delhi" dept = "IT" />
<facilities transport="Y" mealcard = "" driver = "Y"/>
<reportees manager = "XXX" lead = "XXX" VP = "XXX"/>
above is my XML. I've to validate the same. I'm using DBUnit to compare the same XML with other XML (other XML has the same content but generated from different source). While creating dataset I'm getting "org.dbunit.database.AmbiguousTableNameException".
I've SQL query for every XML tag. E.g. For employee tag I'm querying on empID. for facilities I'm querying on mealCard etc etc. I'm looping through the XML tags. first iteration goes fine without error. I'm using QueryDataSet.addTable(table_name ,query) to add tables i.e xml tags. But when next iteration comes and I'm trying to add employee tag again I'm getting the above stated error.

Configure the schema name so dbUnit knows which to use. See Why am I getting an "AmbiguousTableNameException"?

I resolved this issue. I'll tell you how. In dataset for repeating tags I just added random numbers e.g. in my above XML employee is repeating tag hence I added employee_123 for the first instance in dataset for next instance I added employee_098 this is how I manage to get rid of AmbigiousTableNameException. Later I used RegEx to remove the "_[randomNumber]" appended to employee tag. The code is running successful.

#DBUnit(qualifiedTableNames = true)
this setting helped me

Converting tab-delimited text file with multiple columns to XML

I'm trying to programmatically convert a text file with multiple columns of info into an XML file with this format:
<ExampleDataSet>
<Example ExID="AA" exampleCode="AA" exampleDescription="THIS IS AN EXAMPLE DESCRIPTION"/>
<Example ExID="BB" exampleCode="BB" exampleDescription="THIS IS AN EXAMPLE DESCRIPTION"/>
<Example ExID="CC" exampleCode="CCC" exampleDescription="THIS IS AN EXAMPLE DESCRIPTION"/>
<Example ExID="DDD" exampleCode="DD" exampleDescription="THIS IS AN EXAMPLE DESCRIPTION"/>
<Example ExID="EEEE" exampleCode="EE" exampleDescription="THIS IS AN EXAMPLE DESCRIPTION"/>
</ExampleDataSet>
I've found other examples that do similar conversions, but on a simpler level. Could anyone point me in the right direction?

You can manually create an XML document using the below. This example creates an XML document with 1 element and the attributes required.
First, create the xml document itself and append the top level element collection header.
XmlDocument doc = new XmlDocument();
XmlNode node = doc.CreateElement("ExampleDataSet");
doc.AppendChild(node);
Now create a new element row. ( you would need a loop here, 1 per csv row!)
XmlNode eg1 = doc.CreateElement("Example");
Then create each of the attributes of the element and append.
XmlAttribute att1 = doc.CreateAttribute("ExID");
att1.Value = "AA";
XmlAttribute att2 = doc.CreateAttribute("exampleCode");
att2.Value = "AA";
XmlAttribute att3 = doc.CreateAttribute("exampleDescription");
att3.Value = "THIS IS AN EXAMPLE DESCRIPTION";
eg1.Attributes.Append(att3);
eg1.Attributes.Append(att2);
eg1.Attributes.Append(att1);
Finally, append to the parent node.
node.AppendChild(eg1);
You can get the XML string like this if you need it.
string xml = doc.OuterXml;
Or you can save it directly to a file.
doc.Save("C:\\test.xml");
Hope that helps you on your way.
Thanks

In XSLT 3.0 you can write this as, for example:
<xsl:variable name="columns" select="'exId', 'exCode', 'exDesc'"/>
<xsl:template name="xsl:initial-template">
<DatasSet>
<xsl:for-each select="unparsed-text-lines('input.csv')">
<xsl:variable name="tokens" select="tokenize(., '\t')"/>
<Example>
<xsl:for-each select="1 to count($tokens)">
<xsl:attribute name="{$columns[$i]}" select="$tokens[$i]"/>
</xsl:for-each>
</Example>
</xsl:for-each>
</DataSet>
</xsl:template>
I'm not sure why you tagged the question "Java" and "C#" but you can run this using Saxon-HE called from Java or C# or from the command line.

Using xml linq and assuming first row of the file are the column headers
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.txt";
static void Main(string[] args)
{
XDocument doc = new XDocument();
doc.Add(new XElement("ExampleDataSet"));
XElement root = doc.Root;
StreamReader reader = new StreamReader(FILENAME);
int rowCount = 1;
string line = "";
string[] headers = null;
while((line = reader.ReadLine()) != null)
{
if (rowCount++ == 1)
{
headers = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
}
else
{
string[] arrayStr = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
XElement newRow = new XElement("Example");
root.Add(newRow);
for (int i = 0; i < arrayStr.Count(); i++)
{
newRow.Add(new XAttribute(headers[i], arrayStr[i]));
}
}
}
}
}
}

Hibernate Search Faceting not working

I´m programming a web application with Java EE, Hibernate Search, JPA and JSF.
I have been reading the hibernate Search documantation over and over again, but I just can´t get faceting to work properly.
I have a database which includes several categorys. I made an example with football clubs.
I have the category Germany, which has the subclases Bundesliga, 2. Bundesliga and so on.
I have also a category called ChampionsLeague, EuroLeague and some other leagues representing different countries.
If I search for "Deutschland" Hibernate Search gives me the correct list of all Football clubs playing in Germany. Some of the football clubs participate in the ChampionsLeague and Euroleage. The Faceting in my left navigation bar gives me the categories in which the german clubs take part in. Also it displays the correct facetedCount.
The problem is, that if I click on one of the categories, Hibernate Search displays me all of the clubs in this category not only the german Clubs, which I have searched for on my initial search.
Can anybody tell me how to fix this problem?
Here is my code:
SearchBean:
public void startKeywordSearch(){
fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
qb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Company.class).get();
query = qb
.keyword()
.fuzzy().withEditDistanceUpTo(1).withPrefixLength(0)
.onFields("companyName", "companyShortDescription", "companyLongDescription", "categoryList.categoryName", "and so on")
.matching(keyword)
.createQuery();
categoryNameFacetingRequest = qb.facet()
.name("categoryNameFacet")
.onField("categoryList.categoryName_forFaceting")
.discrete()
.orderedBy(FacetSortOrder.COUNT_DESC)
.includeZeroCounts(false)
.maxFacetCount(100)
.createFacetingRequest();
persistenceQuery = fullTextEntityManager.createFullTextQuery(query, Company.class);
facetManager = fullTextEntityManager.createFullTextQuery(query, Company.class).getFacetManager();
facetManager.enableFaceting(categoryNameFacetingRequest);
result = persistenceQuery.getResultList();
facetResults = facetManager.getFacets("categoryNameFacet");
searchCount = result.size();
Here is my Code for the addFacet Method:
public void addFacet(Facet facet) {
fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
Query luceneQuery = facet.getFacetQuery();
persistenceQuery = fullTextEntityManager.createFullTextQuery(luceneQuery, Company.class);
facetManager.enableFaceting(categoryNameFacetingRequest);
result = persistenceQuery.getResultList();
facetResults = facetManager.getFacets("categoryNameFacet");
FacetSelection facetSelection = facetManager.getFacetGroup("categoryNameFacet");
facetSelection.selectFacets(facet);
result = persistenceQuery.getResultList();
And thats the Code for generating my Link:
<div>
<h:form id="facetForm">
<ul>
<ui:repeat value="#{searchBean.facetResults}" var="facet">
<li><h:commandLink value="#{facet.value}" action="#{searchBean.addFacet(facet)}">
<f:ajax render="#all" />
</h:commandLink> (#{facet.count})</li>
</ui:repeat>
</ul>
</h:form>
</div>

Not sure where exactly you call addFacet, but it looks like you just run the query provided by facet#getFacetQuery. This won't work. The facet query is supposed to be applied on top of the existing query. Either via a boolean query or via a FacetSelection (which acts on top of the original query). The documentation has an example for that:
// create a fulltext query
Query luceneQuery = builder.all().createQuery(); // match all query
FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery, clazz );
// retrieve facet manager and apply faceting request
FacetManager facetManager = fullTextQuery.getFacetManager();
facetManager.enableFaceting( priceFacetingRequest );
// get the list of Cd
List<Cd> cds = fullTextQuery.list();
assertTrue(cds.size() == 10);
// retrieve the faceting results
List<Facet> facets = facetManager.getFacets( "priceFaceting" );
assertTrue(facets.get(0).getCount() == 2)
// apply first facet as additional search criteria
FacetSelection facetSelection = facetManager.getFacetGroup( "priceFaceting" );
facetSelection.selectFacets( facets.get( 0 ) );
// re-execute the query
cds = fullTextQuery.list();
assertTrue(cds.size() == 2);
See http://docs.jboss.org/hibernate/search/5.3/reference/en-US/html_single/#_restricting_query_results

Remove empty namespace in java using DOM

<Request>
<EMPId>?</EMPId>
</Request>
I know this is a repeated question, but i would like to post it again as i dint get a convincing answer from any of the threads i went through.
My ultimate aim is to add the XML given above as the Body content of a SOAP message.
You can have a look at the following link to see how i am doing it.
Namespace related error on creating SOAP Request
It worked fine when i was using the Websphere Application Server 7.0 library.JRE is also present, forgot to include in screen shot.
Since i have to export it as a jar and run it as a stand alone application, i have to remove the dependency of 'Websphere Application Server 7.0 library'. Because, by keeping this library, my jar size will go above 100MB. So i thought of taking only the library which i needed.
'com.ibm.ws.prereq.soap.jar'
Now the issue is, the Request tag of the generated SOAP request is coming in following format.
<Request xmlns="">
<EMPId>?</EMPId>
</Request>
I am able to create a 'org.w3c.dom.Document' representation for the generated SOAP message.
Now, can any one tell me how can I delete the xmlns="" from Request tag.

The simplest way what i found is:
first:
in child set nasmespace as in root:
second:
remove namespace
Document doc = new Document();
Namespace xmlns = Namespace.getNamespace("http://www.microsoft.com/networking/WLAN/profile/v1");
Element rootXML = new Element("WLANProfile", xmlns);
Element nameXML = new Element("name");
nameXML.addContent(name);
rootXML.addContent(nameXML);
//below solution
nameXML.setNamespace(xmlns);
nameXML.removeNamespaceDeclaration(xmlns);

Finally I found several solutions of the described problem.
First, you can remove all namespaces from all xml using this answer.
Second, if you do not need to remove all namespaces in Xml, but only empty ones, they arise due to the fact that some namespace is written in the root elements, which is not in the child. For example:
<ЭДПФР xmlns="http://пф.рф/КСАФ/2018-04-03"
xmlns:АФ4="xx"...>
<КСАФ xmlns="">
...
</КСАФ>
So you need to set the same namespace for all children of root elements. It can be done using this code (call setTheSameNamespaceForChildren(rootElement) for root element before saving):
private static final String namespaceKey = "xmlns";
private static String namespaceValue;
public static void setTheSameNamespaceForChildren(Element rootEl) {
namespaceValue = rootEl.getAttribute(namespaceKey);
NodeList list = rootEl.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
Node child = list.item(i);
setTheSameNamespaceRecursively(child);
}
}
private static void setTheSameNamespaceRecursively(Node node) {
if (node.getNodeType() == Node.ELEMENT_NODE) {
boolean isChanged = setTheSameNamespace((Element) node);
if (isChanged) {
NodeList list = node.getChildNodes();
for (int i = 0; i < list.getLength(); i++) {
Node child = list.item(i);
setTheSameNamespaceRecursively(child);
}
}
}
}
private static boolean setTheSameNamespace(Element node) {
String curValue = node.getAttribute(namespaceKey);
if (curValue.length() == 0) {
node.setAttribute(namespaceKey, namespaceValue);
return true;
}
return false;
}

Extracting Values From an XML File Either using XPath, SAX or DOM for this Specific Scenario

I am currently working on an academic project, developing in Java and XML. Actual task is to parse XML, passing required values preferably in HashMap for further processing. Here is the short snippet of actual XML.
<root>
<BugReport ID = "1">
<Title>"(495584) Firefox - search suggestions passes wrong previous result to form history"</Title>
<Turn>
<Date>'2009-06-14 18:55:25'</Date>
<From>'Justin Dolske'</From>
<Text>
<Sentence ID = "3.1"> Created an attachment (id=383211) [details] Patch v.2</Sentence>
<Sentence ID = "3.2"> Ah. So, there's a ._formHistoryResult in the....</Sentence>
<Sentence ID = "3.3"> The simple fix it to just discard the service's form history result.</Sentence>
<Sentence ID = "3.4"> Otherwise it's trying to use a old form history result that no longer applies for the search string.</Sentence>
</Text>
</Turn>
<Turn>
<Date>'2009-06-19 12:07:34'</Date>
<From>'Gavin Sharp'</From>
<Text>
<Sentence ID = "4.1"> (From update of attachment 383211 [details])</Sentence>
<Sentence ID = "4.2"> Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
</Text>
</Turn>
<Turn>
<Date>'2009-06-19 13:17:56'</Date>
<From>'Justin Dolske'</From>
<Text>
<Sentence ID = "5.1"> (In reply to comment #3)</Sentence>
<Sentence ID = "5.2"> &gt; (From update of attachment 383211 [details] [details])</Sentence>
<Sentence ID = "5.3"> &gt; Perhaps we should rename one of them to _fhResult just to reduce confusion?</Sentence>
<Sentence ID = "5.4"> Good point.</Sentence>
<Sentence ID = "5.5"> I renamed the one in the wrapper to _formHistResult. </Sentence>
<Sentence ID = "5.6"> fhResult seemed maybe a bit too short.</Sentence>
</Text>
</Turn>
.....
and so on
</BugReport>
There are many commenter like 'Justin Dolske' who have commented on this report and what I actually looking for is the list of commenter and all sentences they have written in a whole XML file. Something like if(from == justin dolske) getHisAllSentences(). Similarly for other commenters (for all). I have tried many different ways to get the sentences only for 'Justin dolske' or other commenters, even in a generic form for all using XPath, SAX and DOM but failed. I am quite new to these technologies including JAVA and any don't know how to achieve it.
Can anyone guide me specifically how could I get it with any of above technologies or is there any other better strategy to do it?
(Note: Later I want to put it in a hashmap such as like this HashMap (key, value) where key = name of commenter (justin dolske) and value is (all sentences))
Urgent help will be highly appreciated.

There're several ways using which you can achieve your requirement.
One way would be use JAXB. There're several tutorials available on this on the web, so feel free to refer to them.
You can also think of creating a DOM and then extracting data from it and then put it into your HashMap.
One reference implementation would be something like this:
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class XMLReader {
private HashMap<String,ArrayList<String>> namesSentencesMap;
public XMLReader() {
namesSentencesMap = new HashMap<String, ArrayList<String>>();
}
private Document getDocument(String fileName){
Document document = null;
try{
document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File(fileName));
}catch(Exception exe){
//handle exception
}
return document;
}
private void buildNamesSentencesMap(Document document){
if(document == null){
return;
}
//Get each Turn block
NodeList turnList = document.getElementsByTagName("Turn");
String fromName = null;
NodeList sentenceNodeList = null;
for(int turnIndex = 0; turnIndex < turnList.getLength(); turnIndex++){
Element turnElement = (Element)turnList.item(turnIndex);
//Assumption: <From> element
Element fromElement = (Element) turnElement.getElementsByTagName("From").item(0);
fromName = fromElement.getTextContent();
//Extracting sentences - First check whether the map contains
//an ArrayList corresponding to the name. If yes, then use that,
//else create a new one
ArrayList<String> sentenceList = namesSentencesMap.get(fromName);
if(sentenceList == null){
sentenceList = new ArrayList<String>();
}
//Extract sentences from the Turn node
try{
sentenceNodeList = turnElement.getElementsByTagName("Sentence");
for(int sentenceIndex = 0; sentenceIndex < sentenceNodeList.getLength(); sentenceIndex++){
sentenceList.add(((Element)sentenceNodeList.item(sentenceIndex)).getTextContent());
}
}finally{
sentenceNodeList = null;
}
//Put the list back in the map
namesSentencesMap.put(fromName, sentenceList);
}
}
public static void main(String[] args) {
XMLReader reader = new XMLReader();
reader.buildNamesSentencesMap(reader.getDocument("<your_xml_file>"));
for(String names: reader.namesSentencesMap.keySet()){
System.out.println("Name: "+names+"\tTotal Sentences: "+reader.namesSentencesMap.get(names).size());
}
}
}
Note: This is just a demonstration and you would need to modify it to suit your need. I've created it based on your XML to show one way of doing it.

I suggest to use JAXB to creates a Data Model reflecting your XML structure.
One done, you can load the XML into Java instances.
Put each 'Turn' into a Map< String, List< Turn >>, using Turn.From as key.
Once done, you'll can write:
List< Turn > justinsTurn = allTurns.get( "'Justin Dolske'" );

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Filter query exclusion with SolrJ - java

Related

org.dbunit.database.AmbiguousTableNameException

Converting tab-delimited text file with multiple columns to XML

Hibernate Search Faceting not working

Remove empty namespace in java using DOM

Extracting Values From an XML File Either using XPath, SAX or DOM for this Specific Scenario

Categories

Resources