I need to add a property to return a timestamp, I use below example to test print a sample date in "dataimport.properties" file to get last modified time. And it's not working
dataconfig.xml:
<dataConfig>
<dataSource type="JdbcDataSource"
driver="org.apache.cassandra.cql.jdbc.CassandraDriver"
url="jdbc:cassandra://localhost:9160/sample"
user="cassandra"
password="cassandra"
autoCommit="true"/>
<document name="content">
<entity name="defaults" query="SELECT id from sample.contacts"
deltaImportQuery="select id from sample.contacts where modifiedtime >'${dataimporter.defaults.last_index_time}' allow filtering"
deltaQuery="select id from sample.contacts where modifiedtime > '${dataimporter.last_index_time}' limit 1 allow filtering "
autoCommit="true">
<field column="id" name="id" />
</entity>
</document>
<propertyWriter dateFormat="yyyy-MM-dd" type="SimplePropertiesWriter" directory="conf" filename="dataimport.properties" locale="en-US"/>
</dataConfig>`
Try this
<propertyWriter dateFormat="yyyy-MM-dd" type="SimplePropertiesWriter" />
You will start getting your last_index_time in desired format (yyyy-MM-dd) in conf/dataimport.properties.
Related
I have several different XML files. I want to display them in HTML format using JSP file.
The structure of these files can be different. They may have different depth and tag's names.
I've read about XSLT but I found only examples with manually created xsl. (I can't do that because i will have many different xml files).
How can i produce my html?
EDIT:
Example of form:
<form name="Company">
<field name="name" required="true" type="inputText"/>
<field name="registrationDate" required="true" type="date"/>
<field name="isActive" required="false" type="boolean"/>
<field name="unregistrationDate" type="date"/>
<field name="type" type="comboBox">
<values>
<value>type1</value>
<value>type3</value>
<value>type3</value>
</values>
<defaultSelected>type2</defaultSelected>
</field>
</form>
I am trying to perform search result aggregation (count and sum) grouping by several fields in a nested fashion.
For example, with the schema shown at the end of this post, I'd like to be able to get the sum of "size" grouped by "category" and sub-grouped further by "subcategory" and get something like this:
<category name="X">
<subcategory name="X_A">
<size sum="..." />
</subcategory>
<subcategory name="X_B">
<size sum="..." />
</subcategory>
</category>
....
I've been looking primarily at Solr's Stats component which, as far as I can see, doesn't allow nested aggregation.
I'd appreciate it if anyone knows of some way to implement this, with or without the Stats component.
Here is a cut-down version of the target schema:
<types>
<fieldType name="string" class="solr.StrField" />
<fieldType name="text" class="solr.TextField">
<analyzer><tokenizer class="solr.StandardTokenizerFactory" /></analyzer>
</fieldType>
<fieldType name="date" class="solr.DateField" />
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true" />
<field name="category" type="text" indexed="true" stored="true" />
<field name="subcategory" type="text" indexed="true" stored="true" />
<field name="pdate" type="date" indexed="true" stored="true" />
<field name="size" type="int" indexed="true" stored="true" />
</fields>
The new faceting module in Solr 5.1 can do this, it was added in https://issues.apache.org/jira/browse/SOLR-7214
Here is how you would add sum(size) to every facet bucket, and sort descending by that statistic.
json.facet={
categories:{terms:{
field:category,
sort:"total_size desc", // this will sort the facet buckets by your stat
facet:{
total_size:"sum(size)" // this calculates the stat per bucket
}
}}
}
And this is how you would add in the subfacet on subcategory:
json.facet={
categories:{terms:{
field:category,
sort:"total_size desc",
facet:{
total_size:"sum(size)",
subcat:{terms:{ // this will facet on the subcategory field for each bucket
field:subcategory,
facet:{
sz:"sum(size)" // this calculates the sum per sub-cat bucket
}}
}
}}
}
So the above will give you the sum(size) at both the category and subcategory levels. Documentation for the new facet module is currently at http://yonik.com/json-facet-api/
There is a patch SOLR-3583, which adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals. It is possible to add sums to pivot facets by improving this patch.
For example, averages can be calculated for categories using this url:
http://localhost:8983/solr/select?q=*%3A*
&facet=true
&facet.pivot=category,subcategory
&facet.stats.percentiles=true
&facet.stats.percentiles.averages=true
&facet.stats.percentiles.field=size
&f.size.stats.percentiles.requested=25,50,75
&f.size.stats.percentiles.lower.fence=0
&f.size.stats.percentiles.upper.fence=1000
&f.size.stats.percentiles.gap=10
See also this video and slides for more details.
1. Counts
To get the counts, you can use Pivot Faceting. It will generate a list very similar to what you asked but with counts only.
You'll need to append this to your query:
&facet=true&facet.pivot=category,subcategory
Note that this works on Solr 4.0 and after.
2. Sums
As for the sums, I think you can achieve them with ordinary facets but using a facet query instead of facet field.. I'm not entirely sure of this one, I'll try it out and re-post if found anything useful.
I want to join two separate queries. And display information from them. First query is firms from Canada, second query is firms with name Incremento. So I need to run separate queries and join result information.
My schema is:
<entity name="firm" dataSource="jdbc" pk="id"
query="select * from firm"
deltaImportQuery="select * from firm where id='${dih.delta.id}'"
deltaQuery="select id from firm where upd_date > '${dih.last_index_time}'">
<field column="id" name="id"/>
<field column="ADDRESS" name="address"/>
<field column="EMPLOYEE" name="employee"/>
<field column="NAME" name="name"/>
<field column="VILLAGE" name="village"/>
<field column="ZIPCODE" name="zipcode"/>
<field column="PLACE" name="place"/>
<entity name="country" pk="id"
query="select country from country where id='${firm.country}'"
deltaQuery="select id from country where upd_date > '${dih.last_index_time}'"
parentDeltaQuery="select id from firm where country=${country.id}">
<field column="country" name="countryName"/>
</entity>
</entity>
How to do that ???
Searching :-
You can use e.g.
q=name:incremento&fq=location:Canada - Searches firms with name Incremento only in firms with location Canada
fq=location:Canada&fq=name:Incremento - Filters firms with Location CANADA and name Incremento
Indexing :-
You can either handle it in the SQL Query by using OR for compaines from Canada and ones with name Incremento.
e.g. SELECT * FROM FIRMS WHERE COUNTRY='CANADA' OR NAME='INCREMENTO'
OR DIH allows only tag. you may have multiple root tags. e.g.
<dataSource driver="..." url="..." user=".." />
<document name="companies">
<entity name="firm_canada" dataSource="jdbc" pk="id"
query="select * from firm"
deltaImportQuery="select * from firm where id='${dih.delta.id}'"
deltaQuery="select id from firm where upd_date > '${dih.last_index_time}'">
<field column="id" name="id"/>
....
</entity>
<entity name="firm_incremento" dataSource="jdbc" pk="id"
query="select * from firm where ..."
deltaImportQuery="select * from firm where id='${dih.delta.id}'"
deltaQuery="select id from firm where upd_date > '${dih.last_index_time}'">
<field column="id" name="id"/>
....
</entity>
</document>
</dataSource>
My db-data-config.xml like this:
<dataSource name="192.168.5.206" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://192.168.5.206:3306/editor_app" user="root" password="tvmining" />
<dataSource name="localhost" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://192.168.4.49/titans_myself" user="editor" password="tvm_editor" />
<document>
<entity dataSource ="192.168.5.206" name="product_info" query="SELECT t.id, t.title, t.keyword, t.update_time FROM product_info t" deltaQuery="SELECT t.id FROM product_info t where t.update_time > '${dataimporter.last_index_time}'" deltaImportQuery="SELECT t.id, t.title, t.keyword, t.update_time FROM product_info t where t.id='${dataimporter.delta.id}'">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="keyword" name="keyword" />
<field column="update_time" name="update_time" />
</entity>
<entity dataSource ="localhost" name="log_info" query="SELECT t.id, t.operation_content FROM log_info t " deltaQuery="SELECT t.id, t.operation_content FROM log_info t where t.update_time > '${dataimporter.last_index_time}'" deltaImportQuery="SELECT t.id, t.operation_content FROM log_info t where t.id='${dataimporter.delta.id}'">
<field column="id" name="id" />
<field column="operation_content" name="operation_content" />
</entity>
</document>
but when I enter 'http://192.168.4.40:8080/solr/update/database?command=full-import', there always import the first entity data. How can I import two entities data?
this should work as is in order to import both entities. Now, maybe you expect to have a single doc in solr with fields from both entities if the id is the same?? If that is what you are looking for, you need to join the tables somehow, and use a single entity
Try with this url:
http://192.168.4.40:8080/solr/update/database?command=full-import&entity=log_info
I just added the entity parameter with the name of the entity as value.
I am trying the Data Import Handler for MySQL Database.
I added the DIhandler in solrconfig.xml, created a data-config.xml according to my database scheme and also added a field in the schema.xml which was different. I am connecting with MySQL database
After i connect and I run the dataimport?command=full-import i get this response
"00C:\solr\conf\data-config.xmlfull-importidle1102011-03-05 15:01:04Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.2011-03-05 15:01:042011-03-05 15:01:040:0:0.400This response format is experimental. It is likely to change in the future."
The xml files are in this http://pastebin.com/iKebKGSZ
<field column="manu" name="manu" />
<field column="id" name="id" />
<field column="weight" name="weight" />
<field column="price" name="price" />
<field column="popularity" name="popularity" />
<field column="instock" name="inStock" />
<field column="includes" name="includes" />
Are these fields also in your schema.xml?
I couldnt see them in the pastebin link. Make sure you have all fields in your schema as well.