Index XML files in Apache Solr as plain text - java

Is there any way to dump all contents of xml file in a single content field??
schema.xml
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="content" type="text_general" indexed="true" stored="true" multiValued="false" termVectors="true" termPositions="true" termOffsets="true"/>
code used for indexing
HttpUrlConnection solrHttpURLConnection = "http://localhost:7892/solr/myCore/update/extract?literal.id=1234&commit=true "
solrHttpURLConnection.setDoOutput(true);
solrHttpURLConnection.setDoInput(true);
solrHttpURLConnection.setUseCaches(false);
solrHttpURLConnection.setAllowUserInteraction(false);
solrHttpURLConnection.setRequestProperty("Content-type", type);
solrHttpURLConnection.connect();
i am taking outputstream from this url and writing contents by taking input stream from dataServer.
NOTE:
the above code works for all file formats except xml,csv and json.
no error message is coming from solr
Sample XML File
<?xml version="1.0" encoding="UTF-8"?>
<content>just a test
</content>

Set the content type to "text/xml"
Add the following lines to your code:
OutputStreamWriter writer = new OutputStreamWriter(solrHttpURLConnection.getOutputStream());
writer.write(your_xml_file);
writer.flush();
Execute the request with this url http://localhost:7892/solr/myCore/update?literal.id=1234&commit=true
For json files use /update/json/docs
Please also check this documentation about uploading data with index handlers https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-XMLUpdateCommands

Related

generate dynamic page in HTML from xml

I have several different XML files. I want to display them in HTML format using JSP file.
The structure of these files can be different. They may have different depth and tag's names.
I've read about XSLT but I found only examples with manually created xsl. (I can't do that because i will have many different xml files).
How can i produce my html?
EDIT:
Example of form:
<form name="Company">
<field name="name" required="true" type="inputText"/>
<field name="registrationDate" required="true" type="date"/>
<field name="isActive" required="false" type="boolean"/>
<field name="unregistrationDate" type="date"/>
<field name="type" type="comboBox">
<values>
<value>type1</value>
<value>type3</value>
<value>type3</value>
</values>
<defaultSelected>type2</defaultSelected>
</field>
</form>

File Upload in Java OFBiz?

I have a form that has an option for uploading file
<form type="upload" name="myForm" target="rgUsrStory">
<field name="st_title" title="${uiLabelMap.uStoryTitle}"><text/></field>
<field name="upload_file" title="${uiLabelMap.UploadFile}"><file/></field>
<field name="submitButton" title="${uiLabelMap.submit}"><submit/></field>
</form>
request map:
<request-map uri="rgUsrStory">
<security https="true" auth="true"/>
<event type="java" path="org.ofbiz.webapp.control.usrStory" invoke="rgUsrStory" />
<response name="success" type="view" value="main"/>
<response name="error" type="view" value="login"/>
</request-map>
The event function is working properly, but i need to upload the file also to the server and details of that file to the table named as 'documents', but i don't know how to do that, i searched throw the web but only i found using ftl, also want to control the file type of defined file that i want to display those file types as allowed file in form during adding user story.
For any guide and help thanks.
Please have a look at the image upload functionality in the OFBiz content manager.
There's a form
<form name="ImageUpload" target="uploadImage" title="" type="upload" default-map-name="currentValue"
header-row-style="header-row" default-table-style="basic-table">
<field name="dataResourceId" title="${uiLabelMap.ContentDataResourceId}"><display/></field>
<field name="dataResourceTypeId" ><hidden/></field>
<field name="objectInfo" title="${uiLabelMap.ContentUploadedFile}"><display /></field>
<field name="imageData" entity-name="ImageDataResource" title="${uiLabelMap.ContentFile}"><file/></field>
<field name="submitButton" title="${uiLabelMap.CommonUpload}" widget-style="smallSubmit"><submit button-type="button"/></field>
</form>
The corresponding request in the controller.xml
<request-map uri="uploadImage">
<security auth="true" https="true"/>
<event invoke="persistContentAndAssoc" path="" type="service"/>
<response name="success" type="request" value="UploadImage"/>
<response name="error" type="view" value="UploadImage"/>
</request-map>
The service name in services.xml lead you to the service method
<service name="persistContentAndAssoc" engine="java" transaction-timeout="7200"
location="org.ofbiz.content.ContentManagementServices" invoke="persistContentAndAssoc" auth="true">
<description>Create a Content, DataResource and/or ContentAssoc</description>
<permission-service service-name="genericContentPermission" main-action="CREATE"/>
...
</service>
In org.ofbiz.content.ContentManagementServices#persistContentAndAssoc the uploaded file is read by
ByteBuffer imageDataBytes = (ByteBuffer) context.get("imageData");
(the corresponding form field).
You will find some other functionality like dealing with the mime type there.

How to perform nested aggregation on multiple fields in Solr?

I am trying to perform search result aggregation (count and sum) grouping by several fields in a nested fashion.
For example, with the schema shown at the end of this post, I'd like to be able to get the sum of "size" grouped by "category" and sub-grouped further by "subcategory" and get something like this:
<category name="X">
<subcategory name="X_A">
<size sum="..." />
</subcategory>
<subcategory name="X_B">
<size sum="..." />
</subcategory>
</category>
....
I've been looking primarily at Solr's Stats component which, as far as I can see, doesn't allow nested aggregation.
I'd appreciate it if anyone knows of some way to implement this, with or without the Stats component.
Here is a cut-down version of the target schema:
<types>
<fieldType name="string" class="solr.StrField" />
<fieldType name="text" class="solr.TextField">
<analyzer><tokenizer class="solr.StandardTokenizerFactory" /></analyzer>
</fieldType>
<fieldType name="date" class="solr.DateField" />
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
</types>
<fields>
<field name="id" type="string" indexed="true" stored="true" />
<field name="category" type="text" indexed="true" stored="true" />
<field name="subcategory" type="text" indexed="true" stored="true" />
<field name="pdate" type="date" indexed="true" stored="true" />
<field name="size" type="int" indexed="true" stored="true" />
</fields>
The new faceting module in Solr 5.1 can do this, it was added in https://issues.apache.org/jira/browse/SOLR-7214
Here is how you would add sum(size) to every facet bucket, and sort descending by that statistic.
json.facet={
categories:{terms:{
field:category,
sort:"total_size desc", // this will sort the facet buckets by your stat
facet:{
total_size:"sum(size)" // this calculates the stat per bucket
}
}}
}
And this is how you would add in the subfacet on subcategory:
json.facet={
categories:{terms:{
field:category,
sort:"total_size desc",
facet:{
total_size:"sum(size)",
subcat:{terms:{ // this will facet on the subcategory field for each bucket
field:subcategory,
facet:{
sz:"sum(size)" // this calculates the sum per sub-cat bucket
}}
}
}}
}
So the above will give you the sum(size) at both the category and subcategory levels. Documentation for the new facet module is currently at http://yonik.com/json-facet-api/
There is a patch SOLR-3583, which adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals. It is possible to add sums to pivot facets by improving this patch.
For example, averages can be calculated for categories using this url:
http://localhost:8983/solr/select?q=*%3A*
&facet=true
&facet.pivot=category,subcategory
&facet.stats.percentiles=true
&facet.stats.percentiles.averages=true
&facet.stats.percentiles.field=size
&f.size.stats.percentiles.requested=25,50,75
&f.size.stats.percentiles.lower.fence=0
&f.size.stats.percentiles.upper.fence=1000
&f.size.stats.percentiles.gap=10
See also this video and slides for more details.
1. Counts
To get the counts, you can use Pivot Faceting. It will generate a list very similar to what you asked but with counts only.
You'll need to append this to your query:
&facet=true&facet.pivot=category,subcategory
Note that this works on Solr 4.0 and after.
2. Sums
As for the sums, I think you can achieve them with ordinary facets but using a facet query instead of facet field.. I'm not entirely sure of this one, I'll try it out and re-post if found anything useful.

how to store in SOLR (mini) relational data

My data set is title, description and tags.
I would like to store and index in the SOLR the tag_name and their relative tag_id.
As can be understood, each record has one title, one description bt many tag names + tag ids.
I guess I can store the tags as "some-tag-[id]" but his seems wrong.
You can index tags and tags_id as multivalued fields and add in order.
The order is maintained so you can map them within the fields.
<field name="tags" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="tags_id" type="string" indexed="false" stored="true" multiValued="true"/>
Response -
<arr name="tags">
<str>tag1</str>
<str>tag2</str>
<str>tag3</str>
</arr>
<arr name="tags_id">
<str>id1</str>
<str>id2</str>
<str>id3</str>
</arr>

Data Import Handler in Solr

I am trying the Data Import Handler for MySQL Database.
I added the DIhandler in solrconfig.xml, created a data-config.xml according to my database scheme and also added a field in the schema.xml which was different. I am connecting with MySQL database
After i connect and I run the dataimport?command=full-import i get this response
"00C:\solr\conf\data-config.xmlfull-importidle1102011-03-05 15:01:04Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.2011-03-05 15:01:042011-03-05 15:01:040:0:0.400This response format is experimental. It is likely to change in the future."
The xml files are in this http://pastebin.com/iKebKGSZ
<field column="manu" name="manu" />
<field column="id" name="id" />
<field column="weight" name="weight" />
<field column="price" name="price" />
<field column="popularity" name="popularity" />
<field column="instock" name="inStock" />
<field column="includes" name="includes" />
Are these fields also in your schema.xml?
I couldnt see them in the pastebin link. Make sure you have all fields in your schema as well.

Categories

Resources