Solr 4 - Indexing posted text file - java

I'm trying to create a field called "sku" - which is indexed with the following analyzer:
<fieldType name="sku" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="(SKU|Part(\sNumber)?):?\s(\[0-9-\]+)" group="3"/>
</analyzer>
</fieldType>
This is from reading the documentation here http://lucidworks.lucidimagination.com/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer
I already have a Java program that is posting to the solr server succesfully, however it is not grabbing the sku out of any files, and indexing them. Here is my Java code:
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(
"/update/extract");
up.addFile(arg0, arg0.getName());
up.setParam("literal.id", arg0.getName());
up.setParam("uprefix", "attr_");
up.setParam("fmap.content", "attr_content");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);
Any help appreciated.
I understand I can parse the text files myself and extract the SKU and post them in the parameters to the server, but I thought Solr could do this for me?

It is hard to tell what is going on, because there is a several steps in the middle.
For example, what's your schema.xml definition. Is it definitely using sku as its type (and not say string). Then, what's the field name (attr_sku?) and does the extract handler mapping actually maps to it properly? The extract handler usually sends metadata as individual fields and then all file content as one big long field. Is sku somewhere in metadata?
I would do a copyField into something non-processing and see whether the content actually makes it into Solr field. Then, I would start troubleshooting the regex itself.

Related

XML node replace failure

I try Corb to search and update node in large number of documents:
Sample input:
<hcmt xmlns="http://horn.thoery">
<susceptible>X</susceptible>
<reponsible>foresee–intervention</reponsible>
<intend>Benefit Protagonist</intend>
<justified>Goal Outwiegen</justified>
</hcmt>
Xquery:
(: let $resp := "foresee–intervention" :)
let $docs :=
cts:search(doc(),
cts:and-query((
cts:collection-query("hcmt"),
cts:path-range-query("/horn:hcmt/horn:responsible", "=", $resp)
))
)
return
for $doc in $docs
return
xdmp:node-replace($doc/horn:hcmt/horn:responsible, "Foresee Intervention")
Expected output:
<hcmt xmlns="http://horn.thoery">
<susceptible>X</susceptible>
<reponsible>Foresee Intervention</reponsible>
<intend>Benefit Protagonist</intend>
<justified>Goal Outwiegen</justified>
</hcmt>
But node-replace didn’t happen in Corb and no error returns. Other queries work fine in Corb. How can the node-replace work correctly in Corb?
Thanks in advance for any help.
I create functions to reconcile the encoding matters. This not only mitigates potential API transaction failures but also is a requisite to validate & encode parameter or element/property/uri name.
That said, a sample MarkLogic Java API implementation is:
Create a dynamic query construct in the filesystem, in my case, product-query-option.xml (use the query value directly: Chooser–Option)
<search xmlns="http://marklogic.com/appservices/search">
<query>
<and-query>
<collection-constraint-query>
<constraint-name>Collection</constraint-name>
<uri>proto</uri>
</collection-constraint-query>
<range-constraint-query>
<constraint-name>ProductType</constraint-name>
<value>Chooser–Option</value>
</range-constraint-query>
</and-query>
</query>
</search>
Deploy the persistent query options to modules database, in my case, search-lexis.xml, the options file is like:
<options xmlns="http://marklogic.com/appservices/search">
<constraint name="Collection">
<collection prefix=""/>
</constraint>
<constraint name="ProductType">
<range type="xs:string" collation="http://marklogic.com/collation/en/S1">
<path-index xmlns:prod="schema://fc.fasset/product">/prod:requestProduct/prod:_metaData/prod:productType</path-index>
</range>
</constraint>
</options>
Follow on from Dynamic Java Search
File file = new File("src/main/resources/queryoption/product-query-option.xml");
FileHandle fileHandle = new FileHandle(file);
RawCombinedQueryDefinition rcqDef = queryMgr.newRawCombinedQueryDefinition(fileHandle, queryOption);
You can, assuredly, combine the query and the options as one handle in QueryDefinition.
Your original node-replace is translated as Java Partial Update
make sure the DocumentPatchBuilder setNamespaces with the correct NamespaceContext.
For batch data operation, the performant approach is MarkLogic Data Movement: instantiate the QueryBatcher with the searched Uris, supply the replace value or data fragment PatchBuilder.replaceValue
, and complete the batch with
dbClient.newXMLDocumentManager().patch(uri, patchHandle);
MarkLogic Data Services: If you succeed above, perhaps, then go at a more robust and scalable enterprise SOA approach, please review Data Services.
The implementation with Gradle is like:
(Note, all of the transformation metrics should be parameters, including path/element/property name, namespace, value…etc. Nothing is hardcoded.) One proxy service declared in service.json can serve multiple end points (under /root/df-ds/fxd ) with different types of modules which give you the free rein to develop pure Java or extend the development platform to handle complex data operations.
If these operations are persistent node update, you should consider in-memory node transform before the ingestion. Besides the MarkLogic data transformation tools, you can harness the power of XSLT2+.
Saxon XPathFactory could be a serviceable vehicle to query/transform node. Not sure if it is a reciprocity, ML Java API implements the XPath compile to split large paths and stream transaction. XSLT/Saxon is not my forte; therefore, I can’t comment how comparable it is with this encode/decode particularity or how it handles transaction (insert, update…etc) streaming.

In focframework were can I get a list of all properties supported in config.properties file, and how to add my own properties to be used in my own

I am developing a web application using the full stack framework focframework, and I want to know what are the properties that I can control in my config.properties file. Is there a doc for this?
I tried searching the doc but dodn't find anything
Obviously we can figure out some of then from the sample on GitHub by looking at the config.properties file:
jdbc.drivers=org.h2.Driver
jdbc.url=jdbc:h2:./myfocapplication_data_h2
jdbc.username=sa
jdbc.password=
gui.rtl=0
allowAddInsideComboBox=0
focWebServerClassName=com.focframework.sample.myfocapplication.MyFocAppWebServer
dataSourceClass=b01.focDataSourceDB.FocDataSource_DB
cloudStorageClass=com.focCloudStorage.FocCloudStorageS3
cloudStorageClass=com.foc.cloudStorage.FocCloudStorage_LocalDisc
devMode=1
unitDevMode=0
unitAllowed=1
log.dir=c:/01barmaja/log
log.ConsoleActive=1
log.fileActive=1
log.popupExceptionDialog=1
log.dbRequest=1
log.dbSelect=1
debug.showStatusColumn=0
log.debug=1
perf.active=0
Is there any hint on how to get all of them? And what if I want to add my own to be used in my code?
The ConfigInfo.java file is the one responsible of reading all the properties and storing them in variables. It is straight forward to understand and check the variables names and usage. Yet I agree that someone should work on the documentation and add these parameters.
To add your own without modifying the ConfigInfo.java you can simply use this method in the middle of your code.
String myProperty = ConfigInfo.getProperty("my.property.with.a.meanignful.name");

Check fails even if it exists in scala

I have recorded a session in scala. One request is failing even though I can see that tag in View source and Inspect Element. I tried all the other hidden fields but seems this one is not found. Here is the script:
val scn = scenario("Scenario Name")
.feed(csv("user_credentials.csv"))
.exec(http("request_1")
.get("/userLogin")
.headers(headers_1)
.check(regex("""<input id="javax.faces.ViewState" """).saveAs("ViewState_id"))
).pause(1)
.exec(http("request_999")
.post("/userLogin")
.headers(headers_1)
.param("""loginForm""", """loginForm""")
.param("""errorMsg""", """""")
.param("""c_username""", "${username}")
.param("""javax.faces.ViewState""", "${ViewState_id}")
.param("""goButton""", """goButton""")
)
The error which I'm getting is,
c.e.e.g.h.a.GatlingAsyncHandlerActor - Request 'request_1'
failed : Check 'exists' failed, found None
I found the tag <input id="javax.faces.ViewState" ..../> in source but this script is not able to find it. I tried testing with other fields and some hidden fields also, all other components are found except this. How to solve this issue?
I suspect the problem is that the id of your ViewState object is not actually javax.faces.ViewState - the name will be javax.faces.ViewState, but in the JSF implementations that I've seen, the id is something like j_id1:javax.faces.ViewState:0.
The simplest solution would be to follow the instructions at https://github.com/excilys/gatling/wiki/Handling-JSF for handling JSF in Gatling. Or, you could search for the ViewState element by name, rather than id - something like:
.check(css("""input[name="javax.faces.ViewState"]""", "value").saveAs("ViewState_id"))
Should do the trick.

Jasper Reports Exclude Column Headers in a Table?

I am using Jasper reports for a project that needs both PDF and CSV output and the majority of the data is the Detail section, within a table. I know you can remove the pageHeader and columnHeader at the document level, but is it possible to remove, or only print once, the column headers within a table? If not the CSV outputs,
User Type,Time,Username,Event,IP Address,Student Name,Student Number
Admin,6/6/11 8:09 PM,admin,Uploaded a report file.,0:0:0:0:0:0:0:1,,
....[about 20 more lines of CSV then]....
User Type,Time,Username,Event,IP Address,Student Name,Student Number
This just looks very unprofessional and isn't very functional. Like I said I know the page level headers can be removed with:
jasperPrint.getPropertiesMap().setProperty("net.sf.jasperreports.export.exclude.origin.band.1", "pageHeader");
jasperPrint.getPropertiesMap().setProperty("net.sf.jasperreports.export.exclude.origin.band.2", "pageFooter");
jasperPrint.getPropertiesMap().setProperty("net.sf.jasperreports.export.csv.exclude.origin.band.1", "columnHeader");
jasperPrint.getPropertiesMap().setProperty("net.sf.jasperreports.export.csv.exclude.origin.band.2", "pageFooter");
jasperPrint.getPropertiesMap().setProperty("net.sf.jasperreports.export.csv.exclude.origin.keep.first.band.1", "columnHeader");
but I am looking for a solution to remove them on table for CSV output only, not PDF. Is this possible?
Any help would be greatly appreciated!
Thanks,
Chuck
Some useful properties to control report export for different formats.
net.sf.jasperreports.export.xls.exclude.origin.band.1=title
net.sf.jasperreports.export.xls.exclude.origin.band.2=summary
net.sf.jasperreports.export.xls.exclude.origin.band.3=pageHeader
net.sf.jasperreports.export.xls.exclude.origin.band.4=pageFooter
net.sf.jasperreports.export.xls.exclude.origin.keep.first.band.1=columnHeader
net.sf.jasperreports.export.xls.collapse.row.span=false
net.sf.jasperreports.export.xls.remove.empty.space.between.columns=true
net.sf.jasperreports.export.csv.exclude.origin.band.csvSummary=summary
net.sf.jasperreports.export.csv.exclude.origin.band.1=title
net.sf.jasperreports.export.csv.exclude.origin.band.2=pageFooter
net.sf.jasperreports.export.csv.exclude.origin.keep.first.band.1=columnHeader
net.sf.jasperreports.export.xls.exclude.origin.band.1=title
net.sf.jasperreports.export.xls.exclude.origin.band.2=summary
net.sf.jasperreports.export.xls.exclude.origin.band.3=pageHeader
net.sf.jasperreports.export.xls.exclude.origin.band.4=pageFooter
net.sf.jasperreports.export.xls.exclude.origin.keep.first.band.1=columnHeader
net.sf.jasperreports.export.xls.collapse.row.span=false
net.sf.jasperreports.export.xls.remove.empty.space.between.columns=true
net.sf.jasperreports.export.html.using.images.to.align=false
net.sf.jasperreports.export.html.remove.emtpy.space.between.rows=true
net.sf.jasperreports.export.ignore.page.margins=true
Full reference.
Column headers in the table component are meant to be repeated when the table overflows and cannot be hidden. To achieve what you want you could either:
move the contents of your columnHeader into the tableHeader so that only the table header prints once
or filter out the elements when performing a specific export by adding sets of properties like these:
<property name="net.sf.jasperreports.export.pdf.exclude.origin.keep.first.band.1" value="columnHeader"/>
<property name="net.sf.jasperreports.export.pdf.exclude.origin.keep.first.report.1" value="*"/>
More info on filtering elements at export time here and here.
Maybe you should use different report definitions for each output. If not, then you could just recognise when you're printing to csv and only set those properties then.

java gate api. Creating pipeline with success, how can i get the annotationsets from the docs processed?

sorry in advance for my poor grammar.
I have created a pipeline with GATE API, i run it successfully.
I created a serialanalysercontroller like this: pipeline = (SerialAnalyserController)Factory.createResource("gate.creole.SerialAnalyserController");
, then i load a corpus of files (previously populated)
pipeline.setCorpus(foo)
and last, pipeline.execute().
It all works great and i see the results. My problem is that i cannot find the way to get the AnnotationSet for each document that was processed in the corpus. For example i want to find the AnnotationSet ("sentences") to find in which offsets the sentences start and stop in the original text file. The API does not tell how I will get the annotations from the SerialAnalyserController - how to get each gate.Document after the process pipeline has finished.
Thanks in advance
Ok, found it!
I get the corpus back, then because the Corpus is a list, with method get(x) i get which document I want and then I get the annotationSets.
Thanks

Categories

Resources