Multiple length="unbounded" in fixed length file - java

I'm experiencing a little issue and I'm asking for your help!
Using BeanIO 2.1 and working on a fixed-length file, I'm currently trying to retrieve a record that is structured like this :28C:5n/5n
':28C:' : fix
5 numbers (maximum)
'/' : fix
5 numbers (maximum)
Examples:
:28C:61/00005
:28C:100/00001
:28C:12345/12345
Here is a snippet of the code:
<record name="statementNumber" class="com.batch.records.StatementNumber" occurs="1">
<field name="tag" type="string" length="5" rid="true" literal=":28C:" ignore="true" />
<field name="statementNr" type="int" length="unbounded" maxLength="5" />
<field name="separator" type="string" length="1" rid="true" literal="/" ignore="true" />
<field name="sequenceNr" type="int" length="unbounded" maxLength="5" />
</record>
When running my parser, I get this exception:
Cannot determine field position, field is preceded by another component with indeterminate occurrencesor unbounded length
My question is: how can I tell BeanIO that the field '/' is actually the delimiter between the two variable fields ?
Thanks in advance

You can only have 1 field of an unbounded length on a line. The BeanIO documentation says:
The length of the last field in a fixed length record may be set to unbounded to disable padding and allow a single variable length field at the end of the otherwise fixed length record.
Honestly, I'm not sure if this can be done using BeanIO. Is it an option to read the 5n/5n fields completely into Java as 1 field and splitting it on / in your code, instead of BeanIO?

Related

How to skip field when empty in beanio?

The requirement is to skip field when its empty.
eg -
<segment name="seg1" class="com.company.bean.segmentBean" xmlType="none">
<field name="field1" xmlName= "fieldXml1" xmlType="attribute" maxLength="7" />
<field name="field2" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />
</segment>
Lets assume that field2="". As the value of field2 is "". I would like to have the field skipped in segment. Basically the end result XML shouldnt display field2 as its empty("").
You need the lazy attribute on field2. As stated in the Reference Guide
lazy - Set to true to convert empty field text to null before type conversion. For repeating fields bound to a collection, the collection will not be created if all field values are null or the empty String. Defaults to false.
This will make field2 = null and by default, most XML libraries will not output any null elements, BeanIO included.
Try this for field2:
<field name="field2" lazy="true" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />
Most of the time, I also combine lazy with trim. From the docs:
trim - Set to true to trim the field text before validation and type
conversion. Defaults to false.
<field name="field2" lazy="true" trim="true" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />

BeanIO writes 0 instead of intended value

I have a fixed-length stream containing record counters
Records starts with Z
Characters 16+9 (human form) contain B record counter
Characters 25+9 (human form) contain C record counter
All numbers padded with 0 and aligned to the right
Record ends with A + CRLF at position 1898 (record is long 2000 chars)
Following BeanIO mapping code
<record name="RECORD_Z" class="com.acme.ftt2017.RecordZ" order="4" minOccurs="1" maxOccurs="1" maxLength="1900">
<field name="tipoRecord" rid="true" at="0" ignore="true" required="true" length="1" lazy="true" literal="Z" />
<field name="numeroRecordB" at="15" length="9" padding="0" align="right" trim="true" />
<field name="numeroRecordC" at="24" length="9" padding="0" align="right" trim="true" />
<field name="terminatorA" at="1897" length="1" rid="true" literal="A" ignore="true" />
</record>
Bean
public class RecordZ implements Serializable
{
private final char tipoRecord = 'Z';
private Integer numeroRecordB, numeroRecordC;
// G & S omitted
}
I have triple-checked in debug the following code:
RecordZ trailer = new RecordZ();
trailer.setNumeroRecordB(1);
trailer.setNumeroRecordC(countRecordC); // equals 1 in debug
log.debug("Exporting record Z");
log.trace("Record Z: " + trailer.toString());
exporter.write(FttRecordTypes.RECORDTYPE_FTT_Z, trailer);
However the produced data file contains the following
Z 000000000000000000 A
Expected
Z 000000001000000001 A
What is wrong with my export code? Why am I getting always zeroes?
From the last paragraph in Section 4.3.1
Optionally, a format attribute can be used to pass a decimal format for java.lang.Number types, and for passing a date format for java.util.Date types. In the example below, the hireDate field uses the SimpleDateFormat pattern "yyyy-MM-dd", and the salary field uses the DecimalFormat pattern "#,##0". For more information about supported patterns, please reference the API documentation for Java's java.text.DecimalFormat and java.text.SimpleDateFormat classes.
And in Section 4.3.2
A type handler may be explicitly named using the name attribute, and/or registered for all fields of a particular type by setting the type attribute. The type attribute can be set to the fully qualified class name or type alias of the class supported by the type handler. To reference a named type handler, use the typeHandler field attribute when configuring the field.
Thus, You can try one of 2 things:
Remove the use of your custom *IntegerTypeHandlers and specify a format attribute on the fields which use these custom type handlers. This might be a lot of work depending on the amount of fields and type handlers you have. For example:
<field name="numeroRecordB" format="000000000" at="15" length="9" padding="0" align="right" trim="true" />
OR
Make the getType() method return null instead of Integer.class in your custom type handlers, this will hopefully then not be used as a global type handler. I haven't done this before, so it might not work.
public Class<?> getType() {
return null;
}
Hope this helps.
Discovered the guilty. The custom type handlers!!!!
According to BeanIO
A type handler may be explicitly named using the name attribute, and/or registered for all fields of a particular type by setting the type attribute. The type attribute can be set to the fully qualified class name or type alias of the class supported by the type handler. To reference a named type handler, use the typeHandler field attribute when configuring the field.
So I did not show that in the heading of my file I have registered plenties of custom type handlers, unfortunately all with a type attribute.
<typeHandler name="int_2" type="java.lang.Integer" class="org.beanio.types.IntFixedLengthTypeHandler">
<property name="numberOfDigits" value="2" />
</typeHandler>
<typeHandler name="int_4" type="java.lang.Integer" class="org.beanio.types.IntFixedLengthTypeHandler">
<property name="numberOfDigits" value="2" />
</typeHandler>
<typeHandler name="int_10" type="java.lang.Integer" class="org.beanio.types.IntFixedLengthTypeHandler">
<property name="numberOfDigits" value="10" />
</typeHandler>
<typeHandler name="bigint_16" type="java.math.BigInteger" class="org.beanio.types.BigIntFixedLengthTypeHandler">
<property name="numberOfDigits" value="16" />
</typeHandler>
Removing the type attribute works

Lucene/Solr - Indexing publications/texts

I want to be able to search publications with facets. These documents will be annotated so I will upload the annotation to the solr instance. The annotation will have fields which are the terms in the document. Here is an example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<add>
<doc>
<field name="Title">High Glucose Increases the Expression of Inflammatory Cytokine Genes in
Macrophages Through H3K9 Methyltransferase Mechanism.</field>
<field name="Cytokine">INTERFERON </field>
<field name="Cytokine">CYTOKINE </field>
<field name="Cytokine">CYTOKINE</field>
<field name="Cytokine">MEC</field>
<field name="Cytokine">EPA</field>
<field name="Cytokine">DIA</field>
<field name="Cytokine">FIC</field>
<field name="Cytokine">CYTOKINES</field>
<field name="Cytokine">INTERLEUKIN-6 </field>
<field name="Cytokine">INTERLEUKIN</field>
<field name="Cytokine">IL-12P40</field>
<field name="Cytokine">IL-12</field>
<field name="Cytokine">IL-1</field>
<field name="Cytokine">P40</field>
<field name="Cytokine">MACROPHAGE INFLAMMATORY PROTEIN-1</field>
<field name="Cytokine">MACROPHAGE INFLAMMATORY PROTEIN</field>
</doc>
</add>
These terms are all from a Cytokine ontology.
I want be able to set the facet as Cytokine, then select the term and find all of the documents that contain the selected term.
Here is the catch:
I want to be able to store the location of said term found in the
document (it can show up in multiple locations. So I can highlight
later). All of these locations are stored in the annotation.
I want to be able to select one of the terms from the facet and also
bring up documents that contains that terms synonyms but not upload it as a term in the facet (or it being distinguished as a synonym some how (like subcategory)). e.g. automobile
and car
I want to be able to do a cross search e.g. find documents that
contain MEC and EPA.
I have a list of terms I do want to index and want to search the
documents by. These terms have synonyms which I have entered into the
synonyms.txt file.
Also When a term shows up multiple times in the document the annotation has multiple instances of this term with different locations, how should I handle this? Will solr automatically deal with duplication and not give me the documents twice?
One more thing: What about uploading the entire publication to solr, and indexing it on the predefined list of terms?
I understand that, you have synonyms and a search term should be verified directly and also with synonyms and return the results. Let me know if I got it.
If you have all the synonyms while indexing, then you can index them as multi valued field and search on that field.
Faceting is for searching, where the results are grouped.

Solr - Field with default value resets itself if it is stored=false

I am having a weird problem with Solr (4.x) when I set a field to stored=false and give it a default value. To make everything clear, my schema is something like:
<field name="field1" type="tint" indexed="true" stored="true" />
<field name="field2" type="tint" indexed="true" stored="true" />
<field name="field3" type="tint" indexed="true" stored="true" />
<field name="field4" type="tint" indexed="true" stored="true" />
<field name="field5" type="tint" indexed="true" stored="false" default="0" />
And by default each document has a field5=0 at the beginning. Then I update documents, and set field5=1 for some documents. And in case I update the documents which has field5=1, they all goes back to field5=0. But when field5 is stored=true, then there is no problem, they are never going back to default value, although there is no updates on that field...
any solution to overcome this? I can keep the field stored=true of course but then it will cause the index get bigger hence slowing down the search because of overhead...
Behind the scenes, the update operation retrieves the stored value of the fields and reindexes the whole new entity. So, if the field is not marked as stored, you cannot use it with atomic update. Usually, unstored fields would just disappear, the interplay with default value is unusual.
I would not worry about performance at this stage of index design, especially if it is for numbers. There is all sorts of optimization under the covers you can use later and the bottleneck may not be where you expect.
Just encountered the same issue.
Update feature in SOLR requires all fields to be stored "TRUE" as SOLR uses stored fields to get data from them and then uses this data to reconstruct the document that will be removed and indexed again.

JAXB multiple mappings for attribute

I'm just changing design errors made in the past, but want to keep backwards compatibility of my software. For this I would need some way to map two flavors of an xml file into one java bean. Can this be done using two JAXB annotations on one attribute/element? I understand the marshalling would be ambiguous, but the unmarshalling could work. Is there some nice way of doing this?
p.s.: I don't care about marshalling.
You can map twice:
the first time using annotations
the second time using XML resources.
Or just two XML mappings instead of annotations.
For XML resource mappings, there's a number of options:
Annox: http://confluence.highsource.org/display/ANX/JAXB+User+Guide (I wrote it so it comes first :))
EclipseLink Moxy: http://eclipse.org/eclipselink/moxy.php
JAXB Intoductions: http://community.jboss.org/wiki/JAXBIntroductions
With Annox you can easily map twice using XML mapping resources with different extensions like MyClass.ann1.xml or MyClass.ann2.xml. (It's MyClass.ann.xml per default, but the adjustment is trivial.)
Here's a sample of what mappings look like:
<class xmlns="http://annox.dev.java.net" xmlns:annox="http://annox.dev.java.net" xmlns:jaxb="http://annox.dev.java.net/javax.xml.bind.annotation">
<jaxb:XmlAccessorType value="FIELD"/>
<jaxb:XmlType name="" propOrder="productName quantity usPrice comment shipDate"/>
<field name="productName">
<jaxb:XmlElement required="true"/>
</field>
<field name="usPrice">
<jaxb:XmlElement name="USPrice" required="true"/>
</field>
<field name="shipDate">
<jaxb:XmlSchemaType name="date"/>
</field>
<field name="partNum">
<jaxb:XmlAttribute required="true"/>
</field>
</class>

Categories

Resources