How to skip field when empty in beanio? - java

The requirement is to skip field when its empty.
eg -
<segment name="seg1" class="com.company.bean.segmentBean" xmlType="none">
<field name="field1" xmlName= "fieldXml1" xmlType="attribute" maxLength="7" />
<field name="field2" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />
</segment>
Lets assume that field2="". As the value of field2 is "". I would like to have the field skipped in segment. Basically the end result XML shouldnt display field2 as its empty("").

You need the lazy attribute on field2. As stated in the Reference Guide
lazy - Set to true to convert empty field text to null before type conversion. For repeating fields bound to a collection, the collection will not be created if all field values are null or the empty String. Defaults to false.
This will make field2 = null and by default, most XML libraries will not output any null elements, BeanIO included.
Try this for field2:
<field name="field2" lazy="true" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />
Most of the time, I also combine lazy with trim. From the docs:
trim - Set to true to trim the field text before validation and type
conversion. Defaults to false.
<field name="field2" lazy="true" trim="true" xmlName= "fieldX2l1" xmlType="attribute" maxLength="1" typeHandler="Handler" />

Related

BeanIO writes 0 instead of intended value

I have a fixed-length stream containing record counters
Records starts with Z
Characters 16+9 (human form) contain B record counter
Characters 25+9 (human form) contain C record counter
All numbers padded with 0 and aligned to the right
Record ends with A + CRLF at position 1898 (record is long 2000 chars)
Following BeanIO mapping code
<record name="RECORD_Z" class="com.acme.ftt2017.RecordZ" order="4" minOccurs="1" maxOccurs="1" maxLength="1900">
<field name="tipoRecord" rid="true" at="0" ignore="true" required="true" length="1" lazy="true" literal="Z" />
<field name="numeroRecordB" at="15" length="9" padding="0" align="right" trim="true" />
<field name="numeroRecordC" at="24" length="9" padding="0" align="right" trim="true" />
<field name="terminatorA" at="1897" length="1" rid="true" literal="A" ignore="true" />
</record>
Bean
public class RecordZ implements Serializable
{
private final char tipoRecord = 'Z';
private Integer numeroRecordB, numeroRecordC;
// G & S omitted
}
I have triple-checked in debug the following code:
RecordZ trailer = new RecordZ();
trailer.setNumeroRecordB(1);
trailer.setNumeroRecordC(countRecordC); // equals 1 in debug
log.debug("Exporting record Z");
log.trace("Record Z: " + trailer.toString());
exporter.write(FttRecordTypes.RECORDTYPE_FTT_Z, trailer);
However the produced data file contains the following
Z 000000000000000000 A
Expected
Z 000000001000000001 A
What is wrong with my export code? Why am I getting always zeroes?
From the last paragraph in Section 4.3.1
Optionally, a format attribute can be used to pass a decimal format for java.lang.Number types, and for passing a date format for java.util.Date types. In the example below, the hireDate field uses the SimpleDateFormat pattern "yyyy-MM-dd", and the salary field uses the DecimalFormat pattern "#,##0". For more information about supported patterns, please reference the API documentation for Java's java.text.DecimalFormat and java.text.SimpleDateFormat classes.
And in Section 4.3.2
A type handler may be explicitly named using the name attribute, and/or registered for all fields of a particular type by setting the type attribute. The type attribute can be set to the fully qualified class name or type alias of the class supported by the type handler. To reference a named type handler, use the typeHandler field attribute when configuring the field.
Thus, You can try one of 2 things:
Remove the use of your custom *IntegerTypeHandlers and specify a format attribute on the fields which use these custom type handlers. This might be a lot of work depending on the amount of fields and type handlers you have. For example:
<field name="numeroRecordB" format="000000000" at="15" length="9" padding="0" align="right" trim="true" />
OR
Make the getType() method return null instead of Integer.class in your custom type handlers, this will hopefully then not be used as a global type handler. I haven't done this before, so it might not work.
public Class<?> getType() {
return null;
}
Hope this helps.
Discovered the guilty. The custom type handlers!!!!
According to BeanIO
A type handler may be explicitly named using the name attribute, and/or registered for all fields of a particular type by setting the type attribute. The type attribute can be set to the fully qualified class name or type alias of the class supported by the type handler. To reference a named type handler, use the typeHandler field attribute when configuring the field.
So I did not show that in the heading of my file I have registered plenties of custom type handlers, unfortunately all with a type attribute.
<typeHandler name="int_2" type="java.lang.Integer" class="org.beanio.types.IntFixedLengthTypeHandler">
<property name="numberOfDigits" value="2" />
</typeHandler>
<typeHandler name="int_4" type="java.lang.Integer" class="org.beanio.types.IntFixedLengthTypeHandler">
<property name="numberOfDigits" value="2" />
</typeHandler>
<typeHandler name="int_10" type="java.lang.Integer" class="org.beanio.types.IntFixedLengthTypeHandler">
<property name="numberOfDigits" value="10" />
</typeHandler>
<typeHandler name="bigint_16" type="java.math.BigInteger" class="org.beanio.types.BigIntFixedLengthTypeHandler">
<property name="numberOfDigits" value="16" />
</typeHandler>
Removing the type attribute works

Solr error when attempting to use a field set to a fieldtype of solr.StrField

I'm trying to setup faceting through Solr. I have a specific field to use called "Category". The faceting works, but appears lowercase and only contains a single word. I'm sure this is due to the field analyzer so I've change the field's type to "string". This produces a error! See below.
FYI - I'm using Solr 4.3.
You can see the fieldtype and the field below. You can see the field is set to the fieldtype "string". This field type uses "Solr.StrField". Why would using this fieldtype cause the error below?
Schema.xml
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
</types>
<fields>
<field name="category" type="string" indexed="true" stored="true"
required="false" multiValued="false"/>
</fields>
I'm getting the error message in the log and when attempting to run a query
This AttributeSource does not have the attribute
'org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute'
null:java.lang.IllegalArgumentException: This AttributeSource does not
have the attribute
'org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute'.
Update:
Since the error seems to indicate that my fieldtype should have the attribute PositionIncrementAttribute, I added to my string to test. I then cleared and re-indexed, but still no luck.
<fieldType name="string" class="solr.StrField" sortMissingLast="true"
positionIncrementGap="100"/>

Multiple length="unbounded" in fixed length file

I'm experiencing a little issue and I'm asking for your help!
Using BeanIO 2.1 and working on a fixed-length file, I'm currently trying to retrieve a record that is structured like this :28C:5n/5n
':28C:' : fix
5 numbers (maximum)
'/' : fix
5 numbers (maximum)
Examples:
:28C:61/00005
:28C:100/00001
:28C:12345/12345
Here is a snippet of the code:
<record name="statementNumber" class="com.batch.records.StatementNumber" occurs="1">
<field name="tag" type="string" length="5" rid="true" literal=":28C:" ignore="true" />
<field name="statementNr" type="int" length="unbounded" maxLength="5" />
<field name="separator" type="string" length="1" rid="true" literal="/" ignore="true" />
<field name="sequenceNr" type="int" length="unbounded" maxLength="5" />
</record>
When running my parser, I get this exception:
Cannot determine field position, field is preceded by another component with indeterminate occurrencesor unbounded length
My question is: how can I tell BeanIO that the field '/' is actually the delimiter between the two variable fields ?
Thanks in advance
You can only have 1 field of an unbounded length on a line. The BeanIO documentation says:
The length of the last field in a fixed length record may be set to unbounded to disable padding and allow a single variable length field at the end of the otherwise fixed length record.
Honestly, I'm not sure if this can be done using BeanIO. Is it an option to read the 5n/5n fields completely into Java as 1 field and splitting it on / in your code, instead of BeanIO?

How to Check for Multiple Conditions in Xpath

I want to retrive PName of the row/field whose id =2 and pAddress=INDIA
<?xml version="1.0"?>
<mysqldump >
<database name="MyDb">
<table name="DescriptionTable">
<row>
<field name="id">1</field>
<field name="pName">XYZ</field>
<field name="pAddress">INDIA</field>
<field name="pMobile">1234567897</field>
</row>
<row>
<field name="id">2</field>
<field name="pName">PQR</field>
<field name="pAddress">UK</field>
<field name="pMobile">755377</field>
</row>
<row>
<field name="id">3</field>
<field name="pName">ABC</field>
<field name="pAddress">USA</field>
<field name="pMobile">67856697</field>
</row>
</table>
</database>
</mysqldump>
String expression="/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/row/field[#name='id' and ./text()]";
Edit:
I would like to get Pname whoese id is 2 and pAddress=INDIA
String expression="/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/row/field[#name='id' and .='2']and[#name='pAddress' and .='INDIA']/../field[#name='pName']/text()";
Both of the above answers could be improved by moving aspects of the path expression into the predicates, and using nested nested predicates. IMHO this makes the XPath selection much more human readable.
First we find the row with the field whose #name eq id and text() = "2", from there we can simply select the field from that row whose the #name eq "pName".
/mysqldump/database[#name = "MyDb"]/table[#name = "DescriptionTable"]/row[field[#name eq "id"][text() = "2"]]/field[#name = "pName"]
Also note the explicit use of eq and =, eq is used for comparing atomic values, in this instance the selection of our attributes, and = is used for comparing sequences (as it is conceivable that text() may return more than one item - although it won't for your example XML).
try
/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/row/field[#name='id'][.='2']/following-sibling::field[#name='pName']/text()
/mysqldump/database/table/row/field[#name='id' and .=2]/../field[#name='pName']
Explanation:
/mysqldump/database/table/row/field[#name='id' and .=2]
gets the field where id name attribute= id and the value equals 2
../
goes to the parent node.
field[#name='pName']
searches the field where attribute name contains pName
You can try to do this way :
/mysqldump
/database[#name='MyDb']
/table[#name='DescriptionTable']
/row[
field[#name[.='id'] and .=2]
and
field[#name[.='pAddress'] and .='INDIA']
]
/field[#name='pName']
Above XPath will select <row> element whose id is 2 and pAddress=INDIA, then get the row's pName. But looking at the sample XML in this question, there is no such <row> that fulfill both criteria. If you meant to select row which either has id equals 2 or has pAddress equals INDIA, you can use or instead of and in the row filtering expression :
/mysqldump
/database[#name='MyDb']
/table[#name='DescriptionTable']
/row[
field[#name[.='id'] and .=2]
or
field[#name[.='pAddress'] and .='INDIA']
]
/field[#name='pName']
The Simplest way to achieve the above is
String expression = "/mysqldump/database[#name='MyDb']/table[#name='DescriptionTable']/
row[./field[#name="id"]/text()="2" and ./field[#name="pAddress"]/text()="INDIA"]
/field[#name="pName"]/text()";
You can add multiple conditions in the second line seperated by and/or based on your needs
As I understand, you want to get text from two pNames.
I think this should work for you in scope of current xml:
//row/field[text()='2' OR text()='INDIA']/../field[#name='pName']/text()
If you want to take just nodes:
//row/field[text()='2' OR text()='INDIA']/../field[#name='pName']

Solr - Field with default value resets itself if it is stored=false

I am having a weird problem with Solr (4.x) when I set a field to stored=false and give it a default value. To make everything clear, my schema is something like:
<field name="field1" type="tint" indexed="true" stored="true" />
<field name="field2" type="tint" indexed="true" stored="true" />
<field name="field3" type="tint" indexed="true" stored="true" />
<field name="field4" type="tint" indexed="true" stored="true" />
<field name="field5" type="tint" indexed="true" stored="false" default="0" />
And by default each document has a field5=0 at the beginning. Then I update documents, and set field5=1 for some documents. And in case I update the documents which has field5=1, they all goes back to field5=0. But when field5 is stored=true, then there is no problem, they are never going back to default value, although there is no updates on that field...
any solution to overcome this? I can keep the field stored=true of course but then it will cause the index get bigger hence slowing down the search because of overhead...
Behind the scenes, the update operation retrieves the stored value of the fields and reindexes the whole new entity. So, if the field is not marked as stored, you cannot use it with atomic update. Usually, unstored fields would just disappear, the interplay with default value is unusual.
I would not worry about performance at this stage of index design, especially if it is for numbers. There is all sorts of optimization under the covers you can use later and the bottleneck may not be where you expect.
Just encountered the same issue.
Update feature in SOLR requires all fields to be stored "TRUE" as SOLR uses stored fields to get data from them and then uses this data to reconstruct the document that will be removed and indexed again.

Categories

Resources