A Field Guide to Parsing and Creating Flat Files
Using the Humble Flatworm

For Version 2.0
Last Revised December, 2009

Flat files. Much as we live in an XML/SOAP/Web Services world, there's still a ton of data being moved around between proprietary and legacy applications that consists of fixed length fields delimited by EOLs. Around about the time I wrote my 20th Java class who's only purpose in life was to suck up a flat file, use String.substring to break it up into pieces, and then populate a bean with it, I decided there had to be a better way. This package represents the fruit of that frustration.

What is Flatworm?

Flatworm is a Java library intended to allow a developer to describe the format of a flat file using an XML definition file, and then to be able to automatically read lines from that file, and have one or more beans be instantiated for each logical record.

There are a few powerful features in Flatworm worth mentioning. For one thing, a record may consist of one or more physical lines in the file. A record may contain more than one bean once decoded. A flat file may contain more than one type of record, and Flatworm can use line length and substring matching to determine which type of record a line begins.

Besides fielded buffer flat files, Flatworm also supports text files where the different fields are separated by a separator character, e.g. CSV (comma separated values) files.

Flatworm, as of version 2.0, also supports delimited files that contain segments that may repeat. These are different than standard flat files that have a well defined number of fields for each record. With repeating segments it is possible to have a varying number of the segment in each record, so that different records in the file could have a different number of fields. Repeating segments are supported only for reading delimited files.

Last but not least, Flatworm is able to produce flat files from beans and the same definition file.

Requirements

In addition to the flatworm jar file, you will also need to have the following jars in your classpath in order for Flatworm to thrive:

commons-beanutils (from Apache Commons)
commons-collections (from Apache Commons)
commons-logging (from Apache Commons)
commons-lang (from Apache Commons)
log4j (www.log4j.org)(optional)

Recent versions of all of these packages are available in the source jar file.

Downloading

The latest version of Flatworm is Release 2.0. You can download it from Sourceforge .

A Simple Example

Before diving into the complexities of Flatworm, let's look at a simple example that illustrates the basic operation. Imagine the following input file which contains new hire data for a company:

NHJAMES          TURNER         M123-45-67890004224345
NHJOHN           JONES          M987-65-43210104356745

The layout of the file is as follows:

RECORD NAME	TYPE	LENGTH
recordtype	char	2
firstname	char	15
lastname	char	15
gender	char	1
ssn	char	11
salary	double	10 (2 decimal places)

We want to suck this file into a Java bean called Employee that has properties firstName, lastName, ssn, gender and salary. These are available via the standard JavaBean mechanisms.

To do this, we start by writing the Flatworm XML descriptor for the file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">
<file-format>
    <converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>
    <converter name="decimal" class="com.blackbear.flatworm.converters.CoreConverters" method="convertDecimal" return-type="java.lang.Double"/>
    <record name="newhire">
        <record-ident>
            <field-ident field-start="0" field-length="2">
                <match-string>NH</match-string>
            </field-ident>
        </record-ident>
        <record-definition>
            <bean name="employee" class="Employee"/>
            <line>
                <record-element length="2"/>
                <record-element length="15" beanref="employee.firstName" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element length="15" beanref="employee.lastName" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element length="1" beanref="employee.gender" type="char"/>
                <record-element length="11" beanref="employee.ssn" type="char">
                    <conversion-option name="strip-chars" value="non-numeric"/>
                </record-element>
                <record-element length="10" beanref="employee.salary" type="decimal">
                    <conversion-option name="decimal-places" value="2"/>
                    <conversion-option name="decimal-implied" value="true"/>
                    <conversion-option name="pad-character" value="0"/>
                    <conversion-option name="justify" value="right"/>
                </record-element>
            </line>
        </record-definition>
    </record>
</file-format>

The file-format tag is required, and specifies the beginning of the actual description. The first thing that we must do is to register converters for the datatypes used in the file. There are a number of predefined converter methods in the provided class com.blackbear.flatworm.coverters.CoreConverters:

convertChar - Simply returns the field specified, optionally stripping leading or trailing (or both) padding characters, and removing unwanted characters.
convertDecimal - As above but converts the value to a Double. The decimal place may be implied by position, or explicit
convertDate - Parses the date using the default (MM-dd-yyyy) or a user defined format.
convertInteger - Parses to an Integer
convertLong - Parses to a Long
covertBigDecimal - Parses to a BigDecimal

In order to be used in record definitions, a converter must always be registered first. Next in the file, a record is defined. A file may contain several different types of records, the record-indent tag is used to specify which record definition is approach for a given line. There are two different ways to identify a record, by a substring match on a specific section of the line, or by the overall length of the line. Later, you will see how multiple record types can be read from the same file, for them moment only one is defined, which matches on the characters NH (new hire) at locations 0-2 on the line. If no record-ident is defined, all records will match.

Once we're sure that we are dealing with the correct record type, we can define the record. We start by defining the beans that will be returned. Each bean has a name which is used to reference it inside the definition, and a class (fully qualified) with which to create objects. The class specified must have a valid zero-argument instantiator.

Finally the record is broken down line by line (since a record is allowed to span multiple lines). Record-elements (fields) may be defined in terms of:

a length alone, in which case they are considered to span from the end of the last field to that position plus the specified length
a start position and a length, in which case they span from the start position to that position plus the length
a start and end position, in which case they span from the start to end position (not inclusive of the end)
an end position alone, in which case they span from the last end position to the specified end position (not inclusive of the end)

Each record element also defines the beanref (according to the standard used in the Apache Commons BeanUtil package), and the type (which should match one of the types defined at the top of the file) Record elements also may have conversion-options, which are specific to the converter specified. For example, in the above example, the lastName field should have any trailing spaces removed, the social security number show be stripped of all non-numeric characters, and the salary has two implied decimal places and may be left-padded with zeros which should be removed.

Now we're ready to fire it all up. Here's a simple Java class that parses the input file and prints out the beans produced:

import java.io.*;
import java.util.HashMap;

import com.blackbear.flatworm.ConfigurationReader;
import com.blackbear.flatworm.FileFormat;
import com.blackbear.flatworm.MatchedRecord;
import com.blackbear.flatworm.errors.*;

public class SimpleFlatwormExample {
    public static void main(String[] args) {
         ConfigurationReader parser = new ConfigurationReader();
         try {
             FileFormat ff = parser.loadConfigurationFile(args[0]);
             InputStream in = new FileInputStream(args[1]);
             BufferedReader bufIn = new BufferedReader(new InputStreamReader(in));
             MatchedRecord results;
             while ((results = ff.getNextRecord(bufIn)) != null) {
                 if (results.getRecordName().equals("newhire")) {
                    System.out.println(results.getBean("employee"));
                 }
             }

         } catch (FlatwormUnsetFieldValueException flatwormUnsetFieldValueError) {
             flatwormUnsetFieldValueError.printStackTrace();  
         } catch (FlatwormConfigurationValueException flatwormConfigurationValueError) {
             flatwormConfigurationValueError.printStackTrace();
         } catch (FileNotFoundException e) {
             e.printStackTrace();
         } catch (FlatwormInvalidRecordException e) {
             e.printStackTrace(); 
         } catch (FlatwormInputLineLengthException e) {
             e.printStackTrace(); 
         } catch (FlatwormConversionException e) {
             e.printStackTrace(); 
         }
     }

}

The location of the configuration file is passed in as the first argument to the method, and the file to be parsed as the second. A ConfigurationReader object is created, and the loadConfigurationFile method is called with the path to the file as the argument. A FileFormat is returned. After opening the input file and morphing it into a BufferedReader, the BufferedReader is passed in to the getNextRecord method of the FileFormat. getNextRecord either returns null if the input file has been exhusted, or a MatchedRecord object. The getRecordName method lets us know which type of record is being returned (remembering again that a file can have several types of records), and we can access specific beans with the getBean method.

When we run this test program, the results are as expected:

C:/j2sdk1.4.2_04\bin\java SimpleFlatwormExample simple-example.xml import1.txt
Employee@120a47e[TURNER, JAMES, 123456789, M, 42243.45]
Employee@f73c1[JONES, JOHN, 987654321, M, 1043567.45]
Process terminated with exit code 0

Defining Your Own Converters

If you want to define a novel new converter to use in your application, it's quite simple. For each type to be converted, a converter has to offer two methods:

A method to convert a string read from the file to the target type (parsing). The signature of such a method looks like this (being T the type to be parsed):
```
public T convertT(String str, HashMap options) throws FlatwormConversionException;
```
A method to convert a value of the target type into a string representation (generation). The signature of such a method looks like this (being T the type to be written):
```
public String convertT(Object obj, HashMap options)
```

To become a bit more specific, let's look at the definition of ConvertDecimal from the CoreConverters file - first the parsing method:

    public Double convertDecimal(String str, HashMap options) throws FlatwormConversionException
    {
        try
        {
            int decimalPlaces = 0;
            ConversionOption conv = (ConversionOption) options.get("decimal-places");

            String decimalPlacesOption = null;
            if (null != conv)
                decimalPlacesOption = conv.getValue();

            boolean decimalImplied = "true".equals(Util.getValue(options, "decimal-implied"));

            if (decimalPlacesOption != null)
                decimalPlaces = Integer.parseInt(decimalPlacesOption);

            if (str.length() == 0)
                return new Double(0.0D);

            if (decimalImplied)
                return new Double(Double.parseDouble(str) / Math.pow(10D, decimalPlaces));
            else
                return Double.valueOf(str);

        } catch (NumberFormatException ex)
        {
            cat.error(ex);
            throw new FlatwormConversionException(str);
        }
    }

All parsing converter methods must accept exactly two arguments, a String and a HashMap. The String contains the substring text from the input line. The HashMap contains the key/value pairs from the conversion-options tags. It's a good policy to call removePadding first, since it automatically handles removing any left or right padding as specified by the options, strips out unwanted characters, and returns a default value if the string is empty. Converters should return an object as opposed to an intrinsic, since the value must eventually end up in a HashMap. Finally, if any errors are encountered during processing, you should throw a FlatwormConversionException with some useful diagnostic information.

Now let's take a look at the definition of the CoreConverter's method for writing a Decimal:

    public String convertDecimal(Object obj, HashMap options)
    {
        Double d = (Double) obj;
        if (d == null)
        {
            return null;
        }

        int decimalPlaces = 0;
        ConversionOption conv = (ConversionOption) options.get("decimal-places");

        String decimalPlacesOption = null;
        if (null != conv)
            decimalPlacesOption = conv.getValue();

        boolean decimalImplied = "true".equals(Util.getValue(options, "decimal-implied"));

        if (decimalPlacesOption != null)
            decimalPlaces = Integer.parseInt(decimalPlacesOption);

        DecimalFormat format = new DecimalFormat();
        format.setDecimalSeparatorAlwaysShown(!decimalImplied);
        format.setGroupingUsed(false);
        if (decimalImplied)
        {
            format.setMaximumFractionDigits(0);
            d = new Double(d.doubleValue() * Math.pow(10D, decimalPlaces));
        } else
        {
            format.setMinimumFractionDigits(decimalPlaces);
            format.setMaximumFractionDigits(decimalPlaces);
        }
        return format.format(d);
    }

The generating converter methods have a similar restriction as the parsing methods, just the first parameter must be of type Object. It is not the actual attribute type, so Flatworm remains compatible with Java version below 5.0.

Record Matching

As promised, let's look at a more complex example now. This example combines multiple beans in a single record, and multiple record types in a single file: Let's assume we're in the IT department at MegaMart, and we're importing a mixed flat file containing books, videotapes and DVDs. Unfortunately, the three product types have three different formats.

DVD

RECORD NAME	TYPE	LENGTH
title	char	30
studio name	char	30
release date	date	8 (YYYYMMDD)
sku	char	9
price	double	7
dual layer	char	1

The DVD record is a single-line 85 character record, 30 characters each for the title and studio name, an 8 character date field, 9 for the product SKU, 7 for the price with explicit decimal point, and a single character Y/N field that says if the DVD is dual layer.

By contrast, the videotape record is a two-line return:

Videotape

RECORD NAME	TYPE	LENGTH
recordtype	char	1 ('V')
sku	char	9
price	double	6 (implied decimal, 2 places, zero pad)

RECORD NAME	TYPE	LENGTH
title	char	30
studio	char	30
release date	char	10 (YYYY-MM-DD)

This record starts with a line with a leading 'V' character followed by the SKU and price without a decimal point, then a second line with title, studio and release date.

Finally, the book record is a single-line record, similar to the DVD record

Book

RECORD NAME	TYPE	LENGTH
sku	char	9
title	char	30
author	char	30
price	double	7 (explicit decimal)
release date	date	10 (YYYY-MM-DD)

Further complicating thing, we want to use a common "Film" bean to store the film-related info from both the DVD and Videotape records, but store the rest in seperate DVD or Videotape beans. Finally, some of the date records are missing, and should be given a default value on import. As it turns out, this is a piece of cake for Flatworm:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">
<file-format>
    <converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>
    <converter name="decimal" class="com.blackbear.flatworm.converters.CoreConverters" method="convertDecimal" return-type="java.lang.Double"/>
    <converter name="date" class="com.blackbear.flatworm.converters.CoreConverters" method="convertDate" return-type="java.util.Date"/>
    <record name="dvd">
        <record-ident>
            <length-ident minlength="85" maxlength="85"/>
        </record-ident>
        <record-definition>
            <bean name="dvd" class="Dvd"/>
            <bean name="film" class="Film"/>
            <line>
                <record-element length="30" beanref="film.title" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element length="30" beanref="film.studio" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element length="8" beanref="film.releaseDate" type="date">
                    <conversion-option name="format" value="yyyyMMdd"/>
                    <conversion-option name="default-value" value="19990101"/>
                </record-element>
                <record-element length="9" beanref="dvd.sku" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element length="7" beanref="dvd.price" type="decimal">
                    <conversion-option name="justify" value="right"/>
                </record-element>
                <record-element length="1" beanref="dvd.dualLayer" type="char"/>
            </line>
        </record-definition>
    </record>
    <record name="videotape">
        <record-ident>
            <field-ident field-start="0" field-length="1">
                <match-string>V</match-string>
            </field-ident>
        </record-ident>
        <record-definition>
            <bean name="video" class="Videotape"/>
            <bean name="film" class="Film"/>
            <line>
                <record-element start="1" end="10"  beanref="video.sku" type="char">
                    <conversion-option name="justify" value="right"/>
                    <conversion-option name="pad-character" value="0"/>
                </record-element>
                <record-element start="10" end="16" beanref="video.price" type="decimal">
                    <conversion-option name="decimal-implied" value="true"/>
                    <conversion-option name="decimal-places" value="2"/>
                    <conversion-option name="justify" value="right"/>
                    <conversion-option name="pad-character" value="0"/>
                </record-element>
            </line>
            <line>
                <record-element start="0" end="30" beanref="film.title" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element start="30" end="60" beanref="film.studio" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element start="60" end="70" beanref="film.releaseDate" type="date">
                    <conversion-option name="default-value" value="1980-01-01"/>
                </record-element>
            </line>
        </record-definition>
    </record>
    <record name="book">
        <record-definition>
            <bean name="book" class="Book"/>
            <line>
                <record-element length="9"  beanref="book.sku" type="char"/>
                <record-element length="30" beanref="book.title" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element length="30" beanref="book.author" type="char">
                    <conversion-option name="justify" value="left"/>
                </record-element>
                <record-element length="7" beanref="book.price" type="decimal">
                    <conversion-option name="justify" value="right"/>
                </record-element>
                <record-element length="10" beanref="book.releaseDate" type="date">
                    <conversion-option name="default-value" value="1970-01-01"/>
                </record-element>
            </line>
        </record-definition>
    </record>
</file-format>

Without rehashing old ground, you can see that in this example, we have three different scenarios for matching records. Dvds are matched based on record length. Videotapes are matched based on a leading V character. And books, with no record matching tags, match anything that remains. Flatworm processes record definitions in the order they are defined in the file, and applies the first on that successfully matches.

You can also see multiple beans being defined in a single record, and the use of the format conversion option with a date. Given this input file:

DIAL J FOR JAVA               RUN ANYWHERE STUDIO           2004011555512121   49.95Y
546234476HE KNOWS WHEN YOU"RE CODING   JAVALANG OBJECT                 13.952003-11-10
V002346542002355
WHEN A STRANGER IMPLEMENTS    NULL POINTER PRODUCTIONS      2003-03-12
546543476THE GC ALWAYS RINGS TWICE     JAVAUTIL HASHMAP                23.432004-12-19
V002435542001955
DATA AND DATATYPES            PRETENTIOUS FILMS LTD

And the following test program:

import java.io.*;
import java.util.HashMap;

import com.blackbear.flatworm.ConfigurationReader;
import com.blackbear.flatworm.FileFormat;
import com.blackbear.flatworm.MatchedRecord;
import com.blackbear.flatworm.errors.*;

public class ComplexFlatwormExample {
    public static void main(String[] args) {
         ConfigurationReader parser = new ConfigurationReader();
         try {
             FileFormat ff = parser.loadConfigurationFile(args[0]);
             InputStream in = new FileInputStream(args[1]);
             BufferedReader bufIn = new BufferedReader(new InputStreamReader(in));

             MatchedRecord results;
             while ((results = ff.getNextRecord(bufIn)) != null) {
                 if (results.getRecordName().equals("dvd")) {
                    System.out.println(results.getBean("dvd"));
                     System.out.println(results.getBean("film"));
                 }
                 if (results.getRecordName().equals("videotape")) {
                    System.out.println(results.getBean("video"));
                     System.out.println(results.getBean("film"));
                 }
                 if (results.getRecordName().equals("book")) {
                    System.out.println(results.getBean("book"));
                 }
                 System.out.println("");
             }

         } catch (FlatwormUnsetFieldValueException flatwormUnsetFieldValueError) {
             flatwormUnsetFieldValueError.printStackTrace();  //To change body of catch statement use Options | File Templates.
         } catch (FlatwormConfigurationValueException flatwormConfigurationValueError) {
             flatwormConfigurationValueError.printStackTrace();  //To change body of catch statement use Options | File Templates.
         } catch (FileNotFoundException e) {
             e.printStackTrace();  //To change body of catch statement use Options | File Templates.
         } catch (FlatwormInvalidRecordException e) {
             e.printStackTrace();  //To change body of catch statement use Options | File Templates.
         } catch (FlatwormInputLineLengthException e) {
             e.printStackTrace();  //To change body of catch statement use Options | File Templates.
         } catch (FlatwormConversionException e) {
             e.printStackTrace();  //To change body of catch statement use File | Settings | File Templates.
         }
     }



}

The following output is produced:

Dvd@3901c6[55512121, 49.95, Y]
Film@a37368[Thu Jan 15 00:00:00 EST 2004, DIAL J FOR JAVA, RUN ANYWHERE STUDIO]

Book@ae506e[Mon Nov 10 00:00:00 EST 2003, HE KNOWS WHEN YOU"RE CODING, JAVALANG OBJECT,546234476,13.95]

Videotape@ba6c83[2346542, 23.55]
Film@12a1e44[Wed Mar 12 00:00:00 EST 2003, WHEN A STRANGER IMPLEMENTS, NULL POINTER PRODUCTIONS]

Book@29428e[Sun Dec 19 00:00:00 EST 2004, THE GC ALWAYS RINGS TWICE, JAVAUTIL HASHMAP,546543476,23.43]

Videotape@161f10f[2435542, 19.55]
Film@1193779[Tue Jan 01 00:00:00 EST 1980, DATA AND DATATYPES, PRETENTIOUS FILMS LTD]

CSV files

Flatworm also supports reading and writing of CSV (comma separated values) files. The CSV mode is activated by the optional delimit attribute of the <line> tag, where the delimiter character (e.g. a comma, a semicolon, etc.) is specified. The following example shows the respective part of the XML descriptor:

<?xml version="1.0" encoding="UTF-8"?>
<file-format>

    ...
    
    <record name="header">
        <record-ident>
            <field-ident field-start="0" field-length="14">
                <match-string>foobar</match-string>
            </field-ident>
        </record-ident>
        <record-definition>
            <line delimit=";">
                <record-element length="0" type="char">
                    <conversion-option name="default-value" value="field1" />
                </record-element>
                <record-element length="0" type="char">
                    <conversion-option name="default-value" value="field2" />
                </record-element>
            </line>
        </record-definition>
    </record>
</file-format>

The example shows also that the length attribute of record elements must be set to 0, since in CSV files the length of each field is variable, hence meaningless.

Encodings

By default, Flatworm uses the platform's default charset to read/write the files. You can explicitely specify the file's encoding in the respective XML descriptor by the optional encoding attribute of the file-format element. Example:

<?xml version="1.0" encoding="UTF-8"?>
<file-format encoding="iso-8859-1">

    ...
    
</file-format>

Don't confuse the encoding attributes of the XML preamble and that of the file-format element: While the encoding attribute of the preamble defines the encoding of the XML descriptor file, the encoding attribute of the file-format element defines the encoding to be used on reading/writing the flat files.

Repeating Segments

A very simple example of using repeating segments could be describing the classes at a school, with the students taking each class. The class attributes could be subject, period, teacher, grade level, and students. Student attributes might be first name, last name, grade. With a standard flat file it would be necessary to choose a number of students to be supported, and to allow for this number of students in each class. If we were to allow for 30 students, then every record would require 94 fields (whether fixed-length or delimited). If most classes had 20-24 students, and some as few as 10 or 12, there could potentially be a lot of empty space in a file. Now consider an enterprise application where there could be 10s of millions of records with a dozen repeating segments. The amount of empty space could be huge. The simple class example could be described with the following configuration:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">
<file-format>
    <converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>
    <converter name="int" class="com.blackbear.flatworm.converters.CoreConverters" method="convertInteger" return-type="java.lang.Integer"/>
    <record name="class">
        <record-ident>
            <field-ident field-start="0" field-length="2">
                <match-string>CL</match-string>
            </field-ident>
        </record-ident>
        <record-definition>
            <bean name="class" class="ClassPeriod"/>
            <bean name="student" class="Student"/>
            <line delimit="|">
                <record-element length="0" type="char"/>
                <record-element length="0" beanref="class.subject" type="char"/>
                <record-element length="0" beanref="class.period" type="int"/>
                <record-element length="0" beanref="class.teacher" type="char"/>
                <record-element length="0" beanref="class.gradeLevel" type="int"/>
                <segment-element parent-beanref="class" addMethod="addStudent" beanref="student" minCount="1" maxCount="30">
                    <field-ident field-start="0" field-length="1">
                        <match-string>S</match-string>
                    </field-ident>
                    <record-element length="0"/>
                    <record-element length="0" beanref="student.firstName" type="char"/>
                    <record-element length="0" beanref="student.lastName" type="char"/>
                    <record-element length="0" beanref="student.grade" type="int"/>
                </segment-element>
            </line>
        </record-definition>
    </record>
</file-format>

There are a few things to be noted about the <segment-element> tag. Repeating segments are supported only for delimited files, so they must have a <field-ident> element. Since <field-ident> element is required it in't necessary to enclose it in a <record-ident> element. The identifier must be the first field in the segment; if it is not a property of the Java class it can just be consumed. There must be a beanref attribute because Flatworm will construct an instance of the specified bean for each segment that it encounters. It then invokes the addMethod on the parent-beanref, which must have a Collection property for that type. There is an additional attribute that is not shown here; cardinality-mode. This attribute defines how to handle segments that don't adhere to the specified cardinality; that is when there are fewer than the minCount, or more than the maxCount. The default mode is LOOSE, which means that an error is logged for any violations, but no other action is taken. For cardinality mode RESTRICTED, in addition to logging an error, no more than the maxCount instances will be added to the parent collection. If the cardinality mode is set to STRICT, then an exception will be thrown for any violation, either too few, or too many. It is also possible to nest segment-elements within segment-elements.

This configuration would allow us to read a file such as this (the class size here is much smaller!)

CL|English|2|Ms Buffington|4|S|Bill|Smith|78|S|Peter|Jackson|91|S|Mary|Hardmann|87|S|Susan|Benet|88
CL|Arithmetic|2|Mr Hermann|3|S|Harry|Mirtle|93|S|Helen|Peters|87|S|Johnny|Jones|76
CL|Science|4|Mrs Darbie|4|S|Peter|Jackson|86|S|Susan|Benet|85|S|Kelley|Laver|87|S|Bobbie|Jones|73|S|Pauline|Sturgis|84

For this to work, the ClassPeriod class would look similar to

public class ClassPeriod
{
   private String subject;
   private int period;
   private String teacher;
   private int gradeLevel;
   private List<Student> students = new ArrayList<Student>();

   // The usual getters/setters required for JavaBeans
   ....

   public void addStudent(Student _student)
   {
      students.add(_student);
   }
}

Attributes for segment-element tag

ATTRIBUTE NAME	VALUE	REQUIRED	DEFAULT
name	A name for the segment	no	none
minCount	The least number of this segment allowed per record	yes	none
maxCount	The greatest number of this segment allowed per record	yes	none
parent-beanref	The name of the beanref for the parent that contains the segment	yes	none
addMethod	The name of the method in the parent-beanref for adding an instance of this beanref	no	"add" + the name of the segment if specified or the beanref if the name is not specified
beanref	The name of the bean to hold the fields for this segment type	no - if the name is specified yes - if no name is specified	The name of the segment if specified (there must be a bean with the same name declared)
cardinality-mode	Specifies how to handle violations of the specified cardinality	no	loose - violations are ignored

Simplifying Usage With FileParser

The examples which have been presented earlier showing how to use Flatworm to retrieve records from your files (SimpleFlatwormExample and ComplexFlatwormExample) have required that you load the configuration file. Then you create an InputStream which is used (along with the FileFormat that you got back from parsing the configuration file) to retrieve the records from the data file. This works very well, and provides you with a good bit of control over the process. But if you would prefer something a little bit simpler you can use the FileParser (since version 1.2). To use the FileParser, you instantiate an instance with the name of the configuration file, and the data file. For each record type that you have defined you call the setBeanHandler(String, Object, String) method, passing the name of the record type, the object to be invoked on the callback, and the callback method to be invoked. You also set an exception handler by calling the setExceptionHandler(Object, String) method, passing the object to be invoked on the callback and the callback method to be invoked in case of an exception. You then call the open() method (which can throw FileNotFoundException or UnsupportedEncodingException). The file is read by calling the read() method, and when it is finished call the close() method. As each record is read, the callback for that record type will be invoked, passing the MatchedRecord that was generated. From the matched record you can extract the beans that were created for that record. Lets see what the SimpleFlatwormExample might look rewritten to use the FileParser.

public class SimpleFlatwormExample
{
   public static void main(String[] args)
   {
      new SimpleFlatwormExample(args[0], args[1]).run();
   }

   public void run(String configurationFile, String dataFile)
   {
      FileParser parser = new FileParser(configurationFile, dataFile);
      parser.setBeanHandler("newhire", this, "processNewHire");
      parser.setExceptionHandler(this, "processException");
      try
      {
         parser.open();
      }
      catch(Exception e)
      {
         // handle the exception
      }
      parser.read();
      try
      {
         parser.close();
      }
      catch(IOException ignore)
      {}
   }

   public void processHewHire(MatchedRecord record)
   {
      Employee emp = (Employee)record.getBean("employee");
      System.out.println(emp);
   }

   public void processException(String exception, String inputLine)
   {
      // Do whatever processing you wish for the exception
   }
}

With version 2.0 a new object based callback mechanism has been introduced. Passing a class that implements RecordCallback to the FileParser's addRecordCallback(String, RecordCallback) method for each record type to be processed avoids the reflective method invocation of the previous callback mechanism. The previous mechanism is still supported, but will likely be removed in a future version. To set an exception callback with the new mechanism, pass a class that implements ExceptionCallback to the FileParser's setExceptionCallback(ExceptionCallback) method. with either callback mechanism only one exception callback is supported, and if none is set and an exception occurs, a RuntimeException will now be thrown. If you are not interested in processing any record types that are included in the configuration you don't need to add a callback for that type.

Code Examples

Examples of Flatworm usage (including the examples presented in this guide) can be found in the Examples package and in the Test package. Many of the unit tests are more properly integration tests since they test at a high level of granularity. They therefore do provide good examples of how the library can be used.

A Field Guide to Parsing and Creating Flat Files Using the Humble Flatworm