A Field Guide to Parsing and Creating Flat Files
Using the Humble Flatworm
For Version
2.0
Last Revised December, 2009
Flat files. Much as we live in an XML/SOAP/Web Services world,
there's still a ton of data being moved around between proprietary and
legacy applications that consists of fixed length fields delimited by
EOLs. Around about the time I wrote my 20th Java class who's only
purpose in life was to suck up a flat file, use
String.substring
to break it up into pieces, and then populate a bean with it, I decided
there had to be a better way. This package represents the fruit of
that frustration.
What is Flatworm?
Flatworm is a Java library intended to allow a developer to describe the
format of a flat file using an XML definition file, and then to be able
to automatically read lines from that file, and have one or more beans
be instantiated for each logical record.
There are a few powerful features in Flatworm worth mentioning.
For one thing, a record may consist of one or more physical lines in the
file. A record may contain more than one bean once decoded.
A flat file may contain more than one type of record, and Flatworm can
use line length and substring matching to determine which type of record
a line begins.
Besides fielded buffer flat files, Flatworm also supports text files where
the different fields are separated by a separator character, e.g. CSV (comma separated values) files.
Flatworm, as of version 2.0, also supports delimited files that contain segments that may repeat. These
are different than standard flat files that have a well defined number of fields for each record. With
repeating segments it is possible to have a varying number of the segment in each record, so that different
records in the file could have a different number of fields. Repeating segments are supported only for reading
delimited files.
Last but not least, Flatworm is able to produce flat files from beans
and the same definition file.
Requirements
In addition to the flatworm jar file, you will also need to have the
following jars in your classpath in order for Flatworm to thrive:
- commons-beanutils (from Apache Commons)
- commons-collections (from Apache Commons)
- commons-logging (from Apache Commons)
- commons-lang (from Apache Commons)
- log4j (www.log4j.org)(optional)
Recent versions of all of these packages are available in the source jar
file.
Downloading
The latest version of Flatworm is Release 2.0. You can download it
from
Sourceforge
.
A Simple Example
Before diving into the complexities of Flatworm, let's look at a simple
example that illustrates the basic operation. Imagine the
following input file which contains new hire data for a company:
NHJAMES TURNER M123-45-67890004224345
NHJOHN JONES M987-65-43210104356745
The layout of the file is as follows:
RECORD NAME
|
TYPE
|
LENGTH
|
recordtype
|
char
|
2
|
firstname
|
char
|
15
|
lastname
|
char
|
15
|
gender
|
char
|
1
|
ssn
|
char
|
11
|
salary
|
double
|
10 (2 decimal places)
|
We want to suck this file into a Java bean called Employee that has
properties firstName, lastName, ssn, gender and salary. These are
available via the standard JavaBean mechanisms.
To do this, we start by writing the Flatworm XML descriptor for the
file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">
<file-format>
<converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>
<converter name="decimal" class="com.blackbear.flatworm.converters.CoreConverters" method="convertDecimal" return-type="java.lang.Double"/>
<record name="newhire">
<record-ident>
<field-ident field-start="0" field-length="2">
<match-string>NH</match-string>
</field-ident>
</record-ident>
<record-definition>
<bean name="employee" class="Employee"/>
<line>
<record-element length="2"/>
<record-element length="15" beanref="employee.firstName" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="15" beanref="employee.lastName" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="1" beanref="employee.gender" type="char"/>
<record-element length="11" beanref="employee.ssn" type="char">
<conversion-option name="strip-chars" value="non-numeric"/>
</record-element>
<record-element length="10" beanref="employee.salary" type="decimal">
<conversion-option name="decimal-places" value="2"/>
<conversion-option name="decimal-implied" value="true"/>
<conversion-option name="pad-character" value="0"/>
<conversion-option name="justify" value="right"/>
</record-element>
</line>
</record-definition>
</record>
</file-format>
The file-format tag is required, and specifies the beginning of the
actual description. The first thing that we must do is to register
converters for the datatypes used in the file. There are a number
of predefined converter methods in the provided class
com.blackbear.flatworm.coverters.CoreConverters:
- convertChar - Simply returns the field specified, optionally
stripping leading or trailing (or both) padding characters, and
removing unwanted characters.
- convertDecimal - As above but converts the value to a Double.
The decimal place may be implied by position, or explicit
- convertDate - Parses the date using the default (MM-dd-yyyy)
or a user defined format.
- convertInteger - Parses to an Integer
- convertLong - Parses to a Long
- covertBigDecimal - Parses to a BigDecimal
In order to be used in record definitions, a converter must always be
registered first. Next in the file, a record is defined. A
file may contain several different types of records, the record-indent
tag is used to specify which record definition is approach for a given
line. There are two different ways to identify a record, by a
substring match on a specific section of the line, or by the overall
length of the line. Later, you will see how multiple record types
can be read from the same file, for them moment only one is defined,
which matches on the characters NH (new hire) at locations 0-2 on the
line. If no record-ident is defined, all records will match.
Once we're sure that we are dealing with the correct record type, we can
define the record. We start by defining the beans that will be
returned. Each bean has a name which is used to reference it
inside the definition, and a class (fully qualified) with which to
create objects. The class specified must have a valid
zero-argument instantiator.
Finally the record is broken down line by line (since a record is
allowed to span multiple lines). Record-elements (fields) may be defined
in terms of:
- a length alone, in which case they are considered to
span from the end of the last field to that position plus the specified
length
- a start position and a length, in which case they span from
the start position to that position plus the length
- a start and end position, in which case they span from the
start to end position (not inclusive of the end)
- an end position alone, in which case they span from the last
end position to the specified end position (not inclusive of the end)
Each record element also defines the beanref (according to the standard
used in the Apache Commons BeanUtil package), and the type (which should
match one of the types defined at the top of the file) Record
elements also may have conversion-options, which are specific to the
converter specified. For example, in the above example, the
lastName field should have any trailing spaces removed, the social
security number show be stripped of all non-numeric characters, and the
salary has two implied decimal places and may be left-padded with zeros
which should be removed.
Now we're ready to fire it all up. Here's a simple Java class that
parses the input file and prints out the beans produced:
import java.io.*;
import java.util.HashMap;
import com.blackbear.flatworm.ConfigurationReader;
import com.blackbear.flatworm.FileFormat;
import com.blackbear.flatworm.MatchedRecord;
import com.blackbear.flatworm.errors.*;
public class SimpleFlatwormExample {
public static void main(String[] args) {
ConfigurationReader parser = new ConfigurationReader();
try {
FileFormat ff = parser.loadConfigurationFile(args[0]);
InputStream in = new FileInputStream(args[1]);
BufferedReader bufIn = new BufferedReader(new InputStreamReader(in));
MatchedRecord results;
while ((results = ff.getNextRecord(bufIn)) != null) {
if (results.getRecordName().equals("newhire")) {
System.out.println(results.getBean("employee"));
}
}
} catch (FlatwormUnsetFieldValueException flatwormUnsetFieldValueError) {
flatwormUnsetFieldValueError.printStackTrace();
} catch (FlatwormConfigurationValueException flatwormConfigurationValueError) {
flatwormConfigurationValueError.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (FlatwormInvalidRecordException e) {
e.printStackTrace();
} catch (FlatwormInputLineLengthException e) {
e.printStackTrace();
} catch (FlatwormConversionException e) {
e.printStackTrace();
}
}
}
The location of the configuration file is passed in as the first
argument to the method, and the file to be parsed as the second. A
ConfigurationReader object is created, and the loadConfigurationFile
method is called with the path to the file as the argument. A
FileFormat is returned. After opening the input file and morphing
it into a BufferedReader, the BufferedReader is passed in to the
getNextRecord method of the FileFormat. getNextRecord either
returns null if the input file has been exhusted, or a MatchedRecord
object. The getRecordName method lets us know which type of record
is being returned (remembering again that a file can have several types
of records), and we can access specific beans with the getBean method.
When we run this test program, the results are as expected:
C:/j2sdk1.4.2_04\bin\java SimpleFlatwormExample simple-example.xml import1.txt
Employee@120a47e[TURNER, JAMES, 123456789, M, 42243.45]
Employee@f73c1[JONES, JOHN, 987654321, M, 1043567.45]
Process terminated with exit code 0
Defining Your Own Converters
If you want to define a novel new converter to use in your application,
it's quite simple. For each type to be converted, a converter has to
offer two methods:
- A method to convert a string read from the file to the target type (parsing). The signature
of such a method looks like this (being T the type to be parsed):
public T convertT(String str, HashMap options) throws FlatwormConversionException;
- A method to convert a value of the target type into a string representation (generation). The
signature of such a method looks like this (being T the type to be written):
public String convertT(Object obj, HashMap options)
To become a bit more specific, let's look at the definition
of ConvertDecimal from the CoreConverters file - first the parsing method:
public Double convertDecimal(String str, HashMap options) throws FlatwormConversionException
{
try
{
int decimalPlaces = 0;
ConversionOption conv = (ConversionOption) options.get("decimal-places");
String decimalPlacesOption = null;
if (null != conv)
decimalPlacesOption = conv.getValue();
boolean decimalImplied = "true".equals(Util.getValue(options, "decimal-implied"));
if (decimalPlacesOption != null)
decimalPlaces = Integer.parseInt(decimalPlacesOption);
if (str.length() == 0)
return new Double(0.0D);
if (decimalImplied)
return new Double(Double.parseDouble(str) / Math.pow(10D, decimalPlaces));
else
return Double.valueOf(str);
} catch (NumberFormatException ex)
{
cat.error(ex);
throw new FlatwormConversionException(str);
}
}
All parsing converter methods must accept exactly two arguments, a String and a
HashMap. The String contains the substring text from the input
line. The HashMap contains the key/value pairs from the
conversion-options tags. It's a good policy to call removePadding
first, since it automatically handles removing any left or right padding
as specified by the options, strips out unwanted characters, and
returns a default value if the string is empty. Converters should
return an object as opposed to an intrinsic, since the value must
eventually end up in a HashMap. Finally, if any errors are
encountered during processing, you should throw a
FlatwormConversionException with some useful diagnostic information.
Now let's take a look at the definition
of the CoreConverter's method for writing a Decimal:
public String convertDecimal(Object obj, HashMap options)
{
Double d = (Double) obj;
if (d == null)
{
return null;
}
int decimalPlaces = 0;
ConversionOption conv = (ConversionOption) options.get("decimal-places");
String decimalPlacesOption = null;
if (null != conv)
decimalPlacesOption = conv.getValue();
boolean decimalImplied = "true".equals(Util.getValue(options, "decimal-implied"));
if (decimalPlacesOption != null)
decimalPlaces = Integer.parseInt(decimalPlacesOption);
DecimalFormat format = new DecimalFormat();
format.setDecimalSeparatorAlwaysShown(!decimalImplied);
format.setGroupingUsed(false);
if (decimalImplied)
{
format.setMaximumFractionDigits(0);
d = new Double(d.doubleValue() * Math.pow(10D, decimalPlaces));
} else
{
format.setMinimumFractionDigits(decimalPlaces);
format.setMaximumFractionDigits(decimalPlaces);
}
return format.format(d);
}
The generating converter methods have a similar restriction as the parsing methods, just
the first parameter must be of type Object. It is not the actual attribute type, so
Flatworm remains compatible with Java version below 5.0.
Record Matching
As promised, let's look at a more complex example now. This
example combines multiple beans in a single record, and multiple record
types in a single file: Let's assume we're in the IT department at
MegaMart, and we're importing a mixed flat file containing books,
videotapes and DVDs. Unfortunately, the three product types have
three different formats.
DVD
RECORD NAME
|
TYPE
|
LENGTH
|
title
|
char
|
30
|
studio name
|
char
|
30
|
release date
|
date
|
8 (YYYYMMDD)
|
sku
|
char
|
9
|
price
|
double
|
7
|
dual layer
|
char
|
1
|
The DVD record is a single-line 85 character record, 30 characters each
for the title and studio name, an 8 character date field, 9 for the
product SKU, 7 for the price with explicit decimal point, and a single
character Y/N field that says if the DVD is dual layer.
By contrast, the videotape record is a two-line return:
Videotape
RECORD NAME
|
TYPE
|
LENGTH
|
recordtype
|
char
|
1 ('V')
|
sku
|
char
|
9
|
price
|
double
|
6 (implied decimal, 2 places,
zero pad)
|
RECORD NAME
|
TYPE
|
LENGTH
|
title
|
char
|
30
|
studio
|
char
|
30
|
release date
|
char
|
10 (YYYY-MM-DD)
|
This record starts with a line with a leading 'V' character followed by
the SKU and price without a decimal point, then a second line with
title, studio and release date.
Finally, the book record is a single-line record, similar to the DVD
record
Book
RECORD NAME
|
TYPE
|
LENGTH
|
sku
|
char
|
9
|
title
|
char
|
30
|
author
|
char
|
30 |
price
|
double
|
7 (explicit decimal)
|
release date
|
date
|
10 (YYYY-MM-DD)
|
Further complicating thing, we want to use a common "Film" bean to store
the film-related info from both the DVD and Videotape records, but store
the rest in seperate DVD or Videotape beans. Finally, some of the
date records are missing, and should be given a default value on import.
As it turns out, this is a piece of cake for Flatworm:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">
<file-format>
<converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>
<converter name="decimal" class="com.blackbear.flatworm.converters.CoreConverters" method="convertDecimal" return-type="java.lang.Double"/>
<converter name="date" class="com.blackbear.flatworm.converters.CoreConverters" method="convertDate" return-type="java.util.Date"/>
<record name="dvd">
<record-ident>
<length-ident minlength="85" maxlength="85"/>
</record-ident>
<record-definition>
<bean name="dvd" class="Dvd"/>
<bean name="film" class="Film"/>
<line>
<record-element length="30" beanref="film.title" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="30" beanref="film.studio" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="8" beanref="film.releaseDate" type="date">
<conversion-option name="format" value="yyyyMMdd"/>
<conversion-option name="default-value" value="19990101"/>
</record-element>
<record-element length="9" beanref="dvd.sku" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="7" beanref="dvd.price" type="decimal">
<conversion-option name="justify" value="right"/>
</record-element>
<record-element length="1" beanref="dvd.dualLayer" type="char"/>
</line>
</record-definition>
</record>
<record name="videotape">
<record-ident>
<field-ident field-start="0" field-length="1">
<match-string>V</match-string>
</field-ident>
</record-ident>
<record-definition>
<bean name="video" class="Videotape"/>
<bean name="film" class="Film"/>
<line>
<record-element start="1" end="10" beanref="video.sku" type="char">
<conversion-option name="justify" value="right"/>
<conversion-option name="pad-character" value="0"/>
</record-element>
<record-element start="10" end="16" beanref="video.price" type="decimal">
<conversion-option name="decimal-implied" value="true"/>
<conversion-option name="decimal-places" value="2"/>
<conversion-option name="justify" value="right"/>
<conversion-option name="pad-character" value="0"/>
</record-element>
</line>
<line>
<record-element start="0" end="30" beanref="film.title" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element start="30" end="60" beanref="film.studio" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element start="60" end="70" beanref="film.releaseDate" type="date">
<conversion-option name="default-value" value="1980-01-01"/>
</record-element>
</line>
</record-definition>
</record>
<record name="book">
<record-definition>
<bean name="book" class="Book"/>
<line>
<record-element length="9" beanref="book.sku" type="char"/>
<record-element length="30" beanref="book.title" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="30" beanref="book.author" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="7" beanref="book.price" type="decimal">
<conversion-option name="justify" value="right"/>
</record-element>
<record-element length="10" beanref="book.releaseDate" type="date">
<conversion-option name="default-value" value="1970-01-01"/>
</record-element>
</line>
</record-definition>
</record>
</file-format>
Without rehashing old ground, you can see that in this example, we have
three different scenarios for matching records. Dvds are matched
based on record length. Videotapes are matched based on a leading V
character. And books, with no record matching tags, match anything
that remains. Flatworm processes record definitions in the order
they are defined in the file, and applies the first on that successfully
matches.
You can also see multiple beans being defined in a single record, and
the use of the format conversion option with a date. Given this input
file:
DIAL J FOR JAVA RUN ANYWHERE STUDIO 2004011555512121 49.95Y
546234476HE KNOWS WHEN YOU"RE CODING JAVALANG OBJECT 13.952003-11-10
V002346542002355
WHEN A STRANGER IMPLEMENTS NULL POINTER PRODUCTIONS 2003-03-12
546543476THE GC ALWAYS RINGS TWICE JAVAUTIL HASHMAP 23.432004-12-19
V002435542001955
DATA AND DATATYPES PRETENTIOUS FILMS LTD
And the following test program:
import java.io.*;
import java.util.HashMap;
import com.blackbear.flatworm.ConfigurationReader;
import com.blackbear.flatworm.FileFormat;
import com.blackbear.flatworm.MatchedRecord;
import com.blackbear.flatworm.errors.*;
public class ComplexFlatwormExample {
public static void main(String[] args) {
ConfigurationReader parser = new ConfigurationReader();
try {
FileFormat ff = parser.loadConfigurationFile(args[0]);
InputStream in = new FileInputStream(args[1]);
BufferedReader bufIn = new BufferedReader(new InputStreamReader(in));
MatchedRecord results;
while ((results = ff.getNextRecord(bufIn)) != null) {
if (results.getRecordName().equals("dvd")) {
System.out.println(results.getBean("dvd"));
System.out.println(results.getBean("film"));
}
if (results.getRecordName().equals("videotape")) {
System.out.println(results.getBean("video"));
System.out.println(results.getBean("film"));
}
if (results.getRecordName().equals("book")) {
System.out.println(results.getBean("book"));
}
System.out.println("");
}
} catch (FlatwormUnsetFieldValueException flatwormUnsetFieldValueError) {
flatwormUnsetFieldValueError.printStackTrace(); //To change body of catch statement use Options | File Templates.
} catch (FlatwormConfigurationValueException flatwormConfigurationValueError) {
flatwormConfigurationValueError.printStackTrace(); //To change body of catch statement use Options | File Templates.
} catch (FileNotFoundException e) {
e.printStackTrace(); //To change body of catch statement use Options | File Templates.
} catch (FlatwormInvalidRecordException e) {
e.printStackTrace(); //To change body of catch statement use Options | File Templates.
} catch (FlatwormInputLineLengthException e) {
e.printStackTrace(); //To change body of catch statement use Options | File Templates.
} catch (FlatwormConversionException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
}
The following output is produced:
Dvd@3901c6[55512121, 49.95, Y]
Film@a37368[Thu Jan 15 00:00:00 EST 2004, DIAL J FOR JAVA, RUN ANYWHERE STUDIO]
Book@ae506e[Mon Nov 10 00:00:00 EST 2003, HE KNOWS WHEN YOU"RE CODING, JAVALANG OBJECT,546234476,13.95]
Videotape@ba6c83[2346542, 23.55]
Film@12a1e44[Wed Mar 12 00:00:00 EST 2003, WHEN A STRANGER IMPLEMENTS, NULL POINTER PRODUCTIONS]
Book@29428e[Sun Dec 19 00:00:00 EST 2004, THE GC ALWAYS RINGS TWICE, JAVAUTIL HASHMAP,546543476,23.43]
Videotape@161f10f[2435542, 19.55]
Film@1193779[Tue Jan 01 00:00:00 EST 1980, DATA AND DATATYPES, PRETENTIOUS FILMS LTD]
CSV files
Flatworm also supports reading and writing of CSV (comma separated values) files.
The CSV mode is activated by the optional delimit attribute of the <line> tag,
where the delimiter character (e.g. a comma, a semicolon, etc.) is specified. The following
example shows the respective part of the XML descriptor:
<?xml version="1.0" encoding="UTF-8"?>
<file-format>
...
<record name="header">
<record-ident>
<field-ident field-start="0" field-length="14">
<match-string>foobar</match-string>
</field-ident>
</record-ident>
<record-definition>
<line delimit=";">
<record-element length="0" type="char">
<conversion-option name="default-value" value="field1" />
</record-element>
<record-element length="0" type="char">
<conversion-option name="default-value" value="field2" />
</record-element>
</line>
</record-definition>
</record>
</file-format>
The example shows also that the length attribute of record elements must be set to 0, since
in CSV files the length of each field is variable, hence meaningless.
Encodings
By default, Flatworm uses the platform's default charset to read/write the files.
You can explicitely specify the file's encoding in the respective XML descriptor
by the optional encoding attribute of the file-format element. Example:
<?xml version="1.0" encoding="UTF-8"?>
<file-format encoding="iso-8859-1">
...
</file-format>
Don't confuse the encoding attributes of the XML preamble and that of the file-format element:
While the encoding attribute of the preamble defines the encoding of the XML descriptor file, the encoding
attribute of the file-format element defines the encoding to be used on reading/writing the flat files.
Repeating Segments
A very simple example of using repeating segments could be describing the classes at a school, with the students taking
each class. The class attributes could be subject, period, teacher, grade level, and students. Student attributes
might be first name, last name, grade. With a standard flat file it would be necessary to choose a number of students to be
supported, and to allow for this number of students in each class. If we were to allow for 30 students, then every record would
require 94 fields (whether fixed-length or delimited). If most classes had 20-24 students, and some as few as 10 or 12, there
could potentially be a lot of empty space in a file. Now consider an enterprise application where there could be 10s of millions
of records with a dozen repeating segments. The amount of empty space could be huge.
The simple class example could be described with the following configuration:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">
<file-format>
<converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>
<converter name="int" class="com.blackbear.flatworm.converters.CoreConverters" method="convertInteger" return-type="java.lang.Integer"/>
<record name="class">
<record-ident>
<field-ident field-start="0" field-length="2">
<match-string>CL</match-string>
</field-ident>
</record-ident>
<record-definition>
<bean name="class" class="ClassPeriod"/>
<bean name="student" class="Student"/>
<line delimit="|">
<record-element length="0" type="char"/>
<record-element length="0" beanref="class.subject" type="char"/>
<record-element length="0" beanref="class.period" type="int"/>
<record-element length="0" beanref="class.teacher" type="char"/>
<record-element length="0" beanref="class.gradeLevel" type="int"/>
<segment-element parent-beanref="class" addMethod="addStudent" beanref="student" minCount="1" maxCount="30">
<field-ident field-start="0" field-length="1">
<match-string>S</match-string>
</field-ident>
<record-element length="0"/>
<record-element length="0" beanref="student.firstName" type="char"/>
<record-element length="0" beanref="student.lastName" type="char"/>
<record-element length="0" beanref="student.grade" type="int"/>
</segment-element>
</line>
</record-definition>
</record>
</file-format>
There are a few things to be noted about the <segment-element> tag. Repeating segments are supported only for delimited files,
so they must have a <field-ident> element. Since <field-ident> element is required it in't necessary to enclose it in
a <record-ident> element. The identifier must be the first field in the segment; if it is not a property of
the Java class it can just be consumed. There must be a beanref attribute because Flatworm will construct an instance of the specified bean for
each segment that it encounters. It then invokes the addMethod on the parent-beanref, which must have a Collection property for that
type. There is an additional attribute that is not shown here; cardinality-mode. This attribute defines how to handle segments that don't
adhere to the specified cardinality; that is when there are fewer than the minCount, or more than the maxCount. The default mode is
LOOSE, which means that an error is logged for any violations, but no other action is taken. For cardinality mode RESTRICTED,
in addition to logging an error, no more than the maxCount instances will be added to the parent collection. If the cardinality mode
is set to STRICT, then an exception will be thrown for any violation, either too few, or too many. It is also possible to nest segment-elements
within segment-elements.
This configuration would allow us to read a file such as this (the class size here is much smaller!)
CL|English|2|Ms Buffington|4|S|Bill|Smith|78|S|Peter|Jackson|91|S|Mary|Hardmann|87|S|Susan|Benet|88
CL|Arithmetic|2|Mr Hermann|3|S|Harry|Mirtle|93|S|Helen|Peters|87|S|Johnny|Jones|76
CL|Science|4|Mrs Darbie|4|S|Peter|Jackson|86|S|Susan|Benet|85|S|Kelley|Laver|87|S|Bobbie|Jones|73|S|Pauline|Sturgis|84
For this to work, the ClassPeriod class would look similar to
public class ClassPeriod
{
private String subject;
private int period;
private String teacher;
private int gradeLevel;
private List<Student> students = new ArrayList<Student>();
// The usual getters/setters required for JavaBeans
....
public void addStudent(Student _student)
{
students.add(_student);
}
}
Attributes for segment-element tag
ATTRIBUTE NAME |
VALUE |
REQUIRED |
DEFAULT |
name |
A name for the segment |
no |
none |
minCount |
The least number of this segment allowed per record |
yes |
none |
maxCount |
The greatest number of this segment allowed per record |
yes |
none |
parent-beanref |
The name of the beanref for the parent that contains the segment |
yes |
none |
addMethod |
The name of the method in the parent-beanref for adding an instance of this beanref |
no |
"add" + the name of the segment if specified or the beanref if the name is not specified |
beanref |
The name of the bean to hold the fields for this segment type |
no - if the name is specified yes - if no name is specified |
The name of the segment if specified (there must be a bean with the same name declared) |
cardinality-mode |
Specifies how to handle violations of the specified cardinality |
no |
loose - violations are ignored |
Simplifying Usage With FileParser
The examples which have been presented earlier showing how to use Flatworm to retrieve records from your files (SimpleFlatwormExample and
ComplexFlatwormExample) have required that you load the configuration file. Then you create an InputStream which is used (along with the
FileFormat that you got back from parsing the configuration file) to retrieve the records from the data file. This works very well, and
provides you with a good bit of control over the process. But if you would prefer something a little bit simpler you can use the FileParser
(since version 1.2). To use the FileParser, you instantiate an instance with the name of the configuration file, and the data file. For each
record type that you have defined you call the setBeanHandler(String, Object, String) method, passing the name of the record type, the object
to be invoked on the callback, and the callback method to be invoked. You also set an exception handler by calling the setExceptionHandler(Object,
String) method, passing the object to be invoked on the callback and the callback method to be invoked in case of an exception. You then call
the open() method (which can throw FileNotFoundException or UnsupportedEncodingException). The file is read by calling the read() method, and
when it is finished call the close() method. As each record is read, the callback for that record type will be invoked, passing the MatchedRecord
that was generated. From the matched record you can extract the beans that were created for that record. Lets see what the SimpleFlatwormExample
might look rewritten to use the FileParser.
public class SimpleFlatwormExample
{
public static void main(String[] args)
{
new SimpleFlatwormExample(args[0], args[1]).run();
}
public void run(String configurationFile, String dataFile)
{
FileParser parser = new FileParser(configurationFile, dataFile);
parser.setBeanHandler("newhire", this, "processNewHire");
parser.setExceptionHandler(this, "processException");
try
{
parser.open();
}
catch(Exception e)
{
// handle the exception
}
parser.read();
try
{
parser.close();
}
catch(IOException ignore)
{}
}
public void processHewHire(MatchedRecord record)
{
Employee emp = (Employee)record.getBean("employee");
System.out.println(emp);
}
public void processException(String exception, String inputLine)
{
// Do whatever processing you wish for the exception
}
}
With version 2.0 a new object based callback mechanism has been introduced. Passing a class that implements RecordCallback to the FileParser's
addRecordCallback(String, RecordCallback) method for each record type to be processed avoids the reflective method invocation of the previous
callback mechanism. The previous mechanism is still supported, but will likely be removed in a future version. To set an exception callback with
the new mechanism, pass a class that implements ExceptionCallback to the FileParser's setExceptionCallback(ExceptionCallback) method. with either
callback mechanism only one exception callback is supported, and if none is set and an exception occurs, a RuntimeException will now be thrown.
If you are not interested in processing any record types that are included in the configuration you don't need to add a callback for that type.
Code Examples
Examples of Flatworm usage (including the examples presented in this guide) can be found in the Examples package and in the Test package.
Many of the unit tests are more properly integration tests since they test at a high level of granularity. They therefore do provide good
examples of how the library can be used.
Further Reading
The
JavaDoc
for Flatworm provides details on the core converters provided with the
package.