Inheritance

First, we define a class to read and write with files, FileIO.pm. The class is essentially nothing more than the name of the file (filename) and an array (filedata) equal to the contents of the file. (In addition there is a date and writemode field.) In other words, a FileIO object is a lot like a cached file.

The important methods in this class are read and write, which move data between filedata and the associated file. The read method appears to be overly general here, because it apparently allows for the FileIO's attributes to be set via the argument list, yet %attribute_properties indicates that all attributes are "noinit", meaning they can't be set in this manner. The reason for this approach is to enable subclasses to use read, as we'll see below.

The statement that does most of the work is:

    $self->{'_filedata'} = [ < FileIOFH > ];

The write method is similar. First it allows the caller to set the attributes of the FileIO object via the argument list, then it writes the FileIO object to its file, using the mode (such as > for overwriting the current file contents, and >> for appending to the file.) The statement that does most of the work is:

    unless( open( FileIOFH, $self->get_writemode . $self->get_filename ) )

We can test this class with testFileIO. In this file, note the use of the write method: we change the value of the filename field here via the parameter list to write so that the FileIO object will be written to a different file than it was initialized from.

Inheritance: SeqFileIO derived from FileIO

In this section we create a "subclass" of FileIO that is specialized to handle files that contain biological sequences. That class, SeqFileIO, is defined via the file SeqFileIO.pm. The most important thing to note in this class is the use of the base operator. If a Perl cannot find a method of a given name in a class, this mechanism causes Perl to look for the definition of that method in superclasses. For example, suppose we have a SeqFileIO object, $obj. Then $obj->get_count will invoke the get_count method defined in FileIO.

Inherited Closures

There are some seeming oddities. Note that SeqFileIO redefines _all_attributes, _permissions and _attribute_default, even those these definitions are identical to the ones in FileIO. This is because those methods are defined within a closure. We don't want SeqFileIO to use the inherited versions, because those would refer to the the value of %_attribute_properties in FileIO rather than in SeqFileIO. Now when the (inherited) AUTOLOAD method is invoked it will invoke these closure-related methods, and the appropriate ones will be invoked (depending on whether the invokee was a FileIO or a SeqFileIO).

Specialization Fields and Methods

Several fields are added to specialize a FileIO into a SeqFileIO. First, there is an array, _seqfileformats explicitly detailing the various file formats the SeqFileIO class can recognize. In other words, specifying what types of biological sequence data the class can handle. For each file format, x, SeqFileIO provides three methods, is_x, parse_x and put_x.

We can initialize a SeqFileIO object from a Genbank file, such as record.gb, via the code:

my $genbank = SeqFileIO->new();
  $genbank->read( 
    filename => 'record.gb'
  );

Then we can fiddler with these methods by debugging testSeqFileIO. Try it: set a breakpoint just beyond where the code above appears in testSeqFileIO. Go ahead and experiment with $genbank. Here's a sample session:

main::(testSeqFileIO:72):       print "\n####################\n#################
###\n####################\n";
  DB<3> print $genbank->get_header
AB031069     2487 bp    mRNA            PRI       27-MAY-2000 Sequence severely
truncated for demonstration. AB031069
  DB<4> print $genbank->get_sequence
AGATGGCGGCGCTGAGGGGTCTTGGGGGCTCTAGGCCGGCCACCTACTGGTTTGCAGCGGAGACGACGCATGGGGCCTGC
GCAATAGGAGTACGCTGCCTGGGAGGCGTGACTAGAAGCGGAAGTAGTTGTGGGCGCCTTTGCAACCGCCTGGGACGCCG
CCGAGTGGTCTGTGCAGGTTCGCGGGTCGCTGGCGGGGGTCGTGAGGGAGTGCGCCGGGAGCGGAGATATGGAGGGAGAT
AAAAAAAAAAAAAAAAAAAAAAAAAAA
  DB<5> @matt = $genbank->put_genbank

  DB<6> p "@matt"
LOCUS       AB031069       267 bp
 DEFINITION  AB031069     2487 bp    mRNA            PRI       27-MAY-2000 Seque
nce severely truncated for demonstration. AB031069 , 267 bases, 829 sum.
 ACCESSION  AB031069
 ORIGIN
        1  agatggcggc gctgaggggt cttgggggct ctaggccggc cacctactgg tttgcagcgg
       61  agacgacgca tggggcctgc gcaataggag tacgctgcct gggaggcgtg actagaagcg
      121  gaagtagttg tgggcgcctt tgcaaccgcc tgggacgccg ccgagtggtc tgtgcaggtt
      181  cgcgggtcgc tggcgggggt cgtgagggag tgcgccggga gcggagatat ggagggagat
      241  aaaaaaaaaa aaaaaaaaaa aaaaaaa
 //

  DB<7>

Take Care Using SeqFileIO as a Translator

In the above session, note that the header field holds a lot less information than is provided in the annotation section of record.gb! This indicates that information can be lost if these methods are used to translate a file from one format to another. Indeed, in the example above, I am translating from one format to the same format. While the output generated does conform to the syntax of a Genbank file, it is missing many of the details present in the annotation (i.e., the "header") of the original file.

Parse_ Methods

It is harder to play with the various parse_ methods because they are invoked only indirectly by read. The idea is to first load the contents of a sequence file into the SeqFileIO object, then use the is_ methods to determine the file format of the contents of the file. The corresponding parse_ method is then used to initialize the attribute fields (such as sequence, header, id and accession) of the SeqFileIO object. The sequence data could then be manipulated, and a put_ method could be used to store it in a file in whatever format we wanted.

Parse_ and read()

The read() method is particularly clever. Notice how it goes about invoking the parse_ method that is appropriate to the file format with the code:

    $self->{'_format'} = $self->isformat;
    my $parsemethod = 'parse' . $self->{'_format'};
    $self->$parsemethod;

Now go and play!

I urge you to use the debugger to examine execution of the class for testing SeqFileIO, testSeqFileIO.