Perl Modules to implement EOP parsing
Range.pm
========
There are two kinds of ranges in our implementation, "ranges" and
"atomic ranges", and both come in two flavors, floating point and
integer. There is also a symbolic, but at this level it is identical
to integer.
An atomic range is of the form [x] to specify a single number x, or
[lo:hi] to specify the interval from lo to hi, and for integers a third
option is [lo:incr:hi] where incr specifies an increment. For example:
[2] is just the number 2
[3:5] is the interval between 3 and 5, inclusive
[5:3] should be illegal but is treated as [3:5]
[13:2:25] is the odd numbers between 13 and 25, inclusive
[25:-2:13] is treated as [13:2:25]
[13:-2:25] should be illegal but is treated as [13:2:25]
(in other words, the sign of the increment is ignored)
[7:2:10] is really just [7:2:9] in disguise
[3:0:5] should be illegal, but is treated as [3:1:5], aka [3:5]
[3:4:5:6] should be illegal, but is treated as [3:4:5], aka [3]
[3:2:*] says any odd number larger than one
[*:0] means any nonpositive number
A range is a (comma-separated) list of atomic ranges. The list
needn't have more than one element, so an atomic range is also a
range. eg,
[2] the number 2
[3:5] the interval between 3 and 5, inclusive
[1,2:2:10] one and the even numbers up to ten
[2,3,5,8] the Fibonacci numbers less then ten
[2,3:2:7] the primes less than ten
You can see the wonderfully documented Range.pm file for details, but
one of the methods of the range objects is `chooseRandom'. Currently,
the integer and floating point method use different strategies. The
integer method makes a list of all legal integers in the range and
chooses from among them. (Note this will require some rethinking if
the [0:*] notation is implemented!) The floating point method chooses
with equal probability one of its atomic ranges, and then chooses the
parameter uniformly from that atomic range.
There are other range methods to test whether a specified value is in
the object's range, to find the value in the range that is nearest to
the specified value, and even to reflect an out-of-range value into
the object's range.
The file tRange.pl uses Range.pm and tests some of the methods.
Parm.pm
=======
A Parm object specifies how to choose a parameter value, and how to
mutate it, but is not a parameter per se, and does not actually have a
value. The Parm object has methods for parsing the '{del=...,}'
strings that are in the eof file, and maintains the initval and bound
ranges.
Attributes of the Parm object include
o del is the size of jumps made during a mutation
o initval is a value or a range; when a parameter is produced
from this object its initial value is given initval.
o bound specifies the valid range of values for this parameter
Mutations are performed differently for the three different parameter
types.
o Floating point mutations add a delta which is the value of del
times a unit-variance gaussian
o Integer mutations add a delta which is uniformly distributed over
the integers -del,...,+del. Note that a delta of zero is still
considered a mutation (for now).
o Symbolic mutations simply choose randomly from all the legal
choices in the bound range.
For floating point and integer mutation a further check is performed to
ensure that the parameter is in the bound range. If it is not, then
it is reflected inside, and assigned to the nearest valid value. For
instance, if bound=[3:3:18] and a mutation produces a value of 20, then
the reflection will change the 20 to 16, and the nearest valid value is
15, so that will be the mutated value.
ElemOp.pm
=========
This is essentially the same as the old ElemFunc.pm, but with the
new attributes. This is the module that parses the function definitions
in the eop files.
ElemOpTable.pm
==============
This is the replacement for the old FuncTable.pm; it reads the
eop files and makes a hash table (keyed on the ElemOp name, eg
ADDS, ADDP, NOP, etc) of ElemOp objects.
The function tElemOpTable.pl tests this class (and the above classes
that depend on it), and even does a rudimentary test of what a gene
would look like that was based on ElemOp's.
=========================================================
Here is some descripton of other Genie Perl Modules.
Choose.pm: a Choose object consists of an array of items (can be
strings, or pointers, or objects (even other Choose objects))
and an associated array of weights. You can then choose one
of the items with a probability that is proportional to the
weights.
SearchPath.pm: I thought Simon's idea of a search path was such
a good idea, that I made it into an object. The new version
of genie now supports search paths for image files, for .eop
files, and for .pro files.
SimpleDataBase.pm: essentially the same as the Database.pm
module, but I thought a slightly less generic name was more
appropriate. The only other change is that an object of this
class remembers its filename (if you've specified it with
either the new() or the load() command) so you only need to
call $data->save; instead of
$data->save($GlobalValues->databaseFilename); You can always
do $data->save("toSomeOtherFile.sdb") if you want to.
ImagePlanes.pm: the old version had an embarrassingly bad
design, with separate classes for what should have been
separate objects of the same class. This replaces that with a
Planes object, of which data planes, scratch planes, read
planes, answer planes, etc are all objects, and an ImagePlanes
object which aggregates the planes objects as members of a
single object. This makes it easier to enforce relations
among the different planes (eg, read=data+scratch), and more
convenient to just have a single object with all the info.
Global.pm: no more Global.pm.in; an issue of taste mostly, i'd
rather put the @...@ variables all in one place; namely, the
genie.in file. in this case, i could actually get rid of the
two hardcoded paths which were specified by @...@ variables,
since they are now part of the search path options (see
SearchPath above); default search paths based on @...@
variables is not a bad idea, but if we put them in, i'd rather
set them from the genie.in file, and keep the .pm files as-is.
Also reworked the way that options are specified. Now they
are only specified in one place, the optionsHash table,
instead of in two places previously (defaults list, and
options list). Since there are so many options, they are now
sorted alphabetically.
StdObject.pm: recall that almost all of the objects use
StdObject.pm as a base class. This can be done with 'use base
qw(StdObject);' right after the package declaration; that's
shorthand for 'use StdObject; @ISA=qw(StdObject)'. A few
minor enhancements (default "show" and "print" methods are now
provided) and optimizations (a default DESTROY method is
provided, which saves some calls to the relatively less
efficient AUTOLOAD function) and the default error messages
are a little easier to understand.
Range.pm
Parm.pm
ElemOp.pm
ElemOpTable.pm:
these all implement the basic EOP functionality, and though
they have been more carefully documented, debugged, and
generally tweaked, are basically the same as what I had
implemented a month or two ago. One twist, with the Range.pm
class, is that you can now specify one-sided ranges with a '*'
command, eg [3:*] indicates a range from 3 to infinity. or
[0:2:*,*:2:0] to indicate even numbers unbounded from above or
below.
Gene.pm: changing Gene.pm to use EOP's instead of the old fun's
was a little more extensive than I thought it would be. While
I was in there I also altered some of the mutation code. The
various kinds of mutations (wholeGene, parameter, and plane)
are now chosen via the Choose.pm object, a lot cleaner than
what we had before, and enables the user to specify relative
weights in the .opt file. Also, the plane mutations only
mutate a single plane instead of our old systems that just
scrambled them all.
Chromosome.pm, Population.pm, etc: various changes, but
essentially the same functionality as before.
=========================================================
I also imposed some discipline upon the code, and would
recommend that further perl coding follow these standards.
o For (almost) every Blah.pm file, there is a tBlah.pl file
which both tests the module and illustrates its usage. I
found this an effective way to flush out bugs in the code,
because it only tests a single module at a time, rather than a
complicated system of interconnected components.
o For most Blah.pm files, there is now documentation in the form
of "pod" ("plain old documentation") commands built into the
source code. You can type "perldoc Blah" and get
man-page-like documentation for the Blah.pm module. (This is
similar to the 'srcman' system on Alexis.) There also exist
"pod2html" and "pod2latex" conversions; see 'man perlpod' for
more info.
o All of the modules (ie, the Blah.pm files) employ the 'use
strict' directive. This causes runs to fail if strict coding
practices are not followed. Mostly, this involves variables
that are not localized (or my'd). The classic problem looks
like this:
for ($i=0; $i<10; ++$i) {
print "$i: ",&sum_up_to($i),"\n";
}
sub sum_up_to {
$n = shift;
$sum = 0;
for ($i=1; $i<=$n; ++$i) {
$sum += $i;
}
return $sum;
}
The problem is that the $i is a global variable and the $i in
the loop in the subroutine interferes with the $i in the loop
in the calling routine. For very short, very casual, perl
programs, this kind of thing is sometimes okay, but the easy
corrective is to use the 'my' operator, either as a separate
statement
my $i;
for ($i=0...)
or in a more shorthand way, as
for (my $i=0...)
As it turns out, these two approaches are not completely
equivalent; the latter localizes the $i within the loop, and I
generally prefer that. Anyway, the 'use strict' will point
out any variables that you forgot to localize, and so prevent
this kind of interference. See 'perldoc strict' for more info
on other things the directive enforces.
o many of the modules employ the 'use Carp' module. This
provides more sophisticated error messaging. Again, do a
'perldoc Carp' for more info. Note that Carp.pm is part of
the standard Perl distribution.
o most modules have their own $verbose variable; useful but
inessential messages should be written only if this is set:
warn "Creating n=$n genes" if $verbose;
The variable is global to the module, but not to the rest of
the program. It can, however, be set from elsewhere (eg, from
the main genie) if the fully-qualified name is used, eg
$Blah::verbose=1;
Since it is a global variable, 'use strict' will complain about
it, unless we tell it explicitly that verbose is a global
variable. The following three lines appear in a lot of modules:
$verbose=0; ## global to the current package
use vars qw($verbose); ## allow use of global $verbose
use strict; ## but complain about any others
|