NAME
    File::DirSync - Syncronize two directories rapidly

    $Id: DirSync.pm,v 1.46 2007/08/08 21:06:42 rob Exp $

SYNOPSIS
      use File::DirSync;

      my $dirsync = new File::DirSync {
        verbose => 1,
        nocache => 1,
        localmode => 1,
      };

      $dirsync->src("/remote/home/www");
      $dirsync->dst("/home/www");
      $dirsync->ignore("CVS");

      $dirsync->rebuild();

      #  and / or

      $dirsync->dirsync();

DESCRIPTION
    File::DirSync will make two directories exactly the same. The goal is to
    perform this syncronization process as quickly as possible with as few
    stats and reads and writes as possible. It usually can perform the
    syncronization process within a few milliseconds - even for gigabytes or
    more of information.

    Much like File::Copy::copy, one is designated as the source and the
    other as the destination, but this works for directories too. It will
    ensure the entire file structure within the descent of the destination
    matches that of the source. It will copy files, update time stamps,
    adjust symlinks, and remove files and directories as required to force
    consistency.

    The algorithm used to keep the directory structures consistent is a
    dirsync cache stored within the source structure. This cache is stored
    within the timestamp information of the directory nodes. No additional
    checksum files or separate status configurations are required nor
    created. So it will not affect any files or symlinks within the
    source_directory nor its descent.

METHODS
  new( [ { properties... } ] )

    Instantiate a new object to prepare for the rebuild and/or dirsync
    mirroring process.

      $dirsync = new File::DirSync;

    Key/value pairs in a property hash may optionally be specified as well
    if desired as demonstrated in the SYNOPSIS above. The default property
    hash is as follows:

      $dirsync = new File::DirSync {
        verbose => 0,
        nocache => 0,
        localmode => 0,
        src => undef,
        dst => undef,
      };

  src( <source_directory> )

    Specify the source_directory to be used as the default for the rebuild()
    method if none is specified. This also sets the default source_directory
    for the dirsync() method if none is specified.

  dst( <destination_directory> )

    Specify the destination_directory to be used as the default for the
    dirsync() method of none is specified.

  rebuild( [ <source_directory> ] )

    In order to run most efficiently, a source cache should be built prior
    to the dirsync process. That is what this method does. If no
    <source_directory> is specified, you must have already set the value
    through the src() method or by passing it as a value to the "src"
    property to the new() method. Unfortunately, write access to
    <source_directory> is required for this method.

      $dirsync->rebuild( $from );

    This may take from a few seconds to a few minutes depending on the
    number of nodes within its directory descent. For best performance, it
    is recommended to execute this rebuild on the computer actually storing
    the files on its local drive. If it must be across NFS or other remote
    protocol, try to avoid rebuilding on a machine with much latency from
    the machine with the actual files, or it may take an unusually long
    time.

  dirsync( [ <source_directory> [ , <destination_directory> ] ] )

    Copy everything from <source_directory> to <destination_directory>. If
    no <source_directory> or <destination_directory> are specified, you must
    have already set the values through the src() or dst() methods or by
    passing it to the "src" or "dst" properties to new(). Files and
    directories within <destination_directory> that do not exist in
    <source_directory> will be removed. New nodes put within
    <source_directory> since the last dirsync() will be mirrored to
    <destination_directory> retaining permission modes and timestamps. Write
    access to <destination_directory> is required. Read-only access to
    <source_directory> is sufficient since it will not be modifed by this
    method.

      $dirsync->dirsync( $from, $to );

    The rebuild() method should have been run on <source_directory> prior to
    using dirsync() for maximum efficiency. If not, then use the nocache()
    setting to force dirsync() to mirror the entire <source_directory>
    regardless of the dirsync source cache.

  only( <source> [, <source> ...] )

    If you are sure nothing has changed within source_directory except for
    <source>, you can specify a file or directory using this method.

      $dirsync->only( "$from/htdocs" );

    However, the cache will still be built all the way up to the
    source_directory. This only() node must always be a subdirectory or a
    file within source_directory. This option only applies to the rebuild()
    method and is ignored for the dirsync() method. This method may be used
    multiple times to rebuild several nodes. It may also be passed a list of
    nodes. If this method is not called before rebuild() is, then the entire
    directory structure of source_directory and its descent will be rebuilt.

  maxskew( [ future_seconds ] )

    In order to avoid corrupting directory time stamps into the future, you
    can specify a maximum future_seconds that you will permit a node in the
    <source> directory to be modified.

      $dirsync->maxskew( 7200 );

    If the maxskew method is not called, then no corrections to the files or
    directories will be made. If no argument is passed, then future_seconds
    is assumed to be 0, meaning "now" is considered to be the farthest into
    the future that a file should be allowed to be modified.

  ignore( <node> )

    Avoid recursing into directories named <node> within source_directory.
    It may be called multiple times to ignore several directory names.

      $dirsync->ignore("CVS");

    This method applies to both the rebuild() process and the dirsync()
    process.

  lockfile( <lockfile> )

    If this option is used, <lockfile> will be used to ensure that only one
    dirsync process is running at a time. If another process is concurrently
    running, this process will immediately abort without doing anything. If
    <lockfile> does not exist, it will be created. This might be useful say
    for a cron that runs dirsync every minute, but just in case it takes
    longer than a minute to finish the dirsync process. It would be a waste
    of resources to have multiple simultaneous dirsync processes all
    attempting to dirsync the same files. The default is to always dirsync.

  verbose( [ <0_or_1> ] )

      $dirsync->verbose( 1 );

    Read verbose setting or turn verbose off or on. Default is off.

  localmode( [ <0_or_1> ] )

    Read or set local directory only mode to avoid recursing into the
    directory descent.

      $dirsync->localmode( 1 );

    Default is to perform the action recursively by descending into all
    subdirectories of source_directory.

  nocache( [ <0_or_1> ] )

    When mirroring from source_directory to destination_directory, do not
    assume the rebuild() method has been run on the source already to
    rebuild the dirsync cache. All files will be mirrored.

      $dirsync->nocache( 1 );

    If enabled, it will significantly degrade the performance of the
    mirroring process. The default is 0 - assume that rebuild() has already
    rebuilt the source cache.

  gentle( [ <percent> [, <ops> ] ] )

    Specify gentleness for all disk operations. This is useful for those
    servers with very busy disk drives and you need to slow down the sync
    process in order to allow other processes the io slices they demand. The
    <percent> is the realtime percentage of time you wish to be sleeping
    instead of doing anything on the hard drive, i.e., a low value (1) will
    spend most of the time working and a high value (99) will spend most of
    the time sleeping. The <ops> is the number of disk operations you wish
    to perform in between each sleep interval.

      $dirsync->gentle( 25, 1_000 );

    If gentle is called without arguments, then some default "nice" values
    are set. If gentle is not called at all, then it will process all disk
    operations at full blast without sleeping at all.

  proctitle( [ procname ] )

    Enable proctitle mode which shows the current operation on the process
    title. If procname is specified, then it shows that string in the "ps"
    listing. Otherwise, the current $0 is used. This is mostly for progress
    tracking for convenience purposes.

      $dirsync->proctitle( "SYNCING" );

    Default is not to alter the process title at all.

  tracking( [ <0_or_1> ] )

    Enable or disable tracking mode. Operation tracking is disabled by
    default in order to reduce CPU and memory consumption. See entries_*
    methods below for more details.

  entries_updated()

    Returns an array of all directories and files updated in the last
    "dirsync", an empty list if it hasn't been run yet.

  entries_removed()

    Returns an array of all directories and files removed in the last
    "dirsync", an empty list if it hasn't been run yet.

  entries_skipped()

    Returns an array of all directories and files that were skipped in the
    last "dirsync", an empty list if it hasn't been run yet.

  entries_failed()

    Returns an array of all directories and files that failed in the last
    "dirsync", an empty list if it hasn't been run yet.

TODO
    Support for efficient incremental changes to large log files using md5
    checksum comparison on portions of or all of corresponding parts of both
    the larger source and smaller destination files. If no differences are
    found anywhere, including the very end of the destination file, then
    simply append the end of the source to the end of the destination until
    both files are indentical again. Avoid making a full copy of the source
    and especially avoid writing the entire file since writes are so slow
    and plainful.

    Support for hard linking the source files into the destination when they
    both reside on the same device instead of making a full copy.

    Generalized file manipulation routines to allow for easier integration
    with third-party file management systems.

    Support for FTP dirsync (both source and destination).

    Support for Samba style sharing dirsync.

    Support for VFS, HTTP/DAV, and other more standard remote third-party
    file management.

    Support for dereferencing symlinks instead of creating matching symlinks
    in the destination.

BUGS
    If the source or destination directory permission settings do not
    provide write access, there may be problems trying to update nodes
    within that directory.

    If a source file is modified after, but within the same second, that it
    is dirsynced to the destination and is exactly the same size, the new
    version may not be updated to the destination. The source will need to
    be modified again or at least the timestamp changed after the entire
    second has passed by. A quick touch should do the trick.

    It does not update timestamps on symlinks, because I couldn't figure out
    how to do it without dinking with the system clock. :-/ If anyone knows
    a better way, just let the author know.

    Only plain files, directories, and symlinks are supported at this time.
    Special files, (including mknod), pipe files, and socket files will be
    ignored.

    If a destination node is modified, added, or removed, it is not
    guaranteed to revert to the source unless its corresponding node within
    the source tree is also modified. To ensure syncronization to a
    destination that may have been modifed, a rebuild() will also need to be
    performed on the destination tree as well as the source. This bug does
    not apply when using { nocache => 1} however.

    Win32 PLATFORM: Removing or renaming a node from the source tree does
    NOT modify the timestamp of the directory containing that node for some
    reason (see test case t/110_behave.t). Thus, this change cannot be
    detected and stored in the source rebuild() cache. The workaround for
    renaming a file is to modify the contents of the new file in some way or
    make sure at least the modified timestamp gets updated. The workaround
    for removing a file, (which also works for renaming a file), is to
    manually update the timestamp of the directory where the node used to
    reside:

      perl -e "utime time,time,q{.}"

    Then the rebuild() cache can detect and propagate the changes to the
    destination. The other workaround is to disable the rebuild() cache
    (nocache => 1) although the dirsync() process will generally take
    longer.

AUTHOR
    Rob Brown, bbb@cpan.org

COPYRIGHT
    Copyright (C) 2002-2007, Rob Brown, bbb@cpan.org

    All rights reserved.

    This may be copied, modified, and distributed under the same terms as
    Perl itself.

SEE ALSO
    the dirsync(1) manpage, the File::Copy(3) manpage, the perl(1) manpage