Package CedarBackup2 :: Module util
[show private | hide private]
[frames | no frames]

Module CedarBackup2.util

Provides general-purpose utilities.

Author: Kenneth J. Pronovici <pronovic@ieee.org>

Classes
AbsolutePathList Class representing a list of absolute paths.
ObjectTypeList Class representing a list containing only objects with a certain type.
PathResolverSingleton Singleton used for resolving executable paths.
Pipe Specialized pipe class for use by executeCommand.
RestrictedContentList Class representing a list containing only object with certain values.
UnorderedList Class representing an "unordered list".

Function Summary
  convertSize(size, fromUnit, toUnit)
Converts a size in one unit to a size in another unit.
  getUidGid(user, group)
Get the uid/gid associated with a user/group pair
  changeOwnership(path, user, group)
Changes ownership of path to match the user and group.
  splitCommandLine(commandLine)
Splits a command line string into a list of arguments.
  resolveCommand(command)
Resolves the real path to a command through the path resolver mechanism.
  executeCommand(command, args, returnOutput, ignoreStderr, doNotLog, outputFile)
Executes a shell command, hopefully in a safe way.
  calculateFileAge(file)
Calculates the age (in days) of a file.
  encodePath(path)
Safely encodes a filesystem path.
  nullDevice()
Attempts to portably return the null device on this system.
  deviceMounted(devicePath)
Indicates whether a specific filesystem device is currently mounted.
  displayBytes(bytes, digits)
Format a byte quantity so it can be sensibly displayed.
  getFunctionReference(module, function)
Gets a reference to a named function.
  mount(devicePath, mountPoint, fsType)
Mounts the indicated device at the indicated mount point.
  unmount(mountPoint, removeAfter, attempts, waitSeconds)
Unmounts whatever device is mounted at the indicated mount point.

Variable Summary
float ISO_SECTOR_SIZE: Size of an ISO image sector, in bytes.
float BYTES_PER_SECTOR: Number of bytes (B) per ISO sector.
float BYTES_PER_KBYTE: Number of bytes (B) per kilobyte (kB).
float BYTES_PER_MBYTE: Number of bytes (B) per megabyte (MB).
float BYTES_PER_GBYTE: Number of bytes (B) per megabyte (GB).
float KBYTES_PER_MBYTE: Number of kilobytes (kB) per megabyte (MB).
float MBYTES_PER_GBYTE: Number of megabytes (MB) per gigabyte (GB).
int SECONDS_PER_MINUTE: Number of seconds per minute.
int MINUTES_PER_HOUR: Number of minutes per hour.
int HOURS_PER_DAY: Number of hours per day.
int SECONDS_PER_DAY: Number of seconds per day.
int UNIT_BYTES: Constant representing the byte (B) unit for conversion.
int UNIT_KBYTES: Constant representing the kilobyte (kB) unit for conversion.
int UNIT_MBYTES: Constant representing the megabyte (MB) unit for conversion.
int UNIT_SECTORS: Constant representing the ISO sector unit for conversion.
Logger logger = <logging.Logger instance at 0x3ad5130c>
list MOUNT_COMMAND = ['mount']
str MTAB_FILE = '/etc/mtab'
Logger outputLogger = <logging.Logger instance at 0x3ad512ec>
list UMOUNT_COMMAND = ['umount']

Function Details

convertSize(size, fromUnit, toUnit)

Converts a size in one unit to a size in another unit.

This is just a convenience function so that the functionality can be implemented in just one place. Internally, we convert values to bytes and then to the final unit.

The available units are:
  • UNIT_BYTES - Bytes
  • UNIT_KBYTES - Kilobytes, where 1kB = 1024B
  • UNIT_MBYTES - Megabytes, where 1MB = 1024kB
  • UNIT_SECTORS - Sectors, where 1 sector = 2048B
Parameters:
size - Size to convert
           (type=Integer or float value in units of fromUnit)
fromUnit - Unit to convert from
           (type=One of the units listed above)
toUnit - Unit to convert to
           (type=One of the units listed above)
Returns:
Number converted to new unit, as a float.
Raises:
ValueError - If one of the units is invalid.

getUidGid(user, group)

Get the uid/gid associated with a user/group pair

This is a no-op if user/group functionality is not available on the platform.
Parameters:
user - User name
           (type=User name as a string)
group - Group name
           (type=Group name as a string)
Returns:
Tuple (uid, gid) matching passed-in user and group.
Raises:
ValueError - If the ownership user/group values are invalid

changeOwnership(path, user, group)

Changes ownership of path to match the user and group. This is a no-op if user/group functionality is not available on the platform.
Parameters:
path - Path whose ownership to change.
user - User which owns file.
group - Group which owns file.

splitCommandLine(commandLine)

Splits a command line string into a list of arguments.

Unfortunately, there is no "standard" way to parse a command line string, and it's actually not an easy problem to solve portably (essentially, we have to emulate the shell argument-processing logic). This code only respects double quotes (") for grouping arguments, not single quotes ('). Make sure you take this into account when building your command line.

Incidentally, I found this particular parsing method while digging around in Google Groups, and I tweaked it for my own use.
Parameters:
commandLine - Command line string
           (type=String, i.e. "cback --verbose stage store")
Returns:
List of arguments, suitable for passing to popen2.

resolveCommand(command)

Resolves the real path to a command through the path resolver mechanism.

Both extensions and standard Cedar Backup functionality need a way to resolve the "real" location of various executables. Normally, they assume that these executables are on the system path, but some callers need to specify an alternate location.

Ideally, we want to handle this configuration in a central location. The Cedar Backup path resolver mechanism (a singleton called PathResolverSingleton) provides the central location to store the mappings. This function wraps access to the singleton, and is what all functions (extensions or standard functionality) should call if they need to find a command.

The passed-in command must actually be a list, in the standard form used by all existing Cedar Backup code (something like ["svnlook", ]). The lookup will actually be done on the first element in the list, and the returned command will always be in list form as well.

If the passed-in command can't be resolved or no mapping exists, then the command itself will be returned unchanged. This way, we neatly fall back on default behavior if we have no sensible alternative.
Parameters:
command - Command to resolve.
           (type=List form of command, i.e. ["svnlook", ].)
Returns:
Path to command or just command itself if no mapping exists.

executeCommand(command, args, returnOutput=False, ignoreStderr=False, doNotLog=False, outputFile=None)

Executes a shell command, hopefully in a safe way.

This function exists to replace direct calls to os.popen() in the Cedar Backup code. It's not safe to call a function such as os.popen() with untrusted arguments, since that can cause problems if the string contains non-safe variables or other constructs (imagine that the argument is $WHATEVER, but $WHATEVER contains something like "; rm -fR ~/; echo" in the current environment).

Instead, it's safer to pass a list of arguments in the style supported bt popen2 or popen4. This function actually uses a specialized Pipe class implemented using either subprocess.Popen or popen2.Popen4.

Under the normal case, this function will return a tuple of (status, None) where the status is the wait-encoded return status of the call per the popen2.Popen4 documentation. If returnOutput is passed in as True, the function will return a tuple of (status, output) where output is a list of strings, one entry per line in the output from the command. Output is always logged to the ouputLogger.info() target, regardless of whether it's returned.

By default, stdout and stderr will be intermingled in the output. However, if you pass in ignoreStderr=True, then only stdout will be included in the output.

The doNotLog parameter exists so that callers can force the function to not log command output to the debug log. Normally, you would want to log. However, if you're using this function to write huge output files (i.e. database backups written to stdout) then you might want to avoid putting all that information into the debug log.

The outputFile parameter exists to make it easier for a caller to push output into a file, i.e. as a substitute for redirection to a file. If this value is passed in, each time a line of output is generated, it will be written to the file using outputFile.write(). At the end, the file descriptor will be flushed using outputFile.flush(). The caller maintains responsibility for closing the file object appropriately.
Parameters:
command - Shell command to execute
           (type=List of individual arguments that make up the command)
args - List of arguments to the command
           (type=List of additional arguments to the command)
returnOutput - Indicates whether to return the output of the command
           (type=Boolean True or False)
doNotLog - Indicates that output should not be logged.
           (type=Boolean True or False)
outputFile - File object that all output should be written to.
           (type=File object as returned from open() or file().)
Returns:
Tuple of (result, output) as described above.

Notes:

  • I know that it's a bit confusing that the command and the arguments are both lists. I could have just required the caller to pass in one big list. However, I think it makes some sense to keep the command (the constant part of what we're executing, i.e. "scp -B") separate from its arguments, even if they both end up looking kind of similar.
  • You cannot redirect output via shell constructs (i.e. >file, 2>/dev/null, etc.) using this function. The redirection string would be passed to the command just like any other argument. However, you can implement the equivalent to redirection using ignoreStderr and outputFile, as discussed above.

calculateFileAge(file)

Calculates the age (in days) of a file.

The "age" of a file is the amount of time since the file was last used, per the most recent of the file's st_atime and st_mtime values.

Technically, we only intend this function to work with files, but it will probably work with anything on the filesystem.
Parameters:
file - Path to a file on disk.
Returns:
Age of the file in days.
Raises:
OSError - If the file doesn't exist.

encodePath(path)

Safely encodes a filesystem path.

Many Python filesystem functions, such as os.listdir, behave differently if they are passed unicode arguments versus simple string arguments. For instance, os.listdir generally returns unicode path names if it is passed a unicode argument, and string pathnames if it is passed a string argument.

However, this behavior often isn't as consistent as we might like. As an example, os.listdir "gives up" if it finds a filename that it can't properly encode given the current locale settings. This means that the returned list is a mixed set of unicode and simple string paths. This has consequences later, because other filesystem functions like os.path.join will blow up if they are given one string path and one unicode path.

On comp.lang.python, Martin v. Löwis explained the os.listdir behavior like this:
  The operating system (POSIX) does not have the inherent notion that file
  names are character strings. Instead, in POSIX, file names are primarily
  byte strings. There are some bytes which are interpreted as characters
  (e.g. '.', which is '.', or '/', which is '/'), but apart from
  that, most OS layers think these are just bytes.

  Now, most *people* think that file names are character strings.  To
  interpret a file name as a character string, you need to know what the
  encoding is to interpret the file names (which are byte strings) as
  character strings.

  There is, unfortunately, no operating system API to carry the notion of a
  file system encoding. By convention, the locale settings should be used
  to establish this encoding, in particular the LC_CTYPE facet of the
  locale. This is defined in the environment variables LC_CTYPE, LC_ALL,
  and LANG (searched in this order).

  If LANG is not set, the "C" locale is assumed, which uses ASCII as its
  file system encoding. In this locale, '♪♬' is not a
  valid file name (at least it cannot be interpreted as characters, and
  hence not be converted to Unicode).

  Now, your Python script has requested that all file names *should* be
  returned as character (ie. Unicode) strings, but Python cannot comply,
  since there is no way to find out what this byte string means, in terms
  of characters.

  So we have three options:

  1. Skip this string, only return the ones that can be converted to Unicode. 
     Give the user the impression the file does not exist.
  2. Return the string as a byte string
  3. Refuse to listdir altogether, raising an exception (i.e. return nothing)

  Python has chosen alternative 2, allowing the application to implement 1
  or 3 on top of that if it wants to (or come up with other strategies,
  such as user feedback).

As a solution, he suggests that rather than passing unicode paths into the filesystem functions, that I should sensibly encode the path first. That is what this function accomplishes. Any function which takes a filesystem path as an argument should encode it first, before using it for any other purpose.

I confess I still don't completely understand how this works. On a system with filesystem encoding "ISO-8859-1", a path u"♪♬" is converted into the string "♪♬". However, on a system with a "utf-8" encoding, the result is a completely different string: "♪♬". A quick test where I write to the first filename and open the second proves that the two strings represent the same file on disk, which is all I really care about.
Parameters:
path - Path to encode
Returns:
Path, as a string, encoded appropriately
Raises:
ValueError - If the path cannot be encoded properly.

Notes:

  • As a special case, if path is None, then this function will return None.
  • To provide several examples of encoding values, my Debian sarge box with an ext3 filesystem has Python filesystem encoding ISO-8859-1. User Anarcat's Debian box with a xfs filesystem has filesystem encoding ANSI_X3.4-1968. Both my iBook G4 running Mac OS X 10.4 and user Dag Rende's SuSE 9.3 box both have filesystem encoding UTF-8.
  • Just because a filesystem has UTF-8 encoding doesn't mean that it will be able to handle all extended-character filenames. For instance, certain extended-character (but not UTF-8) filenames -- like the ones in the regression test tar file test/data/tree13.tar.gz -- are not valid under Mac OS X, and it's not even possible to extract them from the tarfile on that platform.

nullDevice()

Attempts to portably return the null device on this system.

The null device is something like /dev/null on a UNIX system. The name varies on other platforms.

In Python 2.4 and better, we can use os.devnull. Since we want to be portable to python 2.3, getting the value in earlier versions of Python takes some screwing around. Basically, this function will only work on either UNIX-like systems (the default) or Windows.

deviceMounted(devicePath)

Indicates whether a specific filesystem device is currently mounted.

We determine whether the device is mounted by looking through the system's mtab file. This file shows every currently-mounted filesystem, ordered by device. We only do the check if the mtab file exists and is readable. Otherwise, we assume that the device is not mounted.
Parameters:
devicePath - Path of device to be checked
Returns:
True if device is mounted, false otherwise.

Note: This only works on platforms that have a concept of an mtab file to show mounted volumes, like UNIXes. It won't work on Windows.

displayBytes(bytes, digits=2)

Format a byte quantity so it can be sensibly displayed.

It's rather difficult to look at a number like "72372224 bytes" and get any meaningful information out of it. It would be more useful to see something like "72.37 MB". That's what this function does. Any time you want to display a byte value, i.e.:
  print "Size: %s bytes" % bytes
Call this function instead:
  print "Size: %s" % displayBytes(bytes)
What comes out will be sensibly formatted. The indicated number of digits will be listed after the decimal point, rounded based on whatever rules are used by Python's standard %f string format specifier.
Parameters:
bytes - Byte quantity.
           (type=Integer number of bytes.)
digits - Number of digits to display after the decimal point.
           (type=Integer value, typically 2-5.)
Returns:
String, formatted for sensible display.

getFunctionReference(module, function)

Gets a reference to a named function.

This does some hokey-pokey to get back a reference to a dynamically named function. For instance, say you wanted to get a reference to the os.path.isdir function. You could use:
  myfunc = getFunctionReference("os.path", "isdir")

Although we won't bomb out directly, behavior is pretty much undefined if you pass in None or "" for either module or function.

The only validation we enforce is that whatever we get back must be callable.

I derived this code based on the internals of the Python unittest implementation. I don't claim to completely understand how it works.
Parameters:
module - Name of module associated with function.
           (type=Something like "os.path" or "CedarBackup2.util")
function - Name of function
           (type=Something like "isdir" or "getUidGid")
Returns:
Reference to function associated with name.
Raises:
ImportError - If the function cannot be found.
ValueError - If the resulting reference is not callable.

Copyright: Some of this code, prior to customization, was originally part of the Python 2.3 codebase. Python code is copyright (c) 2001, 2002 Python Software Foundation; All Rights Reserved.

mount(devicePath, mountPoint, fsType)

Mounts the indicated device at the indicated mount point.

For instance, to mount a CD, you might use device path /dev/cdrw, mount point /media/cdrw and filesystem type iso9660. You can safely use any filesystem type that is supported by mount on your platform. If the type is None, we'll attempt to let mount auto-detect it. This may or may not work on all systems.
Parameters:
devicePath - Path of device to be mounted.
mountPoint - Path that device should be mounted at.
fsType - Type of the filesystem assumed to be available via the device.
Raises:
IOError - If the device cannot be mounted.

Note: This only works on platforms that have a concept of "mounting" a filesystem through a command-line "mount" command, like UNIXes. It won't work on Windows.

unmount(mountPoint, removeAfter=False, attempts=1, waitSeconds=0)

Unmounts whatever device is mounted at the indicated mount point.

Sometimes, it might not be possible to unmount the mount point immediately, if there are still files open there. Use the attempts and waitSeconds arguments to indicate how many unmount attempts to make and how many seconds to wait between attempts. If you pass in zero attempts, no attempts will be made (duh).

If the indicated mount point is not really a mount point per os.path.ismount(), then it will be ignored. This seems to be a safer check then looking through /etc/mtab, since ismount() is already in the Python standard library and is documented as working on all POSIX systems.

If removeAfter is True, then the mount point will be removed using os.rmdir() after the unmount action succeeds. If for some reason the mount point is not a directory, then it will not be removed.
Parameters:
mountPoint - Mount point to be unmounted.
removeAfter - Remove the mount point after unmounting it.
attempts - Number of times to attempt the unmount.
waitSeconds - Number of seconds to wait between repeated attempts.
Raises:
IOError - If the mount point is still mounted after attempts are exhausted.

Note: This only works on platforms that have a concept of "mounting" a filesystem through a command-line "mount" command, like UNIXes. It won't work on Windows.


Variable Details

ISO_SECTOR_SIZE

Size of an ISO image sector, in bytes.
Type:
float
Value:
2048.0                                                                

BYTES_PER_SECTOR

Number of bytes (B) per ISO sector.
Type:
float
Value:
2048.0                                                                

BYTES_PER_KBYTE

Number of bytes (B) per kilobyte (kB).
Type:
float
Value:
1024.0                                                                

BYTES_PER_MBYTE

Number of bytes (B) per megabyte (MB).
Type:
float
Value:
1048576.0                                                             

BYTES_PER_GBYTE

Number of bytes (B) per megabyte (GB).
Type:
float
Value:
1073741824.0                                                          

KBYTES_PER_MBYTE

Number of kilobytes (kB) per megabyte (MB).
Type:
float
Value:
1024.0                                                                

MBYTES_PER_GBYTE

Number of megabytes (MB) per gigabyte (GB).
Type:
float
Value:
1024.0                                                                

SECONDS_PER_MINUTE

Number of seconds per minute.
Type:
int
Value:
60                                                                    

MINUTES_PER_HOUR

Number of minutes per hour.
Type:
int
Value:
60                                                                    

HOURS_PER_DAY

Number of hours per day.
Type:
int
Value:
24                                                                    

SECONDS_PER_DAY

Number of seconds per day.
Type:
int
Value:
86400                                                                 

UNIT_BYTES

Constant representing the byte (B) unit for conversion.
Type:
int
Value:
0                                                                     

UNIT_KBYTES

Constant representing the kilobyte (kB) unit for conversion.
Type:
int
Value:
1                                                                     

UNIT_MBYTES

Constant representing the megabyte (MB) unit for conversion.
Type:
int
Value:
2                                                                     

UNIT_SECTORS

Constant representing the ISO sector unit for conversion.
Type:
int
Value:
3                                                                     

logger

Type:
Logger
Value:
<logging.Logger instance at 0x3ad5130c>                                

MOUNT_COMMAND

Type:
list
Value:
['mount']                                                              

MTAB_FILE

Type:
str
Value:
'/etc/mtab'                                                            

outputLogger

Type:
Logger
Value:
<logging.Logger instance at 0x3ad512ec>                                

UMOUNT_COMMAND

Type:
list
Value:
['umount']                                                             

Generated by Epydoc 2.1 on Mon Sep 4 13:49:33 2006 http://epydoc.sf.net