Package logilab :: Package common :: Module textutils
[frames] | no frames]

Module textutils

source code

Some text manipulation utility functions.






:group text formatting: normalize_text, normalize_paragraph, pretty_match,unquote, colorize_ansi
:group text manipulation: searchall, splitstrip
:sort: text formatting, text manipulation

:type ANSI_STYLES: dict(str)
:var ANSI_STYLES: dictionary mapping style identifier to ANSI terminal code

:type ANSI_COLORS: dict(str)
:var ANSI_COLORS: dictionary mapping color identifier to ANSI terminal code

:type ANSI_PREFIX: str
:var ANSI_PREFIX:
  ANSI terminal code notifying the start of an ANSI escape sequence

:type ANSI_END: str
:var ANSI_END:
  ANSI terminal code notifying the end of an ANSI escape sequence

:type ANSI_RESET: str
:var ANSI_RESET:
  ANSI terminal code resetting format defined by a previous ANSI escape sequence

Functions
 
unormalize(ustring, ignorenonascii=False)
replace diacritical characters with their corresponding ascii characters...
source code
 
unquote(string)
remove optional quotes (simple or double) from the string
source code
 
normalize_text(text, line_len=80, indent='', rest=False)
normalize a text to display it with a maximum line size and optionally arbitrary indentation.
source code
 
normalize_paragraph(text, line_len=80, indent='')
normalize a text to display it with a maximum line size and optionally arbitrary indentation.
source code
 
normalize_rest_paragraph(text, line_len=80, indent='')
normalize a ReST text to display it with a maximum line size and optionally arbitrary indentation.
source code
 
splittext(text, line_len)
split the given text on space according to the given max line size
source code
 
splitstrip(string, sep=',')
return a list of stripped string by splitting the string given as argument on `sep` (',' by default).
source code
 
apply_units(string, units, inter=None, final=float, blank_reg=_BLANK_RE, value_reg=_VALUE_RE)
Parse the string applying the units defined in units (e.g.: "1.5m",{'m',60} -> 80).
source code
 
pretty_match(match, string, underline_char='^')
return a string with the match location underlined:
source code
 
colorize_ansi(msg, color=None, style=None)
colorize message by wrapping it with ansi escape codes
source code
 
diff_colorize_ansi(lines, out=sys.stdout, style=DIFF_STYLE) source code
Variables
  linesep = '\n'
  MANUAL_UNICODE_MAP = {u'\xa1': u'!', u'\u0142': u'l', u'\u2044...
  get_csv = deprecated()(splitstrip)
  BYTE_UNITS = {"b": 1, "kb": 1024, "mb": 1024** 2, "gb": 1024**...
  TIME_UNITS = {"ms": 0.0001, "s": 1, "min": 60, "h": 60* 60, "d...
  ANSI_PREFIX = '\033['
  ANSI_END = 'm'
  ANSI_RESET = '\033[0m'
  ANSI_STYLES = {'reset': "0", 'bold': "1", 'italic': "3", 'unde...
  ANSI_COLORS = {'reset': "0", 'black': "30", 'red': "31", 'gree...
  DIFF_STYLE = {'separator': 'cyan', 'remove': 'red', 'add': 'gr...
Function Details

unormalize(ustring, ignorenonascii=False)

source code 
replace diacritical characters with their corresponding ascii characters
    

unquote(string)

source code 
remove optional quotes (simple or double) from the string

:type string: str or unicode
:param string: an optionally quoted string

:rtype: str or unicode
:return: the unquoted string (or the input string if it wasn't quoted)

normalize_text(text, line_len=80, indent='', rest=False)

source code 
normalize a text to display it with a maximum line size and
optionally arbitrary indentation. Line jumps are normalized but blank
lines are kept. The indentation string may be used to insert a
comment (#) or a quoting (>) mark  for instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

normalize_paragraph(text, line_len=80, indent='')

source code 
normalize a text to display it with a maximum line size and
optionally arbitrary indentation. Line jumps are normalized. The
indentation string may be used top insert a comment mark for
instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

normalize_rest_paragraph(text, line_len=80, indent='')

source code 
normalize a ReST text to display it with a maximum line size and
optionally arbitrary indentation. Line jumps are normalized. The
indentation string may be used top insert a comment mark for
instance.

:type text: str or unicode
:param text: the input text to normalize

:type line_len: int
:param line_len: expected maximum line's length, default to 80

:type indent: str or unicode
:param indent: optional string to use as indentation

:rtype: str or unicode
:return:
  the input text normalized to fit on lines with a maximized size
  inferior to `line_len`, and optionally prefixed by an
  indentation string

splittext(text, line_len)

source code 
split the given text on space according to the given max line size

return a 2-uple:
* a line <= line_len if possible
* the rest of the text which has to be reported on another line

splitstrip(string, sep=',')

source code 
return a list of stripped string by splitting the string given as
argument on `sep` (',' by default). Empty string are discarded.

>>> splitstrip('a, b, c   ,  4,,')
['a', 'b', 'c', '4']
>>> splitstrip('a')
['a']
>>>

:type string: str or unicode
:param string: a csv line

:type sep: str or unicode
:param sep: field separator, default to the comma (',')

:rtype: str or unicode
:return: the unquoted string (or the input string if it wasn't quoted)

apply_units(string, units, inter=None, final=float, blank_reg=_BLANK_RE, value_reg=_VALUE_RE)

source code 
Parse the string applying the units defined in units
(e.g.: "1.5m",{'m',60} -> 80).

:type string: str or unicode
:param string: the string to parse

:type units: dict (or any object with __getitem__ using basestring key)
:param units: a dict mapping a unit string repr to its value

:type inter: type
:param inter: used to parse every intermediate value (need __sum__)

:type blank_reg: regexp
:param blank_reg: should match every blank char to ignore.

:type value_reg: regexp with "value" and optional "unit" group
:param value_reg: match a value and it's unit into the

pretty_match(match, string, underline_char='^')

source code 
return a string with the match location underlined:

>>> import re
>>> print pretty_match(re.search('mange', 'il mange du bacon'), 'il mange du bacon')
il mange du bacon
   ^^^^^
>>>

:type match: _sre.SRE_match
:param match: object returned by re.match, re.search or re.finditer

:type string: str or unicode
:param string:
  the string on which the regular expression has been applied to
  obtain the `match` object

:type underline_char: str or unicode
:param underline_char:
  character to use to underline the matched section, default to the
  carret '^'

:rtype: str or unicode
:return:
  the original string with an inserted line to underline the match
  location

colorize_ansi(msg, color=None, style=None)

source code 
colorize message by wrapping it with ansi escape codes

:type msg: str or unicode
:param msg: the message string to colorize

:type color: str or None
:param color:
  the color identifier (see `ANSI_COLORS` for available values)

:type style: str or None
:param style:
  style string (see `ANSI_COLORS` for available values). To get
  several style effects at the same time, use a coma as separator.

:raise KeyError: if an unexistent color or style identifier is given

:rtype: str or unicode
:return: the ansi escaped string


Variables Details

MANUAL_UNICODE_MAP

Value:
{u'\xa1': u'!', u'\u0142': u'l', u'\u2044': u'/', u'\xc6': u'AE', u'\x\
a9': u'(c)', u'\xab': u'"', u'\xe6': u'ae', u'\xae': u'(r)', u'\u0153'\
: u'oe', u'\u0152': u'OE', u'\xd8': u'O', u'\xf8': u'o', u'\xbb': u'"'\
, u'\xdf': u'ss',}

BYTE_UNITS

Value:
{"b": 1, "kb": 1024, "mb": 1024** 2, "gb": 1024** 3, "tb": 1024** 4,}

TIME_UNITS

Value:
{"ms": 0.0001, "s": 1, "min": 60, "h": 60* 60, "d": 60* 60* 24,}

ANSI_STYLES

Value:
{'reset': "0", 'bold': "1", 'italic': "3", 'underline': "4", 'blink': \
"5", 'inverse': "7", 'strike': "9",}

ANSI_COLORS

Value:
{'reset': "0", 'black': "30", 'red': "31", 'green': "32", 'yellow': "3\
3", 'blue': "34", 'magenta': "35", 'cyan': "36", 'white': "37",}

DIFF_STYLE

Value:
{'separator': 'cyan', 'remove': 'red', 'add': 'green'}