A general purpose, extensible search utility, written in Python.
search
is capable of searching for regular expressions in: text files; path
names; or symbol names in object files. It has a flexible module system to allow
the authoring of additional types of search.
Copyright © 2017-2018 Jonathan Simmonds
search
requires:
- Python 2.6+
Additionally the provided modules require:
files
:grep
(currently GNU and BSD variants are supported)symbols
:objdump
(currently GNU and LLVM variants are supported)
All files are licensed under the MIT license.
usage: search [-h] [--version] [dirs | files | symbols [-u]] [-i] [-v]
[path [path ...]] regex
A module-based, recursive file searching utility.
positional arguments:
path Optional path(s) to perform the search in or on. If
omitted the current working directory is used.
regex Perl-style regular expression to search for. It is
recommended to pass this in single quotes to prevent
shell expansion or interpretation of the regex
characters.
global arguments:
-h, --help Show this help message and exit.
--version Show the version number of the program and its
installed modules and exit.
search modules:
{dirs,files,symbols} Select which search module to use. Defaults to files.
dirs Search recursively on the file names of any files in
the given paths.
files Search recursively on the contents of any files in the
given paths.
symbols Search recursively in any object files or archives for
symbols with names matching regex.
common arguments:
-i, --ignore-case Enable case-insensitive searching.
-v, --verbose Enable verbose, full replication of the result column,
even if it means taking multiple lines per match (by
default the result will be condensed to keep one line
per match if possible).
dirs module:
no additional arguments
files module:
no additional arguments
symbols module:
optional arguments:
-u, --undefined Also print undefined symbols (i.e. in objects which
reference but don't define the symbol).
$ search search_modules 'def search\('
search_modules/files.py:65 def search(regex, paths, args, ignore_case=False, verbose=False):
search_modules/dirs.py:42 def search(regex, paths, args, ignore_case=False, verbose=False):
search_modules/symbols.py:485 def search(regex, paths, args, ignore_case=False, verbose=False):
$ search -i symbols -u test/symbols/*.o DIV
test/symbols/math_mul.o
Function symbol:
Name: _div
Section: __TEXT,__text
Value: 0x60
Size: 0x0
test/symbols/util_number.o
Undefined symbol:
Name: _div
$ search dirs '.\.md'
./README.md
search
has been designed from the ground up to be extensible and has a module
system allowing the contribution of custom search modules to enable new ways to
search.
By itself the search
utility does nothing - it is a CLI driver: loading and
initializing all available modules, parsing the command line and directing the
search request to the appropriate module.
search
comes with three provided modules:
dirs
: Search recursively on the file names of any files in the given paths, similar to the Unixfind
command.files
: Search recursively on the contents of any files in the given paths, similar to the Unixgrep
command. This is the default module which will be used if no module is specified.symbols
: Search recursively in any object files or archives for symbols with names matching a regex - similar to anobjdump | grep
pipeline.
If you have been given an additional module, you may install it simply by
placing it in the search_modules
directory alongside the search
executable.
The driver will load all modules in the search_modules
directory alongside the
search
executable. With each of these it will bind the following methods:
-
create_subparser(subparsers)
This method will be called by the driver during module initialization to allow the module to add a subparser to the main parser. This will then automatically contribute help text to the driver and allow selecting of the module in a query. Additional, module-specific arguments can be added to the subparser if necessary. NB: Any added subparser must use the
add_help=False
keyword argument to prevent automatically adding help options. Failure to do so will result in an exception when loading the module - help options are added and handled by the driver.The arguments are as follows:
subparsers
: Special handle object (argparse._SubParsersAction
) which can be used to add subparsers to a parser.
The return is as follows:
- Object representing the created subparser.
-
search(regex, paths, args, ignore_case, verbose)
This method will be called to process a search query.
The arguments are as follows:
regex
: String regular expression to search with.paths
: List of strings representing the paths to search in/on.args
: Namespace containing all parsed arguments. If the subparser added additional arguments these will be present.ignore_case
: Boolean, True if the search should be case-insensitive, False if it should be case-sensitive.verbose
: Boolean, True for verbose output, False otherwise.
The return is as follows:
- Not expected to return anything. Any output must be printed by the method itself.
The module loading will fail if these methods cannot be bound.
Putting all this together, if we wanted to add a new dummy
module, all we
would have to do is place a new file dummy.py
in the search_modules
directory. The most basic, functional contents would look like the following:
def search(regex, paths, args, ignore_case=False, verbose=False):
"""Perform the requested search.
Args:
regex: String regular expression to search with.
paths: List of strings representing the paths to search in/on.
args: Namespace containing all parsed arguments. If the
subparser added additional arguments these will be present.
ignore_case: Boolean, True if the search should be case-insensitive,
False if it should be case-sensitive.
verbose: Boolean, True for verbose output, False otherwise.
"""
for path in paths:
pass # Do some kind of searching here...
def create_subparser(subparsers):
"""Creates this module's subparser.
Args:
subparsers: Special handle object (argparse._SubParsersAction) which can
be used to add subparsers to a parser.
Returns:
Object representing the created subparser.
"""
parser = subparsers.add_parser(
'dummy',
add_help=False,
help='Do nothing at all.')
return parser
Any __version__
member in the module will be picked up as the module's
version information. All modules should have a 1.x version number: major
version numbers greater than this are reserved for future use.
There are a number of helper objects provided for describing search results, formatting output and printing it to the console. These are briefly outlined below and described in much more detail in the Python docstrings:
result
- Provides types necessary to build
SearchResult
objects. SearchResult
: container for describing a single result to a search query.Match
: abstract part of aSearchResult
describing the component which matched the query. TheStringMatch
implementation is provided for a match found in a string (this will probably cover 90% of use cases). Modules may subclass if necessary to provide bespoke match types.Location
: abstract, optional part of aSearchResult
describing where the match has been found. TheTextFileLocation
implementation is provided for a match which has been located in a text file. Modules may subclass if necessary to provide bespoke location types.
- Provides types necessary to build
printer
- Provides printers for printing streamed
SearchResult
objects. AbstractPrinter
: abstract printer to print streamedSearchResult
s. Once created theprint_results
method may be called on it with aSearchResult
iterable to print the output. It is assumed the search query may be long running and it is desireable to print output as found (i.e. before termination), so it makes most sense to call this method with a generator function. A number of implementations of printers are provided. Modules may subclass if necessary to provide bespoke printers.
- Provides printers for printing streamed
console
- Provides very basic console utility functions. Mostly used for writing custom printers.
ansi
- Provides utilities for adding ANSI formatting to a string (i.e. coloring it) for console output. Mostly used for writing custom match or location types.
process
- Provides wrappers to support streaming output from subprocesses. Mostly used for writing search modules which call out to separate tools or command line utilities.
Bringing everything together: a skeleton, functional module might look like the following:
from search_utils.printer import MultiLinePrinter, SingleLinePrinter
from search_utils.process import StreamingProcess
from search_utils.result import SearchResult, StringMatch, TextFileLocation
# Module version.
__version__ = '1.0'
def parse_result(line, regex=None):
"""Creates a SearchResult object from the output of a grep command.
Args:
line: String single line of grep output to process.
regex: String regex this result is derived from, or None if unknown.
Defaults to None.
Returns:
The initialized SearchResult.
"""
path_split = line.split(' ', 1)
line_split = path_split[1].split(':', 1)
return SearchResult(StringMatch(line_split[1].strip(), regex),
TextFileLocation(path_split[0], int(line_split[0])))
def search(regex, paths, args, ignore_case=False, verbose=False):
"""Perform the requested search.
Args:
regex: String regular expression to search with.
paths: List of strings representing the paths to search in/on.
args: Namespace containing all parsed arguments. If the
subparser added additional arguments these will be present.
ignore_case: Boolean, True if the search should be case-insensitive,
False if it should be case-sensitive.
verbose: Boolean, True for verbose output, False otherwise.
"""
with StreamingProcess(['grep', '-rnP', regex] + paths) as proc:
printer = MultiLinePrinter() if verbose else SingleLinePrinter()
printer.print_results(parse_result(line, regex) for line in proc)
def create_subparser(subparsers):
"""Creates this module's subparser.
Args:
subparsers: Special handle object (argparse._SubParsersAction) which can
be used to add subparsers to a parser.
Returns:
Object representing the created subparser.
"""
parser = subparsers.add_parser(
'dummy',
add_help=False,
help='Do nothing very much.')
return parser
This does roughly what the files
module does, although simplified and
considerably less robust. The brevity of this module (most of it is docstrings)
illustrates the power of the provided utility functions.
Module authors are encouraged to review the provided modules and the docstrings
for further inspiration. The dirs
module is by far the simplest (and a
pure-Python implementation), whereas the symbols
module is by far the most
complex (with custom match and location types).