-
Notifications
You must be signed in to change notification settings - Fork 92
Home
Welcome to ctypes.sh, a foreign function interface for bash.
ctypes.sh is a bash plugin that provides a foreign function interface directly in your shell. In other words, it allows you to call routines in shared libraries from within bash.
To help illustrate what ctypes.sh does, here is a trivial example.
$ dlcall puts "Hello, World"
Hello, World
# A more complex example, use libm to calculate sin(PI/2)
$ dlopen libm.so.6
0x172ebf0
$ dlcall -r double sin double:1.57079632679489661923
double:1.000000
All of the ctypes.sh builtins support documentation, acccess it via the bash help
command, for example, help dlopen
.
ctypes.sh is a bash plugin. Do not confuse plugins with scripts, they are unrelated concepts. Plugins are rarely used, but allow you to extend bash at runtime with additional builtins.
A script that automates the loading process and provides some convenience functions is available to source from your scripts.
$ source ctypes.sh
If you're using ctypes.sh in a script and want to verify that it loaded correctly, you can import it like this.
#!/bin/bash
if ! source ctypes.sh; then
echo "please install ctypes.sh to continue"
fi
It is a common pattern in bash to obtain output like so
$ output=$(command --flags input)
or equivalently
$ output=`command --flags input`
This creates a subshell, and modifications to the subshell are discarded once the command completes. A bash programmer might naturally expect to write this to get a handle to a shared library
$ handle=$(dlopen libc.so.6) # DO NOT DO THIS
However, the handle retrieved will be invalid in the parent shell and was probably not what was intended. The correct way to do this is described in the section below. However, if you'e doing something that doesn't have any side-effects (such as printing a string), it will work. For example.
$ string=$(dlcall printf "%lf" double:1.2345)
This will make the value of $string
the string 1.2345
ctypes.sh is a powerful interface, and naturally allows you to shoot yourself in the foot. Using invalid handles or pointers is a good way to crash your shell, or cause unexpected behaviour.
$ dlcall -h 0xdeadbeef crashcrashcrash
Segmentation fault
Never report a bug to the bash maintainers unless you can reproduce it without the ctypes plugin loaded. The source of the bug is most likely either ctypes.sh, or your script.
Bash plugins are not commonly used, and distributions often package the feature incompletely or incorrectly due to limited testing. Header files are very rarely provided and dynamic symbols are sometimes exported incorrectly.
- A list of known-working distributions and platforms is here
- A list of symptoms and suggested fixes is here. TODO
ctypes.sh provides comprehensive access to the dlopen interface, but for typical usage the defaults will work fine.
$ dlopen libz.so
0x2232450
By default, libraries are added to the global scope, so you probably won't need to use the handle returned. If you do need it, simply lookup the soname in the DLHANDLES
array. You would usually only need to do this if you want to close a handle, if the same symbol name is provided by two libraries, or if you didn't want to pollute the global scope.
$ echo ${DLHANDLES[libz.so]}
0x2232450
Two pseudo-handles are provided by ctypes.sh, $RTLD_DEFAULT
and $RTLD_NEXT
. These special handles are described in the dlopen(3)
manual.
If you want to reference an internal bash symbol (for example, you want to lookup the address of a bash variable) you don't need to use a handle, $RTLD_DEFAULT
is assumed by default, so it is sufficient to do this:
$ foobar="hello bash internals"
$ dlcall -r pointer get_string_value foobar
pointer:0x222b3f0
$ dlcall puts pointer:0x222b3f0
hello bash internals
Note that get_string_value
is a symbol provided by bash, not ctypes.sh. ctypes.sh simply allows you to access these internal symbols.
The default mode of dlopen is suitable for most operations, but for more control over the load you may specify flags on the commandline. dlopen supports bash-style switches for the most common flags, or you may specify the flags on the commandline if no switch exists.
$ dlopen libz.so RTLD_GLOBAL RTLD_LAZY
or
$ dlopen -l -g libz.so
If you need a very rarely used flag that ctypes.sh does not know about, you can specify it numerically.
$ dlopen libc.so.6 0x101
For a full list of options supported, use the builtin help
$ help dlopen
By default, libraries will be opened at global scope using RTLD_GLOBAL, but you can disable this with the -g
flag.
To obtain a list of exported symbols from a loaded library, use the standard UNIX command nm
.
$ nm -D /lib64/libz.so
00000000000023f0 T adler32
000000000000c680 T compress
0000000000002a60 T crc32
00000000000046a0 T deflate
...
To call a function, you must know its return type and its parameters, you then call the function with dlcall
. By default, the return value is stored in the DLRETVAL
variable, but you can change that if you wish.
Lets look at an example before the details are explained.
# What are the parameters to crc32?
$ grep crc32 /usr/include/zlib.h
unsigned long crc32(unsigned long crc, const char *buf, unsigned len);
$ dlopen libz.so
0x2232450
$ dlcall -r long crc32 long:0 "hello" 5
long:907060870
# What is that in hex?
$ printf "%#x\n" ${DLRETVAL##*:}
0x3610a686
Because bash only supports two primitive data types (strings and integers), it is necessary to introduce a syntax to encode additional types that might be encountered. ctypes.sh uses prefixed types strings, like so:
<primitive type>:<formatted value>
For example:
float:3.141459
long:25979456
int8:-2
string:hello
pointer:0xdeadbeef
However, for convenience, if a type is not prefixed, then the following rule applies:
- If it can be parsed perfectly as an integer, it is assumed to be an integer.
- perfectly means
endptr=='\0'
andendptr != nptr
, see thestrtoul
manual for details. - Otherwise, it is assumed to be a nul-terminated C string.
You should always use a prefix for non-hardcoded values, or unexpected colons or integers might disrupt parsing.
dlcall recognises some common primitive type names:
Prefix | Example | Range | Notes |
---|---|---|---|
uint8 | uint8:128 |
0-255 | |
int8 | int8:-12 |
-127-128 | |
uint16 | uint16:387 |
||
int16 | int16:-922 |
||
uint32 | uint32:299769 |
||
int32 | int32:-1 |
||
uint64 | uint64:11 |
||
int64 | int64:-123 |
||
float | float:3.1412 |
||
double | double:12e10 |
||
char | char:10 |
||
uchar | uchar:102 |
||
ushort | ushort:123 |
||
short | short:-123 |
||
unsigned | unsigned:0 |
||
int | int:-23 |
||
ulong | ulong:1231 |
||
long | long:123 |
||
longdouble | longdouble:1.23 |
||
pointer | pointer:0xdeadbeef |
||
string | string:hello |
||
void | void: |
||
rawdouble | rawdouble:0x1.8p+0 |
||
rawfloat | rawfloat:0x1.8p+0 |
A good way to experiment with prefixed types is by calling printf.
$ dlcall printf "%s %u %p %c" string:Hello unsigned:123 pointer:0xdeadbeef int:10
ctypes.sh can automatically import most structure definitions from libraries via the struct
command.
More information on using struct
is available here.
TODO
ctypes.sh can generate callable function pointers to bash functions, for use as callbacks or function pointers. Examples of where this is necessary are the standard library functions qsort
and bsearch
.
To write a native callable function, first define the function. The first parameter should be a pointer to store the return code, followed by the formal parameters you want.
It is usually not possible to return the value using the return command in bash, because functions in bash can only return small integers <= 255. For this reason, a pointer is provided to the required return type.
Lets see how this works by calling qsort from bash.
#!/bin/bash
source ctypes.sh
declare -i sortsize=128 # size of array
declare -a values # array of values
set -e
# int compare(const void *, const void *)
function compare {
local -a x=(int)
local -a y=(int)
local -a result
# extract the parameters
unpack $2 x
unpack $3 y
# remove the prefix
x=${x##*:}
y=${y##*:}
# calculate result
result=(int:$((y - x)))
# return result to caller
pack $1 result
return
}
# Generate a function pointer to compare that can be called from native code.
callback -n compare compare int pointer pointer
# Generate an array of random values
for ((i = 0; i < sortsize; i++)); do
values+=(int:$RANDOM)
done
# Verify that array is not sorted
if sort --check=silent --numeric <(IFS=$'\n'; echo "${values[*]##*:}"); then
echo FAIL
exit 1
fi
# Allocate space for integers
dlcall -n buffer -r pointer malloc $((sortsize * 4))
# Pack our random array into that native array
pack $buffer values
# Now qsort can sort them
dlcall qsort $buffer long:$sortsize long:4 $compare
# Unpack the sorted array back into a bash array
unpack $buffer values
# Verify they're sorted
if ! sort --check --numeric <(IFS=$'\n'; echo "${values[*]##*:}"); then
echo FAIL
exit 1
fi
echo PASS
Here is the output
$ bash qsort.sh
PASS
Sometimes you may want to access an exported symbol that is not a function, or you want to know the address of an exported function. For example, you might want to know the address of environ
or errno
.
To do this, use the dlsym
builtin. For example, here is how to access a bash internal symbol.
# I don't want to use $!, let's grab it from inside bash.
$ dlsym last_asynchronous_pid
pointer:0x6ecf14
$ pid=(int)
$ sleep 100 &
[2] 57271
$ unpack pointer:0x6ecf14 pid
$ echo ${pid##*:}
57271
$ dlopen libm.so.6
0x1dcead0
$ dlcall -n result -r double sin double:123.123
double:-0.565374