Discussion:
ffi-help: updated documentation
Matt Wette
2018-01-20 01:17:20 UTC
Permalink
Hi All,

I am working on a ffi-helper (FH): a program that will read in a C dot-h file
and generate a Guile dot-scm file which defines a module to provide hooks into
the associated C library.

This is a rework of the first part of the documentation. It provides an example
and a section explaining part of the design.

(I have recently dumped my macbook w/ flaky keyboard for a ubuntu laptop. I am
still adjusting. I am missing macports a little.)

Matt

FFI Helper for Guile
********************

Matt Wette
January 2018
With NYACC Version 0.83.0

1 Introduction
**************

The acronym FFI stands for "Foreign Function Interface". It refers to
the Guile facility for binding functions and variables from C source
libraries into Guile programs.  This distribution provides utilities for
generating a loadable Guile module from a set of C declarations and
associated libraries.  The C declarations can, and conventionally do,
come from naming a set of C include files.  The nominal method for use
is to write a _ffi-module_ specification in a file which includes a
'define-ffi-module' declaration, and then use the command 'guild
compile-ffi' to produce an associated file of Guile Scheme code.
     $ guild compile-ffi ffi/cairo.ffi
     wrote `ffi/cairo.scm'
The FH does not generate C code.  The hooks to access functions in the
Cairo library are provided in 100% Guile Scheme via '(system foreign)'.

   The compiler for the FFI Helper (FH) is based on the C parser and
utilities which are included in the NYACC (https://www.nongnu.org/nyacc)
package.  Development for the FH is currently being performed in the
'c99dev' branch of the associated git repository.  Within the NYACC
distribution, the relevant modules can be found under the directory
'examples/'.

   Use of the FFI-helper module depends on the _scheme-bytestructure_
package available from
<https://github.com/TaylanUB/scheme-bytestructures>. Releases are
available at
<https://github.com/TaylanUB/scheme-bytestructures/releases>.

   At runtime, after the FFI Helper has been used to create Scheme code,
the modules '(system ffi-help-rt)' and '(bytestructures guile)' are
required.  No other code from the NYACC distribution is needed.
However, note that the process of creating the Scheme output depends on
reading system headers, so the generated code may well contain operating
system and machine dependencies.  If you copy code to a new machine, you
should re-run 'guild compile-ffi'.

   You are probably hoping to see an example, so let's try one.

   This is a small FH example to illustrate its use.  We will start with
the Cairo (cairographics.org) package because that is the first one I
started with in developing the FFI Helper.  Say you are an avid Guile
user and want to be able to use Cairo in Guile.  On most systems Cairo
comes with the associated _pkg-config_ support files; this demo depends
on that support.

   Warning: The FFI Helper package is under active development and there
is some chance the following example will cease to work in the future.

   If you want to follow along and are working in the distribution tree,
you should source the file 'env.sh' in the 'examples' directory.

   By practice, I like to put all FH generated modules under a directory
called 'ffi/', so we will do that.  We start by generating, in the 'ffi'
directory, a file named 'cairo.ffi' with the following contents:

     (define-ffi-module (ffi cairo)
       #:pkg-config "cairo"
       #:include '("cairo.h" "cairo-pdf.h" "cairo-svg.h"))

To generate a Guile module you execute 'guild' as follows:

     $ guild compile-ffi ffi/cairo.ffi
     wrote `ffi/cairo.scm'

Though the file 'cairo/cairo.ffi' is only three lines long, the file
'ffi/cairo.scm' will be over five thousand lines long. It looks like
the following:

     (define-module (ffi cairo)
       #:use-module (system ffi-help-rt)
       #:use-module ((system foreign) #:prefix ffi:)
       #:use-module (bytestructures guile))
     (define link-libs
       (list (dynamic-link "libcairo")))

     ;; int cairo_version(void);
     (define ~cairo_version
       (delay (fh-link-proc ffi:int "cairo_version" (list) link-libs)))
     (define (cairo_version)
       (let () ((force ~cairo_version))))
     (export cairo_version)

     ...

     ;; typedef struct _cairo_matrix {
     ;;   double xx;
     ;;   double yx;
     ;;   double xy;
     ;;   double yy;
     ;;   double x0;
     ;;   double y0;
     ;; } cairo_matrix_t;
     (define-public cairo_matrix_t-desc
       (bs:struct
         (list `(xx ,double) `(yx ,double) `(xy ,double)
               `(yy ,double) `(x0 ,double) `(y0 ,double))))
     (define-fh-compound-type cairo_matrix_t cairo_matrix_t-desc
      cairo_matrix_t? make-cairo_matrix_t)
     (export cairo_matrix_t cairo_matrix_t? make-cairo_matrix_t)

     ... many, many more declarations ...

     ;; access to enum symbols and #define'd constants:
     (define ffi-cairo-symbol-val
       (let ((sym-tab
               '((CAIRO_SVG_VERSION_1_1 . 0)
                 (CAIRO_SVG_VERSION_1_2 . 1)
                 (CAIRO_PDF_VERSION_1_4 . 0)
                 (CAIRO_PDF_VERSION_1_5 . 1)
                 (CAIRO_REGION_OVERLAP_IN . 0)
                 (CAIRO_REGION_OVERLAP_OUT . 1)
                 ... more constants ...
                 (CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
                   .
"application/x-cairo.jbig2-global-id"))))
         (lambda (k) (or (assq-ref sym-tab k)))))
     (export ffi-cairo-symbol-val)
     (export cairo-lookup)

     ... more ...


Note that from the _pkg-config_ spec the FH compiler picks up the
required libraries to bind in.  Also, '#define' based constants, as well
as those defined by enums, are provided in a lookup function
'ffi-cairo-symbol-val'.  So, for example

     guile> (use-modules (ffi cairo))
     ;;; ffi/cairo.scm:6112:11: warning:
         possibly unbound variable `cairo_raster_source_acquire_func_t*'
     ;;; ffi/cairo.scm:6115:11: warning:
         possibly unbound variable `cairo_raster_source_release_func_t*'
     guile> (ffi-cairo-symbol-val 'CAIRO_FORMAT_ARGB32))
     $1 = 0

We will discuss the warnings later.  They are signals that extra code
needs to be added to the ffi module.  But you see how the constants (but
not CPP function macros) can be accessed.

   Let's try something more useful: a real program. Create the
following code in a file, say 'cairo-demo.scm', then fire up a Guile
session and 'load' the file.

     (use-modules (ffi cairo))
     (define srf (cairo_image_surface_create 'CAIRO_FORMAT_ARGB32 200 200))
     (define cr (cairo_create srf))
     (cairo_move_to cr 10.0 10.0)
     (cairo_line_to cr 190.0 10.0)
     (cairo_line_to cr 190.0 190.0)
     (cairo_line_to cr 10.0 190.0)
     (cairo_line_to cr 10.0 10.0)
     (cairo_stroke cr)
     (cairo_surface_write_to_png srf "cairo-demo.png")
     (cairo_destroy cr)
     (cairo_surface_destroy srf)

     guile> (load "cairo-demo.scm")
     ...
     ;;; compiled /.../cairo.scm.go
     ;;; compiled /.../cairo-demo.scm.go
     guile>

If we set up everything correctly we should have generared the target
file 'cairo-demo.png' which contains the image of a square.  A few items
in the above code are notable.  First, the call to
'cairo_image_surface_create' accepted a symbolic form
''CAIRO_FORMAT_ARGB32' for the format argument.  It would have also
accepted the associated constant '0'.  In addition, procedures declared
in '(ffi cairo)' will accept Scheme strings where the C function wants
"pointer to string."

   Now try this in your Guile session:

     guile> srf
     $4 = #<cairo_surface_t* 0x7fda53e01880>
     guile> cr
     $5 = #<cairo_t* 0x7fda54828800>

Note that the FH keeps track of the C types you use. This can be useful
for debugging but may bloat the namespace.  The constants you see are
the pointer values.  But it goes further.  Let's generate a matrix type:

     guile> (define m (make-cairo_matrix_t))
     guile> m
     $6 = #<cairo_matrix_t 0x10cc26c00>
     guile> (use-modules (system ffh-help-rt))
     guile> (pointer-to m)
     $7 = #<cairo_matrix_t* 0x10cc26c00>

When it comes to C APIs that expect the user to allocate memory for a
structure and pass the pointer address to the C function, FH provides a
solution:

     guile> (cairo_get_matrix cr (pointer-to m))
     guile> (fh-object-ref m 'xx)
     $9 = 1.0

1.1 The Guile Foreign Function Interface
========================================

Guile has an API, called the Foreign Function Interface, which allows
one to avoid writing and compiling C wrapper code in order to access C
coded libraries.  The API is based on 'libffi' and is covered in the
Guile Reference Manual.  We review some important bits here.  For more
insight you should read the relevant sections in the Guile Reference
Manual.  For more info on libffi internals visit libffi
(https://github.com/libffi/libffi).

   The relevant procedures used by the FH are
'dynamic-link'
     links libraries into Guile session
'dynamic-func'
     generated Scheme-level pointer to a C function
'pointer->procedure'
     geneates a Scheme lambda given C function signature
'dynamic-pointer'
     provides access to global C variables
Several of the above require import of the module '(system foreign)'.

   In order to generate a Guile procedure wrapper for a function, say
'int foo(char *str)', in some foreign library, say 'libbar.so', you can
use something like the following:
     (use-modules (system foreign))
     (define foo (pointer->procedure
                  int
                  (dynamic-func "foo" (dynamic-link "libbar"))
                  (list '*)))
The argument 'int' is a variable name for the return type, the next
argument is an expression for the function pointer and the third
argument is an expression for the function argument list.  To execute
the function, which expects a C string, you use something like
     (define result-code (foo (string->pointer "hello")))
If you want to try a real example, this should work:
     guile> (use-modules (system foreign))
     guile> (define strlen
               (pointer->procedure
                int (dynamic-func "strlen" (dynamic-link)) (list '*)))
     guile> (strlen (string->pointer "hello, world"))
     $1 = 12
It is important to realize that internally Guile takes care of
converting Scheme arguments to and from C types.  Scheme does not have
the same type system as C and the Guile FFI is somewhat forgiving here.
When we declare a C function interface with, say, an uint32 argument
type, in Scheme you can pass an exact numeric integer. The FH attempts
to be even more forgiving, allowing one to pass symbols where C enums
(i.e., integers) are expected.

   As mentioned, access to libraries not compiled into Guile is
accomplished via 'dynamic-link'.  To link the shared library 'libfoo.so'
into Guile one would write something like the following:
     (define foo-lib (dynamic-link "libfoo"))
Note that Guile takes care of dealing with the file extension (e.g.,
'.so').  Where Guile looks for libraries is system dependent, but
usually it will find shared objects in the following
   * '(assq-ref %guile-build-info 'libdir)'
   * '(assq-ref %guile-build-info 'extensiondir)'
   * '/usr/lib' on GNU/Linux and macOS
   * $DYLD_LIBRARY_PATH on GNU/Linux and macOS
   * directories listed in /etc/ld.so.conf on GNU/Linux
When used with no argument 'dynamic-link' returns a handle for objects
already linked with Guile.  The procedure 'dynamic-link' returns a
library handle for acquiring function and variable handles, or pointers,
for objects (e.g., a pointer for a function) in the library.
Theoretically, once a library has been dynamically linked into Guile,
the expression '(dynamic-link)' (with no argument) should suffice to
provide a handle to acquire object handles, but I have found this is not
always the case.  The FH will try all library handles defined by a ffi
module to acquire object pointers.

1.2 The FFI Helper Design
=========================

In this section we hope to provide some insight into the FH works.  The
FH specification, via the dot-ffi file, determines the set of
declarations which will be included in the target Guile module.  If
there is no declartion filter, then all the declarations from the
specified set of include files are targeted.  With the use of a
declaration filter, this set can be reduced.  By declaration we mean
typedefs, aggregate definitions (i.e., structs and unions), function
declarations, and external variables.

   In the C language typedefs define type aliases, so there is no harm
in expanding typedefs which appear outside the specification.  For
example, say the file 'foo.h' includes a declaration for the typedef
'foo_t' and the file 'bar.h' includes a declaration for the typedef
'bar_t'.  Furthermore, suppose 'foo_t' is a struct that references
'bar_t'.  Then the FH will preserve the typedef 'foo_t' but expand
'bar_t'.  That is, if the declarations are

     typedef int bar_t;   /* from bar.h */
     typedef struct { bar_t x; double y; } foo_t; /* from foo.h */
then the FH will treat 'foo_t' as if it had been declared as
     typedef struct { int x; double y; } foo_t; /* from foo.h */

   When it comes to handling C types in Scheme the FH tries to leave
base types (i.e., numeric types) alone and uses its own type system
based on Guiles _structs_ and associated _vtables_ for structs, unions,
function types and pointer types.  Enum types are handled specially as
described below.  The FH type system associates with each type a number
of procedures.  One of these is the printer procedure which provided the
association of type with output seen in the demo above.

   One of the challenges in automating C-Scheme type conversion is that
C code uses a lot of pointers.  So as the FH generates types for
aggregates, it will automatically generate types for associated
pointers.  For example, in the case above with 'foo_t' the FH will
generate an aggregate type named 'foo_t' and a pointer type named
'foo_t*'.  In addition the FH generates code to link these two together
so that, given an object 'f1' of type 'foo_t', the expression
'(pointer-to f1)' will generate an object of type 'foo_t*'.  This makes
the task of generating an object value in Scheme, and then passing the
pointer to that value as an argument to a FFI-generated procedure, easy.
The inverse operation 'value-at' is also provided.  Note that sometimes
the C code needs to work with pointer pointer types.  The FH does not
produce double-pointers and in that case, the user must add code to the
FH module defintion to support the required additional type (e.g.,
'foo_t**').

   In addition, the FH type system provide unwrap and wrap procedures
used internal to ffi-generated modules for function calls.  These
convert FH types to and from objects of type expected by Guile's FFI
interface.  For example, the unwrap procedure associated with the FH
pointer type 'foo_t*' will convert an 'foo_t*' object to a Guile
'pointer'.  Similarly, on return the wrap procedure are applied to
convert to FH types.  When the FH generates a type, for example 'foo_t'
it also generates an exported procedure 'make-foo_t' that users can use
to build an object of that type.  The FH also generated a predicate
'foo_t?' to determine if an object is of that type.  The '(system
ffi-help-rt)' module provides a procedure 'fh-object-ref' to convert an
object of type 'foo_t' to the underlying bytestructures representation.
For numeric and pointer types, this will generate a number and for
aggregate types, a bytestructure.  Additional arguments to
'fh-object-ref' for aggregates work as with the bytestructures package
and enable selection of components of the aggregate. Note that the
underlying type for a bytestructure pointer is an integer.

   Enums are handled specially.  In C, enums are represented by
integers.  The FH does not generate types for C enums or C enum
typedefs.  Instead, the FH defines unwrap and wrap procedures to convert
Scheme values to and from integers, where the Scheme values can be
integers or symbols.  For example, if, in C, the enum typedef 'baz_t'
has element 'OPTION_A' with value 1, a procedure expecting an argument
of type 'baz_t' will accept the symbol ''OPTION_A' or the integer '1'.

   Where the FH generates types, the underlying representation is a
_bytestructure descriptor_.  That is, the FH types are essentially a
layer on top of a bytestructure.  The layer provides identification seen
at the Guile REPL, unwrap and wrap procedures which are used in function
handling (not normally visible to the user) and procedures to convert
types to and from pointier-types.

   For base types (e.g., 'int', 'double') the FH uses the associated
Scheme values or the associated bytestructures values. (I think this is
all bytestructure values now.)

   The underlying representation of bytestructure values is
_bytevectors_.  See the Guile Reference Manual for more information on
this datatype.

   The following routines are user-level procedures provided by the
runtime module '(system ffi-help-rt)':
'fh-type?'
     a predicate to indicate whether an object is a FH type
'fh-object?'
     a predicate to indicate whether an object is a FH object
'fh-object-val'
     the underlying bytestructure value
'fh-object-ref'
     a procedure that works like 'bytestructure-ref' on the underlying
     object
'fh-object-set!'
     a procedure that works like 'bytestructure-set!' on the underlying
     object
'pointer-to'
     a procedure, given a FH object, or a bytestructure, that returns an
     associated pointer object (i.e., a pointer type whose object value
     is the address of the underlying argument); this may be a FH type
     or a bytestructure
'value-at'
     a procedure to dereference an object
'fh-cast'
     a procedure to cast arguments for varaidic C functions
'make-type'
     make base type, as listed below; also used to make bytestructure
     objects for base types (e.g., '(make-double)' for 'double')

   Supported base types are
short              unsigned-short     int unsigned
long               unsigned-long      float double
size_t             ssize_t            intptr_t uintptr_t
ptrdiff_t
int8               uint8              int16 uint16
int32              uint32             int64 uint64
These types are useful for cases where the corresponding types are
passed by reference as return types.  For example
     (let ((name (make-char*)))
       (some_function (pointer-to name))
       (display "name: ") (display (char*->string name)) (newline))
     (let ((return-val (make-double)))
       (another_function (pointer-to return-val))
       (simple-format #t "val is ~S\n" (fh-object-ref return-val)))
Loading...