Discussion:
[ANN] guile-msgpack: MessagePack for GNU Guile (+ help needed)
HiPhish
2018-09-19 12:33:22 UTC
Permalink
Hello Schemers,

I am pleased to announce to the public a project I have been working on for a
while: an implementation of the MessagePack[1] data serialization format for
GNU Guile.

https://gitlab.com/HiPhish/guile-msgpack


## What is MessagePack ##

MessagePack is a specification for how to turn data into byte and vice-versa.
Think of it like JSON, except instead of human readability it stresses speed
and size efficiency. The format is binary and well suited for data exchange
between processes. It also forms the basis of the MessagePack RPC protocol[2].


## About the Guile implementation ##

My implementation is written in pure Guile Scheme, so it should work out of
the box. You can try it out without installing anything. Just clone the repo
and give it a try:

$ guile -L .

scheme@(guile-user)> ,use (msgpack)
scheme@(guile-user)> (pack "hello" 1337 #t)
$1 = #vu8(165 104 101 108 108 111 205 5 57 195)

As you can see, the three objects passed to the `pack` procedure have been
turned into one big bytevector. We could now unpack this bytevector again,
write it to a file, or send if off to a port. Since using MessagePack with
ports is a frequent task there is also a `pack!` procedure which takes in a
port to pack to as well.

(define hodgepodge (vector 1 2 '#(3 #t) "foo"))
(pack! hodgepodge (current-output-port))

To get our object back we `unpack` it:

(unpack #vu8(#xA5 #x68 #x65 #x6C #x6C #x6F)) ; Returns "hello"
(unpack! (current-input-port))

The readme goes into more detail and the (Texinfo) manual has the complete
documentation. Building it is simple enough with the included makefile.


## What's next? ##

Next is your feedback! Once the library has settled down I would like to make
it into a Guix package. From there I can then start building a MessagePack RPC
implementation and finally a Neovim client which will allow people to write
plugins for Neovim in GNU Guile. (yes, Vim plugins written in Lisp, who would
have thought?)


## Help needed ##

This is my first time making a full project in Guile, so I would appreciate it
if someone with more experience could look over my code. I have written down
some things I could think of in the todo file. In particular I would like to
know:

How much is portability a concern? I know Guile implements *some* or r6rs, but
I wasn't paying much attention to that. Is it something worth considering or
should I just treat Guile as its own language that just so happens to be based
on Scheme?

The extension type `ext` (msgpack/ext.scm) is a pair of a signed 8-bit integer
and a bytevector. The constructor does not enforce this condition, the two
slots can be really anything. What it the proper way of enforcing this in
Guile? I know Common Lisp has type declarations and Racket has contracts, but
what does Guile have?

The `pack` procedure takes any number of objects and packs them into a large
bytevector. However, the `unpack` procedure only returns the first object it
unpacks. Is there a way of making it unpack as many as it can? I thought of
`values`, but you would need to know the number of values in advance. Also the
caller would have to know in advance how many objects he is going to get
unpacked.

As I said, there is more in the todo file, but the other questions are under
the hood, not user-visible.


## The inevitable panhandling ##

I don't want to go into this too much, because no one likes to read it, but if
you like my MessagePack implementation I would really appreciate if you could
spare some cash. Links are in the readme; at the moment I only have Liberapay
set up, if anyone can recommend me a service for one-time donations that would
be cool. All the services I could find were about fundraising for charity and
stuff, not what I was looking for.


_______________________________________________________________________________
[1] https://msgpack.org/
[2] https://github.com/msgpack-rpc/msgpack-rpc
Thompson, David
2018-09-19 13:29:44 UTC
Permalink
Post by HiPhish
## What is MessagePack ##
MessagePack is a specification for how to turn data into byte and vice-versa.
Think of it like JSON, except instead of human readability it stresses speed
and size efficiency. The format is binary and well suited for data exchange
between processes. It also forms the basis of the MessagePack RPC protocol[2].
This is cool! I've seen people use MessagePack for multiplayer video
games. Maybe now that this library exists I will give that a try some
day.
Post by HiPhish
## About the Guile implementation ##
My implementation is written in pure Guile Scheme, so it should work out of
the box. You can try it out without installing anything. Just clone the repo
$ guile -L .
$1 = #vu8(165 104 101 108 108 111 205 5 57 195)
As you can see, the three objects passed to the `pack` procedure have been
turned into one big bytevector. We could now unpack this bytevector again,
write it to a file, or send if off to a port. Since using MessagePack with
ports is a frequent task there is also a `pack!` procedure which takes in a
port to pack to as well.
(define hodgepodge (vector 1 2 '#(3 #t) "foo"))
(pack! hodgepodge (current-output-port))
(unpack #vu8(#xA5 #x68 #x65 #x6C #x6C #x6F)) ; Returns "hello"
(unpack! (current-input-port))
So does this allow for sending over arbitrarily nested s-expressions
(i.e. anything supported by Guile's 'read' and 'write' procedures)?
For example:

(pack '((foo . 1) (bar . (2 3 4)) (baz . "hello")))
Post by HiPhish
How much is portability a concern? I know Guile implements *some* or r6rs, but
I wasn't paying much attention to that. Is it something worth considering or
should I just treat Guile as its own language that just so happens to be based
on Scheme?
FWIW I do the latter. I write my projects specifically for Guile, but
often my projects have to be that way because there's no Scheme
standard that provides the interfaces I use.
Post by HiPhish
The extension type `ext` (msgpack/ext.scm) is a pair of a signed 8-bit integer
and a bytevector. The constructor does not enforce this condition, the two
slots can be really anything. What it the proper way of enforcing this in
Guile? I know Common Lisp has type declarations and Racket has contracts, but
what does Guile have?
Guile doesn't provide any static type system. I haven't looked at your
source, but when you say "a pair", do you mean a literal cons cell?
If so, I recommend switching to record types via the (srfi srfi-9)
module. You can write a custom constructor that does the type checks
if you'd like.
Post by HiPhish
The `pack` procedure takes any number of objects and packs them into a large
bytevector. However, the `unpack` procedure only returns the first object it
unpacks. Is there a way of making it unpack as many as it can? I thought of
`values`, but you would need to know the number of values in advance. Also the
caller would have to know in advance how many objects he is going to get
unpacked.
You could resolve this by changing 'pack' to only accept a single
value. If the user wants to pack multiple items, then they would pass
a list or vector.

Hope this helps,

- Dave
HiPhish
2018-09-19 14:06:48 UTC
Permalink
Post by Thompson, David
So does this allow for sending over arbitrarily nested s-expressions
(i.e. anything supported by Guile's 'read' and 'write' procedures)?
(pack '((foo . 1) (bar . (2 3 4)) (baz . "hello")))
Not quite, you are limited to the types MessagePack defines, because the
recipient might have no idea what types Guile has. Think of it like binary
JSON. However, the standard leaves room for your own extension via the `ext`
type. There is an example in the documentation:

(define (rational->ext x)
(make-ext 13
(pack (vector (numerator x)
(denominator x)))))

Let's say you want to pack an exact rational number while preserving its
exactness. There is nothing in the spec that supports such a type, so we will
chose to represent rational number as an extension with the number 13 (no
particular reason). An extension is a pair of a number and the data as a
bytevector. The simplest way to encode the numerator and denominator is to
pack a vector of the two numbers.

You can now pack this ext object and send it off through a port where on the
other end the recipient hopefully knows what to do with an extension object of
type 13:

(pack! (rational->ext 2/3) (current-output-port))

If you receive such an object it's easy to make a rational number from it
again:

(define (ext13->rational e)
;; The data gets unpacked as a vector of two numbers
(define numbers (unpack (ext-data e)))
;; The result is the quotient of those two numbers
(/ (vector-ref numbers 0)
(vector-ref numbers 1)))

If you don't want to go through this conversion ritual every time you can also
register rational numbers with the `packing-table` parameter. The manual goes
into more detail. But yes, with extension you could in principle pack an
expression like ` '((foo . 1) (bar . (2 3 4)) (baz . "hello"))`, you would
just have to first define *how* to pack pairs. Symbols are currently packed as
strings, but that can be overridden as well.
Post by Thompson, David
Guile doesn't provide any static type system. I haven't looked at your
source, but when you say "a pair", do you mean a literal cons cell?
If so, I recommend switching to record types via the (srfi srfi-9)
module. You can write a custom constructor that does the type checks
if you'd like.
I already use SRFI 9 (define-record-type) to generate everything, an `ext` is
a record with two slots. I'll look into it again then, seems like I missed
that part.
Post by Thompson, David
You could resolve this by changing 'pack' to only accept a single
value. If the user wants to pack multiple items, then they would pass
a list or vector.
MessagePack already defines arrays (vectors), so packing a vector of three
objects and packing three objects is not the same.
John Cowan
2018-09-19 14:12:04 UTC
Permalink
On Wed, Sep 19, 2018 at 8:33 AM HiPhish <***@posteo.de> wrote:

Since using MessagePack with
Post by HiPhish
ports is a frequent task there is also a `pack!` procedure which takes in a
port to pack to as well.
By convention, Scheme procedures whose only side effect is on a port
don't use the ! in their names: we write `read`, `write`, `display`, not
`read!`, `write!`, `display!`.

I would suggest calling them pack-and-write and read-and-unpack;
you can leave out the "and-" if you think those are too long.

How much is portability a concern? I know Guile implements *some* or r6rs,
Post by HiPhish
but
I wasn't paying much attention to that. Is it something worth considering or
should I just treat Guile as its own language that just so happens to be based
on Scheme?
The Scheme community is small enough that doing a little bit to make
libraries
more portable is worthwhile. I'd use R6RS `library` instead of
Guile-specific
`define-module`, and maybe put the code proper into a separate file from the
module furniture. (To use `include` or any other Guile-specific procedure
in an R6RS library, add `(only (guile) include)` to the imports list.)
Using
Guile-specific procedures is not usually a problem, as there are probably
equivalents in other Schemes.
Post by HiPhish
The extension type `ext` (msgpack/ext.scm) is a pair of a signed 8-bit integer
and a bytevector. The constructor does not enforce this condition, the two
slots can be really anything. What it the proper way of enforcing this in
Guile? I know Common Lisp has type declarations and Racket has contracts, but
what does Guile have?
Usually there's just using a constructor wrapper that checks the types
before
calling the real constructor (`cons` in this case).
Post by HiPhish
Is there a way of making it unpack as many as it can?
Returning a list of values is idiomatic. It is actually possible for a
caller to
receive multiple values without knowing how many it's going to get, but it's
probably more trouble than it's worth in this case.
Post by HiPhish
if anyone can recommend me a service for one-time donations that would
be cool. All the services I could find were about fundraising for charity and
stuff, not what I was looking for.
GoFundMe seems like the right thing. They are large and reputable, they
only
take enough of your money to cover credit-card processing costs (for
personal
campaigns like this one), they are okay with small donations. Although
they are
best known for crowdfunding personal emergencies, they do handle works of
art
as well (software is an art, we have Knuth's word for it).
--
John Cowan http://vrici.lojban.org/~cowan ***@ccil.org
First known example of political correctness: After Nurhachi had united
all the other Jurchen tribes under the leadership of the Manchus, his
successor Abahai (1592-1643) issued an order that the name Jurchen should
be banned, and from then on, they were all to be called Manchus.
--S. Robert Ramsey, The Languages of China
HiPhish
2018-09-19 23:32:12 UTC
Permalink
Post by John Cowan
By convention, Scheme procedures whose only side effect is on a port
don't use the ! in their names: we write `read`, `write`, `display`, not
`read!`, `write!`, `display!`.
I would suggest calling them pack-and-write and read-and-unpack;
you can leave out the "and-" if you think those are too long.
That's a really good point, I did not think of. In my Racket implementation I
called the functions without exclamation mark, but that implementation lacks
pure procedures.
Post by John Cowan
The Scheme community is small enough that doing a little bit to make
libraries
more portable is worthwhile. I'd use R6RS `library` instead of
Guile-specific
`define-module`, and maybe put the code proper into a separate file from the
module furniture. (To use `include` or any other Guile-specific procedure
in an R6RS library, add `(only (guile) include)` to the imports list.)
Using
Guile-specific procedures is not usually a problem, as there are probably
equivalents in other Schemes.
How popular is r6rs anyway? From what I gathered it was pretty badly received
and r7rs small was intentionally designed to be less ambitious, while the
upcoming r7rs will be larger than even Common Lisp.
Post by John Cowan
Usually there's just using a constructor wrapper that checks the types
before
calling the real constructor (`cons` in this case).
OK, that was my first idea, but I thought that there might be perhaps
something more idiomatic.
Post by John Cowan
Returning a list of values is idiomatic. It is actually possible for a
caller to
receive multiple values without knowing how many it's going to get, but it's
probably more trouble than it's worth in this case.
Yes, after writing my original email I found out how to bind an unknown number
of values, but there doesn't seem to be a way of returning an unknown number
of values.
Post by John Cowan
GoFundMe seems like the right thing.
GoFundMe is about fundraising campaigns, I was thinking something along the
lines of a tip-jar where you can chip in a little if you want.
John Cowan
2018-09-20 01:44:26 UTC
Permalink
Post by HiPhish
How popular is r6rs anyway? From what I gathered it was pretty badly received
and r7rs small was intentionally designed to be less ambitious, while the
upcoming r7rs will be larger than even Common Lisp.
That's basically correct, but there are still a fair number of R6RS-only
implementations, and if the number isn't growing, it's not shrinking either.
Racket has an unofficial R7RS implementation, but I don't expect that
to happen with Chez, Guile, Ikarus, IronScheme, Mosh, or Ypsilon.
(Larceny and Sagittarius are hybrid R6RS/R7RS systems.)
Some of these are more in development than others, but you
never can tell with a Scheme implementation when a new release
will appear without warning. In addition, it's not hard to transform
R6RS libraries to R7RS ones or vice versa.
Post by HiPhish
Yes, after writing my original email I found out how to bind an unknown number
of values, but there doesn't seem to be a way of returning an unknown number
of values.
(apply values list-of-vals) is your friend.
Post by HiPhish
Post by John Cowan
GoFundMe seems like the right thing.
GoFundMe is about fundraising campaigns, I was thinking something along the
lines of a tip-jar where you can chip in a little if you want.
You might look into Open Collective.
--
John Cowan http://vrici.lojban.org/~cowan ***@ccil.org
They tried to pierce your heart with a Morgul-knife that remains in
the wound. If they had succeeded, you would become a wraith under the
domination of the Dark Lord. --Gandalf
Aleksandar Sandic
2018-09-20 11:15:24 UTC
Permalink
I have renamed `pack!` and `unpack!` to `pack-to` and `unpack-from`,
following the advice about not using the exclamation mark suffix with side
effects regarding ports. I have also changed the order of arguments so the
first argument to `pack-to` is the port and the rest is an arbitrary number of
objects to pack.

(pack-to (current-output-port) 2 3 "hello")

I think this also rolls better off the tongue: pack to the current output port
2, 3 and "hello".
Aleksandar Sandic
2018-09-20 11:24:06 UTC
Permalink
Oops, wrong email address, please don't reply to this one, I'll ask the admins
to delete it...

Loading...