Discussion:
Functional database
amirouche
2018-02-10 07:34:13 UTC
Permalink
Héllo all,

# Introduction

I figured a usecase for an immutable / functional database that works
like git. I like the "streamable immutable database" name but not sure
it's applicable.

This prolly seems ambitious and pretentious, that said, I am certain I
can
get it done. The only uncertainty is performance, but I have also ideas
for that.

The idea of building git-like database is not new but now I have a
better
picture of it.

The question you want to ask, is why not re-implement git in guile and
maybe
use wiredtiger as backing store. Well, that is a legitimate question.
What I am trying to achieve is something more general than git.

Feel free to point me to relevant documentation or argue that git in
guile is the
way forward.

The main use case I want to handle, is the ability to experiment with
different
versions of a given machine learning model / data / dataset that might
be bigger than
RAM. That is, easily and efficiently switch from one version of the
model to another
without resorting on copying all the files or database.

That is a version-ed branch-able fork-able database.

Feel free to argue that data and code are different and that data MUST
BE
distributed out-of-band, I will be reading with great interest.

# Description

It MUST have the following features:

- It support ACID transactions

- It's multi-threaded

- It's an association list database (like guile-wiredtiger's
feature-space) where
keys are symbols and values are any scheme value. Otherwise said,
it's a document
database.

- It support git like features ie. tags, branches, push, pull, revert,
merge
log, diff and of course commits and revision. In particular, it's
possible
to access the history of a given association.

- It's immutable in the sens that CRUD operation instead of changing
values in place create new entries in the database to reflect the
change. In terms of wiredtiger API, there is no call to cursor-update.
It's only using cursor-insert calls.

- 'neon checkout REV' will bring in the working space a more efficient
representation
of the data. That representation MUST BE configurable. Otherwise said,
if the user wants to version csv, a geo-temporal data, timeseries or
whatever it must
be possible.

- It SHOULD allow to mix data with source files.

- It SHOULD also allow to store efficiently binaries.

# TODO

- code the "bare database" ie. the gist of the story that is the
immutable association
list that takes inspiration from git.

- create benchmarks

- Index conceptnet and wikidata and demo the git-like features over the
dictionary
based named entity recognition.
amirouche
2018-02-10 23:44:39 UTC
Permalink
Post by amirouche
Héllo all,
# Introduction
I figured a usecase for an immutable / functional database that works
like git.
There is some data [0] about the subject applied to triple stores aka.
subject-predicate-object data store [1]. So now, I understand that
triple
store manipulates - triples. They do not manipulate unique identifiers
referencing association list. That said, if you group-by 'subject' a set
of triples you get an association lists. It means there is an
equivalence
between the two. The striking thing, is that it seems like the triple
store
doesn't require extra work to pull/push triples since FK / relations /
links
are handle by the user.

[0] http://events.linkeddata.org/ldow2013/papers/ldow2013-paper-01.pdf
[1] https://en.wikipedia.org/wiki/Triplestore

The interface of my database (based on Entity Attribute Value model) is:

(fs:add alist) -> unique identifier

which will call a `get-unique-identfier` procedure and call:

(fs:add-pair uid key value)

For every pairs of the alist.

The interface of a triple store is:

(add subject predicate object)

That is, it's exactly the same as 'fs:add-pair' or almost because
in my target implementation SUBJECT is positive integer (max 2^64)
PREDICATE is a symbol and OBJECT a scheme value. That allows to use
more integers which leads to faster comparisons. I could map SUBJECT
and PREDICATE to an integer via another table... Not sure what is the
best solution.

Related to git, it's possible to change git backend using libgit2's ODB
API [2].

[2]
https://www.perforce.com/blog/your-git-repository-database-pluggable-backends-libgit2
Post by amirouche
# TODO
- code the "bare database" ie. the gist of the story that is the
immutable association
list that takes inspiration from git.
- create benchmarks
- Index conceptnet and wikidata and demo the git-like features over
the dictionary
based named entity recognition.
amirouche
2018-02-11 23:30:17 UTC
Permalink
Post by amirouche
Post by amirouche
Héllo all,
# Introduction
I figured a usecase for an immutable / functional database that works
like git.
There is some data [0] about the subject applied to triple stores aka.
subject-predicate-object data store [1]. So now, I understand that
triple
store manipulates - triples.
To be precise, the public facing API doesn't expose unique identifiers
generated
by the database system.
Post by amirouche
The striking thing, is that it seems like the triple store
doesn't require extra work to pull/push triples since FK / relations
/ links
are handle by the user.
[0] http://events.linkeddata.org/ldow2013/papers/ldow2013-paper-01.pdf
[1] https://en.wikipedia.org/wiki/Triplestore
There is another interesting article called R43ples [2] which is a very
similar idea to what I had in mind 3 years ago when trying to work with
opencog
people [3]. That is use a triple store to store history and references
of the
version-ed triple store! I did not dig much deeper into the paper even
if it's
very well written with lots of diagrams.


[2] http://www.hyperdev.fr/projects/neon/10.1.1.662.1619.pdf
[3] https://wiki.opencog.org/w/User_talk:Amz3

I started the code but did not have time to compile it yet...

My plan demo add/del triples and ref at different points in database
history
in the following days.


If you find the idea interesting and you'd like to help here are few
tasks
that can be achieved independently:

- Create a parser for sparql query language and / or a guile DSL to
achieve the same

- Create parser for turtle syntax
https://en.wikipedia.org/wiki/Turtle_(syntax)

- Package microkanren for guix
https://framagit.org/a-guile-mind/microkanren

- Update guile-gnome package to use guile 2.2.3 and create a graphical
user interface
that allows to: navigate the history (like a git graph), merge
conflicting branches,
make banana flavored ice cream.


Happy hacking!
Amirouche Boubekki
2018-02-13 16:07:49 UTC
Permalink
Post by amirouche
Post by amirouche
Héllo all,
# Introduction
I figured a usecase for an immutable / functional database that works
like git.
I will continue my work using 'chez scheme' because performances.

I will log my progress at http://hyperdev.fr/projects/neon/
Post by amirouche
Happy hacking!
--
Amirouche ~ amz3 ~ http://www.hyperdev.fr
Loading...