Discussion:
neon: git for structured data [Was: Functional database]
Amirouche Boubekki
2018-02-21 14:49:00 UTC
Permalink
I tried chez scheme and I think GNU Guile
is a better platform for what I am trying
to achieve, so I am back.

I also know better what I want to achieve.
I will create a triple store that comply
with semantic web standard that is
a RDF triple store. At [0] and [1] you will
find a primer on what is RDF in the former
and the concepts in the latter.

[0] https://www.w3.org/TR/rdf11-primer/
[1] https://www.w3.org/TR/rdf11-concepts/

It will also be branch-able etc... like git.

Also, I also plan to implement sparql.
If you find sparql difficult I recommend
the tutorial at data.world [2] in the mean time.
It's not very difficult and looks like SQL.
Hence I also plan to implement sparql [3].

[2] https://docs.data.world/tutorials/sparql/
[3] https://www.w3.org/TR/sparql11-overview/

What I want to do is something similar to data.world,
that is a gitlab-like platform for data and replace
the use of git in projects like datahub.io [4].

[4] http://datahub.io/core/registry

Enough talking, what is the status? Well I finished
porting what I had in chez and can now run the following
scenario:

- In master branch, I commit two triples

- In other branch, that is orphan branch, I commit
two triples among where one of them overlaps with
master.

- I can query both branch

- In a merge commit, I fix the conflict between both
branch.

- I can query the resulting branch and get the expected
result.

The code might be easier to read [5]

[5] https://github.com/amirouche/neon/blob/master/guile/neon.scm

What is missing, in order of difficulty:

- microkanren package
https://framagit.org/a-guile-mind/microkanren

- wiredtiger 3 package

- Turtle aka. .ttl format parser https://www.w3.org/TR/turtle/

- sparql queries parser https://www.w3.org/TR/rdf-sparql-query/

- I am not sure of the status of guile-squee yet
https://notabug.org/cwebber/guile-squee/

- pluggable backends

If you want to work one of this item, send me an email.

What I plan to work on next:

There is a semantic difference between neon
and RDF triple stores. In a triple store you
can have as many times as you want the same
attribute given a subject. That is (ref subject)
doesn't return a proper alist.

There is two other links that remain to be cited

- https://www.w3.org/TR/rdf11-mt/

- https://www.w3.org/TR/2014/NOTE-rdf11-datasets-20140225/

Happy hacking,
--
Amirouche ~ amz3 ~ http://www.hyperdev.fr
Roel Janssen
2018-02-21 16:02:07 UTC
Permalink
Dear Amirouche,

I'm not exactly sure if this fits in with your plans, but nevertheless
I'd like to share this code with you.

I recently looked into using triple stores (actually quad stores)
and wrote an interface to Redland librdf for Guile.

I attached the source code of the interface.
With this interface, you can write something like this:

--8<---------------cut here---------------start------------->8---
(use-modules (redland rdf) ; The attached module.
(system foreign))

(define world (rdf-world-new))
(rdf-world-open world)

(define store (rdf-storage-new
world
"hashes"
"redland"
"new=true,hash-type='bdb',dir='path/to/triplestore'"))

(define model (rdf-model-new world store %null-pointer))

(define local-uri (rdf-uri-new world "http://localhost:5000/Redland/"))
(define s (rdf-node-new-from-uri-local-name world local-uri "Test"))
(define p (rdf-node-new-from-uri-local-name world local-uri "TestPredicate"))
(define o (rdf-node-new-from-uri-local-name world local-uri "TestObject"))

(define statement (rdf-statement-new-from-nodes world s p o))
(rdf-model-add-statement model statement)
(rdf-statement-free statement)

(rdf-model-size model)
(rdf-storage-size store)

;; Example mime-type: application/rdf+xml
(define serializer (rdf-serializer-new world %null-pointer "text/turtle" %null-pointer))
(define serialized (rdf-serializer-serialize-model-to-string serializer local-uri model))
(format #t "Serialized: ~s~%" (pointer->string serialized))

(rdf-uri-free local-uri)
(rdf-model-free model)
(rdf-storage-free store)
(rdf-world-free world)
--8<---------------cut here---------------end--------------->8---

Kind regards,
Roel Janssen
Post by Amirouche Boubekki
I tried chez scheme and I think GNU Guile
is a better platform for what I am trying
to achieve, so I am back.
I also know better what I want to achieve.
I will create a triple store that comply
with semantic web standard that is
a RDF triple store. At [0] and [1] you will
find a primer on what is RDF in the former
and the concepts in the latter.
[0] https://www.w3.org/TR/rdf11-primer/
[1] https://www.w3.org/TR/rdf11-concepts/
It will also be branch-able etc... like git.
Also, I also plan to implement sparql.
If you find sparql difficult I recommend
the tutorial at data.world [2] in the mean time.
It's not very difficult and looks like SQL.
Hence I also plan to implement sparql [3].
[2] https://docs.data.world/tutorials/sparql/
[3] https://www.w3.org/TR/sparql11-overview/
What I want to do is something similar to data.world,
that is a gitlab-like platform for data and replace
the use of git in projects like datahub.io [4].
[4] http://datahub.io/core/registry
Enough talking, what is the status? Well I finished
porting what I had in chez and can now run the following
- In master branch, I commit two triples
- In other branch, that is orphan branch, I commit
two triples among where one of them overlaps with
master.
- I can query both branch
- In a merge commit, I fix the conflict between both
branch.
- I can query the resulting branch and get the expected
result.
The code might be easier to read [5]
[5] https://github.com/amirouche/neon/blob/master/guile/neon.scm
- microkanren package
https://framagit.org/a-guile-mind/microkanren
- wiredtiger 3 package
- Turtle aka. .ttl format parser https://www.w3.org/TR/turtle/
- sparql queries parser https://www.w3.org/TR/rdf-sparql-query/
- I am not sure of the status of guile-squee yet
https://notabug.org/cwebber/guile-squee/
- pluggable backends
If you want to work one of this item, send me an email.
There is a semantic difference between neon
and RDF triple stores. In a triple store you
can have as many times as you want the same
attribute given a subject. That is (ref subject)
doesn't return a proper alist.
There is two other links that remain to be cited
- https://www.w3.org/TR/rdf11-mt/
- https://www.w3.org/TR/2014/NOTE-rdf11-datasets-20140225/
Happy hacking,
amirouche
2018-02-21 18:41:20 UTC
Permalink
Héllo Roel,
Post by Roel Janssen
Dear Amirouche,
I'm not exactly sure if this fits in with your plans, but nevertheless
I'd like to share this code with you.
Thanks for the input.
Post by Roel Janssen
I recently looked into using triple stores (actually quad stores)
and wrote an interface to Redland librdf for Guile.
Indeed quad stores. Triple store are only:

subject predicate object

whereas quad stores are:

graph subject predicate object

I did not grasp the difference between triple store and quad stores
until recently. see the definition of the w3c [0]

[0] https://www.w3.org/TR/rdf11-concepts/#section-rdf-graph

I somewhat looked at librdf before. In particular this is interesting:

Storage for graphs in memory and persistently with Oracle Berkeley
DB,
MySQL 3-5, PostgreSQL, OpenLink Virtoso, SQLite, files or URIs.

http://librdf.org/

This is definitely a feature that should be backed into neon.
By the way, wiredtiger is the successor of Oracle Berkley DB.
It was created by the same developers.

The difference between neon and librdf are the following:

- Quads can be version-ed in branches without copy (implemented but
on triples) making it effectively a quintuple store.

- You can pull / push graphs (called 'world' in librdf, i think)
ie. you can neon clone part of the remote data repository the
equivalent of git clone a particular directory (not implemented yet)

- The use of IRIs (or URIs) as 'graph name', 'subject' or 'predicate'
is not
enforced, this doesn't break compatibility with existing systems.
That said,
right now, I will implement 'object' as literals as the specification
describe
them [1] to allow compatibility with existing systems.

[1] https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
Post by Roel Janssen
I attached the source code of the interface.
--8<---------------cut here---------------start------------->8---
(use-modules (redland rdf) ; The attached module.
(system foreign))
(define world (rdf-world-new))
(rdf-world-open world)
(define store (rdf-storage-new
world
"hashes"
"redland"
"new=true,hash-type='bdb',dir='path/to/triplestore'"))
(define model (rdf-model-new world store %null-pointer))
(define local-uri (rdf-uri-new world
"http://localhost:5000/Redland/"))
(define s (rdf-node-new-from-uri-local-name world local-uri "Test"))
(define p (rdf-node-new-from-uri-local-name world local-uri
"TestPredicate"))
(define o (rdf-node-new-from-uri-local-name world local-uri
"TestObject"))
(define statement (rdf-statement-new-from-nodes world s p o))
(rdf-model-add-statement model statement)
The equivalent of this in neon is basically:

(add context "Test" "TestPredicate" "TestObject")

Where 'context' is the database context somewhat equivalent to a
'cursor' in
postgresql parlance.

The strings are mapped to 64 bit unsigned integers in the underlying
storage
to save space and ease comparisons. subjects and predicates are each of
them
stored in specific tables which hot parts stay in RAM. It makes the
string
to integer resolution fast. Basically, I rely on the database layer to
cache
the integer value associated with subjects and predicates, for the time
being.

Similarly to retrieve a triple right now, it can be done as follow:

(ref context "Test" "TestPredicate")

It's a minor difference, and librdf API has the advantage of giving the
choice
to the user to do caching themself.
Post by Roel Janssen
(rdf-statement-free statement)
(rdf-model-size model)
(rdf-storage-size store)
;; Example mime-type: application/rdf+xml
(define serializer (rdf-serializer-new world %null-pointer
"text/turtle" %null-pointer))
(define serialized (rdf-serializer-serialize-model-to-string
serializer local-uri model))
(format #t "Serialized: ~s~%" (pointer->string serialized))
There is no turtle support yet.
Post by Roel Janssen
(rdf-uri-free local-uri)
(rdf-model-free model)
(rdf-storage-free store)
(rdf-world-free world)
--8<---------------cut here---------------end--------------->8---
Kind regards,
Roel Janssen
Thanks Roel!
amirouche
2018-03-05 22:32:27 UTC
Permalink
Here is a small update on the neon project.

I implemented what could prolly be called naive
query engine that somewhat follows SPARQL specification.

Remember neon is quad store, that stores a *set* of 4-tuples
that looks like the following:

(graph, subject, predicate, object)

Or in more casual terms:

(namespace, uid, key, value)

It's a *set*. It means you can have that:

("hyperdev.fr" "metadata" "description" "a blog about programming")

And another tuple with another description:

("hyperdev.fr" "metadata" "description" "space muse")

That is (namespace, uid, key) is not unique.

Given that schema, SPARQL is basically pattern matching
scheme over the tuples. The only thing that I plan to add
to this part of the program is filtering because it allows
to speed up the pattern matching in some cases.

I am not sure how to implement OPTIONAL [0] so I leave it
for later.

[0] https://www.w3.org/TR/rdf-sparql-query/#optionals

There is a lot more to do and I am a bit lost about
the goals of the project. neon seems pretty overkill
for a blog engine. I don't have a particular need for
it actually. The idea of building a community that builds
knowledge bases but I am not sure how to proceed.

BTW, forget about the task that I said would be useful
in the previous mail.

I made a small video:

wget http://hyperdev.fr/static/gnu-guile-hacking-15.mp4

The project is still hosted at the following address:

https://github.com/amirouche/neon

Happy hacking!

Loading...