[[PageOutline]]
= RDFAlchemy =
The goal of RDF Alchemy is to allow anyone who uses [http://www.python.org/ python] to have a object type API access to an RDF Triplestore.
The same way that:
* [http://www.sqlalchemy.org SQLAlchemy] is an '''ORM''' (Object Relational Mapper) for relational database users
* RDFAlchemy is an '''ORM''' (Object RDF Mapper) for semantic web users.
'''News''' Trunk now includes:
* Read/Write access for collections and containers
* Read access to SPARQL endpoints
* Read/Write access to Sesame2
* Cascading delete
* chained descriptors and predicate range->class mapping
== Related resources ==
=== Installation ===
RDFAlchemy is now available at the [http://pypi.python.org/pypi Cheeseshop]: Just type
{{{
easy_install rdfalchemy
}}}
If you don't have setuptools installed...well you should so [http://peak.telecommunity.com/DevCenter/EasyInstall go get it]. Trust me.
=== Code ===
Browse dev code at http://www.openvest.com/trac/browser/rdfalchemy/trunk and see the current trunk and all history.
=== SVN ===
This is an actively developing project so bugs come an go. Get your svn access to the trunk at::
{{{
svn checkout http://www.openvest.com/svn/public/rdfalchemy/trunk
}}}
=== User Group ===
You can now [http://groups.google.com/group/rdfalchemy-dev visit rdfalchemy-dev] at Google Groups.
Bugs can be reported here directly if you have an openid to login.
=== API Docs ===
There are epydoc API Docs at http://www.openvest.com/public/docs/rdfalchemy/api/. You can also use links there to browse source, but it might not be current with the trunk.
{{{
#!html
}}}
The use of persistant objects in RDFAlchemy will be as close as possible to what it would be in SQLAlchemy. Code like:
{{{
#!python
>>> c = Company.get_by(symbol = 'IBM')
>>> print c.companyName
International Business Machines Corp.
}}}
This code does not change as the user migrates from SQLAlchemy to RDFAlchemy and back, lowering the bar for adoption of RDF based datastores.
== Capabilities ==
* SQLAlchemy interface
* Caching of data reads
* Access from multiple datastores:
* [http://rdflib.net rdflib] (beta)
* [http://www.w3.org/TR/rdf-sparql-query/ SPARQL] endpoints ('''alpha''')
* [http://www.joseki.org/ Joseki] based Jena access ('''alpha''')
* [http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/ D2R-server] ('''alpha''')
* Access to RDF triples from SQL databases through D2Rq
== SQL Alchemy ==
SQLAlchemy was chosen over the other popular python ORM ([http://www.sqlobject.org/ SQLObject]) because:
1. There appears to be a migration of some from SQLObject to SQLAlchemy. This appears to be in part due to some of the more sophisticated SQL capability of SQLAlchemy.
2. SQLAlchemy is being used in future releases of [http://trac.openvest.org/about trac] and [http://pylonshq.org Pylons], two systems in active use at Openvest.
3. SQLAlchemy has a [http://www.sqlalchemy.org/docs/03/tutorial.html#tutorial_twoinone line of demarcation] between the SQL and ORM portions of the library. RDFAlchemy similarly provides the rdflib api to SPARQL and Sesame graphs.
'''Note:''' to avoid namespace clashes SQLAlchemy 0.4 will use 'query' as a method more so that:
{{{
#!python
c = Company.get_by(symbol='IBM')
# will become
c = Company.query.get_by(symbol='IBM')
}}}
I don't much like it but RDFAlchemy will move to play copy-cat...I mean to provide a standardized API.
== Descriptors ==
Understanding Descriptors is key to using RDFAlchemy. A descriptor binds an instance variable to the calls to the RDF backend storage.
Class definitions are simple with the rdflib Descriptors. The descriptors are implemented with caching along the lines of [http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/276643 this recipe]. The predicate must be passed in.
{{{
#!python
ov = Namespace('http://owl.openvest.org/2005/10/Portfolio#')
vcard = Namespace("http://www.w3.org/2006/vcard/ns#")
class Company(rdfSubject):
rdf_type = ov.Company
symbol = rdfSingle(ov.symbol,'symbol') #second param is optional
cik = rdfSingle(ov.secCik)
companyName = rdfSingle(ov.companyName)
address = rdfSingle(vcard.adr)
stock = rdfMultiple(ov.hasIssue)
c = Company.get_by(symbol = 'IBM')
print "%s has an SEC symbol of %s" % (c.companyName, c.cik)
}}}
* rdfSingle returns a single literal
* rdfMultiple returns a list (may be a list of one)
* rdfMultiple will return a python list if the predicate is:
* in multiple triples for the `(s p o1)(s p o2)` etc yields `[o1, o2]`
* points to an RDF Collection (rdf:List)
* points to an RDF Container (rdf:Seq, rdf:Bag or rdf:Alt)
* rdfList returns a list (may be a list of one) and on save will save as an RDF:Collection (aka List)
* rdfContainer returns a list and on save will save as an RDF:Seq.
=== Chained predicates ===
Predicates can now be chained as in
{{{
#!python
c = Company.get_by(symbol='IBM')
print c[vcard.adr][vcard.region]
## or
print c.address[vcard.region]
}}}
This works because the generic `rdfSubject[predicate.uri]` notation maps to rdfSubject.__getitem__ which endeavors to return an instance or rdfSubject.
=== Chained descriptors ===
The `__init__` functions for the Descriptors now takes an optional argument of `range_type`. If you know the rdf.type (meaning the uriref of the type) you may pass it to the `Class.__init__`.
Within the samples module, a DOAP.Project maintainer is a FOAF.Person
{{{
#!python
DOAP=Namespace("http://usefulinc.com/ns/doap#")
FOAF=Namespace("http://xmlns.com/foaf/0.1/" )
class Project(rdfSubject):
rdf_type = DOAP.Project
name = rdfSingle(DOAP.name)
# ... some other descriptors here
maintainer = rdfSingle(DOAP.maintainer,range_type=FOAF.Person)
from rdfalchemy.samples.foaf import Person
from rdfalchemy.orm import mapper
mapper()
# some method to find an instance
p = Doap.ClassInstances().next()
p.maintainer.mbox
}}}
To get such mapping requires 3 steps:
1. Classes must be declared with the proper `rdf_type` Class variable set
2. Descriptors that return an instance of a python class should be created with the optional parameter of range_type with the same type as in step 1.
3. Call the `mapper()` function from `rdfalchemy.orm`. This can be called later to 'remap' classes at any time.
The bindings are not created until the third step so classes and descriptors can be created in any order.
== Hybrid SQL/RDF Alchemy Objects ==
If we look at the requirements for any python based object to respond to RDFAlchemy requests there are only two requirements:
1. That some instance object `inst` be able to respond to an `inst.resUri` call (it needs to know it's URI)
2. That there be some descriptor (like `rdfSingle()`) defined for the instance `obj` or its class `type(obj)`
The first requirement could be satisfied by creating some type of mixin class and inheriting from multiple base objects. Maybe I'll go there some day but the behavior of get_by would be uncertain (unless I reread the precedence rules :-). In the mean time we can assign or lookup the relevant URI for the object (assignment could be defined via the [http://sites.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/ D2Rq] vocabulary).
From there you can assign descriptors on the fly and access your Triplestore. RDFDescriptors pull from the RDF Triplestore like rdf via RDFAlchemy and the rest pull from the relational database via SQLAlchemy. A developer need not put all of his data in one repository.
You can mix and match SQL, rdflib and SPARQL data with little effort.
== CRUD ==
=== Create ===
{{{
#!python
class Person(rdfSubject):
rdf_type = FOAF.Person
first = rdfSingle(FOAF.givenname)
last = rdfSingle(FOAF.surname)
p1 = Person() # creates a bnode with an [rdf:type foaf:Person] triple
p2 = Person('>> c = Company.get_by(symbol = 'IBM')
>>> print c.companyName
International Business Machines Corp.
>>>
>>> from rdflib import Namespace
>>> ov = Namespace('http://owl.openvest.org/2005/10/Portfolio#')
>>> print ov.companyName
http://owl.openvest.org/2005/10/Portfolio#companyName
>>> print c[ov.companyName]
International Business Machines Corp.
}}}
This provides the user with complete flexibility. Any predicate can be given using the `dict` style notation. The predicate values can even be determined dynamically at run time.
In [http://www.mnot.net/sw/sparta/ Sparta] however, the Namespace prefix is brought forward into the attribute name. Something like `c.ov_companyName`. I don't like this and will not carry it forward. If you know the prefix mapping and predicate name, use the TRAMP style dict access as above. If you want pythonic dot notation access, you should use descriptors. You can even declare them after the definition of the class as in
{{{
#!python
Company.stockDescription = rdfSingle(ov.stockDescription,'stockDescription')
print c.stockDescription
}}}
= SPARQL Endpoints =
'''WARNING''': ''early alpha code at work there.'' Works by providing read-only access.
== Standalone use ==
This module can stand alone. '''It is not dependent on the rest of RDFAlchemy'''. You can use it as a drop-in replacement for may rdflib !ConjunctiveGraph applications.
Ported methods include:
* `triples` including derivative methods like:
* `subjects`, `predicates`, `objects`
* `predicate_objects`, `subject_predicates` etc
* `value`
The following update methods will '''not''' work for SPARQL Endpoints as they are read only (see [#Sesame Sesame] below)
* `add` and `remove` including derivatives like:
* `set`
* `parse` and `load` including the ability to load from a url
`SELECT`::
Returns a generator of tuples for each return result
`CONSTRUCT`::
Returns an rdflib `ConunctiveGraph('IOMemory')` instance which can be:
* queried through the rdflib api
* assigned as the `db` element to an rdfSubject instance
* serialized to 'n3' or 'rdf/xml'
== Sesame endpoints ==
Can provide read access of Sesame through endpoints. `SELECT` and `CONSTRUCT` methods supported.
If you know you have a Sesame2 endpoint use the `SesameGraph()` rather than `SPARQLGraph` as it has different capabilities.
== Joseki endpoints ==
Can provide read access of Sesame through endpoints. `SELECT`, `CONSTRUCT`, and `DESCRIBE` methods supported.
`triples`::
works but does not currently operate as a true stream. Therefore:
{{{
db.triples((None,None,None))
}}}
will attempt to load the entire endpoint into a memory resident graph ``and then`` iterate over the results.
== Relational Data thru SPARQL ==
In general if your data is in a relational database, you will probably want to use SQLAlchemy as your ORM. If, however that data is in a relational table (yours or someone else's) across the web, and has a SPARQL Mapper on top of it, RDFAlchemy becomes your tool.
=== D2R Server ===
[http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/ D2R Server] includes a Joseki servelett. If you depoloy a D2R Server you can access your relational database table through the web as an rdf datastore. RDFAlchemy usage looks like SQLAlchemy but now it can reach across the web into your rdbms (postgres, mysql, oracle, db2 etc).
D2R Server is used internally at Openvest but there are other engines which should all be accessible through the RDFAlchemy SPARQL client.
=== Other SPARQL / SQL maps ===
Another active projects providing SPARQL access to relational databases are
* [http://jena.sourceforge.net/SquirrelRDF SquirrelRDF]. In addition to relational databases, SquirrelRDF also supports access to LDAP directories.
* [http://virtuoso.openlinksw.com/ Virtuoso] which seams to have use pretty smart rewriting algorithm and also supports Named Graphs.
* [http://ccnt.zju.edu.cn/projects/dartgrid/intro.html DartQuery]. !DartQuery is a component of the !DartGrid application framework which rewrites SPARQL queries as SQL against legacy relational databases.
* [http://www.w3.org/2005/05/22-SPARQL-MySQL/ SPASQL] is an open-source module compiled into the MySQL server to give MySQL native support for RDF.
= Sesame =
The RDFAlchemy trunk now includes access to [http://www.openrdf.org openrdf Sesame2] datastores. !SesameGraph is a subclass of SPARQLGraph and builds on SPARQL endpoint capabilities as it provides write access via a [http://www.openrdf.org/doc/sesame2/system/ch08.html Sesame2 HTTP Protocol]. Just pass the url of the Sesame2 repository endpoint and from there you can use an rdflib type api or use the returned graph in rdfSubject as you would any rdflib database.
== Standalone use ==
This module can stand alone. '''It is not dependent on the rest of RDFAlchemy'''. You can use it as a drop-in replacement for may rdflib !ConjunctiveGraph applications.
Ported methods include:
* `triples` including derivative methods like:
* `subjects`, `predicates`, `objects`
* `predicate_objects`, `subject_predicates` etc
* `value`
* `add` and `remove` including derivatives like:
* `set`
* `parse` and `load` including the ability to load from a url
{{{
#!python
from rdfalchemy.sesame2 import SesameGraph
from rdflib import Namespace
doap = Namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#doap')
rdf = Namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#')
db = SesameGraph('http://localhost:8080/sesame/repositories/testdoap')
db.load('data/rdfalchemy_doap.rdf')
db.load('http://doapspace.org/doap/some_important.rdf')
project = db.value(None,doap.name,Literal('rdflib'))
for p,o in db.predicate_objects(project):
print '%-30s = %s'%(db.qname(p),o)
}}}
== RDFAlchemy use of Sesame ==
You can use it as you would any rdflib database.
Near the head of your code, place a call like
{{{
#!python
from rdfalchemy.sesame2 import SesameGraph
rdfSubject.db = SesameGraph('http://some-place.com/repository')
}}}
== Other Python SPARQL endpoints ==
Some of these have nice code which I hope to migrate into RDFAlchemy. For the impatient, you can check out:
* http://ivanherman.wordpress.com/2007/07/06/sparql-endpoint-interface-to-python/
* http://code.google.com/p/pysparql/source Nice use of pulldom to use a generator for large responses.
* http://www.openrdf.org/forum/mvnforum/viewthread?thread=1393 Attempts to provide DB API 2.0 access. The code looks incomplete but has some very nice use of reading the Sesame [http://www.openrdf.org/doc/sesame/api/org/openrdf/sesame/query/BinaryTableResultConstants.html binary results format] (`application/x-binary-rdf-results-table`)
* http://www.w3.org/2001/sw/DataAccess/proto-tests/tools/ used for the w3c [http://www.w3.org/2001/sw/DataAccess/impl-report-protocol SPARQL Implementation Report]
= Jython =
Not sure if the project is ready to branch. If the Sesame2 HTTP access provided above is not enough and you need to access Sesame and/or Jena with python you and check out the [wiki:RDFAlchemyJython RDFAlchemyJython] page for some samples.