9/13/2017 - 8:35 AM

Database introspection instead of defining schema with language structures

Database introspection instead of defining schema with language structures

There has been some time since web frameworks like django or rails become popular. Usually they had an ORM integrated in or, more generally, a relational database toolkit. What I want to discuss is relevant to most of database toolkits: django ORM, sqlalchemy, rails active record, sequelizejs and many others.

One common thing about their design is that the database schema is defined with language structures, be it classes or structs or dictionaries, and not by making requests to the database for existing tables.

In other words, like this

# models.py

class MyTable:
  id = AutoField()

and not like this

models = inspect(db_settings)

Probably the exception is sqlalchemy, where db introspection support is pretty good. It is not the default or recommended way though. Django lets you generate python modules by inspecting db, which is not what I'm talking about.

With the db introspection approach you will use CREATE TABLE statements for the initial application setup, and not the class definition. Probably that was one of the reasons this approach was rejected in django: it promised to use python for all things and to abstract the SQL away. sqlalchemy has no purposes of abstracting sql, still the default way is class definition. Probably there are reasons I don't see, but before I list the plusses and minuses I do see, let's try to imagine what it would look like.

We have fields in our models, those will be database columns. Name of the field is the name of the column. That's it. Actually, we can express relations between models too. Though there is no relations concept in SQL, there are constraints (foreign keys) that can be seen as a kind of relation to another table. Let foreign key represent a relation. A foreign key has a name. Hypothetically this name can reflect the name of the related attribute in the model.

Now, plusses and minuses.

++ Schema definition / changes in plain SQL is more flexible

That is hardly arguable.

++ Database migrations

Schema migrations are plain SQL, data migrations are just executable files. Compare this with the "models file" aproach where a schema change must correlate the model change with the SQL. And with data migrations things are even worse since it needs a specific revision of the models file to execute. Probably the single ORM where data migrations are properly implemented is django. Speaking of the rest, probably, the only way is to copy the models file and to store it alongside the migration script (am I wrong?)

++ Using multiple toolkits at once

For example, you can use both django and sqlalchemy in the same project.

++ Allows language independent tools and tools that use plain SQL

An example of the latter: https://github.com/kennethreitz/records. A library allows you to pass parameters to sql and get results as a named tuple. To use such tools you have at least to know your SQL schema.

Also, let's take django for example. It has a lot of language-independent concepts. Take those Q, F objects for example or it's attribute lookup: books__author__name__startswith: it is not python specific and can be seen as a higher-level query language.

-- More rigid. Cannot define a custom field (like this).

Possible implementation

I think existing libraries can be adapted to support the introspection mode. Sqlalchemy has good support for it out of the box.