SQLAlchemy

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

alchemy 是啥意思

['ælkɪmɪ] n. 点金术；魔力

最佳教程

官方文档 (推荐下载 PDF 到 kindle 里慢慢看)
Essential SQLAlchemy (太旧了，还是看官方文档吧)

使用 SQLAlchemy 之类的 ORM 相对 raw sql 的好处

ORM 可以将数据库存储的数据封装成对象，同时，如果封装的好的话，应该可以将所有的数据库操作都封装到对象中。这样代码在组织结构上会非常的清晰。

models 文件中的类应该包含哪些字段

这首先是一个设计的问题，其次才是 SQLAlchemy 如何使用的问题。

例如，我如果要设计一个展示 quote 的服务（包含了 tag 的功能），首先应该定义 quote 这个类对外的属性/接口，而不是一上来就想该在数据库表里存储哪些字段。

然后，将不得不存储的对外不需要访问的字段添加上。名字以下划线开头。

当其他模块不再需要使用 session 时，说明 models 已经完美了。

为什么有的时候 String 字段没有指定长度

name = Column(String()) # SQLite, Postgresql 可以不指定长度
name = Column(String(50)) # MySQL 等数据库必须指定长度

SQLite 的 VARCHAR 字段类型实际等同于 TEXT, 并不需要指定长度。但是为了不同数据库的兼容性，还是加上长度为好。

这说明，即使使用 SQLAlchemy 做了数据库操作的封装，不同数据库之间还是有差异的。

Session 与 Engine 的关系

Engine 是用于维护数据库的连接池
Session 从 Engine 的连接池中获取一个连接

所以比较谨慎的做法是，在每次获取完数据后，都关闭本次使用的 session 。

Closing the session is the most important, and it’s a bit trickier. Depending on your web framework you may be able to define a function that will run at the end of every request which will tear down all active sessions with session.remove(). Alternatively you may prefer to explicitly close the session as soon as your data has been retrieved – my choice, since my data layer had to be able to work without the web layer.

try:
    ...
    session.commit()
except Exception:
    session.rollback()
    raise
finally:
    session.close()

配置连接池的示例：

def init(cfg):
    """This function should be called before Session use.

    Input is a dict like object with databases settings from the config.
    """
    database_uri = 'mysql://%(username)s:%(password)s@%(host)s/%(dbname)s' % cfg

    opts = dict(
        pool_recycle = cfg.get('pool_recycle', 3600),
        pool_size = cfg.get('pool_size', 5),
        max_overflow = cfg.get('max_overflow', 0),
    )

    if cfg.get('debug', False):
        opts['echo'] = True
        opts['listeners'] = [DebugListener()]

    engine = create_engine(database_uri, **opts)
    Session.configure(bind=engine)

参考：

https://github.com/cleartext/Enterprise-Microblogging-Server/blob/master/microblog/db.py

model 最好定义在独立的文件中

例如， models.py

id 字段使用 unsigned integer 足够

43 亿足够了，单表很难达到这个数量级。Unsigned BigInteger 有点过了。

http://dev.mysql.com/doc/refman/5.0/en/integer-types.html

各 unsigned Integer 类型的范围：

tiny int -> 255
small int -> 65535
medium int -> 1600 万
int -> 43 亿
big int -> 不可想象

http://dev.mysql.com/doc/refman/5.0/en/integer-types.html

如何理解 relationship 和 backref

例如:

class Parent(Base):
    __tablename__ = 'parent'
    id = Column(Integer, primary_key=True)
    children = relationship("Child", backref="parent")

class Child(Base):
    __tablename__ = 'child'
    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parent.id'))

这个 relationship 相当于给 Parent 的实例增加了一个 children 属性， backref 相当于给 Child 的实例增加了一个 parent 属性。

从日志的 raw SQL 来看，如果使用 parent.children 则相当于又向 child 表发送了一次 SQL 查询，只是指定了 parent_id 而已。

遍历一个表的所有行

当一个表非常大时，需要每次批量取出一部分数据。

使用 raw sql 时，我们会使用 offset & limit.

为什么 for p in session.query(Picture).yield_per(5): print(p) 行不通?

因为，yield_per 只是限制了 python 程序中同时生成的 ORM 对象的数量，而没有限制 MySQL 服务中的内存消耗。从 yield_per 生成的 raw sql 代码就能看出，并没有对 SQL 语句做任何处理，就是直接 SELECT *

比较好的办法就是延续 offset & limit

def page_query(q):
    offset = 0
    while True:
        r = False
        for elem in q.limit(1000).offset(offset):
           r = True
           yield elem
        offset += 1000
        if not r:
            break

for item in page_query(Session.query(Picture)):
    print item

参考：

http://stackoverflow.com/questions/1145905/scanning-huge-tables-with-sqlalchemy-using-the-orm
http://stackoverflow.com/questions/7389759/memory-efficient-built-in-sqlalchemy-iterator-generator
http://markmail.org/message/nsccwf2uckvfvvl7

如何在 commit 前获取 id

首先要搞清楚 commit, flush, refresh 的关系

commit 是提交到数据库，并写到了磁盘上
flush 也提交到了数据库，并且数据对其他 transaction 可见，但是没有写到磁盘上. 可回滚，并且在当前 session 中能够查询到。(session.add 建立起一个事务)

参考：

http://stackoverflow.com/questions/1316952/sqlalchemy-flush-and-get-inserted-id
http://stackoverflow.com/questions/4201455/sqlalchemy-whats-the-difference-between-flush-and-commit

filter 与 filter_by 的区别

db.users.filter_by(name='Joe')
db.users.filter(db.users.name=='Joe')

参考：

http://stackoverflow.com/questions/2128505/whats-the-difference-between-filter-and-filter-by-in-sqlalchemy

建立索引

http://stackoverflow.com/questions/6626810/multiple-columns-index-when-using-the-declarative-orm-extension-of-sqlalchemy

class Person(Model):                                                           
     __tablename__ = "person"                                                   
                                                                                
     id = Column(Integer(unsigned=True), primary_key=True,                      
             autoincrement=True)                                                
     name = Column(String(255), nullable=False, index=True)                     
     is_man = Column(Boolean, nullable=False)                                   
     created_at = Column(DateTime)                                              
     updated_at = Column(DateTime)

生成的 SQL 语句

CREATE TABLE person (
        id INTEGER NOT NULL, 
        name VARCHAR(255) NOT NULL, 
        is_man BOOLEAN NOT NULL, 
        created_at DATETIME, 
        updated_at DATETIME, 
        PRIMARY KEY (id), 
        CHECK (is_man IN (0, 1))
)

CREATE INDEX ix_person_name ON person (name)

排序后取最新的一条记录

person = session.query(Person).filter_by(name="zhongwei").order_by(        
             Person.created_at).first()

对应的 SQL 语句

SELECT person.id AS person_id, person.name AS person_name, 
    person.is_man AS person_is_man, person .created_at AS person_created_at, 
    person.updated_at AS person_updated_at FROM person 
WHERE person.name = ? ORDER BY person.created_at
LIMIT ? OFFSET ?

2013-03-27 10:35:55,087 INFO sqlalchemy.engine.base.Engine ('zhongwei', 1, 0)

Note: 如果一条记录也没有，会返回 None。

限制查询返回特定的字段

为了减少 IO，最好限制返回的字段。

person = session.query(Person.name, Person.created_at,                     
             Person.updated_at).filter_by(name="zhongwei").order_by(            
             Person.created_at).first()

微信关注我哦 👍

大象工具微信公众号

我是来自山东烟台的一名开发者，有感兴趣的话题，或者软件开发需求，欢迎加微信 zhongwei 聊聊，查看更多联系方式

生活	跑步清单足球鲁班探索孤独的美食家驾驶电视剧收纳奶爸健康 game 电影周末 joke
Geek	健身 Laravel Git Vim MySQL Linux UI Windows SVN 纪录片管理 Shell 游记工具手机 BackboneJS 自建博客 Mac DNS Tornado CDN Django Python AngularJS 理财前端 Nginx 爬虫 Redis Javascript Browser 浏览器推广 OAuth CSS PHP Social Networks 安全运维创业杂记 VueJS Android Image IDE Java ReactJS 数据分析 SQLite RESTful 读书笔记家电 ecshop Vagrant wordpress docker SEO GTD magento mongodb nodejs weex 冷知识 ruby iOS 微信小程序 AI CMS 快应用 backpack 广告联盟 OA 短信 UWP Win CSharp Tampermonkey graphviz 钉钉 WPS 数据字典微信公众号 Fuchsia Adobe XD SQL Server thinkphp 代码规范商业模式 Flutter 头痛的问题 serverless 视频制作国际化 golang 服务器 Kotlin 网站建设 5G 笔记本图片 spark spring 物联网 InfluxDB 图像识别 postgre rust 提示词
成长的烦恼	闲言碎语待产不睡觉写作程序员孙心然语录原则大鸿语录
地球	植物时间中文赚钱国家地理烟台一生伏首拜阳明 emoji 弟子规英文国际贸易