2012-09-22

一体いつから私がPyCon JP 2012 に参加したと錯覚していた？

python

といわけで、PyCon JP 2012 に参加したとか、しないとか、発表したとか、しないとか、そういう噂があるらしいですね。懸命なる皆様におかれましては、流言飛語に惑わされぬよう、くれぐれもお気をつけください。

正直参加したような気分としては、金髪のおねーさんはいるし、ドラを叩くおねーさんはいたし、窓辺とかいうおねーさんはいたし、社内で見かけるおねーさんもいつもと雰囲気ちがったし、お昼はFungoのサンドイッチで洒落乙だったし、とても楽しかったような気がします。

あとは高校生すげーとかいっぱいあったような気もしてますが、もはや参加したかも定かではないし、たとえ参加していたとしても、一杯ありすぎて書ききれないので、割愛します。

ところでなんでこんなさっきから、ふわっふわっな感じで書いてるのかというと、二日目に盛大に風邪をひいて今週体調最悪で、しかも今日は強行日程で実家に帰ってきてるので、もう先週の土曜の事なんて覚えてないのでもう寝ましょうよ。。。おれ

PyCon JP 2012 のスタッフ・発表者の皆様方。本当にありがとうございました。そしてお疲れさまでした。

2012-08-24

DjangoのMasterSlaveRouterの話し(稲川淳二風)

python django

これは最近Djangoを使い始めた人の話しなんですけどね。Djangoは複数DBが扱えるんですよ。ええ。それでMaster/Slaveなレプリケーションを扱いたいなーなんて思ってたんです。

ところがね。ドキュメントに書いてあるMasterSlaveRouterを使った時に不思議な現象に遭遇しちゃったんですよ。

https://docs.djangoproject.com/en/dev/topics/db/multi-db/

ドキュメントに書いてあるとおり「MasterSlaveRouter」を実装しておそるおそるsyncdbして、そーと見たら、フッと出た。出たんですよ！エラー

$ python manage.py syncdb
 .
 .
 django.db.utils.IntegrityError: (1062, "Duplicate entry 'contenttypes-contenttype' for key 'app_label'")

あれー。おかしいなー。おかしいなー。なんどsyncdbし直してもずーと出るなー。てしばらく怖くて動けなかったんですよ。

このままじゃいけないなー。なんとかしなくちゃなー。て強く念じたら、なんとか動けるようになって、例の先生に頼ったら、出て来たんですよ。下記のURLが。

http://stackoverflow.com/questions/10497418/django-syncdb-trying-to-insert-duplicate-rows-when-multiple-databases-are-specif

どうやら、この現象は、djangoがデフォルト作るモデルの中に、独自のキャッシュ機構を備えた、Managerオブジェクトを使ってるっていうが原因ぽいなーてのが分かったんですよ。そう曰く付きだったんですよ。怖いなー怖いなー。

お祓いの方法も書いてあって、MasterSlaveRouterのdb_for_readで、そういった独自のキャッシュ機構を持ってる奴らは、必ずMasterを見る様にしてやれば、その現象は収まるらしいんですよー。

  def db_for_read(self, model, **hints):
        if model._meta.app_label in ('contenttypes', 'sites'):
            return 'master'
        else:
            return 'slave'

いやー。これで安心して眠れるなーて、胸を撫で下ろしたんですけど、よく考えるとこれって微妙なお祓いの仕方だなーて思って、ゾッとしました。

なんか他に良い方法ないのかなーって今でもその人は夜な夜な探してるらしいですよ。。

。。。なんだよこれ。

2012-02-26

Python の Coding Style あれこれ

python

唐突ですが、Python の Coding Style について参考になりそうなものをまとめてみるテスト。主にベースとなるPEP8の他に、僕が見聞きしたWebフレームワークのCoding Styleを列挙しています。

PEP 8 Style Guide for Python Code

PEP 8 は Pythonを書く上で大体みんなが参考にするCoding Styleです。以下にあげる Coding Style達は、PEP8 に従い、その中で独自のCoding Styleを定義しているものがほとんどです。

PEP が初耳という人は、下記エントリーが参考になるでしょう。

http://www.oreilly.co.jp/community/blog/2010/12/string-reprensation-in-python3.html

PEP 257 Docstring Conventions

こちらは docstring の表記に関する PEP です。Pythonでは docstringを利用してモジュールのヘルプや、関数・クラスへのコメントを書く事が推奨されています。そのdocstringをどう書くべきかについて、書かれています。

Google Python Style Guide

Googleの出しているガイドです。pycheckerの話、引数のdefault値の話など、割と広範に及んでいます。メリット、デメリットが明確に記載されているので、なぜそれが必要なのか分かりやすくなっています。いま気づいたのは、おまけ的にvimrcの設定もついてるんですね。

http://google-styleguide.googlecode.com/svn/trunk/google_python_style.vim

The Pylons Project | Coding Style and Standards

http://docs.pylonsproject.org/en/latest/community/codestyle.html#coding-style

Pyramid (http://www.pylonsproject.org/projects/pyramid/about) というフレームワークで有名な、Pylonsプロジェクト(http://www.pylonsproject.org/)が出しているCoding Style です。Styleそのものは PEP8 に準拠していて、その他 Pylonsプロジェクトに貢献するための作法についても書かれています。

Styleとは別の話ですが、ユニットテストについてのガイドラインも参考になると思いました。

http://docs.pylonsproject.org/en/latest/community/testing.html#testing-guidelines

Django | Coding style

https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/coding-style/

言わずと知れた Webフレームワーク Django の Coding Styleです。こちらもPEP8を基本としつつ、model, view, template等に関するstyleがか書かれています。日本語訳はちょっと見つかりませんでした。

The Pocoo Style Guide

http://www.pocoo.org/internal/styleguide/
http://flask.pocoo.org/docs/styleguide/ ※ 上の pocooのガイドラインと内容は一緒です
(和訳) http://tokuda109.github.com/flask-docs-ja/styleguide.html

Flask(http://flask.pocoo.org/) や Werkzeug(http://werkzeug.pocoo.org/) 、 Sphinx(http://sphinx.pocoo.org/)といったフレームワークやツールを作っている、Pocooという集団？の提案するCodingStyleです。やはりPEP8がベースとなっています。個人的には moduleのヘッダー部分のコメントの書き方は参考になりました。

Zope | Coding Style

僕はあまり知らないのですが、Zope は古くからあるフレームワークで、CMSの Plone (http://plone.org/) や、Ubuntuの開発等でも利用されるBTSのLaunchPad(https://launchpad.net/) で使われている事でも有名らしいです。こちらもPEP8に準拠しています。個人的には「# XXX」や「# BBB」といったコメントの書き方は、初めて知りました。

http://docs.zope.org/zopetoolkit/codingstyle/todocomments.html

TurboGears | Coding Style

http://www.turbogears.org/2.1/docs/main/Contributing.html#coding-style

これもフレームワークでPEP8に準拠しています。名前は結構昔から聞いた事がありますが、どういう実績があるのかはちょっと分かりませんでした。

Repoze

http://what.repoze.org/docs/1.0/Participate.html#coding-conventions

Repoze は Zopeの各コンポーネントをWSGIアプリでも使えるようにするプロジェクトだそうです。PEP8に従っています。repoze.who(http://docs.repoze.org/who/1.0/) 等は情弱な僕でも聞いた事があります。

Coding Style的なものが見つからなかった。

CherryPy (http://docs.cherrypy.org/)
web2py (http://web2py.com/book)
Tornado (http://www.tornadoweb.org/documentation/index.html)

まとめ

調べる前は、結構Webアプリフレームワーク当たりは独自のStyleを推奨していたりして、カオスな事になってるケースもあるかと思ったんですが、ほとんどがPEP8に従っていて、PEPの影響力すごいなぁとか思いました。

いやほんと「here is the golden rule: Imitate the existing [フレームワーク名] code.」とか書いてなくてすごい。

何か他に良さそうなものがあったら教えて下しあ

その他参考

2012-02-08

bpmappers 便利！

python

今日は bpmappers というライブラリが、すごく便利で感動したという事を書く

bpmappers ドキュメント — bpmappers v0.5 documentation

良くJSONを返すようなAPIを作るときに、オブジェクトの必要な部分だけを、辞書に変換してからJSONにするなんて事をすると思う。オブジェクトのプロパティが、数値やテキストなどリテラルなものであれば良いが、別のオブジェクトだったりすると、そもそもオブジェクトから辞書への変換は面倒くさそうだ。

これをよしなにやってくれるのがbpmappersである。詳しくはドキュメントを見てもらうとして、簡単な例を紹介しようと思う。

前提

まず SQLAlchemyで下記のようなモデルがあるとする。Entry と Comment は 1:N の関係にある。

# declare models 
class Entry(Base): 
    __tablename__ = 'entries'  
    id = Column(Integer, primary_key=True)   
    text = Column(String)  
    created_at = Column(DateTime, default=datetime.now, nullable=False) 

    comments = relationship('Comment') 

class Comment(Base):
    __tablename__ = 'comments'
    id = Column(Integer, primary_key=True)
    entry_id = Column(Integer, ForeignKey('entries.id'), nullable=False)
    text = Column(String)
    created_at = Column(DateTime, default=datetime.now, nullable=False)

このように定義すると「Entry.comments」は下記のようにCommentオブジェクトのリストとして取得できる。

# select data 
entries = Entry.query.all() 
for entry in entries:
   for comment in entry.comments: 
        print ' comment.id: ' + str(comment.id)
        print ' comment.text: ' + comment.text
        print ' comment.created_at: ' + str(comment.created_at)

この入れ子になったオブジェクト構成をbpmappersを使って簡単にJSONにしてみる。

Mapperの定義

# declare mappers
class CommentMapper(Mapper):
    id = RawField()
    entry_id = RawField()
    text = RawField()
    created_at = RawField(callback=lambda x:x.strftime('%Y-%m-%d %H:%M:%S'))
                                                                            
class EntryMapper(Mapper):
    id = RawField() 
    text = RawField() 
    created_at = RawField(callback=lambda x:x.strftime('%Y-%m-%d %H:%M:%S'))
 
    comments = ListDelegateField(CommentMapper) 

#convert to json  
print json.dumps(EntryMapper(Entry.query.all()).as_dict(), indent=2)

これで終わり。 Mapperを定義するだけ。あとはas_dict()でよしなに辞書にしてくれる。実行結果はこんな感じ

{
  "created_at": "2012-02-08 00:06:12", 
  "text": "entry text", 
  "id": 1, 
  "comments": [
    {
      "created_at": "2012-02-08 00:06:12", 
      "text": "comment1 text", 
      "entry_id": 1, 
      "id": 1
    }, 
    {
      "created_at": "2012-02-08 00:06:12", 
      "text": "comment2 text", 
      "entry_id": 1, 
      "id": 2
    }
  ]
}

Mapperクラスを継承して、必要なフィールドを定義するだけで目的が達成できてしまった。RawFieldはオブジェクトが持つ同名のプロパティをそのままセットしてくれる。callbackも呼べるので適宜データを加工した上でセットすることも可能。

ポイントは ListDelegateField 。リストの各要素に対するマッピング処理を別のMapperクラスに委譲できる。この例でいうと、commentsの各要素であるCommentオブジェクトのマッピング処理は、CommentMapperに任せてる。

まとめ

駆け足で説明したけど、まとめるとこんな感じ

Mapperを定義するだけで、オブジェクト自身のデータをゴニョゴニョしなくていいのは大変嬉しい。
DelegateFieldがある事で、ネストしたデータ構造でも安心して使える。
Modelクラスと1 : 1 になるように掛けるので、ごちゃごちゃしない

以上おわり。

おまけ

コードの断片だと分かりにくいと思うので、サンプルコードをまるっと貼付けておく。

#!/usr/bin/env python
#-*- coding:utf8 -*-

import json 
from datetime import datetime 

from sqlalchemy import create_engine, Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.orm import scoped_session, sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base

from bpmappers import Mapper, RawField, ListDelegateField 

# create base 
engine = create_engine('sqlite://', convert_unicode=True, echo=False) 
db_session = scoped_session(sessionmaker(autocommit=False,  
                            autoflush=False,
                            bind=engine)) 
Base = declarative_base() 
Base.query = db_session.query_property()

# declare models
class Entry(Base):
    __tablename__ = 'entries' 
    id = Column(Integer, primary_key=True) 
    text = Column(String) 
    created_at = Column(DateTime, default=datetime.now, nullable=False)

    comments = relationship('Comment')

class Comment(Base): 
    __tablename__ = 'comments' 
    id = Column(Integer, primary_key=True)  
    entry_id = Column(Integer, ForeignKey('entries.id'), nullable=False)  
    text = Column(String) 
    created_at = Column(DateTime, default=datetime.now, nullable=False) 

# declare mappers
class CommentMapper(Mapper): 
    id = RawField() 
    entry_id = RawField()
    text = RawField() 
    created_at = RawField(callback=lambda x:x.strftime('%Y-%m-%d %H:%M:%S'))

class EntryMapper(Mapper): 
    id = RawField() 
    text = RawField() 
    created_at = RawField(callback=lambda x:x.strftime('%Y-%m-%d %H:%M:%S'))
    comments = ListDelegateField(CommentMapper)

# insert data 
Base.metadata.drop_all(bind=engine)
Base.metadata.create_all(bind=engine)

db_session.add(Entry(text='entry text'))
db_session.commit()
db_session.add(Comment(entry_id=1, text='comment1 text')) 
db_session.add(Comment(entry_id=1, text='comment2 text')) 
db_session.commit()

# select data
print '-' * 10 
entries = Entry.query.all() 
for entry in entries: 
    print 'entry.id: ' + str(entry.id)  
    print 'entry.text: ' + entry.text 
    print 'entry.creted_at: ' + str(entry.created_at) 
    for comment in entry.comments: 
        print ' ' + ('-' * 9)
        print ' comment.id: ' + str(comment.id)  
        print ' comment.text: ' + comment.text
        print ' comment.created_at: ' + str(comment.created_at)

# convert to json
print '-' * 10
print json.dumps(EntryMapper(entries).as_dict(), indent=2)

2012-02-05

ディレクトリ内のファイルを作成時刻でソートして連番をつけてリネームする

python

という感じのブログを見つけたので自分だったらどうやるかなぁと思ってやってみる。

[Python]ディレクトリ内のファイルを作成時間の昇順ルールで添え字をつけてリネームするスクリプト - Kshi_Kshi's blog

前提

とりあえず上のブログを見習って、なるべく限定的な条件にして、お手軽にくむ

再帰的にディレクトリを辿るとかいらない。
リネーム対象はファイルのみ
拡張子はリネーム後でも変わらずに
リネームする時のフォーマットは、決めうちで
ソートは昇順のみで良い
バックアップとかも考慮いれない

こうした

#!/usr/bin/env python
# coding: utf-8 

import os
import sys
import stat

DIR_PATH = "/Users/hogehoge/hogehoge" 
RENAME_PATTERN = "IMG_%05d%s"

if __name__ == "__main__":
    if not os.path.isdir(DIR_PATH): 
        sys.exit("ERROR: '%s' is not directory." % DIR_PATH)

    index = 0 
    (root, dirs, files) = next(os.walk(DIR_PATH.rstrip(os.sep))) 
    targets = dict([(os.stat(os.path.join(root, f)).st_ctime, f) for f in files])
    for st_ctime, filename in sorted(targets.items()): 
        index += 1
        (name, ext) = os.path.splitext(filename) 
        os.rename(os.path.join(root, filename), os.path.join(root, RENAME_PATTERN % (index, ext)))

余談

ディレクトリセパレータは os.sepでとれる事をしった
os.path.splitextも初めてしった。

追記(2012/02/05)

海原雄山なのか戦国大名なのか段々見分けるのが困難になってきた御館様(id:imagawa_yakata)に下記のようにTwitterで教えてもらいました。いつもありがとうございます。m( _ _ )m

「イテレータ操作するときはビルトインのnextにイテレータオブジェクトを渡すと良いよ。Python2でも3でも動く書き方になるから。」
- refs: https://twitter.com/imagawa_yakata/status/166042980627656704

なるほど調べてみたら「PEP3114」として上がってるんですね。知らんかった。
- PEP 3114 -- Renaming iterator.next() to iterator.__next__()

といわけで上記のコードを下記のように修正しました。

-    (root, dirs, files) = os.walk(DIR_PATH.rstrip(os.sep)).next()  
+    (root, dirs, files) = next(os.walk(DIR_PATH.rstrip(os.sep)))

2012-02-04

利用可能なモジュール一覧を取得する

python

ふとPythonで利用可能なモジュールの一覧を取得したくなったのでメモ。

条件

ここでいう利用可能なモジュールおよびパッケージ一覧とは、以下のようなものとする。

ビルトインで組み込まれてるやつ
site-packages にインストールされてるやつ
virtualenvなどの仮想環境などにインストールされてるやつ
というかパスが通ってて、モジュール、パッケージとして認識されるヤツ全て
確認したいのじゃなくてリストとかのデータとして扱いたい

こんなん僕が求めてるのとちゃう

求めてるものとはなんか違った夢の残骸達

sys.builtin_module_names

文字通り、ビルトインで入ってるやつの一覧。足りない

http://www.python.jp/doc/release/library/sys.html#sys.builtin_module_names

import sys                                                                      
print sys.builtin_module_names

#=> ('__builtin__', '__main__', '_ast', '_codecs', '_sre', '_symtable', '_warnings', '_weakref', 'errno', 'exceptions', 'gc', 'imp', 'marshal', 'posix', 'pwd', 'signal', 'sys', 'thread', 'xxsubtype', 'zipimport')

sys.modules

実行時にロード済みなやつの一覧。まだ足りない。とういかロードしてない組み込みのヤツは出てこない。

http://www.python.jp/doc/release/library/sys.html#sys.modules

import sys                                                                      
print sys.modules.keys()

#=>['cStringIO', 'copy_reg', 'encodings', 'site', '__builtin__', '__main__', 'encodings.encodings', 'abc', 'posixpath', 'flaskext', '_weakrefset', 'errno', 'pprint', 'encodings.codecs', '_abcoll', 'types', '_codecs', 'new', '_warnings', 'genericpath', 'stat', 'zipimport', 'encodings.__builtin__', 'warnings', 'UserDict', 'encodings.utf_8', 'sys', 'codecs', 'os.path', 'signal', 'linecache', 'posix', 'encodings.aliases', 'exceptions', 'os', '_weakref']

help('modules')

めっさ一杯出た。大分近い。仮想環境にいれたヤツとかもちゃんと拾ってくれてる。。。けど結果が標準出力されるのでなんか微妙

Pythonイントロスペクション入門

$ python -c "help('modules')"

Please wait a moment while I gather a list of all available modules...

ArgImagePlugin      _Mlte               difflib             pimp
Audio_mac           _MozillaCookieJar   dir_help            pip
BaseHTTPServer      _OSA                dircache            pipes
Bastion             _Qd                 dis                 pkg_resources
BdfFontFile         _Qdoffs             distutils           pkgutil
BeautifulSoup       _Qt                 django_assets       platform
BeautifulSoupTests  _Res                doctest             plistlib
BmpImagePlugin      _Scrap              dprint              popen2
...
...

これや僕が求めてたのはこれなんや

「python -c "help('modules')"」の結果は、「pydoc modules」と等価である。というかhelpは多分pydocを見てる。というわけで、pydoc.pyをあら探しする。

であった。それが「pydoc.ModuleScanner」とかいうヤツ。pydoc.pyではこんな使いかたしてた。

1879             modules = {}
1880             def callback(path, modname, desc, modules=modules): 
1881                 if modname and modname[-9:] == '.__init__': 
1882                     modname = modname[:-9] + ' (package)' 
1883                 if find(modname, '.') < 0: 
1884                     modules[modname] = 1 
1885             def onerror(modname):
1886                 callback(None, modname, None)
1887             ModuleScanner().run(callback, onerror=onerror)
1888             self.list(modules.keys())

これを使って「pydoc modules」は実現されていた。単純に一覧が欲しいだけので、コピってこんな感じに変えてみた。

#!/usr/bin/env python
# -*- coding: utf-8 -*

from pydoc import ModuleScanner
from string import find

modules = []

def callback(path, modname, desc, modules=modules):  
    if modname and modname[-9:] == '.__init__': 
        modname = modname[:-9]  
    if find(modname, '.') < 0 and modname not in modules:
        modules.append(modname) 

def onerror(modname): 
    callback(None, modname, None)

ModuleScanner().run(callback, onerror=onerror)
print modules

これを実行すると。。。

$ python all_modules.py
['__builtin__', '_ast', '_codecs', '_sre', '_symtable', '_warnings', '_weakref', 'errno', 'exceptions', 'gc', 'imp', 'marshal', 'posix', 'pwd', 'signal', 'sys', 'thread', 'xxsubtype', 'zipimport', 'all_avaliable_modules', 'all_avaliable_modules2', 'ascii_to_ebcidic', 'blinker_sample', 'bool_judge', 'cast', 'class_name', 'converting_byte', 'datetime_test', 'dict_key_map', 'dict_merge', 'diff_time', 'dir_help', 'dprint', 'fib', 'file_abs_path', 'fold_test', 'func_args_disruptiv
...
...

おー。いっぱいでてきた。ビルトインも、仮想環境も、カレントディレクトリのヤツも全部出てきたw

サブモジュール、パッケージも一覧に含める

上のヤツでは、トップレベル(と呼んでいいんだろうか？)のパッケージとモジュールの一覧しか表示されないようになってる。サブモジュール・パッケージも含めて一覧で表示したかったら。「callback」関数をこう変えればいい。

def callback(path, modname, desc, modules=modules):
    if modname and modname[-9:] == '.__init__':
        modname = modname[:-9]
-    if find(modname, '.') < 0  and modname not in modules:
-        modules.append(modname)
+    if modname not in modules:
+        modules.append(modname)

愚行

ModuleScannerに辿り着く前は、もういいやって気分で「pydoc modules」の標準出力をパースして強引にリストを取得しようとしている愚かな僕がいました。けど「sys.path.insert」で追加したパスのモジュールとか取得できないじゃんとか思ってやめた

#!/usr/bin/env python                                                           
# -*- coding: utf-8 -*-                                                         

import pprint
import commands
import re
ret = commands.getoutput('python -c "help(\'modules\')"')
module_list = []
start = False
for line in ret.split('\n'): 
    if re.match(r'[^ ]+[ ]+[^ ]+[ ]+[^ ]+[ ]+[^ ]+$', line): 
        start = True
    if start and line == '':
        break
    if not start:
        continue

    line = re.sub(r'[ ]+',r' ', line)  
    for module in line.split(' '):
        if module.strip(' ') != '': 
            module_list.append(module.strip(' '))

module_list.sort() 
pprint.pprint(module_list)

追記(2015/3/4)

コメント欄で教えて貰った「pkgutil.iter_modules」を使うとインストール済みのやつとかを一覧でみることができるようです。いい感じですね。:id:hirokiky さんありがとうございます。

>>> import pkgutil
>>> for m in pkgutil.iter_modules():
...     print m
...
(<pkgutil.ImpImporter instance at 0x103103e18>, 'UserDict', False)
(<pkgutil.ImpImporter instance at 0x103103e18>, '_abcoll', False)
(<pkgutil.ImpImporter instance at 0x103103e18>, '_weakrefset', False)
(<pkgutil.ImpImporter instance at 0x103103e18>, 'abc', False)
(<pkgutil.ImpImporter instance at 0x103103e18>, 'codecs', False)

余談

ModuleScannerのonerrorという関数は、何かしらのエラーでimportできないパッケージに対して呼び出される。(モジュールではなく)

参考

2012-02-01

Google Appengine で datastore_admin が使えなくなった

python

GoogleAppengine の Python SDK のバージョン1.6.2が出ましたね。早速upgradeしてみたら。下記のようなエラーに遭遇したのでメモ

SdkReleaseNotes - googleappengine - Google App Engine Python SDK Release Notes - Google App Engine - Google Project Hosting

datastore_admin が無いよとエラー

dev_appserver.py を叩いたら下記のようなエラーがでました。

google.appengine.ext.builtins.InvalidBuiltinName: datastore_admin is not the name of a valid builtin.
Available handlers are: admin_redirect, appstats, default, deferred, django_wsgi, remote_api

どうやらビルトインの中から、datastore_adminは無くなったようです。もう少し正確に言うと、Python2.7 で開発してる場合のみdatastore_adminが使えないっぽいです(2.5で試してないけど) 。リリースノートにもそのように書いてありますた。

2.7で開発をしているというのは、app.yamlでruntimeを「python2.7」にしているという事です。

runtime: python27

というわけでapp.yamlから削ったった。

builtins:
- datastore_admin: on #<- 削った