1/16/2015 - 8:42 PM

pymc.md

Rendered
Source

PyMC

Markov chain Monte Carlo in Python

Chris Fonnesbeck

Vanderbilt Department of Biostatistics

Vanderbilt Center for Quantitative Sciences

Vanderbilt Evidence-based Practice Center

Bayesian Statistical Modeling

^ I will be discussing general implementations of Bayesian methods

Why Bother?

... the Bayesian approach is attractive because it is useful. Its usefulness derives in large measure from its simplicity. Its simplicity allows the investigation of far more complex models than can be handled by the tools in the classical toolbox.

Link and Barker 2010

^ worth the trouble!

Markov chain Monte Carlo

^ Sampling from un-normalized distributions

MCMC

Markov chain Monte Carlo simulates a

Markov chain

for which some function of interest is the

unique, invariant, limiting

distribution.

^ e.g. the joint distribution of the parameters of some model

MCMC

This is guaranteed when the Markov chain is constructed that satisfies the detailed balance equation:

Gibbs Sampler

Metropolis-Hastings

Adaptive Metropolis

Hit-and-Run Algorithm

Slice Sampler

^ First-generation MCMC

[fit] WinBUGS

model {
     for (j in 1:J){
       y[j] ~ dnorm (theta[j], tau.y[j])
       theta[j] ~ dnorm (mu.theta, tau.theta)
       tau.y[j] <- pow(sigma.y[j], -2)
     }
     mu.theta ~ dnorm (0.0, 1.0E-6)
     tau.theta <- pow(sigma.theta, -2)
     sigma.theta ~ dunif (0, 1000)
   }

Win

BUGS

No other platforms, without virtual machine
No cluster
Closed source: cannot modify or redistribute

[fit] Trap!

^ Debugging is impossible

[fit] OpenBUGS

Number of developers: 0.3

^ Andrew Thomas

OpenBUGS is open source only in the read-only sense. -- Andrew Thomas

^ Source code is not posted on public website

Component Pascal

Atrocities include:

requires BlackBox Component Builder (Windows only)
no debugger
source code is not plain text
abandoned by Oberon Microsystems

[fit] JAGS

(aka BUGS++)

Developed by Martyn Plummer (International Agency for Research on Cancer)
first released in 2002, last release 2013
written in C++

A platform for experimenting with ideas in Bayesian modeling

Its just WinBUGS!

uses variant of BUGS language
interpreter creates virtual graphical model

Its not WinBUGS!

no graphical front-end
no output processing
cross-platform
extensible: C/C++/Fortran

SAS

(yes, SAS)

general-purpose implementation

proc mcmc data=stagnant outpost=postout seed=24860 ntu=1000
             nmc=20000;
             
      ods select PostSummaries;
      ods output PostSummaries=ds;

      array beta[2];
      parms alpha cp beta1 beta2;
      parms s2;

      prior cp ~ unif(-1.3, 1.1);
      prior s2  ~ uniform(0, 5);
      prior alpha beta:  ~ normal(0, v = 1e6);

      j = 1 + (x >= cp);
      mu = alpha + beta[j] * (x - cp);
      model y ~ normal(mu, var=s2);
   run;

standard distributions available
parms: declares unknown stochastics

Stan

Gelman Lab, Columbia University

^ Stanislaw Ulam (& Nicholas Metropolis) -- Manhattan project

Stan is ...

compiled
statically typed
imperative
BSD licenced

Four Flavors of Stan

CmdStan
RStan
PyStan
MStan

Hierarchical model example

data {
    int<lower=0> J; // number of schools
    real y[J]; // estimated treatment effects
    real<lower=0> sigma[J]; // s.e. of effect estimates
}

parameters {
    real mu;
    real<lower=0> tau;
    real eta[J];
}

BUGS-inspired syntax


transformed parameters {
    real theta[J];
    for (j in 1:J)
    theta[j] <- mu + tau * eta[j];
}

model {
    eta ~ normal(0, 1);
    y ~ normal(theta, sigma);
}

PyMC

    >>> theta = Exponential('theta', 2.)
    >>> theta
    <pymc.distributions.Exponential 'theta' at 0x10f9e2790>
    >>>

    >>> theta = Exponential('theta', 2.)
    >>> theta
    <pymc.distributions.Exponential 'theta' at 0x10f9e2790>
    >>> theta.value
    array(0.3848267102336743)
    >>>

    >>> theta = Exponential('theta', 2.)
    >>> theta
    <pymc.distributions.Exponential 'theta' at 0x10f9e2790>
    >>> theta.value
    array(0.3848267102336743)
    >>> theta.logp
    -0.07650623990740335
    >>>

    >>> theta = Exponential('theta', 2.)
    >>> theta
    <pymc.distributions.Exponential 'theta' at 0x10f9e2790>
    >>> theta.value
    array(0.3848267102336743)
    >>> theta.logp
    -0.07650623990740335
    >>> theta.random()
    array(0.4290156681214033)
    >>>

    >>> theta = Exponential('theta', 2.)
    >>> theta
    <pymc.distributions.Exponential 'theta' at 0x10f9e2790>
    >>> theta.value
    array(0.3848267102336743)
    >>> theta.logp
    -0.07650623990740335
    >>> theta.random()
    array(0.4290156681214033)
    >>> theta.value
    array(0.4290156681214033)
    >>>

    >>> theta = Exponential('theta', 2.)
    >>> theta
    <pymc.distributions.Exponential 'theta' at 0x10f9e2790>
    >>> theta.value
    array(0.3848267102336743)
    >>> theta.logp
    -0.07650623990740335
    >>> theta.random()
    array(0.4290156681214033)
    >>> theta.value
    array(0.4290156681214033)
    >>> theta.logp
    -0.1648841556828613

Arbitrary stochastic variables


@observed
def survival(value=t, lam=theta, f=failure):
    """Exponential survival likelihood,   
    accounting for censoring"""
    return sum(f * log(lam) - lam * value)

Estimating the probability of IQ impairment from blood phenylalanine for phenylketonuria patients

Christopher J. Fonnesbeck Melissa L. McPheeters Shanthi Krishnaswami Mary Louise Lindegren Tyler Reimschisel

J Inherit Metab Dis DOI 10.1007/s10545-012-9564-0

^part of a larger systematic review on the comparative effectiveness of treatment for PKU by Vanderbilt EPC

prevalence: 1 in 13,500 to 19,000 infants: RARE

Phenylalanine

^ metabolic disorder in which a buildup of phenylalanine (Phe) in the blood results from an inability to properly metabolize it

results in intellectual impairment (low IQ)

PyMC

[fit] 3

^ Complete re-write of PyMC2

John Salvatier

Amazon

Thomas Wiecki

Brown University
Quantopian

Hamiltonian Monte Carlo

(Neal 1993)

Uses a physical analogy of a frictionless particle moving on a hyper-surface

Requires an auxiliary variable to be specified

position (unknown variable value)
momentum (auxiliary)

position is -log(p(θ|x))
momentum is normal distribution corresponding to quadratic kinetic energy function

The Hamiltonian

The Hamiltonian is defined as the sum of potential energy E(s) and kinetic energy K(φ)

Hamiltonian Dynamics

This transformation preserves volume and is reversible.
Because the physical system conserves energy, the log probability that results will be similar to the log probability at the last iteration => high acceptance

Leapfrogging

In practice, we cannot simulate Hamiltonian dynamics exactly because of the problem of time discretization.
Discretization results in some error, so a Metropolis accept/reject step is required.

Hamiltonian MC

Sample a new velocity from univariate Gaussian
Perform n leapfrog steps to obtain new state 𝛘'
Perform accept/reject move of 𝛘'

Hamiltonian MC

Metropolis-Hastings

with pm.Model() as model:
    pm.glm.glm('y ~ x', data)
    step = pm.Metropolis()
    iter_sample = pm.iter_sample(samples+1000, step)

animation.FuncAnimation(fig, animate, init_func=init,
                        frames=samples, interval=5, blit=True)

Video courtesy Thomas Wiecki (@twiecki)

HMC

with pm.Model() as model:
    pm.glm.glm('y ~ x', data)
    step = pm.NUTS()
    iter_sample = pm.iter_sample(samples+1000, step)

animation.FuncAnimation(fig, animate, init_func=init,
                        frames=samples, interval=5, blit=True)

Video courtesy Thomas Wiecki (@twiecki)

Hamiltonian MC requires tuning of two parameters:

Trajectory length

too small ⇒ slow mixing (random walk)
too large ⇒ u-turns

Leapfrog step size

too small ⇒ inefficient simulation
too large ⇒ inaccurate simulation

Go NUTS

With the No U-Turn Sampler

Hoffmann and Gelman (2011)

Extension of HMC that adaptively selects path lengths
also sets leapfrog step size (epsilon)

NUTS

NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution.

It stops automatically when it starts to double back and retrace its steps.

Binary Doubling

Doubling process builds a balanced binary tree whose leaf nodes correspond to position-momentum states.

Doubling is halted when the subtrajectory from the leftmost to the rightmost nodes of any balanced subtree of the overall binary tree starts to double back on itself

stops when an additional doubling would not increase the squared distance from starting point (based on BALANCED binary subtrees)

NUTS

(Hoffmann and Gelman 2011)

^ 200-dimensional mvn with high correlation
GOING TO HAVE A BAD TIME

Calculating Gradients in Theano

>>> from theano import function,tensor as T
>>> x = T.dmatrix('x')
>>> s = T.sum(1 / (1 + T.exp(-x)))
>>> gs = T.grad(s, x)
>>> dlogistic = function([x], gs)
>>> dlogistic([[3, -1],[0, 2]])
array([[ 0.04517666,  0.19661193],
       [ 0.25      ,  0.10499359]])

What's new?

HMC/NUTS
Slice sampler
Parallel chains
Simplified code base
R-style formulae for GLM

TODO

automated missing data imputation
automatic step method selection
non-parametric Bayesian methods
non-MCMC methods
containers
documentation(!)

The Future

^ Very little competition (or diversity) in space of Bayesian statistical software
Has potential for being "platform for experimenting with Bayesian ideas"
Much unfinished work

Cacher is the code snippet organizer for pro developers

We empower you and your team to get more done, faster

pymc.md

PyMC

Markov chain Monte Carlo in Python

Chris Fonnesbeck

Vanderbilt Department of Biostatistics

Vanderbilt Center for Quantitative Sciences

Vanderbilt Evidence-based Practice Center

Bayesian Statistical Modeling

Why Bother?

Markov chain Monte Carlo

MCMC

Markov chain

unique, invariant, limiting

MCMC

Gibbs Sampler

Metropolis-Hastings

Adaptive Metropolis

Hit-and-Run Algorithm

Slice Sampler

[fit] WinBUGS

Win

BUGS

[fit] Trap!

[fit] OpenBUGS

Number of developers: 0.3

Component Pascal

[fit] JAGS

(aka BUGS++)

Its just WinBUGS!

Its not WinBUGS!

SAS

(yes, SAS)

Stan

Stan is ...

Four Flavors of Stan

Hierarchical model example

PyMC

Arbitrary stochastic variables

Estimating the probability of IQ impairment from blood phenylalanine for phenylketonuria patients

Phenylalanine

PyMC

[fit] 3

John Salvatier

Thomas Wiecki

Hamiltonian Monte Carlo

(Neal 1993)

The Hamiltonian

Hamiltonian Dynamics

Leapfrogging

Hamiltonian MC

Hamiltonian MC

Metropolis-Hastings

Video courtesy Thomas Wiecki (@twiecki)

HMC

Video courtesy Thomas Wiecki (@twiecki)

Trajectory length

Leapfrog step size

Go NUTS

With the No U-Turn Sampler

Hoffmann and Gelman (2011)

NUTS

Binary Doubling

NUTS

Calculating Gradients in Theano

What's new?

TODO

The Future

Questions?