{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\nApproximate Density\n===================\n\nWhen using the method of successive approximation we construct\na regression model \n\n\\begin{align}p(\\mathbf{x} \\mid \\mathbf{g}, \\boldsymbol{\\beta})\n   = \\mathcal{N}(\n   \\mathbf{x}\n   \\mid \\mathbf{P}^{M} \\boldsymbol{\\mu}_0,\n   \\alpha \\mathbf{I})\\end{align}\nThe idea is to construct an approximation to this density by\nintroduction each of the successive approximations\n\n\\begin{align}\\mathbf{z}_{i} = \\mathbf{P}\\mathbf{z}_{i-1},\\end{align}\n\nthe idea being that knowing the complete set of approximations\n$\\{ z_0,\\ldots,z_M\\}$ we can solve for the latent variables\nby rearranging the linear equations, instead of manipulating the\npolynomial mean function.\n\nFor this conversion to work we need to introduce a regularisation\nparameter $\\lambda > 0$ and then define $\\mathcal{N}(\n\\mathbf{z}_{i} \\mid \\mathbf{P}\\mathbf{z}_{i-1}, \\lambda\n\\mathbf{I})$, once we do this we can write log-likelihood of the\nstate variables\n\n\\begin{align}\\log = -\\frac{\\lambda}{2} \\sum_{i=1}^{M}\n   \\left(\n   \\mathbf{z}_{i-1}^{\\top}\\mathbf{P}^{\\top}\\mathbf{P}\\mathbf{z}_{i-1}\n   - 2\\mathbf{z}_{i}^{\\top}\\mathbf{P}\\mathbf{z}_{i-1}\n   \\right)\\end{align}\n\nNow the matrices $\\mathbf{P}$ are linear in the parameters\nwhich means that after vectorisation they can be represented as\n\n\\begin{align}\\operatorname{vec}(\\mathbf{P}) = \\mathbf{V}\\mathbf{g} + \\mathbf{v}_0\\end{align}\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>The matrices $\\mathbf{P}$ and their affine representations are\n   most easily written compactly using kronecker products, unfortunately\n   these are not necessarily the best computational representations and\n   there is a lot here that needs refining.</p></div>\n\nLinear Gaussian Model\n---------------------\nWe take a break in the model to now discuss how to start putting some\nof the ideas discussed above into code. For the Kalman Filter we are\ngoing to use the code in the\n`PyKalman package <https://pykalman.github.io/>`_, but hacked a little\nbit to allow for filtering and smoothing of independent sequences\nwith a common transition matrix.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nfrom pydygp.liealgebras import so\nfrom sklearn.gaussian_process.kernels import RBF\nfrom pydygp.linlatentforcemodels import MLFMSA\nnp.random.seed(123)\ndef _get_kf_statistics(X, kf):\n    \"\"\" Gets\n    \"\"\"\n    # the mean, cov and kalman gain matrix\n    means, cov, kalman_gains = kf.smooth(X)\n    # pairwise cov between Cov{ z[i], z[i-1]\n    # note pairwise_covs[0] = 0  - it gets ignored\n    pairwise_covs = kf._smooth_pair(covs, gains)\n\n    S0 = 0.\n    for m, c in zip(means[:-1], covs[:-1]):\n        S0 += c + \\\n              (m[:, None, :] * m[None, ...]).transpose((2, 0, 1))\n    S1 = 0.\n    for i, pw in enumerate(pairwise_covs[1:]):\n        S1 += pw + \\\n              (means[i+1][:, None, :] * \\\n               means[i][None, ...]).transpose((2, 0, 1))\n\n    return S0.sum(0), S1.sum(0)\n                           \n\nmlfm = MLFMSA(so(3), R=1, lf_kernels=[RBF(), ], order=5)\nbeta = np.array([[0.1, 0., 0.],\n                 [-0.5, 0.31, 0.11]])\ntt = np.linspace(0., 6., 15)\nx0 = np.eye(3)\nData, g = mlfm.sim(x0, tt, beta=beta, size=3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Expectation Maximisation\n------------------------\n\nSo we have introduced a large collection of unintersting latent variables,\nthe set of successive approximations $\\{ z_0, \\ldots, z_M \\}$, and\nso we need to integrate them out. If we define the statistics\n\n\\begin{align}\\boldsymbol{\\Psi}_0 = \\sum_{i=1}^{M} \\langle \\mathbf{z}_{i-1}\n   \\mathbf{z}_{i-1}^{\\top} \\rangle_{q(Z)}, \\quad\n   \\boldsymbol{\\Psi}_1 = \\sum_{i=1}^{M} \\langle \\mathbf{z}_{i}\n   \\mathbf{z}_{i-1}^{\\top} \\rangle_{q(Z)}\\end{align}\n\nThen the objective function of the `M-step` becomes\n\n\\begin{align}Q(\\mathbf{g}, \\mathbf{g}^{old}) =\n   -\\frac{1}{2} \\mathbf{g}^{\\top}\n   \\left( \\mathbf{V}^{\\top}\n   (\\boldsymbol{\\Psi}_0 \\otimes \\mathbf{I}_{NK})\\mathbf{V} +\n   \\lambda^{-1} \\mathbf{C}^{-1} \\right)\\mathbf{g} - 2\\end{align}\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "More sensible place to start -- the Kalman Filter performs the numerical integration\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from pydygp.linlatentforcemodels import KalmanFilter\n\nifx = tt.size // 2  # index left fixed by the Picard iteration\n\nmlfm._setup_times(tt, h=None)\n\nA = mlfm._K(g[0](tt), beta, ifx)\n\ninit_conds = np.array([y[ifx, :] for y in Data])\n\n# array [m0, m1, m2] with m0 = np.kron(Data[0][ifx, :], ones)\ninit_vals = np.kron(init_conds, np.ones(mlfm.dim.N)).T\nfinal_vals = np.column_stack([y.T.ravel() for y in Data])\n\nX = np.ma.zeros((mlfm.order, ) + init_vals.shape)  # data we are going to give to the KalmanFilter\nX[0, ...] = init_vals\nX[1, mlfm.order-1, ...] = np.ma.masked  # mask these values -- we have no data\nX[mlfm.order-1, ...] = final_vals\n\nNK = mlfm.dim.N*mlfm.dim.K\nobservation_matrices = np.array([np.eye(NK)]*3)\n\nkf = KalmanFilter(initial_state_mean=init_vals,\n                  initial_state_covariance=np.eye(NK)*1e-5,\n                  observation_offsets=np.zeros((mlfm.order, NK, mlfm.dim.K)),\n                  observation_matrices=observation_matrices,\n                  transition_matrices=A,\n                  transition_covariance=np.eye(NK)*1e-5,\n                  transition_offsets=np.zeros(init_vals.shape),\n                  n_dim_state=NK,\n                  n_dim_obs=NK)\n\nmeans, covs, k_gains = kf.smooth(X)\n\nimport matplotlib.pyplot as plt\nfig, ax = plt.subplots()\nfor i, mean in enumerate(means):\n    # unvectorise the column\n    m = mean[:, 0].reshape((mlfm.dim.K, mlfm.dim.N)).T\n    ax.plot(tt, m, 'k-', alpha=(i+1)/mlfm.order)\nax.plot(tt, Data[0], 'ks')\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}