Prototype of VEM in M&M ASH model¶

This is the VEM step of M&M ASH model.

$\newcommand{\bs}[1]{\boldsymbol{#1}}$ $\DeclareMathOperator*{\diag}{diag}$ $\DeclareMathOperator*{\cov}{cov}$ $\DeclareMathOperator*{\rank}{rank}$ $\DeclareMathOperator*{\var}{var}$ $\DeclareMathOperator*{\tr}{tr}$ $\DeclareMathOperator*{\veco}{vec}$ $\DeclareMathOperator*{\uniform}{\mathcal{U}niform}$ $\DeclareMathOperator*{\argmin}{arg\ min}$ $\DeclareMathOperator*{\argmax}{arg\ max}$ $\DeclareMathOperator*{\N}{N}$ $\DeclareMathOperator*{\gm}{Gamma}$ $\DeclareMathOperator*{\dif}{d}$

M&M ASH model¶

We assume the following multivariate, multiple regression model with $N$ samples, $J$ effects and $R$ conditions (and without covariates, for the time being) \begin{align} \bs{Y}_{N\times R} = \bs{X}_{N \times J}\bs{B}_{J \times R} + \bs{E}_{N \times R}, \end{align} where \begin{align} \bs{E} &\sim \N_{N \times R}(\bs{0}, \bs{I}_N, \bs{\Lambda}^{-1}),\\ \bs{\Lambda} &= \diag(\lambda_1,\ldots,\lambda_R). \end{align}

We assume true effects $\bs{b}_j$ (rows of $\bs{B}$) are iid with prior distribution of mixtures of multivariate normals

$$p(\bs{b}_j) = \sum_{t = 0}^T\pi_t\N_R(\bs{b}_j | \bs{0}, \bs{V}_t),$$

Where the $\bs{V}_t$'s are $R \times R$ covariance matrices and the $\pi_t$'s are their weights.

We place Gamma prior on $\bs{\Lambda}$

$$\lambda_r \overset{iid}{\sim} \gm(\alpha, \beta),$$

and set $\alpha = \beta = 0$ so that it is equivalent to estimating $\bs{\Lambda}$ via maximum likelihood.

We can augment the prior of $\bs{b}_j$ by indicator vector $\bs{\gamma}_j \in \mathbb{R}^T$ for membership of $\bs{b}_j$ into one of the $T$ mixture groups. The densities involved are

\begin{align} p(\bs{Y},\bs{B},\bs{\Gamma},\bs{\Lambda}) &= p(\bs{Y}|\bs{B}, \bs{\Lambda})p(\bs{B}|\bs{\Gamma})p(\bs{\Gamma})p(\bs{\Lambda}), \\ p(\bs{Y}|\bs{B}, \bs{\Lambda}) &= N_{N \times R}(\bs{X}\bs{B}, \bs{I}_N, \bs{\Lambda}^{-1}), \\ p(\lambda_r|\alpha,\beta) &= \frac{\beta^{\alpha}}{\Gamma(\alpha)}\lambda_r^{\alpha - 1}\exp\{-\beta\lambda_r\}, \\ p(\bs{b}_j|\bs{\gamma}_j) &= \prod_{t = 0}^T\left[\N(\bs{b}_j|\bs{0},\bs{V}_t)\right]^{\gamma_{jt}},\\ p(\bs{\gamma}_j) &= \prod_{t = 0}^{T} \pi_t^{\gamma_{jt}}. \end{align}

We assume $V_t$'s and their corresponding $\pi_t$'s are known. In practice we use mashr to estimate these quantities and provide them to M&M.

Variational approximation to densities¶

For the posterior of $\bs{B}$ we seek an independent variational approximation based on

\begin{align} q(\bs{B}, \bs{\Gamma}, \bs{\Lambda}) = q(\bs{\Lambda})\prod_{j = 1}^{J}q(\bs{b}_j,\bs{\gamma}_j), \end{align}

so that we can maximize over $q$ the following lower bound of the marginal log-likelihood

\begin{align} \log p(\bs{Y}) \geq \mathcal{L}(q) = \int q(\bs{B}, \bs{\Gamma}, \bs{\Lambda}) \log\left\{\frac{p(\bs{Y},\bs{B},\bs{\Gamma},\bs{\Lambda})}{q(\bs{B}, \bs{\Gamma}, \bs{\Lambda})}\right\}\dif\bs{B}\dif\bs{\Gamma}\dif\bs{\Lambda}, \end{align}

Gao & Wei have previously developed a version that assumes $\Lambda = I_R$. This version generalized it to a diagonal matrix with Gamma priors. David has developed a version that assumes a diagonal plus low rank structure -- the model of that version is a bit different from shown here, and will be prototyped later after this version works.

Core updates¶

The complete derivation of updates are documented elsewhere (in the two PDF write-ups whose links are shown above). Here I document core updates to guide implementation of the algorithm.

Let $E[\bs{R}_{-j}] := \bs{Y} - \bs{X}\bs{\mu}_{\bs{B}} + \bs{x}_j\bs{\mu}_{\bs{B}[j, ]}^{\intercal}$, then

\begin{align} \bs{\xi}_j &= E\left[\bs{R}_{-j}\right]^{\intercal}\bs{x}_j\|\bs{x}_j\|^{-2}, \\ \bs{\Sigma}_{jt} &= \left(\bs{V}_t^{-1} + \|\bs{x}_j\|^2\bs{\Lambda}\right)^{-1}, \\ \bs{\mu}_{jt} &= \bs{\Sigma}_{jt}\bs{\Lambda}E\left[\bs{R}_{-k}\right]^{\intercal}\bs{x}_j, \\ w_{jt} &= \frac{\pi_t\N(\bs{\xi}_j|\bs{0}, \bs{V}_t + \bs{\Lambda}^{-1}\|\bs{x}_j\|^{-2})}{\sum_{t = 0}^T\pi_t\N(\bs{\xi}_j|\bs{0}, \bs{V}_t + \bs{\Lambda}^{-1}\|\bs{x}_j\|^{-2})},\\ \bs{\mu}_{\bs{B}[j, ]} &= \sum_{t = 0}^T w_{jt}\bs{\mu}_{jt} \end{align}

We update until the lower bound $\mathcal{L}(q)$ converges.

Initialization¶

We fit mashr with effects learned from univariate analysis to obtain $\pi_t$ and $V_t$
- For the first round the effects are "LD-polluted"
Use multivariate LASSO to get the ordering of $X$ for input. Similar approach has been previousely used with varbvs.
We "stack" expression data under multiple conditions and impute missing data with mean imputation or softImpute for a completed $Y$ matrix.
- This version of the model does not impute missing data in $Y$ in its variational updates although this will be added in next version.
We regress out covariates beforehand
- Same approach taken by Guan & Stephens 2011 yet not Carbonetto & Stephens 2012
- In next version we will preprocess covariates by "stacking" them together and perform a low-rank decomposition / imputation. For example for 50 tissues there will be a blocked matrix with a total of over 1000 PEER factors when stacked together, with non-random missing data. We will perform a low rank approximation to hopefully only keep < 50 PEER. We will then control for covariates in the M&M model.

In this notebook we use a test data set of 2 tissues: Thyroid and Lung. As a first pass we also fix $\bs{V}$ as a null matrix plus an identity matrix, with weights and $\pi_0=0.9$.

dat = readRDS('/home/gaow/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds')
str(dat)
attach(dat)

List of 2
 $ Y:'data.frame':	698 obs. of  2 variables:
  ..$ Thyroid: num [1:698] 0.163 0.436 -0.212 0.327 -0.698 ...
  ..$ Lung   : num [1:698] 0.77011 0.77799 -0.65361 0.00672 -0.36792 ...
 $ X: num [1:698, 1:7492] 1 0 0 0 0 1 1 0 1 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:698] "GTEX-111CU" "GTEX-111FC" "GTEX-111VG" "GTEX-111YS" ...
  .. ..$ : chr [1:7492] "chr1_170185243_G_A_b38" "chr1_170185272_T_C_b38" "chr1_170185405_C_A_b38" "chr1_170185417_G_A_b38" ...

%get X Y --from R
import numpy as np
from scipy.stats import multivariate_normal

Loading required package: feather

Data preview¶

X

array([[ 1.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  1.,  0., ...,  0.,  1.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  0.,  1.,  0.]])

Y = Y.as_matrix()

Y

array([[ 0.16348104,  0.77010917],
       [ 0.43588995,  0.77798736],
       [-0.21237311, -0.65361193],
       ..., 
       [ 0.62036618, -0.0035004 ],
       [ 0.00279156, -0.05439095],
       [-0.14650835,  0.29935286]])

Multivariate LASSO¶

For initialization of effects. Here I use implementation via sklearn, as documented here. It is quite slow to compute a proper CV version of LASSO, though.

from sklearn import linear_model
import numpy as np
# model = linear_model.MultiTaskLassoCV(alphas = [(x+1)/100 for x in range(10)], fit_intercept = False)
model = linear_model.MultiTaskLasso(alpha = 0.04, fit_intercept = False)
model.fit(X, Y)

MultiTaskLasso(alpha=0.04, copy_X=True, fit_intercept=False, max_iter=1000,
        normalize=False, random_state=None, selection='cyclic', tol=0.0001,
        warm_start=False)

First feature:

import seaborn as sns
import matplotlib.pyplot as plt
plt.scatter([x+1 for x in range(len(model.coef_[0]))], model.coef_[0], cmap="viridis")
ax = plt.gca()
plt.show()

Second feature:

plt.scatter([x+1 for x in range(len(model.coef_[1]))], model.coef_[1], cmap="viridis")
ax = plt.gca()
plt.show()

First vs. 2nd feature:

plt.scatter(model.coef_[0], model.coef_[1], cmap="viridis")
ax = plt.gca()
plt.show()

Compare with univariate LASSO, first and 2nd features:

# model = linear_model.LassoCV(alphas = [(x+1)/100 for x in range(10)], fit_intercept = False)
model = linear_model.Lasso(alpha = 0.05, fit_intercept = False)
model.fit(X,Y[:,0])

Lasso(alpha=0.05, copy_X=True, fit_intercept=False, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

plt.scatter([x+1 for x in range(len(model.coef_))], model.coef_, cmap="viridis")
ax = plt.gca()
plt.show()

model.fit(X,Y[:,1])
plt.scatter([x+1 for x in range(len(model.coef_))], model.coef_, cmap="viridis")
ax = plt.gca()
plt.show()

MASH priors and weights¶

As initialization we take univariate analysis summary statistics $\hat{\bs{B}}$ and $\bs{S}$ for fitting with mashr.

VEM implementation¶

It is not working yet. Still trying to figure out what's going on.

class MNMASH:
    def __init__(self, X, Y):
        self.X = X
        self.Y = Y        
        self.B = np.zeros((X.shape[1], Y.shape[1]))
        # self.V = np.ones((2, Y.shape[1], Y.shape[1]))
        self.V = np.array((np.zeros((2,2)), np.identity(2)))
        # self.pi = np.random.uniform(0,1,self.V.shape[0])
        # self.pi = self.pi / sum(self.pi)
        self.pi = np.array([0.9, 0.1])
        self.Lambda = np.identity(Y.shape[1])
        self.tol = 1E-4
        self.debug = 1
        self.maxiter = 1
        # initialize intermediate variables
        self.R = np.ones((self.X.shape[1], self.Y.shape[0], self.Y.shape[1])) * np.nan
        self.E = np.ones((self.Y.shape[1], self.X.shape[1])) * np.nan
        self.Rx = np.ones((self.Y.shape[1], self.X.shape[1])) * np.nan
        self.wdE = np.ones((self.X.shape[1], self.V.shape[0])) * np.nan
        self.Sigma = {tt: np.ones((self.X.shape[1], self.Y.shape[1], self.Y.shape[1])) * np.nan for tt in range(self.V.shape[0])}
        self.Mu = np.ones((self.X.shape[1], self.Y.shape[1], self.V.shape[0])) * np.nan
        self.gamma = np.ones((self.X.shape[1], self.V.shape[0])) * np.nan
        # X is N by K matrix, X_norm is 1 by K vector of L2 norm of column k's
        self.X_norm = np.linalg.norm(self.X, ord = 2, axis = 0)
        if self.debug:
            self.X_std = self.X / self.X_norm
            np.testing.assert_array_almost_equal(np.sum(np.square(self.X_std), axis = 0), np.ones(self.X.shape[1]))
        self.X_norm = np.square(self.X_norm)
        self.X_std = self.X / self.X_norm
        self.R_all = None
        self.H = None
        self.Delta = None
        self.update_lambda = True

    def update(self):
        '''
        Core update
        '''
        iLambda = np.linalg.inv(self.Lambda)
        # B is M by K matrix
        self.R_all = self.Y - self.X @ self.B
        # K is number of effects
        for kk in range(self.X.shape[1]):
            # R[kk] is N x M, where M is number of conditions
            self.R[kk,:,:] = self.R_all + np.outer(self.X[:,kk], (self.B[kk,:].T))
            # E[kk] is M x 1
            self.E[:,kk] = (self.R[kk,:,:].T @ self.X_std[:,kk]).ravel()
            # Rx[kk] is M x 1
            self.Rx[:,kk] = (self.R[kk,:,:].T @ self.X[:,kk]).ravel()
            for tt in range(self.V.shape[0]):
                self.wdE[kk,tt] = multivariate_normal.pdf(self.E[:,kk], np.zeros(self.Y.shape[1]), \
                                                          self.V[tt] + iLambda / self.X_norm[kk]) * self.pi[tt]
            wdE_sum = sum(self.wdE[kk,:])
            for tt in range(self.V.shape[0]):
                # Can be made faster via low-rank approximation
                self.Sigma[tt][kk,:,:] = np.linalg.inv(np.identity(self.Y.shape[1]) + self.V[tt] * self.X_norm[kk] @ self.Lambda) @ self.V[tt]
                self.Mu[kk,:,tt] = self.Sigma[tt][kk,:,:] @ self.Lambda @ self.Rx[:,kk]
                self.gamma[kk,tt] = self.wdE[kk,tt] / wdE_sum
                #if self.gamma[kk,tt] != self.gamma[kk,tt]:
                #    self.gamma[kk,tt] = 0
            # dot product for weighted sums
            self.B[kk,:] = self.Mu[kk,:,:] @ self.gamma[kk,:]
        if self.update_lambda:
            # Recalculate h, a M by M matrix
            self.H = np.diag(np.sum([self.X_norm[kk] * np.sum([self.gamma[kk,tt] * (np.square(self.Mu[kk,:,tt] - self.B[kk,:]) + np.diag(self.Sigma[tt][kk,:,:])) for tt in range(self.V.shape[0])], axis = 0) for kk in range(self.X.shape[1])], axis = 0))
            # Update Lambda
            self.Delta = np.diag(self.R_all.T @ self.R_all) + self.H
            self.Lambda = np.diag(self.X.shape[0] / np.diag(self.Delta))

    def vem(self):
        '''
        Variational EM
        '''
        cnt = 0
        while cnt < self.maxiter:
            self.update()
            cnt += 1

    def summary(self):
        self.prt('R')
        self.prt('E')
        self.prt('Rx')
        self.prt('wdE')
        self.prt('Sigma')
        self.prt('Mu')
        self.prt('gamma')
        self.prt('B')
        self.prt('H')
        self.prt('Delta')
        self.prt('Lambda')
        print('X @ B = \n{}\n'.format(self.X @ self.B))
        print('Y - X @ B = \n{}\n'.format(self.Y - self.X @ self.B))

    def prt(self, variable):
        print(variable, '=', repr(eval(f'self.{variable}')))
        print('\n')

res = MNMASH(X,Y)

Checking core updates¶

First iteration.

res.update()

res.summary()

R = array([[[ 0.16348104,  0.77010917],
        [ 0.43588995,  0.77798736],
        [-0.21237311, -0.65361193],
        ..., 
        [ 0.62036618, -0.0035004 ],
        [ 0.00279156, -0.05439095],
        [-0.14650835,  0.29935286]],

       [[ 0.16348104,  0.77010917],
        [ 0.43588995,  0.77798736],
        [-0.21237311, -0.65361193],
        ..., 
        [ 0.62036618, -0.0035004 ],
        [ 0.00279156, -0.05439095],
        [-0.14650835,  0.29935286]],

       [[ 0.16348104,  0.77010917],
        [ 0.43588995,  0.77798736],
        [-0.21237311, -0.65361193],
        ..., 
        [ 0.62036618, -0.0035004 ],
        [ 0.00279156, -0.05439095],
        [-0.14650835,  0.29935286]],

       ..., 
       [[ 0.16348104,  0.77010917],
        [ 0.43588995,  0.77798736],
        [-0.21237311, -0.65361193],
        ..., 
        [ 0.62036618, -0.0035004 ],
        [ 0.00279156, -0.05439095],
        [-0.14650835,  0.29935286]],

       [[ 0.16348104,  0.77010917],
        [ 0.43588995,  0.77798736],
        [-0.21237311, -0.65361193],
        ..., 
        [ 0.62036618, -0.0035004 ],
        [ 0.00279156, -0.05439095],
        [-0.14650835,  0.29935286]],

       [[ 0.16348104,  0.77010917],
        [ 0.43588995,  0.77798736],
        [-0.21237311, -0.65361193],
        ..., 
        [ 0.62036618, -0.0035004 ],
        [ 0.00279156, -0.05439095],
        [-0.14650835,  0.29935286]]])


E = array([[-0.02094472, -0.06122512, -0.04164898, ...,  0.0372279 ,
        -0.00669449,  0.04083855],
       [-0.02124238, -0.05018941, -0.06105233, ..., -0.00273088,
         0.01641117,  0.00058457]])


Rx = array([[-5.37396148, -3.67350731, -4.12385519, ...,  3.61241949,
        -3.382703  ,  4.00364293],
       [-5.45033424, -3.01136475, -6.04506932, ..., -0.26499216,
         8.29251397,  0.05730889]])


wdE = array([[  3.27869046e+01,   1.58466800e-02],
       [  7.12124237e+00,   1.56064056e-02],
       [  1.08224143e+01,   1.57138195e-02],
       ..., 
       [  1.29907011e+01,   1.57422904e-02],
       [  6.68559977e+01,   1.58815694e-02],
       [  1.29400551e+01,   1.57417875e-02]])


Sigma = {0: array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       ..., 
       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]]), 1: array([[[ 0.00388231,  0.        ],
        [ 0.        ,  0.00388231]],

       [[ 0.01639344,  0.        ],
        [ 0.        ,  0.01639344]],

       [[ 0.00999854,  0.        ],
        [ 0.        ,  0.00999854]],

       ..., 
       [[ 0.01020041,  0.        ],
        [ 0.        ,  0.01020041]],

       [[ 0.00197513,  0.        ],
        [ 0.        ,  0.00197513]],

       [[ 0.01009735,  0.        ],
        [ 0.        ,  0.01009735]]])}


Mu = array([[[ 0.        , -0.02086341],
        [ 0.        , -0.02115991]],

       [[ 0.        , -0.06022143],
        [ 0.        , -0.04936664]],

       [[ 0.        , -0.04123255],
        [ 0.        , -0.06044189]],

       ..., 
       [[ 0.        ,  0.03684816],
        [ 0.        , -0.00270303]],

       [[ 0.        , -0.00668126],
        [ 0.        ,  0.01637876]],

       [[ 0.        ,  0.04042619],
        [ 0.        ,  0.00057867]]])


gamma = array([[  9.99516910e-01,   4.83089967e-04],
       [  9.97813264e-01,   2.18673620e-03],
       [  9.98550135e-01,   1.44986471e-03],
       ..., 
       [  9.98789654e-01,   1.21034552e-03],
       [  9.99762508e-01,   2.37492494e-04],
       [  9.98784962e-01,   1.21503815e-03]])


B = array([[ -1.00789037e-05,  -1.02221414e-05],
       [ -1.31688384e-04,  -1.07951809e-04],
       [ -5.97816168e-05,  -8.76325673e-05],
       ..., 
       [  4.45990036e-05,  -3.27159859e-06],
       [ -1.58674981e-06,   3.88983158e-06],
       [  4.91193663e-05,   7.03103794e-07]])


H = array([[ 87.01054504,   0.        ],
       [  0.        ,  36.30742391]])


Delta = array([[ 277.28296782,   77.31960452],
       [ 190.27242279,  113.62702843]])


Lambda = array([[ 2.51728408,  0.        ],
       [ 0.        ,  6.14290464]])


X @ B = 
[[ 0.03294618  0.01109255]
 [ 0.25732661  0.06390661]
 [-1.93134788 -0.82240281]
 ..., 
 [-0.03603904 -0.03764358]
 [ 0.05910854  0.01078694]
 [-0.01833198 -0.02217391]]

Y - X @ B = 
[[ 0.13053486  0.75901662]
 [ 0.17856335  0.71408075]
 [ 1.71897477  0.16879088]
 ..., 
 [ 0.65640522  0.03414318]
 [-0.05631698 -0.06517789]
 [-0.12817637  0.32152677]]

Second iteration.

res.update()

res.summary()

R = array([[[ 0.13052478,  0.7590064 ],
        [ 0.17856335,  0.71408075],
        [ 1.71897477,  0.16879088],
        ..., 
        [ 0.65640522,  0.03414318],
        [-0.05631698, -0.06517789],
        [-0.12817637,  0.32152677]],

       [[ 0.13053486,  0.75901662],
        [ 0.17856335,  0.71408075],
        [ 1.71884308,  0.16868293],
        ..., 
        [ 0.65640522,  0.03414318],
        [-0.05631698, -0.06517789],
        [-0.12817637,  0.32152677]],

       [[ 0.13053486,  0.75901662],
        [ 0.17856335,  0.71408075],
        [ 1.71897477,  0.16879088],
        ..., 
        [ 0.65640522,  0.03414318],
        [-0.05631698, -0.06517789],
        [-0.12817637,  0.32152677]],

       ..., 
       [[ 0.13053486,  0.75901662],
        [ 0.17856335,  0.71408075],
        [ 1.71897477,  0.16879088],
        ..., 
        [ 0.65640522,  0.03414318],
        [-0.05631698, -0.06517789],
        [-0.12817637,  0.32152677]],

       [[ 0.13053327,  0.75902051],
        [ 0.17856176,  0.71408464],
        [ 1.71897318,  0.16879477],
        ..., 
        [ 0.65640522,  0.03414318],
        [-0.05631857, -0.065174  ],
        [-0.12817796,  0.32153066]],

       [[ 0.13053486,  0.75901662],
        [ 0.17856335,  0.71408075],
        [ 1.71897477,  0.16879088],
        ..., 
        [ 0.65640522,  0.03414318],
        [-0.05631698, -0.06517789],
        [-0.12817637,  0.32152677]]])


E = array([[ 0.21079379,  0.2094001 ,  0.24239461, ...,  0.27463481,
         0.10669409,  0.27295689],
       [ 0.11529282,  0.10257626,  0.09074612, ...,  0.1319643 ,
         0.09273926,  0.1332    ]])


Rx = array([[ 54.08511282,  12.56400617,  24.00059538, ...,  26.64926535,
         53.91219941,  26.75956506],
       [ 29.58163476,   6.15457558,   8.98518717, ...,  12.80519295,
         46.86086273,  13.05837673]])


wdE = array([[  2.39127280e-09,   1.54464280e-02],
       [  1.77272121e-01,   1.54191792e-02],
       [  3.01125325e-03,   1.53497452e-02],
       ..., 
       [  3.04044809e-05,   1.51524335e-02],
       [  3.25875959e-07,   1.57486222e-02],
       [  2.68711417e-05,   1.51573079e-02]])


Sigma = {0: array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       ..., 
       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]]), 1: array([[[ 0.00154588,  0.        ],
        [ 0.        ,  0.00063406]],

       [[ 0.00657734,  0.        ],
        [ 0.        ,  0.00270582]],

       [[ 0.00399604,  0.        ],
        [ 0.        ,  0.0016414 ]],

       ..., 
       [[ 0.00407722,  0.        ],
        [ 0.        ,  0.00167482]],

       [[ 0.00078556,  0.        ],
        [ 0.        ,  0.00032206]],

       [[ 0.00403577,  0.        ],
        [ 0.        ,  0.00165776]]])}


Mu = array([[[ 0.        ,  0.21046793],
        [ 0.        ,  0.11521971]],

       [[ 0.        ,  0.20802281],
        [ 0.        ,  0.10229871]],

       [[ 0.        ,  0.24142599],
        [ 0.        ,  0.09059717]],

       ..., 
       [[ 0.        ,  0.27351507],
        [ 0.        ,  0.13174329]],

       [[ 0.        ,  0.10661028],
        [ 0.        ,  0.09270939]],

       [[ 0.        ,  0.2718553 ],
        [ 0.        ,  0.13297918]]])


gamma = array([[  1.54810706e-07,   9.99999845e-01],
       [  9.19979889e-01,   8.00201108e-02],
       [  1.64002696e-01,   8.35997304e-01],
       ..., 
       [  2.00255584e-03,   9.97997444e-01],
       [  2.06919190e-05,   9.99979308e-01],
       [  1.76968025e-03,   9.98230320e-01]])


B = array([[ 0.2104679 ,  0.1152197 ],
       [ 0.01664601,  0.00818595],
       [ 0.20183147,  0.07573899],
       ..., 
       [ 0.27296734,  0.13147946],
       [ 0.10660807,  0.09270747],
       [ 0.27137421,  0.13274385]])


H = array([[ 2626.32773972,     0.        ],
       [    0.        ,  1021.8405499 ]])


Delta = array([[ 3013.93890421,   113.73784609],
       [  387.61116449,  1135.57839598]])


Lambda = array([[ 0.23159063,  0.        ],
       [ 0.        ,  0.61466474]])


X @ B = 
[[ 453.05425944  261.73600459]
 [ 330.33889386  198.94931778]
 [ 529.65337082  299.34095052]
 ..., 
 [ 554.50230877  309.23940148]
 [ 432.80909626  252.75380296]
 [ 350.69635873  204.41429518]]

Y - X @ B = 
[[-452.89077841 -260.96589542]
 [-329.9030039  -198.17133042]
 [-529.86574394 -299.99456245]
 ..., 
 [-553.88194259 -309.24290189]
 [-432.8063047  -252.80819391]
 [-350.84286708 -204.11494232]]

np.min(res.wdE)

5.604973192572156e-166

np.min(res.gamma)

1.0062197120752165e-163

3rd iteration.

res.update()
res.summary()

/home/gaow/Public/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:57: RuntimeWarning: invalid value encountered in double_scalars

R = array([[[-452.68031051, -260.85067572],
        [-329.9030039 , -198.17133042],
        [-529.86574394, -299.99456245],
        ..., 
        [-553.88194259, -309.24290189],
        [-432.8063047 , -252.80819391],
        [-350.84286708, -204.11494232]],

       [[-452.89077841, -260.96589542],
        [-329.9030039 , -198.17133042],
        [-529.84909793, -299.98637649],
        ..., 
        [-553.88194259, -309.24290189],
        [-432.8063047 , -252.80819391],
        [-350.84286708, -204.11494232]],

       [[-452.89077841, -260.96589542],
        [-329.9030039 , -198.17133042],
        [-529.86574394, -299.99456245],
        ..., 
        [-553.88194259, -309.24290189],
        [-432.8063047 , -252.80819391],
        [-350.84286708, -204.11494232]],

       ..., 
       [[-452.89077841, -260.96589542],
        [-329.9030039 , -198.17133042],
        [-529.86574394, -299.99456245],
        ..., 
        [-553.88194259, -309.24290189],
        [-432.8063047 , -252.80819391],
        [-350.84286708, -204.11494232]],

       [[-452.78417034, -260.87318795],
        [-329.79639583, -198.07862295],
        [-529.75913586, -299.90185498],
        ..., 
        [-553.88194259, -309.24290189],
        [-432.69969663, -252.71548644],
        [-350.736259  , -204.02223485]],

       [[-452.89077841, -260.96589542],
        [-329.9030039 , -198.17133042],
        [-529.86574394, -299.99456245],
        ..., 
        [-553.88194259, -309.24290189],
        [-432.8063047 , -252.80819391],
        [-350.84286708, -204.11494232]]])


E = array([[-425.79454062, -444.76539542, -399.88346397, ..., -463.49901197,
        -341.54679507, -461.87222838],
       [-242.91414995, -254.79852845, -228.25162772, ..., -265.39549983,
        -197.10317605, -264.50948834]])


Rx = array([[-109249.63889953,  -26685.92372545,  -39594.28539861, ...,
         -44975.7556278 , -172582.55036628,  -45280.04314355],
       [ -62326.49936591,  -15287.91170723,  -22600.23458034, ...,
         -25752.72619077,  -99595.63169368,  -25931.41632691]])


wdE = array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       ..., 
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])


Sigma = {0: array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       ..., 
       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]]), 1: array([[[ 0.0165505 ,  0.        ],
        [ 0.        ,  0.00630081]],

       [[ 0.06713465,  0.        ],
        [ 0.        ,  0.02639924]],

       [[ 0.04178708,  0.        ],
        [ 0.        ,  0.01616534]],

       ..., 
       [[ 0.04260312,  0.        ],
        [ 0.        ,  0.01648963]],

       [[ 0.00847299,  0.        ],
        [ 0.        ,  0.00320936]],

       [[ 0.04218664,  0.        ],
        [ 0.        ,  0.01632408]]])}


Mu = array([[[   0.        , -418.74742735],
        [   0.        , -241.38359289]],

       [[   0.        , -414.90622599],
        [   0.        , -248.07204203]],

       [[   0.        , -383.1735033 ],
        [   0.        , -224.5618633 ]],

       ..., 
       [[   0.        , -443.75250932],
        [   0.        , -261.01922548]],

       [[   0.        , -338.65287111],
        [   0.        , -196.47060021]],

       [[   0.        , -442.38739125],
        [   0.        , -260.19161388]]])


gamma = array([[ nan,  nan],
       [ nan,  nan],
       [ nan,  nan],
       ..., 
       [ nan,  nan],
       [ nan,  nan],
       [ nan,  nan]])


B = array([[ nan,  nan],
       [ nan,  nan],
       [ nan,  nan],
       ..., 
       [ nan,  nan],
       [ nan,  nan],
       [ nan,  nan]])


H = array([[ nan,   0.],
       [  0.,  nan]])


Delta = array([[             nan,   4.88287921e+07],
       [  1.50030292e+08,              nan]])


Lambda = array([[ nan,   0.],
       [  0.,  nan]])


X @ B = 
[[ nan  nan]
 [ nan  nan]
 [ nan  nan]
 ..., 
 [ nan  nan]
 [ nan  nan]
 [ nan  nan]]

Y - X @ B = 
[[ nan  nan]
 [ nan  nan]
 [ nan  nan]
 ..., 
 [ nan  nan]
 [ nan  nan]
 [ nan  nan]]