ssspy.bss.pdsbss#

In this module, we separate multichannel signals using blind source separation via primal dual splitting algorithm. We denote the number of sources and microphones as \(N\) and \(M\), respectively. We also denote short-time Fourier transforms of source, observed, and separated signals as \(\boldsymbol{s}_{ij}\), \(\boldsymbol{x}_{ij}\), and \(\boldsymbol{y}_{ij}\), respectively.

\[\begin{split}\boldsymbol{s}_{ij} &= (s_{ij1},\ldots,s_{ijn},\ldots,s_{ijN})^{\mathsf{T}}\in\mathbb{C}^{N}, \\ \boldsymbol{x}_{ij} &= (x_{ij1},\ldots,x_{ijm},\ldots,x_{ijM})^{\mathsf{T}}\in\mathbb{C}^{M}, \\ \boldsymbol{y}_{ij} &= (y_{ij1},\ldots,y_{ijn},\ldots,y_{ijN})^{\mathsf{T}}\in\mathbb{C}^{N},\end{split}\]

where \(i=1,\ldots,I\) and \(j=1,\ldots,J\) are indices of frequency bins and time frames, respectively. When a mixing system is time-invariant, \(\boldsymbol{x}_{ij}\) is represented as follows:

\[\boldsymbol{x}_{ij} = \boldsymbol{A}_{i}\boldsymbol{s}_{ij},\]

where \(\boldsymbol{A}_{i}=(\boldsymbol{a}_{i1},\ldots,\boldsymbol{a}_{in},\ldots,\boldsymbol{a}_{iN})\in\mathbb{C}^{M\times N}\) is a mixing matrix. If \(M=N\) and \(\boldsymbol{A}_{i}\) is non-singular, a demixing system is represented as

\[\boldsymbol{y}_{ij} = \boldsymbol{W}_{i}\boldsymbol{x}_{ij},\]

where \(\boldsymbol{W}_{i}=(\boldsymbol{w}_{i1},\ldots,\boldsymbol{w}_{in},\ldots,\boldsymbol{w}_{iN})^{\mathsf{H}}\in\mathbb{C}^{N\times M}\) is a demixing matrix. The negative log-likelihood of observed signals (divided by \(2J\)) is computed as follows:

\[\begin{split}\mathcal{L} &= \mathcal{P}(\mathcal{V}(\mathcal{Y})) + \sum_{i}\mathcal{I}(\boldsymbol{W}_{i}), \\ \mathcal{V}(\mathcal{Y}) &:= (y_{111},\ldots,y_{11N},\ldots,y_{1JN},\ldots,y_{IJN})^{\mathsf{T}} \in\mathbb{C}^{IJN} \\ \mathcal{I}(\boldsymbol{W}_{i}) &= - \log|\det\boldsymbol{W}_{i}|,\end{split}\]

where \(\mathcal{P}\) is a penalty funcion that is determined by the source model.

Let us consider independent vector analysis. In this case, \(\mathcal{P}\) can be written by

\[\mathcal{P}(\mathcal{V}(\mathcal{Y})) = C\sum_{j,n}\left( \sum_{i}\left|\boldsymbol{w}_{in}^{\mathsf{H}}\boldsymbol{x}_{ij}\right|^{2} \right)^{\frac{1}{2}},\]

where \(C\) is a positive constant.

To the above formulation, we can apply the primal-dual splitting algorithm. On the basis of this algorithm, the demixing filter is updated as follows:

\[\begin{split}\tilde{\boldsymbol{W}}_{i} &\leftarrow\mathrm{prox}_{\mu_{1}\mathcal{I}} \left[\boldsymbol{W}_{i} - \mu_{1}\mu_{2}\sum_{j}\boldsymbol{u}_{ij}\boldsymbol{x}_{ij}^{\mathsf{H}}\right] \\ \boldsymbol{z}_{ij} &\leftarrow\boldsymbol{u}_{ij} + \left(2 * \tilde{\boldsymbol{W}}_{i} - \boldsymbol{W}_{i}\right)\boldsymbol{x}_{ij} \\ \mathcal{V}(\tilde{\mathcal{U}}) &\leftarrow\mathcal{V}(\mathcal{Z}) - \mathrm{prox}_{\mathcal{P}/\mu_{2}}\left[\mathcal{V}(\mathcal{Z})\right] \\ \boldsymbol{u}_{ij} &\leftarrow\alpha\tilde{\boldsymbol{u}}_{ij} + (1 - \alpha)\boldsymbol{u}_{ij}, \\ \boldsymbol{W}_{i} &\leftarrow\alpha\tilde{\boldsymbol{W}}_{i} + (1 - \alpha)\boldsymbol{W}_{i}.\end{split}\]

\(\boldsymbol{u}_{ij}\) is a dual variable, which should be initialized by a certain value. \(\mathrm{prox}_{g}\) is a proximal operator defined as

\[\mathrm{prox}_{g}[\boldsymbol{z}] = \mathrm{argmin}_{\boldsymbol{y}} ~~g(\boldsymbol{y}) + \frac{1}{2}\|\boldsymbol{z} - \boldsymbol{y}\|_{2}^{2}.\]

For \(\mathcal{I}\), we can obatain the following proximal operator:

\[\begin{split}\mathrm{prox}_{\mu\mathcal{I}}[\boldsymbol{W}_{i}] &= \boldsymbol{U}_{i}\tilde{\boldsymbol{\Sigma}}_{i}\boldsymbol{V}_{i}^{\mathsf{H}}, \\ \tilde{\boldsymbol{\Sigma}}_{i} &= \mathrm{diag}(\tilde{\sigma}_{i1},\ldots,\tilde{\sigma}_{iN}), \\ \tilde{\sigma}_{in} &= \frac{\sigma_{in} + \sqrt{\sigma_{in}^{2} + 4\mu}}{2},\end{split}\]

where \(\boldsymbol{U}_{i}\), \(\boldsymbol{V}_{i}\), and \(\boldsymbol{\Sigma}_{i}=\mathrm{diag}(\sigma_{i1},\ldots,\sigma_{iN})\) are singular value decomposition.

\[\boldsymbol{W}_{i} = \boldsymbol{U}_{i}\boldsymbol{\Sigma}_{i}\boldsymbol{V}_{i}^{\mathsf{H}}.\]

When \(\mathcal{P}\) is defined as

\[\mathcal{P}(\mathcal{V}(\mathcal{Y})) = C\sum_{j,n}\left( \sum_{i}\left|\boldsymbol{w}_{in}^{\mathsf{H}}\boldsymbol{x}_{ij}\right|^{2} \right)^{\frac{1}{2}},\]

the updates by the proximal operator can be written as

\[y_{ijn} \leftarrow\left(1 - \frac{\mu}{\sqrt{\sum_{i}|y_{ijn}|^{2}}}\right)_{+}y_{ijn}.\]

Algorithms#

class ssspy.bss.pdsbss.PDSBSSBase(penalty_fn=None, prox_penalty=None, callbacks=None, scale_restoration=True, record_loss=True, reference_id=0)#

Base class of blind source separation via proximal splitting algorithm [1].

Parameters:

penalty_fn (callable) – Penalty function that determines source model.
prox_penalty (callable) – Proximal operator of penalty function. Default: None.
callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.
scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back explicitly. Default: True.
record_loss (bool) – Record the loss at each iteration of the update algorithm if record_loss=True. Default: True.
reference_id (int) – Reference channel for projection back. Default: 0.

class ssspy.bss.pdsbss.PDSBSS(mu1=1, mu2=1, alpha=None, relaxation=1, penalty_fn=None, prox_penalty=None, callbacks=None, scale_restoration=True, record_loss=True, reference_id=0)#

Blind source separation via proximal splitting algorithm [1].

Parameters:

mu1 (float) – Step size. Default: 1.
mu2 (float) – Step size. Default: 1.
alpha (float) – Relaxation parameter (deprecated). Set relaxation instead.
relaxation (float) – Relaxation parameter. Default: 1.
penalty_fn (callable) – Penalty function that determines source model.
prox_penalty (callable) – Proximal operator of penalty function. Default: None.
callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.
scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back explicitly. Default: True.
record_loss (bool) – Record the loss at each iteration of the update algorithm if record_loss=True. Default: True.
reference_id (int) – Reference channel for projection back. Default: 0.

__call__(input, n_iter=100, initial_call=True, **kwargs)#

Separate a frequency-domain multichannel signal.

Parameters:

input (numpy.ndarray) – Mixture signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).
n_iter (int) – Number of iterations of demixing filter updates. Default: 100.
initial_call (bool) – If True, perform callbacks (and computation of loss if necessary) before iterations.

Return type:

ndarray

Returns:

numpy.ndarray of the separated signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

update_once()#

Update demixing filters and dual parameters once.

Return type:: None