ssspy.bss.fdica#

In this module, we separate multichannel signals using frequency-domain independent component analysis (FDICA). We denote the number of sources and microphones as \(N\) and \(M\), respectively. We also denote short-time Fourier transforms of source, observed, and separated signals as \(\boldsymbol{s}_{ij}\), \(\boldsymbol{x}_{ij}\), and \(\boldsymbol{y}_{ij}\), respectively.

\[\begin{split}\boldsymbol{s}_{ij} &= (s_{ij1},\ldots,s_{ijn},\ldots,s_{ijN})^{\mathsf{T}}\in\mathbb{C}^{N}, \\ \boldsymbol{x}_{ij} &= (x_{ij1},\ldots,x_{ijm},\ldots,x_{ijM})^{\mathsf{T}}\in\mathbb{C}^{M}, \\ \boldsymbol{y}_{ij} &= (y_{ij1},\ldots,y_{ijn},\ldots,y_{ijN})^{\mathsf{T}}\in\mathbb{C}^{N},\end{split}\]

where \(i=1,\ldots,I\) and \(j=1,\ldots,J\) are indices of frequency bins and time frames, respectively. When a mixing system is time-invariant, \(\boldsymbol{x}_{ij}\) is represented as follows:

\[\boldsymbol{x}_{ij} = \boldsymbol{A}_{i}\boldsymbol{s}_{ij},\]

where \(\boldsymbol{A}_{i}=(\boldsymbol{a}_{i1},\ldots,\boldsymbol{a}_{in},\ldots,\boldsymbol{a}_{iN})\in\mathbb{C}^{M\times N}\) is a mixing matrix. If \(M=N\) and \(\boldsymbol{A}_{i}\) is non-singular, a demixing system is represented as

\[\boldsymbol{y}_{ij} = \boldsymbol{W}_{i}\boldsymbol{x}_{ij},\]

where \(\boldsymbol{W}_{i}=(\boldsymbol{w}_{i1},\ldots,\boldsymbol{w}_{in},\ldots,\boldsymbol{w}_{iN})^{\mathsf{H}}\in\mathbb{C}^{N\times M}\) is a demixing matrix. The negative log-likelihood of observed signals (divided by \(J\)) is computed as follows:

\[\begin{split}\mathcal{L} &= -\frac{1}{J}\log p(\mathcal{X}) \\ &= -\frac{1}{J}\left(\log p(\mathcal{Y}) \ + \sum_{i}\log|\det\boldsymbol{W}_{i}|^{2J} \right) \\ &= -\frac{1}{J}\sum_{i,j,n}\log p(y_{ijn}) - 2\sum_{i}\log|\det\boldsymbol{W}_{i}| \\ &= \sum_{i}\mathcal{L}^{[i]}, \\ \mathcal{L}^{[i]} \ &= \frac{1}{J}\sum_{j,n}G(y_{ijn}) - 2\log|\det\boldsymbol{W}_{i}|, \\ G(y_{ijn}) &= -\log p(y_{ijn}),\end{split}\]

where \(G(y_{ijn})\) is a contrast function. The derivative of \(G(y_{ijn})\) is called a score function.

\[\phi(y_{ijn}) = \frac{\partial G(y_{ijn})}{\partial y_{ijn}^{*}}.\]

Algorithms#

class ssspy.bss.fdica.FDICABase(contrast_fn=None, flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), callbacks=None, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Base class of frequency-domain independent component analysis (FDICA).

Parameters:
  • contrast_fn (callable) – A contrast function which corresponds to \(-\log p(y_{ijn})\). This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the update algorithm if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

__call__(input, n_iter=100, initial_call=True, **kwargs)#

Separate a frequency-domain multichannel signal.

Parameters:
  • input (numpy.ndarray) – Mixture signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

  • n_iter (int) – Number of iterations of demixing filter updates. Default: 100.

  • initial_call (bool) – If True, perform callbacks (and computation of loss if necessary) before iterations.

Return type:

ndarray

Returns:

numpy.ndarray of the separated signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

apply_projection_back()#

Apply projection back technique to estimated spectrograms.

Return type:

None

compute_logdet(demix_filter)#

Compute log-determinant of demixing filter.

Parameters:

demix_filter (numpy.ndarray) – Demixing filters with shape of (n_bins, n_sources, n_channels).

Return type:

ndarray

Returns:

numpy.ndarray of computed log-determinant values.

compute_loss()#

Compute loss \(\mathcal{L}\).

\(\mathcal{L}\) is given as follows:

\[\begin{split}\mathcal{L} &= \sum_{i}\mathcal{L}^{[i]}, \\ \mathcal{L}^{[i]} &= \frac{1}{J}\sum_{j,n}G(y_{ijn}) - 2\log|\det\boldsymbol{W}_{i}|, \\ G(y_{ijn}) \ &= - \log p(y_{ijn})\end{split}\]
Return type:

float

Returns:

Computed loss.

restore_scale()#

Restore scale ambiguity.

If self.scale_restoration=projection_back, we use projection back technique. If self.scale_restoration=minimal_distortion_principle, we use minimal distortion principle.

Return type:

None

separate(input, demix_filter)#

Separate input using demixing_filter.

\[\boldsymbol{y}_{ij} = \boldsymbol{W}_{i}\boldsymbol{x}_{ij}\]
Parameters:
  • input (numpy.ndarray) – The mixture signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

  • demix_filter (numpy.ndarray) – The demixing filters to separate input. The shape is (n_bins, n_sources, n_channels).

Return type:

ndarray

Returns:

numpy.ndarray of the separated signal in frequency-domain. The shape is (n_sources, n_bins, n_frames).

solve_permutation()#

Align demixing filters and separated spectrograms

Return type:

None

class ssspy.bss.fdica.GradFDICABase(step_size=0.1, contrast_fn=None, score_fn=None, flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), callbacks=None, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Base class of frequency-domain independent component analysis (FDICA) using the gradient descent.

Parameters:
  • step_size (float) – A step size of the gradient descent. Default: 1e-1.

  • contrast_fn (callable) – A contrast function which corresponds to \(-\log p(y_{ijn})\). This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • score_fn (callable) – A score function which corresponds to the partial derivative of the contrast function. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the gradient descent if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

__call__(input, n_iter=100, initial_call=True, **kwargs)#

Separate a frequency-domain multichannel signal.

Parameters:
  • input (numpy.ndarray) – The mixture signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

  • n_iter (int) – The number of iterations of demixing filter updates. Default: 100.

  • initial_call (bool) – If True, perform callbacks (and computation of loss if necessary) before iterations.

Return type:

ndarray

Returns:

numpy.ndarray of the separated signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

class ssspy.bss.fdica.GradFDICA(step_size=0.1, contrast_fn=None, score_fn=None, flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), callbacks=None, is_holonomic=False, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Frequency-domain independent component analysis (FDICA) using the gradient descent.

Parameters:
  • step_size (float) – A step size of the gradient descent. Default: 1e-1.

  • contrast_fn (callable) – A contrast function corresponds to \(-\log p(y_{ijn})\). This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • score_fn (callable) – A score function corresponds to the partial derivative of the contrast function. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • is_holonomic (bool) – If is_holonomic=True, Holonomic-type update is used. Otherwise, Nonholonomic-type update is used. Default: False.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the gradient descent if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

Examples

Update demixing filters using Holonomic-type update:

>>> def contrast_fn(y):
...     return 2 * np.abs(y)

>>> def score_fn(y):
...     denom = np.maximum(np.abs(y), 1e-10)
...     return y / denom

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = GradFDICA(
...     contrast_fn=contrast_fn,
...     score_fn=score_fn,
...     is_holonomic=True,
... )
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)

Update demixing filters using Nonholonomic-type update:

>>> def contrast_fn(y):
...     return 2 * np.abs(y)

>>> def score_fn(y):
...     denom = np.maximum(np.abs(y), 1e-10)
...     return y / denom

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = GradFDICA(
...     contrast_fn=contrast_fn,
...     score_fn=score_fn,
...     is_holonomic=False,
... )
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)
update_once()#

Update demixing filters once using the gradient descent.

If is_holonomic=True, demixing filters are updated as follows: :rtype: None

\[\boldsymbol{W}_{i} \leftarrow\boldsymbol{W}_{i} - \eta\left(\frac{1}{J}\sum_{j} \boldsymbol{\phi}(\boldsymbol{y}_{ij})\boldsymbol{y}_{ij}^{\mathsf{H}} -\boldsymbol{I}\right)\boldsymbol{W}_{i}^{-\mathsf{H}},\]

where

\[\begin{split}\boldsymbol{\phi}(\boldsymbol{y}_{ij}) &= \left(\phi(y_{ij1}),\ldots,\phi(y_{ijn}),\ldots,\phi(y_{ijN}) \right)^{\mathsf{T}}\in\mathbb{C}^{N}, \\ \phi(y_{ijn}) &= \frac{\partial G(y_{ijn})}{\partial y_{ijn}^{*}}, \\ G(y_{ijn}) &= -\log p(y_{ijn}).\end{split}\]

Otherwise (is_holonomic=False),

\[\boldsymbol{W}_{i} \leftarrow\boldsymbol{W}_{i} - \eta\cdot\mathrm{offdiag}\left(\frac{1}{J}\sum_{j} \boldsymbol{\phi}(\boldsymbol{y}_{ij})\boldsymbol{y}_{ij}^{\mathsf{H}}\right) \boldsymbol{W}_{i}^{-\mathsf{H}}.\]
class ssspy.bss.fdica.NaturalGradFDICA(step_size=0.1, contrast_fn=None, score_fn=None, flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), callbacks=None, is_holonomic=False, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Frequency-domain independent component analysis (FDICA) using the natural gradient descent.

Parameters:
  • step_size (float) – A step size of the gradient descent. Default: 1e-1.

  • contrast_fn (callable) – A contrast function corresponds to \(-\log p(y_{ijn})\). This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • score_fn (callable) – A score function corresponds to the partial derivative of the contrast function. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • is_holonomic (bool) – If is_holonomic=True, Holonomic-type update is used. Otherwise, Nonholonomic-type update is used. Default: False.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the gradient descent if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

Examples

Update demixing filters using Holonomic-type update:

>>> def contrast_fn(y):
...     return 2 * np.abs(y)

>>> def score_fn(y):
...     denom = np.maximum(np.abs(y), 1e-10)
...     return y / denom

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = NaturalGradFDICA(
...     contrast_fn=contrast_fn,
...     score_fn=score_fn,
...     is_holonomic=True,
... )
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)

Update demixing filters using Nonholonomic-type update:

>>> def contrast_fn(y):
...     return 2 * np.abs(y)

>>> def score_fn(y):
...     denom = np.maximum(np.abs(y), 1e-10)
...     return y / denom

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = NaturalGradFDICA(
...     contrast_fn=contrast_fn,
...     score_fn=score_fn,
...     is_holonomic=False,
... )
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)
update_once()#

Update demixing filters once using the gradient descent.

If is_holonomic=True, demixing filters are updated as follows: :rtype: None

\[\boldsymbol{W}_{i} \leftarrow\boldsymbol{W}_{i} - \eta\left(\frac{1}{J}\sum_{j} \boldsymbol{\phi}(\boldsymbol{y}_{ij})\boldsymbol{y}_{ij}^{\mathsf{H}} -\boldsymbol{I}\right)\boldsymbol{W}_{i},\]

where

\[\begin{split}\boldsymbol{\phi}(\boldsymbol{y}_{ij}) &= \left(\phi(y_{ij1}),\ldots,\phi(y_{ijn}),\ldots,\phi(y_{ijN}) \right)^{\mathsf{T}}\in\mathbb{C}^{N}, \\ \phi(y_{ijn}) &= \frac{\partial G(y_{ijn})}{\partial y_{ijn}^{*}}, \\ G(y_{ijn}) &= -\log p(y_{ijn}).\end{split}\]

Otherwise (is_holonomic=False),

\[\boldsymbol{W}_{i} \leftarrow\boldsymbol{W}_{i} - \eta\cdot\mathrm{offdiag}\left(\frac{1}{J}\sum_{j} \boldsymbol{\phi}(\boldsymbol{y}_{ij})\boldsymbol{y}_{ij}^{\mathsf{H}}\right) \boldsymbol{W}_{i}.\]
class ssspy.bss.fdica.AuxFDICA(spatial_algorithm='IP', contrast_fn=None, d_contrast_fn=None, flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), pair_selector=None, callbacks=None, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Auxiliary-function-based frequency-domain independent component analysis (AuxFDICA) [1].

Parameters:
  • spatial_algorithm (str) – Algorithm to update demixing filters. Choose IP, IP1, or IP2. Default: IP.

  • contrast_fn (callable) – A contrast function corresponds to \(-\log p(y_{ijn})\). This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • d_contrast_fn (callable) – A partial derivative of the real contrast function. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames).

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • pair_selector (callable, optional) – Selector to choose updaing pair in IP2 and ISS2. If None is given, partial(sequential_pair_selector, sort=True) is used. Default: None.

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the demixing filter update if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

Examples

Update demixing filters by IP:

>>> def contrast_fn(y):
...     return 2 * np.abs(y)

>>> def d_contrast_fn(y):
...     return 2 * np.ones_like(y)

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = AuxFDICA(
...     spatial_algorithm="IP",
...     contrast_fn=contrast_fn,
...     d_contrast_fn=d_contrast_fn,
... )
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=100)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)

Update demixing filters by IP2:

>>> from ssspy.utils.select_pair import sequential_pair_selector

>>> def contrast_fn(y):
...     return 2 * np.abs(y)

>>> def d_contrast_fn(y):
...     return 2 * np.ones_like(y)

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = AuxFDICA(
...     spatial_algorithm="IP2",
...     contrast_fn=contrast_fn,
...     d_contrast_fn=d_contrast_fn,
...     pair_selector=sequential_pair_selector,
... )
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=100)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)
__call__(input, n_iter=100, initial_call=True, **kwargs)#

Separate a frequency-domain multichannel signal.

Parameters:
  • input (numpy.ndarray) – The mixture signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

  • n_iter (int) – The number of iterations of demixing filter updates. Default: 100.

  • initial_call (bool) – If True, perform callbacks (and computation of loss if necessary) before iterations.

Return type:

ndarray

Returns:

numpy.ndarray of the separated signal in frequency-domain. The shape is (n_channels, n_bins, n_frames).

update_once(flooring_fn='self')#

Update demixing filters once.

  • If self.spatial_algorithm is IP or IP1, update_once_ip1 is called.

  • If self.spatial_algorithm is IP2, update_once_ip2 is called.

Parameters:

flooring_fn (callable or str, optional) – A flooring function for numerical stability. This function is expected to return the same shape tensor as the input. If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. If self is given as str, self.flooring_fn is used. Default: self.

Return type:

None

update_once_ip1(flooring_fn='self')#

Update demixing filters once using iterative projection.

Parameters:

flooring_fn (callable or str, optional) – A flooring function for numerical stability. This function is expected to return the same shape tensor as the input. If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. If self is given as str, self.flooring_fn is used. Default: self.

Return type:

None

Demixing filters are updated sequentially for \(n=1,\ldots,N\) as follows:

\[\begin{split}\boldsymbol{w}_{in} &\leftarrow\left(\boldsymbol{W}_{in}^{\mathsf{H}}\boldsymbol{U}_{in}\right)^{-1} \boldsymbol{e}_{n}, \\ \boldsymbol{w}_{in} &\leftarrow\frac{\boldsymbol{w}_{in}} {\sqrt{\boldsymbol{w}_{in}^{\mathsf{H}}\boldsymbol{U}_{in}\boldsymbol{w}_{in}}}, \\\end{split}\]

where

\[\begin{split}\boldsymbol{U}_{in} &= \frac{1}{J}\sum_{j} \frac{G'_{\mathbb{R}}(|y_{ijn}|)}{2|y_{ijn}|} \boldsymbol{x}_{ij}\boldsymbol{x}_{ij}^{\mathsf{H}}, \\ G(y_{ijn}) &= -\log p(y_{ijn}), \\ G_{\mathbb{R}}(|y_{ijn}|) &= G(y_{ijn}).\end{split}\]
update_once_ip2(flooring_fn='self')#

Update demixing filters once using pairwise iterative projection.

Parameters:

flooring_fn (callable or str, optional) – A flooring function for numerical stability. This function is expected to return the same shape tensor as the input. If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. If self is given as str, self.flooring_fn is used. Default: self.

Return type:

None

For \(n_{1}\) and \(n_{2}\) (\(n_{1}\neq n_{2}\)), compute auxiliary variables:

\[\begin{split}\bar{r}_{ijn_{1}} &\leftarrow|y_{ijn_{1}}| \\ \bar{r}_{ijn_{2}} &\leftarrow|y_{ijn_{2}}|\end{split}\]

Then, for \(n=n_{1},n_{2}\), compute weighted covariance matrix as follows:

\[\begin{split}\boldsymbol{U}_{in_{1}} &= \frac{1}{J}\sum_{j} \frac{G'_{\mathbb{R}}(\bar{r}_{ijn_{1}})}{2\bar{r}_{ijn_{1}}} \boldsymbol{x}_{ij}\boldsymbol{x}_{ij}^{\mathsf{H}}, \\ \boldsymbol{U}_{in_{2}} &= \frac{1}{J}\sum_{j} \frac{G'_{\mathbb{R}}(\bar{r}_{ijn_{2}})}{2\bar{r}_{ijn_{2}}} \boldsymbol{x}_{ij}\boldsymbol{x}_{ij}^{\mathsf{H}},\end{split}\]

where

\[\begin{split}G(y_{ijn}) &= -\log p(y_{ijn}), \\ G_{\mathbb{R}}(|y_{ijn}|) &= G(y_{ijn}).\end{split}\]

Using \(\boldsymbol{U}_{in_{1}}\) and \(\boldsymbol{U}_{in_{2}}\), we compute generalized eigenvectors.

\[\left({\boldsymbol{P}_{in_{1}}^{(n_{1},n_{2})}}^{\mathsf{H}}\boldsymbol{U}_{in_{1}} \boldsymbol{P}_{in_{1}}^{(n_{1},n_{2})}\right)\boldsymbol{h}_{i} = \lambda_{i} \left({\boldsymbol{P}_{in_{2}}^{(n_{1},n_{2})}}^{\mathsf{H}}\boldsymbol{U}_{in_{2}} \boldsymbol{P}_{in_{2}}^{(n_{1},n_{2})}\right)\boldsymbol{h}_{i},\]

where

\[\begin{split}\boldsymbol{P}_{in_{1}}^{(n_{1},n_{2})} &= (\boldsymbol{W}_{i}\boldsymbol{U}_{in_{1}})^{-1} ( \begin{array}{cc} \boldsymbol{e}_{n_{1}} & \boldsymbol{e}_{n_{2}} \end{array} ), \\ \boldsymbol{P}_{in_{2}}^{(n_{1},n_{2})} &= (\boldsymbol{W}_{i}\boldsymbol{U}_{in_{2}})^{-1} ( \begin{array}{cc} \boldsymbol{e}_{n_{1}} & \boldsymbol{e}_{n_{2}} \end{array} ).\end{split}\]

After that, we standardize two eigenvectors \(\boldsymbol{h}_{in_{1}}\) and \(\boldsymbol{h}_{in_{2}}\).

\[\begin{split}\boldsymbol{h}_{in_{1}} &\leftarrow\frac{\boldsymbol{h}_{in_{1}}} {\sqrt{\boldsymbol{h}_{in_{1}}^{\mathsf{H}} \left({\boldsymbol{P}_{in_{1}}^{(n_{1},n_{2})}}^{\mathsf{H}}\boldsymbol{U}_{in_{1}} \boldsymbol{P}_{in_{1}}^{(n_{1},n_{2})}\right) \boldsymbol{h}_{in_{1}}}}, \\ \boldsymbol{h}_{in_{2}} &\leftarrow\frac{\boldsymbol{h}_{in_{2}}} {\sqrt{\boldsymbol{h}_{in_{2}}^{\mathsf{H}} \left({\boldsymbol{P}_{in_{2}}^{(n_{1},n_{2})}}^{\mathsf{H}}\boldsymbol{U}_{in_{2}} \boldsymbol{P}_{in_{2}}^{(n_{1},n_{2})}\right) \boldsymbol{h}_{in_{2}}}}.\end{split}\]

Then, update \(\boldsymbol{w}_{in_{1}}\) and \(\boldsymbol{w}_{in_{2}}\) simultaneously.

\[\begin{split}\boldsymbol{w}_{in_{1}} &\leftarrow \boldsymbol{P}_{in_{1}}^{(n_{1},n_{2})}\boldsymbol{h}_{in_{1}} \\ \boldsymbol{w}_{in_{2}} &\leftarrow \boldsymbol{P}_{in_{2}}^{(n_{1},n_{2})}\boldsymbol{h}_{in_{2}}\end{split}\]

At each iteration, we update pairs of \(n_{1}\) and \(n_{1}\) for \(n_{1}\neq n_{2}\).

class ssspy.bss.fdica.GradLaplaceFDICA(step_size=0.1, flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), callbacks=None, is_holonomic=False, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Frequency-domain independent component analysis (FDICA) using the gradient descent on a Laplace distribution.

We assume \(y_{ijn}\) follows a Laplace distribution.

\[p(y_{ijn})\propto\exp(|y_{ijn}|)\]
Parameters:
  • step_size (float) – A step size of the gradient descent. Default: 1e-1.

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • is_holonomic (bool) – If is_holonomic=True, Holonomic-type update is used. Otherwise, Nonholonomic-type update is used. Default: False.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the gradient descent if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

Examples

Update demixing filters using Holonomic-type update:

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = GradLaplaceFDICA(is_holonomic=True)
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)

Update demixing filters using Nonholonomic-type update:

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = GradLaplaceFDICA(is_holonomic=False)
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)
class ssspy.bss.fdica.NaturalGradLaplaceFDICA(step_size=0.1, flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), callbacks=None, is_holonomic=False, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Frequency-domain independent component analysis (FDICA) using the natural gradient descent on a Laplace distribution.

We assume \(y_{ijn}\) follows a Laplace distribution.

\[p(y_{ijn})\propto\exp(|y_{ijn}|)\]
Parameters:
  • step_size (float) – A step size of the gradient descent. Default: 1e-1.

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • is_holonomic (bool) – If is_holonomic=True, Holonomic-type update is used. Otherwise, Nonholonomic-type update is used. Default: False.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the gradient descent if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

Examples

Update demixing filters using Holonomic-type update:

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = NaturalGradLaplaceFDICA(is_holonomic=True)
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)

Update demixing filters using Nonholonomic-type update:

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = NaturalGradLaplaceFDICA(is_holonomic=False)
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)
class ssspy.bss.fdica.AuxLaplaceFDICA(spatial_algorithm='IP', flooring_fn=functools.partial(<function max_flooring>, eps=1e-10), pair_selector=None, callbacks=None, permutation_alignment=True, scale_restoration=True, record_loss=True, reference_id=0)#

Auxiliary-function-based frequency-domain independent component analysis on a Laplace distribution.

We assume \(y_{ijn}\) follows a Laplace distribution.

\[p(y_{ijn})\propto\exp(|y_{ijn}|)\]
Parameters:
  • spatial_algorithm (str) – Algorithm to update demixing filters. Choose IP, IP1, or IP2. Default: IP.

  • flooring_fn (callable, optional) – A flooring function for numerical stability. This function is expected to receive (n_channels, n_bins, n_frames) and return (n_channels, n_bins, n_frames). If you explicitly set flooring_fn=None, the identity function (lambda x: x) is used. Default: partial(max_flooring, eps=1e-10).

  • pair_selector (callable, optional) – Selector to choose updaing pair in IP2 and ISS2. If None is given, partial(sequential_pair_selector, sort=True) is used. Default: None.

  • callbacks (callable or list[callable], optional) – Callback functions. Each function is called before separation and at each iteration. Default: None.

  • permutation_alignment (bool) – If permutation_alignment=True, a permutation solver is used to align estimated spectrograms. Default: True.

  • scale_restoration (bool or str) – Technique to restore scale ambiguity. If scale_restoration=True, the projection back technique is applied to estimated spectrograms. You can also specify projection_back or minimal_distortion_principle. Default: True.

  • record_loss (bool) – Record the loss at each iteration of the demixing filter update if record_loss=True. Default: True.

  • reference_id (int) – Reference channel for projection back and minimal distortion principle. Default: 0.

Examples

Update demixing filters by IP:

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = AuxLaplaceFDICA(spatial_algorithm="IP")
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)

Update demixing filters by IP2:

>>> from ssspy.utils.select_pair import sequential_pair_selector

>>> n_channels, n_bins, n_frames = 2, 2049, 128
>>> spectrogram_mix = \
...     np.random.randn(n_channels, n_bins, n_frames) \
...     + 1j * np.random.randn(n_channels, n_bins, n_frames)

>>> fdica = AuxLaplaceFDICA(
...     spatial_algorithm="IP2",
...     pair_selector=sequential_pair_selector,
... )
>>> spectrogram_est = fdica(spectrogram_mix, n_iter=1000)
>>> print(spectrogram_mix.shape, spectrogram_est.shape)
(2, 2049, 128), (2, 2049, 128)