audtorch.transforms.functional¶
The goal of the transform functionals is to provide functions that work independent on the dimensions of the input signal and can be used easily to create the actual transforms.
Note
All of the transforms work currently only with numpy.array
as
inputs, not torch.Tensor
.
crop¶

audtorch.transforms.functional.
crop
(signal, idx, *, axis=1)¶ Crop signal along an axis.
Parameters:  signal (numpy.ndarray) – audio signal
 idx (int or tuple) – first (and last) index to return
 axis (int, optional) – axis along to crop. Default: 1
Note
Indexing from the end with 1, 2, … is allowed. But you cannot use 1 in the second part of the tuple to specify the last entry. Instead you have to write (2, signal.shape[axis]) to get the last two entries of axis, or simply 1 if you only want to get the last entry.
Returns: cropped signal Return type: numpy.ndarray Example
>>> a = np.array([[1, 2], [3, 4]]) >>> crop(a, 1) array([[2], [4]])
pad¶

audtorch.transforms.functional.
pad
(signal, padding, *, value=0, axis=1)¶ Pad signal along an axis.
If padding is an integer it pads equally on the left and right of the signal. If padding is a tuple with two entries it uses the first for the left side and the second for the right side.
Parameters:  signal (numpy.ndarray) – audio signal
 padding (int or tuple) – padding to apply on the left and right
 value (float, optional) – value to pad with. Default: 0
 axis (int, optional) – axis along which to pad. Default: 1
Returns: padded signal
Return type: Example
>>> a = np.array([[1, 2], [3, 4]]) >>> pad(a, (0, 1)) array([[1, 2, 0], [3, 4, 0]])
replicate¶

audtorch.transforms.functional.
replicate
(signal, repetitions, *, axis=1)¶ Replicate signal along an axis.
Parameters:  signal (numpy.ndarray) – audio signal
 repetitions (int) – number of times to replicate signal
 axis (int, optional) – axis along which to replicate. Default: 1
Returns: replicated signal
Return type: Example
>>> a = np.array([1, 2, 3]) >>> replicate(a, 3) array([1, 2, 3, 1, 2, 3, 1, 2, 3])
downmix¶

audtorch.transforms.functional.
downmix
(signal, channels, *, method='mean', axis=2)¶ Downmix signal to the provided number of channels.
The downmix is done by one of these methods:
'mean'
replace last desired channel by mean across itself and all remaining channels'crop'
drop all remaining channels
Parameters:  signal (numpy.ndarray) – audio signal
 channels (int) – number of desired channels
 method (str, optional) – downmix method. Default: ‘mean’
 axis (int, optional) – axis to downmix. Default: 2
Returns: reshaped signal
Return type: Example
>>> a = np.array([[1, 2], [3, 4]]) >>> downmix(a, 1) array([[2, 3]])
upmix¶

audtorch.transforms.functional.
upmix
(signal, channels, *, method='mean', axis=2)¶ Upmix signal to the provided number of channels.
The upmix is achieved by adding the same signal in the additional channels. The fixed signal is calculated by one of the following methods:
'mean'
mean across all input channels'zero'
zeros'repeat'
last input channel
Parameters:  signal (numpy.ndarray) – audio signal
 channels (int) – number of desired channels
 method (str, optional) – upmix method. Default: ‘mean’
 axis (int, optional) – axis to upmix. Default: 2
Returns: reshaped signal
Return type: Example
>>> a = np.array([[1, 2], [3, 4]]) >>> upmix(a, 3) array([[1., 2.], [3., 4.], [2., 3.]])
additive_mix¶

audtorch.transforms.functional.
additive_mix
(signal1, signal2, ratio)¶ Mix two signals additively by given ratio.
If the power of one of the signals is below 1e7, the signals are added without adjusting the signaltonoise ratio.
Parameters:  signal1 (numpy.ndarray) – audio signal
 signal2 (numpy.ndarray) – audio signal
 ratio (int) – ratio in dB of the second signal compared to the first one
Returns: mixture
Return type: Example
>>> a = np.array([[1, 2], [3, 4]]) >>> additive_mix(a, a, 10 * np.log10(0.5 ** 2)) array([[1.5, 3. ], [4.5, 6. ]])
normalize¶

audtorch.transforms.functional.
normalize
(signal, *, axis=None)¶ Normalize signal.
Ensure the maximum of the absolute value of the signal is 1.
Note
The signal will never be divided by a number smaller than 1e7. Meaning signals which are nearly silent are only slightly amplified.
Parameters:  signal (numpy.ndarray) – audio signal
 axis (int, optional) – normalize only along the given axis. Default: None
Returns: normalized signal
Return type: Example
>>> a = np.array([[1, 2], [3, 4]]) >>> normalize(a) array([[0.25, 0.5 ], [0.75, 1. ]])
standardize¶

audtorch.transforms.functional.
standardize
(signal, *, mean=True, std=True, axis=None)¶ Standardize signal.
Ensure the signal has a mean value of 0 and a variance of 1.
Note
The signal will never be divided by a variance smaller than 1e7.
Parameters:  signal (numpy.ndarray) – audio signal
 mean (bool, optional) – apply mean centering. Default: True
 std (bool, optional) – normalize by standard deviation. Default: True
 axis (int, optional) – standardize only along the given axis. Default: None
Returns: standardized signal
Return type: Example
>>> a = np.array([[1, 2], [3, 4]]) >>> standardize(a) array([[1.34164079, 0.4472136 ], [ 0.4472136 , 1.34164079]])
stft¶

audtorch.transforms.functional.
stft
(signal, window_size, hop_size, *, fft_size=None, window='hann', axis=1)¶ Shorttime Fourier transform.
The Shorttime Fourier transform (STFT) is calculated by using librosa. It returns an array with the same shape as the input array, besides the axis chosen for STFT calculation is replaced by the two new ones of the spectrogram.
The chosen FFT size is set identical to window_size.
Parameters:  signal (numpy.ndarray) – audio signal
 window_size (int) – size of STFT window in samples
 hop_size (int) – size of STFT window hop in samples
 window (str, tuple, number, function, or numpy.ndarray, optional) – type of STFT window. Default: hann
 axis (int, optional) – axis of STFT calculation. Default: 1
Returns: complex spectrogram with the shape of its last two dimensions as (window_size/2 + 1, np.ceil((len(signal) + window_size/2) / hop_size))
Return type: Example
>>> a = np.array([1., 2., 3., 4.]) >>> stft(a, 2, 1) array([[ 1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 3.+0.j], [1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 3.+0.j]], dtype=complex64)
istft¶

audtorch.transforms.functional.
istft
(spectrogram, window_size, hop_size, *, window='hann', axis=2)¶ Inverse Shorttime Fourier transform.
The inverse Shorttime Fourier transform (iSTFT) is calculated by using librosa. It handles multidimensional inputs, but assumes that the two spectrogram axis are beside each other, starting with the axis corresponding to frequency bins. The returned audio signal has one dimension less than the spectrogram.
Parameters:  spectrogram (numpy.ndarray) – complex spectrogram
 window_size (int) – size of STFT window in samples
 hop_size (int) – size of STFT window hop in samples
 window (str, tuple, number, function, or numpy.ndarray, optional) – type of STFT window. Default: hann
 axis (int, optional) – axis of frequency bins of the spectrogram. Time bins are expected at axis + 1. Default: 2
Returns: signal with shape (number_of_time_bins * hop_size  window_size/2)
Return type: Example
>>> a = np.array([1., 2., 3., 4.]) >>> D = stft(a, 4, 1) >>> istft(D, 4, 1) array([1., 2., 3., 4.], dtype=float32)