Digital Equalization in Coherent Optical Transmission Systems

This is a continuation of the previous tutorial - fundamentals of laser oscillation.

1. Introduction

Digital equalization within a coherent optical receiver has been critical to the wide-scale adoption of the digital coherent transceiver in core networks.

Digital equalization has not only allowed optical chromatic dispersion to be removed from the line, but more critically it removed the limits imposed by the polarization-mode dispersion (PMD) to upgrading legacy systems.

Given this tutorial is concerned with digital equalization, before going in to detail we segue to discuss what is meant by digital equalization in contrast to mitigation. Mitigation is defined by the Oxford English Dictionary as “the action of reducing the severity, seriousness, or painfulness of something”; in contrast, equalization is concerned with “the action or process of equalizing”.

In this context, both chromatic dispersion and PMD, being fundamentally lossless processes, can be equalized. In contrast, the impact of filtering and polarization-dependent loss (PDL) in the line, or nonlinear impairments can only be mitigated due to the loss of information.

As such our focus is on equalization of dispersion, both chromatic and polarization mode; however, the algorithms described herein can also be applied to the design of matched filters or the mitigation of PDL.

Throughout this tutorial, we assume the digital filtering is partitioned into two blocks as illustrated in Figure 8.1.

**Figure 8.1**. Functional partitioning of the equalization in a digital coherent receiver. It should be noted that the matched filter (\(h_\text{MF}\)) and the chromatic dispersion compensating filter (\(h_\text{CD}\)) can be combined and as such in the subsequent we only discuss the design of the chromatic dispersion compensating filter.

The first of these blocks implements a set of filters that equalize for the static channel properties such as chromatic dispersion, with the second block implementing a set of adaptive filters that compensate for the dynamic channels properties such as polarization rotations or PMD.

This partitioning allows the two blocks to be implemented and updated in very different manners; for example, a large chromatic dispersion filter might be implemented via the frequency domain using an overlap and save method while a shorter adaptive equalizer could be implemented in the time domain, so as to reduce the overall complexity of the digital signal processing (DSP) and the associated power consumption.

In this tutorial, we begin by detailing the necessary mathematics related to digital equalization before discussing the compensation of chromatic dispersion and then PMD. We conclude the tutorial by discussing ongoing research challenges related to digital equalization.

2. Primer on the Mathematics of Least Squares FIR filters

In this section, we discuss the underlying mathematics of finite impulse response (FIR) filters and their optimization (including differentiation with respect to a complex vector).

2.1. Finite Impulse Response Filters

An FIR filter is a nonrecursive filter having tap weights \(h[n]\) where \(n\in[0,1,…,N-1]\) and the time step between taps is given by \(T_s\) such that we sample at a rate of \(f_s=1/T_s\) with corresponding angular frequency \(\omega_s=2\pi/T_s\). In the time domain, the taps may be written as

\[\tag{8.1}h(t)=\boldsymbol{\sum}_{n=0}^{N-1}h[n]\delta(t-nT_s)\]

Alternatively, if we define a vector \(\pmb{h}^{\pmb{T}}=[h[0],h[1],...,h[N-1]]\) and \(\boldsymbol{\delta}_s^{\pmb{T}}=[\delta(t),\delta(t-T_s),...,\delta(t-[N-1]T_s)]\) where superscript \(T\) denotes the transpose operation

\[\tag{8.2}h(t)=\pmb{h}^{\pmb{T}}\boldsymbol{\delta}_s\]

Hence, the Fourier transform is given by

\[\tag{8.3}H(\omega)=\int_{-\infty}^{\infty}h(t)e^{-j\omega{t}}dt=\boldsymbol{\sum}_{n=0}^{N-1}h[n]e^{-jn\omega{T_s}}=\pmb{h}^T\pmb{e}(\omega)\]

where we assume throughout this tutorial the Fourier transform pair

\[\tag{8.4}X(\omega)=\int_{-\infty}^{\infty}x(t)e^{-j\omega{t}}dt\qquad\text{and}\qquad{x(t)}=\frac{1}{2\pi}\int_{-\infty}^{\infty}X(\omega)e^{j\omega{t}}d\omega\]

and we have defined \(\pmb{e}(\omega)=[1,e^{-j\omega{T_s}},e^{-2j\omega{T_s}},...,e^{-j\omega[N-1]T_s}]^T\).

Given the above definitions it readily follows that for \(m\in\mathbb{Z}\) that

\[\tag{8.5}H(\omega+m\omega_s)=\boldsymbol{\sum}_{n=0}^{N-1}h[n]e^{-jn(\omega+m\omega_s)T_s}=\boldsymbol{\sum}_{n=0}^{N-1}h[n]e^{-jnm2\pi}e^{-jn\omega{T_s}}=H(\omega)\]

since \(e^{-jnm2\pi}=1\) indicating that the spectrum is aliased to multiples of the sampling frequency as expected from sampling theory.

The process of generating an output signal \(y[k]\) by filtering an input vector \(\pmb{x}^T=(x[k],x[k-1],...,x[k-(N-1)])\) of \(N\) samples by the filter \(\pmb{h}\) may be written as

\[\tag{8.6}y[k]=\pmb{h}^T\pmb{x}=\boldsymbol{\sum}_{i=0}^{N-1}h[i]x[k-i]\]

indicating that the output is obtained via the discrete time convolution of the input vector with the tap weights, which may be realized in the frequency domain as multiplication, underpinning fast techniques such as the overlap and save method.

2.2. Differentiation with Respect to a Complex Vector

If \(z\) is a complex number such that \(z=x+jy\), then we can define differentiation with respect to a complex number as

\[\tag{8.7}\frac{\partial}{\partial{z}}=\frac{1}{2}\frac{\partial}{\partial{x}}-\frac{j}{2}\frac{\partial}{\partial{y}}\]

with

\[\tag{8.8}\frac{\partial}{\partial{z^*}}=\left(\frac{\partial}{\partial{z}}\right)^*=\frac{1}{2}\frac{\partial}{\partial{x}}+\frac{j}{2}\frac{\partial}{\partial{y}}\]

which in turn gives

\[\tag{8.9}\frac{\partial{z}}{\partial{z}}=1\frac{\partial{z^*}}{\partial{z^*}}=1\frac{\partial{z^n}}{\partial{z}}=nz^{n-1}\]

which is in line with expectations from usual calculus; however, it also follows that

\[\tag{8.10}\frac{\partial{z^*}}{\partial{z}}=\frac{\partial{z}}{\partial{z^*}}=0\]

indicating that a complex variable and its conjugate may be considered as independent variables insofar as differentiation is concerned. If we extend the concept to a vector and such that

\[\tag{8.11}\frac{\partial}{\partial{\pmb{z}}}=\frac{1}{2}\frac{\partial}{\partial\pmb{x}}-\frac{j}{2}\frac{\partial}{\partial{\pmb{y}}}\]

where \(\frac{\partial}{\partial{\pmb{x}}}=\left(\frac{\partial}{\partial{x_0}},\frac{\partial}{\partial{x_1}},...,\frac{\partial}{\partial{x_{N-1}}}\right)^T\) and so on, then it follows that

\[\tag{8.12}\frac{\partial{z}^T}{\partial{z}}=\frac{\partial}{\partial{z}}z^T=\mathbf{I}=\frac{\partial{z^*}^T}{\partial{z^*}}\qquad\text{and}\qquad\frac{\partial{z^*}^T}{\partial{z}}=0=\frac{\partial{z}^T}{\partial{z^*}}\]

where \(\mathbf{I}\) and \(0\) are \(N\times{N}\) identity and null matrices, respectively.

2.3. Least Squares Tap Weights

If the desired frequency response of a filter is \(H_d(\omega)\) over the frequency range \(\omega\in(-\omega_s/2,\omega_s/2)\) then squared error \(\epsilon^2\) between the desired and the actual response \(H(\omega)\) is given by Kidambi and Ramachandran

\[\tag{8.13}\epsilon^2=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}|H(\omega)-H_d(\omega)|^2d\omega\]

Substituting our definition for \(H(\omega)=\pmb{h}^T\pmb{e}(\omega)\), we obtain

\[\tag{8.14}\begin{align}\epsilon^2&=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}|\pmb{h}^T\pmb{e}(\omega)-H_d(\omega)|^2d\omega=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}(\pmb{h}^T\pmb{e}(\omega)-H_d(\omega))^*(\pmb{h}^T\pmb{e}(\omega)-H_d(\omega))d\omega\\&=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}(\pmb{h}^T\pmb{e}(\omega)-H_d(\omega))^*(\pmb{e}^T(\omega)\pmb{h}-H_d(\omega))d\omega\end{align}\]

since \(\pmb{e}^T(\omega)\pmb{h}=\pmb{h}^T\pmb{e}(\omega)\) and hence

\[\tag{8.15}\frac{d\epsilon^2}{d\pmb{h}}=0=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}(\pmb{h}^T\pmb{e}(\omega)-H_d(\omega))^*\pmb{e}(\omega)d\omega\]

Hence, rewriting we obtain

\[\tag{8.16}\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}\pmb{e}^*(\omega)(\pmb{e}(\omega)^T\pmb{h}-H_d(\omega))d\omega=0\]

giving

\[\tag{8.17}\left(\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}\pmb{e}^*(\omega)\pmb{e}(\omega)^Td\omega\right)\pmb{h}=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}\pmb{e}^*(\omega)H_d(\omega)d\omega\]

but given \(\pmb{e}(\omega)=[1,e^{-j\omega{T_s}},e^{-2j\omega{T_s}},...,e^{-j\omega[N-1]T_s}]^T\) then \(\left(\displaystyle\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}\pmb{e}^*(\omega)\pmb{e}(\omega)^Td\omega\right)=\pmb{I}\); hence, we have

\[\tag{8.18}\pmb{h}_\text{opt}=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}\pmb{e}^*(\omega)H_d(\omega)d\omega\]

where the tap weights \(\pmb{h}_\text{opt}\) are optimal in a least squares sense.

To demonstrate the theory, let us consider a rectangular Nyquist filter with support \(\omega\in(-\omega_s/4,\omega_s/4)\). While the amplitude is straightforward given that the filter is complex we must also consider the phase response. In order to simplify the phase response, we modify the basis functions to be symmetric about the origin such that \(\pmb{e}(\omega)=e^{j\omega[N-1]T_s/2}\times[1,e^{-j\omega{T_s}},e^{-2j\omega{T_s}},...,e^{-j\omega[N-1]T_s}]^T\) ensuring that a symmetric FIR filter has zero group delay.

In this case for a rectangular Nyquist filter, the tap weights are given by

\[\tag{8.19}\pmb{h}_\text{opt}=\int_{-\frac{\omega_s}{4}}^{\frac{\omega_s}{4}}e^{-j\frac{j\omega[N-1]T_s}{2}}\times[1,e^{j\omega{T_s}},e^{2j\omega{T_s}},...,e^{j\omega[N-1]T_s}]^Td\omega\]

\[\tag{8.20}\begin{align}h_\text{opt}[n]&=\int_{-\frac{\omega_s}{4}}^{\frac{\omega_s}{4}}e^{j\omega{n}T_s-\frac{j\omega[N-1]T_s}{2}}d\omega=\frac{2\sin\left(\frac{\omega_snT_s}{4}-\frac{\omega_s[N-1]T_s}{8}\right)}{nT_s-\frac{[N-1]T_s}{2}}\\&=\frac{2\sin\left(\frac{\pi{n}}{2}-\frac{\pi[N-1]}{4}\right)}{nT_s-\frac{[N-1]T_s}{2}}=\frac{\pi}{T_s}\text{sinc}\left(\frac{n}{2}-\frac{[N-1]}{4}\right)\end{align}\]

where we have defined \(\text{sinc}(x)=\sin(\pi{x})/(\pi{x})\) in accordance with the usual convention for electronic engineers.

2.4. Application to Stochastic Gradient Algorithms

In an adaptive equalizer, we frequently have a cost function whose gradient is stochastically estimated and used to update the tap weights. For a complex valued set of taps \(\pmb{h}\), the stochastic gradient algorithm is applied to the real \((\boldsymbol{\mathscr{R}}\{\pmb{h}\})\) and imaginary \((\boldsymbol{\mathscr{I}}\{\pmb{h}\})\) components of the taps independently and hence may be written as

\[\tag{8.21}\boldsymbol{\mathscr{R}}\{\pmb{h}\}:=\boldsymbol{\mathscr{R}}\{\pmb{h}\}-\frac{\mu}{2}\left(\frac{\partial\epsilon^2}{\partial\boldsymbol{\mathscr{R}}\{\pmb{h}\}}\right)\qquad\text{and}\qquad\boldsymbol{\mathscr{I}}\{\pmb{h}\}:=\boldsymbol{\mathscr{I}}\{\pmb{h}\}-\frac{\mu}{2}\left(\frac{\partial\epsilon^2}{\partial\boldsymbol{\mathscr{I}}\{\pmb{h}\}}\right)\]

where \(∶=\) denotes the assignment operation, such that \(x∶=y\) indicates that \(x\) is assigned to be the value \(y\). These two equations may, however, be written more compactly in terms of the conjugate derivative as

\[\tag{8.22}\pmb{h}:=\pmb{h}-\mu\frac{\partial\epsilon^2}{\partial\pmb{h}^*}\]

To illustrate the approach, we consider the least mean squares equalizer whose error term is given by \(\epsilon=d-\pmb{h}^T\pmb{x}\), where \(d\) is the desired output, which gives the squared error term as \(\epsilon^2=|d-\pmb{h}^T\pmb{x}|^2=(d-\pmb{h}^T\pmb{x})^*(d-\pmb{h}^T\pmb{x})\).

\[\tag{8.23}\begin{align}\frac{\partial\epsilon^2}{\partial\pmb{h}^*}&=\frac{\partial}{\partial\pmb{h}^*}\left\{(d-\pmb{h}^T\pmb{x})^*(d-\pmb{h}^T\pmb{x})\right\}=\frac{\partial}{\partial\pmb{h}^*}\left\{(d^*-{\pmb{h}^{*}}^T\pmb{x}^*)(d-\pmb{h}^T\pmb{x})\right\}\\&=(d^*-{\pmb{h}^*}^T\pmb{x}^*)\frac{\partial}{\partial\pmb{h}^*}\left\{(d-\pmb{h}^T\pmb{x})\right\}+(d-\pmb{h}^T\pmb{x})\frac{\partial}{\partial\pmb{h}^*}\left\{(d^*-{\pmb{h}^*}^T\pmb{x}^*)\right\}\\&=-\epsilon^*0\pmb{x}-\epsilon{\pmb{I}}\pmb{x}^*\\&=-\epsilon\pmb{x}^*\end{align}\]

where we have used the product rule for differentiation and the relationships related to differentiation with respect to a complex vector and its conjugate.

Hence, the tap weight adaption algorithm is given by

\[\tag{8.24}\pmb{h}:=\pmb{h}+\mu\epsilon\pmb{x}^*\]

equally for the constant modulus algorithm (CMA)

\[\tag{8.25}\epsilon=1-|\pmb{h}^{\pmb{T}}\pmb{x}|^2\]

Hence

\[\tag{8.26}\frac{\partial\epsilon^2}{\partial\pmb{h}^*}=\frac{\partial}{\partial\pmb{h}^*}(1-|\pmb{h}^{\pmb{T}}\pmb{x}|^2)^2=-2\epsilon\frac{\partial}{\partial\pmb{h}^*}(\pmb{h}^{\pmb{T}}\pmb{x})^*(\pmb{h}^{\pmb{T}}\pmb{x})=-2\epsilon\pmb{x}^*(\pmb{h}^{\pmb{T}}\pmb{x})\]

giving the update algorithm for the CMA as

\[\tag{8.27}\pmb{h}:=\pmb{h}-\mu\frac{\partial\epsilon^2}{\partial\pmb{h}^*}=\pmb{h}+2\mu\epsilon\pmb{x}^*(\pmb{h}^{\pmb{T}}\pmb{x})\]

2.5. Application to Wiener Filter

We have already shown that if \(y=\pmb{h}^T\pmb{x}\) and the error term is given by \(\epsilon=d-y\), then

\[\tag{8.28}\frac{\partial\epsilon^2}{\partial\pmb{h}^*}=-\epsilon\pmb{x}^*=-(d-\pmb{h}^T\pmb{x})\pmb{x}^*\]

While we have showed how it is possible to iteratively solve this using a stochastic gradient technique, an alternative is to solve this analytically, by setting the expected value of derivative to zero so as to give

\[\tag{8.29}E\left\{\frac{\partial\epsilon^2}{\partial\pmb{h}^*}\right\}=E\{-(d-\pmb{h}^T\pmb{x})\pmb{x}^*\}=E\{-\pmb{x}^*(d-\pmb{x}^T\pmb{h})\}=0\]

where \(E\{\cdot\}\) is the expectation operator. By exploiting the linearity of the expectation operator, Equation 8.29 can be simplified to give

\[\tag{8.30}E\{\pmb{x}^*d\}=E\{\pmb{x}^*\pmb{x}^T\pmb{h}\}=E\{\pmb{x}^*\pmb{x}^T\}\pmb{h}\]

giving the Wiener filter solution of the tap weights as

\[\tag{8.31}\pmb{h}=R_{xx}^{-1}P\]

where we have defined \(P=E\{\pmb{x}^*d\}\) as the cross-correlation vector between the desired signal and the distorted signal, and \(R_{xx}= E\{\pmb{x}^*\pmb{x}^T\}\) to be the autocorrelation matrix of the distorted signal.

Often, in a coherent optical communication system, the situation is further complicated by the presence of phase noise or the frequency difference between the signal and the local oscillator. Nevertheless, it may be readily applied in a simulation environment where frequency offset correction and carrier recovery is not required.

2.6. Other Filtering Techniques and Design Methodologies

There is a wealth of literature in the area of filter design albeit much of this is focused on the design of linear phase filters, for example, using the Parks–McClellan variant of the Remez exchange algorithm to minimize maximum error.

In contrast, in optical communication systems nonlinear phase responses are often desired, in particular as we discuss in the subsequent section, filters with quadratic phase are required for the compensation of chromatic dispersion.

As such, in this tutorial, we focus on those techniques that have proved to be useful for optical communication systems, in particular the least squares criterion since it allows for closed form solutions without recourse to iterative techniques.

3. Equalization of Chromatic Dispersion

3.1. Nature of Chromatic Dispersion

Chromatic dispersion is a consequence of the frequency-dependent group delay in the optical fiber. If two wavelengths are separated by \(\Delta\lambda\) nm, then the temporal spread \(\Delta{t}\) (in ps) is given by Agrawal

\[\tag{8.32}\Delta{t}=Dz\Delta\lambda\]

where \(D\) is the dispersion coefficient of the fiber given in ps/nm/km and \(z\) is the length of the link in kilometer.

Given \(c=f\lambda\), it follows that \(\Delta\lambda=-\Delta{f}\times\lambda/f\); hence, for \(f\approx\) 193 THz (\(\lambda\approx\) 1553 nm), then \(\Delta\lambda/\Delta{f}\) = 8 pm/GHz. Hence, a signal occupying 35 GHz has a spectral width of 0.28 nm. If the dispersion coefficient is 16.7 ps/nm/km then for every 1000 km of fiber the signal disperses by at least 165 symbol periods, with the minimum value obtained by assuming a rectangular spectrum (in general for a symbol rate of \(B_s\) Gbaud, then the minimum number of \(T/2\) spaced taps is 0.27\(B_s^2\) per 1000 km of SMF with \(D\)=16.7 ps/nm/km). While these are crude “back of the envelope” calculations they give insight as to the expected number of taps.

3.2. Modeling of Chromatic Dispersion in an Optical Fiber

In the absence of fiber nonlinearity, the effect of chromatic dispersion on the envelope \(A(z, t)\) of a pulse may be modeled by the following partial differential equation, which is based on the electronic engineer’s definition of phase compared with the physicist’s convention

\[\tag{8.33}\frac{\partial{A}(z,t)}{\partial{z}}=\frac{j\beta_2}{2}\frac{\partial^2A(z,t)}{\partial{t^2}}\]

where \(z\) is the distance of propagation, \(t\) is time variable in a frame moving with the pulse \(A(z, t)\), and \(\beta_2\) group delay dispersion of the fiber. Taking the Fourier transform of Equation 8.33 and solving gives the frequency domain transfer function \(G(z,\omega)\) given by

\[\tag{8.34}G(z,\omega)=\exp\left(-\frac{j\beta_2}{2}\omega^2z\right)\]

The dispersion compensating filter is, therefore, given by the all-pass filter \(1/G(z,\omega)=G(-z,\omega)\), which can be approximated using an FIR filter.

3.3. Truncated Impulse Response

Herein, we discuss a simple but intuitive means of designing the chromatic dispersion compensating FIR filter, providing a basis for the discussion of more complex techniques.

In contrast to a frequency-domain approach not only does this give a simple closed-form solution for the tap weights but also it provides bounds on the number of taps required for a given value of dispersion.

We begin by obtaining the impulse response \(g(z, t)\) of the dispersive fiber by applying the inverse Fourier transform to the frequency domain transfer function \(G(z, \omega)\) to give

\[\tag{8.35}g(z,t)=\frac{1}{\sqrt{2\pi{j}\beta_2z}}\exp\left(\frac{j}{2\beta_2z}t^2\right)\]

For an arbitrary input, the output can be obtained by convolving this impulse response with the input and as expected the impulse response itself satisfies Equation 8.33. By inverting the sign of the chromatic dispersion, we obtain the impulse function of the chromatic dispersion compensating filter \(g_c(z, t)\), given by

\[\tag{8.36}g_c(z,t)=\frac{1}{\sqrt{-2\pi{j}\beta_2z}}\exp(-j\phi(t)),\qquad\text{where }\phi(t)=\frac{t^2}{2\beta_2z}\]

The impulse response given by Equation 8.36 presents a number of issues for digital implementation, not only is it infinite in duration but since it passes all frequencies for a finite sampling frequency aliasing will occur. The solution to all of these problems is to truncate the impulse response to a finite duration.

To determine the length of the truncation window, we note that if we sample every Ts seconds then aliasing will occur for frequencies which exceed the Nyquist frequency given by \(\omega_n=\pi/T\) and that the impulse response may be considered as a rotating vector whose angular frequency is given by

\[\tag{8.37}\omega=\frac{\partial\phi(t)}{\partial{t}}=\frac{t}{\beta_2z}\]

when the magnitude of this frequency exceeds the Nyquist frequency, aliasing will occur, giving the criterion that \(|\omega|\lt\omega_n\) and hence

\[\tag{8.38}-|\beta_2|z\frac{\pi}{T_s}\le{t}\le|\beta_2|z\frac{\pi}{T_s}\]

Since the impulse response is of finite duration, this can be implemented digitally using an FIR filter. If we assume the number of taps is large, then the sampled impulse response will approximate the continuous time impulse response. Hence, if we consider a filter with \(N_\text{TI}\) taps then the tap weights will be given by

\[\tag{8.39}h_\text{TI}[n]=\frac{1}{\sqrt{\rho}}\exp\left(-\frac{j\pi}{\rho}\left(n-\frac{N_\text{TI}-1}{2}\right)^2\right)\]

where

\[\rho=2\frac{\pi\beta_2z}{T_s^2},\;N_\text{TI}=|\rho|\qquad\text{and}\qquad{n}\in[0,1,2,...,N_\text{TI}-1]\]

where \(x\) is the integer part of \(x\) rounded toward minus infinity. These tap weights form the basis for the compensation of chromatic dispersion using an FIR filter.

3.4. Band-Limited Impulse Response

In the previous example, one of the problems that arose was due to the aliasing that occurred, resulting in an upper bound on the number of taps that can be employed. One obvious solution to overcome this restriction is to band-limit the signal to the Nyquist bandwidth \(\omega_n=\pi/T\) such that

\[\tag{8.40}g_{c_\text{bl}}(z,t)=\frac{1}{2\pi}\int_{-\omega_n}^{\omega_n}\exp\left(\frac{j\beta_2}{2}\omega^2z\right)\exp(j\omega{t})d\omega=w(z,t)\times{g_c}(z,t)\]

where

\[\tag{8.41}w(z,t)=\frac{1}{2j}\left(\text{erfi}\left(\frac{t+z\beta_2\omega_n}{\sqrt{-2jz\beta_2}}\right)-\text{erfi}\left(\frac{t-z\beta_2\omega_n}{\sqrt{-2jz\beta_2}}\right)\right)\]

where \(\text{erfi}(x)\) is the we imaginary error function given by \(\text{erfi}(x)=−j\text{erf}(jx)\) where

\[\tag{8.42}\text{erf}(z)=\frac{2}{\sqrt{\pi}}\int_0^ze^{-t^2}dt\]

The fact that we can write the response as \(g_{c_\text{bl}}(z,t)=g_c(z, t)w(z, t)\) indicates that the band-limited response may be obtained by multiplying the impulse response by a window function \(w(z, t)\).

As mentioned earlier, we may sample the impulse response signal in order to estimate the FIR filter coefficients. However, rather than detail this we turn our attention to the least squares formulation of the FIR filter since we see subsequently that they are equivalent.

3.5. Least Squares FIR Filter Design

As previously discussed, the least squares criterion may be applied to the design of a complex FIR filter, giving optimal tap weights in a least squares sense:

\[\tag{8.43}\pmb{h}_\text{LS}=\int_{-\frac{\omega_s}{2}}^{\frac{\omega_s}{2}}\pmb{e}^*(\omega)H_d(z,\omega)d\omega\]

If as discussed in the previous section, we define our taps to be symmetrically defined such that \(\pmb{e}(\omega)=e^{j\omega[N-1]T_s/2}\times[1,e^{-j\omega{T_s}},e^{-2j\omega{T_s}},...,e^{-j\omega[N-1]T_s}]^T\), and so as to neglect the combination of any subsequent filtering we chose \(H_d(z,\omega)=1/G(z,\omega)=G(-z,\omega)\) then we obtain

\[\tag{8.44}\begin{align}h_\text{LS}[n]&=\frac{1}{2j}\left(\text{erfi}\left(\sqrt{\frac{j\pi}{\rho}}\left(n-\frac{N-1}{2}+\frac{\rho}{2}\right)\right)-\text{erfi}\left(\sqrt{\frac{j\pi}{\rho}}\left(n-\frac{N-1}{2}-\frac{\rho}{2}\right)\right)\right)\\&\qquad\times\frac{1}{\sqrt{\rho}}\exp\left(-\frac{j\pi}{\rho}\left(n-\frac{N-1}{2}\right)^2\right)\end{align}\]

where \(\rho=2\pi\beta_2z/T_s^2\), which we note is identical to the sampled version of the band-limited impulse response. As mentioned earlier, we may factorize this to give

\[\tag{8.45}h_\text{LS}[n]=w[n]\times{h}_\text{TI}[n]\]

where

\[\tag{8.46}w[n]=\frac{1}{2j}\left(\text{erfi}\left(\sqrt{\frac{j\pi}{\rho}}\left(n-\frac{N-1}{2}+\frac{\rho}{2}\right)\right)-\text{erfi}\left(\sqrt{\frac{j\pi}{\rho}}\left(n-\frac{N-1}{2}-\frac{\rho}{2}\right)\right)\right)\]

3.6. Example Performance of the Chromatic Dispersion Compensating Filter

We now consider the performance of the chromatic dispersion compensating filter. In order to assess this, we consider a 35 Gbaud signal with a near-rectangular Nyquist-shaped spectrum (root raised cosine shape with \(\beta\) = 0.01). While the results are given for a specific distance, they can readily be scaled by calculating the maximum length of the truncated impulse response given by \(N=2\pi\beta_2z/T_s^2\) and scaling accordingly.

As can be seen from Figure 8.2, while for PDM-QPSK operating over long-haul distances the signal-to-noise penalty incurred as a result of the design is negligible, for shorter distances or as the cardinality of the modulation format is increased the truncated impulse response has significant limitations. To overcome these limitations the least squares formulation is employed.

**Figure 8.2**. Performance of the truncated impulse response FIR filter design.

From Figure 8.3 we note that the least squares formulation results in a significantly reduced penalty even though in both cases the same number of taps are the same provided \(N=2\pi\beta_2z/T_s^2\). Nevertheless, for highly spectrally efficient formats such as 64 QAM, the penalty can be significant for short distances. The penalty may, however, be mitigated by allowing the number of taps in the least squares design to increase beyond \(N=2\pi\beta_2z/T_s^2\).

**Figure 8.3**. Performance of the least squares FIR filter design.

To illustrate this, we again consider a 35-Gbaud signal with a near rectangular Nyquist-shaped spectrum (root raised cosine shape with \(\beta\) = 0.01) transmitting PDM-64QAM over a distance of 31.25 km of single-mode fiber with \(D\) = 17 ps/km/nm. While this gives a minimum number of taps required of 20 (or 21 if an odd number of taps are employed) as can be seen in Figure 8.4 by increasing the number of taps by 66.7% the penalty can be made negligible (<0.1 dB from the initial value of 2.9 dB with the minimum number of taps).

**Figure 8.4**. Reduction in penalty achieved by allowing the number of taps to increase beyond the minimum for the least squares formulation.

Before closing this section, it is illustrative to consider the shape of the window function \(w[n]\). In Figure 8.5, we plot the window function \(w[n]\) for a range of distances normalized to the length of the truncated impulse response (such that the truncated impulse response only extends from −0.5 to 0.5).

**Figure 8.5**. Shape of the window function for a range of distances with 35 Gbaud signals, with \(|n/N_\text{TI}|\le0.5\) corresponding to the support of the truncated impulse response.

As can be seen in Figure 8.5, as the distance increases the shape of the window function changes such that for long distances the window function converges to the simple window used in the truncated impulse response. This reveals why the performance of the truncated impulse response improves for longer distances, as this converges to the least squares solution.

4. Equalization of Polarization-Mode Dispersion

PMD arises due to variations in the circular symmetry of the optical fiber resulting in localized birefringence. While the PMD is a unitary operation, in general compensation of PMD is included within a subsystem that includes mitigation of PDL, relaxing the unitary requirement. By removing the unitary condition, polarization-independent effects such as nonideal-matched filtering can also be mitigated.

4.1. Modeling of PMD

PMD results in information being coupled from one polarization to another such that the information in the x and y polarization at the output \(\pmb{E}_\text{out}(\omega)=[X_\text{out}(\omega),Y_\text{out}(\omega)]^T\) is related to the input states \(\pmb{E}_\text{in}(\omega)=[X_\text{in}(\omega),Y_\text{in}(\omega)]^T\) via

\[\tag{8.47}\begin{bmatrix}X_\text{out}(\omega)\\Y_\text{out}(\omega)\end{bmatrix}=U\begin{bmatrix}X_\text{in}(\omega)\\Y_\text{in}(\omega)\end{bmatrix}=e^{j\phi(\omega)}\begin{bmatrix}u_1(\omega)&u_2(\omega)\\-u_2^*(\omega)&u_1^*(\omega)\end{bmatrix}\begin{bmatrix}X_\text{in}(\omega)\\Y_\text{in}(\omega)\end{bmatrix}\]

with \(|u_1(\omega)|^2+|u_2(\omega)|^2=1\). The simplest manifestation of PMD is as a differential group delay (DGD) of \(\tau\) such that

\[\tag{8.48}\begin{bmatrix}u_1(\omega)&u_2(\omega)\\-u_2^*(\omega)&u_1^*(\omega)\end{bmatrix}=\begin{bmatrix}\cos(\theta)&\sin(\theta)\\-\sin(\theta)&\cos(\theta)\end{bmatrix}\begin{bmatrix}e^{\frac{j\omega\tau}{2}}&0\\0&e^{-\frac{j\omega\tau}{2}}\end{bmatrix}\begin{bmatrix}\cos(\theta)&-\sin(\theta)\\\sin(\theta)&\cos(\theta)\end{bmatrix}\]

with the DGD being obtainable from the Jones matrix via \(\tau=2\sqrt{|u_1'|^2+|u_2'|^2}\) where \('\) denotes differentiation with respect to angular frequency \(\omega\). From this, we note that the inverse Jones matrix in the time domain is given by

\[\tag{8.49}\begin{bmatrix}\cos(\theta)&\sin(\theta)\\-\sin(\theta)&\cos(\theta)\end{bmatrix}\begin{bmatrix}\delta\left(t-\frac{\tau}{2}\right)&0\\0&\delta\left(t+\frac{\tau}{2}\right)\end{bmatrix}\begin{bmatrix}\cos(\theta)&-\sin(\theta)\\\sin(\theta)&\cos(\theta)\end{bmatrix}\]

This reveals that if the equalizer is to correct for a DGD of \(\tau\) then the temporal span \((N−1)T_s\) of an \(N\) tap FIR filter must exceed \(\tau\) requiring \(N\ge\tau/T_s+1\). As we discuss in the subsequent sections, this is very much a lower bound on the number of taps and in practice when the convergence time of the equalizer is of concern more taps may be required.

Since PMD is due to random coupling between the polarization modes, DGD has a statistical variation with a Maxwellian distribution with a probability density function given by

\[\tag{8.50}f(\tau)=\frac{32}{\tau\pi^2}\left(\frac{\tau}{\tau}\right)^2\exp\left(-\frac{4}{\pi}\left(\frac{\tau}{\tau}\right)^2\right)\]

where \(\langle\tau\rangle\) is the mean DGD and the resulting outage probability given by the corresponding tail distribution \(F_c(\tau)\) is given by

\[\tag{8.51}F_c(\tau)=\int_{\tau}^{\infty}f(x)dx=1+\frac{4}{\pi}\left(\frac{\tau}{\tau}\right)\exp\left(-\frac{4}{\pi}\left(\frac{\tau}{\tau}\right)^2\right)-\text{erf}\left(\frac{2}{\sqrt{\pi}}\left(\frac{\tau}{\tau}\right)\right)\]

By considering a minmax approximation of \(\log_{10}F_c(\tau)\) in the region \(1.6\langle\tau\rangle\le\tau\le6\langle\tau\rangle\), we obtain the following simpler expression for the effect of DGD on the outage probability \(F_c(\tau)\)

\[\tag{8.52}\log_{10}F_c(\tau)=0.2591\left(\frac{\tau}{\langle\tau\rangle}\right)-0.5722\left(\frac{\tau}{\langle\tau\rangle}\right)^2\]

with the maximum relative error of \(\log_{10}F_c(\tau/\langle\tau\rangle)\) being less than 0.15%. By way of an example, we consider data from a field trial of 100 Gbit/s with 31.25 Gbaud PDM-QPSK with \(\langle\tau\rangle\) = 36.6 ps with an adaptive equalizer with 13 taps. This gives \(\tau/\langle\tau\rangle\) = 5.2 and hence a negligible outage probability of 4 × 10\(^{-15}\). While it was also noted that no penalty was observed with \(\langle\tau\rangle\) up to 55 ps then the corresponding outage probability is 8 × 10\(^{-7}\) and as such a penalty is not expected to be observed.

4.2. Obtaining the Inverse Jones Matrix of the Channel

The impact of polarization-dependent effects on the propagation may be modeled by a Jones matrix. In general, this matrix is not unitary due to PDL and, furthermore, it will be frequency dependent due to PMD. The task is, therefore, to estimate the Jones matrix and obtain the inverse to compensate for the impairments incurred.

In contrast to the chromatic dispersion that may be considered relatively constant, the Jones matrix may evolve in time due to effects such as rapid variations in the polarization state, and therefore the compensation scheme must be adaptive.

The problem of compensating polarization rotations digitally was first considered by Betti et al., and later demonstrated utilizing the formalism of multiple-input-multiple-output (MIMO) systems.

For inputs \(\pmb{x}=(x[k], x[k − 1], … , x[k − (N − 1)])^T\) and \(\pmb{y}=(y[k], y[k − 1],… , y[k − (N − 1)])^T\), the outputs \(x_o[k]\) and \(y_o[k]\) are given by

\[\tag{8.53}x_o[k]=\pmb{h}_{xx}^T\pmb{x}+\pmb{h}_{xy}^T\pmb{y}\qquad\text{and}\qquad{y_o}[k]=\pmb{h}_{yx}^T\pmb{x}+\pmb{h}_{yy}^T\pmb{y}\]

where \(\pmb{h}_{xx}\), \(\pmb{h}_{xy}\), \(\pmb{h}_{yx}\), and \(\pmb{h}_{yy}\) are adaptive filters each of which have length \(N\) taps.

While there are a number of methods for adapting the equalizer in MIMO systems, we restrict ourselves to a specific example that exploits properties of the data, namely that for polarization-division multiplexed QPSK (PDM-QPSK) the signal for each polarization should have a constant modulus.

This CMA has also been shown to be effective even when the modulus is not constant such as higher-order quadrature-amplitude modulation.

4.3. Constant Modulus Update Algorithm

For signals of unit amplitude the equalizer will attempt to minimize, in a mean squares sense, the magnitude of \(\epsilon_x=1-|x_o|^2\) and \(\epsilon_y=1-|y_o|^2\). Hence, to obtain the optimal tap weights a set of stochastic-gradient algorithms with convergence parameter \(\mu\) are used

\[\tag{8.54}\begin{align}\pmb{h}_{xx}:=\pmb{h}_{xx}-\mu\frac{\partial\epsilon_x^2}{\partial\pmb{h}_{xx}^*}&=\pmb{h}_{xx}-2\mu\epsilon_x\frac{\partial\epsilon_x}{\partial\pmb{h}_{xx}^*}=\pmb{h}_{xx}+2\mu\epsilon_x\frac{\partial|x_o|^2}{\partial\pmb{h}_{xx}^*}\\&=\pmb{h}_{xx}+2\mu\epsilon_xx_o\frac{\partial}{\partial\pmb{h}_{xx}^*}x_o^*=\pmb{h}_{xx}+2\mu\epsilon_xx_o\pmb{x}^*\end{align}\]

Similarly,

\[\tag{8.55}\pmb{h}_{xy}:=\pmb{h}_{xy}-\mu\frac{\partial\epsilon_x^2}{\partial\pmb{h}_{xy}^*}=\pmb{h}_{xy}+2\mu\epsilon_xx_o\frac{\partial}{\partial\pmb{h}_{xy}^*}x_o^*=\pmb{h}_{xy}+2\mu\epsilon_xx_o\pmb{y}^*\]

Interchanging the variables \(x\) and \(y\), we then obtain

\[\tag{8.56}\pmb{h}_{yx}:=\pmb{h}_{yx}-\mu\frac{\partial\epsilon_y^2}{\partial\pmb{h}_{yx}^*}=\pmb{h}_{yx}+2\mu\epsilon_yy_o\frac{\partial}{\partial\pmb{h}_{yx}^*}y_o^*=\pmb{h}_{yx}+2\mu\epsilon_yy_o\pmb{x}^*\]

and finally we have

\[\tag{8.57}\pmb{h}_{yy}:=\pmb{h}_{yy}-\mu\frac{\partial\epsilon_y^2}{\partial\pmb{h}_{yy}^*}=\pmb{h}_{yy}+2\mu\epsilon_yy_o\frac{\partial}{\partial\pmb{h}_{yy}^*}y_o^*=\pmb{h}_{yy}+2\mu\epsilon_yy_o\pmb{y}^*\]

where \(\pmb{x}^*\) and \(\pmb{y}^*\) denote the complex conjugate of \(\pmb{x}\) and \(\pmb{y}\), respectively. In order to initialize the algorithm, all tap weights are set to zero with the exception of the central tap of \(\pmb{h}_{xx}\) and \(\pmb{h}_{yy}\), which are set to unity. Given the equalizer is unconstrained with respect to its outputs, it is possible for the equalizer to converge on the same output, corresponding to the Jones matrix becoming singular; however, there are many well-established means overcoming this limitation

4.4. Decision-Directed Equalizer Update Algorithm

Once the equalizer has converged, then the equalizer may move into a decision-directed mode, such that if \(D(x)\) is the symbol closest to \(x\) then the decision-directed least mean squared (DD-LMS) algorithm minimizes \(\epsilon_x= D(x_0)-x_o\) and \(\epsilon_y=D(y_0)-y_o\) giving the update algorithm as

\[\tag{8.58}\pmb{h}_{xx}:=\pmb{h}_{xx}-\mu\frac{\partial|\epsilon_x|^2}{\partial\pmb{h}_{xx}^*}=\pmb{h}_{xx}-\mu\epsilon_x\frac{\partial\epsilon_x^*}{\partial\pmb{h}_{xx}^*}=\pmb{h}_{xx}+\mu\epsilon_x\frac{\partial{x_o^*}}{\partial\pmb{h}_{xx}^*}=\pmb{h}_{xx}+\mu\epsilon_x\pmb{x}^*\]

\[\tag{8.59}\pmb{h}_{xy}:=\pmb{h}_{xy}-\mu\frac{\partial|\epsilon_x|^2}{\partial\pmb{h}_{xy}^*}=\pmb{h}_{xy}+\mu\epsilon_x\frac{\partial{x}_o^*}{\partial\pmb{h}_{xy}^*}=\pmb{h}_{xy}+\mu\epsilon_x\pmb{y}^*\]

\[\tag{8.60}\pmb{h}_{yx}:=\pmb{h}_{yx}-\mu\frac{\partial|\epsilon_y|^2}{\partial\pmb{h}_{yx}^*}=\pmb{h}_{yx}+\mu\epsilon_y\frac{\partial{y}_o^*}{\partial\pmb{h}_{yx}^*}=\pmb{h}_{yx}+\mu\epsilon_y\pmb{x}^*\]

\[\tag{8.61}\pmb{h}_{yy}:=\pmb{h}_{yy}-\mu\frac{\partial|\epsilon_y|^2}{\partial\pmb{h}_{yy}^*}=\pmb{h}_{yy}+\mu\epsilon_y\frac{\partial{y}_o^*}{\partial\pmb{h}_{yy}^*}=\pmb{h}_{yy}+\mu\epsilon_y\pmb{y}^*\]

One of the challenges of the decision-directed equalizer is the need to combine the carrier recovery with the equalization since the decisions are made on the phase-corrected signal.

The resulting feedback path can, therefore, be more challenging for CMOS ASIC implementation than the blind equalizer that partitions the equalization from the carrier recovery.

4.5. Radially Directed Equalizer Update Algorithm

While the CMA is well suited to constant modulus formats such at QPSK many of the formats considered for future optical networks are not constant modulus such as PDM-16QAM, such that the CMA will never converge to zero error.

Nevertheless, the CMA can be adapted to a radially directed equalizer. In this case, \(\epsilon_x=Q_r(|x_o|^2)-|x_o|^2\) and \(\epsilon_y=Q_r(|y_o|^2)-|y_o|^2\), where \(Q_r(r^2)\) is a function that quantizes the radius according to the number of possible points.

Once the notional radius is determined the CMA algorithm is used to update the tap weights accordingly. One of the key benefits of using a radially directed equalizer is that it is invariant to the phase of the incoming signal and hence allows the equalization and the carrier recovery to be partitioned; however, for more dense modulation formats, often it is preferable to use the CMA initially and then switch to a decision-directed equalizer.

4.6. Parallel Realization of the FIR Filter

One of the key benefits of the FIR is that it can readily be implemented in CMOS. Thus far, all of the implementations discussed have operated on a symbol-by-symbol basis rather than based on a lower speed CMOS bus. In this case, the DSP is similar to the serial DSP given by Equation 8.53, but now we write

\[\tag{8.62}\pmb{x}_o[k]=\pmb{h}_{xx}^T\pmb{X}+\pmb{h}_{xy}^T\pmb{Y}\qquad\text{and}\qquad{\pmb{y}_o}[k]=\pmb{h}_{yx}^T\pmb{X}+\pmb{h}_{yy}^T\pmb{Y}\]

where \(\pmb{x}_o[k]=[x_o[kN_b+1],x_o[kN_b+2],...,x_o[kN_b+N_b]]\) and \(\pmb{X}=[\pmb{x}_1,\pmb{x}_2,...,\pmb{x}_{N_b}]\) and so on. In this case, the error term for the CMA becomes \(\epsilon_x=1-\pmb{x}_o\circ\pmb{x}_o^*\) and similarly \(\epsilon_y=1-\pmb{y}_o\circ\pmb{y}_o^*\) where \(\circ\) denotes the Hadamard product being the element-by-element multiplication with the resulting update algorithm for the CMA becoming

\[\tag{8.63}\pmb{h}_{xx}:=\pmb{h}_{xx}+2\mu(\epsilon_x\circ\pmb{x}_o)^T\pmb{X}^*\]

\[\tag{8.64}\pmb{h}_{xy}:=\pmb{h}_{xy}+2\mu(\epsilon_x\circ\pmb{x}_o)^T\pmb{Y}^*\]

\[\tag{8.65}\pmb{h}_{yx}:=\pmb{h}_{yx}+2\mu(\epsilon_y\circ\pmb{y}_o)^T\pmb{X}^*\]

\[\tag{8.66}\pmb{h}_{yy}:=\pmb{h}_{yy}+2\mu(\epsilon_y\circ\pmb{y}_o)^T\pmb{Y}^*\]

Once the CMA has converged, a decision-directed equalizer is employed with

\[\tag{8.67}\pmb{h}_{xx}:=\pmb{h}_{xx}+\mu\epsilon_x^T\pmb{X}^*\]

\[\tag{8.68}\pmb{h}_{xy}:=\pmb{h}_{xy}+\mu\epsilon_x^T\pmb{Y}^*\]

\[\tag{8.69}\pmb{h}_{yx}:=\pmb{h}_{yx}+\mu\epsilon_y^T\pmb{X}^*\]

\[\tag{8.70}\pmb{h}_{yy}:=\pmb{h}_{yy}+\mu\epsilon_y^T\pmb{Y}^*\]

4.7. Generalized 4×4 Equalizer for Mitigation of Frequency or Polarization-Dependent Loss and Receiver Skew

Provided the equalizer is not constrained to be unitary then all of the equalizers discussed in this section can also be used to mitigate the impact of frequency or PDL.

The equalizer can, however, be generalized further to the 4 × 4 equalizer by relaxing the assumption in the 2 × 2 complex equalizer that the real and imaginary signals are sampled synchronously and are orthonormal. In this case, the equalizer structure becomes

\[\tag{8.71}x_{o_r}[k]=\pmb{h}_{x_rx_r}^T\pmb{x}_{\pmb{r}}+\pmb{h}_{x_rx_i}^T\pmb{x}_{\pmb{i}}+\pmb{h}_{x_ry_r}^T\pmb{y}_{\pmb{r}}+\pmb{h}_{x_ry_i}^T\pmb{y}_{\pmb{i}}\]

\[\tag{8.72}x_{o_i}[k]=\pmb{h}_{x_ix_r}^T\pmb{x}_{\pmb{r}}+\pmb{h}_{x_ix_i}^T\pmb{x}_{\pmb{i}}+\pmb{h}_{x_iy_r}^T\pmb{y}_{\pmb{r}}+\pmb{h}_{x_iy_i}^T\pmb{y}_{\pmb{i}}\]

\[\tag{8.73}y_{o_r}[k]=\pmb{h}_{y_rx_r}^T\pmb{x}_{\pmb{r}}+\pmb{h}_{y_rx_i}^T\pmb{x}_{\pmb{i}}+\pmb{h}_{y_ry_r}^T\pmb{y}_{\pmb{r}}+\pmb{h}_{y_ry_i}^T\pmb{y}_{\pmb{i}}\]

\[\tag{8.74}y_{o_i}[k]=\pmb{h}_{y_ix_r}^T\pmb{x}_{\pmb{r}}+\pmb{h}_{y_ix_i}^T\pmb{x}_{\pmb{i}}+\pmb{h}_{y_iy_r}^T\pmb{y}_{\pmb{r}}+\pmb{h}_{y_iy_i}^T\pmb{y}_{\pmb{i}}\]

where \(x_{o_r}=\boldsymbol{\mathscr{R}}\{x_o\}\), \(x_{o_i}=\boldsymbol{\mathscr{I}}\{x_o\}\), \(\pmb{x}_{\pmb{r}}=\boldsymbol{\mathscr{R}}\{\pmb{x}\}\), \(\pmb{x}_{\pmb{i}}=\boldsymbol{\mathscr{I}}\{\pmb{x}\}\) and so on.

For the CMA, the update algorithm is

\[\tag{8.75}\begin{align}\pmb{h}_{x_rx_r}&:=\pmb{h}_{x_rx_r}-\mu\frac{\partial\epsilon_x^2}{\partial\pmb{h}_{x_rx_r}^*}=\pmb{h}_{x_rx_r}-2\mu\epsilon_x\frac{\partial\epsilon_x}{\partial\pmb{h}_{x_rx_r}}=\pmb{h}_{x_rx_r}+2\mu\epsilon_x\frac{\partial|x_o|^2}{\partial\pmb{h}_{x_rx_r}}\\&=\pmb{h}_{x_rx_r}+2\mu\epsilon_x\frac{\partial(x_{o_r}^2+x_{o_i}^2)}{\partial\pmb{h}_{x_rx_r}}=\pmb{h}_{x_rx_r}+4\mu\epsilon_xx_{o_r}\frac{\partial{x_{o_r}}}{\partial\pmb{h}_{x_rx_r}}\\&=\pmb{h}_{x_rx_r}+4\mu\epsilon_xx_{o_r}\pmb{x}_{\pmb{r}}\end{align}\]

with the updates for the other filters being obtained in a similar manner such that if we define the set \(k\in\{x_r,x_i,y_r,y_i\}\) then the 16 updates may be written compactly as

\[\tag{8.76}\pmb{h}_{\pmb{x}_{\pmb{r}}\pmb{k}}:=\pmb{h}_{\pmb{x}_{\pmb{r}},\pmb{k}}+4\mu\epsilon_xx_{o_r}\pmb{k}\]

\[\tag{8.77}\pmb{h}_{\pmb{x}_{\pmb{i}}\pmb{k}}:=\pmb{h}_{\pmb{x}_{\pmb{i}},\pmb{k}}+4\mu\epsilon_xx_{o_i}\pmb{k}\]

\[\tag{8.78}\pmb{h}_{\pmb{y}_{\pmb{r}}\pmb{k}}:=\pmb{h}_{\pmb{y}_{\pmb{r}},\pmb{k}}+4\mu\epsilon_yy_{o_r}\pmb{k}\]

\[\tag{8.79}\pmb{h}_{\pmb{y}_{\pmb{i}}\pmb{k}}:=\pmb{h}_{\pmb{y}_{\pmb{i}},\pmb{k}}+4\mu\epsilon_yy_{o_i}\pmb{k}\]

4.8. Example Application to Fast Blind Equalization of PMD

Consider again Nyquist-shaped 35 Gbaud PDM-QPSK signal using an FEC with a BER limit of 2 × 10\(^{-2}\) (dBQ = 6.25 dB). To allow for blind acquisition, the signal is recovered using a CMA, targeting a mean DGD of 10 ps (corresponding to the typical link budget for a legacy 10 Gbit/s), requiring just five taps in order to neglect the outage probability.

We assume a highly parallel implementation with 256 degrees of parallelism such that the CMOS DSP operates as 273MHz while the ADC samples at 70 GSa/s for 35 Gbaud PDM-QPSK.

We include 1 dB of margin (i.e., the signal has an signal-to-noise ratio (SNR) of approximately 7.2 dB, being 1 dB better OSNR than required for a BER=2 × 10\(^{-2}\)), so as to reduce the convergence time and we consider the performance for varying levels of DGD, 5, 10, 20, and 40 ps.

One of the particular challenges of improving the FEC to increase the tolerance to noise is that the SNR is significantly reduced in the equalizer. In Figure 8.6, while \(\mu\) = 2\(^{-8}\) ≈ 0.004 to ensure that the equalizer converges with an SNR of 7.2 dB the stochastic gradient is averaged over two successive CMOS clock cycles, corresponding to a total of 512 samples.

Nevertheless, even with this low SNR a convergence time of less than 400 ns is achievable for a DGD < 40 ps, being 70% of the maximum that the five tap equalizer could compensate.

As the DGD increases toward the maximum theoretical value, the convergence time increases significantly highlighting the need for more research into the area of the convergence of equalization in the presence of distortions with low SNR.

**Figure 8.6**. Convergence of a five-tap equalizer with differing levels of DGD. The BER is time resolved to the block period (7 ns) with the block averaged BER averaged over 1000 different realizations corresponding to one million bits with an SNR = 7.2 dB (being 1 dB greater than that required for BER = 2%).

5. Concluding Remarks and Future Research Directions

In this tutorial, we have outlined the key algorithms for equalization, both in terms of the design of fixed filters but also in the adaptive updating of filters. The key subsystems for digital equalization of chromatic and PMD have been extensively studied for PDM-QPSK with an SNR in the region of 10 dB; however, as we have discussed in this chapter transceiver technology is rapidly moving beyond this toward more dense constellations with reduced SNR.

While the truncated impulse response FIR filter design for chromatic dispersion equalization was adequate for PDM-QPSK as systems moved toward PDM-64QAM this had to be revisited requiring alternative techniques such as the least squares design to be employed.

Likewise for the adaptive equalization while the CMA has been extensively utilized for PDM-QPSK systems allowing the equalization to be partitioned from the carrier recovery, an equivalent to the CMA that is robust for high levels of QAM has yet to be determined.

The situation is further complicated by the move to soft-decision FEC and the associated reduced SNR. Therefore, the first area of future research to highlight is that of equalization for higher-level modulation formats with low SNR, with a key consideration being that the algorithms should be optimized for highly parallel DSP to permit realization in CMOS.

A second area of future research arises with the development of dynamic elastic optical networking in which transceivers can vary their rate but also be dynamically dropped and added, for example, to cope to the so-called “elephant flows” between data centers.

This second area of research calls for both rate adaptive DSP and also research into fast acquisition algorithms so as to minimize the time taken to establish a wavelength on demand service.

A third area of research is that of reduced complexity equalization. Not only does this reduce the power consumption of the DSP but also it permits the use of DSP in applications that are more cost sensitive such as access networks.

While typically the chromatic dispersion compensation filter is implemented using techniques such as the overlap and save method, similar frequency domain approaches can be utilized for the adaptive equalizers.

Ultimately, as optical fiber communication systems continue to evolve the need for research into equalization will continue to be a fertile area of research, particularly as the capabilities of the transceivers increase both in terms of maximum data rate and flexibility to adapt to network demands.

The next tutorial introduces X-Ray optics.

1. Introduction

2. Primer on the Mathematics of Least Squares FIR filters

2.1. Finite Impulse Response Filters

2.2. Differentiation with Respect to a Complex Vector

2.3. Least Squares Tap Weights

2.4. Application to Stochastic Gradient Algorithms

2.5. Application to Wiener Filter

2.6. Other Filtering Techniques and Design Methodologies

3. Equalization of Chromatic Dispersion

3.1. Nature of Chromatic Dispersion

3.2. Modeling of Chromatic Dispersion in an Optical Fiber

3.3. Truncated Impulse Response

3.4. Band-Limited Impulse Response

3.5. Least Squares FIR Filter Design

3.6. Example Performance of the Chromatic Dispersion Compensating Filter

4. Equalization of Polarization-Mode Dispersion

4.1. Modeling of PMD

4.2. Obtaining the Inverse Jones Matrix of the Channel

4.3. Constant Modulus Update Algorithm

4.4. Decision-Directed Equalizer Update Algorithm

4.5. Radially Directed Equalizer Update Algorithm

4.6. Parallel Realization of the FIR Filter

4.7. Generalized 4×4 Equalizer for Mitigation of Frequency or Polarization-Dependent Loss and Receiver Skew

4.8. Example Application to Fast Blind Equalization of PMD

5. Concluding Remarks and Future Research Directions

Share this post