# Advances in Detection and Error Correction for Coherent Optical Communications - Regular, Irregular, and Spatially Coupled LDPC code Designs

This is a continuation from the previous tutorial - ** the Rabi frequency**.

## 1. Introduction

Forward error correction (FEC) in optical communications has been first demonstrated in 1988. Since then, coding technology has evolved significantly. This pertains not only to the codes but also to encoder and decoder architectures.

Modern high-speed optical communication systems require high-performance FEC engines that support throughputs of 100 Gbit/s or multiples thereof, that have low power consumption, that realize net coding gains (NCGs) close to the theoretical limits at a target bit error rate (BER) of less than \(10^{-15}\), and that are preferably adapted to the peculiarities of the optical channel.

FEC coding is based on deterministically adding redundant bits to a source information bit sequence. After transmission over a noisy channel, a decoding system tries to exploit the redundant information for fully recovering the source information.

Several methods for generating the redundant bit sequence from the source information bits are known. Transmission systems with 100 Gbit/s and 400 Gbit/s today typically use one of two coding schemes to generate the redundant information: block-turbo codes (BTCs) or low-density parity-check (LDPC) codes.

In coherent systems, so-called soft information is usually readily available and can be used in high-performance systems within a soft-decision decoder architecture. Soft-decision information means that no binary 0/1 decision is made before entering the FEC decoder. Instead, the (quantized) samples are used together with their statistics to get improved estimates of the original bit sequence.

This tutorial focuses on soft-decision decoding of LDPC codes and the evolving spatially coupled (SC) LDPC codes.

In coherent optical communications, the signal received after carrier recovery may be affected by distortions that are different from those that commonly occur in wireless communications.

For instance, the signal at the input of the signal space demapper may be affected by phase slips (also called cycle slips), with a probability depending on the nonlinear phase noise introduced by the optical transmission link.

The phase slips are not an effect of the physical waveform channel but, rather, an artifact of coarse blind phase recovery algorithms with massive parallelization at the initial digital signal processing (DSP) receiver steps.

If such a phase slip is ignored, error propagation will occur at the receiver and all data following the phase slip cannot be properly recovered. Several approaches to mitigate phase slips have been proposed.

Of these, the most common is differential coding, rendering a phase slip into a single error event. In order to alleviate the penalty caused by differential coding, iterative decoding between an FEC decoder and a differential decoder can be beneficial.

This solution leads, however, to an increased receiver complexity, as several executions of a soft-input soft-output differential decoder (usually based on the BCJR algorithm) have to be carried out.

In this tutorial, we first show how the use of differential coding and the presence of phase slips in the transmission channel affect the total achievable information rates and capacity of a system.

By means of the commonly used quadrature phase-shift keying (QPSK) modulation, we show that the use of differential coding does not decrease the capacity, that is, the total amount of reliably conveyable information over the channel remains the same. It is a common misconception that the use of differential coding introduces an unavoidable “differential loss.” This perceived differential loss is rather a consequence of simplified differential detection and decoding at the receiver.

Afterwards, we show how capacity-approaching coding schemes based on LDPC and spatially coupled LDPC codes can be constructed by combining iterative demodulation and decoding.

For this, we first show how to modify the differential decoder to account for phase slips and then how to use this modified differential decoder to construct good LDPC codes. This construction method can serve as a blueprint to construct good and practical LDPC codes for other applications with iterative detection, such as higher-order modulation formats with nonsquare constellations, multidimensional optimized modulation formats, turbo equalization to mitigate ISI (e.g., due to nonlinearities), and many more.

Finally, we introduce the class of spatially coupled (SC)-LDPC codes, which are a specialization of LDPC codes with some outstanding properties and which can be decoded with a very simple windowed decoder. We show that the universal behavior of spatially coupled codes makes them an ideal candidate for iterative differential demodulation/detection and decoding.

This tutorial is structured as follows:

In Section 2, we formally introduce the notation, system model, and differential coding. We highlight some pitfalls that one may encounter when phase slips occur on the equivalent channel.We propose a modified differential decoder that is necessary to construct a capacity-approaching system with differential coding.

In Section 3, we introduce LDPC codes and iterative detection. We highlight several possibilities of realizing the interface between the LDPC decoder and the detector, and give design guidelines for finding good degree distributions of the LDPC code. We show that with iterative detection and LDPC codes, the differential loss can be recovered to a great extent.

Finally, in Section 4, we introduce SC-LDPC codes and show how a very simple construction can be used to realize codes that outperform LDPC codes while having similar decoding complexity.

## 2. Differential Coding for Optical Communications

In this section, we describe and study the effect of differential coding on coherent optical communication systems and especially on the maximum conveyable information rate (the so-called capacity).

We assume a simple, yet accurate channel model based on additive white Gaussian noise (AWGN) and random phase slips. We start by giving a rigorous description of higher-order modulation schemes frequently used in coherent communications and then introduce in Section 2.2 the channel model taking into account phase slips that are due to imperfect phase estimation in the coherent receiver.

We then introduce differential coding and show how the differential decoder has to be modified in order to properly take into account phase slips. We show that differential coding as such does not limit the capacity of a communication system, provided that an adequate receiver is used.

### 2.1. Higher-Order Modulation Formats

In this section, the interplay of coding and modulation is discussed in detail. We only take on an \(\text{IQ}\)-perspective of digital modulation, representing digital modulation symbols as complex numbers.

The sequence of complex numbers (where \(\text{I}\) denotes the real part and \(\text{Q}\) the imaginary part) is then used to generate the actual waveform (taking into account pulse shaping and eventually electronic predistortion), that is, to drive the optical modulators generating the \(\text{I}\) and \(\text{Q}\) components.

When talking about digital modulation, especially in the context of coded modulation, we are mostly interested in the ** mapping function**, which is that part of the modulator that assigns (complex) modulation symbols to bit patterns.

We introduce in what follows the notation necessary for describing the mapping function. Let \(q\) denote the number of bits that are assigned to one complex modulation symbol \(y\in\mathbb{C}\), and let \(\pmb{b}=(b_1, b_2,…, b_q)\in\mathbb{F}_2^q\) be a binary \(q\)-tuple with \(\mathbb{F}_2=\{0,1\}\) denoting the field of binary numbers.

The one-to-one modulation mapping function \(y=\phi(\pmb{b})\) maps the \(q\)-tuple \(\pmb{b}\) to the (complex) modulation symbol \(y\in\mathbb{C}\), where \(y\) is chosen from the set of \(Q=2^q\) modulation symbols \(\boldsymbol{\mathcal{M}}=\{M_0,M_1,…,M_{Q−1}\}\). The set \(\boldsymbol{\mathcal{M}}\) is also commonly referred to as constellation. The mapping function is illustrated in Figure 3.1.

In this tutorial, we only consider one-to-one mappings. One such mapping is \(\phi_\text{Nat}(\pmb{b})=\phi_\text{Nat}(b_1, b_2,…, b_q)=M_{(b_1b_2…b_q)_{10}}\), where \((b_1b_2…b_q)_{10}\) denotes the decimal expansion of the binary \(q\)-digit number \(b_1b_2…b_q\).

In the context of differential coding of higher-order modulation formats, it is advantageous if the constellation \(\boldsymbol{\mathcal{M}}\) fulfills certain properties. One such property is the rotational invariance of the constellation.

**Definition 1 (Rotational Invariance of Constellation)**

We say that a constellation \(\boldsymbol{\mathcal{M}}=\{M_0,M_1,…,M_{Q−1}\}\) exhibits a V-fold rotational invariance if we recover the original constellation \(\boldsymbol{\mathcal{M}}\) after rotating each modulation symbol \(M_i\) by an amount \(\frac{2\pi}{V}k\), \(\forall{k}\in\{1,…, V\}\) in the complex plane. Formally, we say that a constellation exhibits a V-fold rotational invariance if (with \(\imath=\sqrt{-1}\))

\[\{M_i\cdot{e}^{\imath\frac{2\pi}{V}k}:M_i\in\boldsymbol{\mathcal{M}}\}=\boldsymbol{\mathcal{M}}\qquad\text{for all}\quad{k}\in\{1,...,V\}\]

**Example 3.1**

Consider the two constellations with 8 and 16 points shown in Figure 3.2. The rectangular 8QAM (** quadrature amplitude modulation**) constellation of Figure 3.2(a) has a V = 2-fold rotational invariance as any rotation of the constellation by \(\pi\) leads again to the same constellation. The 16QAM constellation shown in Figure 3.2(b) exhibits a V = 4-fold rotational invariance as any rotation of the constellation by \(\frac{\pi}{2}\) leads again to the same constellation.

Before introducing differential coding and modulation, we first describe the channel model including phase slips.

### 2.2 The Phase-Slip Channel Model

In coherent receivers for high-speed optical communications, it is usually not feasible to employ decision-directed blind phase recovery so that usually, feed-forward phase recovery algorithms have to be employed.

Feed-forward carrier recovery algorithms exploit the rotational invariance of the constellation to remove the modulation before estimating the phase. However, due to the necessary phase unwrapping algorithm in the feed-forward phase estimator, a phenomenon called ** phase slip** occurs.

These are mostly due to coarse blind phase recovery algorithms with massive parallelization, including preliminary hard decisions and phase unwrapping at the initial DSP receiver steps.

Figure 3.3 displays the phase-slip channel model we employ in the following. The channel input is a complex modulation symbol \(y[t]\in\boldsymbol{\mathcal{M}}\subset\mathbb{C}\). The first noise contribution is complex-valued AWGN.

In the field of coding and in the broad body of literature on FEC, the terms \(E_s/N_0\) and \(E_b/N_0\) are frequently used to characterize AWGN channels. Therein, \(E_s=\text{E}\{|y|^2\}\) denotes the energy per modulation symbol.

The noise \(n[t]=n_I[t]+\imath{n}_Q[t]\) (where \(\imath=\sqrt{-1}\)) is characterized by the two-sided noise power spectral density \(N_0=2\sigma_n^2\), where \(\sigma_n^2\) is the variance of both noise components \(n_I[t]\) and \(n_Q[t]\), that is, \(\sigma_n^2=\text{var}(n_I[t])=\text{var}(n_Q[t])\).

The received symbol \(z[t]\) in our model is obtained by \(z[t]=(y[t]+n[t])\cdot{p[t]}\), where \(p[t]\) describes the phase slips. Phase slips and \(p[t]\) are discussed in detail later.

Frequently, especially for comparing different coding schemes, \(E_b/N_0\) is used instead of \(E_s∕N_0\). Herein, \(E_b\) denotes the energy per ** information bit**, whereas \(E_s\) denotes the energy per

**.**

*transmit symbol*For example, if a code of rate \(r=4/5\), corresponding to an overhead of \(\Omega=\frac{1}{r}-1\dot{=}25\%\), is used, the ratio of \(n\) code bits versus \(k\) information bits amounts to \(n/k=5/4=1.25\), that is, \(1.25=1/r\) code bits are transmitted for each information bit.

Thereof, \(q\) code bits are assigned to one modulation symbol \(y[t]\). This means that if the modulation symbols are transmitted each with energy \(E_s\), the amount of energy conveyed by each information bit amounts to

\[E_s=E_b\cdot{q}\cdot{r}\quad\Longleftrightarrow\quad{E_b}=\frac{E_s}{q\cdot{r}}\]

As \(E_b\) is normalized with respect to the information bits of the transmission system, it allows us to immediately evaluate the NCG. The NCG is frequently used to assess the performance of a coding scheme and is defined as the difference (in decibels) of required \(E_b/N_0\) values between coded and uncoded transmission for a given output BER. Note that the NCG takes into account the coding rate \(r\) and the number of bits assigned to each modulation symbol, which are included in \(E_b\).

In optical communications, the optical signal-to-noise ratio (OSNR) is also frequently employed. The OSNR is the signal-to-noise ratio (SNR) measured in a reference optical bandwidth, where frequently a bandwidth \(B_\text{ref}\) of 12.5 GHz is used corresponding to 0.1 nm wavelength.

The OSNR relates to the \(E_s/N_0\) and \(E_b/N_0\) as

\[\left.\text{OSNR}\right|_\text{dB}=\left.\frac{E_s}{N_0}\right|_\text{dB}+10\log_{10}\frac{R_S}{B_\text{ref}}=\left.\frac{E_b}{N_0}\right|_\text{dB}+10\log_{10}\frac{q\cdot{r}\cdot{R_S}}{B_\text{ref}}\]

where \(B_\text{ref}\) is the previously introduced reference bandwidth, \(R_S\) corresponds to the symbol rate of the transmission, \(r\) is the aforementioned rate of the code with \(r=k/n\), and \(q\) corresponds to the number of bits mapped to each modulation symbol.

Returning to the description of the channel model of Figure 3.3, we see that the noisy signal \(\tilde{y}[t]\) additionally undergoes a potential phase rotation yielding \(z[t]\). If the constellation shows a \(V\)-fold rotational invariance with \(V\) even (which is the case for most of the practically relevant constellations), we introduce the following probabilistic phase-slip model

\[\begin{align}P(s[t]=\pm1)&=\xi\\P(s[t]=\pm2)&=\xi^2\\\vdots\\P\left(s[t]=\pm\frac{V}{2}\right)&=\xi^{V/2}\\\end{align}\]

The probability that a phase slip occurs is thus

\[\tag{3.1}P_\text{slip}=2\boldsymbol{\sum}_{i=1}^{V/2}\xi^i=2\left(\frac{1-\xi^{\frac{V}{2}+1}}{1-\xi}-1\right)=\frac{2\xi(1-\xi^{V/2})}{1-\xi}\]

For a given phase-slip probability, which may be obtained from measurements, and which may also depend on the nonlinear phase noise introduced by the optical transmission link and the variance of the additive Gaussian noise due to amplification, we obtain the value \(\xi\) by solving (3.1) for \(\xi\).

For the practically most important cases with \(V=2\), and \(V=4\), we get

\[\tag{3.2}\xi=\begin{cases}\frac{P_\text{slip}}{2},\qquad\qquad\qquad\text{if }V=2\\\frac{\sqrt{2P_\text{slip}+1}}{2}-\frac{1}{2},\qquad\text{if }V=4\end{cases}\]

Experimental measurements suggest that the phase-slip probability depends on the equivalent BER before the FEC decoder. We may thus model \(P_\text{slip}\) empirically as

\[\tag{3.3}P_\text{slip}=\begin{cases}\min\left(1,\frac{\gamma}{2}\text{erfc}\left(\sqrt{\frac{E_s}{N_0}}\right)\right),\qquad\text{for BPSK}\\\min\left(1,\frac{\gamma}{2}\text{erfc}\left(\sqrt{\frac{E_s}{2N_0}}\right)\right),\qquad\text{for QPSK}\end{cases}\]

where \(\gamma\) is the factor between slip rate and pre-FEC BER for the equivalent BPSK channel.

Given \(E_s/N_0\) and \(\gamma\), we can compute \(P_\text{slip}\) from (3.3) and subsequently \(\xi\) from (3.2) or (3.1). Using \(\xi\), we can use a pseudo-random number generator to generate a sequence of \(s[t]\) with the probability mass function defined earlier.

### 2.3 Differential Coding and Decoding

Several approaches to mitigate phase slips have been proposed in the literature. Probably the most common is differential coding, rendering a phase slip into a single error event. In this section, we restrict ourselves for simplicity to constellations with a \(V\)-fold rotational invariance where \(V=2^v\), \(v\in\mathbb{N}\), that is, \(V\in\{2, 4, 8, 16, …\}\).

We consider two different cases:

1. In the first case, we have \(V=Q\). To each constellation point, we assign a state \(\text{S}_i\), \(i\in\{1,…, V\}\). An example of such a constellation is the widely used QPSK constellation with \(V=4\), which is shown in Figure 3.4 together with its state assignment.

2. In the second case, we have \(Q\gt{V}\). We restrict ourselves to the practical case with \(Q=J\cdot{V}\), where \(J\) is an integer number.

In this case, we employ differential coding: The constellation is divided into \(V\) disjoint regions such that these regions are preserved when rotating the constellation by \(\pm\frac{2\pi}{V}\). We assign a state label \(\text{S}_i\) to each disjoint region.

The regions are selected such that each region contains exactly \(J=\frac{Q}{V}=2^j\) constellation points and such that a rotation of the constellation by an angle \(\kappa\cdot\frac{2\pi}{V}\), \(\kappa\in\{0,\pm1,\pm2,…\}\) does neither change the regions nor the assignment of points to a region.

For the constellation points within each region, we employ a ** rotationally invariant** bit mapping, which means that the bit mapping of points inside a region is not changed by a rotation of the constellation by an angle \(\kappa\cdot\frac{2\pi}{V}\).

The popular 16QAM constellation is an example of such a constellation with \(Q=16\), \(V=4\), and \(J=4\). The state assignment and rotationally invariant mapping are exemplarily discussed in Example 3.2 and shown in Figure 3.5.

**Example 3.2**

We consider the transmission of the popular 16QAM constellation. It can be easily verified that the 16QAM constellation shows a \(V\) = 4-fold rotational invariance.

As shown in Figure 3.5, we label the four quadrants of the complex plane by states \(\text{S}_1\), \(\text{S}_2\), \(\text{S}_3\), and \(\text{S}_4\). Inside the first quadrant \(\text{S}_1\), we employ a Gray labeling (also denoted by ** mapping**) to assign the bits \(b_3\) and \(b_4\) to the four points.

The mapping of the bits \(b_3\) and \(b_4\) in the three remaining quadrants is obtained by applying a rotational invariant mapping, that is, by rotating the Gray mapping of \(\text{S}_1\) by multiples of \(\frac{\pi}{2}\). In this case, even by rotating the constellation by multiples of \(\frac{\pi}{2}\), the bits \(b_3\) and \(b_4\) can always be recovered unambiguously.

We employ differential coding with \(v=\log_2(V)\) bits to encode and reliably transmit the region, that is, the state. Within each of these regions, exactly \(Q/V\) constellation points are placed, to which a rotationally invariant bit mapping is assigned. This means that whenever the constellation is rotated by an angle that is a multiple of \(\frac{2\pi}{V}\), the bit patterns assigned to constellation points within the region can still be uniquely identified.

Note that we restrict ourselves to state-region assignments such that the rotation of a complete region gives another valid region, that is, \(\forall{i}\in\{1,…, V\}\), there exists a \(j\in\{1,…, V\}\), such that

\[\left\{z\cdot{e}^{\imath\kappa\frac{2\pi}{V}}:z\in{\text{S}}_i\right\}=\text{S}_j,\qquad\forall\kappa\in\{0,\pm1,\pm2,...\}\]

Note that this restriction does not impose any problems for practical systems as most of the practically relevant constellations can be described in this form.

In what follows, we impose another, slightly more stringent condition on the states. We assume that the states \(\text{S}_i\) are assigned in what we denote as rotational order. Formally,

**Definition 2**

We define a sequence of states \(\text{S}_i\), \(i\in\{1,…, V\}\) that are assigned to a region of the complex plane, to be in rotational order, if and only if the following condition

\[\left\{z\cdot{e}^{\imath\kappa\frac{2\pi}{V}}:z\in{\text{S}}_i\right\}=\text{S}_{((i+\kappa-1)\:\text{mod}\:V)+1},\qquad\forall\kappa\in\mathbb{N}\]

is fulfilled.

We can easily verify that the state assignments of the constellations given in Figures 3.4 and 3.5 are in rotational order. Again, note that the restriction of the states to be in rotational order does not yet impose any major constraint, as we have not yet defined an encoding map. We group the \(V\) states into the set \(\mathcal{S}:=\{\text{S}_1, \text{S}_2,…, \text{S}_V\}\).

The main step in differential coding is to impose memory on the modulation. We assume that the transmission starts at time instant \(t=1\). We introduce the differential memory \(\text{d}_t^{[\text{mem}]}\in\mathcal{S}\) and set \(\text{d}_0^{[\text{mem}]}=\text{S}_1\).

The differential encoder can be considered to be the function

\[\begin{align}f_\text{diff}&:\mathcal{S}\times\mathbb{F}_2^v\rightarrow\mathcal{S}\\&\left(\text{d}_{t-1}^{[\text{mem}]},b_{t,1},...,b_{t,v}\right)\mapsto{f}_\text{dff}\left(\text{d}_{t-1}^{[\text{mem}]},b_{t,1},...,b_{t,v}\right)=\text{d}_t^{[\text{mem}]}\end{align}\]

which takes as input the bits \(b_{t,1},…,b_{t,v}\) and the differential memory \(\text{d}_{t-1}^{[\text{mem}]}\) and generates a new state that is saved in the differential memory \(\text{d}_t^{[\text{mem}]}\).

This new state \(\text{d}_t^{[\text{mem}]}\) selects the symbol to be transmitted (if \(V=Q\)) or the region from which the symbol is selected using the bits \(b_{t,v+1},…, b_{t,q}\).

Note that the differential function is not unique but depends on the assignment of bit patterns to state transitions. Consider the example of the QPSK constellation shown in Figure 3.4. We can give two distinct differential encoding maps.

The first differential encoding function is the ** natural differential code**. The state transition diagram of the natural differential code is visualized in Figure 3.6 and is also given in Table 3.1.

The second encoding function, baptized ** Gray differential code** is given in Table 3.2.

Note that all other differential coding maps for the QPSK constellation can be transformed into one of these two forms by elementary transformations of the constellation and the state assignment.

As the differential code can be understood as a Markov process, we can employ the BCJR algorithm to carry out ** bit-wise Maximum A Posteriori (MAP)** decoding of the differential code.

For this, we may represent the differential code using a so-called trellis diagram. The trellis diagram is an “unrolled” version of the state diagram of Figure 3.6. Figure 3.7 shows four segments of a trellis diagram for the natural differential encoding map.

Four segments of the trellis diagram of the Gray differential encoding map are given in Figure 3.8. The different input bit patterns (\(b_1,b_2\)) can be distinguished by different line styles (dashed, dotted, solid, and “waved”).

If phase slips occur on the channel, memory is imposed on the channel as well. If this additional memory is not properly accounted for in the BCJR decoder of the differential code, the performance of the decoder will rapidly decrease, due to the decoder not being properly adapted to the channel model.

We therefore need to extend the trellis to properly ** take into account the phase slips.** One such extension introduces additional states that correspond to the memory of the phase-slip channel.

We introduce states \(\text{S}_{i,\tilde{s}}\) where the second index \(\tilde{s}\) tracks the current phase-slip state \(\tilde{s}[t]\) mod 4 (see Figure 3.3), while the first index \(i\) is still responsible for describing the differential code.

The occurrence of a phase slip (\(s[t]\ne0\)) leads to a different \(\tilde{s}[t]\). For the running example of a differential code for \(V=4\), we have no longer a trellis diagram (or a state transition diagram) with \(4\) states and \(4\cdot4=16\) state transitions, but instead a trellis diagram with \(4\cdot4=16\) states and \(16\cdot16=256\) state transitions.

One segment of this extended trellis diagram is shown in Figure 3.9 for the Gray differential encoding map.

In order to distinguish the additional state transitions corresponding to phase slips, we use Gray scales corresponding to the probability of occurrence of the phase slips.

The original trellis is obtained by utilizing only those state transitions that correspond to \(s[t]=0\), which correspond to the black lines. The state transitions corresponding to \(s[t]=1\) and \(s[t]=3\) are given by gray lines while the state transitions corresponding to \(s[t]=2\) are given by light gray lines, as these have the lowest probability of occurrence.

As the trellis diagram of Fig. 3.9 may be challenging to implement, we seek for a way to reduce its complexity.

By observing that the memory of the phase-slip channel collapses with the memory of the differential encoder, we may get a more compact representation of the trellis and only need \(V\) states.

This is possible as a phase slip does not introduce a new state, but only leads to a different state transition to one of the \(V\) existing states. In fact we have

\[\text{d}_t^{[\text{mem}]'}=\text{S}_{((i+s[t]-1)\:\text{mod}\:V)+1}\qquad\text{with }\text{S}_i=\text{d}_t^{[\text{mem}]}\]

The state transitions are given exemplarily for the case of the Gray differential encoder in Table 3.3.

This means that we can still use a trellis diagram with \(V\) states but have to insert additional state transitions taking into account all possible values of \(s[t]\). Figure 3.10 shows the simplified extended trellis diagram taking into account the possible slips, indicated by the slip value \(s[t]\in\{0, 1, 2, 3\}\).

Again, we use differential gray scales to represent the state transitions corresponding to different values of \(s[t]\). The trellis diagram of Figure 3.10 is a simplification of the extended trellis diagram with only \(V=4\) states (instead of 16) and \(4\cdot16=64\) state transitions (instead of 256).

### 2.4. Maximum a Posteriori Differential Decoding

In what follows, we use the BCJR decoder to carry out bit-wise ** maximum a posteriori** differential decoding. The BCJR decoder makes a decision on the transmitted symbol (equivalent to a state) based on the maximization

\[\widehat{\text{S}}[t]=\text{arg}\quad\underset{\text{s}\in\{\text{S}_1,...,\text{S}_V\}}{\text{max}}\quad{P}(\text{S}[t]=\text{s}|z_1^{\tilde{n}})\]

At each time instant \(t\), the most probable state \(\text{S}_t\) is computed given the complete received sequence \(\tilde{z}_1^{\tilde{n}}=(z[1],z[2],…,z[\tilde{n}])\).

We use the technique of ** EXtrinsic Information Transfer** (EXIT) charts to characterize the behavior of the differential decoder based on the BCJR algorithm.

EXIT charts plot the extrinsic output mutual information as a function of the input mutual information and are a tool to characterize single components in iterative decoders.

Bit interleavers statistically decouple the respective encoding/decoding components such that a single parameter is sufficient to track their input/output relations. This parameter may be the SNR at the output of a processing block, or, as is the case for EXIT charts, the mutual information between transmitted bits and the received and processed soft bit ** log-likelihood ratio** (LLR) values.

For some channels and some codes, the individual transfer characteristics (or EXIT curves) can be obtained analytically, while for most cases, one has to resort to Monte Carlo simulation for computing the mutual information.

EXIT curves can be defined not only for channel encoders/decoders such as convolutional codes or parity-check codes, but also for components of many serially or parallel concatenated detection and decoding schemes: For example, EXIT curves have been used for describing channel interfaces such as mappers/demappers (detectors) for spectrally efficient modulation, or equalizers of multipath channels; even the decoder of an LPDC code can be viewed as a serial concatenation, with a variable node decoder and a check node decoder that, both, can be described by EXIT curves, respectively.

The main advantage of the EXIT chart technique is that the individual component processing blocks can be studied and characterized separately using EXIT curves, and that the interaction of two (or more) such processing blocks can be graphically predicted in the EXIT chart without performing a complex simulation of the actual fully fletched concatenated coding scheme itself.

As it turns out, the EXIT curves must not intersect to allow convergence to low BERs, and thus, code design reduces to finding good pairs of EXIT curves that match well, or, more constructively as in the case of LDPC codes, to apply curve-fitting algorithms to determine variable and check node degree profiles that match well. A decoding trajectory visualizes the iterative exchange of information between the processing blocks, and shows the progress of the decoding.

While the EXIT chart is exact on the ** binary erasure channel** (BEC) for sufficiently long/infinite sequence lengths, the reduction to single parameter tracking of the involved distributions is just an approximation for other channels. It has been observed, however, that the predicted and actually simulated decoding trajectories match quite well, proving the usefulness of the method, with many successful code designs performed in practice up to date.

Figure 3.11 shows the EXIT characteristics of the differential decoder for a QPSK constellation and both differential encoding maps.

We can clearly see that the characteristic of the detector employing the nonmatched trellis diagram has a nonincreasing shape, which is an indicator of a mismatched model used within the decoder: the decoder trellis does not leave the possibility open for phase slips to occur, but forces the result to a simply differentially encoded target sequence, which is, however, not the case after the phase-slip channel.

This nonincreasing shape is the reason for the error floor that has been observed. The decreasing EXIT characteristic means that during iterative decoding, the overall system performance actually decreases, which can lead to a severe error floor.

Engineers propose to employ hybrid turbo differential decoding (HTDD): by a careful execution of the differential decoder only in those iterations where the extrinsic information is low enough, the operating point in the EXIT chart is kept in the range of an increasing characteristic.

This approach allows the engineers to mitigate the detrimental effect of phase slips on iterative differential decoding and to realize codes with relatively low error floors, which can be combated using a high-rate outer code.

If we employ the trellis diagram of Figure 3.10 incorporating the phase-slip model instead of the nonmatched trellis diagram, we can see that the EXIT characteristics are monotonically increasing, which is a prerequisite for successful decoding with low error floors.

In the next section, we use the EXIT characteristics to compute the information theoretic achievable rates of the differentially encoded system. Further note that for \(\gamma\gt0\), the value of \(I_E^{[D]}\lt1\), even for \(I_A^{[D]}=1\), which may entail an error floor unless the channel code is properly designed.

### 2.5. Achievable Rates of the Differentially Coded Phase-Slip Channel

According to Shannon’s information theory, the capacity of a communication channel is the maximum amount of information (usually expressed in terms of ** bits per channel use**) that can be reliably conveyed over the channel. In information theory, the capacity is usually maximized over the input distribution of the channel.

In this tutorial, we are only interested in the maximum achievable information rate for uniform channel inputs \(y\), as we do not wish to impose any constraints on the data sequence. One possibility to achieve a nonuniform channel input is the use of ** constellation shaping**, which is, however, beyond the scope of this tutorial.

The comparison between the achievable rate of the channel affected by phase slips and the achievable rate of the original AWGN channel shows how much performance may be lost in the presence of phase slips. In order to compute the achievable rates of the differentially encoded channel affected by phase slips, we employ the EXIT chart technique.

By utilizing a slightly modified way of computing EXIT curves of the BCJR decoder, we can also compute the achievable rates of the coded modulation schemes. For this, we make use of the chain-rule of mutual information and compute the mutual information of the equivalent bit channel experienced by the channel decoder after differential detection.

This can be done by (numerically, simulation-based) computing the EXIT curve \(\tilde{I}_E^{[D]}\) of the differential detector using ** a priori** knowledge that is modeled as coming from a BEC, and integrating over such curves.

Specifically, EXIT curves such as those depicted in Figure 3.11 are determined for many different \(E_s/N_0\)-values (and several different phase-slip probability factors \(\gamma\)) but now with a priori knowledge based on a BEC model: By integration, we determine the area \(q\int_0^1\tilde{I}_E^{[D]}(I_A)dI_A\) under these curves and obtain the respective mutual information limits that are plotted into Figures 3.12 and 3.13 at the corresponding \(E_s/N_0\)-values and phase-slip probabilities factors \(\gamma\), respectively.

Note that this mutual information is available to the channel decoder provided that perfect iterative decoding over inner differential detector and outer LDPC decoder is performed. Thus, we still need to design an appropriate LDPC code and iterative decoding scheme to actually approach these promised rates as closely as possible.

Indeed, the subsequent sections explain how to construct such codes and coding schemes in more detail. The achievable rate of the noniterative system with separate differential decoding and channel decoding is obtained from \(q\tilde{I}_E^{[D]}(0)=qI_E^{[D]}(0)\).

Figures 3.12 and 3.13 show the numerically computed achievable rates for the QPSK constellation without differential coding on an AWGN channel that is not affected by phase slips (dotted lines, marker “”) as a reference and additionally the achievable rates for differentially encoded QPSK for a channel affected by phase slips (solid lines) with \(P_\text{slip}=\min\left(1,\frac{\gamma}{2}\text{erfc}\left(\sqrt{\frac{E_s}{2N_0}}\right)\right)\).

In Figure 3.12, we set \(\gamma=0\), and we observe that the achievable rate of the differential QPSK transmission equals the achievable rate of a conventional coherent QPSK transmission, independent of the differential encoding map.

In addition, we plot the achievable rates for a simplified system that carries out differential decoding (leading to the well-known effect of error doubling) followed by error correction decoding (dashed lines). We see that at a spectral efficiency of 1.6 (corresponding to system with Ω = 25% overhead for coding), the simplified non-iterative system leads to an unavoidable loss in \(E_s/N_0\) of 1.5 dB (Gray differential encoding map) or 2.5 dB (natural differential encoding map) respectively. This performance difference becomes even more severe if low spectral efficiencies (i.e., high coding overheads) are targeted.

If phase slips occur on the channel (\(\gamma\gt0\)), we can observe in Figure 3.13 that for high spectral-efficiencies (above 1.5 bits/channel use), the loss in information rate due to the phase slips is not severe, unless \(\gamma\) becomes large. For example, for \(\gamma=0.2\), the capacity loss at a spectral efficiency of 1.5 bit/channel use is only approximately 0.7 dB. The transmission at very low spectral efficiencies, requiring codes with very large overheads, is however seriously affected by the phase-slip channel.

## 3. LDPC-Coded Differential Modulation

In the previous section, we have compared the achievable rates of various systems for an AWGN channel (\(\gamma=0\)) and we have found that differential coding can be used without entailing a decrease of the communication system’s achievable rate.

This means that at least from an information theoretic perspective, we can employ differential coding to combat phase slips without introducing any decoding penalty. Information theory, however, does not tell us what constructive method we may use to achieve this capacity.

One particularly promising way to approach the capacity with differential coding is the use of coded differential modulation with iterative decoding with convolutional codes and with LDPC codes.

This scheme extends the bit-interleaved coded modulation (BICM) method to account for differential encoding and employs iterative decoding and detection to improve the overall system performance.

The adaptation of this scheme to optical communications has been considered for the channel not affected by phase slips and for the channel affected by phase slips. Note that other schemes have been proposed that do not rely on iterative differential decoding, including the slip resilient code presented and block differential modulation.

Figure 3.14 shows the general transmitter (a) and iterative receiver (b) of the coded differential modulation system with iterative decoding and detection.

In this general block diagram, a block FEC encoder takes as input a binary length-\(k\) vector of input bits \(\pmb{u}=(u_1,u_2,…, u_k)\), where \(u_i\in\mathbb{F}_2=\{0,1\}\) and generates a binary length-\(n\) vector of code bits \(\pmb{x}=(x_1,x_2,…, x_n)\).

Almost all of the popular channel codes that are used in optical communications are such block codes. The amount \(n-k\) of redundant bits that are added by the FEC encoder is commonly expressed in terms of the code rate \(r\), which is defined as the ratio of the information block length \(k\) and the code dimension \(n\), that is,

\[r:=\frac{k}{n}\]

In optical communications, often the overhead is used to quantify the amount of redundant information. The overhead \(\Omega\) of the code and its rate are interrelated by

\[\Omega:=\frac{n}{k}-1=\frac{n-k}{k}=\frac{1}{r}-1=\frac{1-r}{r}\]

The block \(\pmb{x}\) of code bits is interleaved by a permutation \(\Pi\) to yield a permuted version \(\tilde{\pmb{x}}\). Ideally, a random permutation is employed, but sometimes, a structure in the permutation is necessary to facilitate implementation (parallelization) or to improve the error correction capabilities of the code.

Note that the permutation \(\Pi\) is sometimes implicitly included in the FEC encoder and does not need to be explicitly implemented. The interleaved block \(\tilde{\pmb{x}}\) is differentially encoded yielding a block of \(\tilde{n}=\left\lceil\frac{n}{q}\right\rceil\) modulation symbols (where \(\lceil\tau\rceil\) denotes the smallest integer larger or equal than \(\tau\)).

At the receiver, the differential decoder and the FEC decoder iteratively decode the signal, where the output of the FEC decoder is used to yield an improved differential decoding result in a subsequent iteration by sharing so-called extrinsic information between the decoder components.

In the remainder of this section, we assume that the employed FEC scheme is an LDPC code. We first give an introduction to LDPC codes and then show how irregular LDPC codes can be designed to be well-adapted to differential coding. We do not show explicitly how decoding is performed, as we intend to take on a more code design-oriented perspective.

We restrict ourselves in the remainder of this tutorial to the case where \(V=Q=2^q\), that is, every state \(\text{S}_i\) is assigned to the modulation symbol \(M_i\). We, however, give hints on how to deal with the case \(V\lt{Q}\) in a later section.

### 3.1. Low-Density Parity-Check (LDPC) Codes

LDPC codes were developed in the 1960s by Gallager in his landmark Ph.D. thesis. These codes were not further investigated for a long time due to the perceived complexity of long codes. With the discovery of turbo codes in 1993 and the sudden interest in iteratively decodable codes, LDPC codes were rediscovered soon afterwards.

In the years that followed, numerous publications from various researchers paved the way for a thorough understanding of this class of codes leading to numerous applications in various communication standards, such as WLAN (IEEE 802.11), DVB-S2, and 10G Ethernet (IEEE 802.3).

LDPC codes for soft-decision decoding in optical communications were studied. Modern high-performance FEC systems are sometimes constructed using a soft-decision LDPC inner code that reduces the BER to a level of \(10^{-3}\) to \(10^{-5}\) and a hard-decision outer code that pushes the system BER to levels below \(10^{-12}\).

An outer cleanup code is used, as most LDPC codes exhibit a phenomenon called error floor: above a certain SNR, the BER does not drop rapidly anymore but follows a curve with a small slope. This effect is mainly due to the presence of trapping sets or absorbing sets.

The implementation of a coding system with an outer cleanup code requires a thorough understanding of the LDPC code and a properly designed interleaver between the LDPC and outer code for avoiding that the errors at the output of the LDPC decoder—which typically occur in clusters—cause uncorrectable blocks after outer decoding.

With increasing computing resources, it is now also feasible to evaluate very low target BERs of LDPC codes and optimize the codes to have very low error floors below the system’s target BER.

A plethora of LDPC code design methodologies exist, each with its own advantages and disadvantages. The goal of an LDPC code designer is to find a code that yields high coding gains and possesses some structure facilitating the implementation of the encoder and decoder.

An LDPC code is defined by a sparse binary parity check matrix \(\pmb{H}\) of dimension \(m\times{n}\), where \(n\) is the code word length (in bits) of the code and \(m\) denotes the number of parity check equations defining the code.

Usually, the number of information bits equals \(n-m\). The overhead of the code is defined as \(\Omega=\frac{m}{n-m}\). A related measure is the rate of the code, which is defined as \(r=\frac{n-m}{n}\).

Sparse means that the number of “1”s in \(\pmb{H}\) is small compared with the number of zero entries. Practical codes usually have a fraction of “1”s that is below 1% by several orders of magnitude.

We start by introducing some notation and terminology related to LDPC codes. Each column of the parity check matrix \(\pmb{H}\) corresponds to 1 bit of the FEC frame. The \(n\) single bits of the code are also often denoted as variables. Similarly, each row of \(\pmb{H}\) corresponds to a parity check equation and ideally defines a single parity bit (if \(\pmb{H}\) has full rank).

#### 3.1.1. Regular and Irregular LDPC Codes

LDPC codes are often classified into two categories: regular and irregular LDPC codes. In this tutorial, we consider the latter, which also constitutes the more general, broader class of codes.

The parity check matrix of regular codes has the property that the number of “1”s in each column is constant and amounts to \(\text{v}_\text{reg.}\) (called ** variable degree**) and that the number of “1”s in each row is constant and amounts to \(\text{c}_\text{reg.}\). (called

**). Clearly, \(n\cdot\text{v}_\text{reg.}= m\cdot\text{c}_\text{reg.}\) has to hold and we furthermore have \(r=1-\frac{\text{v}_\text{reg.}}{\text{c}_\text{reg.}}\).**

*check degree*Irregular LDPC codes have the property that the number of “1”s in the different columns of \(\pmb{H}\) is not constant. In this tutorial, we mainly consider column-irregular codes, which means that only the number of “1”s in the columns is not constant but the number of “1”s in each row remains constant. The irregularity of the parity-check matrix is often characterized by the degree profile of the parity check matrix \(\pmb{H}\).

We denote the number of columns of the parity-check matrix \(\pmb{H}\) with \(i\) ones by \(\Lambda_i\). We say that these columns have ** degree \(i\)**. Normalizing this value to the number of total bits \(n\) per codewords yields

\[L_i=\frac{\Lambda_i}{n}\]

which is the ** fraction** of columns with degree \(i\), that is, with \(i\) ones (e.g., if \(L_3=\frac{1}{2}\), half the columns of \(\pmb{H}\) have three “1”s).

Similarly, we can define the ** check degree profile** by defining that \(P_j\) denotes the number of rows of \(\pmb{H}\) with exactly \(j\) “1”s. The normalized check profile is given by \(R_j\), the

**of rows with \(j\) “1”s. We have the \(R_j=\frac{P_j}{m}\).**

*fraction*In most of the codes we consider, however, all rows of \(\pmb{H}\) have the same number of \(\text{c}_\text{reg.}\) “1”s. In that case, we have \(R_{\text{c}_\text{reg.}}=1\) and \(R_1=R_2= · · · =R_{\text{c}_\text{reg.}−1}=R_{\text{c}_\text{reg.}+1}= · · · =R_\infty=0\). Example 3.3 illustrates the degree distribution of such an irregular LDPC code.

**Example 3.3**

Consider the following LDPC code of size \(n=32\) with parity-check matrix of size dim \(\pmb{H}=m\times{n}=8\times32\), that is, of rate \(r=\frac{32-8}{32}=0.75\), corresponding to an overhead of 33.3%. Note that the zeros in \(\pmb{H}\) are not shown for clarity.

The first eight columns of \(\pmb{H}\) have two “1”s per column, that is, \(\Lambda_2=8\). Furthermore, the middle 16 columns each contain three “1”s, that is, \(\Lambda_3=16\). Finally, the last eight columns contain five “1”s, that is, \(\Lambda_5=8\). Normalizing leads to

\[L_2=\frac{\Lambda_2}{n}=\frac{1}{4},\quad{}L_3=\frac{\Lambda_3}{n}=\frac{1}{2},\quad{}L_5=\frac{\Lambda_5}{n}=\frac{1}{4}\]

Note that \(L_1=L_4=L_6=L_7=· · ·=0\). The number of “1”s in each row of \(\pmb{H}\) is constant and amounts to \(\text{c}_\text{reg.}=13\).

#### 3.1.2. Graph Representation of LDPC Codes

LDPC codes are often represented by a so-called ** Tanner** graph. This graph is an undirected bipartite graph in which the nodes can be partitioned into two disjoint sets and each edge connects a node from the first set to a node from the second set. The Tanner graph allows for an easy description of the decoding algorithm of LDPC codes, which we do not detail here.

Figure 3.15 shows the graph representation of the toy code given in Example 3.3.

The circular nodes on the bottom of the graph represent the ** variable nodes**, which correspond to the bits in the codeword. As each codeword contains \(n\) bits, there are \(n\) variable nodes \(x_1,x_2,…,x_n\). The variable node \(x_i\) has one connection to the transmission channel (arrow from the bottom) and \(j\) additional connections toward the top where \(j\) equals the number of “1”s in the \(i\)th column of \(\pmb{H}\).

For instance, the first \(\Lambda_2\) variables \(x_1,…, x_{\Lambda_2}\) (where \(\Lambda_2=8\)) of the code have two connections toward the graph part of the code and an additional connection from the transmission channel. As shown in Example 3.3, the variable nodes can be divided into three groups, corresponding to the degree of these variables.

The rectangular nodes on the top of the graph are the so-called ** check nodes**. Each check node \(c_i\) corresponds to one of the \(m\) rows of the parity-check matrix \(\pmb{H}\) of the code and defines a code constraint. The number of connections of the check nodes with the graph corresponds to the number of “1”s in the respective row of \(\pmb{H}\).

In the above-mentioned example, every row has \(\text{c}_\text{reg.}=13\) “1”s, so that each of the check nodes has exactly \(\text{c}_\text{reg.}=13\) connected edges. If \(\pmb{H}\) has a nonzero entry at row \(i\) and column \(j\), that is, \(H_{i,j}=1\), then an edge connects variable node \(x_j\) to check node \(c_i\).

As drawing the graph of the code in this way quickly becomes cumbersome and confusing due to the large number of edges, we resort to a simplified (and rotated) representation shown in Figure 3.16.

In this figure, we do not draw all the edges, but only the beginning and end of each edge and assume that the permutation of the edges is managed by an interleaver \(\Pi^{[\text{LDPC}]}\). The interleaver \(\Pi^{[\text{LDPC}]}\) thus ensures that the connections between the different nodes corresponds to the one given by the parity-check matrix \(\pmb{H}\).

#### 3.1.3. Design of Irregular LDPC Codes

The design of irregular LDPC codes consists of finding good degree distributions, that is, good values \(\Lambda_i\) and \(P_i\) (or \(\text{c}_\text{reg.}\)) such that the rate of the code has the desired value (given by the system designer) and such that the NCG achievable by this code is maximized, that is, the code is able to successfully recover the bit stream at the lowest possible \(E_s/N_0\) value.

A comprehensive body of literature on the design of irregular codes exists and we only introduce the basics to describe the optimization of codes tailored to slip-tolerant differential decoding.

The optimization of irregular LDPC codes requires the use of edge-perspective degree distributions.

**Definition 3. (Edge-Perspective Degree Distribution)**

In the Tanner graph representation of the code, we denote by \(\lambda_i\) the fraction of edges that are connected to variable nodes of degree \(i\). We have

\[\tag{3.4}\lambda_i=\frac{i\cdot{L_i}}{\sum_{j=1}^\infty{j\cdot}L_j}\]

Similarly, \(\rho_i\) denotes the fraction of edges that are connected to check nodes of degree \(i\). Again, we have

\[\rho_i=\frac{i\cdot{R_i}}{\sum_{j=1}^\infty{j\cdot}R_j}\]

Using the technique of EXIT charts, good values of \(\lambda_i\) and potentially \(\rho_i\) may be found that can then be used to design a parity-check matrix \(\pmb{H}\) fulfilling these degree distributions. We constrain the maximum possible variable node degree to be \(\text{v}_\text{max}\) and the maximum possible check node degree to be \(\text{c}_\text{max}\).

The inverse relationship between \(\lambda_i\) and \(L_i\), or between \(\rho_i\) and \(R_i\), respectively, reads

\[\tag{3.5}L_i=\frac{\frac{\lambda_i}{i}}{\sum_{j=1}^{\text{v}_\text{max}}\frac{\lambda_j}{j}}\qquad\text{and}\qquad{R_i}=\frac{\frac{\rho_i}{i}}{\sum_{j=1}^{\text{c}_\text{max}}\frac{\rho_j}{j}}\]

The (iterative) LDPC decoding process may be understood as a process where two decoders pass information between each other. The first decoder is the ** variable node decoder** (VND), which processes each of the \(n\) variable nodes of the code. The second decoder is the

**(CND), which processes each of the \(m\) check nodes. Each of these decoders has a certain**

*check node decoder***(EXIT) characteristic.**

*information transfer*Before describing the transfer characteristics, we introduce the \(J\)-function that interrelates mean \(\mu\) (and variance, which amounts \(2\mu\) in the case of symmetric messages) and mutual information for the Gaussian random variable describing the messages that are exchanged in the iterative decoder, with

\[J(\mu)=1-\int_{-\infty}^{\infty}\frac{e^{-(\tau-\mu)^2/(4\mu)}}{\sqrt{4\pi\mu}}\log_2(1+e^{-\tau})d\tau\]

which can be conveniently approximated by

\[\begin{align}I&=J(\mu)\approx\left(1-2^{-H_1(2\mu)^{H_2}}\right)^{H_3}\\\mu&=J^{-1}(I)\approx\frac{1}{2}\left(-\frac{1}{H_1}\log_2\left(1-I^{\frac{1}{H_3}}\right)\right)^{\frac{1}{H_2}}\end{align}\]

with \(H_1=0.3073\), \(H_2=0.8935\), and \(H_3=1.1064\).

In the case of LDPC codes and transmission over an AWGN channel, the information transfer characteristics are obtained as

\[\tag{3.6}I_E^{[V]}=f_V\left(I_A^{[V]},\frac{E_s}{N_0}\right):=\sum_{i=1}^{\text{v}_\text{max}}\lambda_iJ\left(4\frac{E_s}{N_0}+(i-1)J^{-1}(I_A^{[V]})\right)\]

\[\tag{3.7}I_E^{[C]}=f_C\left(I_A^{[C]}\right):=\sum_{i=1}^{\text{c}_\text{max}}\frac{\rho_i}{\log(2)}\sum_{j=1}^{\infty}\frac{\left(\Phi_j\left(J^{-1}(I_A^{[C]})\right)\right)^{i-1}}{2j(2j-1)}\]

where

\[\Phi_i(\mu)=\int_{-1}^1\frac{2\tau^{2i}}{(1-\tau^2)\sqrt{4\pi\mu}}\exp\left\{-\frac{\left(\mu-\log\frac{1+\tau}{1-\tau}\right)^2 }{4\mu}\right\}d\tau\]

Equation (3.6) describes the characteristic of the VND while (3.7) describes the characteristic of the CND. For codes with regular check node degree, (3.7) can be simplified to

\[I_E^{[C]}=f_C(I_A^{[C]}):=\frac{1}{\log(2)}\sum_{j=1}^{\infty}\frac{\left(\Phi_j\left(J^{-1}(I_A^{[C]})\right)\right)^{\text{c}_\text{reg.}-1}}{2j(2j-1)}\]

As \(I_A^{[V]}=I_E^{[C]}\) holds in the context of iterative decoding, a condition for successful decoding is that

\[\tag{3.8}f_V\left(I,\frac{E_s}{N_0}\right)\gt{f}_C^{-1}(I),\forall{I}\in[0;1)\]

where the inverse function \(f_C^{-1}(I)\) of the strictly monotonically increasing function \(f_C\) given in (3.7) can be found using numerical methods. The task of the code designer is to find a degree distribution minimizing \(E_s/N_0\) such that (3.8) is fulfilled. Usually, the condition (3.8) is evaluated at discrete values of \(I\) only, simplifying the optimization.

Some more conditions usually apply to the degree distributions. One of these is the so-called stability condition, which, in the case of an AWGN channel ensures that

\[\lambda_2\le\frac{\exp\left(\frac{E_s}{N_0}\right)}{\sum_{i=1}^{\text{c}_\text{max}}\rho_i(i-1)}\]

### 3.2. Code Design for Iterative Differential Decoding

As described in Section 2.5, the differential decoder based on the BCJR algorithm can be characterized by an EXIT characteristic \(I_E^{[D]}=f_D(I_A^{[D]},E_s/N_0)\). Before optimizing the LDPC code toward the interworking with the differential decoding, we first have to define the decoder scheduling as we are concerned with a threefold iterative decoder loop: decoding iterations are carried out within the LDPC decoder and between LDPC decoder and differential decoder.

In this tutorial, we restrict ourselves to the following scheduling:

- (a) In a first initial step, the differential decoder is executed and generates initial channel-related information.
- (b) Using this initial channel-related information, a single LDPC iteration is carried out, that is, a single execution of the check node and variable node computing processors.
- (c) Using the accumulated variable node information from the LDPC graph, excluding the intrinsic channel-related information from the initial differential decoding execution (step (a)), the differential decoder is executed again, yielding improved channel-related information.
- (d) With the improved information from step (c), another single LDPC iteration is carried out. If the maximum number of allowed iterations is not yet reached, we continue with step (c).
- (e) If the maximum number of iterations is reached, the accumulated variable node information is used to get an a posteriori estimate of each bit.

In what follows, we now describe in detail how to find good degree distributions for iterative differential decoding.

Conditions for degree distributions were derived and it was analyzed if it is possible to construct codes that work equally well for differential coding and conventional nondifferential transmission. In this work, we solely consider the case of differential coding and we aim at showing different possibilities of degree distribution optimization with the goal to find the best possibility for LDPC-coded differential modulation with the above-mentioned decoder scheduling.

We only consider ** column irregular** codes in the remainder of this tutorial, that is, the number of “1”s in each row of the parity-check matrix \(\pmb{H}\) is constant and amounts to \(\text{c}_\text{reg.}\). Such a constraint is often imposed as it simplifies the hardware that is needed to implement the check node decoding operation, which is the most difficult operation in the LDPC decoder.

The complexity of this operation scales roughly linearly with the check node degree (i.e., the number of “1”s per row) and having a constant degree allows the hardware designer to implement a fixed and optimized check node computation engine.

The second constraint that we impose is that we only have three different variable node degrees, namely \(\Lambda_2\) variable nodes of degree 2, \(\Lambda_3\) variable nodes of degree 3, and \(\Lambda_{\text{v}_\text{max}}\) variable nodes of degree \(\text{v}_\text{max}\).

This is in line with the findings that show that the degree distributions are often sparse and that only a few different values are often sufficient. Having only three different variable node degrees simplifies the hardware implementation, especially the design of the required bit widths in a fixed point implementation.

Contrary to many degree distribution approaches proposed, we first fix the rate \(r\) of the final code as the rate is usually constrained by the system design parameters (e.g., speed of analog-to-digital and digital-to-analog converters, pulse shape, channel bandwidth, framing overhead).

With fixed rate \(r\), we remove the dependencies of the degree distribution. We further assume that no nodes of degree 1 are present in the code, that is, \(\Lambda_1=0\) and thus \(\lambda_1=0\). As \(\sum_i\lambda=1\), we can uniquely determine \(\lambda_2\) as

\[\tag{3.9}\lambda_2=1-\sum_{i=3}^{\text{v}_\text{max}}\lambda_i\]

As the rate of the code is given by

\[\tag{3.10}r=1-\frac{\sum_{i=1}^{\text{c}_\text{max}}\frac{\rho_i}{i}}{\sum_{i=1}^{\text{v}_\text{max}}\frac{\lambda_i}{i}}\]

we can eliminate another dependency and by combining (3.10) with (3.9), we get

\[\tag{3.11}\lambda_3=3+6\sum_{i=4}^{\text{v}_\text{max}}\lambda_i\left(\frac{1}{i}-\frac{1}{2}\right)-\frac{6}{1-r}\sum_{i=1}^{\text{c}_\text{max}}\frac{\rho_i}{i}\]

For check-regular codes with regular check node degree \(\text{c}_\text{reg.}\) (i.e., \(\rho_{\text{c}_\text{reg.}}=1\)), (3.11) can be simplified to

\[\tag{3.12}\lambda_3=3-6\left(\frac{1}{\text{c}_\text{reg.}(1-r)}-\sum_{i=4}^{\text{v}_\text{max}}\lambda_i\left(\frac{1}{i}-\frac{1}{2}\right)\right)\]

This means that \(\lambda_2\) and \(\lambda_3\) are uniquely determined by \(\lambda_4,\lambda_5,...,\lambda_{\text{v}_\text{max}}\). If we only allow \(\lambda_2\), \(\lambda_3\) and \(\lambda_{\text{v}_\text{max}}\) to be nonzero, then \(\lambda_2\) and \(\lambda_3\) are uniquely determined by \(\lambda_{\text{v}_\text{max}}\) and we have

\[\tag{3.13}\lambda_3=3-6\left(\frac{1}{\text{c}_\text{reg.}(1-r)}-\lambda_{\text{v}_\text{max}}\left(\frac{1}{\text{v}_\text{max}}-\frac{1}{2}\right)\right)\]

\[\tag{3.14}\lambda_2=-2-\lambda_{\text{v}_\text{max}}+6\left(\frac{1}{\text{c}_\text{reg.}(1-r)}-\lambda_{\text{v}_\text{max}}\left(\frac{1}{\text{v}_\text{max}}-\frac{1}{2}\right)\right)\]

For determining the degree distribution, the choice of the interleaving scheme between LDPC code and differential encoder/decoder is crucial. In fact, this choice determines how to select the degree distribution and finally has an influence on the overall system performance.

#### 3.2.1. Design of LDPC Codes—Full Interleaving

The first way of interleaving consists of placing a full interleaver \(\Pi^{[\text{diff}]}\) of size \(n\) between differential code and LDPC code, as depicted in Figure 3.17. The interleaver \(\Pi^{[\text{diff}]}\) is placed between the differential decoder and the variable nodes of the LDPC code, such that the interleaved output of the differential decoder mimics the transmission channel output.

As the transmission channel is in this case the combination of differential decoder and interleaver, we need to modify the convergence condition (3.8). Instead of having a function describing the information transfer of the VND, we introduce a function \(f_{V,D}\) that describes the information transfer of the combined differential decoder and VND. This combined information transfer function is given by

\[I_E^{[V,D]}=\sum_{i=1}^{\text{v}_\text{max}}\lambda_iJ\left(\mu_c+(i-1)J^{-1}(I_A^{[V,D]})\right)\]

The value \(\mu_c\) is the mean of the message that is sent from the differential decoder towards the LDPC code. Using the EXIT characteristic of the differential decoder \(f_D(I, E_s/N_0)\), which can be prerecorded and potentially represented by a polynomial, we can express \(\mu_c\) as

\[\mu_c=J^{-1}\left(f_D\left(\sum_{i=1}^{\text{v}_\text{max}}L_iJ\left(i\cdot{J}^{-1}(I_A^{[V,D]})\right),\frac{E_s}{N_0}\right)\right)\]

which leads to the overall EXIT characteristic of the combined VND and differential decoder

\[\tag{3.15}\begin{align}f_{V,D}\left(I,\frac{E_s}{N_0}\right)&=\sum_{i=2}^{\text{v}_\text{max}}\lambda_iJ\left(J^{-1}\left(f_D\left(\sum_{j=2}^{\text{v}_\text{max}}L_jJ\left(j\cdot{J}^{-1}\left(I_A^{[V,D]}\right)\right),\frac{E_s}{N_0}\right)\right)\right.\\&\left.+(i-1)J^{-1}(I_A^{[V,D]})\right)\\&=\sum_{i=2}^{\text{v}_\text{max}}\lambda_iJ\left(J^{-1}\left(f_D\left(\sum_{j=2}^{\text{v}_\text{max}}\frac{\lambda_jJ\left(j\cdot{J}^{-1}(I_A^{[V,D]})\right)}{j\sum_{\kappa=2}^{\text{v}_\text{max}}\frac{\lambda_\kappa}{\kappa}},\frac{E_s}{N_0}\right)\right)\right.\\&\left.+(i-1)J^{-1}(I_A^{[V,D]})\right)\end{align}\]

where we have used (3.5) in the second line of the equation. This leads to the condition for successful decoding

\[\tag{3.16}f_{V,D}\left(I,\frac{E_s}{N_0}\right)\gt{f}_C^{-1}(I),\quad\forall{I}\in[0;1)\]

In this case, the stability condition reads

\[\tag{3.17}\lambda_2\lt\frac{1}{\text{c}_\text{reg.}-1}\exp\left(\frac{J^{-1}\left(f_D\left(1,\frac{E_s}{N_0}\right)\right)}{4}\right)\]

As the function \(f_{V,D}(⋅,⋅)\) is not linear in \(\lambda_i\) (and even not necessarily convex), the elegant linear programming-based optimization cannot be applied. We have to resort to heuristic optimization methods such as differential evolution or simulated annealing.

If we assume, however, that the degree distribution only consists of three degrees 2, 3, and \(\text{v}_\text{max}\), then we have seen before that by fixing \(\lambda_{\text{v}_\text{max}}\), the values of \(\lambda_2\) and \(\lambda_3\) are immediately given. Thus, the problem of finding the optimal degree distribution reduces to a one-dimensional problem. By sweeping \(\lambda_{\text{v}_\text{max}}\) between the extremes of the admissible interval [0, 1], we can find the best possible degree distribution.

We use the following binary search to find the best possible degree distribution. We first assume that the differential decoder EXIT characteristic is available for any \(E_s/N_0\) value between \(\frac{E_s}{N_0}|_\text{min}\) and \(\frac{E_s}{N_0}|_\text{max}\). We fix a minimum step size \(\Delta_\text{min}\) and use Algorithm 3.1, which outputs the optimum \(\text{c}_\text{reg.}\), \(\lambda_2\), \(\lambda_3\), and \(\lambda_{\text{v}_\text{max}}\).

We have found that using only three different variable node degrees and only a fixed check node degree does not impose a noteworthy limitation and that the performance of the obtained codes is very close to the performance of codes designed with less constraints, provided that \(\text{v}_\text{max}\) is chosen large enough.

The full interleaving scheme has several limitations. For instance, the EXIT chart-based optimization assumes that the messages exchanged between the different decoder components are Gaussian distributed.

This is, however, not the case when interleaving the outputs of all different variables; in this case, the messages show rather a distribution that can be described by a Gaussian mixture, leading to inaccuracies of the model.

Even though the messages may be conveniently approximated by Gaussian distributions (if the variances of the different parts of the mixture do not vary much), the codes designed according to this model may not yield the best possible performance.

#### 3.2.2. Design of LDPC Codes—Partial Interleaving

In order to mitigate the limitations of the full interleaving approach of the previous section, we replace the single interleaver \(\Pi^{[\text{dff}]}\) by multiple partial interleavers.

In the partial interleaving case, we group all \(\Lambda_i\) variable nodes of degree \(i\), assign an interleaver of size \(\Lambda_i\) to these nodes, and employ a separate differential decoder/encoder for this group of variable nodes. The graph-based model with partial interleaving is shown in Figure 3.18 (where we assume that each \(\Lambda_i\) is a multiple of \(V\)).

If partial interleaving is used, the equations for analyzing the convergence and finding good degree distributions have to be modified as well. In this case, every variable node group (of degree \(i\)) has to be treated separately and is assigned its own differential decoder output \(\mu_{c,i}\) and we can write

\[\tag{3.18}I_E^{[V,D]}=\sum_{i=1}^{\text{v}_\text{max}}\lambda_iJ\left(\mu_{c,i}+(i-1)J^{-1}(I_A^{[V,D]})\right)\]

where \(\mu_{c,i}\) can be computed as

\[\mu_{c,i}=J^{-1}\left(f_D\left(J\left(i\cdot{J}^{-1}(I_A^{[V]})\right),\frac{E_s}{N_0}\right)\right)\]

This leads to the overall EXIT characteristic of the combined variable node differential decoder

\[f_{V,D}\left(I,\frac{E_s}{N_0}\right):=\sum_{i=1}^{\text{v}_\text{max}}\lambda_iJ\left(J^{-1}\left(f_D\left(J\left(i\cdot{J}^{-1}(I)\right),\frac{E_s}{N_0}\right)\right)+(i-1)J^{-1}(I)\right)\]

which is a linear function in \(\lambda\).

Due to the linearity of \(f_{V,D}(⋅,⋅)\), we can employ a simple linear programming optimization to find good values of \(\lambda_i\) for a given check node degree distribution.

However, if we only allow three different variable node degrees (and thus three partial interleavers), the problem reduces to a one-dimensional problem again, which we can solve in a similar way (using Algorithm 3.1) as for the case with full interleaving.

However, we would like to point out that due to the linearity of \(f_{V,D}(⋅,⋅)\), it is much easier to find optimal degree distributions. Numerical methods such as differential evolution cannot guarantee to find the global optimum of the problem.

#### 3.2.3. Comparison of Interleaving Schemes—Results

Using both interleaving schemes and differential QPSK transmission, we design degree distributions. We impose the constraint that the maximum variable node degree shall be \(\text{v}_\text{max}=12\).

For each interleaving scheme, we either use Gray differential encoding or natural differential encoding. For both options, we have optimized codes using Algorithm 3.1 with variable degrees \(\in\{2,3,12\}\) and regular check node degree.

We additionally design codes for the constraint where \(L_2\le1-r\). This constraint is necessary to avoid a high number of degree-2 nodes. It can be shown that a potential error floor can occur if the fraction of degree-2 nodes is larger than \(1-r\).

If we impose the constraint \(L_2\le1-r\), then we can design a code that avoids—in the graph description—cycles containing only degree-2 nodes and with no information bits assigned to degree-2 nodes.

The condition \(L_2\le1-r\) translates in the general case into

\[\lambda_2\le2\left(\frac{1}{r}-1\right)\sum_{j=3}^{\text{v}_\text{max}}\frac{\lambda_j}{j}\]

which can be added to Algorithm 3.1.

We generate codes of target rate \(r=\frac{4}{5}=0.8\), a rate typically used in optical communications. A rate of \(r=0.8\) is a viable selection for current and future 100 Gbit/s (with QPSK) or 200 Gbit/s systems operating in a dense wavelength division multiplex (DWDM) setting with 50 GHz channel spacing and an exploitable bandwidth of roughly 37.5 GHz due to frequent filtering with non-flat frequency characteristic.

The best possible achievable values of \(E_s/N_0\) (corresponding to \(E_\text{best}\) in Algorithm 3.1) for the different code designs are shown in Table 3.4 and the degree distributions of the resulting codes are summarized in Table 3.5. We can see that Gray differential coding with partial interleaving leads to the best coded transmission schemes operating at the lowest possible \(E_s/N_0\) values.

Figure 3.19 shows a first simulation results for the case of a QPSK modulation. We have constructed codes of size \(n=32,000\) having the degree distributions from Table 3.5. The utilized channel model is the phase-slip channel model from Figure 3.3 with \(\gamma=0\). In this example, we wish to confirm the results of Table 3.4.

As a reference scheme, we optimized an LDPC code for a simple AWGN channel with \(\text{v}_\text{max}=12\) and regular check node degree \(\text{c}_\text{reg.}\), which is used with noniterative differential decoding.

We can see that the results of Table 3.4 are indeed confirmed and the code with Gray differential coding and partial interleaving yields the best performance. As decoder, we use a conventional decoder with 18 decoding iterations, where in each iteration, we invoke the differential decoder.

Note that with the use of a layered decoder, the convergence speed can be increased and the same performance can be obtained by using only approximately 12 layered iterations. For this reason, the choice of 18 iterations is practical, as 12 layered LDPC iterations are deemed to be implementable.

In the second example, we increase the phase-slip probability on the channel by choosing \(\gamma=0.2\). The simulation results for this case are shown in Figure 3.20.

We can observe that the formerly best case with Gray differential coding now shows a significant error floor. With natural differential coding, an error floor is observed as well, however, at several orders of magnitude smaller.

This floor is mainly due to the large number of degree-2 nodes and the fact that \(I_E^{[D]}(I_A^{[D]}=1)\lt1\) if \(\gamma\gt0\). Indeed, for this optimization, most of the variable nodes are of degree-2, that is, \(\lambda_2\) and consequently \(L_2\) becomes very large.

It has also been observed that LDPC codes designed for differentially coded modulation require many degree-2 nodes. Degree-2 variable nodes are, however, a non-negligible contributor to the error floor, especially if there are cycles in the graph that connect only degree-2 variable nodes.

It has been shown that cycles containing only degree-2 variable nodes can be avoided if \(\Lambda_2\le{m}=n(1-r)\), that is, if \(L_2\le1-r\), which is why we have included that constraint into the optimization.

In this case, we may design a systematic code and assign only parity bits to the degree-2 variable nodes. This further reduces the error floor as the BER is calculated purely based on the systematic bits and the higher the variable node degree, the more reliable a bit is after decoding.

We have added the constraint \(L_2\le1-r\) to the optimization routine and have found the corresponding degree distribution (summarized in Table 3.5). In the simulation results shown in Figure 3.19, the performance of the code obtained with the \(L_2\le1-r\) condition is shown by the curve with square markers for Gray differential coding.

We see that in this case, we get a slightly better performance than natural differential coding, but have advantages with respect to the residual BER, at the expense of an increased required \(E_s/N_0\) to allow for successful decoding in the case where \(\gamma=0\) (see Figure 3.19).

Thus, depending on the required target BER, we may either use the condition \(L_2\le1-r\) or not. If an outer code is used that can correct up to an input BER of \(4.2\times10^{-3}\) (e.g., the staircase code) we may use the code designed without the constraint on \(L_2\), but if we use a higher rate outer code that requires a very low input BER, we may select the code designed with \(L_2\le1-r\).

We can thus summarize that there are somewhat conflicting code design strategies. If \(\gamma=0\), we can use the code optimized for Gray differential coding with partial interleaving. This code will, however, lead to an elevated error floor if phase slips occur on the channel. This error floor has to be combated either with a properly designed outer code or by using only the code with the constraint \(L_2\le1-r\), which however leads to a suboptimal performance in the phase-slip-free case (\(\gamma=0\)).

Another solution is the implementation of two codes, one for each case (\(\gamma=0\) and large \(\gamma\)). This latter method can guarantee best performance depending on the channel, but requires a feedback loop from the receiver to the transmitter, which may not be available in the network and of course it requires the implementation of two different codes, which may be too complex on an application specific integrated circuit (ASIC). In Section 4, we show a solution that requires only a single code and shows a more universal, channel-agnostic behavior.

### 3.3. Higher-Order Modulation Formats with V < Q

In practical systems, we often have to deal with the case where \(V<Q\), for example, if 16QAM is used, where we have \(V=4\) and \(Q=16\). In this case, we may use different techniques to optimize the code.

We propose to refine the method of partial interleaving and to use only differential coding on a fraction of \(\frac{\log_2V}{\log_2Q}=\frac{v}{q}\) of the bits. This is shown in Figure 3.21.

In this case, the value \(\mu_{c,i}\) required in (3.18) is computed as

\[\mu_{c,i}=\frac{v}{q}J^{-1}\left(f_D\left(J\left(i\cdot{J}^{-1}(I_A^{[V]})\right)\right)\right)+\left(1-\frac{v}{q}\right)\bar{\mu}_c\]

where \(\bar{\mu}_c\) denotes the mean of the LLRs of the bits that are not differentially encoded, obtained using a conventional bit-wise decoder (see (3.25)) and averaged over all these bits. These bits correspond to the part of the constellation encoded with the rotationally invariant mapping (symbols within a region associated with a state \(\text{S}_i\)).

Besides this simple and direct approach, we can also use more involved methods, for example, using the technique of multi-edge-type (MET) codes.

## 4. Coded Differential Modulation with Spatially Coupled LDPC Codes

In the 1960s and the following decades, most coding research focused on block coding techniques, but many practical coding schemes were based upon convolutional codes.

With the advent of turbo codes, the rediscovery of LDPC codes, and advances in semiconductor technology, this suddenly changed so that today most new coding schemes are, again, block codes. The trend is, however, to return to convolutional-like structures that can be efficiently encoded and decoded using sliding-window techniques.

In the last few years, the class of spatially coupled (SC) code ensembles has emerged. Spatially coupled codes were originally introduced more than a decode ago and were then called LDPC convolutional codes.

The appealing properties of SC codes were only recently noticed, when it was found that the performance of terminated SC-LDPC codes with simple belief propagation decoding approaches the maximum a posteriori (MAP) thresholds of the underlying ensemble.

Thus, contrary to a common belief, introducing structure into LDPC codes leads to a class of (degenerated) realizations of LDPC codes that demonstrate superior performance under belief propagation decoding.

This effect of threshold saturation has been analyzed for the BEC and it has been shown that spatially coupled LDPC codes can asymptotically achieve the MAP threshold of the underlying ensemble under belief propagation decoding.

Recently, this result has been extended to more general channels and it has been shown that spatially coupled regular LDPC ensembles universally achieve capacity over binary-input memoryless output-symmetric channels: most codes in this ensemble are good for each channel realization in this class of channels.

Spatially coupled codes are now emerging in various applications. Two examples in the context of optical communications are the staircase code and the braided BCH codes, which are both rate \(R=239/255\) codes targeted for 100 Gbit/s applications with hard-decision decoding.

Both codes are spatially coupled BCH product codes that allow for a natural windowed decoder implementation. These codes can be interpreted as being generalized spatially coupled LDPC codes with variable node degree \(d_v=2\) and every bit participating in two BCH component codes, where the component BCH codes are able to correct up to 4 errors each.

Another example is the IEEE 1901 power line communications standard, where an LDPC convolutional code is specified for the wavelet physical layer.

### 4.1. Protograph-Based Spatially Coupled LDPC Codes

In this tutorial, we restrict ourselves to the class of protograph-based construction of SC-LDPC codes.

A protograph is a convenient way of describing LDPC codes. Protograph codes are constructed from the \(P\)-cover of a relatively small graph that conveys the main properties of the code.

In contrast to the graph representation of the LDPC code, the protograph may contain multiple edges. The code itself is constructed by placing \(P\) copies of the protograph next to each other (note that these have no interconnecting edges) and permuting the edges between the different copies of the protograph, such that the relation between the group of edges is respected. The construction of a small toy code is illustrated in Example 3.4.

**Example 3.4**

We illustrate the construction of larger codes based on protographs using a simple toy example. Starting with a prototype matrix, also called *protomatrix*

\[\pmb{B}=\begin{pmatrix}1&2&3&0\\1&2&0&2\end{pmatrix}\]

we show how a \(P\)-cover is constructed. First, we construct an equivalent graph of the base matrix in the same way as we constructed the graph of the LDPC code in Section 3.1.2. The difference is that the nonzero entries in \(\pmb{B}\) indicate the number of ** parallel edges** connecting the variables with the checks.

The graph representation of the protomatrix \(\pmb{B}\) is given by

In the next step, we construct the \(P\)-cover of this graph, which means that we simply place \(P\) copies of this graph next to each other:

This graph is still not a valid LDPC code as it contains parallel edges and the \(P\) subgraphs are not connected. In order to remove the parallel edges and to construct a more randomized code, we permute in a next step all edges that are within one ** edge group**, that is, that correspond to a single entry \(B_{i,j}\) of \(\pmb{B}\). This permutation is performed in such a way that no parallel edges persist. The final code graph is then obtained by

Note that it is ** not** possible to draw this code with a single interleaver \(\Pi^{[\text{LDPC}]}\) as the code in Figure 3.16, because it is actually a multi-edge-type ensemble and for every single entry \(B_{i,j}\) of \(\pmb{B}\), an individual interleaver is required.

The parity-check matrix \(\pmb{H}\) can be constructed from \(\pmb{B}\) by replacing each entry \(B_{i,j}\) (row \(i\), column \(j\)) of \(\pmb{B}\) by the superposition of \(B_{i,j}\) permutation matrices, chosen such that no two “1”s are in the same position (avoiding parallel edges in the final graph).

For example, a parity-check matrix corresponding to the above-mentioned \(\pmb{B}\) with \(P=5\) is given by

We follow the approach to describe protograph-based SC-LDPC codes. The protograph of a time-invariant, terminated spatially coupled LDPC with syndrome former memory \(m_s\) and replication factor \(L\) is obtained from a collection of \((m_s+1)\) distinct protomatrices \(\pmb{B}_i\), \(i\in\{0, 1,…,m_s\}\) each of size dim \(\pmb{B}_i=m'\times{n}'\). The protomatrix of the spatially coupled code is then given by

\[\pmb{B}^{[\text{conv}]}(L)=\begin{bmatrix}\pmb{B}_0\\\pmb{B}_1&\pmb{B}_0&&\\\vdots&\pmb{B}_1&\ddots&\\\pmb{B}_{m_s}&\vdots&\ddots&\pmb{B}_0\\&\pmb{B}_{m_s}&\ddots&\pmb{B}_1\\&&\ddots&\vdots\\&&&\pmb{B}_{m_s}\end{bmatrix}_{(L+m_s)m'\times{L}n'}\]

\(\pmb{B}^{[\text{conv}]}(L)\) can also be viewed as being composed by a stack of \(L+m_s\) shifted (by \(n'\)) and overlapping versions of

\[\pmb{B}_r=(\pmb{B}_{m_s}\quad\ldots\quad\pmb{B}_0)\]

Note that the termination, which cuts the boundaries of the stacked matrix, leads to an inevitable rate loss, which becomes, however, negligible for increasing \(L\). The rate of the code amounts to

\[r_L=1-\left(\frac{L+m_s}{L}\right)\frac{m'}{n'}\]

If we are allowed to choose \(L\) large enough, we immediately see that \(\lim_{L\rightarrow\infty}r_L=1-\frac{m'}{n'}\), which corresponds to the rate of the original protomatrices \(\pmb{B}_i\).

One can say that the convergence of spatially coupled codes is well understood meanwhile. A thorough analysis for the BEC is given and extended to general binary input memoryless channels. The convergence behavior of the iterative decoder can be subdivided into two convergence regions

- The region of macro-convergence, where convergence is dominated by the code’s degree distribution and convergence prediction by conventional EXIT charts is possible.
- The region of micro-convergence, where the convergence is dominated by spatial coupling and termination effects.

The region of ** macro-convergence** is observed in the first decoding iterations and at high channel SNRs, whereas the region of

**is observed at low SNRs close to the thresholds of the code.**

*micro-convergence*In the region of micro-convergence, the decoding process can be visualized as a decoding wave that slowly progresses through the graph from the boundaries onward.

This wave-like behavior allows the efficient design of windowed decoders where the decoding window follows the decoding wave. The understanding of the dynamics of the decoding wave, especially its speed, is essential for designing high-performance codes and effective windowed decoders.

### 4.2. Spatially Coupled LDPC Codes with Iterative Demodulation

In this section, we combine spatially coupled codes with a differential decoder and use common analysis techniques to show how the detector front-end influences the performance of the codes.

As we have seen, in conventional LDPC code design, usually the code needs to be “matched” to the transfer curve of the detection front-end. If the code is not well matched to the front-end, a performance loss occurs.

If the detector front-end has highly varying characteristics, due to, for example, varying channels or varying phase-slip probabilities, several codes need to be implemented and always the right code needs to be chosen for maximum performance, which appears impractical in optical networks where a feedback channel from the receiver to the transmitter can potentially be difficult to realize.

In contrast to a random ensemble of the same degree profile, spatially coupled LDPC codes can converge below the pinch-off in the EXIT chart, so even if the code is not well matched to the differential decoder we can hope to successfully decode due to the micro-convergence effect.

So, even with a varying channel and detector characteristics, we can use a single code that is universally good in all scenarios. This means that the code design can stay agnostic to the channel/detector behavior.

We determine the thresholds of the protograph-based spatially coupled codes combined with demodulation and detection by an extension of the PEXIT technique with the Gaussian approximation of the check node operation.

A refined version of the PEXIT technique taking into account the windowed decoder of spatially coupled codes has been presented. The mutual information analysis used for the design of degree distributions in LDPC codes has to be modified slightly to account for the protograph structure.

Instead of analyzing and tracking a single mutual information value \(I\), we now have to track an individual mutual information value for each nonzero entry at row \(i\) and column \(j\) of the protomatrix \(\pmb{B}^{[\text{conv}]}\).

We denote the respective outgoing (incoming) edge mutual information by \(I_{E,i,j}^{[V,D]}(I_{A,i,j}^{[V,D]})\) or \(I_{E,i,j}^{[C]}(I_{A,i,j}^{[C]})\), depending on whether the message is computed by the combined variable node detector engine (“V, D”) or by the check node engine (“C”). Note that we assume that the messages are Gaussian distributed and can be described by a single parameter: their mean \(\mu\) (with the variance \(2\mu\)).

Similar to the previous section, we assume that the demodulator/detector properties can be described by means of an EXIT characteristic, which we denote by \(f_D(⋅, ⋅)\). If the message at the input of the detector is Gaussian distributed with mean \(\mu\), then the detector output mean for (protograph) variable \(j\) is obtained by

\[\tag{3.20}\mu_{c,j}=J^{-1}\left(f_D\left(J\left(\sum_{i=1}^{(L+m_s)m'}B_{i,j}^{[\text{conv}]}J^{-1}(I_{A,i,j}^{[V,D]})\right),\frac{E_s}{N_0}\right)\right)\]

leading to the combined variable node and detector update characteristic

\[\tag{3.21}I_{E,i,j}^{[V,D]}=J\left(\mu_{c,j}+\left(B_{i,j}^{[\text{conv}]}-1\right)J^{-1}\left(I_{A,i,j}^{[V,D]}\right)+\sum_{\begin{split}k=1\\k\ne{i}\end{split}}^{(L+m_s)m'}B_{k,j}^{[\text{conv}]}J^{-1}\left(I_{A,k,j}^{[V,D]}\right)\right)\]

which has to be evaluated for all \((i, j)\in[1,…, (L + m_s)m']\times[1,…, Ln']\) where \(B_{i,j}^{[\text{conv}]}\ne0\).

The check node information is computed

\[\tag{3.22}I_{E,i,j}^{[C]}=J\left(\phi^{-1}\left(1-\left[1-\phi\left(J^{-1}(I_{A,i,j}^{[C]})\right)\right]^{B_{i,j}^{[\text{conv}]}-1}\prod_{\begin{split}k=1\\k\ne{j}\end{split}}^{Ln'}\left[1-\Phi\left(J^{-1}(I_{A,i,k}^{[C]})\right)\right]^{B_{i,k}^{[\text{conv}]}}\right)\right)\]

which again has to be evaluated for all combinations of \((i, j)\) such that \(B_{i,j}^{[\text{conv}]}\ne0\). The function \(\phi(\mu)\), which is used to compute the evolution of the mean of the Gaussian messages in the check node update is given by

\[\tag{3.23}\phi(x)=\begin{cases}1-\frac{1}{\sqrt{4\pi{x}}}\int_{-\infty}^{\infty}\tanh\left(\frac{u}{2}\right)\exp\left(-\frac{(u-x)^2}{4x}\right)du,\qquad\text{if }x\gt0\\1,\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad\text{if }x=0\end{cases}\]

The evaluation of the information is carried out in an iterative way:

First, we initialize the process by setting \(I_{A,i,j}^{[V,D]}(1)=0\) for all possible \((i, j)\) where the “(1)” denotes the first iteration.

Using (3.20), we first compute \(\mu_{c,j}(1),\forall{j}\in[1,Ln']\) and use \(\mu_{c,j}(1)\) to compute \(I_{E,i,j}^{[V,D]}(1)\) by evaluating (3.21) for all \((i, j)\in[1,…, (L+m_s)m']\times[1,…, Ln']\).

We then set \(I_{A,i,j}^{[C]}(1)=I_{E,i,j}^{[V,D]}(1)\) and evaluate (3.22) yielding \(I_{E,i,j}^{[C]}(1)\). By setting \(I_{A,i,j}^{[V,D]}(2)=I_{E,i,j}^{[C]}(1)\) we may proceed to the second iteration and compute \(\mu_{c,j}(2)\), \(I_{E,i,j}^{[V,D]}(2)\) and \(I_{E,i,j}^{[C]}(2)\) in this sequence.

Finally, after \(\mathcal{I}\) iterations, we may—for each variable node in the protograph—determine the a posteriori reliability by

\[I_{\text{ap},j}^{[V]}=J\left(\mu_{c,j}+\sum_{k=1}^{(L+m_s)m'}B_{k,j}^{[\text{conv}]}J^{-1}\left(I_{A,k,j}^{[V,D]}(\mathcal{I})\right)\right)\]

We illustrate the behavior of \(I_{\text{ap},j}^{[V]}\), which gives an indication of the reliability of the \(P\) bits that will be assigned to position \(j\) in the protograph by means of an example.

We consider a spatially coupled code of rate \(r=0.8\) with \(m_s=2\) and with \(\pmb{B}_0=\pmb{B}_1=\pmb{B}_2= (1\;1\;1\;1\;1)\). We use QPSK with Gray differential coding and set \(\gamma=0.2\) at \(E_s/N_0=4.8\) dB; that is, according to (3.3), phase slips occur on the channel with a probability \(P_\text{slip}\approx0.0082\).

Figure 3.22 shows the behavior of the ** a posteriori** mutual information \(I_{\text{ap},j}^{[V]}(\mathcal{I})\) as a function of the decoding iterations \(\mathcal{I}\).

We can see that the mutual information \(I_{\text{ap},j}^{[V]}\) increases in a wave-like manner. Starting from the boundaries of the codeword, the mutual information converges toward 1 with an increasing number of iterations from the outside toward the inside until both waves meet and the whole codeword has been successfully decoded.

### 4.3. Windowed Differential Decoding of SC-LDPC Codes

By observing the wave-like decoding behavior in Figure 3.22, we note that only parts of the protograph get updated when carrying out iterations.

For instance, the reliability of the protograph variable node indices 110–140 stay at an almost constant value during the first 200 iterations. Likewise, the protograph variable node indices 1–30 have already converged to 1 after 110 iterations and do not benefit anymore from additional decoding iterations.

This wave-like behavior thus leads to an efficient windowed decoder, which follows the decoding wave and only carries out operations on the part of the protograph that benefits from further decoding iterations.

The windowed decoder works in principle as an LDPC decoder, just with the difference that it operates on a fraction of the parity-check matrix. The windowed decoder is characterized by a window length \(w\) and operates on a portion of the protomatrix containing \(w\) vertically stacked copies of \(\pmb{B}_r\)

The decoder takes a block of \((w+m_s)n'\) protograph variables (i.e., \((w+m_s)Pn'\) code bits), and carries out \(\mathcal{I}_w\) decoding iterations on this block (using the conventional decoding scheme).

After having carried out \(\mathcal{I}_w\) iterations, the window is shifted by \(n'\) protograph variables (\(Pn'\) code bits) and the left-most portion of \(Pn'\) code bits are considered to be decoded. Then, the process starts again.

At the beginning of the process, the decoding window is initialized with perfect knowledge of the boundary values.

### 4.4. Design of Protograph-Based SC-LDPC Codes for Differential-Coded Modulation

We show by means of an example how to construct good protographs for SC-LDPC codes in the context of differential-coded modulation. For simplicity, we restrict ourselves to SC-LDPC codes with \(m_s=2\) leading to a protomatrix given by

\[\pmb{B}_c=\begin{pmatrix}\pmb{B}_0\\\pmb{B}_1&\pmb{B}_0\\\pmb{B}_2&\pmb{B}_1&\ddots\\&\pmb{B}_2&\ddots&\pmb{B}_0\\&&\ddots&\pmb{B}_1\\&&&\pmb{B}_2\end{pmatrix}\]

We select \(m_s=2\), as this choice leads to windowed decoders with a relatively compact decoding window. Note that the minimum required decoding window size grows almost linearly with \(m_s\). Another viable choice would be \(m_s=1\); however, simulation results that are not shown here have indicated that a better performance can be expected with \(m_s=2\).

We consider iterative differential decoding as described in the previous section using the modified slip-resilient BCJR algorithm based on the trellis of Figure 3.10 and wish to design coding schemes of rate \(r=0.8\) (25% OH).

We use protographs leading to regular SC-LDPC codes with variable degree \(\text{v}_\text{reg.}=3\) and check degree \(\text{c}_\text{reg.}=15\).

The reason for using regular codes is that it has been shown that regular SC-LDPC are sufficient to achieve capacity and this particular code has a MAP threshold very close to capacity.

Furthermore, due to the regularity, the implementation of the decoder can be simplified and the error floor is expected to be very low (increasing \(\text{v}_\text{reg.}\) further may even lead to lower error floors).

Although \(\text{v}_\text{reg.}\), \(\text{c}_\text{reg.}\), and \(m_s\) are fixed, we still need to find good protomatrices \(\pmb{B}_0\), \(\pmb{B}_1\), and \(\pmb{B}_2\). In order to have a very simple structure, we fix \(m'=1\) and \(n'=5\), leading to the smallest possible protomatrices.

We have constructed all 1837 possible such combinations of protographs (unique up to column permutation) and computed decoding thresholds for all these protographs using the above-described method. We selected the protographs with the 50 best thresholds and carried out Monte Carlo simulations with three different windowed decoders:

- The first setup uses a window size of \(w=4\) and carries out \(\mathcal{I}_w=3\) iterations per decoding step.
- The second setup uses a window size of \(w=7\) and carries out \(\mathcal{I}_w=2\) iterations per decoding step.
- The third setup uses a window size of \(w=16\) and carries out a single iteration per decoding step, that is, \(\mathcal{I}_w=1\).

Note that with all three setups, every coded bit undergoes and equivalent number of 18 iterations, leading to the same complexity of all decoders and the same complexity as the LDPC-coded schemes presented in Section 3.

Furthermore, note that the number of differential decoder executions per coded bit amounts to \(w+m_s\) and thus depends on the setup.

In the case of LDPC-coded differential demodulation, we have executed the differential decoder for each iteration. The required \(E_s/N_0\) (in dB) to achieve a target BER of \(10^{-6}\) for the 50 selected protographs is shown in Figure 3.23 for \(L=100\) and \(P_\text{slip}\in\{0, 0.01\}\).

Based on the results of Figure 3.23, we select the protograph with index 3, which has the best performance compromise at \(P_\text{slip}=0\) and \(P_\text{slip}=0.01\), and which is given by \(\pmb{B}_0=(1\;1\;1\;1\;1)\), \(\pmb{B}_1=(1\;0\;0\;0\;0)\), and \(\pmb{B}_2=(1\;2\;2\;2\;2)\).

Finally, we use this protograph to construct a code by first lifting the protographs \(\pmb{B}_i\) with \(P=40\) and using the intermediate result in a second step to generate a quasi-cyclic (QC) code with circulant permutation matrices of size 50 × 50.

The resulting parity-check submatrices \(\pmb{H}_i\) associated with \(\pmb{B}_i\) have size dim \(\pmb{H}_i=2000\times10,000\). As reference, we use QPSK with Gray differential coding and pick the two best codes found in Section 3.2: partial interleaving with no restriction on \(L_2\) and partial interleaving with \(L_2\le1-r\). As in Figures 3.19 and 3.20, we use \(I=18\) decoding iterations in all cases. The results are shown in Figure 3.24 for \(\gamma=0\) (i.e., no phase slips) and in Figure 3.25 for \(\gamma=0.2\).

We can see from the results that for \(\gamma=0\), the LDPC code with partial interleaving (no restriction on \(L_2\)) already yields a very good performance within 1 dB of the theoretically minimum \(E_s/N_0\), while the code with the constraint on \(L_2\) entails a performance loss of about 0.4 dB (see also Figures 3.19 and 3.20).

The SC-LDPC code with the second decoder setup (\(w=7\)) outperforms the LDPC code for low BERs due to the steepness of the waterfall curve. If the phase-slip probability is nonzero with \(\gamma=0.2\), which may occur in a highly nonlinear DWDM transmission with OOK neighbors, we observe from Figures 3.19 and 3.25 that the LDPC code optimized without any constraints no longer performs well and has a severe error floor requiring strong outer coding.

The error floor can be reduced by designing a code with \(L_2\le1-r\). If \(\gamma=0.2\), the SC-LDPC code with decoder setup 3 almost shows the same performance as the LDPC code with the constraint \(L_2\le1-r\) and outperforms it at low error rates (we did not observe any error floor in our simulations).

The proposed SC-LDPC code has the important advantage of universality: A single SC code is powerful in all transmission scenarios requiring only a single channel agnostic transmitter and a receiver selecting the best decoding setup depending on \(\gamma\) and \(P_\text{slip}\). The transmission scheme can thus be kept simple and only a single code needs to be implemented.

In the setup with conventional LDPC codes, two different codes have to be implemented, depending on the setup: one code for small \(\gamma\) and another code, possibly with \(L_2\le1-r\), for larger values of \(\gamma\).

The transmitter has to know the expected \(\gamma\) on the channel and adapt the coding scheme accordingly, which requires an undesired feedback channel and changes in the network control plane.

Another possibility is to deliberately design an LDPC code that shall perform well in both cases: this leads, however, to a compromise in the construction and codes that are not able to compete with codes optimized for a specific scenario.

With the SC-LDPC code, we can use a single code and use receiver processing to estimate \(\gamma\) and setup the windowed decoder (\(w\) and \(\mathcal{I}_w\)) accordingly.

Note that for large \(\gamma\), it is advantageous to use a long window \(w\), which means that the number of differential decoding executions per bit shall be maximized. This is in contrast to the HTDD approach, which however uses a differential decoder not adapted to the channel. We conclude that executing the differential detector does not degrade the performance as long as it is well adapted to the channel model.

## 5. Conclusions

In this tutorial, we have described some important aspects of soft-decision FEC with iterative decoding. We have shown that the phenomenon of phase slips, which occurs frequently in the case of coherent long-haul optical communications, can be combated effectively using iterative differential decoding.

We have shown that the achievable information rate is not affected by iterative differential decoding and that differential decoding only leads to an unavoidable performance loss if not properly decoded.

We have further shown how phase slips affect the achievable rate and how to design a trellis diagram describing the differential code affected by phase slips. In order to achieve the best possible performance, this differential decoder needs some well-adapted code design.

We have proposed different design guidelines for LDPC codes and have shown that, depending on the channel quality, there is a different code design that may lead to the desired performance, especially if very low residual BERs are targeted.

Finally, we have shown that spatially coupled codes offer a more universal code design and lead to codes that are more agnostic to the channel and thus enable the implementation of a single code that performs equally well in all channel conditions and that even outperforms conventional LDPC codes.

The next tutorial introduces ** half-shade devices and miniature polarization devices**.