# Multidimensional Optimized Optical Modulation Formats

This is a continuation from the previous tutorial - ** atomic rate equations**.

## 1. Introduction

The development of advanced digital signal processing (DSP) to enable intradyne coherent optical receivers caused a paradigm shift within optical communications, and there is little doubt that the future of optical transport will be coherent.

Coherent receivers ideally map the optical signal to the electrical domain, which enables a lot of novel advanced communication algorithms to be implemented in optical links, for example, digital equalization and advanced modulation.

One of the most profound developments was that intradyne receivers enabled all four quadratures of the optical signal (or in optical terms amplitude, phase, and polarization states) to be modulated and detected.

This was realized already in the early 1990s when Betti et al. investigated the modulation of all four quadratures in optical links. Even if coherent detection was demonstrated already in 1990 by Derr, it was too complicated to be commercially interesting and the research faded.

As optical transmission systems had traditionally used rudimentary modulation (typically on-off-keying (OOK) or differential phase-shift-keying (DPSK)), the coherent receivers meant great opportunity to study novel modulation formats, tailored for the emerging coherent optical links.

The first such format used was the polarization-multiplexed quadrature shift keying (PM-QPSK), which in its simplest form is binary phase-shift keying (BPSK) in all four quadratures in parallel.

As coding and modulation are key building blocks in the design of any communication link, it is a natural first approach to separate them and study the performance of each block separately.

Most of the research reviewed and presented in this tutorial deals only with the modulation format, but we emphasize that it is only part of the problem in designing a good optical transmission link.

The second part is to add forward error correcting (FEC) codes, preferably tailored and co-optimized with the modulation formats, an area often referred to as coded modulation.

The choice of modulation format in a link is crucial in that it sets an upper limit on the achievable spectral efficiency, which loosely speaking measures how well the channel real estate (bandwidth and signaling dimensions) are utilized.

The addition of FEC will always reduce the spectral efficiency (but with the crucial benefit of increasing the noise tolerance). Nevertheless, there is a deep relation between coding and modulation.

Specifically, all FEC codes can be interpreted as a multidimensional modulation format by considering a sequence of time slots as dimensions. The converse does not necessarily hold.

Although many multidimensional modulation formats, in particular those with a regular structure, can be interpreted as a low-dimensional modulation format in combination with an FEC code, this is not always the case. The relation between modulation and coding is discussed further in later tutorials.

In the choice of modulation format, there is an inherent threefold trade-off between the spectral efficiency, the noise tolerance, and complexity of the format. In this tutorial, we aim to shed some light on these trade-offs, by investigating relatively simple, low-dimensional formats in four dimensions.

Such research is not new; 4D formats were investigated already in the 1970s by Welti and Lee and by Zetterberg and Brändström. Also, the work by Biglieri contains some of the 4D formats that we discuss in this tutorial, as well as discussions on lattices and lattice cuts, which we also cover.

The novelty is the application to the optical channel with its specifics and trade-offs when it comes to signal generation, transmission, and detection. Therefore, we devote quite some effort to review and describe implementations and experiments.

This tutorial is organized as follows.

- In the next section, we give basic definitions and performance metrics for modulation formats that are common in the literature.
- In Section 2.3, the most interesting formats and their performances are theoretically described and characterized.
- Next, in Section 2.4, we study how low-dimensional codes can be used to extend the known formats to higher dimensions and spectral efficiencies.
- In Section 2.5, the relatively large body of experimental work done on multidimensional modulation in coherent links that has been done in the last few years is reviewed, and finally Section 2.6 concludes.

## 2. Fundamentals of Digital Modulation

An optical communication channel, like any other physical propagation or storage medium, is what in communication theory is called a waveform channel, which communicates a time-varying voltage (or electric field) from one point to another.

If the channel is used to transmit digital data, then there are only a finite number of possible waveforms of a given length, and every such waveform corresponds to a certain sequence of bits.

The process of mapping bits into waveforms and vice versa is called digital modulation. This can be done in a multitude of ways, depending on the type of channel.

### 2.1 System Models

A multidimensional channel is one that offers the possibility of transmitting multiple waveforms simultaneously. These waveforms could consist of the two quadratures of an amplitude- and phase-modulated light wave, the two polarizations, multiple wavelengths in a wavelength-division multiplexed (WDM) system, multiple modes, or multiple cores.

Each of the parallel waveforms can be thought of residing in one dimension. The traditional paradigm, and the least complex solution, is to transmit independent data on all of these dimensions.

However, improved performance can be obtained by encoding data jointly on several dimensions, that is, by multidimensional modulation. This improvement is most prominent if the waveforms interfere with each other during transmission, but significant gains can be achieved even if the waveforms are transmitted independently.

The mapping of bits into waveforms can be thought of as a three-step process.

First, redundant bits are added to the payload. This overhead serves several purposes: to indicate a frame structure, which allows the interpretation of the received bit stream as a sequence of data packets to provide address information for proper routing and to provide error resilience via FEC. These functions, albeit crucial for the operation of an optical communication network, are all outside the scope of the present tutorial.

Second, \(m\) bits at a time are mapped into a symbol, which is a vector in an \(N\)-dimensional space. The set \(\mathcal{X}\) of all \(M=2^m\) symbols are called a constellation. This is the single most important entity in the definition of a modulation format; indeed, it is so important that the term “modulation format” is sometimes used as a synonym for constellation.

Third, the sequence of symbols is mapped into a set of waveforms. The standard way to do this is via a linear modulator. Denoting the sequence of \(N\)-dimensional symbols with \(\pmb{x}[k]\), for \(k=\ldots,0,1,2,\ldots\), the vector of \(N\) waveforms is computed as

\[\tag{2.1}\pmb{x}(t)=\boldsymbol{\sum}_k\pmb{x}[k]\phi(t-kT)\]

where \(T\) is the symbol time and \(\phi(t)\) is a given pulse shape.

At this point, it should be emphasized that the discrete-time sequence \(\pmb{x}[k]\) is fundamentally different from its continuous-time counterpart \(\pmb{x}(t)\) and they should not be confused with each other.

The waveforms \(\pmb{x}(t)\) needs to be considered in order to analyze signal spectra as well as propagation effects such as distortions, filtering, added noise, and other hardware limitations.

On the contrary, the sequence \(\pmb{x}[k]\) is the quantity of interest to analyze bit error rate (BER) and symbol error rate (SER), mutual information, channel capacity, etc.

The vector \(\pmb{x}(t)\) represents \(N\) baseband waveforms. Each of these waveforms are now multiplied with a carrier, for transmission over an \(N\)-dimensional channel, which, as explained in the beginning of this subsection, consists of multiple quadratures, polarizations, wavelengths, modes, and/or cores.

At the receiver side, the reverse operations are performed using a coherent receiver.

First, the symbol clock, carrier phase, and polarization are recovered using either blind or pilot-aided estimation algorithms. A balanced detector now outputs the \(N\) received baseband waveforms, represented by the vector \(\pmb{y}(t)\), which should hopefully resemble \(\pmb{x}(t)\).

Second, the waveforms are filtered and sampled. The obtained sequence of \(N\)-dimensional vectors is

\[\tag{2.2}\pmb{y}[k]=\int_{-\infty}^{\infty}\pmb{y}(t)h(kT-t)dt\]

for \(k=\ldots,0,1,2,\ldots\), where \(h(t)\) is the impulse response of the receiver filter.

The received symbol sequence \(\hat{\pmb{x}}[k]\) is now determined by identifying, independently for each \(k\), the point in \(\mathcal{X}\) closest to \(\pmb{y}[k]\), in some well-defined sense that depends on the channel model.

Ideally, the receiver filter is chosen as a matched filter \(h(t)\sim\phi(T_d-t)\), where \(T_d\) is the processing delay.

Furthermore, the pulse \(\phi(t)\) is chosen to satisfy the \(T\)-orthogonality criterion

\[\tag{2.3}\int_{-\infty}^{\infty}\phi(t)\phi(t-Kt)dt=0,\quad\text{for all integers }k\ne0\]

which avoids intersymbol interference for linear channels, that is, \(\pmb{y}[k]\) depends on \(\pmb{x}[k]\) but not on \(\pmb{x}[k\pm1]\), \(\pmb{x}[k\pm2]\), …

Third and last, the received bit sequence is obtained by concatenating the bits corresponding to each symbol. Then, the digital overhead is removed, which includes the operations of FEC decoding and frame synchronization.

It is also possible to consider blocks of \(K\) symbols \(\pmb{x}[k]\), \(\pmb{x}[k+1]\), … , \(\pmb{x}[k+K−1]\) as a supersymbol, taken from a constellation of \(NK\) dimensions.

In general, this technique improves the performance at the cost of a higher transmitter and receiver complexity. A similar effect can be achieved at a more manageable complexity by applying an FEC code before modulation.

Specifically, if a block code with codeword length \(n=mK\) is applied to the bit stream before modulation, the resulting symbol sequence can be regarded either as a sequence of dependent \(N\)-dimensional symbols or as a sequence of independent \(NK\)-dimensional supersymbols. We see examples of such \(NK\)-dimensional constellations designed from standard FEC codes later.

### 2.2 Channel Models

A complication for optical links is that the fiber propagation of the signal waveform is conventionally modeled with a nonlinear partial differential equation, the nonlinear Schrödinger equation (NLSE), where fiber dispersion, nonlinearities, and amplifier noise distort the signal.

This is not the desired discrete-time model that a communication engineer would like to have when designing the coding and modulation algorithms.

There are generally three problems associated with taking the fiber propagation to a usable discrete-time model.

- To correctly model the transitions between symbols and waveforms (discrete and continuous time). Usually, the transmitter is modeled as a continuous pulse source multiplied with discrete data in each symbol time, ignoring the sum in (2.1). This works fairly well, but one may have unwanted intersymbol interference in the symbol borders that is often neglected.

The receiver side, going from the continuous waveform to a discrete data sample, is often modeled as an integrate-and-dump filter, that is, restricting the integral in (2.2) to an interval of length \(T\). This is not penalty-free, and it is theoretically complicated when the signal spectrum is distorted or broadened so one cannot guarantee matched filtering or sampling without aliasing.

- The NLSE and fiber transmission is nonlinear in the general case, and often operated in a regime where the nonlinearity cannot be neglected. In this case, the received signal is generally affected by intersymbol interference even if (2.3) is satisfied and linear ISI in the channel is removed.

- The coherent receiver should have negligible distortions, that is, operate in a regime (strong local oscillator with low phase noise) where it linearly maps the optical field to the electrical domain for sampling and detection.

In addition, perfect timing synchronization and compensation for channel impairments are assumed. Often these problems are neglected, which leads to the standard additive white Gaussian noise (AWGN) model for coherent links, where the signal is only distorted by additive amplifier spontaneous emission (ASE) noise. Good agreement between simulations and experiments is evidence that this approach works reasonably well for many systems.

Of the above-mentioned problems, the nonlinearity is the most serious one, but thanks to the recent developments of the Gaussian noise (GN) model, it can be dealt with by a simple extension of the AWGN model.

The GN model applies to links with strong dispersive broadening during propagation and electronic dispersion compensation in the receiver. Then, the impact of the nonlinearity can be accurately modeled as AWGN with a variance proportional to the average signal power cubed, which was first observed by Splett et al. in 1993.

In such links, the presented format optimizations (which rely on the noise being uniform in all dimensions) will still work well. The GN model is known to agree well with experiments and to be a useful system design tool, but the usefulness for, for example, capacity estimates in nonlinear links can be questioned.

A second model accounting for fiber nonlinearities is the nonlinear phase-noise model. This applies to links where the dispersion is negligible, for example, with optical in-line compensation and/or low baudrates. Then, the nonlinear self-phase modulation will, together with the ASE noise, lead to constellations with a spiraling shape. The model has also been extended to dual polarizations by Beygi et al..

### 2.3. Constellations and Their Performance Metrics

The starting point for digital modulation theory is, since long before the invention of fiber-optic communications, the scenario consisting of an AWGN channel, no coding, optimal detection (maximum likelihood, ML), and asymptotically low error probability.

In this scenario, the BER and SER are both proportional to \(Q(d/\sqrt{2N_0}\), where \(Q=(2\pi)^{-1/2}\int_x^{\infty}\exp(-z^2/2)dz\) is the Gaussian \(Q\) function, \(d\) is the minimum Euclidean distance between points in the constellation, and \(N_0\) is the noise power spectral density.

Modulation formats are, therefore, traditionally designed in order to maximize (a normalized version of) the minimum distance \(d\). Nevertheless, such modulation formats are often applied even in scenarios where the minimum distance does not govern the performance, such as for nonGaussian or nonlinear channels, in coded systems, with suboptimal receivers, or at nonasymptotic error probabilities.

The following performance metrics are often used to quantify the performance of modulation formats.

**Spectral Efficiency**

The spectral efficiency or normalized bit rate is defined as

\[\beta=\frac{\log_2M}{N/2}\]

where \(N\) and \(M\) give the number of dimensions and constellation points, respectively.

The spectral efficiency gives the number of bits per channel use, where every (complex) channel use involves two dimensions. It also gives the bit rate per bandwidth, in bit/s/Hz, if Nyquist signaling is applied (sinc pulse shaping).

A related quantity is \(\beta/2\), which gives the number of bits per dimension, and can be interpreted as the data rate per bandwidth in bit/s/Hz, if rectangular pulse shaping is applied and bandwidth is defined as the width of the spectral main lobe.

**Average and Peak Symbol Energy**

The average symbol energy, also called the second moment or the mean squared Euclidean norm, is

\[E=\frac{1}{M}\boldsymbol{\sum}_{x\in\mathcal{X}}|\pmb{x}|^2\]

and the peak symbol energy is

\[E_\text{max}=\underset{x\in\mathcal{X}}{\text{max}}|\pmb{x}|^2\]

If the pulse \(\phi(t)\) in (2.1) satisfies (2.3), then

\[\lim_{n\rightarrow\infty}\frac{1}{2nT}\int_{-nT}^{nT}|\pmb{x}(t)|^2dt\sim\frac{E}{T}\int_{-\infty}^{\infty}\phi^2(t)dt\]

that is, the continuous-time average energy is proportional to the discrete-time average energy \(E\). Unfortunately, there exists no analogous relation between the continuous-time and discrete-time peak energies. Constellation designs based on \(E_\text{max}\) tend nevertheless to be relatively good also in terms of the continuous-time peak energy, but not necessarily optimal.

**Average Bit Energy**

\(E_b=E/\log_2M\) gives the average energy needed to transit one bit of information.

**Constellation Figure of Merit**

The constellation figure of merit (CFM) is defined as

\[\text{CFM}=\frac{d^2N}{2E}\]

This is, assuming AWGN, no coding, optimal detection (maximum likelihood), and asymptotically high signal-to-noise ratio (SNR; low error probability), the relevant power metric if modulation formats are compared at the same bandwidth.

**Power Efficiency**

The (asymptotic) power efficiency is

\[\gamma=\frac{d^2}{4E_b}=\frac{\beta\text{CFM}}{4}\]

This is, under the same conditions as for the CFM, the relevant power metric if modulation formats are compared at the same bit rate.

**Gain**

The gain is quantified with respect to a baseline modulation format at the same spectral efficiency \(\beta\), commonly chosen as pulse-amplitude modulation (PAM). A PAM constellation has

\[\text{CFM}_{\text{PAM}}=\frac{6}{2^\beta-1}\]

and

\[\gamma_\text{PAM}=\frac{3\beta}{2(2^\beta-1)}\]

Multidimensional extensions of PAM such as quadrature-amplitude modulation (QAM) and polarization-multiplexed (PM) QAM have the same CFM and \(\gamma\).

Geometrically, the baseline constellations represent cubic subsets of the cubic lattice. The gain is defined as

\[G=\frac{\text{CFM}}{\text{CFM}_\text{PAM}}=\frac{\gamma}{\gamma_\text{PAM}}\]

also for spectral efficiencies \(\beta\) for which no PAM constellation exists.

**Mutual Information, MI**

The mutual information is defined as

\[I(\pmb{X};\pmb{Y})=\iint{f}(\pmb{x},\pmb{y})\log_2\frac{f(\pmb{x},\pmb{y})}{f(\pmb{x})f(\pmb{y})}d\pmb{x}d\pmb{y}\]

where \(\pmb{X}\) and \(\pmb{Y}\) are the channel inputs and outputs, respectively, and \(f\) denotes the distribution of the stochastic variables indicated by its arguments.

**Complexity**

Finally, some words should be said about complexity. It is one of the most important figures of merit, and it should be considered in any implementation, in order to keep the latency, energy consumption, and cost within reasonable levels.

Nevertheless, it is one of the hardest parameters to quantify numerically, depending not only on the modulation format but also on the transmitter and receiver algorithms as well as the hardware platform. As a crude rule of thumb, the complexity increases with the dimension, number of points, and irregularity of the constellation.

## 3. Modulation Formats and Their Ideal Performance

In this section, we briefly review the various modulation formats and format optimizations that have been presented in the literature.

Without doubt, the most commonly used formats are the PAM formats, based on the cubic lattice, possibly in \(N\) dimensions. Their performance is well known. Their popularity is mostly due to their simplicity of generation and detection, but if some of that simplicity is sacrificed, much better performance (in terms of noise tolerance or spectral efficiency) can be achieved. The formats presented in this section are examples of that.

We extensively discuss format optimization later in the tutorial. It is important to emphasize that the outcome of such an optimization is heavily dependent on what is optimized and which constraints are assumed under the optimization.

The simplest and most common scenario is to assume AWGN, no coding, optimal detection (ML), and asymptotically high SNR (low error probability). This ideal scenario is studied in this section. Modulation optimization for some specific nonlinear and nonGaussian channel models is summarized.

In the limit of high SNR, the formats with the lowest SER can be found from optimized packings of solid spheres. For a constant dimensionality and number of spheres, such packing optimization can be done by either minimizing the average distance of the spheres from the origin (the average second moment \(E\)) or by minimizing the maximum distance (the maximum symbol energy \(E_\text{max}\)).

To emphasize this difference, the constellation of \(M\) spheres with minimum \(E\) in dimension \(N\) is called the cluster \(\mathcal{C}_{N,M}\), and the constellation with lowest \(E_\text{max}\) is called the ball \(\mathcal{B}_{N,M}\). Sometimes the clusters and balls coincide, but in general they do not.

A simple example of the latter arises for 8 points in 2D, as shown in Figure 2.1. This example also shows that the balls may be nonunique, as the center point is loose, and can be freely moved without affecting \(E_\text{max}\).

In addition to the balls and clusters, one can also compare different formats at the same bit rate (where \(\gamma\) is the relevant metric), or at the same bandwidth (where CFM is used).

### 3.1. Format Optimizations and Comparisons

This and the next few sections focus mainly on the clusters, that is, the \(N\)-dimensional, \(M\)-point constellations that minimize the average symbol energy (second moment) \(E\).

Tables with coordinates of those constellations are given in, for example, for 2D clusters and for 3D and 4D clusters. These and other constellations are available online.

All these are numerically optimized results, presented as tables of coordinates. Quite often, the clusters possess some symmetry that facilitates a nice coordinate description.

In the limit of many points, the clusters will be spherical cuts from the regular lattices that are known to be the best packings in the given dimension. The best packing lattices are only known exactly in dimensions 2, 3, 4, 8, and 24, and they are listed in Table 2.1, together with their densities, Δ, which denotes the fraction of \(N\)-dimensional space that is filled by packing nonoverlapping spheres at the lattice points.

The power efficiency for a spherical cut of \(M\) lattice points in \(N\)-dimensional space can, if \(M\) is sufficiently large, be well approximated as

\[\tag{2.7}\gamma_\text{lat}=\log_2(M)\left(1+\frac{2}{N}\right)\left(\frac{\Delta}{M}\right)^{\frac{2}{N}}\]

This expression is derived by assuming a uniform point density in the spherical cut. This approach can be expected to be better with increasing \(M\), significantly exceeding the nearest neighbor number, so that many lattice cells are enclosed in the cut.

If an \(N\)-dimensional hypercubic cut is carried out rather than a spherical cut, a penalty of \(\pi{e}/6=1.53\) dB (the so-called shaping gain) is sacrificed for large \(N\).

In a similar manner, we have the CFM and gain for the lattices as

\[\tag{2.8}\text{CFM}_\text{lat}=2(N+2)\left(\frac{\Delta}{M}\right)^{\frac{2}{N}}\]

\[\tag{2.9}G_\text{lat}=\frac{N+2}{3}\left(M^{\frac{2}{N}}-1\right)\left(\frac{\Delta}{M}\right)^{\frac{2}{N}}\]

**3.1.1 General Properties of the Metrics**

Properties of the best-known clusters, for \(N\) = 2, 4, and 8 and selected values of \(M\), are shown in Figure 2.2. We conjecture that these clusters are all optimal for their values of \(N\) and \(M\).

The spectral efficiency \(\beta\) is shown versus the three power measures: CFM, power efficiency \(\gamma\), and gain \(G\). This also shows the qualitatively different behavior of the three metrics (CFM, \(\gamma\), and \(G\)).We now discuss the general behavior of these metrics with spectral efficiency \(\beta\) (or \(M\), since \(\beta\sim\log_2(M)\)).

The \(\text{CFM}\sim1/E(M)\) decreases monotonically with spectral efficiency \(\beta\), as it compares formats at the same bandwidth (same baudrate), thus showing essentially how the second moment \(E(M)\) increases with \(M\).

For large \(M\), one can expect the clusters to behave as lattice packings, and the CFM to decrease as \(\sim{M}^{-2/N}\) according to (2.7).

The \(\gamma\), on the contrary, weighs in the data rate by multiplying CFM with \(\log_2(M)\), giving it a dependence \(\gamma\sim\log_2(M)∕E(M)\). It can be shown that \(\gamma\) always increases up to at least the simplex (\(M=N+1\)).

However, for large \(M\), the dependence is the lattice’s \(\sim\log_2(M)M^{−2∕N}\), which will eventually decrease with \(M\), and we conclude that for every dimension \(N\gt1\), \(\gamma\) has a maximum \(\gamma_\text{max}\) at some value \(M_\text{opt}\).

The values of \(\gamma_\text{max}\) and \(M_\text{opt}\) are only known, or conjectured, for \(N\) = 2, 3, 4, 8 and listed in Table 2.2. Not much is known about the general dependence of \(\gamma_\text{max}\) and \(M_\text{opt}\) on the dimension \(N\). However, a crude approximation can be obtained from the lattice expression, and maximizing \(\log_2(M)M^{−2∕N}\) for real \(M\).

This optimum is

\[\tag{2.10}M_{\text{opt,lat}}=\exp(N/2)\]

\[\tag{2.11}\gamma_\text{max,lat}=\frac{N+2}{2e\log_2(2)}\Delta^{\frac{2}{N}}\]

These values are compared with the exact known values in Table 2.2, and the agreement is surprisingly accurate, given the rough approximation involved by approximating the discrete points with the homogeneous lattice distributions. It is also interesting to note that \(M_\text{opt,lat}\) corresponds to \(\beta_\text{opt,lat}= 1∕\log_2(2)=1.44\) bits per symbol per dimension pair, independently of \(N\).

The gain \(G\) is defined as the performance relative to the cubic-lattice PAM constellations (QPSK, 16QAM, PM-QPSK, PM-16QAM, etc.), which all have \(G=0\). The clusters show a rapid improvement over the cubic lattice as \(\beta\) increases, as is clear from Figure 2.2(c).

At high spectral-efficiencies, the gain will approach the asymptote given by

\[\tag{2.12}G_\text{max}=\frac{N+2}{3}\Delta^{\frac{2}{N}}\]

which is 0.84, 1.97, and 3.72 dB in the respective 2D, 4D, and 8D cases.

**3.1.2. Two-Dimensional Formats**

The 2D clusters are in almost all cases part of the hexagonal lattice \(A_2\), which is the densest packing of many spheres in 2D space. The only exception is \(M=4\), for which every rhombic constellation with vertex angle between \(60^\circ\) and \(120^\circ\), including the square constellation (QPSK), have the same average symbol energy \(E\) as a four-point subset of \(A_2\).

Foschini et al. found the optimum 2D clusters in the cases of practical interest (\(M\) = 8, \(M\) = 16) by numerical optimization already in 1974, but clearly these results have not taken on in the community, and there are at least three reasons for this:

- The noninteger coordinates make a practical implementation more difficult.
- (ii) the gains G over square QAM constellations are never more than 0.84 dB according to (2.10).
- (Less important) The hexagonal constellations do not lend themselves to a straightforward bit-to-symbol mapping.

QAM constellations are, therefore, dominating in practical 2D systems.

The full set of 2D clusters up to \(M\) = 32 are shown as the \(N\) = 2 line for the three metrics (\(\gamma\), CFM and \(G\)) in Figure 2.2. The most common formats QPSK and 16QAM are shown as stars in Figure 2.2, and in Figure 2.2(c) they are references at \(G=0\).

In the limit of many points, the 2D clusters have performance close to the \(A_2\) lattice (shown with a dashed line), which is not surprising since they are cuts from this lattice, as shown by Graham and Sloane.

The highest \(\gamma\) is seen to arise for \(M=3\) (3-PSK), at \(\beta=3/2\). However, as for all the other 2D clusters (except for QPSK), it has seen limited use, although being discussed in the literature.

**3.1.3. Four-Dimensional Formats**

The 4D clusters \(\mathcal{C}_\text{4,M}\) are shown in Figure 2.2 as the \(N=4\) line. For communication purposes, the powers of two, \(M\) = 8, 16, 32 · · ·, are of particular interest, and they are discussed separately later.

In general, the optimum, or nearly optimum, 4D constellations that are subsets of the \(D_4\) lattice are easier to implement than the corresponding 2D clusters, since the \(D_4\) lattice is a subset of the regular cubic (integer) lattice \(\mathbb{Z}^4\). They will thus have a better opportunity to find wide use than the 2D clusters. Also, higher gains \(G\) are attainable in 4D than 2D.

A few specific cases have caused interest in the research community, and are discussed separately later, namely \(M\) = 4, 8, 16, and 24, as well as the higher powers of 2.

**3D Simplex**

The best packing of 4 points in 4D, the cluster \(\mathcal{C}_{4,4}\), is to put them in a regular tetrahedron, also known as the 3D simplex. Obviously, this is not a 4D object at all, since at least 5 points are required to span a 4D object, but it is the best packing of 4 points in all dimensions \(N\ge3\). Moreover, numerical evidence indicates that all clusters \(\mathcal{C}_{N,M}\), where \(M\le{N+1}\), are the \(M\)-ary simplices. In optical communications, this format was proposed and evaluated by Dochhan et al. as an alternative to PM-BPSK, over which it has a 1.25 dB asymptotic sensitivity gain.

**PS-QPSK**

The maximum \(\gamma\) in 4D occurs for \(M\) = 8. Geometrically, the format is the 4D cross-polytope, and also known in the communications community as 8-ary biorthogonal modulation. The biorthogonal (or cross-polytope) formats consist of all permutations and signs of signal vectors with zeroes at all coordinates except one. Gray mapping is not possible for biorthogonal formats, but assuming the “obvious” bit-to-symbol mapping that flips all bits between opposing symbol pairs ±1, 0, 0, 0…, an exact expression for the BER.

The 8-ary biorthogonal format was originally proposed for optical coherent systems by Betti et al., although it had been considered for communications much earlier. It can even be considered as a special case of permutation modulation, introduced already by Slepian.

In 4D, the cross-polytope can take on many representations; in addition to the permutations of \(\pm(1,0,0,0)\), it can be regarded as the odd (or even) parity subset of the 4D cube (PM-QPSK). It can thus also be seen as resulting from a parity-check code applied to the standard PM-QPSK.

The strength lies in that it loses less in spectral efficiency than it gains in sensitivity over PM-QPSK, so compared at the same bit rate, it gains \(\gamma=3/2\) or 1.76dB in power efficiency. At a finite BER of \(10^{-3}\), its gain is around 1 dB.

Transmission simulations of PS-QPSK in nonlinearly limited fiber links were presented. The general result is that the power efficiency improvement over conventional PM-QPSK can be translated into a reach extension or increased amplifier span losses, which has also been seen in experiments as discussed.

**6PolSK-QPSK/24-cell**

The 24-cell is a four-dimensional polytope that is, according to Coxeter, “…a peculiarity of four-dimensional space… having no analogue [in dimensions] above or below.”

The constellation consists of 24 vertices equally spaced from the origin and each other, and plays an important geometric role of being the Voronoi cell of the \(D_4\)-lattice, as well as the 4D kissing constellation, the latter being proved relatively recently.

The kissing constellation, consisting of the 24-cell and a point at the origin, is also the cluster \(\mathcal{C}_{4,25}\), notably a local maximum in the \(\gamma\) versus \(\beta\) plot, Figure 2.2(b). The cluster \(\mathcal{C}_{4,24}\) is not the 24-cell, but \(\mathcal{C}_{4,25}\) with an outer point removed and centered at the center of gravity.

Nevertheless, the 24-cell \(\mathcal{C}_\text{24-cell}\) performs quite well as a format in its own right. The points can be given as the hypercube in union with the cross-polytope, that is,

\[\mathcal{C}_\text{24-cell}\in\{(\pm1,\pm1,\pm1,\pm1),(\pm2,0,0,0)\}\]

taken with all permutations and sign selections.

An alternative, rotated and rescaled, representation is all permutations and sign selections of \((\pm1,\pm1,0,0)\). The 24-cell has a spectral efficiency \(\beta=\log_2(24)/2=2.29\) and power efficiency \(\gamma=\log_2(24)/4=1.15\) (or 0.59 dB).

In coherent optical communications, it was first proposed by Bülow et al., and later identified as the 24-cell. In optical links, it is realized by transmitting QPSK in one of six different polarization states \((x,y,\pm45^\circ,\text{right/left-hand circular})\), and hence referred to as 6-polarization shift-keying (6PolSK)-QPSK.

The 24-ary nature of 6PolSK-QPSK makes the bit-to-symbol mapping nontrivial, although a scheme was proposed based on mapping 9 bits to two subsequent 6PolSK-QPSK symbols. The resulting format has a slightly reduced \(\gamma\) of 0.51 dB and spectral efficiency of 2.25 bit/symbol/polarization.

**M-SP-QAM and the \(D_4\)-Lattice**

As was mentioned earlier, the \(D_4\) lattice is the densest packing of many points in 4D space. It is, therefore, of importance when finding useful modulation formats for 4D transmission lines.

There are systematic and low-complexity ways of doing this, rather than resorting to sphere packing optimizations and the above-mentioned clusters. The idea is to cut finite portions from the \(D_4\) lattice by using a cubic or spherical cut. The former is easiest in implementations, but the latter is better from a theoretical perspective, which is ultimately, for many points, 1.5 dB better, as was discussed earlier.

In general, the \(D_4\) lattice can be defined, for example, as all points with integer coordinates that sum to an even number. It can be obtained from the cubic (integer) lattice \(\mathbb{Z}^4\) in two ways, either by reduction, or extension.

The reduction scheme is similar to Ungerboeck’s set partitioning, (SP), introduced for trellis-coded modulation. The idea is to remove half of the points in \(\mathbb{Z}^4\) points, for example, those with odd parity. Thus one has

\[\tag{2.13}D_4=\left\{(k_1,k_2,k_3,k_4)\in\mathbb{Z}^4|\boldsymbol{\sum}k_i=\text{even}\right\}\]

The extension scheme is instead to start from \(\mathbb{Z}^4\) add a shifted variant of \(\mathbb{Z}^4\) half an integer in every dimension, that is,

\[\mathbb{Z}^4\cup\mathbb{Z}^4+\left(\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2}\right)\]

However, if the center-of-mass should remain at the origin (which is most efficient from a power efficiency point of view), it is better to shift the two cubic lattices an equal amount in opposite directions, that is

\[\tag{2.14}D_4^*=Z^4-\left(\frac{1}{4},\frac{1}{4},\frac{1}{4},\frac{1}{4}\right)\cup\mathbb{Z}^4+\left(\frac{1}{4},\frac{1}{4},\frac{1}{4},\frac{1}{4}\right)\]

Both methods give the \(D_4\) lattice (apart from a rescaling), and both are useful when obtaining power efficient modulation formats, especially with high spectral efficiency.

The reduction scheme was originally suggested for optical communications by Coelho and Hanik who called the resulting formats \(M\)-ary set-partitioned QAM or \(M\)-SP-QAM. Later, Karlsson and Agrell extended the concept to the whole hierarchy of formats obtainable from extension or reduction of the standard rectangular QAM formats.

In the reduction process, the minimum distance squared is increased by a factor of two, at the expense of losing 1 bit per symbol. Applying this to PM-QPSK leads to the PS-QPSK format with a gain of \(2\times3/4=3/2=1.76\) dB over PM-QPSK.

Applying the same technique to PM-16QAM is more attractive, leading to a gain of \(2\times7/8=7/4=2.43\) dB over PM-16QAM. This format is called 128-SP-QAM, and after being introduced by Coelho and Hanik it was studied in simulations of nonlinear transmission by Renaudier et al. and Sjödin et al.. The latter paper also discussed the problem of bit-to-symbol mapping and maximum-likelihood decoding for the format.

By using the extension scheme on PM-QPSK, one obtains the 32-SP-QAM format, which has \(\gamma=0\) dB, that is, the same power efficiency as PM-QPSK, but transmitting 5 bits per symbol rather than 4.

By using extension and reduction for known QAM formats, an \(M\)-SP-QAM hierarchy with \(M\) = 8, 32, 128, 512, 2048… can be realized, and in the recent review article by Fischer et al., more properties of these formats are given, including, for example, mutual information.

The following relations between power efficiency and spectral efficiency, corresponding to (2.5), can be derived for the SP-QAM hierarchy

\[\tag{2.15}\gamma_\text{SP-QAM}=\frac{3\beta}{2^{\beta+\frac{1}{2}}-1}\]

\[\tag{2.16}\gamma_\text{SP-QAM}=\frac{3\beta}{2^{\beta+\frac{1}{2}}-\frac{1}{2}}\]

where (2.15) holds for SP-QAM formats obtained by reduction of a rectangular PAM format and (2.16) for extension.

**Other 4D Formats of Interest**

The 16-ary cluster \(\mathcal{C}_{4,16}\) is 1.11 dB better (in the \(\gamma\) sense) than the hypercube (PM-QPSK), but the coordinates are not very nice. Layered along one coordinate axis, it consists of a 3D octahedron and a 3D cube, sandwiched between two single points.

The mutual information reveals only a marginal improvement over PM-QPSK, although it has received some experimental interest.

Another improvement over the PM-QPSK format was proposed by Sjödin et al.. Referred to as subset-optimized PM-QPSK (SO-PM-QPSK), the idea was to improve PM-QPSK by rescaling one (e.g., the even-parity) subset and leaving the other unchanged. By optimizing the rescaling to 1.618 (the golden ratio), a 0.44 dB \(\gamma\) improvement over PM-QPSK can be obtained.

It is possible to obtain a nice symmetric 256-point format by cutting the \(D_4\) lattice with a spherical cut around a deep hole. The levels comprise all 4D vectors that lie within a radius of 6, whose coordinates are odd integers and where the coordinate sum is a multiple of 4. Remarkably, this is exactly 256 vectors, and the \(\gamma=16/27=-2.27\) dB, and it is quite likely the most power-efficient 256-ary constellation in 4D.

**3.1.4. Eight- and Higher-Dimensional Formats**

The 8D clusters \(\mathcal{C}_{4,M}\) are shown in Figure 2.2 as \(N\) = 8 line. The maximum power efficiency, \(\gamma_\text{max}\), is 3.01 dB and occurs for \(\mathcal{C}_{8,16}\), which is the 8D cross-polytope, or biorthogonal 16-ary modulation. Interestingly, almost the same \(\gamma\) is obtained for \(M\) = 58 and 241.

For higher spectral-efficiency constellations, the best 8D lattice packing is given by the \(E_8\) lattice, which can be obtained from the \(D_8\)-lattice in union with a shifted \(D_8\), that is,

\[E_8=D_8\cup{D_8}+\left(\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2},\frac{1}{2}\right)\]

where \(D_8\) is defined in analogy with (2.12).

Koike-Akino et al. and Millar et al. discussed two ways of obtaining 8D constellations, namely by cutting (spherical) parts from the \(E_8\) lattice and using known block codes.

They classified a few promising 8D modulation formats (\(M\) = 128, 256) in terms of \(\gamma\) as well as in terms of nonlinear transmission reach. They went deeper and generalized the study to even higher dimensions as well, for example, 6D, 16D, and 24D.

In 16D, the Barnes–Wall lattice is known to be the densest, and a promising 16D constellation with \(M=2^{11}\) points was found from cuts of this lattice, which might be the \(\gamma_\text{max}\) of 16D, even if further studies are required before this can be settled.

It would, interestingly, be in agreement with the approximate expression (2.9). In 24D, the Leech lattice (and the associated Golay code) was used.

**3.1.5. PPM-Based Formats**

Pulse-position modulation (PPM) is a well-known technique to increase power efficiency at the expense of spectral efficiency. The idea is to frame \(2^p\) symbols in time to a \(K\)-ary “supersymbol,” of which one slot is selected for the transmission of a single pulse.

One can in this way transmit \(\log_2(K)\) bits per supersymbol. It has been suggested to combine PPM with higher-order modulation formats to a hybrid \(K\)-PPM-\(M\)-QAM format by transmitting modulated data in the selected PPM slot.

It was pointed out that PS-QPSK is equivalent to 2-PPM-QPSK—it is just another set of four dimensions. To use PPM (i.e., subsequent symbol slots) instead of polarization is often an easier way of realizing more dimensions, and in fact such formats come closer to being an FEC code.

Recently, the PPM idea was further generalized to allow supersymbols with an arbitrary number of nonzero slots (instead of just one). For example, with inverse PPM, iPPM, the idea is to transmit in all symbols but one in the PPM frame. In this way, formats could be realized that have both higher spectral efficiency and higher sensitivity than PM-QPSK. An example is 8iPPM-QPSK, which has \(\beta=2.13\) and \(\gamma=0.84\) dB.

### 3.2. Optimized Formats in Nonlinear Channels

Most formats discussed earlier have been optimized for the linear AWGN channel. However, as pointed out, the fiber is nonlinear, and often systems are operated in a weakly nonlinear regime, where the signal power is optimized as a trade-off between SNR and nonlinear distortions. What can then be done for the nonlinear channel in term of format optimization?

Within the GN model, the format optimization can be essentially the same as for the linear AWGN model assumed earlier, since the noise will be uniform and approximately Gaussian in all dimensions.

Obviously, the formats based on optimizing minimum distance is reasonable for very high SNRs, and in a model with limited SNR (as the GN model) one would need to optimize at a constant SNR. This can be done, but is more computationally demanding.

One could also argue that the balls would be better than the clusters, since they suffer less penalties for average power-limitations than clusters do for maximum power-limited channels. Then again, in the GN model the average signal power is relevant, which speaks in favor of the clusters. Moreover, there is no simple mapping between peak-power limits in discrete and continuous time. The former is easier to analyze, the latter makes more physical sense. Therefore, a more rigorous comparison between balls and clusters in nonlinear links remains to be done.

Format optimizations have also been done for nonGaussian channels, such as the phase-noise channel model. Lau and Kahn compared various 4-point constellations, and managed to improve the nonlinear tolerance significantly by going from QPSK to a constellation with 3-PAM plus a fourth point further out.

A comparison was made between constellations with points on 2 and up to 5 different radii. By optimizing 16-point constellations, a few decibels of increased nonlinear tolerance was seen.

In this context, the recent work by Kayhan and Montorsi on constellation optimization should be mentioned, although they considered a linear phase-noise channel model. As Foschini et al., they considered finite SNR, but used a different target function for the optimization process (approximations and variants of mutual information).

Satellite constellations were introduced, in order to show that the channel capacity of any channel (linear or nonlinear) may not decrease with signal power. They are formed by taking a standard format, for example, 8-PSK, and moving one point far out from the rest.

This yields a constellation whose minimum SER, as well as maximum MI, occurs at a high average power, which can be made arbitrarily high by moving the lone point (satellite) further from the rest. A similar trick was used by Steiner when optimizing formats in the low-SNR regime.

## 4. Combination of Coding and Modulation

So far, the comparisons of modulation formats in this tutorial have concerned uncoded transmission. Modern optical communication systems, however, often include some kind of FEC coding.

For best system performance, the code should influence the choice of modulation format. For example, a modulation format with high spectral efficiency may require a lower-rate code (better error protection capability) than a modulation format with lower spectral efficiency.

This section discusses optimization of modulation formats in coded systems. We distinguish between three cases, depending on the type of decoder employed, which pose quite different requirements on the choice of modulation format.

The three cases are soft-decision decoding, hard-decision decoding, and iterative decoding, which loosely correspond to weak, medium, and strong coding, respectively. Most of this section is devoted to the first case, which is more intimately connected to the problem of constellation design.

### 4.1 Soft-Decision Decoding

We here consider the application of a relatively short-length, well-structured block code, such as a single-parity check code, Hamming code, or Reed–Muller code.

The employment of such simple codes implies low latency, simple encoding and decoding hardware, and hence low energy consumption. Nevertheless, significant improvements over uncoded transmission can be obtained.

If the decoder is an optimal soft-decision decoder, in the sense that it finds the codeword that is closest to the received word in Euclidean distance, then the combination of code and modulation can be regarded as a single higher-dimensional modulation operation.

This approach has been developed extensively in the communications literature and more recently in an optical context.

To be precise, let \(\mathcal{C}\) be a binary linear block code with parameters \((n, k,d_\text{H})\), where \(n\) is the total number of bits per codeword, \(k\) is the number of information bits per codewords, and \(d_\text{H}\) is the minimum Hamming distance. The code rate is \(k/n\).

If this code is used in combination with a (low-dimensional) constellation \(\tilde{\mathcal{X}}\) with dimension \(\tilde{N}\) and size \(\tilde{M}\), then \(\log_2{\tilde{M}}\) bits are needed to index each point in the constellation, and the \(k\) information bits in a codeword suffice to index a block of \(k/\log_2{\tilde{M}}\) constellation points (assuming that this is an integer).

This block can be regarded as a point in a larger constellation \(\mathcal{X}\) with parameters

\[\begin{align}N&=\frac{n\tilde{N}}{\log_2\tilde{M}}\\M&=\tilde{M}^{k/\log_2\tilde{M}}=2^k\end{align}\]

There exists no general expression for calculating the minimum Euclidean distance \(d\) from the parameters of \(\mathcal{C}\) and \(\tilde{\mathcal{X}}\). It depends heavily on the mapping from bits to symbols, which needs to be done with some care.

The simplest, and most common, special case is to let \(\tilde{\mathcal{X}}\) be a BPSK constellation \(\tilde{\mathcal{X}}=\{\pm\sqrt{\tilde{E}}\}\), where \(\tilde{E}\) is the symbol energy, with parameters \(\tilde{N}=1\) and \(\tilde{M}=2\). This yields, for every code \(\mathcal{C}\), a constellation \(\mathcal{X}\) with parameters

\[\tag{2.17}\begin{align}N&=n\\M&=2^k\\d^2&=4\tilde{E}d_\text{H}\\E&=n\tilde{E}\end{align}\]

Geometrically, the constellation \(\mathcal{X}\) resides on the vertices of an \(n\)-dimensional hypercube. The binary code is used to select a subset of these vertices.

Consider, for example, the 4-bit single-parity check code with parameters \((n,k,d_\text{H})=(4,3,2)\). This yields, by (2.17) and (2.4), a four-dimensional constellation with 8 points and power efficiency \(\gamma=d^2\log_2M/(4E)=d_\text{H}k/n=1.76\) dB. This constellation is identical to the PS-QPSK constellation.

Using this approach, a large number of high-dimensional constellations can be designed from standard binary block codes. An attractive family of codes for this purpose is the Reed–Muller (RM) codes.

These codes have rather good performance (albeit not optimal) at short block lengths \(n\), and furthermore, there exist fast encoding and decoding algorithms, alleviating the need for table look-up.

An RM code is specified by two integer parameters, \(u\) and \(r\), chosen such that \(u\ge1\) and \(0\le{r}\le{u}\). The parameters of the \(\text{RM}(r,u)\) code are \((n,k,d_\text{H})=(2^u,\boldsymbol{\sum}_{i=0}^r\left(\begin{split}u\\i\end{split}\right), 2^{u−r})\).

Special cases are repetition codes (\(r=0\)), single-parity check codes (\(r=u-1\)), and the universe code (i.e., uncoded transmission, \(r=u\)).

The parameters of some constellations obtained from Reed–Muller (RM) codes are illustrated in Figure 2.3.

The obtained constellations are apparently quite competitive, compared with the best-known constellations at the same dimensions and spectral efficiencies, and in several instances the RM codes actually yield the best-known constellations.

For any \(\beta\lt2\), arbitrarily high CFM and \(\gamma\) can be obtained by choosing suitable RM code parameters. This makes RM codes attractive instruments for constellation design, especially since low-complexity coding and decoding algorithms are known.

Analogous curves are presented in Figure 2.4 for Hamming codes, the Golay code, and their extended versions.

It turns out that the Hamming codes yield a constant CFM of 7.78 dB, regardless of their size, and similarly, the extended Hamming codes yield CFM = 9.03 dB. The power efficiency increases with the codeword length \(n\) and reaches asymptotically 4.77 dB for Hamming codes and 6.02 dB for extended Hamming codes. The extended Golay code also reaches \(\gamma\) = 6.02 dB, at a lower spectral efficiency.

Theoretically, nothing prevents us from designing extremely high-dimensional constellations by applying the methods of the previous section to codes with long codewords.

Consider, for example, the ubiquitous (255,239) Reed–Solomon (RS) code, which was standardized by the ITU-T in 2000. This code encodes 239 information bytes into blocks of 255 transmitted bytes, and it has an error-correction capability of 8 bytes. This error-correcting capability corresponds to 8 bit errors in the worst case, but if the bit errors come in burst, many more than 8 bit errors can be corrected, as long as the errors do not affect more than 8 bytes in total.

The parameters of the (255,239) RS code, converted from bytes to bits, are \((n,k,d_\text{H})=(2040,1912,17)\). This code can, if combined with BPSK modulation as in (2.17), be regarded as a 2040-dimensional constellation with parameters \(\beta\) = 1.87, CFM = 15.31 dB, \(\gamma\) = 12.02 dB, and \(G\) = 11.79 dB.

Compared with the curves in Figures 2.2–2.4, this constellation has an impressive performance, falling far to the right of any of the curves. This exemplifies the essence of coding, which is to improve the power efficiency by increasing the dimensionality.

One could ask whether it makes sense to consider a 2040-dimensional constellation with \(M=3.7\cdot10^{575}\) points. Clearly, it is not possible to enumerate or store all the points. However, if soft-decision decoding is used in the receiver, the performance predicted by the constellation analysis mentioned earlier is indeed achievable, for well-structured codes such as RM and RS.

Generalizing, we conclude that any low-dimensional modulation scheme with soft-decision FEC is equivalent to a high-dimensional modulation scheme without FEC.

In all these cases, the spectral efficiency \(\beta\) never goes above 2, which is the spectral efficiency for uncoded BPSK. This is a serious limitation in practical optical system implementations, where higher and higher spectral-efficiencies are being targeted nowadays.

To circumvent this limitation, one must employ a multilevel constellation instead of BPSK. This leads to a simple type of coded modulation. In the following, we give some simple examples, based on single-parity check codes and RM codes.

A natural extension to BPSK is to let \(\tilde{\mathcal{X}}\) be a regular PAM constellations with \(\tilde{M}=2^{\tilde{m}}\) points. If the distance between two neighboring PAM points is \(\tilde{d}\), then the average symbol energy of \(\tilde{\mathcal{X}}\) is \(E=(\tilde{M}^2-1)\tilde{d}^2/12\). As explained earlier, the \(k\) information bits in a codeword are divided into groups of \(\tilde{m}\) bits, thus indexing a block of \(k/\tilde{m}\) PAM symbols.

This block of PAM symbols constitutes an \(N\)-dimensional constellation \(\mathcal{X}\) with parameters

\[\tag{2.18}\begin{align}N&=\frac{n}{\tilde{m}}\\M&=2^k\\d^2&\ge\tilde{d}^2d_\text{H}\\E&\le\frac{n\tilde{E}}{\tilde{m}}\end{align}\]

for a suitably chosen (Gray-coded) mapping from bits to PAM symbols.

An important special case arises by applying a single-parity check code with parameters \((n,k,d_\text{H})=(\tilde{m}N,\tilde{m}N-1,2)\). The codeword length \(n\) is chosen so that the obtained constellation \(\mathcal{X}\) is \(N\)-dimensional.

The obtained constellations are plotted in Figure 2.5 for various values of \(\tilde{m}\) and \(N\).

The 4D case has been studied in optical communications under the name SP-QAM. If, for example, the code with parameters (8,7,2) is mapped to a Gray-coded 4-PAM constellations, then the parameters of the resulting 128-point constellations are, by (2.18), \(d^2\ge2\tilde{d}^2\) and \(E\le4\tilde{E}=5\tilde{d}^2\), which yields \(\gamma\ge7/10=-1.55\) dB, at a spectral efficiency of \(7/2\). This modulation format is represented by one of the dots in Figure 2.5(b).

As \(n\) increases, the gain \(G\) of SP-QAM converges to 1.51 and 2.26 for \(N\) = 4 and 8, respectively, which should be compared with the maximum possible gains 1.97 and 3.72 dB, respectively, in (2.10). The asymptotic gain as \(N\) and \(\tilde{M}\) both approach infinity is 3 dB. There is no gain to be harvested in 2D by this method.

The same types of codes as in Figures 2.3 and 2.4 were applied to 4-PAM, which yielded the results in Figures 2.6 and 2.7. The obtained constellations are relatively weak, compared with the best-known constellations at the same dimensions and spectral efficiencies. Nevertheless, the results show that it is in principle to achieve arbitrarily high CFM and \(\gamma\) at any \(\beta\lt4\), if the codeword length is increased sufficiently.

### 4.2. Hard-Decision Decoding

Most commercially deployed long-haul fiber-optical communication systems use hard-decision decoding, which can be realized at a significantly lower hardware complexity than soft-decision decoding.

The decoder can be implemented using binary logic, with no need for analog-to-digital conversion. This in turn admits the use of stronger codes (longer codeword lengths). Reed–Solomon codes are the most popular codes in this context, but BCH (Bose–Chaudhuri–Hocquenghem) codes, Hamming codes, and convolutional codes have also been considered.

In a system with hard-decision decoding, the geometric framework in the previous section makes less sense. In this case, performance metrics based on the minimum Euclidean distance are misleading, and modulation and coding should be kept separate in the analysis.

The standard system design method is to choose a modulation format that guarantees a certain BER, the so-called FEC limit, which is typically in the range of \(10^{-3}\) to \(10^{-4}\), and trust the FEC to bring down the BER to a negligible level.

This design principle is very popular in practice, since it decouples the FEC from the rest of the system, which facilitates experimental work. Two main weaknesses with this standard approach is that it offers no simple mechanism to optimize the code rate (varying the FEC limit) and that it does not account for the bursty nature of errors.

### 4.3. Iterative Decoding

Modern codes such as low-density-parity-check (LDPC) codes and turbo codes have revolutionized wireless communications, and an equally promising potential is envisioned in optical communications.

These codes are typically very long (in the order of 10,000 bits) and have a pseudorandom structure. Algebraic decoding would be far too complex for such codes, but there have been devised efficient iterative decoding algorithms, which gradually improves an estimate of the transmitted codeword, using either soft or hard decisions.

Such decoders are not guaranteed to find the optimal codeword, but nevertheless these codes have excellent performance. In some cases, they even approach the maximum spectral efficiencies predicted by Shannon in 1948.

Shannon proved that the achievable data rate of a given modulation format is upper-bounded by the MI, defined in (2.6). Furthermore, recent research has shown that a performance very close to the MI is achievable using long LDPC codes and soft-decision iterative decoding.

The spectral efficiency in bits per dimension pair is plotted in Figure 2.8 for some common 4D constellations, as a function of the SNR per bit. The input \(\pmb{X}\) is drawn uniformly from a constellation \(\mathcal{X}\), given by the modulation format, and a memoryless AWGN channel is assumed. The spectral efficiency is here calculated as \(\beta=I(\pmb{X};\pmb{Y})/(N/2)\) and the SNR per bit as \(E_b/N_0\), where \(N_0\) is the power spectral density of the Gaussian noise.

Although the plot only includes a small number of constellations, several interesting conclusions can be drawn.

At a given target \(\beta\), it is practically always beneficial in terms of power efficiency to increase the number of points in the constellation. The fact that the minimum distance decreases is fully compensated for by using a lower code rate (higher overhead).

The gain obtained by increasing the number of points is, however, negligible at low \(\beta\), where practically any constellation performs close to capacity.

If the number of points is kept constant, then the constellations designed for optimum uncoded performance tend to be good also in terms of MI.

The gains in decibels are, however, less than the corresponding gains in terms of \(\gamma\) or CFM. Interestingly, PM-QPSK is better than SO-PM-QPSK for all \(\beta\) values, in contrast to the uncoded performance where SO-PM-QPSK is 0.44 dB better.

This effect is even more pronounced in systems with bit-wise receivers, where PM-QPSK is the best-known constellation, despite its simple structure, gaining approximately 1 dB over the cluster \(\mathcal{C}_{4,16}\).

## 5. Experimental Work

To experimentally demonstrate four- and higher-dimensional modulation formats, one needs to be able to simultaneously access all dimensions in the transmitter and receiver.

Depending on how the dimensions are physically realized in the channel (e.g., time, frequency, or spatial dimensions), this can be more or less complicated, as the used dimensions must be synchronized and not drift between symbols. This often requires tailored DSP algorithms for the considered modulation formats.

In this section, we review the experimental work done on mainly 4D formats, where the four dimensions are the conventional four quadratures (I/Q in each of the \(x\) and \(y\) polarizations).

We divide the discussion into

- Realizations of the transmitter and transmission link properties.
- The receiver algorithms, including DSP and decoding, with a summary table.
- We discuss format detection, that is, how to simply determine the transmitted symbol from the received 4D vector, without resorting to a full search of the Euclidean distances to all points in the whole constellation.
- We finally discuss alternative ways of extending dimensions in signal space from a complexity and implementation perspective.

### 5.1 Transmitter Realizations and Transmission Experiments

We later describe the experimental work in similar order as in the above-mentioned theory.

**3D Simplex**

Dochhan et al. proposed and demonstrated the 3D simplex (tetrahedron) transmission, by using a four-channel digital-to-analog converter (DAC) driving a conventional PM-QPSK modulator. The format transmits QPSK in one polarization and BPSK in the other, thus leaving one quadrature unmodulated.

Of the resulting eight levels (forming a cube), the four with odd parity were selected, giving the desired tetrahedron. The symbol rate was 16 Gbaud, corresponding to a data rate of 32 Gbit/s, which was transmitted over 300 km of single-mode fiber.

The back-to-back sensitivity was approximately 1 dB better than PM-BPSK, in agreement with theory, and the nonlinear robustnesswas similar to PM-BPSK. Yamazaki et al. developed a simple integrated modulator structure for this format.

**PS-QPSK**

The first experimental realization of a 4D format in fiber-optic transmission, was probably the demonstration by Sjödin et al. of PS-QPSK at 30 Gbit/s in 2011.

In this experiment, a conventional I/Q-modulator for QPSK was used, and then the data were split into two arms, driven by a pair of Mach–Zehnder amplitude modulators in a push–pull constellation, meaning that either one or the other arm was blocked.

Then, the two arms were multiplexed together by a polarization combiner. In this way, two bits were encoded in the QPSK symbol and the third in the choice of polarization. Similar transmitter structures were used also by other groups around this time, for example, Millar et al. and Nelson et al..

An alternative transmitter setup was used by Fischer et al. who used a PM-QPSK transmitter with a programmable bit-pattern generator, driving the 4 bits with a preprogrammed pattern.

Three of the bit streams were driven by uncorrelated (delayed) pseudo-random sequences, and the fourth was formed as a parity (exclusive OR, XOR) bit from these three sequences.

This transmitter was also used by Renaudier et al., who also introduced a timing offset between the two polarizations to facilitate the receiver DSP.

Yamazaki et al. presented an integrated modulator optimized for PS-QPSK, which could directly generate PS-QPSK in a single device driven by three binary drive signals. It has the additional benefit of avoiding the inherent 3 dB coupling loss in the I/Q modulators when the I and Q quadratures are mixed.

Following the initial demonstration of single-channel transmission came a stream of experimental demonstrations of PS-QPSK; first single-channel demonstrations at higher data rates, for example, 42 Gbit/s and 112 Gbit/s, and then WDM experiments over ultralong distances (thousands of kilometers).

The general conclusion was that the improvement in transmission distance predicted in simulations was experimentally verified. Typically, PS-QPSK achieved 10–25% longer transmission reach (at a BER=\(10^{-3}\)) than PM-QPSK at the same data rate, both in single-channel systems and with WDM.

Masalkina et al. used PS-QPSK in a 20 Gbit/s orthogonal frequency-division multiplexed (OFDM) transmission experiment. Lavery et al. demonstrated that digital back-propagation could extend the reach for 112 Gbit/s PS-QPSK from 4600 km to 5600 km.

**6PolSK-QPSK**

Experimental generation of 6PolSK-QPSK was first demonstrated in 2012 independently by Buchali and Bülow and by Fischer et al..

The transmitter structure used in these experiments was based on a 4-channel 28-Gbaud DAC driving a dual-polarization I/Q modulator at the three levels {− 1, 0, 1}. By forming all 24 permutations and sign selections of the vector (0, 0,±1,±1), the 24-cell is obtained.

The symbol-to-bit mapping followed the suggestion, by mapping 9 bits to 2 subsequent symbols. In back-to-back measurement, Fischer et al. demonstrated 2.1 and 3.4 dB implementation penalties for PM-QPSK and 6PolSK-QPSK, respectively, for 28 Gbaud and a BER of around \(10^{-3}\). This corresponds to an extra implementation penalty for 6PolSK-QPSK of 1.3 dB.

Ding et al. generated 6PolSK-QPSK by using a single dual-drive (i.e., not I/Q) Mach–Zehnder modulator in each polarization.

In transmission, Fischer et al. demonstrated transmission of 19 WDM channels over 3400 km, which is less than the 4800 km of PM-QPSK. However, by applying a rate 455/511 Reed–Solomon code to 6PolSK-QPSK, making the bit rates the same for both formats, the transmission distance gap was partly bridged.

A later extension of this work by Tanimura et al. used a more advanced setup, including pre-emphasis to compensate for DAC imperfections, investigated the nonlinear tolerance in some detail.

It was found that an inner FEC could be beneficial in removing bursty errors due to nonlinearities. One conclusion from these works was that the lack of Gray mapping for the 6PolSK-QPSK induces an extra penalty relative to PM-QPSK, which is troublesome in the nonasymptotic regime.

**M-SP-QAM**

According to predictions and simulations the SP-QAM formats emerged as an interesting 4D format generalization with increased spectral efficiency over PM-QPSK.

In 2013, 128-SP-QAM was demonstrated by three independent groups. The transmitter structures in these experiments were similar; based on pre-programmable DAC:s using 8-bit streams of which one is a parity check bit.

Eriksson et al. used 12 Gbaud for 128-SP-QAM and compared with 10.5 Gbaud PM-16QAM to obtain the same data rate of 84 Gbit/s. Both single channel and 9 WDM channel (25 GHz separation) transmission were compared.

PM-16QAM had a slightly higher implementation penalty (2.1 dB) relative to 128-SP-QAM (1.5 dB), attributed to the improved Euclidean distance of 128-SP-QAM. The back-to-back sensitivity improvement of 128-SP-QAM was 1.9 dB at the same bit rate, and 2.9 dB at the same symbol rate, in close agreement with theoretical expectations.

In transmission, the reach was 1300 km (for PM-16QAM in a WDM system) and 2000 km (for 128 SP-QAM, also in WDM), that is, a 54 % improvement. It was also concluded that PM-16QAM has a larger penalty when going from single channel to WDM than 128-SP-QAM.

Zhang et al. used 128 SP-QAM (denoted “half-4D-16QAM”) over 294 channels (16.64 Gbaud at 17 GHz separation), covering the full C-band, at 104 Gbit/s each to achieve 30.58 Tb/s.

The transmitter used bit-interleaved coded modulation (BICM) together with a 20% overhead LDPC code. They also used DACs for Nyquist channel shaping to reduce interchannel crosstalk.

These data were then transmitted over 7230 km, making it (to that date) the experiment with the highest bit rate-times-distance product of 221 Pb/s × km. In a later experiment, digital back-propagation was used to increase the transmission distance to 10,300 km, but at a reduced data rate of 21 Tb/s.

Renaudier et al. demonstrated both 32-SP-QAM and 128-SP-QAM at 28 Gbaud. In back-to-back experiments, they achieved 0.8 and 1.5 dB implementation penalties, respectively, relative to the AWGN theory.

Then, they propagated 16 channels in a circulating loop constellation, at optimized power (which was the same for all formats), and measured transmission distance. At a BER of \(4\cdot10^{-3}\), they could propagate PM-QPSK, 32-SP-QAM, 128-SP-QAM, and PM-16QAM, respectively, over 18,000, 14,000, 7000, and 4000 km.

At ECOC 2013, two independent groups compared 32-SP-QAM with another spectral efficiency \(\beta\)= 2.5-bits/symbol/pol scheme, namely hybrid PM-QPSK/PM-8QAM.

The hybrid scheme means that half of the transmitted symbols are PM-QPSK (\(\beta\) = 2) and half are PM-8QAM (\(\beta\) = 3). 65 symbols of PM-QPSK were followed by 65 symbols of PM-8QAM. The interleaving was every second symbol, but done in both polarizations in a staggered manner, so that each symbol slot contained one polarization of QPSK and one polarization of 8QAM, and then the next symbol swapped formats in the polarizations.

The 8QAM format used in both papers was the star-shaped constellation consisting of two QPSK constellations at different radii, rotated \(45^\circ\) relative to each other. Both studies concluded (in line with theoretical predictions) that 32-SP-QAM had better performance in terms of transmission distance.

The 32- and 128-SP-QAM were also demonstrated in few-mode fiber transmission over 42 km by van Uden et al., but then using pairwise time slots, rather than polarizations, to set up the 4D space, and subsequently propagating 4D symbols independently in two polarizations and three spatial modes.

Also, the PS-QPSK counterpart, denoted time-switched (TS)-QPSK, was implemented. The transmitter setup consisted of a four-channel programmable DAC at 28 Gbaud, followed by a polarization- and mode-multiplexing stage.

One conclusion from this experiment was that the 4D formats have less implementation penalty than their cubic-lattice counterparts (e.g., when comparing 128-SP-QAM with PM-16QAM).

The record SP-QAM experiments in terms of constellation size are impressive 512- and 2048-SP-QAM, which are obtained from the related PAM constellations are PM-32QAM (cross constellation) and PM-64QAM, as recently demonstrated by Fischer et al..

The formats were generated by carefully co-optimizing multilevel analog-to-digital converters with I/Q modulators, but the 2048 case suffered, not surprisingly, from 3 to 6 dB implementation penalty. The 512-SP-QAM had a more moderate 2 dB of implementation penalty.

**More Complex 4D Formats**

The \(\mathcal{C}_{4,16}\) cluster was experimentally demonstrated independently at OFC 2013 by Karout et al. and by Bülow et al..

The experiment was based on an optical OFDM link, with 81 subcarriers generated by 2 synchronized DACs, comparing \(\mathcal{C}_{4,16}\) with PM-QPSK.

The signal bandwidth was 6.5 GHz, and the resulting data rate of 25.6 Gbit/s was obtained. A small overhead was allocated for training sequences and guard-band.

A small performance gain for \(\mathcal{C}_{4,16}\) could be seen, in good agreement with theoretical BER curves. Only back-to-back measurements, that is, no transmission, were carried out in this demonstration.

The experiment used a baseband signal realized by a 28-Gbaud DAC in the transmitter. The data were transmitted 480 km, but the theoretical performance gain of \(\mathcal{C}_{4,16}\) was shadowed by a larger implementation penalty than PM-QPSK. The experiments indicated an increased nonlinear tolerance of \(\mathcal{C}_{4,16}\), but this was not conclusive.

An extension of 6PolSK-QPSK to 8PolSK-QPSK was proposed and demonstrated by Chagnon et al.. This format uses eight different polarizations, put on the cube corners in Stokes space, and each with QPSK modulation.

The format was generated with a four-channel DAC driving a dual-polarization I/Q modulator, yielding a data rate of 129 Gbit/s. The reach was 3800 km (including WDM channel loading), well above the 2800 km of the PM-8QAM it was benchmarked against.

Bülow et al. also demonstrated experiments of formats obtained from spherical cuts from the \(D_4\)-lattice, with \(M\) = 64 and \(M\) = 256. Power efficiencies over PM-8QAM and PM-16QAM of 1.5–1.7 dB were reported. The experiments also included BICM using a 17 % overhead LDPC code with these formats, and PM-64QAM showed a 35 % reach improvement over PM-8QAM.

Eriksson et al. demonstrated the 256-ary \(D_4\) format and compared it with PM-16QAM at 56 Gbaud. As could be expected, the \(D_4\) format was better back-to-back, for low (\(\lt10^{-3}\)) bit error rates, but for higher BERs (corresponding to transmission distances above 1500 km) PM-16QAM had the edge.

**Higher-Dimensional Formats**

As the only way of improving performance (increasing both power efficiency and spectral efficiency) beyond the limits of 4D formats is to increase the dimensionality of the constellations, there has been work carried out in that direction as well.

A natural extension is to move to eight dimensions (8D), and biorthogonal 8D modulation was implemented and evaluated by Eriksson et al.. The eight dimensions were formed by using two phase-locked neighboring frequency (or wavelength) channels, and then performing coherent detection of both 4D channels in parallel.

The transmitter was a generalization of the corresponding transmitter for PS-QPSK, that is, a PS-QPSK transmitter in cascade with a pair of push–pull modulators to select frequency.

The format was referred to as 4-ary frequency- and polarization-switched QPSK, 4FPS-QPSK. The transmission properties (at 10 Gbaud) showed a reach of 14,000 km for 4FPS-QPSK, compared with 7500 km for PM-QPSK.

PPM was used instead of frequency to realize the eight dimensions. A pair of two PS-QPSK symbols formed a PPM frame, giving a 2PPM-PS-QPSK format, equivalent to the 8D biorthogonal format.

The transmitter becomes notably simpler, requiring a single modulator to select time slot, followed by a PS-QPSK transmitter. The implementation penalty at a BER of \(10^{-3}\) and a symbol rate of 21.4 Gbaud was 0.6 dB, slightly more than the 0.3 dB for PM-QPSK.

The data rate was 85.6 Gbit/s for both formats, meaning that the 2PPM-PS-QPSK format used 42 GHz of bandwidth, twice that of the PM-QPSK transmission. The transmission reach was almost doubled; 2PPM-PS-QPSK reached 12,300 km and PM-QPSK 6700 km, which is in reasonable agreement with a simple GN-model-based theory, predicting a doubling of the reach with a 3 dB improved sensitivity.

Shiner et al. also demonstrated the 8D biorthogonal format by using two subsequent temporal symbols to for the eight dimensions. Their experiment demonstrated transmission over 5000 km at 35 Gbaud, including WDM channels. They also rotated the constellation aiming to reduce the nonlinear effects, noticing a 1 dB improvement in the nonlinear system margin.

In a spatial-division multiplexed context, and in particular multi-core fibers, Puttnam et al. investigated modulation over several cores in parallel. Especially, formats based on a single-parity-check scheme (as outlined earlier) showed good results.

The idea was to coherently transmit 4 bits per symbol in \(K\) parallel cores, and then use a single bit as a parity check bit, giving a total rat4K − 1 parallel BPSK streams.

It may be shown that this scheme has both the spectral efficiency and the power efficiency equal to \(2-1/(2K)\). The special case \(K\) = 1 is equivalent to PS-QPSK. An experiment at 10 Gbaud over 28 km of multi-core fiber demonstrating the concept for \(K\) = 7 cores were performed, and the results were in good agreement with theoretical expectations.

**PPM Implementations**

Liu et al. suggested combining pulse position modulation (PPM) with PM-QPSK to show a record sensitivity for data transmission of 16-PPM at 2.5 Gbit/s and 4-PPM at 6.23 Gbit/s.

The main benefit of these formats are the increased power efficiency, requiring only 2 photons per bit at 2.5 Gbit/s, making them suitable in, for example, single-span long-distance links that demands high sensitivity.

The transmitter was a standard PM-QPSK transmitter, but driven with a more complex data signal, including PPM framing and also some synchronization overhead signals. The transmission was done over a single span of 370 km ultra-large-effective area fiber, with a total loss of 69 dB.

Slightly outside the main topic of this tutorial, we could mention the free-space link experiment by Ludwig et al., which used 64-PPM overlaid with PS-QPSK to require only 2.2 photons per bit for data transmission at 0.56 Gbit/s.

In this context, we shall also reiterate the experiments by Sjödin, Eriksson et al., and van Uden et al., who implemented PS-QPSK as 2-PPM-QPSK, that is, by using two adjacent symbol time frames instead of two polarizations.

### 5.2 Receiver Realizations and Digital Signal Processing

The conventional coherent receiver DSP for PM-QPSK operates according to the following flow.

- Compensation for static errors in the receiver front end (timing skew, power imbalance, I/Q phase error, usually using Gram–Schmidt orthogonalization).
- Static channel equalization (mainly chromatic dispersion compensation, usually with an finite impulse response (FIR) filter).
- Dynamic channel equalization (mainly polarization tracking and polarization mode dispersion compensation, usually using the constant modulus algorithm (CMA)).
- Interpolation and timing (clock) recovery.
- Frequency estimation.
- Carrier phase estimation (usually using the Viterbi and Viterbi (VV) algorithm)
- Symbol estimation and decoding.

Traditionally, blind algorithms have been mostly used to estimate and compensate for channel effects, meaning that the modulation format is known, but the exact symbol transmitted at each time is unknown. Much is based on the knowledge of the used format, so when changing modulation format, a number of these steps have to be altered or modified.

Most “sensitive” in this respect are the final stages: symbol estimation and decoding, and the carrier phase estimation. For example, when moving from PM-QPSK to PM-16QAM, these stages obviously need to be modified.

However, also the dynamic channel estimation stage (the CMA) may need to be modified when changing format. For PS-QPSK, this is the case; the CMA needs to be modified, whereas the other stages can be kept the same as for PM-QPSK.

More recently, however, there is a research trend toward the use of nonblind schemes, where the channel estimation is based on known signals, training sequences, which are transmitted regularly.

Especially in optical OFDM systems, this is popular, but also in conventional baseband transmission it is becoming used. In the following, we describe how these stages need to be modified for some of the 4D formats.

**3D Simplex**

The experiment of Dochhan et al. used more or less standard coherent receiver algorithms, although slightly modified CMA and phase-tracking algorithms were used. A blind algorithm similar to the one presented by Yan et al. was used together with the standard CMA for polarization tracking.

PS-QPSK

Naively, one might think that since PS-QPSK is a subset of PM-QPSK, the conventional DSP should work also for PS-QPSK. This is true for all DSP stages except the polarization equalization, the CMA, which needs to be modified.

This dynamic equalizer has a cost function \(J\) that is minimized in an iterative process, aiming to optimize FIR-filter coefficients in a Jones-matrix-like filter.

Figuratively speaking, the cost function is minimized when the detected samples lie on a circle of unit radius in both polarizations. This scheme works surprisingly well also to compensate for transmission impairments such as polarization mode dispersion (PMD) and polarization dependent losses (PDL), and in fact it also works (albeit with reduced performance) “out-of-the-box” for formats without a constant, but with a nonzero, modulus, such as PM-16QAM.

However, for PS-QPSK it fails, due to an ambiguity making the cost function minimum nonunique. In addition, it requires the polarizations to be independently phase-tracked by two separate VV algorithms.

A number of ways to resolve this issue have been reported. Johannisson et al. suggested a modified cost function \(J\) according to

\[\tag{2.19}J=E[(|E_x|^2+|E_y|^2-P)^2+2q|E_x|^2|E_y|^2]\]

where \(E[]\) denotes the expectation operator (i.e., averaging over a number of symbols), \(E_{x,y}\) are the complex amplitudes of the symbols in the \(x\) and \(y\) polarizations, and \(P\) is the total signal power.

The parameter \(q\) is set to −1 for PM-QPSK and +1 for PS-QPSK. This enables the same CMA to be used with both formats, by just changing the parameter \(q\) in the cost function, which should facilitate implementations.

Alternative CMA approaches were independently suggested. Millar and Savory proposed a modified CMA where the magnitude of the polarization components is compared, and the weaker allowed to reach zero.

An experiment using OFDM with PS-QPSK and an outer LDPC code showed that a 4D demapper in the iterative decoder loop gave increased decoder performance.

Renaudier et al. proposed instead to use a combined transmitter/receiver-based solution for the CMA ambiguity.

By introducing a time offset equal to an integer number of symbol times between the two polarization components in an XOR-based PM-QPSK transmitter, the polarization ambiguity outlined earlier is suppressed and the received signal “looks” like a PM-QPSK signal to the receiver CMA.

Note that his offset must be larger than maximum differential delay of the multi-tap CMA filter. Then, the standard PM-QPSK CMA can be used in the receiver, but a modified VV algorithm must be used instead. Also, Alreesh et al. presented an alternative VV algorithm for PS-QPSK, enabling joint phase tracking in both polarizations.

**6PolSK-QPSK**

The coherent receivers for 6PolSK-QPSK followed the standard coherent receiver, with modifications for the polarization and phase-tracking algorithms.

The implementation of Buchali and Bülow to detect 6PolSK-QPSK used a modified CMA with a new cost function for the polarization tracking and a more complicated VV scheme, raising the signals to the eighth power for the phase estimation.

Fischer et al. used a pilot sequence with 1.1 % overhead, which also helped with the local oscillator frequency offset estimation, to do the polarization tracking, thus avoiding a CMA. For phase estimation, they used a standard VV algorithm, but on a subset of the detected symbols.

Tanimura et al. also considered nonlinear compensation via digital back-propagation as well as with an inner Reed–Solomon FEC.

In addition to the experiments discussed earlier, a couple of later extensions of the 6PolSK-QPSK work should be mentioned. Bülow calculated mutual information and estimated its performance with an outer FEC. Bülow and Masalkina investigated coded modulation based on, among other formats, 6PolSK-QPSK.

Chen et al. extended 6PolSK-QPSK to a 32-point constellation, enabling a 5-bits-to-symbol mapping by extending the constellation with 8 additional points outside the 24-cell. Even if this format was shown to have a sensitivity improvement over, for example, star-shaped 8QAM, it has gives a penalty relative to better 4D constellations with 32 points such as 32-SP-QAM.

**M-SP-QAM**

The DSP required for 32- and 128-SP-QAM is very similar to the requirements for PM-16QAM, which is not obvious, but it works according to simulations and experiments.

In implementations with the same receiver DSP (intended for PM-16QAM), one generally finds that 32- and 128-SP-QAM have less implementation penalty, which is likely due to the better separation of levels.

Sun et al. compared the linewidth tolerance of 32-SP-QAM and the hybrid PM-QPSK/8QAM format and found it to be comparable, provided that the VV algorithm was slightly modified for 32-SP-QAM.

The work by Zhang et al. also added coded modulation (of the BICM flavor) to the use of 128-SP-QAM with an LDPC(18360,15300) code and an interleaver over 30 independent bit streams demultiplexed from 2 separate wavelengths.

The decoding process then used 10 inner and 5 outer iterations to achieve the required performance. The performance was further improved by adding nonlinear back-propagation to the DSP, thus enabling longer transmission distances.

In van Uden’s work on few-mode transmission, the six spatial channels (two polarization modes in three fiber modes) were first optically polarization- and mode-demultiplexed, and then in a 6 × 6 MIMO (multiple input multiple output) structure optimized by an iterative least mean squares scheme, before entering the coherent phase estimator, demapper, and BER counter.

This MIMO structure can be seen as a generalization of the CMA scheme in a conventional coherent receiver. The receiver was also simplified by the fact that the 4D modulation was performed in the time domain, rather than the polarization domain (which will not affect the theoretical spectral efficiency or sensitivity for linear transmission).

**More Complex Formats**

The \(\mathcal{C}_{4,16}\) cluster was transmitted using OFDM, and the synchronization issues, such as polarization and phase tracking, are then addressed in the OFDM receiver, using the channel estimators in the OFDM DSP.

For example, one of the OFDM subcarriers was left unmodulated to act as a pilot channel, aiding the synchronization DSP.

In addition, three training symbols were used in every 512 OFDM-symbol block. The overall spectral efficiency was thus 3.82 bits/symbol/pol, rather than 4 for the raw format.

On the contrary, PM-QPSK was transmitted over the same OFDM channel, so the two formats were compared in a fair fashion.

The implementation of Bülow et al. to detect \(\mathcal{C}_{4,16}\) (referred to as “OPT16”) was a more conventional coherent baseband receiver. The CMA was similar to the one used in their 6PolSK-QPSK experiments. The phase tracking was carried out by using a decision-directed least mean squares scheme, aided by a training sequence. After the training sequence set up the initial starting point, the decision-directed scheme could take over.

**Higher-Dimensional Formats**

In the 8D implementation of frequency and polarization-switched QPSK, the receiver used only one local oscillator (centered between the two channels) and one optical front-end to detect both channels.

The two channels were filtered in DSP, down-converted to baseband and then processed in parallel using the standard DSP flow. The CMA had to be modified with a power threshold to estimate in which frequency a given symbol was sent.

The PPM implementation of the 8D format was simpler in that it used a standard coherent DSP throughout, with the exception of the CMA that used the power threshold to judge which PPM frame was used.

**PPM Implementations**

The experiments by Liu et al. on PPM made extensive use of pilot sequences both for frame synchronization and channel estimation. The frame structure consisted of 3 frames (48 symbol slots) of training sequences followed by 16,100 PPM symbol slots. The pilot sequences helped with both polarization tracking and phase estimation. The rate overhead due to this was small, less than 1 %.

### 5.3 Formats Overview

In Table 2.3, we summarize the first proposals and implementations of some relevant optical 4D modulation formats.

### 5.4 Symbol Detection

For nonregular and high-order constellations, the detection process, that is, determining which of the constellation points that was transmitted, can be quite cumbersome and computationally intensive.

The best detector, in the sense of minimizing the SER, is the ML detector. In the special case of an AWGN channel, the ML detector computes the Euclidean distance between a received vector \(\pmb{r}\) and all constellation points in \(\mathcal{X}\) and picks the symbol with the smallest distance, that is,

\[\tag{2.20}\hat{x}=\text{arg }\underset{\pmb{x}\in\mathcal{X}}{\text{min}}(|\pmb{r}-\pmb{x}|)\]

This will always work, but requires \(M\) distance calculations and a number of comparisons, which is computationally costly, and one, therefore, seeks easier detectors in practice.

In this section, we describe how this detection is done efficiently for some of the common 4D constellations. We start by describing lattices, and then give examples for the specific formats.

For the \(n\)-dimensional integer (or cubic) lattice \(\mathbb{Z}^n\), the problem is particularly simple; the ML detection is equivalent to just rounding each component of the received vector to the nearest integer. As shown by Conway and Sloane, this procedure can be modified to work for other lattices of interest, such as \(D_4\) and \(E_8\).

For the \(D_4\) lattice (2.14), one can use the following algorithm:

- Decode \(\pmb{r}\) to the nearest point in \(\mathbb{Z}^4\), and check the parity of the lattice point (i.e., the modulo-2 sum of its coordinates).
- If the parity is odd, round the component of \(\pmb{r}\) that is farthest away from its closest integer to its second closest integer.

An equally simple algorithm applies to SP-QAM formats obtained by expansion. These algorithms can be generalized to the \(E_8\) lattice as well.

For the simpler formats such as PS-QPSK, the symbol detection is straightforward, but depends on the form after synchronization and phase tracking.

For the set-partitioned-form, that is, \(\pm\{(1,1,1,1),(1,1,−1,−1),(1,−1, 1,−1),(1,−1,−1, 1)\}\), one can use the above-mentioned \(D_4\) scheme.

For the polarization-switched form, where the constellation points are \(\{(\pm1,\pm1, 0, 0),(0,0,\pm1,\pm1)\}\) one can use the following steps:

- Determine the polarization by comparing \(|\mathcal{R}(E_x)|+|\mathcal{I}(E_x)|\) with \(|\mathcal{R}(E_y)|+|\mathcal{I}(E_y)|\).
- Detect QPSK as usual in the chosen polarization.

It is noteworthy and somewhat surprising that comparing the magnitude of the complex numbers \(|E_x|\) and \(|E_y|\) for determining the polarization is suboptimal in contrast with this scheme.

As a simple example of this, consider the received vector \(\pmb{r}=(0.9,0.1,0.6,0.5)\), or \(E_x=0.9+i0.1\), \(E_y=0.5+i0.6\). The closest PS-QPSK point in the Euclidean distance metric is \((0,0,1,1)\), that is, an ML symbol detector selects the \(y\) polarization, despite the received power in \(x\) being \(|E_x|^2=0.82\), which is higher than the power in \(y\), being \(|E_y|^2=0.61\).

For the 24-cell (6PolSK-QPSK) in the form \(\{(\pm1,\pm1,\pm1,\pm1),(\pm2,0,0,0)\}\), the following detector is optimal:

- Find the maximum of \(\{(|r_1|+|r_2|+|r_3|+|r_4|)/2,|r_1|,|r_2|,|r_3|,|r_4|\}\), where \((r_1,r_2,r_3,r_4)=\pmb{r}\).
- If the first is maximum, proceed with standard PM-QPSK detection to return \((\text{sgn}(r_1),\text{sgn}(r_2),\text{sgn}(r_3),\text{sgn}(r_4))\). If one of the last four are largest, take the sign of that component and multiply with 2.

### 5.5 Realizing Dimensions

The dimensionality can be taken as one metric of complexity, and often one chooses to compare modulation formats of the same dimensionality, that is, keeping the complexity similar.

Shannon proved that an arbitrarily small SER is achievable for any channel, assuming that the spectral efficiency is below a certain threshold, the channel capacity, and that the dimensionality goes to infinity. The work was later extended to quantify how the SER decreases with dimension.

The introduction of a channel (FEC) code is the obvious, and prevailing, way of increasing the dimensionality in communications.

If used with BPSK mapping, the dimensionality is simply the number \(n\) of bits in the FEC frame. There may, however, be practical reasons for keeping the value \(n\) down; the latency, complexity, and power consumption incurred by the FEC will increase with \(n\).

Moreover, even if a good FEC code is used that can tolerate a low SNR, the pre-decoder signal may be so distorted by noise that DSP and synchronization algorithms will limit the performance rather than the ideal FEC ability.

Thus, in practical systems, it will make sense also to use the degrees of freedom (DOF) of the transmission channel to increase the dimensionality and the constellation distances.

However, this often leads to issues with crosstalk and synchronization that need to be resolved in the receiver. Later, we briefly discuss the practical implementation challenges by increasing the dimensionality via the physical DOFs.

We should emphasize that the DOFs used as signaling dimensions usually have independent noise sources, which simplifies their usage for the AWGN channel model, as the noise can be modeled as a hyperspherical cloud around the transmitted symbol.

**Quadratures**

Every carrier wave has two DOFs that can be modulated independently, that is, the two quadratures usually described by the real and imaginary parts of a complex phasor, or as the “sine” and “cosine”-components of the wave. An alternative decomposition is the amplitude and phase of the wave.

Up to around 2000, the amplitude was the only DOF used in commercial optical links, and it still is in short-haul links, due to the cost and complexity associated with modulating the optical phase.

To reliably detect the optical phase, a coherent or differential-phase receiver is required. The differential receiver is simpler in optical hardware, but has extra an penalty relative to the coherent counterpart and is limited to mainly PSK modulation.

Since the intradyne coherent receiver was demonstrated, the differential optical receivers have faded away. Coherent receivers are challenging due to the required rapid phase tracking on a microsecond time scale, but with the development of fast electronics and DSPs, they are becoming increasingly common and cost effective, and will likely prevail in future optical links.

Ways of reducing the coherent receiver complexity by, for example, co-propagating a local oscillator carrier (so-called self-homodyning) have been proposed.

The quadrature dimensions are, just as polarizations, and contrary to the time- and space-related dimensions discussed later, not scalable to more than 2.

**Polarization**

Electromagnetic waves have a vector property not seen in longitudinal waves (e.g., acoustic waves) or transverse matter waves (e.g., water waves, oscillating strings).

The easiest description is that of two independent carrier waves, with orthogonally directed vector field components. They are often referred to as the \(x\) and \(y\) polarization states, but there are other orthogonal decompositions possible as well.

Usually, a “polarization state” refers to the relative amplitude and phase between these two waves.

Thus, one has two ways of describing the 4 DOFs of the classical electromagnetic wave, either with “amplitude,” “phase,” and “polarization state” or with the I and Q quadratures in the \(x\) and \(y\) polarizations.

The former is the traditional description used in optics, and the latter is the most attractive one used in communications, as those DOFs form a Cartesian system.

The polarization state in a fiber link is slowly (second to millisecond time scale) drifting due to imperfections, micro- and macrobendings, thermal changes, fiber movements etc., and the use of polarization dimensions will thus require polarization tracking in the receiver.

A commercially attractive and low-cost polarization tracker was not available before the intradyne coherent receiver, and as a result, polarization was not actively used for modulation in commercial systems (although studied).

Also, the intersymbol interference problems related to polarization mode dispersion were long regarded as a significant obstacle (also in conventional links that were not polarization modulated), but elegantly resolved by the adaptive CMA filter in the coherent receiver.

The combined use of quadratures and polarization leads to a real 4D constellation space that is the basis for coherent signaling. It is noteworthy that some key problems and issues with 4D modulation are unresolved or only recently being explored, for example, channel modeling via 4D rotations and modulation format optimization.

**Time**

With an AWGN channel, a \(T\)-orthogonal pulse (2.3), and a matched-filter receiver, adjacent symbols in time will have independent noise, and can be thus be framed to a “supersymbol” with dimensionality equal to the number of symbols used (possibly times the dimensionality of each symbol).

Such a supersymbol can now be modulated with a higher-dimensional format. When using a simple format such as 1D OOK or BPSK, this is equivalent to applying an FEC frame with a binary code.

This clearly highlights the close relationship between modulation and coding; indeed, there is no clear-cut distinction between coding and modulation in communication theory.

Nonetheless, from a practical and implementation perspective, it makes sense to distinguish between modulation and coding, as they are implemented with very different hardware, meeting different challenges.

The time-multiplexing described earlier (of which PPM is one special case) is particularly simple and attractive, since the phase and symbol time synchronization essentially comes for free. The frame synchronization needs to be resolved, however, and usually requires a test sequence and/or use of a specific transmission protocol.

The temporal dimension and its simplicity thus forms a simple playground for testing new formats without challenging synchronization issues. For example, this was used in the experiment by van Uden et al., where two time slots were used to form a 4D symbol rather than the two polarization states, simplifying receiver DSP significantly.

**Frequency/Wavelength**

Different channels transmitting at adjacent wavelengths (frequencies) can be used to form multidimensional supersymbols. The concept of “superchannels” were introduced to denote such multi-wavelength channels, which are routed and detected as one entity. To make use of correlated modulation to increase signal space dimensionality with such superchannels has not yet been realized, but should be possible.

In a similar vein, one could perform joint detection of multi-wavelength channels in a WDM link, thus enabling multidimensional modulation and coding. This meets practical problems with temporal (walk-off related) synchronization and phase synchronization, as well as signal ambiguities of the independent wavelengths, so it has not yet been widely used.

A few limited cases have been reported though, for example, the 8D format by Eriksson et al. detecting two wavelengths with the same local oscillator.

**Space**

With the recent interest in spatial-division multiplexing, that is, the use of waveguide modes and/or parallel waveguides to increase the data rates of optical links, it seems natural to try to further increase the capacity by moving to joint transmission over parallel modes and/or fibers, that is, increasing signaling dimensionality by making use of also the spatial DOFs. These can be (i) the different modes in a multimode fiber, (ii) different cores in a multi-core fiber, (iii) entirely separate fibers, or a combination of these.

However, spatial multiplexing is associated with severe practical challenges. For example, the different modes in a multimode fiber have different group velocities and hence a significant differential mode delay will arise that needs to be compensated for.

Even worse, bends and fiber imperfections gives rise to modal crosstalk that mixes the delayed modes, further complicating the reception of individual modes. As a result, MIMO signal processing is needed to compensate for the modal crosstalk, which is very DSP-heavy.

The single-mode (linear-polarized, \(LP_{01}\)-mode) fiber has four dimensions due the polarization and quadratures. The next mode in the weakly guiding, step-index, circular fiber mode hierarchy, the (linearly polarized) \(LP_{11}\)-mode, is doubly degenerate. This means that it can be of two orientations that are mathematically orthogonal, usually referred to as \(LP_{11a}\) and \(LP_{11b}\).

A linear combination of these modes is popularly referred to as the lowest orbital angular momentum (OAM) mode, with azimuthal index ±1. Thus, OAM modes fall well within the conventional modal description, and offer limited novelty and no principal extension of the existing DOFs, as recently shown for radio frequency (RF) transmission.

This means that inclusion of the next higher-order mode in a circular fiber totals to three orthogonal modes, of 4D each, leading to a 12D space. Format optimization in this 12D space has been discussed by Bülow et al..

A more straightforward approach seems to be to use different (single-mode) cores in a multicore fiber to increase the dimensionality. This relaxes the crosstalk and walk-off penalties. Novel formats exploiting these DOFs were recently discussed by Eriksson et al. and some initial experiments were reported by Puttnam et al..

**Spatial Frequencies**

Due to the space–time duality, which states that free-space propagation of optical beams is similar to dispersive propagation of optical pulses, one could just as well use spatial frequencies as temporal frequencies.

Spatial frequencies translate into physically different propagation directions, so their use would be of most interest in free-space rather than in guided-wave propagation.

A deeper discussion of this topic is, therefore, not included in this tutorial. It suffices to say that the challenges and solutions connected with spatial frequency (physical beam direction in the paraxial propagation limit) are similar to the MIMO technologies used in wireless communications.

## 6. Summary and Conclusions

In this tutorial, we have overviewed the relatively large body of work (experimental and theoretical) on modulation formats for optical coherent links that have emerged over the last 5 years.

We have also shown the performance limits of formats in 2D, 4D, and 8D by reviewing sphere packing simulations, lattice-cuts, and code-based format design.

To reach higher dimensions, formats based on codes are probably the most straightforward approach rather than numerically optimized sphere packings, and we showed a few examples of this as well.

The results summarized in this tutorial are somewhat idealized, in that they to a large extent (i) neglect the impact of FEC, (ii) focus on the asymptotic high-SNR behavior (low SER and BER), and (iii) emphasize the AWGN channel.

This is the classical starting point for modulation theory research and should be viewed as a first step toward a fuller understanding of optical link design. Broadening the scope in all three directions are presently active areas of research.

First, to separate modulation format and coding is (while unnecessary from the information theorist’s point of view) necessary from the system engineer’s perspective.

The modulation format dictates the transmitter and receiver hardware optoelectronics, the complexity, and thus to some extent the cost. It dictates the DSP and the complexity of the receiver electronics, especially if blind equalization is used. It also dictates the attainable spectral efficiency.

Thus, when selecting formats, one makes critical system choices and it is important for all the trade-offs made in system design to know how well the formats behave—even if it is an ideal or asymptotic behavior.

Moreover, there are communication applications that are latency-critical, where the use of FEC is prohibited or at least limited, and there good formats are very important. Examples may be control systems, video conferencing or telephony, and transfer of stock market trading data.

Another issue, seldom studied or emphasized, is the performance of synchronization algorithms, channel estimation algorithms, and adaptive equalizers for various formats. Those tend to operate worse at low SNR (which might arise if strong FEC codes are used), but better if constellations with well-separated points are used. In such situations, the clusters or lattice-based constellations described in this chapter might be a better choice than standard QAM.

An important application for the use of many different formats is the emergence of elastic networking. In future optical networks, where an increased flexibility is desired (often called elastic optical networks), one strives to adapt the data rate provided to customers after the available bandwidth, SNR, and demand. In such systems, it is of great value to be able to switch between different modulation formats to provide the sought flexibility, and a good overview of the performance and trade-offs of formats is needed. A recent overview of 4D modulation formats from this perspective was recently provided by Fischer et al..

Second, the study of asymptotic performance metrics should be complemented by studies of the performance of modulation formats at more practically relevant SNR values. Unfortunately, no analytic instruments are available for this purpose.

To numerically optimize multidimensional modulation formats for specific SNRs is computationally complex, and it may be difficult to get the full picture. Asymptotic metrics such as 𝛾, CFM, and G are attractive in that they give one number to compare formats by, which quickly can be used to compare and select formats. They also provide an intuitive interpretation in terms of sphere packing, and they give an upper limit of the performance gains, which is valuable.

However, once a set of formats are selected, they should be compared via SER or BER simulations as well as complexity estimates. Even if the asymptotic gains are never achieved, the current design paradigm in optics is to compare formats with the same FEC, which has a waterfall region around \(10^{-3}\) for the uncoded SER, and then, quite often, the optimized multidimensional formats will outperform (e.g., in term of system reach) the standard QAM formats.

Third, more advanced channel models to go beyond the AWGN model are currently being studied and depending on their nature, the results from AWGN modeling may or may not be useful. For example, the GN model, in links offering high SNR, can benefit from AWGN-optimized constellations, whereas nonlinear phase-noise channels benefit from radically different constellations.

Finally, we note that the price we pay when going to higher-dimensional formats (and codes) is complexity, which is a notoriously difficult quantity to quantify. Even if it can be quantified in terms of component cost, number of floating point operations, chip area for DSP implementation, or dimensionality of codes and formats, it is hard to provide generic results, since the most reliable complexity metrics are implementation-specific.

The next tutorial introduces ** dichroic and diffraction-type polarizers**.