# Ray Optics and Ray Matrices

This is a continuation from the previous tutorial - Optical beams and resonators: an introduction

## 1. PARAXIAL OPTICAL RAYS AND RAY MATRICES

Ray matrices or "ABCD matrices" are widely used to describe the propagation of geometrical optical rays through paraxial optical elements, such as lenses, curved mirrors, and "ducts." These ray matrices also turn out to be very useful for describing a large number of other optical beam and resonator problems, including even problems that involve the diffractive nature of light.

Therefore, we begin the discussion of optical beams and resonators with a detailed review of paraxial ray theory and ray matrices.

### Optical Rays and Ray Transformations

Consider a ray of light—or equally well a particle, such as an electron—that is traveling approximately in the z direction, but with a transverse displacement \(r(z)\) from the axis and also a small slope \(dr/dz\), as in Figure 15.1. If such a ray propagates in free space from a plane at \(z_1\) to a later plane at \(z_2=z_1+L\), as in Figure 15.2, its input and output ray coordinates will be related by the transformation

\[\tag{1}\begin{align}r_2&=r_1+Ldr_1/dz\\dr_2/dz&=dr_1/dz.\end{align}\]

Suppose the same ray passes through a thin lens of focal length / as in the lower part of Figure 15.2. The input and output ray coordinates just before and after

the lens will then be related by

\[\tag{2}\begin{align}r_2&=r_1\\dr_2/dz&=-(1/f){r_1}+dr_1/dz.\end{align}\]

(Note that we use a sign convention in which a positive value for \(f\) means a positive or converging lens.)

Equations 15.1 and 15.2 both give linear transformations between the input and output displacements and slopes of the rays. In rectangular coordinates, of course, these displacements r and slopes \(dr/dz\) can represent equally well either the \(x\)-axis quantities \(x\) and \(dx/dz\), or the y-axis quantities \(y\) and \(dy/dz\).

### Optical Ray Matrices, or ABCD Matrices

In fact, the change in displacement and slope of an optical ray upon passing through a wide variety of simple optical elements can be written in the same general form as Equations 15.1 and 15.2.

One slight additional complexity should be added, however. In writing ray transformations like these, we will simplify many later results if we define the ray slope variable to be, not the actual slope \(dr/dz\) of the ray, but rather this actual slope multiplied by the local index of refraction at the ray position. Hence, we will define in the most general situation

\[\tag{3}r'(z)\equiv n(z)\frac{dr(z)}{dz}\]

and similarly for \(x'(z)\equiv n(z)\;dx(z)/dz\) and \(y'(z)\equiv n(z)\;dy(z)/dz\).

With these definitions we can connect input and output displacments and slopes in a wide variety of paraxial optical elements by the general form

\[\tag{4}\begin{align}r_2&=Ar_1+Br'_1\\r'_2&=Cr_1+Dr'_1.\end{align}\]

where we use \(r'_1\) and \(r'_2\) to denote the modified ray slopes at the input and output planes, and where the coefficients A, B, C, and D characterize the paraxial focusing properties of this element.

If we need to, we can refer to the derivatives \(dr(z)/dz\) and so forth as the real slopes and to the quantities \(r'(z)\) and so forth as the reduced slopes, in situations where we need to be precise.

It is then natural to write Equation 15.4 in matrix form as

\[\tag{5}r_2\equiv\left[\begin{array}\quad r_2\\r'_2\end{array}\right]=\left[\begin{array}\quad A\quad B\\C\quad D\end{array}\right]\times\left[\begin{array}\quad r_1\\r'_1\end{array}\right]\equiv\textbf{Mr}_\bf1,\]

where \(M\) is the ray matrix for the optical element. Table 15.1 lists the ray matrices for a large number of basic paraxial optical elements, using the actual displacement and reduced slope variables. Note in particular that if we use the generalized definition for the reduced ray slopes, then the bending of a ray trajectory that occurs at a dielectric interface because of Snell's law is automatically taken into account, and the \(\text{ABCD}\) matrix for a planar dielectric interface is simply the identity matrix.

With the generalized slope definition of Equation 15.3, it is a general property of all the basic elements in Table 15.1 that the ray matrix determinant is given by

\[\tag{6}AD-BC=1.\]

(If we do not use the reduced slopes, then we have the more cumbersome relation that \(AB-BC=n_1/n_2\) where \(n_1\) and \(n_2\) are the refractive indices at the input and output planes.) Since the determinant of a matrix product is the product of the determinants, Equation 15.6 holds equally well for an arbitrary cascade of optical elements.

### Interfaces and Ducts

The fundamental building blocks for all the paraxial systems of Table 15.1 are curved dielectric interfaces and quadratically varying dielectric media or "ducts". The general \(\text{ABCD}\) matrix for a curved interface between two dielectric media can be derived from Snell's law and elementary geometry, and is given in Table 15.1. The corresponding \(\text{ABCD}\) matrix for a quadratically varying medium can be developed as follows.

First of all, by a "duct" we mean any dielectric medium which has a quadratic transverse variation in its index of refraction, with either a maximum or minimum on axis, as shown in Figure 15.4. We will also extend this concept in later sections to include "complex ducts" in which there may be a quadratic transverse variation of the loss or gain coefficient as well as the real index of refraction.

To analyze ray propagation in a duct, we can consider a ray, or better a light beam of small but finite width, traveling as in Figure 15.5. The inner edge of this beam is at radius \(r\), and the outer edge at radius \(r+\Delta r\).

Suppose the index of refraction \(n(r)\) decreases going radially outward from the system axis, so that the inner edge of this light beam is in a region of slightly higher index. The inner edge of the beam then travels more slowly, whereas the outer edge sees a lower index value and travels faster. As a result the beam tends to be continually turned or bent inward toward the axis.

Suppose that the index of refraction in this medium can be written, or at least approximated, in the quadratic form

\[\tag{7}n(r,z)=n_0(z)-\frac{1}{2}n_2(z)r^2,\]

where \(n_0(z)\) is the variation along the axis, and the parameter

\[\tag{8}n_2(z)\equiv\left.-\frac{\partial^2n(r,z)}{\partial r^2}\right |_{r=0}\]

is the downward curvature of the index at the axis. Then, within the paraxial approximation a ray traveling through this medium will follow a trajectory given

by the ray propagation equation

\[\tag{9}\frac{d}{dz}\left[n_0(z)\frac{dr(z)}{dz}\right]+n_2(z)r(z)=0.\]

Suppose we define the reduced slope for this ray at any plane, as already discussed in the preceding, by

\[\tag{10}r'(z)\equiv n_0(z)\frac{dr(z)}{dz}\]

Then we can separate the ray propagation equation (15.9) into the pair of equations

\[\tag{11}\frac{dr(z)}{dz}\equiv\frac{r'(z)}{n_0(z)}\qquad\qquad\text{and}\qquad\qquad\frac{dr'(z)}{dz}=-n_2(z)r(z),\]

where the first equation is true by definition, and the second accounts for refractive bending in the radially inhomogeneous duct.

### Stable Quadratic Ducts

Ray propagation in real quadratic ducts separates naturally into geometrically stable and unstable ducts. To show this, let us suppose that the on-axis

index value n0 and the transverse derivative \(n_2\) in Equations 15.11 are both constant with distance. The two ray equations can then be combined to give the single trajectory equation

\[\tag{12}\frac{d^2r(z)}{dz^2}+\frac{n_2}{n_0}r(z)=\frac{d^2r(z)}{dz^2}+\gamma^2r(z=0),\]

where \(\gamma\) is given (for positive values of \(n_2\)) by

\[\tag{13}\gamma^2=\frac{n_2}{n_0}\qquad\text{or}\qquad\gamma=\sqrt\frac{n_2}{n_0}.\]

The general solution for ray propagation in this kind of paraxial (quadratic) duct becomes

\[\tag{14}\begin{align}r(z)&=r_0\cos\gamma z+\frac{1}{\gamma}\frac{dr_0}{dz}\sin\gamma z\\&=r_0\cos\gamma z+(n_0\gamma)^{-1}r'_0\sin\gamma z,\end{align}\]

where \(r_0\) and \(r'_0\) are the initial displacement and (reduced) slope of the ray at \(z=0\).

From Equation 15.14 and its derivative, we can see that the general ray matrix for a duct of length \(z\) is

\[\tag{15}M=\left[\begin{array}\quad\cos\gamma z\quad\quad(n_0\gamma)^{-1}\sin\gamma z\\-n_0\gamma\sin\gamma z \quad\quad\cos\gamma z \end{array}\right]\]

A duct with an index maximum on axis and a quadratic variation near the axis will trap optical rays so that they will oscillate periodically back and forth across the centerline of the duct, as shown in the top part of Figure 15.6. We will refer to this as a stable quadratic duct.

### Unstable Quadratic Ducts

The same analysis in Equations 15.12 to 15.15 will apply equally well to a medium in which the index of refraction increases quadratically going outward from the axis, so that \(n_2<0\quad\text{or}\quad d^2n/dr^2>0\). In this situation, however, the value of \(\gamma^2\) becomes a negative quantity, and \(\gamma\) must be replaced by

\[\tag{16}\gamma^2=-\vert\frac{n_2}{n_0}\vert\quad\text{or}\quad\gamma=j\sqrt{\frac{1}{n_0}\frac{d^2n}{dr^2}}=j\vert\gamma\vert.\]

The general solution analogous to Equation 15.14 then becomes

\[\tag{17}r(z)=r_0\cosh\gamma z+(n_0\gamma)^{-1}r'_0\sinh\gamma z,\]

and the \(\text{ABCD}\) matrix becomes

\[\tag{18}M=\left[\begin{array}\quad\cosh\gamma z\quad\quad(n_0\gamma)^{-1}\sinh\gamma z\\-n_0\gamma\sinh\gamma z\quad\quad\cosh\gamma z\end{array}\right].\]

Such an "anti-duct," with an index minimum on axis, will diverge (as well as defocus) optical rays. It acts in general in the same way as a thick diverging lens, as shown in the lower part of Figure 15.6.

Ducts thus provide our first illustration of the distinction between stable ray-propagating systems, in which rays oscillate periodically back and forth about the ray axis but with bounded excursions; and unstable ray-propagating systems, in which rays diverge exponentially outward with distance. We will see many examples of this for more complex types of paraxial focusing systems in later sections.

### Examples of Ducts: Optical Fibers and GRIN Rods

The focusing and ray-trapping properties of stable quadratic ducts are of great practical importance. They provide first of all an idealized model for light propagation in the graded-index optical fibers that are now becoming widely used for long distance optical communications.

The simplest type of optical fibers are made up of a uniform core surrounded by a lower-index cladding, as in Figure 15.7, so that the radial index variation is a step-function rather than a smooth quadratic variation.

A more detailed waveguide type of analysis is then required to give an accurate description of the modes in fibers having this type of discontinuous index variation.

Many fibers are now being made, however, with a smoothly varying radial profile which more or less approximates a quadratic index variation (Figure 15.7, lower part).

The simple results given in the preceding equations will then provide a good first-order approximation to the ray behavior in this kind of fiber, regardless of the actual index variation \(n(r)\), provided that the index variation has a quadratic leading term near the axis and provided that the ray trajectories are confined close enough to the axis so that higher-order terms in the radial index variation do not become important.

More accurate solutions for other index variations—notably the square-topped or stepped index variations in cladded fibers—are also available but rapidly become more complex.

Optical elements that are of poor optical quality, such as imperfect laser rods and nonlinear optical crystals, may also have unintentional ducts, either stable or unstable, built into them due to local variations in optical index. Laser

### Axial Index Variations

We can also consider the situation where there is no transverse variation, or \(n_2=0\) but there is an axial variation of the index in the medium given by \(n_0=n_0(z)\). The relevant ray equation in this situation is

\[\tag{19}\frac{dr'(z)}{dz}=\frac{d}{dz}\left[\begin{array}\quad n_0(z)\frac{dr(z)}{dz}\end{array}\right]=0\]

with the solution

\[\tag{20}r(z)=r_0+r'_0\int^z_{z0}\frac{1}{n_0(z)}dz.\]

This gives for the \(\text{ABCD}\) matrix through a section of length \(L\) starting at \(z=0\)

\[\tag{21}\bf{M}=\left[\begin{array}\quad1\quad B(L)\\0\quad\quad1\end{array}\right]\quad\text{where}\quad B(L)\equiv\int^L_0\frac{dz}{n_0(z)}.\]

The ray-bending properties of a segment with an axial index variation are contained in the definition of the reduced slope \(r'(z)\).

### Ray Inversion

One additional elementary ray operation that we have not considered yet is ray inversion of an optical ray with respect to one or the other of its transverse coordinate axes.

Ray inversion necessarily occurs, for example, in one transverse coordinate or the other whenever an optical ray is specularly reflected from a mirror, as shown in Figure 15.8.

If we are to retain a right-handed coordinate system looking in the direction of ray propagation both before and after reflection, the ray displacements and slopes in the planes perpendicular to and lying in the plane of incidence must be related before and after reflection by

\[\tag{22}x=x_0,\;x'=x'_0\quad\text{and}\quad y=y_0,\;y'=-y'_0.\]

The ray matrices along the principal axes can thus be written in the form

\[\tag{23}\textbf x_2=I\;x_1\quad\text{and}\quad\bf y_2=-Iy_1,\]

where \(\textbf I\) is the identity matrix. Ray inversion thus represents one particularly primitive kind of astigmatism in an optical system. Ray inversion also means, among other things, that a ring laser having an odd number of mirrors will have a net overall inversion with respect to one or the other of its axes in one round trip.

## 2. RAY PROPAGATION THROUGH CASCADED ELEMENTS

Let us next look at how rays propagate through cascade optical systems consisting of several different paraxial elements connected together in cascade. It is one of the most important properties of ray matrices that such cascaded paraxial optical elements can be handled simply by matrix multiplying the individual \(\text{ABCD}\) matrices for the individual optical elements, arranged in reverse order.

### Cascaded Ray Matrices

Suppose several optical elements with ray matrices \(\textbf M_1\cdots\textbf M_n\),—for example, a free-space section, a thin lens, another free-space section, a dielectric interface, and so on—are arranged in cascade as shown in Figure 15.9. The total ray transformation through this cascaded series of elements can then be calculated from the chain multiplication process

\[\tag{24}\begin{align} &r_1=\textbf M_1r_0\\&r_2=\textbf M_2r_1=\textbf M_2\textbf M_1 r_0\\&r_3=\textbf M_3r_2=\textbf M_3\textbf M_2\textbf M_1r_0,\end{align}\]

and so on up to the general result

\[\tag{25}r_n=[\textbf M_n\textbf M_{n-1}\cdots\text M_2\textbf M_1]r_0=\textbf M_{tot}r_0.\]

The overall or total ray matrix \(\textbf M_{tot}\) for this system is thus given by

\[\tag{26}\textbf M_{tot}\equiv\textbf M_n\textbf M_{n-1}\cdots\textbf M_2\textbf M_1\]

A single 4-element ray matrix equal to the ordinary matrix product of the individual ray matrices can thus describe the total or overall ray propagation through a complicated sequence of cascaded optical elements.

Note, however, that the matrices must be arranged in inverse order from the order in which the ray physically encounters the corresponding elements.

### Ray Matrices and Spherical Wave Propagation

Ray matrices and paraxial ray optics provide a general way of expressing the elementary lens laws of geometrical optics, or of spherical-wave optics, leaving out higher-order aberrations, in a form that many people find clearer and more convenient.

Ray optics and geometrical optics in fact contain exactly the same physical content, expressed in different fashion.

To demonstrate this we can first note that an ideal spherical wave with radius of curvature \(R\) can also be viewed as a collection of rays all diverging from a common point, the wavefront's center of curvature C (Figure 15.10). The slope

and displacement of each of these rays at the plane z where the radius of curvature is \(R(z)\)—that is, at a distance \(R\) from the source point—are then related by

\[\tag{27}r'(z)=n(z)\frac{dr(z)}{dz}=\frac{n(z)r(z)}{R(z)}\quad\text{or}\quad R(z)\equiv\frac{n(z)r(z)}{r'(z)}.\]

Equation 15.27 implies a sign convention in which positive R indicates a diverging spherical wave, as drawn, whereas a negative value of R implies a converging spherical wave.

Suppose such a spherical wavefront with radius Ri passes through a paraxial system with ray matrix \(\text{ABCD}\) as in Figure 15.11.

Then the emerging wavefront at the other end of the \(\text{ABCD}\) system will also be a spherical wavefront with radius \(R_2\), which can be calculated from any one of the output rays by writing

\[\tag{28}\frac{R_2}{n_2}\equiv\frac{r_2}{r'_2}=\frac{Ar_1+Br'_1}{Cr_1+Dr'_1}=\frac{A(R_1/n_1)+B}{C(R_1/n_1)+D}.\]

(Note that Figure 15.11 shows a converging output wave, which means its radius of curvature R2 would be a negative number according to our sign conventions.)

More generally, if we define a "reduced radius of curvature" by \(\hat{R}(z)\equiv R(z)/n(z)\), then Equation 15.28 in terms of the reduced radii becomes simply

\[\tag{29}\hat R_2=\frac{A\hat R_1+B}{C\hat R_1+D}.\]

This simple but very general connection between \(R_1\) and \(R_2\), using only the \(\text{ABCD}\) matrix, will be very important and useful in later sections. It summarizes all of elementary geometrical optics expressed in ray matrix form.

Thick Lenses and \(\text{ABCD}\) Matrices

To expand on this last point a bit more, we can note that Equation 15.29 can be manipulated into the alternative form

\[\tag{30}\frac{1}{\hat R_2-L_2}=\frac{1}{\hat R_1-L_1}+\frac{1}{1/C},\]

with \(L_2\equiv(A-1)/C\;\text{and}\;L_1\equiv(1-D)/C\). But this expression is obviously just a slightly generalized form of the usual geometrical optics lens formula.

It says that the reduced input and output wave curvatures or image and object distances \(\hat R_1\) and \(\hat R_2\) obey the simple lens law for a thin lens of focal length \(f\equiv -1/C\), if these quantities are measured from reference planes located at distances \((1-D)/C\) and \((A-1)/C\) behind the input and output planes of the \(\text{ABCD}\) system.

For simplicity let us consider only the situation where the index of refraction is unity on both sides of the \(\text{ABCD}\) system, so that \(\hat R\equiv R\), and the radius \(R\) gives the distance to or from the source point for the spherical wave.

Then, the two reference planes or principal planes for the \(\text{ABCD}\) system just referred to are located at distances \((1-D)/C\) and \((1-A)/C\) behind and in front of the input and output planes \(z_1\) and \(z_2\) of the \(\text{ABCD}\) system itself, as indicated by the points \(PP_1\) and \(PP_2\) in Figure 15.12. (Note that with our sign convention for radii of curvature, the output principal plane is located a distance \(L_2\) behind the output plane \(z_2\), or a distance \(-L_2\equiv(1-A)/C\) in front of it.)

If the input and output rays, or spherical waves, are referenced to these principal planes rather than the original reference planes \(z_1\) and \(z_2\), the overall \(\text{ABCD}\) system from input to output principal planes then acts exactly like a thin lens with a focal length \(f\equiv-1/C\), as given in Equation 15.30.

This lens then also has front and back focal points \(FP_1\) and \(FP_2\) located a distance \(f\) outside the principal planes, as indicated in Figure 15.12.

Any arbitrary \(\text{ABCD}\) system with \(n_1=n_2\) is thus equivalent to a thick lens, which can be fully characterized by its two principal planes and its focal length \(f\).

If \(n_1\neq n_2\) this conclusion still remains true, but the thick lens must be characterized in a slightly more complex fashion by its principal, focal, and nodal planes; see the Problems at the end of this section for details.

### Imaging Properties of ABCD Systems

For the simpler situation of \(n_1=n_2=1\), the overall \(\text{ABCD}\) matrix in Figure 15.12 going from the input to output principal planes is then given by

\[\tag{31}\textbf M=\left[\begin{array}\equa1&0\\C&1\end{array}\right]\quad\left(\begin{array} nprincipal\;plane\\to\;principal\;plance\end{array}\right).\]

This is the ray matrix for a thin lens with \(f=-1/C\).

In other words, as we noted in the preceding, the overall \(\text{ABCD}\) matrix between these planes appears to have an effective length \(B=0\) and a focal power \(C\) equivalent to the \(\text{ABCD}\) matrix itself.

By contrast, the overall \(\text{ABCD}\) matrix from the input to output focal planes is given by

\[\tag{32}\textbf M=\left[\begin{array}\equa0&C^{-1}\\C&0\end{array}\right]\quad\left(\begin{array} nfocal\;plane\\to\;focal\;plane\end{array}\right).\]

This is the general form of the ray matrix going from focal point to focal point. Note that the apparent length associated with this propagation is \(C^{-1}=-f\), even though the actual physical length (for a positive thin lens) is actually \(2f\).

More generally, for arbitrary indices, consider an input spherical wave which diverges from an arbitrary object plane located at a point \(OP\) on the \(z\) axis, and is then focused by an arbitrary \(\text{ABCD}\) system back down to a (real or virtual) image plane located at a point \(IP\) on the \(z\) axis. We can then show that the overall \(\text{ABCD}\) matrix going from the object plane at \(OP\) to the image plane at IP has the general form

\[\tag{33}\textbf M=\left[\begin{array}\equa M&0\\C&1/M\end{array}\right]\quad\left(\begin{array} nobject\;plane\\to\;image\;plane\end{array}\right).\]

Once again the effective length from object plane to image plane is zero, but in the most general situation there will be an image magnification \(\textbf M\) (given in general by \((CR_1 +D)^{-1})\) from any point \(r_1\) in the image plane to the corresponding point \(r_2\) in the output plane. (Note that because the effective length \(B\equiv 0\), all the rays leaving from any input point \(r_1\) will pass through the same output point \(r_2\).

A ray-angle demagnification given by the \(D\) element value of \(1/\text M\) is then necessarily associated with this image magnification M. This conclusion repre-sents,

in fact, a paraxial approximation to the more general sine condition of optics which says that if a ray leaves a point \(r_1\) in an object plane with angle \(\theta_1\) and arrives at a point \(r_2\) in an image plane with an angle \(\theta_2\), these quantities must be related by \(n_1r_1\sin\theta_1=n_2r_2\sin\theta_2\).

This condition in turn can be given a thermodynamic interpretation: If we collect the blackbody radiation leaving a small area of diameter \(r_1\) and temperature \(T\) within a cone angle \(\theta_1\) and image it, with lateral magnification \(\textbf M\), so that it is incident within cone angle \(\theta_2\) onto another small area of diameter \(r_2=Mr_1\), then this incident radiation must just match the blackbody radiation which the second surface area at the same temperature \(T\) would emit back into the same cone angle \(theta_2\).

If we take properly into account the difference in blackbody energy densities and velocities in two media with different refractive indices, we can then use the necessity for thermodynamic balance to derive either the more general sine condition, or the ray matrix condition that \(AD-BC=1\).

### Ray Matrices in Astigmatic Systems

When cartesian coordinates are used in an optical system, with propagation primarily in the \(z\) direction, then a general ray must be described by its transverse displacements in both the \(x\) and \(y\) directions (Figure 15.13).

For simple optical elements the ray matrix formalism just described then applies separately and independently to both the \(x, x'\) and \(y, y'\) coordinates.

If an overall optical system is rotationally symmetric the same \(\text{ABCD}\) matrices apply equally to both \(x,x'\) and to \(y,y'\).

If the system contains astigmatic elements, then different \(\text{ABCD}\) matrices must be used for these elements in the \(x\) and \(y\) directions, as we will discuss in more detail in a later section.

### Other Ray Matrix Properties

Ray matrices have many other interesting and useful properties and applications which we will introduce in this and later chapters. The properties of real ray matrices in periodic systems, in misaligned ray matrix systems, and in nonorthogonal ray matrix systems (systems with "twist") are discussed in later sections of this chapter.

In later chapters we will also show how Huygens' diffraction integral can be written entirely in terms of ray matrix elements; how ray matrix concepts can be extended and all of paraxial optics explained by generalized or complex ray matrices; and how arbitrary ray matrices can be symmetrized, decomposed and/or synthesized by appropriate transformations.

### Problems for 15.2

1. Evaluating the focal length of a thin lens. A thin lens may be regarded as two curved dielectric interfaces with vanishingly small distance between them. Using this viewpoint and the ray matrices for a dielectric interface, find the focal length / of a thin lens in terms of the radii of curvature of the two lens surfaces and the index of refraction n of the lens material.

2. Replacing an arbitrary "black box" ray matrix with a single lens. An optical black box has various optical elements inside it, producing a given real \(\text{ABCD}\) matrix from its input plane to its output plane. We want to replace this black box with a box of physical length L containing only a single lens of focal length \(f\_. Can this be done? What length \(L\), focal length \(f\), and lens location within \(L\) will be required?

3. Ray matrix of cascaded elements going in the reverse direction. A collection of optical elements in series has an overall \(\text{ABCD}\) matrix going in one direction.

Find the \(\text{ABCD}\) matrix going through the same elements in the reverse direction, i.e., assume the direction of the \(z\) axis going through these elements is reversed (or equivalently, assume that the whole system is picked up, turned around, and set back down on the same \(z\) axis with all the elements now in reverse order).

4. Evaluating the total ray matrix for a reflection problem. A ray passes through a collection of optical elements in series having an overall \(\text{ABCD}\) matrix; bounces off a mirror of radius \(R\); and passes back out through the same collection of elements in the reverse direction.

What is the total \(\text{ABCD}\) matrix for the entire round trip?

5. Replacing an arbitrary ray matrix system with a single mirror. A certain optical black box has a front entrance plane and various lenses and mirrors inside it, such that a ray entering the entrance plane eventually comes back out through the same plane with a total ray transformation given by a known \(\text{ABCD}\) ray matrix.

Suppose this black box is to be replaced by a single curved mirror of radius \(R\) located an appropriate distance \(L\) behind (or, if necessary, in front of) the entrance plane of the box. Find the required radius \(R\) and position \(L\) of the single curved mirror.

6. Focusing properties of thick-lens \(\text{ABCD}\) matrices. Verify the \(\text{ABCD}\) matrix equations (15.31-15.33) given in the text for transfer between input and output principal planes and between input and output focal planes (for \(n_r=1)\), and more generally between object and image planes (for arbitrary \(n_r)\).

7. General formulas for an arbitrary thick lens or \(\text{ABCD}\) system The focal, principal and nodal planes for an arbitrary thick lens or \(\text{ABCD}\) system having input and output reference planes \(z_1\) and \(z_2\) and input and output indices of refraction \(n_1\) and \(n_2\) are defined by the conditions that:

(1) An input spherical wave emanating from the input focal point \(FP_1\) and passing through the \(\text{ABCD}\) system will emerge as an output plane wave; whereas an input plane wave will emerge as a spherical wave which converges to (or appears to diverge from) the output focal point \(FP_2\).

(2) If an input ray \(r_1\) which comes from the input focal point \(FP_1\), and the output ray \(r_2\) parallel to the output axis which it produces, are extended forward or

backward until they intersect, their intersection point defines the input principal plane \(PP_1\). Similarly, the intersection of a parallel input ray \(r_1\) and the output ray \(r_2\) which it produces defines the output principal plane \(PP_2\).

(3) If an input ray with coordinates \(r_1\), \(r'_1\) produces a parallel output ray, i.e., \(r'_2=r'_1\), then the line connecting the input and output points \(r_1,\;z_1\) and \(r_2,\;z_2\), crosses the optical axis at the optical center \(OC\) of the lens.

If extensions of the same entering and exit rays are constructed, these rays then intersect the optical axis at the front and back nodal planes \(NP_1\) and \(NP_2\) of the lens.

To put this in another way, Ditchburn speaks of any pair of planes which are imaged onto each other as conjugate planes, and then says, "The plane conjugate to a plane at an infinitely great distance from the system in a positive direction is called the first focal plane ... In a similar way the second focal plane is conjugate to a plane infinitely distant in the negative direction."

Also, "Any ray which before entering the system is directed toward the first nodal point will emerge with its final direction parallel to the original direction and passing through (or coming from) the second nodal point," and finally, the principal planes of a thick lens ".. .are conjugate planes of unit positive magnification," i.e., any input ray which, when projected, intersects the first principal plane at \(r_1=a\) and any slope \(r'_1\) will produce an output ray which intersects the second plane at the same distance \(r_2=a\).

From these definitions, find general formulas for the locations of all these points or planes for an arbitrary \(\text{ABCD}\) system with \(n_1\neq n_2\), and then illustrate by calculating the actual locations for some representative systems, such as a thick lens with curved front and back faces, or a section of a quadratic duct.

## 3. RAYS IN PERIODIC FOCUSING SYSTEMS

Perhaps the most interesting and important application of ray matrices comes in the analysis of periodic focusing systems, i.e., systems in which the same sequence of elements is repeated many times down a cascaded chain or optical lensguide. An optical resonator can be modeled, as we have already shown in Figure 14.3, by such an iterated periodic focusing system.

The eigenvalues and "eigenrays" for such periodic focusing systems play an important role in optical resonator theory, particularly in explaining the stable and unstable properties of optical resonators and lensguides.

The stability analysis for periodic optical focusing that we will present here will also apply equally well to periodic particle focusing systems, such as electron beams in periodically focused traveling-wave tubes or in linear accelerators.

### Eigenvalues and Eigenrays

Let the ray matrix for propagation through one period in such a system, from an arbitrary reference plane in one period to the corresponding plane one period later (see Figure 15.14), be denoted by \(\textbf M\). The ray vectors \(r_n\) and \(r_{n+1}\) at the \(n\)-\(th\) and \(n+1\)-\(th\) reference planes are then related by

\[\tag{34}r_{n+1}=\textbf Mr_n=\textbf M^{n+1}\;\textbf r_0,\]

where \(r_0\) is the initial ray at the input plane \(n=0\), and \(\textbf M^{n+1}\) is the matrix for one period raised to the \(n+1\)-\(th\) power.

Any cascaded matrix problem such as this can best be analyzed by finding the eigenvalues and eigensolutions of the matrix \(\textbf M\). That is, we look for a set of "eigenrays" \(r\) and corresponding eigenvalues \(\lambda\) (no connection with optical wavelength \(\lambda\)) which each individually satisfy the eigenequation

\[\tag{35}\textbf Mr=\lambda r.\]

For a \(2\times 2\) ray matrix \(\textbf M\) this is equivalent to the equation

\[\tag{36}[\textbf M-\lambda\textbf I]\;\textbf r=0\quad\text{or}\left[\begin{array}\quad A-\lambda&B\\C&D-\lambda\end{array}\right]\left[\begin{array}\quad r\\r'\end{array}\right]=0,\]

where \(\textbf I\) is the identity matrix.

Nonzero solutions to Equation 15.36 are possible if and only if the determinant of the matrix in this equation satisfies the relation

\[\tag{37}\begin{array}\|A-\lambda&B\\C&D-\lambda\end{array}\equiv\lambda^2-(A+D)\lambda+1=0,\]

where we have used the fact that \(AD-B=1\). It is convenient to define an "\(m\) parameter" for the system, equal to half the trace of the \(\text{ABCD}\) matrix, or

\[\tag{38}m\equiv\frac{A+D}{2}.\]

The ray matrix eigenvalues are then given by the two values

\[\tag{39}\lambda_a\lambda_b=m\pm\sqrt{m^2-1}\]

which obey the general relationship that

\[\tag{40}\lambda_a\lambda_b\equiv1.\]

There are also two matching eigenrays \(textbf r_a\) and \(\textbf r_b\), which the reader can calculate for herself, such that

\[\tag{41}\textbf M\;\textbf r_a=\lambda_a\textbf r_a\quad\text{and}\quad\textbf M\;\textbf r_b=\lambda_b\;\textbf r_b.\]

The properties of these eigenvalues and eigenrays are fundamental to the theory of stable and unstable optical resonators, as we shall now see.

### Eigenray Expansions

It is a fundamental property of these matrix eigensolutions that any arbitrary ray \(r_0\) at the input to the periodic system (or for that matter at any other plane) can always be expanded as a sum of the two eigenrays of the system in the form

\[\tag{42}\textbf r_o=c_a\textbf r_a+c_b\textbf r_b,\]

where \(c_a\) and \(c_b\) are suitable expansion coefficients. The ray vector after any number of sections n will then be given by

\[\tag{43}\begin{array}\textbf r_n=\textbf M^n\textbf r_0=\text M^n\times(c_a\textbf r_a+c_b\textbf r_b)\\\quad\qquad\qquad\qquad=c_a\times\lambda^n_a\textbf r_a+c_b\times\lambda^n_b\textbf r_b\end{array}.\]

The propagation of each eigenray is thus specified simply by multiplying it by the corresponding eigenvalue raised to the appropriate power.

The eigenrays and their matching eigenvalues therefore contain all the information that is needed to fully describe the propagation of any arbitrary ray in the periodic system.

### Stable Periodic Focusing Systems

All such periodic focusing systems (with purely real ray matrices) can in fact be neatly divided into either stable or unstable periodic systems, depending only on the properties of the matrix eigenvalues.

Suppose first that the ray matrix for one period has \(A\) and \(D\) coefficients such that

\[\tag{44}-1\leq m\leq 1,\quad\text{or}\quad m^2\equiv\left(\frac{A+D}{c}\right)^2\leq1.\]

In this situation we may write the m parameter as

\[\tag{45}m\equiv\frac{A+D}{2}\equiv\cos\theta,\]

where \(theta\) is the angle defined by this expression. The eigenvalues of the system can then be written as

\[\tag{46}\lambda_a,\lambda_b=m\pm j\sqrt{1-m^2}=\cos\theta\pm j\sin\theta=e^{\pm j\theta}.\]

The matrix eigenvalues are thus complex and have magnitude unity. The propagation of any ray in the periodic system then takes the form

\[\tag{47}\textbf r_n=c_a\text r_a\times e^{jn\theta}+c_b\textbf r_b\times e^{-j n\theta}=\textbf r_0\cos\theta n+s_0\sin\theta n,\]

where \(r_0\equiv c_a\textbf r_a+c_b\textbf r_b\) is the input ray vector, and \(s_0\equiv j(c_a\textbf r_b-c_b\textbf r_b)\) is a kind of "input slope vector."

Any periodic focusing system with \(|m|\leq1\) thus represents a stable periodic focusing system, analogous to a stable duct. Rays in the system will oscillate back and forth about the axis, as in Figure 15.15, with a maximum excursion determined entirely by the initial ray parameters \(r_0\) and \(s_0\).

The displacement \(r_n\) of any ray at successive reference planes down the system will oscillate periodically about the axis in the form

\[\tag{48}\textbf r_n=\textbf r_0\cos\theta n+s_0\sin\theta n\]

where \(r_0\) and \(s_0\) are the ray initial conditions. Note that it is the index \(n\), and not the angle \(\theta\), that is the variable which increases with distance down the chain.

Note also that Equations 15.47 and 15.48 only give the displacement \(r_n\) as measured at the successive reference planes—they do not say anything about what happens to the ray inside the periodic section between those reference planes.

Viewed only at the successive reference planes, however, the ray appears to oscillate about the axis of the periodic focusing system as in Figure 15.15, with an oscillation period equal to \(2\pi/\theta\) periods of the periodic focusing system itself.

### Periodic Focusing Demonstration

Any reader who has the opportunity should set up a simple demonstration of such a stable periodic focusing system, using a pair of silvered mirrors perhaps 10 to 15 cm in diameter with a 50 cm to 1 m focal length, as illustrated in Figure 15.16.

(Suitable inexpensive mirrors and simple mirror mounts are available from hobby stores or amateur astronomy supply houses.) The beam from a \(\text{He}\)-\(\text{Ne}\) laser can be injected at one edge of the resonator, using a small adjustable injection mirror just inside the edge of one of the larger mirrors.

Thoughtful adjustment of the beam injection direction and the mirror spacing and alignment will then lead to various kinds of periodically repeating spot patterns on the end mirrors.

A little chalk dust or smoke can make the interlaced beam patterns inside the resonator dramatically visible in a darkened room, although a more effective way to make the beams visible without fouling the mirrors is to attach a few strands of white cord or thin wire to the shaft of a

small electric motor so that they sweep transversely across the resonator like a soft buzzsaw.

Note that the periodic solutions derived in Equation 15.47 and 15.48 will apply equally well to both of the transverse displacements \(x_n\) and \(y_n\), with appropriate (and in general different) initial conditions in each transverse coordinate. The beam in a stable periodic focusing system should thus oscillate sinusoidally about the axis with the same period in both \(x\) and \(y\) (assuming no astigmatism in the optical system). The oscillations will however in general have different amplitudes and phases in the two directions, depending upon the initial conditions.

But this is just the necessary condition for producing Lissajou figures, except that the Lissajou figures in this situation will be discrete spot patterns at integer values of n rather than continuous line patterns.

In the demonstration apparatus, therefore, the successive spots at which the ray strikes either of the end mirrors will trace out a Lissajou pattern with the same frequency or period in the \(x\) and \(y\) directions. By adjusting the initial beam conditions to vary the phase and amplitude between the \(x\) and \(y\) oscillations, we can obtain arbitrary circular, elliptical or linear spot patterns (for examples, see Figure 15.17).

Inspection will also show (and we will later verify analytically) that the gaussian laser beam in such a periodic focusing system does not spread due to diffraction as we might expect, even after a large number of round-trip bounces.

The same stability conditions that make the ray trajectory stable but oscillatory inside the resonator also make the laser beam spot size be periodically refocused at each mirror. The beam spot size at different points may then oscillate periodically, but it also remains bounded and stable over an indefinite number of round trips.

With proper adjustment we can also catch the laser beam and extract it from the cavity with an extraction mirror (or even with the injection mirror) after any integral number of one-way bounces. The optical delay time in such a cavity is ~6 nsec per round trip for a 1 m long cavity, and with more expensive high-quality mirrors the power loss per bounce can be quite small. Reentrant optical cavities of this type can thus function as optical delay lines.

Such delay lines were once seriously considered as potential high-capacity optical memories (with the cavity filled with coded information in the form of very short optical pulses); and they have also been used a number of times as optical delay lines in various scientific experiments.

### Unstable Periodic Focusing Systems

Let us now turn to the opposite example, that of an unstable periodic focusing system, in which the ray matrix for one period has instead the property that

\[\tag{49}m^2\equiv\left(\frac{A+D}{2}\right)^2>1\quad\text{or}\quad|m|>1.\]

The eigenvalues of the system will then have the values

\[\tag{50}\lambda_a,\lambda_b=m\pm\sqrt{m^2-1}=\textbf M, 1/\textbf M,\]

where \(\textbf M\) is a "transverse magnification per period," with the property that \(|\textbf M|>1\). The ray displacement in this situation will obey the formula

\[\tag{51}\textbf r_n=\textbf M^n\times c_a\textbf r_a+\textbf M^{-n}\times c_b\textbf r_b=\textbf r_0\cosh\theta n+\sinh\theta n,\]

where \(\theta\equiv\) In \(\textbf M\) and \(r_0\) and \(s_0\) again represent initial conditions at the start of the periodic system.

The ray displacement \(r_n\) in this situation will diverge exponentially with distance down the chain, as shown in Figure 15.18, with the displacements and slopes magnifying by a magnification \(\textbf M\) in each period.

There will also be at first a demagnifying component to the trajectories, decreasing as \(1/textbf M\) per section, but this will die out after a few sections.

Note that the ray position may also oscillate back and forth across the ray axis in alternate periods, depending on whether the magnification has a value \(\textbf M<-1\;\text{or}\;\textbf M>+1\). Such unstable periodic focusing systems have an important practical application in the unstable laser resonators we will describe later.

### Problems for 15.3

1. Properties of the eigenrays in a periodic system. Calculate the two eigenrays \(r_a\) and \(r_b\) for a periodic focusing system in terms of the \(\text{ABCD}\) matrix elements. Note that any physically meaningful ray in the periodic system must be purely real, i.e., must have purely real displacement and slope, yet the eigenvectors \(r_a\) and \(r_b\) for a stable periodic system are in general complex quantities.

How can this be true? Under what conditions (if any) can individual eigenrays be individually or separately excited in the periodic system?

2. Ray properties of an elementary periodic lensguide. Calculate the ray eigenvalues and eigenrays for the simplest type of lens waveguide, namely repeated identical convergent lenses of focal length \(f\) spaced a distance \(L\) apart, using the midpoint between lenses as the reference plane.

Use the notation \(L=4f(1-\Delta)\), and discuss the mathematical behavior and the physical significance of the eigenvalues and eigenrays as the lens spacing is increased toward the value \(L\rightarrow\;4f\;\text{or}\;\Delta\rightarrow 0\).

What would be the optical resonator analog to this limit?

Try repeating this problem working from reference planes located at the midplanes of the lenses (i.e., half the lens focusing power is placed on each side of the reference plane); and compare the eigenvalues at this reference plane to the eigenvalues at the previous reference plane.

3. Computer plotting of periodic ray positions. Write a simple computer program to compute and plot (on some suitable plotter or printer) the \(x,y\) positions on one end mirror on successive bounces for a ray bouncing through repeated round trips inside a resonator of length \(L\) with two identical end mirrors having radii of curvature \(R\).

Allow for arbitrary initial ray injection conditions and also for astigmatic mirrors, i.e., mirrors having different curvatures \(R_x\) and \(R_y\) in the \(x\) and \(y\) transverse directions.

Experiment with different spacings, curvatures, and injection conditions to find the kinds of trajectories the spot will follow around the transverse plane on one end mirror, noting particularly how the spot moves around the mirror from bounce to bounce. (You might also plot side or top views of how the rays bounce in the resonator, or examine the spot patterns at planes inside the resonator other than the end mirror.)

4. Periodic systems with integer numbers of spots. Suppose you have set up either the computer simulation outlined in the previous problem, or a working optical delay line model using a \(\text{He}\)-\(\text{Ne}\) laser and two identical mirrors with variable spacing.

Then you can discover that as you change the spacing between mirrors (with fixed mirror radii \(R\)), there are certain spacings \(L\) for which the beam produces an exactly integral number of spots on each mirror before returning back to the same point where it is injected, (a) Find an expression for the mirror spacings \(L_n\) at which there are exactly \(n\) spots produced on each mirror, in terms of the radius of curvature \(R\) of the two identical mirrors, (b) If the input beam is injected properly the spots on the end mirrors walk around a circular orbit.

At any transverse plane in between the mirrors the spots then lie on a circle also, but of smaller diameter than on the end mirrors (the rays lie on a hyperboloid of revolution). Find the ratio between the diameters of the spot circles at the center of the resonator and on the end mirrors.

5. Alignment procedure for the periodic delay line demonstration. There is a simple sequence of steps one can follow, using the injected laser beam, to get the optical delay line demonstration initially aligned, with the two mirrors properly aligned to each other, and with the injected beam properly aligned to the resonator.

Can you describe how this should be done?

6. Eigenray solutions for a near-spherical optical resonator. Find the ray matrix eigenvalues and eigenvectors for a near-spherical resonator (i.e., \(R_1=R_2\approx 2L)\), using the midplane of the resonator as the reference plane. Discuss the physical significance of the results in the limiting situation of an exactly spherical resonator.

7. Perturbation stability of periodic focusing eigenrays. Suppose that a ray starts out in a periodic focusing system as primarily one of the eigenrays, say, the \(r_a\) eigenray, but with a small perturbation or a small amount of the other eigenray \(r_b\) mixed in, so that \(r_1=a_1r_a+\beta\textbf r_b\), with \(\beta_1\ll a_1\).

Show that on each successive round trip the relative amount of the \(\textbf r_b\) component in the ray mixture will grow as \(\lambda^2_b\).

In other words, show that any small perturbation about either one of the eigensolutions will grow (or decay) with a "perturbation eigenvalue" that is equal to the ray eigenvalue of the other eigensolution squared.

8. Ray intersections inside an optical resonator. In a multiple-pass optical delay line as described in this section, optical rays on different bounces will intersect each other (at least in one transverse dimension) at certain locations inside the cell.

Analyze the locations of these intersections, and find the total number of such intersections within a cell as a function of the multiple-pass cell design. Note that beam intersections within such a cell can be significant where nonlinear optical interactions are important, for example, in the multiple-pass Raman gain cells described by \(B\). Perry, et al., "Controllable pulse compression in a multiple-passcell Raman laser," Optics Lett. 5, 288-290 (July 1980).

## 4. RAY OPTICS WITH MISALIGNED ELEMENTS

The ray matrix formalism we have used thus far assumes that all the paraxial elements are properly aligned and centered with respect to the optical reference axis. What effects will misalignment or transverse misplacement of individual optical elements have on the overall ray matrix performance?

### Analysis of Misaligned Elements

To answer this question let us first consider the effects of misalignment on a single optical element, or perhaps a collection of elements forming a single internally aligned \(\text{ABCD}\) system.

In order to analyze this situation, we must from here on distinguish between the real physical axis (the "true optical axis") of any individual paraxial element or \(\text{ABCD}\) system, which we will call its element axis, and the reference optical axis we use for analyzing the rays in this optical system, which may be arbitrarily chosen, and which we will call the reference optical axis or just the optical axis, as in Figure 15.19.

Suppose then that the element axis of some arbitrary \(\text{ABCD}\) system, with overall length \(L\), is displaced from the reference optical axis by displacements \(\Delta_1\) and \(\Delta_2\) at the input and output ends, as in Figure 15.19.

The element axis is thus also misaligned in slope with respect to the reference axis by the (small) angle

\[\tag{52}\Delta'\equiv\frac{\Delta_2-\Delta_1}{L}.\]

The misalignment of an individual element or collection of \(\text{ABCD}\) elements with respect to the reference axis can thus be characterized by any two of the three parameters \(\Delta_1,\Delta_2,\Delta'\). (Note that \(\Delta'\) is a real, not a reduced slope.)

We can also express this misalignment of the paraxial system by two "misalignment vectors" at its input and output ends, as given by

\[\tag{53}\Delta_1\equiv\left[\begin{array}&\Delta_1\\\Delta'_1\end{array}\right]\quad\text{and}\quad\Delta_2\equiv\left[\begin{array}&\Delta_2\\\Delta'_2\end{array}\right].\]

where \(\Delta'_1\equiv n_1\Delta'\;\text{and}\;\Delta_2'\equiv n_2\Delta'\) are the reduced values of the element axis slope at each end. The two misalignment vectors will then be connected by

\[\tag{54}\Delta_2\equiv\left[\begin{array}&\Delta_2\\\Delta'_2\end{array}\right]=\left[\begin{array}&1&L/n_1\\0&n_2/n_1\end{array}\right]\left[\begin{array}&\Delta_1\\\Delta'_1\end{array}\right]\equiv\textbf M_\Delta\times\Delta_1,\]

where \(\textbf M_\Delta\) is shorthand for the \(2\times2\) matrix in this equation.

The coordinates of any general ray vector as measured with respect to the arbitrary reference optical axis we will then continue to denote by \(r\), \(r'\) as before, whereas the same ray vector measured with respect to the element axis we will denote by \(s,s'\). These quantities are then related at the input plane by

\[\tag{55}r_1=s_1+\Delta_1\quad\text{and}\quad r'_1=s'_1+\Delta'_1,\]

and similarly for \(r_2\) and \(r_2'\). Hence, in vector notation,

\[\tag{56}r_2=s_2+\bf{\Delta}_2\quad\text{and}\quad r_1=s_1+\Delta_1.\]

(We assume small angles, so that we can simply add the slopes.)

Now, the ray vectors measured with respect to the element axis will transform through the \(\text{ABCD}\) element in the usual fashion, namely

\[\tag{57}s_2\equiv\left[\begin{array}&s_2\\s'_2\end{array}\right]=\left[\begin{array}&A&B\\C&D\end{array}\right]\left[\begin{array}&s_1\\s'_1\end{array}\right]\equiv\textbf M\times s_1,\]

where \(\textbf M\) is the \(text{ABCD}\) matrix for the aligned element(s).

However, the input and output displacements and slopes measured with respect to the reference optical axis will now be given, in matrix terms, by

\[\tag{58}r_2=s_2+\bf\Delta_2=\textbf{M}s_1+\textbf{M}_\Delta\bf{\Delta_1}=\textbf{M}r_1+[\textbf{M}_\Delta-\textbf{M}]\bf\Delta_1\]

which we will rewrite in general terms as

\[\tag{59}r_2=\textbf{M}r_1+\bf{E}.\]

The primary effect of misalignment on a paraxial system is to add to the usual ray matrix transformation what we might call an "error vector" \(\textbf E\) which is given by

\[\tag{60}\bf E\equiv\left[\begin{array}&E\\F\end{array}\right]=[\bf M_\Delta-\bf M]\bf\Delta_1=\left[\begin{array}&1-A&L-n_1B\\-C&n_2-n_1D\end{array}\right]\left[\begin{array}&\Delta_1\\\Delta'\end{array}\right]\]

in terms of the usual \(\text{ABCD}\) matrix elements and the misalignment quantities \(\Delta_1\) and \(\Delta'\).

### Three-by-Three Matrix Formalism for Misaligned Systems

These results for a general misaligned paraxial system can be put into a convenient \(3\times3\) matrix form by adding a third dummy element of value unity to each of the ray vectors, and then writing a \(3\times3\) "\(\text{ABCDEF}\)" matrix relation in the form

\[\tag{61}\left[\begin{array}&r^2\\r'^2\\1\end{array}\right]=\left[\begin{array}&A&B&E\\C&D&F\\0&0&1\end{array}\right]\times\left[\begin{array}&r_1\\r'_1\\1\end{array}\right],\]

where the two additional ray matrix quantities \(E\) and \(F\) are given by the results derived in Equation 15.60, namely,

\[\tag{62}E=(1-A)\Delta_1+(L-n_1B)\Delta'\quad\text{and}\quad F=-C\Delta_1+(n_2-n_1D)\Delta'.\]

These \(3\times3\) matrices can then be cascaded, perhaps with the aid of a simple computer program, to handle several such misaligned paraxial elements connected in series.

### Cascaded Misaligned Elements

Suppose several successive optical elements or groups of elements are arranged in cascade, with each element or group of elements having a different degree of (small) misalignment, and hence different Ei and Fi elements, as well as the usual \(A_i,\;B_i,\;C_i,\;D_i\) elements. (These individual misalignments are all measured relative to a common reference optical axis passing, in a straight line, through the whole collection.) We can then cascade these \(3\times3\) ray vectors and ray matrices (in reverse order, as usual) to propagate rays through any sequence of cascaded, and individually misaligned, paraxial systems, each with its own \(\text{ABCD}\) elements and its own distinct \(EF\) misalignment elements.

Rather than multiplying and manipulating \(3\times3\) matrices, however, we can analyze the same situation in a more convenient fashion by rewriting Equation 15.61 on the partitioned matrix form

\[\tag{63}\begin{bmatrix}r_2\\\hline 1\end{bmatrix}=\begin{bmatrix}\textbf M&|&\textbf E\\\hline O&|&1\end{bmatrix}\begin{bmatrix}r_1\\\hline 1\end{bmatrix},\]

where \(\bf M\) is the usual \(2\times2\) \(\text{ABCD}\) matrix; \(r_1,\;r_2\) and E are \(2\times1\) column matrices; \(O\) is a \(1\times2\) row matrix with both elements \(0\); and \(1\) is a single "\(1\times1\)" element.

Partitioned matrices of this sort can then be multiplied out analytically by applying the usual rules of matrix multiplication treating each individual submatrix within the partitioned matrix as a fixed element.

Suppose we wish to cascade just two individually misaligned \(\text{ABCD}\) systems in sequence. The overall \(3\times3\) matrix for the cascaded system can then be calculated from

\[\tag{64}\begin{bmatrix}\textbf{M}_{tot}&|&\textbf {E}_{tot}\\\hline O&|& 1\end{bmatrix}=\begin{bmatrix}\textbf {M}_{2}&|&\textbf{E}_{2}\\\hline O&|& 1\end{bmatrix}\times\begin{bmatrix}\textbf{M}_{1}&|&\textbf{E}_1\\\hline O&|&1\end{bmatrix}=\begin{bmatrix}\textbf{M}_2\textbf{M}_1&|&\textbf{M}_2\textbf{E}_1+\textbf{E}_2\\\hline O&|&1\end{bmatrix}.\]

As a check the reader may want to multiply out the full 3x3 matrices in non-partitioned form to verify that the final result is indeed

\[\tag{65}\begin{bmatrix}\textbf{M}_{tot}&|&\textbf{E}_{tot}\\\hline O&|& 1\end{bmatrix}=\left[\begin{array}&A_2A_1+B_2C_1&A_2B_1+B_2D_1&A_2E_1+B_2F_1+E_2\\C_2A_1+D_2C_1&C_2B_1+D_2D_1&C_2E_1+D_2F_1+F_2\\0&0&1\end{array}\right],\]

or the same as given by the partitioned form.

We see first of all that the \(2\times2\) or \(\text{ABCD]\) part of the overall cascaded, misaligned system has exactly the same form as the product of the two matrices would have without misalignment, since this part of the product does not depend at all on the misalignment values \(E_1,\;F_1,\;\) or \(E_2,\;F_2\) of the individual elements.

To phrase this more generally, the basic ray matrix properties and paraxial focusing properties of a cascaded system are entirely unchanged by small misalignments of individual elements within the system.

### Overall Misaligned Systems

These same conclusions obviously remain true even if we cascade an arbitrary number of arbitrarily misaligned paraxial elements. Suppose we propagate an initial ray \(r_0\) through \(N\) such elements or subsystems, each with an individual misalignment described by an error vector \(E_k\equiv[E_k,\;F_k]\) as referenced to a single straight-line optical axis through the overall system.

The overall transformation through the cascaded system can then be written as

\[\tag{66}\bf r_N=\bf M_{tot}\;\bf r_0+\bf{E}_{tot}\]

where the overall \(\text{ABCD}\) matrix is given as usual by the matrix product \(\bf M_{tot}=\bf M_N\cdots\bf M_2\bf M_1\), and where the cumulative "error vector" through the entire system is given in terms of the error vectors of the individual elements by

\[\tag{67}\bf{E}_{tot}=[\bf M_N\cdots\bf M_2]\bf E_1+[\bf M_N\cdots\bf M_3]\bf E_2+\cdots+\bf M_N\bf E_{N-1}+\bf E_N.\]

The overall misalignment elements \(\bf E_{tot}\) and and \(\bf F_{tot}\) for the cascaded system obviously involve the misalignments \(\bf E_k,\;F_k,\) of each individual element in the system, as "propagated" through the \(\text{ABCD}\) matrices of all the subsequent elements in the system.

In a cascaded \(\text{ABCD}\) system with misaligned individual elements, the overall system will thus appear to have a total misalignment \(\bf E_{tot}\), \(\bf F_{tot}\) that depends in a complicated way both on the misalignment of individual elements and on the transmission of each of these individual misalignments through the individual \(\text{ABCD}\) matrices of all later elements.

### System Alignment, and the Overall Element Axis

Suppose we do the kind of calculation just outlined, and find the overall misalignment parameters \(\bf E_{tot}\) and \(\bf F_{tot}\) for some particular cascaded system, using some particular arbitrarily chosen reference optical axis that passes in a straight line through the entire system.

The preceding results then imply that the overall system acts as if it is a single properly aligned overall system, but one whose overall element axis has end-plane displacements \(\Delta_0\) and \(\Delta_N\) at its input and output ends like those in Figure 15.20, measured with respect to the reference optical axis that we used in doing all the calculations.

Any system with misaligned individual elements can thus obviously be converted into an effectively aligned overall system, having \(\bf E_{tot}=\bf F_{tot}=0\), either by a physical translation and rotation of the overall system to bring its overall element axis into coincidence with the reference optical axis, or equivalently by a redefinition of the reference optical axis to bring it into coincidence with the system's element axis.

That is, any overall values of \(\bf E=\bf E_{tot}\) and \(\bf F=\bf F_{tot}\) for the overall system can be canceled out by physically translating the entire system as a unit downward an amount \(\Delta_0\) given by

\[\tag{68}\Delta_0=\frac{(1-D)E-(L-B)F}{(1-A)(1-D)+(L-B)C},\]

and then physically rotating it toward the system axis, with center of rotation at the input plane, by the angle

\[\tag{69}\Delta'=\frac{EC+(1-A)F}{(1-A)(1-D)+(L-B)C},\]

where all the quantities \(A,\;B,\;C,\;D,\;E,\;F\) and \(L\) in these expression are the overall values for the cascaded system. Once this is done the overall system will look perfectly well aligned, despite the individual misalignments of its various internal elements.

### Misaligned Resonators or Periodic Systems

A slightly different viewpoint and approach can also be useful in discussing the ray matrix properties of an optical resonator, or its equivalent iterated periodic focusing system, in the situation where individual optical elements inside the resonator may be misaligned.

Suppose we unfold an optical resonator having one or more misaligned internal elements into an equivalent periodic system. Each individual period of the resulting lensguide, corresponding to one round trip in the resonator, will then have an overall element axis, with respect to which that individual period or round trip will look like an ideal aligned system.

This element axis, however, in general will not come back on itself after one round trip—that is, the element axis in each individual period may be tilted with respect to the reference optical axis running through the repeated sections of the lensguide, so that the element axes in successive periods do not connect to each other.

Is there then some better or alternative way to define an effective axis in a misaligned resonator or periodic system? To answer this question we might recall that the distinguishing characteristic of the axis in an aligned paraxial system is that a ray vector which starts out exactly aligned along the axis always remains exactly on the axis.

We might ask therefore if, starting from any given reference plane within a misaligned resonator or periodic system, there will be some unique "axis ray," let us label it by \(r_0\), whose displacement and slope (measured with respect to the reference optical axis) will exactly repeat themselves after one period or one round trip through this \(\text{ABCDEF}\) system?

Such a ray, which self-reproduces after one round trip, is given by the conditions that

\[\tag{70}\bf Mr_0+E=r_0\quad\text{or}\quad r_0=(I-M)^{-1}E,\]

where \(\bf I\) is the identity matrix, and the -1 superscript means the inverse of the matrix within the parentheses. If we carry out the algebra, we can find that the displacement and slope of this "axis ray" are given (at this one particular reference plane) by

\[\tag{71}r_0\equiv\frac{(1-D)E+BF}{2-A-D}\quad\text{and}\quad r'_0\equiv\frac{CE+(1-A)F}{2-A-D}.\]

It is then easy to show that the transformation of any other input ray \(r_1\) through the misaligned system is given by

\[\tag{72}(r_2-r_0)=\bf{M}\times(r_1-r_0),\]

where the \(\text{ABCD}\) elements are the round-trip elements starting from and coming back to some particular reference plane inside the resonator.

This particular ray \(r_0\) then represents a kind of misaligned "natural optical axis" for the misaligned periodic system, as observed at this particular reference plane.

The resonator or periodic system becomes in effect a well-aligned \(\text{ABCD}\) system if the input and output ray coordinates are measured relative to the axis ray \(r_0\equiv[r_0,\;r'_0]\) at the particular reference plane \(z_0\) used to define the \(\text{ABCDEF}\) matrix elements. If a ray starts around the resonator with input displacement and slope given by \(r_0\), it will return to this same position on every successive round trip.

Any other ray, however, starting off with different initial values, will oscillate about this ray (or possibly diverge from it) in exactly the stable or unstable periodic fashion described earlier for aligned periodic systems.

### Differences Between the Axis Ray and the Overall Element Axis

We note again that the axis ray for a misaligned resonator or periodic system is not the same in general as the "overall element axis" we discussed a few paragraphs back.

The overall element axis through a given collection of misaligned elements is a straight line through these elements, as in Figure 15.20, such that if the ray displacements and slopes are measured relative to this axis, the overall system will act like an aligned \(2\times2\) matrix from input to output.

The axis ray through the same collection of elements, by contrast, will consist in general of a series of bent or even curving segments, with respect to which the system again acts like an aligned \(2\times2\) matrix.

The axis ray has the property that it comes out parallel to itself after one pass through the system. However, although the axis rays at the input and output planes have the same displacement and slope, and thus are parallel to each other, they do not in general define a single straight line through the system, whereas does the overall element axis does.

In fact, in an optical resonator or periodic system with several individually misaligned elements the axis ray, which acts as the effective optical axis for the periodic system, will trace out a zig-zag course within the \(\text{ABCD}\) system, shifting or bending from plane to plane within the period or round trip.

Moreover, the axis rays going in the forward and reverse directions through a standing-wave resonator may not lie on top of each other (though they must intersect in position, but not necessarily in slope, at the end mirrors); and also the axis ray in a misaligned system may or may not coincide with the element axis of any individual element at the point where it intersects that element.

Such an axis ray nonetheless always exists.

### Summary

The overall conclusion of this section is clearly that (small) displacements or misalignments of individual paraxial elements are usually not a serious problem. They can be handled with the extended matrix technique of this section if desired, but in general they do not change the basic focusing or stability problems of a paraxial \(\text{ABCD}\) system.

If we are designing an extended beam transmission system and perhaps wish to know the sensitivity of the overall system alignment to misalignments of individual elements, then the techniques of this section can be very useful. If the problem is merely to design and evaluate the stability and spot size properties of a closed resonator, then misalignment effects can be ignored.

### Problems for 15.4

1. Error vector for a tilted Hat mirror. What is the error vector \(E\) for a flat mirror which is misaligned (i.e., tilted) by a small angle \(\theta\) relative to its aligned position (assuming its aligned position is perpendicular to the reference optical axis of the system).

2. Misaligned optical resonator. Consider an optical resonator consisting of an aligned flat mirror at the left-hand end; a collection of aligned optical elements having an overal \(\text{ABCD}\) matrix going in the \(+z\) direction from the left-hand mirror to the right-hand mirror; and another planar mirror at the right-hand end which is misaligned by a small tilt angle \(\theta\). Find formula for the overall element

axis and axis ray for one round trip in this resonator, or in its equivalent periodic lensguide.

Calculate and sketch the locations of these rays for the specific situation of a resonator of length \(L\) with a thin lens of focal length \(f\) located at the center of the cavity, for both a stable resonator \((L/4<f<\infty)\) and a positive-branch unstable resonator \(f<0\).

3. More misaligned resonators. Repeat the previous problem assuming the thin lens of focal length \(f\) is located just in front of the left-hand mirror, and then just in front of the right-hand mirror. (The stability conditions are different in each of these situations.)

4. Finding the axis ray in another optical resonator with misaligned elements. A laser resonator of total length \(L\) consists of two intracavity lenses of focal length \(f=2L\) equally spaced between two flat end mirrors.

One lens is displaced above the optic axis of the resonator by a small distance \(\Delta=\epsilon\); the other is displaced downward by \(\Delta=-2\epsilon\). Trace the "axis ray" through this resonator.

## 5. RAY MATRICES IN CURVED DUCTS

As still another example of an interesting ray matrix system, consider a quadratic duct as defined previously, in which the transverse index variation \(n=n(r)\) is constant with distance, but assume now that this duct is twisted or bent, so that the axis of the duct at any plane \(z\) is displaced from a straight reference axis by a small amount \(\Delta(z)\) as in Figure 15.21. (This could represent a curved or twisted optical fiber.)

What is the \(\text{ABCD}\) matrix for this curved duct?

### Differential Matrix Analysis

Following the combined approach of the preceding two sections, we can suppose that \(\bf M(z)\) represents the \(3\times3\) \(\text{ABCDEF}\) matrix for such a system from an input plane \(z_0\) up to plane \(z\), with elements \(A(z)\) through \(F(z)\). Then from the cascading properties of ray matrices we can write that

\[\tag{73}\textbf{M}(z+dz)=\textbf{M}(dz)\times\textbf M(z),\]

where \(\textbf{M}(dz)\) is the ray matrix for the short distance \(dz\) from \(z\) to \(z+dz\).

Now, for a thin segment of transversely displaced duct, as in Figure 15.21, this matrix has the form, in the limit as \(dz\rightarrow 0\), of

\[\tag{74}\textbf{M}(dz)=\left[\begin{array}&1&n_0^{-1}dz&0\\-n_0\gamma^2dz&1&n_0\gamma^2\Delta(z)dz\\0&0&1\end{array}\right]\]

Multiplying the matrices \(\textbf{M}(dz)\) and \(\textbf{M](z)\) together and comparing them term-by- term with the matrix \(\textbf{M}(z+dz)\), then gives the differential relations

\[\tag{75}\begin{array}\frac{dA(z)}{dz}=n_0^{-1}C(z),&\frac{dB(z)}{dz}=n_0^{-1}D(z)\\\frac{dC(z)}{dz}=-n_0\gamma^2A(z),&\frac{dD(z)}{dz}=-n_0\gamma^2B(z),\end{array}\]

plus the two additional equations

\[\tag{76}\frac{dE(z)}{dz}=n_0^{-1}F(z)\quad\text{and}\quad\frac{dF(z)}{dz}=-n_0\gamma^2[E(z)-\Delta(z)].\]

Solving the first four equations, starting from \(z_0\), gives the overall \(\text{ABCD}\) matrix as a function of distance in the form

\[\tag{77}A(z)=D(z)=\cos\gamma(z-z_0),\quad n_0\gamma B(z)=-(n_0\gamma)^{-1}C(z)=\sin\gamma(z-z_0)\]

which agrees with what we already know from Equation 15.15. The overall \(\text{ABCD}\) matrix is again unchanged by curvature or misalignment of the duct.

### Effects of Duct Misalignment

The final two equations, which are independent of \(\text{ABCD}\), however, yield the formal solutions

\[\tag{78}\begin{array}&E(z)=\gamma\int^z_{z_0}\Delta(z')\sin\gamma(z-z')dz'\\F(z)=n_0\gamma^2\int^z_{z_0}\Delta(z')\cos\gamma(z-z')dz'.\end{array}\]

There is one particular situation where these solutions can be quite important. Suppose the axis displacement \(\Delta(z)\) in the duct has a natural periodic component with a spatial variation \(\Delta(z)=\cos\gamma_1z\) or \(\sin\gamma_1z\), and suppose that \(\gamma_1\) equals or closely matches the natural ray oscillations at \(\cos\gamma z\) or \(\sin\gamma z\).

The integrands in Equation 15.78 will then contain \(\cos^2\gamma z\) or \(\sin^2\gamma z\) factors which will integrate cumulatively with distance \(z\). This then implies that the displacement parameters \(E(z)\) and \(F(z)\), or in essence the cumulative amount of misalignment in the duct, will grow more or less linearly with distance.

Problems will thus result if the physical curvature or waviness of a duct has a periodic variation that resonates with the natural oscillation period for optical rays about the axis of the duct.

The system axis of the duct then seems to diverge by an increasing amount from the physical axis (or element axis) of the duct as we go further down the duct. In more physical terms this means that the periodic oscillations of rays in the duct will appear to grow linearly in amplitude

with distance, until these rays encounter the edges of the duct, or some other nonlinearity occurs to limit their growth.

If the duct has instead a randomly wavy axis, i.e., with random variations in \(\Delta(z)\) along the length of the guide, then the oscillations in off-axis rays will grow as the square root of distance along the guide rather than linearly with the distance \(z\).

The growth rate for this process will be proportional to the amplitude of the spatial frequency components of \(\Delta(z)\) in the immediate vicinity of the natural wave number \(\gamma\).

## 6. NONORTHOGONAL RAY MATRICES

We have noted earlier that in optical systems with rotational symmetry the same ray matrices apply equally but separately to the \(x,x'\) and the \(y,y'\) ray coordinates.

In the slightly more complicated situation of optical elements having simple astigmatism, the ray matrices will be different along the \(x\) and \(y\) coordinates. A thin cylindrical lens having its cylinder axis aligned along the x axis (Figure 15.22), for example, will act as a thin lens with the appropriate \(\text{ABCD}\) matrix so far as the \(y\) transverse coordinate is concerned, but will have no focusing or bending effect on the \(x\) displacement of the ray.

Suppose that an overall optical system contains several such astigmatic elements, but these elements all have their principal axes aligned along the same \(x\) and \(y\) axes.

We can then still analyze the ray behavior in each transverse coordinate separately and independently, using separate \(\text{ABCD}\) matrices for the \(x\) and the \(y\) directions. Such an astigmatic system, for example, might even be stable in one coordinate and unstable in the other.

Systems having only simple astigmatism, and thus describable by separate and independent ray matrices in two principle planes that are \(90^\circ\) apart, are commonly referred to as orthogonal systems.

Systems not having this property are said to be nonorthogonal. Nonorthogonal systems in general exhibit one or another kind of "twist" or image rotation, which is more complicated than simple astigmatism, and which in general does not permit the ray matrices to be separated into two separate ray matrices along two orthogonal axes.

The ray analysis of nonorthogonal paraxial optical systems has not yet been extensively developed, and we can therefore summarize in this section only a few results concerning such systems.

### General Analysis of Nonorthogonal Ray Optical Systems

It would be useful, for example, to establish the most general forms that the ray matrices of both orthognal and nonorthogonal optical systems can assume if we include such operations as arbitrary astigmatism, image rotation, and image inversion.

These questions will not be fully answered in this section, although we will derive some of the general properties of nonorthogonal systems by building up from combinations of elementary ray operations and matrices.

We are particularly interested in establishing the conditions under which an optical system will remain orthogonal, so that the system can be described by separate and independent ray matrices along two orthogonal transverse directions.

There are first of all two basically different ways in which we might write the \(4\times4\) matrices needed to describe the ray coordinates in both the \(x\) and \(y\) transverse coordinates.

One way is to organize the ray coordinates in the form of displacements and then slopes, e.g.,

\[\tag{79}\begin{bmatrix}x_2\\y_2\\\hline x'_2\\y'_2\end{bmatrix}=\begin{bmatrix}A_{xx}\quad A_{xy}&|&B_{xx}\quad B_{xy}\\A_{yx}\quad A_{yy}&|&B_{yx}\quad B_{yy}\\\hline C_{xx}\quad C_{xy}&|&D_{xx}\quad D_{xy}\\C_{yx}\quad C_{yy}&|& D_{yx}\quad D_{yy}\end{bmatrix}\begin{bmatrix}x_1\\y_1\\\hline x'_1\\y'_1\end{bmatrix},\]

or in shorthand notation

\[\tag{80}\begin{bmatrix}r_2\\\hline r'_2\end{bmatrix}=\begin{bmatrix}A|B\\\hline C|D\end{bmatrix}\begin{bmatrix}r_1\\\hline r'_1\end{bmatrix},\]

where we use the notation in these paragraphs that \(r\) and \(r'\) are column vectors with elements \(r\equiv[x,y]\) and \(r'\equiv[n_x\;dx/dz,\;n_y\;dy/dz]\), and \(\bf A,\;B,\;C\) and \(\bf D\) are all \(2\times2\) matrices.

For an astigmatic but orthogonal system with its principal axes oriented along the \(x\) and \(y\) directions, all four of these matrices will then be diagonal, i.e., the \(xy\) and \(yx\) elements that couple between the \(x\) and \(y\) axes will all be zero.

Any rotation of the coordinate system will make these off-diagonal elements nonzero, although for orthogonal systems there will be constraints among the diagonal and off-diagonal elements.

A general nonorthogonal system will have off-diagonal elements between the \(x\) and \(y\) directions that cannot be removed by any coordinate rotation.

Expressing the \(4\times4\) problem in the form of Equation 15.79 has a number of advantages, as discussed for example by Nazarathy (see References).

If a superscript \(\bf T\) indicates the matrix transpose, then it can be shown (see References) that even in the most general nonorthogonal system these \(2\times2\) matrices must satisfy the constraints

\[\tag{81}\begin{array}&\bf{AB}^T=BA^T,&\bf B^TD=D^TB\\\bf DC^T=CD^T,&\bf C^TA=A^TC\end{array}\]

as well as

\[\tag{82}\bf AD^T-BC^T=A^TD-B^TC=I\]

where \(\bf I\) is the identity matrix. The last two relations are obviously the nonorthogonal generalizations of the \(AD-BC=1\) relation for orthogonal \(2\times2\) ray matrices.

There are potentially sixteen elements in the general \(4\times4\) ray matrix, but as a result of these six relations there are only ten independent elements (as also pointed out by Arnaud).

We can also show, following Nazarathy, that with this form for the \(4\times4\) matrices the general form of the Huygens-Fresnel integral that we will introduce in a later chapter can be put into the very beautiful form

\[\tag{83}\tilde u_2(r_2)=\frac{j}{|\bf B|^{1/2}\lambda}\int^\infty_{-\infty}\bf{\tilde K}(\bf {r_2,r_1})\tilde u_1(r_1)dr_1,\]

where \(|\bf B|^{1/2}\) is the square root of the determinant of the \(\bf B\) matrix, and \(\bf{\tilde K}\) is the exponential part of the Huygens' kernel given by

\[\tag{84}\bf{\tilde K}(\textbf r_2,\textbf r_1)\equiv\text{exp}\left[-j\frac{\pi}{\lambda}(\textbf r_1\cdot\bf B^{-1}A\cdot\textbf r_1-2\textbf r_1\cdot B^{-1}\cdot r_2+r_2\cdot DB^{-1}\cdot r_2)\right],\]

with \(\textbf B^{-1}\) being the inverse of the \(\bf B\) matrix. This form of Huygens' integral is then equally valid for orthogonal or nonorthogonal systems.

### Alternative Matrix Notation

An alternative notation to Equation 15.79 for ray systems in two transverse dimensions is to organize the coordinates and matrix elements in the form

\[\tag{85}\begin{bmatrix}x_2\\x_2'\\\hline y_2\\y_2'\end{bmatrix}=\begin{bmatrix}A_{xx}&B_{xx}&|&A_{xy}&B_{xy}\\C_{xx}&D_{xx}&|&C_{xy}&D_{xy}\\\hline A_{yx}&B_{yx}&|&A_{yy}&B_{yy}\\C_{yx}&D_{yx}&|&C_{yy}&D_{yy}\end{bmatrix}\begin{bmatrix}x_1\\x_1'\\\hline y_1\\y_1'\end{bmatrix}\]

As a shorthand notation we will write this equation in the partitioned matrix form

\[\tag{86}\begin{bmatrix}x_2\\y_2\end{bmatrix}=\begin{bmatrix}\textbf{M}_{xx}&|&\textbf{M}_{xy}\\\hline\textbf{M}_{yx}&|&\textbf{M}_{yy}\end{bmatrix}\begin{bmatrix}x_1\\y_1\end{bmatrix},\]

where \(x\) and \(y\) are the ray vectors in the \(x\) and \(y\) coordinates, respectively; \(\textbf M_{xx}\) and \(\textbf M_{yy}\) are the ordinary \(2\times2\) \(\text{ABCD}\) matrices applying to the \(x\) and \(y\) directions;

and \(\textbf M_{xy}\) and \(\textbf M_{yx}\) are the cross-matrices between the \(x\) and \(y\) directions.

We will pursue some applications and consequences of this alternative matrix arrangement in the remainder of this section.

### Rotated Astigmatic Optical Systems

Most of the difficulties in nonorthogonal systems arise from questions of rotation, where the term rotation can mean either coordinate system rotation or actual image rotation of a ray bundle by arbitrary angles about the direction of propagation.

Let us therefore examine in some detail the analytical effects that arise from such rotations.

For example, we might begin by organzing the \(4\times4\) ray matrix for an astigmatic but still orthogonal system, aligned along its principle axes, in the form

\[\tag{87}\begin{bmatrix}x_2\\x'_2\\\hline y_2\\y'_2\end{bmatrix}=\begin{bmatrix}A_x\quad B_x\quad&|&\\C_x\quad D_x&|&\quad\\\hline\quad&|& A_y\quad B_y\\&|& C_y\quad D_y\end{bmatrix}\begin{bmatrix}x_1\\x'_1\\\hline y_1\\y'_1\end{bmatrix}\]

where we will follow the convention that any elements not written are zero. The \(x\) and \(y\) quantities in this situation are entirely uncoupled.

At any position \(z\) we can always make a coordinate rotation from our original \(x_1,\;y_1\)coordinates to a set of axes \(x_2,\;y_2\) which are rotated about the \(z\) axis by an angle \(\theta\) (Figure 15.23).

This is done analytically by applying the general rotation matrix

\[\tag{88}\begin{bmatrix}x_2\\x'_2\\\hline y_2\\y'_2\end{bmatrix}=\begin{bmatrix}\cos\theta&|&\sin\theta\qquad\\\qquad\quad\quad\cos\theta&|&\quad\qquad\sin\theta\\\hline-\sin\theta\quad\qquad&|&\cos\theta\quad\qquad\\\quad\qquad-\sin\theta&|&\quad\qquad\cos\theta\end{bmatrix}\]

where subscript 1 refers to the ray coordinates measured in the old coordinate system and subscript 2 refers to the same ray measured in the new (rotated) coordinate system. We can then write this in shorthand notation as

\[\tag{89}\begin{bmatrix}x_2\\\hline y_2\end{bmatrix}=\begin{bmatrix}C\theta&|&S_\theta\\\hline -S_\theta&|&C_\theta\end{bmatrix}\begin{bmatrix}x_1\\\hline y_1\end{bmatrix},\]

where \(C_\theta\) and \(S_\theta\) (with suitable subscripts) represent the cos and sin of the rotation angle, with each of these understood to be multiplied by the identity matrix which is not written out. Rotation in the opposite direction simply reverses the sign of \(S_\theta\).

Suppose an orthogonal astigmatic element is physically rotated about the \(z\) axis by an arbitrary angle \(\theta\), as in Figure 15.23, and that we wish to describe the ray propagation through this element written in the original or unrotated coordinate system.

To pass a ray through this rotated element analytically using the original \(x_1,\;y_1\) axes, we must transform from our original axes into the rotated principal axes of the element; propagate through the element using the \(\text{ABCD}\) matrices along its principal axes; and then rotate back to our original axes by a rotation of amount \(-\theta\).

If we carry out this procedure, the ray matrix of the rotated astigmatic element written in the original \(x,\;y\) coordinate axes is the cascade product

\[\tag{90}\begin{bmatrix}C_\theta&|&-S_\theta\\\hline S_\theta&|&C_\theta\end{bmatrix}\times\begin{bmatrix}\textbf{M}_{xx}&|&\quad\\\hline\quad&|&\textbf{M}_{yy}\end{bmatrix}\times\begin{bmatrix}C_\theta&|&S_\theta\\\hline-S_\theta&|&C_\theta\end{bmatrix}\]

which can be manipulated into the form

\[\tag{91}\begin{bmatrix}C^2_\theta\textbf{M}_{xx}+S^2_\theta\textbf{M}_{yy}&|&S_\theta C_\theta\textbf({M}_{xx}-\textbf{M}_{yy})\\\hline S_\theta C_\theta(\textbf{M}_{xx}-\textbf{M}_{yy})&|&S_\theta^2\textbf{M}+C^2_\theta\textbf{M}_{yy}\end{bmatrix}\]

An orthogonal system rotated to an arbitrary angle \(\theta\) will thus have a \(4\times4\) matrix of this general form.

In particular we can deduce that in an orthogonal but arbitrarily rotated system, the upper right and lower left \(2\times2\) blocks may not be zero, but they will always be identical, as illustrated in Equation 15.91.

### Two Rotated Elements in Cascade

Suppose next that two individually orthogonal but astigmatic elements or systems are arranged in cascade, and are rotated to arbitrary angles \(\theta_1\) and \(\theta_2\) about the \(z\) axis (see Figure 15.24), with element #1 passed through first.

The overall ray matrix of these cascaded elements is then the matrix product of two rotated matrices of the type given in Equation 15.91, with appropriate subscripts to identify the first and second systems (e.g., \(S_{\theta_1}\equiv\sin\theta_1\) for the first element; \(\textbf M_{xx,1}\) is the \(x\)-axis ray matrix of the first element in its own principal axes;

\(\textbf M_{yy,2}\) is the \(y\)-axis ray matrix of the second element in its own principal axes; and so forth).

The overall matrix product that results from carrying out this multiplication is lengthy and not particularly transparent.

But suppose this overall product is written in the shorthand form

\[\tag{92}\begin{bmatrix}\textbf M_{xx}|\textbf M_{xy}\\\hline\textbf {M}_{yx}|\textbf {M}_{yy}\end{bmatrix}=\begin{bmatrix}\text{overall}\quad 4\times4\\\text{matrix}\;\text{product}\end{bmatrix}.\]

In this situation the \(2\times2\) \(\textbf M_{xx}\) and \(\textbf M_{yy}\) matrices are no longer necessarily correct \(\text{ABCD}\) matrices by themselves, but are merely the upper left and lower right blocks of the overall \(4\times4\) matrix, whereas the \(\textbf M_{xy}\) and \(\textbf M_{yx}\) are cross matrices between the \(x\) and \(y\) coordinates.

Now, if this overall cascaded system is to be an orthogonal system, then the upper right and lower left blocks must be identical, i.e., \(\textbf M_{xy}=\textbf M_{yx}\), in the same way as in the rotated orthogonal system of Equation 15.91.

All of these blocks are complicated functions of the rotations \(\theta_1,\;\theta_2\) and the individual system matrices. It can be shown, however, after some algebra, that the upper right and lower left blocks of the cascade product of Equation 15.91 will differ by the amount

\[\tag{93}\textbf M_{xy}-\textbf M_{yx}=\sin(\theta_2-\theta_1)\cos(\theta_2-\theta_1)(\textbf M_{xx,1}-\textbf M_{yy,1})(\textbf M_{xx,2}-\textbf M_{yy,2})\]

We can deduce from this that a cascaded system of two rotated astigmatic elements will in general be orthogonal only if \(\text{(i)}\) \(\theta_2-\theta_1=0^\circ\;\text{or}\;90^\circ\), which means the two elements have relative rotations such that their principal planes coincide; or else if \(\text{(ii)}\) \(\textbf M_{xx,1}=\textbf M_{yy,1},\;\text{or}\;\textbf M_{xx,2}=\textbf M_{yy,2}\), which means that one or the other of the cascaded systems is not astigmatic (e.g., is rotationally symmetric).

To phrase this in the opposite sense, we can conclude that, except for these very special situations, an optical system having cascaded astigmatic elements rotated at arbitrary angles will in general not be orthogonal.

Such a system will not have any pair of transverse coordinates separated by \(90^\circ\) with respect to which a ray can be analyzed by separate and independent \(\text{ABCD}\) matrices.

### Image Rotation

Paraxial optical systems of the most general form can also exhibit image rotation in addition to inversion and astigmatism. Image rotation means that the displacement and slope of a ray on passing through an element are actually rotated in the \(x,y\) plane in the manner given analytically by the general \(4\times4\) rotation matrix given in Equation 15.88.

We introduced the coordinate rotation notation given above at first to represent simply a purely mathematical transformation of coordinates. In simple situations we may rotate the \(x,\;y\) coordinate system by an angle \(\theta\), perhaps in order to line up the coordinate system with the principal axes of an astigmatic element.

We may then rotate the coordinate system back by \(-\theta\) to the original axes further along the \(z\) axis, after passing through the astigmatic element.

However, there are also optical systems which accomplish genuine physical rotation of the ray position even with respect to fixed coordinate axes. This image rotation is also given analytically by the same rotation matrix using \(C_\theta\) and \(S_\theta\) as given in Equation 15.88, but with the rotation operation now viewed as operating on the rays with respect to fixed coordinate axes.

Such image rotation systems often also contain one or more image inversions. A beam passing a partially rotated Dove.prism is one simple example of this type. In such a system the rotation matrix only operates once—there is no "reverse rotation" later on.

### Nonplanar Ring Resonators

The concepts of coordinate rotation versus image rotation become particularly indistinguishable for a twisted or nonplanar ring resonator (see Figure 15.26).

When rays bounce off a mirror at other than normal incidence, as in any ring resonator, it is most natural to use transverse coordinate axes that lie in the plane and perpendicular to the plane of incidence defined by the ray axes just before and after reflection.

This is particularly desirable when reflecting off spherical mirrors at other than normal incidence, since the effective radius of curvature of the mirror becomes \(R\cos\theta_0\) for rays in the plane of incidence and \(R\cos\theta_0\) for rays perpendicular to the plane of incidence, where \(\theta_0\) is the angle between the incident direction and the normal to the mirror.

Analyzing the ray propagation in going around a twisted or nonplanar ring then requires repeated coordinate rotations just before each mirror, in order to bring the transverse \(x,\;y\) axes into agreement with the plane of incidence and reflection of the optical rays on that particular mirror.

For a twisted ring, these rotations at each mirror may or in general may not sum to zero net rotation after a complete round trip.

We can then view this situation either as a set of sequential coordinate transformations which do not bring the final coordinate axes back in alignment with the initial axes after one round trip; or alternatively we may view this as a physical

rotation of the image or of the ray vectors as seen in the original transverse coordinates after one round trip. The result either way is a net nonzero rotation of the ray coordinates in one round trip.

An image rotation plus an orthogonal system in cascade will have a net \(4times4\) matrix in one of the two forms

\[\tag{94}\begin{bmatrix}C_\theta\textbf M_{xx}&|&S_\theta \textbf M_{xx}\\\hline -S_\theta\textbf M_{yy}&|&C_\theta\textbf M_{yy}\end{bmatrix}\quad\text{or}\quad\begin{bmatrix}C_\theta\textbf M_{xx}&|&S_\theta\textbf M_{yy}\\\hline -S_\theta\textbf M_{xx}&|&C_\theta\textbf M_{yy}\end{bmatrix},\]

depending on whether the rotation or the astigmatic element comes first. Systems with image rotation are clearly not orthogonal.

### Summary

The analysis of general nonorthogonal systems, i.e., those having image rotation, inversion, and/or cascaded and rotated astigmatic elements, thus becomes significantly more complicated than for the simple \(2\times2\) ray matrix.

Arnaud and others have shown, for example, that the most general \(4\times4\) ray matrix has just ten independent elements out of the sixteen total elements. A general nonorthogonal system can also be separated into independent x and y coordinates in a particular nonorthogonal set of \(x\) and \(y\) axes, i.e., a set of transverse coordinates that are not at \(90^\circ\) to each other.

Systems with image rotation generally also rotate the electric field polarization of a real optical wave, leading to added complexities for the polarization eigenmodes of such a resonator.

We will not explore any of these properties of nonorthogonal optical systems further in this text, and the remainder of our discussions in the following chapters will apply only to orthogonal astigmatic systems, with separable and orthogonal \(x\) and \(y\) axes.