Category Archives: Geometry

Geodesics

Straight Lines

When we think of a straight line, we usually think of a line in the Euclidean sense; that is, c(t)=p+tX, where p is a point contained in the line, t is a real number, and X is a vector that points parallel to the line. If we consider Euclidean space as a manifold, we would say that X is in the tangent space T_{c(t)}(\mathbb E^n), because c'(t)=X. One important observation to make is that all along c(t), X never changes; i.e., we never accelerate. That is, if we move along the curve, we never speed up or slow down, and we never turn.

In the language of my post on covariant derivatives, this is easy to express:

\nabla_{c'} c' \equiv 0

The geometric interpretation is simple here: in the direction of the velocity vector, the velocity vector doesn’t change. You can probably see the punchline coming by now. If we generalize to a curve c(t) on a manifold M, c(t) is a geodesic if \nabla_{c'} c' \equiv 0.

Now, you may notice that we can trace out the same curve if we tweak the parameter t so that we could accelerate on the curve (we wouldn’t turn, but we could speed up or slow down). That is, we could have an alternate parametrization. But in order to have a geodesic, we need \nabla_{c'} c' \equiv 0, so \nabla_{c'}\left<c',c' \right> = 2\left<\nabla_{c'} c',c' \right> = 0, and therefore \|c'\| is a constant along the curve. This gives us a unique parametrization of the curve, up to a constant scaling factor on the parameter. In fact, if we consider such a scaling factor, we get that \nabla_{c'(st)} c'(st) = \nabla_{s c'(t)} s c'(t) = s^2\nabla_{c'(t)} c'(t) = 0, so a geodesic with a constant scaling factor on its parameter is still a geodesic (and obviously has the same image). This motivates the following definition: if \|c'\| = 1 then the geodesic is called a normal geodesic.

The Exponential Map

Say that some curve \gamma(t) is a geodesic. Then \nabla_{\gamma'}\gamma' = 0 is a second-order differential equation in t. If we assume that \gamma(0) = p, and \gamma'(0) = v, then we have the required conditions for existence and uniqueness of a solution to the differential equation. That is, given a point p\in M and tangent vector v\in T_p(M), there is a unique geodesic \gamma_v that passes through p with velocity v.

The exponential map \text{exp}_p:T_p(M)\to M is defined as \text{exp}_p(v) = \gamma_v(1), assuming that 1 is in the domain of \gamma_v. The exponential map is fairly important when talking about Riemannian manifolds, and it turns out that it is smooth and a local diffeomorphism. The latter means that there is a neighborhood around p where its unique inverse exists. This inverse is the logarithmic map, or \text{log}_p:M\to T_p(M).

The exponential map is so important, in fact, that it appears in many of the important theorems in Riemannian geometry, like the Hopf-Rinow Theorem and the Cartan-Hadamard Theorem. It’s also essential to understanding the effects of curvature on a Riemannian manifold.

Arc Length

At this point we can ask about the relationship between arc length and geodesics. Assume that we have some smooth function \alpha : [a,b]\times(-\epsilon,\epsilon)\to M. We can compute the change in arc length L[c_s] over the family of curves c_s = \alpha | [a,b]\times\{s\}:

\frac d{ds}L[c_s] = \frac d{ds}\int_a^b\left<c_s'(t),c_s'(t)\right>^{1/2}dt = \int_a^b\nabla_S\left<T,T\right>^{1/2}dt
= \frac 1 2\int_a^b\left<T,T\right>^{-1/2}\nabla_S\left<T,T\right>dt = \int_a^b\left<T,T\right>^{-1/2}\left<\nabla_S T,T\right>dt

The variables S,T that we substitute here are fields of tangent vectors corresponding to the differential of \alpha with respect to the variables s,t. The rest is just calculus. Since s,t are independent of each other, we know that their derivatives commute and so we can say that [T,V] = 0. This means that we can make the switch \nabla_S T = \nabla_T S:

\frac d{ds}L[c_s] = \int_a^b\left<T,T\right>^{-1/2}\left<\nabla_T S,T\right>dt
= \int_a^b\left<T,T\right>^{-1/2}\left(T\left<S,T\right>-\left<S,\nabla_T T\right>\right)dt

If we consider the curve c_0, and consider that we can always reparametrize a curve without loss of generality so that l = \left<T,T\right>^{1/2} is a constant,

\frac d{ds}L[c_s]\mid_{s = 0} = l^{-1} \left(\left<S,T\right>\mid_a^b-\int_a^b\left<S,\nabla_T T\right>dt\right)

This is called the first variation formula. The function \alpha is called a variation. If we assume that all the c_s are curves that join two points in M, then we know that S vanishes at the endpoints. If we further assume that c_0 is a geodesic, then the integral vanishes (because \nabla_T T = 0). What this means is that geodesics are critical points of the arc length function L for curves that join two points.

We can’t claim that a geodesic segment minimizes the distance between two points (though there is a unique minimizing geodesic segment; for that we need the second variation formula, which I won’t get into in this post). To see this, consider the case when M is a sphere, with the usual angular metric. If we consider any two distinct points, there is a great circle path that joins them that is of length the angular distance between them, \delta. However, there is also a path of length 2\pi - \delta that goes around “the long way” that joins the points as well. This path happens to be the longest one that you can take, and it’s also a geodesic segment. Obviously this would be a maximum of the first variation formula.

It’s easy to see that the first variation formula gives us a lot of power in talking about the geometry of a Riemannian manifold. The source that I use actually motivates the definition of a geodesic from an effort to minimize the first variation formula. I prefer to motivate it from the “straight line” perspective.

Sources

Much of this material comes from Comparison Theorems in Riemannian Geometry by Jeff Cheeger and David G. Ebin.

Riemannian Connections

For the project that I’m working on, I needed to know the basics of riemannian connections. Connections confused the hell out of me until I took a few days to really absorb them. I’m writing down my interpretation here so that I can burn it into the neurons, and hopefully help someone else trying to understand the same topic.

Covariant Derivatives of Scalar Functions

A connection is also called a covariant derivative. One of the principles of differential geometry is that everything should behave the same regardless of which coordinate system you work in, so we’d like a way to get the derivative of a quantity when along an arbitrary direction. When we consider a scalar function f, the covariant derivative is just the directional derivative. If X = \sum_{k=1}^n b_j E_j :

\nabla_X f = Xf = \sum_{i=1}^n a_i \frac{\partial f}{\partial x_i}

I found it extremely useful to think of the covariant derivative as a linear operator:

\nabla_X f = \left(\sum_{i=1}^n a_i \frac\partial{\partial x_i}\right)f

Covariant Derivatives of Vector Fields

If we want to apply \nabla_X to a vector field Y, then we can apply the operator:

\nabla_X Y = \left(\sum_{i=1}^n a_i \frac\partial{\partial x_i}\right)Y = \sum_{i=1}^n a_i \frac{\partial Y}{\partial x_i}

Immediately we can see an interpretation for \nabla_X Y: see how Y changes with respect to each coordinate direction, and then sum the resulting vectors together, weighted by each component of X. It’s easy to see how this gives us a coordinate-free derivative of a vector field. What we have right now is called an affine connection.

Affine Connections

Affine connections have two properties; linearity in X and the product rule on fY. This is immediate from the operator representation:

\nabla_{fU+gV} Y = f\nabla_U Y + g\nabla_V Y
\nabla_X fY = (\nabla_X f)Y + f\nabla_X Y

This means that we can expand the representation in X:

\nabla_X Y = \sum_{i=1}^n a_i\nabla_{E_i}Y

It should be pretty obvious that \nabla_{E_i}Y is the same as \partial Y/\partial x_i, in that they both represent how Y changes in the unit direction of x_i. If you’ve been paying attention, you’ve probably been wondering about how we compute these constructs. It’s fairly straightforward to assume that in Cartesian coordinates, we just differentiate each component of Y. What about in other bases? Well, assuming that Y = \sum_{j=1}^n b_j E_j, we can just apply the product rule on the terms:

\nabla_X Y = \sum_{i=1}^n a_i\nabla_{E_i} \sum_{j=1}^n b_j E_j
= \sum_{i=1}^n a_i \left(\sum_{j=1}^n \left(\nabla_{E_i} b_j\right) E_j + \sum_{j=1}^n b_j \nabla_{E_i} E_j\right)
= \sum_{i,j} a_i \frac{\partial b_j}{\partial x_i} E_j + \sum_{i,j} a_i b_j \nabla_{E_i} E_j

In Cartesian coordinates, the second term is going to vanish, because the coordinate directions don’t change with respect to any direction. So our assumption about Cartesian coordinates is correct. In other bases, we can just think of the second term as a corrective factor for the curvature of the coordinate frames. In most texts, the vector \nabla_{E_i} E_j = \sum_{k=1}^n \Gamma_{ij}^k E_k is defined, where the \Gamma_{ij}^k are called Christoffel symbols. I won’t get into them here, except to say that they have some important symmetries.

Riemannian Connections

If you’re familiar with this material, you may have noticed that I’ve hand-waved a lot. There’s a lot of machinery that needs to be set up to prove existence and uniqueness of all these constructs. It’s also machinery that works fairly well in Euclidean space, but we can’t make the same assumptions on general smooth manifolds. We’d like a connection that works on general manifolds, but we need to make some extra assumptions. A Riemannian connection is an affine connection with some extra properties:

\nabla_X Y - \nabla_Y X = \left[X,Y\right]
\nabla_X\left<U,V\right> = \left<\nabla_X U,V\right> + \left<U,\nabla_X V\right>

Where \left<\cdot,\cdot\right> is an inner product on the tangent space, and \left[\cdot,\cdot\right] is the Lie bracket. The first condition imposes a restriction on the coordinate frames that states that the frames must be torsion-free; that is, the coordinate frames may not twist when moving in any particular direction. The second just imposes the product rule on the inner product. Euclidean space already has these properties, so the covariant derivative as I described it above is a Riemannian connection.

These extra rules basically allow us to assume that a connection \nabla is unique on any particular smooth manifold that has an inner product defined on its tangent space, and that we can use the above formula to write it out explicitly. There’s a lot more to it, of course, but we have enough to work with. I’ll be writing more posts that cover this topic, but I encourage you to read up on it yourself and derive your own intuition of what’s going on.

Convexity Using Metric Balls

I figure that I owe my readers a technical post, so while I’m riding home on the bus, I’ll write it up. This occurred to me when I was trying to figure out what to do on the ride. I have a nice gadget with a WordPress app, so why not?

The project that I’m working on now involves defining a notion of convexity for a non-Euclidean space. There are any number of difficulties that you can run into when you attempt to define convexity on an arbitrary space, but I do have a few guarantees:

  • I’m on a manifold, so shapes make “sense,” albeit in a squishy way
  • I don’t have any limitations of convexity; that is, I can make a convex set as large as I like
  • Metric balls are convex

So now I want to define a convex hull of a set of points in this space. I can do this in one of two ways. I can say that the convex hull is the convex set of minimal volume containing the set, or equivalently, that it is the intersection of all convex sets containing the set.

I’d like to say that the intersection of all metric balls containing the set is the same as the convex hull (not just any convex superset of the set, mind you; specifically metric balls that are supersets of the set in question). I don’t necessarily need this lemma to be true, but it would be nice. The way to show that two sets are equivalent is usually to say that one contains the other, and vice-versa.

It’s quite trivial to show that (in this space) the intersection of all metric balls that contain the set also contains the convex hull. Metric balls are, after all, convex. It’s trickier (to me) to show that the converse would also be true; that is, that the convex hull of a set also contains the intersection of all metric balls that contain the set. Any ideas?

[Update: apparently there’s a construction called a ball hull that is exactly the intersection of all metric balls containing a set. Perhaps it is essentially different from a convex hull.]

Spring Break

Spring break is here, so it will be a good time to take a look at a couple things in addition to getting some work done.

Here’s a couple things I’m looking at right now:

As for work, among many other things, I’m trying to find a good “how-to” on deriving a curvature tensor. It seems that differential geometers like to leave these things as an “exercise.”

Visualizing Hyperspheres

Since all you in the blogosphere seem to love hyperspheres so much, here’s a link to someone who put together some visualizations of hyperspheres and polytopes in 4 dimensions:

http://groups.csail.mit.edu/mac/users/rfrankel/fourd/FourDArt.html

The approach is pretty cool, and some of the images are quite stunning.

Reasoning in Higher Dimensions: Measure

In a previous post on this topic, I said that hyperspheres get a bad rap. They’re doing their best to be perfectly round, and someone comes along and accuses them of being inadequate, or weird. It turns out that hyperspheres aren’t really weird at all. It’s measure that’s weird. And where measure is concerned, there are objects out there that truly display that weirdness.

To recap, I was talking about how the volume of a unit hypersphere measured the normal way (with its radius = 1) approaches zero with increasing dimension. I also mentioned that even though a “unit” hypercube that circumscribes the unit sphere (i.e., a hypercube with inradius = 1) has volume that increases exponentially with the dimension (2d), a hypercube with circumradius = 1 decreases even faster than the volume of the hypersphere. Why is one configuration different than the other?

The answer is that they’re not different. A cube is a cube, no matter how you orient it. If its side is of length s, then its volume is sd. What’s different here is our notion of unit measure. We commonly define a unit of volume as the volume of a hypercube with sides of unit length. In that light, it’s not terribly surprising what we know about the volume of hypercubes. So why can’t we just define the unit hypersphere to have unit volume?

This seems objectionable until you realize that we do this all the time in the real world. What’s a gallon? It has nothing to do with an inch or foot. So why do we worry ourselves over defining volume in terms of one-dimensional units? The metric system doesn’t even adhere to this standard. A liter is a cubic decimeter. Why? It just worked out that way. Since these units are all just arbitrary, we could just declare that unit volume is the volume of a unit hypersphere. Or not. So a hypersphere’s volume really isn’t that weird. What seems weird is the discrepancy between the geometries of the hypercube and hypersphere.

Are there objects that do act strangely in higher dimensions? Definitely. Consider a multivariate normal distribution (a Gaussian distribution in multiple dimensions). For the sake of simplicity, I’ll consider one with zero mean and variance σ2:

p(x) = \frac 1 {(2\pi)^{n/2}\sigma^n} \exp\left(\frac{\|x\|^2}{2\sigma^2}\right)

Multivariate Gaussians are all nice and round. What can we say about what the distance from the mean (0) looks like? Well, this is just the variance:

\mathbf{E}\|X\|^2 = \mathbf{E}(X_1^2 + \dots + X_n^2) = n\sigma^2

How much does it deviate from this value? We can apply a Chernoff bound (don’t ask me how; deriving Chernoff bounds is not my strong suit):

\mathbf{P}(\left|\|X\|^2 - n\sigma^2\right| > \epsilon n\sigma^2) \leq \exp\left(-\frac{n\epsilon^2}{24}\right)

Let’s take another look at this bound, though. It’s saying that the probability of the squared distance from the mean deviating from nσ2 by more than a small percentage decreases exponentially with n. So the points that follow the distribution mostly sit in a thin shell around the mean. But the density function still says that the density is highest at the mean. Now that’s weird.

Why does this happen, though? It’s difficult to get a handle on, but the word “density” is what you have to pay attention to. That shell has an incredibly high volume at higher dimensions (it grows with drd-1). High enough that the density is still lower at the shell than at the mean. Why it’s highest in the shell is even more difficult to figure out. I don’t have a good answer, but I suspect that it has something to do with the fact that the distribution must add up to one, and it has to “fill” all the nooks, and it can’t do that at the mean. It has to do this out in this thin shell.

[Update: I guessed last night that if I multiply the p.d.f. by the boundary volume (i.e. “surface area”) of a hypersphere of radius ||x||, then I should see spikes out at σ√n. I was correct. Below, Micheal Lugo confirmed that intuition slightly more rigorously in the comments. He’s a probabilist, so I think I’m safe. :-) ]

There are certainly weird things that happen in higher dimensions. In my opinion, all these things have more to do with measure than geometry.

Reasoning in Higher Dimensions: Hyperspheres

In my last post on higher dimensions, I alluded to the fact that I don’t agree completely with certain notions about higher dimensions. Specifically, I disagree with the idea that the intuition that you take for granted in low dimensions is necessarily ill-equipped to serve you in higher dimensions. Low-dimensional intuition is ill-equipped for many problems, and like most other topics in math, it’s usually most sensible to do the calculations anyway.

Hyperspheres often get brought up with the subject of weirdness in higher dimensions, mostly because they’re easy to understand, and it’s easy to demonstrate the weirdness very quickly. But are they completely weird? Are the examples really fair, or are hyperspheres getting a bad rap?

First, let’s get some notation out of the way. We often like to call a hypersphere an n-sphere, because it’s an n-dimensional manifold. Technically, one of these can exist in any metric space with more than n dimensions (because I’m talking about intuition, I’m assuming it’s Euclidean space). For simplicity, though, we’ll say that it lives in n+1 space, so that we can define it easily:

S^n = \left\{ x \in \mathbb{R}^{n+1} : \|x\| = r\right\}

That’s not all that I want to talk about, though. I also want to talk about the volume of the n-sphere, and in that case, we often talk about a ball, which is just the interior of a sphere. The interior of an n-sphere is an (n+1)-ball, because if the sphere is an n-dimensional manifold, its interior is an (n+1)-dimensional manifold:

B^{n+1} = \left\{ x \in \mathbb{R}^{n+1} : \|x\| < r\right\}

Or more simply:

B^{n} = \left\{ x \in \mathbb{R}^{n} : \|x\| < r\right\}

The volume of this object has a somewhat simple formula:

V_n={\pi^\frac{n}{2}r^n\over\Gamma(\frac{n}{2} + 1)}

Where Γ(x) represents the Gamma function (which is a tad more complicated).

So where’s the counter-intuition? Say that we took the unit ball for all n > 0 and graphed its volume:

Volume vs. dimension of unit n-ball

Volume vs. dimension of unit n-ball


This does seem a little odd. The volume goes up, hits a peak at 5, and then drops, and eventually bottoms out. In fact, with high enough dimension, you won’t see an n-ball have any volume at all. The limit of the volume of any n-ball as n goes to infinity is 0. That is weird. That’s not necessarily something that you’d expect. It also seems weird that the volume starts dropping after a while.

But is all this really that strange? What if we fixed the radius at, say, 1/sqrt(π)? The volume vs. dimension is then just a decreasing function, even at low dimensions. Not surprising when you consider that radii less than 1 should make the volume diminish rapidly. So what about radii greater than 1? What if we fix r at say, 3? The volume peaks out at n = 56, and the volume is about 143 billion … somethings. After that, the volume diminishes back to zero again. All that we’re really saying here is that the geometry of the sphere dominates rn, but rn has enough power to dominate at low dimensions until the geometry cuts over.

What’s so special about rn though? Why is this the gold standard by which we judge the hypersphere? It’s just the hypercube with sides of length r. In fact, the unit sphere is inscribed in a cube with sides of length 2r. What if we considered a hypercube of circumradius r instead of inradius r? That means that a sphere of radius r contains it. If that’s the case, then it has volume strictly less than the sphere’s volume. In fact, its volume is:

V_n=\left(2r\over\sqrt n\right)^n = {(2r)^n\over n^{n\over2}}

which diminishes even faster than the sphere’s volume. So it can’t be the geometry of a cube that makes it keep its volumetric power.

So what’s my point? This is all sounding very counterintuitive. My point is that when you talk about counter-intuition in higher dimensions, it’s helpful to talk about what’s actually going on, instead of maligning poor innocent constructs like the hypersphere. What’s actually going on? More about that later.

But for now, consider this: no matter how many dimensions a sphere has, it’s always perfectly round, and perfectly isotropic. That’s intuition that isn’t lost in higher dimensions.

[Someone posted this to Reddit! Thanks!]