Reasoning in Higher Dimensions: Measure

In a previous post on this topic, I said that hyperspheres get a bad rap. They’re doing their best to be perfectly round, and someone comes along and accuses them of being inadequate, or weird. It turns out that hyperspheres aren’t really weird at all. It’s measure that’s weird. And where measure is concerned, there are objects out there that truly display that weirdness.

To recap, I was talking about how the volume of a unit hypersphere measured the normal way (with its radius = 1) approaches zero with increasing dimension. I also mentioned that even though a “unit” hypercube that circumscribes the unit sphere (i.e., a hypercube with inradius = 1) has volume that increases exponentially with the dimension (2d), a hypercube with circumradius = 1 decreases even faster than the volume of the hypersphere. Why is one configuration different than the other?

The answer is that they’re not different. A cube is a cube, no matter how you orient it. If its side is of length s, then its volume is sd. What’s different here is our notion of unit measure. We commonly define a unit of volume as the volume of a hypercube with sides of unit length. In that light, it’s not terribly surprising what we know about the volume of hypercubes. So why can’t we just define the unit hypersphere to have unit volume?

This seems objectionable until you realize that we do this all the time in the real world. What’s a gallon? It has nothing to do with an inch or foot. So why do we worry ourselves over defining volume in terms of one-dimensional units? The metric system doesn’t even adhere to this standard. A liter is a cubic decimeter. Why? It just worked out that way. Since these units are all just arbitrary, we could just declare that unit volume is the volume of a unit hypersphere. Or not. So a hypersphere’s volume really isn’t that weird. What seems weird is the discrepancy between the geometries of the hypercube and hypersphere.

Are there objects that do act strangely in higher dimensions? Definitely. Consider a multivariate normal distribution (a Gaussian distribution in multiple dimensions). For the sake of simplicity, I’ll consider one with zero mean and variance σ2:

$p(x) = \frac 1 {(2\pi)^{n/2}\sigma^n} \exp\left(\frac{\|x\|^2}{2\sigma^2}\right)$

Multivariate Gaussians are all nice and round. What can we say about what the distance from the mean (0) looks like? Well, this is just the variance:

$\mathbf{E}\|X\|^2 = \mathbf{E}(X_1^2 + \dots + X_n^2) = n\sigma^2$

How much does it deviate from this value? We can apply a Chernoff bound (don’t ask me how; deriving Chernoff bounds is not my strong suit):

$\mathbf{P}(\left|\|X\|^2 - n\sigma^2\right| > \epsilon n\sigma^2) \leq \exp\left(-\frac{n\epsilon^2}{24}\right)$

Let’s take another look at this bound, though. It’s saying that the probability of the squared distance from the mean deviating from nσ2 by more than a small percentage decreases exponentially with n. So the points that follow the distribution mostly sit in a thin shell around the mean. But the density function still says that the density is highest at the mean. Now that’s weird.

Why does this happen, though? It’s difficult to get a handle on, but the word “density” is what you have to pay attention to. That shell has an incredibly high volume at higher dimensions (it grows with drd-1). High enough that the density is still lower at the shell than at the mean. Why it’s highest in the shell is even more difficult to figure out. I don’t have a good answer, but I suspect that it has something to do with the fact that the distribution must add up to one, and it has to “fill” all the nooks, and it can’t do that at the mean. It has to do this out in this thin shell.

[Update: I guessed last night that if I multiply the p.d.f. by the boundary volume (i.e. “surface area”) of a hypersphere of radius ||x||, then I should see spikes out at σ√n. I was correct. Below, Micheal Lugo confirmed that intuition slightly more rigorously in the comments. He’s a probabilist, so I think I’m safe. :-) ]

There are certainly weird things that happen in higher dimensions. In my opinion, all these things have more to do with measure than geometry.

22 responses to “Reasoning in Higher Dimensions: Measure”

1. Suresh

excellent post!

2. Pingback: New math blog — The Endeavour

3. What follows in a version of my previous comment, edited for readability — I didn’t realize the comment software would kill subscripts and superscripts. Feel free to delete the previous comment.

Here’s what’s going on with that shell. The volume of a shell with radius r and thickness dr is C r^(d-1) dr, for some constant C which depends on n. (C is the surface area of the unit ball in n-space, but that doesn’t matter.) The density is proportional to exp(-|x|^2/2σ^2); in a shell of radius r, then it’s exp(-r^2/2σ^2). So the probability density function of the distance of a random point from the origin, where the point is selected from the Gaussian distribution you specified, is obtained from

q(r) = r^(d-1) exp(-r^2/2σ^2)

by normalizing so that the integral is 1. (Working out the normalization factor isn’t that hard, but it’s tedious.)

This has maximum at sqrt(n-1)σ, so we see that the mode of the distribution is there; it’s not quite the same as the expectation but it’s a lot easier to derive.

4. So the points that follow the distribution mostly sit in a thin shell around the mean. But the density function still says that the density is highest at the mean. Now that’s weird.

Why is is weird? The density is highest at the mean and falls exponentially as we move away from the mean. Or am I missing something very obvious?

5. Michael: As it turns out, I did almost that same calculation in Matlab last night, and you’re exactly right. There’s two spikes out at nσ^2 $\sigma\sqrt{n}$. The trick was multiplying the value of the p.d.f. for a particular vector by the infinitesimal change in volume (i.e., the surface area of a hypersphere of radius r).

Panos: The p.d.f. isn’t weird. It’s exactly as you describe. But what’s counterintuitive is that a value is not likely to exist at the mean.

Aside: let’s see if latex works in comments:
$n\sigma^2$

6. Ha! It does!

7. William V

“But what’s counterintuitive is that a value is not likely to exist at the mean”

Not exactly. The distribution is skewed so whats probably confusing you is the fact that your used to using mean as a measure of central tendency when you really want to talk about max likelihood. The mode is the most likely. But for any distribution with finite mean, the repeated averages will converge to the mean which incidentally will asymptotically also be the mode.

• William V

As an addition the name of the distribution that Michael talked about is the Chi Distribution.

http://en.wikipedia.org/wiki/Chi_distribution

Testing latex as well

$z = \sum{i=1}{n}X^2_i \sim \Chi^2_n \\ y = \sqrt{\sum{i=1}{n}X^2_i} \sim Chi_n$

If

$X_i \sim N(0,1)$, i.i.d

• Yes, my probability terminology is probably lacking. But you’re missing the point. I’m not talking about repeated averages.

The point is that one usually expects a value (a single value) drawn from a Gaussian to be close to the center (if one isn’t a probabilist). That’s what happens in low dimensions, but in much higher dimensions, that single value will most likely be far from the mean.

Obviously, when you draw many points, their mean will coincide with the distribution’s mean.

8. William V

In higher dimensions the mode of the normal is still 0 and the mean is still 0. So your statement below is factually wrong. ‘Most likely’, means highest likelihood which occurs at the mode for a sample size of 1.

“That’s what happens in low dimensions, but in much higher dimensions, that single value will most likely be far from the mean.”

When your calculating the average distance from the mean, that no longer has a gaussian distribution that has a chi distribution which is skewed. Of course the expected distance is positive and increasing with n. The distribution is becoming more ‘flat’ due to normalization of the RV.

By the way, a sample from an uncorrelated MV normal is exactly the same as an repeated sample of (not averaging or course) of a univariate normal.

• William V

0 if we omit (mu, sigma) which we can of course for discussions purposes.

• “In higher dimensions the mode of the normal is still 0 and the mean is still 0. So your statement below is factually wrong. ‘Most likely’, means highest likelihood which occurs at the mode for a sample size of 1.”

Yes, the mean is 0. The MV normal distribution is not unimodal, however, so it’s not correct to say that it has a mode of 0. The mode is the most frequent value of a distribution. That value is not unique in the case of an MV normal.

Its set of modes is a hypersphere of radius almost $\sigma\sqrt{n}$, for high enough values of n. My point is that this fact is what makes it unintuitive. You may find it more intuitive than I do.

“When your calculating the average distance from the mean, that no longer has a gaussian distribution that has a chi distribution which is skewed.”

I’m aware of this. This is basically the point that I’m trying to make in the post.

“Of course the expected distance is positive and increasing with n. The distribution is becoming more ‘flat’ due to normalization of the RV.”

I don’t know what you mean by “flat,” but no, it’s not obvious on first examination. Again, perhaps it is to you, but I found it weird and fascinating when I learned of it.

“By the way, a sample from an uncorrelated MV normal is exactly the same as an repeated sample of (not averaging or course) of a univariate normal.”

Yes, I am aware of this. Stated this way, it grounds the situation a little more, and perhaps this is the best way to demonstrate what’s really going on.

Again, my argument is that when the MV normal is viewed geometrically, it’s a very interesting and weird object.

9. “The MV normal distribution is not unimodal”

Actually it is. (Just take the partial derivatives and you will see that the mode is at [0,…,0])

You get confused because you compare areas of different volume. Yes, the volume around the mean is small, so that area will never have enough points compared to the “shell” around the mean, which has significantly higher volume in higher dimensions.

However, if you compute density, you will see that the density is still higher at the mean. (Remember: density = “number of points in the area”/”volume of the area”)

• Ah, yes, you’re right. I keep getting into this radial line of thinking. Thank you, Panos.

10. I noticed that this is not the first time you write about the topic. Why have you decided to write about it again?

• I also noticed that this is not the first time that you’ve left your spam in a comment on my post.

11. I must say this is a great article i enjoyed reading it keep the good work

12. Andrej

I study about the problem you discuss in Scott (1992): Multivariate density estimation. He said:

… [in higher] dimensions, the probability mass of a multivariate Normal begins a rapid migration into the extreme tails. In fact, more than half of the probability mass is in a very low density region for 10-dimensional data.

First of all, I have some problems with the term “probability mass”. I’m quite sure that this is not the density, so what it is?

• “Mass” of a region here refers to actual probability once you measure the probability of that same region. “Density” then is familiar: it’s mass per unit volume.

So he’s saying that if you look at this very low-density region, that half the probability exists over this whole region. Then the region must be very large, since it’s a low-density region.

13. Andrej