Problem-setup: point-distribution grading

 Having finished the hull finding examples I started thinking if we really needed them. I care about them mostly in the context of knowing when data is interpolable, but really it may not be the best measure for if something can be well approximated. With scattered data something can be outside of the hull and close to a lot of dense information, or it can be in the hull and far from anything. Settling for density metrics isn't enough though, because we benefit from a wide distribution of samples, many taken at the same nearby point don't give us as much information as a smaller number dispersed into the local region.

I looked far and wide and honestly couldn't come up with a good candidate measure for n-dimensions. What I can do instead is try to draft a clear example of what I mean so that I don't suffer a bunch of smart people telling me to use a bunch of methods that don't quite work for hard to explain reasons.

To give ourselves a nice easy case, we're going to assume that we're working in the space of analytic functions $\mathbb{R}^n \to \mathbb{R}$. Since we care quite a bit about the dispersion of data, we're going to do something a little rouge and work with polar coordinates. Note that $\arccos\left(\frac{x}{\|x\|}\right) = \theta$ where $\theta$ is the vector of angles of the point $x$ relative to the standard basis vectors. Translating in and out of polar coordinates in an n-dimensional setting isn't as awful as it sounds.

Since everything is analytic, we can express it in terms of $f(r,\theta) = \sum r^k g_k(\theta)$, no need to worry that we have one too many thetas, it doesn't increase our degrees of freedom anywhere so it keeps us locked in. If we had some sample values $y$ at a set of points given by a vector of $r$'s and a matrix of $\theta$s we can put together $y = V(\vec{r}) \cdot G_{\gamma}\left(\theta\right)$, where $V$ is an (infinitely fat) vandermonde matrix of those $r$'s and $G$ is an (equally fat) matrix of the various $g_k$ values for the involved $\theta$s. Note that $g_k: S^n \to \mathbb{R}$ takes a vector of inputs and maps to one number, so no tensors crop up.

Anyway, if you want to fit $\gamma$ to it you do the standard trick where you take the derivative of the squared norm of the difference and set it to 0:

$$\displaystyle E(\gamma) = \|V(\vec{r}) \cdot G_{\gamma}\left(\theta\right) - y\|^2$$

$$\displaystyle  \frac{\partial E}{\partial \gamma}= 0 = \left(V(\vec{r}) \cdot G_{\gamma}\left(\theta\right) - y\right)^T\left(V(\vec{r}) \cdot \frac{\partial}{\partial \gamma}G_{\gamma}\left(\theta\right)\right)$$
Now at least we have a framework for what our approximation error can start to look like. If we're feeling frisky we can also note that $g_0\left(\theta\right) = c$ and $g_1\left(\theta\right) = \vec{a} \cdot \cos\left(\theta\right)$ if we want to preserve our nice manifold behavior which would otherwise be disrupted at the origin. Tomorrow if I get a moment we can try the simple case to establish worst case behavior for a linear approximation and certain bounds on the variation of the $g_k$ terms (See how it being periodic will come in handy? bounded variation means bounded norms means a whole lot of delta-epsilon n-ball stuff).

Wait could this be the wrong approach? We shouldn't be trying to find the linear function minimizing error everywhere, we only want to minimize error at our point of interest. Does the model that best fits the data present the best estimate of a particular parameter? Could there be systematic bias to offset other terms like the r squared one? What if all the $g$'s after were zero for odd $k$ and positive for even $k$. We could theoretically fit a higher dim function then pass along the first terms from that which would improve in all cases.

No comments:

Post a Comment

March thoughts

Lets start by taking any system of ordinary differential equations. We can of course convert this to a first order system by creating stand-...