Faster ReLUs

I'll expand upon this a little more later, but I've used the following trick in the past to speed up computations.

When evaluating a ReLU it essentially takes the form of:

$$\displaystyle \text{ReLU}\left(\vec{x}\right)_j = h\left(\langle \vec{a}_j | \vec{x} \rangle + b_j\right)$$

Where $h(z) = z$ if $z>0$ and returns $0$ otherwise. A ReLU being non-zero implies the input point $x$ lies in a certain half-plane. If we know that several ReLUs are a mix of zeros and non-zeros, it implies that the point lies in a specific region given by the intersections of those half-planes. If we know that other ReLU operations we haven't evaluated have a half plane outside that region, then we know they will evaluate to 0 before the evaluation has taken place, meaning we don't need to perform them.

In effect, we look to evaluate any given ReLU block in stages, where each stage will tell us which operations do and don't need to be performed. Creating this map of dependencies is computationally expensive, but only expensive once. After it has been constructed it speeds up all future ReLU operations, making it useful for production.


No comments:

Post a Comment

March thoughts

Lets start by taking any system of ordinary differential equations. We can of course convert this to a first order system by creating stand-...