SiLU, GELU and ELU activation functions

If you’re a fan of neural networks, you’ve probably heard of the ELU, GELU and SiLU activation functions.  However these activation functions still are not so common, in this post we are going to know them a little more

elu:

The Exponential Linear Unit is a smooth approximation to the rectifier function. The main advantages is that generates negative outputs, which aid in guiding the network’s weights and biases in the desired directions, and the main cons is that increases the computation time.


import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

def elu(x,alpha):
  return [z if z > 0 else alpha*(np.exp(z) -1) for z in x]

x = np.random.randint(-10,10, 100) alpha = .5 y = elu(x, alpha)

and plot to check graphically:


plt.style.use('ggplot')
g = sns.lineplot(x=x, y=y)

g.axhline(0, ls='--', color="gray")
g.axvline(0, ls='--', color="gray")
g.set(ylim=(-1.5, 2))

plt.legend(labels=["elu"])
plt.title("Exponential Linear Unit (ELU)")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

The plot:

gelu:

Gaussian Error Linear Unit is used for LM and transformer models like GPT-2 and BERT, this functions prevents the problem of vanishing gradients also unlike ELU, it has a continuous derivative at 0, which can sometimes make training faster.


import math

def gelu(x):
    return [0.5 * z * (1 + math.tanh(math.sqrt(2 / np.pi) * (z + 0.044715 * math.pow(z, 3)))) for z in x]

y = gelu(x)

plt.style.use('ggplot')
g = sns.lineplot(x=x, y=y)

g.axhline(0, ls='--', color="gray")
g.axvline(0, ls='--', color="gray")
g.set(xlim=(-4, 4))
g.set(ylim=(-0.5, 2))
plt.xlabel("x")
plt.ylabel("y")


plt.legend(labels=["gelu"])
plt.title("Gaussian Error Linear Unit (GELU)")
plt.show()

The output:

 

silu:

Sigmoid Linear Units, it  serves as a smooth approximation to the ReLU 


def sigmoid(x_elem):
  return 1/(1 + np.exp(-x_elem))

def silu(x, theda = 1.0):
    return [x_elem * sigmoid(theda *x_elem) for x_elem in x]

y = silu(x)

plt.style.use('ggplot')
g = sns.lineplot(x=x, y=y)

g.axhline(0, ls='--', color="gray")
g.axvline(0, ls='--', color="gray")
g.set(ylim=(-0.5, 2))
plt.xlabel("x")
plt.ylabel("y")


plt.legend(labels=["silu"])
plt.title("Sigmoid Linear Units (SiLU)")
plt.show()

The output:

So which one should you use?

It depends on your application and what works best for your network. In general, ELU or GELU may be better choices than ReLU if you’re worried about dead neurons, while SILU may be a good choice if you’re using batch normalization.

Also GELU seems to be the SOTA for transformer models and SiLU is use mostly in computer vision models.

Leave a Reply

Your email address will not be published. Required fields are marked *