The Dual Problem to SVM

We derive the dual problem to soft margin SVM.

14 March 2025

Suppose we have mm data points (x1,y1),...,(xm,ym)(x_1,y_1),...,(x_m,y_m), where yi{±1}y_i\in \{\pm 1\}. Recall the primal problem of SVM in its constrained form

minw,b,ϵ12w2+Ci=1mϵi,s.t. yi(wTxi+b)1ϵi,andϵi0.\begin{equation} \begin{split} &\min_{w,b,\epsilon} \frac{1}{2}\|w\|^2 + C\sum_{i=1}^m\epsilon_i,\\ &\text{s.t.} \quad \text{ }y_i(w^Tx_i+b) \geq 1 - \epsilon_i,\\ &\text{and}\quad \epsilon_i \geq 0. \end{split} \end{equation}

We set up the Lagrangian from the constraints as follows. Define 2m2m positive variables αi,βi0\alpha_i,\beta_i\geq 0 where

L(w,b,ϵ,α,β)=12w2+Ci=1mϵi+i=1mαi(1ϵi(yi(wTxi+b)))i=1mβiϵi.\begin{equation} \begin{split} &\mathcal{L}(w,b,\epsilon,\alpha,\beta)=\\ &\frac{1}{2}\|w\|^2 + C\sum_{i=1}^m\epsilon_i + \sum_{i=1}^m \alpha_i(1-\epsilon_i-(y_i(w^Tx_i+b))) - \sum_{i=1}^m \beta_i\epsilon_i. \end{split} \end{equation}

Our aim is to use first-order optimality conditions to get equations on our old variables, subsitute them into L\mathcal{L}, then rewrite everything in terms of our new variables. Maximizing the Lagrangian after the subsitution converts the primal problem into the dual. Taking derivative in the wjw^jth direction gives

0=Lwj=12wji=1n(wi)2wji=1mαiyi(k=1nwkxik)=12i=1mwiδjii,k=1mαiyixikδjk=wji=1mαiyixij.\begin{equation} \begin{split} 0=\frac{\partial \mathcal{L}}{\partial w^j}&=\frac{1}{2}\frac{\partial}{\partial w^j}\sum_{i=1}^n (w^i)^2-\frac{\partial}{\partial w^j}\sum_{i=1}^m \alpha_i y_i (\sum_{k=1}^n w_kx^k_i)\\ &=\frac{1}{2}\sum_{i=1}^m w^i\delta^i_j - \sum_{i,k=1}^m \alpha_iy_ix_i^k\delta_{jk}\\ &=w^j-\sum_{i=1}^m \alpha_iy_ix^j_i. \end{split} \end{equation}

Since this holds for each coordinate wjw^j, this implies

w=i=1mαiyixi.\begin{equation} w=\sum_{i=1}^m \alpha_i y_ix_i. \end{equation}

Taking a derivative in the bbth direction gives

0=Lb=i=1mαiyi.\begin{equation} 0=\frac{\partial \mathcal{L}}{\partial b}=-\sum_{i=1} ^m \alpha_iy_i. \end{equation}

Taking a derivative in the ϵj\epsilon_jth direction gives

0=Lϵj=Cϵji=1mϵiϵji=1mαiϵiϵji=1mβiϵi=Ci=1mδiji=1mαiδiji=1mβiδij=Cαjβj.\begin{equation} \begin{split} 0=\frac{\partial \mathcal{L}}{\partial \epsilon_j}&=C\frac{\partial}{\partial \epsilon_j}\sum_{i=1}^m \epsilon_i- \frac{\partial}{\partial \epsilon_j}\sum_{i=1}^m \alpha_i\epsilon_i- \frac{\partial}{\partial \epsilon_j} \sum_{i=1}^m \beta_i\epsilon_i\\ &=C\sum_{i=1}^m\delta_{ij}-\sum_{i=1}^m\alpha_i\delta_{ij}-\sum_{i=1}^m\beta_i\delta_{ij}\\ &=C-\alpha_j-\beta_j. \end{split} \end{equation}

This, of course, implies

βj=Cαj.\begin{equation} \beta_j=C-\alpha_j. \end{equation}

Subsituting these constraints in gives us the dual objective function L\mathcal{L}^*, which we maximize below.

maxα,βL=maxα12i=1mαiyixi2+Ci=1mϵi()i=1m(C()αi())ϵi+i=1mαii=1mαiϵi()i=1mαiyi(wTxi+b)=maxα12i,j=1mαiαjyiyjxiTxj+i=1mαii=1mαiyi(j=1mαjyjxjT)xi=maxα12i,j=1mαiαjyiyjxiTxj+i=1mαi=minα12i,j=1mαiαjyiyjxiTxji=1mαi\begin{equation} \begin{split} \max_{\alpha,\beta} \mathcal{L}^* &= \max_{\alpha}\frac{1}{2} \left\|\sum_{i=1}^m \alpha_i y_i x_i \right\|^2+ \underbrace{C\sum_{i=1}^m\epsilon_i}_{(*)}- \sum_{i=1}^m (\underbrace{C}_{(*)}-\underbrace{\alpha_i}_{(**)}) \epsilon_i\\ &\quad \quad +\sum_{i=1}^m \alpha_i \underbrace{- \sum_{i=1}^m \alpha_i\epsilon_i}_{(**)} - \sum_{i=1}^m \alpha_iy_i(w^Tx_i+\cancel{b})\\ &=\max_\alpha \frac{1}{2} \sum_{i,j=1}^m \alpha_i \alpha_j y_i y_j x_i^T x_j + \sum_{i=1}^m \alpha_i\\ &\quad \quad - \sum_{i=1}^m \alpha_i y_i \left(\sum_{j=1}^m \alpha_j y_j x_j^T \right)x_i\\ &=\max_\alpha -\frac{1}{2} \sum_{i,j=1}^m \alpha_i \alpha_j y_i y_j x_i^T x_j + \sum_{i=1}^m \alpha_i\\ &=\min_\alpha \frac{1}{2} \sum_{i,j=1}^m \alpha_i \alpha_j y_i y_j x_i^T x_j - \sum_{i=1}^m \alpha_i\\ \end{split} \end{equation}

For our constraints, one comes from i=1mαiyi=0\sum_{i=1}^m\alpha_iy_i=0. A second comes from αi0\alpha_i\geq 0. The last comes from

Cαi+βiαi,\begin{equation} C\geq \alpha_i+\beta_i\geq \alpha_i, \end{equation}

since βi0\beta_i\geq 0. This gives the full dual problem to SVM as

minα12i,j=1mαiαjyiyjxiTxji=1mαis.t.i=1mαiyi=0,and0αiC.\begin{equation} \begin{split} &\min_\alpha \frac{1}{2} \sum_{i,j=1}^m \alpha_i \alpha_j y_i y_j x_i^T x_j - \sum_{i=1}^m \alpha_i\\ &\text{s.t.} \quad \sum_{i=1}^m\alpha_iy_i=0,\\ &\text{and}\quad 0\leq \alpha_i\leq C. \end{split} \end{equation}