Logistic Regression in OCaml
This page shows how to implement logistic regression in OCaml. The inspiration comes from the blog of Yin. We refer to this page for details. The bottom line is that, given a data set \( (x_i, y_i) \in \mathbb{R}^N \times \{{-1}, 1\}\), \(i = 1,\dots, N\), estimating the parameters \(w\) requires finding the maximum point of the likelihood function \(L\) defined by \[ L(w) = - \sum_{i=1}^N \log\bigl( 1 + \exp({- y_i w^T x_i}) \bigr) - \frac{\lambda}{2} w^T w. \] To use L-BFGS, one also needs to provide the gradient of \(L\): \[ \nabla L(w) = \sum_{i=1}^N \frac{y_i x_i}{1 + \exp(y_i w^T x_i)} - \lambda w. \] Given an optimal \(w\), the probability of labelling an instance \(x\) as \(y\) is given by \[ P(y = {\pm 1} \vert x) = \frac{1}{1 + \exp(-y \, w^T x)} . \]
The implementation uses the OCaml binding of the L-BFGS-B routines by J. Nocedal & al. The API is very simple thanks to OCaml optional arguments. To fetch the data from the web, ocamlnet is used, specifically its Http_client.Convenience module. Linear algebra operations are performed with Lacaml which is a binding to the celebrated LAPACK.
Let us start by opening some modules:
open Http_client.Convenience open Scanf open Format open Bigarray open Lacaml.D
Accessing the data
Since the data is coma separated, we use the Scanf module for an easy splitting and conversion of the flower characteristics. For more complex coma-separated values, a CSV module exists.
let iris_features, iris_labels = let page = http_get "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" in let fh = Scanning.from_string page in let get_iris_id s = if s = "Iris-setosa" then 2 else if s = "Iris-versicolor" then -1 else 1 in let f = ref [] and l = ref [] in try while true do bscanf fh "%f,%f,%f,%f,%s " (fun f1 f2 f3 f4 name -> (* filter so only 2 categories are left. *) if get_iris_id name <> 2 then ( (* Augment the [x] vector so that [w.{1}] is the offset. *) f := Vec.of_array [| 1.; f1; f2; f3; f4 |] :: !f; l := get_iris_id name :: !l; )) done; assert false with End_of_file _ -> Array.of_list !f, Array.of_list !l
The logistic regression implementation
The determination of \(w\) is as simple as it ought to be:
compute \(w \mapsto L(w)\) and its gradient (in a single function
to allow for common factors between \(L\) and \(\nabla L\) to be
computed once only) and pass it to
Lbfgs.F.max
.
let log_reg ?(lambda=0.1) x y = (* [f_df] returns the value of the function to maximize and store its gradient in [g]. *) let f_df w g = let s = ref 0. in ignore(copy ~y:g w); (* g ← w *) scal (-. lambda) g; (* g = -λ w *) for i = 0 to Array.length x - 1 do let yi = float y.(i) in let e = exp(-. yi *. dot w x.(i)) in s := !s +. log1p e; axpy g ~alpha:(yi *. e /. (1. +. e)) ~x:x.(i); done; -. !s -. 0.5 *. lambda *. dot w w in let w = Vec.make0 (Vec.dim x.(0)) in ignore(Lbfgs.F.max f_df w); w
Try the logistic regression model on the Iris dataset
let proba w x y = 1. /. (1. +. exp(-. float y *. dot w x))
let () = let sol = log_reg iris_features iris_labels in printf "w = %a\n" Lacaml.Io.pp_fvec sol; let nwrongs = ref 0 in for i = 0 to Array.length iris_features - 1 do let p = proba sol iris_features.(i) iris_labels.(i) in printf "Label = %i prob = %g => %s\n" iris_labels.(i) p (if p > 0.5 then "correct" else (incr nwrongs; "wrong")) done; printf "Number of wrong labels: %i\n" !nwrongs