Logistic Regression in OCaml

This page shows how to implement logistic regression in OCaml. The inspiration comes from the blog of Yin. We refer to this page for details. The bottom line is that, given a data set \( (x_i, y_i) \in \mathbb{R}^N \times \{{-1}, 1\}\), \(i = 1,\dots, N\), estimating the parameters \(w\) requires finding the maximum point of the likelihood function \(L\) defined by \[ L(w) = - \sum_{i=1}^N \log\bigl( 1 + \exp({- y_i w^T x_i}) \bigr) - \frac{\lambda}{2} w^T w. \] To use L-BFGS, one also needs to provide the gradient of \(L\): \[ \nabla L(w) = \sum_{i=1}^N \frac{y_i x_i}{1 + \exp(y_i w^T x_i)} - \lambda w. \] Given an optimal \(w\), the probability of labelling an instance \(x\) as \(y\) is given by \[ P(y = {\pm 1} \vert x) = \frac{1}{1 + \exp(-y \, w^T x)} . \]

The implementation uses the OCaml binding of the L-BFGS-B routines by J. Nocedal & al. The API is very simple thanks to OCaml optional arguments. To fetch the data from the web, ocamlnet is used, specifically its Http_client.Convenience module. Linear algebra operations are performed with Lacaml which is a binding to the celebrated LAPACK.

Let us start by opening some modules:

open Http_client.Convenience
open Scanf
open Format
open Bigarray
open Lacaml.D

Accessing the data

Since the data is coma separated, we use the Scanf module for an easy splitting and conversion of the flower characteristics. For more complex coma-separated values, a CSV module exists.

let iris_features, iris_labels =
  let page = http_get
    "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" in
  let fh = Scanning.from_string page in
  let get_iris_id s =
    if s = "Iris-setosa"  then 2
    else if s = "Iris-versicolor" then -1
    else 1 in
  let f = ref [] and l = ref [] in
  try
    while true do
      bscanf fh "%f,%f,%f,%f,%s " (fun f1 f2 f3 f4 name ->
        (* filter so only 2 categories are left. *)
        if get_iris_id name <> 2 then (
          (* Augment the [x] vector so that [w.{1}] is the offset. *)
          f := Vec.of_array [| 1.; f1; f2; f3; f4 |] :: !f;
          l := get_iris_id name :: !l;
        ))
    done;
    assert false
  with End_of_file _ ->
    Array.of_list !f, Array.of_list !l

The logistic regression implementation

The determination of \(w\) is as simple as it ought to be: compute \(w \mapsto L(w)\) and its gradient (in a single function to allow for common factors between \(L\) and \(\nabla L\) to be computed once only) and pass it to Lbfgs.F.max.

let log_reg ?(lambda=0.1) x y =
  (* [f_df] returns the value of the function to maximize and store
     its gradient in [g]. *)
  let f_df w g =
    let s = ref 0. in
    ignore(copy ~y:g w); (* g ← w *)
    scal (-. lambda) g;  (* g = -λ w *)
    for i = 0 to Array.length x - 1 do
      let yi = float y.(i) in
      let e = exp(-. yi *. dot w x.(i)) in
      s := !s +. log1p e;
      axpy g ~alpha:(yi *. e /. (1. +. e)) ~x:x.(i);
    done;
    -. !s -. 0.5 *. lambda *. dot w w
  in
  let w = Vec.make0 (Vec.dim x.(0)) in
  ignore(Lbfgs.F.max f_df w);
  w

Try the logistic regression model on the Iris dataset

let proba w x y = 1. /. (1. +. exp(-. float y *. dot w x))

let () =
  let sol = log_reg iris_features iris_labels in
  printf "w = %a\n" Lacaml.Io.pp_fvec sol;
  let nwrongs = ref 0 in
  for i = 0 to Array.length iris_features - 1 do
    let p = proba sol iris_features.(i) iris_labels.(i) in
    printf "Label = %i prob = %g => %s\n" iris_labels.(i) p
      (if p > 0.5 then "correct" else (incr nwrongs; "wrong"))
  done;
  printf "Number of wrong labels: %i\n" !nwrongs