Last week there was an information in OCaml-discuss about the continuity of the Owl project, which is the most important numerical (scientific numerical computing) library in OCaml.
I was then thinking more about Owl and I realised that I would like to perform quick evaluations of OCaml/Owl with respect to Python-Numpy and Julia regarding performance. I am not an expert on this, but the idea is to have my two-cents about matrix operations speed in OCaml. Note that in Owl there are automatic tests of performance of the library against python/julia equivalent libraries. However, I am more interested in a bit broader view when a potential programer may need to mix the optimized method with other non-optimized operations. :
Multidimensional arrays benchmarks
One important point of scientific numerical libraries is how the handle the multidimensional arrays and how fast can be typical operations with them. Inspired by the blog entry: ocaml-bigarray-vs-array I have decided to compare these matrix operations with Python and Julia, the code in both languages is quite straight-forward. The core of the comparison is:
-
Create an square matrix with random float numbers from 0 to 1, and measure the typical time to sum all the elements. First we do with specific methods, then we compare with the same calculation just using typical loops. For each method with loops I tried both possible loops orders (sometimes named c-layout vs fortran-layout).
-
For Python, I am using Python 3.11.2, for Julia is the version 1.9.0, for OCaml compilar is used the 4.12.1 version.
Python and Numpy
Code
import time
import numpy as np
def np_sum(arr, size):
a = time.time_ns()
fsum = np.sum(arr)
b = time.time_ns()
print(" Method np-sum : %f (time %fs)" %(fsum,(b-a)/1000000000.0))
return
def loop_sum(arr,size):
lsum=0.0
a = time.time_ns()
for i in range(0,size):
for j in range(0,size):
lsum = lsum + arr[i,j]
b = time.time_ns()
print(" Method loop : %f (time %fs)" %(lsum,(b-a)/1000000000.0))
return
def loopt_sum(arr,size):
tsum=0.0
a = time.time_ns()
for i in range(0,size):
for j in range(0,size):
tsum = tsum + arr[j,i]
b = time.time_ns()
print(" Method loop-t : %f (time %fs)" %(tsum,(b-a)/1000000000.0))
return
for asize in [1234,12345,23456]:
print("\n ==== Size A[n,n] ====== %d \n" %(asize))
arr2 = np.random.rand(asize, asize)
np_sum(arr2,asize)
loop_sum(arr2,asize)
loopt_sum(arr2,asize)
Result (in my desktop computer)
> ==== Size A[n,n] ====== n=1234 ====
> Method np-sum : 761566.315611 (time 0.000391s)
> Method loop : 761566.315611 (time 0.154743s)
> Method loop-t : 761566.315611 (time 0.151696s)
>
> ==== Size A[n,n] ====== n=12345 ====
> Method np-sum : 76199182.226933 (time 0.038816s)
> Method loop : 76199182.226942 (time 14.834441s)
> Method loop-t : 76199182.226923 (time 14.894826s)
>
> ==== Size A[n,n] ====== n=23456 ====
> Method np-sum : 275090911.896293 (time 0.133710s)
> Method loop : 275090911.896455 (time 52.886030s)
> Method loop-t : 275090911.896464 (time 53.198568s)
Julia
Code
using Printf
function jl_sum(arr)
a = time()
fsum = sum(arr)
b = time()
@printf " Method jl-sum : %f (time %.6fs)\n" fsum (b-a)
end
function loop(arr,alen)
global lsum=0
a = time()
for i = 1:alen
for j = 1:alen
global lsum += arr[i,j]
end
end
b = time()
@printf " Method loop : %f (time %.6fs)\n" lsum (b-a)
end
function loopt(arr,alen)
global tsum=0
a = time()
for i = 1:alen
for j = 1:alen
global tsum += arr[j,i]
end
end
b = time()
@printf " Method loop-t : %f (time %.6fs)\n" tsum (b-a)
end
for asize in [1234,12345,23456,34567]
@printf "\n ==== Size A[n,n] ====== %d \n" asize
arr = rand(asize, asize)
jl_sum(arr)
loop(arr,asize)
loopt(arr,asize)
end
Result (in my desktop computer)
> ==== Size A[n,n] ====== n=1234 ====
> Method jl-sum : 760961.820085 (time 0.000152s)
> Method loop : 760961.820085 (time 0.034048s)
> Method loop-t : 760961.820085 (time 0.031716s)
>
> ==== Size A[n,n] ====== n=12345 ====
> Method jl-sum : 76203592.302483 (time 0.034457s)
> Method loop : 76203592.302516 (time 2.140357s)
> Method loop-t : 76203592.302498 (time 2.344866s)
>
> ==== Size A[n,n] ====== n=23456====
> Method jl-sum : 275096914.914670 (time 0.129691s)
> Method loop : 275096914.914723 (time 7.598340s)
> Method loop-t : 275096914.914827 (time 8.538193s)
Comments Python and Julia
For the results with Julia and Python, we observe that numpy (np-sum) and the equivalent method in Julia (jl-sum) are giving similar results. However when we try to repeat that calculation using loops we observe that Julia is quite faster than Python. In general in the Julia community it is common to say: don't be afraid about using loops and nested loops in Julia as they are fast. It is true for those comming from Python, but comparing with specific methods (jl-sum and np-sum) we see in both languages remarkable difference.
OCaml
In OCaml following the reference ocaml-bigarray-vs-array I have tried more methods for different libraries of arrays.
Code
open Owl
let array_at m i j = Array.unsafe_get (Array.unsafe_get m i ) j
let sum_array_no_inline n m = let a = ref 0. in
for i = 0 to (n-1) do
for j = 0 to (n-1) do
a := !a +. array_at m i j
done;
done;
!a
let sum_array n m = let a = ref 0. in
for i = 0 to (n-1) do
for j = 0 to (n-1) do
a := !a +. (Array.unsafe_get (Array.unsafe_get m i ) j )
done;
done;
!a
let sum_arrayt n m = let a = ref 0. in
for i = 0 to (n-1) do
for j = 0 to (n-1) do
a := !a +. (Array.unsafe_get (Array.unsafe_get m j ) i )
done;
done;
!a
let sum_big_array n m = let a = ref 0. in
for i = 0 to (n-1) do
for j = 0 to (n-1) do
a := !a +. (Bigarray.Array2.unsafe_get m i j);
done;
done;
!a
let sum_big_arrayt n m = let a = ref 0. in
for i = 0 to (n-1) do
for j = 0 to (n-1) do
a := !a +. (Bigarray.Array2.unsafe_get m j i);
done;
done;
!a
let sum_owl_array n m = let a = ref 0. in
for i = 0 to (n-1) do
for j = 0 to (n-1) do
a := !a +. Mat.get m i j;
done;
done;
!a
let sum_owl_array_lib n m = Mat.sum' m
let time f x =
let start = Unix.gettimeofday () in
let res = f x in
let stop = Unix.gettimeofday () in
Printf.printf ": %f (time %fs)\n%!" res (stop -. start);
res
Results with ocamlc (bytecode)
> ==== Size A[n,n] ======= n=1234 ====
> [ArrayNoInline] : 761505.037418 (time 0.038757s)
> [Array ] : 761505.037418 (time 0.029140s)
> [Arrayt ] : 761505.037418 (time 0.029490s)
> [BigArray ] : 761505.037421 (time 0.035729s)
> [BigArrayt ] : 761505.037421 (time 0.036163s)
> [OwlArray ] : 761505.037418 (time 0.062719s)
> [OwlArrayLib ] : 761505.037418 (time 0.000493s)
>
> ==== Size A[n,n] ======= n=12345 ====
> [ArrayNoInline] : 76201442.335659 (time 3.410532s)
> [Array ] : 76201442.335659 (time 2.576479s)
> [Arrayt ] : 76201442.335612 (time 2.721879s)
> [BigArray ] : 76201442.335401 (time 3.146339s)
> [BigArrayt ] : 76201442.335364 (time 3.787680s)
> [OwlArray ] : 76201442.335659 (time 5.330613s)
> [OwlArrayLib ] : 76201442.335678 (time 0.031934s)
>
> ==== Size A[n,n] ======= n=23456 ====
> [ArrayNoInline] : 275097306.304532 (time 12.392087s)
> [Array ] : 275097306.304532 (time 9.196658s)
> [Arrayt ] : 275097306.304435 (time 10.553999s)
> [BigArray ] : 275097306.304219 (time 11.405834s)
> [BigArrayt ] : 275097306.304346 (time 10.811333s)
> [OwlArray ] : 275097306.304532 (time 16.714575s)
> [OwlArrayLib ] : 275097306.304379 (time 0.113167s)
Results with ocamlopt
> ==== Size A[n,n] ======= n=1234 ====
> [ArrayNoInline] : 761056.305873 (time 0.005656s)
> [Array ] : 761056.305873 (time 0.001035s)
> [Arrayt ] : 761056.305873 (time 0.001341s)
> [BigArray ] : 761056.305869 (time 0.016464s)
> [BigArrayt ] : 761056.305869 (time 0.015194s)
> [OwlArray ] : 761056.305873 (time 0.009826s)
> [OwlArrayLib ] : 761056.305873 (time 0.000259s)
>
> ==== Size A[n,n] ======= n=12345 ====
> [ArrayNoInline] : 76202418.923891 (time 0.407937s)
> [Array ] : 76202418.923891 (time 0.101955s)
> [Arrayt ] : 76202418.923849 (time 0.218512s)
> [BigArray ] : 76202418.923890 (time 1.473963s)
> [BigArrayt ] : 76202418.923885 (time 1.932328s)
> [OwlArray ] : 76202418.923891 (time 0.980752s)
> [OwlArrayLib ] : 76202418.923860 (time 0.032272s)
>
> ==== Size A[n,n] ======= n=23456 ====
> [ArrayNoInline] : 275097850.636376 (time 1.404471s)
> [Array ] : 275097850.636376 (time 0.351583s)
> [Arrayt ] : 275097850.636556 (time 1.019869s)
> [BigArray ] : 275097850.637126 (time 5.146998s)
> [BigArrayt ] : 275097850.637142 (time 7.019139s)
> [OwlArray ] : 275097850.636376 (time 3.568812s)
> [OwlArrayLib ] : 275097850.636496 (time 0.112816s)
Final comparison
When comparing OCaml/Owl with Python and Julia we observe that
- Owl specific methods are as fast as Julia or Numpy specific methods, probably all are using similar underlaying openblas libraries (kind of).
- OCaml with loops is always faster than Python loops.
- OCaml with loops is competitive with Julia loops, a bit faster than Julia with ocamlopt a bit slower (with ocamlc).
- OCaml with ocamlopt seems to be the fastest option
IMPORTANT: In the case of Julia it has been tested with and without @inbounds macro without detecting an clear impact in performance only very slight improvement for the largest matrix. In the case of OCaml, I also tested the same code than above where
Array.unsafe_get
has been replaced byArray.get
like in Julia it is very slight impact even for large matrix, both with ocamlc and ocamlopt. Regarding optimizations, I did not set up any (for any language, in OCaml no flambda has been tried).
Discussion
We could conclude that there is no matter to use any of these languages as the speed with the proper libraries is similar, however in real-world most of the programmers are mixing specific libraries methods with functions/procedures typical of that language but not from these "binding libraries". In Python many daily users (maybe not profesional developers but typical users) still are using loops, which has been one reason to support the transition to Julia as their loops will offer a improve in performance.
From this perspective use OCaml, were almost all the methods provide (compared to Python) good performance, seems to be a prefered option. Julia is also reasonable as it gives a large standard library for numerical methods. On the other side, at least in the case of Julia 1.9.0, still there is a small challenge named: time-to-first-plot, as the use of its Just-In-Time compiler can not offer the speed to Python for simple scripts. Also the JIT compiler can not offer much more in terms of speed than the quite fast compiler that OCaml is giving to users. But also in Julia/Python is not possible (to my knowledge) an easy and straight-forward method to compile to binary files.
Personally, I am happy about my choice of learning OCaml and Owl for scientific numerical computing.