In my real work, I (or better, my students and colleagues) use R to program our data analyses (if you are interested, you can check out the TRONCO package.)

Of course, being a Common Lisp quite-high priest at this point and a

*recreational*programmer in it, I wondered how to render several R constructs in the language with parentheses. It turns out that it was quite fun to re-engineer some of the key features of the R language.

This is not the first such project trying to mimic R. I know at least two of them; the most public one being Common Lisp Stat, but, being known for suffering from NIH and RTW syndromes, I had to jump in.

Here is a short summary of what I did and what was achieved.

## Generic Math, Arrays and Vectors

R provides generic math on vectors. Pretty much everything in R is a vector (stored in

Indexing can also become interesting because of other idiosyncrasies in R. Consider the following:

*column major order*; more on this later) including arrays. The generic math helps you writing things like>42 * 1:5# The ':' operator produces a vector. [1] 42 84 126 168 210

### Indexing

The generic math and the construction of some operators in CL is pretty much straightforward. Where things get hairy is in the handling of the several ways that a R programmer may use to index, slice and dice vectors and arrays. As an example, you could write>In brief, you have a matrix with "named" rows and columns. There are other idiosyncrasies but this is already something interesting to deal with, as you can now write things like:a = array(11:14, dim = c(2, 2), dimnames = list(list("a", "b"), list("x", "y")))>ax y a 11 13 b 12 14

>The second "indexing" selects, ora[1, "y"][1] 3 >a["b", ]x y 12 14

*slices*, the entire second row (counting from 1) and, as a result, returns a vector*with the column names*retained.Indexing can also become interesting because of other idiosyncrasies in R. Consider the following:

>Indexing by only one index accesses the underlying vector in column major order. If the index is beyond the current limit, the valuea[2][1] 12 >a[42][1] NA # 'NA' is the 'not available' indicator. >a[c(2, 3, 4, 10)][1] 12 13 14 NA

`NA`

('not available') is returned. If the sole index is a vector, then the indexing returns a vector of the values in the array accessed column major order.### Indexing and Assignments

Of course, once you handle slicing and dicing of arrays, you have to deal with

In R you can have a lot of what you can dream of (and something more, some of which, you wished was not there).

Basic assignment on vectors and arrays works as expected. Consider:

So far, nothing special is happening. Things actually make sense when trying to set a sub-slice of the array.

However, when setting a sub-slice of an array to a vector, or setting the elements of an array using a single vector index (i.e., when accessing the array in column major order), things may become a bit surprising.

Not really a problem, once you get the hang of it, but, source of a bit of confusion nevertheless, at least to yours truly.

In any case, all of this must be dealt with once you want to get into building some facilities like these on top of Common Lisp.

Of course, these choices will make porting code from R to the CL library/tool quite cumbersome; let's say that this goal was dropped early on. The CL library (which, by the way, is named ρ, that is:

Here is a run down of the basic features. This was the hard part, in the opinion of the author. Getting more of R implemented will be somewhat easier.

###
The

The AT generic function works as expected on the standard CL objects (as it should).

*setting*these slices and dice.In R you can have a lot of what you can dream of (and something more, some of which, you wished was not there).

Basic assignment on vectors and arrays works as expected. Consider:

>a = array(11:19, dim = c(3, 3), dimnames = list(list("a", "b", "c"), list("x", "y", "z")))>ax y z a 11 14 17 b 12 15 18 c 13 16 19 >a[2, "y"] = 42>ax y z a 11 14 17 b 12 42 18 c 13 16 19

So far, nothing special is happening. Things actually make sense when trying to set a sub-slice of the array.

>b = array(42, dim = c(2, 2))>b[,1] [,2] [1,] 42 42 [2,] 42 42 >a[2:3, 2:3] = b>ax y z a 11 14 17 b 12 42 42 c 13 42 42

However, when setting a sub-slice of an array to a vector, or setting the elements of an array using a single vector index (i.e., when accessing the array in column major order), things may become a bit surprising.

>Ok, the last example shows how R manages assigning past the limit of thea[2:3, 2:3] = c(1, 1, 1, 1)>a# This is not a surprise. x y z a 11 14 17 b 12 1 1 c 13 1 1 >a[2:3, 2:3] = c(1, 2, 3, 4, 5, 6)# This is also expected (modulo the 'multiple'.)Error in a[2:3, 2:3] = c(1, 2, 3, 4, 5, 6) : number of items to replace is not a multiple of replacement length>a[c(1, 2, 6, 7)] = c(111, 111, 111, 111)# Ok, understandable. > a x y z a 111 14 111 b 111 1 1 c 13 111 1 >a[c(1, 2, 12, 7)] = c(1, 1, 1, 1)>a# What?!? [1] 1 1 13 14 1 111 1 1 1 NA NA 1

*underlying*vector.Not really a problem, once you get the hang of it, but, source of a bit of confusion nevertheless, at least to yours truly.

In any case, all of this must be dealt with once you want to get into building some facilities like these on top of Common Lisp.

## Let's "Reinvent the Wheel" (RTW) in Common Lisp

Getting the basics down was not so difficult. Common Lisp has a somewhat richer set of array and vector data types than R, however, some design decisions resulted in some fun (albeit probably sub-optimal) setup.

Here is a list of decisions taken, which may be subject to debate.

- Objects are represented as structures.
- Constructors follow the R conventions (no
`make-array`

!) - Vectors and arrays are mapped to CL vectors and arrays.

This means that

- Arrays are kept as multi-dimensional entities which are accessed in
*row major order* - Vector and array "objects" are separate and different.

- Arrays are kept as multi-dimensional entities which are accessed in
- A generic reference function
`AT`

is introduced (and its counterpart,`(SETF AT)`

.)

Of course, these choices will make porting code from R to the CL library/tool quite cumbersome; let's say that this goal was dropped early on. The CL library (which, by the way, is named ρ, that is:

`RHO`

) will look familiar to a R programmer, but it will not be the same. After all, is it Common Lisp, isn't it?Here is a run down of the basic features. This was the hard part, in the opinion of the author. Getting more of R implemented will be somewhat easier.

###
The `AT`

Generic Function

RHO 10 >The generic function(defvar qwerty "qwerty")QWERTY RHO 11 >(setf (at qwerty 3) #\R)#\R RHO 12 >qwerty"qweRty" RHO 13 >(at qwerty 4)#\t RHO 14 >(defvar a #2A((1 2 3) (4 5 6) (7 8 9)))A ;;; ... RHO 16 >(at a 1 1)5 RHO 17 >(setf (at a 0 0) 42)42 RHO 18 >a#2A((42 2 3) (4 5 6) (7 8 9))

`AT`

also works on structures and class instances.RHO 22 >So far, so good. Now let's have some fun.(defstruct foo a s d)FOO RHO 23 >(defvar foo (make-foo :s 42))FOO RHO 24 >(at foo 'foo-s); In this case the accessor function FOO-S is invoked. 42 RHO 25 >(setf (at foo 'foo-d) "Food")"Food" RHO 26 >foo#S(FOO :A NIL :S 42 :D "Food")

### Creating Arrays (and Vectors and Matrices)

Let's create some arrays, vectors and matrices.

Creating matrices is straightforward and, again, a R-like function is used.

###
Slicing and Dicing in

Most important one is "negative" indexing, which does not work completely yet.

Of course, I am limiting this comment to arrays and vectors: other R data structures, such as

RHO 28 >The objects returned by(setf rr (range 1 9)); This would be 'rr = 1:8' in R. 1 2 3 4 5 6 7 8 RHO 29 >(setf rv (vector 1 22/23 -3.2 42 5.8))1 22/23 -3.2 42 5.8 RHO 30 >(setf rv2 (vector* (list 1 22/23 -3.2 42 5.8) :names '(a b c d e)))A B C D E 1 22/23 -3.2 42 5.8

`VECTOR`

and `VECTOR*`

(the full constructor) are wrappers around the underlying CL vectors. Their class is `RHO:VECTOR`

. Note also that `*PRINT-READABLY*`

is set to `NIL`

, therefore, a "pretty printing" of the vectors is produced.RHO 37 >The function(setf ra (array (range 1 9) :dim '(3 3)))[, 0] [, 1] [, 2] [0, ] 1 2 3 [1, ] 4 5 6 [2, ] 7 8 1 RHO 38 >(at ra 1 1)5

`RANGE`

excludes the upper limit (following CL conventions), while the `ARRAY`

constructors "recycles" the values passed to it (following R conventions). The value produces is pretty-printed, while its class is `RHO:ARRAY`

. Indexing is 0-based, as in CL.Creating matrices is straightforward and, again, a R-like function is used.

RHO 3 >(setf rm (matrix '(1 1 2 0) :nrow 2 :ncol 2))[, 0] [, 1] [0, ] 1 1 [1, ] 2 0 RHO 4 >(at rm 0 1)1 RHO 5 >(at rm 1 0)2

###
Slicing and Dicing in `RHO`

Now that we have all the bases covered, we can proceed to some "slicing and dicing". All the R indexing and assignment methods are provided in

First of all, let's set some

Next we create a new array (a matrix, but in R, matrices are separate subclasses of arrays)

Remember that we have built a matrix

Note however, that, although "correct", no attempt to signal an error is made for incongruous dimensions.

Getting the indexing of different combinations of scalars, vectors, and arrays was an interesting exercise. The implementation is probably bloated and not easily understandable; most probably because of the design choice of keeping the underlying CL vectors and arrays "as-they-are", especially multidimensional arrays.

`RHO`

, modulo the Common Lisp background choices, bugs and unimplemented features; not many of the last kind.First of all, let's set some

*dimension names*...RHO 7 >(setf (dimension-names ra) '(("a" "b") ("x" "y")))(("a" "b") ("x" "y")) RHO 8 >ra[, x] [, y] [, 2] [a, ] 1 2 3 [b, ] 4 5 6 [2, ] 7 8 1 RHO 9 >(at ra 1 "y")5

Next we create a new array (a matrix, but in R, matrices are separate subclasses of arrays)

**and use it as a value to set a sub-matrix of**`rb`

**.**`ra`

RHO 13 >As you can see, everything works as expected.(setf rb (array 42 :dim '(2 2)))[, 0] [, 1] [0, ] 42 42 [1, ] 42 42 RHO 29 >(setf (at ra (vector 1 2) (vector 1 2)) rb)#2A((42 42) (42 42)) RHO 15 >ra[, x] [, y] [, 2] [a, ] 1 2 3 [b, ] 4 42 42 [2, ] 7 42 42

Remember that we have built a matrix

**; let's use it to set some elements of the array**`rm`

**, according to the R "access-by-matrix" semantics.**`ra`

RHO 16 >rm[, 0] [, 1] [0, ] 1 1 [1, ] 2 0 RHO 24 >(type-of rm)MATRIX RHO 27 >(setf (at ra rm) 1024)1024 RHO 28 >ra[, x] [, y] [, 2] [a, ] 1 2 3 [b, ] 4 1024 42 [2, ] 1024 42 42

RHO 29 >(setf (at ra #(1 2) #(1 2)) #(1 1 1 1))#(1 1 1 1)

RHO 30 >ra[, x] [, y] [, 2] [a, ] 1 2 3 [b, ] 4 1 1 [2, ] 1024 1 1

Note however, that, although "correct", no attempt to signal an error is made for incongruous dimensions.

RHO 42 >(setf (at ra #(1 2) #(1 2)) #(11 12 13 14 15))#(11 12 13 14 15) RHO 43 >ra[, x] [, y] [, 2] [a, ] 1 2 3 [b, ] 4 11 12 [2, ] 1024 13 14

Getting the indexing of different combinations of scalars, vectors, and arrays was an interesting exercise. The implementation is probably bloated and not easily understandable; most probably because of the design choice of keeping the underlying CL vectors and arrays "as-they-are", especially multidimensional arrays.

### Missing Things...

Many...Most important one is "negative" indexing, which does not work completely yet.

Of course, I am limiting this comment to arrays and vectors: other R data structures, such as

*factor*s and*data.frame*s are in the making, but all the heavy lifting is now done.## Want to Play?

Well, for the time being, drop me a line (you know where to find me). We can talk about setting up a proper project and discuss some design choices.

Cheers

Cheers

## No comments:

## Post a Comment