constructs in the language with parentheses. It turns out that it was quite fun to re-engineer some of the key features of the
language.
, but, being known for suffering from NIH and RTW syndromes, I had to jump in.
R provides generic math on vectors. Pretty much everything in
R is a vector (stored in
column major order; more on this later) including arrays. The generic math helps you writing things like
> 42 * 1:5 # The ':' operator produces a vector.
[1] 42 84 126 168 210
Indexing
The generic math and the construction of some operators in CL is pretty much straightforward. Where things get hairy is in the handling of the several ways that a
R programmer may use to index, slice and dice vectors and arrays. As an example, you could write
> a = array(11:14, dim = c(2, 2), dimnames = list(list("a", "b"), list("x", "y")))
> a
x y
a 11 13
b 12 14
In brief, you have a matrix with "named" rows and columns. There are other idiosyncrasies but this is already something interesting to deal with, as you can now write things like:
> a[1, "y"]
[1] 3
> a["b", ]
x y
12 14
The second "indexing" selects, or
slices, the entire second row (counting from 1) and, as a result, returns a vector
with the column names retained.
Indexing can also become interesting because of other idiosyncrasies in
R. Consider the following:
> a[2]
[1] 12
> a[42]
[1] NA # 'NA' is the 'not available' indicator.
> a[c(2, 3, 4, 10)]
[1] 12 13 14 NA
Indexing by only one index accesses the underlying vector in column major order. If the index is beyond the current limit, the value
NA
('not available') is returned. If the sole index is a vector, then the indexing returns a vector of the values in the array accessed column major order.
Indexing and Assignments
Of course, once you handle slicing and dicing of arrays, you have to deal with
setting these slices and dice.
In R you can have a lot of what you can dream of (and something more, some of which, you wished was not there).
Basic assignment on vectors and arrays works as expected. Consider:
> a = array(11:19, dim = c(3, 3), dimnames = list(list("a", "b", "c"), list("x", "y", "z")))
> a
x y z
a 11 14 17
b 12 15 18
c 13 16 19
> a[2, "y"] = 42
> a
x y z
a 11 14 17
b 12 42 18
c 13 16 19
So far, nothing special is happening. Things actually make sense when trying to set a sub-slice of the array.
> b = array(42, dim = c(2, 2))
> b
[,1] [,2]
[1,] 42 42
[2,] 42 42
> a[2:3, 2:3] = b
> a
x y z
a 11 14 17
b 12 42 42
c 13 42 42
However, when setting a sub-slice of an array to a vector, or setting the elements of an array using a single vector index (i.e., when accessing the array in column major order), things may become a bit surprising.
> a[2:3, 2:3] = c(1, 1, 1, 1)
> a # This is not a surprise.
x y z
a 11 14 17
b 12 1 1
c 13 1 1
> a[2:3, 2:3] = c(1, 2, 3, 4, 5, 6) # This is also expected (modulo the 'multiple'.)
Error in a[2:3, 2:3] = c(1, 2, 3, 4, 5, 6) :
number of items to replace is not a multiple of replacement length
> a[c(1, 2, 6, 7)] = c(111, 111, 111, 111) # Ok, understandable.
> a
x y z
a 111 14 111
b 111 1 1
c 13 111 1
> a[c(1, 2, 12, 7)] = c(1, 1, 1, 1)
> a # What?!?
[1] 1 1 13 14 1 111 1 1 1 NA NA 1
Ok, the last example shows how
R manages assigning past the limit of the
underlying vector.
Not really a problem, once you get the hang of it, but, source of a bit of confusion nevertheless, at least to yours truly.
In any case, all of this must be dealt with once you want to get into building some facilities like these on top of Common Lisp.
Let's "Reinvent the Wheel" (RTW) in Common Lisp
Getting the basics down was not so difficult. Common Lisp has a somewhat richer set of array and vector data types than
R, however, some design decisions resulted in some fun (albeit probably sub-optimal) setup.
Here is a list of decisions taken, which may be subject to debate.
- Objects are represented as structures.
- Constructors follow the R conventions (no
make-array
!)
- Vectors and arrays are mapped to CL vectors and arrays.
This means that
- Arrays are kept as multi-dimensional entities which are accessed in row major order
- Vector and array "objects" are separate and different.
- A generic reference function
AT
is introduced (and its counterpart, (SETF AT)
.)
Of course, these choices will make porting code from R to the CL library/tool quite cumbersome; let's say that this goal was dropped early on. The CL library (which, by the way, is named ρ, that is:
RHO
) will look familiar to a
R programmer, but it will not be the same. After all, is it Common Lisp, isn't it?
Here is a run down of the basic features. This was the hard part, in the opinion of the author. Getting more of
R implemented will be somewhat easier.
The AT
Generic Function
The AT generic function works as expected on the standard CL objects (as it should).
RHO 10 > (defvar qwerty "qwerty")
QWERTY
RHO 11 > (setf (at qwerty 3) #\R)
#\R
RHO 12 > qwerty
"qweRty"
RHO 13 > (at qwerty 4)
#\t
RHO 14 > (defvar a #2A((1 2 3) (4 5 6) (7 8 9)))
A
;;; ...
RHO 16 > (at a 1 1)
5
RHO 17 > (setf (at a 0 0) 42)
42
RHO 18 > a
#2A((42 2 3) (4 5 6) (7 8 9))
The generic function
AT
also works on structures and class instances.
RHO 22 > (defstruct foo a s d)
FOO
RHO 23 > (defvar foo (make-foo :s 42))
FOO
RHO 24 > (at foo 'foo-s) ; In this case the accessor function FOO-S is invoked.
42
RHO 25 > (setf (at foo 'foo-d) "Food")
"Food"
RHO 26 > foo
#S(FOO :A NIL :S 42 :D "Food")
So far, so good. Now let's have some fun.
Creating Arrays (and Vectors and Matrices)
Let's create some arrays, vectors and matrices.
RHO 28 > (setf rr (range 1 9)) ; This would be 'rr = 1:8' in R.
1 2 3 4 5 6 7 8
RHO 29 > (setf rv (vector 1 22/23 -3.2 42 5.8))
1 22/23 -3.2 42 5.8
RHO 30 > (setf rv2 (vector* (list 1 22/23 -3.2 42 5.8)
:names '(a b c d e)))
A B C D E
1 22/23 -3.2 42 5.8
The objects returned by
VECTOR
and
VECTOR*
(the full constructor) are wrappers around the underlying CL vectors. Their class is
RHO:VECTOR
. Note also that
*PRINT-READABLY*
is set to
NIL
, therefore, a "pretty printing" of the vectors is produced.
RHO 37 > (setf ra (array (range 1 9) :dim '(3 3)))
[, 0] [, 1] [, 2]
[0, ] 1 2 3
[1, ] 4 5 6
[2, ] 7 8 1
RHO 38 > (at ra 1 1)
5
The function
RANGE
excludes the upper limit (following CL conventions), while the
ARRAY
constructors "recycles" the values passed to it (following
R conventions). The value produces is pretty-printed, while its class is
RHO:ARRAY
. Indexing is 0-based, as in CL.
Creating matrices is straightforward and, again, a
R-like function is used.
RHO 3 > (setf rm (matrix '(1 1 2 0) :nrow 2 :ncol 2))
[, 0] [, 1]
[0, ] 1 1
[1, ] 2 0
RHO 4 > (at rm 0 1)
1
RHO 5 > (at rm 1 0)
2
Slicing and Dicing in RHO
Now that we have all the bases covered, we can proceed to some "slicing and dicing". All the
R indexing and assignment methods are provided in
RHO
, modulo the Common Lisp background choices, bugs and unimplemented features; not many of the last kind.
First of all, let's set some
dimension names...
RHO 7 > (setf (dimension-names ra) '(("a" "b") ("x" "y")))
(("a" "b") ("x" "y"))
RHO 8 > ra
[, x] [, y] [, 2]
[a, ] 1 2 3
[b, ] 4 5 6
[2, ] 7 8 1
RHO 9 > (at ra 1 "y")
5
Next we create a new array (a matrix, but in
R, matrices are separate subclasses of arrays)
rb
and use it as a value to set a sub-matrix of
ra
.
RHO 13 > (setf rb (array 42 :dim '(2 2)))
[, 0] [, 1]
[0, ] 42 42
[1, ] 42 42
RHO 29 > (setf (at ra (vector 1 2) (vector 1 2)) rb)
#2A((42 42) (42 42))
RHO 15 > ra
[, x] [, y] [, 2]
[a, ] 1 2 3
[b, ] 4 42 42
[2, ] 7 42 42
As you can see, everything works as expected.
Remember that we have built a matrix
rm
; let's use it to set some elements of the array
ra
, according to the R "access-by-matrix" semantics.
RHO 16 > rm
[, 0] [, 1]
[0, ] 1 1
[1, ] 2 0
RHO 24 > (type-of rm)
MATRIX
RHO 27 > (setf (at ra rm) 1024)
1024
RHO 28 > ra
[, x] [, y] [, 2]
[a, ] 1 2 3
[b, ] 4 1024 42
[2, ] 1024 42 42
RHO 29 > (setf (at ra #(1 2) #(1 2)) #(1 1 1 1))
#(1 1 1 1)
RHO 30 > ra
[, x] [, y] [, 2]
[a, ] 1 2 3
[b, ] 4 1 1
[2, ] 1024 1 1
Note however, that, although "correct", no attempt to signal an error is made for incongruous dimensions.
RHO 42 > (setf (at ra #(1 2) #(1 2)) #(11 12 13 14 15))
#(11 12 13 14 15)
RHO 43 > ra
[, x] [, y] [, 2]
[a, ] 1 2 3
[b, ] 4 11 12
[2, ] 1024 13 14
Getting the indexing of different combinations of scalars, vectors, and arrays was an interesting exercise. The implementation is probably bloated and not easily understandable; most probably because of the design choice of keeping the underlying CL vectors and arrays "as-they-are", especially multidimensional arrays.
Missing Things...
Many...
Most important one is "negative" indexing, which does not work completely yet.
Of course, I am limiting this comment to arrays and vectors: other
R data structures, such as
factors and
data.frames are in the making, but all the heavy lifting is now done.
Want to Play?
Well, for the time being, drop me a line (you know where to find me). We can talk about setting up a proper project and discuss some design choices.
Cheers