20240303

A Rant about R and Python (and anything with a-lists, tuples and "dictionaries").

R and Python come from Lisp (R for sure, Python will deny it). Early Lisp. Even before the first edition of "AI Programming" by Charniak, Riesbek and McDermott.

At that time, there were a-lists and p-lists. Then Charniak, Riesbeck and McDermott taught us (not only them of course) to create records in Lisp. You know... those things like:

    DECLARE
      1 STUDENT,
        2 NAME     CHAR (30),
        2 SURNAME  CHAR (50),
        2 ID FIXED DECIMAL (8),
        2 ADDRESS,
          3 STREET  CHAR (80),
          3 NUMBER  FIXED DECIMAL (5),
          3 CITY    CHAR (80),
          3 PRST    CHAR (20),
          3 ZIP     CHAR (10),
          3 COUNTRY CHAR (80);

Using a-lists the above may become:

    (defvar student '((name . "John")
                      (surname . "Blutarski")
                      (id . 42)
                      (address . ((street . "United State Senate")
                                  (number . 0)
                                  (city . "Washington")
                                  (prst . "DC")
                                  (zip . "20510")
                                  (country . "U.S.A.")))
                      ))

In Python you can do the following.

    student = {}    # a 'Dict'; i.e., a hash table.
    student['name'] = "John"
    student['surname'] = "Blutarski"
    student['id'] = 42
    student['address'] = {}
    student['address']['street'] = "United State Senate"
    student['address']['number'] = 0
    student['address']['city'] = "Washington"
    student['address']['prst'] = "DC"
    student['address']['zip'] = "20510"
    student['address']['country'] = "U.S.A."

Not that you must, but surely you can; and this, as in the case of R below, is the root of my rant.

In R you use lists; a misnomer for something that is essentially a dictionary like in Python, patterned, ├ža va sans dire, after a-lists.

    student = list()
    student$name = "John"
    student$surname = "Blutarsky"
    student$id = 42
    student$address = list()
    student$address$street = "United State Senate"
    student$address$number = 0
    student$address$city = "Washington"
    student$address$prst = "DC"
    student$address$zip = "20150"
    student$address$country = "U.S.A."

This of course gives you a lot of flexibility; e.g., if - in the middle of your code - you need to deal with the student's nickname, you just write the following.

In (Common) Lisp:

    (defun tracking-student ()
        ...
        ...
        (setf student (acons 'nickname "Bluto" student))
        ...
        )

In Python:

    def tracking_student():
        ...
        ...
        student['nickname'] = "Bluto"
        ...

In R:

    tracking_student <- function() {
        ...
        ...
        student$nickname = "Bluto"
        student <<- student     # Yes, R scoping rules are ... interesting.
        ...
    }

The example is relatively simple and innocuous, but it has some consequences when you have to actually read some code.

The problem I have (it may be just me) is that this programming style does not give me an overview of what a data structure (a record) actually is. The flexibility that this style allows for is usually not accompanied by the necessary documentation effort telling the reader what is going into an "object". The reader is therefore left wondering, while taking copious notes about what is what and where.

Bottom line: don't do that. Use defstruct and defclass in CL, classes in Python, struct in Julia, etc. etc. etc. In R, please document your stuff.

You may feel that your code is less malleable, but, in the end it becomes easier to read. At least for me. Sorry.


(cheers)

No comments:

Post a Comment