20110614

Printing XHTMΛ

I finally bit the bullet! I swear I tried! I pestered people! I failed. I.e., I failed to like any of the "major" HTML-producing libraries "out there" and so I rolled up my own. In the process, I (re)learned a few things and I believe I made good use of parts of the language that are usually overlooked.

My problems with the other libraries

This section must start with an apology.

All the libraries I tried, are very fine and sophisticated pieces of software that do solve problems. Alas, myself being a rotten Lisper, I found that I "needed something different" (read: "something I wrote"). Therefore, the comments you'll read below are not to be intended as general statements about such libraries, but only as testimony of my whims.

The libraires I looked at are CL-HTTP, CL-WHO and variations of TFEB's htout and Franz htmlgen, especially in the XHTML-GENERATOR version that comes with CXML.

As I said, my idiosyncrasies with the the whole business of CL programming found problems with each of these otherwise fine libraries. More specifically, I found CL-HTTP too heavy to use just to generate HTML. One gripe I had with CL-WHO is that it did not handle pretty printing of HTML well (indentation is off in "recursive" use); more or less the same can be said of htout and htmlgen. CXML XHTML-GENERATOR is essentially a "round-trip" utilities and it makes your life quite unhappy if you are trying to use simple HTML entities like - surprise - λ and Λ.

CL-WHO, htout, htmlgen and XHTML-GENERATOR all take the approach summarized as I will compile a SExp representing "HTML" and will generate - in line - a set of specialized writing calls (yes: mostly WRITE and WRITE-STRING). (Cfr. the examples in CL-WHO documentation; .)

There is nothing wrong with this approach, but it makes the resulting library and overall implementation more monolithic and it does not leverage some of the bells and whistles that you have available in CL. Thus I rolled my own (and I called it XHTMΛ).

Yet anothern HTML generation library

My approach to HTML (or XML) generation is the following:

  1. HTML (or XML!) element need not be "lists" or "conses"; they can be bona-fide objects, i.e., structures.
  2. print-object, and, above all, the pretty printer are my friends.
  3. *print-pretty*, *print-readably* etc., are more than useful.

There are a few consequences from this choices and they should be exposed. Before doing that, let's see what happens in the basic case.

The basic definition in the implementation of XHTMΛ is the representation of a HTML (or XML!) "element". It is very simple and it does accommodate the HTML5 bits and pieces.


(defstruct (element (:constructor %element))
   (tag        nil :type symbol)
   (attributes ()  :type list)
   (content    ()  :type list))

tag is ... the tag, attributes is a p-list and content is a possibly empty list of other elements.

"Printing" an element

Let's forget a minute about the constructor and let's instead concentrate on an element "printing" process. The main entry point is a print-object method.


(defmethod print-object ((e element) (s stream))
  (let ((tag (element-tag e))
        (attributes (element-attributes e))
        (content (element-content e))
        )
    (cond (*print-pretty*
           (pprint-xhtml s e))

          (*print-readably*
           (format s "#S(~S :TAG ~S :ATTRIBUTES ~S :CONTENT ~S)"
                   (type-of e)
                   tag
                   attributes
                   content))

          (t
           ;; Format string showing-off!!!!
           (format s "<~A~{ ~A=\"~A\"~}~:[ />~;>~:*~{~S~^ ~}</~3:*~A>~]"
                   (string-downcase tag)
                   attributes
                   content
                   )
           ))
    ))

The method is rather straightforward (apart from the last format string, which does many things at once: (1) writes the attributes, (2) checks whether there is content and if not closes the tag, otherwise backs up to print it, and (3) finally it backs up again to the tag to print the proper closing element). Note that, in order to properly and nicely printing the element, if *print-pretty* is non-NIL, then the function pprint-xhtml is called.

Using the pretty printer

It may be just me, but I believe that the pretty printer is an under-used part of the CL standard. Therefore, I set out to use it heavily in order to get "properly indented" (meaning, the way I like it) (X)HTML. The function pprint-xhtml does that.


(defun pprint-xhtml (s xhtml-element)
  (declare (type stream s)
           (type element xhtml-element))
  (let ((tag (string-downcase (element-tag xhtml-element)))
        (attrs (element-attributes xhtml-element))
        (content (element-content xhtml-element))
        )
    (pprint-logical-block (s content)  ; (1)
      (pprint-logical-block (s content)  ; (2)
        (format s "<~A~@<~{~^ ~A=\"~S\"~^~_~}~:>" tag attrs)  ; (3)
      
        (when content
          (write-char #\> s)
          (pprint-newline :mandatory s)
          (format s "~{~4,0:T ~:W~_~}" content)
          ))

      (if content
          (format s "~0I</~A>" tag)
          (format s " />"))
      )))

The function requires a few explanations (of course, if you are a "pretty printer black-belt" this may be a bit boring). First of all, a display of what I want to obtain.

<body style="color: red">
    <p>
        Some text here
        <ul>
            <li>
                Line 1
            </li>
        </ul>
    </p>
</body>

This indentation may not be the best possible and there are some pitfalls, but it is better than what you get with the other libraries. But how does the function pprint-xhtml achieve this result while interacting with the pretty printing machinery?

The function pprint-xhtml uses three logical blocks. Two for the element and a third for the attributes. The logical block for the attributes is introduced in the format string using the ~@< ... ~:> directive. Note also the conditional newline ~_ in the list iteration construction ~{ ... ~}. The other two pprint-logical-block establish the fence for the whole element and for the "inside" of the same. The outer pprint-logical block serves essentially to print the closing tag (if needed) correctly indented. The "inner" pprint-logical-block just serves to provide the correct indentation for the tag and the actual element content. The pprint-newline and the indentation directive in the format string, do the rest.

Once you wrap your head around it (it did take me some time!) it is very straightforward, and very powerful.

Bells and Whistles

The pretty printing machinery offers you more control over what you can do with it. For the time being my code just uses one simple hook into the pretty printer dispatch table in order to write strings "unquoted", but, potentially, this is the machine to provide fancier element layout.

The actual "printing" of an element is controlled by a specialized macro (provisionally) called with-html-syntax which calls write with an appropriately setup :pprint-dispatch argument.

The variable *xhtml-pd* holds the modified pretty print dispatch table, which it is initialized as follows (at a minimum):


(set-pprint-dispatch 'element
                     'pprint-xhtml
                     0
                     *xhtml-pd*)

(set-pprint-dispatch 'string
                     (lambda (s xhtml-string)
                        (write-string xhtml-string s))
                     0
                     *xhtml-pd*)

This is the result:


XHTMLAMBDA 29 > (with-html-syntax (*standard-output* :print-pretty t)
                  (body (:style "color: red")
                        (p ()
                           "Some text here"
                           (ul ()
                               (li () "Line 1")))))
<body style="color: red">
    <p>
        Some text here
        <ul>
            <li>
                Line 1
            </li>
        </ul>
    </p>
</body>
<body style="color: red"><p>"Some text here" <ul><li>"Line 1"</li></ul></p></body>  ; This is value returned!

XHTMLAMBDA 30 >

XHTMΛ Syntax

As you have noted in the previous example, the syntax of a XHTMΛ elements is


   (tag attributes . content)

where each tag is implemented as a macro, which is essentially in charge of delaying the evaluation of the content plus some other massaging, mostly flattening of the content lists, this is achieved by having each macro calling a first parsing step, which generates an "intermediate" form that eventually calls the element function (see below). The following example shows a pretty standard trick:


XHTMLAMBDA 33 > (with-html-syntax (*standard-output* :print-pretty t)
                  (body (:style "color: red")
                        (p ()
                           "Some text here"
                           (ul () (loop for i below 5
                                        collect (li () (format nil "Line ~D" i)))))))
<body style="color: red">
    <p>
        Some text here
        <ul>
             <li>
                 Line 0
             </li>
             <li>
                 Line 1
             </li>
             <li>
                 Line 2
             </li>
             <li>
                 Line 3
             </li>
             <li>
                 Line 4
             </li>
        </ul>
    </p>
</body>
<body style="color: red"><p>"Some text here" <ul><li>"Line 0"</li> <li>"Line 1"</li> <li>"Line 2"</li> <li>"Line 3"</li> <li>"Line 4"</li></ul></p></body>

XHTMLAMBDA 34 >

Thus XHTMΛ is unlike most other libraries which just discriminate on the first element of a SExp, usually a keyword. XHTMΛ wants more structure and it strives to be more easily extensible through "standard" and low-level machinery, cfr., the pretty printing machinery and CLOS. As an aside, the %element constructor is there just to be called by a "factory" generic function called - you guessed it - element.

Other Syntaxes and the HTMLIZE Macro

Yet there is value in the widely used alternative SExp syntax for HTML (and XML):


  (tag . content)

or


  ((tag . attributes) . content)

In order to accommodate such syntax (and also a "keyword-based" one), XHTMΛ provides a htmlize macro which does some more rewriting from the syntax just above (termed :compact) to the "operator-and-attributes" syntax (termed :standard).


XHTMLAMBDA 39 > (htmlize
                 ((body :style "color: red")
                  (p "Some text here"
                     (ul (loop for i below 5
                                  collect (li () (format nil "Line ~D" i))))))
                 )
<body style="color: red"><p>"Some text here" <ul><li>"Line 0"</li> <li>"Line 1"</li> <li>"Line 2"</li> <li>"Line 3"</li> <li>"Line 4"</li></ul></p></body>

Availability

The XHTMΛ library will be available "very soon"™ in common-lisp.net. Stay tuned!

References

[W93] Richard C. Waters, Some Useful Lisp Algorithms: Part 2, Mitsubishi Electric Research Laboratories Technical Report 93-17, August, 1993.

[S90] Guy L. Steele Jr., Common Lisp, the Language, 2nd Edition, Digital Press, 1990.

9 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Great article! Its a very useful and informative blog. I'm sending it to all my friends on Facebook.But it is very difficulty..

    ReplyDelete
  3. Hi Marco,

    Did you try Yaclml? Maciek Pasternacki has a great introduction to it at http://www.3ofcoins.net/2009/02/07/yaclml-in-pictures-part-i-html-generation/, and it seems to be similar in spirit to what you did.

    ReplyDelete
  4. Nope. I did not try that. It went under my radar. I will give it a look.

    MA

    ReplyDelete
  5. Very nice, and I'm glad to see good use of the pretty printer; this is a good application for it. A few pedantic notes on using this for XML generation, though. The

    (string-downcase …)

    of the element name could cause some problems in XML formatting (but is fine for (X)HTML, as far as I know). Similarly, the

    (pprint-newline :mandatory s)

    after the #\> of the opening tag of an element could be inserting significant whitespace. (This isn't just for XML; in (X)HTML, whitespaces sneaking in can cause some issues in layout and formatting.) Some special vars for customizing some of that behavior could be handy.

    ReplyDelete
  6. Thanks YeashuaAaron. I know my XML and XHTML fu is not very strong. I'll keep this in mind. Fort the time being I am targeting (X)HTML, so I think I can sweep some issues under the carpet. If you have any suggestions on the knobs you think interesting do let me know.

    MA

    ReplyDelete
  7. try first class dom objects.

    http://labs.core.gen.tr/#page:domprogramming

    ReplyDelete
  8. Print programming set on the computer and give advance to save the time and money also. There are best of them and newer technology.

    cheap a5 flyer printing

    ReplyDelete
    Replies
    1. Most printers can be acclimated to book checks, from ample appointment printers to home appointment printer scanner copiers, but afore checks can be printed, software have to be acclimated to architecture the check's format.

      brand design agency sydney

      Delete