Wednesday, September 12, 2007


'Programming Collective Intelligence' in Common Lisp, Chapter 2

Like many others, I've been reading Toby Segaran's Programming Collective Intelligence. Toby's examples are in Python. Inspired by loucal's posting of code examples in Ruby, I've decided to put up my own Common Lisp examples. These are from Chapter 2, going up to page 15, "Ranking the Critics".

In order to just have the recommendations in the file, I used assoc lists instead of hashes. One place where Python (and Ruby) has it over Lisp is in hash syntax, just critics[person] instead of (gethash person critics) or (cdr (assoc person critics :test #'equalp)). I made a 'critics' function to keep down the verbosity. Is there a good way to change the syntax for hash lookup?

The other big difference is my use of mapcar and reduce everywhere instead of Python's list comprehensions, with the occaisional intersection thrown in.

If the Python source code gets posted, I may try some benchmarks with later examples.

(defparameter *RECOMMENDATIONS*
("Lisa Rose" . (("Lady in the Water" . 2.5) ("Snakes on a Plane" . 3.5) ("Just My Luck" . 3.0)
("Superman Returns" . 3.5) ("You, Me and Dupree" . 2.5) ("The Night Listener" . 3.0)))
("Gene Seymour" . (("Lady in the Water" . 3.0) ("Snakes on a Plane" . 3.5) ("Just My Luck" . 1.5)
("Superman Returns" . 5.0) ("The Night Listener" . 3.0) ("You, Me and Dupree" . 3.5)))
("Michael Phillips" . (("Lady in the Water" . 2.5) ("Snakes on a Plane" . 3.0)
("Superman Returns" . 3.5) ("The Night Listener" . 4.0)))
("Claudia Puig" . (("Snakes on a Plane" . 3.5) ("Just My Luck" . 3.0) ("The Night Listener" . 4.5)
("Superman Returns" . 4.0) ("You, Me and Dupree" . 2.5)))
("Mick LaSalle" . (("Lady in the Water" . 3.0) ("Snakes on a Plane" . 4.0) ("Just My Luck" . 2.0)
("Superman Returns" . 3.0) ("The Night Listener" . 3.0) ("You, Me and Dupree" . 2.0)))
("Jack Matthews" . (("Lady in the Water" . 3.0) ("Snakes on a Plane" . 4.0) ("The Night Listener" . 3.0)
("Superman Returns" . 5.0) ("You, Me and Dupree" . 3.5)))
("Toby" . (("Snakes on a Plane" . 4.5) ("You, Me and Dupree" . 1.0)
("Superman Returns" . 4.0)))))

(defun critics (reviewer &optional movie)
(labels ((get-movie (ms m)
(cdr (assoc m ms :test #'equalp))))
(let ((movies (cdr (assoc reviewer *RECOMMENDATIONS* :test #'equalp))))
(if movie (get-movie movies movie) movies))))

(defun similar (person1 person2 distance)
(let* ((movies1 (critics person1))
(movies2 (critics person2))
(common-movies (mapcar #'car (intersection movies1 movies2
:test #'(lambda (x y) (equalp (car x) (car y)))))))
(if (null common-movies)
(funcall distance person1 person2 common-movies))))

(defun euclidean-distance (person1 person2 common-movies)
(let* ((sum-of-squares (reduce #'+ (mapcar
#'(lambda (cm)
(expt (- (critics person1 cm) (critics person2 cm)) 2))
(distance (/ 1 (1+ sum-of-squares))))

(defun sim-distance (person1 person2)
(similar person1 person2 #'euclidean-distance))

(defun pearson-distance (person1 person2 common-movies)
(let* ((n (length common-movies))
(scores1 (mapcar #'(lambda (x) (critics person1 x)) common-movies))
(scores2 (mapcar #'(lambda (x) (critics person2 x)) common-movies))
(sum1 (reduce #'+ scores1))
(sum2 (reduce #'+ scores2))
(sum1-sq (reduce #'+ (mapcar #'(lambda (x) (* x x)) scores1)))
(sum2-sq (reduce #'+ (mapcar #'(lambda (x) (* x x)) scores2)))
(psum (reduce #'+ (mapcar #'* scores1 scores2)))
(num (- psum (/ (* sum1 sum2) n)))
(den (sqrt (* (- sum1-sq (/ (expt sum1 2) n)) (- sum2-sq (/ (expt sum2 2) n))))))
(if (zerop den) 0 (/ num den))))

(defun sim-pearson (person1 person2)
(similar person1 person2 #'pearson-distance))

(defun top-matches (person &optional (n 5) (similarity #'sim-pearson))
(let* ((scores (mapcar #'(lambda (x) (cons (funcall similarity person x) x))
(remove-if #'(lambda (x) (equalp x person)) (mapcar #'car *RECOMMENDATIONS*))))
(sorted-scores (sort scores #'> :key #'car))
(len (length sorted-scores)))
(if (<= len n)
(butlast sorted-scores (- len n)))))

If you don't like the syntax for hash table look-ups, use a macro to "fix" it. In this case, a simple read macro might be the way to go.

(See chapter 17 of On Lisp. You can program the reader to convert something reasonably close to the python syntax into the item in the hash table at read-time. This is subtly different from the usual compile-time macro expansion.)
You might be interested in this reader macro. It defines an alternative syntax for hash table creation and retrieval.
i think using macros and similar is cheating.

i can use aliases in ruby just as easily (and do it) but at the same time, if other users "experience" this, its not official but user-defined stuff... it feels weird to stray away from the core language (i do this in ruby too *grin* but to me the ruby syntax looks a lot cleaner)

And yeah I say this knowing that Lisp people _love_ lisp ;)
Anonymous - Lisp macros are not analogous to Ruby aliases in any way. They are one of THE reasons to use Lisp: the fuzzing of the code/data distinction, which is stark in most languages (Ruby included). Where Ruby shines is in fuzzing the code/runtime distinction, but that's an entirely different beast all together.

Macros allow you to alter the programming language to better fit your task. You should use them as often as they improve code clarity without introducing unnecessary complexity. That, of course, is up for a certain amount of debate.
Macros are the standard solution.

(defmacro critic (critic)
(gethash critic critics))

(defsetf critic (critic value)
(setf (gethash critic critics) value))

Then after (setf (critic foo) bar),
(critic foo) returns bar.
> (defmacro critic (critic)
> (gethash critic critics))

Except critic can be a function.

(defun critic (critic)
(gethash critic critics))
Very cool! Always wanted to check out LISP in a little more detail, newlisp seems cool, I tried it out a little a few months ago.

Thanks for crediting me for the inspiration! I finally forced myself to put up a blog before my ruby on rails blog engine is production ready because I was letting too many good ideas go unwritten, so your credit goes a long way to getting some recognition for my new blog, and it did not go without notice. I have added you to my linkblog, thanks again, and any other links will also be much appreciated.
academician, thanks for the reader macro link. I definitely plan on using that in future chapters. I've just had a quick look at it and compiled it, do you know why they wanted to restrict keys to symbols? I commented out the checks and can use strings as keys in simple tests, but don't know if it will bite me later.
Presumably due to the equality testing?

Here's a simple hash populating (non-reader) macro:

(defmacro populate-hash (hash-to-populate &rest key-values)
(let ((hash (gensym)))
`(let ((,hash ,hash-to-populate))
,@(loop for item in key-values
collect `(setf (gethash ,(car item) ,hash) ,(cadr item)))

some simple tests:

(gethash 'a (populate-hash (make-hash-table) ('a 1) ('b 2)))
(defvar cake (make-hash-table))
(gethash 'a (populate-hash cake ('a 1) ('b 2)))
But when the code makes the hash tables, they set up #'equalp as the test, which I believe is the most forgiving test you can use.

CL-USER> (equalp "hello" "HELLO")
I'd guess that the test for symbols is to help catch malformed key/value lists. But this does presume to know more about the user's intentions than a general purpose utility probably should.

Extending the reader to use a separator like <= for pairs, or using a macro like mine that wraps each pair in parentheses, would make things more syntactically robust without restricting the user.

I still don't know why equalp is used though. :-)
There's very little difference between critics[person] and (gethash person critics). It's just a matter of the ubiquitous parentheses and then the use of the seven-character word 'gethash' rather than the two brackets.

If it's the latter that bothers you, just throw down a method named, say, [], that has similar functionality/syntax to the Ruby arrays. So it would then be ([] critics person), for example. Just the handful of saved characters can make a big difference sometimes (I experience similar joy from using ^ instead of expt).

You can then, of course, extend it to other things, using it as a wrapper for nth and such. Generics are fun! And then there's no need to invoke the reader and break standard syntax.
The chapter 2 examples had tables of tables, so in terms of looks I prefer [[critics "Toby"] "Superman Returns"] to ([] ([] critics "Toby") "Superman Returns").

In terms of functionality, the reader macro gives the advantage of "inline setup" in defparameter, let, etc. intead of a make-hash-table, then setf's on gethash later.

CL-USER> (defparameter critic #["hello" "world" "ht" #["one" 1 "two" 2]])
CL-USER> [[critic "ht"] "one"]
Eh, I'm not really seeing the benefit in the tables-of-tables part, but YMMV.

That inline ability is a bit sexy, though. I see the benefit there. I'm always a bit nervous about adding reader macros, but hell, if you're using hashes enough and are annoyed enough by the syntax, by all means go for it.
A very very helpful post. Keep up the good work!
tanks for the code., it works and i change a bit of codes.
Thanks for this great post, indeed informative.:)
Thanks for sharing it.
Thanks for sharing this one hope to read more.
Thanks for sharing this post, hope to read more of this.:)
This post is great! thanks for posting this and sharing some important info.. Keep on posting more info here, i really enjoy reading that codes and i also learned.
bad credit payday loans
Thanks for your great post!!!
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?