Thursday, November 17, 2005

 

Looking for Lisp/NLP starting points

I guess I'm having trouble with the correct incantations into Google. I'm trying to find information on natural language processing, preferably in Lisp but I'll look at anything, specifically towards taking in text and summarizing it.

I'm thinking of doing an RSS reader where long-winded feeds can be automatically summarized down to "abstract" form, and including a link back to the original article that the user can decide to read or just move on to the next article.

Any suggestions for starting points would be appreciated.

Comments:
The magical phrase is "text summarization". I don't know of a publically available Lisp code off-hand. For C, there's libots on Sourceforge. Looks like a lot of the research efforts finished with low-hanging fruit a few years ago. Many were commercialized. www.summarization.com has some references.

The hardest part is defining the problem. Programming is "relatively" easy.
 
Hmm, what exactly do you plan to do with the feed reader? Open Source it? Because I've been hacking a bit with RSS/Atom recently.
Thought, this might be helpful.
 
For basic info have a look at:
the AAAI NLP page

For a lisp-specific library, have a look at:
langutils
 
See the list of online NLP resources and books at Peter Norvig's AIMA site.
 
anonymous,
Thanks for the links - if nothing else, I can link into libots, although I prefer to have as much Lisp as possible.

Vishnu,
Open source is a possibility, if the project is "worthy" of release. Right now I'm considering it a combination of "scratching an itch" and "fun learning project".

The itch I want to scratch is having a RSS reader that summarizes long articles in a meaningful way so I can scan and decide if I want to read the rest of it. I think this kind of application will become more important in the future as we get inundated with more information.

I'm also interested in Lisp/AJAX combination applications where the "interesting" parts of the application are done in Lisp behind a HTTP server, such as portableaserve, and the user interface techniques are done for me by AJAX libraries and spitting out HTML to a browser.

With a more intelligent RSS reader I can work on one end of the problem (AJAX, for example) and when I get stuck/bored I can swith to the other part for a while. Fun for me, but not necessarily something I want to inflict on the public at first.
 
Drew McDermott has a paper "Lexiparse - A Lexicon-based Parser for Lisp Applications" at (http://www.cs.yale.edu/homes/dvm/papers/parser-manual.pdf). For summarization, I'd look for references on Latent Semantic Analysis. If you go that route, I have some Lisp code that might help.
 
Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?