* Final Project Requirements: - Due 5:30pm, May 25th. - No extensions. If you turn it in late, you get a 0. - Option #1: a paper - 5 page analysis of an aspect of a program or a system, using the techniques we discussed in class. - 5 pages (of text) of a design for a user interface. - Some other 5 page paper that is of appropriate quality and scholarship for this class. - Option #2: A program. - A prototype that clearly illustrates something that we have discussed in class. - A one-page document describing the program, why you did it, what it shows. - Include both a runable executable and all the source. - Information not turned in will not count towards your grade. **************************************************************** TIME * What time is it? * Why is it important to have a single time? * http://www.time.gov/ - http://tf.nist.gov/service/its.htm - http://tycho.usno.navy.mil/ - Powerpoint: http://www.navcen.uscg.gov/cgsic/meetings/summaryrpts/38thmeeting/Miranian.ppt * Internet protocols that get the time: - NTP protocol * NetGear flaw triggers DoS attack - http://news.com.com/2100-1002_3-5068035.html - http://www.cs.wisc.edu/~plonka/netgear-sntp/ * Some HTTP DoS did this recently (used GET instead of HEAD) * mrtg.org * How do you handle changes in the time. * How do you handle timezones? * iCalendar BEGIN:VCALENDAR CALSCALE:GREGORIAN PRODID:-//Apple Computer\, Inc//iCal 2.0//EN VERSION:2.0 BEGIN:VEVENT LOCATION:53 Church Street room 203. EXDATE;TZID=US/Eastern:20060330T173000 UID:0A1B543D-0E33-4AF9-A22A-D29B5338F61B SEQUENCE:13 DTSTAMP:20060510T153902Z DTSTART;TZID=US/Eastern:20060202T173000 SUMMARY:CSCI E-180+ DTEND;TZID=US/Eastern:20060202T193000 RRULE:FREQ=WEEKLY;INTERVAL=1;UNTIL=20060526T035959Z;BYDAY=TH;WKST=SU END:VEVENT END:VCALENDAR * How do you handle clock changes? - scoreboards? * Crazy Clocks - What's the lesson of this article? **************************************************************** Natural Language Processing References: * http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-863JSpring2003/CourseHome/index.htm * What is natural language processing and what can we do? - Spelling correction - Written commands - Making sense of text that's online * Limitations: - Language specific (usually) - You can get perfect (even people aren't perfect) * Two techniques: - Try to "understand" the text - Statistics * Examples: - SBook - Scheduling in Google Calendar - Seminar announcement reading system - Anti-spam - * How do you do this? - Build a model of what you are trying to do - Test the model - Have lots of provisions for "special cases" - Preprocess the input - Prepare for failure. * Example: Language Identification - saint_french, saint_english - single character frequencies, bigrams and trigrams - word frequencies - Vocabulary * People are very good at this: - words1.txt - words2.txt - words3.txt - CRM114 * Tools - regular expressions - flex - baysian models * Data sources: [simsong@Phoenix dfrws] % ls -l /usr/share/dict/ total 3448 -r--r--r-- 1 root wheel 516 Aug 22 2005 README -r--r--r-- 1 root wheel 706 Aug 22 2005 connectives -r--r--r-- 1 root wheel 8640 Aug 22 2005 propernames -r--r--r-- 1 root wheel 2486825 Aug 22 2005 web2 -r--r--r-- 1 root wheel 1012730 Aug 22 2005 web2a lrwxr-xr-x 1 root wheel 4 Mar 16 15:56 words@ -> web2 [simsong@Phoenix dfrws] % - SSA * Very large (or very many) regular expressions - Spamassassin /usr/local/share/spamassassin/20_head_tests.cf /usr/local/share/spamassassin/20_phrases.cf /usr/local/share/spamassassin/30_text_fr.cf - What SBook does. ~/slg/sbook/libsbook/firstname.fp ~/slg/sbook/libsbook/parse_address.fp * Semantic analysis - Wordnet http://wordnet.princeton.edu/ * Getting a corpus - It's hard! Side by side: http://www.lofficier.com/saint7.html