ttrbeta - lines = lines 1 tokens = tokens...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
#!/usr/bin/env python calculates the type to token ratio import sys,re def cmpItems(x,y): return -cmp(x[1],y[1]) #assigns key and value lines = 0 tokens = 0 types = {} for line in sys.stdin.readlines(): #reads in file line = re.sub(r'[^a-z]',' ',line.lower()) #eliminates capitalization
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lines = lines + 1 tokens = tokens + len(line.split()) #counts word tokens for token in line.split(): #separates words with a new line types[token] = types.get(token,0) + 1 #counts types m = len(types) n = tokens ttr = float(m)/float(n) print ttr...
View Full Document

This note was uploaded on 09/06/2009 for the course LING 571 taught by Professor Staff during the Fall '08 term at San Diego State.

Ask a homework question - tutors are online