thisoneworks - for word in word_list5: word =...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
#!/usr/bin/env python # -*- coding: utf-8 -*- ##differan.py #Creates distinct frequency dictionaries from two text files, which are used #to form a third dictionary containing all words which occur in one or both #dictionaries. The third dictionary also contains the coefficients #of difference, which are calculated from the frequencies of the first two #dictionaries: cof = (freq1-freq2)/(freq1+freq2). ##Also outputs a list of words and their frequencies in each corpus; these #words are also outputted in concordance with their coefficients of difference. import re c1 = '2001.txt' c5 = '2005.txt' ## Creates frequency dictionaries from the two text files. word_list1 = re.split('\s+', file(c1).read().lower()) word_list5 = re.split('\s+', file(c5).read().lower()) freq_d1 = {} freq_d5 = {} for word in word_list1: word = re.sub(r'[^a-z]','',word) try: freq_d1[word] += 1 except: freq_d1[word] = 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: for word in word_list5: word = re.sub(r'[^a-z]','',word) try: freq_d5[word] += 1 except: freq_d5[word] = 1 for word in freq_d1.keys(): if word not in freq_d5: freq_d5[word] = 0 for word in freq_d5.keys(): if word not in freq_d1: freq_d1[word] = 0 diff = {} for word in freq_d1.keys(): diff[word] = float(freq_d1[word]-freq_d5.get(word,0))/float(freq_d1[word] +freq_d5.get(word,0)) for word in freq_d5.keys(): if word not in diff: diff[word] = -1.0 value_key = sorted([(v, k) for k, v in diff.items()], reverse=True) for vk in value_key[0:10]: if vk[1] not in " \n": print "%15s %4.2f %4d %4d" % (vk[1], vk[0], freq_d5[vk[1]], freq_d1[vk[1]]) value_key = sorted([(v, k) for k, v in diff.items()], reverse=False) for vk in value_key[0:10]: if vk[1] not in " \n": print "%15s %4.2f %4d %4d" % (vk[1], vk[0], freq_d5[vk[1]], freq_d1[vk[1]])...
View Full Document

This note was uploaded on 09/06/2009 for the course LING 571 taught by Professor Staff during the Fall '08 term at San Diego State.

Page1 / 2

thisoneworks - for word in word_list5: word =...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online