7 Pages

comPerformance

Course: CIT 595, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 1399

Document Preview

for Motivation Examining Performance Hardware performance is often key to the effectiveness of an entire system of hardware and software Why certain piece of software performs the way it does? Why one instruction set can be implemented to perform better than another? CIT 595 Spring 2008 Computer Performance CIT 595 2 What defines Performance? If you are running a program on 2 different workstations Then faster...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> CIT 595

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
for Motivation Examining Performance Hardware performance is often key to the effectiveness of an entire system of hardware and software Why certain piece of software performs the way it does? Why one instruction set can be implemented to perform better than another? CIT 595 Spring 2008 Computer Performance CIT 595 2 What defines Performance? If you are running a program on 2 different workstations Then faster one is the one that gets the job done first As a user you are interested reducing the response time i.e. time from start to completion of task (a.k.a. execution time) Measuring Response Time Response Time is one of the measure of computer performance So how do we measure the time? We could just time from started our program till the time it was done executing (i.e. elapsed or response time) However modern computers are often timeshared and may simultaneously work on several programs So we want distinguish between the time the processor is working on our behalf vs. the time spent waiting 3 CIT 595 4 If you are running a computer center with 2 timeshared computers running jobs submitted by many users Then faster computer is the one that completed most jobs during the day As system administrator you are interested increasing throughput i.e. the total amount of work done in a given time CIT 595 1 Execution Time or Latency Response Time = CPU Execution Time + Wait Time CPU (processor) execution time i.e. time spent computing for a particular program Not running other programs or waiting on I/O Also called as Latency CPU Execution Time or Latency CPU execution time of program depends on: Dependent on the total number of instructions executed Each instruction performance is dependent on the number of clock cycles it takes to complete the instruction cycle (Fetch thru Store) called as cycles per instruction (CPI) One clock cycle is time interval with clock ticks i.e. clock cycle time time program time = cycle cycles instruction CPI instructions program Instruction Count CPU Time = x x Clock Cycle time CIT 595 5 CIT 595 6 CPU Execution Time (contd..) Clock cycle time = 1/Clock Rate or 1/Frequency Clock rate is given in Hertz (Hz) = cycles/sec Cycles Per Instruction (CPI) CPI is an average of all the instructions executed in the program Depends on Instruction Count x CPI CPU Time = Clock Rate Instruction Set Architecture (ISA) Design details (micro-architecture) CPI varies among different implementations of the same ISA CPI is obtained by performing a detailed simulation of an implementation CIT 595 7 CIT 595 8 2 Example: Calculating CPI Given: # Instructions & Clock Cycle Time Number of Instructions depends on: Instruction Load Store ALU Branch/Jump Others # cycles 5 5 5 3 6 % Instr. Count 25 15 40 15 5 Algorithm Compiler ISA Clock Cycle Time depends on: Hardware Technology (transistor technology) Design (Micro-architecture) Simulate the longest propagation path of your circuit The average CPI is = 0.25 x 5 + 0.15 x 5 + 0.40 x 5 + 0.15 x 3 + 0.05 x 5 = 4.75 cycles CIT 595 9 CIT 595 10 Maximizing Performance in relation to Execution Time One, way to maximize performance, we want to minimize execution time Thus we can relate performance and execution time for a machine X as: PerformanceX = 1/ Execution TimeX For low latency we want to minimize all 3 factors CPI, # instructions and Clock cycle time This is difficult as trying to decrease one often leads to increasing the other parameter E.g. Two ISA Philosophies RISC low CPI/clock period, high inst count CISC low insn count, high CPI/clock period CIT 595 11 Performance Ratio (Comparing Performance) To claim that Machine X performs better than Machine Y: PerformanceX > PerformanceY i.e. 1 > Execution TimeX Performancex Performance Ratio = PerformanceY = Execution TimeX Execution TimeY Execution TimeY 1 CIT 595 12 3 Relative Performance Example A Clock Cyle Time (ns) CPI 1 2.0 B 2 1.2 Misleading Metrics Clock Which is faster? ClockA = 5 GHz, ClockB = 3 GHz 2 implementations of the same ISA Which machine (A or B) is faster and by how much? Let I be the number instructions per program Ex TimeA = 1 x 2.0 x I = 2 x I ns Ex TimeB = 2 x 1.2 x I = 2.4 x I ns Probably A, but make this assumption only if the same ISA & compiler If stated that CPIA = 2 and CPIB = 1, then which is faster? MIPS Millions of instructions per second instr/sec = 1/[(cycles/instr) * (seconds/cycle)] 2.4 x I ns Performance Ratio = TimeB TimeA = 2.0 x I ns = 1.2 Higher the MIPS, faster the machine Different ISAs/Compilers have incomparable MIPS Machine A is 1.2 times than faster B CIT 595 13 CIT 595 14 Maximizing Performance in relation to Throughput Throughput is amount of work that can be done in a given time Measured in tasks per time unit E.g. x tasks per hour Evaluating Performance Step I Choose a workload Workload: set of programs whose performance you care about Use standardized workload (known as benchmarks) Step II Decide the metric Execution Time or Throughput Step III - Evaluate data gathered using statistical analysis Statistical method use depends on Metric Distribution of the data So higher the throughput, better the performance Also, Throughput and Response Time are inversely related If system carries out a task in k seconds then its throughput is 1/k tasks per second Throughput can be increased: Pipelining at instruction level (chp 5) Timesharing a computer system (chp 8 - OS) Achieving Parallelism (chp 9) Superscalar Processor Use multiprocessors (Shared Memory, Distributed, Chip Multiprocessors) CIT 595 15 CIT 595 16 4 Standard Performance Evaluation Corporation (SPEC) Is consortium that collects, distributes and standardizes benchmarks Produces benchmark suites for various classes of CPU, Java, I/O, Web, Multithreaded etc. E.g. CPU 2006 consists of 29 CPU intensive C/C++, Fortran programs integer: perl, gcc,bzip2(compression),sjeng(AI: chess) floating point: povray(ray tracing), wrf (weather prediction),sphynx3 (speech recognition) Statistical Analysis: Arithmetic Mean Arithmetic Mean = Used for units that proportional to time The arithmetic mean can be misleading if the data are skewed or scattered Like SPEC, there is Transaction Processing Council (TPC) Used for web/database server workloads Programs are I/O or network intensive rather than CPU intensive CIT 595 17 CIT 595 18 11.3 Mathematical Preliminaries Represented as: Geometric Mean Geometric Mean Example Unlike an arithmetic mean, tends to dampen the effect of skew Used for unit less quantities e.g. performance ratio/speedup Performance results are stated in relation to the performance of a common machine used as reference CIT 595 19 Ratio = Ex (Machine Reference)/ Ex(Your Machine) Geometric Mean for System A normalized with respect to System B: = (100/50 x 400/200 x 500/250 x 800/400 x 4100/5000)1/5 = 1.6733 CIT 595 20 5 Geometric Mean Consistency The results that we got when using System B and System C as reference machines are given below We find that Geometric Mean of A to Geometric Mean of B to be Harmonic Mean Used to units that inversely proportional to time i.e. throughput Unlike latencies, throughput cannot be added E.g. 1st - 10 mile @ 30 mph, 2nd - 10 miles @ 40 mph and last 10 miles @ 60 mph Average is not 43.3 mph consistent regardless which system is taken as reference 1.6733/1...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - CIT - 07
Assembly: Human-Readable Machine LanguageComputers like ones and zeros 0001110010000110 Humans like readable form Chapter 7Assembly LanguageADDOpcodeR6, R2, R6Dest Src1 Src2; increment index reg.CommentAssembler A program that turns h
UPenn - CIT - 593
Assembly: Human-Readable Machine LanguageComputers like ones and zeros 0001110010000110 Humans like readable form Chapter 7Assembly LanguageADDOpcodeR6, R2, R6Dest Src1 Src2; increment index reg.CommentAssembler A program that turns h
UPenn - CIT - 07
I/OFrom chapter 8 we know that I/O access is privileged as it involves accessing device registers. Normal programs asks another privileged program such as OS to perform I/O on its behalfChapter 18 I/O Part IBased on slides McGraw-Hill Modified
UPenn - CIT - 593
I/OFrom chapter 8 we know that I/O access is privileged as it involves accessing device registers. Normal programs asks another privileged program such as OS to perform I/O on its behalfChapter 18 I/O Part IBased on slides McGraw-Hill Modified
UPenn - CIT - 595
Outsourcing:The Rise of the EastHardware and Software JobsToday, if you tell people youre a programmer, theyll ask you how long you have until your employment benefits run out. Bill Blunden What is the future of American IT jobs? What are the
UPenn - CIT - 595
EvolvableHardwareBrenda Lin Mai IrieEvolvableHardware?!?In Most Simplest Terms: Reconfigurable Device + Evolutionary AlgorithmOkayWhat would you use it for? Circuit Design for Problems That are Difficult to Solve Using Conventional Methods Au
UPenn - CIT - 595
NoraApsel AdamRothblattWhat is User Interface Design? The process of planning and designing how users interact with computer systems. Applies to a wide variety of systems. Examples: Software Websites Remote Controls KiosksHuman-Computer In
UPenn - CIT - 07
Parallel Architectures in BiotechnologyNina Baron-HionisParallel Architecture OverviewTaxonomy SISD SIMD MISD MIMDCentralized Shared Memory Fewer then 3 dozen processors Share a single memory No processor has priority to use Does n
UPenn - CIT - 595
Parallel Architectures in BiotechnologyNina Baron-HionisParallel Architecture OverviewTaxonomy SISD SIMD MISD MIMDCentralized Shared Memory Fewer then 3 dozen processors Share a single memory No processor has priority to use Does n
UPenn - CIT - 593
#before: data1#Location Valuebf8ed457 abf8ed458 bbf8ed459 cbf8ed45a dbf8ed45b e#before: data2#bf8ed452 abf8ed453 bbf8ed454 cbf8ed455 dbf8ed456 ebf8ed
UPenn - CIS - 110
Midterm Grading Guidelines1a) All or nothing1b) -2 if no explanation -1 or -2 depending on the reasoning1c) -1 point for not providing definition of a constant -1 point for incorrect example of a constant3) -1 for saying only odd fo
UPenn - CIS - 110
CIS110 Summer 2008 Final Grading Guidelines1a,b,d All or nothing1c. -2 if 0 is not one of the answer -2 if n*(n+1) is not one of the answers2. -2 for each wrong true/false3a, b, c. -1.5 for wrong reason -1 for wrong location of
UPenn - CIS - 110
-Final base score stats:-min: 44.00max: 86.00mean: 67.92median: 70.50modes: [ there are too many ]std_dev: 11.70Histogram: 85.5- 90.0 : * 81.0- 85.5 : * 76.5- 81.0 : * 72.0- 76.5 : * 67.5- 72.0 : * 63.0- 67.5 : * 58.5- 6
UPenn - CIS - 110
-Midterm base score stats:-min: 35.00max: 64.50mean: 50.95median: 50.50modes: [47.00;55.50]std_dev: 8.84Histogram: 66.5- 70.0 : 63.0- 66.5 : * 59.5- 63.0 : * 56.0- 59.5 : 52.5- 56.0 : * 49.0- 52.5 : * 45.5- 49.0 : *
UPenn - CIS - 110
-Homework: Number Personalities base score stats:-min: 34.00max: 100.00mean: 84.48median: 92.00modes: [92.00;96.00;100.00]std_dev: 18.99Histogram: 95.0-100.0 : * 90.0- 95.0 : * 85.0- 90.0 : * 80.0- 85.0 : * 75.0- 80.0 :
UPenn - CIT - 07
Pointers and ArraysWe've seen examples of both of these in our LC-3 programs; now we'll see them in CChapter 16 Pointers and ArraysBased on slides McGraw-Hill Additional material 2004/2005 Lewis/Martin Modified by Diana PalsetiaPointerAddres
UPenn - CIT - 593
Pointers and ArraysWe've seen examples of both of these in our LC-3 programs; now we'll see them in CChapter 16 Pointers and ArraysBased on slides McGraw-Hill Additional material 2004/2005 Lewis/Martin Modified by Diana PalsetiaPointerAddres
UPenn - CIT - 593
Basic C ElementsVariables A data item upon which the programmer performs an operationChapter 12 Variables and OperatorsOperators Predefined actions performed on data items Combined with variables to form expressions, statementsBased on slid
UPenn - CIT - 593
What to expect?1.5 Hour Exam, Closed book and notes Anything specific needed will be provided LC3 ISA instructions ASCII tableFinal Exam ReviewConcentrate on material after midterm Similar format seen on quizes and midterm But LC3 ISA instru
UPenn - CIT - 595
HOMEBREW COMPUTING AND THE NINTENDO DS:Programming and Controlling a Dedicated-Purpose Computer SystemCraig Schroeder and Luke Walker CIT 595, University of Pennsylvania April 28, 2008ABSTRACTPortable video game systems are not only a multibill
UPenn - CIT - 07
RAIDTechnologyandDataStorageTodayJeffreyDoto BrandonKrakowskyth April15 ,2007AbstractWithinformationgenerationanddatatransferspeedatanalltimehigh,datastorageis fastbecomingoneofthefastestgrowingindustriesintheworld.Enterpriselevel corporations,e
UPenn - CIT - 595
RAIDTechnologyandDataStorageTodayJeffreyDoto BrandonKrakowskyth April15 ,2007AbstractWithinformationgenerationanddatatransferspeedatanalltimehigh,datastorageis fastbecomingoneofthefastestgrowingindustriesintheworld.Enterpriselevel corporations,e
UPenn - CIT - 07
CIT 595Grid ComputingVincent PoonUniversity of PennsylvaniaOby SumampouwUniversity of PennsylvaniaABSTRACT Grid computing brings the diverse resources of multiple administrative domains to bear on large scale computing problems. Recent advan
UPenn - CIT - 595
CIT 595Grid ComputingVincent PoonUniversity of PennsylvaniaOby SumampouwUniversity of PennsylvaniaABSTRACT Grid computing brings the diverse resources of multiple administrative domains to bear on large scale computing problems. Recent advan
UPenn - CIT - 595
System Software: Programming ToolsProgramming tools carry out the mechanics of software creation within the confines of the operating system and hardware environmentProgramming ToolsCIT 595 Spring 2008These include:Compiler & Assembler Transla
UPenn - CIT - 07
CIT595 ProjectA Study of Wearable ComputingBy: Fatima Boujarwah Laxmi Nair Kok Sung WonAbstractAs computers move from the desktop, to the palm top, and onto our bodies and into our everyday lives, infinite opportunities arise to realize applic
UPenn - CIT - 595
CIT595 ProjectA Study of Wearable ComputingBy: Fatima Boujarwah Laxmi Nair Kok Sung WonAbstractAs computers move from the desktop, to the palm top, and onto our bodies and into our everyday lives, infinite opportunities arise to realize applic
UPenn - CIT - 593
I/OFrom chapter 8 we know that I/O access is privileged as it involves accessing device registers. Normal programs asks another privileged program such as OS to perform I/O on its behalfChapter 18 I/O in C + MiscBased on slides McGraw-Hill Modif
UPenn - CIT - 595
Microprogrammed ControlEach machine instruction is in turn implemented by a series of instructions called microinstructions A microinstructions encodes Control signal for carry out a particular stage in the instruction cycle Next (likely) microinst
UPenn - CIT - 595
Speech RecognitionAlexis Baird and Michael Gibney CIT595 Final Project Proposal April 28th, 2008Table of ContentsSample Speech Recognizer Outputpage 3 Introduction.page 4 Applications.page 4 From Analog to Digital.page 5 Analysis of Phonemes.pag
UPenn - CIT - 595
MotivationData that is either transmitted over communication channel (e.g. bus) or stored in memory is not completely error freeError Detection and CorrectionCIT 595 Spring 2008Error can caused by: 1. Transmission ErrorsSignal distortion or at
UPenn - CIT - 595
What Do We Know?Already discovered: Gates (AND, OR.) Combinational logic circuits (decoders, mux) Memory (latches, flip-flops) Sequential logic circuits (state machines) Simple processors (programmable traffic sign)Processor Data Path and Con
UPenn - CIS - 630
Lecture notes by Edward Loper Course: CIS 630 (NLP Seminar: Structural Representations) Professor: Joshi Institution: University of Pennsylvania1Monday, January 15, 20010.1 Representationally Oriented Grammars(or Grammars for Analysis, Grammar
UPenn - CIS - 570
Lecture notes by Edward Loper Course: CIS 570 (Modern Programming Language Implementation) Professor: E Christopher Lewis Institution: University of Pennsylvaniahttp:/www.cis.upenn.edu/~eclewis/cis5701Monday, January 15, 2001Assignments: m
UPenn - LING - 554
Lecture notes by Edward Loper Course: Ling 554 (Type-Logical Semantics) Professor: Bob Carpenter Institution: University of Pennsylvania1Tuesday, October 3, 2000 1 Review of update semanticsdistinction between world knowledge & discoures l
UPenn - CIS - 630
Lecture notes by Edward Loper Course: CIS 630 (Lexical Semantics) Professor: Martha Plamer Institution: University of Pennsylvania1Tuesday, October 3, 2000 1 Karin Kipper: Word Sense DisambiguationComparison of 3 dierent approaches to word sens
UPenn - CIS - 639
Lecture notes by Edward Loper Course: CIS 639 (Statistical Approaches guage Processing) Professor: Mitch Marcus Institution: University of Pennsylvaniahttp:/www.cis.upenn.edu/~mitch/cis639.htmltoNaturalLan-1Logisticsgo over section III o
UPenn - CIS - 620
Lecture notes by Edward Loper Course: CIS 620 (Advanced Topics in AI) Professors: Michael Kearns and Lawrence Saul Institution: University of Pennsylvaniahttp:/www.cis.upenn.edu/~mkearns/teaching/cis620/cis620.html1Wednesday, January 9, 2002 1
UPenn - LING - 590
Lecture notes by Edward Loper Course: Ling 590 (Pragmatics I) Professor: Ellen Prince Institution: University of PennsylvaniaCheck the web page! check on ling talks? email to manager@ling.upenn.edu ask to be added to the pennguists mailing list.1
UPenn - CIS - 630
Lecture notes by Edward Loper Course: CIS 630 (Machine Learning Seminar) Professor: Fernando Pererez Institution: University of Pennsylvania1LogisticsOce hours: wed before class1.1Projectwrite a summary/somehting that will stand the test
UPenn - CIS - 700
AMPLITUDE CONVERGENCE IN CHILDRENS CONVERSATIONAL SPEECH WITH ANIMATED PERSONASRachel Coulston, Sharon Oviatt and Courtney DarvesDepartment of Computer Science and Engineering Oregon Health & Science University +1-503-748-1602; {rachel|oviatt|court
UPenn - CIS - 700
Mining a Lexicon of Technical Terms and Lay EquivalentsNoemie Elhadad and Komal Sutaria Computer Science Department The City College of New York New York, NY 10031 noemie@cs.ccny.cuny.edu, kdsutaria@gmail.com AbstractWe present a corpus-driven meth
UPenn - CIS - 430
A taxonomy of web searchbroder@us.ibm.com(Most of the work presented here was done while the author was with the AltaVista corporation.Andrei Broder IBM ResearchAbstract:Classic IR (information retrieval) is inherently predicated on users sear
UPenn - CIS - 700
(Proceedings, 1996 International Symposium on Spoken Dialogue, ISSD-96. Philadelphia, PA, pp. 41-44. Copyright 1996 by the Acoustical Society of Japan.)LEXICAL ENTRAINMENT IN SPONTANEOUS DIALOGSusan E. Brennan Department of Psychology State Unive
UPenn - CIS - 700
HOW DO SYSTEM QUESTIONS INFLUENCE LEXICAL CHOICES IN USER ANSWERS? J. Gustafson1, A. Larsson1, R. Carlson1 and K. Hellman2Department of Speech, Music and Hearing, KTH Box 70014, S-10044 Stockholm, Sweden Tel.:+46 8 790 7879 Fax: +46 8 790 7854 E-mai
UPenn - CIS - 700
Comprehending Technical Texts: Predicting and Dening Unfamiliar TermsNoemie Elhadad, Ph.D.Department of Computer Science, City College of New York, New York, NYWe investigate how to improve access to medical literature for health consumers. Our f
UPenn - CIS - 700
Modality Convergence in a Multimodal Dialogue SystemLinda Bell1, Johan Boye2, Joakim Gustafson1 and Mats Wirn2Centre for Speech Technology, KTH Drottning Kristinas vg 31, S-100 44 Stockholm, Sweden bell@speech.kth.se, jocke@speech.kth.se Telia Rese
UPenn - CIS - 700
gTheories of Discourse for modelling Conjunctive CohesionAdvaith SiddharthanResearch Associate Computer LaboratoryAdvaith Siddharthan May 17, 2006 p.1/48OverviewgTerminology Coherence vs Cohesion Conjunctive vs Anaphoric Rhetorical Structur
UPenn - CIS - 700
Read-X: Automatic Evaluation of Reading Difficulty of Web TextEleni Miltsakaki Graduate School of Education University of Pennsylvania, USA elenimi@gse.upenn.edu Audrey Troutt School of Engineering and Applied Science University of Pennsylvania, USA
UPenn - CIS - 700
Measuring Coherence1Running head: TEXTUAL COHERENCE USING LATENT SEMANTIC ANALYSISThe Measurement of Textual Coherence with Latent Semantic AnalysisPeter W. Foltz New Mexico State UniversityWalter Kintsch and Thomas K. Landauer University o
UPenn - CIS - 430
Language Identi cation: Examining the IssuesPenelope Sibun and Je rey C. Reynar y The Institute for the Learning Sciences Northwestern University 1890 Maple Avenue Evanston, IL 60201y zzDept. of Computer and Information Science University of Pe
UPenn - CIS - 430
Entropy of Search Logs: How Hard is Search? With Personalization? With Backoff?University of Illinois at Urbana Champaign Urbana,IL 61801Qiaozhu Mei qmei2@uiuc.educhurch@microsoft.eduthese large investments be wiped out if a small cache of a
UPenn - CIS - 430
CIS430November6,2008 EmilyPitler3NamedEntities 1or2words Ambiguousmeaning Ambiguousintent45MeiandChurch,WSDM20086Beitzelet.al.SIGIR2004 AmericaOnline,weekinDecember2003 Popularqueries: 1.7wordsOverall: 2.2words7Le
UPenn - CIS - 430
Some int er est ing dir ect ions in Aut omat ic Summar izat ionAnnie L ouis CI S 430 12/02/081Todays lect ur eM ult i-st r at egy summar izat ionI s one met hod enough?Per for mance Confidence Est imat ionWould be nice t o have an indicat io
UPenn - CIS - 430
Text categorization Feature selection: chi square test1JointProbabilityDistributionThejointprobabilitydistributionforasetofrandomvariablesX1Xn givestheprobabilityofeverycombinationofvaluesP(X1,.,Xn) Sneeze Cold 0.08 Cold 0.01Sneeze 0.01 0.9
UPenn - CIS - 430
Discourse, coherence and anaphora resolutionLecture 161What is discourse? Anypiece of text consisting of more than one sentence now our lectures revolved mainly around topics concerning word-level or sentence-level analysis. Until2Disc
UPenn - CIS - 430
ProbabilityTheory BayesTheoremandNaveBayes classification1DefinitionofProbabilityProbabilitytheoryencodesourknowledge orbeliefaboutthecollectivelikelihoodof theoutcomeofanevent. Weuseprobabilitytheorytotrytopredict whichoutcomewilloccurforagiven
UPenn - CIS - 430
Course overview Introduction to summarizationLecture 1 Instructor: Ani Nenkova505 Levine, nenkova@seas.upenn.edu Office hours: Tuesdays 3:154:15 or by appointment TA:Annie Louislannie@seas.upenn.eduTextbookNo required textSlid
UPenn - CIS - 430
Measures of association: chi square test, mutual information, binomial distribution and log likelihood ratio Lecture81ExperimentsinMultidocument summarization(SNM02)Summarizationsystembasedonarangeoffeatures Raisesissueswehavenotdiscussedupton
UPenn - CIS - 430
Discussion of assigned readings Lecture 131 Howdoes smoothing help in Bayesian word sense disambiguation? How do you do this smoothing?Most words appear rarely (remember Heaps law) Themore data you see, the more words you had never seen b
UPenn - CIS - 430
Introduction to Language Models Evaluation in information retrievalLecture 4Last lecture: term weightingtf.idf term weightingtf .idf w,dN = tf w,d log df wtfw,d = # of occurrences of word w in doc d term frequency N = number of document