ANNOUNCEMENTS2Mini-Project #3 is notout yet! Will be out after the midterm.•It will be linked to from ELMS; will also be available at: •Deliverable is a .ipynb file submitted to ELMS•Due before Thanksgiving (TBD)Please label your ipynbfile something like <lastname>_<firstname>_project3.ipynb•E.g., dickerson_john_project3.ipynb
PROJECT 1 GRADES ARE UP!General comments:People did really well!We used a fairly strict rubric, but if you have a real bone to pick with your grade, please triage through TAs/office hours!Comments for our sanity, moving forward:•df.head(n)-- defaults to n = 5, use ~10, 20, 50 as needed•Please label your ipynbfile something like <lastname>_<firstname>_project3.ipynb•E.g., dickerson_john_project3.ipynb3
COMMON ISSUEOften not a problem! But, sometimes a problem …Example:•df[df[‘intensity’] > 0.1][‘color’] = ‘red’•??????????•This will not set a value in df – assignment is chained•Instead, use df.loc[df[‘intensity’] > 0.1, ‘color’] = ‘red’4A value is trying to be set on a copy of a slice from a DataFrame
MIDTERM: STRUCTURE50 points = 25% of the total grade10 points:•10 True/False questions, 1 point each10 points:•5 multiple choice questions, 2 points each30 points:•10 short answer questions, 3 points eachCompared to the CMSC320 midterm I posted from last semester, this midterm is shorter.5
MIDTERM: CHEAT SHEETYou can use a cheat sheeton the exam:•Create it on your own•Handwritten notes only•One side of one 8.5x11 inch ("normal-sized”) sheet of paperYou’ll turn in your cheat sheet with your midterm6
QUICK MIDTERM REVIEWAs discussed in previous lectures and on Piazza, themidterm can cover:•Up to and including last Wednesday’s lecture (10/17)•Quizzes that were due on or before last Wednesday•Stuff that you should know from doing P1 and P2Everything is online: I know this is a lot of material.•Rule of thumb: open up a slide deck•Do you feel “comfortable” with the material?•Test will be more qualitative than prior 1xx, 2xx, 3xx tests7
DATA COLLECTION (DC) & DATA PROCESSING (DP)We talked about:•Scraping data•RESTful APIs•Structured data formats (JSON, XML, etc)•RegexesData manipulation via Numpy Stack (Numpy, Pandas, etc)•Indexing, slicing, groups, joins, aggregate queries, etcTidy data + meltingVersion control (just know how this works qualitatively)RDMS, a little bit of SQLEntity resolution & other data integration issuesStoring stuff as a graph, and manipulating it9
DC: HTTP REQUESTS?q=cmsc320&tbs=qdr:mHTTP GET Request:GET /?q=cmsc320&tbs=qdr:m HTTP/1.1Host: User-Agent:Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1 10??????????