LAB+Phase+6+CV_Reg_DR.pdf - LAB Phase 6 CV_Reg_DR October...

This preview shows page 1 - 4 out of 11 pages.

LAB Phase 6 CV_Reg_DROctober 12, 2020In [1]:%matplotlibinlinefrompathlibimportPathimportpandasaspdfromsklearn.model_selectionimporttrain_test_split, GridSearchCVfromsklearn.linear_modelimportLinearRegression, RidgeCV, LassoCV, BayesianRidge, Elaimportmatplotlib.pylabaspltimportsklearn.metricsfromdmbaimportregressionSummaryfromdmbaimportadjusted_r2_score, AIC_score, BIC_scoreno display found. Using non-interactive Agg backend/usr/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.mwarnings.warn(message, FutureWarning)/usr/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.mwarnings.warn(message, FutureWarning)In [2]:#Here is the used Toyota Corolla sales dataset from the Netherlandscar_df=pd.read_csv('../resource/lib/public/ToyotaCorolla.csv')car_df=car_df.sample(frac=1)# randomize the datacar_df=car_df.iloc[0:1000]# use only the first 1,000 rows (samples)In [3]:car_df.head()Out[3]:IdModelPriceAge_08_04\111113TOYOTA Corolla VERSO 2.0 D4D SOL (7) MPV312754707711TOYOTA Corolla 2.0 DSL SEDAN LINEA TERRA 4/5-D...850067830834TOYOTA Corolla 1.6 16V LIFTB LINEA TERRA 4/5-D...8950631
911915TOYOTA Corolla 1.6 16V LIFTB LINEA LUNA 4/5-Doors995064582586TOYOTA Corolla 1.4 HB Terra 2/3-Doors895056Mfg_MonthMfg_YearKM Fuel_TypeHPMet_Color...\111520041500Diesel1161...7072199992922Diesel721...8306199968453Petrol1101...9115199958136Petrol1101...5821200031000Petrol970...Powered_WindowsPower_SteeringRadioMistlampsSport_Model\1111101170701000830010019111101058201000Backseat_DividerMetallic_RimRadio_cassetteParking_Assistant\11110007070000830100091111005820000Tow_Bar11107070830191115820[5 rows x 39 columns]In [4]:# get rid of nonessential variables - but why are we removing these variables?car_df=car_df.drop('Id',axis=1)#??? # remove the ID variablecar_df=car_df.drop('Model',axis=1)#??? # remove "Model" variablecar_df=car_df.drop('Mfg_Year',axis=1)#??? # remove "Mfg Year" variablecar_df=car_df.drop('Mfg_Month',axis=1)#??? # remove "Mfg month" variableIn [5]:# isolate predictors separate from the response variable (price), and we're using all pX=car_df.drop(columns=['Price'])y=car_df['Price']2
In [6]:# dummy code the predictors for linear modeling (make sure to remove the first categroyX=pd.get_dummies(X, prefix_sep='_',drop_first=True)In [7]:X.shape# check the number of rows and columnsOut[7]:(1000, 43)In [8]:# let's take a look at all of the predictors after dummy coding; what measures do thesepd.DataFrame(X.columns)Out[8]:00Age_08_041KM2HP3Met_Color4Automatic5CC6Doors7Cylinders8Gears9Quarterly_Tax10Weight11Mfr_Guarantee12BOVAG_Guarantee13Guarantee_Period14ABS15Airbag_116Airbag_217Airco18Automatic_airco19Boardcomputer20CD_Player21

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 11 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Spring
Professor
Satish Boregowda
Tags
Mean squared error

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture