Advanced Programming in Quantitative Economics Introduction, structure, and advanced programming techniques 17 – 21 August 2009, Aarhus, Denmark Charles Bos [email protected] VU University Amsterdam Tinbergen Institute Advanced Programming in Quantitative Economics – p. 1

Day 5 - Morning 9.00L Data handling Difference in formats Reading large datasets Selecting and transforming 10.30P Selected exercises. Choice of Reading HF data Implement HF Autoregressive Duration Model Using graphing package AR(p) estimation with/without ARFIMA package 12.00 Lunch Advanced Programming in Quantitative Economics – p. 2
Input and Output file type extension Names Remark ASCII matrix file .mat - Convenient input ASCII data file with load information .dat + PcGive/OxMetrics data file .in7 (with .bn7) + Retains variable names Excel spreadsheet file .xls + Common format, take care Lotus spreadsheet file .wks/.wk1 + Gauss data file .dht (with .dat) + Gauss matrix file .fmt - Small, save results Stata data file .dta - Input only text file using fscan / fprint functions ? Lots of control binary file using fread / fwrite functions ? loadmat.ox #include <oxstd.h> main() { decl mX, asVar; mX = loadmat("data/data.xls", &asVar); println("Saving data in 5 types of files: ..."); savemat("excl/lm_data.mat", mX); savemat("excl/lm_data1.dat", mX, asVar); savemat("excl/lm_data.dht", mX, asVar); savemat("excl/lm_data.in7", mX, asVar); savemat("excl/lm_data.fmt", mX); println("done."); } Advanced Programming in Quantitative Economics – p. 3

Input: mat format Text-based format, very convenient for inputting your data Starts with two numbers, rows and columns of matrix Followed by numbers; output matrix is filled row-by-row Numbers are separated by space, comma or new-line ‘.’, ‘m’, ‘M’ and ’.NaN’ are considered missing ‘.Inf’ is infinity Other text leads to skipping the remainder of the line Paths You may specify a full path (but relative paths are easier). Use either ‘/’ (preferably) or ‘ \\ ’ in your path. The file is searched first starting from the present directory, then along the value of the OX5PATH environment variable (i.e., usually in the main Ox directory and its include subdirectory). Advanced Programming in Quantitative Economics – p. 4
stackloss.mat example stack/data/stackloss.mat 4 21 // Stackloss.mat // // Hoeting, J. A., Madigan, D., and Raftery, A.E. (1996) ‘‘A Method for // Simultaneous Variable Selection and Outlier Identification in Linear // Regression,’’ Journal of Computational Statistics and Data // Analysis , 22 , 251-270. // // Data source: // Brownlee, K. A. (1965), "Statistical Theory and Methodology in // Science and Engineering", 2nd edition, New York:Wiley. // // Rows contain: // Air Flow, Water Temperature, Acid Concentration, Stack Loss 80 80 75 62 62 62 62 62 58 58 58 58 58 58 50 50 50 50 50 56 70 27 27 25 24 22 23 24 24 23 18 18 17 18 19 18 18 19 19 20 20 20 89 88 90 87 87 87 93 93 87 80 89 88 82 93 89 86 72 79 80 82 91 42 37 37 28 18 18 19 20 15 14 14 13 11 12 8 7 8 8 9 15 15 Great format to save your data, easy reference etc.

