August 26, 1999 Dr. William Connett QUICK REFERENCE FOR PC SAS 1. To begin. You can logon from any pc that has SAS installed. On this campus, SAS is available on the pc's in the computing labs that run SAS 6.12 running in the Windows NT environment (in SSB 103 or 452, and Benton 232). Bring a formatted 3.5" diskette, and sit down at one of these machines, and click on the Windows flag in the lower left hand corner. Go to the Programs line and pick SAS, and then the first time you begin SAS, or whenever you want to format a new diskette for SAS, choose the option to "Download Connett's profile to your a disk." This task need only be done once, and can be repeated whenever you wish to format another disk. Wait a minute until this task is completed. Then choose the option to start SAS from your a-disk using Connett's profile, be sure you choose the correct one (there are two buttons which start SAS from the a-disk.) SAS will start, and your work session will be configured so that the three most important windows (the program, log, and output windows) will all have a command line in them, the program windows will have numbered lines in it, and each window will have a distinctive color, so it will be easy to tell where you are. The program window is the color cyan, the log screen is yellow, and the output screen is white. Your printer will be configured in SAS monspace font, 9 pitch, 135 characters/line and 44 lines/page, it will print in landscape mode. The following commands can be used to achieve this configuration, which is the default when you start from your a-disk. You may want to customize your work area. They may be entered on any command line. pmenu@ (I will use the @ key to indicate enter) (puts a command line in each window) numbers on@ (this puts numbered input lines in the program window) pgm@ (this will move you to the program window) color back cyan@ (Sets the background color of the current window to cyan) log@ (this will move you to the log window) color back yellow@ wsave@ (this saves the new configuration to the a-disk) Another extremely useful window is the keys window, which is accessed by typing keys@ on any command line (and close the window by typing end@ on the command line in that window).In this window is stored the custom definition of each key, which you can eventually modify as you need. SAS provides a complete working environment, almost a mini operating system, which is fairly uniform across all platforms. Once you learn it here, you will be able to work on many different platforms. You will do your work by using the SAS product INSIGHT to analyze various SAS data sets, or by creating program files (eg. prog1.sas), data files (eg. prog1.dat), and then you will print out the file in your output window to give to me. Your first task is to use the product INSIGHT to reproduce the various graphs found in Chapter one of our textbook, Petrucelli, Nandram, and Chen Applied Statistics. To do this go to any command line available to you and type the word INSIGHT@. A dialog box will appear asking which library of SAS data sets you want to visit. The library M132 contains all of the data sets on the diskette that came with the book. Highlight this choice, and press @. The second column will then display the contents of this library. Highlight the entry ELECE and @. This will bring the data set into a spread sheet like display. You can then construct the various graphs that are needed using the ANANYSIS button on the menu bar, and following the instructions in the "Doing it in SAS" manual. You do not have to print them all out, but you should print out one or two, because it is a little tricky. You should also add titles and footnotes by typing TITLE@ on any available command line and entering the appropriate titles in the dialog box that appears. you exit the dialog box by typing end on the command line. A similar procedure will allow you to put your name and the date in a footnote. Your next task is to create a sas data set containing the 15 measurement of my desk, that were taken in class on Thursday. First create a file containing the data that you wish to analyze. Go to the program window (type pgm@), this window is your basic text editor, and the interface is fairly intuitive. Next to the numbered lines you can enter text, typeover, or delete text. In the numbered lines you may type editing commands such as i (insert one line here, ib for insert before, ia for insert after) i2 (insert 2 lines here) d (delete this line) also there are block commands for deleting, moving, or copying blocks of text. For example, to mark a block of text for deleting, type the double letters dd before and after the block. Enter will remove the block. To move a block, use mm ...mm, to mark the block, and then enter b or a on the line before or after which you wish the block to move. Now go to the first line and begin to type in the raw data for this problem, in the format of the group number (1,2 or 3) and then a space and then the measurement taken by that group. You should have 15 lines of input. When you are done, press the home key to go to the command line, and type: file 'desk.dat'@. Now to create the SAS program to analyze this data. Type clear on the command line of the program window, to create a clean sheet. Now create a file called 'desk.sas', to contain the SAS program. (All the SAS commands in this handout are in capital letters. This is no longer necessary.) Notice that all programs end with a "RUN;" command. OPTIONS LINESIZE = 135 PAGESIZE = 44 PAGENO = 1; FILENAME PROBLEM 'DESK.DAT'; LIBNAME A 'A:\'; DATA A.DESK; INFILE PROBLEM; INPUT GROUP M; PROC PRINT; PROC SORT; BY GROUP; PROC MEANS MEAN STD VAR; VAR M; BY GROUP; RUN; QUIT; The data step creates a SAS data set which contains the values for the two variables group and m for each of the 15 measurements taken, and stores this data set on your floppy under the name 'desk.sd2'. The first proc step then prints out the new sas dataset, and the second proc calculates the means and variances of the five measurements taken by each group. First save this file by pressing the home key, and then typing the command: file 'desk.sas'@. To run this program, type in the command line sub@, and watch the log window to see what is happening. If all goes well, there will be no red lines indicating errors in the log, and you will be thrown into the output window in order to view your results. If all does not go well, go to the log window, and see what your errors are, and then return to the program window, recall the program you just ran by typing recall@, and fix the errors. Eventually you will get the correct answer, and you can go the command line of the output window, and type dlgprt@, which will initiate a dialog print box. You want to check to see that you have selected landscape output, and a constant width font, such as SAS monospace, 9 pitch. This should allow you to obtain 135 characters/line, and 44 lines/page. After you print the output window be sure to clear this window or you will end up printing it again. To bring back a sas program that you have stored on your a-disk, say desk.sas, go to the command line in the program window, and type include 'desk.sas'@. To bring back a data file, say desk.dat, type include 'desk.dat'@. MORE ON SAS SAS programs always contain two steps, first a SAS data set is created or retrieved with a "DATA" step, and then this data set is analyzed with a "PROC" step. The "DATA" step Either you will create your own raw data file, or you will access a data set that has already been created. (1) "Raw data" An example of this was given above. (2) a preexisting SAS data set, such as the umsl questionaire, can be accessed in a simpler fashion. Since it is often convenient to convert a large data set into the SAS format once, and then use parts of it again and again in your analysis, all you then need to do is get it. For example, if you have a SAS data on your diskette in the a-drive, called umsl.sd2, and it contains responses to Q3 and Q4, then the following brings in the big data set and keeps only the responses to Q3 and Q4. The proc step will then print out the contents. LIBNAME A 'A:\'; DATA; SET A.UMSL; KEEP Q3 Q4; PROC PRINT; RUN; You may also include in any data step commands to edit the data. For example: subsetting if: IF Q3 ='A'; if..then: IF Q3='A' THEN Z=1; ELSE Z=0; Create new variables: Z = 2*Y; THE "PROC STEP" (1) Wilcoxon (where METHOD is the condition or class variable, SCORE the ordered or rating variable. ) PROC NPAR1WAY WILCOXON; CLASSES METHOD; VARIABLES SCORE; (It does not report U or V but an equivallent statistic, S = U +m(m+1)/2, where U = # B's less than A, m = #of A's. This only holds exactly if there are no ties. It also reports the z statistic.) (2) Pearson (where X and Y are both class or condition variables.) PROC FREQ; TABLES X*Y/CHISQ; (This produces a chi-square statistic which is the square of the statistic in Spitznagel.) (3) Kendall (where X and Y are both ordered or rating variables.) PROC CORR KENDALL; VARIABLES X Y; (This produces Kendall's tau which is defined as tau = S/(n(n-1)/2). Test for large S by testing for large tau.) (4) To print the data set; PROC PRINT; (5) To plot the data set PROC PLOT; PLOT Y*X; To plot two graphs on the same axis PROC PLOT; PLOT Y*X ='*' Z*X='+'/OVERLAY; (6) Titles are an important part of the documentation of your program, they contain information that will appear on every print out you produce. Put them at the begining of the program, and then they can be changed as you move through the program. TITLE1 PROBLEM #1; TITLE2 FOR MATH 132; TITLE3 BY JOE JONES; (7) Normal statistics for the continuous random variable X. PROC MEANS; VARIABLES X; (8) T- Test for the continuous random variable Y classified by the class variable X. PROC TTEST; CLASSES X; VARIABLES Y; (9) Regression fits the model E[Y] = a + bX. PROC GLM; MODEL Y = X; To capture more information from GLM for later analysis, add the lines OUTPUT OUT = NEW P = PRED R = RESID; This creates a new SAS data set "NEW" which contains the new variables PRED and RESID which can later be plotted or analyzed. (10) Contingency tables PROC FREQ; TABLES Q1*Q7/CHISQ CELLCHI2 EXPECTED; (11) To classify the data by one variable and calculate the means for each subclass use the BY option. LIBNAME A 'A:\'; DATA; SET A.UMSL; KEEP Q1 Q4; PROC SORT; BY Q1; PROC MEANS; VARIABLES Q4; BY Q1; *******************************************************************