| Title |
Description (Page under Construction) |
|
Some Introductory Comments about SAS... |
Sas Introduction:
SAS is statistical analysis software developed out of Cary, North Carolina. It has
been around for years and probably is the largest supplier of statistical software
in the nation. Some people use it but are not interested in its statistical capabilities.
These individuals like the way SAS handles very large datasets and crunches a lot of
numbers for reporting. Other people use it for modeling purposes. Some use it to
develop and maintain a data warehouse.
SAS is primarily a command driven statistical programming language. It has something
called PROCS that are typically short commands designed to do a certain task very
quickly. There is a proc for just about everything - from printing to estimating a
host of regression models. A good book for beginners The Little SAS Book - written by
Lora Delwiche and Susan Slaughter. In addition to a lot of handy procs, SAS can be used as
a flexible programming language. However, as a programming language, SAS will appear
quirky if you are use to working in other languages.
The purpose of the following downloads is to give you some help in getting up to speed
on programming in SAS as it relates to modeling. The code that follows was not
written for efficiency's sake, but to illustrate concepts. As always, I have to say that
these programs are without warranty and I accept no liability for their use or installation.
|
| Calling SAS from VB6 |
Before we get started with the SAS stuff, there are times you may want to call
SAS from a Visual Basic program. This is easy to do and can be done by leaving the
user in SAS to examine the output, or you can do everything in the background and
hide SAS completely. It's easy if you know how!
|
| Summing Variables in SAS |
One of the things SAS is quirky about is how you are forced to take a different appraoch
in summing a column of numbers as opposed to other programming languages. This is
because SAS automatically reads the rows or observations for you rather than you providing
it with the code to do so like you would in Visual Basic and other languages. In order to
sum a column of numbers, you need to use the RETAIN statement. This is highlighted in a simple
example.
|
| Reordering Variables |
Its easy to reorder the variables in a SAS dataset if you know how. Simply use a
RETAIN statement before the SET statement.
|
| Handling Missing Data |
You can use an ARRAY statement to quickly run through a dataset and substitute
a selected value for missing information.
|
| SAS Date Manipulation |
Date manipulation can be tricky. Here is a useful programming example in SAS.
|
| Numeric & Character Variables |
Converting Numeric to Character values and vice-versa is easy if you know how.
|
| Merging Datasets |
Merging (joining) datasets together is easy using the MERGE statement along with
the (IN=) statements. Be sure both datasets are properly sorted first.
|
| Making .PDF Files |
The ODS statement in SAS can make you a nice looking .pdf file for any output
including graphs.
|
| Simple Macro |
This program shows you how to write a simple macro which is very similar
to a subroutine in other languages where you can pass parameters. The secret
to using a macro centers on the use of the "&" symbol where SAS uses it as a
key to text substitution where it is dynamically resolved within the program.
|
| Using SYMPUT in your Macro |
In order to perform the same level of coding in SAS as you do in other
porgramming languages, you need to understand the SYMPUT command. This command
allows you to store information from the Data Step and hold it in memory.
You really can't live without the SYMPUT command if your programming in SAS.
|
| Removing 1st Character of Variable Name |
When you do programming in SAS, sometimes you have to manipulate the names of the
variables. This program drops the 1st character of the variable name for ALL variables
in your dataset. This can be modified to other things very easily. SWEET!
|
| Cross-Tab Macro |
SAS can develop crosstabs easily through PROC FREQ and Tables command. Using
these commands, you can print out a nice looking nXn matrix of counts and percentages.
However, there is no super quick way to place this information in a SAS dataset unless
you do a little programming with PROC TRANSPOSE and use something called the SPARSE command
when you do PROC FREQ. Here is a macro that does all the hard stuff for you. Believe me,
this will save you a great deal of time over the years!
|
| Importance Ratio |
Sometimes the analyst has the problem of having TOO many variables to work with. There
are a number of alternatives in trying to condense the sheer volume of variables down to
some managable number. You could perform some Factor Analysis on the data. A quick alternative,
however, might be to calculate an index that measures the importance of the variable in
relation to the dependent variable. This routine shows you how to calculate such a ratio
that can be used in a univariate sense to trim down the number of variables for your regressions.
The higher the 'importance ratio', the more predictive the variable from a relative
perspective.
|
| Identify / Remove Correlated Vars |
In regression analysis, one of the most frustrating challenges is that sometimes
variables are too correlated with one another to all be included as predictors. This program
provides at least one satisfying solution to that problem.
|
| Delete Missing Data Vars |
In regression analysis, you typically do not want to include variables with a large
percentage of missing information. This program deletes them for you automatically.
|
| Variable Clustering |
In regression analysis, you might have data that is correlated, so you could
identify clusters of information that are attempting to explain the same thing. This
program shows you how and extracts the recommended variable from each cluster that you
might want to try in your regression.
|
| Bootstrap Resampling |
Bootstrap resampling is a way to determine if your regression model is over-optimistic
with regard to accuracy. The idea is to automatically create resamples of your original
dataset of the same size a number of times and re-estimate your model each time. Next,
you would measure its accuracy each time (KS, R-sq, AIC, etc.) and average the results.
This procedure is especially useful when you have a limited number of bads and cannot
afford to have a separate hold-out sample. Note: this is a sampling procedure WITH
replacement, so you could have an observation from your original dataset appear
more than once in each bootstrap sample. Great idea!
|
| Jackknife Estimation |
In this example, Jackknife sampling holds out a single observation from the
original dataset for model estimation. However, it does it repeatedly for as many times
as there are observations in your data. Jackknifing can be used for a variety of purposes,
but one application is to examine the influence of each observation on your regression
estimates. If you see the estimate for a particular coefficient change a significant
amount, then you might view that observation as a possible outlier. This example uses
logistic regression to estimate the model and to score the single holdout observation
each time.
|
| Split Sample Jackknife |
Simple Jackknife estimation typically uses a single holdout observation for analysis
purposes. A more generalized approach is Split Sampling where the size of the holdout sample
can vary. In this example, a holdout sample of 10% is selected, repeated 20 times, and
scored. This is another good method of testing the accuracy of your model - especially
when the number of bads is smaller.
|
| Scanning Macro Var Lists |
This is a neat code example where you can make a list of characters that can be
used to scan a dataset to flag, say invalid records. Cool use of the Scan function
in SAS.
|
| VARS Upper/Lower Case |
Here is a neat bit of code to automatically change your variable names to upper or
lower case. Uses %sysfunc and RENAME statement. The macro also allows you to do
a search and replace on a character value within the variable name in case there
are some troublemakers there. A good example is to replace '_' with 'x' in the all the variable
names. This part uses the TRANSLATE function. Very quick routine!
|
| Alphabetically Reordering Variables |
Although I did not write this code, this program shows you how to automatically
reorder your variables alphabetically. Very handy.
|
| Dynamically Reading Variable Names |
This program shows you how to dynamically retrieve variable names using the SYSFUNC command
from a DO LOOP in SAS.
|
| Handling Missing Values |
This program shows you how to take your data and substitute the mean, mode, or median
values for missing values. Useful in credit scoring applications where you typically have
25-30% of your data missing. If you do not do something about missing values, your
regression procedure will automatically skip the observation, substantially reducing the
size of your data.
|
| Handling Missing Values |
Sometimes you may want to handle missing values a little differently than using
the mean, mode, or median values as proxies. Typically in scoring applications, if the
event probabilities are signficantly different between accounts that have missing values
and accounts that have valid values, then assigning proxies from sample averages could
be less than optimal. This program uses a discretizing process to collapse a continuous
variable into intervals and determine the closest proxy for missing using the sample
event probabilities.
|
| Macro Variables in Regression |
This program illustrates how to get the variable names found significant in a logistic
regression and to use them later in a dymamic way in your program. Very useful.
|
| Creating Credit Scoring Dataset |
If your modeling data is in the form of monthly snapshots, you may have
no built-in easy way to determine if a set of accounts went "BAD" over the performance period.
This program illustrated how to create your dependent variable for a credit scoring
application and merge it with your attribute data obtained from the observation point.
|
| Correlations |
Look at correlations across cross sectional units over time.
|
| Graphing Logit Models |
This program illustrates how to automatically graph logit models for each of
the independent variables in the regression. Very useful in determining where
the model variables are most sensitivity and if the relationship to the event
probability is S-shaped. Contains more advanced macro programming.
|
| Validating Logit Models |
This program illustrates how to validate a logit or probit model if you are given
the predicted probability and the event variable for a set of observations. This
program produces a ranking table as well as a lift curve (sometimes called a power
curve). Very useful.
|
| Implementation Code |
This program illustrates how to dynamically produce the implementation code
associated with a logit model so you can score a data set easily. Again, very useful.
|
| Single Sided TOBIT |
This program shows you how to estimate a single sided tobit model in SAS (PROC LIFEREG)
and how to score a dataset with implementation code. Output matches SHAZAM
econometrics program. Note: This may not work on SAS version 8.2+.
|
| Double Sided TOBIT |
This program shows you how to estimate a double sided tobit model in SAS (PROC QLIM)
and how to score a dataset with implementation code. Output matches SHAZAM
econometrics program which is provided on the MISCELLANEOUS download page.
This will not work on SAS version 8.2+.
|
| Pairwise & Bivariate Correlations |
Have you ever needed to get information from the correlation matrix in a useful and
meaningful way? Well, its a little tricky, but this program shows you how by pulling
out the upper right hand portion of the matrix into useful tables for analysis purposes.
|
| Stacking and Unstacking Data |
Using Macros, this shows how you can stack and unstack your data if you are working
with cross sectional data such as state and county level information.
|
| Shading Recessions |
Useful for showing economic recessions - simple line graph.
|
| Shading Recessions |
Plotting dual axis graphs with recession shading.
|
| Creating a Slideshow |
Simple new proc in SAS creates a PDF slideshow and allows you to import pictures.
|
| State Thematic Maps |
State Thematic Maps are easy in SAS if you know how!
|
| County Thematic Maps |
County Thematic Maps are easy in SAS if you know how!
|
| Color Based Scatter Plots |
SAS can provide you with a Scatter Plot (XY graph) that is grouped by colors. This is an
excellent way to show differences in a dataset.
|
| Bivariate Granger Causality |
The Bivariate Granger Causality Test is a good way to determine if one variable is a leading indicator
of another variable. This knowledge is often desirable in forecasting applications.The program here is done using a pooled data format.
|
| Multiple Bivariate Granger Causality |
The Bivariate Granger Causality Test in this program is automated to include testing for numerous variables automatically from a list of variables supplied by the user. Again, it is done using a pooled framework.
|
| Mapping Zip code locations |
If you know the 5 digit zipcode, you can easily map locations by state(s).
|
| Mapping locations with Radius |
Shows how to map customers around a 10 mile radius from a business. All
you need is lat/long.
|
| Mapping locations with Radius |
Instead of State, let's look at counties. All
you need is lat/long and the county and state FIPS code.
|
| Mapping locations with Custom Legend Texts |
Use proc format to change legend texts.
|
| Summarizing and counts data automatically with one to many relationships. |
Summarizing data with Character Data can be difficult. Here is a
macro that automates the process.
|
| Useful Frequency Rpt |
This is a useful program written by Chris Swenson for consolidating Frequencies for Character and Numeric Variables. Check out his website at http://sas.cswenson.com/. Nicely done, Chris.
|