# Stata commands

**NOTE:**
This is a very brief summary of the commands covered in class. You absolutely must
have a look at the online help for the command you need to figure out (`whelp` *command*),
and you should consult the manual for a more extensive understanding of how a given command works,
as only very basic usage is given here.
The notation sort of follows the Stata convention. The commands are set in `typewriter` font
and are to be typed exactly as spelled (or shorthanded as shown by the __under__lined text),
the entries to be filled by the user are shown in *italics*, and optional element are put into
`[`square brackets`]`.

## Stata guts and concepts

Color: context
Syntax: on-line, context

Data types and storage formats: on-line,
context.

Missing values: on-line,
context: 1, 2.

`_n` and `_N`: on-line,
context

Variable list: on-line, context: 1

Condition qualifiers: on-line in,
if, context: 1,
2, 3

`by : ` construct: online, context:
1

Return values: `return, ereturn`

# Stata commands by type

(scroll down for commands in alphabetical order)
## Interface and usability

Help and search: `help, whelp, search,
findit`
Log files: `log, cmdlog`

Hand calculator: `display`

Exit

## Data handling

File operations: `clear, use, ls,
save,
infile, outfile, sysuse,
preserve, restore
`
Operations on variables: `generate, label,
replace, egen, mvencode,
mvdecode, keep, drop,
range, recode
`

Memory: `memory, set memory, compress`

Looking at data: `describe, list, browse,
compare, count
`

Sorting: `sort`, `gsort`,
`aorder`, `order`, `move`

Labels and notes

## Basic summaries

Means, variances, medians, percentiles: `summarize`.
Tabulations: `tabulate, table`.

Correlations, covariances: `correlate, pwcorr`.

Other: `inspect, lv`

## Graphics

Histograms
Box plots

Scatter plots: `scatter, twoway`

## Estimation routines

The summary of the estimation commands: online,
context.
Post-estimation commands:
`
test, testnl, lincom,
nlcom, predict, ereturn
`

Basic methods: `regress, boxcox
`

Regression diagnostics:
online, context;
commands:
`hettest` (heteroskedasticity),
`ovtest` (nonlinearity),
`imtest` (distribution of the regression errors),
`dwstat` (Durbin-Watson test),
`archlm` (LM test for ARCH),
`vif` (collinearity)

## Programming

Executing do-files: `do, run
`
Output:
`capture, display, quietly,
noisily, more
`
Macros: `local, global`

Cycles: `foreach, forvalues`

Stata commands in alphabetical order

`aorder [`*varlist*`]`
Sorts the variables in *varlist* in alphabetic order and moves them to the front of the dataset.
On-line, context

`archlm [, `__l__ags(*#*)]
Lagrange multiplier test for autoregressive conditional heteroskedasticity
On-line, context

`avplot [`*varlist*]
Plots the added variable plot of dependent variable vs. variables in *varlist*, one by one,
conditional on other regressors.
Available only after `regress`
`avplots` gives added variable plots for all regressors in the model.
On-line, context

`browse [`*varlist*`] [if `*exp*`]`
Browse the data
On-line, context

__cap__ture *Stata command*
Executes a command suppressing its output and proceeds further regardless of the error status
If you need to see the output and the error message, enter __cap__ture __noi__sily.
On-line, context

`cd` *directory*
Change the current directory.
Note that you can use both slash (/) and backslash (\)
under Windows, but only slash (/) under Unix.
Put quotes around the full directory name
if it contains spaces.
On-line, context

`clear`
Clears Stata memory. USE WITH CAUTION!
On-line, context

`cmdlog using` *filename*
`cmdlog close`
Logs all the commands issued by the user
On-line, context

`compare `*varname1 varname2*` [if `*exp*`] [in `*range*`]`
Compares the values of two variables
On-line, context

`compress [`*varlist*`]`
Reduces the amount of memory needed for the data by bringing it to the smallest storage type
needed.
On-line, context

__cor__relate [*varlist*`] [if `*exp*`] [in `*range*`]`
Computes the correlation between two or more variables based on the subset of the
observations that are available for all variables.
On-line, context

`count [if `*exp*`] [in `*range*`]`
Shows the number of observations satisfying an `if/in` criteria
On-line, context

__d__escribe, [ __s__hort`]`
Shows size of the data set, the number of observations, the variables in the data set, their types and labels
On-line, context

__di__splay *exp*
Evaluates an expression and outputs the result
On-line, context: 1

`do `*filename*` [`*arguments*`] , [nostop]`
`Executes the specified do-file.
``nostop` allows to continue execution even if an error occurs.
`On-line, context: 1,
2
`

`drop `*varlist*
Deletes specified variables from the current data set in memory.
On-line, context

`drop [if `*exp*`] [in `*range*`]`
Deletes specified observations from the current data set in memory.
On-line, context

`dwstat`
Performs Durbin-Watson test of residual autocorrelation following `regress`
The data must be `tsset`
On-line, context

`egen [`*type*`] `
*newvar*` = `*fcn*`(`*arguments*`)
[if `*exp*`] [in `*range*`], [`*options*`]`
Extensions to generate.
On-line, context

__eret__urn __li__st
Shows the stored results of the previous estimation command
On-line, context

`exit, [clear]`
Exit Stata
`clear` option shows your understanding that it is OK to lose unsaved data.
On-line, context

`findit `*text*
Searches for *text* in the help files available on the current machine, and over the Internet.
On-line, context

`foreach `*lclname* of [ *varlist*` | `*numlist*` | local `*macro*
| global *macro* ] {
` ...
``}
``foreach `*lclname* in *arbitrary_list*` {
`` ...
``}`
Performs the action in the body of the cycle going over all values of ``lclname' `from
the specified list or the macro.
On-line, context

`forvalues `*lclname* = *range*` {
`` ...
``}`
Performs the action in the body of the cycle going over all values of ``lclname' `from
the specified *range*.
If an arbitrary *numlist* is to be used, see foreach.
On-line, context

__g__enerate [type] *newvarname*`=`*exp*
[if *exp*`] [in `*range*`]`
Creates a new variable and set it equal to *exp*, and to missing otherwise
On-line, context

`global `*gblname*` `*string*
Creates a global macro *gblname* and copies *string* to it.
`global `*gblname*` = `*exp*
Creates a global macro *gblname*, evaluates *exp* and copies the result to *gblname*.
The local macros are referred to as `$gblname`.
The ambiguities in the global macro names are resolved by putting `{ }` where needed.
, context

__gr__aph box *variable*` [if `*exp*`] [in `*range*`],
[by(`*varlist*`) ]`
Draws a box-and-whisker plot of the data
On-line, context

`gsort [+|-`*variable*`] ...`
Generalized sorting in both ascending and descending order
On-line, context

`help` *command*
Displays help on the specified *command*.
On-line, context

`hettest [`*varlist*`]`
Lagrangian multiplier test for heteroskedasticity; only available after `regress`
On-line, context

__hist__ogram *variable*` [if `*exp*`] [in `*range*`]
, [`__d__iscrete __w__idth(#) bin(#) start(#) __den__sity __frac__tion __freq__uency ...]
Draws a histogram for a single variable. Look through the help file for relevant options.
On-line, context

`imtest`
Information matrix test on the residual distribution
On-line, context

`infile `*variables*` using `*filename*`, [clear]`
Reads data from the raw text file.
You can specify a dictionary
for complex tasks.
On-line, context

`inspect `*variable*` [if `*exp*`] [in `*range*`]`
Gives a small histogram, the number of values that are:
unique; positive, zero, negative; integer and non-integer; missing.
On-line, context

`keep `*varlist*
Keeps specified variables and deletes others from the current data set in memory.
On-line, context

`keep [if `*exp*`] [in `*range*`]`
Keeps specified observations and deletes others from the current data set in memory.
On-line, context

__lab__el __var__iable [*varname*`"`*text*`"`
Gives a variable a label that is shown in the Variables window,
in the output of describe, tabulate, and in graphs
On-line, context

__lab__el __de__fine *labelname*` # "`*text*`" ...`
__lab__el __val__ues *varname*` [`*labelname*`]`
Defines a set of labelled values, and applies this set to the specified variable
On-line, context

__li__st [*variables*`] [if `*exp*`] [in `*range*`]`
Shows the entries of the data set for the specified variables (for all variables by default)
and specified observations (all observations by default).
On-line, context

`local `*lclname*` `*string*
Creates a local macro *lclname* and copied *string* to it.
`local `*lclname*` = `*exp*
Creates a local macro *lclname*, evaluates *exp* and copies the result to *lclname*.
The local macros are referred to as ``lclname' `
On-line, context

`log using` *filename*
`log close`
Logs Stata output
On-line, context

`ls [`*filename(s)*`]`
Lists the files in the current directory
On-line, context

`lv `*variable*` [if `*exp*`] [in `*range*`]`
Letter values of a variable to break the distribution into quintiles, deciles, etc.,
and visually assess normality.
On-line, context

`memory`
`set memory ` *#*`m`
The former displays available memory, and the latter changes the amount of memory Stata can use
On-line, context: 1,
2

`more
``set more [on|off]`
Makes Stata stop and wait for user to press a key
On-line, context

`move `*varname1 varname2*
Moves *varname1* to the front of the data set, and shifts the remaining variables, including
*varname2*, to make room.
On-line, context

`mvdecode `*varlist*` [if `*exp*`] [in `*range*`]
, mv(`*numlist* ...`)`
Changes occurrences of *numlist* to a missing value code. `mv()` is required.
On-line, context

`mvencode `*varlist*` [if `*exp*`] [in `*range*`]
, mv(#...) [`__o__verride]
Changes missings to specified number(s). `mv()` is required.
Without `override`, `mvencode`
refuses to make any changes if the numeric values already exist in the *varlist*
On-line, context

`noisily `*Stata command*
Turns back the output of the command cancelling the effect of `quietly`
or `capture`.
On-line, context

__note__s [*variable*`"`*text*`"`
Adds notes to the whole data set or to a particular variable
On-line, context

`order `*varlist*
Moves the specified variables to the front of the data set.
On-line, context

`outfile [`*variables*` using] `*filename*`, [replace]`
Writes the specified variables (or all of the data)
On-line, context

`ovtest , [rhs]`
Test for omitted nonlinearity
On-line, context

`predict `*newvarname*` [if `*exp*`] [in `*range*`]
, `*options*
A universal post-estimation command to obtain observation level results of
an estimation procedure, such as fitted values and residuals for `regress`,
or predicted probabilities for `probit`. The supported options /
statistics are provided in help files for the original estimation commands.
On-line, context

`preserve `
Temporarily saves your current data set in memory, to be restored later.
On-line, context

`probit `*depvar*` [`*varlist*`]
[if `*exp*`] [in `*range*`],
[`__r__obust __cl__uster(*varname*`) ...]`
Estimates the probit regression of *depvar* on *varlist*.
`robust` and `cluster` options provide corrections of the
estimates covariance matrix
`predict` options: `p` for
the probability of a positive outcome (default);
`xb` for fitted values; `stdp` for the
standard error of the prediction.
On-line, context

`pwcorr [`*varlist*`] [if `*exp*`] [in `*range*`]
, [sig obs ...] `
Computes pairwise correlations.
`sig` option requests significance of zero correlation testing
`obs` option requests the number of observations on which the correlation is based
On-line, context

`pwd`
Displays the current directory.
On-line, context

__qui__etly *Stata command*
Suppresses Stata output from the command, except for error messages.
Unlike capture, stops for errors.
On-line, context

`range `*varname #first #last*`[`*#obs*`]`
Generates a numerical range / grid of points
On-line, context

`recode [`*varlist*`] (`*rule*`) ...,
`__g__enerate(*newvarlist*`)`
Changes the values of numeric variables according to the specified *rule*s.
*rule* is of the form
*numlist* `| `__nonm__issing | __mis__sing = #
On-line, context

`regress `*depvar*` [`*varlist*`]
[if `*exp*`] [in `*range*`],
[`__r__obust __cl__uster(*varname*`) ...]`
Estimates the linear regression of *depvar* on *varlist*.
`robust` and `cluster` options provide corrections of the
estimates covariance matrix
`predict` options: __res__iduals for
the residuals; `xb` for fitted values (default); `stdp` for the
standard error of the prediction; etc.
On-line, context

`replace `*varname*` =`*exp*
` [if `*exp*`] [in `*range*`]`
Changes the entries of an existing variable
On-line, context

`restore, [not `__pres__erve]
Restores the data that was previously preserved.
`not` instructs to cancel the previous preserve.
`preserve` instructs to continue keeping the preserved data for further restoration.
On-line, context

__ret__urn __li__st
Shows the results of a non-estimation command, if applicable
On-line, context

`run `*filename*` [`*arguments*`] , [nostop]`
Executes the specified do-file suppressing the output, except for erors.
`nostop` allows to continue execution even if an error occurs.
On-line, context

`sample `*#*` [if `*exp*`] [in `*range*`], [`__c__ount]
Takes a subsample of the data, either # per cent of the original data, or # observations
if `count` option is specified.
On-line, context

`save `*filename*`, [replace]`
Saves the current data set in Stata format
On-line, context

`scatter `*y-variable(s) x-variable*` [if `*exp*`] [in `*range*`],
[`*formatting options*`]`
Scatter plot of one variable against another, or several y-variables against one x-variable.
The list of the formatting options is huge and will include options on
the marker shape, size and color,
axes, space of the graph, title, and many other things.
Study the online help for details.
On-line, context

`search` *phrase*
Searches Stata help and online resources for *phrase*.
On-line, context

`sort `*varlist*
Sorts the observations in ascending order of variables in *varlist*
On-line, context

__su__mmarize [*varlist*`]
[if `*exp*`] in [`*range*`], [`__d__etail]
Descriptive statistics: means, medians, standard deviations, variances.
`detail` option gives percentiles, skewness and kurtosis.
On-line, context

`sysuse` *filename*
Loads a data set that comes with Stata
On-line, context

`table `*var2*` [`*var2*`] [if `*exp*`] [in `*range*`]
, [`__c__ontents(...)] ...
Makes a table of summary statistics defined by `contents`, including
frequencies, means, counts, etc. of other variables by *var1* or *var1* and *var2*.
On-line, context

__ta__bulate *var2*` [`*var2*`] [if `*exp*`] [in `*range*
`], [`__r__ow __co__lumn __nol__abel ... ]
One- and two-way frequency tables, with optional row and column summaries/frequencies,
and independence tests.
On-line, context

`tsset [`*panelvar*`] `*timevar*
Declares the data to be of the time series format with *timevar* representing time variable.
For panel data, *panelvar* identifies panels (individuals, enterprises, countries, ...),
and *timevar* indicates time within those panels.
The data set will be `sort`ed by *timevar* or by
*panelvar timevar*
On-line

`twoway `*plot*` [if `*exp*`] [in `*range*`], `
*twoway options*
Two way graphs, including scatter plots, line plots, bar plots, histograms,
and smoothing / trend line plots.
May be used as a wrapper to combine several scatter plots.
Options control axes, labels, title, legends, etc.
On-line, context

`use` *filename*
Loads the Stata data file into Stata memory
On-line, context

`vce`
Displays the variance-covariance matrix of the estimated coefficients
On-line, context

`vif`
Variance inflation factors
On-line, context

`whelp` *command*
Displays help on the specified *command*.
On-line, context