# Stata¶

Stata is a powerful statistical software package that is widely used in scientific computing. RCC users are licensed to use Stata on all RCC resources. Stata can be used interactively or as a submitted script. Please note that if you would like to run it interactively, you must still run it on a compute node, in order to keep the login nodes free for other users. Stata can be run in parallel on up to 16 nodes.

Note

Stata examples in this document are adapted from a Princeton tutorial. You may find it useful if you are new to Stata or want a refresher.

## Getting Started¶

If you need to use the Stata GUI, connect to Midway with Connecting with ThinLinc.

Obtain an interactive session on a compute node. This is necessary so that your computation doesn’t interrupt other users on the login node. Now, load Stata:

```
sinteractive
module load stata
xstata
```

This will open up a Stata window. The middle pane has a text box to enter commands at the bottom, and a box for command results on top. On the left there’s a box called “Review” that shows your command history. The right-hand box contains information about variables in the currently-loaded data set.

One way Stata can be used is as a fancy desktop calculator. Type the following code into the command box:

```
display 2+2
```

Stata can do much more if data is loaded into it. The following code loads census data that ships with Stata, prints a description of the data, then creates a graph of life expectancy over GNP:

```
sysuse lifeexp
describe
graph twoway scatter lexp gnppc
```

## Running Stata from the command line¶

This is very similar to running graphically; the command-line interface is equivalent to the “Results” pane in the graphical interface. Again, please use a compute node if you are running computationally-intensive calculations:

```
sinteractive
module load stata
stata
```

## Running Stata Jobs with SLURM¶

You can also submit Stata jobs to SLURM, the scheduler. A Stata script is called a “do-file,” which contains a list of Stata commands that the interpreter will execute. You can write a do-file in any text editor, or in the Stata GUI’s do-file editor: click “Do-File Editor”” in the “Window” menu. If your do-file is named “example.do,” you can run it with either of the following commands:

```
stata < example.do
stata -b do example.do
```

Here is a very simple do-file, which computes a regression on the sample data set from above:

```
version 13 // current version of Stata, this is optional but recommended.
sysuse lifeexp
gen loggnppc = log(gnppc)
regress lexp loggnppc
```

Here is a submission script that submits the Stata program to the default queue on Midway:

```
#!/bin/bash
#SBATCH --job-name=stataEx
#SBATCH --output=stata_example.out
#SBATCH --error=stata_example.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
module load stata
stata -b stata_example.do
```

`stata_example.do`

is our example do-file, and `stata_example.sbatch`

is the submission script.

To run this example, download both files to a directory on Midway. Enter the following command to submit the program to the scheduler:

```
sbatch stata_example.sbatch
```

Output from this example can be found in the file named `stata_example.log`

, which will be created automatically in your current directory.

### Running Parallel Stata Jobs¶

The parallel version of Stata, Stata/MP, can speed up computations and make effective use of RCC’s resources. When running Stata/MP, you are limited to 16 cores and 5000 variables. Run an interactive Stata/MP session:

```
sinteractive
module load stata
stata-mp
# or, for the graphical interface:
xstata-mp
```

Here is a sample do-file that would benefit from parallelization. It runs bootstrap estimation on another data set that ships with Stata.

```
version 13
sysuse auto
expand 10000
bootstrap: logistic foreign price-gear_ratio
```

Here is a submission script that will run the above do-file with Stata/MP:

```
#!/bin/bash
#SBATCH --job-name=stataMP
#SBATCH --output=stata_parallel.out
#SBATCH --error=stata_parallel.err
#SBATCH --nodes=1
#SBATCH --tasks-per-node=16
module load stata
stata-mp -b stata_parallel.do
```

Download `stata_parallel.do`

and `stata_parallel.sbatch`

to Midway, then run the program with:

```
sbatch stata_parallel.sbatch
```