### Code

# Python Data Science

My notes, resources and examples using Python, NumPy, SciPy and Matplotlib as alternatives to R and Matlab for data science and analysis.

## Load Data from Text File

```
import pylab
filename = "cool_data.dat"
# use skiprows if your data file has headers
data = pylab.loadtxt(filename, skiprows=1)
```

An example loading comma delimited data using Numpy:

```
import numpy as np
data = np.loadtxt(open('comma_delim.csv'), delimiter=",")
```

## Plotting and Graphing

### Log Scale

```
import math
import matplotlib.pyplot as pyplot
X = list(xrange(1,25))
Y= []
for i in X:
Y.append(math.pow(10, i))
pyplot.xlim(0,25)
pyplot.ylim(max(Y))
pyplot.yscale('log')
pyplot.plot(X,Y)
```

### Labels for Titles and Axes

```
import matplotlib.pyplot as pyplot
import pylab
X = pylab.np.random.normal(0,1,500)
Y = pylab.np.random.normal(0,1,500)
pyplot.scatter(X,Y)
pyplot.title("Scatter Plot Example")
pyplot.xlabel("X-Axis")
pyplot.ylabel("Y-Axis")
```

### Saving a Graph

The following will create a png image 648×432 pixels. Note, you most likely will want to keep the dpi set to 72 since this has a direct effect on the font sizes in the rendered image

```
import pylab
... setup of data ...
# figure size in inches
pylab.rcParams['figure.figsize'] = 9, 6
pylab.plot(X,Y)
pylab.savefig("graph.png", dpi=72) # dots per inch
```

## Installing NumPy, SciPy and Matplotlib on OS X

I had a little trouble with the initial setup of some of the key libraries used for machine learning, stats and data science.

Here's what worked for me, to install on Mac OS X, the key was to not use the built-in python, download the binary from python.org. Secondly, make sure you set your environment variables to use this binary.

The symlinks to the python binaries were put in `/usr/local/bin`

The actual binaries are installed in `/Library/Frameworks/Python.framework/Versions/3.3/bin`

You need to install `gfortran`

as a prerequisite, which I did using Homebrew

brew install gfortran

If you are still using Python 2.7, pip install works fine, make sure the pip you are using is for the binary you installed, and not the base system. Most likely should be /usr/local/bin/pip and not /usr/bin/pip

```
$ pip install numpy
$ pip install scipy
$ pip install matplotlib
```

If you are using Python 3.3, which I have switched to and have had no problems using once installed, it seems the pip libraries aren't as up-to-date or require the latest code, so I checked out from source and built

```
$ git clone https://github.com/numpy/numpy.git
$ cd numpy
$ python setup.py install
$ git clone https://github.com/scipy/scipy.git
$ cd scipy
$ python setup.py install
$ git clone https://github.com/matplotlib/matplotlib.git
$ cd matplotlib
$ python setup.py install
```