# Data Science

A few notes, resources, and examples on using Python for data science and analysis.

## Load Data from Text File

``````

```python
import numpy as np

## Data Analysis

You can use numpy to do some quick data analysis.

### Median and Average

Use `np.median()` and `np.average()` to calculate the median and average for a set of data.

``````import numpy as np
import random

# sample data
nums = [random.randint(1, 1000) for _ in range(10)]

a = np.array(nums)
print(f"Median: {np.median(a)}")
print(f"Average: {np.average(a)}")``````

### Percentile

Use numpy percentile function to calculate percentiles for a set of data.

``````# sample data
data = np.array(range(10, 91))

# calculate 10th, 25th, 50th, 75th, and 90th
for per in [10, 25, 50, 75, 90]:
perc = np.percentile(data, per)
print(f"{per}th => {perc:.0f}")

>>> 10th => 18.00
>>> 25th => 30.00
>>> 50th => 50.00
>>> 75th => 70.00
>>> 90th => 82.00``````

## Plotting and Graphing

### Line Graph

``````import math
import matplotlib.pyplot as plt

# line with slope m=2
X = [n for n in range(0, 20)]
Y = [ 2 * x + 3 for x in X]
plt.plot(X,Y)

# save graph to file
plt.savefig("line-graph.png", dpi=150)``````

### Graph Stylesheets

Style your matplotlib graphs by using a stylesheet. The matplotlib documentation has a few example stylesheets to preview and download.

Download the stylesheet and place in the same directory as your code. The line graph example using `bmh` stylesheet, add the following at the top:

``plt.style.use('bmh')``

### Scatter Plot

An example of a scatter plot adding title, labels, and axes and using the ggplot style sheet.

``````import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")

# random data
X = np.random.normal(0, 1, 500)
Y = np.random.normal(0, 1, 500)

plt.scatter(X,Y)
plt.title("Scatter Plot Example")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")

# save graph to file
plt.savefig("scatter-ggplot.png", dpi=150)``````