Working with Python
Data Science
A few notes, resources, and examples on using Python for data science and analysis.
Load Data from Text File
An example loading comma delimited data into a Numpy array.
```python
import numpy as np
data: np.loadtxt(open('comma_delim.csv'), delimiter=",")
Data Analysis
You can use numpy to do some quick data analysis.
Median and Average
Use np.median()
and np.average()
to calculate the median and average for a set of data.
import numpy as np
import random
# sample data
nums = [random.randint(1, 1000) for _ in range(10)]
a = np.array(nums)
print(f"Median: {np.median(a)}")
print(f"Average: {np.average(a)}")
Percentile
Use numpy percentile function to calculate percentiles for a set of data.
# sample data
data = np.array(range(10, 91))
# calculate 10th, 25th, 50th, 75th, and 90th
for per in [10, 25, 50, 75, 90]:
perc = np.percentile(data, per)
print(f"{per}th => {perc:.0f}")
>>> 10th => 18.00
>>> 25th => 30.00
>>> 50th => 50.00
>>> 75th => 70.00
>>> 90th => 82.00
Plotting and Graphing
Line Graph
import math
import matplotlib.pyplot as plt
# line with slope m=2
X = [n for n in range(0, 20)]
Y = [ 2 * x + 3 for x in X]
plt.plot(X,Y)
# save graph to file
plt.savefig("line-graph.png", dpi=150)
Graph Stylesheets
Style your matplotlib graphs by using a stylesheet. The matplotlib documentation has a few example stylesheets to preview and download.
Download the stylesheet and place in the same directory as your code. The line graph example using bmh
stylesheet, add the following at the top:
plt.style.use('bmh')
Scatter Plot
An example of a scatter plot adding title, labels, and axes and using the ggplot style sheet.
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
# random data
X = np.random.normal(0, 1, 500)
Y = np.random.normal(0, 1, 500)
plt.scatter(X,Y)
plt.title("Scatter Plot Example")
plt.xlabel("X-Axis")
plt.ylabel("Y-Axis")
# save graph to file
plt.savefig("scatter-ggplot.png", dpi=150)