What you should Have to Start

Lesson Portion 1: ReIntroduction to Data Analysis, NunPy, and Pandas, Why is it important?

Data Analysis.

  • Data Analysis is the process of examining data sets in order to find trends and draw conclusions about the given information. Data analysis is important because it helps businesses optimize their performances.

What is NumPy and Pandas

  • Pandas library involves a lot of data analysis in Python. NumPy Library is mostly used for working with numerical values and it makes it easy to apply with mathematical functions.
  • Imagine you have a lot of toys, but they are all mixed up in a big box. NumPy helps you to put all the same types of toys together, like all the cars in one pile and all the dolls in another. Pandas is like a helper that helps you to remember where each toy is located. So, if you want to find a specific toy, like a red car, you can ask Pandas to find it for you.
  • Just like how it's easier to find a toy when they are sorted and organized, it's easier for grown-ups to understand and analyze big sets of numbers when they use NumPy and Pandas.

Lesson Portion 2 More into NumPy

What we are covering;

  • Explanation of NumPy and its uses in data analysis
  • Importing NumPy library
  • Examining NumPy arrays
  • Creating NumPy arrays and performing intermediate array operations
  • Popcorn Hacks, Make your own percentile NunPy array

What is NumPy's use in data analysis/ how to import NumPy.

NumPy is a tool in Python that helps with doing math and data analysis. It's great for working with large amounts of data, like numbers in a spreadsheet. NumPy is really good at doing calculations quickly and accurately, like finding averages, doing algebra, and making graphs. It's used a lot by scientists and people who work with data because it makes their work easier and faster.

import numpy as np

List of NumPy Functions, what they do, and examples.

Example of Using NumPy in Our Project

This code calculates the total plate appearances for a baseball player using NumPy's sum() function, similar to the original example. It then uses NumPy to calculate the total number of bases (hits plus walks) for the player, and divides that by the total number of plate appearances to get the on-base percentage. The results are then printed to the console.

import numpy as np

# Example data
player_hits = np.array([3, 1, 2, 0, 1, 2, 1, 2])  # Player's hits in each game
player_walks = np.array([1, 0, 0, 1, 2, 1, 1, 0])  # Player's walks in each game
player_strikeouts = np.array([2, 1, 0, 2, 1, 1, 0, 1])  # Player's strikeouts in each game

# array to store plate appearances (PA) for the player
total_pa = np.sum(player_hits != 0) + np.sum(player_walks) + np.sum(player_strikeouts)

# array to store on-base percentage (OBP) for the player
total_bases = np.sum(player_hits) + np.sum(player_walks)
obp = total_bases / total_pa

# Print the total plate appearances and on-base percentage for the player
print(f"Total plate appearances: {total_pa}")
print(f"On-base percentage: {obp:.3f}")
Total plate appearances: 21
On-base percentage: 0.857

Activity 1; PopCorn Hacks; Creating a NumPy Array and Analyzing the Data using Array Operations

import numpy as np

#Create a NumPy array of the heights of players in a basketball team
heights = np.array([192, 195, 193, 200, 211, 199, 201, 198, 184, 190, 196, 203, 208, 182, 207])

# Calculate the percentile rank of each player's height
percentiles = np.percentile(heights, [25, 50, 75])

# Print the results
print("The 25th percentile height is", percentiles[0], "cm.")
print("The 50th percentile height is", percentiles[1], "cm.")
print("The 75th percentile height is", percentiles[2], "cm.")

# Determine the number of players who are in the top 10% tallest
top_10_percent = np.percentile(heights, 90)
tallest_players = heights[heights >= top_10_percent]

print("There are", len(tallest_players), "players in the top 10% tallest.")
The 25th percentile height is 192.5 cm.
The 50th percentile height is 198.0 cm.
The 75th percentile height is 202.0 cm.
There are 2 players in the top 10% tallest.
import numpy as np

#Create a NumPy array of the x
x = np.array([])

# Calculate the percentile rank of x
y = np.percentile(x, [1,2,3])

# Print the results
print("", percentiles[0], "")
print("", percentiles[1], "")
print("", percentiles[2], "")

# Determine the number of players who are in the top 10% x
t = np.percentile(x, 90)
z = x[x >= t]

print("There are", len(z), "players in the top 10% (x).")
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb Cell 18 in <cell line: 9>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5'>6</a> x = np.array([])
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7'>8</a> # Calculate the percentile rank of x
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8'>9</a> y = np.percentile(x, [1,2,3])
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=10'>11</a> # Print the results
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#X23sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11'>12</a> print("", percentiles[0], "")

File <__array_function__ internals>:5, in percentile(*args, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py:3867, in percentile(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3865 if not _quantile_is_valid(q):
   3866     raise ValueError("Percentiles must be in the range [0, 100]")
-> 3867 return _quantile_unchecked(
   3868     a, q, axis, out, overwrite_input, interpolation, keepdims)

File ~/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py:3986, in _quantile_unchecked(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3983 def _quantile_unchecked(a, q, axis=None, out=None, overwrite_input=False,
   3984                         interpolation='linear', keepdims=False):
   3985     """Assumes that q is in [0, 1], and is an ndarray"""
-> 3986     r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
   3987                     overwrite_input=overwrite_input,
   3988                     interpolation=interpolation)
   3989     if keepdims:
   3990         return r.reshape(q.shape + k)

File ~/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py:3564, in _ureduce(a, func, **kwargs)
   3561 else:
   3562     keepdim = (1,) * a.ndim
-> 3564 r = func(a, **kwargs)
   3565 return r, keepdim

File ~/anaconda3/lib/python3.9/site-packages/numpy/lib/function_base.py:4098, in _quantile_ureduce_func(***failed resolving arguments***)
   4093 if np.issubdtype(a.dtype, np.inexact):
   4094     # may contain nan, which would sort to the end
   4095     ap.partition(concatenate((
   4096         indices_below.ravel(), indices_above.ravel(), [-1]
   4097     )), axis=0)
-> 4098     n = np.isnan(ap[-1])
   4099 else:
   4100     # cannot contain nan
   4101     ap.partition(concatenate((
   4102         indices_below.ravel(), indices_above.ravel()
   4103     )), axis=0)

IndexError: index -1 is out of bounds for axis 0 with size 0

Lesson Portion 3 More into Pandas

What we are Covering

  • Explanation of Pandas and its uses in data analysis
  • Importing Pandas library
  • Loading data into Pandas DataFrames from CSV files
  • Manipulating and exploring data in Pandas DataFrames
  • Example of using Pandas for data analysis tasks such as filtering and sorting

What are pandas and what is its purpose?

  • Pandas is a software library that is used in Python
  • Pandas are used for data analysis and data manipulation
  • Pandas offer data structures and operations for manipulating numerical tables and time series.
  • Pandas is free

Things you can do using pandas

  • Data Cleansing; Identifying and correcting errors, inconsistencies, and inaccuracies in datasets.
  • Data fill; Filling in missing values in datasets.
  • Statistical Analysis; Analyzing datasets using statistical techniques to draw conclusions and make predictions.
  • Data Visualization; Representing datasets visually using graphs, charts, and other visual aids.
  • Data inspection; Examining datasets to identify potential issues or patterns, such as missing data, outliers, or trends.

Pandas and Data analysis

The 2 most important data structures in Pandas are:

  • Series ; A Series is a one-dimensional labeled array that can hold data of any type (integer, float, string, etc.). It is similar to a column in a spreadsheet or a SQL table. Each element in a Series has a label, known as an index. A Series can be created from a list, a NumPy array, a dictionary, or another Pandas Series.
  • DataFrame ;A DataFrame is a two-dimensional labeled data structure that can hold data of different types (integer, float, string, etc.). It is similar to a spreadsheet or a SQL table. Each column in a DataFrame is a Series, and each row is indexed by a label, known as an index. A DataFrame can be created from a dictionary of Series or NumPy arrays, a list of dictionaries, or other Pandas DataFrame.

Dataframes

import pandas as pd
pd.__version__
'1.4.2'

Importing CSV Data

  • imports the Pandas library and assigns it an alias 'pd'.
  • Loads a CSV file named 'nba_player_statistics.csv' into a Pandas DataFrame called 'df'.
  • Specifies a player's name 'Jimmy Butler' to filter the DataFrame for that player's stats. It creates a new DataFrame called 'player_stats' which only contains rows where the 'NAME' column matches 'Jimmy Butler'.
  • Displays the player's stats for points per game (PPG), rebounds per game (RPG), and assists per game (APG) using the print() function and string formatting.
  • The code uses the double square brackets [[PPG', 'RPG', 'APG']] to select only the columns corresponding to the player's points per game, rebounds per game, and assists per game from the player_stats DataFrame.
  • In summary, the code loads NBA player statistics data from a CSV file, filters it for a specific player, and displays the stats for that player's PPG, RPG, and APG using a Pandas DataFrame.
import pandas as pd
# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('/home/edwin/vscode/fastpage1/_notebooks/files/nba_player_statistics.csv')
# Filter the DataFrame to only include stats for a specific player (in this case, Jimmy Butler)
player_name = 'Jimmy Butler'
player_stats = df[df['NAME'] == player_name]
# Display the stats for the player
print(f"\nStats for {player_name}:")
print(player_stats[['PPG', 'RPG', 'APG']])
Stats for Jimmy Butler:
    PPG  RPG   APG
0  35.0  5.0  11.0

In this code segment below we use Pandas to read a CSV file containing NBA player statistics and store it in a DataFrame.

The reason Pandas is useful in this scenario is because it provides various functionalities to filter, sort, and manipulate the NBA data efficiently. In this code, the DataFrame is filtered to only include the stats for the player you guys choose.

  • Imports the Pandas library and assigns it an alias 'pd'.
  • Loads a CSV file named 'nba_player_statistics.csv' into a Pandas DataFrame called 'df'.
  • Asks the user to input a player name using the input() function and assigns it to the variable player_name.
  • Filters the DataFrame for the specified player name using the df[df['NAME'] == player_name] syntax, and assigns the resulting DataFrame to the variable player_stats.
  • Checks if the player_stats DataFrame is empty using the empty attribute. If it is empty, prints "No stats found for that player." Otherwise, it proceeds to step 6.
  • Displays the player's stats for points per game (PPG), rebounds per game (RPG), assists per game (APG), and total points + rebounds + assists (P+R+A) using the print() function and string formatting.
  • In summary, this code loads NBA player statistics data from a CSV file, asks the user to input a player name, filters the DataFrame for that player's stats, and displays the player's stats for PPG, RPG, APG, and P+R+A. If the player is not found in the DataFrame, it prints a message indicating that no stats were found.
import pandas as pd
df = pd.read_csv('/home/edwin/vscode/fastpage1/_notebooks/files/nba_player_statistics.csv')
# Load CSV file into a Pandas DataFrame
player_name = input("Enter player name: ")
# Get player name input from user
player_stats = df[df['NAME'] == player_name]
# Filter the DataFrame to only include stats for the specified player
if player_stats.empty:
    print("No stats found for that player.")
else:
# Check if the player exists in the DataFrame
    print(f"\nStats for {player_name}:")
print(player_stats[['PPG', 'RPG', 'APG', 'P+R+A']])
# Display the stats for the player inputted.
No stats found for that player.
Empty DataFrame
Columns: [PPG, RPG, APG, P+R+A]
Index: []

Lesson Portion 4

What we will be covering

  • Example of analyzing data using both NumPy and Pandas libraries
  • Importing data into NumPy and Pandas Performing basic data analysis tasks such as mean, median, and standard deviation Visualization of data using Matplotlib library

Example of analyzing data using both NumPy and Pandas libraries

import numpy as np
import pandas as pd

# Load CSV file into a Pandas DataFrame

df = pd.read_csv('/home/edwin/vscode/fastpage1/_notebooks/files/nba_player_statistics.csv')

# Filter the DataFrame to only include stats for the specified player

player_name = input("Enter player name: ")
player_stats = df[df['NAME'] == player_name]
if player_stats.empty:
    print("No stats found for that player.")
else:

    # Convert the player stats to a NumPy array
    player_stats_np = np.array(player_stats[['PPG', 'RPG', 'APG', 'P+R+A']])

    # Calculate the average of each statistic for the player

    player_stats_avg = np.mean(player_stats_np, axis=0)

    # Print out the average statistics for the player

    print(f"\nAverage stats for {player_name}:")
    print(f"PPG: {player_stats_avg[0]:.2f}")
    print(f"RPG: {player_stats_avg[1]:.2f}")
    print(f"APG: {player_stats_avg[2]:.2f}")
    print(f"P+R+A: {player_stats_avg[3]:.2f}")
Average stats for LeBron James:
PPG: 21.00
RPG: 11.00
APG: 5.00
P+R+A: 37.00

NumPy impacts the given code because it performs operations on arrays efficiently. Specifically, it converts a Pandas DataFrame object to a NumPy array object, and then calculates the average statistics for a the player you guys inputted. Without NumPy, it would be more difficult and less efficient to perform these calculations on large data sets. It does the math for us.

Importing data into NumPy and Pandas Performing basic data analysis tasks such as mean, median, and standard deviation Visualization of data using Matplotlib library

Matplotlib is used essentially to create visuals of data. charts,diagrams,etc.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('/home/edwin/vscode/fastpage1/_notebooks/files/nba_player_statistics.csv')

# Print the first 5 rows of the DataFrame
print(df.head())

# Calculate the mean, median, and standard deviation of the 'Points' column
mean_minutes = df['MPG'].mean()
median_minutes = df['MPG'].median()
stddev_minutes = df['MPG'].std()

# Print the results
print('Mean Minutes: ', mean_minutes)
print('Median Minutes: ', median_minutes)
print('Standard Deviation Minutes: ', stddev_minutes)

# Create a histogram of the 'Points' column using Matplotlib
plt.hist(df['MPG'], bins=20)
plt.title('MPG Histogram')
plt.xlabel('MPG')
plt.ylabel('Frequency')
plt.show()
   RANK             NAME TEAM POS   AGE  GP   MPG  USG%   TO%  FTA  ...   APG  \
0     1     Jimmy Butler  Mia   F  33.6   1  42.9  34.3   9.9    8  ...  11.0   
1     2    Kawhi Leonard  Lac   F  31.8   2  40.2  30.0  11.9   17  ...   6.0   
2     3  Khris Middleton  Mil   F  31.7   1  33.1  37.5  19.8   10  ...   4.0   
3     4     Devin Booker  Pho   G  26.5   2  44.1  28.8  16.2   14  ...   6.0   
4     5     De'Aaron Fox  Sac   G  25.3   2  38.2  31.6   9.0   14  ...   7.0   

   SPG  BPG  TPG   P+R   P+A  P+R+A    VI   ORtg   DRtg  
0  3.0  0.0  3.0  40.0  46.0   51.0  11.6  117.2  103.8  
1  2.0  0.5  3.0  41.0  40.5   47.0  11.0  129.5  110.4  
2  0.0  0.0  5.0  42.0  37.0   46.0  12.8  115.5  111.9  
3  2.5  1.5  4.0  33.0  38.0   39.0   5.2  121.9  111.0  
4  3.5  0.5  2.5  34.0  38.0   41.0   9.1  112.6  108.8  

[5 rows x 29 columns]
Mean Minutes:  20.985483870967748
Median Minutes:  23.0
Standard Deviation Minutes:  12.844102823170283

In this example code, we first import the necessary libraries, including NumPy, Pandas, and Matplotlib. We then load the CSV file into a Pandas DataFrame using the pd.read_csv() function. We print the first 5 rows of the DataFrame using the df.head() function. Next, we calculate the mean, median, and standard deviation of the 'MPG' column using the appropriate Pandas methods, and print the results. And, we create a histogram of the 'MPG' column using Matplotlib by calling the plt.hist() function and setting appropriate axis labels and a title. We then call the plt.show() method to display the plot. Even though NumPy is not directly used in this code, it is an important underlying component of the pandas and Matplotlib libraries, which are used to load, manipulate and visualize data. It allows them to work more efficiently

Lesson Portion 5; Summary

Summary/Goals of Lesson:

One of our goals was to make you understand data analysis and how it can be important in optimizing business performance. We also wanted to make sure you understood the use of Pandas and NumPy libraries in data analysis, with a focus on NumPy. As someone who works with data, we find Pandas incredibly useful for manipulating, analyzing, and visualizing data in Python. The way we use pandas is to calculate individual player and team statistics. We are a group that works with numerical data, so NumPy is one of our favorite tools for working with arrays and applying mathematical functions to them. It is very fast at computing and manipulating arrays making it a very valuable tool for tracking statistics which is important to our group. For example, if you have an array of the points scored by each player in a game, you can use NumPy to calculate the total points scored, average points per player, or the highest and lowest scoring players.

Lesson Portion 6 Hacks

Printing a CSV File (0.5)

  • Use this link https://github.com/ali-ce/datasets to select csv file of a topic you are interested in, or you may find one online.
    • Products.csv from CES 2015
  • Once you select your topic make sure it is a csv file and then you want to press on the button that says raw.
  • After that copy that information and create a file with a name and .csv at the end and paste your information.
  • Below is a start that you can use for your hacks.
  • Your goal is to print 2 specific parts from data (example could be like population and country).

Popcorn Hacks (0.2)

  • Lesson Portion 1.

Answering Questions (0.2)

  • Found Below.

Submit By Thursday 8:35 A.M.

  • How to Submit: Slack a Blog Post that includes all of your hacks to "Joshua Williams" on Slack.

Activity 1 Hacks

import numpy as np

#Create a NumPy array of the heights of players in a basketball team
grades = np.array([950, 950, 960, 970, 970, 990, 1010, 1020, 1020, 1040, 1050, 1070, 1080, 1090, 1100, 1120, 1130, 1150, 1150, 1170, 1200, 1210, 1230, 1270, 1280, 1290, 1300, 1320, 1340, 1340, 1350, 1370, 1370, 1380, 1390, 1400, 1430, 1450, 1460, 1470, 1490, 1500, 1520, 1550, 1580, 1590, 1600])

# Calculate the percentile rank of each player's height
percentiles = np.percentile(grades, [25, 50, 75])

# Print the results
print("The 25th percentile class grade is", percentiles[0])
print("The 50th percentile class grade is", percentiles[1])
print("The 75th percentile class grade is", percentiles[2])

# Determine the number of players who are in the top 10% tallest
top_10_percent = np.percentile(grades, 90)
highest_grades = grades[grades >= top_10_percent]

print("There are", len(highest_grades), "students with a score in the top 10%.")
The 25th percentile class grade is 1075.0
The 50th percentile class grade is 1270.0
The 75th percentile class grade is 1395.0
There are 5 students with a score in the top 10%.

CSV Hacks

import pandas as pd

# read the CSV file
df = pd.read_csv("files/Products.csv")
Product = df[['Company', 'Product', 'Product Category', 'Starting Price']].head(10)

print(Product)
# display the data in a table
          Company                 Product         Product Category  \
0     iHealth Lab           iHealth Align         Health & Fitness   
1     BlueMaestro                 Pacif-i   Baby, Health & Fitness   
2  Acoustic Sheep             SleepPhones  Music, Health & Fitness   
3         Evollve             Ozbot Bit's                 Robotics   
4      Fuz Design                    Noke                 Security   
5         Quitbit         Quitbit Lighter         Health & Fitness   
6   The Eye Tribe   The Eye Tribe Tracker              Eye Tracker   
7      Swiftpoint           Swiftpoint GT     Computer Accessories   
8            Edyn  Water Sensor and Valve                     Home   
9   Get Narrative        Narrative Clip 2                   Camera   

  Starting Price  
0         $16.95  
1         $39.00  
2         $39.95  
3         $49.99  
4         $59.99  
5         $99.00  
6         $99.00  
7        €109.00  
8        $159.98  
9        $199.00  

Extra

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load the CSV file into a Pandas DataFrame
df = pd.read_csv('/home/edwin/vscode/fastpage1/_notebooks/files/Products.csv')

# Print the first 5 rows of the DataFrame
print(df.head())

# Calculate the mean, median, and standard deviation of the 'Points' column
mean_minutes = df['Starting Price'].mean()
median_minutes = df['Starting Price'].median()
stddev_minutes = df['Starting Price'].std()

# Print the results
print('Mean Minutes: ', mean_minutes)
print('Median Minutes: ', median_minutes)
print('Standard Deviation Minutes: ', stddev_minutes)

# Create a histogram of the 'Points' column using Matplotlib
plt.hist(df['Starting Price'], bins=20)
plt.title('Starting Price of CES Products')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()
         Product                                              Image  \
0  iHealth Align  http://www.cesweb.org/CES/media/2014/Innovatio...   
1        Pacif-i  http://bluemaestro.com/wp-content/uploads/2014...   
2    SleepPhones  http://www.sleepphones.com/sites/default/files...   
3    Ozbot Bit's  http://www.ozobot.com/wp-content/uploads/revsl...   
4           Noke  http://cdn.shopify.com/s/files/1/0276/6013/t/2...   

          Company         Product Category  \
0     iHealth Lab         Health & Fitness   
1     BlueMaestro   Baby, Health & Fitness   
2  Acoustic Sheep  Music, Health & Fitness   
3         Evollve                 Robotics   
4      Fuz Design                 Security   

                                         Description Starting Price  \
0  This powerful glucose meter plugs directly int...         $16.95   
1  With a temperature sensor built into the pacif...         $39.00   
2  Relieve stress and listen to music with wonder...         $39.95   
3  As seen on Wall Street Journal, Forbes, CNET a...         $49.99   
4  The world's first bluetooth padlock. Noke is a...         $59.99   

                    Company HQ CES Innovation Award Honoree  \
0    California, United States          Best of Innovations   
1               United Kingdom                          NaN   
2  Pennsylvania, United States                          NaN   
3    California, United States                          NaN   
4          Utah, United States          Best of Innovations   

                                            Homepage  
0  http://www.ihealthlabs.com/glucometer/ihealth-...  
1      http://bluemaestro.com/pacifi-smart-pacifier/  
2  http://www.sleepphones.com/store/sleepphones-s...  
3                             http://www.ozobot.com/  
4                   http://fuzdesigns.com/pages/Noke  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb Cell 47 in <cell line: 12>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#Y106sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8'>9</a> print(df.head())
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#Y106sdnNjb2RlLXJlbW90ZQ%3D%3D?line=10'>11</a> # Calculate the mean, median, and standard deviation of the 'Points' column
---> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#Y106sdnNjb2RlLXJlbW90ZQ%3D%3D?line=11'>12</a> mean_minutes = df['Starting Price'].mean()
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#Y106sdnNjb2RlLXJlbW90ZQ%3D%3D?line=12'>13</a> median_minutes = df['Starting Price'].median()
     <a href='vscode-notebook-cell://wsl%2Bubuntu/home/edwin/vscode/fastpage1/_notebooks/2023-04-25-group3lesson.ipynb#Y106sdnNjb2RlLXJlbW90ZQ%3D%3D?line=13'>14</a> stddev_minutes = df['Starting Price'].std()

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py:11117, in NDFrame._add_numeric_operations.<locals>.mean(self, axis, skipna, level, numeric_only, **kwargs)
  11099 @doc(
  11100     _num_doc,
  11101     desc="Return the mean of the values over the requested axis.",
   (...)
  11115     **kwargs,
  11116 ):
> 11117     return NDFrame.mean(self, axis, skipna, level, numeric_only, **kwargs)

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py:10687, in NDFrame.mean(self, axis, skipna, level, numeric_only, **kwargs)
  10679 def mean(
  10680     self,
  10681     axis: Axis | None | lib.NoDefault = lib.no_default,
   (...)
  10685     **kwargs,
  10686 ) -> Series | float:
> 10687     return self._stat_function(
  10688         "mean", nanops.nanmean, axis, skipna, level, numeric_only, **kwargs
  10689     )

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py:10639, in NDFrame._stat_function(self, name, func, axis, skipna, level, numeric_only, **kwargs)
  10629     warnings.warn(
  10630         "Using the level keyword in DataFrame and Series aggregations is "
  10631         "deprecated and will be removed in a future version. Use groupby "
   (...)
  10634         stacklevel=find_stack_level(),
  10635     )
  10636     return self._agg_by_level(
  10637         name, axis=axis, level=level, skipna=skipna, numeric_only=numeric_only
  10638     )
> 10639 return self._reduce(
  10640     func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
  10641 )

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/series.py:4471, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   4467     raise NotImplementedError(
   4468         f"Series.{name} does not implement {kwd_name}."
   4469     )
   4470 with np.errstate(all="ignore"):
-> 4471     return op(delegate, skipna=skipna, **kwds)

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/nanops.py:93, in disallow.__call__.<locals>._f(*args, **kwargs)
     91 try:
     92     with np.errstate(invalid="ignore"):
---> 93         return f(*args, **kwargs)
     94 except ValueError as e:
     95     # we want to transform an object array
     96     # ValueError message to the more typical TypeError
     97     # e.g. this is normally a disallowed function on
     98     # object arrays that contain strings
     99     if is_object_dtype(args[0]):

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/nanops.py:155, in bottleneck_switch.__call__.<locals>.f(values, axis, skipna, **kwds)
    153         result = alt(values, axis=axis, skipna=skipna, **kwds)
    154 else:
--> 155     result = alt(values, axis=axis, skipna=skipna, **kwds)
    157 return result

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/nanops.py:410, in _datetimelike_compat.<locals>.new_func(values, axis, skipna, mask, **kwargs)
    407 if datetimelike and mask is None:
    408     mask = isna(values)
--> 410 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
    412 if datetimelike:
    413     result = _wrap_results(result, orig_values.dtype, fill_value=iNaT)

File ~/anaconda3/lib/python3.9/site-packages/pandas/core/nanops.py:698, in nanmean(values, axis, skipna, mask)
    695     dtype_count = dtype
    697 count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
--> 698 the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
    700 if axis is not None and getattr(the_sum, "ndim", False):
    701     count = cast(np.ndarray, count)

File ~/anaconda3/lib/python3.9/site-packages/numpy/core/_methods.py:48, in _sum(a, axis, dtype, out, keepdims, initial, where)
     46 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
     47          initial=_NoValue, where=True):
---> 48     return umr_sum(a, axis, dtype, out, keepdims, initial, where)

TypeError: can only concatenate str (not "int") to str

Question Hacks;

  1. What is NumPy and how is it used in data analysis?

NumPy is a Python library that supports in matrices and also a variety of mathematical operations that can be performed on them. It is commonly used for analyzing data, numerical computations, data manipulation, and is used a lot in statistics.

  1. What is Pandas and how is it used in data analysis?

Pandas is a Python library that is used for data manipulation, analysis, and visualization.

  1. How is NumPy different than Pandas for data analysis?

NumPy is primarily focused on numerical computations with arrays and matrices, while Pandas is designed for data manipulation and analysis of structured data, offering more advanced functionality such as data alignment, merging, and grouping.

  1. What is a DataFrame?

A DataFrame is a two-dimensional, size-mutable table where pandas can be used to easily analyze the data and change what it is saying.

  1. What are some common operations you can perform with NumPy?

NumPy provides a variety of common numerical operations for arrays and matrices, including basic arithmetic, statistical analysis, linear algebra, Fourier transforms, and random number generation.

  1. How Can You Incorporate Either of these Data Analysis Tools (NunPy, Pandas) into your project?

You can incorporate either NumPy or Pandas into your project by importing the library in your Python code and using its functions and methods for data analysis and manipulation. I can make a .csv for smartphones and use it to graph a line of best fit for my smartphones