Python & Pandas Data Analysis Questions & Answers

Question 1

Both “=” and “==” perform the same function in Python?

Answer: False

Question 2

What is the output of the following lines of code:

A white background with black text  Description automatically generated

Answer: 211

Question 3

Which of the following is not a valid variable type in Python?

Answer: DoubleString

Question 4

What will be the result of the following code?

A close-up of a number  Description automatically generated

Answer: 6.0

Question 5

What will be the output of the following Python expression?

A close-up of a number  AI-generated content may be incorrect.

Answer: “100100100”

Question 6

Which code will filter the dataframe df for rows where the column “col1” has values that are NOT equal to 5?

Answer: df[df[“col1”] != 5]

Question 7

Which of the following pandas methods will report summary statistics of numeric columns?

Answer: describe()

Question 8

Suppose we have a dataframe df, which includes a column animal. I would like to count the number of times each animal appears in the column, e.g. Elephant: 8, Zebra: 7, etc. Which of the following commands can I use to do that?

Answer: df[“animal”].value_counts()

Question 9

Assuming dataframe df is properly defined, what line of code would output the first 10 rows of data?

Answer: df.head(10)

Question 10

Which option will help us select elements “a” and “b” in the list L = [“a”, “b”, “c”]? (Select all that apply.)

Answers:

  • L[0:2]
  • L[-3:-1]

Question 11

Which of these will help us select two columns, c1 and c2, from the dataframe df?

Answer: df[[“c1”, “c2”]]

Question 12a

What will be the output of List1[-2] if List1 is defined as below?

AD_4nXdZHr51ogl17WQxzafMJllaE2MPVO1mTyTcG_n7_MmiQOs0VR2F1A8usR2ZNIhSGXaOEYUSPTb7EZTQm8S5A643JaSFkzI5DhX6Pll3aW1ad7rWSGRgEy9z0Y6-TueGkQfw1bwCdFVJkrBFPeTfeDA?key=sTWON3iEf99spsqmRY6LO0DU

Answer: ‘Professor’

Question 12b

A list can hold multiple types of data.

Answer: True

Question 13

What would the following code output?

A number on a white background  Description automatically generated

Answer: 5

Question 14

How would you return the enrollment count in business from this dictionary?

AD_4nXecUDGeIBCrej7YsEX0fxeqUUD8FhKERSwdZLwa8ojFa86woy_yaKOYGIvxofE_S2knyaBrWmNWSkObtaPxA4_NcfhF_2OmpBRuUeU_6-GX9QmsJ65uzK9XL4q_XZ4TDsS_Wio7hjr7Yvkrt5_ZrTs?key=sTWON3iEf99spsqmRY6LO0DU

Answer: Univ[“business”][“enrollment”]

Question 15

Which of the following is appropriate syntax for a key-value pair in a dictionary?

Answer: “Location”: “Urbana”

Question 16

What will be the output of this code?

A couple of people with a white background  Description automatically generated with medium confidence

Answer: {‘Dan’: 3.8, ‘Jason’: 4, ‘Jasmine’: 4, ‘Lori’: 3.85}

Question 18

In the below scatterplot, what is the relationship between the variables on the x-axis and y-axis?

AD_4nXf_Z93rJFH2KIA1t1K6VRqGCYjxAjBPo6JRY6cxWwZTPMT9Efgcxcg9vGKXUpg4Gxqdsl7ICvQo766H6XSShaHh9_zjSyZF8aP_n5VDDCPflzrCoNtg55zuxgEDLW6ZpEexlx1veBtjn7ybFJ-bjqM?key=sTWON3iEf99spsqmRY6LO0DU

Answer: A positive relationship

Question 19

The following boxplot shows the distribution of tip amounts by day of the week. Based on the figure, which of the following is true? (Select all that apply.)

AD_4nXcazPJ0PK62zndh2c0jWqQOsbcHTXEqnyoBU9ds21E_7vWuad9IE9Fv7FE2190FYALu4RRuIFeue0UiMmRnSycSBHmZ1dUsQvWqdAoZAM6SnM0mpWHIXxnmQiLMXUnoH1dJ0IN0_XvtmF7Toyio3w?key=sTWON3iEf99spsqmRY6LO0DU

Answers:

  • The median tip on Sat is less than that on Sun.
  • The first quartile for tip is similar across all four days.
  • Friday and Sunday have no outliers.

Question 20

The following bar chart shows average rainfall by month. Based on the figure, which of the following is true?

Interpret Given Bar Graphs | CK-12 Foundation

There was no day in March when it rained for more than 50 mm. For each day of April 2014, at least 50 mm of rain was recorded. The day with the highest rain was in May 2014.

Answer: None of the other options.

Question 21

To filter a dataframe df such that a column, col1, has values only greater than 100, we run the following code:

A close-up of numbers  Description automatically generated

What will be the outcome if we only run the code within the outer set of square brackets, i.e., df[‘col1’] > 100?

Answer: A series of True and False will be printed.

Question 22

A scatterplot is a useful chart to visualize the distribution of a categorical variable.

Answer: False

Question 23

Suppose you have a dataset of box office revenues for the top 1000 movies. The dataset includes the genre of each movie. Which chart could you use to show average revenues by genre?

Answer: Bar chart

Question 24

What is the value of the interquartile range (IQR)?

Answer: Q3 – Q1

Question 24

What purpose does a cross-tabulation (pd.crosstab) serve in data analysis?

Answer: Examine the relationship between two categorical variables.

Question 26

What does the groupby function in pandas do?

Answer: Group rows of a dataframe based on one or more columns

Question 27

Which of the following measures cannot be inferred from a box plot?

Answers:

  • Count
  • Min
  • Mean

Question 28

Which of the following plots can be created based on a single numeric variable?

Answers:

  • Box plot
  • Histogram

Question 29

Which of the following statements is TRUE?

Answer: Each column in a pandas dataframe is a Series.

Question 30

Consider the dataframe df with categorical variable catvar which takes n possible values. How many dummy variables will the following code create?

AD_4nXf9UDBMryeb8k1qsSCOmMyojq7wMkJW4mHMe7XtF3wkF8tNp5XKxIgXhN04op393Pz3yYv9WhYkBqJBFq45rEmllL6I0OJKpabDuWWBXyNcE23q_3rbQZgeVgpTGP3cSJAVXlbrvfxUzTUAqd-KVg?key=sTWON3iEf99spsqmRY6LO0DU

Answer: n dummy variables

Question 31

Consider the dataframe df with m numerical variables, and one categorical variable catvar which takes n possible values. How many variables will be in the dataframe created by the following code?

A black and green math equation  Description automatically generated

Answer: m+n-1 variables

Question 32

Consider the dataframe df with categorical variable catvar and a numerical variable numvar. What is the correct code to find the mean of numvar for each value of catvar?

Answer: df.groupby(“catvar”)[“numvar”].mean()

Question 33

Consider the dataframe df with categorical variable catvar and two numerical variables numvar1 and numvar2. What is the correct code to find the mean of numvar1 and numvar2 for each value of catvar?

Answer: df.groupby(“catvar”)[[“numvar1”, “numvar2”]].mean()

Question 34

The dataframe df has 100 rows and 5 columns. After running df.duplicated().sum(), you find that the dataframe has 9 duplicate values. How many rows will df have after running df.drop_duplicates()?

Answer: 100, the code will only return a copy of df with duplicates removed. It will not change the underlying dataframe.

How many rows will df have after running df.drop_duplicates(inplace = True)?

Answer: 91 (100 – 9)

Question 35

The dataframe df has 100 rows and 5 columns. After running df.isnull().sum(), you find that the dataframe has 5 missing values in Column1 and 5 missing values in Column2. How many rows will df have after running df.dropna()?

Answer: 100, the code will only return a copy of df with duplicates removed. It will not change the underlying dataframe.

How many rows will df have after running df.dropna(inplace = True)?

Answers:

  • At least 90 observations (100 – 5 – 5, if the missing values are all in different rows)
  • At most 95 observations (100 – 5, if the missing values in Column1 are in the same rows as the missing values in Column2)

Question 36

For the next three questions, consider the following dataframe, called df.

A table with numbers and letters  Description automatically generated

Which of the following code will return the following subset? (Select all that apply.)

A screenshot of a table  Description automatically generated

Answers:

  • df.loc[[1, 3, 5],:]
  • df.loc[[1, 3, 5]]
  • df.iloc[0:3,:]
  • df.loc[1:5,:]

Question 37

Which of the following code will return the following subset? (Select all that apply.)

AD_4nXdKcSsuESnICY8_VhJJ2m9zZbqYUmxY7hN_G6h5FvGpqmBqP_9Z5DFf7bxWgua98f-tzuERD6EcFMTLgyQkuLbqrQnY8XgejtTegZattQb-Gdjwb6Nz6XqFlLreajNBdfbB24bV2wU_fIcMjlY9J1M?key=sTWON3iEf99spsqmRY6LO0DU

Answers:

  • df.loc[:,[“Col1”, “Col3”, “Col5”]]
  • df.iloc[:,0:3]

Question 39

Which of the following code will return the following subset? (Select all that apply.)

A screenshot of a cell phone  Description automatically generated

Answers:

  • df.loc[[1,3,5],[“Col1”, “Col3”, “Col5”]]
  • df.iloc[0:3,0:3]
  • df.loc[1:5,”Col1″:”Col5″]

Question 40

You have an inventory dictionary that tracks items at two different outlets:

inventory = {
"OutletA": {"chairs": 15, "desks": 30},
"OutletB": {"chairs": 12, "desks": 21}
}

Which expression correctly retrieves the number of desks at OutletB?

Answer: inventory[“OutletB”][“desks”]

Question 41

Consider a pandas DataFrame df that has a numeric column called price. You want to filter for rows where price is at least 50 and less than 100. Which line of code accomplishes this?

Answer: df[(df[“price”] >= 50) & (df[“price”] < 100)]

Question 42

A study was conducted to analyze the relationship between study habits and whether students drink coffee. The data has been stored in a dataframe:

Study hours/drinks_coffee

FALSE

TRUE

Low

0.5

0.5

Medium

0.4

0.6

High

0.25

0.75

study_hours (possible values: Low, Medium, High)

drinks_coffee (possible values: True, False)

In the crosstab, what is the correct interpretation of the value 0.75 in the bottom right?

Answer: b. 75% of students who study a lot also drink coffee.

Question 43

You want to calculate the sum and standard deviation of Exam_Score and Study_Hours, grouped by Major and Year. Which of the following is correct?

Answer: a. df.groupby([“Major”, “Year”])[[“Exam_Score”, “Study_Hours”]].agg([“sum”, “std”])