Skip to content

02-dataframe.ipynb notebook - Solutions for checkpint using df_dask which currently does not exist? #1

@billglennon

Description

@billglennon

In the 02-dataframe.ipynb notebook, you have the following:
df = dd.read_csv("data/yellow_tripdata_2019-.csv")
df
and
df = dd.read_csv("data/yellow_tripdata_2019-
.csv",
dtype={'RatecodeID': 'float64',
'VendorID': 'float64',
'passenger_count': 'float64',
'payment_type': 'float64'
})

However, you are using df_dask in your checkpoint solution (for both) which does not exist.

Solution 1

std_tip = df_dask.groupby("passenger_count").tip_amount.std().compute()

Solution 2

mean_total = df_dask.total_amount.mean()
std_total = df_dask.total_amount.mean()

dask.compute(mean_total, std_total)

I recommend changing to use df_dask for the Dask dataframe section and change existing code to use it.
There are a few places that use df.xxx when going over the Dask dataframe.
e.g.
%%time

mean_tip_amount = df.groupby("passenger_count").tip_amount.mean()
mean_tip_amount

Also, I would make Solution 1 the following (if you use df_dask) and show the output (added 2nd line)
std_tip = df_dask.groupby("passenger_count").tip_amount.std().compute()
std_tip

Thanks for putting this course together and the notebooks! Very much enjoyed it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions