Skip to content

Commit d112635

Browse files
committed
Update chapter 3
1 parent 39d6a8e commit d112635

File tree

8 files changed

+510
-374
lines changed

8 files changed

+510
-374
lines changed

docs/3. Productionizing/3.0. Package.md

Lines changed: 41 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -6,120 +6,115 @@ description: Learn how to structure and build your Python code into a package, w
66

77
## What is a Python package?
88

9-
A [Python package](https://packaging.python.org/en/latest/) is a structured collection of Python modules, which allows for a convenient way to organize and share code. Among the various formats a package can take, [the wheel format](https://peps.python.org/pep-0427/) (.whl) stands out. Wheels are a built package format that can significantly speed up the installation process for Python software, compared to distributing source code and requiring the user to build it themselves.
9+
A [Python package](https://packaging.python.org/en/latest/) is a structured directory of Python modules that can be easily distributed and installed. In MLOps, packaging is the foundation for creating reproducible, maintainable, and shareable machine learning systems.
1010

11-
## Why do you need to create a Python package?
11+
Packages are typically distributed as **wheels** (`.whl` files), a pre-built format that makes installation faster and more reliable than installing from source code.
1212

13-
Creating a Python package offers multiple benefits, particularly for developers looking to distribute their code effectively:
13+
## Why create a Python package for an ML project?
1414

15-
- **As a Library:** Packaging your code as a library enables you to share reusable components across different projects. This is common in the Python ecosystem, with examples like `numpy`, `pandas`, and `tensorflow` being shared as libraries.
16-
- **As an Application:** Packaging also plays a crucial role in deploying applications. It simplifies the distribution and installation process, ensuring your software can be easily executed on various systems, including web or mobile platforms.
15+
Packaging your ML project is a critical step in moving from research to production. It provides several key advantages:
1716

18-
Additionally, creating a package can enhance the maintainability of your code, enforce good coding practices by encouraging modular design, and facilitate version control and dependency management.
17+
- **Reproducibility:** It bundles your code and its specific dependencies, ensuring that it runs consistently across different environments.
18+
- **Modularity:** It encourages you to organize code into reusable components (e.g., for data processing, feature engineering, or model training), which can be shared across projects.
19+
- **Simplified Deployment:** It allows you to distribute your project as a versioned library for other services to use or as a standalone application with defined entrypoints.
20+
- **Clear Structure:** It enforces a standardized project structure, making it easier for new team members to understand and contribute to the codebase.
1921

20-
## Which tool should you use to create a Python package?
22+
## Which tool should you use to build a Python package?
2123

22-
The Python ecosystem provides several tools for packaging, each with its unique features and advantages. While the choice can seem overwhelming, as humorously depicted in the [xkcd comic on Python environments](https://xkcd.com/1987/), [uv](https://docs.astral.sh/uv/) emerges as a standout option. Uv simplifies dependency management and packaging, offering an intuitive interface for developers.
24+
While the Python packaging ecosystem has many tools, as humorously noted in this [xkcd comic](https://xkcd.com/1987/), the modern standard is **[uv](https://docs.astral.sh/uv/)**. It is an extremely fast and comprehensive tool that handles dependency management, virtual environments, and package building.
2325

24-
To get started with uv for packaging, you can use the following commands:
26+
Key `uv` commands for packaging include:
2527

26-
- **Initiate an uv package**:
28+
- **`uv sync`**: Installs the base dependencies listed in `pyproject.toml`.
29+
- **`uv sync --all-groups`**: Installs all dependencies, including optional groups for development, testing, and documentation.
30+
- **`uv build --wheel`**: Builds your package into a `.whl` file, which appears in the `dist/` directory.
2731

28-
```bash
29-
uv sync
30-
```
32+
For developers exploring other options, tools like [PDM](https://pdm-project.org/en/latest/), [Hatch](https://hatch.pypa.io/latest/), and [Pipenv](https://pipenv.pypa.io/en/latest/) also offer robust packaging and dependency management features.
3133

32-
- **Start developing the package**:
34+
## Should you use Conda for production ML projects?
3335

34-
```bash
35-
uv sync --all-groups
36-
```
36+
[Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) is popular among data scientists for its ability to manage both Python and non-Python dependencies. However, for production MLOps, it presents challenges like slow performance and a complex dependency resolver.
3737

38-
- **Build a package with uv**:
39-
40-
```bash
41-
uv build --wheel
42-
```
43-
44-
At the end of the build process, a `.whl` file is generated in the `dist` folder with the name and version of the project from `pyproject.toml`.
45-
46-
For those seeking alternatives, tools like [PDM](https://pdm-project.org/en/latest/), [Hatch](https://hatch.pypa.io/latest/), and [Pipenv](https://pipenv.pypa.io/en/latest/) offer different approaches to package management and development, each with its own set of features designed to cater to various needs within the Python community.
47-
48-
## Do you recommend Conda for your AI/ML project?
49-
50-
Although [Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) is a popular choice among data scientists for its ability to manage complex dependencies, it's important to be aware of its limitations. Challenges such as slow performance, a complex dependency resolver, and confusing channel management can hinder productivity. Moreover, Conda's integration with the Python ecosystem, especially with new standards like `pyproject.toml`, is limited. For managing complex dependencies in AI/ML projects, Docker containers present a robust alternative, offering better isolation and compatibility across environments.
38+
For production environments, the industry-standard approach is to use **`uv`** for managing Python dependencies defined in `pyproject.toml` and **Docker** for creating isolated, reproducible environments that include system-level dependencies. This combination provides superior performance, compatibility, and control.
5139

5240
## How can you install new dependencies with uv?
5341

54-
Please refer to [this section of the course](../1. Initializing/1.3. uv (project).md).
42+
Please refer to [this section of the course](../1. Initializing/1.3.%20uv%20(project).md).
43+
44+
## What metadata is essential for a Python package?
5545

56-
## Which metadata should you provide to your Python package?
46+
The [`pyproject.toml`](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/) file is the heart of your package, defining its identity, dependencies, and build configuration.
5747

58-
Including detailed metadata in your [`pyproject.toml`](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/) file is crucial for defining your package's identity and dependencies. This file should contain essential information such as the package name, version, authors, and dependencies. Here's an example that outlines the basic structure and content for your package's metadata:
48+
Here is an example with explanations for each section:
5949

6050
```toml
6151
# https://docs.astral.sh/uv/reference/settings/
6252
# https://packaging.python.org/en/latest/guides/writing-pyproject-toml/
6353

54+
# Core project metadata used by PyPI and installation tools.
6455
[project]
6556
name = "bikes"
6657
version = "3.0.0"
6758
description = "Predict the number of bikes available."
6859
authors = [{ name = "Médéric HURIER", email = "[email protected]" }]
6960
readme = "README.md"
7061
requires-python = ">=3.13"
71-
dependencies = []
62+
dependencies = [] # List your production dependencies here
7263
license = { file = "LICENSE.txt" }
7364
keywords = ["mlops", "python", "package"]
7465

66+
# URLs that appear on your package's PyPI page.
7567
[project.urls]
7668
Homepage = "https://github.com/fmind/bikes"
7769
Documentation = "https://fmind.github.io/bikes"
7870
Repository = "https://github.com/fmind/bikes"
7971
"Bug Tracker" = "https://github.com/fmind/bikes/issues"
8072
Changelog = "https://github.com/fmind/bikes/blob/main/CHANGELOG.md"
8173

74+
# Defines command-line scripts. 'bikes' will be a command that runs the 'main' function.
8275
[project.scripts]
8376
bikes = 'bikes.scripts:main'
8477

78+
# Configures uv to install optional dependency groups by default during development.
8579
[tool.uv]
8680
default-groups = ["checks", "commits", "dev", "docs", "notebooks"]
8781

82+
# Specifies the build tool (Hatchling, in this case) to create the package.
8883
[build-system]
8984
requires = ["hatchling"]
9085
build-backend = "hatchling.build"
9186
```
9287

93-
This information not only aids users in understanding what your package does but also facilitates its discovery and integration into other projects.
88+
## Where should you structure the source code for your package?
9489

95-
## Where should you add the source code of your Python package?
90+
Always place your package's source code inside a `src` directory. This is known as the [**`src` layout**](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/) and is a best practice for several reasons:
9691

97-
For a clean and efficient project structure, placing your package's source code in a `src` directory is recommended. This approach, known as [the `src` layout](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/), separates your package's code from other project files, such as tests and documentation, reducing the risk of import clashes and making it easier to package and distribute your code.
92+
- **Prevents Import Conflicts:** It ensures that your installed package is used during testing, not the local source files. This prevents bugs where the code works locally but fails after installation.
93+
- **Clean Separation:** It keeps your importable package code separate from project root files like `pyproject.toml`, tests, and documentation.
9894

99-
Here's how you can set up this structure:
95+
To create this structure, run:
10096

10197
```bash
10298
mkdir -p src/bikes
10399
touch src/bikes/__init__.py
104100
```
105101

106-
The presence of an `__init__.py` file within a directory indicates to Python that this directory should be treated as a package, making it possible for other parts of your project or external projects to import its modules.
102+
The `__init__.py` file tells Python to treat the `src/bikes` directory as a package.
107103

108-
## Should you publish your Python package? On which platform should you publish it?
104+
## Should you publish your Python package, and where?
109105

110-
Deciding whether to publish your Python package depends on your goals. If you aim to share your work with the broader community or need a convenient way to distribute your code across projects or teams, publishing is a great option. [The Python Package Index (PyPI)](https://pypi.org/) is the primary repository for public Python packages, making it an ideal platform for reaching a wide audience.
106+
The decision to publish depends on your audience:
111107

112-
For private packages or when sharing within a limited group or organization, platforms like [AWS CodeArtifact](https://aws.amazon.com/codeartifact/) or [GCP Artifact Registry](https://cloud.google.com/artifact-registry) offer secure hosting and management of your packages.
108+
- **Public Packages:** If you want to share your work with the open-source community, publish it to the [**Python Package Index (PyPI)**](https://pypi.org/). This makes it installable by anyone using `pip` or `uv`.
109+
- **Private Packages:** For internal company projects or proprietary code, use a private artifact registry. Popular choices include [**AWS CodeArtifact**](https://aws.amazon.com/codeartifact/), [**GCP Artifact Registry**](https://cloud.google.com/artifact-registry), or **GitHub Packages**.
113110

114-
To publish a package using uv, you can use the command:
111+
To publish your package to a configured repository, use the command:
115112

116113
```bash
117114
uv publish
118115
```
119116

120-
This will upload your package to PyPI, making it available for installation via `pip` by the Python community.
121-
122-
## Package additional resources
117+
## Additional Resources
123118

124119
- **[`pyproject.toml` example from MLOps Python Package](https://github.com/fmind/mlops-python-package/blob/main/pyproject.toml)**
125120
- [A great MLOps project should start with a good Python Package 🐍](https://fmind.medium.com/a-great-mlops-project-should-start-with-a-good-python-package-7662bdf79563)

0 commit comments

Comments
 (0)