You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/3. Productionizing/3.0. Package.md
+41-46Lines changed: 41 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,120 +6,115 @@ description: Learn how to structure and build your Python code into a package, w
6
6
7
7
## What is a Python package?
8
8
9
-
A [Python package](https://packaging.python.org/en/latest/) is a structured collection of Python modules, which allows for a convenient way to organize and share code. Among the various formats a package can take, [the wheel format](https://peps.python.org/pep-0427/) (.whl) stands out. Wheels are a built package format that can significantly speed up the installation process for Python software, compared to distributing source code and requiring the user to build it themselves.
9
+
A [Python package](https://packaging.python.org/en/latest/) is a structured directory of Python modules that can be easily distributed and installed. In MLOps, packaging is the foundation for creating reproducible, maintainable, and shareable machine learning systems.
10
10
11
-
## Why do you need to create a Python package?
11
+
Packages are typically distributed as **wheels** (`.whl` files), a pre-built format that makes installation faster and more reliable than installing from source code.
12
12
13
-
Creating a Python package offers multiple benefits, particularly for developers looking to distribute their code effectively:
13
+
## Why create a Python package for an ML project?
14
14
15
-
-**As a Library:** Packaging your code as a library enables you to share reusable components across different projects. This is common in the Python ecosystem, with examples like `numpy`, `pandas`, and `tensorflow` being shared as libraries.
16
-
-**As an Application:** Packaging also plays a crucial role in deploying applications. It simplifies the distribution and installation process, ensuring your software can be easily executed on various systems, including web or mobile platforms.
15
+
Packaging your ML project is a critical step in moving from research to production. It provides several key advantages:
17
16
18
-
Additionally, creating a package can enhance the maintainability of your code, enforce good coding practices by encouraging modular design, and facilitate version control and dependency management.
17
+
-**Reproducibility:** It bundles your code and its specific dependencies, ensuring that it runs consistently across different environments.
18
+
-**Modularity:** It encourages you to organize code into reusable components (e.g., for data processing, feature engineering, or model training), which can be shared across projects.
19
+
-**Simplified Deployment:** It allows you to distribute your project as a versioned library for other services to use or as a standalone application with defined entrypoints.
20
+
-**Clear Structure:** It enforces a standardized project structure, making it easier for new team members to understand and contribute to the codebase.
19
21
20
-
## Which tool should you use to create a Python package?
22
+
## Which tool should you use to build a Python package?
21
23
22
-
The Python ecosystem provides several tools for packaging, each with its unique features and advantages. While the choice can seem overwhelming, as humorously depicted in the[xkcd comic on Python environments](https://xkcd.com/1987/), [uv](https://docs.astral.sh/uv/) emerges as a standout option. Uv simplifies dependency management and packaging, offering an intuitive interface for developers.
24
+
While the Python packaging ecosystem has many tools, as humorously noted in this[xkcd comic](https://xkcd.com/1987/), the modern standard is **[uv](https://docs.astral.sh/uv/)**. It is an extremely fast and comprehensive tool that handles dependency management, virtual environments, and package building.
23
25
24
-
To get started with uv for packaging, you can use the following commands:
26
+
Key `uv` commands for packaging include:
25
27
26
-
-**Initiate an uv package**:
28
+
-**`uv sync`**: Installs the base dependencies listed in `pyproject.toml`.
29
+
-**`uv sync --all-groups`**: Installs all dependencies, including optional groups for development, testing, and documentation.
30
+
-**`uv build --wheel`**: Builds your package into a `.whl` file, which appears in the `dist/` directory.
27
31
28
-
```bash
29
-
uv sync
30
-
```
32
+
For developers exploring other options, tools like [PDM](https://pdm-project.org/en/latest/), [Hatch](https://hatch.pypa.io/latest/), and [Pipenv](https://pipenv.pypa.io/en/latest/) also offer robust packaging and dependency management features.
31
33
32
-
-**Start developing the package**:
34
+
## Should you use Conda for production ML projects?
33
35
34
-
```bash
35
-
uv sync --all-groups
36
-
```
36
+
[Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) is popular among data scientists for its ability to manage both Python and non-Python dependencies. However, for production MLOps, it presents challenges like slow performance and a complex dependency resolver.
37
37
38
-
-**Build a package with uv**:
39
-
40
-
```bash
41
-
uv build --wheel
42
-
```
43
-
44
-
At the end of the build process, a `.whl` file is generated in the `dist` folder with the name and version of the project from `pyproject.toml`.
45
-
46
-
For those seeking alternatives, tools like [PDM](https://pdm-project.org/en/latest/), [Hatch](https://hatch.pypa.io/latest/), and [Pipenv](https://pipenv.pypa.io/en/latest/) offer different approaches to package management and development, each with its own set of features designed to cater to various needs within the Python community.
47
-
48
-
## Do you recommend Conda for your AI/ML project?
49
-
50
-
Although [Conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) is a popular choice among data scientists for its ability to manage complex dependencies, it's important to be aware of its limitations. Challenges such as slow performance, a complex dependency resolver, and confusing channel management can hinder productivity. Moreover, Conda's integration with the Python ecosystem, especially with new standards like `pyproject.toml`, is limited. For managing complex dependencies in AI/ML projects, Docker containers present a robust alternative, offering better isolation and compatibility across environments.
38
+
For production environments, the industry-standard approach is to use **`uv`** for managing Python dependencies defined in `pyproject.toml` and **Docker** for creating isolated, reproducible environments that include system-level dependencies. This combination provides superior performance, compatibility, and control.
51
39
52
40
## How can you install new dependencies with uv?
53
41
54
-
Please refer to [this section of the course](../1. Initializing/1.3. uv (project).md).
42
+
Please refer to [this section of the course](../1. Initializing/1.3.%20uv%20(project).md).
43
+
44
+
## What metadata is essential for a Python package?
55
45
56
-
## Which metadata should you provide to your Python package?
46
+
The [`pyproject.toml`](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/) file is the heart of your package, defining its identity, dependencies, and build configuration.
57
47
58
-
Including detailed metadata in your [`pyproject.toml`](https://packaging.python.org/en/latest/guides/writing-pyproject-toml/) file is crucial for defining your package's identity and dependencies. This file should contain essential information such as the package name, version, authors, and dependencies. Here's an example that outlines the basic structure and content for your package's metadata:
48
+
Here is an example with explanations for each section:
# Specifies the build tool (Hatchling, in this case) to create the package.
88
83
[build-system]
89
84
requires = ["hatchling"]
90
85
build-backend = "hatchling.build"
91
86
```
92
87
93
-
This information not only aids users in understanding what your package does but also facilitates its discovery and integration into other projects.
88
+
## Where should you structure the source code for your package?
94
89
95
-
## Where should you add the source code of your Python package?
90
+
Always place your package's source code inside a `src` directory. This is known as the [**`src` layout**](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/) and is a best practice for several reasons:
96
91
97
-
For a clean and efficient project structure, placing your package's source code in a `src` directory is recommended. This approach, known as [the `src` layout](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/), separates your package's code from other project files, such as tests and documentation, reducing the risk of import clashes and making it easier to package and distribute your code.
92
+
-**Prevents Import Conflicts:** It ensures that your installed package is used during testing, not the local source files. This prevents bugs where the code works locally but fails after installation.
93
+
-**Clean Separation:** It keeps your importable package code separate from project root files like `pyproject.toml`, tests, and documentation.
98
94
99
-
Here's how you can set up this structure:
95
+
To create this structure, run:
100
96
101
97
```bash
102
98
mkdir -p src/bikes
103
99
touch src/bikes/__init__.py
104
100
```
105
101
106
-
The presence of an `__init__.py` file within a directory indicates to Python that this directory should be treated as a package, making it possible for other parts of your project or external projects to import its modules.
102
+
The `__init__.py` file tells Python to treat the `src/bikes` directory as a package.
107
103
108
-
## Should you publish your Python package? On which platform should you publish it?
104
+
## Should you publish your Python package, and where?
109
105
110
-
Deciding whether to publish your Python package depends on your goals. If you aim to share your work with the broader community or need a convenient way to distribute your code across projects or teams, publishing is a great option. [The Python Package Index (PyPI)](https://pypi.org/) is the primary repository for public Python packages, making it an ideal platform for reaching a wide audience.
106
+
The decision to publish depends on your audience:
111
107
112
-
For private packages or when sharing within a limited group or organization, platforms like [AWS CodeArtifact](https://aws.amazon.com/codeartifact/) or [GCP Artifact Registry](https://cloud.google.com/artifact-registry) offer secure hosting and management of your packages.
108
+
-**Public Packages:** If you want to share your work with the open-source community, publish it to the [**Python Package Index (PyPI)**](https://pypi.org/). This makes it installable by anyone using `pip` or `uv`.
109
+
-**Private Packages:** For internal company projects or proprietary code, use a private artifact registry. Popular choices include [**AWS CodeArtifact**](https://aws.amazon.com/codeartifact/), [**GCP Artifact Registry**](https://cloud.google.com/artifact-registry), or **GitHub Packages**.
113
110
114
-
To publish a package using uv, you can use the command:
111
+
To publish your package to a configured repository, use the command:
115
112
116
113
```bash
117
114
uv publish
118
115
```
119
116
120
-
This will upload your package to PyPI, making it available for installation via `pip` by the Python community.
121
-
122
-
## Package additional resources
117
+
## Additional Resources
123
118
124
119
-**[`pyproject.toml` example from MLOps Python Package](https://github.com/fmind/mlops-python-package/blob/main/pyproject.toml)**
125
120
-[A great MLOps project should start with a good Python Package 🐍](https://fmind.medium.com/a-great-mlops-project-should-start-with-a-good-python-package-7662bdf79563)
0 commit comments