Skip to content

Commit 4ae73d1

Browse files
authored
Upgrade to Python 3.12 (#1516)
we're bumping our image to Python 3.12 which required the following: remove numpy-mkl: unfortunately we were not able to find/install a compatible version, we opted to remove it based on previous conv we had on this topic. we will instead use the default installed in colab. remove cuml installation hack: thankfully are able to use the pre-installed base image version without build errors. unpinned package due to learn: Learn is no longer dependent on this build, we can freely unpinned many packages -seaborn, scikit-learn, matplotlib, geopandas, TPOT, shapely, tfdf, ydf, etc remove incompatible packages: Some of these are no longer support and cause build issues -pydegensac, pymc3, eli5, etc remove preinstalled package: where applicable we removed packages that are already installed in colab base image https://b.corp.google.com/issues/468103319
1 parent e756e92 commit 4ae73d1

File tree

15 files changed

+55
-138
lines changed

15 files changed

+55
-138
lines changed

Dockerfile.tmpl

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,6 @@ RUN pip freeze | grep -E 'tensorflow|keras|torch|jax' > /colab_requirements.txt
1212
RUN cat /colab_requirements.txt >> /requirements.txt
1313
RUN cat /kaggle_requirements.txt >> /requirements.txt
1414

15-
# TODO: GPU requirements.txt
16-
# TODO: merge them better (override matching ones).
17-
1815
# Install Kaggle packages
1916
RUN uv pip install --system -r /requirements.txt
2017

@@ -29,36 +26,25 @@ RUN uv pip install --system --force-reinstall --prerelease=allow "kagglehub[pand
2926
# to avoid affecting the larger build, we'll post-install it.
3027
RUN uv pip install --no-build-isolation --system "git+https://github.com/Kaggle/learntools"
3128

32-
# b/408281617: Torch is adamant that it can not install cudnn 9.3.x, only 9.1.x, but Tensorflow can only support 9.3.x.
33-
# This conflict causes a number of package downgrades, which are handled in this command
34-
RUN uv pip install \
35-
--index-url https://pypi.nvidia.com --extra-index-url https://pypi.org/simple/ --index-strategy unsafe-first-match \
36-
--system --force-reinstall "cuml-cu12==25.2.1" \
37-
"nvidia-cudnn-cu12==9.3.0.75" "nvidia-cublas-cu12==12.5.3.2" "nvidia-cusolver-cu12==11.6.3.83" \
38-
"nvidia-cuda-cupti-cu12==12.5.82" "nvidia-cuda-nvrtc-cu12==12.5.82" "nvidia-cuda-runtime-cu12==12.5.82" \
39-
"nvidia-cufft-cu12==11.2.3.61" "nvidia-curand-cu12==10.3.6.82" "nvidia-cusparse-cu12==12.5.1.3" \
40-
"nvidia-nvjitlink-cu12==12.5.82"
41-
RUN uv pip install --system --force-reinstall "pynvjitlink-cu12==0.5.2"
42-
43-
# b/385145217 Latest Colab lacks mkl numpy, install it.
44-
RUN uv pip install --system --force-reinstall -i https://pypi.anaconda.org/intel/simple numpy
45-
4629
# newer daal4py requires tbb>=2022, but libpysal is downgrading it for some reason
4730
RUN uv pip install --system "tbb>=2022" "libpysal==4.9.2"
4831

4932
# b/404590350: Ray and torchtune have conflicting tune cli, we will prioritize torchtune.
5033
# b/415358158: Gensim removed from Colab image to upgrade scipy
5134
# b/456239669: remove huggingface-hub pin when pytorch-lighting and transformer are compatible
5235
# b/315753846: Unpin translate package, currently conflicts with adk 1.17.0
53-
RUN uv pip install --system --force-reinstall --no-deps torchtune gensim "scipy<=1.15.3" "huggingface-hub==0.36.0" "google-cloud-translate==3.12.1"
36+
# b/468379293: Unpin Pandas once cuml/cudf are compatible, version 3.0 causes issues
37+
# b/468383498: numpy will auto-upgrade to 2.4.x, which causes issues with numerous packages
38+
# b/468367647: Unpin protobuf, version greater than v5.29.5 causes issues with numerous packages
39+
RUN uv pip install --system --force-reinstall --no-deps torchtune gensim "scipy<=1.15.3" "huggingface-hub==0.36.0" "google-cloud-translate==3.12.1" "numpy==2.0.2" "pandas==2.2.2"
40+
RUN uv pip install --system --force-reinstall "protobuf==5.29.5"
5441

5542
# Adding non-package dependencies:
5643
ADD clean-layer.sh /tmp/clean-layer.sh
5744
ADD patches/nbconvert-extensions.tpl /opt/kaggle/nbconvert-extensions.tpl
5845
ADD patches/template_conf.json /opt/kaggle/conf.json
5946

60-
# /opt/conda/lib/python3.11/site-packages
61-
ARG PACKAGE_PATH=/usr/local/lib/python3.11/dist-packages
47+
ARG PACKAGE_PATH=/usr/local/lib/python3.12/dist-packages
6248

6349
# Install GPU-specific non-pip packages.
6450
{{ if eq .Accelerator "gpu" }}
@@ -168,7 +154,7 @@ ADD patches/kaggle_gcp.py \
168154

169155
# Figure out why this is in a different place?
170156
# Found by doing a export PYTHONVERBOSE=1 and then running python and checking for where it looked for it.
171-
ADD patches/sitecustomize.py /usr/lib/python3.11/sitecustomize.py
157+
ADD patches/sitecustomize.py /usr/lib/python3.12/sitecustomize.py
172158

173159
ARG GIT_COMMIT=unknown \
174160
BUILD_DATE=unknown

config.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
BASE_IMAGE=us-docker.pkg.dev/colab-images/public/runtime
2-
BASE_IMAGE_TAG=release-colab_20250725-060057_RC00
2+
BASE_IMAGE_TAG=release-colab-external_20251024-060052_RC00
33
CUDA_MAJOR_VERSION=12
44
CUDA_MINOR_VERSION=5

kaggle_requirements.txt

Lines changed: 7 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,9 @@ PyArabic
77
PyUpSet
88
Pympler
99
Rtree
10-
shapely<2
10+
shapely
1111
SimpleITK
12-
# b/302136621: Fix eli5 import for learntools, newer version require scikit-learn > 1.3
13-
TPOT==0.12.1
14-
Theano
12+
TPOT
1513
Wand
1614
annoy
1715
arrow
@@ -29,21 +27,14 @@ deap
2927
dipy
3028
docker
3129
easyocr
32-
# b/302136621: Fix eli5 import for learntools
33-
eli5
3430
emoji
3531
fastcore
36-
# b/445960030: Requires a newer version of fastai than the currently used base image.
37-
# Remove when relying on a newer base image.
38-
fastai>=2.8.4
3932
fasttext
4033
featuretools
4134
fiona
4235
fury
4336
fuzzywuzzy
4437
geojson
45-
# geopandas > v0.14.4 breaks learn tools
46-
geopandas==v0.14.4
4738
gensim
4839
# b/443054743,b/455550872
4940
google-adk[a2a,eval]
@@ -81,7 +72,7 @@ libpysal<=4.9.2
8172
lime
8273
line_profiler
8374
mamba
84-
matplotlib<3.8
75+
matplotlib
8576
mlcrate
8677
mne
8778
mpld3
@@ -90,9 +81,7 @@ nbconvert==6.4.5
9081
nbdev
9182
nilearn
9283
olefile
93-
# b/445960030: Broken in 1.19.0. See https://github.com/onnx/onnx/issues/7249.
94-
# Fixed with https://github.com/onnx/onnx/pull/7254. Upgrade when version with fix is published.
95-
onnx==1.18.0
84+
onnx
9685
openslide-bin
9786
openslide-python
9887
optuna
@@ -107,11 +96,9 @@ preprocessing
10796
pudb
10897
pyLDAvis
10998
pycryptodome
110-
pydegensac
11199
pydicom
112100
pyemd
113101
pyexcel-ods
114-
pymc3
115102
pymongo
116103
pypdf
117104
pytesseract
@@ -123,32 +110,25 @@ qtconsole
123110
ray
124111
rgf-python
125112
s3fs
126-
# b/302136621: Fix eli5 import for learntools
127-
scikit-learn==1.2.2
113+
scikit-learn
128114
# Scikit-learn accelerated library for x86
129115
scikit-learn-intelex>=2023.0.1
130116
scikit-multilearn
131117
scikit-optimize
132118
scikit-plot
133119
scikit-surprise
134-
# Also pinning seaborn for learntools
135-
seaborn==0.12.2
120+
seaborn
136121
git+https://github.com/facebookresearch/segment-anything.git
137-
# b/329869023: shap 0.45.0 breaks learntools
138-
shap==0.44.1
122+
shap
139123
squarify
140124
tensorflow-cloud
141125
tensorflow-io
142126
tensorflow-text
143-
tensorflow_decision_forests
144127
torchinfo
145128
torchmetrics
146129
torchtune
147130
transformers>=4.51.0
148131
vtk
149132
wavio
150-
# b/350573866: xgboost v2.1.0 breaks learntools
151-
xgboost==2.0.3
152133
xvfbwrapper
153134
ydata-profiling
154-
ydf

tests/data/kagglehub/models/keras/bert/keras/bert_tiny_en_uncased/2/metadata.json

Lines changed: 0 additions & 6 deletions
This file was deleted.

tests/data/kagglehub/models/keras/bert/keras/bert_tiny_en_uncased/2/tokenizer.json

Lines changed: 0 additions & 21 deletions
This file was deleted.

tests/data/kagglehub/models/keras/bert/keras/bert_tiny_en_uncased/2/assets/tokenizer/vocabulary.txt renamed to tests/data/kagglehub/models/keras/bert/keras/bert_tiny_en_uncased/3/assets/tokenizer/vocabulary.txt

File renamed without changes.

tests/data/kagglehub/models/keras/bert/keras/bert_tiny_en_uncased/2/config.json renamed to tests/data/kagglehub/models/keras/bert/keras/bert_tiny_en_uncased/3/config.json

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"module": "keras_nlp.src.models.bert.bert_backbone",
2+
"module": "keras_hub.src.models.bert.bert_backbone",
33
"class_name": "BertBackbone",
44
"config": {
55
"name": "bert_backbone",
@@ -13,7 +13,5 @@
1313
"max_sequence_length": 512,
1414
"num_segments": 2
1515
},
16-
"registered_name": "keras_nlp>BertBackbone",
17-
"assets": [],
18-
"weights": "model.weights.h5"
16+
"registered_name": "keras_hub>BertBackbone"
1917
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"keras_version": "3.7.0",
3+
"keras_hub_version": "0.19.0",
4+
"parameter_count": 4385920,
5+
"date_saved": "2024-12-20@19:42:50",
6+
"tasks": [
7+
"MaskedLM",
8+
"TextClassifier"
9+
]
10+
}
Binary file not shown.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"module": "keras_hub.src.models.bert.bert_tokenizer",
3+
"class_name": "BertTokenizer",
4+
"config": {
5+
"name": "bert_tokenizer",
6+
"trainable": true,
7+
"dtype": {
8+
"module": "keras",
9+
"class_name": "DTypePolicy",
10+
"config": {
11+
"name": "int32"
12+
},
13+
"registered_name": null
14+
},
15+
"config_file": "tokenizer.json",
16+
"vocabulary": null,
17+
"sequence_length": null,
18+
"lowercase": true,
19+
"strip_accents": false,
20+
"split": true,
21+
"suffix_indicator": "##",
22+
"oov_token": "[UNK]",
23+
"special_tokens": null,
24+
"special_tokens_in_strings": false
25+
},
26+
"registered_name": "keras_hub>BertTokenizer"
27+
}

0 commit comments

Comments
 (0)