-
Notifications
You must be signed in to change notification settings - Fork 63
Description
/kind feature
What happened:
My team is working with the Kubeflow platform and we're investigating using ormb to share and publish our ML models and other stateful artifacts like transformers (e.g. standard scaler, pca, tf-idf vectorizer) on Harbor.
As far as I understand, to publish a stateful artifact after it processed data, the following steps need to be performed:
- save the "fitted" artifact within an
<artifact_name>/model/directory - write an
<artifact_name>/ormbfile.yamlartifact config file containing the artifact's metadata - run the
ormbsaveandpushcommands to package and publish the stateful artifact
As some of the metadata can:
- only be known at runtime (e.g.
createddatetime,sizeof the artifact , run-dependenthyperparameters,metrics) - or better be automatically populated at runtime (e.g.
revision,frameworkwith its version used)
<artifact_name>/ormbfile.yaml artifact config file needs to be programmatically written/modified. This step – without any utilities – requires to write a lot of logic on the user side.
What you expected to happen:
Have a process of publishing ML stateful artifacts as convenient & automated as possible for the end user, i.e. the data scientist.
Maybe we could implement some utilities within ormb python sdk to make the process more convenient in practice.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I'm not that familiar with image based registries, the underlying concepts, and the tools of that ecosystem, so feel free to correct me or suggest me any useful materials.