Skip to content

Commit 4efa9c1

Browse files
authored
Merge pull request #883 from A7med7x7/main
Author Added
2 parents 332af1c + 19ab6a2 commit 4efa9c1

File tree

6 files changed

+103
-0
lines changed

6 files changed

+103
-0
lines changed

content/authors/alghali/_index.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
# Display name
3+
title: Ahmed Alghali
4+
5+
# Username (this should match the folder name)
6+
authors:
7+
- alghali
8+
9+
# Is this the primary user of the site?
10+
superuser: false
11+
12+
# Role/position
13+
role: "undergraduate Computer Science student at The University of Khartoum"
14+
15+
# Organizations/Affiliations
16+
organizations:
17+
- name: University of Khartoum
18+
url: "https://www.uofk.edu/"
19+
20+
21+
22+
# Short bio (displayed in user profile at end of posts)
23+
bio: Ahmed Alghali is an undergraduate Computer Science student at the University of Khartoum with interest in applied machine learning and data platforms.
24+
25+
26+
# Social/Academic Networking
27+
# For available icons, see: https://sourcethemes.com/academic/docs/widgets/#icons
28+
# For an email link, use "fas" icon pack, "envelope" icon, and a link in the
29+
# form "mailto:[email protected]" or "#contact" for contact widget.
30+
social:
31+
- icon: envelope
32+
icon_pack: fas
33+
34+
- icon: github
35+
icon_pack: fab
36+
link: https://github.com/a7med7x7
37+
- icon: linkedin
38+
icon_pack: fab
39+
link: https://www.linkedin.com/in/ahmed-alghali-4997a5229/
40+
41+
42+
# Enter email to display Gravatar (if Gravatar enabled in Config)
43+
44+
45+
# Organizational groups that you belong to (for People widget)
46+
# Set this to `[]` or comment out if you are not using People widget.
47+
user_groups:
48+
- 2025 Contributors
49+
---
50+
Ahmed Alghali is doing his undergrad degree as a Computer Science student at the University of Khartoum during his undergrad he was involved in machine learning projects . He gained hands-on experience working as an ML Engineer at [AirQo](https://www.airqo.net/home), where he contributed to the research Modeling of fused ground measurements and satellite remote-sensing air quality data and the deployment of the machine learning models developed into the AirQo Platform His work blends strong interests in applied machine intelligence. He is currently part of the project [Applying MLOps to overcome reproducibility barriers](https://ucsc-ospo.github.io/project/osre25/nyu/mlops/) at OSPO, with a growing focus on data platforms, reproducible systems and competiting in data science competitions.

content/authors/alghali/avatar.jpg

746 KB
Loading
391 KB
Loading
62.3 KB
Loading
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: "Applying MLOps to overcome reproducibility barriers in machine learning research"
3+
subtitle: "Streamlining Reproducible Machine Learning Research with Automated MLOps Workflows"
4+
summary: " "
5+
authors:
6+
- alghali
7+
tags: ["osre25","reproducibility","summer of reproducibility", "experiment tracking","machine learning research", "automation","chameleon testbed"]
8+
categories: []
9+
date: 2025-06-22
10+
lastmod: 2025-06-22
11+
featured: flase
12+
draft: false
13+
14+
# Featured image
15+
# To use, add an image named `featured.jpg/png` to your page's folder.
16+
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
17+
image:
18+
caption: ""
19+
focal_point: ""
20+
preview_only: false
21+
---
22+
### About the Project
23+
Hello! I'm Ahmed, an undergraduate Computer Science student at the University of Khartoum I'm working on making machine learning research more reproducible for open access research facilities like [Chameleon testbed](chameleoncloud.org), under the project [Applying MLOps to overcome reproducibility barriers in machine learning research](https://ucsc-ospo.github.io/project/osre25/nyu/mlops/), mentored by Prof. [Fraida Fund](https://ucsc-ospo.github.io/author/fraida-fund/) and [Mohamed Saeed](https://ucsc-ospo.github.io/author/mohamed-saeed/). as part of this project my [proposal](https://docs.google.com/document/d/146PutdVy7cWSf_Gn8qcn0Ba2llMHjNtHIQzZ5a-xRvQ/edit?usp=sharing) aims to build a template generator that generates repositories for reproducible model training on the Chameleon testbed.
24+
25+
### Reproducibility
26+
> *We argue that unless reproducing research becomes as vital and mainstream part of scientific exploration as reading papers is today, reproducibility will be hard to sustain in the long term because the incentives to make research results reproducible won’t outweigh the still considerable costs*
27+
>
28+
> [Three Pillars of Practical Reproducibility Paper](https://www.chameleoncloud.org/media/filer_public/25/18/25189b96-c3a2-4a55-b99b-c25322fe6682/reproducibility_on_chameleon-3.pdf)
29+
30+
![Acadamic code quality](codquality.JPG)
31+
32+
By Reproducibility in science we refer to the ability to obtain consistent results using the same methods and conditions as the previous study. in simple words if I used the same data and metholodgy that was used before, I should obtain the same results. this principle is mapped to almost every scientific field including both Machine Learning research in science and core Machine Learning.
33+
34+
### Challenges in Reproducibility
35+
36+
The same way the famous paper about the [repoducibility crisis in science](https://www.nature.com/articles/d41586-019-00067-3) was published in in 2016, similar discussions have been published discussing this in machine learning research setting, the [paper state of the art reproducibility in artificial intelligence](https://ojs.aaai.org/index.php/AAAI/article/view/11503) after analayzing 400 hundereds papers from top AI conferences, it was found that around 6% shared code, approximately 33% shared test data. In contrast, 54% only shared a pseudocode (summary of the algorithm).
37+
38+
![Percentage of papers documenting each variable for the three factors](variables.png)
39+
40+
The lack of software dependency management, proper version control, log tracking, and effective artifacts sharing made it very difficult to reproduce research in machine learning.
41+
42+
Reproducibility in machine learning is largely supported by MLOps practices which is the case in the industry where the majority of researchers are backed by software engineers who are responsible of setting experimental environments or develop tools that streamline the workflow.However, in academic settings reproducibility remains a great challenge, researchers prefer to focus on coding, and worry a little about the the complexities invloved in configuring their experimental environment,As a result, the adaptation and standardization of MLOps practices in academia progress slowly. The best way to ensure a seamleas experience with MLOps, is to make these capabilities easily accessible to the researchers' workflow. by developing a tool that steamlines the process of provisioning resources, enviornment setup, model training and artifacts tracking, that ensures reproducible results.
43+
44+
45+
### Proposed Solution
46+
47+
![Solution Architecture](Design.png)
48+
49+
We want the researchers to spin up ML research instances/bare metal on Chameleon testbed while keeping the technical complexity involved in configuring and stitching everything together abstracted, users simply answer frew questions about their project info, frameworks, tools, features and integrations if there are any, and have a full generated,reproducible project. it contains a provisioning/infrastracture config layer for provisioning resources on the cloud, a dockerfile to spin up services and presistent storage for data,the ML code at its core is a containarized training environment backed by ML tracking server system that logs the artifacts, metadata, environment configuration, system specification (GPUs type) and Git status using Mlflow, powered by a postgresSQL for storing metadata and a S3 Minio bucket to store artifacts.
50+
persistent storage for the artifacts generated from the experiment and the datasets and containarization of all these to ensure reproducibility.we aim to make the cloud experience easier, by dealing with the configuration needed for setting up the environment having a 3rd party framework, enabling seamless access to benchmarking dataset or any necessary components from services like Hugging face and GitHub as an example will be accessible from the container easily. for more techincal details about the solution you can read my propsal [here](https://docs.google.com/document/d/1ilm-yMEq-UTiJPGMl8tQc3Anl5cKM5RD2sUGInLjLbU).
51+
52+
53+
By addressing these challenges we can accelerate the scientific discovery. this not benefits those who are conducting the research but also the once building on top of it in the future. I look forward to share more updates as the project progresses and I welcome feedback from others interested in advancing reproducibility in ML research.
231 KB
Loading

0 commit comments

Comments
 (0)