Skip to content

Conversation

@natkam
Copy link
Contributor

@natkam natkam commented Oct 13, 2025

No description provided.

natkam and others added 25 commits August 5, 2025 19:51
* fix: Improve logging when feedback generation fails

* chore: Remove an unused dataclass, fix some typos and type annotations

* fix: Fix a potential security issue when running Git commands in a subprocess

When running multiple commands with `sh -c ...`, it would be easier to construct
the variable "message" so that it contained some executable code.

* fix: Fix a typo in the log message after a successful autograde job

* fix: Do not try to generate feedback for notebooks created by students

If the instructor specifies '*.ipynb' in the whitelist of an assignment,
additional notebooks created by students are also copied over. The feedback
generator should not try to generate feedback for them - they are not in
the gradebook and an attempt to do so would fail.

Refs: #272

* fix: Improve error handling and logging in GitBaseHandler.git_reponse()
* refactor: changed load_roles dict entries to lists

* chore: added new load_roles style to dev env

* Update a docstring

---------

Co-authored-by: Natalia Maniakowska <[email protected]>
…290)

* chore: Update the openapi specification to make it less out-of-date

* chore: Regenerate the API models

* fix: Remove (unused) 'grading_failed' from ManualStatus

* chore: Use enums for submission statuses in the code

* fix: Use StrEnum for status enums (makes them json-serialisable)

* chore: DB migration to use submission status enums
* chore: Add migration: switch from using user.name as PK to user.id (WIP)

* refactor: Use sqlalchemy ORM (not raw SQL) to update values in migration

* chore: Update schema and recreate models: user id

* chore: Update the migration to also add user_id to `api_token`, `oauth_code` tables

* chore: Update alembic migration README

* chore: Update some docstrings to make them more explanatory

* chore: Add user_id to ORM classes for APIToken and OAuthCode and update their usage

* chore: Update ORM classes for User, Submission, Role: add user_id

* chore: [WIP] Replace usage of `username` with `user_id` in the code

+cleanup: remove a few unused test helper functions

* chore: [WIP] Replace some more usages of username with user_id

* chore: Fix some further user_id usages, split a few too long lines

* chore: Fix some tests that I broke, refactor a test helper function

* chore: Fix even more user.id and username usages

* style: Fix typos, string formatting, and a copy-paste mistake in an error msg

* chore: Fix failing tests, add a helper function for creating a student

* chore: Small improvements and cleanup in submissions handler tests

* chore: Add type annotations to authentication-related methods in base handler

* refactor: Extract a csv-writting method in submission handler

* chore: Remove unused role check in submission handler

The appropriate role is already enforced by the @Authorise decorator.

* chore: Fix the LectureStudentsHandler (get name from User), add a test

Also: rewrite an inacurrately named test for a submission handler

* chore: Fix user name usages in autograding, values.yaml, and the docs

* refactor: Slightly simplify SubmissionHandler.post

* fix: Set the submission user to student when instructor creates a submission for a student

* refactor: Use BaseHandler.construct_git_dir where possible, validate the constructed path

* fix: Use GitRepoType instead of str in Git base handler

* style: Improve parsing of the pathlets in GitBaseHandler.gitlookup

* fix: Do not autograde submissions created for student by instructor

This fixes an error in LocalAutogradeExecutor where the `git checkout 000...0"
failed for submissions newly created by an instructor.

* fix: Fix failing tests

* fix: Retrieve the submission more carefully in gitlookup()

* chore: Write tests for submission creation with POST request

* fix: Fix a bug when a repo is edited by an instructor

* fix: Update submission post body in OpenAPI schema

* fix: Replace the use of username in submission's delete(), fix a test
… on running tests (#300)

* chore: Write tests for the edit handler (create or reset edit repo)

* fix: Fix a failing test; configure git in GH actions

* chore: Produce coverage report when running the tests
* WIP: updated migrations; postgres does not work yet with data inside

* WIP: added db migration tests

* ci: added docker network

* fix: implemented PR feedback

* chore: changed gitignore to allow grader config files

* fix: added new config

* fix: changed db init name

* test: automatically clean postgresql db after test to avoid side effects

* test: added data integrity test

* Update grader_service/tests/migrate/test_migrate.py

Co-authored-by: Natalia Maniakowska <[email protected]>

---------

Co-authored-by: nadja-jovancevic <[email protected]>
Co-authored-by: Natalia Maniakowska <[email protected]>
* Fix display_name in token and base_handler

* Simplifying display_name

* Allow display_name in load_roles

* Update docstring for load_roles
* fix: make celery lti task synchonous

* fix: removed placeholder aud from bearer token func
@natkam natkam requested a review from florian-jaeger October 13, 2025 14:03
raise HTTPError(404, reason="Assignment not found")

# create directories once we know they exist in the database
lecture_path = os.path.abspath(os.path.join(self.gitbase, lect_code))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a user-provided value.
This path depends on a user-provided value.
This path depends on a user-provided value.
This path depends on a user-provided value.

Copilot Autofix

AI about 1 month ago

To fix this problem, we should restrict lect_code – used in the filesystem path – to contain only safe characters and never allow path traversal or special characters that could escape the intended directory structure.

For this, the best practice is to validate and sanitize lect_code immediately after retrieving it from the database, before using it to construct any filesystem path. Restricting lecture codes to alphanumerics, underscores, and dashes is a suitable, minimal set. If the application needs more, this set can be modified as required.

  • Introduce a sanitization/helper function, e.g., sanitize_fs_component(name: str) -> str, or simply perform an in-place validation with a regex such as ^[A-Za-z0-9_\-]+$.
  • If the value does not match this pattern, return a 404 or 400 error.
  • Apply this check before line 135, ideally right after extracting or looking up lect_code.
  • Import the re module to use regular expressions for the validation.

Suggested changeset 1
grader_service/handlers/git/server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/grader_service/handlers/git/server.py b/grader_service/handlers/git/server.py
--- a/grader_service/handlers/git/server.py
+++ b/grader_service/handlers/git/server.py
@@ -9,6 +9,7 @@
 from pathlib import Path
 from string import Template
 from typing import List, Optional
+import re
 
 from sqlalchemy.orm.exc import MultipleResultsFound, NoResultFound
 from tornado.ioloop import IOLoop
@@ -126,6 +127,10 @@
         role = self.session.get(Role, (self.user.id, lecture.id))
         self._check_git_repo_permissions(rpc, role, pathlets)
 
+        # Ensure lect_code is safe to use as a filesystem component
+        if not re.match(r"^[A-Za-z0-9_\-]+$", lect_code):
+            raise HTTPError(400, reason="Invalid lecture code")
+
         try:
             assignment = self.get_assignment(lecture.id, int(assign_id))
         except ValueError:
EOF
@@ -9,6 +9,7 @@
from pathlib import Path
from string import Template
from typing import List, Optional
import re

from sqlalchemy.orm.exc import MultipleResultsFound, NoResultFound
from tornado.ioloop import IOLoop
@@ -126,6 +127,10 @@
role = self.session.get(Role, (self.user.id, lecture.id))
self._check_git_repo_permissions(rpc, role, pathlets)

# Ensure lect_code is safe to use as a filesystem component
if not re.match(r"^[A-Za-z0-9_\-]+$", lect_code):
raise HTTPError(400, reason="Invalid lecture code")

try:
assignment = self.get_assignment(lecture.id, int(assign_id))
except ValueError:
Copilot is powered by AI and may make mistakes. Always verify output.

# create directories once we know they exist in the database
lecture_path = os.path.abspath(os.path.join(self.gitbase, lect_code))
assignment_path = os.path.abspath(os.path.join(lecture_path, assign_id))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a user-provided value.
This path depends on a user-provided value.
This path depends on a user-provided value.
This path depends on a user-provided value.

Copilot Autofix

AI about 1 month ago

To prevent path traversal attacks, ensure that any constructed file system path that includes client-supplied data is not able to escape a defined root directory. After constructing the assignment_path by joining the root directory (lecture_path) with the assign_id, normalize the path (using os.path.abspath(), as already done) and then verify that the resulting path begins with the intended base path (in this case, lecture_path). If the constructed path does not start with the intended base path, raise an error or reject the operation.

Specifically, after assigning assignment_path, add a check:

if not assignment_path.startswith(lecture_path + os.sep):
    raise HTTPError(403, "Invalid assignment path")

This ensures that any manipulation of assign_id or other variables cannot result in a file being created outside of the designated location.

  • Edit grader_service/handlers/git/server.py, in the gitlookup method.
  • After constructing assignment_path, add the validation check shown above.
  • Ensure to import os (already present).

Suggested changeset 1
grader_service/handlers/git/server.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/grader_service/handlers/git/server.py b/grader_service/handlers/git/server.py
--- a/grader_service/handlers/git/server.py
+++ b/grader_service/handlers/git/server.py
@@ -134,6 +134,9 @@
         # create directories once we know they exist in the database
         lecture_path = os.path.abspath(os.path.join(self.gitbase, lect_code))
         assignment_path = os.path.abspath(os.path.join(lecture_path, assign_id))
+        # Ensure that the constructed assignment_path is contained within the intended lecture_path.
+        if not assignment_path.startswith(lecture_path + os.sep):
+            raise HTTPError(403, "Invalid assignment path")
         if not os.path.exists(lecture_path):
             os.mkdir(lecture_path)
         if not os.path.exists(assignment_path):
EOF
@@ -134,6 +134,9 @@
# create directories once we know they exist in the database
lecture_path = os.path.abspath(os.path.join(self.gitbase, lect_code))
assignment_path = os.path.abspath(os.path.join(lecture_path, assign_id))
# Ensure that the constructed assignment_path is contained within the intended lecture_path.
if not assignment_path.startswith(lecture_path + os.sep):
raise HTTPError(403, "Invalid assignment path")
if not os.path.exists(lecture_path):
os.mkdir(lecture_path)
if not os.path.exists(assignment_path):
Copilot is powered by AI and may make mistakes. Always verify output.
@natkam
Copy link
Contributor Author

natkam commented Oct 13, 2025

@florian-jaeger please merge - then we can create the 0.10.0 branch.

* refactor: Start refactoring the LocalAutogradeExecutor class

Split methods into smaller ones, improve logging and error handling

* refactor: [WIP] Prepare GenerateFeedbackExecutor for refactoring

Reflect some of the changes from the base class, LocalAutogradeExecutor,
and improve error handling etc.

* fix: Fix a circular import

* chore: Make all helper methods in kube grader 'private' (for consistency)

* cleanup: Remove redundant recreation of output dirs in graders

* refactor: [WIP] Extract git-related logic from LocalAutogradeExecutor and derrived classes

Also:
- move getting all whitelist patterns to a method in Assignment (seems to fit there better),
- remove some duplicated code from the child classes of the AutogradeExecutor,
  sometimes making their behaviour more consistent with the parent,
- add missing type annotations and improve a few clearly outdated docstrings, etc.

* tests: Add tests for the LocalAutogradeExecutor

* refactor: Remove explicit gradebook file deletion (is done in _cleanup)

* fix: Fix writing gradebook in GenerateFeedbackProcessExecutor

* fix: Remove unused attributes from GenerateFeedbackExecutor

* test: Write tests for local feedback generation executors

* fix: Use the full commit hash when creating a submission in tests setup

* fix: Fix reading stderr in the process executors

process.stderr is a str already, no need to read().decode() it - it fails

* fix: Remove unexpected (and unused) param from GenerateFeedbackApp.start()

* cleanup: Clean up fixtures for autograding tests

* fix: Re-add missing updating of submission logs; log process errors more consistently

* tests: Add tests for LocalProcessAutogradeExecutor, improve tests for local feedback

* refactor: Simplify autograding/feedback log collection

* refactor: Improve running subprocesses, adjust and add tests

* refactor: Pass git commands to subprocess.run as lists, not strings

We execute the git commands with variables potentially provided by users
(filenames, usernames, etc.). Passing them as strings and then splitting,
potentially could enable constructing these variables in such a way
so as to execute malicious commands.

An alternative would be to use `shlex.quote` with any variables,
but it seems more cumbersome.

* refactor: Small tweaks in the local grader and local feedback, update docstrings

* tests: Add tests for the collect_logs context manager

* fix: Feedback repo path, do not save feedback logs in submission.logs

Until now feedback logs have also not been appended to submission.logs.

* chore: Small improvements in tests

* fix: Only commit html files when generating feedback

Previously, all student-supplied files were included in the feedback,
even though feedback is actually only generated for the jupyter notebooks
that are included in the gradebook.
Now only the generated html files will be included in the feedback files.

* fix: Protect against fabricated usernames in git repo paths in autograders

* refactor: Move git manager to a separate module

* chore: Small cleanup in autograde executors

* chore: Make autograding/feedback executor class names more consistent

* fix: Do not write `self.grading_logs` in local feedback executors

We don't use these logs anywhere, so it's not necessary to collect them.
All the logs are still logged with the regular `self.log` logger.
nadjajovancevic and others added 2 commits November 7, 2025 14:05
* WIP: added groups attribute to AssignmentSettings

* fix: add deleted lines back

* fix: add missing attribute (group) in test_base_handler.py
* chore: Small improvements in the docker-compose setup

- remove the network used by all containers (dc creates it by default)
- set default DATABASE_TYPE to sqlite => simpler dc up commands
- change restart policy to unless-stopped => containers don't restart
  after system restart (which was annoying in local development)

* fix: Fix the initial SQL data import for local docker-compose setup

* fix: added psycog2 package to init container and set grader network

* fix: Remove postgres-specific syntax from the initial data file

`E'\n'` is invalid syntax in sqlite. It was used in
`replace(<very long logs as a string>, '\n', E'\n')` and
`replace(<very long logs as a string>, '\012', E'\n')`.
I just replaced these values (`\n` and `\012`) with newlines.

* cleaup: Remove psycopg from Dockerfile (not needed, after all)

* fix: Fix database initialisation in docker compose for sqlite and postgres

* fix: update docker compose docs

* fix: Fix a comment

* chore: Use uv to install python packages in containers in Docker Compose setup

* chore: Make the various init bash scripts for containers more consistent

* chore: Also use uv in jupyterhub Dockerfile

* cleanup: Remove a redundant `expose` from docker compose file

* fix: Remove submission data from the initial db import

We don't include the submission files and repos in the initial setup,
and creating the database objects without them doesn't make much sense,
because one cannot do anything with such submissions in grader service.

* fix: Simplify python deps installation in Dockerfile-Service

* fix: Only load one assignment as initial db data in docker compose setup

* fix: Set the initial data assignment deadline to a future date

* fix: Small adjustment in the init assignment data

* chore: Add initial 'source' repo for docker compose setup

* chore: Add initial 'release' repo for docker compose setup

* chore: Make initial repos available in the docker compose setup

* fix: Fix celery worker container command

* fix: Add `test` dependency group to pyproject.toml, upgrade dev and test deps.

* fix: Replace deprecated `set-output` with $GITHUB_OUTPUT

* chore: Delete unused .gitlab-ci.yml file

* fix: Tiny improvements in docker compose setup (better caching, get rid of a warning)

* fix: Fix github action check for Dockerfile changes

Previously, the check would only output directory names, so the check
for the "Dockerfile" name would never match. I added a separate step that
checks specifically if the Dockerfile has changed.

---------

Co-authored-by: florian.jaeger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants