Skip to content

Commit acfc3d1

Browse files
Ericson2314roberth
andcommitted
Document "hash derivation quotiented", resolution, and build trace
Progress on #13405, which asks for an explicit characterisation of the equivalence relation like the one given here. Also progress on #11895, because we're using the term "build trace entry" instead of "realisation". Mention #9259, a future work item. Co-authored-by: Robert Hensing <[email protected]>
1 parent 34ac179 commit acfc3d1

File tree

13 files changed

+528
-22
lines changed

13 files changed

+528
-22
lines changed

doc/manual/book.toml.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ additional-css = ["custom.css"]
77
additional-js = ["redirects.js"]
88
edit-url-template = "https://github.com/NixOS/nix/tree/master/doc/manual/{path}"
99
git-repository-url = "https://github.com/NixOS/nix"
10+
mathjax-support = true
1011

1112
# Handles replacing @docroot@ with a path to ./source relative to that markdown file,
1213
# {{#include handlebars}}, and the @generated@ syntax used within these. it mostly

doc/manual/meson.build

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@ manual = custom_target(
9292
(cd @2@; RUST_LOG=warn @1@ build -d @2@ 3>&2 2>&1 1>&3) | { grep -Fv "because fragment resolution isn't implemented" || :; } 3>&2 2>&1 1>&3
9393
rm -rf @2@/manual
9494
mv @2@/html @2@/manual
95+
# Remove Mathjax 2.7, because we will actually use MathJax 3.x
96+
find @2@/manual | grep .html | xargs sed -i -e '/2.7.1.MathJax.js/d'
9597
find @2@/manual -iname meson.build -delete
9698
'''.format(
9799
python.full_path(),

doc/manual/source/SUMMARY.md.in

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,12 @@
2626
- [Derivation Outputs and Types of Derivations](store/derivation/outputs/index.md)
2727
- [Content-addressing derivation outputs](store/derivation/outputs/content-address.md)
2828
- [Input-addressing derivation outputs](store/derivation/outputs/input-address.md)
29+
- [Build Trace](store/build-trace.md)
30+
- [Derivation Resolution](store/resolution.md)
2931
- [Building](store/building.md)
3032
- [Store Types](store/types/index.md)
3133
{{#include ./store/types/SUMMARY.md}}
34+
- [Appendix: Math notation](store/math-notation.md)
3235
- [Nix Language](language/index.md)
3336
- [Data Types](language/types.md)
3437
- [String context](language/string-context.md)

doc/manual/source/protocols/derivation-aterm.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Derivation "ATerm" file format
22

3-
For historical reasons, [store derivations][store derivation] are stored on-disk in [ATerm](https://homepages.cwi.nl/~daybuild/daily-books/technology/aterm-guide/aterm-guide.html) format.
3+
For historical reasons, [store derivations][store derivation] are stored on-disk in "Annotated Term" (ATerm) format
4+
([guide](https://homepages.cwi.nl/~daybuild/daily-books/technology/aterm-guide/aterm-guide.html),
5+
[paper](https://doi.org/10.1002/(SICI)1097-024X(200003)30:3%3C259::AID-SPE298%3E3.0.CO;2-Y)).
46

57
## The ATerm format used
68

doc/manual/source/protocols/json/schema/derivation-v3.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ properties:
3939
This is a guard that allows us to continue evolving this format.
4040
The choice of `3` is fairly arbitrary, but corresponds to this informal version:
4141
42-
- Version 0: A-Term format
42+
- Version 0: ATerm format
4343
44-
- Version 1: Original JSON format, with ugly `"r:sha256"` inherited from A-Term format.
44+
- Version 1: Original JSON format, with ugly `"r:sha256"` inherited from ATerm format.
4545
4646
- Version 2: Separate `method` and `hashAlgo` fields in output specs
4747
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Build Trace
2+
3+
> **Warning**
4+
>
5+
> This entire concept is currently
6+
> [**experimental**](@docroot@/development/experimental-features.md#xp-feature-ca-derivations)
7+
> and subject to change.
8+
9+
The *build trace* is a [memoization table](https://en.wikipedia.org/wiki/Memoization) for builds.
10+
It maps the the inputs of builds to the outputs of builds.
11+
Concretely, that means it maps [derivations][derivation] to maps of [output] names to [store objects][store object].
12+
13+
In general the derivations used as a key should be be [*resolved*](./resolution.md).
14+
A build trace with all-resolved-derivation keys is also called a *base build trace* for extra clarity.
15+
If all the resolved inputs of a derivation are content-addressed, that means the inputs will be fully determined, leaving no ambiguity for what build was performed.
16+
(Input-addressed inputs however are still ambiguous. They too should be locked down, but this is left as future work.)
17+
18+
Accordingly, to look up an unresolved derivation, one must first resolve it to get a resolved derivation.
19+
Resolving itself involves looking up entries in the build trace, so this is a mutually recursive process that will end up inspecting possibly many entries.
20+
21+
Except for the issue with input-addressed paths called out above, base build traces are trivially *coherent* -- incoherence is not possible.
22+
That means that the claims that each key-value base build try entry makes are independent, and no mapping invalidates another mapping.
23+
24+
Whether the mappings are *true*, i.e. the faithful recording of actual builds performed, is another matter.
25+
Coherence is about the multiple claims of the build trace being mutually consistent, not about whether the claims are individually true or false.
26+
27+
In general, there is no way to audit a build trace entry except for by performing the build again from scratch.
28+
And even in that case, a different result doesn't mean the original entry was a "lie", because the derivation being built may be non-deterministic.
29+
As such, the decision of whether to trust a counterparty's build trace is a fundamentally subject policy choice.
30+
Build trace entries are typically *signed* in order to enable arbitrary public-key-based trust polices.
31+
32+
## Derived build traces
33+
34+
Implementations that wish to memoize the above may also keep additional *derived* build trace entries that do map unresolved derivations.
35+
But if they do so, they *must* also keep the underlying base entries with resolved derivation keys around.
36+
Firstly, this ensures that the derived entries are merely cache, which could be recomputed from scratch.
37+
Secondly, this ensures the coherence of the derived build trace.
38+
39+
Unlike with base build traces, incoherence with derived build traces is possible.
40+
The key ingredient is that derivation resolution is only deterministic with respect to a fixed base build trace.
41+
Without fixing the base build trace, it inherits the subjectivity of base build traces themselves.
42+
43+
Concretely, suppose there are three derivations \\(a\\), \\(b\\), and \((c\\).
44+
Let \\(a\\) be a resolved derivation, but let \\(b\\) and \((c\\) be unresolved and both take as an input an output of \\(a\\).
45+
Now suppose that derived entries are made for \\(b\\) and \((c\\) based on two different entries of \\(a\\).
46+
(This could happen if \\(a\\) is non-deterministic, \\(a\\) and \\(b\\) are built in one store, \\(a\\) and \\(c\\) are built in another store, and then a third store substitutes from both of the first two stores.)
47+
48+
If trusting the derived build trace entries for \\(b\\) and \((c\\) requires that each's underlying entry for \\(a\\) be also trusted, the two different mappings for \\(a\\) will be caught.
49+
However, if \\(b\\) and \((c\\)'s entries can be combined in isolation, there will be nothing to catch the contradiction in their hidden assumptions about \\(a\\)'s output.
50+
51+
[derivation]: ./derivation/index.md
52+
[output]: ./derivation/outputs/index.md
53+
[store object]: @docroot@/store/store-object.md

doc/manual/source/store/derivation/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ If those other derivations *also* abide by this common case (and likewise for tr
245245
> note the ".drv"
246246
> ```
247247
248-
## Extending the model to be higher-order
248+
## Extending the model to be higher-order {#dynamic}
249249
250250
**Experimental feature**: [`dynamic-derivations`](@docroot@/development/experimental-features.md#xp-feature-dynamic-derivations)
251251

doc/manual/source/store/derivation/outputs/content-address.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,10 +167,10 @@ It is only in the potential for that check to fail that they are different.
167167
>
168168
> In a future world where floating content-addressing is also stable, we in principle no longer need separate [fixed](#fixed) content-addressing.
169169
> Instead, we could always use floating content-addressing, and separately assert the precise value content address of a given store object to be used as an input (of another derivation).
170-
> A stand-alone assertion object of this sort is not yet implemented, but its possible creation is tracked in [Issue #11955](https://github.com/NixOS/nix/issues/11955).
170+
> A stand-alone assertion object of this sort is not yet implemented, but its possible creation is tracked in [issue #11955](https://github.com/NixOS/nix/issues/11955).
171171
>
172172
> In the current version of Nix, fixed outputs which fail their hash check are still registered as valid store objects, just not registered as outputs of the derivation which produced them.
173-
> This is an optimization that means if the wrong output hash is specified in a derivation, and then the derivation is recreated with the right output hash, derivation does not need to be rebuilt --- avoiding downloading potentially large amounts of data twice.
173+
> This is an optimization that means if the wrong output hash is specified in a derivation, and then the derivation is recreated with the right output hash, derivation does not need to be rebuilt &mdash; avoiding downloading potentially large amounts of data twice.
174174
> This optimisation prefigures the design above:
175175
> If the output hash assertion was removed outside the derivation itself, Nix could additionally not only register that outputted store object like today, but could also make note that derivation did in fact successfully download some data.
176176
For example, for the "fetch URL" example above, making such a note is tantamount to recording what data is available at the time of download at the given URL.

0 commit comments

Comments
 (0)