|
6 | 6 | That is to say, an input-addressed output's store path is a function not of the output itself, but of the derivation that produced it. |
7 | 7 | Even if two store paths have the same contents, if they are produced in different ways, and one is input-addressed, then they will have different store paths, and thus guaranteed to not be the same store object. |
8 | 8 |
|
9 | | -<!--- |
| 9 | +## Modulo fixed-output derivations {#hash-modulo} |
10 | 10 |
|
11 | | -### Modulo fixed-output derivations |
| 11 | +So how do we compute the hash part of the output paths of an input-addressed derivation? |
| 12 | +This is done by the function `hashDerivationModulo`, shown below. |
12 | 13 |
|
13 | | -**TODO hash derivation modulo.** |
| 14 | +First, a word on inputs. |
| 15 | +`hashDerivationModulo` is only defined on derivations whose [inputs](@docroot@/store/derivation/index.md#inputs) take the first-order form: |
| 16 | +```typescript |
| 17 | +type ConstantPath = { |
| 18 | + path: StorePath; |
| 19 | +}; |
14 | 20 |
|
15 | | -So how do we compute the hash part of the output path of a derivation? |
16 | | -This is done by the function `hashDrv`, shown in Figure 5.10. |
17 | | -It distinguishes between two cases. |
18 | | -If the derivation is a fixed-output derivation, then it computes a hash over just the `outputHash` attributes. |
| 21 | +type FirstOrderOutputPath = { |
| 22 | + drvPath: StorePath; |
| 23 | + output: OutputName; |
| 24 | +}; |
19 | 25 |
|
20 | | -If the derivation is not a fixed-output derivation, we replace each element in the derivation’s inputDrvs with the result of a call to `hashDrv` for that element. |
21 | | -(The derivation at each store path in `inputDrvs` is converted from its on-disk ATerm representation back to a `StoreDrv` by the function `parseDrv`.) In essence, `hashDrv` partitions store derivations into equivalence classes, and for hashing purpose it replaces each store path in a derivation graph with its equivalence class. |
| 26 | +type FirstOrderDerivingPath = ConstantPath | FirstOrderOutputPath; |
22 | 27 |
|
23 | | -The recursion in Figure 5.10 is inefficient: |
24 | | -it will call itself once for each path by which a subderivation can be reached, i.e., `O(V k)` times for a derivation graph with `V` derivations and with out-degree of at most `k`. |
25 | | -In the actual implementation, memoisation is used to reduce this to `O(V + E)` complexity for a graph with E edges. |
| 28 | +type Inputs = Set<FirstOrderDerivingPath>; |
| 29 | +``` |
26 | 30 |
|
27 | | ---> |
| 31 | +For the the algorithm below, we adopt a derivation where the two types of (first order) derived paths are partitioned into two sets, as follows: |
| 32 | +```typescript |
| 33 | +type Derivation = { |
| 34 | + // inputs: Set<FirstOrderDerivingPath>; // replaced |
| 35 | + inputSrcs: Set<ConstantPath>; // new instead |
| 36 | + inputDrvOutputs: Set<FirstOrderOutputPath>; // new instead |
| 37 | + // ...other fields... |
| 38 | +}; |
| 39 | +``` |
28 | 40 |
|
| 41 | +In the [currently-experimental][xp-feature-dynamic-derivations] higher-order case where outputs of outputs are allowed as [deriving paths][deriving-path] and thus derivation inputs, derivations using that generalization are not valid arguments to this function. |
| 42 | +Those derivations must be (partially) [resolved](@docroot@/store/resolution.md) enough first, to the point where no such higher-order inputs remain. |
| 43 | +Then, and only then, can input addresses be assigned. |
| 44 | + |
| 45 | +``` |
| 46 | +function hashDerivationModulo(drv) -> Hash: |
| 47 | + assert(drv.outputs are input-addressed) |
| 48 | + drv′ ← drv with { |
| 49 | + inputDrvOutputs = ⋃( |
| 50 | + assert(drvPath is store path) |
| 51 | + case hashOutputsOrDerivationModulo(readDrv(drvPath)) of |
| 52 | + drvHash : Hash → |
| 53 | + (drvHash.toBase16(), output) |
| 54 | + outputHashes : Map[String, Hash] → |
| 55 | + (outputHashes[output].toBase16(), "out") |
| 56 | + | (drvPath, output) ∈ drv.inputDrvOutputs |
| 57 | + ) |
| 58 | + } |
| 59 | + return hashSHA256(printDrv(drv′)) |
| 60 | +
|
| 61 | +function hashOutputsOrDerivationModulo(drv) -> Map[String, Hash] | Hash: |
| 62 | + if drv.outputs are content-addressed: |
| 63 | + return { |
| 64 | + outputName ↦ hashSHA256( |
| 65 | + "fixed:out:" + ca.printMethodAlgo() + |
| 66 | + ":" + ca.hash.toBase16() + |
| 67 | + ":" + ca.makeFixedOutputPath(drv.name, outputName)) |
| 68 | + | (outputName ↦ output) ∈ drv.outputs |
| 69 | + , ca = output.contentAddress // or get from build trace if floating |
| 70 | + } |
| 71 | + else: // drv.outputs are input-addressed |
| 72 | + return hashDerivationModulo(drv) |
| 73 | +``` |
| 74 | + |
| 75 | +### `hashDerivationModulo` |
| 76 | + |
| 77 | +We replace each element in the derivation's `inputDrvOutputs` using data from a call to `hashOutputsOrDerivationModulo` on the `drvPath` of that element. |
| 78 | +When `hashOutputsOrDerivationModulo` returns a single drv hash (because the input derivation in question is input-addressing), we simply swap out the `drvPath` for that hash, and keep the same output name. |
| 79 | +When `hashOutputsOrDerivationModulo` returns a map of content addresses per per-output, we look up the output in question, and pair it with the output name `out`. |
| 80 | + |
| 81 | +The resulting pseudo-derivation (with hashes instead of store paths in `inputDrvs`) is then printed (in the ["ATerm" format](@docroot@/protocols/derivation-aterm.md)) and hashes, and this becomes the "hash modulo" of the derivation. |
| 82 | + |
| 83 | +When calculating output hashes, `hashDerivationModulo` is called on an almost-complete input-addressing derivation, which is just missing its input-addressed outputs paths. |
| 84 | +The derivation hash is then used to calculate output paths for each output. |
| 85 | +<!-- TODO describe how this is done. --> |
| 86 | +Those output paths can then be substituted into the almost-complete input-addressed derivation to complete it. |
| 87 | + |
| 88 | +> **Note** |
| 89 | +> |
| 90 | +> There may be an unintentional deviation from specification currently implemented in the `(outputHashes[output].toBase16(), "out")` case. |
| 91 | +> This is not fatal because the deviation would only apply for content-addressing derivations with more than one output, and that only occurs in the floating case, which is [experimental][xp-feature-ca-derivations]. |
| 92 | +> Once this bug is fixed, this note will be removed. |
| 93 | +
|
| 94 | +### `hashOutputsOrDerivationModulo` |
| 95 | + |
| 96 | +How does `hashOutputsOrDerivationModulo` in turn work? |
| 97 | +It consists of two main cases, based on whether the outputs of the derivation are to be input-addressed or content-addressed. |
| 98 | + |
| 99 | +#### Input-addressed outputs case |
| 100 | + |
| 101 | +In the input-addressed case, it just calls `hashDerivationModulo`, and returns that derivation hash. |
| 102 | +This makes `hashDerivationModulo` and `hashOutputsOrDerivationModulo` mutually-recursive. |
| 103 | + |
| 104 | +> **Note** |
| 105 | +> |
| 106 | +> In this case, `hashDerivationModulo` is being called on a *complete* input-addressing derivation that already has its output paths calculated. |
| 107 | +> The `inputDrvs` substitution takes place anyways. |
| 108 | +
|
| 109 | +#### Content-addressed outputs case |
| 110 | + |
| 111 | +If the outputs are [content-addressed](./content-address.md), then it computes a hash for each output derived from the content-address of that output. |
| 112 | + |
| 113 | +> **Note** |
| 114 | +> |
| 115 | +> In the [fixed](./content-address.md#fixed) content-addressing case, the outputs' content addresses are statically specified in advanced, so this always just works. |
| 116 | +> (The fixed case is what the pseudo-code shows.) |
| 117 | +> |
| 118 | +> In the [floating](./content-address.md#floating) case, the content addresses are not specified in advanced. |
| 119 | +> This is what the "or get from build trace if floating" comment refers to. |
| 120 | +> In this case, the algorithm is *stuck* until the input in question is built, and we know what the actual contents of the output in question is. |
| 121 | +> |
| 122 | +> That is OK however, because there is no problem with delaying the assigning of input addresses (which, remember, is what `hashDerivationModulo` is ultimately for) until all inputs are known. |
| 123 | +
|
| 124 | +### Performance |
| 125 | + |
| 126 | +The recursion in the algorithm is potentially inefficient: |
| 127 | +it could call itself once for each path by which a subderivation can be reached, i.e., `O(V^k)` times for a derivation graph with `V` derivations and with out-degree of at most `k`. |
| 128 | +In the actual implementation, memoisation is used to reduce this to `O(V + E)` complexity for a graph with `E` edges. |
| 129 | + |
| 130 | +### Semantic properties |
| 131 | + |
| 132 | +In essence, `hashDerivationModulo` partitions input-addressing derivations into equivalence classes: every derivation in that equivalence class is mapped to the same derivation hash. |
| 133 | +We can characterize this equivalence relation directly, by working bottom up. |
| 134 | + |
| 135 | +We start by defining an equivalence relation on first-order output deriving paths that refer content-addressed derivation outputs. Two such paths are equivalent if they refer to the same store object: |
| 136 | + |
| 137 | +\\[ |
| 138 | +\\begin{prooftree} |
| 139 | +\\AxiomC{$d\_1$ is content-addressing} |
| 140 | +\\AxiomC{$d\_2$ is content-addressing} |
| 141 | +\\AxiomC{${}^\*(d\_1, o\_1) = {}^\*(d\_2, o\_2)$} |
| 142 | +\\TrinaryInfC{$(d\_1, o\_1) \\,\\sim_{\\mathrm{CA}}\\, (d\_2, o\_2)$} |
| 143 | +\\end{prooftree} |
| 144 | +\\] |
| 145 | + |
| 146 | +where \\({}^*(d, o)\\) denotes the store object that the output deriving path refers to. |
| 147 | + |
| 148 | +We will also need the following construction to lift any equivalence relation on \\(X\\) to an equivalence relation on (finite) sets of \\(X\\) (in short, \\(\\mathcal{P}(X)\\)): |
| 149 | + |
| 150 | +\\[ |
| 151 | +\\begin{prooftree} |
| 152 | +\\AxiomC{$\\forall a \\in A. \\exists b \\in B. a \\,\\sim\_X\\, b$} |
| 153 | +\\AxiomC{$\\forall b \\in B. \\exists a \\in A. b \\,\\sim\_X\\, a$} |
| 154 | +\\BinaryInfC{$A \\,\\sim_{\\mathcal{P}(X)}\\, B$} |
| 155 | +\\end{prooftree} |
| 156 | +\\] |
| 157 | + |
| 158 | +Now we can define the equivalence relation \\(\\sim_\\mathrm{IA}\\) on input-addressed derivation outputs. Two input-addressed outputs are equivalent if their derivations are equivalent (via the yet-to-be-defined \\(\\sim_{\\mathrm{IADrv}}\\) relation) and their output names are the same: |
| 159 | + |
| 160 | +\\[ |
| 161 | +\\begin{prooftree} |
| 162 | +\\AxiomC{$d\_1$ is input-addressing} |
| 163 | +\\AxiomC{$d\_2$ is input-addressing} |
| 164 | +\\AxiomC{$d\_1 \\,\\sim_{\\mathrm{IADrv}}\\, d\_2$} |
| 165 | +\\AxiomC{$o\_1 = o\_2$} |
| 166 | +\\QuaternaryInfC{$(d\_1, o\_1) \\,\\sim_{\\mathrm{IA}}\\, (d\_2, o\_2)$} |
| 167 | +\\end{prooftree} |
| 168 | +\\] |
| 169 | + |
| 170 | +And now we can define \\(\\sim_{\\mathrm{IADrv}}\\). |
| 171 | +Two input-addressed derivations are equivalent if their content-addressed inputs are equivalent, their input-addressed inputs are also equivalent, and they are otherwise equal: |
| 172 | + |
| 173 | +\\[ |
| 174 | +\\begin{prooftree} |
| 175 | +\\AxiomC{$ |
| 176 | + \\mathrm{caInputs}(d\_1) |
| 177 | + \\,\\sim_{\\mathcal{P}(\\mathrm{CA})}\\, |
| 178 | + \\mathrm{caInputs}(d\_2) |
| 179 | +$} |
| 180 | +\\AxiomC{$ |
| 181 | + \\mathrm{iaInputs}(d\_1) |
| 182 | + \\,\\sim_{\\mathcal{P}(\\mathrm{IA})}\\, |
| 183 | + \\mathrm{iaInputs}(d\_2) |
| 184 | +$} |
| 185 | +\\AxiomC{$ |
| 186 | + d\_1\left[\\mathrm{inputDrvOutputs} := \\{\\}\right] |
| 187 | + \= |
| 188 | + d\_2\left[\\mathrm{inputDrvOutputs} := \\{\\}\right] |
| 189 | +$} |
| 190 | +\\TrinaryInfC{$d\_1 \\,\\sim_{\\mathrm{IADrv}}\\, d\_2$} |
| 191 | +\\end{prooftree} |
| 192 | +\\] |
| 193 | + |
| 194 | +where \\(\\mathrm{caInputs}(d)\\) returns the content-addressed inputs of \\(d\\) and \\(\\mathrm{iaInputs}(d)\\) returns the input-addressed inputs. |
| 195 | + |
| 196 | +> **Note** |
| 197 | +> |
| 198 | +> An astute reader might notice that that nowhere does `inputSrcs` enter into these definitions. |
| 199 | +> That means that replacing an input derivation with its outputs directly added to `inputSrcs` always results in a derivation in a different equivalence class, despite the resulting input closure (as would be mounted in the store at build time) being the same. |
| 200 | +> [Issue #9259](https://github.com/NixOS/nix/issues/9259) is about creating a coarser equivalence relation to address this. |
| 201 | +> |
| 202 | +> \\(\\sim_\mathrm{Drv}\\) from [derivation resolution](@docroot@/store/resolution.md) is such an equivalence relation. |
| 203 | +> It is coarser than this one: any two derivations which are "'hash modulo'-equivalent" (\\(\\sim_\mathrm{IADrv}\\)) are also "resolution-equivalent" (\\(\\sim_\mathrm{Drv}\\)). |
| 204 | +> It also relates derivations whose `inputDrvOutputs` have been rewritten into `inputSrcs`. |
| 205 | +
|
| 206 | +[deriving-path]: @docroot@/store/derivation/index.md#deriving-path |
| 207 | +[xp-feature-dynamic-derivations]: @docroot@/development/experimental-features.md#xp-feature-dynamic-derivations |
29 | 208 | [xp-feature-ca-derivations]: @docroot@/development/experimental-features.md#xp-feature-ca-derivations |
30 | | -[xp-feature-git-hashing]: @docroot@/development/experimental-features.md#xp-feature-git-hashing |
31 | | -[xp-feature-impure-derivations]: @docroot@/development/experimental-features.md#xp-feature-impure-derivations |
|
0 commit comments