From f01144694f4adb447c45980eaeda39f33c737179 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 29 Jul 2023 13:24:14 -0700 Subject: [PATCH 1/8] WIP --- text/0000-extern-impl.md | 218 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 218 insertions(+) create mode 100644 text/0000-extern-impl.md diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md new file mode 100644 index 00000000000..9c671c5108f --- /dev/null +++ b/text/0000-extern-impl.md @@ -0,0 +1,218 @@ +- Feature Name: (fill me in with a unique ident, `my_awesome_feature`) +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a +separate crate, while still meeting the soundness guarantees currently enforced +by the orphan rule. + +# Motivation +[motivation]: #motivation + +In order to guarantee coherence, Rust currently imposes the fairly strict orphan +rule. That is: +- only a crate defining a type can have inherent implementations for that type, + or for any traits, and +- only a crate defining a trait can implement that trait for any type + +In other words, a crate is not allowed to have "third-party implementations" - +it's not allowed to add new inherent implementations to a type, nor is it +allowed to implement a foreign trait for a foreign type. + +This raises a number of problems, but this RFC's principle concern is that of +build performance and dependency graph scaling. + +For example: + +- It common for a to only use a crate for its type definitions, but not need any + of it's implementations (for example, when mentioning a type in a function + signature or another type definition). However, with the current orphan rule, + this would require waiting for the full crate to compile, including all of its + dependencies. +- When introducing a new trait, it's useful to provide implementations for other + external types. But this means the trait-definition crate takes on + depdendencies on all those downstream dependencies whether they're needed or not. + +~~What this RFC proposes is a mechanism which allows implementations to be +delegated to "trusted" or "friendly" crates (typically defined in the same +package), allowing downstream consumers to take on only the specific +dependencies they need. Furthermore a "witness" mechanism makes sure that the +coherence invariants currently enforced by the orphan rule are maintained.~~ + +Dependency graph dominator based coherence checking. + +Status Quo: +```mermaid +graph LR +MysqlUser --> DBTrait +MongoUser --> DBTrait +DBTrait --> Mysql +DBTrait --> MongoDB +DBTrait --> RocksDB +``` + +Proposed: +```mermaid +graph LR +MysqlUser --> DBMysql +MongoUser --> DBMongo +DBMysql --> Mysql +DBMysql -->|impl| DBTrait +DBMongo --> MongoDB +DBMongo -->|impl| DBTrait +DBRocks --> RocksDB +DBRocks -->|impl| DBTrait +``` +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +- defining crate and impl crate + +`--extern impl:name=name.rlib` + +- impl crate must directly depend on defining crate - re-exports don't count +- defining crate & impl crate are considered the same (eg `pub(crate)` is visible to both) +- impls in defining & impl must be coherent + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +- "Local" for coherence checking extended from "definitions in one crate" to "definitions from set of crates" +- set of crates are impls for a definition which are downstream from dominator + +Dominating crates: +- crate dominates itself (so must be internally coherent) +- impl crate dominates self and defining crate (so must be coherent) +- least dominator for a set of impl crates must guarantee coherence + +Defining crate is its own implementing crates, and must be internally coherent +(ie, status quo) +```mermaid +graph TD +D["`D +also I0`"] +``` + +Each impl crate dominates its defining crate(s) and must be coherent with them: +```mermaid +graph TD +I1 -->|impl| D +I2 -->|impl| D +In["I..n"] -->|impl| D +``` + +Least dominator of all impl crates for a given defining crate must check +coherence of impls: +```mermaid +graph TD +D["`D +Defines types/traits`"] +I1 -->|impl| D +I2 -->|impl| D +I3 -->|impl| D +In["`I..n +impls for types/traits in D`"] -->|impl| D +C0 --> I1 +C1 --> I2 +C2["`C2 +must check for I1, I2 coherence`"] +C2 --> C0 +C2 --> C1 +C3["C3 need not check"] +C3 --> C2 +C4["C4 must check I1, I2, I3 coherence"] +C4 --> C2 +C4 --> I3 +``` + +## Extra cases + +Impl crates depend on each other +```mermaid +graph TD + I1 -->|impl| D + I2 -->|impl| D + I2 --> I1 +``` +Impl crates use other crates (common) +```mermaid +graph TD + I1 -->|impl| D + I1 --> T +``` +Impl crates implement multiple definition crates +```mermaid +graph TD + I1 -->|impl| D0 + I1 -->|impl| D1 +``` +Impl crates define types and have their own impl crates +```mermaid +graph TD + I1 -->|impl| D0 + I1 -->|impl| I2 + I2 -->|impl| D0 +``` + + +This is the technical portion of the RFC. Explain the design in sufficient detail that: + +- Its interaction with other features is clear. +- It is reasonably clear how the feature would be implemented. +- Corner cases are dissected by example. + +The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. + +# Drawbacks +[drawbacks]: #drawbacks + +- Complexity +- Ecosystem split + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- Features? + - only work for small-scale projects + +# Prior art +[prior-art]: #prior-art + +Discuss prior art, both the good and the bad, in relation to this proposal. +A few examples of what this can include are: + +- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? +- For community proposals: Is this done by some other community and what were their experiences with it? +- For other teams: What lessons can we learn from what other communities have done here? +- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. + +This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. +If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. + +Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. +Please also take into consideration that rust sometimes intentionally diverges from common language features. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +Intended for close coupled crates from same origin, but *could* be used for +generic third-party impls. + +Corresponding Cargo part + - Multi-crate packages? + - Namespaces + - Some other abstraction? + +# Future possibilities +[future-possibilities]: #future-possibilities + +- Transitiive impls +- Total coherence check + - All impl crates are checked against each other for publication, even if not + used together + - Or deliberately allow for conflicting impls in different impl crates, eg + different algorithms for different use-cases for the same thing \ No newline at end of file From fbd0b59eb3fea3d1e85ea4293a7e8ad52752ff6c Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 29 Jul 2023 19:59:19 -0700 Subject: [PATCH 2/8] Lots more details --- text/0000-extern-impl.md | 339 ++++++++++++++++++++++++++------------- 1 file changed, 224 insertions(+), 115 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index 9c671c5108f..fe19dfdd43e 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -1,5 +1,5 @@ -- Feature Name: (fill me in with a unique ident, `my_awesome_feature`) -- Start Date: (fill me in with today's date, YYYY-MM-DD) +- Feature Name: extern_impl +- Start Date: 2023-07-17 - RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) @@ -8,7 +8,8 @@ This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a separate crate, while still meeting the soundness guarantees currently enforced -by the orphan rule. +by the orphan rule. This allows the code author to use the build system to more +flexibly partition functionality among crates. # Motivation [motivation]: #motivation @@ -20,181 +21,277 @@ rule. That is: - only a crate defining a trait can implement that trait for any type In other words, a crate is not allowed to have "third-party implementations" - -it's not allowed to add new inherent implementations to a type, nor is it -allowed to implement a foreign trait for a foreign type. +it's not allowed to add new inherent implementations to a non-local type, nor is +it allowed to implement a non-local trait for a non-local type. This raises a number of problems, but this RFC's principle concern is that of -build performance and dependency graph scaling. +dependency graph scaling and build performance. Specifically, the current orphan +rule sets a hard limit how finely implementations can be distributed amongst +crates. For example: +- When introducing a new trait, it's useful to provide implementations for other + external types. But this means the trait-definition crate takes on + depdendencies on all those downstream dependencies whether they're ultimately + needed or not. - It common for a to only use a crate for its type definitions, but not need any of it's implementations (for example, when mentioning a type in a function signature or another type definition). However, with the current orphan rule, this would require waiting for the full crate to compile, including all of its dependencies. -- When introducing a new trait, it's useful to provide implementations for other - external types. But this means the trait-definition crate takes on - depdendencies on all those downstream dependencies whether they're needed or not. -~~What this RFC proposes is a mechanism which allows implementations to be -delegated to "trusted" or "friendly" crates (typically defined in the same -package), allowing downstream consumers to take on only the specific -dependencies they need. Furthermore a "witness" mechanism makes sure that the -coherence invariants currently enforced by the orphan rule are maintained.~~ +What this RFC proposes is a new kind of dependency relationship between two +crates, in which a crate can have an "impl" dependency on a crate defining a +type or trait, and coherence is a computed over more than one crate. + +## More concrete example + +Say you've just written a new crate which defines a univeral interface to all +types of databases, called `DBTrait`. You want to make this trait useful from +the outset by implementing it for a number of existing DB crates. -Dependency graph dominator based coherence checking. +Today this means your `DBTrait` crate would need to take dependencies on every +one of those DB crates in order to do `impl DBTrait for X`. It also means that +every downstream user of `DBTrait` would also transitively gain those +dependencies. -Status Quo: +Here, `MysqlUser` ends up depending on all of `Mysql`, `MongoDB` and `RocksDB` +even though it only cares about `Mysql`. ```mermaid -graph LR -MysqlUser --> DBTrait -MongoUser --> DBTrait -DBTrait --> Mysql -DBTrait --> MongoDB -DBTrait --> RocksDB +graph TB +MysqlUser(MysqlUser) --> DBTrait --> Mysql & MongoDB & RocksDB ``` -Proposed: +With this RFC, the implementation of `DBTrait` for each of the database crates +would be distributed among multiple implementation crates. Downstream users +would only need to take a dependency on the specific implemenetations they need. + +In this example, `MysqlUser` depends only on `DBTrait` and `DBMysql`, and is +unaffected by the `DBMongo` and `DBRocks` implementations. ```mermaid -graph LR -MysqlUser --> DBMysql -MongoUser --> DBMongo +graph TB +Bin(MysqlUser) --> DBMysql +DBMysql & DBMongo & DBRocks -->|impl| DBTrait DBMysql --> Mysql -DBMysql -->|impl| DBTrait DBMongo --> MongoDB -DBMongo -->|impl| DBTrait DBRocks --> RocksDB -DBRocks -->|impl| DBTrait ``` # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -- defining crate and impl crate +This RFC defines the notion of an *defining* crate and an *implementing* crate. -`--extern impl:name=name.rlib` +We extend the `rustc` `--extern` flag with an additional `impl` flag: +`--extern impl:dbtrait=dbtrait.rlib`. This means that *this* crate (the +*implementing* crate), that rustc is currently compiling, is allowed to +implement traits and methods for types defined in the `dbtrait` crate (the +*defining* crate). -- impl crate must directly depend on defining crate - re-exports don't count -- defining crate & impl crate are considered the same (eg `pub(crate)` is visible to both) -- impls in defining & impl must be coherent +For example: +```rust +//! Defining crate +trait DBTrait {} +``` +```rust +//! Implementation crate, also depends on the `mysql` crate. +use mysql::MySql; -# Reference-level explanation -[reference-level-explanation]: #reference-level-explanation +// Allowed because we're depending on `dbtrait` with `impl` +impl dbtrait::DBTrait for MySql {} +``` -- "Local" for coherence checking extended from "definitions in one crate" to "definitions from set of crates" -- set of crates are impls for a definition which are downstream from dominator +In other words, the implementing crate is considered the same as the defining +crate as far as the orphan rule goes. Note that you may only implement traits +and types for crates directly defined in the defining crate — you cannot +implement re-exported types. -Dominating crates: -- crate dominates itself (so must be internally coherent) -- impl crate dominates self and defining crate (so must be coherent) -- least dominator for a set of impl crates must guarantee coherence +The implementing crate is also considered the same as the defining crate from a +visibility perspective - it will be able to access `pub(crate)` modules, types, +fields, methods, etc in the defining crate (though in this case including +re-exports). -Defining crate is its own implementing crates, and must be internally coherent -(ie, status quo) -```mermaid -graph TD -D["`D -also I0`"] -``` +The defining and implementing crates are not the same in other respects. They +still have distinct crate names, defining their own crate-level namespaces. For +example, if they both defined a type `Foo`, it would be two definitions +`defining::Foo` and `implementing::Foo`. + +There are no other constraints on the defining and implementing crates - they +can freely define new traits, types and other items, and implementations for +those types and traits. An implementing crate can be a defining crate for +another crate. -Each impl crate dominates its defining crate(s) and must be coherent with them: +Regardless, the definitions within the defining crate, the implementing crate, +and between the implementing crate and defining crate must meet the current +coherence requirements. Also if there are multiple implementation crates in +scope in the dependency graph, they must also be coherent. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +[Coherence](https://rust-lang.github.io/chalk/book/clauses/coherence.html) is +currently defined in terms of "local" definitions - that is, items defined with +in the current crate. + +This RFC does not change the definition of Coherence, just the definition of +"local". Instead of only considering items defined in one specific crate, it +defines a set of crates whose definitions are considered "local". + +This set is defined in terms of immediate dominators in the crate dependency +graph. + +Some definitions & axioms: +- a dependency graph always has a top-level crate (typically a binary) which: + - has no dependencies + - only crates which it transitively depends on are relevant +- a "local set" of crates TODO +- an implementing crate must have a dependency relationship to a defining crate + in order to actually have implementations +- a crate always dominates itself +- a crate is the dominator of other crates if it's on the only path to + those other crates + - it's the immediate dominator if it's the closest on that path + +So given this example compiling binary crate `Binary`: +```mermaid +graph TB + Bin(Binary) --> A & B + A & B -->|impl| C + D -->|impl| C + ``` +- When compiling `C`, because it dominates itself, `rustc` must check that + it is internally coherent (this is identical to today) +- When compiling `A`, which dominates itself and `C`, `rustc` must check `A` is + internally coherent and coherence between `A` and `C`. Likewise `B` with + respect to `C`.` +- `Binary` dominates both `A` and `B`, so when compiling it `rustc` must check + for coherence between them. It doesn't need to worry about coherence between + `A` and `C`, nor `B` and `C`, since that's already been checked. +- `D` is not relevant to any of this, as it is not in `Binary`'s transitive + dependencies, and its coherence is not checked. + +In a more complex example: +```mermaid +graph TB + Bin(Binary) --> A & C + A --> D & E + C & D & E -->|impl| F + ``` + - `F` must be internally coherent, likewise `D` & `E` with respect to `F`. + - Compiling `A` must check for coherence between `D` and `E` with respect to `F` + - Compiling `Binary` must check for coherence between `C`, `D` and `E` (which + may be able to take advantage of the knowledge that `D` and `E` are already + coherent because `A` already checked it.) + +## Changes to coherence checking + +In the simple case, when checking the coherence between an implementing crate +and the defining crate, definitions in both the defining and implementing crate +are considered "local". Aside from that, the algorithm then proceeds as before. +Here, implementing crates `A`, `B` and `C` are each considered local to defining crate `D`, but not to each other. ```mermaid graph TD -I1 -->|impl| D -I2 -->|impl| D -In["I..n"] -->|impl| D + A & B & C --> D ``` -Least dominator of all impl crates for a given defining crate must check -coherence of impls: +When checking the coherence of multiple implementing crates, we must be careful +with the definition of "local". Since the implementing crates don't depend on +each other, and can't have implementations for each other's types, we can safely +consider them local to each other with respect to the defining crate `D` for the +purposes of coherence checking. + ```mermaid -graph TD -D["`D -Defines types/traits`"] -I1 -->|impl| D -I2 -->|impl| D -I3 -->|impl| D -In["`I..n -impls for types/traits in D`"] -->|impl| D -C0 --> I1 -C1 --> I2 -C2["`C2 -must check for I1, I2 coherence`"] -C2 --> C0 -C2 --> C1 -C3["C3 need not check"] -C3 --> C2 -C4["C4 must check I1, I2, I3 coherence"] -C4 --> C2 -C4 --> I3 +graph TB + Bin(Binary) --> A & B & C --> D + subgraph Local + A & B & C + end ``` ## Extra cases Impl crates depend on each other ```mermaid -graph TD - I1 -->|impl| D - I2 -->|impl| D +graph TB + I1 & I2 -->|impl| D I2 --> I1 ``` Impl crates use other crates (common) ```mermaid -graph TD +graph TB I1 -->|impl| D I1 --> T ``` Impl crates implement multiple definition crates ```mermaid -graph TD - I1 -->|impl| D0 - I1 -->|impl| D1 +graph TB + I1 -->|impl| D0 & D1 ``` Impl crates define types and have their own impl crates ```mermaid -graph TD +graph TB I1 -->|impl| D0 I1 -->|impl| I2 I2 -->|impl| D0 ``` +Duplicated checking -- `A` and `B` would both check the coherence of `I1` and +`I2` with respect to each other, since neither knows about the other. +```mermaid +graph TB +I1 & I2 -->|impl| D +Bin([Binary]) --> A & B --> I1 & I2 - -This is the technical portion of the RFC. Explain the design in sufficient detail that: - -- Its interaction with other features is clear. -- It is reasonably clear how the feature would be implemented. -- Corner cases are dissected by example. - -The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work. +``` # Drawbacks [drawbacks]: #drawbacks -- Complexity -- Ecosystem split +This adds a fair amount of complexity to a delecate part of the language +semantics and to the compiler's implementation of it. I've tried to keep the +changes as light-touch as possible, but I haven't tried to prototype it yet. + +If implemented as described here, and enabled in Cargo in an unconstrainted way, +it would enabled full third-party implementations. That is, anyone could publish +a crate implementing any trait for any type. This would be fine in the small, +but at large scales it would mean that if there were two crates with conflicting +implementations, then could never appear in the same dependency graph. It means +that non-local changes to dependencies could cause downstream build breakage. +This would lead to very undesireable ecosystem-wide dynamics. As such how this +feature is used must be very carefully considered. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -- Features? - - only work for small-scale projects +In principle you could use features to achieve similar outcomes to the +motivating `DBTrait` example above - you'd define a feature for each DB +implementation, and the consumer would only enable the specific implementations +they want. + +For relatively small Cargo-defined projects, this would probably work fine. You +would lose some of the benefits (like type-only crates for use in declarations), +but it would solve the fundamental "don't want to depend on every DB" problem. + +However, because features are additive, as the dependency graphs get larger it +will tend towards having every feature enabled, so that every subgraph of the +overall dependency graph will end up depending on what everyone else depends on. +As a result, trait-defining crates like `DBTrait` become dependency bottlenecks +which end up single-threading the build. + +(This is particularly acute in cases where Cargo is primarily used as a +dependency management tool, and builds are done with another tool like Buck.) # Prior art [prior-art]: #prior-art -Discuss prior art, both the good and the bad, in relation to this proposal. -A few examples of what this can include are: +...? -- For language, library, cargo, tools, and compiler proposals: Does this feature exist in other programming languages and what experience have their community had? -- For community proposals: Is this done by some other community and what were their experiences with it? -- For other teams: What lessons can we learn from what other communities have done here? -- Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background. +Other languages have a very weak notion of coherence. C++ doesn't enforce it at +all, and simply defines conflicting implementations as UB. Haskell allows for +ecosystem wide conflicts, which may or may not be a problem in practice. -This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your RFC with a fuller picture. -If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages. - -Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC. -Please also take into consideration that rust sometimes intentionally diverges from common language features. +Rust is unique(?) in separately languages in taking such a strong stand on this, +and it's been mostly beneficial. But it would be nice to have a mechanism to +loosen the constraints on this without admitting unsoundness. # Unresolved questions [unresolved-questions]: #unresolved-questions @@ -202,17 +299,29 @@ Please also take into consideration that rust sometimes intentionally diverges f Intended for close coupled crates from same origin, but *could* be used for generic third-party impls. -Corresponding Cargo part - - Multi-crate packages? - - Namespaces - - Some other abstraction? +The big question is what this looks like from a Cargo perspective. I can think +of a few options: +- Extend the namespace proposal so that crates in the same namespace can have + `impl` relations +- Multi-crate Cargo packages, which allow internal `impl` dependencies + # Future possibilities [future-possibilities]: #future-possibilities -- Transitiive impls -- Total coherence check - - All impl crates are checked against each other for publication, even if not - used together - - Or deliberately allow for conflicting impls in different impl crates, eg - different algorithms for different use-cases for the same thing \ No newline at end of file +Allow transitive impl relations. For example: +```mermaid +graph LR + A -->|impl| B -->|impl| C +``` +Allowing `A` to implement items in `C` because of the transitive `impl` +relationship through B. I think this potentially complicates the correctness +analysis a lot though, and doesn't really have a lot of benefits over just +making `A` depend on `C` directly. + +There's a question on whether to globally enforce coherence even between crates +which are not in the same dependency graph. This could be particularly useful at +crates.io publication time to make sure the package doesn't have lurking +problems. On the other hand there might be cases where it's desireable to allow +incoherent implementations to exist; for example, two implementations of the +same API with different algorithms tuned for different use-cases. From 4d75622d9d7982bd0104a48f79f77d7d7c3da144 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 29 Jul 2023 20:09:14 -0700 Subject: [PATCH 3/8] More tweaks --- text/0000-extern-impl.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index fe19dfdd43e..11de02921f0 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -212,24 +212,24 @@ graph TB Impl crates depend on each other ```mermaid -graph TB +graph LR I1 & I2 -->|impl| D I2 --> I1 ``` Impl crates use other crates (common) ```mermaid -graph TB +graph LR I1 -->|impl| D I1 --> T ``` Impl crates implement multiple definition crates ```mermaid -graph TB +graph LR I1 -->|impl| D0 & D1 ``` Impl crates define types and have their own impl crates ```mermaid -graph TB +graph LR I1 -->|impl| D0 I1 -->|impl| I2 I2 -->|impl| D0 @@ -262,7 +262,7 @@ feature is used must be very carefully considered. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -In principle you could use features to achieve similar outcomes to the +In principle you could use Cargo features to achieve similar outcomes to the motivating `DBTrait` example above - you'd define a feature for each DB implementation, and the consumer would only enable the specific implementations they want. @@ -325,3 +325,7 @@ crates.io publication time to make sure the package doesn't have lurking problems. On the other hand there might be cases where it's desireable to allow incoherent implementations to exist; for example, two implementations of the same API with different algorithms tuned for different use-cases. + +There are cases where the same coherence properties may be checked multiple +times. Perhaps these checks could be memoized with the incremental machinery to +avoid the redundancy. From 3ebc4e84f9aeaa20189517abb03225511cfbabea Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Mon, 7 Aug 2023 02:56:42 -0700 Subject: [PATCH 4/8] Add WIP alloy spec --- text/0000-extern-impl.md | 168 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 168 insertions(+) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index 11de02921f0..ae74c4d6266 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -329,3 +329,171 @@ same API with different algorithms tuned for different use-cases. There are cases where the same coherence properties may be checked multiple times. Perhaps these checks could be memoized with the incremental machinery to avoid the redundancy. + +# Appendix - Alloy spec (WIP) + +--- +title: Alloy spec for Rust implementation coherency +--- + +Note: this is a literate [Alloy](https://alloytools.org/) spec. Download the +most recent version of Alloy from +https://github.com/AlloyTools/org.alloytools.alloy/releases, and see +https://alloy.readthedocs.io for documentation. + +# Alloy spec for Rust implementation coherency + +This is a simplified spec of Rust coherency checking. Rust requires that there, +are no overlapping or conflicting implementations with a complete Rust program, +as that would allow for ambiguity about which one to use. + +(This applies to all implementations, but here we're only going to consider the +subset of simple trait implementations for types, with no generic type +parameters.) + +The current solution to this is to requires that when a crate implements a trait +for a type, at least one of the type or the trait have to be "local" - ie, +defined in the same crate. + +This is simple to describe and implement, but in practice it has several downsides: +- (dependencies, no third-party impls) + +## Alloy Spec + +First we define signatures for types and traits. Both are defined in a crate: + +```alloy +sig Trait { + trait_def: one Crate +} + +sig Type { + type_def: one Crate +} +``` + +And crates themselves. Crates can depend on a set of other crates, but the +overall crate dependency graph must be acyclic. Each crate also has a relation +of which trait implementations for which types it contains. + +```alloy +sig Crate { + deps: set Crate, + impls: Trait -> Type, +} { + no this & this.^@deps -- acyclic +} +``` + +A Binary is the unique "top-level" crate which depends on all the other crates +transitively. Or to put it another way, no other crate depends on Binary. + +```alloy +one sig Binary extends Crate {} { + no @deps.this -- nothing depends on Binary + all c: Crate - Binary | c in this.^@deps -- Binary depends on everything else +} +``` + +Let's define the safety invariant we need to enforce, that every implementation +is unique. Or more precisely, for every trait/type pair, there's at most one +crate implementing it. (It's fine if nothing implements it.) + +```alloy +pred coherent_impls[crates: Crate] { + all tr: Trait, ty: Type | lone crates & impls.ty.tr +} +`````` + +This is the basic orphan rule, with a tight definition of "local": either the +type or the trait must be defined in the crate: + +```alloy +pred local_orphan_rule[crates: Crate] { + all crate: crates | + crate.impls in + (crate[trait_def] -> (crate + crate.deps)[type_def]) + + ((crate + crate.deps)[trait_def] -> crate[type_def]) +} +``` + +We can check that if `local_orphan_rule` is true for all crates, then we have +coherence for all crates. This has no counter-examples. + +```alloy +check local_coherent { + -- ie, checking local_orphan_rule on each crate implies that all crates are globally coherent + local_orphan_rule[Crate] => coherent_impls[Crate] +} +``` + +## impl dependencies + +Let's extend the orphan constraint so that the definition of "local" is extended +to immediate dependencies as well. In a way this is simpler than +`local_orphan_rule` because we no longer have to constrain either the type or +trait to be in `crate`. + +```alloy +pred dep_orphan_rule[crates: Crate] { + all crate: crates | + crate.impls in + (crate + crate.deps)[trait_def] -> (crate + crate.deps)[type_def] +} +``` + +However, this is not sufficient to maintain the invariant. This will quickly +find a counter-example with two crates with a duplicate implementation. + +```alloy +check dep_coherent_bad { + dep_orphan_rule[Crate] => coherent_impls[Crate] +} +``` + +We need to add additional constraints to maintain the invariant. First, let's +define a function which, for a given crate which defines types and/or traits, +all the crates with implementations for those definitions: +```alloy +fun impl_crates[c: Crate]: set Crate { + c[trait_def][impls.univ] + c[type_def][impls].univ +} +``` + +We can then apply the constraint in the most general way: for all dependencies +of Binary, all the crates implementing anything must be coherent. We'll ignore +that this is a tautology for now, as we'll tighten this up later. + +```alloy +check dep_coherent_impl_crates { + { + dep_orphan_rule[Crate] -- redundant + all dep: Binary.*deps | + coherent_impls[impl_crates[dep]] -- tautology + } => coherent_impls[Crate] +} for 10 -- make sure there's enough objects to be interesting +``` + +Unfortunately, this doesn't correspond to how the build is actually performed in +practice. When we're compiling a crate which has definitions, we don't know +which crates will have implementations. And when we're compiling the crates with +implementations, we don't know which other crate to cross-check with for +coherence. + +The key insight is that there must be *some* crate which has both the +implementing crates in its transitive dependencies (even if it's the top-level +Binary), which means it can check for coherence when compiling that crate. + +For example, here `Use` is responsible for checking the coherence of `ImplA` and +`ImplB` in its dependencies, but `ImplC` someone else's problem, and `Other` is +not relevant. +```mermaid +graph TD + Use --> ImplA & ImplB + ImplA & ImplB & ImplC --> Defn + Use --> Other +``` + +```alloy +-- TODO check constraint for common deps +``` From 8dd90d73e5290fefb89a8d9fafb8a329e47be554 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Sat, 2 Sep 2023 18:03:10 -0700 Subject: [PATCH 5/8] Revise and clarify --- text/0000-extern-impl.md | 471 +++++++++++++++++++++++++-------------- 1 file changed, 298 insertions(+), 173 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index ae74c4d6266..385580c8f1f 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -8,42 +8,51 @@ This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a separate crate, while still meeting the soundness guarantees currently enforced -by the orphan rule. This allows the code author to use the build system to more -flexibly partition functionality among crates. +by the orphan rule. With this, the crate structure (and thus build system +dependency graph) can be decoupled from the actual implementations. # Motivation [motivation]: #motivation -In order to guarantee coherence, Rust currently imposes the fairly strict orphan -rule. That is: -- only a crate defining a type can have inherent implementations for that type, - or for any traits, and -- only a crate defining a trait can implement that trait for any type - -In other words, a crate is not allowed to have "third-party implementations" - -it's not allowed to add new inherent implementations to a non-local type, nor is -it allowed to implement a non-local trait for a non-local type. - -This raises a number of problems, but this RFC's principle concern is that of -dependency graph scaling and build performance. Specifically, the current orphan -rule sets a hard limit how finely implementations can be distributed amongst -crates. - -For example: - -- When introducing a new trait, it's useful to provide implementations for other - external types. But this means the trait-definition crate takes on - depdendencies on all those downstream dependencies whether they're ultimately - needed or not. -- It common for a to only use a crate for its type definitions, but not need any - of it's implementations (for example, when mentioning a type in a function - signature or another type definition). However, with the current orphan rule, - this would require waiting for the full crate to compile, including all of its - dependencies. - -What this RFC proposes is a new kind of dependency relationship between two -crates, in which a crate can have an "impl" dependency on a crate defining a -type or trait, and coherence is a computed over more than one crate. +Rust requires implementation coherence - that is, there's at most one +implementation of a trait for a type. This is required to: +- avoid any ambiguity about which implementation to use when compiling, and +- make sure that adding a new implementation does not change the meaning of +existing code. + +To do this, Rust implements the orphan rule, which requires that at least one of +the trait or the type are defined in the same crate as the implementation (ie is +a "local" definition). The "other" definition must be in a crate which is a +dependency, and as the dependency graph is acyclic, that "other" crate can't +possibly have a conflicting implementation because it can't have both +definitions in scope. + +The effect of this rule imposes some very hard constraints on how a developer +can factor their code into crates. For example it prevents: +- introducing a new trait and implementing it for existing types, without adding + dependencies for those types to all trait users + - more generally, refactoring to break dependency chains +- refactoring code to increase build parallelism +- adding trait implementations for types in crates whose source is generated (eg protobuf) +- splitting libstd/libcore +- multiple conflicting implementations to make different implementation + tradeoffs (eg size vs performance), or bindings to incompatible native + libraries (python 2 vs python 3) + +This RFC proposes an extension to the orphan rule which retains the coherence +invariant, while loosening the "locality" requirement for implementation. This +considerably increases implementation flexibility, at the cost of no longer +being able to enforce the invariant purely locally. + +At the rustc and language level, this mechanism is very general, and allows +almost arbitrary implementation relations between crates. However in practice +unconstrained use of this feature (ie implementing traits for types from +different crates from unrelated origins) would have large impacts on the general +Rust / crates.io ecosystem if not carefully considered. + +As a result, this RFC also proposes a fairly small Cargo extension which allows +the feature to be used within a workspace for experimentation, but leaves the +implications of publishing to crates.io to a future RFC. ## More concrete example @@ -77,16 +86,45 @@ DBMysql --> Mysql DBMongo --> MongoDB DBRocks --> RocksDB ``` -# Guide-level explanation -[guide-level-explanation]: #guide-level-explanation +# Guide-level explanation (cargo) +[guide-level-explanation]: #guide-level-explanation + +Within a workspace, you can designate an "impl" relationship between two crates. +For example, in your `my_impls/Cargo.toml` you can put: -This RFC defines the notion of an *defining* crate and an *implementing* crate. +``` +[dependency] +my_definitions = { version = "0.1", path = "../my_definitions", impl = True } +``` -We extend the `rustc` `--extern` flag with an additional `impl` flag: -`--extern impl:dbtrait=dbtrait.rlib`. This means that *this* crate (the -*implementing* crate), that rustc is currently compiling, is allowed to -implement traits and methods for types defined in the `dbtrait` crate (the -*defining* crate). +This means that your `my_impls` crate can add trait implementations using types +or traits defined in `my_definitions`, almost as if they were defined in +`my_definitions`. If you want to use those implementations in other crates, +however, you must also add a dependency to both `my_impls` and `my_definitions` +(or you could re-export the definitions in `my_impls`). + +You may only implement for types/traits actually defined in `my_definitions` +itself. Re-exports don't count. + +The dependency *must* be a path dependency to another crate defined in the same +workspace. Cargo will not allow you specify `impl = True` for other crates. + +Publishing packages with `impl = True` to crates.io is not allowed. + +# Guide-level explanation (rustc) + +In current Rust, when you specify in your build system that a crate depends on +another, this is a "use" dependency - the dependee can use definitions in the +dependant. + +This RFC defines a new dependency relationship: the "impl" dependency. The +dependee can provide implementations for types and traits defined in the +dependant. + +We extend the `rustc` `--extern` flag with an additional `impl` flag: `--extern +impl:dbtrait=dbtrait.rlib`. This means that the crate currently being compiled +with this option, is allowed to implement traits and methods for types defined +in the `dbtrait` crate (the *defining* crate). For example: ```rust @@ -106,11 +144,6 @@ crate as far as the orphan rule goes. Note that you may only implement traits and types for crates directly defined in the defining crate — you cannot implement re-exported types. -The implementing crate is also considered the same as the defining crate from a -visibility perspective - it will be able to access `pub(crate)` modules, types, -fields, methods, etc in the defining crate (though in this case including -re-exports). - The defining and implementing crates are not the same in other respects. They still have distinct crate names, defining their own crate-level namespaces. For example, if they both defined a type `Foo`, it would be two definitions @@ -133,136 +166,162 @@ scope in the dependency graph, they must also be coherent. currently defined in terms of "local" definitions - that is, items defined with in the current crate. -This RFC does not change the definition of Coherence, just the definition of -"local". Instead of only considering items defined in one specific crate, it -defines a set of crates whose definitions are considered "local". - -This set is defined in terms of immediate dominators in the crate dependency -graph. - -Some definitions & axioms: -- a dependency graph always has a top-level crate (typically a binary) which: - - has no dependencies - - only crates which it transitively depends on are relevant -- a "local set" of crates TODO -- an implementing crate must have a dependency relationship to a defining crate - in order to actually have implementations -- a crate always dominates itself -- a crate is the dominator of other crates if it's on the only path to - those other crates - - it's the immediate dominator if it's the closest on that path - -So given this example compiling binary crate `Binary`: -```mermaid -graph TB - Bin(Binary) --> A & B - A & B -->|impl| C - D -->|impl| C - ``` -- When compiling `C`, because it dominates itself, `rustc` must check that - it is internally coherent (this is identical to today) -- When compiling `A`, which dominates itself and `C`, `rustc` must check `A` is - internally coherent and coherence between `A` and `C`. Likewise `B` with - respect to `C`.` -- `Binary` dominates both `A` and `B`, so when compiling it `rustc` must check - for coherence between them. It doesn't need to worry about coherence between - `A` and `C`, nor `B` and `C`, since that's already been checked. -- `D` is not relevant to any of this, as it is not in `Binary`'s transitive - dependencies, and its coherence is not checked. - -In a more complex example: -```mermaid -graph TB - Bin(Binary) --> A & C - A --> D & E - C & D & E -->|impl| F - ``` - - `F` must be internally coherent, likewise `D` & `E` with respect to `F`. - - Compiling `A` must check for coherence between `D` and `E` with respect to `F` - - Compiling `Binary` must check for coherence between `C`, `D` and `E` (which - may be able to take advantage of the knowledge that `D` and `E` are already - coherent because `A` already checked it.) +The details of the coherence checking logic are unchanged; the only change is +the constraints on which crates a definition and an implementation can be in. + +The discussion below talks about coherence in terms of the crate dependency +relations, but the actual coherence check is at the level of individual type and +trait implementations. The crate dependency relations set an upper bound on +this, as any crate which does not appear within the relevent dependency graph +need not be considered for coherence checking. ## Changes to coherence checking -In the simple case, when checking the coherence between an implementing crate -and the defining crate, definitions in both the defining and implementing crate -are considered "local". Aside from that, the algorithm then proceeds as before. -Here, implementing crates `A`, `B` and `C` are each considered local to defining crate `D`, but not to each other. +The key problem we need to solve is how to reconcile the Rust language-level +details (types and traits) with the build system view (dependency relationship +between crates). Coherence is defined in terms of type/trait definitions and +implementations, which forms its own graph. This graph is embedded in the crate +dependency graph — that is there are no edges in the type/trait +implementation graph which are not also present in the crate dependency graph. + ```mermaid -graph TD - A & B & C --> D +graph TB + CrateC -->|dep| CrateA & CrateB + subgraph CrateA + A["Trait A"] + D["Type D"] + E["Impl A for D"] -..->|impl| A & D + end + subgraph CrateB + B["Type B"] + end + subgraph CrateC + C["Impl A for B"] -..->|impl| A & B + end ``` -When checking the coherence of multiple implementing crates, we must be careful -with the definition of "local". Since the implementing crates don't depend on -each other, and can't have implementations for each other's types, we can safely -consider them local to each other with respect to the defining crate `D` for the -purposes of coherence checking. +For the discussion below, crate B is visible to crate A if it falls within its +transitive dependencies; a crate is visible to itself. ```mermaid graph TB - Bin(Binary) --> A & B & C --> D - subgraph Local - A & B & C + subgraph A sees only A + c1["A"] --> c2["..."] + c3["B"] + end + subgraph A sees A&B + b1["A"] --> b2["..."] --> b3["B"] + end + subgraph A sees A&B + a1["A"] --> a2["B"] end ``` -## Extra cases +The general rule is when compiling a crate, `rustc` must check the coherence of +a set of implementations for a type if: +- all those implementations are visible +- no other visible crate has checked them -Impl crates depend on each other +This means the compilation of a crate must check for coherence when: +- all the definitions and implementations are within one crate +- if a crate A has an impl dependency on crate B, it must check coherency + between A and B +- if a crate has two or more crates with implementations for a given definition + within its view via distinct direct dependencies + +This last point is the most complex. For example: ```mermaid -graph LR - I1 & I2 -->|impl| D - I2 --> I1 +graph TB + T -->|view| A & B -->|impl| D + A & B --> E ``` -Impl crates use other crates (common) +Here, `T` must check `A` and `B`'s implementations are coherent with respect to +each other and definitions in `D` and `E`. It can rely that `A` and `B` are each +individually coherent with respect to `D` and `E` because that would have been +checked when `A` and `B` were compiled. + +Note that while `A` and `B` both depend +on `D` and `E`, they need only have a `impl` dependency on one of them; the `impl` +dependency simply means that `D`'s definitions are considered local to `A` and +`B` when seen in terms of the current orphan rule. + +This is OK even if `A` and `B` have different `impl` dependencies: ```mermaid -graph LR - I1 -->|impl| D - I1 --> T +graph TB + T -->|view| A & B + A -->|impl| D + A --> E + B --> D + B -->|impl| E ``` -Impl crates implement multiple definition crates +Since the groups (`A`, `D` `E`) and (`B`, `D` `E`) must be internally coherent, +and `T` still needs to check the coherence of (`A` and `B`). + +In this case: ```mermaid -graph LR - I1 -->|impl| D0 & D1 +graph TB + T --> A -->|view| B & C -->|impl| D ``` -Impl crates define types and have their own impl crates +`A` must check the coherence of `B` & `C`'s implementations with respect to `D`. +`T`, however, can rely on `A` having already checked `B` and `C`'s coherence +because both are visible through it's `A` dependency. + +A more complex example: ```mermaid -graph LR - I1 -->|impl| D0 - I1 -->|impl| I2 - I2 -->|impl| D0 +graph TB + T --> A -->|view| B & C -->|impl| D + B & C --> E + T --> C ``` -Duplicated checking -- `A` and `B` would both check the coherence of `I1` and -`I2` with respect to each other, since neither knows about the other. +`T` can rely on `A` to check the coherence of `B` and `C`, even though `T` has +its own direct dependency on `C`, because `C` is also visible through `A`. + +But in this case: ```mermaid graph TB -I1 & I2 -->|impl| D -Bin([Binary]) --> A & B --> I1 & I2 + T --> A -->|view| B & C -->|impl| E + B & C --> F + T -->|view| D -->|impl| E +``` +`T` can rely on `A` for the coherency of `B` and `C`, but it must check the +coherence of `B`, `C` and `D` with respect to `E`. +Note that this can mean that the same crates' coherency can be checked +redundantly. For example: +```mermaid +graph TB + Binary --> X & Y & Z -->|view| A & B -->|impl| C ``` +`X`, `Y` and `Z` are all the first to see the implementations of `A` and `B` +with respect to `C`, so must each do their own coherency checks, even though +`Binary` only needs for this to be performed once. Fortunately these redundant +checks can be done in parallel, but it's still a waste of CPU. See below for +some discussion about optimizing this. # Drawbacks [drawbacks]: #drawbacks This adds a fair amount of complexity to a delecate part of the language -semantics and to the compiler's implementation of it. I've tried to keep the -changes as light-touch as possible, but I haven't tried to prototype it yet. +semantics and to the compiler's implementation of it. This is opt-in, but one +would need to understand these semantics when using other people's packages +which use this feature. If implemented as described here, and enabled in Cargo in an unconstrainted way, -it would enabled full third-party implementations. That is, anyone could publish -a crate implementing any trait for any type. This would be fine in the small, -but at large scales it would mean that if there were two crates with conflicting -implementations, then could never appear in the same dependency graph. It means -that non-local changes to dependencies could cause downstream build breakage. -This would lead to very undesireable ecosystem-wide dynamics. As such how this -feature is used must be very carefully considered. +it would enable full third-party implementations. That is, anyone could publish +a package implementing any type or trait in any other package. This would be +fine at small scalle, but at large scales it would mean that if there were two +crates with conflicting implementations, then could never appear in the same +dependency graph. That is a change to private dependencies, which wouldn't +usually be considered compatibility breaking, would cause downstream build +breakage. This would lead to very undesireable ecosystem-wide dynamics. As such +how this feature is used must be very carefully considered. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -In principle you could use Cargo features to achieve similar outcomes to the +## Features + +In principle you could use Cargo "features" to achieve similar outcomes to the motivating `DBTrait` example above - you'd define a feature for each DB implementation, and the consumer would only enable the specific implementations they want. @@ -280,18 +339,62 @@ which end up single-threading the build. (This is particularly acute in cases where Cargo is primarily used as a dependency management tool, and builds are done with another tool like Buck.) +Sometimes features are used in a non-additive way, which is technically not +supported, but there's no mechanism to prevent it (or even warn about it). In +contrast, the mechanism described here does allow for separate implementation +crates with conflicting implementations, perhaps to allow for different +implementation tradeoffs, or bindings to different non-Rust implementations. + +Extensive use of features make code navigation difficult. For example, IDE +integration (VS Code + Rust Analyzer) often won't know which features are +enabled, and so will tend to dim feature-controlled code which should be +enabled, or vice versa. + +It's also hard to know which features should be enabled when generating +documentation with `rustdoc`. Not enabling features which are actually used +means that crate functionality goes undocumented, where as enabling too many +unused features can obscure the API the user actually wants to see. + # Prior art [prior-art]: #prior-art -...? +## Generally weak coherence Other languages have a very weak notion of coherence. C++ doesn't enforce it at all, and simply defines conflicting implementations as UB. Haskell allows for ecosystem wide conflicts, which may or may not be a problem in practice. -Rust is unique(?) in separately languages in taking such a strong stand on this, -and it's been mostly beneficial. But it would be nice to have a mechanism to -loosen the constraints on this without admitting unsoundness. +Rust is unique(?) in separately-compiled languages in taking such a strong stand +on this, and it's been mostly beneficial. But it would be nice to have a +mechanism to loosen the constraints on this without admitting unsoundness. + +## Interface/implementation split + +In C and C++, there's a clear separation between "header files" containing a +specification of an interface, and "source files" which contain actual +implementations. Typically a compilation unit which has a dependency on another +compilation unit will include the header files containing the interface +definitions, and can be compiled with that information alone; it does not requre +the implementation of the dependant to be compiled. + +Ideally (and often in practice) this means that the entire program can be +compiled in parallel, with only the final linking step dependent on all the +compilation units. + +The primary disadvantage of this scheme is that the header file definitions must +be manually kept in sync with the implementations, and failing to do so can +cause numerous poor outcomes from linker errors to undefined behaviour. + +Today, Rust approximates this interface/implementation split with pipelined +builds, where `rustc` will generate a metadata object containing a crates +definitions, followed by actual code generation. The dependent compilation can +start as soon as the metadata is available without having to wait for the +generated code. But this still requires metadata for the entire dependency chain +- and for many crates the metadata generation *is* the expensive part of +compilation. + +With this RFC the split can be made more explicit by having actual separate +crates for definitions and implementations. # Unresolved questions [unresolved-questions]: #unresolved-questions @@ -299,37 +402,64 @@ loosen the constraints on this without admitting unsoundness. Intended for close coupled crates from same origin, but *could* be used for generic third-party impls. -The big question is what this looks like from a Cargo perspective. I can think -of a few options: -- Extend the namespace proposal so that crates in the same namespace can have - `impl` relations -- Multi-crate Cargo packages, which allow internal `impl` dependencies +The big question is what this looks like from a crates.io perspective. If we +could guarantee that `cargo publish` published an entire workspace atomically, +then # Future possibilities [future-possibilities]: #future-possibilities -Allow transitive impl relations. For example: -```mermaid -graph LR - A -->|impl| B -->|impl| C -``` -Allowing `A` to implement items in `C` because of the transitive `impl` -relationship through B. I think this potentially complicates the correctness -analysis a lot though, and doesn't really have a lot of benefits over just -making `A` depend on `C` directly. - -There's a question on whether to globally enforce coherence even between crates -which are not in the same dependency graph. This could be particularly useful at -crates.io publication time to make sure the package doesn't have lurking -problems. On the other hand there might be cases where it's desireable to allow -incoherent implementations to exist; for example, two implementations of the -same API with different algorithms tuned for different use-cases. +## Support intrinsic implementations + +This RFC focuses on implementing traits for types, since this is the common pain +point. In principle everything described above will also work for intrinsic +implementations, but there's some awkward secondary questions. For example, +intrinsic implementations tend to access the private fields of structs; does +that mean we need add or redefine visibility rules? + +## "Global" third-party implementations + +If this mechanism were broadly enabled for the crates.io ecosystem, it could +cause undesireably ecosystem splits. For example if you have type-defining +package `some_type` and trait defining package `some_trait`, you could have two +separate crates `A` & `B` implementing `some_trait` for `some_type`. If your +package `my_package` contains just one of `A` or `B` in its dependencies, then +all is OK. But at any point, with only a minor patch update, any package could +add the other to its dependencies, preventing your package from compiling. This +kind of brittleness is highly undesireable. + +So is there some way to enable third-party these kinds of third-party +implementation crates in a more general way without introducing this kind of +ecosystem brittleness? + +The [namespacing RFC](https://github.com/Manishearth/namespacing-rfc) +([PR)(https://github.com/rust-lang/rfcs/pull/3243)]) is also looking at +questions relating the the overall crates.io ecosystem. One could imagine a +proposal, for example, that packages within the same namespace could have this +`impl` relationship. But whether that makes sense really depends on the intended +semantics of namespaces. + +## Cached coherence checks There are cases where the same coherence properties may be checked multiple -times. Perhaps these checks could be memoized with the incremental machinery to +times, if the same implementing crates are dependencies of multiple unrelated +crates. Perhaps these checks could be memoized with the incremental machinery to avoid the redundancy. + +## Auto-split crates + +Modules +within crates may have cyclic dependencies, but the crate dependency graph must +be acyclic. But only in extremely pathological cases will a crate's internal +dependency graph consist of a single strongly connected component. + +This opens the possibility of using the mechanism described in this RFC to +automatically partition a crate into sub-crates in order to expose more +built-time parallelism. This could either be performed inline during the build +or some tooling to perform pre-build processing on the crates. + # Appendix - Alloy spec (WIP) --- @@ -343,7 +473,9 @@ https://alloy.readthedocs.io for documentation. # Alloy spec for Rust implementation coherency -This is a simplified spec of Rust coherency checking. Rust requires that there, +(TODO: make this fully consistent with the description above.) + +This is a simplified spec of Rust coherency checking. Rust requires that there are no overlapping or conflicting implementations with a complete Rust program, as that would allow for ambiguity about which one to use. @@ -351,13 +483,6 @@ as that would allow for ambiguity about which one to use. subset of simple trait implementations for types, with no generic type parameters.) -The current solution to this is to requires that when a crate implements a trait -for a type, at least one of the type or the trait have to be "local" - ie, -defined in the same crate. - -This is simple to describe and implement, but in practice it has several downsides: -- (dependencies, no third-party impls) - ## Alloy Spec First we define signatures for types and traits. Both are defined in a crate: From e0b22b24ee172c1fb648fa679e4b8c124cb53d47 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Wed, 6 Sep 2023 08:32:09 -0700 Subject: [PATCH 6/8] WIP address comments --- text/0000-extern-impl.md | 69 +++++++++++++++++++++++++--------------- 1 file changed, 44 insertions(+), 25 deletions(-) diff --git a/text/0000-extern-impl.md b/text/0000-extern-impl.md index 385580c8f1f..305bd06ae02 100644 --- a/text/0000-extern-impl.md +++ b/text/0000-extern-impl.md @@ -6,10 +6,11 @@ # Summary [summary]: #summary -This RFC proposes a mechanism to allow `impl`s for a type or trait to be in a -separate crate, while still meeting the soundness guarantees currently enforced -by the orphan rule. With this, the crate structure (and thus build system -dependency graph) can be decoupled from the actual implementations. +This RFC proposes an opt-in mechanism to relax some of the restrictions of the +orphan rule, while maintaining the soundness of implementation coherence. It +also proposes a minimal Cargo extension to allow this feature to be exercised. +With this, the crate structure (and thus build system dependency graph) can be +decoupled from the actual implementations. # Motivation [motivation]: #motivation @@ -109,7 +110,8 @@ itself. Re-exports don't count. The dependency *must* be a path dependency to another crate defined in the same workspace. Cargo will not allow you specify `impl = True` for other crates. -Publishing packages with `impl = True` to crates.io is not allowed. +Publishing packages with any `impl = True` dependencies to crates.io is not +allowed. # Guide-level explanation (rustc) @@ -139,20 +141,22 @@ use mysql::MySql; impl dbtrait::DBTrait for MySql {} ``` -In other words, the implementing crate is considered the same as the defining -crate as far as the orphan rule goes. Note that you may only implement traits -and types for crates directly defined in the defining crate — you cannot -implement re-exported types. +In other words, when the overlap check is considering what definitions are +"local", both the implementing crate and the defining crate both count. Note +that you may only implement traits and types for crates directly defined in the +defining crate — you cannot implement re-exported types. The defining and implementing crates are not the same in other respects. They still have distinct crate names, defining their own crate-level namespaces. For example, if they both defined a type `Foo`, it would be two definitions `defining::Foo` and `implementing::Foo`. -There are no other constraints on the defining and implementing crates - they -can freely define new traits, types and other items, and implementations for -those types and traits. An implementing crate can be a defining crate for -another crate. +The terms "implementing" and "defining" describe a property of the dependency +relationship between two crates, not an intrinsic property of the crates +themselves. There are no other constraints on the defining and implementing +crates - they can freely define new traits, types and other items, and +implementations for those types and traits. An implementing crate can be a +defining crate for another crate. Regardless, the definitions within the defining crate, the implementing crate, and between the implementing crate and defining crate must meet the current @@ -162,18 +166,27 @@ scope in the dependency graph, they must also be coherent. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -[Coherence](https://rust-lang.github.io/chalk/book/clauses/coherence.html) is -currently defined in terms of "local" definitions - that is, items defined with -in the current crate. +> The idea of trait +> [coherence](https://rust-lang.github.io/chalk/book/clauses/coherence.html) is +> that, given a trait and some set of types for its type parameters, there +> should be exactly one impl that applies. -The details of the coherence checking logic are unchanged; the only change is -the constraints on which crates a definition and an implementation can be in. +A key part of this are *overlap checks*, which check that not only are there +currently no conflicting implementations, but adding new implementations won't +invalidate or change the meaning of existing code. -The discussion below talks about coherence in terms of the crate dependency -relations, but the actual coherence check is at the level of individual type and -trait implementations. The crate dependency relations set an upper bound on -this, as any crate which does not appear within the relevent dependency graph -need not be considered for coherence checking. +These overlap checks can only be performed if all the relevant implementation +sites are known so they can be checked. This is the role of the *orphan rule*, +which has the consequence that there's only one possible crate in which contain +a given implementation. This is the restriction this RFC proposes to loosen. + +We introduce what could be called *the adoption rule*. This is an extension of +the orphan rule, in which implementations can be in more than one crate, but +some other crate adopts those implementation crates in order to apply the +overlap check across them in order to guarantee coherence. + +The actual details of the overlap check logic are unchanged; the only change is +a wider definition of what's considered a "local definition". ## Changes to coherence checking @@ -184,6 +197,12 @@ implementations, which forms its own graph. This graph is embedded in the crate dependency graph — that is there are no edges in the type/trait implementation graph which are not also present in the crate dependency graph. +Note that a blanket implementations - `impl Foo for T` - are a universal +quantifier over all types `T`, whether they're in scope or not. However if we're +interested in applying it to specific concrete types, we need only consider +those types which are in scope, which are therefore constrained by the +dependency graph. + ```mermaid graph TB CrateC -->|dep| CrateA & CrateB @@ -224,8 +243,8 @@ a set of implementations for a type if: This means the compilation of a crate must check for coherence when: - all the definitions and implementations are within one crate -- if a crate A has an impl dependency on crate B, it must check coherency - between A and B +- if a crate A has an impl dependency (ie, a direct dependency with the impl + option) on crate B, it must check coherency between A and B - if a crate has two or more crates with implementations for a given definition within its view via distinct direct dependencies From 8793aa0291b721a185e2b0b4a88af0aba813cee1 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Thu, 2 Oct 2025 00:50:49 -0700 Subject: [PATCH 7/8] Add formal specification plan and development documentation - Add extern-impl-formal-spec-plan.md with comprehensive plan for improving the formal specification of the extern impl RFC - Include tool research comparing Alloy/Forge, TLA+, Lean, Dafny and others - Document 6 phases of work including formalization, generics analysis, performance considerations, and Cargo RFC requirements - Add CLAUDE.md to provide context for future Claude Code sessions - Identify key challenge: Cargo's package-based deps vs rustc's crate-level impl deps, requiring multiple libraries per package support --- CLAUDE.md | 81 +++++++++ extern-impl-formal-spec-plan.md | 290 ++++++++++++++++++++++++++++++++ 2 files changed, 371 insertions(+) create mode 100644 CLAUDE.md create mode 100644 extern-impl-formal-spec-plan.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000000..88f5bf67283 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,81 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Repository Overview + +This is the Rust RFC (Request for Comments) repository. RFCs are design proposals for substantial changes to Rust, Cargo, Crates.io, or the RFC process itself. + +## Current Work Context + +The current focus is on **RFC 0000-extern-impl** (`text/0000-extern-impl.md`), which proposes relaxing the orphan rule while maintaining coherence through "impl dependencies." + +### Key RFC Concept + +The extern impl RFC addresses a fundamental constraint in Rust: the orphan rule requires that trait implementations be in the same crate as either the trait or type. This RFC proposes allowing crates to have "impl dependencies" where they can implement traits/types from dependencies as if they were local, while maintaining coherence through distributed checking. + +### Active Work + +**Formal Specification Plan**: See `extern-impl-formal-spec-plan.md` for the comprehensive plan to improve the RFC's formal specification. This includes: + +1. **Tool selection**: Research concluded that **Forge** (modern Alloy variant) or **Alloy 6** are the best tools for modeling the structural graph properties +2. **Outstanding work phases**: + - Define precise terminology (especially graph-theoretic terms like "immediate dominator") + - Enumerate all coherence invariants (coherence, acyclicity, visibility, checker responsibility, etc.) + - Analyze how generics and associated types interact with the system + - Add performance/complexity analysis + - Review restrictions (e.g., the prohibition on re-exported definitions may be too conservative) + - Build a complete formal specification in Alloy/Forge + +3. **Key invariant**: Each binary is an independent coherence root. Multiple binaries need not be coherent with each other. + +### Current Alloy Spec Issues + +The existing Alloy specification in the RFC (Appendix) is incomplete and has several problems: +- Doesn't model impl-deps as distinct from regular deps +- Doesn't formalize "checker responsibility" +- Contains a tautology in the `dep_coherent_impl_crates` check +- Has a TODO for the most critical constraint + +## RFC File Structure + +- `text/` - Contains all accepted and proposed RFCs +- `text/0000-extern-impl.md` - The extern impl RFC being worked on +- `0000-template.md` - Template for new RFCs +- `README.md` - RFC process documentation + +## Working with RFCs + +RFCs follow a specific structure defined in `0000-template.md`: +- Summary +- Motivation +- Guide-level explanation +- Reference-level explanation +- Drawbacks +- Rationale and alternatives +- Prior art +- Unresolved questions +- Future possibilities + +## Git Workflow + +Current branch: `extern-impl` + +This is a working branch for the extern impl RFC. The RFC is still in draft/revision phase. + +## Next Steps + +Based on the formal specification plan, the immediate next steps are: + +1. Define precise terminology and graph-theoretic concepts +2. Enumerate and formalize all coherence invariants +3. Decide between Forge (modern, better UX) vs Alloy 6 (more established) for the formal spec +4. Build out the formal specification incrementally + +## Key Concepts to Understand + +- **Orphan rule**: Current requirement that trait impls must be in the same crate as the trait or type +- **Coherence**: The property that there's at most one impl of a trait for a given type +- **Impl dependency**: The proposed new dependency type where a crate can implement traits/types from dependencies +- **Checker responsibility**: Which crate is responsible for verifying coherence between multiple implementing crates +- **Binary root**: Each binary crate is the root of its own coherence domain diff --git a/extern-impl-formal-spec-plan.md b/extern-impl-formal-spec-plan.md new file mode 100644 index 00000000000..3dc3f602d9d --- /dev/null +++ b/extern-impl-formal-spec-plan.md @@ -0,0 +1,290 @@ +# Extern Impl Formal Specification Plan + +This document outlines the plan for improving the formal specification of the extern impl RFC (0000-extern-impl.md). + +## Todo List + +- [x] Research current state of formal verification tools and tooling +- [ ] Define terminology and graph-theoretic concepts +- [ ] Enumerate and formalize all coherence invariants +- [ ] Analyze generic type parameters and associated types +- [ ] Add performance and complexity analysis section +- [ ] Review and revise other spec restrictions +- [ ] Design and implement new formal specification +- [ ] Write separate Cargo RFC for impl dependency representation + +## Proposed Plan + +### Phase 1: Foundations (can be done in parallel) + +**1a. Terminology audit** +- Review current terms: "defining crate", "implementing crate", "impl deps", "adoption", "view" +- Consider alternatives: "providing crate" vs "defining", "extending crate" vs "implementing" +- Graph theory: "least common dominator" vs "lowest common ancestor" vs "convergence point" +- Consider borrowing from module systems literature (ML, Scala implicits) + +**1b. Tool research** ✓ COMPLETED +- Survey formal methods tools with specific criteria: + - Good at modeling graph structures with constraints + - Can express reachability/dominance properties + - Has counter-example generation + - Reasonably mature tooling +- Create comparison matrix: Alloy vs TLA+ vs Lean vs Dafny vs others + +### Phase 2: Formalization (sequential, builds on Phase 1) + +**2a. Complete invariant catalog** +Core invariants to formalize: +1. **Coherence**: `∀(trait, type). |implementations| ≤ 1` +2. **Acyclic deps**: `∀c. c ∉ transitive_deps(c)` +3. **Visibility**: `impl(A, trait, type) ⟹ trait ∈ visible(A) ∧ type ∈ visible(A)` +4. **Orphan extension**: Define precisely when an impl is "local enough" +5. **Check responsibility**: Every impl-pair has exactly one checker +6. **Check completeness**: Checker can see all relevant impls +7. **Binary root**: Each binary is the coherence root; multiple independent binaries need not be coherent with each other + +**2b. Graph-theoretic formalization** +- Define "checker responsibility" precisely +- Is it the immediate dominator? The least common ancestor in the dep tree? +- Formalize: "crate C must check impls I₁, I₂ iff C is the closest crate where both are visible through distinct direct deps" + +### Phase 3: Generics and Associated Types (separate deep dive) + +This deserves its own analysis because: +- Blanket impls create universal quantification +- Associated types add type-level functions +- Specialization (if/when stabilized) adds partial ordering + +**Key questions:** +1. Does "defining crate" for `impl Trait for Vec` mean the crate defining `Trait`, `Vec`, or the impl itself? +2. Do overlap rules change? (I suspect not, but needs proof) +3. How do associated type constraints interact with visibility? + +**Approach**: Start with simplified model (no generics), then add them incrementally + +### Phase 4: Implementation Concerns (can be parallel with Phase 3) + +**4a. Performance analysis** +- Best case: O(n) linear scan when no impl deps +- Worst case: Could be O(n²) if every crate must check every pair? +- Space: Need to track impl deps in metadata +- Incremental compilation: Which caches invalidate? + +**4b. Restrictions review** +Your specific question about re-exports: +- Current: "You may only implement for types/traits actually defined in my_definitions itself. Re-exports don't count." +- Why this restriction? Probably to avoid ambiguity about which crate is "defining" +- Could lift it if we track def-site precisely in metadata +- Risk: re-export chains could make reasoning harder + +### Phase 5: New Formal Spec (depends on 1b, 2a, 2b) + +**Recommended approach:** +1. **Start with Forge or Alloy 6** for structural properties and counter-example finding +2. **Use Lean 4 or Dafny** for proving the invariants hold (future work) + +**Structure:** +- Module 1: Basic definitions (crates, deps, traits, types, impls) +- Module 2: Visibility and reachability +- Module 3: Original orphan rule (baseline) +- Module 4: Extended orphan rule with impl deps +- Module 5: Checking responsibility assignment +- Module 6: Proof that assigned checks ensure coherence + +### Phase 6: Cargo RFC (separate RFC, depends on Phase 2 and Phase 5) + +This will require a separate RFC submission to define how Cargo represents impl dependencies. **This is likely quite complex and deserves significant design work.** + +**Fundamental architectural challenge: Package vs Crate dependencies** + +Currently, Cargo's dependency model is based on **packages**, while impl deps are fundamentally **crate-level** relationships. Key issues: + +- **One library per package**: Cargo currently allows only one library crate per package (plus build scripts, tests, binaries) +- **Multiple libraries per package**: We likely want to extend Cargo to allow multiple library crates per package with "friend" relationships (impl deps between them) +- **Package-level deps → crate-level deps**: Need to reconcile Cargo's package dependency resolution with rustc's crate-level impl dependency requirements + +**Key challenges:** + +1. **Multiple library crates per package** + - How to declare multiple library crates in `Cargo.toml`? + - Naming scheme for library crates within a package? + - Which library is the "main" one (for backwards compatibility)? + - How do external packages depend on specific library crates within a package? + +2. **Friend/impl relationships within a package** + - Syntax for declaring impl deps between sibling library crates in the same package + - Do these need special visibility rules? + - How does this interact with the module system? + +3. **Cross-package impl dependencies** + - How to specify which library crate within a package is the impl dependency? + - Current proposal: `impl = true` on package dependency, but which crate in the target package? + - Need syntax for package-level dep → specific crate-level impl dep + +4. **Package resolution implications** + - Do impl deps affect resolution differently than regular deps? + - Version compatibility: how do semver rules apply to impl deps? + - Diamond dependencies: what if two packages have impl deps on different versions? + +5. **Publishing and ecosystem concerns** + - Current RFC prohibits publishing crates with impl deps to crates.io + - Workspace restrictions: impl deps only allowed within workspaces (path dependencies) + - How to prevent accidental publishing violations? + - Future: if we allow third-party impls, what are the ecosystem implications? + +6. **Metadata and build system integration** + - What information needs to be in package metadata? + - How do other build tools (Buck/Bazel) represent this? + - Passing impl dep information to rustc via `--extern impl:` + +7. **Interaction with existing features** + - Optional dependencies + - Cargo features and feature flags + - Dev/build dependencies + - Target-specific dependencies + +8. **Error messages and diagnostics** + - When impl dependency constraints are violated + - When someone tries to publish with impl deps + - When cross-workspace impl deps are attempted + +**Research needed:** +- Review other Cargo RFCs that impact dependency resolution +- Study how other build systems handle sub-package dependencies +- Analyze ecosystem impact of multiple libraries per package +- Consider backwards compatibility path + +## Grouping Strategy + +**Can group together:** +- 1a + 1b (terminology and tools inform each other) +- 2a + 2b (invariants and graph theory are intertwined) +- 4a + 4b (both implementation concerns) + +**Must separate:** +- Phase 2 before Phase 5 (need clear invariants before formalizing) +- Phase 3 separate (generics are complex enough to deserve isolated treatment) +- Tool selection (1b) before new spec (Phase 5) + +**Critical path:** 1b → 2a/2b → Phase 5 + +## Specific Recommendations + +**Graph theory term:** The correct term is likely **"immediate dominator"** or **"lowest common dominator"** - the first node in the dependency tree that dominates (can reach) all the implementing crates. In tree terms, it's the "lowest common ancestor" but in a DAG it's the dominator. + +**Re-exports restriction:** Propose relaxing this to "must be defined-or-reexported in the impl-dependency crate" with clear def-site tracking. The restriction seems overly conservative. + +## Formal Verification Tools Research Summary (2024) + +### Current State of Tooling + +**Good news**: Most formal verification tools now have decent VSCode integration, addressing concerns about Alloy's integrated-only environment. + +### Tool Comparison Matrix + +| Tool | IDE Support | Strengths for This Use Case | Weaknesses | 2024 Status | +|------|-------------|------------------------------|------------|-------------| +| **Alloy 6** | VSCode extensions available (multiple options) | ✓ Perfect for relational/graph properties
✓ Excellent counter-example finding
✓ Concise transitive closure
✓ Uses Kodkod SAT solver | - Not for formal proofs
- Bounded model checking only | Active (6.2.0 released Jan 2025) | +| **Forge** | Web-based + local tooling, Sterling visualizer | ✓ Based on Alloy with improvements
✓ Better visualization
✓ More user-friendly | - Teaching-focused
- Smaller community | New (OOPSLA 2024) | +| **TLA+** | Official VSCode extension, very active | ✓ Great for protocols/invariants
✓ Strong community | - Less natural for pure graph properties
- Steeper learning curve | Very active, "Nightly" builds | +| **Dafny** | Excellent VSCode integration, auto-verification | ✓ Can prove implementations correct
✓ Great IDE experience
✓ Verification-aware language | - Requires writing actual code
- Overkill for structural properties | Very active, LSP-based | +| **Lean 4** | VSCode + LSP, best-in-class | ✓ Most powerful prover
✓ Excellent tooling
✓ Strong metaprogramming | - Steep learning curve
- Overkill for this need | Extremely active (DeepMind using it) | +| **Coq** | Two VSCode options (VsCoq official, coq-lsp alternative) | ✓ Mature ecosystem
✓ Good for proofs | - Complex for graph properties
- Not ideal for counter-examples | Active (renamed to Rocq) | +| **Isabelle** | VSCode extension, but jEdit primary | ✓ Very mature
✓ Strong automation | - Tooling not as polished as Lean
- Learning curve | Active but VSCode is secondary | + +### Recommendation for Specific Needs + +Based on requirements (structural graph properties, counter-examples, fixed dependency graphs): + +**Primary tool: Forge or Alloy 6** +- **Forge** if you want better visualization and modern UX (published 2024) +- **Alloy 6** if you want the established tool with more resources +- Both now have VSCode support +- Both use Kodkod's SAT solver underneath (same engine) +- Both excel at relational constraints on graphs + +**For formal proofs later: Lean 4 or Dafny** +- **Lean 4** has the best tooling ecosystem in 2024, but significant learning investment +- **Dafny** is more pragmatic, easier to learn, good IDE integration + +### Specific Findings + +**Alloy improvements:** +- Alloy 6.2.0 (Jan 2025) added mutable state, temporal logic, improved visualizer +- Multiple VSCode extensions now available +- Still uses the integrated Alloy Analyzer, but VSCode extensions bundle it + +**Forge (new discovery):** +- Brown University's modern take on Alloy (2024) +- Better visualization with Sterling +- Language levels for gradual complexity +- Open source: https://github.com/tnelson/Forge + +**P language:** +- Good for state machines and protocols +- Less suited for pure structural properties than Alloy +- Overkill for static graph problem + +**Amazon's formal methods stack (from https://queue.acm.org/detail.cfm?id=3712057):** +- TLA+ for early system design and protocol verification +- P for modeling distributed systems (more programmer-friendly) +- Cedar for authorization policy language +- Dafny for implementation verification +- Kani for Rust code verification + +### Final Recommendation + +1. **Try Forge first** - it's Alloy with 10+ years of UX improvements, published 2024 +2. **Fall back to Alloy 6** if Forge is too teaching-focused or immature +3. **Use VSCode extension for either** to avoid the integrated environment issue +4. **Keep Lean 4 or Dafny in mind** for future formal proofs, but don't start there + +The original instinct about Alloy being right for the problem was correct, and the tooling situation has improved significantly. + +## What the Alloy/Forge Spec Should Look Like + +The problem is purely structural: given a fixed DAG with properties on nodes (definitions, implementations) and edges (deps, impl-deps), can you prove local constraints imply global invariants? + +### Basic Structure + +```alloy +sig Crate { + deps: set Crate, // regular dependencies + impl_deps: set Crate, // impl dependencies (subset of deps) + impls: Trait -> Type, // implementations in this crate +} + +sig Trait { def_crate: one Crate } +sig Type { def_crate: one Crate } + +// impl_deps must be subset of deps +fact { all c: Crate | c.impl_deps in c.deps } + +// Acyclic (build system enforces) +fact { all c: Crate | c not in c.^deps } + +// One binary root +one sig Binary extends Crate {} +fact { all c: Crate - Binary | c in Binary.^deps } +``` + +Then define: +- **Local orphan constraint**: what each crate can implement given its deps/impl_deps +- **Checker responsibility**: which crate must check which impl pairs +- **Prove**: local checks at each crate ⟹ global coherence for Binary + +### Key Insight + +Since each binary has independent coherence: +- Alloy finds counter-examples for a single binary's dep graph +- The property is: "for any dep graph rooted at a Binary, local rules ensure coherence" +- Multiple binaries = multiple separate Alloy checks (they don't interact) + +## Current Alloy Spec Problems + +The existing spec in the RFC has conceptual issues, not tool-choice issues: +1. Doesn't model impl-deps as distinct from regular deps +2. Doesn't formalize "checker responsibility" +3. Has a tautology in `dep_coherent_impl_crates` check +4. Doesn't capture "which crate is responsible for checking coherence" +5. Incomplete - has TODO for the most important constraint From 63c5eb12742a685fea803a3cb515b1c72488e0da Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Thu, 2 Oct 2025 21:11:37 -0700 Subject: [PATCH 8/8] Add analysis of supertraits and intrinsic methods to formal spec plan - Add Phase 3b for supertrait analysis: how do supertrait impls spread across crates interact with coherence and impl deps? - Add Phase 3c for intrinsic methods: extending proposal to allow impl blocks without traits across impl-dep crates - Note intrinsic methods complicate visibility and Cargo integration - Hypothesis: supertraits likely already handled by existing overlap rules - Update approach: incrementally add generics, supertraits, intrinsics --- extern-impl-formal-spec-plan.md | 44 ++++++++++++++++++++++++++++++--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/extern-impl-formal-spec-plan.md b/extern-impl-formal-spec-plan.md index 3dc3f602d9d..c77f221c19f 100644 --- a/extern-impl-formal-spec-plan.md +++ b/extern-impl-formal-spec-plan.md @@ -8,6 +8,8 @@ This document outlines the plan for improving the formal specification of the ex - [ ] Define terminology and graph-theoretic concepts - [ ] Enumerate and formalize all coherence invariants - [ ] Analyze generic type parameters and associated types +- [ ] Analyze supertraits and their interaction with impl deps +- [ ] Analyze intrinsic methods (impl blocks without traits) - [ ] Add performance and complexity analysis section - [ ] Review and revise other spec restrictions - [ ] Design and implement new formal specification @@ -48,9 +50,13 @@ Core invariants to formalize: - Is it the immediate dominator? The least common ancestor in the dep tree? - Formalize: "crate C must check impls I₁, I₂ iff C is the closest crate where both are visible through distinct direct deps" -### Phase 3: Generics and Associated Types (separate deep dive) +### Phase 3: Generics, Associated Types, and Edge Cases (separate deep dive) -This deserves its own analysis because: +This deserves its own analysis because of several complex interactions. + +**3a. Generics and Associated Types** + +These add complexity because: - Blanket impls create universal quantification - Associated types add type-level functions - Specialization (if/when stabilized) adds partial ordering @@ -60,7 +66,39 @@ This deserves its own analysis because: 2. Do overlap rules change? (I suspect not, but needs proof) 3. How do associated type constraints interact with visibility? -**Approach**: Start with simplified model (no generics), then add them incrementally +**3b. Supertraits** + +Supertraits create implicit dependencies between trait implementations. + +**Key questions:** +1. If `trait A: B` and a type T has `impl A` in crate X and `impl B` in crate Y, does this create coherence issues? +2. Does the existing orphan rule already handle this? (Likely yes, but needs verification) +3. With impl deps: can crate X implement A for T if it has an impl dep on the crate defining A, but T's impl of B is elsewhere? +4. Are there any new coherence concerns when supertrait impls are spread across crates? +5. Does the checker responsibility algorithm need to account for supertrait relationships? + +**Hypothesis**: Supertraits are probably already covered by the existing overlap checking rules, since the type system requires `impl A for T` to prove `T: B` exists. But this needs formal verification. + +**3c. Intrinsic Methods (impl blocks without traits)** + +The current proposal focuses on trait implementations. A natural extension is intrinsic methods. + +**Current Rust**: Multiple `impl Type` blocks are allowed in the same crate, spreading methods across different modules/files. + +**Proposed extension**: Allow `impl Type` blocks across multiple impl crates. + +**Key questions:** +1. How much additional complexity does this add to the proposal? +2. Coherence is simpler (no overlap checking needed between intrinsic methods) +3. But visibility becomes more complex: which methods are visible where? +4. Do we need a "defining crate" concept for intrinsic impls? +5. How does this interact with privacy/visibility rules? +6. Does allowing intrinsic impls in impl-dep crates create security/encapsulation concerns? +7. Should this be in the initial proposal or deferred to future work? + +**Consideration**: The RFC mentions this as a future possibility. It may significantly complicate the Cargo story (how to declare which crate provides which methods?) and the coherence checking story (though simpler than traits). + +**Approach**: Start with simplified model (no generics, no supertraits, no intrinsic methods), then add each dimension incrementally ### Phase 4: Implementation Concerns (can be parallel with Phase 3)