diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 9a98611860..9b59170de4 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -191,6 +191,7 @@ - [Sharing the trait solver with rust-analyzer](./solve/sharing-crates-with-rust-analyzer.md) - [`Unsize` and `CoerceUnsized` traits](./traits/unsize.md) - [Having separate `Trait` and `Projection` bounds](./traits/separate-projection-bounds.md) +- [Well-Formedness](./analysis/well-formed.md) - [Variance](./variance.md) - [Coherence checking](./coherence.md) - [HIR Type checking](./hir-typeck/summary.md) diff --git a/src/analysis/well-formed.md b/src/analysis/well-formed.md new file mode 100644 index 0000000000..49c9509fea --- /dev/null +++ b/src/analysis/well-formed.md @@ -0,0 +1,293 @@ +# Well-Formedness + +## What is Well-Formedness? + +"Well-formed" means "correctly built."[^wf-history] Something is _well-formed_ when its structure follows rules. When we use this term in the Rust compiler we are concerned with establishing some kind of _internal consistency_. + +## Well-Formedness in Rust + +To check that something is well-formed is to perform a "Well-formedness check." + +In the Rust compiler there are two different forms of well-formedness checking: + +- **Type-Level Term**[^terms][^terms-abbreviated] well-formedness check. + - Also called "Term well-formedness" or "Term well-formedness checking." + - Not a distinct analysis stage, this gets performed throughout analysis. +- **Item**[^items] well-formedness check (item-wfck.) + - "Item-wfck" will often wind up requiring Terms be well-formed, but skips some areas. + - Inner "Terms" can (incorrectly) get normalized first. + - More coherent as a stage in the compiler than "term well-formedness" (which is performed in many places.) + +See: [What Well-Formedness Isn't](#what-well-formedness-isnt). + +## Well-Formedness of Type-Level Terms + +Term well-formedness checking begins with building a list of things that need to be true for a term to be well-formed. We call these "Obligations"[^obligations]. + +Type-Level Terms are considered Well-Formed when their associated obligations are satisfied by the trait solver. + +### Obligations for Well-Formedness + +Specific obligations are things like `String: Clone`, `A: usize`, or `::Item: Debug`. + +On this page we show the split between obligations and terms/items as: + +```rust,ignore + +--- + +``` + +Here is an example of a well-formed type-level term: + +```rust,ignore +Vec +--- +// Obligations to fulfill +Vec where T: Sized +// Trait solver says `String: Sized` is true, so this is well-formed. +Vec where String: Sized +``` + +When we compute the obligations for `Vec`, we'll find that `Vec` generates the obligation `T: Sized`. We substitute `T` with `String` in `Vec`, so we find the obligation `String: Sized` which the trait solver will determine to be satisfied. + +The following **is not** well-formed: + +```rust,ignore +Vec +--- +// Obligations to fulfill +Vec where T: Sized +// Trait solver says `str: Sized` is not true, so this is not well-formed. +Vec where str: Sized +``` + +The above computes the obligation `T: Sized`, like before, but we substitute `T` for `str` in the instance of `Vec` finding the obligation `str: sized`. This obligation will be determined by the trait solver to be _unsatisfied_. + +#### Determining Obligations + +In the compiler, obligations of terms are found through the [`obligations`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_selection/traits/wf/fn.obligations.html) function in the [term well-formedness module](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_selection/traits/wf/index.html). + +#### Other Obligations + +Obligations are more than just trait and const generic bounds, but we've only mentioned these specific obligations so far as they are what we care about when we do "well-formedness checking" of terms. See: [`PredicateKind`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_type_ir/predicate_kind/enum.PredicateKind.html) and [`ClauseKind`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_type_ir/predicate_kind/enum.ClauseKind.html) for a full list of obligations. + +### We Don't Need Normalization (Yet) + +[Normalization](../normalization.md) is the process of resolving [type aliases](../normalization.md#aliases) into their underlying type. + +A type alias is considered well-formed if its where clauses are satisfied. The underlying type undergoes well-formedness checking at most definition and instantiation sites, but there are exceptions. + +### Const Generic Arguments + +Term well-formedness is responsible for getting "type checking" obligations of const generic terms[^tyck-const-generics]. Let's look at the following use of const generics: + +```rust,ignore +fn use_const_generics() { /* ... */ } +// call site +use_const_generics::<6>(); +--- +// call site wfck obligations +const 6: usize +``` + +The call site will provide us with the obligation `6: usize` during well-formedness checking. This obligation will be passed off to the trait solver just like any trait-style obligation, as the trait solver has more responsibilities than its name suggests. + +## Well-Formedness of Items + +Items are, generally speaking, "Things that get defined." Item-wfck happens at the signature level for types and functions, methods, and definitions/implementations of traits. + +```rust,ignore +// The `Vec` is checked during item wfck +fn foo(_: Vec) { + // The `Vec<[u8]>` is not handled by item wfck as it's not in the signature + let _: Vec<[u8]> +} +--- +Vec: Sized // Generated +Vec<[u8]>: Sized // Not done at item-wfck. Done elsewhere. +``` + +Item-wfck has more responsibilities than only collecting the obligations of its internal type-level terms and passing them to the trait solver. We do not talk about all of these here, but they can be found at the individual `check_*` functions in [**the item-wfck module**](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir_analysis/check/wfcheck/index.html). + + + +### Global and Trivial Bounds + + + +Trait bounds are a common Obligation. Global and Trivial trait bounds are kinds of trait bounds where we already have enough information to determine if they are true or false. Item-wfck is responsible for finding and checking these bounds. + +- **Global bounds** are, in the old solver, post-normalization bounds that don't contain any generic parameters (like `` or `'a`) or bound variables (like `for<'b>`). +- **Trivial bounds** are bounds that do not need further normalization to determine if they're well-formed or not. + +Consider the following function definition: + +```rust,ignore +fn apartment_complex(block: T, name: String) where String: Clone { /* ... */ } +--- +String: Clone // Trivial & Global bound! There's no aliases to resolve. +// There could be bligations on T but we don't care about them here. +``` + +This produces a trait bound obligation `String: Clone` that is _Global_ (no generic parameters) and _Trivial_ (didn't require normalization to be well-formedness checked). The trait solver doesn't need to be given any additional information for it to be able to make a judgment on the well-formedness of `String: Clone`. + +False trivial bounds are simply trivial bounds that do not hold. The following is a basic example: + +```rust,ignore +fn apartment_simple(block: T, name: String) where String: Copy { /* ... */ } +--- +String: Copy // Trivial bound again, but this one is false! +``` + +Here we have a trivial bound that does not hold, because `String` is not `Copy`. + +#### Trivial Bounds Are Not Always Global + +Trivial Bounds are not a subset of Global Bounds. A trivial bound that isn't Global is `for<'a> String: Clone` (trivially true, has a bound variable) or `&'a str: Copy` (trivially false, has a generic parameter). + +#### Item-Wfck and Trivial/Global Bounds + + + +When checking items are well-formed we will check that there are no trivially false global bounds. + +## When We Don't Fully Do Well-Formedness Checking + +Well-formedness checking is not a coherent "stage" of type checking. There are many areas where well-formedness checking is performed, and some areas where we skip over well-formedness checking due to limitations in what kinds of analysis we can currently perform. Ideally, we would never skip or defer well-formedness checking. + +### We (Sometimes) Need Normalization + +There are places where normalization of an Item happens before its Terms have gone through well-formedness checking. This is considered problematic as doing so allows some terms to [bypass term well-formedness checking entirely](https://github.com/rust-lang/rust/issues/100041). + +### Trait Objects + +We do not require the where clauses of trait objects to be well-formed when determining if that trait object is well-formed. These where clauses are proven when coercing into a trait object, but this remains a hole in well-formedness checking. + +As an example, the following will compile because we don't have a point where we're constructing the trait object from a concrete type: + +```rust,ignore +trait Trait +where + for<'a> [u8]: Sized {} + +fn foo(_: &dyn Trait) {} +--- +// This doesn't end up being generated, because it happens within a trait object. +[u8]: Sized +``` + +The above should not compile because `[u8]: Sized`, but this won't be checked until actual use: + +```rust +trait Trait +where + for<'a> [u8]: Sized {} + +fn foo(_: &dyn Trait) {} + +// We still need to specify the bound here, otherwise `[u8]: Sized` _is_ +// checked as an obligation. +impl Trait for u8 where for<'a> [u8]: Sized {} + +fn main() { + // No matter what we do, this boundary between concrete type and trait + // object will produce the obligation `[u8]: Sized`, which will fail when + // handed over to the trait solver. + let object: Box = Box::new(42u8); + foo(&object); +} +``` + +This exception does not apply to Const Generic Arguments in trait objects: + +```rust,ignore +trait Trait {} +fn foo(_: &dyn Trait) {} +--- +const N: usize +const B: bool +N = B // Substitution +const B: usize + bool +``` + +The above doesn't compile, unlike the previous example we gave. We're doing _some_ well-formedness checking here when it comes to the const generic arguments. + +### Binders / Higher-Ranked Types + +Binders / Higher-Ranked Types reduce the amount well-formedness checking we do on a term, leaving well-formedness checking to when the bound is instantiated: + +```rust,ignore +let _: for<'a> fn(Vec<[&'a ()]>); +--- +// This doesn't end up being generated, because it happens within a HRB +[&'a ()]: Sized // slices aren't sized, this would fail! +``` + +Specifically, obligations involving variables from binders (`for<'a>`) are only checked when the binder is instantiated. Some things are stilled checked under the `for<'a>`, but we still skip a lot of things. + +A lot of unsoundness surrounds this behavior. See: [#25860](https://github.com/rust-lang/rust/issues/25860), [#84591](https://github.com/rust-lang/rust/issues/84591). + +Let's consider the following: + +```rust,ignore +for<'a, 'b> fn(&'a &'b ()) +``` + +The above HRB implies `'b: 'a` (a lifetime bound), rather than two completely separate lifetimes. This is normal lifetime behavior, but during well-formedness checking we cannot prove that this bound is generally true[^horrible], so we skip it. + +### Free Type Aliases + +The right-hand side of Free Type Aliases[^fta] is not fully checked to be well-formed at the definition site, only the types of const generic arguments in the RHS are checked. + +The following free type alias passes type checking, at time of writing: + +```rust,ignore +type WorksButShouldNot = Vec; +--- +// This should fail! But we skip the RHS of free type aliases +str: Sized // Not generated +``` + +This shouldn't work, as both `T: Sized`, `str: Sized` are implied by `Vec`. This "passes" item-wfck because the RHS of a free type alias doesn't go through well-formedness checking _until it's used_. Item-wfck is **deferred until use** for this specific case. + +For Const Generics we still do a small amount of well-formedness checking at the definition site of a free type alias. This is consistent with our current special-casing of const generic well-formedness checking when we skip over things like where bounds. + +This means that the following, despite being of a similar form to the above example, fails as it should: + +```rust,ignore +pub struct Consty; +type Alias = Consty<42>; +--- +// This *is* generated as an obligation, so this (correctly) fails. +42: bool // This is generated! +``` + + + +## "Well-Formed" or "Wellformed"? + +Prefer "well-formed" over "wellformed," as this is consistent with logic literature. This also gets abbreviated to WF in other parts of the dev guide / docs. + +## Informal Usage + +In conversation, contributors may refer to something as "well-formed" and not necessarily mean what we cover here because "well-formedness" is a general phrase associated with the correctness of formal structures. This isn't necessarily in error, but it should be looked out for. + +## What Well-Formedness Isn't + +Well-formedness checking is not "number of parameters" or "parameter type" checking[^kind-checking]. Neither term well-formedness checking nor item-wfck is concerned with if a type with 2 parameters has 1 or 3 types applied to it (assuming no defaults), or if a const generic parameter has a type applied to it. These kinds of problems will get handled during HIR-ty Lowering[^hir-ty-lower], not wfck. + +Well-formedness doesn't check or validate lifetimes, this is handled in [MIR](../borrow-check.md). + +Well-formedness in the Rust compiler doesn't correspond to "correct syntax" as it does in logic. The term has a history of general use in a mathematical context of "follows a given set of rules." In Rust, our original usage was closer to "this thing is internally consistent" with respect to the bounds on a type in places such as the original [clarification on projections and well-formedness RFC](https://github.com/rust-lang/rfcs/blob/master/text/1214-projections-lifetimes-and-wf.md). + +[^obligations]: These get referred to as Obligations, Requirements, or Constraints in the documentation. Preferred term is "obligations", as this matches the suffix of the type and the names of relevant functions. In future, this may be superseded by the new solver's term "Goal." +[^wf-history]: In linguistics this is "grammatically correct," in logic it is "syntactically correct," and in casual mathematician use it can be read as a more general "follows the rules we set for this domain." +[^horrible]: Instead, this bound is checked during "MIR borrowck" when the lifetimes are instantiated. +[^fta]: Type aliases not associated with anything, i.e. a module-level `type Alias = Vec;`. +[^items]: "Definition" style things in rust, See the [glossary](../appendix/glossary.md). +[^terms]: AKA Type expressions and subexpressions in the general sense, not a specific struct or enum in the rust compiler. See the [glossary](../appendix/glossary.md). +[^terms-abbreviated]: Abbreviated as "Terms" on this page in some areas. +[^kind-checking]: AKA "kind checking", as we might see in languages like Haskell. +[^hir-ty-lower]: +[^tyck-const-generics]: \ No newline at end of file diff --git a/src/appendix/glossary.md b/src/appendix/glossary.md index 30959f4b39..527da87b7b 100644 --- a/src/appendix/glossary.md +++ b/src/appendix/glossary.md @@ -102,6 +102,7 @@ Term | Meaning trans 👎 | Short for _translation_, the code to translate MIR into LLVM IR. **Renamed to** [codegen](#codegen). `Ty` | The internal representation of a type. ([see more](../ty.md)) `TyCtxt` | The data structure often referred to as [`tcx`](#tcx) in code which provides access to session data and the query system. +Type-Level Term | An expression at the type level, such as a Type or Const Generic. UFCS 👎 | Short for _universal function call syntax_, this is an unambiguous syntax for calling a method. **Term no longer in use!** Prefer _fully-qualified path / syntax_. ([see more](../hir-typeck/summary.md), [see the reference](https://doc.rust-lang.org/reference/expressions/call-expr.html#disambiguating-function-calls)) uninhabited type | A type which has _no_ values. This is not the same as a ZST, which has exactly 1 value. An example of an uninhabited type is `enum Foo {}`, which has no variants, and so, can never be created. The compiler can treat code that deals with uninhabited types as dead code, since there is no such value to be manipulated. `!` (the never type) is an uninhabited type. Uninhabited types are also called _empty types_. upvar | A variable captured by a closure from outside the closure.