microsoft/openvmm
Publicmirrored fromhttps://github.com/microsoft/openvmmAvailable
Guide/src/dev_guide/contrib/code.md
480lines · modecode
| 1 | # Coding conventions |
| 2 | |
| 3 | One of our major goals with OpenVMM is to provide a high quality coding |
| 4 | experience for contributors, starting first-and-foremost by having a consistent |
| 5 | set of coding conventions in the project. |
| 6 | |
| 7 | [Do your part](https://youtu.be/-_7FaWnlhS4?t=10) and keep OpenVMM clean! |
| 8 | |
| 9 | ## `rustfmt` |
| 10 | |
| 11 | _Checked Automatically:_ **Yes** (via [`cargo xtask fmt rustfmt`](../dev_tools/xtask.md)) |
| 12 | |
| 13 | OpenVMM source must be formatted using |
| 14 | [rustfmt](https://github.com/rust-lang/rustfmt), which automatically and |
| 15 | mechanically applies standard formatting to all the code. This eliminates time |
| 16 | spent discussing or reviewing stylistic issues in pull requests. |
| 17 | |
| 18 | The CI will run `rustfmt --check` to enforce consistent formatting, and will |
| 19 | fail if it notices any discrepancies. |
| 20 | |
| 21 | Unfortunately, `rustfmt` isn't infinitely customizable, and there are several |
| 22 | rules that are must be manually enforced: |
| 23 | |
| 24 | - All lines must end with LF, not CRLF. |
| 25 | - Top-level `use` imports should be non-nested. |
| 26 | |
| 27 | ```admonish note |
| 28 | Some of these manually-enforced conventions were introduced late into |
| 29 | OpenVMM's development, and there may be chunks of the codebase that do not |
| 30 | adhere to these conventions. |
| 31 | |
| 32 | If you're working in a file and notice that it isn't following a certain |
| 33 | convention, please take a moment to fix it! |
| 34 | ``` |
| 35 | |
| 36 | Assuming you've followed the [suggested dev env setup](../getting_started/suggested_dev_env.md) |
| 37 | and set up `rust-analyzer` to format-on-save, you should rarely have to think |
| 38 | about formatting in `.rs` files. |
| 39 | |
| 40 | ## House Rules |
| 41 | |
| 42 | _Checked Automatically:_ **Yes** (via [`cargo xtask fmt house-rules`](../dev_tools/xtask.md)) |
| 43 | |
| 44 | "House-rules" are a set of misc code lints that are specific to OpenVMM, which |
| 45 | are enforced using a custom in-house tool: |
| 46 | |
| 47 | - enforce the presence of the standard Microsoft copyright header |
| 48 | - enforce in-repo crate names don't use '-' in their name (use '_' instead) |
| 49 | - enforce Cargo.toml files don't include autogenerated "see more keys" comments |
| 50 | - enforce Cargo.toml files don't contain author or version fields |
| 51 | - enforce files end with a single trailing newline |
| 52 | - deny usage of `#[repr(packed)]` (you want `#[repr(C, packed)]`) |
| 53 | - justify usage of `cfg(target_arch = ...)` (use `guest_arch` instead!) |
| 54 | - justify usage of `expect(unsafe_code)` with an UNSAFETY comment |
| 55 | |
| 56 | Some of these lints are self explanatory, whereas others are described in more |
| 57 | detail elsewhere on this page. |
| 58 | |
| 59 | ## Unused `Cargo.toml` Dependencies |
| 60 | |
| 61 | _Checked Automatically:_ **Yes** (via [`cargo xtask fmt unused-deps`](../dev_tools/xtask.md)) |
| 62 | |
| 63 | We have an in-repo fork of |
| 64 | [`cargo-machete`](https://github.com/bnjbvr/cargo-machete) that ensures |
| 65 | `Cargo.toml` files only include dependencies that are actually being used. |
| 66 | |
| 67 | Avoiding unused dependencies makes it easier to reason about what a crate is |
| 68 | doing just by looking at its dependencies, and also helps cut-down on |
| 69 | incremental compile times. |
| 70 | |
| 71 | ## Formatting (`Cargo.toml`) |
| 72 | |
| 73 | _Checked Automatically:_ **No** |
| 74 | |
| 75 | When defining dependencies in `Cargo.toml` files, please organize dependencies |
| 76 | into the following groups: |
| 77 | |
| 78 | - crate-specific "subcrates" |
| 79 | - crates under vm/ |
| 80 | - crates under vm/vmcore/ |
| 81 | - crates under support/ |
| 82 | - external dependencies |
| 83 | |
| 84 | The rationale here is that crates should be grouped according to how related to |
| 85 | how _widely applicable_ they are. i.e: crates from crates.io and support are |
| 86 | widely applicable outside of OpenVMM, whereas crates under vmcore/ only make |
| 87 | sense within the context of OpenVMM, and crate-specific subcrates are - by |
| 88 | definition - only applicable to the crate they are being imported from. |
| 89 | |
| 90 | Additionally, we make use of the |
| 91 | [workspace dependencies](https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table) |
| 92 | feature to ensure that all our dependencies stay in sync. This requires defining |
| 93 | dependencies in your crate's `Cargo.toml` file and in the project's root `Cargo.toml`. |
| 94 | |
| 95 | So, for example: |
| 96 | |
| 97 | ```toml |
| 98 | [package] |
| 99 | name = "openvmm" |
| 100 | |
| 101 | [dependencies] |
| 102 | # crate-specific subcrates |
| 103 | openvmm_core.workspace = true |
| 104 | ... |
| 105 | |
| 106 | # /vmcore |
| 107 | vmcore.workspace = true |
| 108 | ... |
| 109 | |
| 110 | # /vm/devices |
| 111 | firmware_uefi_custom_vars.workspace = true |
| 112 | storvsp.workspace = true |
| 113 | ... |
| 114 | |
| 115 | # /support |
| 116 | guid.workspace = true |
| 117 | inspect.workspace = true |
| 118 | inspect_proto.workspace = true |
| 119 | ... |
| 120 | |
| 121 | # external dependencies |
| 122 | anyhow.workspace = true |
| 123 | cfg-if.workspace = true |
| 124 | clap.workspace = true |
| 125 | ... |
| 126 | ``` |
| 127 | |
| 128 | ## Linting (via `clippy`) |
| 129 | |
| 130 | _Checked Automatically:_ **Yes** |
| 131 | |
| 132 | OpenVMM uses [`cargo clippy`](https://github.com/rust-lang/rust-clippy) to |
| 133 | supplement rustc's built-in lints. |
| 134 | |
| 135 | Assuming you've followed the guide and set up `rust-analyzer` to |
| 136 | [use clippy](../getting_started/suggested_dev_env.md#enabling-clippy), you should see clippy lints |
| 137 | appear inline when working on Rust code. |
| 138 | |
| 139 | The CI runs `cargo clippy` on every crate in the repo prior to building the |
| 140 | project, and will fast-fail if it catches any warnings / errors. |
| 141 | |
| 142 | ### Suppressing Lints |
| 143 | |
| 144 | In general, lints should be fixed by modifying the code to satisfy the lint. |
| 145 | However, there are cases where a lint may need to be `expect`'d inline. |
| 146 | |
| 147 | In these cases, you _must_ provide a inline comment providing reasonable |
| 148 | justification for the suppressed lint. |
| 149 | |
| 150 | e.g: |
| 151 | |
| 152 | ```rust,ignore |
| 153 | // x86_64-unknown-linux-musl targets have a different type defn for |
| 154 | // `libc::cmsghdr`, hence why these lints are being suppressed. |
| 155 | #[expect(clippy::needless_update, clippy::useless_conversion)] |
| 156 | libc::cmsghdr { |
| 157 | cmsg_level: libc::SOL_SOCKET, |
| 158 | cmsg_type: libc::SCM_RIGHTS, |
| 159 | cmsg_len: (size_of::<libc::cmsghdr>() + size_of_val(fds)) |
| 160 | .try_into() |
| 161 | .unwrap(), |
| 162 | ..std::mem::zeroed() |
| 163 | } |
| 164 | ``` |
| 165 | |
| 166 | ### OpenVMM's `clippy` Configuration |
| 167 | |
| 168 | We stick fairly close to the default set of rustc / clippy lints, though there |
| 169 | are some default lints that we've decided to disabled project wide, and other |
| 170 | non-default lints which we've explicitly opted into. |
| 171 | |
| 172 | See the `[workspace.lints]` sections of OpenVMM's root |
| 173 | [`Cargo.toml`](https://github.com/microsoft/openvmm?path=/Cargo.toml) |
| 174 | for a list of globally enabled/disabled lints, along with justification as to |
| 175 | why certain lints have been enabled/disabled. |
| 176 | |
| 177 | ## Unsafe Code Policy |
| 178 | |
| 179 | ```admonish danger |
| 180 | When possible, try to avoid introducing new `unsafe` code! |
| 181 | |
| 182 | Before rolling your own `unsafe` code, check to see if a safe abstraction |
| 183 | already exists, either in-tree, on crates.io\*, or in the standard library. |
| 184 | |
| 185 | \*subject to an unsafe-code audit |
| 186 | ``` |
| 187 | |
| 188 | Rather than synthesizing our own unsafe code conventions, we follow the |
| 189 | guidelines outlined in the following two resources: |
| 190 | |
| 191 | - [The Rust `unsafe` docs](https://doc.rust-lang.org/std/keyword.unsafe.html) |
| 192 | - [The Fuchsia "Unsafe code in Rust" Guidelines](https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/docs/development/languages/rust/unsafe.md) |
| 193 | |
| 194 | In a nutshell: |
| 195 | |
| 196 | - `unsafe` functions are required to include `/// # Safety` documentation |
| 197 | describing any preconditions the caller must uphold when calling the function. |
| 198 | - `unsafe {}` blocks are required to include a `// SAFETY:` comment describing |
| 199 | how the preconditions for calling the `unsafe` function(s) within the block |
| 200 | are being satisfied. |
| 201 | - `expect(unsafe_code)` annotations are required to include an `// UNSAFETY:` |
| 202 | comment justifying why the code in question needs to use `unsafe`. This |
| 203 | annotation must be placed at the module or crate level. |
| 204 | |
| 205 | These requirements are enforced by CI, and will cause the build to fail if |
| 206 | required documentation is missing. |
| 207 | |
| 208 | Editing a file containing unsafe code will trigger CI to label your PR and warn |
| 209 | reviewers that the PR touches unsafe code. This is to ensure that all unsafe code |
| 210 | is audited for correctness by area experts. |
| 211 | |
| 212 | ## Uses of `cfg(target_arch = ...)` must be justified |
| 213 | |
| 214 | _Checked Automatically:_ **Yes** (via [`cargo xtask fmt house-rules`](../dev_tools/xtask.md)) |
| 215 | |
| 216 | Unless you're working on something that's genuinely tied to the host's CPU |
| 217 | architecture, you should use `cfg(guest_arch = ...)` instead of `cfg(target_arch = ...)`. |
| 218 | |
| 219 | OpenVMM is a multi-architecture VMM framework, capable of running running x64 |
| 220 | guests on a x64 host, as well as Aarch64 guests on Aarch64 hosts (with the |
| 221 | potential of adding additional platforms in the future). |
| 222 | |
| 223 | At the moment, OpenVMM requires that the host architecture and guest |
| 224 | architecture match. That said, it's possible that at some point in the future, |
| 225 | OpenVMM may also support _mismatched_ guest/host architectures, via an emulated |
| 226 | CPU virtualization backend (akin to QEMU). |
| 227 | |
| 228 | Having an emulated CPU backend would enable OpenVMM to support such useful |
| 229 | scenarios as: |
| 230 | |
| 231 | - running Arch64 guests on x86 machines |
| 232 | - running x86 guests on ARM |
| 233 | - running something exotic (e.g: RISC-V) on x86/ARM machines |
| 234 | - ...assuming we had the bandwidth to implement + maintain something like that |
| 235 | - running OpenVMM on systems without hardware-accelerated CPU virtualization enabled |
| 236 | |
| 237 | With these scenarios in mind, it would be short-sighted to rely entirely on |
| 238 | `cfg(target_arch = ...)` to gate guest-facing arch-specific functionality, as it |
| 239 | would inexorably tie the guest's arch to the host's arch, making any future |
| 240 | initiatives to pry the two apart significantly more difficult! |
| 241 | |
| 242 | As such, the `OpenVMM` repo includes infrastructure to specify a custom, |
| 243 | OpenVMM-specific `cfg(guest_arch = ...)` cfg parameter. |
| 244 | |
| 245 | By default, `cfg(guest_arch = ...)` will act the same way as `cfg(target_arch = |
| 246 | ...)`, but it can be swapped to a different architecture by setting the |
| 247 | `OPENVMM_GUEST_TARGET` env var at compile-time. |
| 248 | |
| 249 | There are very few reasons to use `cfg(target_arch = ...)` within the OpenVMM |
| 250 | repo, and to enforce this rule, we have an in-house `xtask fmt house-rules` |
| 251 | check that lints each use of `cfg(target_arch = ...)` to include a |
| 252 | "justification" for why it's being used. |
| 253 | |
| 254 | e.g: `cfg(target_arch = ...)` would be applicable when feature-gating a CPU |
| 255 | intrinsic (such as CPUID, or a SIMD instruction), or when implementing a |
| 256 | `*-sys` crate where the underlying C API/ABI varies between architectures. |
| 257 | |
| 258 | ...otherwise, use `cfg(guest_arch = ...)`! |
| 259 | |
| 260 | ## Avoid `Default` when using `zerocopy::FromZeros` |
| 261 | |
| 262 | _Checked Automatically:_ **No** |
| 263 | |
| 264 | The rule: |
| 265 | |
| 266 | - A type can `derive(Default)` **XOR** `derive(FromZeros)`. |
| 267 | - A type that is `FromZeros` can also `impl Default`, but it must be a |
| 268 | conscious, explicit choice, with justification (read: inline comment) as to |
| 269 | why that particular default value was chosen. |
| 270 | - N.B. `derive(IntoBytes)` or `derive(FromBytes)` imply `derive(FromZeros)`. That is: |
| 271 | this convention applies to those types as well. |
| 272 | |
| 273 | The why: |
| 274 | |
| 275 | - The all-zero type is often not a semantically valid `Default` value for a type |
| 276 | - There are plenty of types that don't have a `Default` value, but _do_ have a |
| 277 | valid all-zero repr |
| 278 | |
| 279 | ### Additional context |
| 280 | |
| 281 | As per the Rust docs for [`Default::default`](https://doc.rust-lang.org/std/default/trait.Default.html#tymethod.default): |
| 282 | |
| 283 | ```admonish quote |
| 284 | `fn default() -> Self` |
| 285 | |
| 286 | Returns the “default value” for a type. |
| 287 | |
| 288 | Default values are often some kind of initial value, identity value, or anything |
| 289 | else that may make sense as a default. |
| 290 | ``` |
| 291 | |
| 292 | Notably, **default should _not_ be used as some shorthand to "zero initialize" |
| 293 | values!** For most non-trivial structs, the all-zero representation is _not_ a |
| 294 | semantically valid `Default`! |
| 295 | |
| 296 | This is true in many contexts... but one that's particularly relevant in OpenVMM |
| 297 | is that of FFI via C-style APIs and ABIs. |
| 298 | |
| 299 | In C, it's very common for types to undergo multi-stage instantiation, where |
| 300 | they are initially allocated as all-zero, and then get "filled in" by some |
| 301 | secondary init code. Notably: it's quite rare for that initial "all-zero" struct |
| 302 | to be a valid instance of the type! |
| 303 | |
| 304 | #### Example: C FFI |
| 305 | |
| 306 | For example, a common pattern in C libraries might look something like: |
| 307 | |
| 308 | ```c |
| 309 | struct Handle { |
| 310 | uint16_t opaque_handle; |
| 311 | } |
| 312 | |
| 313 | struct Handle handle = {0}; |
| 314 | init_handle(&handle); |
| 315 | |
| 316 | update_handle(handle, options); |
| 317 | do_thing(handle); |
| 318 | ``` |
| 319 | |
| 320 | In this example: it would be an error to invoke `update_handle` or `do_thing` |
| 321 | with an all-zero handle, as the type hadn't finished being fully initialized. |
| 322 | |
| 323 | If we wanted to use this library from Rust, a "naive" approach would be to do |
| 324 | something like: |
| 325 | |
| 326 | ```rust,ignore |
| 327 | #[repr(C)] |
| 328 | #[derive(Default)] |
| 329 | struct Handle { |
| 330 | opaque_handle: u16, |
| 331 | } |
| 332 | |
| 333 | let mut handle = Handle::default(); // BAD! |
| 334 | unsafe { init_handle(&handle) }; |
| 335 | |
| 336 | unsafe { update_handle(handle, options) }; |
| 337 | unsafe { do_thing(handle) }; |
| 338 | ``` |
| 339 | |
| 340 | While this _technically_ works... using `Default` here is kinda bogus! |
| 341 | |
| 342 | After all - `Handle::default()` doesn't actually call `init_handle`, which means |
| 343 | the value returned by `default()` doesn't match the "promise" of the trait! |
| 344 | Namely: the returned value is not semantically valid yet! |
| 345 | |
| 346 | In many other Rust codebase, this "overloading" of `Default` to represent both |
| 347 | semantically valid value _and_ all-zero values (in FFI) is par for the course, |
| 348 | as once structs get more complicated, having a `derive` that is able to fully |
| 349 | init a "uninitialized" struct in-memory is quite handy... |
| 350 | |
| 351 | In OpenVMM, we don't do this. Instead, we use a separate trait to init all-zero |
| 352 | structs. |
| 353 | |
| 354 | **In OpenVMM, we use `FromZeros` and `FromZeros::new_zeroed()` to work with types |
| 355 | that have valid all-zero representations, _without_ implying that those types |
| 356 | also have valid all-zero _default_ values!** |
| 357 | |
| 358 | So, for the example above: |
| 359 | |
| 360 | ```rust,ignore |
| 361 | #[repr(C)] |
| 362 | #[derive(zerocopy::FromZeros)] |
| 363 | struct Handle { |
| 364 | opaque_handle: u16 |
| 365 | } |
| 366 | |
| 367 | let mut handle = Handle::new_zeroed(); // GOOD! |
| 368 | unsafe { init_handle(&handle) }; |
| 369 | ``` |
| 370 | |
| 371 | Now, it's impossible for code elsewhere to obtain a `Handle` via |
| 372 | `Handle::default`, and mistakenly forget to invoke `init_handle` on it. |
| 373 | |
| 374 | ...but if it so happens that we _do_ want a `Default` impl for `Handle`, we can |
| 375 | do so by _manually_ implementing `derive(Default)` ourselves: |
| 376 | |
| 377 | ```rust,ignore |
| 378 | // Default + FromZeros: `default` returns fully initialized handle |
| 379 | impl Default for Handle { |
| 380 | fn default() -> Handle { |
| 381 | let mut handle = Handle::new_zeroed(); |
| 382 | unsafe { init_handle(&handle) }; |
| 383 | handle |
| 384 | } |
| 385 | } |
| 386 | ``` |
| 387 | |
| 388 | ## Avoid Requiring `Debug` on Traits |
| 389 | |
| 390 | _Checked Automatically:_ **No** |
| 391 | |
| 392 | **TL;DR:** Don't do this: |
| 393 | |
| 394 | ```rust,ignore |
| 395 | trait MyTrait: std::fmt::Debug |
| 396 | ``` |
| 397 | |
| 398 | Implementations of the standard library's `Debug` trait can be surprisingly large, |
| 399 | and the final binary size of OpenHCL and related binaries is a major concern |
| 400 | for us. Unused implementations of this trait are usually removed during the |
| 401 | optimization process (like all dead code), making this a non-issue. However when |
| 402 | traits, and more specifically trait objects, are involved, the compiler has a |
| 403 | much more difficult time proving that implementations are unused. This can |
| 404 | result in large amounts of functionally dead code ending up in the final |
| 405 | binaries. |
| 406 | |
| 407 | If you need to implement `Debug` for a struct containing such a trait object, you |
| 408 | will need to do so manually, so that you can skip over that field. |
| 409 | |
| 410 | Moreover, it's usually not good form to leave tracing statements that log a |
| 411 | struct's `Debug` representation in production. Prefer tracing just the fields |
| 412 | you're interested in, and/or connecting objects to the `Inspect` graph. |
| 413 | |
| 414 | ## Crate Naming |
| 415 | |
| 416 | _Checked Automatically:_ **Yes** (via [`cargo xtask fmt house-rules`](../dev_tools/xtask.md)) |
| 417 | |
| 418 | **Crates must be named with underscores, not dashes and underscores used in |
| 419 | folder names.** |
| 420 | |
| 421 | - Bad: `my-cool-crate` |
| 422 | - Good: `my_cool_crate` |
| 423 | |
| 424 | Rust does not allow dashes in imports, with any dashes getting replaced with |
| 425 | underscores when used in the code. Avoiding dashes altogether makes it easier to |
| 426 | grep for crate names, and makes things more consistent across the repo. |
| 427 | |
| 428 | This convention is enforced by CI |
| 429 | |
| 430 | **Do not name crates with the words "base, util, common" or other terms that are |
| 431 | overly general.** |
| 432 | |
| 433 | For example, consider a crate that provides a common data structure used by |
| 434 | multiple devices: |
| 435 | |
| 436 | - Bad: `devices_common` |
| 437 | - Good: `range_map` |
| 438 | |
| 439 | Libraries that contain the following eventually become a mishmash of unrelated |
| 440 | functionality that is located there for convenience. This |
| 441 | [blog post](https://dave.cheney.net/2019/01/08/avoid-package-names-like-base-util-or-common) |
| 442 | goes more in-depth as to why. |
| 443 | |
| 444 | Instead, name things based on what they logically provide, like functionality or |
| 445 | data types. |
| 446 | |
| 447 | ## Release Gates Workflow |
| 448 | |
| 449 | _Triggered Manually:_ **Yes** (via GitHub labels) |
| 450 | |
| 451 | In addition to the standard PR gates that run in debug mode, OpenVMM also provides |
| 452 | a "Release Gates" workflow that runs the same checks but compiled in release mode. |
| 453 | This workflow takes significantly longer to run but can catch issues that only |
| 454 | manifest in optimized builds. |
| 455 | |
| 456 | ### When to Use Release Gates |
| 457 | |
| 458 | The release gates workflow should be used when: |
| 459 | |
| 460 | - Making changes to performance-critical code paths |
| 461 | - Modifying compiler flags or build configuration |
| 462 | - Implementing low-level optimizations |
| 463 | - Debugging issues that only appear in release builds |
| 464 | - Before merging large refactoring changes |
| 465 | |
| 466 | ### How to Trigger Release Gates |
| 467 | |
| 468 | To run the release gates on your PR: |
| 469 | |
| 470 | 1. Ensure your PR is ready for review (not in draft mode) |
| 471 | 2. Add the `release-ci-required` label to your PR |
| 472 | 3. The workflow will automatically trigger and run all checks in release mode |
| 473 | |
| 474 | The workflow will only run when the specific label is present, so you have full |
| 475 | control over when to use this more resource-intensive testing. |
| 476 | |
| 477 | ### Label Management |
| 478 | |
| 479 | Only repository maintainers can add labels to PRs. If you need release gates |
| 480 | run on your PR, ask a maintainer to add the `release-ci-required` label for you. |
| 481 | |