Skip to content

Commit

Permalink
rfc: managing versions
Browse files Browse the repository at this point in the history
  • Loading branch information
ppannuto committed Aug 18, 2023
1 parent fd06e4d commit 7eb32ae
Show file tree
Hide file tree
Showing 3 changed files with 312 additions and 4 deletions.
58 changes: 58 additions & 0 deletions doc/Maintenance.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ group](wg/core/README.md) maintains the Tock project.

- [Roadmap and Feature Planning](#roadmap-and-feature-planning)
- [Outreach and Education](#outreach-and-education)
- [Between Releases](#between-releases)
- [Preparing a Release](#preparing-a-release)
* [Release Tasks](#release-tasks)
+ [Before the release](#before-the-release)
Expand Down Expand Up @@ -36,6 +37,63 @@ academic and professional conferences.
The project also maintains a [book](https://book.tockos.org) which includes
self-guided tutorials for various Tock features.

## Between Releases

Generally speaking, the primary development branch of Tock holds "next
minor version" semantics. That is, a default compile of a fresh checkout
of the Tock repository should maintain backwards-compatibility with the
currently released major version, but may also include bugfixes and new,
additional features.

### Patch Releases

> Note: Currently, even for patch releases, the Tock release process is
> fairly heavyweight due to the extensive hardware testing by core team
> members and affiliates. As a result, releases are relatively rare,
> which underpins the rationale for the current point release process.
A patch release generally carries the minimal changeset necessary for
the bugfix it includes relative to the most recent prior release. This
means that a 'new' patch release may be fairly divorced from the tip of
the development branch.

Normally, a patch release will follow the full release process described
in the next section. However, for very contained changesets with no risk
of hardware-sensitive behavior changes, the core team may elect for an
abridged release-testing process. Rationale and actual testing performed
will be included in the release notes in such cases.


### Minor Releases

(todo)


### Major Releases

When preparing a major release for Tock, a new branch will be christened
as the 'eventual release' branch, generally named `tock-v{N+1}` or
similar. ABI-breaking changes will all go to the major release branch.
Smaller changes and bugfixes will still go to the primary development
branch. The major release branch will periodically merge in new
changesets as is appropriate for its development process.

When the major release reaches a point of reasonable stability and
maturity, the core team will ask that new features be directed towards
the major release branch, and only bugfixes go to the current
development master. Shortly before the new major release is ready for
its first release candidate, a final minor release for the current major
version will be prepared.

After this release, all development will move to the major release
branch. Critical fixes that merit a new point release for the old major
version will be accepted, but all other changes should target the new
major release. The core team will then initiate the release process for
the new major release. When the final release is tagged, the major
version branch will be merged into the default branch for the Tock
repository, and development will continue as-normal.


## Preparing a Release

Tock releases are milestone-based, with a rough expectation that a new release
Expand Down
59 changes: 55 additions & 4 deletions doc/reference/trd104-syscalls.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ in other documents.
Three design considerations guide the design of Tock's system call API and
ABI.

1. Tock is currently supported on the ARM CortexM and RISCV architectures.
1. Tock is currently supported on the ARM CortexM and RISC-V architectures.
It may support others in the future. Its ABI must support both architectures
and be flexible enough to support future ones.
2. Tock userspace applications can be written in any language. The system
Expand All @@ -51,7 +51,58 @@ ABI.
3. Both the API and ABI must be efficient and support common call
patterns in an efficient way.

2.1 Architectural Support and ABIs
2.1 ABI Stability and Versioning
--------------------------------

This document describes the ABI for Tock 2.1 and beyond.

The Tock kernel version consists of `MAJOR.MINOR[.PATCH]`. For an
initial minor release, the `.0` patch is implicit, i.e., versions may
sequence as `v2.1`, `v2.1.1`, `v2.1.2`, and `v2.2`.

A kernel major and minor version guarantees the ABI for exchanging
data between kernel and userspace and the system call numbers.

- A patch release of Tock is for bugfixes. Patch releases will not add
to or remove from the kernel ABI surface, however, in select cases
behavior may change, see the next subsection for details.
- A minor version of Tock is for addition of new features. Minor
version releases guarantee backwards-compatibility with userspace
applications compiled against any prior release with the same major
version.
- A major version of Tock is for breaking ABI changes. Generally, Tock
attempts to minimize ABI changes. However, there are no stability
guarantees provided for applications across major versions.

2.2 ABI Discrepancies
---------------------

Documents such as this TRD describe the expected behavior of Tock.
However, it is possible that due to bugs or other errors, the kernel
implementation does not match expected behavior. When such cases are
discovered, the core team will make a judgement call that aspires to
least-surprising behavior. While not hard-and-fast rules, generally
the following principles will guide decision-making:

- If behavior is functionally incorrect and userspace could not
possibly do the correct thing because of the incorrect behavior
(e.g., a command does not actually execute the underlying action or
a syscall reports X bytes were written when in reality Y were
actually written), the implementation bug will be fixed and
included in the next patch (or greater) release.
- If behavior is partially incorrect, or incomplete, but it a
reasonable userspace app could 'work around' or 'limp along'
despite the issue, we are unlikely to change the existing interface
or its behavior, but may consider a minor release with a
'transition' syscall that corrects the issue, marking the original
for removal with the next major release. Applications SHOULD
transition to such 'replacement' ABIs when available.
- If behavior is consistent, but against guidelines, e.g. several
capsules in v2.x do not follow the `Command Identifier 0 =>
Exists` convention, the ABI will not change until the next major
release.

2.3 Architectural Support and ABIs
--------------------------------

The primary question for the ABI is how many and which registers transfer
Expand All @@ -60,7 +111,7 @@ of the kernel and userspace being able to transfer more information
without relying on pointers to memory structures. It has the cost of requiring
every system call to transfer and manipulate more registers.

2.2 Programming Language APIs
2.4 Programming Language APIs
---------------------------------

Userspace support for Rust is an important requirement for Tock. A key
Expand All @@ -70,7 +121,7 @@ passes a writeable (mutable) buffer into the kernel, it must relinquish
any references to that buffer. As a result, the only way for userspace
to regain a reference to the buffer is for the kernel to pass it back.

2.3 Efficiency
2.5 Efficiency
---------------------------------

Programming language calling conventions are
Expand Down
199 changes: 199 additions & 0 deletions doc/rfcs/2023-08-17--ManagingVersions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
Managing Versions
=================

- Initial Proposal: 2023-08-17
- Disposition: Under Review
- RFC PR: https://github.com/tock/tock/pull/3622

Summary
-------

This document describes Tock's versioning approach as well as the
development model for how we manage changes pending for next patch,
minor, and major releases.


Kernel Versioning
-------

We apparently never wrote down how our stability guarantees map to
kernel versions, so we should do that. I tried to capture the essence of
what we currently do in the new §2.1 and §2.2 for TRD104 attached to
this PR.

_Ideally, this is just writing down our existing policy, and is not too
controversial._


Master Branch Policy
--------------------

A fairly significant piece of the unwritten versioning rules is "what
changes are allowed in master?"

We have an answer from the 2.0 transition that we used: the default
development branch should maintain ABI backwards compatibility but not
necessarily forward compatibility. I.e., in our versioning language,
the master branch has 'next minor release' semantics.

I think it's fair to say that we've basically stuck to that policy.
I tried to capture this in the updates to the Maintenance.md document
attached to this PR.

_Ideally, this is just writing down our existing policy, and is not too
controversial._

I also tried to write down how we managed the major version transition.
I think it worked pretty well last time, but open to more thoughts here.


Understanding the delta between master and last release
------------------------------------

While we have a CHANGELOG file, to-date we are not great about keeping
it up to date during active development. Rather, as part of release
preparation the core team retroactively looks through all of the PRs and
commit history since the last release and synthesizes the key changes.

> Indeed, I got really confusing writing this section when I looked at
> the [current CHANGELOG in `master`](https://github.com/tock/tock/blob/master/CHANGELOG.md),
> as I thought we were on 2.1.1, but in actuality only _some_ of the
> changeset included in 2.1.1 is actually in current master---one of the
> missing bits is the update to the CHANGELOG :/.
Given our fairly non-deterministic release process, we can end up with a
long time and a lot of changes that makes it very challenging for an
external person to understand what has been fixed, what has been added
since Tock's last release in December, 2022 (well, that's the 2.1.1
point release, which _doesn't_ have a bunch of stuff between it and the
September, 2022 2.1 release included). There have been 1,350 commits
since 2.1 as-of this writing.

Trying to update the CHANGELOG with each PR, however, would likely
create a cluster of merge conflicts not worth maintaining.

One possible idea is to require CHANGELOG updates for any PR tagged
`P-Significant`. I suspect that might be a necessary, but not
sufficient, policy.

Another idea, of course, is more frequent releases. Maybe that is
something more feasible with robust hardware CI, but in the near-term, I
don't see our release process getting significantly less painful or more
frequent.


Carrying planned changes in code not in ToDo lists
--------------------------------------------------

One challenge with ABI stability, especially ABI stability in the master
branch, is that we can't fix 'small things' when they come up
(#3375, #3613, etc as motivation here). Currently, we simply close such
PRs and/or merge partial, non-ABI-breaking fixes, and put a TODO comment
in the code or maybe a tracking issue. I see several negatives to this
approach:
- It is easy to miss/overlook/forget something
- When I linked the stabilized syscall document, I noticed the
comment that 'GPIO is slated for renumbering with 2.0', but GPIO
was [00004 at v1.6][driverNum1v6] and was still
[00004 at v2.1.1][driverNum2v1v1] (and is 00004 currently).
- The cost of missing something is _very high_
- It means that a change must now be deferred until the _next_ major
release, which history tells us is rare.
- We create a giant todo-list that blocks/slows releases
- When something 'major' does motivate a new major release, in
addition to that big, complicated thing, we have a giant list of
tiny busy-work style things we need to
- It's off-putting to new users
- Some of the latent consistency issues are (rightly) tagged a
`good-first-issue`, yet when folks make PRs to fix them
(#3397, #3613, etc) we reject the changes.
- We throw away good code and developer context
- This follows from the last two points---we should fix things when
we are touching the code to make changes. Otherwise when we come
back later during release crunch time, we're relearning context
of what needs fixing and how to fix it that was already in
someone's head a few months ago.


### What can we do instead?

1. Keep PRs with fixes tagged and open until a major release process starts.
- Pro: Have the "real code" and "real fix" already there.
- Con: Likely that the PR will not merge cleanly by the time the next
major release occurs.
- Con: Lots of noise in the PR queue

2. Always keep a 'next-major-release' branch active
- Pro: Logical home for ABI-breaking changes
- Pro: Developer-friendly, easy to ask people to change target of PR
to accept useful changes
- Con: Will diverge, likely significantly from master branch, making
sync hard when release is ready
- Con: Avoiding prior con would add non-trivial maintenance burden of
periodically merging master with 'next-major', and likely
carrying a large set of merge conflict resolutions along the way.
- Con: Somewhat hidden, and unlikely to see any testing

3. The `cfg` option (I can't believe I'm suggesting it either)
- Pro: Code is 'right there' such that wide-area interface,
renaming, etc style changes will update next-release code as well
[assumes CI builds both current and next-release, which is easy]
- Pro: Release transition is very clear, simply remove all code where
`cfg` is no longer relevant.
- Con: It's `cfg`. It means we're carrying a bunch of code which in
the common case is untested and unused.
- Pro: Can result in space-savings for up-to-date downstream users.
If we are carrying code for backwards compatibility that
downstream folks know they won't use, they can remove that part
of the kernel.
- Con: The exponential number of kernel versions we just created.
- Pro: Could act as indicator for when a release is merited, i.e.
if we released a new version we'd shave off XX% of kernel
size for large enough X.
- Nuance: It may be worth distinguishing between "next minor" and
"next major". In particular, "next minor" `cfg`'d stuff can be
default-on, and largely no change from current practice, except
for this sentinel sprinkled in the codebase identifying what's
not in the released version [though, this could arguably also be
accomplished with some type of comment keyword].

4. ...?

[driverNum1v6]: https://github.com/tock/tock/blob/e8d0a28d86897c91b6747be357abfcfa7e86688f/capsules/src/driver.rs
[driverNum2v1v1]: https://github.com/tock/tock/blob/44f39d7c8cf5db0038606f08640bfae670127eef/capsules/src/driver.rs


Being more explicit about ABI changes and surface area
------------------------------------------------------

It's great that we have things like the compatibility header in the TBF,
but if I wanted to understand which version of Tock my application
requires, how would I figure that out? We have the list of [syscalls
stabilized with 2.0](https://github.com/tock/tock/tree/master/doc/syscalls),
but does that mean an app can't rely on _anything_ else?


**Proposal:** We need something between "stable" and "unstable". Maybe
not the [7 tiers of Python stability](https://pypi.org/classifiers/),
but something that helps folks get a sense of 'this has been around for
a while' versus 'this is brand new and we are still figuring it out'.


**Proposal:** We should not stabilize whole driver interfaces, but
rather individual syscalls within them. Low-level-debug Command 2,
"print a number", seems pretty safe to call immutable; I don't think
we want to lock the LLD interface yet, however.


As we have learned time and again, it's also really hard to
keep such documentation up to date. We basically subsist now on @bradjc
going on documentation rampages.

It is also likely not obvious to new users when something is part of the
kernel ABI. In practice, the surface area of the kernel is strewn across
many files.

**Dream:** We should write a tool that enforces documentation, including
stability/maturity classification, for anything that creates a
userspace-facing interface (i.e. `SyscallDriver` impl's) and
auto-generate documentation, with info like "since Tock 2.x".

0 comments on commit 7eb32ae

Please sign in to comment.