November 21, 2023

Topics: Coding

Link to tech demo video on YouTube

Note

This is an article about my first ever merged pull request on an open source product. See Coding for more background on my coding experiences and my journey to consider myself a "programmer" instead of just a "coder".

I'm taking this journey because I like to make useful products, learn interesting things, solve challenging puzzles, and maybe someday make some money doing it.

TLDR

I programmed an automation tool that increases developer productivity and increases the reliability of the Taskwarrior to-do task management software by assuring that developers are all collaborating on the same version of programming language.

Background

TaskWarrior is a task list or to-do application for the command line. TaskWarrior has a module being integrated called TaskChampion. Although TaskWarrior is primarily written in C++, TaskChampion is written in the Rust programming language. The source code is hosted in a repository on the Github version control system. As of 2023-11-21, TaskWarrior has 89 sponsors, ~3,600 stars, 140 contributors, ~2,900 OS-X downloads per year, and over 14,000 downloads from Github itself.

Software development requires collaborating developers to use the same version of the coding environment, supporting software libraries, etc. in order to assure that the code the developers writes performs, as expected, no matter who is running it or where. When developers use different versions of the same software to write code, it can introduce bugs, or fail completely. With so many developers collaborating on TaskWarrior, it is important to keep them coordinated and aligned on which development software versions to use.

Therefore, keeping everyone on the same page is critical to writing software that performs as the user and other developers expect. This can be accomplished in a variety of ways, both automated and manually. This is especially the case with collaborating developers on Open Source Software, and in this case, TaskWarrior. One solution specific to collaborating with other developers on the Task Champion repository, a child repository of Task Warrior, is to manually update the version of the software environment to be used, wherever it is used.

In the Rust programming language, this is expressed as the Minimum Supported Rust Version or MSRV for short. The MSRV is specified in a variety of files, specifically configuration files which aid in the continuous integration ("CI") and software compilation process. This makes sure that all components of that software that gets compiled into a user-executable binary file are using the same version of Rust and therefore compatible with each other.

Problem

Manually updating and maintaining the MSRV in a variety of places can be cumbersome, time consuming, and error prone. As a result, software may end up not using the correct version of Rust, may not compile properly, and even if it is updated correctly, may require significant time to update every time it needs to be updated.

Solution

Using automation to update files that specify the MSRV whenever the MSRV changes, developers can better be aligned on what version of Rust they should be using. Using the existing xtask command line automation tool , a repo within the parent TaskChampion repository (which is a child of the TaskWarrior repository), I implemented new functionality that will automatically update the MSRV to a specified version number for all files specified within the program's source code. For example, running xtask MSRV 1.65 will look in all specified files for a specific text pattern, and if an existing version is found and is different than the one specified in the command, it will update the MSRV to 1.65.

Impact

By automating the process of keeping the MSRV consistent across all configuration files:

No developers will have to spend time manually updating the MSRV in any files.
The MSRV will stay consistent, reducing time spent debugging and fewer version compatibility bugs
Code written should be more reliably built by CI services.

How

The xtask MSRV program has a const slice of (string, string) tuples that specify:

the relative path to a file that should have its MSRV updated, and
regex that specifies the string to look for within the file containing the version number to be updated.

This slice is than looped over to find and open the file, whose contents is iterated over line by line, and evaluated for a match to the specified regex. If a match is found, and it is different than the MSRV specified at the command line, then an update is made to the line, and the remainder of the file is evaluated. The file is then saved and closed. This recurs for all files specified in the slice until all files have been evaluated. When a file is successfully updated, a note will be printed to the command line.

Specifying no version number, or an invalid version number (according to simple numeric value checks), will result in an error and no updates will be made. Similarly, if a specified file does not exist, the MSRV pattern is not found, or the found MSRV is not different to the one specified as an argument at the command line, then no updates will be made.

Known Limitations

There are several known limitation to xtask MSRV that, given the scope, time, and anticipated usage of the tool, are considered to be acceptable. These include but may not be limited to:

MSRV does not prevent the user from specifying invalid Rust version numbers which may result in bad configurations
MSRV does not guarantee that all files with a specified MSRV will get updated as expected, since files and appropriate regex must be specified in the slice.
Because MSRV relies on regex pattern matching, it is unable to proactively identify potentially false-positive matches when evaluating lines in a file, and as a result, a developer may inadvertently update a file's contents with MSRV that they did not intend to.
Depends on the user to have a correctly specified environment variable to the appropriate Cargo manifest location.

Alternative Solutions

There may be other solutions to solving the coordination of version numbers problem. The default solution is to conduct manual updates. A proposed but potentially incomplete solution is to specify the MSRV in Rust's cargo.toml configuration file, implicitly indicating to a potential developer the MSRV to be used. This solution would still require the manual updating of each relevant configuration file.

Further Development

To improve on this functionality, a developer could implement function to search the entire repo's contents for files with specific MSRV patterns, instead of specifying specific files within the repo. Additionally, better version identification, perhaps with serialized data with serde might yield better accuracy and lower false positives or false negatives.

Conclusion

Automatically updating the MSRV to enhance the productivity of developers and reduce potential bugs should result in higher velocity development of Taskwarrior. Extending the functionality of xtask MSRV could include include versioning of other types of software and libraries - not just rust. The functionality could also be applied to documentation of software.

What I've learned from this project

As I was working through the project I kept a list of items I want to keep in mind for future projects:

I need to pay careful attention to crate versions that may not be compatible with a minimum supported rust version. Unwinding the manifest dependencies is an unfun activity. Specifically, I found that cargo.lock had been updated with some dependencies that I no longer needed and couldn't use anyway due to their particular version not being supported.
It is helpful to read the docs for crates, methods, etc during development and to revisit when using unfamiliar methods. It is helpful to understand what a function actually does, what it actually returns, and under what conditions it tends to throw errors. It may also help find more useful methods that are available that I wasn't aware of, or methods that are only available under certain conditions (eg there is a write_at method that is only available on Unix platforms, so I needed to rewrite some of the code with write_all).
Follow the leader's suggestions.: They may have contextual or implicit knowledge that they either failed to communicate explicitly, or are unable to transparently communicate. (eg The collaborator who assigned the project to me envisioned a slightly different developer workflow than I had in mind. Given their familiarity with how developers intend to use my code, its fair to assume my project leader would have a better idea how the code should work than I do. As a result of not following the suggestion to use regex from the beginning, I ended up going back and implementing regex for pattern matching anyway when I learned about the unstated expectation.)
It is preferable to use 'idiomatic' rust code. This may go beyond cargo fmt. (eg: .unwrapped() is not idiomatic even though it is not technically incorrect).
Readability of code counts. Un-nesting match or if statements may be ok if it has little to not impact to the speed of the compiled code but increases the readability of the code.
Clarify the task. I had assumed that the comments of the file were to be changed, but the actual configuration values are what needed to be changed.
Look to see how others have solved the problem before. Its not cheating, necessarily.
- I felt that in my case, cutting my teeth by working from scratch was part of the learning process. In a professional context, speed matters more than individual learning.
Further, the solution might be in the exact same file as the one I'm already working in! I was a bit challenged by relative paths in different contexts (eg: my git environment vs. someone else's) and needed to learn about environment variables, but my project manager had to very nearly spell it out for me. This was particularly embarrassing since there was already a function in the file I was collaborating on that had almost the entire solution already coded.
What a docstring is versus a normal comment and how it is used.
Conventions around shadowing variables and similarly named variables in adjacent scopes/namespace (eg line from a for line in lines loop)
Before requesting a code review, I need to make sure to:
- look for anything that can be refactored - especially in places where previous code was modified in place and the functionality changed
  - eg variables defined in a loop that don't change with each iteration of the loop
  - In Rust, look for where ? can be used instead of unwrapped and let the outer loop handle exceptions to ? failures
  - Consider if using a slice is better than using an array
- run all linters and fix issues
- run auto formatting
- review all comments for consistency with the code
- build the binary and test it on local system
- Compile / Build the binary and run it in ways and environment I might imagine that users might too. I found that in the process of preparing the recorded demo, that my environment variables weren't set the same as when I was running debug tests without compiling. Just testing whether it compiles or not is insufficient.
  - While debugging, learned that the CARGO_MANIFEST environmental variable works differently while running with cargo run than after the binary is compiled and therefore it doesn't work the way I thought it did. While troubleshooting this, I also found that, at least on my computer, my environment path variables when set using the ~ shorthand for the user home directory, aren't considered valid paths by Rust when reading from the environment variable.

Thanks

I was getting guidance (mostly via hints and directional nudges) from @djmitch who coordinated with and assigned the project to me. Thank you for letting me contribute!