[instrumentation-fetch] [Research/Discussion] Use `msw` for fetch (and XHR) instrumentation tests #5282

chancancode · 2024-12-18T09:29:51Z

Note

This PR currently contains an extra commit from #5268. I will rebase once that is merged.

Note

This PR is currently an incomplete proof-of-concept.

Which problem is this PR solving?

Currently the test suite heavily relies on mocking to work, specifically it uses sinon.stub(window, 'fetch') (and the XHR equivalent) to replace it with a custom function that is intended to work similarly to the real one. However, the actual behavior of fetch/XHR is very complicated and is very difficult to get right, with a lot of subtle interactions around CORS, redirects, resource timing, etc.

Issues like #5122 is kind of an extreme example of the problem (assuming the report is correct), where the current code and the "fake fetch" are assuming a browser behavior that didn't actually exist, and the current tests are explicitly confirming that non-existent behavior based on the bad stub.

Considering that the point of the this package (and consequently, what we are trying to test) is – 1. replace the globals with an instrumented version without introducing any breakages 2. extract the relevant information from the browser into the spans – this seems like a problem to me. Essentially, we are testing that "if our assumptions about how the browser works is correct, then our instrumentation code behaves correctly", which seems a bit circular to me. It also makes the test suite quite difficult to work with – part of it is just due to the factoring of the current code, but it would genuinely quite complex to add additional test coverage for things like redirects and opaque responses.

In the case of fetch/XHR, there seems to be a better way to go about this and use as much of the real browser functionalities as possible. Service Workers allows webpages to intercept network requests (like fetch() and XHR, but also other resources), and msw is a library to make that work nicely for testing/development workflows. Essentially, it allows for mocking/intercepting any fetch()/XHR requests we want to make in the tests while still invoking the actual native fetch() function in the browser.

To that end, I started this branch as a proof-of-concept to see how far I could get, and what sort of problems I may run into with this approach. So far, I have finished setting things up and porting over the first few handful of fetch tests. I was able to get that to work/the modified tests to pass, and I think it represents a significant improvement to the readability of the tests, in addition to the realism benefits.

I intend on keep pushing in this direction and see how far I can get and what problems I ended up running into. Wanted to open a PR to collect some feedback and see if there are any obvious reasons not to do this/any major red flags.

Assuming I should continue the exploration, the next step would be to see if I can test the integration with the resource timing API without mocking/stubbing those globals. It's not the end of the world if we have to keep stubbing those, but since the point of this is to reduce reliance on mocking browser behavior, I think it's worth trying and I think it should be doable.

chancancode · 2024-12-18T18:35:32Z

Discussed this at the Dec 18 SIG meeting, the group seems generally onboard with trying this out, so I'll keep pushing forward on this and report any issues with the approach

Refactor `fetch()` tests for improved clearity, type safety and realism (relative to the real-world `fetch` behavior). See the inline comments on PR open-telemetry#5268 for a detailed explaination of the changes.

TypeScript has to be temporarily bumped because msw listed it as a optional peerDependency with an incompatibile version range, which is a strange thing to do IMO, but nevertheless this shouldn't be an issue in the long run, since open-telemetry#5145 will bump us to 5.0 anyway.

These tests doesn't actually need the complicated setup to work, so seperated them out into a new suite. Spiritually should be the same coverage as before, with a new test for `{ enabled: false }`.