-
Extending Playwright: Mobile, Lighthouse, and Storybook
An experiment in extending the core Playwright runner to natively support Appium, inline Lighthouse audits, Storybook discovery, and duration-based test sharding.
-
Benchmarking LLMs on real test-automation work
Twenty models, scored by what their generated tests actually do against a sandbox. They perform exceptionally well at unit-test generation, but stall on the engineering judgment that maintaining a real suite requires.