Oura vs Whoop vs Apple Watch: The Quantified Scientist on What the Accuracy Data Shows
Rob ter Horst tracks his own body in extraordinary detail. Since 2018 he has had a brain MRI every single week, somewhere between 250 and 300 scans in total, which he suspects may make his the most-scanned brain in the world. He sleeps with an EEG on his head every night. He has analysed around 250 of his own gut microbiome samples. For several years the measuring took more than eleven hours a week. He built a following as The Quantified Scientist by testing wearables not on their features, but on whether their numbers hold up against laboratory-grade reference equipment.
The thing that changed his health the most was not a brain scan. It was a basic Fitbit, years ago, showing him he was sleeping under six hours a night without realising it. That tension runs through everything he says here: the difference between data that genuinely helps you and data that just makes you anxious, or poorer. We asked him the question our community asks more than any other, Oura versus Whoop versus Apple Watch, and then kept going: which sleep numbers to trust and which to ignore, when tracking starts doing more harm than good, whether a continuous glucose monitor is worth the money, and the two or three figures a busy person should actually watch.
What stands out is how often he tells you to track less, not more, and to trust the simple signals over the proprietary scores.
Here is our conversation.
1. Your own tracking setup is on another level: more than eleven hours of measurements a week, a weekly brain MRI, sleep EEG, even gut microbiome sampling. What started all this, and what is the most surprising thing you have learned about your own body?
It started during my PhD, which was in biological data analysis. At the time I was also doing a lot of science communication, and I found that the data analysis side was hard to explain, especially when the data was complex. So I wanted some data that was fun to work with. I bought a wearable, tracked my music listening, and correlated different artists to my heart rate while I listened to them, purely as a playful data science exercise. That gave me the idea to track a bit more and see what I could learn, partly as a teaching moment in data science and partly as a personal scientific project to run over several years. It then spiralled somewhat out of control. Since 2018 I have had a weekly brain MRI, and I suspect I have the most scanned brain in the world, with somewhere around 250 to 300 scans. I sleep with an EEG every night. I also did roughly 250 gut microbiome samples before I stopped, mostly because of the cost. The eleven hours ran for several years, and I have since reduced it by cutting the most time intensive parts, like detailed daily questionnaires and morning reaction speed tests.
The most surprising thing came from the simplest device. Early on I had a Fitbit Charge 2, just to track my sleep. I had not realised I was averaging under six hours a night, because I was doing so many things at once. Seeing that number in front of me, and seeing that it had been falling, is what changed my behaviour the most. Knowing something vaguely in the back of your mind is very different from seeing it laid out as data.
2. How did personal tracking turn into reviewing devices for an audience?
Once I was deep into tracking, I wanted to find the best wearables, but the reviews I could find were quite surface level on most metrics, with the possible exception of heart rate and, to a degree, GPS. So the idea sat in the back of my mind for a while. My barber was actually the one who suggested I review the devices myself. I did not have time at first, because I was starting a postdoc, but then the pandemic hit and I launched my YouTube channel using the data I already had from the Fitbit and the Oura Ring I was wearing. It grew organically from there.
Free Guide: Pro Longevity Dashboard
Add 20+ Healthy Years to Your Life
(Even with limited time and knowledge)
By subscribing you will also be signed up for the New Zapiens newsletter
25,000+ Downloads • 100% Free
3. The single thing our community asks about more than anything else is Oura versus Whoop versus Apple Watch. Going by your accuracy data rather than the marketing, how do they actually compare, and does the answer change depending on what someone wants to track?
Yes, the answer changes, because these devices have different target audiences in both their features and their accuracy. In simple terms, if you want raw data only, the Apple Watch gives you good quality numbers for sleep and heart rate, but it offers no interpretation and the visualisation is weak. To make sense of it you would need a translation layer, an additional app such as Bevel or Athlytic, though I have not tested those specifically because they do not generate new data of their own. Oura and Whoop both perform well and present the data far better. Oura is a little stronger during sleep, Whoop is a little stronger during exercise, and the Apple Watch is solid at both. Whoop leans more towards athletes, Oura leans more towards sleep metrics, and Oura's heart rate tracking during exercise is not that reliable. There are also newer competitors now, like the Google Fitbit, which performs quite well, at least on my body.
4. You have said there is no single best device, only the right combination for the person. For someone who just wants to buy one thing and get on with their life, how should they choose?
Start by deciding what you actually want to focus on: sleep, sport, general health, or all of it, and how much the app experience matters to you. From there the choice becomes clearer. If you want GPS, Garmin is one of the best, and Coros also does well. If you mainly care about sleep and want something small, the Oura Ring is very good. If you want something without a screen, more of a band, the Google Fitbit is a good starting device. If you are very active, the Whoop app is still a little better than the Fitbit's, but there is a big price gap, so you need to be sure you really need it. Interestingly, the data quality on my own body was better on the Google Fitbit than on the Whoop strap, though Whoop is still among the better performers. And if you also want a proper smartwatch experience, the Apple Watch or the Google Pixel Watch are good choices. It really comes down to what you need it to do, what your own constraints are, and battery life, which is quite poor on smartwatches.
5. You test these devices against proper reference equipment. Which popular metrics hold up well, which are far shakier than people assume, and is there a feature that markets itself brilliantly and then falls apart once you put it to the test?
My reference equipment ranges from great to good. My sleep tracking, for example, is good enough to make a judgement, even if it is not perfect. Heart rate holds up very well. It is well defined, some devices measure it really accurately, and it is easy to test across multiple people, so we can say with confidence that a given device does well for most users. Heart rate variability during the night is in a similar position.
Sleep stages are far shakier than people assume. They will always carry a relatively high error margin, because even with the best gold standard equipment in the world, two independent people analysing the same data will only agree around ninety percent of the time. Part of that is definitional. We treat deep sleep and REM sleep as discrete stages, but in reality there are transition phases, not all deep or REM sleep is the same, and there is even REM sleep without eye movement, despite REM standing for rapid eye movement. The definitions themselves are messy.
As for a feature that markets brilliantly and then collapses, I would not point to one in particular. What I see instead is a lot of general overselling. A company will claim great sleep stage tracking, then a year later release an improved version, which quietly implies the first claim was overstated. It is less one failed feature and more a pattern of overselling quality across the board.
6. What does it actually take to test one device properly, and what is the part of the process people would be surprised by?
I sleep with an EEG on my head every night. It is not full polysomnography, but it is several electrodes in a fairly sleek package. Analysing the data takes coding, or at least a data science setup that most people would not be able to do themselves. Every seven days I take out the SD card, analyse the data, and put it back in. So there is a lot of manual work involved. The part people are probably most surprised by is the human side of it. The device has blinking lights, which I mostly turn off at night, but anyone who has shared a bed with me might not have considered it the most attractive look.
7. Sleep tracking is everywhere now. How close are consumer devices to getting your sleep stages right, and how much should someone trust that nightly sleep score?
These are two separate things. Sleep stages are one thing, the sleep score is another. During the night it is worth tracking several signals together, not just stages but also heart rate, heart rate variability, and how those patterns move across the night. If your heart rate drops slowly compared with normal, for instance, that might tell you that you ate too large a meal or trained too late before bed. The sleep score then combines several of these signals, but those scores are proprietary, they are not comparable between devices, and sometimes we do not even know how they are calculated, which makes them hard to interpret.
On the stages themselves, consumer devices are getting a lot better, and with AI and foundation models I expect them to keep improving, potentially reaching the level of a very good EEG before long, close to polysomnography even if not quite matching it. Some devices are already quite good. My advice is to interpret patterns over time rather than fixating on single nights. Big outlier days are the interesting ones and the more reliable signal. Subtle changes I would treat with more caution, unless they persist over a long period.
8. Some people check their recovery score before they even get out of bed. When does tracking genuinely improve someone's health, and when does it start doing more harm than good?
This is very personal and hard to answer completely. There has been research showing that for some people tracking probably does more harm than good, while for others it is neutral or positive. Some people are sensitive to it: when the tracker reports a bad night, they discount how they actually feel and become so influenced by the number that their sleep quality genuinely suffers. Hopefully coaching, which can already be done to some degree by AI and will get better, can help interpret the data more gently. It becomes harmful when a metric turns into too rigid a goal, like needing a sleep score above a certain number every day, or insisting on a fixed percentage of deep sleep. Used well, the data is a guide alongside your subjective feelings and what actually matters in your life. If a friend is getting married, it is probably worth losing some sleep to be there and form that memory. Scrolling on TikTok is probably not. These things are rarely black and white.
9. For a busy person who is curious but does not want to obsess, which two or three numbers are actually worth paying attention to?
Again it depends on your goal. If you want to get fitter, watch your resting heart rate during the night and treat a falling number as a sign your fitness is improving. Beyond that, pay attention to big outliers in heart rate variability and temperature. If those deviate sharply, it can signal overtraining or stress, with heart rate variability being especially useful for overtraining, or simply that you need more recovery. If a lot of signals are out of order at once, it can mean you are getting sick, which is more of an immediate effect and might change whether you train that day. Personally I also use heart rate to train in specific zones, both high and low intensity, but you do not strictly need a device for that. A conversational pace, where you can still talk fairly comfortably, already tells you a lot. The tracking just makes it a little easier. And it is worth remembering that most of these numbers are proxies for what you are really trying to achieve.
11. Continuous glucose monitors are now marketed to healthy people who want to optimise their energy and metabolism. Based on the evidence, who actually benefits from wearing one, and who is mostly paying for expensive curiosity?
I should be upfront that this is not my area of expertise. I have used them, but I have not done a deep dive, so take this as a less informed opinion. My expectation is that the biggest benefit would come at population level. If a large fraction of people used a CGM occasionally, we would find many people with diabetes or pre-diabetes who do not yet know they have it, and they could seek treatment and ideally make lifestyle changes. For people simply wanting to optimise energy and metabolism, it can be interesting to test which foods cause spikes, and the low values that often follow, and then reduce those fluctuations if you find they affect your energy. That is worth doing for a while. I do not think it is worth wearing one continuously. They are expensive and not reusable, and for almost anyone other than a diabetic managing their intake and insulin, it does not make sense. So for the general population it is mostly an expensive curiosity. If they became much cheaper, though, the screening value alone, catching people who are pre-diabetic or diabetic without knowing it, could genuinely extend their healthspan.
12. All of these devices collect intimate health data around the clock. How much should people worry about where that data goes, and do you factor privacy into what you recommend?
I have not dug deeply into this, so I cannot say a great deal. The data can certainly be used for marketing, and in theory it could be used by insurance companies to deny claims or argue pre-existing conditions. I do not think that has happened, but it is hard to know what the future holds. I have not factored privacy into my recommendations so far. It would be good if companies at least promised not to sell the data, and I doubt all of them do. Apple Health is among the more local platforms, less accessible to outsiders, whereas most platforms store data in the cloud where the manufacturer can access it. Even a promise not to sell can be undone by a hack, so the data is still out there. How much it should worry you depends on how concerned you are that it could be used in a way that works against you.
13. Where is wearable tech genuinely heading next, things like blood pressure on the wrist or needle-free sensing, and what are you most excited or sceptical about?
I think we will be able to infer more and more from the sensors we already have. We are already trying to estimate the risk of high blood pressure from existing wearable sensors without adding any new hardware, and this will keep improving with foundation models and ways of integrating AI into the ecosystem. I am quite sceptical of the rush to put AI into everything. For some things in this space it genuinely makes sense, but bolting chatbots onto everything is a bit ridiculous in some cases. Personalised supplements, for instance, are overstated at the moment. There is a lot of generalisability, and while unique responses certainly exist, we are not good at predicting them from the data we currently have, short of people simply trying them and seeing what works. That said, I do not consider that my area either.
What I am genuinely interested in is the recent Google research introducing a sensor foundation model, which strikes me as proper, serious work. Integrated responsibly, it could help hint at the risk of certain diseases and nudge people in the right direction. And what I am most excited about is reliable blood pressure sensing on the wrist, to warn people of hypertension, with the caveat that we tend to overestimate how much the average person actually changes their lifestyle. Alongside that, needle-free continuous glucose sensing from the wrist, without buying disposable sensors, could diagnose a lot of people with prediabetes who could then seek medical attention. Those two are the developments I would be most excited about.
Free weekly newsletter
Your weekly brief on
living longer – without
overhauling your life
10,000+ readers get smarter about their health every Thursday. Join them.
- Science broken down for real life
- Daily health routines from leading experts
- Free health guides & tools every week
No spam. Unsubscribe anytime.
Read by 10.000+ health-conscious readers.
Author: Rob ter Horst
I run "The Quantified Scientist" on YouTube