Reducing annotation burden in physical activity research using vision language models

Wasfy, M. M. & Lee, I.-M. Examining the dose–response relationship between physical activity and health outcomes. NEJM Evid. 1(12), EVIDra2200190 (2022).

Article
PubMed

Google Scholar

Servais, L. et al. First regulatory qualification of a digital primary endpoint to measure treatment efficacy in DMD. Nat. Med. 29(10), 2391–2392 (2023).

Article
CAS
PubMed

Google Scholar

Troiano, R. P., Stamatakis, E. & Bull, F. C. How can global physical activity surveillance adapt to evolving physical activity guidelines? Needs, challenges and future directions. Br. J. Sports Med. 54(24), 1468–1473 (2020).

Article
PubMed

Google Scholar

Logacjov, A., Herland, S., Ustad, A. & Bach, K. SelfPAB: Large-scale pre-training on accelerometer data for human activity recognition. Appl. Intell. 54(6), 4545–4563 (2024).

Article

Google Scholar

Yuan, H. et al. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. NPJ Digit. Med. 7(1), 91 (2024).

Article
PubMed
PubMed Central

Google Scholar

Walmsley, R. et al. Reallocation of time between device-measured movement behaviours and risk of incident cardiovascular disease. Br. J. Sports Med. 56(18), 1008–1017 (2022).

Article

Google Scholar

Willetts, M., Hollowell, S., Aslett, L., Holmes, C. & Doherty, A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Sci. Rep. 8(1), 7961 (2018).

Article
ADS
PubMed
PubMed Central

Google Scholar

Doherty, A. et al. Large scale population assessment of physical activity using wrist worn accelerometers: The UK biobank study. PLoS ONE 12(2), e0169649 (2017).

Article
PubMed
PubMed Central

Google Scholar

Bao, L. & Intille, S. S. Activity recognition from user-annotated acceleration data. In International Conference on Pervasive Computing, 1–17 (Springer, 2004).

Keadle, S. K., Lyden, K. A., Strath, S. J., Staudenmayer, J. W. & Freedson, P. S. A framework to evaluate devices that assess physical behavior. Exerc. Sport Sci. Rev. 47(4), 206–214 (2019).

Article
PubMed

Google Scholar

Thomaz, E. & Dimiccoli, M. Acquisition and analysis of camera sensor data (lifelogging). In Mobile Sensing in Psychology: Methods and Applications, 277 (2023).

Tufte, E. R. The Visual Display of Quantitative Information 2nd edn. (Graphics Press, 2002).

Google Scholar

Tremblay, M. S. et al. Sedentary behavior research network (SBRN)-terminology consensus project process and outcome. Int. J. Behav. Nutr. Phys. Act. 14, 1–17 (2017).

Article

Google Scholar

Ainsworth, B. E. et al. 2011 compendium of physical activities: A second update of codes and met values. Med. Sci. Sports Exerc. 43(8), 1575–1581 (2011).

Article
PubMed

Google Scholar

Keadle, S. K. et al. Using computer vision to annotate video-recoded direct observation of physical behavior. Sensors 24(7), 2359 (2024).

Article
ADS
PubMed
PubMed Central

Google Scholar

Schalkamp, A.-K., Peall, K. J., Harrison, N. A. & Sandor, C. Wearable movement-tracking data identify Parkinson’s disease years before clinical diagnosis. Nat. Med. 29(8), 2048–2056 (2023).

Article
CAS
PubMed

Google Scholar

Shreves, A. H., Small, S. R., Travis, R. C., Matthews, C. E. & Doherty, A. Dose–response of accelerometer-measured physical activity, step count, and cancer risk in the UK Biobank: A prospective cohort analysis. Lancet 402, S83 (2023).

Article
PubMed

Google Scholar

Bull, F. C. et al. World Health Organization 2020 guidelines on physical activity and sedentary behaviour. Br. J. Sports Med. 54(24), 1451–1462 (2020).

Article
PubMed

Google Scholar

Chan, S. et al. Capture-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition. Sci. Data 11(1), 1135 (2024).

Article
PubMed
PubMed Central

Google Scholar

Kelly, P. et al. An ethical framework for automated, wearable cameras in health behavior research. Am. J. Prev. Med. 44(3), 314–319 (2013).

Article
PubMed

Google Scholar

Ainsworth, B. E., Herrmann, S. D., Jacobs Jr, D. R., Whitt-Glover, M. C. & Tudor-Locke, C. A brief history of the compendium of physical activities. J. Sport Health Sci. 13(1), 3 (2024).

Article
PubMed
PubMed Central

Google Scholar

Bureau of Labor Statistics. American Time Use Survey, 2024. Accessed 13 May 2024.

Herath, S., Harandi, M. & Porikli, F. Going deeper into action recognition: A survey. Image Vis. Comput. 60, 4–21 (2017).

Article

Google Scholar

Chen, Y. et al. Device-measured movement behaviours in over 20,000 China Kadoorie Biobank participants. Int. J. Behav. Nutr. Phys. Act. 20(1), 138 (2023).

Article
PubMed
PubMed Central

Google Scholar

Byrne, N. M., Hills, A. P., Hunter, G. R., Weinsier, R. L. & Schutz, Y. Metabolic equivalent: One size does not fit all. J. Appl. Physiol. 99, 1112–1119 (2005).

Article
PubMed

Google Scholar

Walmsley, R. Device-Measured 24-Hour Movement Behaviours and Risk of Incident Cardiovascular Disease. PhD thesis, University of Oxford (2022).

Kozey, S. L., Lyden, K., Howe, C. A., Staudenmayer, J. W. & Freedson, P. S. Accelerometer output and MET values of common physical activities. Med. Sci. Sports Exerc. 42(9), 1776 (2010).

Article
PubMed
PubMed Central

Google Scholar

Pober, D. M., Staudenmayer, J., Raphael, C. & Freedson, P. S. Development of novel techniques to classify physical activity mode using accelerometers. Med. Sci. Sports Exerc. 38(9), 1626 (2006).

Article
PubMed

Google Scholar

Montoye, A. H. K., Begum, M., Henning, Z. & Pfeiffer, K. A. Comparison of linear and non-linear models for predicting energy expenditure from raw accelerometer data. Physiol. Meas. 38(2), 343–357 (2017).

Article
PubMed

Google Scholar

Hills, A. P., Mokhtar, N. & Byrne, N. M. Assessment of physical activity and energy expenditure: An overview of objective measures. Front. Nutr. 1, 5 (2014).

Article
ADS
PubMed
PubMed Central

Google Scholar

Kim, Y., Barry, V. W. & Kang, M. Validation of the ActiGraph GT3X and activPAL accelerometers for the assessment of sedentary behavior. Meas. Phys. Educ. Exerc. Sci. 19(3), 125–137. (2015).

Article
CAS

Google Scholar

Kerr, J. et al. Using the SenseCam to improve classifications of sedentary behavior in free-living settings. Am. J. Prev. Med. 44(3), 290–296 (2013).

Article
PubMed

Google Scholar

Chasan-Taber, L. et al. Update and novel validation of a pregnancy physical activity questionnaire. Am. J. Epidemiol. 192(10), 1743–1753 (2023).

Article
PubMed
PubMed Central

Google Scholar

Nawab, K. A. et al. Accelerometer-measured physical activity and functional behaviours among people on dialysis. Clin. Kidney J. 14(3), 950–958 (2021).

Article
PubMed

Google Scholar

Martinez, J. Accuracy and Precision of Wearable Camera Media Annotations to Estimate Dimensions of Physical Activity and Sedentary Behavior. PhD thesis, University of Wisconsin-Milwaukee (2024).

Giurgiu, M. et al. Quality evaluation of free-living validation studies for the assessment of 24-hour physical behavior in adults via wearables: Systematic review. JMIR mHealth uHealth 10(6), e36377 (2022).

Article
PubMed
PubMed Central

Google Scholar

Femiano, R., Werner, C., Wilhelm, M. & Eser, P. Validation of open-source step-counting algorithms for wrist-worn tri-axial accelerometers in cardiovascular patients. Gait Posture 92, 206–211 (2022).

Article
PubMed

Google Scholar

Alphen, H. J. M., Waninge, A., Minnaert, A. E. M. G., Post, W. J. & Putten, A. A. J. Construct validity of the Actiwatch-2 for assessing movement in people with profound intellectual and multiple disabilities. J. Appl. Res. Intell. Disabil. 34(1), 99–110 (2021).

Article

Google Scholar

Bach, K. et al. A machine learning classifier for detection of physical activity types and postures during free-living. J. Meas. Phys. Behav. 5(1), 24–31 (2021).

Article

Google Scholar

Marcotte, R. T. et al. Estimating sedentary time from a hip- and wrist-worn accelerometer. Med. Sci. Sports Exerc. 52(1), 225 (2020).

Article
PubMed
PubMed Central

Google Scholar

Koenders, N. et al. Validation of a wireless patch sensor to monitor mobility tested in both an experimental and a hospital setup: A cross-sectional study. PLoS ONE 13(10), e0206304 (2018).

Article
PubMed
PubMed Central

Google Scholar

Gershuny, J. et al. Testing self-report time-use diaries against objective instruments in real time. Sociol. Methodol. 50(1), 318–349 (2020).

Article

Google Scholar

Doherty, A. et al. GWAS identifies 14 loci for device-measured physical activity and sleep duration. Nat. Commun. 9(1), 1–8 (2018).

Article
CAS

Google Scholar

Mann, S. Wearable computing: A first step toward personal imaging. Computer 30(2), 25–32 (1997).

Article

Google Scholar

Aizawa, K., Ishijima, K. & Shiina, M. Summarizing wearable video. In Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), Vol. 3, 398–401 (IEEE, 2001).

Bush, V. et al. As we may think. Atl. Mon. 176(1), 101–108 (1945).

MathSciNet

Google Scholar

Feichtenhofer, C., Fan, H., Malik, J. & He, K. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6202–6211 (2019).

Zhang, C.-L., Wu, J. & Li, Y. Actionformer: Localizing moments of actions with transformers. In European Conference on Computer Vision, 492–510 (Springer, 2022).

Momeni, L., Caron, M., Nagrani, A., Zisserman, A. & Schmid, C. Verbs in action: Improving verb understanding in video-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15579–15591 (2023).

Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J., Jiang, H., Liu, M., Liu, X., et al. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18995–19012 (2022).

Lin, K. Q. et al. Egocentric video-language pretraining. Adv. Neural Inf. Process. Syst. 35, 7575–7586 (2022).

Google Scholar

Pramanick, S., Song, Y., Nag, S., Lin, K. Q., Shah, H., Shou, M. Z., Chellappa, R. & Zhang, P. Egovlpv2: Egocentric video-language pre-training with fusion in the backbone. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5285–5297 (2023).

Bock, M., Van Laerhoven, K. & Moeller, M. Weak-annotation of HAR datasets using vision foundation models. In Proceedings of the 2024 ACM International Symposium on Wearable Computers, ISWC ’24, 55–62 (Association for Computing Machinery, New York, NY, USA, 2024).

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763 (PMLR, 2021).

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.-Y., Li, S.-W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A. & Bojanowski, P. Dinov2: Learning Robust Visual Features Without Supervision (2024). arXiv:2304.07193 [cs].

Carreira, J. & Zisserman, A. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6299–6308 (2017).

Wang, P. & Smeaton, A. F. Using visual lifelogs to automatically characterize everyday activities. Inf. Sci. 230, 147–161 (2013).

Article

Google Scholar

Moghimi, M., Wu, W., Chen, J., Godbole, S., Marshall, S., Kerr, J., & Belongie, S. Analyzing sedentary behavior in life-logging images. In 2014 IEEE International Conference on Image Processing (ICIP), 1011–1015 (IEEE, 2014).

Castro, D., Hickson, S., Bettadapura, V., Thomaz, E., Abowd, G., Christensen, H., & Essa, I. Predicting daily activities from egocentric images using deep learning. In proceedings of the 2015 ACM International symposium on Wearable Computers, 75–82 (2015).

Cartas, A., Marín, J., Radeva, P. & Dimiccoli, M. Recognizing activities of daily living from egocentric images. In Pattern Recognition and Image Analysis: 8th Iberian Conference, IbPRIA 2017, Faro, Portugal, June 20–23, 2017, Proceedings 8, 87–95 (Springer, 2017).

Cartas, A., Radeva, P. & Dimiccoli, M. Activities of daily living monitoring via a wearable camera: Toward real-world applications. IEEE Access 8, 77344–77363 (2020).

Article

Google Scholar

Cartas, A., Talavera, E., Radeva, P., & Dimiccoli, M. Understanding event boundaries for egocentric activity recognition from photo-streams. In International Conference on Pattern Recognition, 334–347 (Springer, 2021).

Damen, D., Doughty, H., Farinella, G. M., Furnari, A., Kazakos, E., Ma, J., Moltisanti, D., Munro, J., Perrett, T., Price, W., et al. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. Int. J. Comput. Vis., pp. 1–23 (2022).

Grauman, K. et al. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19383–19400 (2024).

Li, C. et al. Multimodal foundation models: From specialists to general-purpose assistants. Found. Trends Comput. Graph. Vis. 16(1–2), 1–214 (2024).

Google Scholar

Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 36, 34892–34916 (2024).

Google Scholar

Schuhmann, C. et al. LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2022).

Deng, J. et al. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (IEEE, 2009).

Udandarao, V. et al. No “zero-shot” without exponential data: Pretraining concept frequency determines multimodal model performance. arXiv preprint arXiv:2404.04125 (2024).

Reimers, N. & Gurevych, I. Sentence-bert: Sentence embeddings using Siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 (2019).

Hastie, T., Tibshirani, R., Friedman, J. H. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Vol. 2 (Springer, 2009).

Book

Google Scholar

Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

MathSciNet

Google Scholar

Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

Google Scholar

Li, J., Li, D., Savarese, S. & Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).

Chung, H. W. et al. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).

Wolf, T. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).

Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv:2010.11929 [cs] (2021).

Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (2019).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).

Article
CAS
PubMed

Google Scholar

Muller, S. G., & Hutter, F. TrivialAugment: Tuning-free yet state-of-the-art data augmentation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 754–762 (IEEE, Montreal, 2021).

Mirza, M. J. et al. Lafter: Label-free tuning of zero-shot classifier using language and unlabeled image collections. Scjefie 10, 10 (2023).

Google Scholar

Richard Landis, J. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977).

Article

Google Scholar

Keadle, S. K. et al. Evaluation of within-and between-site agreement for direct observation of physical behavior across four research groups. J. Meas. Phys. Behav. 1(aop), 1–9 (2023).

Google Scholar

Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (2016).

Fang, H.-S. et al. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 7157–7173 (2022).

Article
ADS

Google Scholar

Martinez, J. et al. Validation of wearable camera still images to assess posture in free-living conditions. J. Meas. Phys. Behav. 4, 47–52 (2021).

Article
PubMed
PubMed Central

Google Scholar

Wang, L. et al. Parameter-efficient fine-tuning in large language models: A survey of methodologies. Artif. Intell. Rev. 58(8), 227 (2025).

Article

Google Scholar

Gu, J., Han, Z., Chen, S., Beirami, A., He, B., Zhang, G., Liao, R., Qin, Y., Tresp, V. & Torr, P. A systematic survey of prompt engineering on vision-language foundation models. arXiv:2307.12980 [cs] (2023).

Tran, Q.-Li., Nguyen, B., Jones, G. J. F. & Gurrin, C. Memorilens: A low-cost lifelog camera using raspberry pi zero. In Proceedings of the 2024 International Conference on Multimedia Retrieval, 1255–1259 (2024).

Mamish, John et al. Nir-sighted: A programmable streaming architecture for low-energy human-centric vision applications. ACM Trans. Embedd. Comput. Syst. 23, 1–26 (2024).

Article

Google Scholar

Pei, B., Chen, G., Xu, J., He, Y., Liu, Y., Pan, K., Huang, Y., Wang, Y., Lu, T., Wang, L. & Qiao, Y. EgoVideo: Exploring egocentric foundation model and downstream adaptation. arXiv:2406.18070 [cs] (2024).

Doherty, A. R. et al. Use of wearable cameras to assess population physical activity behaviours: An observational study. Lancet 380, S35 (2012).

Article

Google Scholar

Gage, R. et al. Fun, food and friends: A wearable camera analysis of children’s school journeys. J. Transp. Health 30, 101604 (2023).

Article

Google Scholar

Mok, T. M., Cornish, F. & Tarr, J. Too much information: Visual research ethics in the age of wearable cameras. Integr. Psychol. Behav. Sci. 49, 309–322 (2015).

Article
PubMed

Google Scholar

Meyer, L. E. et al. Using wearable cameras to investigate health-related daily life experiences: A literature review of precautions and risks in empirical studies. Res. Ethics 18(1), 64–83 (2022).

Article
PubMed

Google Scholar