Reducing annotation burden in physical activity research using vision language models

0
Reducing annotation burden in physical activity research using vision language models
  • Wasfy, M. M. & Lee, I.-M. Examining the dose–response relationship between physical activity and health outcomes. NEJM Evid. 1(12), EVIDra2200190 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Servais, L. et al. First regulatory qualification of a digital primary endpoint to measure treatment efficacy in DMD. Nat. Med. 29(10), 2391–2392 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Troiano, R. P., Stamatakis, E. & Bull, F. C. How can global physical activity surveillance adapt to evolving physical activity guidelines? Needs, challenges and future directions. Br. J. Sports Med. 54(24), 1468–1473 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Logacjov, A., Herland, S., Ustad, A. & Bach, K. SelfPAB: Large-scale pre-training on accelerometer data for human activity recognition. Appl. Intell. 54(6), 4545–4563 (2024).

    Article 

    Google Scholar 

  • Yuan, H. et al. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. NPJ Digit. Med. 7(1), 91 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Walmsley, R. et al. Reallocation of time between device-measured movement behaviours and risk of incident cardiovascular disease. Br. J. Sports Med. 56(18), 1008–1017 (2022).

    Article 

    Google Scholar 

  • Willetts, M., Hollowell, S., Aslett, L., Holmes, C. & Doherty, A. Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Sci. Rep. 8(1), 7961 (2018).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Doherty, A. et al. Large scale population assessment of physical activity using wrist worn accelerometers: The UK biobank study. PLoS ONE 12(2), e0169649 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bao, L. & Intille, S. S. Activity recognition from user-annotated acceleration data. In International Conference on Pervasive Computing, 1–17 (Springer, 2004).

  • Keadle, S. K., Lyden, K. A., Strath, S. J., Staudenmayer, J. W. & Freedson, P. S. A framework to evaluate devices that assess physical behavior. Exerc. Sport Sci. Rev. 47(4), 206–214 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Thomaz, E. & Dimiccoli, M. Acquisition and analysis of camera sensor data (lifelogging). In Mobile Sensing in Psychology: Methods and Applications, 277 (2023).

  • Tufte, E. R. The Visual Display of Quantitative Information 2nd edn. (Graphics Press, 2002).

    Google Scholar 

  • Tremblay, M. S. et al. Sedentary behavior research network (SBRN)-terminology consensus project process and outcome. Int. J. Behav. Nutr. Phys. Act. 14, 1–17 (2017).

    Article 

    Google Scholar 

  • Ainsworth, B. E. et al. 2011 compendium of physical activities: A second update of codes and met values. Med. Sci. Sports Exerc. 43(8), 1575–1581 (2011).

    Article 
    PubMed 

    Google Scholar 

  • Keadle, S. K. et al. Using computer vision to annotate video-recoded direct observation of physical behavior. Sensors 24(7), 2359 (2024).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Schalkamp, A.-K., Peall, K. J., Harrison, N. A. & Sandor, C. Wearable movement-tracking data identify Parkinson’s disease years before clinical diagnosis. Nat. Med. 29(8), 2048–2056 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Shreves, A. H., Small, S. R., Travis, R. C., Matthews, C. E. & Doherty, A. Dose–response of accelerometer-measured physical activity, step count, and cancer risk in the UK Biobank: A prospective cohort analysis. Lancet 402, S83 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Bull, F. C. et al. World Health Organization 2020 guidelines on physical activity and sedentary behaviour. Br. J. Sports Med. 54(24), 1451–1462 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Chan, S. et al. Capture-24: A large dataset of wrist-worn activity tracker data collected in the wild for human activity recognition. Sci. Data 11(1), 1135 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kelly, P. et al. An ethical framework for automated, wearable cameras in health behavior research. Am. J. Prev. Med. 44(3), 314–319 (2013).

    Article 
    PubMed 

    Google Scholar 

  • Ainsworth, B. E., Herrmann, S. D., Jacobs Jr, D. R., Whitt-Glover, M. C. & Tudor-Locke, C. A brief history of the compendium of physical activities. J. Sport Health Sci. 13(1), 3 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Bureau of Labor Statistics. American Time Use Survey, 2024. Accessed 13 May 2024.

  • Herath, S., Harandi, M. & Porikli, F. Going deeper into action recognition: A survey. Image Vis. Comput. 60, 4–21 (2017).

    Article 

    Google Scholar 

  • Chen, Y. et al. Device-measured movement behaviours in over 20,000 China Kadoorie Biobank participants. Int. J. Behav. Nutr. Phys. Act. 20(1), 138 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Byrne, N. M., Hills, A. P., Hunter, G. R., Weinsier, R. L. & Schutz, Y. Metabolic equivalent: One size does not fit all. J. Appl. Physiol. 99, 1112–1119 (2005).

    Article 
    PubMed 

    Google Scholar 

  • Walmsley, R. Device-Measured 24-Hour Movement Behaviours and Risk of Incident Cardiovascular Disease. PhD thesis, University of Oxford (2022).

  • Kozey, S. L., Lyden, K., Howe, C. A., Staudenmayer, J. W. & Freedson, P. S. Accelerometer output and MET values of common physical activities. Med. Sci. Sports Exerc. 42(9), 1776 (2010).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pober, D. M., Staudenmayer, J., Raphael, C. & Freedson, P. S. Development of novel techniques to classify physical activity mode using accelerometers. Med. Sci. Sports Exerc. 38(9), 1626 (2006).

    Article 
    PubMed 

    Google Scholar 

  • Montoye, A. H. K., Begum, M., Henning, Z. & Pfeiffer, K. A. Comparison of linear and non-linear models for predicting energy expenditure from raw accelerometer data. Physiol. Meas. 38(2), 343–357 (2017).

    Article 
    PubMed 

    Google Scholar 

  • Hills, A. P., Mokhtar, N. & Byrne, N. M. Assessment of physical activity and energy expenditure: An overview of objective measures. Front. Nutr. 1, 5 (2014).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kim, Y., Barry, V. W. & Kang, M. Validation of the ActiGraph GT3X and activPAL accelerometers for the assessment of sedentary behavior. Meas. Phys. Educ. Exerc. Sci. 19(3), 125–137. (2015).

    Article 
    CAS 

    Google Scholar 

  • Kerr, J. et al. Using the SenseCam to improve classifications of sedentary behavior in free-living settings. Am. J. Prev. Med. 44(3), 290–296 (2013).

    Article 
    PubMed 

    Google Scholar 

  • Chasan-Taber, L. et al. Update and novel validation of a pregnancy physical activity questionnaire. Am. J. Epidemiol. 192(10), 1743–1753 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Nawab, K. A. et al. Accelerometer-measured physical activity and functional behaviours among people on dialysis. Clin. Kidney J. 14(3), 950–958 (2021).

    Article 
    PubMed 

    Google Scholar 

  • Martinez, J. Accuracy and Precision of Wearable Camera Media Annotations to Estimate Dimensions of Physical Activity and Sedentary Behavior. PhD thesis, University of Wisconsin-Milwaukee (2024).

  • Giurgiu, M. et al. Quality evaluation of free-living validation studies for the assessment of 24-hour physical behavior in adults via wearables: Systematic review. JMIR mHealth uHealth 10(6), e36377 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Femiano, R., Werner, C., Wilhelm, M. & Eser, P. Validation of open-source step-counting algorithms for wrist-worn tri-axial accelerometers in cardiovascular patients. Gait Posture 92, 206–211 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Alphen, H. J. M., Waninge, A., Minnaert, A. E. M. G., Post, W. J. & Putten, A. A. J. Construct validity of the Actiwatch-2 for assessing movement in people with profound intellectual and multiple disabilities. J. Appl. Res. Intell. Disabil. 34(1), 99–110 (2021).

    Article 

    Google Scholar 

  • Bach, K. et al. A machine learning classifier for detection of physical activity types and postures during free-living. J. Meas. Phys. Behav. 5(1), 24–31 (2021).

    Article 

    Google Scholar 

  • Marcotte, R. T. et al. Estimating sedentary time from a hip- and wrist-worn accelerometer. Med. Sci. Sports Exerc. 52(1), 225 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Koenders, N. et al. Validation of a wireless patch sensor to monitor mobility tested in both an experimental and a hospital setup: A cross-sectional study. PLoS ONE 13(10), e0206304 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gershuny, J. et al. Testing self-report time-use diaries against objective instruments in real time. Sociol. Methodol. 50(1), 318–349 (2020).

    Article 

    Google Scholar 

  • Doherty, A. et al. GWAS identifies 14 loci for device-measured physical activity and sleep duration. Nat. Commun. 9(1), 1–8 (2018).

    Article 
    CAS 

    Google Scholar 

  • Mann, S. Wearable computing: A first step toward personal imaging. Computer 30(2), 25–32 (1997).

    Article 

    Google Scholar 

  • Aizawa, K., Ishijima, K. & Shiina, M. Summarizing wearable video. In Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), Vol. 3, 398–401 (IEEE, 2001).

  • Bush, V. et al. As we may think. Atl. Mon. 176(1), 101–108 (1945).

    MathSciNet 

    Google Scholar 

  • Feichtenhofer, C., Fan, H., Malik, J. & He, K. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6202–6211 (2019).

  • Zhang, C.-L., Wu, J. & Li, Y. Actionformer: Localizing moments of actions with transformers. In European Conference on Computer Vision, 492–510 (Springer, 2022).

  • Momeni, L., Caron, M., Nagrani, A., Zisserman, A. & Schmid, C. Verbs in action: Improving verb understanding in video-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15579–15591 (2023).

  • Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J., Jiang, H., Liu, M., Liu, X., et al. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18995–19012 (2022).

  • Lin, K. Q. et al. Egocentric video-language pretraining. Adv. Neural Inf. Process. Syst. 35, 7575–7586 (2022).

    Google Scholar 

  • Pramanick, S., Song, Y., Nag, S., Lin, K. Q., Shah, H., Shou, M. Z., Chellappa, R. & Zhang, P. Egovlpv2: Egocentric video-language pre-training with fusion in the backbone. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5285–5297 (2023).

  • Bock, M., Van Laerhoven, K. & Moeller, M. Weak-annotation of HAR datasets using vision foundation models. In Proceedings of the 2024 ACM International Symposium on Wearable Computers, ISWC ’24, 55–62 (Association for Computing Machinery, New York, NY, USA, 2024).

  • Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763 (PMLR, 2021).

  • Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.-Y., Li, S.-W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A. & Bojanowski, P. Dinov2: Learning Robust Visual Features Without Supervision (2024). arXiv:2304.07193 [cs].

  • Carreira, J. & Zisserman, A. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6299–6308 (2017).

  • Wang, P. & Smeaton, A. F. Using visual lifelogs to automatically characterize everyday activities. Inf. Sci. 230, 147–161 (2013).

    Article 

    Google Scholar 

  • Moghimi, M., Wu, W., Chen, J., Godbole, S., Marshall, S., Kerr, J., & Belongie, S. Analyzing sedentary behavior in life-logging images. In 2014 IEEE International Conference on Image Processing (ICIP), 1011–1015 (IEEE, 2014).

  • Castro, D., Hickson, S., Bettadapura, V., Thomaz, E., Abowd, G., Christensen, H., & Essa, I. Predicting daily activities from egocentric images using deep learning. In proceedings of the 2015 ACM International symposium on Wearable Computers, 75–82 (2015).

  • Cartas, A., Marín, J., Radeva, P. & Dimiccoli, M. Recognizing activities of daily living from egocentric images. In Pattern Recognition and Image Analysis: 8th Iberian Conference, IbPRIA 2017, Faro, Portugal, June 20–23, 2017, Proceedings 8, 87–95 (Springer, 2017).

  • Cartas, A., Radeva, P. & Dimiccoli, M. Activities of daily living monitoring via a wearable camera: Toward real-world applications. IEEE Access 8, 77344–77363 (2020).

    Article 

    Google Scholar 

  • Cartas, A., Talavera, E., Radeva, P., & Dimiccoli, M. Understanding event boundaries for egocentric activity recognition from photo-streams. In International Conference on Pattern Recognition, 334–347 (Springer, 2021).

  • Damen, D., Doughty, H., Farinella, G. M., Furnari, A., Kazakos, E., Ma, J., Moltisanti, D., Munro, J., Perrett, T., Price, W., et al. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. Int. J. Comput. Vis., pp. 1–23 (2022).

  • Grauman, K. et al. Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19383–19400 (2024).

  • Li, C. et al. Multimodal foundation models: From specialists to general-purpose assistants. Found. Trends Comput. Graph. Vis. 16(1–2), 1–214 (2024).

    Google Scholar 

  • Liu, H., Li, C., Wu, Q. & Lee, Y. J. Visual instruction tuning. Adv. Neural Inf. Process. Syst. 36, 34892–34916 (2024).

    Google Scholar 

  • Schuhmann, C. et al. LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2022).

  • Deng, J. et al. Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (IEEE, 2009).

  • Udandarao, V. et al. No “zero-shot” without exponential data: Pretraining concept frequency determines multimodal model performance. arXiv preprint arXiv:2404.04125 (2024).

  • Reimers, N. & Gurevych, I. Sentence-bert: Sentence embeddings using Siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 (2019).

  • Hastie, T., Tibshirani, R., Friedman, J. H. & Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Vol. 2 (Springer, 2009).

    Book 

    Google Scholar 

  • Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    MathSciNet 

    Google Scholar 

  • Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

    Google Scholar 

  • Li, J., Li, D., Savarese, S. & Hoi, S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023).

  • Chung, H. W. et al. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).

  • Wolf, T. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).

  • He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).

  • Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv:2010.11929 [cs] (2021).

  • Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (2019).

  • Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Muller, S. G., & Hutter, F. TrivialAugment: Tuning-free yet state-of-the-art data augmentation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 754–762 (IEEE, Montreal, 2021).

  • Mirza, M. J. et al. Lafter: Label-free tuning of zero-shot classifier using language and unlabeled image collections. Scjefie 10, 10 (2023).

    Google Scholar 

  • Richard Landis, J. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977).

    Article 

    Google Scholar 

  • Keadle, S. K. et al. Evaluation of within-and between-site agreement for direct observation of physical behavior across four research groups. J. Meas. Phys. Behav. 1(aop), 1–9 (2023).

    Google Scholar 

  • Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (2016).

  • Fang, H.-S. et al. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 7157–7173 (2022).

    Article 
    ADS 

    Google Scholar 

  • Martinez, J. et al. Validation of wearable camera still images to assess posture in free-living conditions. J. Meas. Phys. Behav. 4, 47–52 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, L. et al. Parameter-efficient fine-tuning in large language models: A survey of methodologies. Artif. Intell. Rev. 58(8), 227 (2025).

    Article 

    Google Scholar 

  • Gu, J., Han, Z., Chen, S., Beirami, A., He, B., Zhang, G., Liao, R., Qin, Y., Tresp, V. & Torr, P. A systematic survey of prompt engineering on vision-language foundation models. arXiv:2307.12980 [cs] (2023).

  • Tran, Q.-Li., Nguyen, B., Jones, G. J. F. & Gurrin, C. Memorilens: A low-cost lifelog camera using raspberry pi zero. In Proceedings of the 2024 International Conference on Multimedia Retrieval, 1255–1259 (2024).

  • Mamish, John et al. Nir-sighted: A programmable streaming architecture for low-energy human-centric vision applications. ACM Trans. Embedd. Comput. Syst. 23, 1–26 (2024).

    Article 

    Google Scholar 

  • Pei, B., Chen, G., Xu, J., He, Y., Liu, Y., Pan, K., Huang, Y., Wang, Y., Lu, T., Wang, L. & Qiao, Y. EgoVideo: Exploring egocentric foundation model and downstream adaptation. arXiv:2406.18070 [cs] (2024).

  • Doherty, A. R. et al. Use of wearable cameras to assess population physical activity behaviours: An observational study. Lancet 380, S35 (2012).

    Article 

    Google Scholar 

  • Gage, R. et al. Fun, food and friends: A wearable camera analysis of children’s school journeys. J. Transp. Health 30, 101604 (2023).

    Article 

    Google Scholar 

  • Mok, T. M., Cornish, F. & Tarr, J. Too much information: Visual research ethics in the age of wearable cameras. Integr. Psychol. Behav. Sci. 49, 309–322 (2015).

    Article 
    PubMed 

    Google Scholar 

  • Meyer, L. E. et al. Using wearable cameras to investigate health-related daily life experiences: A literature review of precautions and risks in empirical studies. Res. Ethics 18(1), 64–83 (2022).

    Article 
    PubMed 

    Google Scholar 

  • link

    Leave a Reply

    Your email address will not be published. Required fields are marked *