Computer vision is hard

The challenges of computer vision.

Written by

René Vergara-Fuentes

Published on

January 22, 2025

AI in general is hard, but computer vision AI is particularly challenging. While Large Language Models (LLMs) often feel intuitive to work with, computer vision—especially with video—presents a level of complexity that demands careful attention to detail. At Factorial Biomechanics, this contrast is something we navigate every day.

Feeding a pre-trained Large Language Model is often as simple as providing clean text—maybe tokenizing or structuring it for specific tasks. Computer vision, on the other hand, demands far more nuanced preparation. Video data is sensitive to resolution mismatches, where resizing must align precisely with the model's expectations. Frame rates become a balancing act: skipping too many frames risks missing critical moments, while analyzing all frames can overwhelm computational resources. However, when working with human biomechanics, undersampling and skipping frames is not an option. Accurate frame-by-frame analysis is crucial for capturing the intricate dynamics of human movement, ensuring no critical data points are lost. In the case of human biomechanics, frame rate is king. Precise frame rates ensure the fidelity needed for capturing dynamic movements, which is crucial since post-processing in biomechanics often involves advanced signal processing techniques to derive meaningful insights.

The sensitivity of computer vision models can be illustrated by the "one-pixel attack," where changing a single pixel in an image can completely alter a model's inference. This vulnerability underscores the importance of precise preprocessing and format awareness in video analysis. For example, biomechanical analysis often relies on accurately tracking joint movements frame by frame. If a video is heavily compressed using an unaware H.265 codec, critical details like limb edges or subtle movements can become blurred or distorted. This turns what could have been precise motion data into gibberish, rendering the analysis unreliable.

Media formats add another layer of complexity. A video encoded in H.264 might yield acceptable results, but switching to H.265—even with its superior compression—can introduce subtle artifacts that disrupt model performance. Variable frame rates can further destabilize temporal analyses, creating additional challenges in applications like motion tracking or joint analysis. In contrast, LLMs show remarkable resilience to format variations, handling plain text, JSON, or Markdown with little degradation in output quality.

Even after inference, post-processing is a labor-intensive step for computer vision. Human movement is inherently noisy. Post-processing is a must for any professional biomechanics setup, even when computer vision is not used. Frame-by-frame synchronization is critical to ensure temporal consistency, and derived metrics like joint angles or velocities rely on clean, stable data. For LLMs, post-processing often feels trivial by comparison—structured outputs like JSON usually require minimal refinement.

At Factorial Biomechanics, precision is non-negotiable. We have already automated the most complex parts of the pipeline, ensuring that users get reliable and actionable insights. Yet, we deliberately leave room for user interaction, allowing flexibility without compromising on accuracy. It’s a constant reminder that, while AI might appear plug-and-play on the surface, computer vision demands a much deeper engagement with the data. Simply wrapping a model interface, even if it’s a high-performing model, is not enough. Pre- and post-processing are the missing components of the business layer, bridging raw model outputs to actionable insights. But the payoff—delivering actionable biomechanical insights—makes the effort worthwhile.

If you’ve worked with AI, what challenges have you faced when moving between different domains? We’d love to hear your perspective.

René Vergara-Fuentes

René is a co-founder, CEO and Head of Technology at Factorial Biomechanics.

Computer vision is hard

The challenges of computer vision.

René Vergara-Fuentes

Our solution

Who is it for

Learn

About us