Meta's New AI Model Enhances Video Editing and Object Tracking

Meta’s New AI Model Enhances Video Editing and Object Tracking

Meta has introduced a new AI model, the Segment Anything Model 2 (SAM 2), capable of labeling and tracking any object in a video. This innovation builds on its predecessor, SAM, which was limited to images, and expands the possibilities for video editing and analysis.

SAM 2 offers real-time segmentation, marking a significant technical advancement. It demonstrates how AI can process moving images and distinguish between elements on screen, even as they move in and out of the frame. Segmentation refers to the process by which software determines the pixels in an image that belong to different objects. An AI assistant capable of this task simplifies the processing and editing of complex images.

 

Meta's New AI Model Enhances Video Editing and Object Tracking

 

The original SAM model has been used for various applications, including segmenting sonar images of coral reefs, analyzing satellite images for disaster relief, and detecting skin cancer through cellular images. SAM 2 extends these capabilities to video, which was not previously feasible. As part of its launch, Meta released a database of 50,000 videos created to train the model, in addition to the 100,000 other videos mentioned earlier. Real-time video segmentation requires significant computing power, so while SAM 2 is currently open and free, it may not remain that way indefinitely.

Advancements and Potential Applications

With SAM 2, video editors can isolate and manipulate objects within a scene more efficiently than current editing software, surpassing the limitations of manually adjusting each frame. Meta envisions SAM 2 transforming interactive video, allowing users to select and manipulate objects within live videos or virtual spaces.

The model could also play a pivotal role in developing computer vision systems, especially in autonomous vehicles, where accurate object tracking is essential for safe navigation. SAM 2’s capabilities could accelerate the annotation process of visual data, providing high-quality training data for AI systems.

While much attention is given to AI models that generate videos from text prompts, such as OpenAI’s Sora, Runway, and Google Veo, SAM 2’s editing abilities may have a more significant impact on integrating AI into video creation. Other AI video developers are also working on similar technologies. For instance, Google has recently been testing video summarization and object recognition features on YouTube, while Adobe’s Firefly AI tools focus on photo and video editing, including content-aware fill and auto-reframe features.