gen‑ai.news
← Back
Image

New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds

Counting objects in images sounds straightforward, but it has long been one of the more stubborn problems in computer vision. Earlier systems were typically trained for narrow domains - counting people in a crowd, for instance, or cells in a microscopy slide - and performed poorly when asked to generalize beyond those specific contexts. "Count Anything" is designed to break that constraint by accepting an open-ended text prompt as its only guidance, allowing a single model to handle a broad range of counting tasks without retraining or fine-tuning for each category.

According to the researchers, the model achieves roughly half the error rate of previous general-purpose counting systems on standard benchmarks. That is a meaningful improvement, since counting accuracy tends to degrade quickly as scenes become more complex or the target objects vary in size, occlusion, and appearance. The text-prompt approach means a user can simply describe what they want counted - "white blood cells," "people waiting in line," "cars in a parking lot" - and the model attempts to locate and tally each instance accordingly.

The underlying approach likely draws on the growing body of work that combines vision encoders with language models, allowing visual understanding to be steered by natural language descriptions rather than fixed category labels. This kind of open-vocabulary design is increasingly common in object detection and segmentation, and applying it to counting is a logical extension. The challenge is that counting demands not just identifying that something is present, but precisely localizing and distinguishing every individual instance - a harder requirement than simple classification or detection.

Despite the progress, the model has clear limits. Very dense configurations - tightly packed crowds or overlapping cells - still produce higher error rates, which is consistent with the difficulty of separating individual instances when they occlude one another heavily. Ambiguous or abstract text prompts also cause problems, since the model must interpret what the user means before it can begin counting. These limitations suggest that "Count Anything" is a solid step forward for general-purpose visual counting rather than a finished solution, and the domain will likely see continued iteration as training data and architecture choices improve.

Enjoy this story? Get the next one in your inbox.

Twice a week: the most important stories in generative image and video AI, distilled into a 2-minute read.

Free. Unsubscribe any time. No spam, ever.

Your next read

Image

Photographer Disturbed By AI-Generated ‘Women’ in Beauty Magazine

Austin-based photographer and director of photography Cassandra Klepac recently noticed AI-generated images of women appearing in a beauty magazine, raising concerns about the implications for working photographers. The incident highlights how generative AI is quietly making its way into fashion and beauty editorial content. Her reaction has sparked a broader conversation about transparency, labor, and the future of commercial photography.

Image

Adobe Adds More User Control to AI Features Inside Lightroom and Photoshop

Adobe has rolled out new Creative Cloud updates to Lightroom and Photoshop that give photographers more control over AI-assisted workflows, particularly around photo culling and selection. The changes are aimed at reducing the time photographers spend manually sorting through large batches of images. The updates reflect Adobe's ongoing effort to make AI tools feel more transparent and adjustable rather than fully automated.