Published inVoxel51Rethinking How We Evaluate Multimodal AICVPR 2025 reveals why spatial reasoning and subjective βvibesβ are redefining how we benchmark AI systems3d ago3d ago
Published inVoxel51This Visual Illusions Benchmark Makes Me Question the Power of VLMsExploring how well modern AI systems can spot visual deceptionsMar 3A response icon1Mar 3A response icon1
Published inVoxel51Memes Are the VLM Benchmark We DeserveSeriously, Memes Are All We NeedFeb 20A response icon1Feb 20A response icon1
Published inVoxel51Can VLMs Hear What They See?Exploring the Intersection of Vision Language Models and Audio DataFeb 20Feb 20
Published inVoxel51WebUOT-1M: A Dataset for Underwater Object TrackingHow 1.1 Million Frames Are Transforming Object TrackingFeb 17Feb 17
Published inVoxel51Beyond the Microscope: Diving into BIOSCAN-5M, a New Dataset for Insect Biodiversity ResearchExploring the Worldβs Largest Insect Dataset with a Modern Toolkit for Visual AIFeb 13Feb 13
Published inVoxel51AIMv2 Outperforms CLIP on Synthetic Dataset ImageNet-DTesting Vision Model Robustness: A hands-on tutorial for evaluating vision model performance on synthetic data using embedding analysis andβ¦Feb 12Feb 12
Published inVoxel51Visual Understanding with AIMv2Move over, CLIPβββyouβve been dethroned!Feb 11Feb 11