FrameMap combines the power of Brilliant Labs Frame smart glasses, the LLaVA Vision-Language model, and OpenStreetMap to create an interactive tool for capturing and visualizing spatially contextualized photos. The system captures images, generates AI-driven descriptions, and maps them to geolocations, enabling users to visualize their captured moments on an interactive map.
The project used Python to enable communication between the Brilliant Labs Frame, LLaVA model, and OpenStreetMap frontend. Captured images were processed using the locally hosted LLaVA model via Ollama to generate textual captions, while geolocation data was retrieved using IP-based services. The metadata, including photo paths, captions, and geolocations, was stored in a JSON file for visualization on a Leaflet-based map interface.
The project successfully demonstrated the integration of AI-driven visual descriptions with spatial data visualization. The interactive map provided a user-friendly interface for exploring captured moments. However, the reliance on IP-based geolocation limited the accuracy of spatial metadata, highlighting the need for GPS-based enhancements in future iterations.
calluxpore/FrameMap-Photo-Capture-Description-and-Location-Visualization