GeoViT
Context-aware AI geolocation system using Vision Transformers and OCR fusion for district-level precision in Istanbul.
GeoViT is an R&D project tackling urban geolocation using a hybrid AI architecture. Standard geolocation models treat cities as monolithic entities, but Istanbul's 39 districts share similar Ottoman-era architecture, defeating conventional approaches. GeoViT introduces a Context-Aware layer that processes images through a Vision Transformer encoder, extracts text signals via OCR (street signs, municipality markers), detects conflicts between visual and textual predictions, and resolves ambiguity through vector database queries. The system achieves 94.2% district-level accuracy — a 36x improvement over random baseline. The showcase features an interactive terminal demo, Istanbul coverage map with 61,000+ data points, and training data visualizations.
Interested in this project?
Feel free to reach out if you have questions or want to collaborate.