Back to Projects

GeoViT

Context-aware AI geolocation system using Vision Transformers and OCR fusion for district-level precision in Istanbul.

🚀 Active
2026
active
About This Project

GeoViT is an R&D project tackling urban geolocation using a hybrid AI architecture. Standard geolocation models treat cities as monolithic entities, but Istanbul's 39 districts share similar Ottoman-era architecture, defeating conventional approaches. GeoViT introduces a Context-Aware layer that processes images through a Vision Transformer encoder, extracts text signals via OCR (street signs, municipality markers), detects conflicts between visual and textual predictions, and resolves ambiguity through vector database queries. The system achieves 94.2% district-level accuracy — a 36x improvement over random baseline. The showcase features an interactive terminal demo, Istanbul coverage map with 61,000+ data points, and training data visualizations.

Technologies Used
The tech stack that powers this project
Python
PyTorch
Vision Transformers (ViT)
OpenCV
OCR
React
TypeScript
Leaflet.js
Project Links
Explore the code, demo, or live website

Interested in this project?

Feel free to reach out if you have questions or want to collaborate.