GeoViT

Context-aware AI geolocation system using Vision Transformers and OCR fusion for district-level precision in Istanbul.

🚀 Active

2026

active

About This Project

GeoViT is an R&D project tackling urban geolocation using a hybrid AI architecture. Standard geolocation models treat cities as monolithic entities, but Istanbul's 39 districts share similar Ottoman-era architecture, defeating conventional approaches. GeoViT introduces a Context-Aware layer that processes images through a Vision Transformer encoder, extracts text signals via OCR (street signs, municipality markers), detects conflicts between visual and textual predictions, and resolves ambiguity through vector database queries. The system achieves 94.2% district-level accuracy — a 36x improvement over random baseline. The showcase features an interactive terminal demo, Istanbul coverage map with 61,000+ data points, and training data visualizations.

Technologies Used

The tech stack that powers this project

Python

PyTorch

Vision Transformers (ViT)

OpenCV

OCR

React

TypeScript

Leaflet.js

Project Links

Explore the code, demo, or live website

Visit Website

Interested in this project?

Feel free to reach out if you have questions or want to collaborate.

Get In Touch View All Projects