A production-grade OCR application designed for the Fintech sector. This project demonstrates advanced on-device text recognition for credit/debit cards and bank passbooks, utilizing a heuristic-based parsing engine for high accuracy in real-world conditions.
- Details Extracted: Card Number, Expiry Date, Card Holder Name.
- Privacy: Automated masking of the card number (e.g.,
XXXX XXXX XXXX 1234) with a toggle to view. - Algorithm:
CardParsermanually processes noisy OCR text using a scoring system. - Luhn Validation: Manual implementation of the Luhn Algorithm (
LuhnValidator) to verify card authenticity.
- Details Extracted: Account Holder Name, Account Number, IFSC Code.
- Algorithm:
PassbookParseruses keyword proximity and heuristic scoring to identify account details amidst transaction history and noisy text. - Normalization: Standardizes Indian IFSC codes (ensuring the 5th character is always
0).
As per assignment constraints, no external libraries were used for parsing the extracted OCR text.
- Candidate Identification: Scans text for 13-19 digit sequences.
- Prioritization: Ranks candidates based on Luhn validity and standard card lengths.
- Expiry Detection: Searches for
MM/YYorMM/YYYYpatterns. Scores them higher if near keywords like "Valid Thru" or "Expiry" and lower if near "DOB". - Name Extraction: Identifies 2-3 word uppercase strings excluding card brand names and banking keywords.
- IFSC Detection: Regex-based pattern matching for standard 11-digit IFSC codes.
- Account Number Extraction: Identifies numeric sequences of 9-18 digits. Uses a scoring system where proximity to "A/C" or "Account" increases confidence, while proximity to "Balance" or "Amount" decreases it.
- Name Extraction: Looks for name-related keywords and extracts the following alphabetic string.
- Language: Optimized for English-language documents.
- IFSC Format: Assumes the standard 11-character Indian IFSC format.
- Card Holder Name: Extracted if printed in standard uppercase format on the card front.
- Image Quality: Assumes reasonable lighting and focus for on-device OCR accuracy.
- Bank Logo Recognition: While branding helps scoring, a full logo-to-bank mapping was skipped to focus on text-based parsing.
- Non-Standard Expiry: Very rare formats (e.g., text-based "Dec 2025") were deprioritized over numeric formats.
- Offline ML Models: Used Google ML Kit which is on-device but requires initial library download; fully air-gapped custom Tesseract models were skipped for better performance/size ratio.
- State Management: Cubit (Flutter Bloc) for predictable, testable business logic.
- Dependency Injection: GetIt for service decoupling.
- OCR Engine: Google ML Kit Text Recognition.
- Navigation: GoRouter for declarative routing.
- Architecture: Feature-first Clean Architecture.
lib/
├── core/ # Shared utilities (Regex, Luhn, Text Normalization)
├── features/ # Domain-specific logic (Card, Passbook)
├── shared/ # Reusable UI components
└── app/ # DI, Theme, and Routing configuration
-
Clone the repository
git clone https://github.com/codebyforam/flutter-scanner.git
-
Install Dependencies
flutter pub get
-
Run Tests
flutter test -
Run Application
flutter run
codeByForam
- GitHub: @codeByForam
Distributed under the MIT License. See LICENSE for more information.