This project is a Natural Language Processing (NLP) pipeline designed to classify legal case reports into their respective areas of law (e.g., Civil Procedure, Criminal Law, Company Law, etc.).
It supports inference via a FastAPI endpoint where users can submit the full body of a legal judgment (full_report) and receive the predicted legal category.
- Clean and preprocess dataset
- Build a pipeline that maps legal case reports to legal categories.
- Compared traditional ML methods (TF-IDF + Logistic Regression) vs transformer-based deep learning models (DeBERTa).
- Serve the final transformer model as an API.
full_report: The body of the legal judgment.introduction: Used to extract the area of law (labels).- Other fields:
case_title,suitno,facts,issues,decision.
Labels include:
- Civil Procedure
- Criminal Law and Procedure
- Enforcement of Fundamental Rights
- Company Law
- Election Petition, etc.
- Text Vectorization: TF-IDF (
TfidfVectorizer) - Model: Logistic Regression
- Label Extraction: Regex pattern matching from
introduction - Evaluation:
- Accuracy: ~35%
- F1 Score: Low for imbalanced classes
- Limitation: Unable to understand contextual meaning, struggled with nuanced legal phrasing.
- Model Used:
distilbert-base-uncasedfine-tuned onfull_report - Tokenizer: Hugging Face tokenizer
- Label Encoder:
LabelEncoder()for mapping categories - Training: Done on CPU (adapted for low-resource environment)
- Evaluation:
- Higher F1 and accuracy
- Better understanding of complex language
- Handled rare labels more gracefully
LawPavilion/
├── app/
│ ├── api.py # FastAPI routes and logic
│ └── model_utilis.py # Load model/tokenizer/label_encoder
├── models/
│ ├── config.json, tokenizer, model weights, saved_model.pkl
├── main.py # Entrypoint for FastAPI app
├── requirements.txt
└── Legal_Classifier.ipynb # Jupyter notebook with traditional + BERT training
git clone https://github.com/shashacode/NLP_Code_Challenge.git
cd LawPavillionpip install -r requirements.txtuvicorn main:app --reloadVisit: http://127.0.0.1:8000/docs to access the Swagger UI.
Endpoint: POST /predict
Request Body:
{
"full_report": "The appellant was tried for armed robbery and sentenced to life imprisonment. The appeal concerns improper identification and denial of fair hearing."
}Response:
{
"predicted_area_of_law": "Criminal Law and Procedure"
}This solution was developed as part of the LawPavilion Legal NLP Challenge, aimed at advancing AI solutions in the legal domain in Nigeria.