Caption trainer
Tools for training image captioning and vision-language models. Started as a BLIP fine-tuning setup and has grown into a broader toolkit covering inference, agentic caption pipelines, and iterative caption improvement.
- Fine-tune captioning and VLM models with LoRA and other parameter efficient methods
- Run inference to generate and evaluate captions
- Agentic flows for iterative caption refinement
- Validation with WER (word error rate) and loss tracking