Caption trainer

Tools for training image captioning and vision-language models. Started as a BLIP fine-tuning setup and has grown into a broader toolkit covering inference, agentic caption pipelines, and iterative caption improvement.

caption-train

Fine-tune captioning and VLM models with LoRA and other parameter efficient methods
Run inference to generate and evaluate captions
Agentic flows for iterative caption refinement
Validation with WER (word error rate) and loss tracking