AI Product / Multimodal / VLM

KEHAO CHEN

AI Application & Multimodal Engineer, delivering production-ready AI products.

AI 应用与多模态工程师,交付真实上线的 AI 产品。

Focused on LLMs/VLMs, RAG, intelligent document parsing, model fine-tuning, and end-to-end product delivery. Experienced in building complex document parsers, structuring financial charts, generating synthetic multi-turn medical datasets, and launching complete products (recently independent creator of PawMood iOS App).

关注 LLM/VLM、RAG、文档解析、模型微调和端到端产品交付。做过复杂文档解析、金融图表结构化、智能问诊数据生成,也能把想法落成完整产品:近期独立开发 PawMood iOS App。

94.6% Doc Parsing Accuracy 文档文本识别准确率
20k+ Multi-Turn Medical SFT Data 多轮问诊 SFT 数据
95%+ Financial Chart Parsing Precision 金融图表解析 Precision
01

Experience & Education

经历与教育背景

Work & Research

工作与研究经历

2024.09 — 2026.01 Matrix Origin 矩阵起源

Algorithm Engineer

算法工程师

Led document parsing and enterprise RAG QA projects for the AI Platform. Drove the migration of the document parsing pipeline from multi-model LayoutLM+OCR to end-to-end QwenVL, building fine-tuning datasets and evaluation systems to achieve 94.6% text parsing accuracy and increase table recognition TEDS from 74% to 87%. Built a multi-agent framework to generate 20k+ multi-turn medical SFT dialogues.

负责 AI Platform 方向的文档解析、企业知识库问答项目。推动文档解析从 LayoutLM+OCR 多模型管线迁移至 QwenVL 端到端方案,构建微调数据与评测体系,使文本识别准确率达 94.6%,表格识别 TEDS 从 74% 提升至 87%。搭建多智能体生成框架,合成 20k+ 多轮问诊 SFT 数据。

2024.01 — 24.08 Wind Information 万得信息

Algorithm Engineer

算法工程师

Focused on financial report parsing and chart QA. Led domain-specific fine-tuning for OneChart to extract structured JSON data from complex financial charts, maintaining a precision rate above 95%. Developed PPT transition detection and title positioning in meeting videos using keyframe extraction and LayoutLMv3 fine-tuning.

面向金融研报、图表问答等场景。主导 OneChart 领域微调,提取金融图表 JSON 结构化信息,Precision 稳定在 95% 以上。基于关键帧提取与 LayoutLMv3 微调实现会议视频 PPT 切换识别 and 标题定位。

2023.07 — 23.12 The University of Sydney 悉尼大学

Multimodal Sentiment Analysis Research

多模态情感分析研究

Proposed a BERT + DINOv2 dual-tower fusion architecture using Cross-Attention to improve image-text alignment. Research paper "Enhancing Sentiment Analysis Through Multimodal Fusion" ↗ accepted by ICCS 2025.

提出 BERT + DINOv2 双塔融合架构,并用 Cross-Attention 改善图文特征对齐。论文 Enhancing Sentiment Analysis Through Multimodal Fusion 被 ICCS 2025 接收。

Education

教育背景

2022.07 — 2023.12 The University of Sydney 悉尼大学

Master of Data Science

数据科学 硕士

Core Courses: Computational Statistical Methods, Database Management Systems, Principles of Data Science, Data Mining and Machine Learning, Deep Learning, Natural Language Processing.

主修课程:计算统计方法,数据库管理系统,数据科学原理,数据挖掘与机器学习,深度学习,自然语言处理。

2017.09 — 2021.06 Beijing University of Posts and Telecommunications 北京邮电大学

Bachelor of Telecom Engineering with Management

电信工程及管理 本科

Core Courses: Signals and Systems, Digital Signal Processing, Principles of Communications, Image and Video Processing, Computer Vision.

主修课程:信号与系统,数字信号处理、通信原理,图形与视频处理,计算机视觉。

02

Selected AI Cases

精选 AI 项目案例

Matrix Origin · Medical Data 矩阵起源 · 医疗数据 Synthetic Medical Dialogue Generator 智能问诊数据生成

Simulated patient-doctor interactions using a persona-based multi-agent framework to generate high-quality multi-turn dialogues from real clinical histories. Developed automated cleaning pipelines using LLM-as-a-Judge.

基于 Persona 的多智能体框架模拟医患交互,结合真实病例逻辑生成高质量多轮对话,并用 LLM-as-a-Judge 建立自动化清洗机制。

20k+ SFT Dialogue Records 20k+ SFT 对话数据 65% → 88% Dialogue Consistency (BERTScore) 65% → 88% BERTScore LoRA Fine-Tuning Qwen2.5-7B LoRA 微调 Qwen2.5-7B
Wind Info · Financial Reports 万得信息 · 金融研报 Financial Chart CQA System 金融图表问答 CQA

Led domain fine-tuning of OneChart to extract structured JSON data from complex financial charts. Designed custom annotation tools and a semi-automated data pipeline.

主导 OneChart 领域微调,从复杂金融图表抽取 JSON 结构化信息,设计图表专用标注工具和半自动数据生产管线。

95%+ Chart Parsing Precision 95%+ 图表解析 Precision Reduced manual annotation costs significantly 降低人工标注成本 Supported numerical reasoning for financial charts 支撑金融图表数值推理
03

Capabilities

专业核心技能

AI Applications & Multimodal

AI 应用与多模态

LLM, VLM, RAG, Prompt Engineering, SFT, LoRA, QwenVL, LayoutLM, CLIP, BERT, DINOv2.

ML & Data Engineering

机器学习与数据

Python, SQL, R, PyTorch, Scikit-learn, Pandas, classical machine learning, deep learning, time-series modeling, and data visualization.

Python, SQL, R, PyTorch, Scikit-learn, Pandas, 传统机器学习、深度学习、时序建模与数据可视化。

Product Engineering

产品工程

SwiftUI, FastAPI, Docker, Linux, Git, API Design, model evaluation, data cleaning, and annotation pipelines.

SwiftUI, FastAPI, Docker, Linux, Git, API 设计, 模型评测、数据清洗与标注管线。

Languages & Certifications

语言与证书

IELTS: 6.5, CET-6. Proficient in professional English technical reading, writing, and communication.

大学英语六级 (CET-6),雅思 6.5。具备良好的英文技术文档阅读、撰写与日常交流能力。

04

PawMood : Pet Meme Maker

PawMood : 宠物表情包生成器

Indie Project · AI Pet Meme Maker App

独立开发 · AI 宠物表情包生成 App

A complete AI product loop starting from photo uploads, leveraging VLMs to comprehend pet poses and image context, and generating shareable English meme captions and cards. Includes a SwiftUI iOS client, FastAPI backend, Apple sign-in, subscription entitlements, and Supabase cloud deployment.

从拍照/相册上传开始,用 VLM 理解宠物姿态和画面情绪,再生成适合社交分享的英文 meme 文案与卡片。项目覆盖 SwiftUI 客户端、FastAPI 后端、账号登录、订阅权益、存储和线上部署,是一个完整的 AI 产品闭环。

UX 体验

Seamless photo upload, mood selection, meme generation, card layout toggle, re-generation, and local saving.

照片上传、情绪选择、文案生成、卡片布局切换、重生成与下载保存。

Engineering 工程

Sign in with Apple, RevenueCat subscription tiers, Supabase sync, and FastAPI cloud deployment.

Sign in with Apple、RevenueCat 多档会员权益、Supabase 数据同步、FastAPI 服务部署。

AI AI

VLM captioning for raw pet scene descriptions combined with styled prompt constraints for shareable copy.

VLM Caption 生成宠物描述,结合风格和语气约束生成可分享的 meme 文案。

SwiftUI FastAPI VLM RevenueCat Supabase
05

Connect

联系与合作

Interested in AI products, indie development, or multimodal applications? Let's connect.

想聊 AI 产品、独立开发或多模态应用?随时联系我。