Content
# auto-slicing
[English](README_en.md) | [简体中文](README.md)
### One-sentence Summary
Automated slicing: This is a project that implements automated video slicing through ASR + LLM Agent + MCP.
### 👐 Self-Recommendation
- One day, while you are live streaming, you suddenly have an idea and fantasize about becoming one of those `super slicers`, creating video slices for your beloved streamer or your favorite V🤳. What should you do?
- You have the desire but no time to edit, and you haven't even learned how to edit. What should you do?
- You gather your courage to prepare the recorded video for editing, only to realize this is just the starting point 🙂↕️.
- Filtering, managing the timeline ⌚️, coming up with suitable titles 🙋—it's all too complicated 😩. If one day you miss the live stream 🏃♂️, you'll have to watch the entire recording again before slicing. What should you do? 😭
- You say you love it, and it is indeed stunning, but it still requires too much effort. Is there a simpler and more powerful way to slice videos?
- Yes, there is, my friend.
- Check out `auto-slicing`. You handle the recording, and it handles the slicing.
Currently, it is still in a relatively ordinary phase. Please wait for multiple rounds of optimization.
## 👀 Quick Start
### 1. Install Dependencies
#### 1.1 Clone this project or download the zip package directly
Prerequisites:
- GPU memory >= 8GB; I haven't tried smaller ones.
- RAM >= 32GB.
#### 1.2 Configure Python Environment
Python version: Theoretically, Python `3.11~3.12` is fine. There were some issues with `3.13` before, and I'm not sure if they have been fixed now.
1. It is recommended to use uv for installation.
```bash
uv pip install -r requirements.txt
```
2. Or install directly using pip.
```bash
pip install -r requirements.txt
```
Note: If you encounter an error related to Pillow during installation, you can run:
```bash
sudo apt-get install -y libjpeg-dev zlib1g-dev
```
#### 1.3 Configure ffmpeg
The editing part of this project is implemented by [`zakahan/vedit-mcp`](https://github.com/zakahan/vedit-mcp), which relies on `ffmpeg`. Therefore, please configure ffmpeg.
```bash
# ubuntu
sudo apt update
sudo apt install ffmpeg
```
#### 1.4 Download ASR Model Weights
- Audio analysis uses speech recognition and speech interruption detection. Please refer to:
- https://github.com/FunAudioLLM/SenseVoice
- [modelscope-iic/SenseVoiceSmall](https://www.modelscope.cn/models/iic/SenseVoiceSmall) - Corresponds to SENSE_VOICE_MODEL_PATH
- [modelscope-icc/vad_fsmmn](https://www.modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) - Corresponds to SENSE_VOICE_VAD_MODEL_PATH
Note: This part currently only supports local inference; API support may be added later.
- Alternatively, you can choose to download from a cloud storage link:
> File shared via cloud storage: iic.zip
> Link: https://pan.baidu.com/s/128FIp4k8qGez5pjBAP9Pbg?pwd=6rfi Extraction code: 6rfi
> -- Shared by Baidu Cloud Super Member v8.
#### 1.5 Configure Environment Variables
```bash
cd auto-slicing/src
cp .env.example .env
```
Edit the `.env` file and modify some configurations to match your actual situation.
Note: Currently, this script uses the API from the [`Volcano Ark Platform`](https://www.volcengine.com/product/ark), so both API_BASE and API_KEY are from this platform.
1. `OPENAI_API_BASE`: Currently set to the API base of the Volcano Ark platform.
2. `OPENAI_API_KEY`: It is recommended to configure this directly as an environment variable to prevent leakage issues, but you can also configure it directly here.
3. `OPENAI_MODEL` and `OPENAI_MODEL_THINKING`: Model names; please adjust according to your actual situation.
4. `SENSE_VOICE_LOCAL_MODEL_PATH`: Change this to the path of the downloaded sense_voice model weights.
5. `SENSE_VOICE_LOCAL_VAD_MODEL_PATH`: Change this to the path of the downloaded vad_model weights.
6. `KB_BASE_PATH`: The base path for slicing processing; all files will be relative to this path.
Note: It is recommended to use absolute paths for the above paths.
### 2. Start the Project
(Choose either 2.1 or 2.2)
#### 2.1 Script Start
Please modify the query part in `src/main.py` to what you need.
Note: `raw_video` must be a path relative to `KB_BASE_PATH`. This design is to reduce the possibility of path errors during the large model's invocation.
```bash
cd src
python main.py
```
#### 2.2 Web UI
```bash
bash start_up.sh
```
Enter the web UI.
## 🫡 Implementation Introduction
The overall architecture diagram is as follows:

For specifics, you can directly look at the code in the `src/processor` section, where the entry points for each module are located. The overall idea is already clear in the diagram.
## ✅ Todo List
- [ ] Add prompt switching to support opening and closing titles.
- [ ] Implement support for ASR API to break free from local inference limitations.
- [ ] Expand `vedit-mcp`; currently, it only supports basic editing functions and needs further support.
- [ ] Add subtitle functionality.
- [ ] Add API call methods for speech recognition.
- [ ] Add cover generation functionality; start with a simple version.
- [ ] Consider support for song responses.
- [ ] Consider using speaker separation to support scenarios where audio signals are not unique, such as game responses and video playback.
## 🔥 Latest News
- 2025-05-18: Unable to resolve the streamlit file_uploader bug, switched to gradio for implementation.
- 2025-05-08: Implemented a simple web UI using streamlit.
You Might Also Like
OpenWebUI
Open WebUI is an extensible web interface for customizable applications.

NextChat
NextChat is a light and fast AI assistant supporting Claude, DeepSeek, GPT4...

Continue
Continue is an open-source project for seamless server management.
sec-edgar-mcp
SEC EDGAR MCP is a cross-platform tool for accessing SEC EDGAR data.
StarRocks
StarRocks MCP Server enables AI assistants to execute SQL, explore...
mcp-package-version
MCP Server for checking latest stable package versions across multiple registries.