Content

# auto-slicing [English](README_en.md) | [简体中文](README.md) ### One-sentence Summary Automated slicing: This is a project that implements automated video slicing through ASR + LLM Agent + MCP. ### 👐 Self-Recommendation - One day, while you are live streaming, you suddenly have an idea and fantasize about becoming one of those `super slicers`, creating video slices for your beloved streamer or your favorite V🤳. What should you do? - You have the desire but no time to edit, and you haven't even learned how to edit. What should you do? - You gather your courage to prepare the recorded video for editing, only to realize this is just the starting point 🙂‍↕️. - Filtering, managing the timeline ⌚️, coming up with suitable titles 🙋—it's all too complicated 😩. If one day you miss the live stream 🏃‍♂️, you'll have to watch the entire recording again before slicing. What should you do? 😭 - You say you love it, and it is indeed stunning, but it still requires too much effort. Is there a simpler and more powerful way to slice videos? - Yes, there is, my friend. - Check out `auto-slicing`. You handle the recording, and it handles the slicing. Currently, it is still in a relatively ordinary phase. Please wait for multiple rounds of optimization. ## 👀 Quick Start ### 1. Install Dependencies #### 1.1 Clone this project or download the zip package directly Prerequisites: - GPU memory >= 8GB; I haven't tried smaller ones. - RAM >= 32GB. #### 1.2 Configure Python Environment Python version: Theoretically, Python `3.11~3.12` is fine. There were some issues with `3.13` before, and I'm not sure if they have been fixed now. 1. It is recommended to use uv for installation. ```bash uv pip install -r requirements.txt ``` 2. Or install directly using pip. ```bash pip install -r requirements.txt ``` Note: If you encounter an error related to Pillow during installation, you can run: ```bash sudo apt-get install -y libjpeg-dev zlib1g-dev ``` #### 1.3 Configure ffmpeg The editing part of this project is implemented by [`zakahan/vedit-mcp`](https://github.com/zakahan/vedit-mcp), which relies on `ffmpeg`. Therefore, please configure ffmpeg. ```bash # ubuntu sudo apt update sudo apt install ffmpeg ``` #### 1.4 Download ASR Model Weights - Audio analysis uses speech recognition and speech interruption detection. Please refer to: - https://github.com/FunAudioLLM/SenseVoice - [modelscope-iic/SenseVoiceSmall](https://www.modelscope.cn/models/iic/SenseVoiceSmall) - Corresponds to SENSE_VOICE_MODEL_PATH - [modelscope-icc/vad_fsmmn](https://www.modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) - Corresponds to SENSE_VOICE_VAD_MODEL_PATH Note: This part currently only supports local inference; API support may be added later. - Alternatively, you can choose to download from a cloud storage link: > File shared via cloud storage: iic.zip > Link: https://pan.baidu.com/s/128FIp4k8qGez5pjBAP9Pbg?pwd=6rfi Extraction code: 6rfi > -- Shared by Baidu Cloud Super Member v8. #### 1.5 Configure Environment Variables ```bash cd auto-slicing/src cp .env.example .env ``` Edit the `.env` file and modify some configurations to match your actual situation. Note: Currently, this script uses the API from the [`Volcano Ark Platform`](https://www.volcengine.com/product/ark), so both API_BASE and API_KEY are from this platform. 1. `OPENAI_API_BASE`: Currently set to the API base of the Volcano Ark platform. 2. `OPENAI_API_KEY`: It is recommended to configure this directly as an environment variable to prevent leakage issues, but you can also configure it directly here. 3. `OPENAI_MODEL` and `OPENAI_MODEL_THINKING`: Model names; please adjust according to your actual situation. 4. `SENSE_VOICE_LOCAL_MODEL_PATH`: Change this to the path of the downloaded sense_voice model weights. 5. `SENSE_VOICE_LOCAL_VAD_MODEL_PATH`: Change this to the path of the downloaded vad_model weights. 6. `KB_BASE_PATH`: The base path for slicing processing; all files will be relative to this path. Note: It is recommended to use absolute paths for the above paths. ### 2. Start the Project (Choose either 2.1 or 2.2) #### 2.1 Script Start Please modify the query part in `src/main.py` to what you need. Note: `raw_video` must be a path relative to `KB_BASE_PATH`. This design is to reduce the possibility of path errors during the large model's invocation. ```bash cd src python main.py ``` #### 2.2 Web UI ```bash bash start_up.sh ``` Enter the web UI. ## 🫡 Implementation Introduction The overall architecture diagram is as follows: ![](assets/images/stream_cn.png) For specifics, you can directly look at the code in the `src/processor` section, where the entry points for each module are located. The overall idea is already clear in the diagram. ## ✅ Todo List - [ ] Add prompt switching to support opening and closing titles. - [ ] Implement support for ASR API to break free from local inference limitations. - [ ] Expand `vedit-mcp`; currently, it only supports basic editing functions and needs further support. - [ ] Add subtitle functionality. - [ ] Add API call methods for speech recognition. - [ ] Add cover generation functionality; start with a simple version. - [ ] Consider support for song responses. - [ ] Consider using speaker separation to support scenarios where audio signals are not unique, such as game responses and video playback. ## 🔥 Latest News - 2025-05-18: Unable to resolve the streamlit file_uploader bug, switched to gradio for implementation. - 2025-05-08: Implemented a simple web UI using streamlit.

auto-slicing

Content

You Might Also Like

OpenWebUI

NextChat

Continue

sec-edgar-mcp

StarRocks

mcp-package-version

auto-slicing

Scan with WeChat to Share

Content

You Might Also Like

OpenWebUI

NextChat

Continue

sec-edgar-mcp

StarRocks

mcp-package-version