Python实战:用列表推导式+Requests搞定M3U8视频下载,自动过滤广告.ts文件

发布时间:2026/6/6 8:31:51

Python实战:用列表推导式+Requests搞定M3U8视频下载,自动过滤广告.ts文件 Python实战构建M3U8视频下载器与广告过滤系统每次在视频网站追剧时那些突如其来的广告总是让人烦躁。作为开发者我们完全可以用Python打造一个专属的下载工具不仅能自动跳过广告片段还能将视频保存到本地随时观看。本文将带你从零开始用不到100行代码实现这个实用工具。1. 环境准备与基础原理在开始编码前我们需要了解几个关键概念。M3U8是一种基于HTTP Live Streaming (HLS)协议的播放列表格式它将视频分割成多个小的.ts文件便于网络传输和自适应码率切换。我们的任务就是解析这个播放列表下载所有真正的视频片段同时过滤掉广告内容。首先确保你的Python环境已安装必要的库pip install requests pycryptodome注意pycryptodome是替代Crypto的现代加密库功能相同但维护更好M3U8文件通常包含类似这样的内容#EXTM3U #EXT-X-VERSION:3 #EXT-X-TARGETDURATION:10 #EXTINF:9.009, http://example.com/segment1.ts #EXTINF:9.009, http://ad.server.com/ad1.ts #EXTINF:9.009, http://example.com/segment2.ts2. 智能广告过滤系统实现广告识别是核心挑战。常见广告特征包括特定域名如ad.、ads.、doubleclick.net等URL中包含特定路径如/ad/、/commercial/文件命名模式如ad_开头2.1 基础过滤方案最简单的过滤方式是排除包含广告关键词的URLdef filter_ads(lines): ad_keywords [ad., ads., doubleclick.net] return [ line.strip() for line in lines if line.strip().endswith(.ts) and not any(keyword in line for keyword in ad_keywords) ]2.2 高级动态过滤方案更健壮的做法是将广告规则外部化便于更新维护def load_filter_rules(rule_filead_rules.txt): with open(rule_file) as f: return [line.strip() for line in f if line.strip()] def advanced_filter(lines, rules): return [ line.strip() for line in lines if line.strip().endswith(.ts) and not any(rule in line for rule in rules) ]规则文件示例ad. ads. /ad/ /commercial/ tracking.3. 稳健的下载引擎实现下载大量小文件需要考虑网络异常、性能优化等问题。下面是增强版的下载器import requests from pathlib import Path import time def download_segments(urls, output_diroutput, max_retries3): Path(output_dir).mkdir(exist_okTrue) headers { User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) } for idx, url in enumerate(urls, 1): filename f{idx:010d}.ts # 10位数字填充 filepath Path(output_dir) / filename for attempt in range(max_retries): try: resp requests.get(url, headersheaders, timeout20) resp.raise_for_status() with open(filepath, wb) as f: f.write(resp.content) print(f下载成功: {filename}) break except Exception as e: print(f尝试 {attempt1}/{max_retries} 失败: {e}) if attempt max_retries - 1: print(f无法下载 {url}, 跳过...) time.sleep(2 ** attempt) # 指数退避关键优化点自动创建输出目录文件名自动补零确保正确排序指数退避重试机制详细的进度反馈4. 视频合并与解密处理4.1 基本合并方法下载完成后可以使用系统命令合并文件import subprocess def merge_ts_files(input_diroutput, output_fileoutput.mp4): if not Path(input_dir).exists(): raise FileNotFoundError(f目录 {input_dir} 不存在) # Windows系统使用copy命令 if os.name nt: cmd fcopy /b {input_dir}\\*.ts {output_file} # Linux/Mac使用cat命令 else: cmd fcat {input_dir}/*.ts {output_file} subprocess.run(cmd, shellTrue, checkTrue)4.2 处理加密视频遇到加密视频时我们需要解密后再合并from Crypto.Cipher import AES def decrypt_ts_file(encrypted_data, key, ivNone): cipher AES.new(key, AES.MODE_CBC, iviv) if iv else AES.new(key, AES.MODE_CBC) return cipher.decrypt(encrypted_data) def download_and_decrypt(url, key, output_path, ivNone): response requests.get(url, timeout10) if response.status_code 200: decrypted_data decrypt_ts_file(response.content, key, iv) with open(output_path, wb) as f: f.write(decrypted_data) return True return False5. 完整工作流实现将所有组件整合成一个完整的解决方案def process_m3u8(m3u8_url, output_videooutput.mp4, ad_rules_fileNone): print(正在下载M3U8文件...) m3u8_content requests.get(m3u8_url).text.splitlines() print(过滤广告片段...) if ad_rules_file: rules load_filter_rules(ad_rules_file) segments advanced_filter(m3u8_content, rules) else: segments filter_ads(m3u8_content) print(f发现 {len(segments)} 个有效视频片段) download_segments(segments) print(合并视频文件...) merge_ts_files(output_fileoutput_video) print(f视频已保存为 {output_video})使用示例if __name__ __main__: process_m3u8( http://example.com/playlist.m3u8, my_video.mp4, ad_rules.txt )6. 高级功能扩展6.1 并发下载优化使用线程池加速下载from concurrent.futures import ThreadPoolExecutor def concurrent_download(urls, output_diroutput, max_workers5): with ThreadPoolExecutor(max_workersmax_workers) as executor: futures [] for idx, url in enumerate(urls, 1): filename f{idx:010d}.ts futures.append(executor.submit( download_single_segment, url, output_dir, filename )) for future in futures: future.result() # 等待所有任务完成6.2 自动识别加密视频通过解析M3U8中的EXT-X-KEY标签处理加密def parse_encryption_info(m3u8_content): key_uri None iv None for line in m3u8_content: if line.startswith(#EXT-X-KEY): parts line.split(,) for part in parts: if URI in part: key_uri part.split()[1].strip() elif IV in part: iv part.split()[1].strip() return key_uri, iv6.3 进度显示与断点续传添加进度条和断点检查功能from tqdm import tqdm def download_with_progress(urls, output_diroutput): existing_files set(f.name for f in Path(output_dir).glob(*.ts)) with tqdm(totallen(urls), desc下载进度) as pbar: for idx, url in enumerate(urls, 1): filename f{idx:010d}.ts if filename in existing_files: pbar.update(1) continue download_single_segment(url, output_dir, filename) pbar.update(1)7. 错误处理与日志记录健壮的生产级代码需要完善的错误处理import logging logging.basicConfig( filenamevideo_downloader.log, levellogging.INFO, format%(asctime)s - %(levelname)s - %(message)s ) def safe_download(url, output_path, max_retries3): for attempt in range(max_retries): try: response requests.get(url, timeout15) response.raise_for_status() with open(output_path, wb) as f: f.write(response.content) logging.info(f成功下载: {url}) return True except Exception as e: logging.warning(f尝试 {attempt1} 失败: {url} - {str(e)}) if attempt max_retries - 1: logging.error(f最终下载失败: {url}) return False time.sleep(2 ** attempt)在实际项目中这套系统帮我节省了大量追剧时间。最令人满意的是广告过滤功能通过维护一个广告规则文件可以持续更新应对各种新的广告URL模式。

相关新闻