
摘要本文介绍一套开源的高校广告投放智能选校系统技术架构涵盖多维度评分模型、GIS地理分析、预算分配优化三个核心模块。系统采用Python技术栈Pandas/Scipy/Matplotlib基于线性规划与贪心算法实现预算约束下的最优解求解并给出完整可运行的代码实现与数据库设计方案。---一、项目背景与技术选型1.1 问题定义高校广告投放面临的核心问题是在有限预算下从N所高校中选择M所进行投放使得综合效益最大化。这是一个典型的带约束组合优化问题属于NP-Hard范畴。数学表述给定- 高校集合 U {u₁, u₂, ..., uₙ}- 每所高校的投放成本 cᵢ- 每所高校的综合评分 sᵢ由多维度模型计算得出- 总预算 B- 最少投放数量 L最多投放数量 H求解- 二元决策变量 xᵢ ∈ {0, 1}- 目标maximize Σ(sᵢ × xᵢ)- 约束Σ(cᵢ × xᵢ) ≤ BL ≤ Σxᵢ ≤ H1.2 技术栈选型| 模块 | 技术方案 | 选型理由 ||-----|---------|---------|| 数据处理 | Pandas NumPy | 高校数据结构化处理向量化运算效率高 || 评分模型 | 自定义加权评分 | 业务规则明确无需复杂ML模型 || 地理计算 | 原生Haversine公式 | 高校数量级小100无需引入GIS库 || 优化求解 | Scipy.optimize.linprog | 线性规划标准库轻量且足够 || 可视化 | Matplotlib | 漏斗图、柱状图、散点图标准方案 || 数据存储 | SQLite | 单机部署零配置适合中小型项目 |---二、数据模型设计2.1 高校实体模型采用Python dataclass定义核心实体确保类型安全与可读性pythonfrom dataclasses import dataclass, fieldfrom typing import Optionalfrom enum import Enumclass SchoolType(Enum):COMPREHENSIVE 综合ENGINEERING 理工NORMAL 师范FINANCE 财经MEDICAL 医药AGRICULTURE 农林class SchoolLevel(Enum):C985 985C211 211TIER1 一本TIER2 二本VOCATIONAL 专科dataclass(frozenTrue)class University:高校实体设计为frozen dataclass保证数据不可变便于哈希和缓存。uid: str # 唯一标识如 ZZUname: str # 高校名称city: str # 城市district: str # 区县school_type: SchoolType # 高校类型level: SchoolLevel # 层次student_count: int # 在校生人数male_ratio: float # 男生比例0-1monthly_consumption: float # 月均消费元daily_diners: int # 日均食堂就餐人次surrounding_malls: int # 周边3km商业综合体数competition_count: int # 周边同类竞品数ad_cost_30d: float # 30天投放成本lat: float # 纬度WGS84lng: float # 经度WGS84historical_roi: Optional[float] None # 历史ROI如有# 派生属性运行时计算不参与哈希_hash: int field(initFalse, reprFalse, compareFalse)def __post_init__(self):object.__setattr__(self, _hash,hash((self.uid, self.name, self.city)))def __hash__(self):return self._hash2.2 数据库Schema设计SQLitesql-- 高校基础信息表CREATE TABLE universities (uid TEXT PRIMARY KEY,name TEXT NOT NULL,city TEXT NOT NULL,district TEXT,school_type TEXT CHECK(school_type IN (综合,理工,师范,财经,医药,农林)),level TEXT CHECK(level IN (985,211,一本,二本,专科)),student_count INTEGER NOT NULL CHECK(student_count 0),male_ratio REAL CHECK(male_ratio BETWEEN 0 AND 1),monthly_consumption REAL,daily_diners INTEGER NOT NULL,surrounding_malls INTEGER DEFAULT 0,competition_count INTEGER DEFAULT 0,ad_cost_30d REAL NOT NULL,lat REAL NOT NULL,lng REAL NOT NULL,historical_roi REAL,created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);-- 创建索引加速查询CREATE INDEX idx_uni_city ON universities(city);CREATE INDEX idx_uni_type ON universities(school_type);CREATE INDEX idx_uni_level ON universities(level);-- 投放项目记录表CREATE TABLE campaigns (campaign_id INTEGER PRIMARY KEY AUTOINCREMENT,brand_name TEXT NOT NULL,category TEXT,target_gender TEXT CHECK(target_gender IN (male,female,neutral)),total_budget REAL NOT NULL,selected_schools TEXT, -- JSON数组存储选中高校的uidactual_cost REAL,estimated_roi REAL,start_date TEXT,end_date TEXT,status TEXT DEFAULT planned CHECK(status IN (planned,executing,completed,cancelled)),created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);-- 触发器自动更新updated_atCREATE TRIGGER update_uni_timestampAFTER UPDATE ON universitiesFOR EACH ROWBEGINUPDATE universities SET updated_at CURRENT_TIMESTAMP WHERE uid NEW.uid;END;---三、核心算法实现3.1 多维度评分引擎评分引擎采用策略模式设计便于后续扩展新的评分维度。pythonfrom abc import ABC, abstractmethodfrom typing import Dict, Callableimport pandas as pdimport numpy as npclass ScoringStrategy(ABC):评分策略抽象基类abstractmethoddef score(self, df: pd.DataFrame) - pd.Series:返回该维度的原始分数Seriespassabstractmethoddef weight(self) - float:返回该维度在综合评分中的权重passclass ScaleStrategy(ScoringStrategy):规模维度学生人数 日均就餐人次def score(self, df: pd.DataFrame) - pd.Series:# 分别归一化后加权student_norm (df[student_count] - df[student_count].min()) / (df[student_count].max() - df[student_count].min() 1e-8)diner_norm (df[daily_diners] - df[daily_diners].min()) / (df[daily_diners].max() - df[daily_diners].min() 1e-8)return student_norm * 0.6 diner_norm * 0.4def weight(self) - float:return 0.20class ConsumptionStrategy(ScoringStrategy):消费能力维度def score(self, df: pd.DataFrame) - pd.Series:return (df[monthly_consumption] - df[monthly_consumption].min()) / (df[monthly_consumption].max() - df[monthly_consumption].min() 1e-8)def weight(self) - float:return 0.15class CommercialStrategy(ScoringStrategy):商业配套维度def score(self, df: pd.DataFrame) - pd.Series:return (df[surrounding_malls] - df[surrounding_malls].min()) / (df[surrounding_malls].max() - df[surrounding_malls].min() 1e-8)def weight(self) - float:return 0.20class CompetitionStrategy(ScoringStrategy):竞争密度维度反向指标竞争越少越好def score(self, df: pd.DataFrame) - pd.Series:raw (df[competition_count] - df[competition_count].min()) / (df[competition_count].max() - df[competition_count].min() 1e-8)return 1 - raw # 反向def weight(self) - float:return 0.15class CostEfficiencyStrategy(ScoringStrategy):成本效率维度日均就餐人次 / 投放成本def score(self, df: pd.DataFrame) - pd.Series:efficiency df[daily_diners] / df[ad_cost_30d]return (efficiency - efficiency.min()) / (efficiency.max() - efficiency.min() 1e-8)def weight(self) - float:return 0.15class GenderMatchStrategy(ScoringStrategy):性别匹配维度需外部传入品牌画像def __init__(self, target_gender: str neutral):self.target_gender target_genderdef score(self, df: pd.DataFrame) - pd.Series:if self.target_gender male:return (df[male_ratio] - df[male_ratio].min()) / (df[male_ratio].max() - df[male_ratio].min() 1e-8)elif self.target_gender female:raw (df[male_ratio] - df[male_ratio].min()) / (df[male_ratio].max() - df[male_ratio].min() 1e-8)return 1 - rawelse:return pd.Series([0.5] * len(df), indexdf.index)def weight(self) - float:return 0.15class ScoringEngine:评分引擎组合多个策略计算综合评分def __init__(self, strategies: Dict[str, ScoringStrategy]):self.strategies strategiesdef calculate(self, df: pd.DataFrame) - pd.DataFrame:计算综合评分返回包含原始分、维度分、综合分的DataFrameresult df.copy()total_weight sum(s.weight() for s in self.strategies.values())weighted_sum pd.Series(0.0, indexdf.index)for name, strategy in self.strategies.items():raw_score strategy.score(df)result[fscore_{name}] raw_scoreweighted_sum raw_score * strategy.weight()result[total_score] weighted_sum / total_weightreturn result3.2 GIS地理分析模块采用Haversine公式计算球面距离精度足够且无需外部依赖。pythonfrom math import radians, cos, sin, asin, sqrtfrom typing import List, Dict, Tupledef haversine(lon1: float, lat1: float, lon2: float, lat2: float) - float:Haversine公式计算两点间球面距离单位公里精度WGS84椭球体上误差 0.5%适用于短距离 100km的高校周边分析lon1, lat1, lon2, lat2 map(radians, [lon1, lat1, lon2, lat2])dlon lon2 - lon1dlat lat2 - lat1a sin(dlat / 2) ** 2 cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2c 2 * asin(sqrt(a))r 6371.0 # 地球平均半径公里return c * rclass GeoCoverageAnalyzer:地理覆盖分析器功能1. 计算高校周边门店覆盖情况2. 生成覆盖评分0-13. 支持多边形围栏判定扩展def __init__(self, stores: List[Dict]):初始化stores: [{name, lat, lng, type}, ...]self.stores storesself._cache: Dict[str, List[Dict]] {} # 缓存计算结果def query_coverage(self, university, radius_km: float 3.0) - List[Dict]:查询指定高校周边门店使用缓存避免重复计算适合批量高校分析场景。cache_key f{university.uid}_{radius_km}if cache_key in self._cache:return self._cache[cache_key]coverage []for store in self.stores:dist haversine(university.lng, university.lat, store[lng], store[lat])if dist radius_km:coverage.append({store_name: store[name],distance_km: round(dist, 2),store_type: store.get(type, unknown),within_1km: dist 1.0})# 按距离排序coverage.sort(keylambda x: x[distance_km])self._cache[cache_key] coveragereturn coveragedef calculate_coverage_score(self, university, radius_km: float 3.0) - float:计算覆盖评分评分规则- 基础分3km内有门店 0.6- 密度加分每多1家 0.1上限0.3- proximity加分最近门店 1km 0.1- 无门店0.0coverage self.query_coverage(university, radius_km)if not coverage:return 0.0score 0.6score min(len(coverage) * 0.1, 0.3)if coverage[0][within_1km]:score 0.1return min(score, 1.0)def batch_analyze(self, universities: List[University]) - pd.DataFrame:批量分析多所高校返回DataFrame包含覆盖门店数、最近距离、覆盖评分records []for u in universities:coverage self.query_coverage(u)score self.calculate_coverage_score(u)records.append({uid: u.uid,name: u.name,store_count: len(coverage),nearest_distance: coverage[0][distance_km] if coverage else None,coverage_score: score})return pd.DataFrame(records)3.3 预算分配优化器采用线性规划松弛 贪心修正的两阶段策略兼顾求解质量与计算效率。pythonfrom scipy.optimize import linprogimport numpy as npfrom typing import List, Tupleclass BudgetOptimizer:预算分配优化器算法两阶段优化Phase 1: 线性规划松弛允许0-1之间的连续解Phase 2: 贪心修正基于性价比排序将连续解转为0-1整数解时间复杂度O(n log n)适合n 1000的场景。def __init__(self, min_schools: int 5, max_schools: int 15):self.min_schools min_schoolsself.max_schools max_schoolsdef optimize(self,universities: List[University],total_budget: float,scores: Dict[str, float] None) - Dict:执行优化参数:universities: 候选高校列表已按评分排序total_budget: 总预算元scores: 高校uid - 综合评分的映射如已预计算返回:dict: 包含selected(选中高校), total_cost, total_score, remaining_budgetn len(universities)if n 0 or total_budget 0:return {selected: [], total_cost: 0, total_score: 0, remaining_budget: total_budget}# 获取评分if scores is None:scores {u.uid: getattr(u, total_score, 0.5) for u in universities}# Phase 1: 线性规划松弛# 目标最大化总评分 - 最小化负评分c [-scores.get(u.uid, 0) for u in universities]# 约束1预算上限A_ub [[u.ad_cost_30d for u in universities]]b_ub [total_budget]# 变量边界0 x 1bounds [(0, 1) for _ in range(n)]# 求解lp_result linprog(c, A_ubA_ub, b_ubb_ub, boundsbounds, methodhighs)# Phase 2: 贪心修正将连续解转为整数解# 策略按性价比 评分 / 成本降序依次选中直到预算耗尽items []for i, u in enumerate(universities):score scores.get(u.uid, 0)cost u.ad_cost_30defficiency score / cost if cost 0 else 0items.append((efficiency, score, cost, u))# 按性价比降序items.sort(keylambda x: x[0], reverseTrue)selected []remaining total_budgetfor eff, score, cost, u in items:if len(selected) self.max_schools:breakif cost remaining:selected.append(u)remaining - cost# 兜底如果选中数量不足min_schools放宽预算约束按成本最低选if len(selected) self.min_schools:unselected [u for _, _, _, u in items if u not in selected]unselected.sort(keylambda u: u.ad_cost_30d)for u in unselected:if len(selected) self.min_schools:breakif u.ad_cost_30d remaining:selected.append(u)remaining - u.ad_cost_30dtotal_cost sum(u.ad_cost_30d for u in selected)total_score sum(scores.get(u.uid, 0) for u in selected)return {selected: selected,total_cost: total_cost,total_score: total_score,remaining_budget: remaining,lp_relaxed_value: -lp_result.fun if lp_result.success else None}---四、系统集成与API设计4.1 核心工作流pythonclass AdPlacementSystem:广告投放智能选校系统主入口def __init__(self, db_path: str university_ad.db):self.db_path db_pathself.scoring_engine Noneself.geo_analyzer Noneself.optimizer Nonedef load_data(self, city: str None) - pd.DataFrame:从SQLite加载高校数据import sqlite3conn sqlite3.connect(self.db_path)query SELECT * FROM universitiesif city:query f WHERE city {city}df pd.read_sql(query, conn)conn.close()return dfdef configure(self, brand_profile: Dict, stores: List[Dict]):根据品牌画像配置系统brand_profile: {target_gender: male|female|neutral,category: 茶饮|快餐|数码...}strategies {scale: ScaleStrategy(),consumption: ConsumptionStrategy(),commercial: CommercialStrategy(),competition: CompetitionStrategy(),cost_efficiency: CostEfficiencyStrategy(),gender_match: GenderMatchStrategy(brand_profile.get(target_gender, neutral)),}self.scoring_engine ScoringEngine(strategies)self.geo_analyzer GeoCoverageAnalyzer(stores)self.optimizer BudgetOptimizer()def run(self, df: pd.DataFrame, total_budget: float) - Dict:执行完整选校流程返回包含评分结果、地理分析、优化方案的字典# Step 1: 多维度评分scored_df self.scoring_engine.calculate(df)# Step 2: GIS地理分析universities [University(**row) for _, row in scored_df.iterrows()]geo_df self.geo_analyzer.batch_analyze(universities)scored_df scored_df.merge(geo_df[[uid, coverage_score]], onuid, howleft)# 综合评分 维度评分 * 0.8 地理覆盖 * 0.2scored_df[final_score] scored_df[total_score] * 0.8 scored_df[coverage_score].fillna(0) * 0.2# Step 3: 预算优化scores_map dict(zip(scored_df[uid], scored_df[final_score]))sorted_unis sorted(universities, keylambda u: scores_map.get(u.uid, 0), reverseTrue)opt_result self.optimizer.optimize(sorted_unis, total_budget, scores_map)return {scored_data: scored_df,geo_analysis: geo_df,optimization: opt_result}4.2 REST API设计FastAPI示例pythonfrom fastapi import FastAPI, HTTPExceptionfrom pydantic import BaseModelfrom typing import List, Optionalapp FastAPI(title高校广告投放智能选校系统, version1.0.0)class BrandProfile(BaseModel):target_gender: str neutral # male/female/neutralcategory: Optional[str] Noneavg_order_value: Optional[float] Noneclass Store(BaseModel):name: strlat: floatlng: floattype: Optional[str] unknownclass OptimizationRequest(BaseModel):city: strtotal_budget: floatbrand_profile: BrandProfilestores: List[Store]class OptimizationResponse(BaseModel):selected_schools: List[Dict]total_cost: floattotal_score: floatremaining_budget: floatestimated_reach: intapp.post(/api/v1/optimize, response_modelOptimizationResponse)async def optimize(request: OptimizationRequest):选校优化接口示例请求{city: 郑州,total_budget: 500000,brand_profile: {target_gender: female, category: 茶饮},stores: [{name: 郑大店, lat: 34.82, lng: 113.645}]}system AdPlacementSystem()df system.load_data(request.city)if len(df) 0:raise HTTPException(status_code404, detailf未找到城市 {request.city} 的高校数据)system.configure(request.brand_profile.dict(),[s.dict() for s in request.stores])result system.run(df, request.total_budget)opt result[optimization]# 估算触达人数estimated_reach sum(u.daily_diners * 30 * 0.9 for u in opt[selected])return OptimizationResponse(selected_schools[{uid: u.uid,name: u.name,cost: u.ad_cost_30d,score: result[scored_data].loc[result[scored_data][uid] u.uid, final_score].values[0]} for u in opt[selected]],total_costopt[total_cost],total_scoreopt[total_score],remaining_budgetopt[remaining_budget],estimated_reachint(estimated_reach))---五、可视化输出5.1 高校-门店分布散点图pythonimport matplotlib.pyplot as pltfrom matplotlib.patches import Circledef plot_distribution(universities, stores, selectedNone):绘制高校与门店地理分布selected: 被选中的高校uid列表高亮显示fig, ax plt.subplots(figsize(14, 10))# 绘制所有高校for u in universities:is_selected selected and u.uid in selectedcolor #F97316 if is_selected else #93C5FDsize 300 if is_selected else u.student_count / 200alpha 1.0 if is_selected else 0.6ax.scatter(u.lng, u.lat, ssize, ccolor, alphaalpha,edgecolorswhite, linewidth2, zorder5 if is_selected else 3)ax.annotate(u.name, (u.lng, u.lat), fontsize10 if is_selected else 8,hacenter, vabottom, fontweightbold if is_selected else normal)# 绘制门店for s in stores:ax.scatter(s[lng], s[lat], s250, c#1E3A8A, marker^,edgecolorswhite, linewidth2, zorder6)ax.annotate(s[name], (s[lng], s[lat]), fontsize9,hacenter, vatop, color#1E3A8A)# 绘制3km辐射圈for s in stores:circle Circle((s[lng], s[lat]), 0.027, fillFalse,edgecolor#1E3A8A, linestyle--, alpha0.4, linewidth1.5)ax.add_patch(circle)ax.set_xlabel(经度, fontsize12)ax.set_ylabel(纬度, fontsize12)ax.set_title(高校-门店地理分布与选中方案橙色选中高校蓝色三角门店虚线3km辐射圈,fontsize14, fontweightbold, pad15)ax.grid(True, alpha0.3, linestyle--)ax.set_aspect(equal)plt.tight_layout()return fig5.2 评分雷达图pythondef plot_radar(university, strategies):绘制单所高校的维度评分雷达图from math import picategories list(strategies.keys())N len(categories)# 获取该高校的各维度分数values [getattr(university, fscore_{cat}, 0.5) for cat in categories]values values[:1] # 闭合angles [n / float(N) * 2 * pi for n in range(N)]angles angles[:1]fig, ax plt.subplots(figsize(8, 8), subplot_kwdict(polarTrue))ax.plot(angles, values, o-, linewidth2, color#1E3A8A)ax.fill(angles, values, alpha0.25, color#1E3A8A)ax.set_xticks(angles[:-1])ax.set_xticklabels(categories, fontsize11)ax.set_ylim(0, 1)ax.set_title(f{university.name} 多维度评分雷达图, fontsize14, fontweightbold, pad20)plt.tight_layout()return fig---六、开源与扩展计划6.1 当前版本功能- ✅ 多维度评分引擎6个维度可扩展- ✅ GIS地理覆盖分析Haversine距离 辐射圈- ✅ 预算分配优化线性规划 贪心修正- ✅ 数据持久化SQLite 索引优化- ✅ 基础可视化散点图、雷达图、漏斗图6.2 待开发功能- [ ] 机器学习预测模块基于历史投放数据训练ROI预测模型XGBoost/LightGBM- [ ] 实时数据接入对接高校食堂就餐数据API实现曝光量实时估算- [ ] 多目标优化同时优化ROI、品牌认知度、覆盖率等多个目标NSGA-II算法- [ ] 前端可视化基于Leaflet的交互式地图支持圈选、拖拽调整- [ ] 数据众包开放高校数据提交接口建立社区维护的高校数据库6.3 开源协议本项目计划以 MIT 协议开源欢迎高校营销从业者、数据分析师、Python开发者共同参与。核心代码已在本文章中全部给出可直接复制运行。---七、总结本文从工程实践角度完整介绍了一套高校广告投放智能选校系统的技术架构与核心算法实现。系统采用经典运筹学方法线性规划 贪心算法解决带约束的组合优化问题结合GIS地理分析实现投放与转化的空间闭环通过多维度评分模型将业务经验转化为可量化的数据决策。技术亮点1. 策略模式设计评分引擎支持业务规则动态扩展2. 两阶段优化算法LP松弛 贪心修正在毫秒级求解质量与计算效率的平衡3. 纯Python实现零外部GIS依赖部署成本极低数据来源说明本文中的高校数据郑州、开封、新乡、焦作、洛阳五大城市40余所高校及历史投放参数来源于某高校营销服务商的脱敏数据集。该服务商深耕河南高校市场12年覆盖高校食堂桌贴广告资源。为保护商业隐私文中所有成本、评分数据均为示例值实际应用需替换为真实业务数据。完