用Python手把手教你实现TOPSIS算法：从Excel数据到决策排序（附完整代码）-Seo优化-塔城地区网站建设公司

用Python手把手教你实现TOPSIS算法：从Excel数据到决策排序（附完整代码）

当面对多个候选方案需要综合评估时，TOPSIS（Technique for Order Preference by Similarity to Ideal Solution）算法提供了一种直观有效的决策方法。想象一下这样的场景：你手头有一份包含20家供应商评估数据的Excel表格，每行代表一个供应商，列是各项评估指标（如价格、交货周期、质量评分等）。如何快速客观地给这些供应商排序？这就是TOPSIS要解决的问题。

与传统评分方法不同，TOPSIS通过计算每个方案与理想解的相对接近度来排序。所谓理想解，就是假设存在一个"完美方案"（各指标都最优）和一个"最差方案"（各指标都最劣），我们的目标就是找出最接近完美方案同时远离最差方案的实际方案。这种方法避免了人为赋权的主观性，特别适合处理多指标决策问题。

1. 环境准备与数据加载

在开始之前，确保已安装以下Python库：

pip install pandas numpy scipy matplotlib

假设我们有一个名为supplier_evaluation.xlsx的Excel文件，包含以下结构的数据：

供应商ID	价格(万元)	交货周期(天)	质量评分(1-10)	售后服务评分(1-5)
S001	120	15	8	4
S002	95	25	7	3
S003	110	20	9	5

用pandas加载数据：

import pandas as pd # 读取Excel文件 df = pd.read_excel('supplier_evaluation.xlsx', index_col='供应商ID') print("原始数据：") print(df.head()) # 定义指标类型（1表示效益型，-1表示成本型） criteria_type = { '价格(万元)': -1, # 成本型，越小越好 '交货周期(天)': -1, '质量评分(1-10)': 1, # 效益型，越大越好 '售后服务评分(1-5)': 1 }

2. 数据规范化处理

不同指标的量纲和方向性不同，需要进行规范化处理。TOPSIS通常采用向量归一化方法：

import numpy as np def normalize_matrix(df, criteria_type): # 转换为numpy数组 X = df.values m, n = X.shape # 向量归一化 norm_X = X / np.linalg.norm(X, axis=0) # 调整成本型指标方向 for j in range(n): if list(criteria_type.values())[j] == -1: norm_X[:, j] = -norm_X[:, j] # 成本型指标取负 return norm_X norm_X = normalize_matrix(df, criteria_type) print("\n规范化矩阵：") print(pd.DataFrame(norm_X, index=df.index, columns=df.columns))

3. 确定权重与理想解

权重反映了各指标的重要性程度。可以通过AHP、熵权法等方法确定，这里假设我们已经获得权重：

weights = np.array([0.3, 0.2, 0.3, 0.2]) # 价格、交货、质量、售后 # 加权规范化矩阵 weighted_norm_X = norm_X * weights # 确定正负理想解 positive_ideal = np.max(weighted_norm_X, axis=0) negative_ideal = np.min(weighted_norm_X, axis=0) print("\n正理想解：", positive_ideal) print("负理想解：", negative_ideal)

4. 计算距离与贴近度

计算每个方案到正负理想解的欧氏距离：

from scipy.spatial import distance # 计算距离 dist_pos = np.array([distance.euclidean(x, positive_ideal) for x in weighted_norm_X]) dist_neg = np.array([distance.euclidean(x, negative_ideal) for x in weighted_norm_X]) # 计算贴近度 closeness = dist_neg / (dist_pos + dist_neg) # 添加到DataFrame result_df = df.copy() result_df['正理想距离'] = dist_pos result_df['负理想距离'] = dist_neg result_df['贴近度'] = closeness result_df['排名'] = result_df['贴近度'].rank(ascending=False) print("\n计算结果：") print(result_df.sort_values('排名'))

5. 结果可视化与分析

为了更好地理解结果，我们可以绘制雷达图和排序条形图：

import matplotlib.pyplot as plt from matplotlib.patches import Circle # 雷达图函数 def plot_radar_chart(df, criteria_type, top_n=3): categories = list(df.columns[:-4]) # 排除计算列 N = len(categories) angles = np.linspace(0, 2*np.pi, N, endpoint=False).tolist() angles += angles[:1] # 闭合 fig, ax = plt.subplots(figsize=(8,8), subplot_kw={'polar': True}) # 绘制每个供应商 for idx, (supplier, row) in enumerate(df.sort_values('排名').head(top_n).iterrows()): values = row[categories].values.tolist() values += values[:1] ax.plot(angles, values, linewidth=2, label=f'{supplier} (排名:{int(row["排名"])})') ax.fill(angles, values, alpha=0.25) # 设置坐标轴 ax.set_theta_offset(np.pi/2) ax.set_theta_direction(-1) ax.set_thetagrids(np.degrees(angles[:-1]), categories) # 设置极径 for crit, typ in criteria_type.items(): if typ == 1: ax.set_rlabel_position(angles[categories.index(crit)]*180/np.pi) plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1)) plt.title('TOP供应商指标对比雷达图') plt.show() # 绘制排序结果 def plot_ranking(result_df): plt.figure(figsize=(10,6)) ranked = result_df.sort_values('贴近度', ascending=True) plt.barh(ranked.index, ranked['贴近度'], color='skyblue') for i, v in enumerate(ranked['贴近度']): plt.text(v, i, f" {v:.3f}", va='center') plt.xlabel('贴近度') plt.title('供应商TOPSIS排序结果') plt.grid(axis='x', linestyle='--', alpha=0.7) plt.show() # 执行可视化 plot_radar_chart(result_df, criteria_type) plot_ranking(result_df)

6. 完整代码封装与扩展

将上述步骤封装为可复用的Python类：

class TOPSIS: def __init__(self, data, criteria_type, weights=None): self.data = data self.criteria_type = criteria_type self.weights = weights if weights else np.ones(data.shape[1])/data.shape[1] self.normalized_data = None self.weighted_data = None self.positive_ideal = None self.negative_ideal = None self.result_df = None def normalize(self): X = self.data.values norm_X = X / np.linalg.norm(X, axis=0) for j, typ in enumerate(self.criteria_type.values()): if typ == -1: norm_X[:, j] = -norm_X[:, j] self.normalized_data = pd.DataFrame( norm_X, index=self.data.index, columns=self.data.columns) return self def calculate_ideal_solutions(self): self.weighted_data = self.normalized_data * self.weights self.positive_ideal = np.max(self.weighted_data.values, axis=0) self.negative_ideal = np.min(self.weighted_data.values, axis=0) return self def evaluate(self): dist_pos = np.array([distance.euclidean(x, self.positive_ideal) for x in self.weighted_data.values]) dist_neg = np.array([distance.euclidean(x, self.negative_ideal) for x in self.weighted_data.values]) closeness = dist_neg / (dist_pos + dist_neg) self.result_df = self.data.copy() self.result_df['正理想距离'] = dist_pos self.result_df['负理想距离'] = dist_neg self.result_df['贴近度'] = closeness self.result_df['排名'] = self.result_df['贴近度'].rank(ascending=False) return self.result_df.sort_values('排名') def run(self): return (self.normalize() .calculate_ideal_solutions() .evaluate()) # 使用示例 topsis = TOPSIS(df, criteria_type, weights) ranking_result = topsis.run() print(ranking_result)

7. 实际应用中的注意事项

指标类型确认：
- 效益型指标（越大越好）：如质量评分、客户满意度
- 成本型指标（越小越好）：如价格、交货时间
- 区间型指标（在某个范围内最好）：如温度、pH值（需要额外处理）
权重确定方法：
- 主观赋权法：专家打分、AHP层次分析法
- 客观赋权法：熵权法、CRITIC法
- 组合赋权法：主客观结合

数据预处理技巧：

# 处理缺失值 df = df.fillna(df.mean()) # 数据标准化（可选） from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() scaled_data = scaler.fit_transform(df)

结果验证方法：
- 敏感性分析：微调权重观察排名变化
- 对比实验：与其他决策方法（如加权求和、VIKOR）结果比较
- 实际验证：将排名结果与实际业务表现对比

性能优化建议：

# 使用numpy向量化运算加速 def fast_distance(X, ideal): return np.sqrt(np.sum((X - ideal)**2, axis=1)) # 并行计算（适用于大数据集） from joblib import Parallel, delayed distances = Parallel(n_jobs=-1)( delayed(distance.euclidean)(x, ideal) for x in weighted_norm_X )

8. 扩展应用场景

TOPSIS算法可以应用于各种决策场景，只需适当调整指标体系和权重：

投资组合选择：
- 指标：预期收益率、风险等级、流动性、投资期限
- 权重：[0.4, 0.3, 0.2, 0.1]

人才评估系统：

employee_data = pd.DataFrame({ '专业技能': [8, 7, 9, 6], '团队合作': [7, 8, 6, 9], '创新能力': [6, 7, 8, 7], '项目经验': [5, 6, 7, 8] }, index=['候选人A', '候选人B', '候选人C', '候选人D']) emp_criteria = {'专业技能':1, '团队合作':1, '创新能力':1, '项目经验':1} emp_weights = [0.3, 0.2, 0.2, 0.3] emp_topsis = TOPSIS(employee_data, emp_criteria, emp_weights) print(emp_topsis.run())

产品选型评估：
- 建立包含价格、功能、兼容性、维护成本等维度的评估体系
- 根据企业需求动态调整权重

智能选址分析：

location_data = pd.DataFrame({ '租金成本': [12000, 15000, 10000], '客流量': [500, 700, 400], '竞争强度': [3, 5, 2], # 1-5分，越小越好 '交通便利性': [4, 3, 5] # 1-5分 }, index=['位置A', '位置B', '位置C']) loc_criteria = {'租金成本':-1, '客流量':1, '竞争强度':-1, '交通便利性':1} loc_weights = [0.3, 0.4, 0.2, 0.1] loc_topsis = TOPSIS(location_data, loc_criteria, loc_weights) print(loc_topsis.run())

医疗方案评估：
- 考虑疗效、副作用、费用、治疗周期等多维度指标
- 医生和患者可以设置不同的权重体系得到个性化推荐

9. 常见问题解决方案

问题1：如何处理区间型指标？

对于像"温度控制在18-22℃最佳"这类指标，需要特殊处理：

def interval_normalize(series, best_min, best_max, tolerable_min, tolerable_max): normalized = np.zeros_like(series) for i, val in enumerate(series): if val < best_min: if val >= tolerable_min: normalized[i] = 1 - (best_min - val)/(best_min - tolerable_min) else: normalized[i] = 0 elif val > best_max: if val <= tolerable_max: normalized[i] = 1 - (val - best_max)/(tolerable_max - best_max) else: normalized[i] = 0 else: normalized[i] = 1 return normalized # 示例：温度指标处理 temperature = pd.Series([17, 20, 25, 22, 15]) normalized_temp = interval_normalize(temperature, 18, 22, 15, 25)

问题2：权重如何科学确定？

推荐使用熵权法自动计算权重：

def entropy_weight(X): # 数据标准化 X = X / X.sum(axis=0) # 计算熵值 k = 1 / np.log(X.shape[0]) entropy = -k * (X * np.log(X)).sum(axis=0) # 计算差异系数和权重 diversity = 1 - entropy weights = diversity / diversity.sum() return weights # 使用示例 weights = entropy_weight(df.values)

问题3：如何处理大量数据时的性能问题？

对于大规模数据集：

# 使用Dask进行分布式计算 import dask.dataframe as dd def large_scale_topsis(dask_df, criteria_type, weights): # 转换为dask array X = dask_df.values # 向量归一化 norms = da.sqrt((X ** 2).sum(axis=0)) norm_X = X / norms # 调整方向 for j, typ in enumerate(criteria_type.values()): if typ == -1: norm_X[:, j] = -norm_X[:, j] # 加权计算 weighted_norm_X = norm_X * weights # 计算理想解 positive_ideal = weighted_norm_X.max(axis=0) negative_ideal = weighted_norm_X.min(axis=0) # 计算距离 dist_pos = da.sqrt(((weighted_norm_X - positive_ideal) ** 2).sum(axis=1)) dist_neg = da.sqrt(((weighted_norm_X - negative_ideal) ** 2).sum(axis=1)) # 计算贴近度 closeness = dist_neg / (dist_pos + dist_neg) return closeness.compute()

问题4：如何解释TOPSIS结果给非技术人员？

建议制作直观的可视化报告：

def generate_report(result_df, top_n=5): # 创建图表 fig, axes = plt.subplots(2, 1, figsize=(12, 10)) # 排名条形图 top_results = result_df.sort_values('贴近度').tail(top_n) axes[0].barh(top_results.index, top_results['贴近度'], color='#4c72b0') axes[0].set_title(f'TOP {top_n} 方案排名') axes[0].set_xlabel('贴近度') # 指标对比图 normalized_top = top_results.iloc[:, :-4] # 排除计算列 normalized_top.plot(kind='bar', ax=axes[1], colormap='viridis', title='各方案指标对比') axes[1].set_xticklabels(top_results.index, rotation=0) plt.tight_layout() return fig report = generate_report(ranking_result) report.savefig('TOPSIS_Report.png', dpi=300)

10. 与其他决策方法的对比

在实际项目中，TOPSIS常与其他多准则决策方法结合使用：

方法	优点	缺点	适用场景
TOPSIS	直观易懂，计算简单	对权重敏感	方案排序，有明显理想解的情况
AHP	系统性好，一致性检验	成对比较工作量大	指标权重确定
灰色关联分析	小样本也能工作，数据要求低	分辨率有时不高	数据不完整或样本少的情况
VIKOR	考虑群体效用和个体遗憾	参数设置影响大	需要平衡妥协解的决策
PROMETHEE	处理模糊偏好灵活	计算复杂	有明确偏好函数的复杂决策

# 与其他方法的集成示例 from pyDecision.algorithm import topsis, vikor # TOPSIS结果 topsis_result = topsis(df.values, weights, criteria_type=[1 if x==1 else -1 for x in criteria_type.values()]) # VIKOR结果 vikor_result = vikor(df.values, weights, criteria_type=[1 if x==1 else -1 for x in criteria_type.values()]) # 综合比较 comparison = pd.DataFrame({ 'TOPSIS排名': topsis_result[:, -1], 'VIKOR排名': vikor_result[:, -1] }, index=df.index)

用Python手把手教你实现TOPSIS算法：从Excel数据到决策排序（附完整代码）