2000字范文,分享全网优秀范文,学习好帮手!
2000字范文 > 广义线性模型(逻辑回归 泊松回归)

广义线性模型(逻辑回归 泊松回归)

时间:2021-01-10 07:49:41

相关推荐

广义线性模型(逻辑回归 泊松回归)

线性回归模型也并不适用于所有情况,有些结果可能包含而元数据(比如正面与反面)或者计数数据,广义线性模型可用于解释这类数据,使用的仍然是自变量的线性组合。

目录

逻辑回归

使用statsmodels

使用sklearn

泊松回归

使用statsmodels

负二项回归

逻辑回归

当响应变量为二元数据时,常用逻辑回归对数据进行建模。

以下数据来源于pandas活用所提供的数据,如需要可在此下载/download/qq_57099024/79301082

import pandas as pdd=pd.read_csv('D:/pandas活用/pandas_for_everyone-master/data/acs_ny.csv')print(d.columns)print('@'*66)#输出特殊符号以区分两次输出print(d.head())'''以下为输出结果:Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren','NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers','OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp','HeatingFuel', 'Insurance', 'Language'],dtype='object')@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@Acres FamilyIncome FamilyType NumBedrooms NumChildren NumPeople \0 1-10 150Married 4 13 1 1-10 180 Female Head 3 24 2 1-10 280 Female Head 4 02 3 1-10 330 Female Head 2 12 4 1-10 330 Male Head 3 12 NumRooms NumUnits NumVehicles NumWorkers OwnRent YearBuilt \0 9 Single detached 1 0 Mortgage 1950-1959 1 6 Single detached 2 0 Rented Before 1939 2 8 Single detached 3 1 Mortgage 2000- 3 4 Single detached 1 0 Rented 1950-1959 4 5 Single attached 1 0 Mortgage Before 1939 HouseCosts ElectricBill FoodStamp HeatingFuel Insurance Language 0 1800 90 No Gas 2500 English 1 850 90 No Oil0 English 2 2600 260 No Oil 6600 Other European 3 1800 140 No Oil0 English 4 860 150 No Gas 660 Spanish '''

以下对FamilyIncome 进行分箱操作:

d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])d['income_15w']=d['income_15w'].astype(int)

使用cut分箱操作,创建二值响应变量_我就是一个小怪兽的博客-CSDN博客

使用statsmodels

import statsmodels.formula.api as smfmodel=smf.logit('income_15w~HouseCosts+NumWorkers+OwnRent+NumBedrooms+FamilyType',data=d)results=model.fit()print(results.summary())

Optimization terminated successfully.Current function value: 0.391651Iterations 7Logit Regression Results ==============================================================================Dep. Variable: income_15w No. Observations:22745Model:Logit Df Residuals:22737Method: MLE Df Model: 7Date:Sat, 05 Feb Pseudo R-squ.: 0.2078Time: 08:46:18 Log-Likelihood:-8908.1converged: True LL-Null: -11244.Covariance Type: nonrobust LLR p-value: 0.000===========================================================================================coef std errzP>|z|[0.0250.975]-------------------------------------------------------------------------------------------Intercept -5.80810.120 -48.4560.000-6.043-5.573OwnRent[T.Outright] 1.82760.2088.7820.000 1.420 2.236OwnRent[T.Rented]-0.87630.101-8.6470.000-1.075-0.678FamilyType[T.Male Head]0.28740.1501.9130.056-0.007 0.582FamilyType[T.Married] 1.38770.08815.7810.000 1.215 1.560HouseCosts 0.0007 1.72e-0542.4530.000 0.001 0.001NumWorkers 0.58730.02622.3930.000 0.536 0.639NumBedrooms 0.23650.01713.9850.000 0.203 0.270==================================================================================

使用sklearn

predictors=pd.get_dummies(d[['HouseCosts','NumWorkers','OwnRent','NumBedrooms','FamilyType']],drop_first=True)from sklearn import linear_modellr=linear_model.LogisticRegression()results=lr.fit(X=predictors,y=d['income_15w'])print(results.coef_)print('-*-'*10)print(results.intercept_)

[[ 5.86894916e-04 7.32489391e-01 2.86764784e-01 7.17542587e-02-2.13282748e+00 -1.03910262e+00 2.63647146e-01]]-*--*--*--*--*--*--*--*--*--*-[-4.86108187]

泊松回归

常用于计数数据分析

使用statsmodels

results=smf.poisson('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d).fit()print(results.summary())

Optimization terminated successfully.Current function value: nanIterations 1Poisson Regression Results==============================================================================Dep. Variable: NumChildren No. Observations:22745Model: Poisson Df Residuals:22739Method: MLE Df Model: 5Date:Sat, 05 Feb Pseudo R-squ.: nanTime: 09:05:28 Log-Likelihood:nanconverged: True LL-Null: -30977.Covariance Type: nonrobust LLR p-value: nan===========================================================================================coef std errzP>|z|[0.0250.975]-------------------------------------------------------------------------------------------Intercept nan nan nan nan nan nanFamilyType[T.Male Head] nan nan nan nan nan nanFamilyType[T.Married]nan nan nan nan nan nanOwnRent[T.Outright] nan nan nan nan nan nanOwnRent[T.Rented] nan nan nan nan nan nanFamilyIncome nan nan nan nan nan nan==================================================================================

负二项回归

如果泊松回归的假设不理想(例如数据过度离散),可使用负二项回归来代替

statsmodels的GLM文档列入了可以传入GLM参数的许多分布族,可在sm.familiese.<FAMILY>.links下找到连接函数::

Binomial(二项式分布)

Gamma(伽马分布)

InverseGaussian(逆高斯分布)

NegativeBinomial(负二项式分布)

Poisson(泊松分布)

Tweedie分布

import statsmodelsimport statsmodels.api as smimport statsmodels.formula.api as smfmodel=smf.glm('NumChildren~FamilyIncome+FamilyType+OwnRent',data=d,family=sm.families.NegativeBinomial(sm.genmod.families.links.log))results=model.fit()print(results.summary())

Generalized Linear Model Regression Results ==============================================================================Dep. Variable: NumChildren No. Observations:22745Model: GLM Df Residuals:22739Model Family: NegativeBinomial Df Model: 5Link Function:log Scale:1.0000Method:IRLS Log-Likelihood:-29749.Date:Sat, 05 Feb Deviance: 20731.Time: 10:06:21 Pearson chi2: 1.77e+04No. Iterations: 6 Covariance Type: nonrobust ===========================================================================================coef std errzP>|z|[0.0250.975]-------------------------------------------------------------------------------------------Intercept -0.33450.029 -11.6720.000-0.391-0.278FamilyType[T.Male Head] -0.04680.052-0.9050.365-0.148 0.055FamilyType[T.Married] 0.15290.0295.2000.000 0.095 0.211OwnRent[T.Outright] -1.97370.243-8.1130.000-2.450-1.497OwnRent[T.Rented] 0.41640.03013.7540.000 0.357 0.476FamilyIncome 5.398e-07 9.55e-085.6520.000 3.53e-07 7.27e-07=================================================================================

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。