博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
matlib实现logistic回归算法(序一)
阅读量:7290 次
发布时间:2019-06-30

本文共 3996 字,大约阅读时间需要 13 分钟。

数据下载:

数据描述:

这是针对美国某区域的一次人口普查结果,共32561条数据。具体字段如下表:


字段名

含义

类型

age

年龄

连续变量

workclass

工作类别

分类变量,用0-7表示,Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked,

fnlwgt

序号

连续变量

education

教育程度

分类变量,0-15表示,Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

education_num

受教育时间(年)

连续变量

maritial_status

婚姻状况

分类变量,用0-6表示

Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse

occupation

职业

分类变量,0-13表示

Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.

relationship

社会关系

分类变量,0-5表示

Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried

race

种族

分类变量,0-4表示

White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black

sex

性别

分类变量,0-1表示

Female, Male

capital_gain

资本收益

连续变量

capital_loss

资本消耗

连续变量

hours_per_week

每周工作小时数

连续变量

native_country

原籍(国家)

分类变量0-39表示

United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

income

收入

分类变量0,1 表示

<=50K, >50K

首先我们根据分类预处理数据,把具体的分类字符串替换成相应的数字,以便运用logistic回归计算模型参数。对于数据中有?的字段,直接剔除掉。

处理完毕后得到adult_train.txt和verify.txt,用logstic算法训练参数,得到的参数用以验证verfiy.txt中的数据,通过比较,发现正确率仅89%,比较结果放在result.xlsx

clear all; close all; clcdata = load('adult_train.txt');x = data(:,1:14);y = data(:,15);m = length(y); % 样本数目x = [ones(m, 1), x]; % 输入特征增加一列,x0=1meanx = mean(x);%求均值sigmax = std(x);%求标准偏差x(:,2) = (x(:,2)-meanx(2))./sigmax(2);x(:,3) = (x(:,3)-meanx(3))./sigmax(3);x(:,4) = (x(:,4)-meanx(4))./sigmax(4);x(:,5) = (x(:,5)-meanx(5))./sigmax(5);x(:,6) = (x(:,6)-meanx(6))./sigmax(6);x(:,7) = (x(:,7)-meanx(7))./sigmax(7);x(:,8) = (x(:,8)-meanx(8))./sigmax(8);x(:,9) = (x(:,9)-meanx(9))./sigmax(9);x(:,10) = (x(:,10)-meanx(10))./sigmax(10);x(:,11) = (x(:,11)-meanx(11))./sigmax(11);x(:,12) = (x(:,12)-meanx(12))./sigmax(12);x(:,13) = (x(:,13)-meanx(13))./sigmax(13);x(:,14) = (x(:,14)-meanx(14))./sigmax(14);x(:,15) = (x(:,15)-meanx(15))./sigmax(15);theta = zeros(size(x(1,:)))'; % 初始化thetag = inline('1.0 ./ (1.0 + exp(-z))'); %定义logistic函数% Newton's methodMAX_ITR = 7;J = zeros(MAX_ITR, 1);for i = 1:MAX_ITR    % Calculate the hypothesis function    z = x * theta;    h = g(z);%转换成logistic函数    % Calculate gradient and hessian.    % The formulas below are equivalent to the summation formulas    % given in the lecture videos.    grad = (1/m).*x' * (h-y);%梯度的矢量表示法    %diag(h),返回向量h为对角线元素的方阵    H = (1/m).*x' * diag(h) * diag(1-h) * x;%hessian矩阵的矢量表示法    % Calculate J (for testing convergence)    J(i) =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h));%损失函数的矢量表示法    theta = theta - H\grad;%H\逆矩阵end% Display thetathetadata1 = load('verify.txt');x1 = data1(:,1:14);y1 = data1(:,15);m1 = length(y1);x1 = [ones(m1, 1), x1];meanx1 = mean(x1);%求均值sigmax1 = std(x1);%求标准偏差x1(:,2) = (x1(:,2)-meanx1(2))./sigmax1(2);x1(:,3) = (x1(:,3)-meanx1(3))./sigmax1(3);x1(:,4) = (x1(:,4)-meanx1(4))./sigmax1(4);x1(:,5) = (x1(:,5)-meanx1(5))./sigmax1(5);x1(:,6) = (x1(:,6)-meanx1(6))./sigmax1(6);x1(:,7) = (x1(:,7)-meanx1(7))./sigmax1(7);x1(:,8) = (x1(:,8)-meanx1(8))./sigmax1(8);x1(:,9) = (x1(:,9)-meanx1(9))./sigmax1(9);x1(:,10) = (x1(:,10)-meanx1(10))./sigmax1(10);x1(:,11) = (x1(:,11)-meanx1(11))./sigmax1(11);x1(:,12) = (x1(:,12)-meanx1(12))./sigmax1(12);x1(:,13) = (x1(:,13)-meanx1(13))./sigmax1(13);x1(:,14) = (x1(:,14)-meanx1(14))./sigmax1(14);x1(:,15) = (x1(:,15)-meanx1(15))./sigmax1(15)y2 = g(x1*theta);y2
View Code

转载于:https://www.cnblogs.com/mikewolf2002/p/8146129.html

你可能感兴趣的文章
聊聊我所从事过的通信行业
查看>>
怎样基于路由器实现IPSec ×××
查看>>
[OpenStack] OpenStack Essex - Nova 安装部署与命令行详解
查看>>
laravel5.1的用户权限管理的实现
查看>>
jquery autocomplete 自动完成插件
查看>>
Java编程思想之-反证法
查看>>
100个Google终极技巧【精品转载】
查看>>
我的友情链接
查看>>
android 获取到view渲染完后的宽高等属性的 监听器
查看>>
IOS学习之数据库(7)--FMDB简单介绍
查看>>
Redmine(三)——Redmine日常使用
查看>>
Jmeter HTTP接口案例开发、调试方法
查看>>
ant编译报错“错误: 编码UTF-8的不可映射字符”
查看>>
容器控件ScrollViewer控件
查看>>
05 备份手机短信
查看>>
Java Synchronized之偏向锁
查看>>
Linux netstat命令详解
查看>>
大型网站技术架构(六)网站的伸缩性架构
查看>>
web服务器、容器和中间件
查看>>
fedora linux自动锁频问题
查看>>