学位论文 > 优秀研究生学位论文题录展示

基于粗糙集理论的关联知识发现

作　者: 王天志
导　师: 夏幼明
学　校: 云南师范大学
专　业: 计算机软件与理论
关键词: 离散化等价类属性约简联合熵二进制表示兴趣度准确度
分类号: TP182
类　型: 硕士论文
年　份: 2005年
下　载: 378次
引　用: 1次
阅　读: 论文下载

内容摘要

粗糙集理论是一种新的处理模糊和不确定性知识的数学工具。其主要思想是在保持分类能力不变的前提下,通过知识约简,导出问题的决策或分类规则。它与其他处理不确定性问题理论的最显著的区别是它无需提供问题所需处理的数据以外的任何先验信息。粗糙集理论认为知识就是人类和其他物种所固有的分类能力,粗糙集的一大优势就是其极强的分类能力[史忠植,2002]。传统的关联规则挖掘算法没有对数据集进行属性的约简,这将导致挖掘出的关联规则可能存在大量的冗余,不利于决策者的结果分析和决策。而且传统关联规则挖掘算法仅适用于布尔型(定性)的关联规则的挖掘,不能直接进行定量的规则挖掘。有鉴于粗糙集的这些优势,粗糙集理论便被运用于关联规则的挖掘中。将粗糙集理论运用于关联规则的挖掘大致经历以下过程:预处理——将连续属性离散化,处理矛盾信息等,属性约简——包括两个过程,属性集的约简和属性值的约简,规则提取——关联挖掘。论文主要工作: (1)对知识表达理论应用于粗糙集理论进行了研究,引用了知识量、平均知识量、熵和联合熵等概念,并将联合熵,即条件属性集和决策属性集的联合表达的平均知识量,应用于粗糙集的连续属性离散和属性约简中,作为处理的判别标准; (2)对连续属性离散化的一种方法“增类减类算法”进行了改进,提出了连续属性联合熵离散化算法。增类减类算法经历了两个过程:先将每个属性分为两类,此时判断新的属性集的支持度——是否满足与原属性集的支持度相等的条件,若相等,停止增类过程;若不等,则继续对下一个属性进行增类过程,直到满足条件。然后进行减类过程,依次对每个属性的分类数减少一个,判断新的支持度是否满足同样的条件,若满足则继续对下一个属性进行减类,若不满足,则停止减类过程,该属性的分类数即为此次减类前的分类数。而连续属性联合熵离散化算法根据支持度和属性离散的性质,只进行了一个减类过程,以初始时等价类作为初始分类,然后对各个属性按分级聚类法减少一个该属性等价数的分类,看是否满足条件属性对决策属性的联合熵相等的条件,若满足则对下一个属性进行同样的减类处理,直到支持度下降为止。 (3)为求属性集的等价类引入了等价类的二进制表示,属性集的等价类可以通过各个属性等价类的二进制表示的与运算来求解,通过属性及属性集的二进制表示还可以求解关联规则的支持度、兴趣度和准确度。在规则的发现中结合了支持度、兴趣度和准确度作为关联规则过滤的阈值。 (4)给出了决策属性等价类算法来求解决策表的属性等价类;给出了二进制支持度算法来为求解关联规则的支持度,而兴趣度和准确度都可以通过支持度来计算;为求解有效关联规则给出了有效关联规则算法。

全文目录

1.基于粗糙集理论的关联知识发现  5-37
  目录  5-6
  摘要  6-7
  第一章粗糙集和知识表达度量理论的基本概念和原理  7-14
    1.1 引言  7-9
      1.1.1 粗糙集的应用领域  7-8
        1.1.1.1 分类规则提取  7-8
        1.1.1.2 数据归约  8
      1.1.2 粗糙集理论与其他方法的融合  8-9
    1.2 知识与知识表达  9
    1.3 基本定义和原理  9-10
    1.4 支持度  10-11
    1.5 知识表达度量理论基本概念  11-14
  第二章数据预处理——连续属性离散化  14-19
    2.1 连续属性离散化后的联合熵变化  14-15
    2.2 基本算法  15-17
    2.3 增类减类离散化算法的改进  17-19
  第三章知识约简  19-23
    3.1 知识约简基本概念  19-20
    3.2 属性重要性  20
    3.3 知识约简原理  20-23
  第四章关联知识发现  23-27
    4.1 有效关联规则理论  23-24
    4.2 规则统计过滤  24-26
    4.3 关联规则挖掘算法  26-27
  第五章试验分析  27-30
    5.1 试验步骤  27
    5.2 试验采用的数据  27-28
    5.3 试验结果  28
    5.4 试验结果分析  28-30
  第六章总结和展望  30-31
    6.1 论文总结  30
    6.2 粗糙集理论在数据挖掘中的应用展望  30-31
  参考文献：  31-37
2.Association Knowledge Mining Based on Rough Sets  37-71
  Abstract  38-40
  Chapter 1 Basic Concept and Principle of Rough Set  40-49
    1.1 Introduction  40-43
      1.1.1 Application Field of Rough Set  40-42
        1.1.1.1 Pick-up Sorting Rules  40-41
        1.1.1.2 Data Reduction  41-42
      1.1.2 Fusing the Rough Set Theory and other method  42-43
      1.1.3 The Classification of Application of the Rough Set Theory  43
    1.2 Knowledge and Knowledge Expression  43-44
    1.3 Basic Definition and Principle  44
    1.4 Info Entropy,Sustainability  44-46
    1.5 Basic Concept of Knowledge Expression Measurement Theory  46-49
  Chapter 2 Data Pretreatment—Dispersing of Successive Attribute  49-56
    2.1 The Change of the United Entropy by Dispersing Successive Attribute  50-51
    2.2 Basic Algorithm  51-53
    2.3 Improvement on Dispersing Algorithm by Increasing and Reducing Classes  53-56
  Chapter 3 Knowledge Reduction  56-61
    3.1 Basic Concept of Knowledge Reduction  56-57
    3.2 Essentiality of Attribute  57-58
    3.3 Principle of Knowledge Reduction  58-61
  Chapter 4 Association Knowledge Discovery  61-66
    4.1 Efficient Association Rule Theory  61-62
    4.2 Rule Filtrating by Stat.--Pick-up Association Rules Based on Binary System  62-65
    4.3 Association Rule Algorithm  65-66
  Chapter 5 Trial Analysis  66-69
    5.1 Trial Approach  66
    5.2 Trial Data  66-67
    5.3 Trial Result  67-68
    5.4 Analysis of Trial Result  68-69
  Chapter 6  69-71
    6.1 Paper Summary  69
    6.2 Prospect of the Rough Set Theory's Application in Data Mining  69-71
3.面向信息系统的关联规则挖掘研究  71-125
  目录  71-73
  前言  73-74
  第一部分数据库中的知识发现和数据挖掘概述  74-92
    第一章在数据库的知识发现(KDD)  74-82
      1.1 KDD基本概念  74-75
      1.2 KDD的起源  75-76
      1.3 KDD研究现状  76
      1.4 KDD的一般机理  76
      1.5 主要研究方法  76-77
      1.6 抽取知识的类型和表示  77
      1.7 KDD系统的基本框架  77-78
      1.8 KDD的挖掘模式  78-80
        1.8.1 关联模式(Association Model)  79
        1.8.2 分类模式(Classification Model)  79
        1.8.3 聚类模式(Clustering Model)  79
        1.8.4 回归模式(Regression Model)  79-80
        1.8.5 序列模式(Sequence Modell)  80
      1.9 典型方法及工具  80-82
    第二章数据挖掘概述  82-92
      2.1 DM概念  82-83
      2.2 主要研究方法  83-89
        2.2.1 分类模式(Classification Model)  83-85
        2.2.2 聚类分析模式(Clustering Analysis Method)  85-88
        2.2.3 回归模式(Regression)  88
        2.2.4 关联模式(Association Model)  88
        2.2.5 序列模式(Sequential Model)  88
        2.2.6 偏差模式(Deviation Model)  88-89
      2.3 数据挖掘的常用方法  89-92
        2.3.1 模糊方法(Fuzzy Method)  89
        2.3.2 粗糙集理论(Rough Set Theory)  89
        2.3.3 云理论(Cloud Theory)  89-90
        2.3.4 证据理论(Evidence Theory)  90
        2.3.5 人工神经网络(Artificial Neural Network,ANN)  90
        2.3.6 遗传算法(Genetic Algorithm,GA)  90-91
        2.3.7 归纳学习(Induction Learning)  91-92
  第二部分粗糙集理论  92-105
    第三章粗糙集基本理论  93-102
      3.1 基本概念  93-94
      3.2 区分矩阵与区分函数  94
      3.3 连续属性离散化  94-97
        3.3.1 现有的离散化方法分类：  95
        3.3.2 典型的属性离散化算法  95-97
      3.4 信息熵  97-99
      3.5 知识的依赖性  99-100
      3.6 属性约简  100-102
    第四章知识表达理论  102-105
      4.1 Agent与知识的相关概念  102
      4.2 基于Agent的知识表达度量理论  102-105
        4.2.1 知识量(Knowledge Quantum)  102-103
        4.2.2 熵(Entropy)  103
        4.2.3 等价知识基元个数  103-105
  第三部分关联规则挖掘  105-125
    第五章关联规则AR挖掘的原理和步骤  105-108
      5.1 基本概念和问题描述  105-106
      5.2 AR选择的技术标准  106-107
      5.3 AR挖掘的步骤  107-108
    第六章 AR挖掘的分类及算法研究  108-113
      6.1 AR挖掘的分类  108
      6.2 主要研究方向和典型算法分析  108-113
        6.2.1 多循环方式的采掘算法  108-109
        6.2.2 增量式更新算法  109-110
        6.2.3 核心算法  110-111
        6.2.4 频集算法的几种优化方法  111-113
    第七章有效关联规则挖掘  113-121
      7.1 语义关联规则  113-115
      7.2 有效关联规则  115-121
    第八章基于粗糙集的关联规则挖掘  121-125
      8.1 传统关联规则挖掘的不足  121
      8.2 粗糙集理论应用于关联规则挖掘的优势  121-122
      8.3 基于粗糙集的关联规则挖掘的一般步骤  122
      8.4 典型算法  122-125
4.Association Rules Mining Research Facing to Info System  125-182
  Preface  127-128
  Part 1 Knowledge Discovery in Database and Summary of Data Mining  128-150
    Chapter 1 Knowledge Discovery in Database(KDD)  128-138
      1.1 Basic Concept of KDD  128-129
      1.2 Origin of KDD  129-130
      1.3 Present Research on KDD  130-131
      1.4 the General Mechanism of KDD  131
      1.5 Major Research Techniques  131-132
      1.6 Type and Expression of Collecting Knowledge  132-133
      1.7 Basic Frame of KDD System  133-134
      1.8 Mode of KDD Mining  134-136
        1.8.1 Association Model  134-135
        1.8.2 Classification Model  135
        1.8.3 Clustering Model  135
        1.8.4 Regression Model  135
        1.8.5 Sequence Model  135-136
      1.9 Typical Methods and Tools  136-138
    Chapter 2 Summary of Data Mining  138-150
      2.1 DM Concept  138-139
      2.2 Main Research Approaches  139-146
        2.2.1 Classification Model  139-142
        2.2.2 Clustering Analysis Method  142-145
          (1) Partitioning Method  143-144
          (2) Hierarchical Method  144
          (3) Density-based Method  144
          (4) Grid-based Method  144
          (5) Model-based Method  144-145
          (6) Outlier Mining  145
        2.2.3 Regression  145
        2.2.4 Association Model  145-146
        2.2.5 Sequential Model  146
        2.2.6 Deviation Model  146
      2.3 Method of DM in Common Use  146-150
        2.3.1 Fuzzy Method  146
        2.3.2 Rough Set Theory  146-147
        2.3.3 Cloud Theory  147-148
        2.3.4 Evidence Theory  148
        2.3.5 Artificial Neural Network(ANN)  148
        2.3.6 Genetic Algorithm(GA)  148-149
        2.3.7 Induction Learning  149-150
  Part 2 Rough Set Theory  150-165
    Chapter 3 Basic Theory of RS  151-162
      3.1 Basic Concept  151-152
      3.2 Distingui shment Matrix and Distinguishment Function  152-153
      3.3 Dispersing of Successive Attributes  153-155
        3.3.1 Classification of Dispersing Method in Exisitence:  153
        3.3.2 Typical Dispersing Algorithm  153-155
      3.4 Info Entropy  155-158
      3.5 Dependence of Knowledge  158-159
      3.6 Attribute Reduction  159-162
    Chapter 4 Knowledge Expression Theory  162-165
      4.1 Correlation Concept of Agent and Knowledge  162
      4.2 Knowledge Expression Measurement Theory Based on Agent  162-165
        4.2.1 Knowledge Quantum  162-163
        4.2.2 Entropy  163
        4.2.3 Number of Basic Element of Equivalence Knowledge  163-165
  Part 3 Association Rule  165-182
    Chapter 5 Principle and Approach of Assiciation Rules Mining  165-168
      5.1 Basic Concept and Issue Description  165-166
      5.2 Technique Criterion of AR  166-167
      5.3 Approach of AR Mining  167-168
    Chapter 6 Classification and Algorithm Rearsh of AR Mining  168-173
      6.1 Classification of AR Mining  168-169
      6.2 Main Research Orientation and Typical Algorithm Analysis  169-173
        6.2.1 Excavation Algorithm of Many Cycle Mode  169-170
        6.2.2 Increment Mode Updating Algorithm  170
        6.2.3 Kernel Algorithm  170-171
        6.2.4 Several Optimiztion Methods of Frequency Set  171-173
    Chapter 7 Effective Association Rule Mining  173-180
      7.1 Semantic Association Rule  173-175
      7.2 Effective Association Rule  175-180
    Chapter 8 Association Rules Mining Based on RS  180-182
      8.1 Shortcoming of Traditional Association Rule Mining  180
      8.2 Superiority of RS Applied to Association Rule Mining  180-181
      8.3 Commonly Process of Association Rule Mining Based on RS  181
      8.4 Typical Algorithm  181-182

基于粗糙集理论的关联知识发现

内容摘要

全文目录

相似论文