首页 >> 收录期刊 >> 中文信息学报 >> 正文
杂志中文名:中文信息学报
杂志英文名:Journal of Chinese Information Processing
主管单位:中国科学技术协会
主办单位:中国中文信息学会、中国科学院软件研究所
地址:北京海淀区中关村南四街4号
邮编:100080
电话:010-62562916;
Email:cips@iscas.ac.cn
ISSN:1003-0077
主编:孙茂松












汉语自动分词词典机制的实验研究
引用本文:孙茂松,左正平,黄昌宁.汉语自动分词词典机制的实验研究[J].中文信息学报,2000,14(1):1-6.
作者姓名:孙茂松  左正平  黄昌宁
作者单位:清华大学计算机科学与技术系北京 100084
基金项目:本研究得到国家自然科学基金资助(合同号:69433010)
摘    要:分词词典是汉语自动分词系统的一个基本组成部分。其查询速度直接影响到分词系统的处理速度。本文设计并通过实验考察了三种典型的分词词典机制:整词二分,TRIE索引树及逐字二分,着重比较了它们的时间,空间效率。实验显示:基于逐字二分的分词词典机制简洁,高效,较好地满足了实用型汉语自动分词系统的需要。

关 键 词:中文信息处理  汉语自动分词  分词词典机制
修稿时间:1999年4月6日

An Experimental Study on Dictionary Mechanism for Chinese Word Segmentation
Sun Maosong,Zuo Zhengping,Huang Changning,The State Key Laboratory of Intelligent Technology and Systems.An Experimental Study on Dictionary Mechanism for Chinese Word Segmentation[J].Journal of Chinese Information Processing,2000,14(1):1-6.
Authors:Sun Maosong  Zuo Zhengping  Huang Changning  The State Key Laboratory of Intelligent Technology  Systems
Affiliation:Sun Maosong Zuo Zhengping Huang Changning The State Key Laboratory of Intelligent Technology and Systems,Department of Computer Science and Technology,Tsinghua University Beijing 100084
Abstract:The dictionary mechanism serves as one of the basic components in Chinese word segmentation systems.Its performance influences the segmentation speed significantly.In this paper,we design and implement three typical dictionary mechanisms,i.e.binary seek by word,TRIE indexing tree and binary seek by characters,from word segmentation point of view,and compare their space and time complexity experimentally.It can be seen that the binary seek by characters model is the most appropriate one being capable of fulfilling the need for speed of practical Chinese word segmenters to the maximum extent.
Keywords:Chinese information processing  Chinese word segmentation  Dictionary mechanism for Chinese word segmentation
本文献已被 CNKI 维普 万方数据 等数据库收录!