-
Notifications
You must be signed in to change notification settings - Fork 35
Expand file tree
/
Copy path2010-03-04-720.html
More file actions
98 lines (93 loc) · 3.93 KB
/
2010-03-04-720.html
File metadata and controls
98 lines (93 loc) · 3.93 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
layout: post
title: "Debian/Linux下Sphinx-for-chinese (中文全文搜索)的安装"
---
Sphinx是一个基于SQL的全文检索引擎,但对中文用户来说一个致命的缺陷是不支持中文。后来在网上发现了一个基于 Sphinx 的支持切词的全文搜索引擎 <a href="http://code.google.com/p/sphinx-for-chinese/" target="_blank">sphinx-for-chinese</a>。下载下来安装使用后发现很好用,下面介绍一下具体的安装过程。
<span id="more-720"></span>
<ol>
<li><strong>下载所需的安装包</strong>
sphinx-for-chinese-0.9.9-r2117.tar.gz
xdict_1.1.tar.gz
下载地址:<a href="http://code.google.com/p/sphinx-for-chinese/downloads/list" target="_blank">http://code.google.com/p/sphinx-for-chinese/downloads/list</a></li>
<li>安装 sphinx-for-chinese
<pre>$ tar zxvf sphinx-for-chinese-0.9.9-r2117.tar.gz
$ cd sphinx-for-chinese-0.9.9-r2117
$ ./configure --prefix=/usr/local/sphinx
$ make
$ sudo make install</pre>
</li>
<li>创建test数据库,并创建sphinx用户
<pre>mysql> create database test;
mysql>create user 'sphinx'@'localhost' identified by 'sphinx';
mysql>grant all privileges on test.* to 'sphinx'@'localhost';</pre>
</li>
<li>指定sphinx配置文件
<pre>$ cd /usr/local/sphinx/etc
$ sudo cp sphinx.conf.dist sphinx.conf</pre>
</li>
<li>编辑配置文件
<pre>sql_host = localhost
<strong>sql_user = sphinx
sql_pass = sphinx</strong>
sql_db = test
sql_port = 3306 # optional, default is 3306</pre>
说明:加粗部分是修改的内容</li>
</ol>
到这里为止,sphinx已经可以使用了,但还不能支持中文切词,以下是加入中文切词的步骤
<ol>
<li>解压字典文件 xdict_1.1.tar.gz
<pre>$ tar zxvf xdict_1.1.tar.gz</pre>
</li>
<li>借助先前安装的 mkdict 工具生成字典
<pre>$ /usr/local/sphinx/bin/mkdict xdict.txt xdict</pre>
</li>
<li>将字典 xdict 拷贝到 /usr/local/sphinx/etc目录下</li>
<li>配置中文切词
打开 sphinx.conf文件,找到 'charset_type = sbcs' 字样,将其改为
<pre>charset_type = utf-8
chinese_dictionary = /usr/local/sphinx/etc/xdict</pre>
</li>
</ol>
至此中文切词配置完成,下面做一个简单的测试
<ol>
<li>编辑sphinx-for-chinese自带的SQL脚本,加入中文数据
$ vi /usr/local/sphinx/etc/example.sql
<pre>REPLACE INTO test.documents ( id, group_id, group_id2, date_added, title, content ) VALUES
( 1, 1, 5, NOW(), 'test one', 'this is my test document number one. also checking search within phrases.' ),
( 2, 1, 6, NOW(), 'test two', 'this is my test document number two' ),
( 3, 2, 7, NOW(), 'another doc', 'this is another group' ),
( 4, 2, 8, NOW(), 'doc number four', 'this is to test groups' ),
<strong>( 5, 2, 8, NOW(), 'doc number five', '一个' ),
( 6, 2, 8, NOW(), 'doc number six', '我' ),
( 7, 2, 8, NOW(), 'doc number seven', '中国人' )</strong>;
</pre>
说明:加粗部分是添加的中文测试数据
</li>
<li>导入数据
<pre>$ mysql -usphinx -psphinx < example.sql</pre>
</li>
<li>建立索引
<pre>$ sudo /usr/local/sphinx/bin/indexer --all</pre>
</li>
<li>检索
<pre>$ /usr/local/sphinx/bin/search 我是一个中国人
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file '/usr/local/sphinx/etc/sphinx.conf'...
index 'test1': query '我是一个中国人 ': returned 0 matches of 0 total in
0.000 sec
words:
1. '我': 1 documents, 1 hits
2. '是': 0 documents, 0 hits
3. '一个': 1 documents, 1 hits
4. '中国人': 1 documents, 1 hits
index 'test1stemmed': query '我是一个中国人 ': returned 0 matches of 0
total in 0.000 sec
words:
1. '我': 1 documents, 1 hits
2. '是': 0 documents, 0 hits
3. '一个': 1 documents, 1 hits
4. '中国人': 1 documents, 1 hits</pre>
</li>
</ol>
至此,sphinx-for-chinese已经成功安装并顺利通过测试