实用的免费中文分词服务
中文分词对于大型网站的搜索及SEO优化都是一个难点,同时也是一个重点。很多人会使用Lucene中文分词,但是想维护如此大的一个词库不是一件容易的事情,可行性值得考证。
在这种情况下为什么不使用他人提供的服务呢?
优点:
1、稳定、速度快、分词准确
2、不用维护
缺点:
1、他人的免费服务始终不放心,万一停了怎么办?
以下是百度热门相关关键字和discuz标签功能提供的免费ROA服务(php版函数)
//根据标题获得百度热门相关关键字,返回字符串
function baiduKeyword($title,$num=5,$charset="UTF-8"){
$title=iconv($charset, "GB2312", $title);
$w=file_get_contents('http://d.baidu.com/rs.php?q='.urlencode($title).'&tn=baidu');
//die($w);
//获得列表部分
preg_match_all("|<div id=con>(.*)</div>|isU",$w,$con);
$list=$con[1][0];
//获得具体内容
preg_match_all("|<ul><li class=ls>(.*)</li><li class=kwc><a target=_blank href=(.*)>(.*)</a></li><li class=bar><img src=http://img.baidu.com/img/bar_1.gif height=6 width=(\d*) align=absmiddle vspace=5></li></ul>|isU",$list,$content);
//合并成数组,被搜索次数做为键值
$c=array_combine($content[4],$content[3]);
//排序
krsort($c);
//取前N条
$r=array_slice($c, 0, $num);
//转换成字符串
$result=implode(",", $r);
$result=iconv("GB2312", $charset,$result);
return $result;
}
//通过discuz获取文章关键字(标签),输入标题和内容 即可返回5个关键字数组
function getTags($title,$content){
$subjectenc = rawurlencode(strip_tags($title));
$messageenc = rawurlencode(strip_tags(preg_replace("/\[.+?\]/U", '',$content)));
$subjectenc =substr($subjectenc,0,60);
$messageenc=substr($messageenc,0,1200);
$data = @implode('', file("http://keyword.discuz.com/related_kw.html?title=$subjectenc&content=$messageenc&ics=utf-8&ocs=utf-8"));
$kws = array();
if($data) {
$parser = xml_parser_create();
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parse_into_struct($parser, $data, $values, $index);
xml_parser_free($parser);
foreach($values as $valuearray) {
if($valuearray['tag'] == 'kw' || $valuearray['tag'] == 'ekw') {
$kw =trim($valuearray['value']);
$kws[] =$kw ;
}
}
}
return $kws;
}
最新评论:
no.114849 xrumjam89 185.103.255.190 2020-02-04 17:23
no.114788 xrumjam89 185.103.254.187 2020-02-03 02:31
no.114774 xrumjam67 185.103.255.246 2020-02-02 14:53
no.114663 xrumjam89 185.105.118.36 2020-01-30 09:58
no.112234 bqatylwd 5.188.210.2 2019-12-14 14:29
no.111014 534lihmiuh 5.128.73.76 2019-10-08 12:35
no.110506 gnfd43gfd 5.128.73.76 2019-09-16 14:19
no.109181 yourmail 5.128.73.76 2019-08-21 09:17
no.58986 641 94.251.118.150 2019-03-28 15:40
no.58982 pan641 94.251.118.150 2019-03-28 14:56
no.58975 pany35641 94.251.118.150 2019-03-28 14:08
no.58824 pany3561 94.251.118.150 2019-03-27 14:25
no.58765 pany3561 94.251.118.150 2019-03-27 05:07
no.58486 pany3561 94.251.118.150 2019-03-25 10:45
no.58010 pany356 94.251.118.150 2019-03-22 14:32
no.51516 sjfdrudsddd 94.251.118.150 2019-02-11 14:43
no.51498 ss568 94.251.118.150 2019-02-11 12:56
no.51422 ss5687567 94.251.118.150 2019-02-11 03:11
no.51415 ssre5776 94.251.118.150 2019-02-11 02:27
no.51375 ssrety 94.251.118.150 2019-02-10 21:43
no.51329 ssrety 94.251.118.150 2019-02-10 15:31
no.51310 ssrety 94.251.118.150 2019-02-10 13:40
no.51304 ssrety345 94.251.118.150 2019-02-10 13:02
no.51300 boope589 94.251.118.150 2019-02-10 12:37
no.51027 boope345 94.251.118.150 2019-02-09 06:13
no.50843 ne4356 94.251.118.150 2019-02-08 05:47
no.50652 nego4356 94.251.118.150 2019-02-07 05:45
no.50606 nego4356 94.251.118.150 2019-02-07 00:22
no.50440 nego.877 94.251.118.150 2019-02-05 20:21
no.50434 negodov.877 94.251.118.150 2019-02-05 15:20
no.49970 negodov.77 94.251.118.150 2019-02-02 15:18
no.18665 wer324dfcc 31.130.19.14 2018-09-27 00:18
no.14495 beltran 5.35.69.214 2018-07-10 11:59
no.14104 support 5.188.211.35 2018-05-12 18:29
no.14087 support 5.188.211.10 2018-05-12 05:47
no.13792 cbivpcreecewexumma 46.161.9.9 2018-04-06 08:41
no.13791 jraafcreecewexumma 46.161.9.9 2018-04-06 06:15
no.13786 owrciBavaFalliawnlem 46.161.9.69 2018-04-05 19:37
no.13779 vczvnBavaFalliawnlem 46.161.9.69 2018-04-05 07:20
no.13771 matveylandikaj 146.185.223.230 2018-04-04 16:37
no.13770 matveylandikaj 146.185.223.230 2018-04-04 16:37
no.13769 matveylandikaj 146.185.223.230 2018-04-04 16:37
no.13768 matveylandikaj 146.185.223.230 2018-04-04 16:36
no.13767 matveylandikaj 146.185.223.230 2018-04-04 16:36
no.13766 matveylandikaj 146.185.223.230 2018-04-04 13:57
no.13762 matveylandikaj 146.185.223.230 2018-04-04 08:32
no.13761 matveylandikaj 146.185.223.230 2018-04-04 08:32
no.13760 matveylandikaj 146.185.223.230 2018-04-04 08:32
no.13759 matveylandikaj 146.185.223.230 2018-04-04 08:32
no.13758 matveylandikaj 146.185.223.230 2018-04-04 08:32