<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pavel Baranov&#039;s Personal Page &#187; Sphinx</title>
	<atom:link href="http://www.pavelbaranov.com/category/it/sphinx/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pavelbaranov.com</link>
	<description>my personal interests and thoughts</description>
	<lastBuildDate>Fri, 06 Aug 2010 23:08:27 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Playing with Sphinx Search Engine / MySQL vs. Sphinx ?</title>
		<link>http://www.pavelbaranov.com/2009/11/09/playing-with-sphinx-search-engine-mysql-vs-sphinx/</link>
		<comments>http://www.pavelbaranov.com/2009/11/09/playing-with-sphinx-search-engine-mysql-vs-sphinx/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 19:14:57 +0000</pubDate>
		<dc:creator>Pavel Baranov</dc:creator>
				<category><![CDATA[IT]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Sphinx]]></category>

		<guid isPermaLink="false">http://www.pavelbaranov.com/?p=18</guid>
		<description><![CDATA[So what do you do when you have 240 Million profiles and you want to search on first, last, middle names + locations + tags ?
And you are stuck with MySQL cause Oracle is way out of your budget?
Well &#8211; give sphinx a try!
I know that sphinx is more of a text search / stats [...]]]></description>
			<content:encoded><![CDATA[<p>So what do you do when you have 240 Million profiles and you want to search on first, last, middle names + locations + tags ?</p>
<p>And you are stuck with MySQL cause Oracle is way out of your budget?</p>
<p>Well &#8211; give sphinx a try!</p>
<p>I know that sphinx is more of a text search / stats engine but with a little trial and error and some sneakiness you can achieve some cool results, so here we go:</p>
<blockquote><p>Testing Sphinx 0.9.8-release (r1371)</p></blockquote>
<blockquote><p>Preliminary results (sphinx):</p>
<p><strong>STAGE 1:</strong></p>
<p><strong>[root@web01 /]# search &#8211;config /usr/local/etc/sphinx.conf -q -a real -l 5 -s &#8220;results desc&#8221; -i test2</strong></p>
<p><strong>Sphinx 0.9.8-release (r1371)</strong></p>
<p><strong>Copyright (c) 2001-2008, Andrew Aksyonoff</strong></p>
<p><strong>using config file &#8216;/usr/local/etc/sphinx.conf&#8217;&#8230;</strong></p>
<p><strong>index &#8216;test2&#8242;: query &#8216;real &#8216;: returned 1000 matches of 34358 total in <span style="color: #ff0000;"><span style="text-decoration: underline;">0.039 sec</span></span></strong></p>
<p><strong>displaying matches:</strong></p>
<p><strong>1. document=32792, weight=1, results=827830, date_added=Wed Jun  6 11:23:14 2007</strong></p>
<p><strong>2. document=3376336, weight=1, results=4302, date_added=Tue Aug  5 10:59:12 2008</strong></p>
<p><strong>3. document=18765, weight=1, results=3586, date_added=Tue Jun  5 21:42:00 2007</strong></p>
<p><strong>4. document=772274, weight=1, results=3122, date_added=Wed May  7 21:13:59 2008</strong></p>
<p><strong>5. document=113495, weight=1, results=1215, date_added=Tue Sep 18 07:31:07 2007</strong></p>
<p><strong>words:</strong></p>
<p><strong>1. &#8216;real&#8217;: 34358 documents, 34402 hits</strong></p>
<p>(Mysql 5.0.37):</p>
<p><strong>mysql&gt; select id from employers where name like &#8216;%real %&#8217; order by results desc limit 5;</strong></p>
<p><strong>+&#8212;&#8212;&#8212;+</strong></p>
<p><strong>| id      |</strong></p>
<p><strong>+&#8212;&#8212;&#8212;+</strong></p>
<p><strong>|   32792 |</strong></p>
<p><strong>| 3376336 |</strong></p>
<p><strong>|   18765 |</strong></p>
<p><strong>|  772274 |</strong></p>
<p><strong>|  113495 |</strong></p>
<p><strong>+&#8212;&#8212;&#8212;+</strong></p>
<p><strong>5 rows in set <span style="color: #ff0000;">(<span style="text-decoration: underline;">7.57 sec</span>)</span></strong></p></blockquote>
<blockquote><p><strong><br />
</strong></p></blockquote>
<p>Umn &#8230; lets see&#8230; I&#8217;d go with SPHINX, want more ?</p>
<p>Here it is:</p>
<blockquote><p><strong>STAGE 2:</strong></p>
<p><strong>Goal:<br />
a. Count how many profiles are there in “Terre Haute” city_id 1906883 and has WORK TAGS (Strategic Supply Chain Manager, Strategic Supply Chain Management 5 inc., Strategic Supply Manager, strategic supply)<br />
b. Get first 20 profile ids order by links desc</strong></p>
<p><strong>Lets start with good old MySQL 5.0.37 :</strong></p>
<p><strong>a.<br />
mysql&gt; SELECT COUNT(1) AS cnt FROM region_global_tags_222 AS pgt JOIN search_city_3837 AS loct USING(profile_id) WHERE ((pgt.global_tag_type=&#8221;employer&#8221; &amp;&amp; pgt.global_tag_id IN (223209,141623,4561058,331778))) AND pgt.region_id=3837 AND loct.city_id=1906883;</strong></p>
<p><strong>+&#8212;&#8211;+</strong></p>
<p><strong>| cnt |</strong></p>
<p><strong>+&#8212;&#8211;+</strong></p>
<p><strong>|   1 |</strong></p>
<p><strong>+&#8212;&#8211;+</strong></p>
<p><strong>1 row in set (2 min 7.14 sec)</strong></p>
<p><strong> b.</strong></p>
<p><strong>mysql&gt; SELECT profile_id AS cnt FROM region_global_tags_222 AS pgt JOIN search_city_3837 AS loct USING(profile_id) WHERE ((pgt.global_tag_type=&#8221;employer&#8221; &amp;&amp; pgt.global_tag_id IN (223209,141623,4561058,331778))) AND pgt.region_id=3837 AND loct.city_id=1906883 order by pgt.links desc limit 20;</strong></p>
<p><strong>+&#8212;&#8212;&#8212;&#8211;+</strong></p>
<p><strong>| cnt       |</strong></p>
<p><strong>+&#8212;&#8212;&#8212;&#8211;+</strong></p>
<p><strong>| 147563021 |</strong></p>
<p><strong>+&#8212;&#8212;&#8212;&#8211;+</strong></p>
<p><strong>1 row in set (3 min 4.47 sec)</strong></p>
<p><span style="color: #ff0000;"><strong>TOTAL:  5 MINUTES and 11.51 SECONDS !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</strong></span></p>
<p><strong> </strong></p>
<p><strong>Lets try with SPHINX:</strong></p>
<p><strong>[root@web01 pavel]# search &#8211;config /usr/local/etc/sphinx.conf -e &#8216;@global_tag_type employer @global_tag_id (223209|141623|4561058|331778) @city_id 1906883&#8242; -l 200 -s &#8220;links desc&#8221; -i tags_region_3837                                                                                                                                           Sphinx 0.9.8-release (r1371)</strong></p>
<p><strong>Copyright (c) 2001-2008, Andrew Aksyonoff</strong></p>
<p><strong>using config file &#8216;/usr/local/etc/sphinx.conf&#8217;&#8230;</strong></p>
<p><strong>index &#8216;tags_region_3837&#8242;: query &#8216;@global_tag_type employer @global_tag_id (223209|141623|4561058|331778) @city_id 1906883 &#8216;: returned 1 matches of 1 total in <span style="color: #ff0000;">0.014 sec <img src='http://www.pavelbaranov.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
</span></strong></p></blockquote>
<p><strong><br />
</strong></p>
<p><span style="color: #339966;"><strong><span style="text-decoration: underline;">0.014 SECONDS </span></strong><strong>vs. <span style="text-decoration: underline;">5 MINUTES and 11.51 SECONDS</span></strong></span></p>
<p><span style="color: #000000;">This is one of the uses of</span> Sphinx that can increase your performance and decrease some DB load <img src='http://www.pavelbaranov.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Comments ?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pavelbaranov.com/2009/11/09/playing-with-sphinx-search-engine-mysql-vs-sphinx/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
