So what do you do when you have 240 Million profiles and you want to search on first, last, middle names + locations + tags ?
And you are stuck with MySQL cause Oracle is way out of your budget?
Well – give sphinx a try!
I know that sphinx is more of a text search / stats engine but with a little trial and error and some sneakiness you can achieve some cool results, so here we go:
Testing Sphinx 0.9.8-release (r1371)
Preliminary results (sphinx):
STAGE 1:
[root@web01 /]# search –config /usr/local/etc/sphinx.conf -q -a real -l 5 -s “results desc” -i test2
Sphinx 0.9.8-release (r1371)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file ‘/usr/local/etc/sphinx.conf’…
index ‘test2′: query ‘real ‘: returned 1000 matches of 34358 total in 0.039 sec
displaying matches:
1. document=32792, weight=1, results=827830, date_added=Wed Jun 6 11:23:14 2007
2. document=3376336, weight=1, results=4302, date_added=Tue Aug 5 10:59:12 2008
3. document=18765, weight=1, results=3586, date_added=Tue Jun 5 21:42:00 2007
4. document=772274, weight=1, results=3122, date_added=Wed May 7 21:13:59 2008
5. document=113495, weight=1, results=1215, date_added=Tue Sep 18 07:31:07 2007
words:
1. ‘real’: 34358 documents, 34402 hits
(Mysql 5.0.37):
mysql> select id from employers where name like ‘%real %’ order by results desc limit 5;
+———+
| id |
+———+
| 32792 |
| 3376336 |
| 18765 |
| 772274 |
| 113495 |
+———+
5 rows in set (7.57 sec)
Umn … lets see… I’d go with SPHINX, want more ?
Here it is:
STAGE 2:
Goal:
a. Count how many profiles are there in “Terre Haute” city_id 1906883 and has WORK TAGS (Strategic Supply Chain Manager, Strategic Supply Chain Management 5 inc., Strategic Supply Manager, strategic supply)
b. Get first 20 profile ids order by links descLets start with good old MySQL 5.0.37 :
a.
mysql> SELECT COUNT(1) AS cnt FROM region_global_tags_222 AS pgt JOIN search_city_3837 AS loct USING(profile_id) WHERE ((pgt.global_tag_type=”employer” && pgt.global_tag_id IN (223209,141623,4561058,331778))) AND pgt.region_id=3837 AND loct.city_id=1906883;+—–+
| cnt |
+—–+
| 1 |
+—–+
1 row in set (2 min 7.14 sec)
b.
mysql> SELECT profile_id AS cnt FROM region_global_tags_222 AS pgt JOIN search_city_3837 AS loct USING(profile_id) WHERE ((pgt.global_tag_type=”employer” && pgt.global_tag_id IN (223209,141623,4561058,331778))) AND pgt.region_id=3837 AND loct.city_id=1906883 order by pgt.links desc limit 20;
+———–+
| cnt |
+———–+
| 147563021 |
+———–+
1 row in set (3 min 4.47 sec)
TOTAL: 5 MINUTES and 11.51 SECONDS !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Lets try with SPHINX:
[root@web01 pavel]# search –config /usr/local/etc/sphinx.conf -e ‘@global_tag_type employer @global_tag_id (223209|141623|4561058|331778) @city_id 1906883′ -l 200 -s “links desc” -i tags_region_3837 Sphinx 0.9.8-release (r1371)
Copyright (c) 2001-2008, Andrew Aksyonoff
using config file ‘/usr/local/etc/sphinx.conf’…
index ‘tags_region_3837′: query ‘@global_tag_type employer @global_tag_id (223209|141623|4561058|331778) @city_id 1906883 ‘: returned 1 matches of 1 total in 0.014 sec
0.014 SECONDS vs. 5 MINUTES and 11.51 SECONDS
This is one of the uses of Sphinx that can increase your performance and decrease some DB load
Comments ?