Click to See Complete Forum and Search --> : Faster Queries
The Little Guy
10-01-2007, 02:14 PM
OK, how could I improve performance?
My first query gets all the results, so I know how many pages to have
My second query is to get the results that will be shown on the page.
<?php
// Query to get ALL valid results
$query_count = "SELECT title,content,URL,
MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score1,
MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score2,
MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score3
FROM $searchType WHERE MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) OR
MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) OR
MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE)";
$result_count = mysqli_query($db,$query_count);
$row = mysqli_fetch_array($result_count);
$totalrows = mysqli_num_rows($result_count);
$limitvalue = $page * $limit - ($limit);
$time_start = microtime(true);
// Same query as above, only it returns a limited number of results ($limit = 5 to 100)
$query = "SELECT title,content,URL,id,
MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score1,
MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score2,
MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score3
FROM $searchType WHERE MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) OR
MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) OR
MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE)
ORDER BY score1 DESC, score2 DESC, score3 DESC LIMIT $limitvalue, $limit";
?>
I don't know if it is just me, but It seems like this would take a while to query, if it needed to query 100,000,000+ rows in a database.
What would the fastest way to do this be?
valenok
10-01-2007, 04:40 PM
$query_count = "SELECT title,content,URL,
MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score1,
MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score2,
MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score3
FROM $searchType WHERE MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) OR
MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) OR
MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE)";
OU ..
Wow ..
All you need is to select just the id of the rows you need
or just use COUNT()
bokeh
10-01-2007, 04:42 PM
Do you really have one hundred million plus rows in you database? Why are you making this so complicated? Do you really need a boolean search.
Also your first query should be a count query and doesn't need to return a score as it is not going to be used anywhere so that needs deleting. The second query also seems unnecessarilly over complicated. The fulltext index on the table should include all columns in one index rather than having individual indexes on each column. As for the code you have posted, I would simplify it (without losing any functionality) to something like the following: <?php
// Query to get ALL valid results
$query_count = "
SELECT
COUNT(*) as totalrows
FROM $searchType
WHERE
MATCH(title, URL, content)
AGAINST ('$searchQuery' IN BOOLEAN MODE)";
// Same query as above, only it returns a limited number of results ($limit = 5 to 100)
$query = "
SELECT
title,
content,
URL,
id,
MATCH(title, URL, content)
AGAINST ('$searchQuery' IN BOOLEAN MODE) AS score
FROM $searchType
WHERE
MATCH(title, URL, content)
AGAINST ('$searchQuery' IN BOOLEAN MODE)
ORDER
BY score DESC LIMIT $limitvalue, $limit";
?>
The Little Guy
10-01-2007, 05:13 PM
will that keep an order of importance??
title is most important, URL is second, and content is least
Well... A search Engine is in the process, so... 100 mil could be approached.
So I need a way to search the DB almost instantly.
bokeh
10-01-2007, 05:52 PM
will that keep an order of importance??If that is important you should make the baising tunable. Maybe something like this in the SELECT clause:SELECT
title,
content,
URL,
id,
(MATCH(title)
AGAINST ('$searchQuery' IN BOOLEAN MODE) * 1.5) +
(MATCH(URL)
AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.8) +
(MATCH(content)
AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.5) AS score
FROM $searchType
WHERE
MATCH(title, URL, content)
AGAINST ('$searchQuery' IN BOOLEAN MODE)
ORDER
BY score DESC LIMIT $limitvalue, $limit
valenok
10-01-2007, 07:46 PM
Actually you even dont need the count query
if you arent planning output how many pages you have
if start > 0 you have previous page
now select PER_PAGE+1 and if you have mysql_num_rows = PER_PAGE+1 you have "next page"
The Little Guy
10-01-2007, 10:07 PM
Actually you even dont need the count query
if you arent planning output how many pages you have
if start > 0 you have previous page
now select PER_PAGE+1 and if you have mysql_num_rows = PER_PAGE+1 you have "next page"
The user can jump from the first to the last page.
http://secret.publicsize.com/search?q=free+car&btn=Web+Search&lan=en&sType=web
NightShift58
10-01-2007, 11:55 PM
I wouldn't use COUNT() in my "pre-query" but the SQL_CALC_FOUND_ROWS. I would also try to use SQL caching options so that the second execution of the query would take advantage of the first run, to the extent that it makes sense in your application.
The Little Guy
10-02-2007, 01:02 AM
I wouldn't use COUNT() in my "pre-query" but the SQL_CALC_FOUND_ROWS. I would also try to use SQL caching options so that the second execution of the query would take advantage of the first run, to the extent that it makes sense in your application.
how would I use SQL_CALC_FOUND_ROWS? I have never heard of it before.
NightShift58
10-02-2007, 01:23 AM
SELECT * SQL_CALC_FOUND_ROWS
FROM myTABLE
WHERE 1=1 AND 1<>2
The Little Guy
10-02-2007, 01:40 AM
I don't know if it is just me, but that seems to take some time....
it is taking about 1+ sec to run...
NightShift58
10-02-2007, 01:51 AM
What is taking 1+ seconds to run?
The Little Guy
10-02-2007, 01:55 AM
this:
$query_count = "SELECT SQL_CALC_FOUND_ROWS
title,
content,
URL,
id,
(MATCH(URL)
AGAINST ('$searchQuery' IN BOOLEAN MODE) * 1.5) +
(MATCH(title)
AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.8) +
(MATCH(content)
AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.5) AS score
FROM $searchType
WHERE
MATCH(title, URL, content)
AGAINST ('$searchQuery' IN BOOLEAN MODE)
AND content != ''
ORDER
BY score DESC LIMIT $limitvalue, $limit";
Depending on the number of records, this could be pretty fast.
The true test would be to run the same query using count(*) instead.
The Little Guy
10-02-2007, 02:03 AM
They are about the same, but SQL_CALC_FOUND_ROWS is about 1 or 2 milliseconds faster.
I'll stay with that, how can I increase the speed more do you think? Any way or ideas of how to make it about .05 seconds?
NightShift58
10-02-2007, 04:30 AM
I don't know about going from 1 sec to 0.05 sec... That's a little steep...
I don't know how your table really looks like so it would be presumptious of me to tell you this-or-that, not that I haven't been known to be that way...
You could reduce some of the overhead - this is straight out of the docs - by using identical MATCH...AGAINST both in the SELECT and WHERE clauses. As it stands now, they are being being called during the query. By ensuring that they are identical, you would presumably save one pass, which could add up to a big saving.
Also, I try to put my "K.O." criteria first in a WHERE clause, perhaps more out of habit than true need these days - in this case, content != ''
You can also do away with the sort as the MATCH...AGAINST in the WHERE clause will deliver the data in order of highest relevance first and the SORT BY becomes irrelevant.
It could look like this:
$query_count = "SELECT SQL_CALC_FOUND_ROWS
title,
content,
URL,
id,
(
(MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) * 1.5) +
(MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.8) +
(MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.5)
) AS score
FROM $searchType
WHERE content != ''
AND
(
(MATCH(URL) AGAINST ('$searchQuery' IN BOOLEAN MODE) * 1.5) +
(MATCH(title) AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.8) +
(MATCH(content) AGAINST ('$searchQuery' IN BOOLEAN MODE) * 0.5)
)
LIMIT $limitvalue, $limit";
bokeh
10-02-2007, 08:45 AM
You could reduce some of the overhead - this is straight out of the docs - by using identical MATCH...AGAINST both in the SELECT and WHERE clauses. As it stands now, they are being being called during the query. By ensuring that they are identical, you would presumably save one pass, which could add up to a big saving.Testing on a 35,000 row table finds this is not so. The identical query (SELECT and WHERE) runs slower than grouped MATCH query. Maybe the MATCH ... AGAINST in the SELECT clause is only run on the rows selected in the WHERE clause.
Also, I try to put my "K.O." criteria first in a WHERE clause, perhaps more out of habit than true need these days - in this case, content != ''This is making a test query run slower (marginally).
You can also do away with the sort as the MATCH...AGAINST in the WHERE clause will deliver the data in order of highest relevance first and the SORT BY becomes irrelevant.Testing says this is bogus. Ordering is completely different, ... and it's not just a case of the ASC/DESC argument inverting the results.
Appologies for not adding anything positive with this post.
The Little Guy
10-02-2007, 02:23 PM
Would It be faster to search 3 tables?
- One with the URL's
- One with the titles
- One with the content
bokeh
10-02-2007, 02:44 PM
Would It be faster to search 3 tables?I doubt it. Have you got a proper full text index on those 3 columns? I mean like this:FULLTEXT KEY `keyname` (`title`,`URL`,`content`)Rather than individual ones. If they are individual ones the query will run slow.
By the way, using Nighshift58's SQL_CALC_FOUND_ROWS method on my 35,000 row table returns an average result in around 0.09 seconds. This is on the first attempt before the query is cached by MySQL.
The Little Guy
10-02-2007, 03:25 PM
That full text really speed it up, it is now doing it in about .4 - .5 seconds
I had it individually full text
and not as one.. Thanks.
Any other performance ideas would be great!
bokeh
10-02-2007, 04:15 PM
You can also do away with the sort as the MATCH...AGAINST in the WHERE clause will deliver the data in order of highest relevance first and the SORT BY becomes irrelevant.Ok, I just want to revise what I posted above. This is true but only for a standard MATCH/AGAINST query, not a BOOLEAN one.
Any here's what I recommend trying. First run this:ALTER TABLE $searchType ADD FULLTEXT(title,URL,content)
I played with the query a bit and found the following query returned the exact same result set as the big slow BOOLEAN query. SELECT SQL_CALC_FOUND_ROWS
title,
content,
URL,
id
FROM $searchType
WHERE
MATCH(title,URL,content) AGAINST ('$searchQuery')
LIMIT $limitvalue, $limitThat query runs in 0.0005 seconds on my box on that 35,000 row table. If you want to use the boolean operators make it an option of your search, not the default and write a different query.
The Little Guy
10-02-2007, 05:26 PM
Nice, That does work. it is now between ~ .02 and ~ .1
I emailed one of my instructors, and he emailed me this back:
Hi, Ryan!
I'm glad you are going places with PHP and MySQL!
You have practically designed this query to be slow! You are doing full text
search, which is inherently slow. I haven't done much with full text searches
(because they are new in MySQL 5.x), but my understanding is that you can
only perform a full text search on a field that you marked in the table
definition, so any indexing the database can do to make the search faster is
already done.
Back in the "olden days," they used to tell us to put the most exclusive
elements of a where clause first, but with today's query optimizers, I doubt
this will matter, but you can try putting your "content != ''" before
the "match" in your where clause (I'll also add that I didn't know SQL was
okay with '!=', I thought it only allowed "<>", but since the query is
working, I'm clearly wrong about that.
You might also consider NOT storing blank strings, but using "NULL" values.
comparing to NULL is considerably faster than comparison with even an empty
string, but this can be a hassle, I know, because you have to change blanks
to NULLs. NULLs can be a pain.
The only other things I can think of are things to make the whole system
faster. Faster processor, more memory, software RAID bit-striping on the
database volume, etc.
When you think about what you are doing, 1-2 seconds for the search you are
performing is pretty damned good for a single, unclustered database server.
Any thoughts on it?
The Little Guy
10-02-2007, 05:53 PM
I was able to speed it up a bit more, instead of indexing individualy, I combined URL, and title to get it about between .01 and .002 seconds for searching for the search that returned the most results.
bokeh
10-03-2007, 02:26 AM
I haven't done much with full text searches
(because they are new in MySQL 5.x)As of MySQL 3.23.23, MySQL has had support for full-text indexing and searching.
MrCoder
10-03-2007, 05:28 AM
As of MySQL 3.23.23, MySQL has had support for full-text indexing and searching.
Maybe you should look at becoming a teacher Bokeh :)
svidgen
09-19-2008, 12:02 AM
If you're serious about creating a public search engine, you probably shouldn't be using MySQL's built-in FULLTEXT functionality. It's pretty slow and limited compared to the alternatives. You're probably even better off just creating your own "FULLTEXT" index table.
High Performance MySQL recommends using something specialized like Lucene (http://lucene.apache.org/) or Sphinx (http://sphinxsearch.com/). I'm pretty sure Sphinx actually uses MySQL with a custom "FULLTEXT" indexing table as a backend.
In any case, I hope that helps.
webdeveloper.com
Copyright Internet.com Inc., All Rights Reserved.