www.webdeveloper.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 16

Thread: Complex MySQL searches

  1. #1
    Join Date
    Aug 2007
    Location
    London
    Posts
    410

    Question Complex MySQL searches

    Hi all,

    I have quite a complex series of n:n relationships in my database which I want to be able to use as criteria. I'm fairly sure I can work this out with CakePHP but I also want to be able to perform a text search on the "description" type field (probably using something like Sphinx).

    What's the best way to do this?

  2. #2
    Join Date
    Aug 2007
    Location
    London
    Posts
    410

  3. #3
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,361
    Quote Originally Posted by blue-eye-labs View Post
    Anyone?
    Afraid I have no idea what "Sphinx" is or how it would apply. I'm halfway decent with SQL and mutli-table JOINs, but there's not nearly enough info here so far for me to have any idea what to suggest on that front (other than to read up on joins if that's something new to you -- though if my recollection of your other posts is correct, I doubt that's the issue. ).
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  4. #4
    Join Date
    Aug 2007
    Location
    London
    Posts
    410
    Sphinx is an indexing/search daemon for databases (incl. MySQL) http://www.sphinxsearch.com/.

    In terms of the problem, I'm already using JOINs because of all the n:n relationships involved but in addition to searching based on links between different models, I want to be able to search the "description" field of a record which can be up to 5000 chars long. The problem here is obviously that simple LIKE %query% conditions aren't very good here so I need a more powerful sort of fulltext algorithm. I just don't think that InnoDB supports full-text searching.

    Does that make more sense? I want to be able to search using both relationships and by the text in the description field simultaneously as filters.

    A good analogy might be if one were to have a forum (like this one) and wanted to search it using a filter both for the users who have posted in it and the content of the threads.

  5. #5
    Join Date
    Jan 2009
    Posts
    3,346
    One solution is to move/copy the larger text searchable field to an additional table using the alternate MyISAM storage engine. You'll need to manage on cascade manually for that additional table but then you'll get the needed features from both.

  6. #6
    Join Date
    Aug 2007
    Location
    London
    Posts
    410
    Quote Originally Posted by criterion9 View Post
    One solution is to move/copy the larger text searchable field to an additional table using the alternate MyISAM storage engine. You'll need to manage on cascade manually for that additional table but then you'll get the needed features from both.
    I think this will be the most desperate solution. I'd like to find an alternative to it because I think it's the worst option (but not one I'd thought of, so thanks).

    By looking around the net I think that Sphinx is probably my best option.

  7. #7
    Join Date
    Nov 2008
    Posts
    2,477
    Criterion's solution is the one I use. I just replicate the search data to a MyISAM table on save. Why do you say it's a bad solution? It's fairly standard to store fulltext search data in a MyISAM table and use something (imo) better like InnoDB for the transaction support etc.

    I wrote a small FullTextSearchable ModelBehavior if it's of any use:

    PHP Code:
    <?php

    /**
     * FulltextSearchableBehavior
     *
     * Allows a fulltext search to be performed against the Model.
     *
     * This behaviour allows a model to be searched using fulltext boolean searches.
     * By default, this takes place by adding an element to the $options array passed
     * to find which contains the query string. By default, this option is named 'fulltext'.
     * The presence of this key indicated that a search should take place, and only
     * data which matches the query string will be returned.
     *
     * There are some restrictions to this behaviour, primarily that fulltext search
     * is only available for models which use a MySQL database and MyISAM table.
     * To avoid slow queries, it is also a very good idea to build FULLTEXT indeces on
     * the searchable fields. The following SQL can be used for this:
     *
     * ALTER TABLE table_name ADD FULLTEXT(field)
     *
     * The searchable fields should be passed in to the behaviour configuration as
     * an array, with a integer weighting value as key, and the field name as value.
     * The weighting value is used to give precedence to certain fields. For example,
     * finding a keyword in a title field is usually more significant that finding
     * in a content field. It is suggested to use factors of 10 for weighting keys,
     * for example 1, 10, 100, 1000 etc.
     *
     * To fine tune the search, you need to edit the mysql config file /etc/mysql/my.conf.
     * The following keys under the [mysqld] section are useful:
     *
     * ft_min_word_len
     * ft_stopword_file
     *
     * ft_min_word_len is the minimum length a word has to be before it will be indexed,
     * and therefor included in search results. By default this is 4 characters, but
     * this has been lowered to 3 for the purposes of the CMS.
     * ft_stopword_file contains the path to a file containing words which should be
     * excluded from the search process. For example 'and' should usually be ignored.
     * If this option is not set, mysql uses a built in stopwords list. For the purposes
     * of the CMS, the list has been disabled altogether, so nothing is excluded. After
     * changing either of these settings, mysql should be restared, and the following
     * query run to rebuild the search indeces:
     *
     * REPAIR TABLE table_name QUICK
     *
     * The following operators are accepted in the query string:
     *     +    The word is mandatory in all rows returned.
     *     -    The word cannot appear in any row returned.
     *     <    The word that follows has a lower relevance than other words, although
     *             rows containing it will still match
     *     >    The word that follows has a higher relevance than other words.
     *     ()    Used to group words into subexpressions.
     *     ~    The word following contributes negatively to the relevance of the row (which
     *             is different to the '-' operator, which specifically excludes the word, or
     *             the '<' operator, which still causes the word to contribute positively to
     *             the relevance of the row.
     *     *    The wildcard, indicating zero or more characters. It can only appear at the
     *             end of a word.
     *     "    Anything enclosed in the double quotes is taken as a whole (so you can
     *             match phrases, for example).
     * 
     *
     */


    class FulltextSearchableBehavior extends ModelBehavior {

        private 
    $_queryKey 'fulltext';

        public function 
    setup($model$config = array()) {
            if(!empty(
    $config)) { 
                if (
    array_key_exists('fields'$config)) {
                    
    $this->settings[$model->alias]['fulltext_fields'] = (array) $config['fields'];
                }
            }
        }
        
        
    /**
         * Inspects the search config looking for a key called $this->_queryKey. If set,
         * the value of the element with this key is taken to be the search query string,
         * and the search is applied as part of the query.
         *
         * @param object $model A reference to the model to which this behaviour is attached.
         * @param mixed $config The $options array as passed to the Model::find method.
         * @return mixed The modified $options array which now includes the relevant info
         *     for performing the search.
         */
        
    public function beforeFind($model$config) {
            
            if (empty(
    $this->settings[$model->alias]['fulltext_fields'])) {
                
    // no searchable fields defined
                
    return $config;
            }
            if (!
    array_key_exists($this->_queryKey$config)) {
                
    // no query string provided
                
    return $config;
            }
            
            foreach (
    $this->settings[$model->alias]['fulltext_fields'] as $weight => $field) {
                
    $rating[] = sprintf(
                    
    "(MATCH (`%s`.`%s`) AGAINST ('%s' IN BOOLEAN MODE) * %d)"
                    
    $model->alias$field$config[$this->_queryKey], $weight
                
    );
                
    $whereFields[] = sprintf("`%s`.`%s`"$model->alias$field);
            }
            
            
    $config['fields'] = (array) $config['fields'];
            if (empty(
    $config['fields'])) {
                
    array_unshift($config['fields'], '*');
            }
            
    array_push($config['fields'], '(' implode(' + '$rating) . ') AS `rating`');
            
            
    $config['conditions'] = (array) $config['conditions'];
            
    array_unshift(
                
    $config['conditions'],
                
    sprintf(
                    
    "MATCH (%s) AGAINST ('%s' IN BOOLEAN MODE)",
                    
    implode(','$whereFields), $config[$this->_queryKey]
                )
            );
            
            
    $config['order'] = (array) $config['order'];
            
    array_unshift($config['order'], '`rating` DESC');
            
            return 
    $config;
            
        }

    }
    ?>
    Usage is pretty simple:

    PHP Code:
        public $actsAs = array(
            
    'FulltextSearchable' => array(
                
    'fields' => array(
                    
    100 => 'title',
                    
    10 => 'description',
                    
    => 'notes'
                
    )
            )
        ); 
    I add the following to AppController to handle searches automatically (though this is not necessary to use the behavior:

    PHP Code:
        function beforeFilter() {
            if (
    $this->_userSearched()) {
                
    $this->__handleSearch();
            } else {
                
    $this->set('searched'false);
            }
        }
        
        
    /**
         * Provides a globally available search mechanism. Any view can build a form
         * which includes a hidden Search.action field and a Search.query textbox. When
         * submitted, this method picks up the data and redirects to the specified action
         * (or the referring action if none specified, or index of the referring action can't
         * be determined). The redirection causes the __handleSearch method to be called.
         */
        
    function search() {
            
            
    // if no redirect action was specified, use the referring action
            
    if (!isset($this->data['Search']['action'])) {
                
    $route Router::parse($this->referer());
                if (!empty(
    $route['action'])) {
                    
    $this->data['Search']['action'] = $route['action'];
                } else {
                    
    $this->data['Search']['action'] = 'index';
                }
            }
            
            
    $url['action'] = $this->data['Search']['action'];
            
            if (isset(
    $this->data['Search']['passedParams'])) {
                foreach (
    $this->data['Search']['passedParams'] as $param) {
                    
    $url[] = $param;
                }
            }
            
            
    $url['search'] = $this->data['Search']['keywords'];
            
            
    $this->redirect($urlnulltrue);
            
        }
        
        protected function 
    _preHandleSearch() {
            
    $this->data['Search']['keywords'] = $this->passedArgs['search'];
            
    $this->set('searched'$this->data['Search']['keywords']);
        }

        function 
    _userSearched() {
            return isset(
    $this->passedArgs['search']);
        }
        
        
    /**
         * Handles the search performed by the user.
         *
         * By default, this method handles full text search, and simple searches for
         * the model's id. If the search is to be performed
         * via FullTextSearchable behaviour, nothing more is to be done (aside from configure
         * the behaviour in the relevant model). However this method can also be overridden
         * in any controller in order to implement a custom search. The overriding
         * method should populate the paginate property for the particular searched model.
         * The overriding method should also be sure to call the _preHandleSearch()
         * method of app_controller. 
         */
        
    function __handleSearch() {
            
            
    $this->_preHandleSearch();
            
            if (
    is_numeric($this->passedArgs['search'])) {
                
    // assume the search was for an id
                
    $search = array(
                    
    'conditions' => array(
                        
    $this->{$this->modelClass}->alias '.id' => $this->passedArgs['search']
                    )
                );
            } else {
                
    // assume this is to be a fulltext search
                
    $search = array(
                    
    'fulltext' => $this->passedArgs['search']
                );
            }
            
            
    $this->paginate[$this->{$this->modelClass}->alias] =
                
    array_merge_recursive($this->paginate[$this->{$this->modelClass}->alias], $search);
              
        } 
    The first rule of Tautology Club is the first rule of Tautology Club.

  8. #8
    Join Date
    Aug 2007
    Location
    London
    Posts
    410

    Question

    @Mindzai:

    I called it a "bad" solution a) because of the need to synchronise data across two tables (which seems, to me, like an unnecessary overhead and potential pain if something goes awry) and b) because I read over at the mysql performance blog about memory allocation issues when using both myisam and innodb.

    I'm glad to see that you've already written the behaviour functionality so I don't have to (should I choose this method) but I'm going to look at Sphinx as a more viable option (unless MyISAM fulltext searching is actually really good as a search engine?) Sphinx is supposed to be pretty good.

  9. #9
    Join Date
    Jan 2009
    Posts
    3,346
    As far as I know Sphinx does essentially the same thing as my solution it just compartmentalizes the data duplication/separation so you don't have to do the dirty work yourself.

  10. #10
    Join Date
    Aug 2007
    Location
    London
    Posts
    410
    Quote Originally Posted by criterion9 View Post
    As far as I know Sphinx does essentially the same thing as my solution it just compartmentalizes the data duplication/separation so you don't have to do the dirty work yourself.
    And the thing about mem alloc issues with MyISAM + InnoDB? Or is this actually fallacious? I wasn't sure.

  11. #11
    Join Date
    Jan 2009
    Posts
    3,346
    Quote Originally Posted by blue-eye-labs View Post
    And the thing about mem alloc issues with MyISAM + InnoDB? Or is this actually fallacious? I wasn't sure.
    I'm pretty sure those issues depend on the type of queries. If you limit each call to a single engine you shouldn't have any memory allocation issues (though I'm not certain those old issues are still relevant in the latest stable releases). It may mean an additional DB call or two if you are trying to combine tables together using very complex queries (though those kinds of queries could cause issues regardless of the engine type).

  12. #12
    Join Date
    Aug 2007
    Location
    London
    Posts
    410
    Quote Originally Posted by criterion9 View Post
    I'm pretty sure those issues depend on the type of queries. If you limit each call to a single engine you shouldn't have any memory allocation issues (though I'm not certain those old issues are still relevant in the latest stable releases). It may mean an additional DB call or two if you are trying to combine tables together using very complex queries (though those kinds of queries could cause issues regardless of the engine type).
    Okay, thanks. I only really envision my queries using a fulltext search on the descriptions table along with a join to the main table and conditions involving related n:n tables. Shouldn't be too complex to be honest.

    Just a quick (if late) fact check. It is full text searching I want to get search engine like functionality isn't it?

  13. #13
    Join Date
    Jan 2009
    Posts
    3,346
    Quote Originally Posted by blue-eye-labs View Post
    Okay, thanks. I only really envision my queries using a fulltext search on the descriptions table along with a join to the main table and conditions involving related n:n tables. Shouldn't be too complex to be honest.

    Just a quick (if late) fact check. It is full text searching I want to get search engine like functionality isn't it?
    I use Lucene for searches myself, but if you are planning to use MySQL by itself then yes full text is probably what you are looking for (as it is very much superior to "like %keyword%" searches).

  14. #14
    Join Date
    Aug 2007
    Location
    London
    Posts
    410

    Question

    Quote Originally Posted by criterion9 View Post
    I use Lucene for searches myself, but if you are planning to use MySQL by itself then yes full text is probably what you are looking for (as it is very much superior to "like %keyword%" searches).
    Lucene and Sphinx are fairly equivalent, no? The difference being that Lucene is Java and Sphinx is in a C derivative which should make it faster shouldn't it? Or at least hog less memory?

  15. #15
    Join Date
    Jan 2009
    Posts
    3,346
    You can use Lucene with Zend Framework. It doesn't have to be Java.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles