Hello World

Be Happy!

PgSearch full-text search with GIN


prerequisites

  1. install PgSearch gem
    1. gem 'pg_search'
  2. If there is exist data before install pg_search gem, Insert data to pg_search_documents Table
    1. $ rails console
      > PgSearch::Multisearch.rebuild(YourModel, clean_up: true)
  3. seed file for large data
    1. $ rails console
      200000.times do
        FactoryBot.create :your_model
      end

Before GIN(Generalized Inverted Index) 

"Execution Time: 7238.816 ms"
> ActiveRecord::Base.connection.execute("EXPLAIN ANALYZE #{PgSearch.multisearch('Diablo').to_sql}").values
   (7342.5ms)  EXPLAIN ANALYZE SELECT "pg_search_documents".* FROM "pg_search_documents" INNER JOIN (SELECT "pg_search_documents"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))), (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')), 0)) AS rank FROM "pg_search_documents" WHERE ((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')))) AS pg_search_ce9b9dd18c5c0023f2116f ON "pg_search_documents"."id" = pg_search_ce9b9dd18c5c0023f2116f.pg_search_id ORDER BY pg_search_ce9b9dd18c5c0023f2116f.rank DESC, "pg_search_documents"."id" ASC
=> 
[["Gather Merge  (cost=117241.98..117645.20 rows=3456 width=126) (actual time=7011.033..7063.181 rows=3053 loops=1)"],
 ["  Workers Planned: 2"],
 ["  Workers Launched: 2"],
 ["  ->  Sort  (cost=116241.95..116246.27 rows=1728 width=126) (actual time=6951.903..6960.173 rows=1018 loops=3)"],
 ["        Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE(pg_search_documents_1.content, ''::text)), '''diablo'''::tsquery, 0)) DESC, pg_search_documents.id"],
 ["        Sort Method: quicksort  Memory: 537kB"],
 ["        Worker 0:  Sort Method: quicksort  Memory: 584kB"],
 ["        Worker 1:  Sort Method: quicksort  Memory: 551kB"],
 ["        ->  Nested Loop  (cost=0.42..116149.03 rows=1728 width=126) (actual time=57.831..6938.574 rows=1018 loops=3)"],
 ["              ->  Parallel Seq Scan on pg_search_documents pg_search_documents_1  (cost=0.00..106474.33 rows=1728 width=89) (actual time=55.811..6794.069 rows=1018 loops=3)"],
 ["                    Filter: (to_tsvector('simple'::regconfig, COALESCE(content, ''::text)) @@ '''diablo'''::tsquery)"],
 ["                    Rows Removed by Filter: 72317"],
 ["              ->  Index Scan using pg_search_documents_pkey on pg_search_documents  (cost=0.42..5.35 rows=1 width=122) (actual time=0.026..0.026 rows=1 loops=3053)"],
 ["                    Index Cond: (id = pg_search_documents_1.id)"],
 ["Planning Time: 102.483 ms"],
 ["JIT:"],
 ["  Functions: 24"],
 ["  Options: Inlining false, Optimization false, Expressions true, Deforming true"],
 ["  Timing: Generation 18.007 ms, Inlining 0.000 ms, Optimization 11.028 ms, Emission 79.955 ms, Total 108.990 ms"],
 ["Execution Time: 7238.816 ms"]]

After GIN(Generalized Inverted Index) 

"Execution Time: 294.301 ms"
> ActiveRecord::Base.connection.execute("EXPLAIN ANALYZE #{PgSearch.multisearch('Diablo').to_sql}").values
   (296.3ms)  EXPLAIN ANALYZE SELECT "pg_search_documents".* FROM "pg_search_documents" INNER JOIN (SELECT "pg_search_documents"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))), (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')), 0)) AS rank FROM "pg_search_documents" WHERE ((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')))) AS pg_search_ce9b9dd18c5c0023f2116f ON "pg_search_documents"."id" = pg_search_ce9b9dd18c5c0023f2116f.pg_search_id ORDER BY pg_search_ce9b9dd18c5c0023f2116f.rank DESC, "pg_search_documents"."id" ASC
=> 
[["Gather Merge  (cost=9465.68..9540.09 rows=647 width=394) (actual time=214.256..268.902 rows=3053 loops=1)"],
 ["  Workers Planned: 1"],
 ["  Workers Launched: 1"],
 ["  ->  Sort  (cost=8465.67..8467.29 rows=647 width=394) (actual time=205.182..217.448 rows=1526 loops=2)"],
 ["        Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE(pg_search_documents_1.content, ''::text)), '''diablo'''::tsquery, 0)) DESC, pg_search_documents.id"],
 ["        Sort Method: quicksort  Memory: 871kB"],
 ["        Worker 0:  Sort Method: quicksort  Memory: 800kB"],
 ["        ->  Nested Loop  (cost=28.95..8435.47 rows=647 width=394) (actual time=1.086..190.694 rows=1526 loops=2)"],
 ["              ->  Parallel Bitmap Heap Scan on pg_search_documents pg_search_documents_1  (cost=28.53..3622.37 rows=647 width=357) (actual time=0.781..18.373 rows=1526 loops=2)"],
 ["                    Recheck Cond: (to_tsvector('simple'::regconfig, COALESCE(content, ''::text)) @@ '''diablo'''::tsquery)"],
 ["                    Heap Blocks: exact=1402"],
 ["                    ->  Bitmap Index Scan on index_pg_search_documents_on_content  (cost=0.00..28.25 rows=1100 width=0) (actual time=1.038..1.056 rows=3053 loops=1)"],
 ["                          Index Cond: (to_tsvector('simple'::regconfig, COALESCE(content, ''::text)) @@ '''diablo'''::tsquery)"],
 ["              ->  Index Scan using pg_search_documents_pkey on pg_search_documents  (cost=0.42..7.19 rows=1 width=390) (actual time=0.012..0.012 rows=1 loops=3053)"],
 ["                    Index Cond: (id = pg_search_documents_1.id)"],
 ["Planning Time: 0.618 ms"],
 ["Execution Time: 294.301 ms"]]

How to make pg_search_document index with GIN (Generalized Inverted Index)

% docker-compose run --no-deps web rails generate migration add_index_to_documents_content
class AddIndexToDocumentsContent < ActiveRecord::Migration[6.1]
  def change
    add_index :pg_search_documents, %[to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))], using: :gin, name: "index_pg_search_documents_on_content"
  end
end
% docker-compose run --no-deps web rails db:migrate

Check pg_search_documents index

you can see "index_pg_search_documents_on_content" that is new generated index with GIN
% docker-compose run --no-deps web rails console
> ActiveRecord::Base.connection.indexes(:pg_search_documents)
>=> 
 [#<ActiveRecord::ConnectionAdapters::IndexDefinition:0x000056053f9e1c00
   @columns="to_tsvector('simple'::regconfig, COALESCE(content, ''::text))",
   @comment=nil,
   @lengths={},
   @name="index_pg_search_documents_on_content",
   @opclasses={},
   @orders={},
   @table=:pg_search_documents,
   @type=nil,
   @unique=false,
   @using=:gin,
   @where=nil>,
  #<ActiveRecord::ConnectionAdapters::IndexDefinition:0x000056053f9e12a0
   @columns=["searchable_type", "searchable_id"],
   @comment=nil,
   @lengths={},
   @name="index_pg_search_documents_on_searchable",
   @opclasses={},
   @orders={},
   @table=:pg_search_documents,
   @type=nil,
   @unique=false,
   @using=:btree,
   @where=nil>]  
   

https://github.com/Casecommons/pg_search/wiki/Building-indexes#how-to-create-a-model-scope-and-use-multisearch-index
#postgresql (5) #gin (1) #inverted_index (1) #search (1)
List