PgSearch full-text search with GIN
prerequisites
- install PgSearch gem
gem 'pg_search'
- If there is exist data before install pg_search gem, Insert data to pg_search_documents Table
$ rails console > PgSearch::Multisearch.rebuild(YourModel, clean_up: true)
- seed file for large data
$ rails console 200000.times do FactoryBot.create :your_model end
Before GIN(Generalized Inverted Index)
"Execution Time: 7238.816 ms"
> ActiveRecord::Base.connection.execute("EXPLAIN ANALYZE #{PgSearch.multisearch('Diablo').to_sql}").values (7342.5ms) EXPLAIN ANALYZE SELECT "pg_search_documents".* FROM "pg_search_documents" INNER JOIN (SELECT "pg_search_documents"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))), (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')), 0)) AS rank FROM "pg_search_documents" WHERE ((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')))) AS pg_search_ce9b9dd18c5c0023f2116f ON "pg_search_documents"."id" = pg_search_ce9b9dd18c5c0023f2116f.pg_search_id ORDER BY pg_search_ce9b9dd18c5c0023f2116f.rank DESC, "pg_search_documents"."id" ASC => [["Gather Merge (cost=117241.98..117645.20 rows=3456 width=126) (actual time=7011.033..7063.181 rows=3053 loops=1)"], [" Workers Planned: 2"], [" Workers Launched: 2"], [" -> Sort (cost=116241.95..116246.27 rows=1728 width=126) (actual time=6951.903..6960.173 rows=1018 loops=3)"], [" Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE(pg_search_documents_1.content, ''::text)), '''diablo'''::tsquery, 0)) DESC, pg_search_documents.id"], [" Sort Method: quicksort Memory: 537kB"], [" Worker 0: Sort Method: quicksort Memory: 584kB"], [" Worker 1: Sort Method: quicksort Memory: 551kB"], [" -> Nested Loop (cost=0.42..116149.03 rows=1728 width=126) (actual time=57.831..6938.574 rows=1018 loops=3)"], [" -> Parallel Seq Scan on pg_search_documents pg_search_documents_1 (cost=0.00..106474.33 rows=1728 width=89) (actual time=55.811..6794.069 rows=1018 loops=3)"], [" Filter: (to_tsvector('simple'::regconfig, COALESCE(content, ''::text)) @@ '''diablo'''::tsquery)"], [" Rows Removed by Filter: 72317"], [" -> Index Scan using pg_search_documents_pkey on pg_search_documents (cost=0.42..5.35 rows=1 width=122) (actual time=0.026..0.026 rows=1 loops=3053)"], [" Index Cond: (id = pg_search_documents_1.id)"], ["Planning Time: 102.483 ms"], ["JIT:"], [" Functions: 24"], [" Options: Inlining false, Optimization false, Expressions true, Deforming true"], [" Timing: Generation 18.007 ms, Inlining 0.000 ms, Optimization 11.028 ms, Emission 79.955 ms, Total 108.990 ms"], ["Execution Time: 7238.816 ms"]]
After GIN(Generalized Inverted Index)
"Execution Time: 294.301 ms"
> ActiveRecord::Base.connection.execute("EXPLAIN ANALYZE #{PgSearch.multisearch('Diablo').to_sql}").values (296.3ms) EXPLAIN ANALYZE SELECT "pg_search_documents".* FROM "pg_search_documents" INNER JOIN (SELECT "pg_search_documents"."id" AS pg_search_id, (ts_rank((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))), (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')), 0)) AS rank FROM "pg_search_documents" WHERE ((to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'Diablo' || ' ''')))) AS pg_search_ce9b9dd18c5c0023f2116f ON "pg_search_documents"."id" = pg_search_ce9b9dd18c5c0023f2116f.pg_search_id ORDER BY pg_search_ce9b9dd18c5c0023f2116f.rank DESC, "pg_search_documents"."id" ASC => [["Gather Merge (cost=9465.68..9540.09 rows=647 width=394) (actual time=214.256..268.902 rows=3053 loops=1)"], [" Workers Planned: 1"], [" Workers Launched: 1"], [" -> Sort (cost=8465.67..8467.29 rows=647 width=394) (actual time=205.182..217.448 rows=1526 loops=2)"], [" Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE(pg_search_documents_1.content, ''::text)), '''diablo'''::tsquery, 0)) DESC, pg_search_documents.id"], [" Sort Method: quicksort Memory: 871kB"], [" Worker 0: Sort Method: quicksort Memory: 800kB"], [" -> Nested Loop (cost=28.95..8435.47 rows=647 width=394) (actual time=1.086..190.694 rows=1526 loops=2)"], [" -> Parallel Bitmap Heap Scan on pg_search_documents pg_search_documents_1 (cost=28.53..3622.37 rows=647 width=357) (actual time=0.781..18.373 rows=1526 loops=2)"], [" Recheck Cond: (to_tsvector('simple'::regconfig, COALESCE(content, ''::text)) @@ '''diablo'''::tsquery)"], [" Heap Blocks: exact=1402"], [" -> Bitmap Index Scan on index_pg_search_documents_on_content (cost=0.00..28.25 rows=1100 width=0) (actual time=1.038..1.056 rows=3053 loops=1)"], [" Index Cond: (to_tsvector('simple'::regconfig, COALESCE(content, ''::text)) @@ '''diablo'''::tsquery)"], [" -> Index Scan using pg_search_documents_pkey on pg_search_documents (cost=0.42..7.19 rows=1 width=390) (actual time=0.012..0.012 rows=1 loops=3053)"], [" Index Cond: (id = pg_search_documents_1.id)"], ["Planning Time: 0.618 ms"], ["Execution Time: 294.301 ms"]]
How to make pg_search_document index with GIN (Generalized Inverted Index)
% docker-compose run --no-deps web rails generate migration add_index_to_documents_content class AddIndexToDocumentsContent < ActiveRecord::Migration[6.1] def change add_index :pg_search_documents, %[to_tsvector('simple', coalesce("pg_search_documents"."content"::text, ''))], using: :gin, name: "index_pg_search_documents_on_content" end end % docker-compose run --no-deps web rails db:migrate
Check pg_search_documents index
you can see "index_pg_search_documents_on_content" that is new generated index with GIN
% docker-compose run --no-deps web rails console > ActiveRecord::Base.connection.indexes(:pg_search_documents) >=> [#<ActiveRecord::ConnectionAdapters::IndexDefinition:0x000056053f9e1c00 @columns="to_tsvector('simple'::regconfig, COALESCE(content, ''::text))", @comment=nil, @lengths={}, @name="index_pg_search_documents_on_content", @opclasses={}, @orders={}, @table=:pg_search_documents, @type=nil, @unique=false, @using=:gin, @where=nil>, #<ActiveRecord::ConnectionAdapters::IndexDefinition:0x000056053f9e12a0 @columns=["searchable_type", "searchable_id"], @comment=nil, @lengths={}, @name="index_pg_search_documents_on_searchable", @opclasses={}, @orders={}, @table=:pg_search_documents, @type=nil, @unique=false, @using=:btree, @where=nil>]
https://github.com/Casecommons/pg_search/wiki/Building-indexes#how-to-create-a-model-scope-and-use-multisearch-index