Field Types in opensearch or elasticsearch.
> let me know, about var type in opensearch or elasticsearch.
default type is text. if i change keyword type, is there benefit?
⏺ In OpenSearch/Elasticsearch, field types have important differences:
text (default):
- Analyzed and tokenized for full-text search
- Supports partial matching, fuzzy search
- Higher storage overhead
- Slower for exact matches and aggregations
keyword:
- Stored as-is, not analyzed
- Exact match only
- Lower storage overhead
- Faster for filtering, sorting, aggregations
- Better for structured data (IDs, statuses, categories)
Benefits of keyword type:
- Performance: Much faster exact matches and filtering
- Aggregations: Enables terms aggregations, cardinality counts
- Storage: Uses less disk space
- Sorting: Efficient sorting operations
Use keyword for:
- IDs, usernames, API keys
- Status codes, categories
- URLs, file paths
- Any field you filter/sort by exactly
Use text for:
- Log messages, descriptions
- Content that needs full-text search
- Fields where you search partial matches
For your API logging, consider keyword type for fields like user_id,
app_name, status_code, endpoint.
========================================================
Numeric Types:
- long, integer, short, byte - Integer numbers
- double, float, half_float - Floating point numbers
- scaled_float - Decimal numbers with scaling factor
Date Types:
- date - Timestamps and dates
- date_nanos - High precision timestamps
Boolean:
- boolean - true/false values
Binary:
- binary - Base64 encoded binary data
Range Types:
- integer_range, float_range, long_range, double_range, date_range,
ip_range
Complex Types:
- object - JSON objects (nested fields)
- nested - Array of objects with independent querying
- flattened - JSON object as single field
Specialized:
- ip - IPv4/IPv6 addresses
- geo_point - Latitude/longitude coordinates
- geo_shape - Complex geographic shapes
- completion - Auto-complete suggestions
- search_as_you_type - Optimized for search-as-you-type
- alias - Field aliases
For your API logs, consider:
- date for timestamps
- ip for IP addresses
- integer for status codes
- long for response times (milliseconds)
- keyword for user IDs, endpoints
- text for error messages