The fulltext index is attached as a hidden/shadow column to the source table column. In SQL you specify:
CREATE FULLTEXT INDEX <index_name> ON <table_name> ( <column_name> ) [<fulltext_parameter>] The subsequent fulltext index created can only be accessed indirectly via search, i.e. in SQL using SELECT with the CONTAINS predicate. The fulltext index is not a table, which you could directly access.
In contrast, text analysis results are found in a table and is an option to the fulltext index. In SQL you add the following parameter:
CREATE FULLTEXT INDEX <index_name> ON <table_name> ( <column_name> ) TEXT ANALYSIS ON The subsequent $TA_<index_name> table created in the database allows you to enjoy the benefits of SQL operations such as calculate and aggregate functions. In addition, you could utilize other HANA engines like Spatial or Predictive on the text analysis annotations.
Requirements: Columnar table with character or binary content
FAST PREPROCESS OFF is the key to activate text analysis -> then uses text analysis for tokenization and for the stemming TEXT ANALYSIS ON does not have to be specified
An explicit full-text index is created after the source table (needs to already be there). You must drop and re-create an explicit index to modify text analysis parameters
By default, all inputs are assumed to be English! Specify the possible languages when creating the full text index. If the confidence is low for the first language, then it will bypass to the next one. There is a parameter where you can change this confidence level. You can also specify one language for each row with LANGUAGE COLUMN and bypass automatic language detection.
Binary Document Formats:
You can also specify the input format
You can change the tokenization behavior of the system using TOKEN SEPARATORS parameter. It is an additional level of token-breaking applied after text analysis processing. The default if not specified: /;,.:-_()[]<>!?*@+{}="& Can be disabled by specifying empty TOKEN SEPARATORS
Specify TEXT ANALYSIS ON to access text analysis results. Result:
Synchronous processing
-- Standard syntax explicit fulltext index
CREATE FULLTEXT INDEX <index_name> ON <schema_name> <table_name> ( <column_name> ) [<fulltext_parameter>]
-- Explicit fulltext index with text analysis activation in MLP engine for tokenization and normalization
CREATE FULLTEXT INDEX <index_name> ON <schema_name> <table_name> ( <column_name> ) FAST PREPROCESS OFF
-- Access text analysis results
CREATE FULLTEXT INDEX <index_name> ON <schema_name> <table_name> ( <column_name> ) TEXT ANALYSIS ON
-- Drop a full text index
DROP FULLTEXT INDEX [INDEX NAME];
-- Example of how to specify the possible languages when creating the full text index
CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON PRODUCT_REVIEWS(CONTENT) FAST PREPROCESS OFF
LANGUAGE DETECTION ('EN', 'DE', 'FR');
-- Example of how to specify one language for each row
CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON PRODUCT_REVIEWS(CONTENT) FAST PREPROCESS OFF
LANGUAGE COLUMN language;
-- Example of how to combine language for each row with automatic language detection
CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON PRODUCT_REVIEWS(CONTENT) FAST PREPROCESS OFF
LANGUAGE COLUMN language LANGUAGE DETECTION ('EN', 'DE', 'FR')
-- Example of how to specify the input format with Text Analysis
CREATE FULLTEXT INDEX PRODUCT_REVIEWS_IDX ON PRODUCT_REVIEWS(CONTENT) FAST PREPROCESS OFF
MIMETYPE COLUMN MIMETYPE
MIMETYPE 'text/plain';
-- How to customize token breaks
CREATE FULLTEXT INDEX <index_name> ON <schema_name> <table_name> ( <column_name> ) TEXT ANALYSIS ON
TOKEN SEPARATORS 'token_separator';
-- How to disable token breaks
CREATE FULLTEXT INDEX <index_name> ON <schema_name> <table_name> ( <column_name> ) TEXT ANALYSIS ON
TOKEN SEPARATORS '';
-- Define a configuration for the fulltext index
CREATE FULLTEXT INDEX <index_name> ON
<schema_name> <table_name> ( <column_name> ) FAST PREPROCESS OFF
LANGUAGE DETECTION ('EN', 'DE', 'FR') TEXT ANALYSIS ON
CONFIGURATION 'EXTRACTION_CORE/EXTRACTION_CORE_ENTERPRISE/EXTRACTION_CORE_PUBLIC_SECTOR/EXTRACTION_CORE_VOICEOFCUSTOMER/
GRAMMATICAL_ROLE_ANALYSIS/LINGANALYSIS_BASIC/LINGANALYSIS_FULL/LINGANALYSIS_STEMS';