Automation Action: Embedded Vector Database

Store and search vectors linked to text or external data using the embedded vector database.

ThinkAutomation includes a built-in 'vector database'. A vector database is a type of database designed to store, index, and search data represented as vectors, typically high-dimensional numerical arrays. These vectors are often embeddings - mathematical representations of data such as text, images, audio, or other unstructured content -generated by machine learning models. When searching, instead of exact matches (like in traditional databases), vector databases find similar items using approximate nearest neighbor (ANN) algorithms.

The ThinkAutomation vector database allows you to store vectors for any external data (such as database records, images or document and email content). With each vector record you also store an external 'title'. When a search is performed the closest matching titles (and optionally the text itself) will be returned (in relevancy order). You could then use these title values to lookup the actual data and add this to the 'context' for the Ask AI action, or provide advanced search results.

The ThinkAutomation Embedded Knowledge Store allows you to add the content and embeddings to 'articles' that can then be used as context for the Ask AI action. However, this is limited to about 25,000 articles, since the search is performed in memory. The Vector Database on the other hand has no limit, since the database is maintained on disk.

Collection Name

Title/vector pairs are contained within a Collection. Multiple collections can be used. Collection names can contain letters or numbers only. Title/vector pair collections are global to the ThinkAutomation instance (IE: The same collection can be used on all Solutions/Automations).

From the Vector Operation list, choose: Update, Search, Delete, Drop or Count:

Update

Add or update a record in the vector database collection. If a record with the specified title does not exist, a new record will be added, otherwise the existing record will be updated.

Specify the Title. The title can be any text. This should be some form of unique id for record (such as a document title, file path or database primary key).

Specify the Text. This is the text content you want to store vectors (embeddings) for. If you have setup an AI Provider in the ThinkAutomation Server Settings, then you can enable the Get Embeddings option. When the record is saved the AI Provider will be called to obtain the embeddings, which will then be used as the vectors.

Enable the Save Text option if you want the actual text stored with the vectors in the database. The text can then be returned when a search is performed. If this option is not enabled, then only the key and the vectors will be stored. You would then use the returned keys to lookup the actual text when a search is performed.

If the add is successful then the title value will be assigned to the variable specified in the Assign To list.

You can also specify the vectors in the text itself. This is for use cases where you are obtaining vectors via another method. In this case the Get Embeddings and Save Text options should be disabled.
Note: The number of vector dimensions must be the same for each record. For example, if the first record added has vectors with 1024 dimensions, then all subsiquent records added to the same collection must have the same vector dimensions. Different collections can have vectors with different dimensions.

Search

Search the vector database for relevant items based on the Search Text text. You can return the Top x most relevant items - in relevance order. The Relevancy Threshold setting controls the relevancy level. Items below the relevancy % will not be included. This value defaults to 20%.

If you have setup an AI Provider in the ThinkAutomation Server Settings, then you can enable the Get Embeddings option. Before the search is performed the AI Provider will be called to obtain the embeddings, which will then be used as the vectors. The number of vector dimensions for the search text must be the same as the vector dimensions stored in the database.

You can specify the Max Tokens to return. When a record is added to the vector database, the number of tokens used in the text is also saved. Search results will be limited to the max tokens specified. This is useful when using the vector database search along with the Ask AI action.

In the Return As list select either:

  • Titles Only (One Per Line) : The search will return a list of titles, one per line. You can then use these to lookup the source data separately.
  • Titles And Text : The search will include the title and text content (if the Save Text option was enabled when the record was added).
  • Json : A JSON array will be returned containing the search results in the following format:
[
    {
      "Title": "About Parker Software",
      "Text": "Parker Software is an independent software house.",
      "Similarity": 0.78213344,
      "Tokens": 4
    },
    {
        ...
    }
]            

Specify Json if you are searching for items to add as context for the Ask AI action.

Select the variable to receive the results from the Assign To list.

Delete

Delete an existing item. Specify the Title to delete. If the delete was successful the title will be returned to the variable specified in the Assign To list.

Drop

Drops the entire collection. If the drop was successful the collection name will be returned to the variable specified in the Assign To list.

Count

Returns the total number of records stored in the specified collection. The count will be returned to the variable specified in the Assign To list.

Vector Database Use Cases
  • As a local document search engine : Use the Convert Document To Text action to obtain the plain text for local documents or incoming attachments. Add these to the Vector Database using the document file path as the title. A separate automation could then perform a search and return the top x matching document titles.
  • As context for the Ask AI action. For use cases where the number of items is too large for the Embedded Knowledge Store.