Skip to content

data.tst #

Ternary Search Tree (TST) Implementation Plan

STILL BROKEN, CAN'T FIND YET

Overview

A Ternary Search Tree (TST) is a specialized tree structure where each node has three children:- Left: For characters less than the current node's character

  • Middle: For characters equal to the current node's character (leads to the next character in the sequence)
  • Right: For characters greater than the current node's character

In this implementation, we'll focus on the core TST functionality without any balancing mechanisms. Nodes will be inserted in their natural position based on character comparisons.

Data Structures

classDiagram
    class Node {
        +char character
        +bool is_end_of_string
        +[]u8 value
        +u32 left_id
        +u32 middle_id
        +u32 right_id
    }

    class TST {
        +&ourdb.OurDB db
        +u32 root_id
        +set(key string, value []u8)
        +get(key string) []u8
        +delete(key string)
        +list(prefix string) []string
        +getall(prefix string) [][]u8
    }

    TST --> Node : contains

Core Components

  1. Node Structure:
  • character: The character stored at this node
  • is_end_of_string: Flag indicating if this node represents the end of a key
  • value: The value associated with the key (if this node is the end of a key)
  • left_id, middle_id, right_id: Database IDs for the three children
  1. TST Structure:
  • db: Reference to the ourdb database for persistence
  • root_id: Database ID of the root node
  1. Serialization/Deserialization:
  • Use the same encoder/decoder as the radix tree
  • Serialize node properties efficiently

Key Operations

1. Insertion

flowchart TD
    A[Start Insertion] --> B{Is Tree Empty?}
    B -->|Yes| C[Create Root Node]
    B -->|No| D[Start at Root]
    D --> E{Current Char vs Node Char}
    E -->|Less Than| F{Left Child Exists?}
    F -->|Yes| G[Go to Left Child]
    F -->|No| H[Create Left Child]
    E -->|Equal To| I{End of Key?}
    I -->|Yes| J[Set End of String Flag]
    I -->|No| K{Middle Child Exists?}
    K -->|Yes| L[Go to Middle Child]
    K -->|No| M[Create Middle Child]
    E -->|Greater Than| N{Right Child Exists?}
    N -->|Yes| O[Go to Right Child]
    N -->|No| P[Create Right Child]
    G --> E
    L --> Q[Move to Next Char]
    Q --> E
    O --> E
    H --> R[Insertion Complete]
    J --> R
    M --> Q
    P --> R
    C --> R

The insertion algorithm will:1. Navigate the tree based on character comparisons2. Create new nodes as needed3. Set the end-of-string flag when the entire key is inserted

flowchart TD
    A[Start Search] --> B{Is Tree Empty?}
    B -->|Yes| C[Return Not Found]
    B -->|No| D[Start at Root]
    D --> E{Current Char vs Node Char}
    E -->|Less Than| F{Left Child Exists?}
    F -->|Yes| G[Go to Left Child]
    F -->|No| C
    E -->|Equal To| H{End of Key?}
    H -->|Yes| I{Is End of String?}
    I -->|Yes| J[Return Value]
    I -->|No| C
    H -->|No| K{Middle Child Exists?}
    K -->|Yes| L[Go to Middle Child]
    K -->|No| C
    E -->|Greater Than| M{Right Child Exists?}
    M -->|Yes| N[Go to Right Child]
    M -->|No| C
    G --> E
    L --> O[Move to Next Char]
    O --> E
    N --> E

3. Deletion

flowchart TD
    A[Start Deletion] --> B{Is Tree Empty?}
    B -->|Yes| C[Return Not Found]
    B -->|No| D[Find Node]
    D --> E{Node Found?}
    E -->|No| C
    E -->|Yes| F{Is End of String?}
    F -->|No| C
    F -->|Yes| G{Has Children?}
    G -->|Yes| H[Mark Not End of String]
    G -->|No| I[Remove Node]
    H --> J[Deletion Complete]
    I --> J

For deletion, we'll use a simple approach:1. If the node has children, just mark it as not end-of-string2. If the node has no children, remove it from its parent3. No rebalancing is performed

flowchart TD
    A[Start Prefix Search] --> B{Is Tree Empty?}
    B -->|Yes| C[Return Empty List]
    B -->|No| D[Navigate to Prefix Node]
    D --> E{Prefix Found?}
    E -->|No| C
    E -->|Yes| F[Collect All Keys from Subtree]
    F --> G[Return Collected Keys]

The prefix search will:1. Navigate to the node representing the end of the prefix2. Collect all keys in the subtree rooted at that node3. Return the collected keys

5. Serialization/Deserialization

flowchart TD
    A[Serialize Node] --> B[Add Version Byte]
    B --> C[Add Character]
    C --> D[Add is_end_of_string Flag]
    D --> E[Add Value if End of String]
    E --> F[Add Child IDs]
    F --> G[Return Serialized Data]

    H[Deserialize Node] --> I[Read Version Byte]
    I --> J[Read Character]
    J --> K[Read is_end_of_string Flag]
    K --> L[Read Value if End of String]
    L --> M[Read Child IDs]
    M --> N[Return Node Object]

Implementation Steps

  1. Create Basic Structure (tst.v):
  • Define the Node and TST structures
  • Implement constructor function
  1. Implement Serialization (serialize.v):
  • Create serialize_node and deserialize_node functions
  • Ensure compatibility with the encoder
  1. Implement Core Operations:
  • set (insert)
  • get (search)
  • delete
  • list (prefix search)
  • getall (get all values with prefix)
  1. Create Test Files:
  • Basic functionality tests
  • Prefix search tests
  • Serialization tests
  • Performance tests for various dataset sizes
  1. Add Documentation:
  • Create README.md with usage examples
  • Add inline documentation

File Structure

tst/
├── tst.v                 # Main implementation
├── serialize.v           # Serialization functions
├── tst_test.v            # Basic tests
├── prefix_test.v         # Prefix search tests
├── serialize_test.v      # Serialization tests
└── README.md             # Documentation

Performance Considerations

  1. Unbalanced Tree Characteristics:
  • Performance may degrade over time as the tree becomes unbalanced
  • Worst-case time complexity for operations could approach O(n) instead of O(log n)
  • Best for datasets where keys are inserted in a somewhat random order
  1. Optimizations:
  • Efficient node serialization to minimize storage requirements
  • Careful memory management during traversal operations
  • Optimized string handling for prefix operations
  1. Database Interaction:
  • Minimize database reads/writes
  • Only create new nodes when necessary
  1. Large Dataset Handling:
  • Efficient prefix search algorithm
  • Optimize node structure for millions of entries

fn namefix #

fn namefix(s string) string

namefix normalizes a string for consistent key handling- removes leading/trailing whitespace

  • converts to lowercase
  • replaces special characters with standard ones

fn new #

fn new(args NewArgs) !TST

Creates a new ternary search tree with the specified database path

struct NewArgs #

@[params]
struct NewArgs {
pub mut:
	path  string
	reset bool
}

struct TST #

@[heap]
struct TST {
mut:
	db      &ourdb.OurDB // Database for persistent storage
	root_id u32          // Database ID of the root node
}

TST represents a ternary search tree data structure

fn (TST) delete #

fn (mut self TST) delete(key string) !

Deletes a key from the tree

fn (TST) get #

fn (mut self TST) get(key string) ![]u8

Gets a value by key from the tree

fn (TST) getall #

fn (mut self TST) getall(prefix string) ![][]u8

Gets all values for keys with a given prefix

fn (TST) list #

fn (mut self TST) list(prefix string) ![]string

Lists all keys with a given prefix

fn (TST) set #

fn (mut self TST) set(key string, value []u8) !

Sets a key-value pair in the tree