1. Home
  2. Science
  3. Semantic Folding

A new brain model of language

Hierarchical Temporal Memory including Cortical Learning Algorithms

Semantic Folding Theory
and its application in Semantic Fingerprinting

A Cortical.io White Paper Version 1.0
Author: Francisco E. De Sousa Webber

Natural Language Understanding inspired by neuroscience

Language Intelligence

With Semantic Folding:

Words, sentences and whole texts can be compared to each other

NLP tasks like classification and semantic search are highly efficient

The system is trained in a fully unsupervised manner

No need for large language models nor expensive computing resources

Taking the Hierarchical Temporal Memory (HTM) theory, a computational theory of the human cortex developed by Numenta, as a starting point, Cortical.io has developed Semantic Folding, a corresponding theory of language representation.

Semantic Folding describes a method of converting text into a semantically grounded representation called a semantic fingerprint. Semantic fingerprints are Sparse Distributed Representations (SDR) of words: large binary vectors that are very sparsely filled, with every bit representing distinct semantic information.

Many practical problems of statistical Natural Language Processing (NLP) systems and, more recently, of Transformer models, like the necessity of creating large training data sets, the high cost of computation, the fundamental incongruity of precision and recall, the complex tuning procedures, etc., can be elegantly overcome by applying Semantic Folding to text processing.

 

Semantic Folding Simply Explained:
Watch a Short Video

Semantic Folding converts text in semantic fingerprints, encapsulating meaning in a topographical representation.

Semantic fingerprints allow direct comparison of the meanings of any two pieces of text, showing thousands of semantic relations.

If two semantic fingerprints look similar, it means that the texts are semantically similar too.

With Semantic Folding, semantic spaces are stable across languages, enabling direct comparison of text across languages without machine translation.

How does Semantic Folding work?

To begin with, we select reference material that represents the domain the system will work in – Wikipedia for applications using general English, or domain-related collections of documents for industry-specific applications.

Then, the reference documents are cut into context-based snippets which are distributed over a 2D matrix, in such a way that snippets with similar topics (sharing many common words) are placed close to each other on the map. This process creates a 2D semantic map.

In the next step, a vector is created for each word contained in the reference documents, by activating the positions of all snippets containing this word. This produces a large, binary, very sparsely filled vector called a Semantic Fingerprint.

A Semantic Fingerprint is a vector of 16,384 bits (128×128) where every bit stands for a concrete context (topic) that can be realized as a bag of words of the training snippets at this position.

The whole Semantic Folding process is fully unsupervised.

    Applications of Semantic Folding

    Semantic Folding builds the basis for high-level natural language processing functionalities that can be integrated in many different applications.

    • Semantic fingerprints can be generated for language elements like words, sentences and entire documents.
    • Any two pieces of text can be compared, regardless of length or language.
    • Computational operations can be performed on the meaning of text data by measuring the overlap of semantic fingerprints.

    Semantic fingerprints work particularly well for NLP tasks like:

    • Classification: instead of training the classifier with many labeled examples, one reference fingerprint can be used to describe a class
    • Semantic search: comparing the semantic overlap between the semantic fingerprint of a query in natural language and the fingerprints of the indexed documents proves to be both highly accurate and efficient.

    Advantages of Semantic Folding

    • High Accuracy

    Semantic fingerprints leverage a rich semantic feature set of 16k parameters, enabling a fine-grained disambiguation of words and concepts.

    • High Efficiency

    Semantic Folding requires order of magnitude less training material (100s vs, 1’000s) and less compute resources because it uses sparse distributed vectors.

    • High Transparency & Explainability

    Each semantic feature can be inspected at the document level so that biases can be eliminated in the models and results explained.

    • High Flexibility & Scalability

    Semantic Folding can be applied to any language and use case and business users can easily customize models.

    The Future of AI is High Efficiency AI

    Watch the Video

    What is a Sparse Distributed Representation?

    Read more