How to Use MAMMAL for Biomedical Discovery: Model, Setup, and Research Workflow | Cure Cancer With AI

MAMMAL, short for Molecular Aligned Multi-Modal Architecture and Language, is a biomedical foundation model from IBM Research designed for cross-modal drug discovery tasks. The model was described in the Nature portfolio article "MAMMAL - Molecular Aligned Multi-Modal Architecture and Language for biomedical discovery", published in npj Drug Discovery on May 4, 2026.

This guide explains how to use MAMMAL at a practical level: where the model lives, what it can accept as input, which tasks it supports, and how a research team might fit it into a biomedical discovery workflow. It is written for researchers, developers, and technical readers searching for terms like how to use MAMMAL, MAMMAL Hugging Face, MAMMAL drug discovery, and biomedical foundation model.

What MAMMAL Is Built To Do

MAMMAL is not a general chatbot. It is a biomedical AI model built around molecular and omics inputs. According to the Nature article, MAMMAL was pre-trained on roughly 2 billion samples spanning protein and antibody sequences, small molecules, and gene expression profiles. That multimodal design is the key difference: instead of forcing every drug discovery task into a single data type, MAMMAL uses a structured prompt syntax that can combine multiple biomedical entities in one model input.

The model supports classification, regression, and generation-style tasks. In practical terms, that means MAMMAL can be adapted for workflows such as drug-target interaction prediction, protein-protein interaction prediction, antibody design, blood-brain barrier permeability prediction, toxicity prediction, cell-type annotation, and cancer drug response modeling.

Where To Get MAMMAL

The public pretrained weights and tokenizer are available on Hugging Face at ibm-research/biomed.omics.bl.sm.ma-ted-458m. The official implementation, examples, fine-tuning code, and inference utilities are available in the BiomedSciAI/biomed-multi-alignment GitHub repository.

The project documentation states that MAMMAL is tested with Python 3.10 or newer and PyTorch 2.0 or newer. The repository can be installed from source, and the package can also be installed from PyPI. For researchers who want to reproduce examples or build custom workflows, the GitHub repository is the best starting point because it includes task-specific examples and configuration files.

Basic Setup For MAMMAL

A typical local setup starts with a clean Python environment, PyTorch, and the MAMMAL package. The official GitHub README shows both editable source installation and package installation. A research-oriented setup would usually look like this:

conda create -n mammal_env python=3.10 -y
conda activate mammal_env

# Install PyTorch using the build that matches your CPU/GPU environment.
# Then install MAMMAL from PyPI or from the source repository.
pip install biomed-multi-alignment[examples]

If you need the latest examples and configuration files, use the GitHub repository instead:

git clone https://github.com/BiomedSciAI/biomed-multi-alignment.git
pip install -e ./biomed-multi-alignment[examples]

Loading The Model And Tokenizer

The Hugging Face model card shows the core usage pattern. You load the model with the MAMMAL package, set it to evaluation mode, and load the modular tokenizer from the same Hugging Face model name:

import torch
from fuse.data.tokenizers.modular_tokenizer.op import ModularTokenizerOp
from mammal.model import Mammal
from mammal.keys import *

model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-458m")
model.eval()

tokenizer_op = ModularTokenizerOp.from_pretrained(
    "ibm/biomed.omics.bl.sm.ma-ted-458m"
)

The important concept is that MAMMAL expects task-specific biomedical prompts. Those prompts include tokenizer type, task tokens, molecular entity markers, sequence boundaries, and input entities such as amino acid sequences or SMILES strings. This is why the official examples matter: they show the exact prompt format expected by supported tasks.

Example Workflow: Protein-Protein Interaction Prediction

For a protein-protein interaction task, MAMMAL can receive two amino acid sequences in a single structured prompt. The Hugging Face model card demonstrates a prompt for calmodulin and calcineurin, then generates a prediction from the model. The high-level workflow is:

First, choose the task token for the prediction problem. Second, format the protein sequences using the amino acid tokenizer markers and sequence start/end markers. Third, tokenize the prompt into model inputs. Fourth, run generation or prediction with the loaded model. Finally, decode the output with the tokenizer.

For new users trying to learn how to use MAMMAL, this is the most important lesson: MAMMAL usage is less like prompting a conversational LLM and more like preparing a structured biomedical input record for a model that understands multiple molecular modalities.

Fine-Tuning MAMMAL For A Specific Task

The GitHub repository includes examples for protein solubility prediction, drug carcinogenicity prediction, drug-target interaction prediction, and cell-line drug response prediction. Fine-tuning is typically run through configuration files, for example:

python mammal/main_finetune.py --config-name config.yaml --config-path examples/dti_bindingdb_kd

For drug-target interaction workflows, the model expects a target amino acid sequence and a small-molecule SMILES representation. For cell-line drug response workflows, the model can combine the drug SMILES string with a gene-expression profile. That multimodal input pattern is why MAMMAL is especially relevant to precision oncology and cancer drug discovery research.

Best Practices Before Using MAMMAL In Research

Start with the official examples before creating a custom task. Confirm that the task you need matches one of the supported formats: protein sequences, antibody sequences, small molecules, or gene expression profiles. Keep track of train, validation, and test splits so benchmark comparisons remain meaningful. Treat MAMMAL predictions as research signals, not clinical recommendations.

For cancer researchers, MAMMAL may be useful for hypothesis generation around drug response, molecular interaction, toxicity, and therapeutic design. For patients and families, the main takeaway is not that MAMMAL is a treatment, but that AI cancer research is becoming more capable of connecting different forms of biomedical evidence.

What To Read Next

For a plain-English overview, read What Is MAMMAL AI?. For oncology-specific context, see MAMMAL for Cancer Drug Discovery. You can also browse the Cure Cancer With AI blog for more updates on artificial intelligence in oncology and machine learning drug discovery.

At Cure Cancer With AI, we track developments like MAMMAL because they help explain where biomedical AI is heading: toward models that can reason across proteins, molecules, gene expression, and disease biology. This article is educational and should not be used as medical advice.