Add Nomic BERT model #440

georgeguimaraes · 2026-01-03T21:14:46Z

I needed nomic-embed-text-v1.5 for a project, so I added support for it.

It's a BERT variant with a few differences:

Rotary position embeddings (base 1000)
SwiGLU activation in FFN
No biases in attention/FFN layers
Combined Wqkv projection

I tested it against Python transformers and the outputs match within floating point precision (~2e-6).

Hello world: ✓ PASS (max diff: 2.0e-6)
The quick brown fox: ✓ PASS (max diff: 3.0e-6)

Add support for nomic-ai/nomic-embed-text-v1.5 embedding model. Architecture: - Postnorm transformer (standard BERT-style) - SwiGLU activation (up * silu(gate)) - Rotary position embeddings (RoPE) with base 1000 - Combined Wqkv projection - No biases in attention and FFN layers - Mean pooling over non-masked tokens Tested against Python transformers with ~2e-6 precision.

Copilot

Pull request overview

This PR adds support for the Nomic BERT model family, specifically the nomic-embed-text-v1.5 embedding model. This is a BERT variant that incorporates modern architectural improvements including Rotary Position Embeddings (RoPE), SwiGLU activation functions, and bias-free attention/FFN layers.

Key Changes:

Implementation of Nomic BERT architecture with combined Wqkv projection and gated FFN
Mean pooling for generating sentence embeddings
HuggingFace model loading support with parameter conversion from combined Wqkv weights

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
lib/bumblebee/text/nomic_bert.ex	Complete model implementation including embedder, encoder with postnorm transformer blocks, gated FFN (SwiGLU), and HuggingFace config/parameter mappings
test/bumblebee/text/nomic_bert_test.exs	Integration test that loads nomic-embed-text-v1.5 from HuggingFace and verifies output shapes and values against Python transformers
lib/bumblebee.ex	Registers NomicBertModel architecture and maps nomic_bert model type to BERT tokenizer
mix.exs	Adds Bumblebee.Text.NomicBert to documentation models list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/bumblebee/text/nomic_bert.ex

Copilot · 2026-01-03T21:18:27Z

lib/bumblebee/text/nomic_bert.ex

+        doc: "the activation function"
+      ],
+      rotary_embedding_base: [
+        default: 10_000,


The description states that Nomic BERT uses "Rotary position embeddings (base 1000)", and the test verifies that rotary_embedding_base is 1000 for the loaded model. However, the default value here is set to 10_000. This mismatch could cause issues when users try to create models from scratch without loading from HuggingFace. The default should be 1000 to match the actual Nomic BERT specification.

Suggested change

default: 10_000,

default: 1000,

lib/bumblebee/text/nomic_bert.ex

- Fix max_positions doc: remove incorrect "vocabulary size" reference - Fix rotary_embedding_base default: change from 10_000 to 1000 - Fix normalization doc: correct "pre-normalization" to "post-normalization" - Fix position_ids doc: clarify usage with RoPE instead of position embeddings

jonatanklosko

NomicBERT is not implemented in Python transformers directly, instead the source code lives on HuggingFace Hub and that model implementation is used on the fly. Normally we don't add such models, because the source is more likely to change and sometimes there is more than one source code.

That said, it seems that all nomic models point to the reference implementation from nomic-ai/nomic-bert-2048 and it's been a while since the last changes. So in this case I'd say it's ok to add it.

I added comments inline.

lib/bumblebee/text/nomic_bert.ex

test/bumblebee/text/nomic_bert_test.exs

jonatanklosko · 2026-01-05T14:57:37Z

test/bumblebee/text/nomic_bert_test.exs

+    assert_all_close(
+      outputs.hidden_state[[.., 0, 0..4]],
+      Nx.tensor([[1.3752, 0.7431, -4.6988, -0.6574, 2.1887]]),
+      atol: 1.0e-3


We should not need to override :atol here, if it fails with the default 1.0e-4, it most likely means the model implementation does not match the reference one, so that needs to be addressed.

It's worth having a look at the config attributes. For example
mlp_fc1_bias and mlp_fc2_bias seem relevant, but we don't import those.

yeah, i was disabling bias for every model. Now I'm importing the appropriate config.

jonatanklosko · 2026-01-05T15:02:22Z

lib/bumblebee/text/nomic_bert.ex

+    # Nomic BERT uses postnorm (like standard BERT):
+    # Each block:
+    #   attn_output = attention(hidden_states)
+    #   hidden_states = norm1(attn_output + hidden_states)
+    #   ffn_output = ffn(hidden_states)
+    #   hidden_states = norm2(ffn_output + hidden_states)


This seems standard, is there any reason we cannot use Layers.Transformer.blocks, as we do for most other models?

Good catch! I initially thought the model implementation should be self-contained, but honestly I just didn't pay close attention to what Layers.Transformer.blocks already provides. Thanks

I initially thought the model implementation should be self-contained

That's what hf/transformers do, but it has tradeoffs, and in our case we ended up normalizing and sharing all of the core transformer logic. This makes it easier to maintain and add new models, because most LLMs introduce just a few differences.

test/bumblebee/text/nomic_bert_test.exs

Round intermediate_size to multiple of 256 to match Python's GatedMLP behavior.

georgeguimaraes · 2026-01-06T14:40:36Z

using tiny model now

Copilot AI review requested due to automatic review settings January 3, 2026 21:14

Copilot started reviewing on behalf of georgeguimaraes January 3, 2026 21:15 View session

georgeguimaraes force-pushed the nomic-bert branch from 462663b to 513b16c Compare January 3, 2026 21:15

Copilot AI reviewed Jan 3, 2026

View reviewed changes

jonatanklosko reviewed Jan 5, 2026

View reviewed changes

georgeguimaraes added 6 commits January 5, 2026 16:40

Remove custom atol override in test

3874426

Use standard test inputs and slice pattern

4aa7c0d

Use Layers.Transformer.blocks for encoder

3b89413

Make intermediate_size optional, default to 4 * hidden_size

9e20881

Import mlp_fc1_bias and mlp_fc2_bias config attributes

e30d3db

Fix formatting in nomic_bert_test.exs

2ad695c

georgeguimaraes requested a review from jonatanklosko January 5, 2026 20:08

Use tiny-random model in tests

e987c75

Round intermediate_size to multiple of 256 to match Python's GatedMLP behavior.

Add Nomic BERT model #440

Are you sure you want to change the base?

Add Nomic BERT model #440

Conversation

georgeguimaraes commented Jan 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonatanklosko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonatanklosko Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

georgeguimaraes Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

georgeguimaraes Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

jonatanklosko Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

georgeguimaraes commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants