Starcoder2 model #28

jlamypoirier · 2024-01-10T22:50:11Z

The mistral model adapted to run Starcoder 2:

Use layer norm (RMS still available as option)
Use standard MLP (gated still available as option)
Add back biases (optional)
Change (default?) tokenizer class

Missing for starcoder 1 (do we want to support it?):

Absolute position embeddings

Other notes:

Has less entries in modeling auto than gpt bigcode (3 instead of 6), probably doesn't matter
Using repeat for kv cache in flash attn, might not be necessary.

Still got a bunch of minor things to do (see todos)

jlamypoirier added 4 commits January 10, 2024 15:20

Copy model

81bcfbd

changes

4f2df8e

misc

5b88238

fixes

e0ec999

NouamaneTazi mentioned this pull request Feb 5, 2024

Make modeling compatible with Nanotron + few optims #23

Closed

5 tasks

RaymondLi0 and others added 3 commits February 19, 2024 14:13

add embed and residual dropout (#30)

4983a75

Merge branch 'hf_main' into starcoder2

65f9c26

misc

7fac7d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Starcoder2 model #28

Starcoder2 model #28

Uh oh!

jlamypoirier commented Jan 10, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Starcoder2 model #28

Are you sure you want to change the base?

Starcoder2 model #28

Uh oh!

Conversation

jlamypoirier commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jlamypoirier commented Jan 10, 2024 •

edited

Loading