Skip to content
View alexpalms's full-sized avatar
🚀
🚀

Organizations

@diambra

Block or report alexpalms

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
alexpalms/README.md

Alessandro (Alex) Palmas

Senior AI / ML Research Engineer
Specializing in LLMs / VLMs / VLAs, Reinforcement Learning, and Embodied / Physical AI


🧠 About

I’m a senior AI/ML research engineer with 15+ years of experience building intelligent agents and AI systems that understand and reason about the world. My work spans foundation models, reinforcement learning, and physics-based simulation, continuously expanding into VLMs, VLAs, and embodied multimodal AI.

Currently part of the core research and engineering team at LawZero, a non-profit AI lab in Montreal led by Yoshua Bengio, I focus on advancing truthful, transparent, and safe-by-design AI through next-generation foundational models, and reasoning architectures.

My passion is right at the intersection of AI, simulation, and decision-making. I focus on:

  • 🧠 Foundation Models & Alignment: LLM / VLM / VLA fine-tuning and alignment (SFT, DPO, RLVR/RLHF/RLAIF) with a focus on interpretable, safe, and truthful reasoning.
  • 🧭 Reinforcement Learning: Online/offline, adversarial, multi-agent, imitation learning, and human-in-the-loop optimization.
  • 🛰️ Embodied AI & High-Fidelity Simulation: Isaac Lab, Genesis, and physics-based virtual environments for grounded RL and world modeling.

✨ What I’m Working On Now

  • Experimenting with large-scale, distributed RL finetuning, using verifiable rewards for code-focused LLMs
  • Building knowledge and hands-on experience on RL applied to foundation models
  • Exploring the landscape around Vision-Language-Action models: SOTA architectures, training playbooks and recipes, benchmarks, data pipelines, and tooling
  • Standardizing approaches and pipelines for DeepRL-based optimal decision making and control in real-world applications

🔗 Let's Connect

I’m always happy to chat with people who share my interests and passions. Feel free to reach out!


🚀 Featured Projects

Here are a few of my public repositories that reflect my work and interests:

ContaineRL

🐋 ContaineRL - Containerize your RL Environments and Agents

Toolkit to package and deploy reinforcement learning environments and agents inside reproducible containers.

RL-Drone-Swarm

🛡️ RL Drone Swarm Defense

Reinforcement learning framework for decision-level interception prioritization of drone swarms.

DIAMBRA Arena

🕹️ DIAMBRA Arena

A platform to train reinforcement learning agents in classic retro fighting games.

DIAMBRA Agents

🤖 DIAMBRA Agents

A library of reinforcement learning algorithms tailored for DIAMBRA Arena environments.

RSL-RL

🔀 Extending GPU-Native RSL-RL Library

Customized fork of RSL-RL library that supports multi-discrete action spaces.


Pinned Loading

  1. diambra/arena diambra/arena Public

    DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation

    Python 354 26

  2. diambra/agents diambra/agents Public

    Example Agents for DIAMBRA Arena Environments

    Python 17 6

  3. deeprl-counter-uav-swarm deeprl-counter-uav-swarm Public

    Reinforcement learning framework for decision-level interception prioritization of drone swarms.

    Python 13 1

  4. containerl containerl Public

    Toolkit to package and deploy reinforcement learning environments and agents inside reproducible containers

    Python

  5. DIAMBRA-Arena-MARL-TLeague DIAMBRA-Arena-MARL-TLeague Public

    A customization of Tencent’s TLeague for self-play in DIAMBRA Arena. Enables population-based adversarial training in multi-agent environments using league-style training loops and dynamic opponent…

    1

  6. diambra-game-painter diambra-game-painter Public

    This project is an experiment that applies in real-time the style of famous paintings to popular fighting retro games, which are provided as Reinforcement Learning environments by DIAMBRA. It is ba…

    Python 1