Skip to content

Conversation

@ldrozdz93
Copy link
Owner

@ldrozdz93 ldrozdz93 commented Jan 20, 2026

Summary

This PR adds Azure Blob Storage source. From the user's perspective, it's intended to work in a similar manner to AWS S3 source.

Vector configuration

sources:
  azure_blob:
    type: azure_blob
    connection_string: REDACTED
    container_name: logs
    queue:
      queue_name: eventgrid

sinks:
  console:
    type: console
    inputs:
      - azure_blob
    encoding:
      codec: json  

How did you test this PR?

  • Unit tests.
  • Integration tests.
  • Manual tests, described in the steps.md.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • make build-licenses was run to regenerate the license inventory.
  • historical context: This Azure source was originally implemented for vector 0.38 and run in our product MVP for a few months. It supported just a tiny subset of features, only precisely what we needed. During that time, I've rebased it on the latest vector versions at the time. Then we've dropped the Azure Blobs approach in our product. I picked this up again a few months ago, added the intended feature parity with AWS S3 source, updated docs, extended tests etc. Basically done all that was required to contribute this to open-source.
  • This is my first opensource vector contribution. I'd appreciate feedback related to both the code and the process.

Example connection string:
```text
DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net

Check failure

Code scanning / check-spelling

Unrecognized Spelling Error

myaccount is not a recognized word. (unrecognized-spelling)
container_name = "logs"
[sources.azure_logs.queue]
queue_name = "eventgrid"

Check failure

Code scanning / check-spelling

Unrecognized Spelling Error

eventgrid is not a recognized word. (unrecognized-spelling)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new azure_blob source for Vector that enables reading logs from Azure Blob Storage via Event Grid notifications delivered through Azure Storage Queues.

Changes:

  • New Azure Blob Storage source with queue-based event processing
  • Support for compression (gzip, zstd), multiple codecs, and multiline aggregation
  • Comprehensive unit and integration tests
  • Documentation files and configuration examples

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/sources/azure_blob/mod.rs Main source implementation with streaming and event processing
src/sources/azure_blob/queue.rs Queue integration, blob retrieval, and Event Grid message processing
src/sources/azure_blob/test.rs Unit tests for compression detection and blob processing
src/sources/azure_blob/integration_tests.rs Integration tests covering various scenarios
src/internal_events/azure_queue.rs Internal event definitions for metrics and logging
website/cue/reference/components/sources/azure_blob.cue Component documentation and metadata
Cargo.toml Dependency and feature flag additions
tests/integration/azure/config/*.yaml Integration test configuration
testing/github-XXXXX/* Test artifacts and documentation (should be removed)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pub(crate) use self::aws_kinesis_firehose::*;
#[cfg(any(feature = "sources-aws_s3", feature = "sources-aws_sqs",))]
pub(crate) use self::aws_sqs::*;
// #[cfg(feature = "sources-azure_blob")]
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code and feature flag comments should be either removed or properly uncommented. Lines 157-158 and 190-191 contain inconsistent commenting that suggests uncertainty about whether these should be conditionally compiled.

Suggested change
// #[cfg(feature = "sources-azure_blob")]

Copilot uses AI. Check for mistakes.
).await {
Ok(Some(bp)) => yield bp,
Ok(None) => trace!("Message {msg_id} is ignored, \
no blob stream stream created from it. \
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the log message: "no blob stream stream created" should be "no blob stream created" (remove duplicate "stream").

Suggested change
no blob stream stream created from it. \
no blob stream created from it. \

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +34
pub struct ClientCredentials {
/// Tenant ID for Azure authentication.
pub tenant_id: String,
/// Client ID for Azure authentication.
pub client_id: String,
/// Client secret for Azure authentication.
pub client_secret: String,
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields tenant_id, client_id, and client_secret in ClientCredentials struct should be public to allow configuration deserialization. Without public visibility, users won't be able to configure these authentication credentials.

Copilot uses AI. Check for mistakes.
PR_CHECKLIST.md Outdated
---

**Last Updated:** 2026-01-01
**Status:** Ready for pre-PR work after GitHub issue approval
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR includes testing artifacts and internal documentation that should not be committed to the repository. Files like PR_CHECKLIST.md, testing/github-XXXXX/, and test-results.md appear to be development/testing artifacts rather than production code or official documentation.

Suggested change
**Status:** Ready for pre-PR work after GitHub issue approval

Copilot uses AI. Check for mistakes.
counter!(
"component_errors_total",
"error_code" => "failed_deleting_azure_queue_event",
"error_type" => error_type::WRITER_FAILED,
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error_type is inconsistent: "WRITER_FAILED" is used for a queue message deletion failure, but this should likely be "ACKNOWLEDGMENT_FAILED" to match the error message on line 99.

Suggested change
"error_type" => error_type::WRITER_FAILED,
"error_type" => error_type::ACKNOWLEDGMENT_FAILED,

Copilot uses AI. Check for mistakes.
src/lib.rs Outdated
#[cfg(feature = "aws-config")]
pub mod aws;
#[allow(unreachable_pub)]
// pub mod azure;
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code should be removed or uncommented with proper feature flags. Having commented-out module declarations in production code can lead to confusion and maintenance issues.

Suggested change
// pub mod azure;

Copilot uses AI. Check for mistakes.
@ldrozdz93 ldrozdz93 force-pushed the ldrozdz93/azure-blob-storage-source branch from 5d8d878 to e962d8b Compare January 21, 2026 14:10
A new `azure_blob` source for reading logs from Azure Blob Storage
containers via Azure Storage Queue notifications (Event Grid).

Designed for feature parity with the existing `aws_s3` source.

Key features:
- Event-driven architecture using Azure Event Grid via Storage Queue
- Connection string authentication
- Configurable compression (gzip, zstd) with auto-detection
- Configurable framing (newline-delimited, character-delimited, etc.)
- Multiline aggregation for stack traces and multi-line logs
- Event metadata enrichment (container, blob, timestamp)
- Acknowledgement support
@ldrozdz93 ldrozdz93 force-pushed the ldrozdz93/azure-blob-storage-source branch from e962d8b to 8a72d0d Compare January 21, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Azure Blob Storage source

2 participants