Skip to content

Conversation

@psainics
Copy link
Collaborator

@psainics psainics commented Jun 24, 2025

Description

New google drive plugin will

UI Changes

Reordering

  • Moved Authentication above Basic

  • [OLD]
    image

  • [NEW]
    image

New Fields

1) Structured Schema Required

  • Default Value

    • True
  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • Always visible

2) Format

  • Doc
    image

  • UI [under basic]
    image
    image

  • Visibility

    • shown when Structured Schema Required is set to true
    • Filter Used
      image

3) get-schema

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true
    • Filter Used
image

3) Path Field

  • Required by interface but not useful in this plugin, power-user may edit value from pipeline json
  • Visibility
    • Always hidden

4) Regex Path Filter

  • Required by interface but not useful in this plugin, power-user may edit value from pipeline json
  • Visibility
    • Always hidden

5) Read Files Recursively

  • Required by interface but not useful in this plugin, power-user may edit value from pipeline json
  • Default Value
    • false
  • Visibility
    • Always hidden

6) Allow Empty Input

  • Required by interface but not useful in this plugin, power-user may edit value from pipeline json
  • Default Value
    • false
  • Visibility
    • Always hidden

7) Sample Size

  • Doc
    image

  • UI [under basic]
    image

  • Default Value

    • 1000
  • Visibility

    • shown when Structured Schema Required is set to true
    • Filter Used
image

8) Override

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true
    • Filter Used
image

9) Sample Size

  • Doc
    image

  • UI [under basic]
    image

  • Default Value

    • 1000
  • Visibility

    • shown when Structured Schema Required is set to true
    • Filter Used
image

10) Delimiter

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true and format is delimited
    • Filter Used
image

11) Enable Quoted Values

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true and format like csv
    • Filter Used
image

12) Use First Row as Header

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true and format like csv and xls
    • Filter Used
image

13) File Encoding

  • Doc
    image

  • UI [under advance]
    image

  • Visibility

    • shown when Structured Schema Required is set to true and format like csv and xls
    • Filter Used
image

14) Terminate Reading After Empty Row

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true and format xls
    • Filter Used
image

15) Select Sheet Using

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true and format xls
    • Filter Used
image

16) Sheet Value

  • Doc
    image

  • UI [under basic]
    image

  • Visibility

    • shown when Structured Schema Required is set to true and format xls
    • Filter Used
image

Code Changes

  • GoogleDriveFileSource

    • New class that extends AbstractFileSource and gives it's functionality
  • GoogleDriveFileSystem

    • New class that gives file system implementation.
    • Helper classes
    • GoogleDriveInputStream
    • GoogleDriveInputStreamWrapper
    • GoogleDriveUtils
  • GoogleDriveRecordReader , GoogleDriveSource

    • Change output from FileFromFolder to StructuredRecord to make it compatible with abstract file source
  • GoogleDriveSourceConfig

    • implement FileSourceProperties interface to be used as a config class compatible with AbstractFileSource

If no such file can be found, an error will be returned.

**Sample Size:** The maximum number of rows that will get investigated for automatic data type detection.
The default value is 1000. This is only used when the format is 'xls'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be used in tsv and csv also

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to

**Sample Size:** The maximum number of rows that will get investigated for automatic data type detection.
The default value is 1000. This is used when the format is `xls`, `csv`, `tsv`, `delimited`.

The default value is 1000. This is only used when the format is 'xls'.

**Override:** A list of columns with the corresponding data types for whom the automatic data type detection gets
skipped. This is only used when the format is 'xls'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used in csv, tsv also

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to

**Override:** A list of columns with the corresponding data types for whom the automatic data type detection gets
skipped. This is used when the format is `xls`, `csv`, `tsv`, `delimited`.


@Override
public FileStatus[] listStatus(Path path) throws FileNotFoundException, IOException {
return GoogleDriveUtils.listStatus(driveService, path);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we handling filters for this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added support for filters

// Query Google Drive for files in the directory
String query = "'" + dirId + "' in parents and trashed = false";
FileList result = driveService.files().list()
.setQ(query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing filters should be passed here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@psainics psainics force-pushed the feat/AbstractFileBatchSource branch 3 times, most recently from a09cada to d832951 Compare July 17, 2025 07:13
@psainics psainics force-pushed the feat/AbstractFileBatchSource branch from 58ad16a to 54913a2 Compare August 19, 2025 06:11
@psainics psainics force-pushed the feat/AbstractFileBatchSource branch from 54913a2 to e8dd7e0 Compare September 23, 2025 06:58
…ileBatchSource-ui

[PLUGIN-1906] Add AbstractFileBatchSource GDrive [UI + Docs]
@psainics psainics force-pushed the feat/AbstractFileBatchSource branch from e8dd7e0 to 2034022 Compare September 24, 2025 11:47
@psainics psainics force-pushed the feat/AbstractFileBatchSource branch from b453f4a to 439d403 Compare October 14, 2025 08:46
@psainics psainics force-pushed the feat/AbstractFileBatchSource branch from 439d403 to 4dc5202 Compare October 14, 2025 08:55
@psainics psainics merged commit 1013bcd into develop Oct 14, 2025
1 check passed
@psainics psainics deleted the feat/AbstractFileBatchSource branch October 14, 2025 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants