Skip to content

Conversation

@williballenthin
Copy link
Collaborator

@williballenthin williballenthin commented Oct 29, 2025

closes #2740

TODO:

  • add GH workflow/CI/CD configuratin
  • triage failures
  • bug: extract alternative names (function names and API names)
  • bug: fix references to pma16-01-function=0x404356
  • bug: fix references to kernel32-64-function=0x1800202B0
  • bug: fix references to al-khaser x64-function=0x14004B4F0
  • use inf_get_ostype() to help OS detection for ELF files

Checklist

  • No CHANGELOG update needed
  • No documentation update needed

@williballenthin williballenthin added the enhancement New feature or request label Oct 29, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@github-actions github-actions bot dismissed their stale review October 29, 2025 19:23

CHANGELOG updated or no update needed, thanks! 😄

@williballenthin
Copy link
Collaborator Author

Results (524.69s (0:08:44)):
     155 passed
      15 failed
         - tests/test_idalib_features.py:23 test_idalib_features[al-khaser x64-function=0x14004B4F0-api(__vcrt_GetModuleHandle)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True0]
         - tests/test_idalib_features.py:23 test_idalib_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True1]
         - tests/test_idalib_features.py:23 test_idalib_features[pma12-04-file-characteristic(embedded pe)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356-os(windows)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356-arch(i386)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356-format(pe)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356,bb=0x4043B9-os(windows)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356,bb=0x4043B9-arch(i386)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4027b3,bb=0x402861,insn=0x40286d-api(__GI_connect)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4027b3,bb=0x402861,insn=0x40286d-api(connect)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4027b3,bb=0x402861,insn=0x40286d-api(__libc_connect)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4088a4-function-name(__GI_connect)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4088a4-function-name(connect)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4088a4-function-name(__libc_connect)-True]

@williballenthin
Copy link
Collaborator Author

williballenthin commented Oct 29, 2025

         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4088a4-function-name(__GI_connect)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4088a4-function-name(connect)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[2bf18d-function=0x4088a4-function-name(__libc_connect)-True]

need to figure out how to extract "alternative names":
image

and propagate to callers:
image


also, the connect function isn't recognized as a lib function, so its name isn't being extracted (the symbol comes from symtab, not FLIRT). the ida extractor should extract all names that don't look like sub_*.

@williballenthin
Copy link
Collaborator Author

williballenthin commented Oct 29, 2025

       - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356-os(windows)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356-arch(i386)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356-format(pe)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356,bb=0x4043B9-os(windows)-True]
         - tests/test_idalib_features.py:23 test_idalib_features[pma16-01-function=0x404356,bb=0x4043B9-arch(i386)-True]

IDA recognizes this as a library function, so we'd better pick a different function: 0x401100 basic block 0x401130 for example.

@williballenthin
Copy link
Collaborator Author

williballenthin commented Oct 29, 2025

image
         - tests/test_idalib_features.py:23 test_idalib_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True0]
         - tests/test_idalib_features.py:23 test_idalib_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True1]

this is due to recognizing the containing function as _report_gsfailure (a library function) and therefore skipping analysis of it, which is correct behavior.


note there's a dup here in the test cases, we should de-dup.

@williballenthin
Copy link
Collaborator Author

need to figure out how to extract "alternative names"

apparently "Alternative Names" are just stored in the function comment:

  def get_alternative_names(ea):
      """Get all alternative names for an address."""
      alt_names = []

      # Check indented comment
      cmt = ida_bytes.get_cmt(ea, False)  # False = non-repeatable
      if cmt:
          for line in cmt.split('\n'):
              if line.startswith("Alternative name is '") and line.endswith("'"):
                  name = line[21:-1]  # Extract name between quotes
                  alt_names.append(name)

      # Check function comment
      pfn = ida_funcs.get_func(ea)
      if pfn:
          func_cmt = ida_funcs.get_func_cmt(pfn, False)
          if func_cmt:
              for line in func_cmt.split('\n'):
                  if line.startswith("Alternative name is '") and line.endswith("'"):
                      name = line[21:-1]
                      alt_names.append(name)

      return alt_names

@williballenthin
Copy link
Collaborator Author

         - tests/test_idalib_features.py:23 test_idalib_features[al-khaser x64-function=0x14004B4F0-api(__vcrt_GetModuleHandle)-True]

this is due to IDA correctly identifying a library function and therefore not analyzing it:

image

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 11, 2025

current progress

     158 passed
      10 failed
         - tests/test_idalib_features.py:26 test_idalib_features[mimikatz-function=0x40B3C6-api(LocalFree)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[mimikatz-function=0x4702FD-characteristic(calls from)-False]
         - tests/test_idalib_features.py:26 test_idalib_features[pma12-04-file-characteristic(embedded pe)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[2bf18d-function=0x4027b3,bb=0x402861,insn=0x40286d-api(__GI_connect)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[2bf18d-function=0x4027b3,bb=0x402861,insn=0x40286d-api(connect)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[2bf18d-function=0x4027b3,bb=0x402861,insn=0x40286d-api(__libc_connect)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[2bf18d-function=0x4088a4-function-name(__GI_connect)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[2bf18d-function=0x4088a4-function-name(connect)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[2bf18d-function=0x4088a4-function-name(__libc_connect)-True]
         - tests/test_idalib_features.py:43 test_idalib_feature_counts[mimikatz-function=0x4702FD-characteristic(calls from)-0]

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 15, 2025

currently I see:

FAILED tests/test_idalib_features.py::test_idalib_features[pma12-04-file-characteristic(embedded pe)-True] - AssertionError: characteristic(embedded pe) should be found in file
FAILED tests/test_idalib_features.py::test_idalib_features[2bf18d-function=0x4088a4-function-name(__libc_connect)-True] - AssertionError: function-name(__libc_connect) should be found in function=0x4088a4

Results (166.29s (0:02:46)):
     166 passed
       2 failed
         - tests/test_idalib_features.py:26 test_idalib_features[pma12-04-file-characteristic(embedded pe)-True]
         - tests/test_idalib_features.py:26 test_idalib_features[2bf18d-function=0x4088a4-function-name(__libc_connect)-True]

The first is due to not loading the resource, the second is to be investigated :)

@mr-tz mr-tz force-pushed the idalib-tests branch 2 times, most recently from f9d5b1d to 34488b3 Compare December 15, 2025 15:29
@williballenthin
Copy link
Collaborator Author

williballenthin commented Dec 16, 2025

FAILED tests/test_idalib_features.py::test_idalib_features[2bf18d-function=0x4088a4-function-name(__libc_connect)-True] - AssertionError: function-name(__libc_connect) should be found in function=0x4088a4

IDA recognizes:

  • connect
  • __GI_connect

but doesn't recognize __libc_connect:

image

from the symbol table:

$ objdump -x 2bf18d0403677378adad9001b1243211.elf_ | grep "8a4 "
00000000004088a4  w    F .text  000000000000002b connect
00000000004088a4  w    F .text  000000000000002b .hidden __GI_connect
00000000004088a4 g     F .text  000000000000002b __libc_connect

for which Claude explains:

Looking at these symbol table entries, the key difference is in the binding flags:

w = Weak symbol

g = Global symbol

The three entries:

Address Flags Name
0x4088a4 w (weak) connect
0x4088a4 w (weak) __GI_connect
0x4088a4 g (global) __libc_connect

All three point to the same code at address 0x4088a4.

What this means:

Weak (w): Can be overridden by another symbol with the same name. If you define your
own connect function, the linker will use yours instead of this one.

Global (g): The "canonical" strong symbol. Cannot be overridden by weak symbols, only
by other strong symbols (which would cause a linker error if duplicated).

Why glibc does this:

This is a common glibc pattern:

  • __libc_connect - The internal "real" implementation (global/strong)
  • connect - The public API name (weak, so users can interpose/override it)
  • __GI_connect - Internal glibc alias (.hidden means not exported to dynamic linker)
    used for internal calls that bypass PLT overhead

The .hidden on __GI_connect means glibc's internal code can call connect() via
__GI_connect without going through the PLT, improving performance for internal calls
while still allowing user code to override the public connect symbol.


so let me see if this is a bug with IDA. edit: its not immediately clear why this is happening. there's a good amount of logic to recover various names and store them all, and I don't immediately see why __libc_connect is ignored.

@williballenthin
Copy link
Collaborator Author

williballenthin commented Dec 16, 2025

apparently __libc_connect is being overwritten by a name provided by Lumina! if i disable Lumina in IDA Pro, then __libc_connect is the primary name for the function. there's an existing bug report against IDA for good names being overrided by Lumina, so that'll be fixed... when its fixed.

  1. i had no idea that idalib pulled from Lumina by default, so TIL
  2. we should probably find a way to disable that, at least within the capa session

edit: we can disable Lumina on a per-session basis either by:

  1. setting LUMINA_PRIMARY and LUMINA_SECONDARY to the empty strings, or
  2. providing the "-Olumina:host=" argument to init_library or open_database, however the value can't be empty, so maybe it points to something non-existent (danger! maybe use local host not an unregistered domain.)

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 16, 2025

apparently __libc_connect is being overwritten by a name provided by Lumina!

🙈 oh oh...

good finds here, thanks!

williballenthin added a commit that referenced this pull request Dec 16, 2025
see #2742 in which Lumina names overwrote names provided by debug info
williballenthin added a commit that referenced this pull request Dec 16, 2025
see #2742 in which Lumina names overwrote names provided by debug info
@williballenthin
Copy link
Collaborator Author

i'll triage these final IDA 9.0 failures tomorrow. i wonder if argv isn't supported in idalib for 9.0.

@williballenthin
Copy link
Collaborator Author

sure enough 9.1 added support for argv for idalib: https://docs.hex-rays.com/release-notes/9_1

image

so we should xfail those tests on 9.0 (and/or use 9.1)

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 18, 2025

I'd say we use 9.1 instead of 9.0.

@williballenthin
Copy link
Collaborator Author

i think this is ready to merge @mr-tz

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost good to go IMHO

@williballenthin
Copy link
Collaborator Author

assuming tests pass (just rebased), this is ready to go.

@williballenthin
Copy link
Collaborator Author

i'll have a follow-up PR to migrate the code to use the Domain API, which is cleaner and more idiomatic, and probably worth using instead of the low level IDA Python SDK.

@mr-tz
Copy link
Collaborator

mr-tz commented Jan 13, 2026

Great work. This is a huge improvement!

@mr-tz mr-tz merged commit 6ad4fbb into master Jan 13, 2026
36 of 38 checks passed
@mr-tz mr-tz deleted the idalib-tests branch January 13, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add feature tests for idalib backend

2 participants