Skip to content

Race condition with SMP code #14

@highwaycoder

Description

@highwaycoder

Sometimes when testing, the OS will freeze up and nothing will happen. I believe this is because a code path which disables interrupts is being executed when the APIC timer that the scheduler_await function sets up fires, causing the APIC timer interrupt to get masked - there is no mechanism for detecting this, so no way to avoid it.

I believe in SMP mode this should be at least less probable if not outright impossible, as the APIC timer fires on all cores at the same time, so as long as only core 0 executes code paths that include the cli instruction, we should never lock up. It's possible that core0 still locks up in this scenario though.

The solution will be quite complex in any case, for example we could fire a second APIC timer interrupt every second or so and have that interrupt handler check that the scheduler is still running - and if not, bring it back online. However, we of course have the same problem here and could easily end up in a "turtles all the way down" situation.

Another option would be to have a check in the idle loop itself for when the scheduler was last run, and kick the scheduler if some deadline is exceeded - though this would have to be carefully tuned so as not to end up with two scheduler timers running concurrently (which would "work", but would result in a lot more context switching than is necessary / tuned for, which will hurt performance).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions