JIT: isolate JIT runtime symbols via explicit symbol mapping#812
JIT: isolate JIT runtime symbols via explicit symbol mapping#812LeeLee26 wants to merge 2 commits into
Conversation
|
Thanks again for the PR! Just to clarify, is #802 obsolete now with the introduction of this PR? |
|
I really appreciate for your follow-up PR. @LeeLee26 As I said #802, I am probably not familiar enough with this part of the JIT/runtime implementation to give a deep code review, so I tried to validate the behavior from the user side. MRE Result in 802== after import ==
dlopenflags: 2
RTLD_GLOBAL bit: False
GC_malloc: hidden
GC_init: hidden
GC_get_version: hidden
seq_alloc: hidden
== first jit call ==
jit result: 42
== after first jit ==
dlopenflags: 2
RTLD_GLOBAL bit: False
GC_malloc: hidden
GC_init: hidden
GC_get_version: hidden
seq_alloc: hiddenAs you said, yes, no longer symbols in Runtime Library are visible through dlsym. I also ran additional JIT probe that repeatedly allocates and resizes native codon types (list, dict, set ..), to exercise whether they work well without runtime symbols visible. MREfrom __future__ import annotations
import os
import sys
from typing import Tuple
import codon
@codon.jit
def mix(x: int) -> int:
return ((x * 1103515245) + 12345) % 2147483647
def py_gc_wave(rounds: int, width: int) -> Tuple[int, int, int]:
checksum = 0
max_set_size = 0
last_join_len = 0
for step in range(rounds):
# Many integer elements in a fresh list.
values = [((step * width + i) * 17) % 1000 for i in range(width)]
# Another list built from the first one.
mixed = [mix(v) % 1000 for v in values]
# Dict with string keys.
table = {str(i): value for i, value in enumerate(mixed)}
# Set derived from the list.
bucket_ids = {value % 29 for value in mixed}
# Tuples that include ints, strings and bools.
triples = [(value, str(value), value % 7 == 0) for value in mixed]
# String creation / joining.
joined = "|".join(table.keys())
checksum += sum(mixed) + len(table) + len(triples) + len(joined)
if len(bucket_ids) > max_set_size:
max_set_size = len(bucket_ids)
last_join_len = len(joined)
return (checksum, max_set_size, last_join_len)
@codon.jit
def gc_wave(rounds: int, width: int) -> Tuple[int, int, int]:
checksum = 0
max_set_size = 0
last_join_len = 0
for step in range(rounds):
# Fresh list allocation.
values = [((step * width + i) * 17) % 1000 for i in range(width)]
# Another fresh list allocation.
mixed = [mix(v) % 1000 for v in values]
# Dict allocation with string keys.
table = {str(i): value for i, value in enumerate(mixed)}
# Set allocation.
bucket_ids = {value % 29 for value in mixed}
# List of tuples, with more string creation.
triples = [(value, str(value), value % 7 == 0) for value in mixed]
# Joined string from dict keys.
joined = "|".join(table.keys())
checksum += sum(mixed) + len(table) + len(triples) + len(joined)
if len(bucket_ids) > max_set_size:
max_set_size = len(bucket_ids)
last_join_len = len(joined)
return (checksum, max_set_size, last_join_len)
def main() -> int:
print("== environment ==")
print(f"python: {sys.executable}")
print(f"python_version: {sys.version.split()[0]}")
print(f"codon_module: {getattr(codon, '__file__', None)}")
print(f"CODON_PATH: {os.environ.get('CODON_PATH')!r}")
print()
rounds = 40
width = 64
expected = py_gc_wave(rounds, width)
got = gc_wave(rounds, width)
print("== single run ==")
print(f"python baseline: {expected!r}")
print(f"codon jit : {got!r}")
print(f"match : {got == expected}")
print()
print("== repeated calls ==")
for i in range(3):
result = gc_wave(rounds + i, width + i)
print(f"run {i}: {result!r}")
print()
if got != expected:
print("probe: FAIL")
return 1
print("probe: SUCCESS")
return 0
if __name__ == "__main__":
raise SystemExit(main())ResultSeems works well too. So, from the limited runtime checks I performed, this appears to fix the symbol leakage I was observing while still allowing JIT-compiled Codon functions that allocate runtime-managed objects to run correctly. But, I have not deeply reviewed the implementation or validated all possible JIT/runtime cases, so please treat this as a limited behavioral confirmation rather than a full code review. @arshajii @LeeLee26 @inumanag |
|
Hi @BI71317, Thank you so much for this thorough user‑side validation and extensive testing! It’s really helpful to confirm that PR #812 properly resolves the symbol leakage issue while keeping runtime‑managed objects working correctly under repeated JIT execution. I really appreciate your careful probe covering list, dict, set allocations and GC‑related workloads. Your behavioral verification gives us great confidence in this change. And I totally understand that this is a high‑level functional check rather than a full code review. Thanks again for your time and effort! |
|
Hi @LeeLee26 Thank you for your work here! Can you please explain why exactly is RTLD_GLOBAL an issue? Aside from having a few extra symbols, I see no other downsides. Before that, we used to have all sorts of JIT and symbol issues on different platforms; I am not sure if this PR fixes any of that. This PR also hardcodes lots of stuff (and looks like is not cross-platform as it relies on ".so" extension, fixed versions of shared libs, and so on), and also introduces a significant friction in JIT updates. Furthermore, future-proofing and being independent of LLVM ORC updates is also a major concern. Basically, I am afraid that merging this PR will introduce way more issues that it will solve. |
|
Hi @inumanag, Thank you so much for your detailed feedback and valid concerns! Let me start by clarifying why I initially addressed the This PR is an extension of the work and discussions from PR #802 , where I implemented full symbol isolation as a potential solution. As you correctly pointed out, this implementation relies heavily on hardcoded configurations, lacks proper cross-platform support, and may bring extra maintenance overhead regarding future LLVM ORC updates. Given these trade-offs, I fully agree with your concerns. I strongly recommend we merge PR #802 first while keeping this PR retained as a backup solution. If we encounter specific JIT symbol-related issues on UNIX platforms later, we can revisit and evaluate this implementation accordingly. Additionally, if any Unix-specific JIT symbol issues arise in the future, I’d be more than happy to help investigate and diagnose the root causes. |
This PR refactors how the Codon JIT backend loads and resolves symbols from the runtime library (
libcodonrt). It addresses symbol leakage by replacing global dynamic library loading with local isolation and explicit registration.This PR is implemented based on #802, which can further mitigate symbol pollution during JIT compilation.
Apply this PR, the result for the case are as follows