Add policy and schema caching via cache()/cacheKey() API#350
Add policy and schema caching via cache()/cacheKey() API#350geekphilosophy wants to merge 3 commits intocedar-policy:mainfrom
Conversation
902cceb to
5ae66b9
Compare
Adds Rust-side caching of pre-parsed policies and schemas, integrated
directly into PolicySet and Schema. When cached, BasicAuthorizationEngine
transparently uses a fast stateful authorization path that skips
re-parsing — no separate engine class or new authorization flow required.
Usage:
policySet.cache();
schema.cache();
var engine = new BasicAuthorizationEngine();
for (AuthorizationRequest req : requests) {
engine.isAuthorized(req, policySet, entities); // fast cached path
}
// Cached data is freed automatically when the object is GC'd.
API on PolicySet and Schema:
- cache(): pre-parse and cache on the Rust side. One-way, idempotent.
To use different policies/schema, create a new instance.
- cacheKey(): returns Optional<String> cache ID, empty if not cached.
Both policy set and schema must be cached for the fast path to be used.
If only one is cached, authorization falls back to the uncached path
to avoid silently skipping schema validation.
Implementation:
Rust (CedarJavaFFI):
- Shared DashMap cache for parsed PolicySets and Schemas, accessible
from any thread. Direct JNI entry points for cache removal (no JSON
dispatch overhead).
- Stateful authorization operation that looks up cached policies/schema
by ID and authorizes against them.
Java:
- PolicySet and Schema gain cache()/cacheKey() methods.
- BasicAuthorizationEngine.isAuthorized() checks for cached inputs and
dispatches to the stateful path transparently.
- GC-based cleanup via java.lang.ref.Cleaner frees Rust cache entries.
- SharedCedarInternals centralizes the Cleaner instance and JNI access.
NativeHelpers remains package-private.
Benchmarks:
- Full cross-product matrix: 4 policy sizes x 3 entity sizes, cached
and uncached. Run via: ./gradlew jmh
Signed-off-by: Chris Simmons <simmonsc@amazon.com>
5ae66b9 to
81654a8
Compare
| String input = objectWriter().writeValueAsString( | ||
| new PreparsePolicySetRequest(id, this)); | ||
| String response = SharedCedarInternals.callCedarJNI("PreparsePolicySet", input); | ||
| JsonNode node = new com.fasterxml.jackson.databind.ObjectMapper().readTree(response); |
There was a problem hiding this comment.
Nit: Replace fully-qualified inline usage of ObjectMapper with import.
import com.fasterxml.jackson.databind.ObjectMapper;
| static AUTHORIZER: Authorizer = Authorizer::new(); | ||
| } | ||
|
|
||
| static CACHED_POLICY_SETS: LazyLock<DashMap<String, PolicySet>> = LazyLock::new(DashMap::new); |
There was a problem hiding this comment.
We should have some cache eviction policy in place. Otherwise, if someone caches a lot of long-lived policy sets, we run into a risk of running out of memory. It won't be a Java OOM "graceful" error but probably a process killed error if CACHED_POLICY_SETS or CACHED_SCHEMAS grow too much. Unlikely for most use-cases but we should have it in place. Maybe give user some control over the eviction policy (e.g., TTL, maxEntries, etc.)
There was a problem hiding this comment.
Looking at adding a bounded cache sex (maxEntries). Configurable through system property, no TTL or LRU as that seems like overkill here.
| // Look up cached policy set | ||
| let policies = CACHED_POLICY_SETS | ||
| .get(&call.preparsed_policy_set_id) | ||
| .map(|r| r.clone()); |
There was a problem hiding this comment.
Maybe this could be further optimized by using Arc<PolicySet> instead of performing a clone as isAuthorized just takes a reference to the policy set?
@mark-creamer-amazon What do you think?
| } | ||
| }; | ||
|
|
||
| let principal = match principal.parse(Some("principal")) { |
There was a problem hiding this comment.
Wondering if we can just reuse cedar_policy::ffi::is_authorized_json_str or cedar_policy::ffi::is_authorized_json_str like we do for non-stateful isAuthorized call instead of doing all the request parsing by ourselves.
cedar_policy::ffi already creates a thread local authorizer. Actually, looking at the cedar_policy::ffi implementation seems like it has some version of pre-parsed policy sets and schemas methods. Maybe we can re-use it? However, from a brief look I don't see it having any eviction mechanism though.
There was a problem hiding this comment.
cedar_policy::ffi does have preparse_policy_set, preparse_schema, and stateful_is_authorized with essentially the same structure. However, their cache is thread-local (RefCell inside thread_local!), which doesn't work for JNI: the Java thread that calls cache() is typically not the same thread that later calls isAuthorized() in a server thread pool. That's why we use a global DashMap — it's cross-thread. This was earlier feedback from a customer review.
There was a problem hiding this comment.
Estimates are that a two level cache would save ~0.3% so probably not worth it..
Cache/uncache operations (preparsePolicySet, preparseSchema, removeCachedPolicySet, removeCachedSchema) no longer route through the generic callCedarJNI JSON dispatch. Each is now a direct JNI native method on PolicySet or Schema, avoiding unnecessary JSON serialization/deserialization overhead. - Delete NativeHelpers (existed only for JSON dispatch delegation) - Strip SharedCedarInternals to just the Cleaner - Remove PreparsePolicySet/PreparseSchema/RemoveCachedPolicySet/ RemoveCachedSchema from the Rust call_cedar dispatch router - Add direct JNI entry points on PolicySet and Schema classes - Fix fully-qualified annotation/import nits in PolicySet and interface.rs Signed-off-by: Chris Simmons <simmonsc@amazon.com>
Cache refuses new entries when at capacity (default 1024) rather than growing unbounded. When cache() can't store due to capacity, cacheId stays null and authorization transparently uses the uncached path. If a cached entry is evicted and the stateful authorization path returns "not found", BasicAuthorizationEngine falls back to the uncached path instead of throwing. Configurable via system properties: -Dcedar.cache.maxPolicySets=1024 -Dcedar.cache.maxSchemas=1024 Signed-off-by: Chris Simmons <simmonsc@amazon.com>
|
|
||
| // --- Cross-product: policy size x entity size --- | ||
|
|
||
| // Small policies x {medium, large} entities |
There was a problem hiding this comment.
Was wondering at first why we're leaving out the small policies x small entities case etc, then realized that yeah, they're above!
I wonder if there's an enumerating annotation we could use to have the same benchmarked method called across the parameters, similar to TestNg's DataProvider.
I see jmh has Param, which perhaps if we enumerated over an enum of sizes as input parameters, those could key into a map whose values are our various instance variables representing the differently sized policySet and entities.
Adds Rust-side caching of pre-parsed policies and schemas, integrated directly into PolicySet and Schema. When cached, BasicAuthorizationEngine transparently uses a fast stateful authorization path that skips re-parsing — no separate engine class or new authorization flow required.
Usage:
policySet.cache();
schema.cache();
var engine = new BasicAuthorizationEngine();
for (AuthorizationRequest req : requests) {
engine.isAuthorized(req, policySet, entities); // fast cached path
}
// Cached data is freed automatically when the object is GC'd.
API on PolicySet and Schema:
Both policy set and schema must be cached for the fast path to be used. If only one is cached, authorization falls back to the uncached path to avoid silently skipping schema validation.
Implementation:
Rust (CedarJavaFFI):
Java:
Benchmarks: