Skip to content

Add policy and schema caching via cache()/cacheKey() API#350

Open
geekphilosophy wants to merge 3 commits intocedar-policy:mainfrom
geekphilosophy:integrated-caching
Open

Add policy and schema caching via cache()/cacheKey() API#350
geekphilosophy wants to merge 3 commits intocedar-policy:mainfrom
geekphilosophy:integrated-caching

Conversation

@geekphilosophy
Copy link
Copy Markdown
Contributor

Adds Rust-side caching of pre-parsed policies and schemas, integrated directly into PolicySet and Schema. When cached, BasicAuthorizationEngine transparently uses a fast stateful authorization path that skips re-parsing — no separate engine class or new authorization flow required.

Usage:

policySet.cache();
schema.cache();

var engine = new BasicAuthorizationEngine();
for (AuthorizationRequest req : requests) {
engine.isAuthorized(req, policySet, entities); // fast cached path
}
// Cached data is freed automatically when the object is GC'd.

API on PolicySet and Schema:

  • cache(): pre-parse and cache on the Rust side. One-way, idempotent. To use different policies/schema, create a new instance.
  • cacheKey(): returns Optional cache ID, empty if not cached.

Both policy set and schema must be cached for the fast path to be used. If only one is cached, authorization falls back to the uncached path to avoid silently skipping schema validation.

Implementation:

Rust (CedarJavaFFI):

  • Shared DashMap cache for parsed PolicySets and Schemas, accessible from any thread. Direct JNI entry points for cache removal (no JSON dispatch overhead).
  • Stateful authorization operation that looks up cached policies/schema by ID and authorizes against them.

Java:

  • PolicySet and Schema gain cache()/cacheKey() methods.
  • BasicAuthorizationEngine.isAuthorized() checks for cached inputs and dispatches to the stateful path transparently.
  • GC-based cleanup via java.lang.ref.Cleaner frees Rust cache entries.
  • SharedCedarInternals centralizes the Cleaner instance and JNI access. NativeHelpers remains package-private.

Benchmarks:

  • Full cross-product matrix: 4 policy sizes x 3 entity sizes, cached and uncached. Run via: ./gradlew jmh
  • Caching avoids re-parsing policies and schemas on every authorization call. The benefit scales with policy/schema complexity — from ~2x for trivial policies to over 20x for large policy sets with schemas

Adds Rust-side caching of pre-parsed policies and schemas, integrated
directly into PolicySet and Schema. When cached, BasicAuthorizationEngine
transparently uses a fast stateful authorization path that skips
re-parsing — no separate engine class or new authorization flow required.

Usage:

  policySet.cache();
  schema.cache();

  var engine = new BasicAuthorizationEngine();
  for (AuthorizationRequest req : requests) {
      engine.isAuthorized(req, policySet, entities);  // fast cached path
  }
  // Cached data is freed automatically when the object is GC'd.

API on PolicySet and Schema:
- cache(): pre-parse and cache on the Rust side. One-way, idempotent.
  To use different policies/schema, create a new instance.
- cacheKey(): returns Optional<String> cache ID, empty if not cached.

Both policy set and schema must be cached for the fast path to be used.
If only one is cached, authorization falls back to the uncached path
to avoid silently skipping schema validation.

Implementation:

Rust (CedarJavaFFI):
- Shared DashMap cache for parsed PolicySets and Schemas, accessible
  from any thread. Direct JNI entry points for cache removal (no JSON
  dispatch overhead).
- Stateful authorization operation that looks up cached policies/schema
  by ID and authorizes against them.

Java:
- PolicySet and Schema gain cache()/cacheKey() methods.
- BasicAuthorizationEngine.isAuthorized() checks for cached inputs and
  dispatches to the stateful path transparently.
- GC-based cleanup via java.lang.ref.Cleaner frees Rust cache entries.
- SharedCedarInternals centralizes the Cleaner instance and JNI access.
  NativeHelpers remains package-private.

Benchmarks:
- Full cross-product matrix: 4 policy sizes x 3 entity sizes, cached
  and uncached. Run via: ./gradlew jmh

Signed-off-by: Chris Simmons <simmonsc@amazon.com>
Comment thread CedarJava/build.gradle
String input = objectWriter().writeValueAsString(
new PreparsePolicySetRequest(id, this));
String response = SharedCedarInternals.callCedarJNI("PreparsePolicySet", input);
JsonNode node = new com.fasterxml.jackson.databind.ObjectMapper().readTree(response);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Replace fully-qualified inline usage of ObjectMapper with import.

  import com.fasterxml.jackson.databind.ObjectMapper;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing.

Comment thread CedarJava/src/main/java/com/cedarpolicy/model/policy/PolicySet.java Outdated
Comment thread CedarJavaFFI/src/interface.rs Outdated
static AUTHORIZER: Authorizer = Authorizer::new();
}

static CACHED_POLICY_SETS: LazyLock<DashMap<String, PolicySet>> = LazyLock::new(DashMap::new);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some cache eviction policy in place. Otherwise, if someone caches a lot of long-lived policy sets, we run into a risk of running out of memory. It won't be a Java OOM "graceful" error but probably a process killed error if CACHED_POLICY_SETS or CACHED_SCHEMAS grow too much. Unlikely for most use-cases but we should have it in place. Maybe give user some control over the eviction policy (e.g., TTL, maxEntries, etc.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at adding a bounded cache sex (maxEntries). Configurable through system property, no TTL or LRU as that seems like overkill here.

// Look up cached policy set
let policies = CACHED_POLICY_SETS
.get(&call.preparsed_policy_set_id)
.map(|r| r.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be further optimized by using Arc<PolicySet> instead of performing a clone as isAuthorized just takes a reference to the policy set?

@mark-creamer-amazon What do you think?

Comment thread CedarJavaFFI/src/interface.rs Outdated
}
};

let principal = match principal.parse(Some("principal")) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can just reuse cedar_policy::ffi::is_authorized_json_str or cedar_policy::ffi::is_authorized_json_str like we do for non-stateful isAuthorized call instead of doing all the request parsing by ourselves.

cedar_policy::ffi already creates a thread local authorizer. Actually, looking at the cedar_policy::ffi implementation seems like it has some version of pre-parsed policy sets and schemas methods. Maybe we can re-use it? However, from a brief look I don't see it having any eviction mechanism though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cedar_policy::ffi does have preparse_policy_set, preparse_schema, and stateful_is_authorized with essentially the same structure. However, their cache is thread-local (RefCell inside thread_local!), which doesn't work for JNI: the Java thread that calls cache() is typically not the same thread that later calls isAuthorized() in a server thread pool. That's why we use a global DashMap — it's cross-thread. This was earlier feedback from a customer review.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimates are that a two level cache would save ~0.3% so probably not worth it..

Cache/uncache operations (preparsePolicySet, preparseSchema,
removeCachedPolicySet, removeCachedSchema) no longer route through
the generic callCedarJNI JSON dispatch. Each is now a direct JNI
native method on PolicySet or Schema, avoiding unnecessary JSON
serialization/deserialization overhead.

- Delete NativeHelpers (existed only for JSON dispatch delegation)
- Strip SharedCedarInternals to just the Cleaner
- Remove PreparsePolicySet/PreparseSchema/RemoveCachedPolicySet/
  RemoveCachedSchema from the Rust call_cedar dispatch router
- Add direct JNI entry points on PolicySet and Schema classes
- Fix fully-qualified annotation/import nits in PolicySet and
  interface.rs

Signed-off-by: Chris Simmons <simmonsc@amazon.com>
Comment thread CedarJava/src/main/java/com/cedarpolicy/model/policy/PolicySet.java
Cache refuses new entries when at capacity (default 1024) rather than
growing unbounded. When cache() can't store due to capacity, cacheId
stays null and authorization transparently uses the uncached path.

If a cached entry is evicted and the stateful authorization path
returns "not found", BasicAuthorizationEngine falls back to the
uncached path instead of throwing.

Configurable via system properties:
  -Dcedar.cache.maxPolicySets=1024
  -Dcedar.cache.maxSchemas=1024

Signed-off-by: Chris Simmons <simmonsc@amazon.com>

// --- Cross-product: policy size x entity size ---

// Small policies x {medium, large} entities
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering at first why we're leaving out the small policies x small entities case etc, then realized that yeah, they're above!

I wonder if there's an enumerating annotation we could use to have the same benchmarked method called across the parameters, similar to TestNg's DataProvider.

I see jmh has Param, which perhaps if we enumerated over an enum of sizes as input parameters, those could key into a map whose values are our various instance variables representing the differently sized policySet and entities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants