Skip to content

0.7 alpha 2#258

Open
Chenglong-MS wants to merge 404 commits intomainfrom
dev
Open

0.7 alpha 2#258
Chenglong-MS wants to merge 404 commits intomainfrom
dev

Conversation

@Chenglong-MS
Copy link
Copy Markdown
Collaborator

@Chenglong-MS Chenglong-MS commented Mar 18, 2026

PR Summary

Agents & AI Pipeline

  • Unified data agents: Consolidated agent_py_data_rec, agent_sql_data_rec, agent_py_data_transform, agent_sql_data_transform, agent_concept_derive, agent_py_concept_derive, agent_data_clean, and agent_exploration into three unified agents: data_agent.py, agent_data_rec.py, and agent_data_transform.py
  • Semantic type system: New semantic_types.py backend module and full frontend type registry (src/lib/agents-chart/core/type-registry.ts, field-semantics.ts, semantic-types.ts) with domain shape inference, tick constraints, zero-baseline classification, and snap-to-bound heuristics
  • Chart insight agent: New agent_chart_insight.py for AI-generated chart takeaways
  • Language agent: New agent_language.py for i18n-aware prompts
  • Diagnostics agent: New agent_diagnostics.py with unified diagnostic information builder for better error reporting
  • Improved agent robustness: Better handling of missing output blocks, output variable detection, multimodal fallback for text-only models

Visualization

  • Agents-chart library: Complete new chart rendering library (src/lib/agents-chart/, 120 files, ~44K lines) with multi-backend support for Vega-Lite, ECharts, Chart.js, and GoFish — includes template system, semantic-aware axis/domain/tick handling, color decisions, layout computation, faceting, and overflow filtering
  • Chart gallery: New ChartGallery.tsx with expanded chart type support including pie, US map, world map, bump, candlestick, density, lollipop, pyramid, radar, rose, streamgraph, strip plot, waterfall, and more
  • Chart render service: New ChartRenderService.tsx replacing static SVG rendering with vega-embed for interactive charts
  • Insight panel redesign: Insight takeaways now display as styled cards (matching concept explanation style) with 2-column grid layout instead of bullet lists
  • Chart recommendations: New SimpleChartRecBox.tsx and chartRecommendation.ts for improved chart suggestion workflow
  • Score tick fix: Score type with small domain spans (e.g., [0,1]) no longer forces integer-only ticks, preserving intermediate decimal ticks

Data Thread & Workflow

  • Hybrid thread redesign: Unified data thread with reports integrated into threads (DataThread.tsx rewrite, new DataThreadCards.tsx, InteractionEntryCard.tsx)
  • Unified formulate data hook: New useFormulateData.ts consolidating data derivation logic
  • Report editor: New Tiptap-based report editor (TiptapReportEditor.tsx) with richer editing support

Data Loading & Management

  • Unified upload dialog: New UnifiedDataUploadDialog.tsx replacing the old table selection view — supports file upload, URL, paste, database, and sample datasets in a single dialog with loading state indicators
  • Multi-table preview: New MultiTablePreview.tsx for previewing multiple tables before loading
  • Unified table loading thunk: New tableThunks.ts handling all data source types with server-side workspace storage
  • Live data & refresh: New useDataRefresh.tsx with auto-refresh, stream data sources, and RefreshDataDialog.tsx
  • Virtual table sorting: Server-side sorting now returns original row IDs (#rowId) via ROW_NUMBER() in DuckDB and pandas paths, preserving original row positions after sort

Data Loaders (Database Plugins)

  • New data loaders: Added Athena, BigQuery, and MongoDB data loaders
  • Enhanced existing loaders: Improved MySQL, PostgreSQL, MSSQL, S3, Azure Blob, and Kusto loaders with better error handling, connection cleanup, and password sanitization

Datalake / Workspace Backend

  • New workspace system: Complete datalake/ package with workspace.py, azure_blob_workspace.py, cached_azure_blob_workspace.py, file_manager.py, metadata.py, cache_manager.py, parquet_utils.py, and table_names.py
  • Workspace factory: New workspace_factory.py for configuration-driven workspace initialization
  • Session management: New session_routes.py for session-level API endpoints
  • Unicode & encoding: Support for Unicode filenames, path traversal checks, safe filename processing, UTF-8/GBK encoding detection
  • Atomic metadata updates: Prevent lost updates in concurrent scenarios

Security

  • Code signing: New code_signing.py for generated code integrity verification
  • Auth module: New auth.py for authentication handling
  • URL allowlist: New url_allowlist.py for URL validation
  • Error sanitization: New sanitize.py to prevent leaking sensitive info in error messages
  • Sandbox system: New sandbox/ package with local_sandbox.py, docker_sandbox.py, not_a_sandbox.py, and Dockerfile.sandbox replacing the old py_sandbox.py
  • Identity management: New identity.ts with browser-based identity for multi-user support

Internationalization (i18n)

  • Full i18n framework: Added react-i18next with English and Chinese locale files across 7 namespaces (common, chart, encoding, messages, model, navigation, upload)
  • Translation guide: Comprehensive TRANSLATION_GUIDE.md for contributors

UI & Design System

  • Design tokens: New tokens.ts with centralized color, spacing, shadow, transition, and radius tokens
  • Canvas redesign: Refactored DataFormulator.tsx and App.tsx with TopNavButton, AppShell navigation, and model management UI
  • Encoding shelf updates: Reworked EncodingShelfCard.tsx and EncodingShelfThread.tsx
  • Removed legacy components: Deleted ConceptCard.tsx, ConceptShelf.tsx, DerivedDataDialog.tsx

Model Management

  • Server-side global models: New model_registry.py for managing model configurations server-side
  • Model selection dialog: Enhanced ModelSelectionDialog.tsx with multi-model support

Infrastructure & DevOps

  • Docker support: New Dockerfile, docker-compose.yml, docker-compose.test.yml with volume permissions and sandbox user handling
  • Updated dev container: Refreshed .devcontainer/devcontainer.json
  • Dependency management: Migrated from npm to yarn, added uv.lock, updated pyproject.toml and requirements.txt

Testing

  • Comprehensive test suite: 69 new test files (~8K lines) covering backend unit, integration, contract, security, plugin, and frontend unit tests
  • Test infrastructure: New vitest.config.ts, pytest.ini, conftest.py, frontend setup, and test_plan.md
  • Database plugin tests: Docker-based test harnesses for MySQL, PostgreSQL, MongoDB, and BigQuery

Comment thread py-src/data_formulator/agent_routes.py Fixed
Comment thread py-src/data_formulator/agent_routes.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
@Chenglong-MS Chenglong-MS requested a review from Copilot March 24, 2026 20:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@Chenglong-MS Chenglong-MS requested a review from Copilot March 24, 2026 20:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Comment thread py-src/data_formulator/routes/agents.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
Comment thread py-src/data_formulator/tables_routes.py Fixed
@Chenglong-MS Chenglong-MS requested a review from zhb-ai April 8, 2026 05:46
zhb-y-agent and others added 3 commits April 9, 2026 02:14
补充 Superset 集成代码迁移说明,调整步骤编号,并添加文档交付要求
补充数据溯源描述的设计决策和实现方式,使用模板拼接而非AI生成来保证准确性和可刷新性。描述内容包括来源、筛选条件、时间范围等,并自动存储到loader_metadata中供前端和AI使用。
Comment thread py-src/data_formulator/session_routes.py Fixed
Comment thread py-src/data_formulator/session_routes.py Fixed
Chenglong-MS and others added 7 commits April 8, 2026 23:56
更新设计文档,将所有数据源插件的环境变量前缀从裸前缀(如SUPERSET_)统一修改为PLG_前缀(如PLG_SUPERSET_),以避免与LLM模型配置的环境变量冲突
更新数据源插件架构文档,简化外部元数据设计方案为不透明 blob 格式
添加新的设计文档分析多语言提示词注入问题及解决方案
完善语言注入架构说明,明确需要修复的工作区命名问题
移除不相关的解决方案,聚焦现有架构的合理使用
Comment thread py-src/data_formulator/routes/agents.py Fixed
Comment thread py-src/data_formulator/agent_routes.py Fixed
Comment thread py-src/data_formulator/agent_routes.py Fixed
zhb-ai and others added 11 commits April 10, 2026 14:17
- 将原始设计中的个人本地工具假设更新为统一工作区存储模型
- 详细说明 WORKSPACE_BACKEND 配置支持的多种部署形态
- 移除过时的 storeOnServer 和 DISABLE_DATABASE 概念
- 明确插件系统设计决策,包括服务端配置和认证管理
…e EasyAuth support

Add pluggable authentication system framework with environment variable-based provider configuration
Add Azure App Service EasyAuth authentication provider implementation
Refactor existing authentication logic to support multiple authentication sources
Add relevant test cases and documentation markers
…extend data upload dialog

- Export buildDictTableFromWorkspace function in tableThunks.ts and add new loadPluginTable async thunk
- Extend UnifiedDataUploadDialog to support plugin data source tabs
- Add plugin host component for handling plugin data loading
- Modify UploadTabType type to support plugin-specific tabs
Add Superset data source plugin with support for browsing and loading datasets via password or SSO login. Integrate OIDC authentication system with frontend login button and callback handling. Extend authentication regex to support OIDC's sub claim format. Add GitHub OAuth as an alternative authentication option. Introduce plugin system framework supporting automatic discovery and registration of frontend and backend plugins. Optimize internationalization files with Superset-related translations. Add test cases covering authentication and plugin functionality.

New files include:

- Superset plugin frontend components and API modules
- OIDC authentication configuration and callback pages
- GitHub OAuth gateway and authentication provider
- Plugin system base classes and registration logic
- Related test cases and internationalization resources
Modified files involve:

- Enhanced authentication logic
- Extended application configuration interfaces
- Dependency updates (pyproject.toml, package.json)
- Security-related regex adjustments
…plugin interface

Refactor Superset plugin interface with dashboard search and filtering capabilities
Migrate translation files from public directory to plugin local directory
Add plugin translation registration mechanism with support for dynamically loading plugin localization resources
Optimize table name generation logic with automatic suffix functionality
Improve filter dialog interactions with support for more operators and type checking
…plugin interface

Refactor Superset plugin interface with dashboard search and filtering capabilities
Migrate translation files from public directory to plugin local directory
Add plugin translation registration mechanism with support for dynamically loading plugin localization resources
Optimize table name generation logic with automatic suffix functionality
Improve filter dialog interactions with support for more operators and type checking
在数据源插件架构文档中新增第12节,详细说明插件如何自带翻译文件并通过框架自动合并的方案。该方案解决插件翻译与宿主项目混杂的问题,保持插件自包含性,同时支持多语言自动切换。
zhb-y-agent and others added 30 commits May 3, 2026 05:28
Add _build_sort_clause static method for building ORDER BY clauses
Modify fetch_data_as_arrow method to support sorting options
Add related unit tests to verify sorting functionality
… security design document

Add ISSUE-005 design document, detailing the current status of sorting capabilities and column name concatenation security issues in data loaders, root cause analysis, fix plan, and testing strategy
… library

Merge skills and experiences directories into a unified library directory, keeping rules independent. Main changes include:
- Modify constant definitions and initialization logic in knowledge/store.py
- Update category references in API routes and Agent tools
- Adjust frontend type definitions and state management
- Optimize ExperienceDistillAgent prompts to distill general methodologies
- Update related test cases
…periences

- Merge original three directories rules/skills/experiences into rules/experiences
- KnowledgeStore.search() automatically skips rules with alwaysApply=true
- Update ExperienceDistillAgent to distill general methodologies rather than specific cases
- Frontend simplified from three tabs to Rules/Experiences two sections
- Injection text uses semantic tags [knowledge]/[rule] instead of directory names
…ience distillation

Add timeout parameter configuration in API, backend routes, and agent, frontend sets request timeout based on configuration
…gorithm

Refactor knowledge base rule injection logic, unify duplicate code into KnowledgeStore.format_rules_block() method, support preloading data to avoid secondary disk reads. Improve search algorithm to tokenization + multi-field weighted matching, support Chinese-English mixed query splitting and table name tag bonus. Update related documentation and test cases.

- Add format_rules_block() and load_always_apply_rules() methods
- Implement _tokenize_query() supporting Chinese-English mixed tokenization
- Improve _match_score() weighted algorithm and table name tag bonus
- Update DataLoadingAgent to use last user message as search query
- Refactor rule injection code in 6 Agents
- Improve test coverage and development documentation
- Update Superset SSO configuration example, add DF Token Exchange endpoint
- Supplement SSO token exchange mode documentation, including flow, deployment steps, and security notes
- Mark user metadata feature as completed, abandon imported table editing approach
- Streamline knowledge system documentation, archive completed content
- Update user isolation design document, record implemented portions
- Adjust knowledge injection planning document, highlight core conclusions and implemented portions
Add Chinese and English translations for Agent logs, including status messages and expand/collapse functionality
Implement long message folding and expanding functionality, optimize message display experience
Enhance Agent step display, support icon differentiation for error, warning, and info states
Fix JSON parsing issues when weak models call tools, add validation logic
…le loading plan support

Implement AI functionality for data loading assistant, including:
1. Add LoadPlan type to support multi-table loading plans
2. Add tools for searching data candidates, reading metadata, and proposing loading plans
3. Implement loading plan confirmation card UI component
4. Update i18n translations to support new features
5. Add backend test cases to verify data discovery tools
6. Rename "Data Extraction" to "Data Assistant" to better reflect functionality
feat(data loading): add AI data assistant functionality and multi-tab…
- Add pagination and sorting functionality to table component
- Support backend pagination queries and local sorting
- Add loading state display and pagination information
- Update internationalization text to support pagination display
Set unified maximum row limit for data loading to 2 million rows and remove frontend row selector and related code. Modify all data loaders to use MAX_IMPORT_ROWS constant and update related tests.

- Add MAX_IMPORT_ROWS constant (2 million rows) and apply to all data loaders
- Remove frontend row selector component and related UI code
- Update test cases to verify row limit logic
- Adjust data loading chat agent to use system-configured row limit
Update related design documents and development guidelines to reflect unified limit policy
Global row limit is now handled automatically by the system, so remove all rowLimit related logic from frontend, backend, and test code
refactor(data loading): remove rowLimit related code
…oader

Merge verbose_name, description, and expression into a single description field for improved catalog search, while preserving individual fields for consumers that need them separately.
- Enhanced the oauth_config.py example with detailed comments and a new staging handler for SSO integration.
- Updated the superset_config.py to include a TokenExchangeView for silent token exchange and improved security checks.
- Revised the SSO configuration guide to clarify the structure and requirements for SSO user information parsing and role mapping.
- Refactor data loading to utilize Superset's Chart Data API instead of SQL Lab API, simplifying permission requirements to only `datasource access`.
- Enhance documentation to clarify permission needs and security advantages of the new approach.
- Update related tests to validate the new data retrieval method and ensure compatibility with the Chart Data API.
- Remove obsolete SQL Lab related code and helper functions.
…ized errors

- Introduce `extractErrorMessage` function to handle various error types, including RTK serialized errors, ensuring proper message extraction.
- Update error handling in `DataSourceSidebar`, `DataThread`, and `DBTableManager` components to utilize the new extraction method, preventing `[object Object]` outputs.
- Add tests for `extractErrorMessage` to validate behavior with different error shapes, including ApiRequestError and plain Error instances.
- Refactor existing error handling logic to improve clarity and maintainability.
…ersetLoader

- Introduce functionality to detect and convert epoch-ms temporal columns to appropriate Arrow date/timestamp types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants