Working With Large Data Models In SAP Datasphere U...

A few weeks ago, my colleague @Yonggang_Yu published a great blog post: Using Claude Code and Natural Language to Create SAP Datasphere Artifacts

I loved the idea: describe what you need in plain language, and Claude Code translates that into the right CLI commands to create tables, views, and analytic models in Datasphere. I forked his repo and started playing around with it.

It worked really well. But after a while, I ran into a limitation.

The Problem: You Rarely Start from Scratch

When I started testing, I was creating fresh tables and views. That's fun, but it's not what most real projects look like.

In practice, you're usually working inside an existing data model. Maybe it's been growing for months. There are 50 tables, 120 views, 3 analytic models and they all depend on each other in ways that aren't always obvious. A business requirement comes in: “we need to add a country code to the reporting layer.” Or a data architect says a column needs to be renamed to match a new naming standard.

That's when the original skills hit a wall. They're great at creating new objects, but they don't help you understand what you already have, or how a change to one object ripples through everything downstream.

I wanted to solve that problem. I used the SAP Group Reporting data model in Datasphere as my test case as it's a real-world, complex model with a deep chain of views layered on top of consolidated finance tables. Exactly the kind of thing you'd find in a production environment. And I happened to have worked on it recently and knew it fairly well. Also, this data model is available to anyone who wants to try for themselves.

The Challenge: Dependency Chains and Circular Deadlocks

To understand why this is hard, you need to know a bit about how Datasphere enforces consistency.

When you have a chain like this:

SAP_FI_SOURCE_TABLE
    └── SAP_FI_INTERMEDIATE_VIEW
            └── SAP_FI_REPORTING_VIEW
                    └── AM_FINANCE

…and you want to add a new column, you can't just add it to the source table and call it done. Every view in the chain needs to be updated to pass that column through. With 4 objects that's annoying. With 20 objects it's a real problem and takes a lot of time in the Datasphere UI.

Renaming or removing a column is even harder. Datasphere enforces referential integrity on every save. If you try to rename a column on a view, DSP blocks it with an HTTP 422 error, because downstream objects still reference the old name. But if you try to update the downstream object first, DSP blocks that too – because the upstream view doesn't have the new name yet.

The solution is a flag called --allow-missing-dependencies. It tells Datasphere: “save this, even if some references don't resolve yet, they'll be fixed shortly.” With this flag, the deadlock breaks. Order stops mattering. You can update all objects freely and Datasphere sorts it out once everything is saved.

Figuring this out and baking it into every skill was one of the key things I had to get right.

The New Skills

I added five new skills on top of Yonggang's original six. Together they cover the “working inside an existing model” use case.

`impact-analysis` – understand before you change

Before touching anything, you (and Claude!) want to know what you're dealing with. This skill scans the entire space, builds a dependency graph of every view and analytic model, and shows you the full upstream/downstream chain for any object.

You tell Claude Code:

“What objects depend on SAP_FI_IL_A_ConsolidationUnit? Show me everything downstream.”

Claude runs the impact-analysis skill and you get a tree like this in the terminal:

SAP_FI_IL_A_ConsolidationUnit [view]
  └── SAP_FI_ConsolidationUnit [view]          (direct)
        └── SAP_FI_IL_T_ConsolidationUnit [view]  (direct)
              └── AM_ConsolidationUnit [AM]         (direct)

Every object is tagged by type ([table], [view], [AM]), ordered from closest to furthest. You can also ask “which of these views are missing column COUNTRY_CODE?” and the skill highlights exactly which ones need updating, giving you a ready-made action plan before a single change is made.

The first run reads the whole space (~2-3 minutes for 100+ objects). After that, results are cached and every subsequent query is instant.

`propagate-columns` – add a column end-to-end

Once you've added a column to a source table, this skill cascades it through every downstream view automatically. You tell Claude:

“Add COUNTRY_CODE (3-char string) to SAP_FI_IL_A_ConsolidationUnit and make sure it appears in all downstream views.”

Claude seeds the table, then runs the propagation. The terminal shows progress as each view is updated and verified:

[1/4] SAP_FI_ConsolidationUnit          → updated ✓
[2/4] SAP_FI_IL_T_ConsolidationUnit     → updated ✓
[3/4] SAP_FI_IL_A_ConsolidationUnitHierDir → updated ✓
[4/4] AM_ConsolidationUnit              → updated ✓

All objects verified. Deploying in topological order...
Deploy complete.

It walks the graph in topological order (sources before consumers), updates each view, verifies the column is present, then deploys everything in sequence. You can also ask Claude to do a dry run first and it will print the full plan without touching anything.

`rename-column-cascade` – rename across the whole chain

You tell Claude:

“Rename CNSLDTN_UNIT to CONSOLIDATION_UNIT in SAP_FI_REPORTING_VIEW and all objects that depend on it.”

Claude renames the column in the starting view, then propagates the change through every downstream view and analytic model. The --allow-missing-dependencies flag is used on every save automatically, so the circular deadlock described above never becomes your problem.

`remove-column-cascade` – remove across the whole chain

Same pattern as rename, but for removing a column. Useful when cleaning up deprecated fields that have spread across many views.

`remove-column-cascade` – remove across the whole chain

Same pattern as rename, but for removing a column. Useful when cleaning up deprecated fields that have spread across many views. Just tell Claude which column to remove and where to start and it handles the rest.

`add-columns-to-table` and `add-columns-to-view`

The building blocks used by the cascade skills above. These update a single object precisely – all three locations that Datasphere requires for graphical views (element definitions, SELECT projection, and the visual UI model). Getting any of these out of sync causes Datasphere to silently convert your graphical view to SQL mode, which is hard to recover from.

A Concrete Example: Group Reporting

Let me walk through a realistic scenario using the Group Reporting Data Model which is available as SAP Content.

The model has a deep chain: source integration tables feed intermediate views, which feed reporting views, which feed analytic models. A typical request: “The CDS Views in the connected S4 System have new columns, these need to be replicated to DSP and propagated all the way up to the analytic model.”

Step 1: understand what's downstream

“Show me everything downstream of SAP_FIN_CS_IL_I_CNSLDTNGROUPSTRUCTURE_2, and check which objects are missing the column SEGMENT_CODE.”

Claude runs impact-analysis and prints the dependency tree with a gap report at the bottom:

SAP_FIN_CS_IL_I_CNSLDTNGROUPSTRUCTURE_2 [table]
  └── SAP_FIN_CS_IL_A_ConsolidationUnit [view]      SEGMENT_CODE: missing ✗
        └── SAP_FIN_CS_ConsGroup [view]              SEGMENT_CODE: missing ✗
              └── SAP_FIN_CS_IL_T_ConsGroup [view]   SEGMENT_CODE: missing ✗
                    └── AM_GroupReporting [AM]        SEGMENT_CODE: missing ✗

Action plan: 4 objects need updating.

Step 2: make the change

“Add SEGMENT_CODE (string, 10 chars, label ‘Segment Code') to that table and propagate it to all downstream views.”

Claude seeds the source table, then cascades the column through the chain, updating, verifying, and deploying each object in order:

Adding SEGMENT_CODE to SAP_FIN_CS_IL_I_CNSLDTNGROUPSTRUCTURE_2... ✓

Propagating downstream:
[1/4] SAP_FIN_CS_IL_A_ConsolidationUnit    → updated ✓
[2/4] SAP_FIN_CS_ConsGroup                 → updated ✓
[3/4] SAP_FIN_CS_IL_T_ConsGroup            → updated ✓
[4/4] AM_GroupReporting                    → updated ✓

All objects verified. Deploying in topological order...
Deploy complete.

What would have been an hour of manual UI work (opening each view, adding the column in three places, saving, deploying) runs unattended in a couple of minutes.

Limitations

This is still early stage work, and it's worth being honest about where it falls short.

The skills cover the most common operations well, but Claude doesn't always take the direct path. Sometimes it runs into loops, asks for permissions repeatedly, or tries to figure things out step by step when the skill should have handled it cleanly. This is partly a prompting problem, partly a maturity problem – the more you use it, the better you get at describing what you want clearly.

More importantly: always check what Claude did. The cascade skills make real changes to real objects and deploy them. The backups are there for a reason. Before calling something done, open a few of the affected views in the Datasphere UI and verify the column is actually there, the view still renders in graphical mode, and nothing looks off. Automated doesn't mean infallible.

SQL views are a known gap – they can't be edited programmatically, so if one sits in the middle of a dependency chain, the cascade stops there and you have to finish manually. This could probably be solved by additional skills, but I have not looked into it yet.

What's Next

There's a lot of room to grow here. Better SQL view support, smarter conflict resolution, maybe skills that read a requirements doc and figure out what needs changing on their own, like Yonggang mentioned at the end of his Blog Post. I'm planning to keep adding things as I run into new use cases, and I'd genuinely welcome contributions from anyone who hits a gap and wants to fill it.

The code for all new skills is on GitHub: https://github.com/FredericWall/dsp-cli
Feel free to check the repo out, any contribution would be appreciated 🙂

Thanks to Yonggang for the original project and the idea. This is built directly on top of his work.

Source link