logo

Are you need IT Support Engineer? Free Consultant

KI: Challenges of structured data

  • By sujay
  • 01/10/2025
  • 170 Views

The performance strength of generative AI in connection with texts has an enormous utility-starting when writing emails to answering questions up to the writing of wedding speeches. AI models that were trained for text work, such as large language models (Large Language Models, LLMS), have increased this utility, and they are getting better and better in the field of natural language.

However, if you go beyond and use these models for structured, tabular data that is essential for the operational tasks of companies, a few challenges are faced. This imbalance is partially due to the availability of training data. Texts for training models are plentiful and are often pulled from the Internet. On the other hand, tabular data, especially those with several linked tables, are rare.

In order to transfer the progress of AI into the corporate context, researchers who deal with training and compare the performance of these models in the corporate environment require realistic tabular data. For this reason, the SAP “Sales AutoComplettion Linked Business Tables” (Salt) has developed. This is a specially compiled data record with anonymized data from a customer’s ERP system.

Salt was specially developed to support researchers who work on AI models for practical business contexts. Salt is accessible over Hugging face and Github.

Increase your productivity with the most powerful AI and agents based on the context of all your business data.

Challenges: procurement and handling of company data

So far, it is not an easy task to provide the research community realistic company data such as Salt. Data protection, confidentiality and economic interests make it difficult to obtain large, adjusted, high -quality company data rates for training and benchmarking models for certain applications. This means that the gap between the data with which researchers work and the actual company data grows.

In addition, the problem of lack of availability is that corporate data is complex. First of all, business data is usually stored in several interconnected tables. An entry in a customer order can be connected to numerous tables, for example, customer numbers that are linked to a supplier table with address data. Second, tables per se are heterogeneous in relation to the data types that you can contain. For example, one field is a text field, but another can contain numerical or categorical values. Finally, business data often have significant imbalances in terms of columns. This means that a certain product category can be included in up to 90 percent of all customer orders, for example, while others rarely occur.

The best way to support researchers in developing corporate models for these challenges is to provide precise company data.

Salt – the new data set

Precise corporate data are in short supply in AI research. The Salt data record creates a remedy here by providing the research community the first real ERP data set. Salt uses actual industry data from an ERP system in which customer orders are recorded. In order to protect confidentiality, the data was processed minimally.

“There is a gap between science and industry in terms of data. For data protection reasons, this cannot be closed so easily,” says Tassilo Klein from the Research/Salt area at SAP. “But we want the research community to work on real and not just simulated problems.”

Intelligent sales with Joule now available in SAP Sales Cloud

ERP systems help companies manage their core business processes such as finance and expenditure management. With millions of entries and extensive related relational tables, which mainly come from the sales area, the Salt data set replicates customer interactions in an ERP system. Due to the company data from practice, Salt forms a perfect basis for models to better understand the characteristics of corporate data and validate their performance through benchmarking. In addition, Salt researchers should help develop better basic models for connected business data.

If all of this succeeds, automation in companies will advance, since many business processes are based on data in structured table formats. Although this data plays a crucial role in the day -to -day business of companies, the revolutionary generative AI has not yet managed to fully open up its potential.

“Salt is a first step to provide researching authentic representative industry data that enables a little insight into actual corporate data. For the time being, we start with only one customer and an application,” explains Johannes Hoffhart, Chief Technology Officer from Business AI at SAP. “However, we plan to publish other data records that cover a larger range of customers and applications. This can then be used with Salt as a basis for pre-training, adapting and benchmarking of models.”

Another motivation for the publication of this data is cooperation with universities.

“We hope for a collaboration with partners from science, who can usually only publish their results in open repositors,” said Klein. “Another hope is that this data record will encourage more people to try and validate new methods that help basic models to deal better with tabular corporate data.”

That does the SAP

In addition to commitment to the open research community with Salt, the SAP is developing the SAP Foundation Model in order to process tabular corporate data. This AI model, especially for tabular data, is intended to shorten the time until adding value for forward-looking tasks based on tabular data. The underlying model should be able to work with tabular data immediately without or with just a few additional training data. The Portal paperwhich was published in connection with Salt, offers a first look at what this model could look like.

Knowledge graphs play an important role. You work on the basis of metadata – who, what and when of data – can be used by the connections between information. This enables a structured, networked representation of the data that AI models can easily understand and use. With the help of SAP Knowledge Graph, the SAP Foundation Model can be scaled to a variety of different applications and adapted by minor fin tuning.

Find out more about:

Subscribe to the SAP News Center newsletter

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

//
Our customer support team is here to answer your questions. Ask us anything!
👋 Hi, how can I help?