Access to artificial intelligence (AI) and the drive for adoption by organizations is more prevalent now than it’s ever been, yet many companies are struggling with how to manage data and the overall process. As companies open this “pandora’s box” of new capabilities, they must be prepared to manage data inputs and outputs in secure ways or risk allowing their private data to be consumed in public AI models.

Through this evolution, it is critical that companies consider that ChatGPT is a public model built to grow and expand off use through advanced learning models. Private instances will be leveraged shortly where the model for answering prompted questions will arise solely from internal data selected – as such, it’s important that companies determine where public use cases will be appropriate (e.g., non-sensitive information) versus what mandates the need for private instances (e.g., company financial information and other data sets that are either internal and/or confidential).

All in . . . but what about the data?

The popularity of recently released AI platforms such as Open AI’s ChatGPT and Google Bard has led to a mad rush for AI use cases. Organizations are envisioning a future in this space where AI platforms will be able to consume company-specific data in a closed environment vs. using a global ecosystem as is common today. AI relies upon large sets of data fed into it to help create output but is limited by the quality of data that is consumed by the model. This was on display during the initial test releases of Google Bard, where it provided a factually inaccurate answer on the James Webb Space Telescope based on reference data it ingested. Often, individuals will want to drive toward the end goal first (implementing automation of data practices) without going through the necessary steps to discover, ingest, transform, sanitize, label, annotate, and join key data sets together. Without this important step, AI may produce inconsistent or inaccurate data that could put an organization in a risky gambit of leveraging insights that are not vetted.

Through data governance practices, such as accurately labeled metadata and trusted parameters for ownership, definitions, calculations, and use, organizations can ensure they are able to organize and maintain their data in a way that can be useable for AI initiatives. By understanding this challenge, many organizations are now focusing on how to appropriately curate their most useful data in a way that can be readily retrieved, interpreted, and utilized to support business operations.

Storing and retrieving governed data

Influential technology, like Natural Language Processing (NLP), allows for the retrieval of responses based on questions that are asked conversationally or a standard business request. This process parses a request into meaningful components and ensures that the right context is applied within a response. As technology evolves, this function will allow for a company’s specific lexicon to be accounted for and processed through an AI platform. One application of this may be related to defining company-specific attributes for particular phrases (e.g., How a ‘customer’ may be defined for an organization vs. the broader definition of a ‘customer’) to ensure that organizationally agreed nomenclature and meaning are applied through AI responses. For instance, an individual may be asked to “create a report that highlights the latest revenue by division for the past two years: that applies all the necessary business metadata that an analyst and management would expect.

Historically, this request requires individuals to convert the ask into a query that can be pulled from a standard database. AI and NLP technology is now capable of processing both the request and the underlying results, enabling data to be interpreted and applied to business needs. However, the main challenge is that many organizations do not have their data in a manner or form that is capable of being stored, retrieved, and utilized by AI – generally due to individuals taking non-standard approaches to obtaining data and making assumptions about how to use data sets.

Setting and defining key terms

A critical step for quality outputs is having data organized in a way that can be properly interpreted by an AI model. The first step in this process is to ensure the right technical and business metadata is in place. The following aspects of data should be recorded and available:

Term definition

Calculation criteria (as applicable)

Lineage of the underlying data sources (upstream/downstream)

Quality parameters

Uses/affinity mentions within the business

Ownership

The above criteria should be used as a starting point for how to enhance the fields and tables captured to enable proper business use and application. Accurate metadata is critical to ensure that private algorithms can be trained to emphasize the most important data sets with reliable and relevant information.

A metadata dictionary that has appropriate processes in place for updates to the data and verification practices will support the drive for consistent data usage and maintain a clean, usable data set for transformation initiatives.

Understanding the use case and application

Once the right information is recorded related to the foundation of the underlying data set, it is critical to understand how data is ultimately used and applied to a business need. Key considerations regarding the use case of data include documenting the sensitivity of information recorded (data classification), organizing and applying a category associated with a logical data domain structure to data sets (data labeling), applying boundaries associated with how data is shared, and stored (data retention), and ultimately defining protocols for destroying data that is no longer essential or where requests for the removal of data have been presented and are legally required (data deletion).

An understanding of the correct use and application of underlying data sets can allow for proper decision-making regarding other ways data can be used and what areas an organization may want to ensure they do not engage in based on strategic direction and legal and/or regulatory guidance. Furthermore, the storage and maintenance of business and technical metadata will allow AI platforms to customize the content and responses generated to ensure organizations receive both tailored question handling and relevant response parsing – this will ultimately allow for the utilization of company-specific language processing capabilities.

Prepare now for what’s coming next

It is now more critical than ever that the right parameters are placed around how and where data should be stored to ensure the right data sets are being retrieved by human users while allowing for growth and enablement of AI use cases going forward. The concept of AI model training relies on clean data which can be enforced through governance of the underlying data set. This further escalates the demand for appropriate data governance to ensure that valuable data sets can be leveraged.

This shift has greatly accelerated the need for data governance – which by some may have been seen as a ‘nice to have’ or even as an afterthought into a ‘must have’ capability allowing organizations to remain competitive and be seen as truly transformative in how they use data, their most valuable asset, both internally for operations and with their customers in an advanced data landscape. AI is putting the age-old adage of ‘garbage in, garbage out’ onto steroids, allowing any data defects flowing into the model to potentially be a portion of the output and further highlighting the importance of tying up your data governance controls.

Read the results of Protiviti’s Global Technology Executive Survey: Innovation vs. Technical Debt Tug of War 

Connect with the Author

Will Shuman
Director, Technology Consulting

Data Management

Over the last few months, both business and technology worlds alike have been abuzz about ChatGPT, and more than a few leaders are wondering what this AI advancement means for their organizations. Let’s explore ChatGPT, generative AI in general, how leaders might expect the generative AI story to change over the coming months, and how businesses can stay prepared for what’s new now—and what may come next.

What is ChatGPT?

ChatGPT is a product of OpenAI. It’s only one example of generative AI.

GPT stands for generative pre-trained transformer. A transformer is a type of AI deep learning model that was first introduced by Google in a research paper in 2017. Five years later, transformer architecture has evolved to create powerful models such as ChatGPT.

ChatGPT has significantly improved the number of tokens it can accept (4,096 tokens vs 2,049 in GPT-3), which effectively allows the model to “remember” more about a current conversation and informs subsequent responses with context from previous question-answer pairs in a conversation. Every time the maximum number of tokens is reached, the conversation resets without context—reminiscent of a conversation with Dory from Pixar’s Nemo.

ChatGPT was trained on a much larger dataset than its predecessors, with far more parameters. ChatGPT was trained with 175 billion parameters; for comparison, GPT-2 was 1.5B (2019), Google’s LaMBDA was 137B (2021), and Google’s BERT was 0.3B (2018). These attributes make it possible for users to enquire about a broad set of information.

ChatGPT’s conversational interface is a distinguished method of accessing its knowledge. This interface paired with increased tokens and an expansive knowledge base with many more parameters, helps ChatGPT to seem quite human-like.

ChatGPT is certainly impressive, and its conversational interface has made it more accessible and understandable than its predecessors. Meanwhile, however, many other labs have been developing their own generative AI models. Some examples are originating from MicrosoftAmazon Web ServiceGoogleIBM , and more, plus from partnerships among players. The frequency of new generative AI releases, the scope of their training data, the number of parameters they are trained on, and the tokens they can take in will continue to increase. There will be more developments in the generative AI space for the foreseeable future, and they’ll become available rapidly. It was 2 years from GPT-2 (February 2019) to GPT-3 (May 2020), 2.5 years to ChatGPT (November 2022), and only 4 months to GPT-4 (March 2023).

How ChatGPT and generative AI fit with conversational AI

Protiviti

Text-based generative AI can be considered a key component in a broader context of conversational AI. Business applications for conversational AI have, for several years already, included help desks and service desks. A natural language processing (NLP) interpretation layer underpins all conversational AI, as you must first understand a request before responding. Enterprise applications of conversational AI today leverage responses from either a set of curated answers or results generated from searching a named information resource. The AI might use a repository of frequently asked questions (producing a pre-defined response) or an enterprise system of record (producing a cited response) as its knowledge base.

When generative AI is introduced into conversational applications, it is impossible today to provide answers that include the source of the information The nature of generative capabilities of a large language model is to create a novel response by compiling And restructuring information from a body of information. This becomes problematic for enterprise applications, as it is often imperative to cite the information source to validate a response and allow further clarification.

Another key challenge of generative AI today is its obliviousness to the truth. It is not a “liar,” because that would indicate an awareness of fact vs. fiction. It is simply unaware of truthfulness, as it is optimized to predict the most likely response based on the context of the current conversation, the prompt provided, and the data set it is trained on. In its current form, generative AI will oblige information as prompted, which means your question may lead the model to produce false information. Any rules or restrictions on responses today are built in as an additive “safety” layer outside of the model construct itself.

For now, ChatGPT is finding most of its applications in creative settings. But one day soon, generative AI like ChatGPT will draw responses from a curated knowledge base (like an enterprise system of record), after which more organizations will be able to apply generative AI to a variety of strategic and competitive initiatives, as some of these current challenges could be addressed.

Leaders can start preparing today for this eventuality, which could come in a matter of months, if recent developments indicate how fast this story will continue to move: in November of 2022, ChatGPT was only accessible via a web-based interface. By March of 2023, ChatGPT’s maker OpenAI announced the availability of GPT3.5 Turbo, an application programming interface (API) via which developers can integrate ChatGPT into their applications. The API’s availability doesn’t resolve ChatGPT’s inability to cite sources in its responses, but it indicates how rapidly generative AI capabilities are advancing. Enterprise leaders should be thinking about how advances in generative AI today could relate to their business models and processes tomorrow.

What it takes to be ready

Organizations that have already gained some experience with generative AI are in a better position than their peers to apply it one day soon. The next impressive development in generative AI is fewer than six months away. How can organizations find or maintain an edge? The principles of preparing for the great “what’s next?” remain the same, whether the technology in question is generative AI or something else.

It’s hard to achieve a deep, experiential understanding of new technology without experimentation. Leaders should define a process for evaluating these AI technology developments early, as well as an infrastructure and environment to support experimentation.

They should respond to innovations in an agile way: starting small and learning by doing. They’ll keep track of innovation in the marketplace and look for opportunities to refresh their business and competitive strategies as AI advances become available to them.

They should seed a small cross-functional team to monitor these advancements and experiment accordingly. Educate that team about the algorithms, data sources, and training methods used for a given AI application, as these are critical considerations for enterprise adoption. If they haven’t already, they should develop a modular and adaptable AI governance framework to evaluate and sustain solutions, specifically including generative abilities, such as the high-level outline below:

Protiviti

Leaders need not wonder what ChatGPT, other generative AI, and other revolutionary technologies might mean for their business and competitive strategy. By remaining vigilant to new possibilities, leaders should create the environment and infrastructure that supports identification of new technology opportunities and prepare to embrace the technology as it matures for enterprise adoption.

Learn more about Protiviti’s Artificial Intelligence Services.

Connect with the Author

Christine Livingston
Managing Director, Technology Consulting

Artificial Intelligence, Machine Learning

The European Data Protection Board (EDPB) wants to set up a task force to take a closer look at AI tools like ChatGPT, which is being interpreted as an indication that European data protection officers could set stricter rules for the use of AI.

The Italian data protection authorities in particular got a head start a few weeks ago. Since ChatGPT operator OpenAI couldn’t prove a working age verification for use, and the models behind the AI ​​tool were trained with data from Italian citizens without their knowledge, Italians banned ChatGPT without further details, and set the operator a deadline of late April to present plans for improvements.

ChatGPT is threatened with bans across Europe

Other countries in Europe could follow suit with comparable measures. In Germany, for example, Federal Commissioner for Data Protection and Freedom of Information Ulrich Kelber announced that his agency was closely monitoring developments in Italy, and an AI task force of data protection officers has taken on the matter, he said. 

Further south, if colleagues come to the conclusion that ChatGPT violates the EU data protection regulation (EU-DSGVO), a ban could also loom in Spain. So data protection officials there have also announced a preliminary investigation in order to shed more light on the practices of OpenAI.

Taking various approaches into consideration, the EDPB task force aims to promote cooperation and exchange of information between multiple data protection authorities. Member states also hope to align their political positions, an insider quoted by Reuters said at a national supervisory authority, who asked not to be named. All of this this will take time and the point is not to punish OpenAI ChatGPT owners or to issue rules, but rather to create general and responsible guidelines that make the use of AI more transparent.

Meanwhile, the EU is currently working on a new legal framework to not only meet the challenges and opportunities of AI effectively, but strengthen trust in these rapidly evolving technologies. It will also be about regulating potential effects on individuals, society, and the economy in the best possible way, and creating an economic environment in which research, innovation and entrepreneurship could flourish. The aim of the European Commission is to increase private and public investments in AI technologies to €20 billion annually.

Commitment by providers is not enough

Despite ChatGPT and AI rapidly developing and stealing recent headlines around the world, the complexities of setting up such an AI set of rules has been going on for years. Reacting to this, the rules planned so far could be tightened again as a result before anything comes into force. 

Despite the dynamics of an ever-changing landscape, the European Parliament intends to enact the world’s toughest regulations for AI use. “Companies’ duty of care alone is not enough,” says Dragoș Tudorache, member of the European Parliament and co-negotiator, in a recent Financial Times article.

To fulfil this objective, the European Parliament plans to oblige AI developers to disclose which data they use to train their algorithms and models. Facial recognition using AI in public spaces will be banned entirely, which is likely to lead to heated debate with police authorities. In addition, it also says AI ​​manufacturers should be held liable for the misuse of their solutions, not users.

However, EU bodies in Strasbourg and Brussels coming to an agreement won’t happen overnight. If the EU Parliament has a draft, it will be further coordinated with the EU Commission, individual member states, and MEPs, and a then final draft law should result from these negotiations. The aim is to pass this law in the current legislative period, which lasts until 2024.

Meanwhile, representatives of the IT industry are warning against strict rules and bans. “We have to drive forward the technological development of AI in Germany and develop a practical set of rules for its application in Europe and worldwide,” said Bitkom president Achim Berg. “The current ban discussion, as initiated by the Federal Data Protection Commissioner, is going in the completely wrong direction.”

Artificial Intelligence, Legal, Regulation

The past several years have thrown numerous challenges at consumer packaged goods (CPG) companies. The pandemic has led to shifting consumer channel preferences, a supply chain crunch, and cost pressure, to name just a few. CPG titan Unilever has been answering the challenge with analytics and artificial intelligence (AI).

The 93-year-old, London-based CPG company is the world’s largest soap producer. Its products include food and condiments, toothpaste, beauty products and much more, including brands like Dove, Hellmann’s, and Ben & Jerry’s ice cream.

Alessandro Ventura, CIO and vice president of analytics and business services for North America at Unilever, has been at the forefront of helping the company apply AI to its businesses for years. While originally in the role of IT director, he has since added analytics and people services to his portfolio.

“That’s everything from facility management, fleet management, employee and facilities services, and people data, and that kind of stuff,” Ventura explains.

Unilever believes AI is not a technology of tomorrow. It’s already being widely used, and Ventura feels all industries will need to adapt to it.

In recent months, Unilever has developed a number of new technology applications to help its lines of business in the markets of tomorrow. One of the most important is “Alex,” short for Alexander the Great. Alex, powered by ChatGPT, filters emails in Unilever’s Consumer Engagement Center, sorting spam from real consumer messages. For the legitimate messages, it then recommends responses to Unilever’s human agents.

“Although Alex is good at what it does, it may lack a bit of a personal touch that instead our consumer engagement center agents have in big quantities,” Ventura says. “So, we let them decide whether they want to respond to our consumer as Alex suggested, or they want to add some personal recommendation; if the answer suggested by Alex is wrong or doesn’t have an answer, they can flag it so Alex can learn it the following time.” 

Generative AI in action

Alex was created using a system of neural networks, with ChatGPT for content generation. Ventura says the tool can understand what a consumer is asking and even capture the tone. It can then store the answer and sentiment in Salesforce. Importantly, he says, the tool does the heavy lifting on those tasks, giving the human agents more time to dedicate to what they do best. To date, Ventura says Alex has helped Unilever reduce the amount of time agents spend drafting an answer by more than 90%.

Another Unilever tool, called Homer, leverages ChatGPT to generate content. It’s a neural network that takes a few details about a product and generates an Amazon product listing, with a short description and long description that matches the brand tone.

“We want to ensure we captured the voice of the brand so, for example, that we differentiate between a TRESemmé and a Dove shampoo, and the system got it absolutely nailed,” Ventura says. 

Another AI-based tool that Unilever launched on the week of US Thanksgiving supports the Hellmann’s mayonnaise brand. Its purpose is to reduce food waste.

“It links up with the recipe management system that we have at Hellmann’s, so somebody can go in and select two or three ingredients that they have in the fridge and get in exchange recipes for what they can do with those ingredients,” Ventura says.

In the first week, the tool got 80,000 users who reported loving it.

For Ventura, that’s the magic of analytics and AI in the CPG space: It enables personalization at scale.

“In CPG, we rely more and more on analytics and AI for different things,” he says. “Consumers are more and more specific about what they want. It’s a bit of a cliché, but they really do want personalized products and experiences. Analytics helps CPG to understand the context they’re navigating through and what the consumer wants, and then, with AI, we can scale that one-to-one relationship across all the multitude of consumers that we have.”

Co-creation key to AI success

Beyond the consumer relationship, analytics and AI are also key to making CPG companies more sustainable. Ventura points to examples like ingredient traceability and using machine learning (ML) to automate forecasting, which in turn helps the company minimize waste. Unilever is also applying analytics and AI to logistics, including tracking inventory and optimizing routes.

“The old interpretation of elasticity, we threw it out the window,” Ventura says of operations in the wake of the inflation crisis. “We had to come up with new calculations because the traditional ones were giving us very different scenarios from what we were seeing happening at the shelves. Going forward, we will continue to see that pressure from all the different challenges coming from the geopolitical situation around the world.” 

To support its innovation around analytics and AI, Unilever has adopted a hybrid model. It has a global center of excellence, but also keeps some data scientists embedded with business units.

“It’s basically a two-gear system,” Ventura says. “The local team can be activated very quickly, ingest the data very quickly, and then create a statistical model and analytics model together with the business, sitting next to each other. Then, if that model can be leveraged across and scaled, we pass it on to the global team so they can move data sets in the global data lake that we have and can start creating and maintaining that model at a global level.” 

Ventura believes co-creation and co-ownership of analytics and AI capabilities with the business function is essential to success.

“Whether it is machine learning for automating the forecast or Alex with the Consumer Engagement Center, if we show up with a black box and say, ‘Hey, follow whatever the machine tells you,’ it will take a long time and probably will never get to 100% trust in the machine,” Ventura says. “With co-creation and co-ownership, I feel like we get to start with the right foot, with the human and the machine working alongside each other in partnership, almost as colleagues. Also, you get a much less biased system in the end because you’re able to introduce a much more diverse angle in your algorithms, both from a business perspective and a technology perspective.” 

Artificial Intelligence, Digital Transformation