References: UX for Enterprise ChatGPT Solutions

Part 1: UX Foundation for Enterprise ChatGPT

Every good story has a beginning, a middle, and an end. In this part, we’ll start our journey by exploring how traditional user experience methods and best practices can be applied to creating world-class solutions powered by ChatGPT. We will then explore essential user research methods and provide tips and secrets of the trade that work when creating conversational designs. This will lead us to explore how to define and pick the use cases that large language models (LLMs) are best suited to solve. A user experience approach is taught to prioritize use cases. When combined with a method to include development costs into the mix, this powerful method from Agile allows for prioritizing the most valuable solutions to build. Although we’ll consider how these use cases play out with ChatGPT, almost all the learnings can apply to any LLM model. We’ll look at LLM-powered applications, chat-only experiences, and robust chat-powered graphical user interfaces, and we’ll even explain how to work with ChatGPT when there is no UI. This part has the following chapters:

Chapter 1, Recognizing the Power of Design in ChatGPT
Chapter 2, Conducting Effective User Research
Chapter 3, Identifying Optimal Use Cases for ChatGPT
Chapter 4, Scoring Stories
Chapter 5, Defining the Desired Experience

Chapter 1: Recognizing the Power of Design in ChatGPT

GitHub: Repository for Book Material. Richard Miller
Website: OpenAI ChatGPT Experience. OpenAI
Website: Quickstart guide for developers. OpenAI.
Article: Computing Machinery and Intelligence (1950). Alan Turning in Mind.
Wikipedia: Turing Test.
Wikipedia: ELIZA details.
Demo: ELIZA demo. Michael Wallace and enhanced by George Dunlop.
Wikipedia: History of Chatbots.
Video: Steve Martin teaching his kid for the first day of school. Steve Martin.
Wikipedia: The History of VoiceXML.
Article: The origins of OpenAI. Karl Montevirgen
Article: Chatbot Failures. Cem Dilmegani
Wikipedia: Survey of LLMs.
Article: Google's LaMDA. Google
Article: Meta's LlaMa. Meta
Article: Anthropic's Claude. Anthropic
Article: OpenAI's GPT Models. OpenAI
Article: Hick-Hyman Law.
Wikipedia: Explanation of working memory.
Books Collection of useful design books.
Article: ChatGPT AI for Slack. Slack
Article: AI in Salesforce. Salesforce
Article: Feeding Sensitive Data to ChatGPT. Robert Lemos.
Figure 1.3: Getting an account with OpenAI
Figure 1.3: The OpenAI Playground.
GitHub: The Full_Thesis.pdf file. Richard Miller
Article: File Search from OpenAIOpen AI.

Chapter 2: User Research

Figure 2.1: The broad landscape of user research methods https://www.nngroup.com/articles/which-ux-research-methods. Christian Rohrer
Website (bonus link): Nielsen Norman Website Survey of research methods. Christian Rohrer
Book: A deeper dive into user research methods The User Experience Team of One: A Research and Design Survival Guide (2nd Edition)_. Leah Buley
Website: Using comic strips to storyboard UX Comic Strip for Storyboarding. Chris Spalton
Book: Writing Effective Use Cases Writing Effective Use Cases. Alistair Cockburn
Book: Survey Methods Questionnaire Design: How to Plan, Structure and Write Survey Material for Effective Market Research. Brace and Bolton (5th Edition, 2022)
Article: Primer on Likert Scales What is a Likert scale?. Surveymonkey
Article: Creating an Interview Scripts Example User Interview Script. Surveymonkey
Website: Discount Usability Methods Discount Usability Methods. Jakob Nielsen
Website: User Interviews User Interviews. Jakob Nielsen
GitHub: Spreadsheet for Conversational Analysis Examples to Review. Richard Miller and Surlina Yin
GitHub: Spreadsheet template for Conversational Analysis Template for Analysis. Richard Miller and Surlina Yin

Chapter 3: Identifying Optimal Use Cases for ChatGPT

Book: Recommendation on Use Cases Writing Effective Use Cases. Alistair Cockburn
Book: PDF of Alistair's Book Online PDF of Book. Alistair Cockburn
Article: Amazon Searches Top 100 Amazon Searches. Source
Article: The Trolly Problem The Trolly Problem. Source

Chapter 4: Scoring Stories

Article: Weighted Shortest Job First WSJF Discussion. Scaled Agile, Inc.
Book: Software development process The Principles of Product Development Flow. Donald Reinertsen
Article: Comparison of UI evaluation techniques User Interface Evaluation in the Real World: Comparison of Four Techniques. Robin Jeffries, James R. Miller, Cathleen Wharton, and Kathy M. Uyeda. ACM (1991)
Article: Agile Estimating Poker Estimating Poker. Scaled Agile, Inc.
Wikipedia: Accessibility Learning about Accessibility. Wikipedia
Wikipedia: Language Support NLS Support. Wikipedia
Article: Explaining story points Why do we use Story Points for Estimating? Derek Davidson, Scrum.org.
GitHub: Example File for Scoring Stories Scoring Stories Samples. Richard Miller
GitHub: Worksheet Scoring Stories Scoring Stories Worksheet. Richard Miller
Article: Explaining story points Explaining Story Points. Source
Article: More on Story Points Don’t Equate Story Points to Hours. Mike Cohn, Mountain Goat Software.
Website: SAFe Scaled Agile Framework. Source
Article: Using WSJF and the use of Fibonacci Prioritize your backlog: Use Weighted Shortest Job First (WSJF) for improved ROI. Matthew Heusser
GitHub: Deep dive on severity (PDF) More details on setting up a Severity Rubric. Richard Miller

Chapter 5: Defining the Desired Experience

Article: The Role of Micro-interactions in Modern UX Micro-interactions. Mads Soegaard.
Book: (Online Bonus) Recommendation Microinteractions: Full Color Edition: Designing with Details. Dan Saffer and Don Norman
Article: The Essence of Effective Rich Internet Applications RIA introduction. Kevin Mullet, Macromedia Experience Design Team.
Video: Resturant ordering via voice Resturant ordering via voice. Soundhound.
Video: A generative AI co-pilot feature in a GUI Demo of Oracle Virtual Assistant. Jurgen Kress, Oracle.
Article: Current Desktop addressable sizes Current browser usage and statistics. Statcounter.
Books: The work of Edward Tufte The collective works of Edward Tufte. Edward Tufte.

Book: Charts and Visualizations The Visual Display of Quantitative Information, 2nd Ed.. Edward Tufte.
Book: Visualizations Envisioning Information. Edward Tufte.
Book: Design strategies Visual Explanations: Images and Quantities, Evidence and Narrative. Edward Tufte.
Book: How to visually show evidence Beautiful Evidence. Edward Tufte.
Book: Holistic view of the impact of visualization Seeing with Fresh Eyes: Meaning, Space, Data, Truth. Edward Tufte.

Article: Recognition over recall Recognition over recall. Raluca Budiu, Nielsen Normal Group
Article: Web guidelines for accessibility Web Content Accessibility Guidelines (WCAG) 2. World Wide Web Consortium.
Article: US Federal Standards for software Section 508 Standards. US Government.
Article: Accessible Rich Internet Applications suite of web standards WAI-ARIA Guidelines. World Wide Web Consortium.
Article: ADA website American with Disabilities Act. US Government.
Article: AI Guidance . Algorithms, Artificial Intelligence, and Disability Discrimination in Hiring.

Article: European Standards European Standards. Publisher.
Article: Human Factors standards for video (conferencing) Human Factors in Videotelephony. European Telecommunications Standards Institute (ETSI).
Article: ITU-T F.922 Guidelines ITU-T F.922 Guidelines. International Telecommunication Union (ITU).
Website: Reference for a counterpoint to the concept of cheap, fast, or good. Choose two. Cheap, Fast or Good, Choose Two. Benek Lisefski
Book: Recommendation on writing and punctuation Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation. Lynne Truss
Web link: Google Sheets for Testing Translation Length Testing Translation Length. Richard Miller (feel free to clone your own sheet!)
Website: Reference on spelling out numbers When to spell out numbers. MasterClass
Website: Reference on pluralization Introduction to pluralization. Lingo Hub
Website: Reference on plural rules Plural Rules for Countries. Unicode
Article: The Science of Word Recognition The Science of Word Recognition. Microsoft.
Website: Reference on word recognition The Science of Word Recognition (ALL CAPS discussion). Microsoft Team
Website: Reference on cultural design Primer on Cultural Design. Google
Website: Reference on bi-directionality BI-directional design. Goggle

Part 2: Designing LLMS

This part focuses on the most technical part of LLM design, the work done in the LLM, or when using other tools and techniques to make an end-to-end LLM solution. We will start by introducing practices such as Retrieval Augmented Generation (RAG) to integrate enterprise datasets with the generative ability of an LLM. You’ll then learn about the fundamentals of prompt engineering for enterprise applications. The examples follow a non-technical route, so you can get the most out of learning about the steps involved without coding. We’ll then explore fine-tuning to make models think and act based on examples provided that will follow the style and tone needed for any business or enterprise. We’ll also explore a few case studies and give some hands-on experiences to get a feel for the process. This part includes the following chapters:

Chapter 6, Gathering Data: Content is King
Chapter 7, Prompt Engineering
Chapter 8, Fine Tuning

Chapter 6: Gathering Data: Content is King

Article: Open AI Knowledge Retrieval Documentation. OpenAI
(Updated) Documentation: Supported File Formats. OpenAI
Article: Amazon’s RAG Explanation. Amazon
Article: Leveraging LLMs on your domain-specific knowledge base. Michiel De Koninck
Video: Accelerate your Generative AI journey. Databricks, Hanlin Tang, Ina Kolea, and Akhil Aggrawal
Article: A survey of RAG for LLMs. Yunfan Gaoa, Yun Xiong, et. al.
GitHub: FAQ Collection for Testing. Richard Miller
Demo: OpenAI Playground.
Website: The freight forwarding use case Wove.com.
Article: Rate Sheet Terms and Introduction. Publisher.
Documentation: ChatGPT documentation on function calling. OpenAI
Website: Cohere, an AI platform. Cohere
GitHub: FAQ Sample Document. Richard Miller
GitHub: Zip of FAQS as unique PDFs. Richard Miller
GitHub: Zip of 18 FAQ Files (each with 25 or so FAQs). Richard Miller
GitHub: Single PDF with all 441 FAQs. Richard Miller
Video: A Survey of Techniques for Maximizing LLM Performance. OpenAI, Colin Jarvis and John Allard
GitHub: Transcripts of FAQ Test. Richard Miller.
Article: Prolego tips for RAG development. Prolego

Chapter 7: Prompt Engineering

Documentation: Prompt Examples Prompt Documentation. OpenAI.
Video: Techniques for improving LLM Quality (45 minutes) A Survey of Techniques for Maximizing LLM Performance. OpenAI.
Video: Prompt Eng, RAG, and Fine Tuning (15 minutes) Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use. Mark Hennings of Entry Point AI
Article: Getting started with LLM prompt engineering Getting started with LLM prompt engineering. Shane Peckham, Jeff Day, and Diana Hbr for Microsoft.
Article: Prompt Techniques Best Prompt Techniques for Best LLM Responses. Jules Damji.
GitHub: Basic prompting from the CO-STAR framework https://colab.research.google.com/github/dmatrix/genai-cookbook/blob/main/llm-prompts/1_how_to_use_basic_prompt.ipynbBasic prompting from the CO-START framework
Resource: Training from Miha Fine Tuning Training from Mihael Cacic
Video: Optimize Instruction tuned Conversational AI/LLM OpenAI Playground: Optimize Instruction tuned Conversational AI /LLM. code_your_own_AI
Article: Self-consistency improves chain of thought reasoning in language models Self-consistency improves chain of thought reasoning in language models. Wang, et. al..
Article: Writing clear instructions Strategy: Write clear instructions. Open AI.
Article: Chaining complex prompts for stronger performance Chaining complex prompts for stronger performance. Publisher.
Video: Andrew NG Agentic Presentation Video: Andrew NG Agentic Presentation. Publisher.
Article: SELF-REFINE: Iterative Refinement with Self-Feedback SELF-REFINE: Iterative Refinement with Self-Feedback. Madaan et al.
Article: Reflexion: Language Agents with Verbal Reinforcement Learning Reflexion: Language Agents with Verbal Reinforcement Learning. Shinn et al.
Article: Gorilla: Large Language Model Connected with Massive APIs Gorilla: Large Language Model Connected with Massive APIs. Patil et al.
Article: MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action. Yung et al.
Article: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Wei et al.
Article: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face. Shen et al.
Article: Communicative Agents for Software Development Communicative Agents for Software Development. Qian et al.
Article: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al.
Article: A guide to prompt techniques A guide to prompt techniques. Tensor Ops.
Article: Improving LLMs with emotional prompts Large Language Models Understand and Can Be Enhanced by Emotional Stimuli. Cheng Li et al..
Article: The Good, The Bad, and Why? Unveiling Emotions in Generative AI The Good, The Bad, and Why? Unveiling Emotions in Generative AI. Publisher.
Demo: Playground for learning about Temperature and Top P Playground for learning about Temperature and Top P. Open AI.
Article: Prompt Engineering Guide Prompt Engineering Guide. Publisher.
Article: Google explains multi-modal prompting for Gemini Google explains multi-modal prompting for Gemini. Google.
Article: Language Is Not All You Need: Aligning Perception with Language Models Language Is Not All You Need: Aligning Perception with Language Models. Shaohan Huang et. al. Microsoft.
Documentation: Ingredients of a Prompt Template Ingredients of a Prompt Template. Salesforce.
Documentation: Guide to the prompt builder Guide to the prompt builder. Salesforce.
Article: Lost in the Middle: How Language Models Use Long Contexts Lost in the Middle: How Language Models Use Long Contexts. Nelson F. Liu et. al. Stanford

Chapter 8: Fine-Tuning

Demo: Tokenizer Tokenizer. Open AI.
Article: When to use fine-tuning (same link as above) When to use Fine Tuning. Open AI.
Demo: OpenAI Fine Tuning OpenAI Fine Tuning. Open AI.
GitHub: Training Data with ten examples 10 Fine Tuning Examples. Richard Miller.
GitHub: 20 validation examples 20 Fine Tuning Validation Examples. Richard Miller.
GitHub: Training Data with 30 examples 30 Fine Tuning Examples. Richard Miller.
GitHub: 49 training examples 49 Fine Tuning Examples. Richard Miller.
GitHub: (Online bonus) Complete file with all training examples All 78 training examples. Richard Miller.
Training: Miha's training website where I got Fine Tuning training https://miha.academy/LLM Training Academu. Mihael Cacic.
Article: Training vs. Validation Loss Training vs. Validation Loss. Publisher.
Documentation: Calling functions with chat models Calling functions with chat models. Open AI.
Article: Learning Rate What is Learning Rate. ChatGPT Guide.
Documentation: ChatGPT Fine Tuning Documentation Fine Tuning Intergrations. Open AI.
Article: Getting Started with OpenAI Evals Evals. Open AI.

Part 3: Care and Feeding

In the last phase of your journey, we’ll explore the tools and resources needed to understand how the LLM is performing and what to look for to fix it if it isn't up to expectations. You’ll first learn how to apply existing guidelines and heuristics that can be used in ways you didn’t expect to craft LLM solutions. We’ll even give you the skills to adapt web and traditional design thinking to conversational AI. Then, you’ll dive into understanding how to monitor and evaluate solutions based on industry AI and user experience metrics, covering objective and subjective performance measures in the care and feeding process. The lifecycle of an LLM solution demands good business processes to be successful. Best practices that can work at the most prominent enterprises are shared to help create effective methods for creating world-class LLM solutions. This part includes the following chapters:

Chapter 9, Guidelines and Heuristics
Chapter 10, Monitoring and Evaluation
Chapter 11, Process
Chapter 12, Summary

Chapter 9: Guidelines and Heuristics

Article: 944 Guidelines for Designing User Interface Software Smith and Mosier’s Guidelines for Designing User Interface Software. Publisher.
Article: First Principles of Interaction Design First Principles of Interaction Design. Bruce Tognazzini
Article: Intro to Heuristic Evaluation (HE) What is Heuristic Evaluation (HE)?. Interaction Design Foundation.
Article: How to perform a heuristic evaluation How to perform a heuristic evaluation. Nielsen Normal Group.
Article: Memory Recognition and Recall in User Interfaces https://www.nngroup.com/articles/recognition-and-recall/. Nielsen Normal Group.
Article: Proposal to include Accessibility as the 11th Heuristic Proposal to include Accessibility as the 11th Heuristic. Jim Ekanem.
GitHub: Guidelines typical of a GUI and supporting conversational experiences Heuristic Checklist example. Richard Miller.
Website: Writing standards in the Chicago Manual of Style Chicago Manual of Style. Publisher.
Article (Online Bonus): The 10 Usability Heuristics Reimagined The 10 Usability Heuristics Reimagined. Jakob Nielsen

Chapter 10: Monitoring and Evaluation

Documentation: Introduction to RAGAs Introduction to RAGAs. Ragas.
Tutorial: Installing RAGAs Installing RAGAs. Ragas.
Article: Evaluating the performance of RAG solutions Evaluating the performance of RAG solutions. Ragas.
Article: LLM Leaderboard Hughes Hallucination Evaluation Model (HHEM) Leaderboard. Hughes.
Discussion: Text Embedding Issues Discussion: Text Embedding Issues.
Article: Semantic Answer Similarity: The Smarter Metric to Score Question Answering Predictions Semantic Answer Similarity: The Smarter Metric to Score Question Answering Predictions. Isabelle Nguyen.
Article: Semantic Answer Similarity for Evaluating Question Answering Models Semantic Answer Similarity for Evaluating Question Answering Models. Risch et al..
Article: Metrics for evaluating LLMs Metrics for evaluating LLMs. Publisher.
Article: Evaluating RAG Applications with RAGAs Evaluating RAG Applications with RAGAs. Publisher.
Article: Automating Hallucination Detection Automating Hallucination Detection. Vectara.
Article: An evaluation on large language model outputs: Discourse and memorization An evaluation on large language model outputs: Discourse and memorization. Wynter et al., 2023.
Article: Taxonomy of Hallucinations in LLMs Taxonomy of Hallucinations in LLMs. Deval Shah, Lakera.
Article: A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. Huang et al., 2023.
Video: Techniques for maximizing LLM performance Techniques for maximizing LLM performance. Open AI.
Documentation: OpenAI's view on testing strategy OpenAI's view on testing strategy. Open AI.
Article: Out of Domain Detection Out of Domain Detection. Open AI.
Video: Tutorial for improving retrieval Tutorial for improving retrieval. Open AI.
Table 10.7 Article: How to Use Rouge 2.0 How to Use Rouge 2.0. Open AI.
Table 10.7 Article: Perplexity and Burstiness Perplexity and Burstiness. Open AI.
Table 10.7 Product: Originality AI Originality AI. Open AI.
Table 10.7 Aiesera: LLM Evaluation: Metrics and Benchmarking Performance Aiesera: LLM Evaluation: Metrics and Benchmarking Performance.
Article: How to Evaluate LLMs: A Complete Metric Framework How to Evaluate LLMs: A Complete Metric Framework.
Article: Patterns of Trustworthy Experimentation: During-Experiment Stage Patterns of Trustworthy Experimentation: During-Experiment Stage.
Article: How to modify the NPS Survey question to meet your needs How to modify the NPS Survey question to meet your needs.
Table 10.9 Article: NPS scores for the table, Nice Source Nice NPS scores.
Table 10.9 Article: NPS scores for the table, CustomerGauge Source CustomerGuage NPS scores.
Table 10.9 Article: Typical NPS for a product Simplesat NPS scores.
Article: Net Promoter Score Net Promoter Score
Article: How to SUS out usability scores How to SUS out usability scores.
Article: Is the SUS Too Antiquated Is the SUS Too Antiquated.
Article: 5 Ways to Interpret a SUS Score 5 Ways to Interpret a SUS Score.
Article: Software Usability Scale Software Usability Scale.

Chapter 11: Working Design Processes into a Development Lifecycle

Website: AI marketplace for Jira and Confluence Atlassian Marketplace.
Video: Introduction to Agile Introduction to Agile. Agile Alliance.
Article: State of Agile Report (linked to latest report) State of Agile Report. Digital AI..
Article: The Agile Manifesto The Agile Manifesto.
Data: Ultrachat 200,000-row data set Ultrachat 200,000-row data set.

Chapter 12: Summary

Website: Boomi homepage for Six Tenents of AI Readiness Weblink, not deep linking to source. Michael Bachman, Boomi.
Book: The Goal The Goal: A Process of Ongoing Improvement. Eliyahu M Goldratt, Jeff Cox, and David Whitford (North River Press).
GitHub: Wisdom Pyramid slide in Powerpoint Wisdom Pyramid slide. Richard Miller.

Books Recommendations: A summary of all books referenced and recommended in this book.

A primer for learning interaction designer (Chapter 1) The Design Of Everyday Things Paperback. Don Norman (Revised & Expanded Edition, 2013)
A reference book for design principles (Chapter 1) Universal Principles of Design, Revised and Updated: 125 Ways to Enhance Usability, Influence Perception, Increase Appeal, Make Better Design Decisions, and Teach through Design. Lidwell, Holden, and Butler (2010)
The psychology of design (Chapter 1) Designing with the Mind in Mind: Simple Guide to Understanding User Interface Design Guidelines. Jeff Johnson (3rd Edition, 2020)
Introduction to design (Chapter 1) Don't Make Me Think, Revisited: A Common Sense Approach to Web Usability. Steve Krug (3rd Edition, 2013)
The human factors underpinning of UX design (Chapter 1) Engineering Psychology and Human Performance. Christopher D. Wickens, Helton, Hollands, and Banbury (5th Edition, 2021)
Research methods (Chapter 2) The User Experience Team of One: A Research and Design Survival Guide. Leah Buley
Writing use cases (Chapters 2 and 3) Writing Effective Use Cases. Alistair Cockburn (2000)

Survey Method (Chapter 2) Survey Methods. Brace and Bolton (5th Edition, 2022)

Product development principles (Chapter 4) The Principles of Product Development Flow. Donald Reinertsen (2009)

Microinteraction (Online Bonus) (Chapter 5) Microinteractions: Full Color Edition: Designing with Details. Dan Saffer and Don Norman

The books of Edward Tufte

Book: Charts and Visualizations (Chapter 5) The Visual Display of Quantitative Information, 2nd Ed.. Edward Tufte.

Book: Visualizations (Chapter 5) Envisioning Information. Edward Tufte.

Book: Design strategies (Chapter 5) Visual Explanations: Images and Quantities, Evidence and Narrative. Edward Tufte.

Book: How to visually show evidence (Chapter 5) Beautiful Evidence. Edward Tufte.

Book: Holistic view of the impact of visualization (Chapter 5) Seeing with Fresh Eyes: Meaning, Space, Data, Truth. Edward Tufte.

On the use of punctuation and writing quality (Chapter 5) Eats, Shoots & Leaves: The Zero Tolerance Approach to Punctuation. Lynne Truss (2006)

Classic reference on interface design (Chapter 9) Tog on Interface. Bruce Tognazzini (1992)

How to understand the goal of a business (Chapter 12) The Goal: A Process of Ongoing Improvement. Eliyahu M Goldratt, Jeff Cox, and David Whitford (North River Press).

And yes, I do hope to earn a few bucks with any book purchases for my reference material. Writing a book is a labor of love, so please use my links. It is at no cost to you!
© Packt Publishing and Richard H. Miller, All Rights Reserved.