Programming social machines

Generative AI responds to social cues in your instructions. Nobody knows why, or how to handle it, but we do know: our relationship with automation is changed forever.

Nov 11, 2023

“Take a deep breath.”
“Give it a shot.”
“Please try hard, this matters to me. I believe in you!”

Before 2022, these statements weren’t computer code - now they are.

You read that right. What was once an exhortation to a fellow human is now an intelligible instruction for a machine. And research is starting to show that instructions like these make a difference in the output you get - we’ve seen significant performance improvements and decreases. Computing and automation will never be the same, but nobody quite understands how or why. It’s time to get smart about programming social machines.

Programming used to be hard

To see this new programming picture, you need to understand the old one.

Before 2022, if you wanted to get a computer to do something for you - from scratch - you had to write precise, procedural instructions for it in notation that wasn’t readily interpretable by novices. This notation allowed for a host of machine-executed operations on well defined inputs and outputs: repeat iterations of a task, based on the last task output (known as recursion), performing a task as long as certain precise conditions were met (“for” loops and if/then conditionals, for example), taking in input in certain ranges, storing data, doing math, producing graphic output… the list goes on. This notation is difficult for many to learn, but underwrote modern society:

The green text is human comments on the code that are not read by the machine - there to help even experts understand what the code is supposed to do.

We call this “code”.

It executes exactly as it is written, every time. That’s the point. Those who can write code can make digital machines that create value for many others, because they can get a copy and run it themselves. That’s what I did with the code above: I made a webapp that runs a statistical simulation of a project’s duration. Anyone with the url can access and run it for free. But a great deal of software isn’t free, and that puts a lot of power in the hands of the few who learned this notational form (and way of thinking, really).

That’s all changed.

Now programming is easy

You’ve probably heard by now that we’re in an era of coding “in natural language”. Practically, this means you can issue instructions to a machine in English (the lingua franca of the world, for good and ill), and machines will fulfill those instructions. And your instructions can be fiendishly complex, containing conditional logic, recursion, and so on - just like code. And you can even ask genAI to produce traditional code as part of this kind of programming. For over a decade, investors and technologists have anticipated and invested in this possibility, partially because one of the key bottlenecks for software production was how hard it was to write code. The idea was that allowing people to code in natural language would dramatically expand the global software engineering population.

You don’t have to know code to program anymore. You know the code. You speak and write it every day to interact with other humans. It’s called “natural language”.

And it’s starting to happen. Anyone who can think procedurally, define a goal, and engage in chat-based dialog can code up a digital machine to get work done. OpenAI recognizes this - their big announcement this week was a tool to allow us to spin up these machines (which they call “GPT”s) much more readily, and to share them (for profit) with others via a marketplace. Ethan Mollick explores how to use them - and their implications - here.

So, fine, we’re all software engineers now, and getting to work is as simple as knowing your ABCs. The barriers are down.

But…

Now, programming is also social

But the barriers are also different. Higher in other ways. Weirder, but also familiar.

That’s because they’re social.

At least since an anonymous google engineer declared their AI to be “sentient”, we’ve had the intuitive sense that there are… additional features to the form of notation we use to program generative AI. genAI systems like ChatGPT produce and respond to queues that are not directly related to the task, or its goals. These are queues like the ones at the opening of this article, related to the mindset of the agent doing the work, and their relationship with other agents. Humans work like this, and we’ve invented two branches of social science to study both aspects: psychology and sociology.

Psychological programming

Right now, the social programming out there is psychologically oriented - these are prompts that include things we might say to shape someone’s thoughts, feelings, perceptions, or effort.

We’re only just discovering the relationship between the psychological aspects of our language and the performance of LLMs. It’s a strange and exciting time.

Psychological prompting was in part inspired by early efforts to probe the “mind” of large language models (most think these models are not conscious, but this has not been proven). Creative psychologists like Michal Kozinski at Stanford started to subject these models to basic tests, showing they exhibited behavior consistent with a “theory of mind” (i.e., an understanding that they’re interacting with other agents with private beliefs and values). We’ve since gotten papers showing that ChatGPT has personality - in other words that it gets coherent scores on well validated personality tests.

But this approach didn’t emerge after hearing from psychologists via papers. Nonpsychologists have been on to this possibility for quite a while longer. You’ve likely heard one of the first successful attempts in this domain: “think step by step”. This prompt was uncovered in May of 2022 by researchers at the University of Tokyo and Google Research as they tried to get GPT-3 to perform better. It was terrible at math, and word problems in particular. The results were astounding: for instance just including this in a prompt to solve a problem in MultiArith (a standardized math assessment) improved model performance from 17.7% to 78.7%! In September of 2023 researchers at Google’s Deep Mind offered an improvement: “take a deep breath and think step by step”. This odd combination of emotional and cognitive information offered improvements of up to 50% on Big-Bench Hard tasks. This is a suite of 23 tasks that humans find very difficult, covering a wide range of topics beyond arithmetic reasoning, including symbolic manipulation and commonsense reasoning.

Despite early results, there are clearly huge untapped and unknown potential here. In part that’s because in most cases this psychological programming is not designed by psychologists who could draw on theories about human psychology to pick powerful prompts. Engineers and computer scientists just kind of messed around and found out!

A wonderful recent example that bucks this trend - written up by a group of coauthors from academia and Microsoft - named an entire domain of psychological programming: emotional prompts. It turns out that indicating that you’re feeling strong emotion (hope, fear, anticipation) or increasing the intensity of your expectations (e.g., “please try hard”, or “you better be sure”) has dramatic and generally positive effects on genAI performance. These kinds of prompts got 115% gains on Big-Bench Hard tasks - more than double the gains from the previous attempts! The key difference in this study is authors drew on psychological theory to both explain why this might work, and to choose amongst potential prompts. This makes it much easier for others to expand on these programming techniques.

With this approach in hand, we can make much more rapid, efficient progress in psychological programming. Here are just a few exciting bodies of theory that explore different ways to improve human performance. As far as I know these have not yet been tested on LLMs:

Instilling a growth mindset (cultivating the mindset that you can improve)
Creating psychological safety (reducing the risk of speaking up)
Nudging towards “fast” and “slow” thinking ( intuition vs. rationality)
Encouraging ownership
Expressing or producing gratitude, wonder, or awe

…the list goes on. Seriously: psychologists and computer scientists should be getting together to save time, improve results, and build new theory about psychological programming. LLMs won’t respond to these prompts exactly like humans do - they’re not human, after all.

But when it comes to programming social machines, psychology is only half the game.

Sociological programming

As far as I can tell, we literally haven’t begun to experiment in the sociological direction. As the name of the discipline implies, this has to do with the social conditions associated with our thoughts, feelings, and action: are we a subordinate or a superior? Higher or lower status? Aware we’re being observed or not? In a hierarchical organization or a “flat” one? A bureaucratic organizational culture or an entrepreneurial one? Collaborating with others in real time or working asynchronously? Humans behave in predictable ways in certain social situations, and often this is a far better predictor of our behavior than any psychological differences between us. Analyzing the effect of social conditions on human behavior is the domain of Sociology (and Social Psychology - the field that sits between these two poles that initially attracted me to social science).

Just like a founder, leader, or legislator might, we can create prompts that imply different social conditions that should shape LLM behavior.

To start concretely are a few examples of potential sociological prompts that should have an effect on LLM performance:

“You are a high-ranking executive responding to a junior employee.”
“Respond as if you worked in a traditional, rule-bound organization, then as if you worked in a dynamic start-up.”
“Imagine responding anonymously online.”
“As a member of the open software community, address this issue.”

The first prompt draws on role theory, which suggests that we behave according to the social roles we occupy. The second draws on organizational sociology: we know that organizational culture shapes human behavior in areas such as formality, rule following and creativity. The third draws on research on the online disinhibition effect - in which some people self-disclose or act out more than they would in person. And the fourth draws on Social Identity Theory, which predicts that our responses very significantly depending on the norms, knowledge, and status base of the group represented. As with psychology, the list goes on, and on, and on. We’ll have to prioritize.

Quick note: we might not have seen sociological programming yet because it doesn’t work for some reason. That would be huge, and very interesting. LLMs encode psychological dynamics but not sociological ones? Why? That would be another good reason to publish failed prompts. But it’s also possible that they do work, but we haven’t tried them (in a research way). That’s because they’re just not as intuitive, or attractive. Humans have a much harder time understanding sociological forces, or believing they’re real when they do. We prize free will, and understand ourselves as contained beings: we don’t like the idea that our situation exerts more control over our intentions and actions than we do. Either way, we’ve got to look into this.

Another reason we need to look into sociological programming is we’re on the brink of introducing tools to the public that rely on multiple LLM agents interacting with each other. For almost ten months now (an eternity in this genAI thing) - software engineers everywhere have been spinning up “multi-agent systems” to solve really complex problems. Think of a system that helps you spin up a small, “just add GPT” organization - complete with different roles and responsibilities - and you’ve got the picture. These have names like AutoGen, MetaGPT, Langroid, and AgentVerse, and they’re amongst the most popular shared software projects on the planet. We just haven’t seen them commercially launched yet.

MetaGPT alone has over 30,000 stars on GitHub, the world’s home for collaborative open software development. For context, most highly-regarded, well-developed projects get only a few hundred or a thousand stars.

In the 90s, Cliff Nass (a researcher at Stanford who’s since deceased - a tragic loss) gave us the theory of Computers As Social Actors (or CASA). But this idea - and the research tradition it inspired - was centered on the notion that humans treated computers as if they were social actors. Not that the computers would respond differently as a result.

Now they do.

Learning to program social machines… socially

We can learn so much more about programming social machines if we do it together. It’s a wild world of work out there, and we need all the insight we can get.

Our first task is to understand exactly how this is so. Especially given all the different LLMs out there, and the different guardrails those companies are putting on these LLMs, it’s not obvious that all the findings of psychology and sociology will neatly predict these systems’ reactions to a very broad range of social programming.

The next task is to figure out what works, and why. Finding useful tactics is helpful, but if we don’t have principles to help us understand what’s going on, our progress is going to be far slower than it should be. We need to develop theories that predict what works - in the words of one of Kurt Lewin (one of my intellectual heroes): “Nothing is as practical as a good theory.” Knowing why something’s true saves time and effort.

Academics or researchers in industry, if you’re reading this and agree, please forward this post to those you think might run experiments to test these things out. This is a target-rich environment, and only a few of us are trying. Happy also to talk about collaborating in this space.

And if you’re just out there in the wild with ChatGPT, I have a challenge for you, and a request: try this stuff out - especially sociological programming - and let us know what you find in the comments. I’ll do the same. This is an exciting time where you have exactly the same, state-of-the-art LLM in your hands as any researcher does. As I do generally in my research, we’re all on a hunt for a positive needle in a negative haystack. The next big discovery in social programming could just as well come from you as from some credentialed academic.

The key is we all have to share what we’re learning. We have to do this socially. Right now the incentives for isolated use of LLMs are strong. You get a quick result and you move on. That blocks a key mechanism for progress: learning from all our individual experimentation out there.

So let’s start programming these social machines, socially - as in sharing, connecting, and learning together, as we learn individually. See you out there!

Trym B

Mar 8Edited

Hi Matt, have you had any insights or experience around sociological programming since you posted this? I'm especially interested if you can point me towards any interesting directions for potential research questions for a PhD thesis in sociology. Keep up with the great writing!

Expand full comment

1 reply by Matt Beane

1 more comment...

Wild World of Work

Discussion about this post