That Carlsmith blog series in 1 page

My takeaways on AI, control and.. green

Apr 24, 2024

Joe Carlsmith is an AI researcher at Open Philanthropy1. He recently wrote a set of essays on AI, controlling the future and harmony. I wanted to talk about them, but my friends hadn’t read them, so here is an outline, in my own words. Joe’s own summary is here. All links in this section are to the essays.

Carlsmith tackles two linked questions:
- How should we behave towards future beings (future humans, AIs etc)?
- What should our priors be about how AIs will behave towards us?
So the first question:
We worry about value in the future, how should we behave towards the future and future beings?
- (1) We could trust in some base goodness - the universe, god, AIs being good
- (2) We could accept that all future beings will be alien to us and stop worrying (hee and here)
- (3) We could have moral systems or concepts of goodness/niceness
- (4) We could seize power over the future (here and here2)
- (5) We could adopt a different poise which centred around notions like growth/harmony/ “attunement” (here and here)
(1)
- Yudkowskians don’t tend to trust in god or the universe. That’s part of their schtick. They don’t trust AIs will be good
(2)
- This is perhaps what Hanson argue for3
- Yudkowskians do not buy this
  - Are they avoiding how alien future people will be?
  - How can they justify a notion of good that’s robust over time? (eg often Yudkowsky pushes against this notion)
  - Are they avoiding discomfort?4
(3)
- Moral systems vary wildly at edge cases
- On the scale of the whole future, even good people controlling it might be a moral disaster. The notion of paperclipping elides this, because it is involuntary and aesthetically dull. But the arguments also extend to law abiding and even relatively joyful beings
(4)
- This powerseeking is hard to distinguish from the power seeking of AIs5 (also (here and here)
(5)
- There is some notion of attunement/ trust/ growth/ balance which rationalists and EAs are quite inimical to6
- There seems some way of being which navigates how to interact with complex systems, letting them grow, tending to them without being able to dominate the or being dominated by them
- What might this look like?
Second question. How might AIs treat us?
- If we assume that we cannot trust (not 1,3) think it’s a problem (not 2) and ignore (5) then
  - it’s very easy to see AI as a tool while it is weaker than us and a competitor if it becomes stronger
  - Even if AI may be others ways than this, the safe option is to assume the competitor frame
- Under (5) AI might be
  - Something good but alien (like an octopus)
  - Something dangerous but not competitive (like a dead-eyed bear)
  - Something else. Other. Not us
Takeaways
- We should consider other ways to be towards those we could control but who might control us
- We should consider other relationships towards AI7
- In AI discourse there is a lack of clarity in notions of attunement, respect, harmony in relation to the sub-optimal choices of other conscious beings (here and here)
- It is possible that our priors are driven by our lack of this notion89

Some quotes I liked/was moved by:

Where Joe is quoting someone else I also link to the original source

On being ‘just statistics’

“Just” is rarely a bare metaphysic. More often, it’s also an aesthetic. And in particular: the aesthetic of disinterest, boredom, deadness. Certain frames – for example, mechanistic ones – prompt this aesthetic more readily. But you can spread deadness over anything you want, consciousness included. Cf depression, sociopathy, etc.

Werner Herzog, on the deadness of nature: (source, link to essay section)

“And what haunts me is that in all the faces of all the bears that Treadwell ever filmed, I discover no kinship, no understanding, no mercy. I see only the overwhelming indifference of nature. To me, there is no secret world of the bears, and this blank stare speaks only of a half-bored interest in food.”

From Yudkowsky: (source, link to essay section)

No rescuer hath the rescuer.
No Lord hath the champion,
no mother and no father,
only nothingness above.

Yudkowsky, on the death of his brother10, (source, link to essay section)

... Yehuda did not "pass on". Yehuda is not "resting in peace". Yehuda is not coming back. Yehuda doesn't exist any more. Yehuda was absolutely annihilated at the age of nineteen. Yes, that makes me angry. I can't put into words how angry. It would be rage to rend the gates of Heaven and burn down God on Its throne, if any God existed. But there is no God, so my anger burns to tear apart the way-things-are, remake the pattern of a world that permits this....

Haters gonna hate; atheists gonna yang11; agents gonna power-seek

(link)

Utilitarianism does not love you, nor does it hate you, but you're made of atoms that it can use for something else

(link)

On whether it is wrong to cut down ancient trees?

And yet, for all this, something about just cutting down this ancient, living tree for lumber does, indeed, feel pretty off to me. It feels, indeed, like some dimension related to "respect" is in deficit.

Also this image from the On green (image source, link to post)

(Lesswrong version for those who celebrate12)

What did you think of this? I have a much longer point by point summary and if 10 people sign up to a paid subscription of my blog I’ll finish and post that to them13.

The big EA foundation

If you like C S Lewis, you may find this essay particularly provocative - On the abolition of man

It’s funny to me that Carlsmith’s hierarchy of atheism seems to imply Hanson is the deepest atheist, disbelieving not only in God and the goodness of the universe but also that there is a stable notion of good over time. I softly endorse this

Specific quote: "On the other hand, some sort of discomfort in trying to control the values of future humans persists (at least for me). I think Hanson is right to notice it – and to notice, too, its connection to trying to control the values of the AIs. I think the AI alignment discourse should, in fact, prompt this discomfort – and that we should be serious about understanding, and avoiding, the sort of yang-gone-wrong that it's trying to track."

Specific quote: "Utilitarianism does not love you, nor does it hate you, but you're made of atoms that it can use for something else."

Specific quote: "Indeed, for closely related reasons, when I think about the two ideological communities that have paid the most attention to AI risk thus far—namely, Effective Altruism and Rationalism—the non-green of both stands out."

Specific quote: "Fear? Oh yes, I expect fear. But not only that. And we should look ahead to the whole thing."

Specific quote: “I want to start this series by acknowledging how many dimensions of interspecies-relationship this narrative leaves out”

To me, there is a slight undercurrent of this being a self-fulfilling prophecy/ vicious cycle - that we make a world of conflict slightly more likely by considering that world more likely than it is

I find this quote tremendously moving. And some part of me sings in unison

Carlsmith links the notion of powerseeking, agency, activity and a lack of trust and labels it 'yang'. I have thought using it a lot since

It’s common on twitter to say eg “Happy Eid to those who celebrate”. This isn’t the joke, but it’s as near as I can point.

And write more of this kind of stuff in future. This post took 5 - 15 hours more than if I'd just listened to the pieces. Getting it this short took a long time, as the saying goes, "If I had more time, I would have written a shorter letter" (seems we don't know who originally said this)

Predictive Text

Discussion about this post