The Accidental Democracy of Flat Organizations

Without a leader or a process for decision making, flat organizations make decisions accidentally. The winners tend to be whoever is the most stubborn.

Sep 25, 2024

In 2023 I was brought in as a consultant to offer some advice about the overall software architecture at a startup in the travel and hotel market. The startup had an existing tool which found under-priced vacation spots for tourists. I was not involved with that tool. Instead, I was being asked for advice on their new project. You can think of this as a search tool like Kayak, but combining every type of place a tourist could stay.

For example: Let’s say someone owned an apartment in Naples, which they offered via AirBnB, and they changed the price for which they were renting the apartment. We wanted to capture that change and register it in our database within 1 second, or, worst case, within 5 seconds. And we wanted to do this for every single hotel room in the world — and for every single apartment or home that was being rented via AirBnB, VRBO, or Booking.com, plus a number of smaller, regional "short-term rental" marketplaces.

The major technical difficulty stemmed from the massive write-throughput combined with the requirement for near-real-time updates — difficult but possible. Yet it soon became clear that the major problems we faced were not on the technical side, but rather, the human side. The startup was poorly organized and poorly managed. They had a strong commitment to being a democratic and "flat" organization, which always sounds great in theory but in practice can lead to some rough methods of reaching a decision.

I will here recount the first meeting I attended.

Suk-aku, Marty, and Vladimir were three engineers who met at the office each day. The startup favored an in-office culture. Olivia and Regina were two engineers who had been given permission to work from home. Both Olivia and Regina had young children they needed to take to and from school, so for them, being able to work from home was an important part of the flexibility the job offered.

As I was at the office, I followed Suk-aku, Marty and Vladimir into a conference room, and then Olivia and Regina joined us via a Zoom call. Olivia spoke first, and she summarized the way she recalled a conversation they had had the previous week: because of the high levels of data throughput, and the need to update the datastore in real-time, she thought this would be a good time to use the Go programming language to write an app that could handle the data importing, plus the minor data processing and cleanup that was needed before pushing the data into the datastore. She thought this approach would be more direct, and ultimately more simple, than trying to rig together multiple Python technologies to achieve the same end.

Now, this made sense to me. But I've learned that it’s a mistake for me, as an outside consultant, to open my mouth and commit myself before I have the whole context. I've learned how important it is to get an understanding of the entire history of a discussion before I offer my own opinion. So I was determined to remain absolutely silent. But as the conversation went on, I became increasingly convinced that Olivia was correct.

Suk-aku strongly disagreed with Olivia, and he had an entirely different memory of the meeting from the previous week. He felt that, after much discussion, they had agreed that a collection of Python technologies would offer them the best way forward.

Olivia: I thought we talked about this last week? I suggested we use Go for this. We are going to face bursts of massive throughput when we import all of the data from places like AirBnB.
Suk-aku: We did talk about this last week, and we agreed that Python gave us all the tools we needed to handle the throughput.
Olivia: What I said last week is that this is a case where we can benefit from working in a language which has strong concurrency primitives, such as Go. We are facing real-time constraints here that make most of Python's techniques inappropriate.
Regina: That is how I remember the conversation too.
Suk-aku: No, absolutely not.
Marty: Python always has the right answer. You just need to dig for it sometimes. Pythonistas find a way.
Vladimir: Python has many, many tools for handling concurrency. The biggest AI and machine-learning projects in the world use Python. Python can do anything.
Olivia: Right, Python can orchestrate some processes that oversee large workloads — but that's in situations where there is no real-time constraint. That's when you've got time to let a process run.
Suk-aku: Wrong.
Marty: So, so wrong.
Regina: To Olivia's point, Python is often just glue code. Especially in large scale Machine Learning tasks, it’s often simply used to invoke underlying technologies, many of which are written in C or Scala or Java.
Vladimir: And C is at least as fast as Go. Maybe faster. Python is fast because it can invoke those other technologies. We can rely on Apache's software for handling real-time streams of data, and we all know that the Apache technology for this has been highly optimized and is blindingly fast.
Regina: Right, but you're not talking about developing in C. You're talking about working in Python, to invoke code written in C, or perhaps, in this case, Scala. As such, there is a lack of directness. And we lose something when we use one language to invoke a technology written in another language. We lose time and we add complexity. And with Python you are not close to the metal, the way you are in C.
Vladimir: That doesn't matter, because we’re invoking C — so we get all the benefits of C, but we get the ease of coding that Python offers us. It is the best of all worlds.
Regina: Not exactly. There are some costs to using Python for this project. Python does not directly know when the underlying C code is done processing a specific message, so we need to build a whole orchestration layer to notify the Python code that the C code is done running. And that orchestration layer introduces a whole new layer of complexity.
Suk-aku: Wrong.
Marty: So, so wrong.
Vladimir: These orchestration layers, they've become standardized. Nowadays you can implement them with just 10 lines of code. I could set up the whole thing in less than 15 minutes.
Olivia: That's impossible.
Regina: Right, as Olivia was saying — I think it’s important that we come up with estimates that are realistic. There is a big difference between setting up an orchestration system versus customizing it for our specific use-case, and then working out the various failure modes that might come up, and then writing code to handle the failures, and then writing the tests to make sure the whole system is robust, and then writing the system checks and endpoint tests that would alert us when something bad happens. That is essentially weeks of work.
Suk-aku: Wrong.
Marty: It's like 15 minutes.
Vladimir: Yeah. Maybe 15 minutes. You don't realize how much functionality these Python libraries offer us. Everything just works now, out of the box. The default settings already cover 99% of any problem that might come up.
Olivia: Um, I'm worried this conversation is not grounded in reality. You cannot be serious about this 15 minute estimate. If you honestly thought you could do the whole thing in 15 minutes, then you could do it right now. We could all watch as you do it. Do you really think you could do the whole project while we watch you, before this call ends?
Suk-aku: That's not fair.
Marty: How would you feel if we tried to ambush you like that, Olivia? How would you like it if we watched you as you worked? Do you really want to be forced to do all of your work on a Zoom call, while we watch you and make fun of you every time you make a mistake?
Olivia: I never suggested that we make fun of anyone. We should help Vladimir in every way we can. But if you honestly think this project will only take 15 minutes, then let's do it now.
Suk-aku: That is so, so unfair.
Marty: I'm shocked you would do this, Olivia. You wouldn't want anyone to treat you like this, so why do you want to treat us this way?
Olivia: What way? If you need more time, that is fine. I'm not the one who is suggesting it can be done in 15 minutes.
Suk-aku: You're trying to ambush us.
Olivia: I am not trying to ambush you.
Regina: I don't think Olivia is trying to ambush anyone. And I agree with her: if you're serious that you can do this in 15 minutes, then why don't we do it right now?
Vladimir: I need some time to get set up. I need some time to spin up the servers it’ll run on. I need to think about what servers we will use to run the code, and how exactly our public VPN will talk to our private VPN, and how to whitelist the routes we’ll use, and how many servers to provision, and how the code should pool database connections, and so much more.
Regina: Yes, exactly, that is exactly what we're trying to say. I think that is why Olivia is suggesting the project will take a few weeks is because there are so many things to consider.
Suk-aku: You are being ridiculous.
Marty: Yeah, and your attack on Python doesn't even make rational sense. We would have to do all of the same things if we use Go.
Regina: Would we? If the Go app can continue to accept writes, while also doing the background processing, then we no longer have the division between Python code and C code plus an orchestration layer to unify their efforts, so a tremendous amount of complexity disappears. Instead, we just have a Go app, maybe running on some servers behind a load balancer, but that is it. We no longer need most of the complexity you were just complaining about.
Olivia: Exactly.
Suk-aku: No, no, no.
Marty: That is totally wrong.
Vladimir: We still have to think about what resources we will need. You are not thinking about the resources we will use.
Regina: What resources? If we have a Go app that is fast enough to both accept high levels of write-throughput while also doing the data processing, before pushing it into the database, then we have a very simple system — certainly a much simpler system than your suggestion, if we used Python with Storm. Or even Airflow.
Suk-aku: Storm?
Marty: Airflow? These are terrible choices. What the hell are you talking about?
Regina: Okay, well, we can use an alternative if —
Suk-aku: See, this is why you are confused.
Marty: You are completely misunderstanding the entire situation.
Vladimir: No one here suggested the use of Storm, why are you suggesting Storm? That is a terrible choice!
Regina: Well, just a minute ago you specifically said you wanted to use the Apache software for real-time streams of data, and Storm is the Apache software for real-time streams of data, so I assumed you were talking about Storm, but if you were thinking of something else, then please just tell us what —
Suk-aku: That is such a terrible idea.
Marty: You are just way off. Like, way off. Your ideas are crazy. This is why you think the project is going to take weeks.
Regina: Okay, so if you didn't mean Storm, then just tell us what —
Vladimir: We don't need to use Storm, I think this is a very bad idea. I think your suggestion is a very bad idea, I don't think you are putting much thought into this.
Regina: I am not suggesting we use Storm.
Vladimir: Well, you are the only person who has mentioned Storm, so this is your suggestion.
Marty: Yes, Regina, it is your suggestion.
Suk-aku: Be honest, Regina, you are the one who mentioned Storm, and it is a very bad idea.
Regina: I am not recommending Storm. I just thought you were talking about Storm, but again, if you're talking about something else, then —
Suk-aku: There are so many options that would be better than Storm.
Marty: So many. I can think of six options that would be better than Storm.
Vladimir: At least six.
Regina: Great, so what are these other —
Suk-aku: And none of them are going to take “weeks” to implement.
Marty: And all of them can be implemented in Python.
Suk-aku: By good Pythonistas who actually know what they are doing.
Vladimir: Once we do the setup work, this will only take 15 minutes.
Suk-aku: And we do not need to use Go.
Marty: That would be a terrible idea.
Vladimir: Go would introduce a lot of complexity. We just want to use Python for everything because that keeps our code base simple.
Marty: And good Pythonistas can always find a way to make it work.
Regina: Well, we can make anything work, but if it's slow then it won't meet the requirements we face for real-time updates.
Suk-aku: It will be fast.
Marty: Very fast.
Vladimir: With the right concurrency framework, we can build a Python system that will be faster than any code written in Go.
Regina: Listen. I started working in Python in 2006, when version 2.5 had just been released. I wrote my first websites using the Zope framework. I've grown up with Python. I understand Python. I've also done some major projects in Go. And I'm telling you, when you can employ an app that can run in constant daemon mode, listening on a port, accepting massive write-throughput while also doing background processing, then there is simply no rational reason to use Python. This is an obvious use-case for Go. To be honest I'm surprised we're even having this discussion.
Suk-aku: Well, this explains everything! You remember what Python was like back in 2006, so you don't realize what Python can do today!
Marty: Hey, Regina, it's not 2006 anymore. We're not stuck with the GIL.
Suk-aku: Oh my god! The GIL! I forgot about that!
Vladimir: The Global Interpreter Lock! Oh for hell’s sake. Is that what you’re worried about, Regina? Because no one has used that in years.
Regina: I know perfectly well that —
Suk-aku: Things have changed since 2006!
Olivia: Okay guys, these attacks are unfounded. Regina was not suggesting that her concerns were about —
Marty: Modern Python has very good concurrency frameworks!
Regina: I know that!
Olivia: For god's sake, half our code has “green threads”! We understand concurrency in modern Python!
Vladimir: Wait a moment, I am going to the documentation page for Tornado. I want to read this to you, Regina, so you understand this. Here is how Tornado explains itself: "Real-time web features require a long-lived mostly-idle connection per user. In a traditional synchronous web server, this implies devoting one thread to each user, which can be very expensive. But Tornado uses non-blocking Input/Output, so Tornado can scale to tens of thousands of open connections." Tornado is the correct Python way to handle concurrency. This gives us everything we need.
Regina: Yes, obviously I have worked with Tornado in the past! But Olivia raised the point earlier that it will require some orchestration if we have Python code in Tornado that is calling some other technology, perhaps written in C! That is why we discussed orchestration.
Olivia: I feel like you guys are not listening.
Suk-aku: I feel like you're not listening, Olivia.
Marty: Yeah, you are definitely not listening to us, Olivia.
Regina: Look, I have to go and pick up my kids from school. I'm already 5 minutes late leaving. So I have to go. But this is an obvious use-case for Go, and we would be fools to try to build this in Python.
Suk-aku: You're not listening to us. You are ignoring our arguments.
Marty: Yeah, Regina, you did not win this debate. You lost. You didn't make good points.
Regina: I have to leave now.
(Then she signed out.)
Olivia: Listen, everyone, I think you are not hearing what we are saying. Obviously Python in 2023 has good tools for concurrency. Obviously the situation is better now than in 2006. No one was suggesting otherwise. But in this case we are facing two very specific requirements, which are, accepting massive write-throughput while doing the data processing fast enough that the updates in the database appear to be real-time. That is a weak point for Python. It can be done, but it takes effort. Meanwhile, it's an area where Go is perfect. If we simply write a Go app to handle this then the final system will be much simpler, with fewer moving parts, and we can fulfill all of the requirements with less effort.
Suk-aku: Okay, so you admit that this can be done with Python? So why are we even talking about this? Let's stick with one language.
Marty: Yeah, Olivia, you just admitted that this can be done with Python. So you admit that we are right and you are wrong. Why are we even having this conversation?
Vladimir: Real Pythonistas always find a way. And we can make this work. There is no question about that.
Olivia: Okay, well, I also need to go get my kids from school, so I cannot discuss this any more. But honestly I feel we are making a huge mistake if we use Python for this project. We are inviting extra complexity and therefore the risk of delays.
Suk-aku: That's just not true.
Marty: And, Olivia, you already admitted you were wrong, so why are we still discussing this?
Olivia: I have to leave. Bye.

And then she signed out.

I spent another two weeks studying the requirements that had come from the Product Team, and I concluded that Olivia and Regina were correct, and that the final system would be simpler if they used Go. I wrote my report and sent it to the leadership, but the leadership, in their commitment to the ideal of a flat organization, left the implementation details to the teams, so my report was ignored. When a client does not listen to my advice then I feel they are wasting their money and I am wasting my time. So I soon moved on to other clients who were more interested in what I had to say.

In the end, Suk-aku and Marty and Vladimir began building the system using a variety of Python technologies. I believe it took them a long time to nail down the right set of feedback systems and orchestration systems which allowed them to get close to real-time updates.

But let us step away from the specifics of the technical decisions. Let us instead talk about the style of decision-making displayed here.

(It is a tradition, when writing an essay like this, that at some point the author should reference Jo Freeman's essay, “The Tyranny of Structurelessness.” Freeman’s essay is about political movements but has some insights that are also useful in a business setting.)

At this particular startup, the CTO was overseeing a team of 60 engineers — and in theory they were all direct-reports to the CTO, as the organization was flat. Any engineer could, in theory, take any concern straight to the CTO. But as a practical matter, the CTO was spread thin. Despite the nominal commitment to flatness, the CTO had certain favorites, to whom he spoke more frequently than he spoke to others. This introduced an invisible hierarchy into the organization — and it was more insidious than a normal hierarchy, precisely because it was invisible and undefined. You had to already be an insider to know who the insiders were. From the outside, you could not even be sure who the favorites were.

As an aside, I’d like to comment on the fact that Suk-aku and Marty and Vladimir all referred to themselves as "Pythonistas." They are clearly proud of their association with Python. If they were being pragmatic, Python would only be a tool to them, not an identity. Ideally, they would be neutral when considering the differences between Python and Go. But they clearly started with a bias towards Python, and their attitude was basically, "You must offer overwhelming evidence against Python before we will renounce our loyalty." This is not the correct way to make technical decisions. Technical decisions are never black-and-white, they are always subtle, full of nuance, and sensitive to context, therefore there is no way to prove that one approach is better than another. Creating beautiful software is an art, not a science. The best decisions will be made by master craftsmen whose judgement has been shaped by thousands of successes and thousands of failures. It’s impossible to explain these intuitions to someone who has a strong bias towards some pre-determined outcome.

But now let's refocus on the main issue, which is that this organization was flat. The CTO was spread thin — he could not meet with everyone, nor could he hope to understand every technical dispute that was happening in the organization. Occasionally teams would try to get his attention, only to be scheduled for a meeting two months in the future, which would then be canceled and rescheduled for yet another two months in the future.

As such, the teams had gotten used to making decisions on their own.

In organizations that have a strong commitment to being flat, you tend to end up with a lot of "accidental democracy." No one is in charge, so the teams will meet and try to hash out some kind of agreement. And in such cases, you don't necessarily get the best decision. Nor do you automatically get the decision that the majority of a team favors. You end up getting decisions favored by the most stubborn members of the team. They will simply wear everyone else down.

In this particular case, Suk-aku and Marty and Vladimir had a strong commitment to Python, and they were willing to sit in that conference room all day long, if necessary, to be sure that Python was always the winning answer. By contrast, Olivia and Regina both had to take care of young children, so they could not sit in an endless meeting, rehashing the same points over and over. Therefore they lost every argument, even when they were probably correct on the technical merits.

And that is one of the reasons why flat organizations tend to be so dysfunctional.

Thousands (probably tens of thousands?) of essays have been written about the kinds of dysfunction you run into in big bureaucracies. By contrast, not many essays have been written about the kind of dysfunction you run into in flat organizations. But, from what I've seen, the dysfunction caused by flatness can be as severe as the dysfunction caused by big bureaucracy.

If I had been put in charge of this startup, how would I have fixed the problems I witnessed? For me, the answer is simple: I would have ended the commitment to flatness. I felt that Olivia had the best intuitions about the right path forward, so I would have put her in charge of that team. She could then set the technical direction, in defiance of what Suk-aku and Marty and Vladimir felt. It would then be up to them (Suk-aku and Marty and Vladimir) to get on board with the new direction or go elsewhere. If they quit, Olivia would then be free to hire new people who aligned more strongly with her vision. And this small increase in hierarchy (making Olivia the team lead) would solve many of the problems the team suffered from.

We should therefore ask: which is better, big bureaucracy or flatness? I would say neither matters. All that matters is good leadership, and there is no formula for that. Like all great art, we know it when we see it, but it is exceedingly difficult to say why one bit of paint is a masterpiece while another bit of paint is an amateur effort. Likewise, it is difficult to say why one person's subtle system of decision-making leads to great outcomes, while someone else's subtle system of decision-making leads to terrible outcomes.

And yet, we can conclude with one observation that is absolutely true and indisputable. As an organization gets larger, it will need more systems for keeping track of its money; it will need more security to protect itself from internal embezzlement; it will need more lawyers to negotiate increasingly larger deals; and it will need some way to measure what work is getting done, which will become more complex as the team grows and the number of projects multiply. Additionally, it will need some system of accountability when work is not getting done and a system of punishment for those who don't get work done, and the company will need to document these punishments for legal reasons. In other words, bureaucracy is inevitable.

Therefore, you, as a leader, must have the aim to build a beautiful and pragmatic and ambitious bureaucracy. Because even if bureaucracy and flatness both cause different kinds of dysfunction, only bureaucracy can solve the problems you face as you grow.

I could also say this the other way around:

Bureaucracy is avoidable if you are a failure. You can remain flat forever if you remain small forever. Bureaucracy is only inevitable if you grow. And bureaucracy, in turn, allows for future growth. You cannot have an organization with 10,000 employees until you've built the kind of vast bureaucracy that can support having 10,000 employees. And yet, many companies have a strong commitment to flatness, and so they never build the bureaucracy that would allow a larger organization to exist, so they never become a larger organization, so they strangle their own growth, so they make their own failure inevitable.

Phrased concisely:

If you are truly committed to flatness, then you are truly committed to your own failure.

As a leader, think carefully. Is that what you really want?

Respectful Leadership

Discussion about this post