<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Mind the Gap</title>
<link>https://blog.kasralekan.com/ideas/</link>
<atom:link href="https://blog.kasralekan.com/ideas/index.xml" rel="self" type="application/rss+xml"/>
<description>Technical, business, and philosophical musings {{&lt; fa regular lightbulb &gt;}}</description>
<generator>quarto-1.5.55</generator>
<lastBuildDate>Tue, 01 Oct 2024 04:00:00 GMT</lastBuildDate>
<item>
  <title>Money and Politics in US House of Representatives Elections</title>
  <dc:creator>Kasra Lekan</dc:creator>
  <link>https://blog.kasralekan.com/ideas/money-and-politics/</link>
  <description><![CDATA[ 
<div class="progress" id="progress">
    <div class="train">
        <div class="train-tail"></div>
        <div class="train-body" id="train-body"></div>
        <div class="train-head"></div>
    </div>
</div>




<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="citizens-united-court-sketch.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="“In this Sept.&nbsp;9, 2009 artist rendering, U.S. Solicitor General Elena Kagan, right, argues before the Supreme Court, Citizens United v. Federal Election Commission in Washington. (Dana Verkouteran/AP)” [@kaminer_truth_2015]"><img src="https://blog.kasralekan.com/ideas/money-and-politics/citizens-united-court-sketch.jpg" class="img-fluid figure-img" alt="“In this Sept.&nbsp;9, 2009 artist rendering, U.S. Solicitor General Elena Kagan, right, argues before the Supreme Court, Citizens United v. Federal Election Commission in Washington. (Dana Verkouteran/AP)” (Kaminer 2015)"></a></p>
<figcaption>“In this Sept.&nbsp;9, 2009 artist rendering, U.S. Solicitor General Elena Kagan, right, argues before the Supreme Court, Citizens United v. Federal Election Commission in Washington. (Dana Verkouteran/AP)” <span class="citation" data-cites="kaminer_truth_2015">(Kaminer 2015)</span></figcaption>
</figure>
</div>
<section id="context" class="level1">
<h1>Context</h1>
<p>As we near the 2024 presidential vote, I have heard several discussions of the corrupting influence of money in American politics from journalists and pundits inside and outside the United States. They argue that since <a href="https://en.wikipedia.org/w/index.php?title=Citizens_United_v._FEC&amp;oldid=1247060928">Citizens United</a> which, plainly put determined that <em>Money is Speech</em>, enshrining the ability for firms or individuals to donate huge sums to political candidates.</p>
<div id="fig-pew" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-pew-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="PP_2023.09.19_views-of-politics_05-02.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="PEW Research [@nadeem_5_2023]"><img src="https://blog.kasralekan.com/ideas/money-and-politics/PP_2023.09.19_views-of-politics_05-02.png" class="img-fluid figure-img" style="width:50.0%"></a></p>
<figcaption>PEW Research <span class="citation" data-cites="nadeem_5_2023">(Nadeem 2023)</span></figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-pew-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<p>Many argue that this allows politicians to be influenced by big donors who sometimes will donate to both candidates in a race to ensure their influence regardless of the outcome. I do not consider implicit bias in favor of the donor’s agenda here. I wanted to tackle a far simpler question: in our modern age of targeted advertising and dense information ecosystems, “<em>Does money facilitate election wins?</em>”</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="openSecretWinning.png" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="Percent of Races Won by Top Spending Candidate [@opensecrets_did_2022]"><img src="https://blog.kasralekan.com/ideas/money-and-politics/openSecretWinning.png" class="img-fluid figure-img" alt="Percent of Races Won by Top Spending Candidate (OpenSecrets 2022)"></a></p>
<figcaption>Percent of Races Won by Top Spending Candidate <span class="citation" data-cites="opensecrets_did_2022">(OpenSecrets 2022)</span></figcaption>
</figure>
</div>
<p>It is no secret that the top spending candidate wins almost all of the time. This post was inspired by someone stating “In 2022, the candidate with more money won over 93% of the time.” But we would expect this regardless. Let’s suppose we have hypothetical candidates A and B. Candidate A is well-liked by her constituents and as a result receives a large amount of donations. Candidate B is not well-liked and receives fewer donations. Candidate A wins <em>AND</em> is better funded because she is more popular. Therefore, a more in-depth is needed to evaluate whether having more money is unduly beneficial.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="confoundingCorrelation.png" class="lightbox" data-gallery="quarto-lightbox-gallery-4" title="A confounding variable (popularity) influences both donations and voting, creating a misleading association between them."><img src="https://blog.kasralekan.com/ideas/money-and-politics/confoundingCorrelation.png" class="img-fluid figure-img" style="width:70.0%" alt="A confounding variable (popularity) influences both donations and voting, creating a misleading association between them."></a></p>
<figcaption>A confounding variable (popularity) influences both donations and voting, creating a misleading association between them.</figcaption>
</figure>
</div>
<!-- cite sources on how often money wins and PEW perceptions polling -->
</section>
<section id="methods" class="level1">
<h1>Methods</h1>
<p><strong>Datasets</strong>: For this analysis I combine FEC campaign finance data <span class="citation" data-cites="fec_campaign_2024">(FEC 2024)</span>, election results data <span class="citation" data-cites="lab_us_2024">(Lab 2024)</span>, and overall election data for Cook PVIs <span class="citation" data-cites="cook_pvi_2023_2023">(Cook PVI℠ 2023)</span> and national vote totals from Wikipedia <span class="citation" data-cites="wikipedia_2020_2021">(e.g. Wikipedia 2021)</span>. The code and data for the analysis can be found <a href="https://github.com/anrath/money-and-politics">here</a>.</p>
<p><strong>Restrictions of the Dataset</strong>: I consider districts during the period 2012-2020 to avoid redistricting between years in the dataset. I only consider races between a registered Democrat and Republican candidate.</p>
<p><strong>Variables</strong>: The independent variable is the spending ratio between the winning candidate and the runner-up. The dependent variable is the Margin of Victory (%) or Adjusted Margin (%). Adjusted Margin was calculated to control for the political bias of the district and the country. To control for the political bias of the district, I adjust the Margin of Victory by the Cook PVI for the district (it is best to think of this as calculating the under- or over-performance of the candidate based on the political lean of the district). I also adjust by what I am calling the “National House Environment” which is just the difference between the percentage of all Americans who voted for Republican candidates and those who voted for Democratic candidates. This adjustment was an attempt to control for the influence of the environment of the nation and media on the voting of “swing” voters in each district. I did not control for incumbency, but I separately analyzed cases where incumbents were not present.</p>
</section>
<section id="findings" class="level1">
<h1>Findings</h1>
<div id="fig-findings" class="quarto-layout-panel">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-findings-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-layout-row">
<div class="quarto-layout-cell-subref quarto-layout-cell" data-ref-parent="fig-findings" style="flex-basis: 50.0%;justify-content: flex-start;">
<div id="fig-mv-all" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-subfloat-fig figure">
<div aria-describedby="fig-mv-all-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="output/spending_ratio_and_margin_of_victory.png" class="lightbox" data-gallery="fig-findings" title="Figure&nbsp;2&nbsp;(a): All races; Raw Margin of Victory vs.&nbsp;Spending Ratio"><img src="https://blog.kasralekan.com/ideas/money-and-politics/output/spending_ratio_and_margin_of_victory.png" class="img-fluid figure-img" data-ref-parent="fig-findings"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-subfloat-caption quarto-subfloat-fig" id="fig-mv-all-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
(a) All races; Raw Margin of Victory vs.&nbsp;Spending Ratio
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell-subref quarto-layout-cell" data-ref-parent="fig-findings" style="flex-basis: 50.0%;justify-content: flex-start;">
<div id="fig-am-all" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-subfloat-fig figure">
<div aria-describedby="fig-am-all-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="output/spending_ratio_and_adjusted_margin.png" class="lightbox" data-gallery="fig-findings" title="Figure&nbsp;2&nbsp;(b): All races; Adjusted Margin vs.&nbsp;Spending Ratio"><img src="https://blog.kasralekan.com/ideas/money-and-politics/output/spending_ratio_and_adjusted_margin.png" class="img-fluid figure-img" data-ref-parent="fig-findings"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-subfloat-caption quarto-subfloat-fig" id="fig-am-all-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
(b) All races; Adjusted Margin vs.&nbsp;Spending Ratio
</figcaption>
</figure>
</div>
</div>
</div>
<div class="quarto-layout-row">
<div class="quarto-layout-cell-subref quarto-layout-cell" data-ref-parent="fig-findings" style="flex-basis: 50.0%;justify-content: flex-start;">
<div id="fig-mv-cont" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-subfloat-fig figure">
<div aria-describedby="fig-mv-cont-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="output/spending_ratio_and_margin_of_victory_most_contested_races.png" class="lightbox" data-gallery="fig-findings" title="Figure&nbsp;2&nbsp;(c): Races with +/- 10% Adjusted Margin; Raw Margin of Victory vs.&nbsp;Spending Ratio"><img src="https://blog.kasralekan.com/ideas/money-and-politics/output/spending_ratio_and_margin_of_victory_most_contested_races.png" class="img-fluid figure-img" data-ref-parent="fig-findings"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-subfloat-caption quarto-subfloat-fig" id="fig-mv-cont-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
(c) Races with +/- 10% Adjusted Margin; Raw Margin of Victory vs.&nbsp;Spending Ratio
</figcaption>
</figure>
</div>
</div>
<div class="quarto-layout-cell-subref quarto-layout-cell" data-ref-parent="fig-findings" style="flex-basis: 50.0%;justify-content: flex-start;">
<div id="fig-am-cont" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-subfloat-fig figure">
<div aria-describedby="fig-am-cont-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="output/spending_ratio_and_adjusted_margin_most_contested_races.png" class="lightbox" data-gallery="fig-findings" title="Figure&nbsp;2&nbsp;(d): Races with +/- 10% Adjusted Margin; Adjusted Margin vs.&nbsp;Spending Ratio"><img src="https://blog.kasralekan.com/ideas/money-and-politics/output/spending_ratio_and_adjusted_margin_most_contested_races.png" class="img-fluid figure-img" data-ref-parent="fig-findings"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-subfloat-caption quarto-subfloat-fig" id="fig-am-cont-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
(d) Races with +/- 10% Adjusted Margin; Adjusted Margin vs.&nbsp;Spending Ratio
</figcaption>
</figure>
</div>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-findings-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Comparing victory margins with relative spending. Note that “Challenger vs.&nbsp;Incumbent” indicates that the Challenger wins a race against an Incumbent.
</figcaption>
</figure>
</div>
<p>As I hypothesized, the correlation decreases when using the Adjusted Margin. With an overall correlation of .27, the relationship between the Spending Ratio and Adjusted Margin is weak but not negligible. Most of the data is from races where Incumbents are present and win, which is extremely common in US House races. Thus, I broke down the correlations by race type (data from “most contested races”):</p>
<ul>
<li>Overall correlation between Spending Ratio and Adjusted Vote Margin (n=437): 0.2685</li>
<li>Correlation for Incumbent vs Challenger (n=306): 0.2263</li>
<li>Correlation for Open Seat (n=68): 0.2243</li>
<li>Correlation for Challenger vs Incumbent (n=63): -0.0215</li>
</ul>
<p>While there are much fewer races where the Challenger beats the Incumbent (n=63 “most contested races” or n=73 “all races”), it is interesting that there is no correlation between spending ratio and adjusted margin in these cases.</p>
</section>
<section id="discussion" class="level1">
<h1>Discussion</h1>
<p>FiveThirtyEight’s piece <a href="https://fivethirtyeight.com/features/money-and-elections-a-complicated-love-story/">“How Money Affects Elections”</a> looks at this issue from a more qualitative perspective based on the work of some prominent researchers in the field. They raise a few interesting ideas that mesh with my findings:</p>
<blockquote class="blockquote">
<p>… early fundraising strongly predicted who would win primary races. …advertising is useful for making voters aware that a candidate or an issue exists at all. <span class="citation" data-cites="koerth_how_2018">(Koerth 2018)</span></p>
</blockquote>
<blockquote class="blockquote">
<p>… the strong raw association between raising the most cash and winning probably has more to do with big donors who can tell (based on polls or knowledge of the district or just gut-feeling woo-woo magic) that one candidate is more likely to win — and then they give that person all their money. <span class="citation" data-cites="koerth_how_2018">(Koerth 2018)</span></p>
</blockquote>
<p>Based on the relatively weak correlations I found and the views of various authors on this topic, money influences elections, but in a far more minor way than one would expect. Money is most relevant when a challenger attempts to defeat an incumbent or when a seat is open. Thus, money, to an extent, can be a barrier to entry into the political fray. Regarding eroding American democracy, I would contend that money is primarily an issue in elections due to the negative impact on voter perceptions (see Figure&nbsp;1). Once politicians get into office, however, the influence of such donations can take a far more corrosive toll.</p>



</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-cook_pvi_2023_2023" class="csl-entry">
Cook PVI℠. 2023. <span>“2023 <span>Cook</span> <span>PVI</span>℠: <span>District</span> <span>Map</span> and <span>List</span> (118th <span>Congress</span>).”</span> <a href="https://www.cookpolitical.com/cook-pvi/2023-partisan-voting-index/118-district-map-and-list">https://www.cookpolitical.com/cook-pvi/2023-partisan-voting-index/118-district-map-and-list</a>.
</div>
<div id="ref-fec_campaign_2024" class="csl-entry">
FEC. 2024. <span>“Campaign Finance Data.”</span> <a href="https://www.fec.gov/data/browse-data/?tab=bulk-data">https://www.fec.gov/data/browse-data/?tab=bulk-data</a>.
</div>
<div id="ref-kaminer_truth_2015" class="csl-entry">
Kaminer, Wendy. 2015. <span>“The <span>Truth</span> <span>About</span> <span>Citizens</span> <span>United</span>.”</span> <a href="https://www.wbur.org/cognoscenti/2015/01/21/campaign-finance-myths-wendy-kaminer">https://www.wbur.org/cognoscenti/2015/01/21/campaign-finance-myths-wendy-kaminer</a>.
</div>
<div id="ref-koerth_how_2018" class="csl-entry">
Koerth, Maggie. 2018. <span>“How <span>Money</span> <span>Affects</span> <span>Elections</span>.”</span> <em>FiveThirtyEight</em>. <a href="https://fivethirtyeight.com/features/money-and-elections-a-complicated-love-story/">https://fivethirtyeight.com/features/money-and-elections-a-complicated-love-story/</a>.
</div>
<div id="ref-lab_us_2024" class="csl-entry">
Lab, MIT Election Data and Science. 2024. <span>“U.<span>S</span>. <span>House</span> 1976–2022.”</span> Harvard Dataverse. <a href="https://doi.org/10.7910/DVN/IG0UN2">https://doi.org/10.7910/DVN/IG0UN2</a>.
</div>
<div id="ref-nadeem_5_2023" class="csl-entry">
Nadeem, Reem. 2023. <span>“5. <span>Money</span>, Power and the Influence of Ordinary People in <span>American</span> Politics.”</span> <em>Pew Research Center</em>. <a href="https://www.pewresearch.org/politics/2023/09/19/money-power-and-the-influence-of-ordinary-people-in-american-politics/">https://www.pewresearch.org/politics/2023/09/19/money-power-and-the-influence-of-ordinary-people-in-american-politics/</a>.
</div>
<div id="ref-opensecrets_did_2022" class="csl-entry">
OpenSecrets. 2022. <span>“Did <span>Money</span> <span>Win</span>?”</span> <em>OpenSecrets</em>. <a href="https://www.opensecrets.org/elections-overview/winning-vs-spending">https://www.opensecrets.org/elections-overview/winning-vs-spending</a>.
</div>
<div id="ref-wikipedia_2020_2021" class="csl-entry">
Wikipedia. 2021. <span>“2020 <span>United</span> <span>States</span> <span>House</span> of <span>Representatives</span> Elections - <span>Wikipedia</span>.”</span> <a href="https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections">https://en.wikipedia.org/wiki/2020_United_States_House_of_Representatives_elections</a>.
</div>
</div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-lekan2024" class="csl-entry quarto-appendix-citeas">
Lekan, Kasra. 2024. <span>“Money and Politics in US House of
Representatives Elections.”</span> October 1, 2024. <a href="https://blog.kasralekan.com/ideas/money-and-politics/">https://blog.kasralekan.com/ideas/money-and-politics/</a>.
</div></div></section></div> ]]></description>
  <category>[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fhits.dwyl.com%2Fanrath%2Fblog_money-and-politics.json&amp;show=unique&amp;style=flat-square&amp;label=Views&amp;color=orange)]()</category>
  <category>Data Analysis</category>
  <category>Policy</category>
  <guid>https://blog.kasralekan.com/ideas/money-and-politics/</guid>
  <pubDate>Tue, 01 Oct 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.kasralekan.com/ideas/money-and-politics/citizens-united-court-sketch.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>The Large World of Small(er) Language Models</title>
  <dc:creator>Kasra Lekan</dc:creator>
  <link>https://blog.kasralekan.com/ideas/small-language-models/</link>
  <description><![CDATA[ 
<div class="progress" id="progress">
    <div class="train">
        <div class="train-tail"></div>
        <div class="train-body" id="train-body"></div>
        <div class="train-head"></div>
    </div>
</div>




<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="heroGraphSizeQuality.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://blog.kasralekan.com/ideas/small-language-models/heroGraphSizeQuality.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:70.0%"></a></p>
</figure>
</div>
<div class="callout callout-style-default callout-warning callout-titled" title="Technical Content Disclaimer">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Technical Content Disclaimer
</div>
</div>
<div class="callout-body-container callout-body">
<!-- https://quarto.org/docs/authoring/callouts.html -->
<p>This post does not constitute a journal-level review. Therefore, my research for this post was intended to be informational and did not exhaust the search space. If you notice any key papers or references that I have missed or if I misinterpreted the findings of any reference, please let me know in the comments.</p>
</div>
</div>
<section id="an-aside-on-naming-and-definitions" class="level1">
<h1>An Aside on Naming and Definitions</h1>
<p>Naming in science can sometimes lead to confusion when the original name and purpose of something evolves. The relevant example here is “Large Language Models” (LLMs). The modern basis for nearly all such models is three things:</p>
<ol type="1">
<li>A large amount of training data to service a stacked collection of underlying layers.</li>
<li>Attention layers, the contents of which have evolved since <span class="citation" data-cites="vaswani_attention_2017">Vaswani et al. (2017)</span>, in (mostly) transformer-like layer collections.</li>
<li>Training on sequences of tokens, which mostly correspond to natural language.</li>
</ol>
<p>A few caveats, however, are necessary even in this basic list. First, the term “large” is relative. The use of “large” indicates that the original LLMs were much larger than any of the previous models. Second, there is nothing ensuring the primacy of Transformer layers. Recent models which focus on reducing inference compute integrate components of state space models (SSMs), most notably proposed in <span class="citation" data-cites="gu_mamba_2023">Gu and Dao (2023)</span>. I highly recommend watching Daniel Fu’s talk from NeurIPS 2023 to get a foundational overview of this area <span class="citation" data-cites="fu_39014562_2023">Fu (2023)</span>. Third, tokens are not necessarily language. To give just one example from my own work, PLAN-BERT (<span class="citation" data-cites="shao_degree_2021">(Shao, Guo, and Pardos 2021)</span>) uses course codes at UC Berkeley as tokens to predict course schedules.</p>
<p>As usual, Andrej Karpathy has already expressed this idea with a strong public response:</p>
<blockquote class="blockquote">
<p>It’s a bit sad and confusing that LLMs (“Large Language Models”) have little to do with language; It’s just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something. <span class="citation" data-cites="karpathy_its_2024">(@karpathy 2024)</span></p>
</blockquote>
<section id="what-is-a-small-language-model" class="level2">
<h2 class="anchored" data-anchor-id="what-is-a-small-language-model">🛩️ What is a Small Language Model?</h2>
<p>As with “large” models, the definition of “small” is somewhat loose. Rather than thinking of any specific size cutoff, here I will adapt the name to Small(er) Language Models. Many of the models considered here attempt to increase either the inference compute efficiency through explicit model choices or by more efficiently condensing the capabilities of an effective language model into a smaller parameter count.</p>
</section>
</section>
<section id="smaller-lms-technique-overview" class="level1">
<h1>Small(er) LMs: Technique Overview</h1>
<p>Here I explore the techniques used by a collection of modern small(er) language models to determine what techniques are used to create more efficient models. I break down these techniques into pre-train, train, and post-train.</p>
<section id="pre-train-data-curation-and-synthetic-data" class="level2">
<h2 class="anchored" data-anchor-id="pre-train-data-curation-and-synthetic-data">📚 Pre-train: data curation and synthetic data</h2>
<p>Several modern approaches incorporate high-quality synthetic data into their training datasets. For example, the Phi-2 model <span class="citation" data-cites="javaheripi_phi-2_2023">(Javaheripi and Bubeck 2023)</span> uses synthetic datasets specifically created to teach common sense reasoning and general knowledge, covering areas such as science, daily activities, and theory of mind. The Phi-3 model <span class="citation" data-cites="beatty_tiny_2024">Abdin et al. (2024)</span> takes this concept further by employing a sophisticated prompting and seeding formula inspired by the TinyStories approach. This method involves <span class="citation" data-cites="li_textbooks_2023">(Li et al. 2023)</span>:</p>
<ol type="1">
<li>Collecting publicly available information into an initial dataset</li>
<li>Using a large language model (LLM) to synthesize new content based on this data</li>
<li>Filtering the generated content for quality</li>
<li>Feeding the filtered content back into the LLM for further synthesis</li>
</ol>
<p>This iterative process allows researchers to build up a high-quality corpus of data large enough to train a capable small language model over several weeks.</p>
</section>
<section id="train-distillation-and-architecture" class="level2">
<h2 class="anchored" data-anchor-id="train-distillation-and-architecture">🚝 Train: Distillation and Architecture</h2>
<section id="knowledge-distillation" class="level3">
<h3 class="anchored" data-anchor-id="knowledge-distillation">Knowledge distillation</h3>
<p>Knowledge distillation is a technique where a smaller model (the student) is trained to mimic the behavior of a larger, more complex model (the teacher). This approach allows the smaller model to benefit from the knowledge captured by the larger model while maintaining a more compact size, e.g., Gemma 2 <span class="citation" data-cites="schmid_welcome_2024">(Schmid et al. 2024)</span>. Knowledge distillation is not new, even in the language model space <span class="citation" data-cites="sanh_distilbert_2019">(Sanh et al. 2019)</span>.</p>
</section>
<section id="architectural-approach-examples" class="level3">
<h3 class="anchored" data-anchor-id="architectural-approach-examples">Architectural Approach Examples</h3>
<p>Several architectural innovations have been employed to create more efficient small language models. A few examples include:</p>
<ol type="1">
<li>Grouped-Query Attention (GQA): This attention mechanism helps improve efficiency in processing queries. See <span class="citation" data-cites="abdin_phi-3_2024">Abdin et al. (2024)</span>.</li>
<li>Embedding Tying: This technique reduces the model’s parameter count by sharing weights between the input embedding and output layer.</li>
<li>Hybrid Models: Combines Mamba layers (a type of state space model) with shared attention layers. This approach aims to balance the efficiency of state space models with the expressiveness of attention mechanisms, e.g., <span class="citation" data-cites="glorioso_zamba2-mini_2024">Goel (2024)</span></li>
<li>Mixture of Expert Models: While not strictly a small model technique, the Mixture of Experts approach can create more efficient large models. For example, Phi-3.5-MoE comprises 16 experts, each containing 3.8B parameters. During inference, it activates only a subset of these experts (typically two), resulting in 7.6B active parameters out of a total of 42B.</li>
</ol>
</section>
</section>
<section id="post-train" class="level2">
<h2 class="anchored" data-anchor-id="post-train">🔪 Post-train</h2>
<section id="pruning-and-quantization" class="level3">
<h3 class="anchored" data-anchor-id="pruning-and-quantization">Pruning and Quantization</h3>
<p>Pruning techniques remove less important weights or entire neurons from the model, reducing its size and potentially improving inference speed.</p>
<p>Quantization involves reducing the precision of the model’s weights (e.g., from 32-bit floating-point to 8-bit integers), which can significantly reduce the model’s memory footprint and inference time with minimal impact on performance.</p>
<p>While theoretically pruning and quantization can both be used to reduce model sizes, quantization is used far more often. There are a number of interesting papers advancing the effectiveness of LLM quantization. Originally, LLMs had poor quantization results since they exhibit outliers in specific activation channels across all layers and tokens<sup>1</sup>. <span class="citation" data-cites="malinovskii_evolution_2024">Malinovskii (2024)</span> has a good blog post summarizing the progression of this research.</p>
</section>
<section id="fine-tuning" class="level3">
<h3 class="anchored" data-anchor-id="fine-tuning">Fine-tuning</h3>
<p>After initial training, models can be fine-tuned on specific tasks or domains to improve their performance in targeted areas without increasing model size. This is only useful for specific use cases which correspond to the model style rather than the model’s capabilities. However, where applicable, QLora <span class="citation" data-cites="dettmers_qlora_2023">(Dettmers et al. 2023)</span> and its descendants enable low-compute model specialization.</p>
</section>
</section>
</section>
<section id="smaller-lms-core-techniques" class="level1">
<h1>Small(er) LMs: Core Techniques</h1>
<div class="callout callout-style-default callout-note callout-titled" title="Modification">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Modification
</div>
</div>
<div class="callout-body-container callout-body">
<!-- https://quarto.org/docs/authoring/callouts.html -->
<p>This section was added after the original post.</p>
</div>
</div>
<p>A few core techniques emerge consistently in the research and industry announcements including knowledge distillation (sometimes paired with sophisticated parameter search techniques), memory-efficient attention mechanisms, and Mixture-of-Expert models. Quantization is being thoroughly explored for more computationally constrained scenarios, e.g. <span class="citation" data-cites="liu_vptq_2024">Liu et al. (2024)</span>.</p>
<section id="case-study-llama-3.1-nemotron-51b" class="level2">
<h2 class="anchored" data-anchor-id="case-study-llama-3.1-nemotron-51b">Case Study: Llama-3.1-Nemotron-51B</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="distill_nemotron.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="“Block-distillation – For blocks of the reference model (blue), we create multiple variants for the ‘student model’ (yellow) that mimic the block-wise teacher functionality.”"><img src="https://blog.kasralekan.com/ideas/small-language-models/distill_nemotron.png" class="img-fluid figure-img" alt="“Block-distillation – For blocks of the reference model (blue), we create multiple variants for the ‘student model’ (yellow) that mimic the block-wise teacher functionality.”"></a></p>
<figcaption>“Block-distillation – For blocks of the reference model (blue), we create multiple variants for the ‘student model’ (yellow) that mimic the block-wise teacher functionality.”</figcaption>
</figure>
</div>
<p>Last month, NVIDIA introduced Llama-3.1-Nemotron-51B, a language model derived from Meta’s Llama-3.1-70B. It achieves 2.2x faster inference and handles 4x larger workloads on a single GPU while maintaining comparable accuracy to its parent model <span class="citation" data-cites="bercovich_advancing_2024">(Bercovich and Karpas 2024)</span>. The core technique used was a combination of multi-path Knowledge Distillation with a novel “Neural Architecture Search (NAS)” approach which allowed them to find a model that matched the chosen point on the size-capability frontier. This technique combined with knowledge distillation represents an exciting step forward in Small(er) Language Models.</p>
<blockquote class="blockquote">
<p>We then use our block-distillation framework to train all these block variants for all layers of a (large) parent LLM in parallel. In a basic version of block-distillation, training data is passed through the reference model &nbsp;(also known as a teacher). For each block, its input is taken from the teacher and injected into the matching block of the student. The outputs of the teacher and student for the block are compared and the student block is trained so that the student block mimics the functionality of the teacher block. A more advanced scenario where a single student block mimics multiple teacher blocks is depicted in the right-hand diagram of Figure 2. Next, we use our Puzzle algorithm to efficiently score each alternative replacement “puzzle piece” and search our enormous design space for the most accurate models, while adhering to a set of inference constraints, such as memory size and required throughput. Finally, by using knowledge distillation (KD) loss for both block scoring and training, we demonstrate the potential to narrow the accuracy gap between our model and the reference model using a much more efficient architecture with a tiny fraction of the reference model training costs. Using our methods on Llama-3.1-70B as the reference model, we built ​​Llama-3.1-Nemotron-51B-Instruct, a 51B model that breaks the efficient frontier of LLMs on a single NVIDIA H100 GPU.</p>
</blockquote>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="layers_nemotron.png" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="“Runtime of Puzzle chosen blocks (layers) for attention layers (blue) and FFN layers (red) across the 80 layers of the reference model. Green areas correspond to overall runtime savings.”"><img src="https://blog.kasralekan.com/ideas/small-language-models/layers_nemotron.png" class="img-fluid figure-img" alt="“Runtime of Puzzle chosen blocks (layers) for attention layers (blue) and FFN layers (red) across the 80 layers of the reference model. Green areas correspond to overall runtime savings.”"></a></p>
<figcaption>“Runtime of Puzzle chosen blocks (layers) for attention layers (blue) and FFN layers (red) across the 80 layers of the reference model. Green areas correspond to overall runtime savings.”</figcaption>
</figure>
</div>
<p>There are several instances where learned structures are used in novel ways to improve performance (one of my favorites is <a href="https://arxiv.org/abs/1712.01208">learned indices</a>). When training new models I, like many other researchers, would default to some kind of consistent pattern. Seeing some of these layers pruned without impacting model performance suggests that one could use this method to inform future model design or interoperability research by examining patterns in layer pruning.</p>
<p>Separately, OpenAI started to offer knowledge distillation of their core models <span class="citation" data-cites="openai_distillation_2024">(OpenAI 2024)</span>. Although we cannot determine the techniques employed on their backend, it is easy to imagine a world where people are able to choose a model distilled from the most powerful models to best suit their computational requirements.</p>
</section>
<section id="case-study-deepseek-v2" class="level2">
<h2 class="anchored" data-anchor-id="case-study-deepseek-v2">Case Study: DeepSeek-V2</h2>
<p>In the Transformer decoder, the attention matrix for the current token depends on the preceding tokens (hence why each subsequent token generated is more computationally expensive). Naturally, the components of the attention calculation are cached, but this introduces a costly memory overhead. To address this overhead, several attention mechanisms have been introduced:</p>
<ul>
<li>Multi-Head Attention (MHA)</li>
<li>Multi-Query Attention (MQA)</li>
<li>Grouped-Query Attention (GQA)</li>
<li>Multi-Head Latent Attention (MLA was introduced with DeepSeek-V2, <span class="citation" data-cites="deepseek-ai_deepseek-v2_2024">(DeepSeek-AI et al. 2024)</span>)</li>
</ul>
<p>These attention mechanisms represent tradeoffs between attention effectiveness, computational cost, and scalability. For more details, see the phenomenal post: <span class="citation" data-cites="abideen_mha_2024">Abideen (2024)</span>. His conclusion was:</p>
<blockquote class="blockquote">
<p>MHA can be faster for inference but its KV-cache overheads make it impossible to scale to larger-sized models. MQA significantly reduces KV-cache but degrades in quality as the model size increases. GQA is a balance between both attention mechanisms in terms of KV-caching and memory bandwidths. MLA requires a significantly lower KV cache yet outperforms MHA in output quality.</p>
</blockquote>
</section>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>Creating efficient small(er) language models involves a combination of techniques applied at various stages of the model development process. Outside of research applications, small(er) language models enable faster inference speeds, which is beneficial for production applications, and on-device inference, which is crucial to privacy-concerned users.</p>
<p>Most posts about small(er) language models are something akin to “Model X beats much larger model Y on benchmark Z!!!” To those in the research community, it is not controversial to point out that benchmarks have serious limitations. As such, benchmarks may not accurately estimate the performance of some small(er) language models or may fail to highlight the vulnerabilities of their performance. For instance, a few months back, I wanted to run a multi-lingual language model on-device and found that most at the time were only capable in English and a few other high-resource languages. Since then, models like Phi-3.5-MoE-instruct have filled some of this gap, albeit at much higher parameter counts than I hoped.</p>
<p>As the field continues to evolve, we can expect to see further innovations in this space, making advanced NLP capabilities more accessible and deployable in resource-constrained environments.</p>



</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-abdin_phi-3_2024" class="csl-entry">
Abdin, Marah, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, et al. 2024. <span>“Phi-3 <span>Technical</span> <span>Report</span>: <span>A</span> <span>Highly</span> <span>Capable</span> <span>Language</span> <span>Model</span> <span>Locally</span> on <span>Your</span> <span>Phone</span>.”</span> arXiv. <a href="http://arxiv.org/abs/2404.14219">http://arxiv.org/abs/2404.14219</a>.
</div>
<div id="ref-abideen_mha_2024" class="csl-entry">
Abideen, Zain ul. 2024. <span>“<span>MHA</span> Vs <span>MQA</span> Vs <span>GQA</span> Vs <span>MLA</span>.”</span> <em>Medium</em>. <a href="https://medium.com/@zaiinn440/mha-vs-mqa-vs-gqa-vs-mla-c6cf8285bbec">https://medium.com/@zaiinn440/mha-vs-mqa-vs-gqa-vs-mla-c6cf8285bbec</a>.
</div>
<div id="ref-beatty_tiny_2024" class="csl-entry">
Beatty, Sally. 2024. <span>“Tiny but Mighty: <span>The</span> <span>Phi</span>-3 Small Language Models with Big Potential.”</span> <em>Microsoft News</em>. <a href="https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/">https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/</a>.
</div>
<div id="ref-bercovich_advancing_2024" class="csl-entry">
Bercovich, Akhiad, and Udi Karpas. 2024. <span>“Advancing the <span>Accuracy</span>-<span>Efficiency</span> <span>Frontier</span> with <span>Llama</span>-3.1-<span>Nemotron</span>-<span>51B</span>.”</span> <em>NVIDIA Technical Blog</em>. <a href="https://developer.nvidia.com/blog/advancing-the-accuracy-efficiency-frontier-with-llama-3-1-nemotron-51b/">https://developer.nvidia.com/blog/advancing-the-accuracy-efficiency-frontier-with-llama-3-1-nemotron-51b/</a>.
</div>
<div id="ref-deepseek-ai_deepseek-v2_2024" class="csl-entry">
DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, et al. 2024. <span>“<span>DeepSeek</span>-<span>V2</span>: <span>A</span> <span>Strong</span>, <span>Economical</span>, and <span>Efficient</span> <span>Mixture</span>-of-<span>Experts</span> <span>Language</span> <span>Model</span>.”</span> arXiv. <a href="https://doi.org/10.48550/arXiv.2405.04434">https://doi.org/10.48550/arXiv.2405.04434</a>.
</div>
<div id="ref-dettmers_qlora_2023" class="csl-entry">
Dettmers, Tim, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. <span>“<span>QLoRA</span>: <span>Efficient</span> <span>Finetuning</span> of <span>Quantized</span> <span>LLMs</span>.”</span> <em>Advances in Neural Information Processing Systems</em> 36 (December): 10088–115. <a href="https://proceedings.neurips.cc/paper_files/paper/2023/hash/1feb87871436031bdc0f2beaa62a049b-Abstract-Conference.html">https://proceedings.neurips.cc/paper_files/paper/2023/hash/1feb87871436031bdc0f2beaa62a049b-Abstract-Conference.html</a>.
</div>
<div id="ref-fu_39014562_2023" class="csl-entry">
Fu, Daniel Y. 2023. <span>“39014562 · <span>Monarch</span> <span>Mixer</span>: <span>A</span> <span>Simple</span> <span>Sub</span>-<span>Quadratic</span> <span>GEMM</span>-<span>Based</span> <span>Architecture</span>.”</span> <em>SlidesLive</em>. <a href="https://neurips.cc/virtual/2023/oral/73841">https://neurips.cc/virtual/2023/oral/73841</a>.
</div>
<div id="ref-glorioso_zamba2-mini_2024" class="csl-entry">
Glorioso, Paolo, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, and Beren Millidge. 2024. <span>“Zamba2-Mini - <span>Zyphra</span>.”</span> <a href="https://www.zyphra.com/post/zamba2-mini">https://www.zyphra.com/post/zamba2-mini</a>.
</div>
<div id="ref-goel_device_2024" class="csl-entry">
Goel, Karan. 2024. <span>“The <span>On</span>‑<span>Device</span> <span>Intelligence</span> <span>Update</span> - <span>Cartesia</span>.”</span> <a href="https://cartesia.ai/blog/2024-08-27-on-device">https://cartesia.ai/blog/2024-08-27-on-device</a>.
</div>
<div id="ref-gu_mamba_2023" class="csl-entry">
Gu, Albert, and Tri Dao. 2023. <span>“Mamba: <span>Linear</span>-<span>Time</span> <span>Sequence</span> <span>Modeling</span> with <span>Selective</span> <span>State</span> <span>Spaces</span>.”</span> arXiv. <a href="https://doi.org/10.48550/arXiv.2312.00752">https://doi.org/10.48550/arXiv.2312.00752</a>.
</div>
<div id="ref-javaheripi_phi-2_2023" class="csl-entry">
Javaheripi, Mojan, and Sébastien Bubeck. 2023. <span>“Phi-2: <span>The</span> Surprising Power of Small Language Models.”</span> <em>Microsoft Research</em>. <a href="https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/">https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/</a>.
</div>
<div id="ref-karpathy_its_2024" class="csl-entry">
@karpathy. 2024. <span>“It’s a Bit Sad and Confusing...”</span> Tweet. <em>Twitter</em>. <a href="https://x.com/karpathy/status/1835024197506187617">https://x.com/karpathy/status/1835024197506187617</a>.
</div>
<div id="ref-li_textbooks_2023" class="csl-entry">
Li, Yuanzhi, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, and Yin Tat Lee. 2023. <span>“Textbooks <span>Are</span> <span>All</span> <span>You</span> <span>Need</span> <span>II</span>: Phi-1.5 Technical Report.”</span> arXiv. <a href="https://doi.org/10.48550/arXiv.2309.05463">https://doi.org/10.48550/arXiv.2309.05463</a>.
</div>
<div id="ref-liu_vptq_2024" class="csl-entry">
Liu, Yifei, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, and Mao Yang. 2024. <span>“<span>VPTQ</span>: <span>Extreme</span> <span>Low</span>-Bit <span>Vector</span> <span>Post</span>-<span>Training</span> <span>Quantization</span> for <span>Large</span> <span>Language</span> <span>Models</span>.”</span> arXiv. <a href="https://doi.org/10.48550/arXiv.2409.17066">https://doi.org/10.48550/arXiv.2409.17066</a>.
</div>
<div id="ref-malinovskii_evolution_2024" class="csl-entry">
Malinovskii, Vladimir. 2024. <span>“The <span>Evolution</span> of <span>Extreme</span> <span>LLM</span> <span>Compression</span>: <span>From</span> <span>QuIP</span> to <span>AQLM</span> with <span>PV</span>-<span>Tuning</span>.”</span> <em>Yandex</em>. <a href="https://medium.com/yandex/the-evolution-of-extreme-llm-compression-from-quip-to-aqlm-with-pv-tuning-19c44b91af96">https://medium.com/yandex/the-evolution-of-extreme-llm-compression-from-quip-to-aqlm-with-pv-tuning-19c44b91af96</a>.
</div>
<div id="ref-openai_distillation_2024" class="csl-entry">
OpenAI. 2024. <span>“Distillation <span>API</span> <span>Docs</span>.”</span> <a href="https://platform.openai.com">https://platform.openai.com</a>.
</div>
<div id="ref-sanh_distilbert_2019" class="csl-entry">
Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. <span>“<span>DistilBERT</span>, a Distilled Version of <span>BERT</span>: Smaller, Faster, Cheaper and Lighter.”</span> arXiv. <a href="https://doi.org/10.48550/arXiv.1910.01108">https://doi.org/10.48550/arXiv.1910.01108</a>.
</div>
<div id="ref-schmid_welcome_2024" class="csl-entry">
Schmid, Philipp, Omar Sanseviero, Pedro Cuenca, Lewis Tunstall, Tom Aarsen, and Vaibhav Srivastav. 2024. <span>“Welcome <span>Gemma</span> 2 - <span>Google</span>’s New Open <span>LLM</span>.”</span> <a href="https://huggingface.co/blog/gemma2">https://huggingface.co/blog/gemma2</a>.
</div>
<div id="ref-shao_degree_2021" class="csl-entry">
Shao, Erzhuo, Shiyuan Guo, and Zachary A. Pardos. 2021. <span>“Degree <span>Planning</span> with <span>PLAN</span>-<span>BERT</span>: <span>Multi</span>-<span>Semester</span> <span>Recommendation</span> <span>Using</span> <span>Future</span> <span>Courses</span> of <span>Interest</span>.”</span> <em>Proceedings of the AAAI Conference on Artificial Intelligence</em> 35 (17): 14920–29. <a href="https://doi.org/10.1609/aaai.v35i17.17751">https://doi.org/10.1609/aaai.v35i17.17751</a>.
</div>
<div id="ref-vaswani_attention_2017" class="csl-entry">
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. <span>“Attention Is <span>All</span> You <span>Need</span>.”</span> <em>Advances in Neural Information Processing Systems</em>.
</div>
</div></section><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Interestingly, this also presents a significant problem in interpretability research.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-lekan2024" class="csl-entry quarto-appendix-citeas">
Lekan, Kasra. 2024. <span>“The Large World of Small(er) Language
Models.”</span> September 18, 2024. <a href="https://blog.kasralekan.com/ideas/small-language-models/">https://blog.kasralekan.com/ideas/small-language-models/</a>.
</div></div></section></div> ]]></description>
  <category>[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fhits.dwyl.com%2Fanrath%2Fblog_small-language-models.json&amp;show=unique&amp;style=flat-square&amp;label=Views&amp;color=orange)]()</category>
  <category>NLP</category>
  <category>LLM</category>
  <category>Lit. Review</category>
  <guid>https://blog.kasralekan.com/ideas/small-language-models/</guid>
  <pubDate>Wed, 18 Sep 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.kasralekan.com/ideas/small-language-models/heroGraphSizeQuality.png" medium="image" type="image/png" height="114" width="144"/>
</item>
<item>
  <title>An Unexpected DNS Error</title>
  <dc:creator>Kasra Lekan</dc:creator>
  <link>https://blog.kasralekan.com/ideas/optimizing-serverless/</link>
  <description><![CDATA[ 
<div class="progress" id="progress">
    <div class="train">
        <div class="train-tail"></div>
        <div class="train-body" id="train-body"></div>
        <div class="train-head"></div>
    </div>
</div>




<p>A few days ago, I went to my freshly redesigned website having not viewed or edited it since early the previous day. I received a DNS issue as shown in Figure&nbsp;1. This error was gone if I refreshed the page. However, most people who get an error going to a website will assume that the website is down so the error needed to be addressed.</p>
<div id="fig-dns-bug" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-dns-bug-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="bug.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="The result of viewing my website for the first time in ~24 hours before attempting to address the issue. Refreshing the page caused it to resolve."><img src="https://blog.kasralekan.com/ideas/optimizing-serverless/bug.png" class="img-fluid figure-img"></a></p>
<figcaption>The result of viewing my website for the first time in ~24 hours before attempting to address the issue. Refreshing the page caused it to resolve.</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-dns-bug-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<section id="diagnosing-the-problem" class="level1">
<h1>Diagnosing the Problem</h1>
<p>I knew that refreshing the page resolved the problem and that the issue would not arise if I re-attempted to view the website soon after visiting it. This indicated that the problem likely did not have to do with standard speed benchmarks that we concern ourselves with in web development, e.g.&nbsp;first content paint. Nevertheless, I wanted to rule out first content paint being an issue so I did some speed testing on the deployed website. As expected, the times were great with the first content painted within .54 seconds and the entire website loading in 1.5 seconds.</p>
<p>I was unable to find any documentation of others having this problem despite my having a fairly standard NexJS website without a sprawling dependency list or complicated logic. I had deployed many websites before with Vercel with no issues so I knew it was something with my project that was different than my previous ones. I was using two packages that I had never used before <code>framer-motion</code> and <code>react-rough-notations</code>. I considered the possibility that these dependencies may be the culprits because they drastically slowed down compilation when I was developing locally, taking roughly 5 seconds to load content after a fresh restart with <code>next dev</code>.</p>
<p>Based on my observations, I decided to focus on optimizing the bundling of my website packages in any way I could. In the back of my mind, I thought that these issues should not cause the DNS error I saw before but I had no other theories and had to start on a solution.</p>
</section>
<section id="a-digression-on-serverless-computing" class="level1">
<h1>A Digression on Serverless Computing</h1>
<p>Vercel<sup>1</sup> deployments are, to a first approximation, a layer on top of <a href="https://en.wikipedia.org/wiki/AWS_Lambda">AWS Lambdas</a>, a serverless computing provider. While I have not worked on any large-scale projects that provisioned large cloud systems, I studied them during my Masters and they are perhaps the most underrated technical achievement fueling the Internet today.</p>
<p>In general, cloud computing provides for economies of scale reducing the cost of server management and the aggregation of the best technical expertise. Serverless functions are another layer of innovation on top of cloud computing. They are extremely optimized down to the kernel level so that they can cold-start with 100s of milliseconds or even microseconds of a request. Thus, serverless functions can start at request time rather than running permanently (Figure&nbsp;2). The reason I don’t have to pay for deploying my “hobby” projects (with low traffic) is that it costs virtually nothing<sup>2</sup> to run these serverless functions.</p>
<div id="fig-serverless-function" class="quarto-layout-panel">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-serverless-function-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="serverlessFunctions.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="Figure&nbsp;2: Serverless computing (FaaS) visualized. Image credit: Prof.&nbsp;Yue Cheng"><img src="https://blog.kasralekan.com/ideas/optimizing-serverless/serverlessFunctions.png" class="img-fluid figure-img"></a></p>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><a href="serverlessHood.png" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="Figure&nbsp;2: Serverless computing (FaaS) visualized. Image credit: Prof.&nbsp;Yue Cheng"><img src="https://blog.kasralekan.com/ideas/optimizing-serverless/serverlessHood.png" class="img-fluid figure-img"></a></p>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-serverless-function-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Serverless computing (FaaS) visualized. Image credit: <a href="https://datascience.virginia.edu/people/yue-cheng">Prof.&nbsp;Yue Cheng</a>
</figcaption>
</figure>
</div>
</section>
<section id="solutions" class="level1">
<h1>Solutions</h1>
<div id="fig-bundle-analyzer" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-bundle-analyzer-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="bundleAnalyzerInit.png" class="lightbox" data-gallery="quarto-lightbox-gallery-4" title="The initial bundle analyzer output showing the size of the various parts of the website after building. NodeJS server results are shown."><img src="https://blog.kasralekan.com/ideas/optimizing-serverless/bundleAnalyzerInit.png" class="img-fluid figure-img"></a></p>
<figcaption>The initial bundle analyzer output showing the size of the various parts of the website after building. NodeJS server results are shown.</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-bundle-analyzer-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3
</figcaption>
</figure>
</div>
<p>Based on the documentation and the results of the bundle analyzer, I did the following:</p>
<ol start="0" type="1">
<li>I added a <a href="https://nextjs.org/docs/app/building-your-application/routing/loading-ui-and-streaming">Suspense wrapper</a> to my website’s content with a loading element.
<ul>
<li>I knew this would not solve my issue since my first-paint speed was good, but using Suspense to have a loading UI is best practice.</li>
</ul></li>
<li>Added the <code>optimizePackageImports</code> flag and applied it to <code>framer-motion</code> and <code>react-rough-notations</code>. From the <a href="https://nextjs.org/docs/app/building-your-application/optimizing/package-bundling#optimizing-package-imports">docs</a>, “This option will only load the modules you actually use, while still giving you the convenience of writing import statements with many named exports.”
<ul>
<li>I was especially interested in its performance with <code>framer-motion</code> since the dependency JavaScript is large and there is not much that you can do to reduce it<sup>3</sup> since you cannot import specific animations.</li>
</ul></li>
<li>I converted <code>framer-motion</code> calls to use vanilla CSS for simple animations.
<ul>
<li>I used <code>framer-motion</code> when vanilla CSS could accomplish the same effects since I already had the dependency. I realized that I could remove <code>framer-motion</code> from the necessary page load by using vanilla CSS when paired with the next optimization.</li>
</ul></li>
<li>I applied component lazy loading and skipped SSR<sup>4</sup> for my animated components based on the <a href="https://nextjs.org/docs/app/building-your-application/optimizing/lazy-loading#skipping-ssr">docs</a>.</li>
</ol>
<section id="results" class="level2">
<h2 class="anchored" data-anchor-id="results">🛬 Results</h2>
<div id="fig-bundle-analyzer-2" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-bundle-analyzer-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="bundleAnalyzerPost.png" class="lightbox" data-gallery="quarto-lightbox-gallery-5" title="The final bundle analyzer output showing the size of the various parts of the website after building with optimizations. Optimizations reduced the parsed size by ~19%. NodeJS server results are shown."><img src="https://blog.kasralekan.com/ideas/optimizing-serverless/bundleAnalyzerPost.png" class="img-fluid figure-img"></a></p>
<figcaption>The final bundle analyzer output showing the size of the various parts of the website after building with optimizations. Optimizations reduced the parsed size by ~19%. NodeJS server results are shown.</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-bundle-analyzer-2-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4
</figcaption>
</figure>
</div>
<p>After applying the optimizations my server-side bundle size (parsed not static) was reduced by 19.1% as shown in Figure&nbsp;4. Since these updates, my website has worked properly even after prolonged periods without requests.</p>
</section>
<section id="why-not-astro" class="level2">
<h2 class="anchored" data-anchor-id="why-not-astro">🚀 Why not Astro</h2>
<p>When first building my new website, I attempted to use <a href="https://astro.build/">Astro</a> instead of NextJS. Since my website is all static, Astro would perhaps be a better technical fit especially since their implementation of <a href="https://astro.build/blog/astro-4120/">Server Islands</a> which is similar to NextJS 14’s <a href="https://nextjs.org/docs/app/building-your-application/rendering/partial-prerendering">Partial Prerendering</a> would allow me to adds dynamic components when necessary. There are a lot of things I love about Astro. However, I struggled to use stateful components in Astro and ultimately had to use Next.</p>
</section>
<section id="reflections" class="level2">
<h2 class="anchored" data-anchor-id="reflections">🪞 Reflections</h2>
<p>The automatic optimizations that enable the rapid deployment of efficient websites like my own are truly stunning. A combination of serverless optimization, package bundle optimizations, and component optimizations in React metaframeworks allow me to focus on the content first. When I made my first website some years ago, I manually minified all my images and converted them to better formats for web viewing. With modern frameworks like Next, there are built-in <code>Image</code> components that perform optimizations like this for you.</p>
<blockquote class="blockquote">
<p>We stand on the shoulders of giants.</p>
</blockquote>


</section>
</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>My website is built with NextJS so <a href="https://www.vercel.com/">Vercel</a> was the natural choice for deployment.↩︎</p></li>
<li id="fn2"><p>Pricing for AWS Lambdas are billed at the 1-millisecond granularity. As of writing:</p>
<ul>
<li>$0.20 per million requests</li>
<li>$0.0000166667 per GB-second of compute</li>
</ul>
<p>This implies running 6000 1 GB Lambda function for one second costs $0.10.↩︎</p></li>
<li id="fn3"><p>This is not entirely true. There are some <a href="https://www.framer.com/motion/guide-reduce-bundle-size/">tools</a> Framer provides to reduce the bundle size but they have limitations. As Framer points out, this normally shouldn’t be necessary because most bundlers apply “tree shaking” to reduce the bundle to just what is used.↩︎</p></li>
<li id="fn4"><p>Server Side Rendering (SSR) refers to when the HTML that the client is served is generated on the server, often using a database or other external APIs.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-lekan2024" class="csl-entry quarto-appendix-citeas">
Lekan, Kasra. 2024. <span>“An Unexpected DNS Error.”</span> September 1,
2024. <a href="https://blog.kasralekan.com/ideas/optimizing-serverless/">https://blog.kasralekan.com/ideas/optimizing-serverless/</a>.
</div></div></section></div> ]]></description>
  <category>[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fhits.dwyl.com%2Fanrath%2Fblog_optimizing-payload.json&amp;show=unique&amp;style=flat-square&amp;label=Views&amp;color=orange)]()</category>
  <category>Cloud Computing</category>
  <category>WebDev</category>
  <guid>https://blog.kasralekan.com/ideas/optimizing-serverless/</guid>
  <pubDate>Sun, 01 Sep 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.kasralekan.com/ideas/optimizing-serverless/bug.png" medium="image" type="image/png" height="80" width="144"/>
</item>
<item>
  <title>Why Game Design is Hard</title>
  <dc:creator>Kasra Lekan</dc:creator>
  <link>https://blog.kasralekan.com/ideas/game-design/</link>
  <description><![CDATA[ 
<div class="progress" id="progress">
    <div class="train">
        <div class="train-tail"></div>
        <div class="train-body" id="train-body"></div>
        <div class="train-head"></div>
    </div>
</div>




<div id="fig-maplestory" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-maplestory-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="maplestory-basilmarket-screen.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Maplestory was a popular 2D MMO released in 2003."><img src="https://blog.kasralekan.com/ideas/game-design/maplestory-basilmarket-screen.jpg" class="img-fluid figure-img"></a></p>
<figcaption>Maplestory was a popular 2D MMO released in 2003.</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-maplestory-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<section id="a-simple-observation" class="level1">
<h1>A Simple Observation</h1>
<p>Why are there so few MMO<sup>1</sup> games? Although I rarely played MMOs, the few times I have have been extremely fun because it is a large-scale social experience. At times you’re interacting with 10’s or 100’s of real people in real time.</p>
<p>I posed this question to a gamer/network engineer I know. He asked me to consider the tech stack required to provide a seamless high-player-count experience. That requires a server that is receiving data from ~10-100 clients who have different latency to the server (but are providing input at the same time) and must reconcile all those actions to inform the clients what happened. This is a difficult technical problem that most game engines<sup>2</sup> can ignore due to their low supported player count. While this was a compelling argument, I also considered how making an MMO is potentially a poor business decision for most developers.</p>
<p>I decided to explore deeper. Jump to the context section for an overview of the industry.</p>
</section>
<section id="technical-and-business-factors" class="level1">
<h1>Technical and Business Factors</h1>
<section id="game-engines-and-devices" class="level2">
<h2 class="anchored" data-anchor-id="game-engines-and-devices">🚂 Game Engines and Devices</h2>
<p>Game distribution today is diverse. While physical copies maintain some presence in the market, most games are distributed through online marketplaces like <a href="https://en.wikipedia.org/wiki/Steam_(service)">Steam</a> or mobile app stores. Since distribution is relatively simple, the main barriers to entry for new titles are (1) Game Design / Development Times and (2) Marketing.</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div id="fig-engine-decisions" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-engine-decisions-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div>
<pre class="mermaid mermaid-js" data-label="fig-engine-decisions">flowchart TD

subgraph Z[" "]
direction LR
  A{No-Code or Coding?} --&gt; C{Platform/Device}

  C --&gt; D1[Console]
  C --&gt; D2[Desktop]
  C --&gt; D3[Mobile]
  C --&gt; D4[VR]
end

subgraph ZA[" "]
direction LR
    B{2D or 3D?} --&gt; E{Licensing Agreement}
    
    E--&gt;H1[Open Source/Free]
    E--&gt;H2[Commercial Licensing]
end

Z --&gt; ZA
</pre>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-engine-decisions-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: A sample decision hierarchy for game engine selection.
</figcaption>
</figure>
</div>
</div>
</div>
<p>Game design is part engineering, part art, and part business. Game engines are designed to carry some of the engineering load. While there are many engines, the choice is based on a few key criteria summarized in Figure&nbsp;2. A few game engines worth pointing out:</p>
<ul>
<li><a href="https://github.com/kaplayjs/kaplay">Kaplay (formerly Kaboom)</a> - The premier JavaScript-based game library, enabling web-first games but also desktop and mobile games through tools like <a href="https://github.com/electron/electron">Electron</a> and <a href="https://github.com/tauri-apps/tauri">Tauri</a>.</li>
<li><a href="https://github.com/o3de/o3de">Open 3D Engine (O3DE)</a> - an open-source, multi-platform 3D engine originally built for AAA titles<sup>3</sup></li>
<li><a href="https://github.com/godotengine/godot">Godot</a> - another open-source, cross-platform engine; designed for both 2D and 3D.</li>
<li><a href="https://unity.com/">Unity</a> and <a href="https://www.unrealengine.com/">Unreal</a> - closed-source but the most popular game engines. Unity has an extremely popular mobile game engine component as well.</li>
</ul>
</section>
<section id="customer-satisfaction" class="level2">
<h2 class="anchored" data-anchor-id="customer-satisfaction">🤨 Customer Satisfaction</h2>
<p>Simple games have an easier time satisfying the player. When playing classic Mario, there are mechanics, nuance, and plot. Ultimately, however, the player is a lone gamer trying to complete the level. There is a clear objective with no external input.</p>
<p>When more complexity is added, most especially with multi-player support, the game design becomes more difficult. Just like businesses can have different categories of customers, games have different kinds of players. Balancing the game for each of these to make them happy can be difficult. For instance, a common imbalance occurs between “casual” and “hard-core” players. Some players will only devote an hour every once in a while to the game while the hard-core player will quickly reach the “end-game” content. If all of the development energy was placed into the initial progression of the game, the end-game will suffer and hard-core players will leave disappointed.</p>
</section>
<section id="case-study-new-world" class="level2">
<h2 class="anchored" data-anchor-id="case-study-new-world">🆕 Case Study: New World</h2>
<blockquote class="blockquote">
<p>New World is a massively popular multiplayer online (MMO) game created by Amazon Games and released in September 2021… For players, the game is an extremely immersive experience: they don’t have to wait for screens to load or other interruptions. This is known as “seamlessness,” and it’s a valuable trait for developers to be able to deliver. <span class="citation" data-cites="walsh_unique_2022">(Walsh 2022)</span></p>
</blockquote>
<p>New World is the most popular novel MMO I have seen in North America. Owned by Amazon, the game naturally used much of their infrastructure and the now deprecated Lumberyard engine. <span class="citation" data-cites="walsh_unique_2022">Walsh (2022)</span> summarize the architecture which supported the game (Figure&nbsp;3).</p>
<div id="fig-aws" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-aws-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="newWorldArchitecture.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="New World‘s Architecture"><img src="https://blog.kasralekan.com/ideas/game-design/newWorldArchitecture.png" class="img-fluid figure-img"></a></p>
<figcaption>New World‘s Architecture</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-aws-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3
</figcaption>
</figure>
</div>
<blockquote class="blockquote">
<p>Collectively, the Amazon EC2 instances for a single world in Aeternum can simulate more than 7,000 artificial intelligence entities and hundreds of thousands of objects for 2,500 players. Each server set often processes millions of state changes per second, selecting the relevant data to create individual immersive experiences.</p>
</blockquote>
<p>While this architecture, it is exciting that cloud computing has made provisioning an infrastructure for such a massive game far easier.</p>
<p>Despite its commercial success, ultimately New World lost most of its player base soon after launch. The game peaked on September 27th, 2021 with just over 900,000 players. By the end of the year, that number dwindled to 117,000 players. One month later - 68,000. Next month - 34,000. The game maintained that player base through the end of 2023 <span class="citation" data-cites="steamcharts_new_2024">(steamcharts 2024)</span>.</p>
</section>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>Given the technical complexity of supporting a large player base with “seamlessness” and the challenges in satisfying the wide variety of customers, it is unsurprising that few new MMOs are made. Naturally, successful games can bring in strong revenue but the return on invested time may be higher by creating a simpler game. It’s worth noting that many indie developers make games that they would want to play and do not enter the developer with the calculating approach that I took in writing this article.</p>
</section>
<section id="context" class="level1">
<h1>Context</h1>
<blockquote class="blockquote">
<p>The video game industry is a dynamic sector, characterized by a diverse customer base, a wide array of developers, dominant geographies, various distribution systems and devices, multiple business models, and a plethora of game genres.</p>
</blockquote>
<section id="market-overview" class="level2">
<h2 class="anchored" data-anchor-id="market-overview">🌏 Market Overview</h2>
<p>Here I’m focused on a small subset of the overall game market in terms of both revenue and player count. However, some data on the industry overall will be useful in contextualizing the information presented here.</p>
<div id="cell-fig-platform-revenue" class="cell" data-execution_count="1">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> plotly.subplots <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> make_subplots</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> plotly.graph_objects <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> go</span>
<span id="cb1-3"></span>
<span id="cb1-4">labels_sector <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Mobile"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Console"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PC"</span>]</span>
<span id="cb1-5">values_sector <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">90.4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">53.2</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">40.3</span>]</span>
<span id="cb1-6"></span>
<span id="cb1-7">fig_sector <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> go.Figure(data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[go.Pie(labels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>labels_sector, values<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>values_sector, hole<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.3</span>)])</span>
<span id="cb1-8"></span>
<span id="cb1-9">fig_sector.update_traces(</span>
<span id="cb1-10">    textinfo<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"label+text+percent"</span>, </span>
<span id="cb1-11">    texttemplate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{label}</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">&lt;br&gt;$%</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{value}</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> billion&lt;br&gt;(%</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{percent}</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">)"</span></span>
<span id="cb1-12">)</span>
<span id="cb1-13"></span>
<span id="cb1-14">fig_sector.update_layout(</span>
<span id="cb1-15">    title_text<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2023 Global Games Market Revenue, by Sector"</span>,</span>
<span id="cb1-16">    dragmode<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pan"</span>,</span>
<span id="cb1-17">    plot_bgcolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#2C2F33'</span>,</span>
<span id="cb1-18">    paper_bgcolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#23272A'</span>,</span>
<span id="cb1-19">    font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb1-20">    title_font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb1-21">    legend_title_font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb1-22">    legend_font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb1-23">    xaxis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(</span>
<span id="cb1-24">        title_font<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>),</span>
<span id="cb1-25">        tickfont<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>)</span>
<span id="cb1-26">    ),</span>
<span id="cb1-27">    yaxis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(</span>
<span id="cb1-28">        title_font<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>),</span>
<span id="cb1-29">        tickfont<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>)</span>
<span id="cb1-30">    ),</span>
<span id="cb1-31">    margin<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(l<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, r<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, t<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb1-32">)</span>
<span id="cb1-33"></span>
<span id="cb1-34">fig_sector.show(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"displaylogo"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>})</span></code></pre></div>
</details>
<div id="fig-platform-revenue" class="cell-output cell-output-display quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-platform-revenue-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div>                            <div id="d8b6414d-11fc-47e6-bd19-78ef269c701e" class="plotly-graph-div" style="height:525px; width:100%;"></div>            <script type="text/javascript">                require(["plotly"], function(Plotly) {                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById("d8b6414d-11fc-47e6-bd19-78ef269c701e")) {                    Plotly.newPlot(                        "d8b6414d-11fc-47e6-bd19-78ef269c701e",                        [{"hole":0.3,"labels":["Mobile","Console","PC"],"values":[90.4,53.2,40.3],"type":"pie","textinfo":"label+text+percent","texttemplate":"%{label}\u003cbr\u003e$%{value} billion\u003cbr\u003e(%{percent})"}],                        {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"},"margin":{"b":0,"l":0,"r":0,"t":30}}},"title":{"font":{"color":"#FFFFFF"},"text":"2023 Global Games Market Revenue, by Sector"},"font":{"color":"#FFFFFF"},"legend":{"title":{"font":{"color":"#FFFFFF"}},"font":{"color":"#FFFFFF"}},"xaxis":{"title":{"font":{"color":"#FFFFFF"}},"tickfont":{"color":"#FFFFFF"}},"yaxis":{"title":{"font":{"color":"#FFFFFF"}},"tickfont":{"color":"#FFFFFF"}},"margin":{"l":50,"r":50,"t":50,"b":50},"dragmode":"pan","plot_bgcolor":"#2C2F33","paper_bgcolor":"#23272A"},                        {"displaylogo": false, "responsive": true}                    ).then(function(){
                            
var gd = document.getElementById('d8b6414d-11fc-47e6-bd19-78ef269c701e');
var x = new MutationObserver(function (mutations, observer) {{
        var display = window.getComputedStyle(gd).display;
        if (!display || display === 'none') {{
            console.log([gd, 'removed!']);
            Plotly.purge(gd);
            observer.disconnect();
        }}
}});

// Listen for the removal of the full notebook cells
var notebookContainer = gd.closest('#notebook-container');
if (notebookContainer) {{
    x.observe(notebookContainer, {childList: true});
}}

// Listen for the clearing of the current output cell
var outputEl = gd.closest('.output');
if (outputEl) {{
    x.observe(outputEl, {childList: true});
}}

                        })                };                });            </script>        </div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-platform-revenue-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: A breakdown of 2023 global game revenue by platform.
</figcaption>
</figure>
</div>
</div>
<p>In 2023, the global video game market was valued at $184 billion with ~22% related to PC games Figure&nbsp;4. Here I focus on this portion of games since they are most germane to my questions of interest.</p>
<div id="cell-fig-region-breakdown" class="cell" data-execution_count="2">
<details class="code-fold">
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> plotly.subplots <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> make_subplots</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> plotly.graph_objects <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> go</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Data for the pie charts</span></span>
<span id="cb2-5">labels_market_share <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Asia-Pacific"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"North America"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Europe"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Latin America"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Middle East/Africa"</span>]</span>
<span id="cb2-6">values_market_share <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">84.1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">50.6</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">33.6</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">8.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">7.1</span>]</span>
<span id="cb2-7"></span>
<span id="cb2-8">labels_player_share <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Asia-Pacific"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Middle East/Africa"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Europe"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Latin America"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"North America"</span>]</span>
<span id="cb2-9">values_player_share <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1800</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">574</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">447</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">335</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">237</span>]</span>
<span id="cb2-10"></span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Creating a subplot with two rows and one column, vertically aligned</span></span>
<span id="cb2-12">fig_combined <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> make_subplots(</span>
<span id="cb2-13">    rows<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, cols<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb2-14">    row_heights<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>],</span>
<span id="cb2-15">    vertical_spacing<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,</span>
<span id="cb2-16">    subplot_titles<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2023 Global Games Market Share ($ billions), by Region"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2023 Global Games Player Count (millions), by Region"</span>),</span>
<span id="cb2-17">    specs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[[{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"type"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pie"</span>}], [{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"type"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pie"</span>}]]</span>
<span id="cb2-18">)</span>
<span id="cb2-19"></span>
<span id="cb2-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Adding the Market Share pie chart to the first row</span></span>
<span id="cb2-21">fig_combined.add_trace(</span>
<span id="cb2-22">    go.Pie(labels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>labels_market_share, values<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>values_market_share, hole<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.3</span>, hoverinfo<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'label+percent+value'</span>),</span>
<span id="cb2-23">    row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb2-24">)</span>
<span id="cb2-25"></span>
<span id="cb2-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Adding the Player Share pie chart to the second row</span></span>
<span id="cb2-27">fig_combined.add_trace(</span>
<span id="cb2-28">    go.Pie(labels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>labels_player_share, values<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>values_player_share, hole<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.3</span>, hoverinfo<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'label+percent+value'</span>),</span>
<span id="cb2-29">    row<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, col<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb2-30">)</span>
<span id="cb2-31"></span>
<span id="cb2-32">fig_combined.update_layout(</span>
<span id="cb2-33">    dragmode<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pan"</span>,</span>
<span id="cb2-34">    plot_bgcolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#2C2F33'</span>,</span>
<span id="cb2-35">    paper_bgcolor<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#23272A'</span>,</span>
<span id="cb2-36">    font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb2-37">    title_font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb2-38">    legend_title_font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb2-39">    legend_font_color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>,</span>
<span id="cb2-40">    xaxis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(</span>
<span id="cb2-41">        title_font<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>),</span>
<span id="cb2-42">        tickfont<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>)</span>
<span id="cb2-43">    ),</span>
<span id="cb2-44">    yaxis<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(</span>
<span id="cb2-45">        title_font<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>),</span>
<span id="cb2-46">        tickfont<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(color<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'#FFFFFF'</span>)</span>
<span id="cb2-47">    ),</span>
<span id="cb2-48">    margin<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">dict</span>(l<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, r<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, t<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>)</span>
<span id="cb2-49">)</span>
<span id="cb2-50"></span>
<span id="cb2-51">fig_combined.show(config<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>{<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"displaylogo"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>})</span></code></pre></div>
</details>
<div id="fig-region-breakdown" class="cell-output cell-output-display quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-region-breakdown-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div>                            <div id="e5c273f9-7251-4ea9-9f8b-c281d1d43d12" class="plotly-graph-div" style="height:525px; width:100%;"></div>            <script type="text/javascript">                require(["plotly"], function(Plotly) {                    window.PLOTLYENV=window.PLOTLYENV || {};                                    if (document.getElementById("e5c273f9-7251-4ea9-9f8b-c281d1d43d12")) {                    Plotly.newPlot(                        "e5c273f9-7251-4ea9-9f8b-c281d1d43d12",                        [{"hole":0.3,"hoverinfo":"label+percent+value","labels":["Asia-Pacific","North America","Europe","Latin America","Middle East\u002fAfrica"],"values":[84.1,50.6,33.6,8.7,7.1],"type":"pie","domain":{"x":[0.0,1.0],"y":[0.55,1.0]}},{"hole":0.3,"hoverinfo":"label+percent+value","labels":["Asia-Pacific","Middle East\u002fAfrica","Europe","Latin America","North America"],"values":[1800,574,447,335,237],"type":"pie","domain":{"x":[0.0,1.0],"y":[0.0,0.45]}}],                        {"template":{"data":{"histogram2dcontour":[{"type":"histogram2dcontour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"choropleth":[{"type":"choropleth","colorbar":{"outlinewidth":0,"ticks":""}}],"histogram2d":[{"type":"histogram2d","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmap":[{"type":"heatmap","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"heatmapgl":[{"type":"heatmapgl","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"contourcarpet":[{"type":"contourcarpet","colorbar":{"outlinewidth":0,"ticks":""}}],"contour":[{"type":"contour","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"surface":[{"type":"surface","colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]}],"mesh3d":[{"type":"mesh3d","colorbar":{"outlinewidth":0,"ticks":""}}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"parcoords":[{"type":"parcoords","line":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolargl":[{"type":"scatterpolargl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"scattergeo":[{"type":"scattergeo","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterpolar":[{"type":"scatterpolar","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"scattergl":[{"type":"scattergl","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatter3d":[{"type":"scatter3d","line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattermapbox":[{"type":"scattermapbox","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scatterternary":[{"type":"scatterternary","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"scattercarpet":[{"type":"scattercarpet","marker":{"colorbar":{"outlinewidth":0,"ticks":""}}}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"white","linecolor":"white","minorgridcolor":"white","startlinecolor":"#2a3f5f"},"type":"carpet"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}],"barpolar":[{"marker":{"line":{"color":"#E5ECF6","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"pie":[{"automargin":true,"type":"pie"}]},"layout":{"autotypenumbers":"strict","colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"hovermode":"closest","hoverlabel":{"align":"left"},"paper_bgcolor":"white","plot_bgcolor":"#E5ECF6","polar":{"bgcolor":"#E5ECF6","angularaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"radialaxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"ternary":{"bgcolor":"#E5ECF6","aaxis":{"gridcolor":"white","linecolor":"white","ticks":""},"baxis":{"gridcolor":"white","linecolor":"white","ticks":""},"caxis":{"gridcolor":"white","linecolor":"white","ticks":""}},"coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]]},"xaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"yaxis":{"gridcolor":"white","linecolor":"white","ticks":"","title":{"standoff":15},"zerolinecolor":"white","automargin":true,"zerolinewidth":2},"scene":{"xaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"yaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2},"zaxis":{"backgroundcolor":"#E5ECF6","gridcolor":"white","linecolor":"white","showbackground":true,"ticks":"","zerolinecolor":"white","gridwidth":2}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"geo":{"bgcolor":"white","landcolor":"#E5ECF6","subunitcolor":"white","showland":true,"showlakes":true,"lakecolor":"white"},"title":{"x":0.05},"mapbox":{"style":"light"},"margin":{"b":0,"l":0,"r":0,"t":30}}},"annotations":[{"font":{"size":16},"showarrow":false,"text":"2023 Global Games Market Share ($ billions), by Region","x":0.5,"xanchor":"center","xref":"paper","y":1.0,"yanchor":"bottom","yref":"paper"},{"font":{"size":16},"showarrow":false,"text":"2023 Global Games Player Count (millions), by Region","x":0.5,"xanchor":"center","xref":"paper","y":0.45,"yanchor":"bottom","yref":"paper"}],"font":{"color":"#FFFFFF"},"title":{"font":{"color":"#FFFFFF"}},"legend":{"title":{"font":{"color":"#FFFFFF"}},"font":{"color":"#FFFFFF"}},"xaxis":{"title":{"font":{"color":"#FFFFFF"}},"tickfont":{"color":"#FFFFFF"}},"yaxis":{"title":{"font":{"color":"#FFFFFF"}},"tickfont":{"color":"#FFFFFF"}},"margin":{"l":50,"r":50,"t":50,"b":50},"dragmode":"pan","plot_bgcolor":"#2C2F33","paper_bgcolor":"#23272A"},                        {"displaylogo": false, "responsive": true}                    ).then(function(){
                            
var gd = document.getElementById('e5c273f9-7251-4ea9-9f8b-c281d1d43d12');
var x = new MutationObserver(function (mutations, observer) {{
        var display = window.getComputedStyle(gd).display;
        if (!display || display === 'none') {{
            console.log([gd, 'removed!']);
            Plotly.purge(gd);
            observer.disconnect();
        }}
}});

// Listen for the removal of the full notebook cells
var notebookContainer = gd.closest('#notebook-container');
if (notebookContainer) {{
    x.observe(notebookContainer, {childList: true});
}}

// Listen for the clearing of the current output cell
var outputEl = gd.closest('.output');
if (outputEl) {{
    x.observe(outputEl, {childList: true});
}}

                        })                };                });            </script>        </div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-region-breakdown-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: A breakdown of 2023 global game revenue and player counts by region.
</figcaption>
</figure>
</div>
</div>
<p>North America represents ~7% of the 3.3 billion players of games globally, however, it is responsible for nearly 30% of the revenue Figure&nbsp;5. I will focus mainly on the North American market due to my familiarity with it. The phenomena I discuss will generalize to some extent globally but may not map precisely to other geographies.</p>
<p>To learn more, I recommend reading my sources in full:</p>
<ul>
<li>Visualized US data based on a 2023 YouGov poll <span class="citation" data-cites="esa_2024_2024">(ESA 2024)</span>.</li>
<li>⭐ For comprehensive, global perspective on video games in 2024 with case studies <span class="citation" data-cites="eriksen_state_2024">(Eriksen 2024)</span>.</li>
</ul>
</section>
<section id="game-genres" class="level2">
<h2 class="anchored" data-anchor-id="game-genres">🧟 Game Genres</h2>
<p>There is a great variety of games. Below is a list of common genres and related games:</p>
<ul>
<li><strong>Action Adventure</strong> (e.g., Assassins Creed, God of War, Legend of Zelda)
<ul>
<li><strong>Hack and Slash RPG</strong> (e.g., Hades, Children of Morta, Moonlighter)</li>
<li><strong>Single-player RPG</strong> (e.g., Skyrim, Final Fantasy VII, Mass Effect)</li>
</ul></li>
<li><strong>MOBA</strong> (e.g., League of Legends, Dota 2, Arena of Valor, Wild Rift)</li>
<li><strong>MMORPG</strong> (e.g., World of Warcraft, Guild Wars, EVE Online, Black Desert Mobile)</li>
<li><strong>Shooter</strong> (e.g., Overwatch, Call of Duty, Counter-Strike, VALORANT)
<ul>
<li><strong>Battle Royale</strong> (e.g., Fortnite Battle Royale, PUBG, Call of Duty Warzone)</li>
</ul></li>
<li><strong>2D Platformer</strong> (e.g., Mario, Metroid, Castlevania, Hollow Knight, Dead Cells, Ori and the Will of the Wisps, Cuphead, Blasphemous, Celeste)</li>
<li><strong>Auto-Battler</strong> (e.g., Teamfight Tactics, Dota Auto Chess)</li>
<li><strong>Fighting Games</strong> (e.g., Street Fighter, Super Smash Bros, Tekken)</li>
<li><strong>Lifestyle Role-playing / Simulation Games</strong> (e.g., Animal Crossing, The Sims, Stardew Valley)</li>
<li><strong>Collector</strong> (e.g., Genshin Impact, AFK Arena, Marvel Contest of Champions)</li>
<li><strong>Collectible Card Games</strong> (e.g., Hearthstone, Magic the Gathering Online, Legends of Runeterra)</li>
<li><strong>Endless Runner</strong> (e.g., Subway Surfers, Temple Run, Crash Bandicoot: On the Run, Jetpack Joyride)</li>
<li><strong>Tower Defense</strong> (e.g., Plants vs.&nbsp;Zombies, Bloons TD, Kingdom Rush)</li>
<li><strong>Adventure/Puzzle</strong> (e.g., The Outer Wilds, Firewatch, The Talos Principle, Journey)</li>
</ul>
<p>Naturally, some genres tend to be more popular such as Action-RPGs and Shooters on PC/Console and Puzzle on mobile.</p>
<div id="fig-top-us-2023" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-top-us-2023-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="table-responsive">
<table class="table-striped table-hover caption-top table">
<caption>2023 Top Grossing Games</caption>
<colgroup>
<col style="width: 4%">
<col style="width: 28%">
<col style="width: 25%">
<col style="width: 19%">
<col style="width: 21%">
</colgroup>
<thead>
<tr class="header">
<th>Rank</th>
<th>Top Grossing Console &amp; PC Full Game</th>
<th>Genre</th>
<th>Top Grossing Mobile Game</th>
<th>Genre</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>Hogwarts Legacy</td>
<td><strong>Action RPG</strong></td>
<td>MONOPOLY GO!</td>
<td>Board Game</td>
</tr>
<tr class="even">
<td>2</td>
<td>Call of Duty: Modern Warfare 3</td>
<td>First-Person Shooter</td>
<td>Candy Crush Saga</td>
<td><strong>Puzzle</strong></td>
</tr>
<tr class="odd">
<td>3</td>
<td>Madden NFL 24</td>
<td>Sports</td>
<td>Roblox</td>
<td>Sandbox/Adventure</td>
</tr>
<tr class="even">
<td>4</td>
<td>Marvel’s Spider-Man 2</td>
<td><strong>Action-Adventure</strong></td>
<td>Royal Match</td>
<td><strong>Puzzle</strong></td>
</tr>
<tr class="odd">
<td>5</td>
<td>The Legend of Zelda: Tears of the Kingdom</td>
<td><strong>Action-Adventure</strong></td>
<td>Coin Master</td>
<td>Casual/Card</td>
</tr>
<tr class="even">
<td>6</td>
<td>Diablo IV</td>
<td><strong>Action RPG</strong></td>
<td>Pokémon GO</td>
<td>Augmented Reality/Adventure</td>
</tr>
<tr class="odd">
<td>7</td>
<td>Call of Duty: Modern Warfare 2</td>
<td>First-Person Shooter</td>
<td>Gardenscapes</td>
<td><strong>Puzzle</strong></td>
</tr>
<tr class="even">
<td>8</td>
<td>Mortal Kombat 1</td>
<td>Fighting</td>
<td>Jackpot Party – Casino Slots</td>
<td>Casino</td>
</tr>
<tr class="odd">
<td>9</td>
<td>Star Wars: Jedi Survivor</td>
<td><strong>Action-Adventure</strong></td>
<td>Township</td>
<td>Simulation/City-Building</td>
</tr>
<tr class="even">
<td>10</td>
<td>EA Sports FC 24</td>
<td>Sports</td>
<td>Evony</td>
<td>Strategy</td>
</tr>
</tbody>
</table>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-top-us-2023-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;6: <span class="citation" data-cites="esa_2024_2024">ESA (2024)</span> lists the highest-grossing games in the United States in 2023. The highest frequency genre is bolded.
</figcaption>
</figure>
</div>
</section>
<section id="developers" class="level2">
<h2 class="anchored" data-anchor-id="developers">🛖 Developers</h2>
<p>The video game industry consists of a mix of large, established companies and smaller independent developers:</p>
<ul>
<li><strong>Large Developers</strong>: Companies like Riot Games, Electronic Arts, and Ubisoft dominate the market with high-budget, AAA titles.</li>
<li><strong>Independent Developers</strong>: Indie developers are significant, producing games that often gain substantial followings.</li>
</ul>
<p>Naturally, success looks different for a bootstrapped or low-budget indie developer than for a large, well-funded developer. Think startup versus Google. For a startup, releasing a product with several million dollars in revenue is a massive success while Google has <a href="https://killedbygoogle.com/">killed many products</a> that have millions of users.</p>
</section>
<section id="business-models" class="level2">
<h2 class="anchored" data-anchor-id="business-models">💵 Business Models</h2>
<p>The video game industry employs various business models to generate revenue:</p>
<ul>
<li><strong>Premium Sales</strong>: One-time purchase of games.</li>
<li><strong>Free-to-Play (F2P)</strong>: Games are free to download, but revenue is generated through in-game purchases and microtransactions for virtual goods, skins, etc. While many such transactions are cosmetic, others are a “time-for-money” tradeoff allowing players to achieve the same ends through lots of playtime or spending money.</li>
<li><strong>Subscription Models</strong>: Some games have subscriptions but there are also services like Xbox Game Pass and PlayStation Now that offer access to a library of games for a monthly fee.</li>
<li><strong>Advertising</strong>: Especially prevalent in mobile games, where ads are shown to players.</li>
</ul>



</section>
</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-eriksen_state_2024" class="csl-entry">
Eriksen, Kaare. 2024. <span>“The <span>State</span> of the <span>Video</span> <span>Games</span> <span>Industry</span>: <span>A</span> <span>Special</span> <span>Report</span>.”</span> <em>Variety</em>. <a href="https://variety.com/vip-special-reports/state-of-video-games-a-special-report-1235884124/">https://variety.com/vip-special-reports/state-of-video-games-a-special-report-1235884124/</a>.
</div>
<div id="ref-esa_2024_2024" class="csl-entry">
ESA. 2024. <span>“2024 <span>Essential</span> <span>Facts</span> <span>About</span> the <span>U</span>.<span>S</span>. <span>Video</span> <span>Game</span> <span>Industry</span>.”</span> <a href="https://www.theesa.com/resources/essential-facts-about-the-us-video-game-industry/2024-data/">https://www.theesa.com/resources/essential-facts-about-the-us-video-game-industry/2024-data/</a>.
</div>
<div id="ref-steamcharts_new_2024" class="csl-entry">
steamcharts. 2024. <span>“New <span>World</span> - <span>Steam</span> <span>Charts</span>.”</span> <a href="https://steamcharts.com/app/1063730#All">https://steamcharts.com/app/1063730#All</a>.
</div>
<div id="ref-walsh_unique_2022" class="csl-entry">
Walsh, Nicholas. 2022. <span>“The <span>Unique</span> <span>Architecture</span> Behind <span>Amazon</span> <span>Games</span>’ <span>Seamless</span> <span>MMO</span> <span>New</span> <span>World</span> <span></span> <span>AWS</span> for <span>Games</span> <span>Blog</span>.”</span> <a href="https://aws.amazon.com/blogs/gametech/the-unique-architecture-behind-amazon-games-seamless-mmo-new-world/">https://aws.amazon.com/blogs/gametech/the-unique-architecture-behind-amazon-games-seamless-mmo-new-world/</a>.
</div>
</div></section><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Massively Multiplayer Online games, <a href="https://en.wikipedia.org/wiki/Massively_multiplayer_online_game">Wikipedia Entry</a>.↩︎</p></li>
<li id="fn2"><p><a href="https://unity.com/">Unity</a> and <a href="https://www.unrealengine.com/">Unreal</a> are the most popular engines. For more on game engines, see <a href="https://en.wikipedia.org/wiki/Game_engine">Wikipedia Entry</a>.↩︎</p></li>
<li id="fn3"><p>See <a href="https://aws.amazon.com/lumberyard">Amazon Lumberyard</a>. O3DE was announced as Lumberyard’s Apache-licensed successor in 2021.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-lekan2024" class="csl-entry quarto-appendix-citeas">
Lekan, Kasra. 2024. <span>“Why Game Design Is Hard.”</span> August 23,
2024. <a href="https://blog.kasralekan.com/ideas/game-design/">https://blog.kasralekan.com/ideas/game-design/</a>.
</div></div></section></div> ]]></description>
  <category>[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fhits.dwyl.com%2Fanrath%2Fblog_video-games.json&amp;show=unique&amp;style=flat-square&amp;label=Views&amp;color=orange)]()</category>
  <category>design</category>
  <category>market-overview</category>
  <guid>https://blog.kasralekan.com/ideas/game-design/</guid>
  <pubDate>Fri, 23 Aug 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.kasralekan.com/ideas/game-design/maplestory-basilmarket-screen.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Langage Model Performance Plateaus. What’s next?</title>
  <dc:creator>Kasra Lekan</dc:creator>
  <link>https://blog.kasralekan.com/ideas/lm-performance-plateau/</link>
  <description><![CDATA[ 
<div class="progress" id="progress">
    <div class="train">
        <div class="train-tail"></div>
        <div class="train-body" id="train-body"></div>
        <div class="train-head"></div>
    </div>
</div>




<p>A special credit to <span class="citation" data-cites="t3dotgg_ai_2024">t3dotgg (2024)</span> for the video that inspired me to write this post before writing a post on the technical innovations behind small / more efficient language models.</p>
<hr>
<div class="callout callout-style-default callout-warning callout-titled" title="Technical Content Disclaimer">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Technical Content Disclaimer
</div>
</div>
<div class="callout-body-container callout-body">
<!-- https://quarto.org/docs/authoring/callouts.html -->
<p>This post does not constitute a journal-level review. Therefore, my research for this post was intended to be informational and did not exhaust the search space. If you notice any key papers or references that I have missed or if I misinterpreted the findings of any reference, please let me know in the comments.</p>
</div>
</div>
<section id="language-model-performance-is-plateauing" class="level1">
<h1>Language Model Performance is Plateauing</h1>
<div id="fig-model-graph-performance" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-model-graph-performance-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="model_graph_performance.jpg" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="@maximelabonne_due_2024"><img src="https://blog.kasralekan.com/ideas/lm-performance-plateau/model_graph_performance.jpg" class="img-fluid figure-img"></a></p>
<figcaption><span class="citation" data-cites="maximelabonne_due_2024">@maximelabonne (2024)</span></figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-model-graph-performance-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<p>Figure&nbsp;1 shows the consistent trend of improvement in open- and closed-source models. While I could spend this whole post writing just about his graph, for now just notice the general trend of improvement on MMLU 5-shot benchmark performance over time.</p>
<section id="the-hype-is-alive" class="level2">
<h2 class="anchored" data-anchor-id="the-hype-is-alive">🎉 The Hype is Alive</h2>
<blockquote class="blockquote">
<p>20 months ago, “ChatGPT is a revolution, the most powerful model ever made,” and today, you can run a model more preferred than this literally on a toaster!🍞 <span class="citation" data-cites="schmid_hugging_2024">(Schmid 2024)</span></p>
</blockquote>
<p>The quote from HuggingFace Technical Lead Philipp Schmid references Gemma-2-9b-it which, as of August 2nd, ranked 47th on HuggingFace’s language model benchmark – higher than GPT-3.5-Turbo-0613. Gemma 2 includes four different models <span class="citation" data-cites="schmid_welcome_2024">(see Schmid et al. 2024 for full model details)</span>:</p>
<ol type="1">
<li>gemma-2-9b: Base 9B model.</li>
<li>gemma-2-9b-it: Instruction fine-tuned version of the base 9B model.</li>
<li>gemma-2-27b: Base 27B model.</li>
<li>gemma-2-27b-it: Instruction fine-tuned version of the base 27B model.</li>
</ol>
<p>Thus, a 9 billion parameter model bested a 175 billion parameter model. The astute reader will note that Schmid must have the most advanced toaster ever to run Gemma 2 🤣.</p>
</section>
<section id="moores-law-is-dead" class="level2">
<h2 class="anchored" data-anchor-id="moores-law-is-dead">📈 Moore’s Law is Dead?</h2>
<blockquote class="blockquote">
<p>The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. <span class="citation" data-cites="sutton_bitter_2019">(Sutton 2019)</span></p>
</blockquote>
<p>Sutton’s famous 2019 blog post<sup>1</sup> “The Bitter Lesson” <span class="citation" data-cites="sutton_bitter_2019">(Sutton 2019)</span> indicated the primacy of computational power<sup>2</sup> in improving model performance over time rather than building expert knowledge into the models. However, as we approach the physical limits of transistors on a chip, maintaining Moore’s law has become impossible.</p>
<p>When I first read “The Bitter Lesson,” I thought Sutton argued that engineering work on models did not matter; I was wrong. The key lesson I now extract from my experience, the academic literature, and “The Bitter Lesson” is that model architectures must optimize their use of computation. For example, the introduction of self-attention mechanisms in Transformer models allowed each token to attend to every other token in the input sequence, leveraging parallel computation to process large amounts of data efficiently. Similarly, architectures like convolutional neural networks (CNNs) capitalize on the spatial structure of data, using shared weights and local connectivity to optimize computational efficiency and scalability. These architectures do not merely rely on hand-crafted features but instead exploit the raw power of computation to learn and generalize from massive datasets.</p>
<p>The lesson here <strong><em>seems</em></strong> clear: the architectures that thrive are those that best adapt to and utilize the growing computational resources available.</p>
</section>
<section id="limitations-on-available-data" class="level2">
<h2 class="anchored" data-anchor-id="limitations-on-available-data">💾 Limitations on Available Data</h2>
<blockquote class="blockquote">
<p>The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. <span class="citation" data-cites="muennighoff_scaling_2023">(Muennighoff et al. 2023)</span></p>
</blockquote>
<p>Architectures need to be efficient because of the massive amount of data used during training. Several papers including <span class="citation" data-cites="muennighoff_scaling_2023">Muennighoff et al. (2023)</span> note how we are reaching the limits of available data generated by humans. Thus, other authors have investigated training on AI-generated data <span class="citation" data-cites="gudibande_false_2023">Shumailov et al. (2023)</span> with mixed results.</p>
<p>If we cannot solve the data bottleneck, relying on more computation will not close the performance gap between state-of-the-art models and general intelligence.</p>
</section>
</section>
<section id="the-future-of-language-models" class="level1">
<h1>The Future of Language Models</h1>
<section id="a-focus-on-efficient-performance" class="level2">
<h2 class="anchored" data-anchor-id="a-focus-on-efficient-performance">⚙️ A Focus on Efficient Performance</h2>
<p>Like Gemma 2, which I discussed (and OpenAI’s Turbo models before it), there is an increasing focus on efficiency and inference speed as we begin to see plateaus in language model performance. Similarly, Mistral Large 2, the second generation of the startup’s flagship model, was announced with a post entitled “Large Enough.” The model is designed to “push the boundaries of cost efficiency, speed, and performance” <span class="citation" data-cites="mistralai_large_2024">(MistralAI 2024)</span>.</p>
<p>I am not saying this is a bad push. I intend to devote an entire post to the technical innovations that have driven the gains in efficiency and inference speed we have seen in these models. However, I do not believe these technical innovations will lead to the future of AI models.</p>
</section>
<section id="a-paradigm-shift-is-needed" class="level2">
<h2 class="anchored" data-anchor-id="a-paradigm-shift-is-needed">🛝 A Paradigm Shift is Needed</h2>
<p>Language models have the following high-level flaws as I see it:</p>
<ol type="1">
<li>A coupling of knowledge and reasoning<sup>3</sup> capabilities.
<ul>
<li>This is most similar to the issues that the engineers behind models like Gemma 2 seek to address. It’s difficult to train a small model with high-level reasoning capabilities when its weights have to hold so much information in them.</li>
<li>There is a growing literature on grounding language models with “World Models.”</li>
</ul></li>
<li>There is no concept of thinking deeply, i.e.&nbsp;more inference compute doesn’t get you a better answer.
<ul>
<li>Even if we stop improving our chips, we are building far more of them than ever before. Thus, if we had a model that had reasoning capabilities grow with inference compute, that model could answer difficult questions given enough resources.</li>
</ul></li>
</ol>
<p>I believe that to achieve the next level of AI-based intelligence, a new approach that addresses at least one of these flaws is needed.</p>
<div id="fig-arc" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-arc-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="ai-benchmarks-arc.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="@arc_prize_inc_arc_2024 is a public competition set up to encourage researchers to focus on achieving General Intelligence with a new Arc-AGI benchmark."><img src="https://blog.kasralekan.com/ideas/lm-performance-plateau/ai-benchmarks-arc.png" class="img-fluid figure-img"></a></p>
<figcaption><span class="citation" data-cites="arc_prize_inc_arc_2024">ARC Prize (2024)</span> is a public competition set up to encourage researchers to focus on achieving General Intelligence with a new Arc-AGI benchmark.</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-arc-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2
</figcaption>
</figure>
</div>
<p>Much of the discussion of performance in this post has been based on benchmarks, a fascinating topic in its own right. Benchmarks have been instrumental in the advancement of NLP all the way back to the <a href="https://gluebenchmark.com/">GLUE benchmark</a> <span class="citation" data-cites="wang_glue_2018">(Wang et al. 2018)</span>, however, they can cause research to become myopically focussed <span class="citation" data-cites="gehrmann_gem_2021">(Gehrmann et al. 2021)</span> or mischaracterize the rate of progress as demonstrated in the paper from <span class="citation" data-cites="schaeffer_are_2023">Schaeffer, Miranda, and Koyejo (2023)</span> which won the priced Outstanding Paper award at NeurIPS 2023. Figure&nbsp;2 is a demonstration of one potential flaw from the team at the Arc Price.</p>
</section>
<section id="a-final-word-from-yann-lecun" class="level2">
<h2 class="anchored" data-anchor-id="a-final-word-from-yann-lecun">🍒 A Final Word from Yann Lecun</h2>
<div id="fig-lecun" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-lecun-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="cherryRL.webp" class="lightbox" data-gallery="quarto-lightbox-gallery-3" title="@lecun_predictive_2016 famously articulated on the NIPS stage in 2016 that Self-Supervised Learning with lots of data was the foundation of future models."><img src="https://blog.kasralekan.com/ideas/lm-performance-plateau/cherryRL.webp" class="img-fluid figure-img"></a></p>
<figcaption><span class="citation" data-cites="lecun_predictive_2016">Lecun (2016)</span> famously articulated on the NIPS stage in 2016 that Self-Supervised Learning with lots of data was the foundation of future models.</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-lecun-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3
</figcaption>
</figure>
</div>
<blockquote class="blockquote">
<p>If you are a student interested in building the next generation of AI systems, don’t work on LLMs <span class="citation" data-cites="yann_lecun_ylecun_if_2024">(Yann LeCun [@ylecun] 2024)</span></p>
</blockquote>
<p>Yann Lecun, a famous and foundational NLP researcher, has had some famous pronouncements in the past like in Figure&nbsp;3. Recently, he has been arguing that LLMs are not the solution to the intelligence problem despite their truly awesome performance.</p>



</section>
</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-arc_prize_inc_arc_2024" class="csl-entry">
ARC Prize, Inc. 2024. <span>“<span>ARC</span> <span>Prize</span>.”</span> <em>ARC Prize</em>. <a href="https://arcprize.org/">https://arcprize.org/</a>.
</div>
<div id="ref-gehrmann_gem_2021" class="csl-entry">
Gehrmann, Sebastian, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, et al. 2021. <span>“The <span>GEM</span> <span>Benchmark</span>: <span>Natural</span> <span>Language</span> <span>Generation</span>, Its <span>Evaluation</span> and <span>Metrics</span>.”</span> arXiv. <a href="http://arxiv.org/abs/2102.01672">http://arxiv.org/abs/2102.01672</a>.
</div>
<div id="ref-gudibande_false_2023" class="csl-entry">
Gudibande, Arnav, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, and Dawn Song. 2023. <span>“The <span>False</span> <span>Promise</span> of <span>Imitating</span> <span>Proprietary</span> <span>LLMs</span>.”</span> arXiv. <a href="http://arxiv.org/abs/2305.15717">http://arxiv.org/abs/2305.15717</a>.
</div>
<div id="ref-lecun_predictive_2016" class="csl-entry">
Lecun, Yann. 2016. <span>“Predictive <span>Learning</span>, <span>NIPS</span> 2016 <span></span> <span>Yann</span> <span>LeCun</span>, <span>Facebook</span> <span>Research</span>.”</span> <a href="https://www.youtube.com/watch?v=Ount2Y4qxQo">https://www.youtube.com/watch?v=Ount2Y4qxQo</a>.
</div>
<div id="ref-maximelabonne_due_2024" class="csl-entry">
@maximelabonne. 2024. <span>“Due to Popular Demand, <span>I</span>’ve Updated This Figure to Include <span>DeepSeek</span>-<span>V2</span> and <span>Mistral</span> <span>Large</span> 2. <span>It</span>’s Also More Zoomed for Readability. Https://t.co/<span class="nocase">jWEpxH9zgO</span>.”</span> Tweet. <em>Twitter</em>. <a href="https://x.com/maximelabonne/status/1816416043511808259/photo/1">https://x.com/maximelabonne/status/1816416043511808259/photo/1</a>.
</div>
<div id="ref-mistralai_large_2024" class="csl-entry">
MistralAI. 2024. <span>“Large <span>Enough</span>.”</span> <a href="https://mistral.ai/news/mistral-large-2407/">https://mistral.ai/news/mistral-large-2407/</a>.
</div>
<div id="ref-muennighoff_scaling_2023" class="csl-entry">
Muennighoff, Niklas, Alexander M. Rush, Boaz Barak, Teven Le Scao, Nouamane Tazi, Aleksandra Piktus, Sampo Pyysalo, Thomas Wolf, and Colin Raffel. 2023. <span>“Scaling <span>Data</span>-<span>Constrained</span> <span>Language</span> <span>Models</span>.”</span> In. <a href="https://openreview.net/forum?id=j5BuTrEj35">https://openreview.net/forum?id=j5BuTrEj35</a>.
</div>
<div id="ref-roser_what_2024" class="csl-entry">
Roser, Max, Hannah Ritchie, and Edouard Mathieu. 2024. <span>“What Is <span>Moore</span>’s <span>Law</span>?”</span> <em>Our World in Data</em>, February. <a href="https://ourworldindata.org/moores-law">https://ourworldindata.org/moores-law</a>.
</div>
<div id="ref-sauerwein_reflections_2024" class="csl-entry">
Sauerwein, David. 2024. <span>“Reflections on <span>The</span> <span>Bitter</span> <span>Lesson</span> <span></span> <span>LinkedIn</span>.”</span> <a href="https://www.linkedin.com/posts/davidsauerwein_ai-machinelearning-compute-activity-7215818757405888512-ogMw/">https://www.linkedin.com/posts/davidsauerwein_ai-machinelearning-compute-activity-7215818757405888512-ogMw/</a>.
</div>
<div id="ref-schaeffer_are_2023" class="csl-entry">
Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. 2023. <span>“Are <span>Emergent</span> <span>Abilities</span> of <span>Large</span> <span>Language</span> <span>Models</span> a <span>Mirage</span>?”</span> <em>Advances in Neural Information Processing Systems</em> 36 (December): 55565–81. <a href="https://proceedings.neurips.cc/paper_files/paper/2023/hash/adc98a266f45005c403b8311ca7e8bd7-Abstract-Conference.html">https://proceedings.neurips.cc/paper_files/paper/2023/hash/adc98a266f45005c403b8311ca7e8bd7-Abstract-Conference.html</a>.
</div>
<div id="ref-schmid_hugging_2024" class="csl-entry">
Schmid, Philipp. 2024. <span>“Hugging <span>Face</span> <span>Post</span> - <span>LinkedIn</span>.”</span> <a href="https://www.linkedin.com/posts/philipp-schmid-a6a2bb196_absolutely-wild-google-deepmindgemma-activity-7224466612043620352-dcgq/">https://www.linkedin.com/posts/philipp-schmid-a6a2bb196_absolutely-wild-google-deepmindgemma-activity-7224466612043620352-dcgq/</a>.
</div>
<div id="ref-schmid_welcome_2024" class="csl-entry">
Schmid, Philipp, Omar Sanseviero, Pedro Cuenca, Lewis Tunstall, Tom Aarsen, and Vaibhav Srivastav. 2024. <span>“Welcome <span>Gemma</span> 2 - <span>Google</span>’s New Open <span>LLM</span>.”</span> <a href="https://huggingface.co/blog/gemma2">https://huggingface.co/blog/gemma2</a>.
</div>
<div id="ref-shumailov_curse_2023" class="csl-entry">
Shumailov, Ilia, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, and Ross Anderson. 2023. <span>“The <span>Curse</span> of <span>Recursion</span>: <span>Training</span> on <span>Generated</span> <span>Data</span> <span>Makes</span> <span>Models</span> <span>Forget</span>.”</span> arXiv. <a href="http://arxiv.org/abs/2305.17493">http://arxiv.org/abs/2305.17493</a>.
</div>
<div id="ref-sutton_bitter_2019" class="csl-entry">
Sutton, Rich. 2019. <span>“The <span>Bitter</span> <span>Lesson</span>.”</span> <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">http://www.incompleteideas.net/IncIdeas/BitterLesson.html</a>.
</div>
<div id="ref-t3dotgg_ai_2024" class="csl-entry">
t3dotgg. 2024. <span>“<span>AI</span> Isn’t Gonna Keep Improving.”</span> <a href="https://www.youtube.com/watch?v=Y8Ym7hMR100">https://www.youtube.com/watch?v=Y8Ym7hMR100</a>.
</div>
<div id="ref-valmeekam_planning_2023" class="csl-entry">
Valmeekam, Karthik, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. 2023. <span>“On the <span>Planning</span> <span>Abilities</span> of <span>Large</span> <span>Language</span> <span>Models</span> : <span>A</span> <span>Critical</span> <span>Investigation</span>.”</span> arXiv. <a href="http://arxiv.org/abs/2305.15771">http://arxiv.org/abs/2305.15771</a>.
</div>
<div id="ref-wang_glue_2018" class="csl-entry">
Wang, Alex, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. <span>“<span>GLUE</span>: <span>A</span> <span>Multi</span>-<span>Task</span> <span>Benchmark</span> and <span>Analysis</span> <span>Platform</span> for <span>Natural</span> <span>Language</span> <span>Understanding</span>.”</span> In <em>Proceedings of the 2018 <span>EMNLP</span> <span>Workshop</span> <span>BlackboxNLP</span>: <span>Analyzing</span> and <span>Interpreting</span> <span>Neural</span> <span>Networks</span> for <span>NLP</span></em>, 353–55. Brussels, Belgium: Association for Computational Linguistics. <a href="https://doi.org/10.18653/v1/W18-5446">https://doi.org/10.18653/v1/W18-5446</a>.
</div>
<div id="ref-yang_neurips_2023" class="csl-entry">
Yang, Sherry, Ofir Nachum, Yilun Du, Stephen McAleer, Igor Mordatch, Linxi Fan, Jeannette Bohg, and Dale Schuurmans. 2023. <span>“<span>NeurIPS</span> <span>Workshop</span> · <span>Foundation</span> <span>Models</span> for <span>Decision</span> <span>Making</span>.”</span> <em>SlidesLive</em>. <a href="https://neurips.cc/virtual/2023/workshop/66525">https://neurips.cc/virtual/2023/workshop/66525</a>.
</div>
<div id="ref-yann_lecun_ylecun_if_2024" class="csl-entry">
Yann LeCun [@ylecun]. 2024. <span>“If You Are a Student Interested in Building the Next Generation of <span>AI</span> Systems, Don’t Work on <span>LLMs</span>.”</span> Tweet. <em>Twitter</em>. <a href="https://x.com/ylecun/status/1793326904692428907">https://x.com/ylecun/status/1793326904692428907</a>.
</div>
</div></section><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p><span class="citation" data-cites="sauerwein_reflections_2024">Sauerwein (2024)</span> has a good post covering the summary of “The Bitter Lesson” and related responses.↩︎</p></li>
<li id="fn2"><p>Moore’s law observes that the number of transistors in an integrated circuit roughly doubles every two years. <span class="citation" data-cites="roser_what_2024">Roser, Ritchie, and Mathieu (2024)</span> have a great post visualizing it.↩︎</p></li>
<li id="fn3"><p>“Reasoning” is a tricky term to nail down. Here I am referring to current benchmarks rather than more advanced tasks like planning. Planning is an extremely interesting capability that is essential for language-model-based agents. Check out the NeurIPS 2023 workshop on “Foundation Models for Decision Making” for a taste of this research <span class="citation" data-cites="yang_neurips_2023">(Yang et al. 2023)</span> as well as <span class="citation" data-cites="valmeekam_planning_2023">Valmeekam et al. (2023)</span>.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-lekan2024" class="csl-entry quarto-appendix-citeas">
Lekan, Kasra. 2024. <span>“Langage Model Performance Plateaus. What’s
Next?”</span> August 11, 2024. <a href="https://blog.kasralekan.com/ideas/lm-performance-plateau/">https://blog.kasralekan.com/ideas/lm-performance-plateau/</a>.
</div></div></section></div> ]]></description>
  <category>[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fhits.dwyl.com%2Fanrath%2Fblog_llm_plateau_final.json&amp;show=unique&amp;style=flat-square&amp;label=Views&amp;color=orange)]()</category>
  <category>NLP</category>
  <category>LLM</category>
  <category>Lit. Review</category>
  <guid>https://blog.kasralekan.com/ideas/lm-performance-plateau/</guid>
  <pubDate>Sun, 11 Aug 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.kasralekan.com/ideas/lm-performance-plateau/model_graph_performance.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Website Redesign</title>
  <dc:creator>Kasra Lekan</dc:creator>
  <link>https://blog.kasralekan.com/ideas/website-revamp/</link>
  <description><![CDATA[ 
<div class="progress" id="progress">
    <div class="train">
        <div class="train-tail"></div>
        <div class="train-body" id="train-body"></div>
        <div class="train-head"></div>
    </div>
</div>




<p><a href="webDalle.webp" class="lightbox" data-gallery="quarto-lightbox-gallery-1"><img src="https://blog.kasralekan.com/ideas/website-revamp/webDalle.webp" class="img-fluid" style="width:90.0%" alt="Web design graphic"></a></p>
<section id="context" class="level1">
<h1>🕰️ Context</h1>
<p>Like other programmers with some experience in website development, I have toyed around with the design of my website over time. My first website was on <a href="https://wordpress.com/">WordPress</a> before I learned to code, the second was built using a React template, and the third was a Hugo-based website using <a href="https://hugoblox.com/">HugoBlox</a>. The transition between the second and third website was motivated by the tech debt of the old React template I used as well as a desire to have a simple website that supported me during my Master’s degree with features for my research publications. The website was always planned as a temporary solution until such time as I could focus on redesigning from the ground up.</p>
</section>
<section id="my-design-process" class="level1">
<h1>⚒️ My Design Process</h1>
<p>My design process had six steps:</p>
<ol type="1">
<li>Design Search</li>
<li>Condense Learnings</li>
<li>Determine My Content (data)</li>
<li>Data-informed Design</li>
<li>Coding</li>
<li>User Testing (+ repeat steps as needed)</li>
</ol>
<p>In the following section, I detail steps 1 and 2. When reviewing my content, I realized (1) I have limited required copy and content that the user needs to see (mostly my work experience, published work, and projects) and (2) I wanted to support my writing with academic-style citations. After searching, I decided to separate my writing into its own website using the Quarto framework <span class="citation" data-cites="noauthor_quarto_2024">(<span>“Quarto”</span> 2024)</span>. This separation provides for easier writing, but importantly it allows me to refactor my base website without worrying about backwards compatibility for my writing. Quarto is quite expressive and customizable for a markdown-native framework which has allowed me to add a number of extra features to this website including a <a href="https://github.com/quarto-dev/quarto-cli/discussions/5895#discussioncomment-10259127">reference backrefs</a>, styled progress bar, view counters, and use of the <a href="https://developer.mozilla.org/en-US/docs/Web/API/View_Transitions_API">view transitions</a> API <sup>1</sup>.</p>
</section>
<section id="web-design-learnings" class="level1">
<h1>🪞 Web Design Learnings</h1>
<section id="successful-examples-categorized" class="level2">
<h2 class="anchored" data-anchor-id="successful-examples-categorized">Successful Examples Categorized</h2>
<div class="callout callout-style-default callout-warning callout-titled" title="Websites May Change">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Websites May Change
</div>
</div>
<div class="callout-body-container callout-body">
<!-- https://quarto.org/docs/authoring/callouts.html -->
<p>These websites may be redesigned by the time you are reading this.</p>
</div>
</div>
<section id="committing-to-an-aesthetic" class="level3">
<h3 class="anchored" data-anchor-id="committing-to-an-aesthetic">Committing to an aesthetic</h3>
<p>Examples include:</p>
<ul>
<li><strong>Minimal (mostly dark)</strong>
<ul>
<li><a href="https://www.ethanchng.com/">ethanchng.com</a></li>
<li><a href="https://antfu.me/">antfu.me</a></li>
<li><a href="https://brianlovin.com/">brianlovin.com</a></li>
</ul></li>
<li><strong>3D with <a href="https://threejs.org/">three.js</a></strong>
<ul>
<li><a href="https://itssharl.ee/">itssharl.ee</a> – brilliant use of animations and using the same 3D shapes on hover animations for desktop</li>
<li><a href="https://bruno-simon.com/">bruno-simon.com</a> – cool factor achieved</li>
<li><a href="https://jesse-zhou.com/">jesse-zhou.com</a> – extremely detailed design</li>
<li><a href="https://www.edwardh.io/">edwardh.io</a> – fun integration of 3D in a normal portfolio website</li>
</ul></li>
<li><strong>Bright and Curvy</strong>
<ul>
<li><a href="https://www.amysboyd.com/">amysboyd.com</a></li>
<li><a href="https://www.seanhalpin.xyz/">seanhalpin.xyz</a></li>
</ul></li>
<li><strong>Gallery (lots of pictures)</strong>
<ul>
<li><a href="https://cydstumpel.nl/">cydstumpel.nl</a></li>
<li><a href="https://nicolasloureiro.com/">nicolasloureiro.com</a></li>
</ul></li>
<li><strong>Other Aesthetic Examples</strong>
<ul>
<li><em>Coder</em>
<ul>
<li><a href="https://tamalsen.dev/">tamalsen.dev</a></li>
<li><a href="https://vscode-portfolio.vercel.app/">vscode-portfolio</a></li>
</ul></li>
<li><em>8-bit</em>
<ul>
<li><a href="https://thegeekdesigner.com/">thegeekdesigner.com</a> – great copy as well as brilliantly designed</li>
<li><a href="https://expensive.toys/404">expensive.toys</a> – a fun 404 page</li>
</ul></li>
</ul></li>
</ul>
</section>
<section id="animations-interaction" class="level3">
<h3 class="anchored" data-anchor-id="animations-interaction">Animations &amp; Interaction</h3>
<p>Examples include:</p>
<ul>
<li><em>Loading animation</em>
<ul>
<li><a href="https://patrickheng.com/">patrickheng.com</a></li>
</ul></li>
<li><em>Scroll animation</em>
<ul>
<li><a href="https://aimpie.design/">aimpie.design</a></li>
<li><a href="https://cherupil.com/">cherupil.com</a></li>
</ul></li>
<li><em>Cursor following animations, clickable items, etc.</em>
<ul>
<li><a href="https://minhpham.design/">minhpham.design/</a> – hilarious hidden copy on hover</li>
</ul></li>
</ul>
</section>
</section>
<section id="my-key-design-considerations" class="level2">
<h2 class="anchored" data-anchor-id="my-key-design-considerations">My Key Design Considerations</h2>
<ul>
<li><strong>Wow Factor</strong>: This is one of those <em>you know it when you see it</em> traits. Having reflected on all the websites above and many more, I believe that the <em>Wow Factor</em> is based on (1) Animations / Interactivity and (2) Design Cohesion. Cohesion for some designs is relatively easy, e.g.&nbsp;a minimalistic design is naturally cohesive due to the relatively few design elements being implemented. Ultimately, cohesion also depends on decisions about Typography, Colors, and Animations.</li>
<li><strong>Reducing Cognitive Load</strong>: Some websites are awe-inspiring but make it hard to get to the raw data or have scrolling animations that make my fingers hurt due to the amount of time it takes to traverse the page. A brilliant example of a low-cognitive-load website that is also impressive is <a href="https://gkoberger.com/">gkoberger.com</a> which uses clickable objects in the hero div and related animations to present a simple, yet information-dense website. This website served as a key inspiration for my design.</li>
<li><strong>Who is the Audience</strong>: Many excellent websites struggle to deliver a similar experience on mobile devices. There are inherent limitations of mobile but at a minimum, the same information should be accessible to the user. This is just one example of having to develop for different audiences who either are viewing the website differently or looking for different information from the website.</li>
</ul>
<p>The tension between generating a Wow Factor while having a low User Cognitive Load and serving the different audiences viewing my website makes the design process challenging.</p>
<p><strong><em>My objectives</em></strong> to balance these considerations were:</p>
<ol type="1">
<li><ins>
Commit
</ins>
to a unique aesthetic for my website, e.g.&nbsp;making it look like a Google/Apple Maps page.</li>
<li>Ensure all <ins>high-priority information is accessible within 5 seconds</ins> of loading the website (ideally all the information is within the first page on desktop) &nbsp; &nbsp; - Subgoal: Be <ins>information efficient</ins>: tighten up copy, provide smaller bites of information with clickable elements for users to learn more when interested, and use visuals to display information when possible.</li>
<li><ins>
Implement animations
</ins>
for Loading, Hovers, Clicking into sections, and (if applicable) Scrolling. Consider opportunities for interactivity.</li>
<li>Ensure all information is <ins>accessible on mobile</ins> and consider how to make mobile viewing equally enjoyable as desktop viewing.</li>
</ol>
</section>
</section>
<section id="results" class="level1">
<h1>✔️ Results</h1>
<div id="fig-pie" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-pie-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="og.png" class="lightbox" data-gallery="quarto-lightbox-gallery-2" title="A taste of my new website."><img src="https://blog.kasralekan.com/ideas/website-revamp/og.png" class="img-fluid figure-img"></a></p>
<figcaption>A taste of my new website.</figcaption>
</figure>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-pie-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<p>View my website at <a href="https://kasralekan.com/">kasralekan.com</a>.</p>



</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-noauthor_quarto_2024" class="csl-entry">
<span>“Quarto.”</span> 2024. <em>Quarto</em>. <a href="https://quarto.org/">https://quarto.org/</a>.
</div>
</div></section><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>This implementation of view transitions has been quite hacky and required much more refinement than anticipated due to the way Quarto injects content or executes JS based on the header yml elements in the different page files.↩︎</p></li>
</ol>
</section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-lekan2024" class="csl-entry quarto-appendix-citeas">
Lekan, Kasra. 2024. <span>“Website Redesign.”</span> July 29, 2024. <a href="https://blog.kasralekan.com/ideas/website-revamp/">https://blog.kasralekan.com/ideas/website-revamp/</a>.
</div></div></section></div> ]]></description>
  <category>[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fhits.dwyl.com%2Fanrath%2Fblog_website-revamp.json&amp;show=unique&amp;style=flat-square&amp;label=Views&amp;color=orange)]()</category>
  <category>WebDev</category>
  <category>design</category>
  <category>creativity</category>
  <guid>https://blog.kasralekan.com/ideas/website-revamp/</guid>
  <pubDate>Mon, 29 Jul 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.kasralekan.com/ideas/website-revamp/webDalle.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>Extending “Towards Monosemanticity”</title>
  <dc:creator>Kasra Lekan</dc:creator>
  <link>https://blog.kasralekan.com/ideas/towards-monosemanticity/</link>
  <description><![CDATA[ 
<div class="progress" id="progress">
    <div class="train">
        <div class="train-tail"></div>
        <div class="train-body" id="train-body"></div>
        <div class="train-head"></div>
    </div>
</div>




<section id="background" class="level1">
<h1>Background</h1>
<p>Based on <a href="https://transformer-circuits.pub/2023/monosemantic-features/index.html">Towards Monosemanticity: Decomposing Language Models With Dictionary Learning</a> <span class="citation" data-cites="bricken_towards_2023">(Bricken et al. 2023)</span> by Anthropic and <a href="https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html">Language models can explain neurons in language models</a> <span class="citation" data-cites="bills_language_2023">(Bills et al. 2023)</span> by OpenAI, I attempted to generate natural language explanations for the neurons in Distill-GPT-2 by projecting the final MLP output layer to a higher dimension (similar to dictionary learning) and then using a language model (gpt-4-turbo-2024-04-09) to generate natural language descriptions of each higher dimension using activation values. The underlying theoretical foundation is the Superposition hypothesis which, simply put, states that each neuron in a language model learns a complicated mix of concepts. For instance, a neuron may activate strongly on Korean and DNA sequences. Thus, by projecting MLP outputs to a higher dimension we can attempt to create “features” that represent an explainable concept.</p>
</section>
<section id="challenges" class="level1">
<h1>Challenges</h1>
<ul>
<li>Reproducing Anthropic’s representation from the paper’s appendix
<ul>
<li>Huge thanks to Neel Nanda for his blog post and repo <span class="citation" data-cites="nanda_neelnanda-io1l-sparse-autoencoder_2024 nanda_neelnanda_transformerlensorgtransformerlens_2024">(Nanda 2024a, 2024b)</span>.</li>
</ul></li>
<li>Tuning hyperparameters</li>
<li>Loss degradation over longer training runs</li>
<li>Automated interpretability using <a href="https://github.com/openai/automated-interpretability">OpenAI’s implementation package</a> &nbsp;- API changes requiring code refactoring due to data parsing changes or rewriting due to missing information. &nbsp;- Poor responses from GPT-4, ultimately making automated interpretability impossible without adjusting the prompts</li>
</ul>
</section>
<section id="observations" class="level1">
<h1>Observations</h1>
<section id="training-autoencoders-for-reconstruction" class="level2">
<h2 class="anchored" data-anchor-id="training-autoencoders-for-reconstruction">Training Autoencoders for Reconstruction</h2>
<p>The primary metric I used for MLP reconstruction efficacy was “reconstruction score”:</p>
<p><img src="https://latex.codecogs.com/png.latex?score%20=%20%5Cfrac%7Bzero%5C_abl%5C_loss%20-%20recons%5C_loss%7D%7Bzero%5C_abl%5C_loss%20-%20loss%7D"></p>
<p>I was able to reproduce Anthropic’s autoencoder on a single-layer transformer with a GELU activation, achieving a reconstruction score of ~94% with 2 billion training tokens (fewer than Anthropic’s run). With Distill-GPT-2, I was only able to achieve a reconstruction score of ~77% (with a 32x dictionary size). I observed that (1) training with more tokens did not significantly improve reconstruction scores. Additionally, performance would deteriorate throughout the training run after reaching an optimum, suggesting that when scaling up this approach, a more sophisticated training strategy would be necessary.</p>
</section>
<section id="dictionary-size" class="level2">
<h2 class="anchored" data-anchor-id="dictionary-size">Dictionary Size</h2>
<p>Anthropic tested many dictionary sizes from 1x to 256x but focussed on 8x for their primary findings. I hypothesized that a larger size would be optimal for a larger model since it is trained on more tokens and learns more complex representations. I first trained a 32x dictionary and later trained a 128x dictionary. Training a larger dictionary naturally was more computationally intensive, <img src="https://latex.codecogs.com/png.latex?O(n)">.</p>
</section>
<section id="interpretability" class="level2">
<h2 class="anchored" data-anchor-id="interpretability">Interpretability</h2>
<div id="fig-pie" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-pie-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<a href="monoPieFigure.png" class="lightbox" data-gallery="quarto-lightbox-gallery-1" title="Figure&nbsp;1: "><img src="https://blog.kasralekan.com/ideas/towards-monosemanticity/monoPieFigure.png" class="img-fluid figure-img"></a>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-pie-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<p>The natural language explanations suggested that 32x is not expressive enough for Distill-GPT-2. Many features had the same explanation because they activated on a wide range of tokens. Since these ranges were not specific, these explanations fitted more to the high-frequency tokens in the evaluation text rather than the model Figure&nbsp;1. Thus, I posit that larger models need much larger dictionaries as they encode more features for each neuron in the MLP layer. There are ~15x more parameters in Distill-GPT-2 than in the single-layer transformer that Anthropic analyzed. Thus, I opted to test a 128x dictionary in addition to the 32x.</p>
<p>Training a 128x dictionary was far more computationally intensive and did not reach a high enough reproduction score to facilitate interpretability. After training for 2 billion tokens, the score was only ~48%. Additional training, led to degredation from this optimum, emphasizing the need for increased sophistication with autoencoder training as the model being interpreted becomes larger.</p>
<div class="callout callout-style-default callout-tip callout-titled" title="Acknowledgements">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Acknowledgements
</div>
</div>
<div class="callout-body-container callout-body">
<!-- https://quarto.org/docs/authoring/callouts.html -->
<p>This work would not have been possible without guidance from <a href="https://yangfengji.net/">Professor Yangfeng Ji</a>.</p>
</div>
</div>
<p>Check out my presentation on this research <a href="./report.pdf">here</a>.</p>



</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0">
<div id="ref-bills_language_2023" class="csl-entry">
Bills, Steven, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, and William Saunders. 2023. <span>“Language Models Can Explain Neurons in Language Models.”</span> <em>Open AI Public</em> 2.
</div>
<div id="ref-bricken_towards_2023" class="csl-entry">
Bricken, Trenton, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, et al. 2023. <span>“Towards Monosemanticity: <span>Decomposing</span> Language Models with Dictionary Learning.”</span> <em>Transformer Circuits Thread</em> 2.
</div>
<div id="ref-nanda_neelnanda-io1l-sparse-autoencoder_2024" class="csl-entry">
Nanda, Neel. 2024a. <span>“Neelnanda-Io/<span>1L</span>-<span>Sparse</span>-<span>Autoencoder</span>.”</span> <a href="https://github.com/neelnanda-io/1L-Sparse-Autoencoder">https://github.com/neelnanda-io/1L-Sparse-Autoencoder</a>.
</div>
<div id="ref-nanda_neelnanda_transformerlensorgtransformerlens_2024" class="csl-entry">
———. 2024b. <span>“<span>TransformerLensOrg</span>/<span>TransformerLens</span>.”</span> TransformerLensOrg. <a href="https://github.com/TransformerLensOrg/TransformerLens">https://github.com/TransformerLensOrg/TransformerLens</a>.
</div>
</div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-lekan2024" class="csl-entry quarto-appendix-citeas">
Lekan, Kasra. 2024. <span>“Extending <span>‘Towards
Monosemanticity’</span>.”</span> April 3, 2024. <a href="https://blog.kasralekan.com/ideas/towards-monosemanticity/">https://blog.kasralekan.com/ideas/towards-monosemanticity/</a>.
</div></div></section></div> ]]></description>
  <category>[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fhits.dwyl.com%2Fanrath%2Fblog_towards-monosemanticity.json&amp;show=unique&amp;style=flat-square&amp;label=Views&amp;color=orange)]()</category>
  <category>research</category>
  <category>NLP</category>
  <category>LLM</category>
  <guid>https://blog.kasralekan.com/ideas/towards-monosemanticity/</guid>
  <pubDate>Wed, 03 Apr 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.kasralekan.com/ideas/towards-monosemanticity/featured.png" medium="image" type="image/png" height="82" width="144"/>
</item>
</channel>
</rss>
