\n\nThis analysis uses registered users only, and uses their traffic in the past year. We find that registered and unregistered users have similar traffic patterns, but we can more easily identify registered users and have higher quality data for them. You can see \u003Ca href=\"https://stackoverflow.com/users/prediction-data\">exactly what kind of data we store for you as a user\u003C/a>, as well as out of predictions.\n\nWe see in this plot that all of the options Amazon has identified as finalist cities are very similar to each other. If we added a city in Russia or India to this plot, we would see a significantly lower cosine similarity compared to these North American tech centers. Northern Virginia and Washington, DC are the most similar to Seattle in terms of the kinds of technologies that developers visit. Developers in Northern VA and Washington, DC visit a mix of technologies at proportions that are the closest to developers in Seattle (at least, the parts of Seattle that are \u003Cem>not\u003C/em> Redmond). There is another tier that is very close in similarity, and it includes Atlanta, Newark, Philadelphia, and Montgomery County. This is super interesting, but that isn't all we can learn from this kind of data. We can use statistical analysis to explore more.\n\n\u003Ch2>Understanding developers using principal component analysis\u003C/h2>\n\nWe can use a statistical technique called \u003Ca href=\"https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues/140579#140579\">principal component analysis\u003C/a> to answer these kinds of questions. Developers who come to Stack Overflow don't visit tags in random combinations; the tags that any individual visits are related in ways that are connected to the kind of work that they do.\n\nLet's think of each Stack Overflow user as a point in a high-dimensional space with tags as the coordinates. Principal component analysis is a way to \u003Ca href=\"http://setosa.io/ev/principal-component-analysis/\">project these points\u003C/a> (or users, in this case) onto a new, special coordinate system. In the new coordinate system, each coordinate, or principal component, is a weighted sum of tags/technologies. The first principal component has the most variance in users in its direction, the second principal component has the second most variance in users in its direction, and so forth.\n\n\u003Cimg class=\"aligncenter size-large wp-image-9170\" src=\"https://stackoverflow.blog/wp-content/uploads/2018/02/first_components-1-964x675.png\" alt=\"\" width=\"964\" height=\"675\" />\n\nThis plot shows the first six components or dimensions from principal component decomposition of registered traffic from the last year to Stack Overflow questions. Notice the combinations of tags that appear together in these different components.\n\n\u003Cul>\n\u003Cli>The first principal component, which explains the most variation in Stack Overflow users, contrasts users who visit a lot of front-end technologies (HTML, JavaScript, jQuery) with those who visit a lot of Python and/or low-level technologies like C++. When we look at all of our users, this spectrum from front-end to low-level and Python is what explains the most difference from one user to another.\u003C/li>\n\u003Cli>The second principal component, which explains the second largest amount of variation in Stack Overflow users, is not a contrast between two kinds of things, but instead is focused on one family of technologies- the Microsoft ecosystem of C#, .NET, Visual Studio, and related technologies. The characteristic of developers that explains the second most difference is whether or not they use these Microsoft technologies.\u003C/li>\n\u003Cli>The third principal component focuses on Android and iOS; this component measures to what extent a developer works building mobile apps.\u003C/li>\n\u003Cli>The fourth principal component is another single family, focused on Java, Spring, and Maven.\u003C/li>\n\u003Cli>The fifth principal component is back to a set of contrasts, and measures how much a developer works with C++ and C versus how much they work with SQL, databases, and perhaps some data handling with dataframes.\u003C/li>\n\u003Cli>The six principal component returns to iOS development for Apple devices, but instead of being partnered with Android like it was before, now it is contrasted with Java tags. This is a lower-rank principal component, so this difference explains less variation in users than the fourth principal component.\u003C/li>\n\u003C/ul>\n\nThere are many principal components, each one less important than the one before in explaining differences between various users. This projection of traffic data into a new coordinate system allows us to draw conclusions about Amazon's candidate city choices.\n\n \n\n\u003Cimg class=\"aligncenter size-large wp-image-9182\" src=\"https://stackoverflow.blog/wp-content/uploads/2018/02/scatter_plot-1-1-844x675.png\" alt=\"\" width=\"844\" height=\"675\" />\n\nThere is a lot of information in a plot like this, so let's talk through some details. The labels on the x-axis and y-axis include what percent variation in the data is explained by each component. Each orange or blue point labeled with a city or region represents the aggregate, average user in that metro area, while the gray points represent real, individual users. The principal component decomposition was calculated using all registered users who visited at least 200 questions in the last year, but these plots show one of out every 10 users, for visual clarity.\n\nThe analysis in this blog post uses our total, global traffic (not just North America), so the first conclusion we can draw here is that the similarities among Amazon's candidate cities are high compared to global variation in developer traffic. Compared to our traffic worldwide, these 20 locations are pretty similar to each other. All 20 North American cities are focused proportionally more on low-level languages and Python (more to the left), and compared to the worldwide distribution they use more Microsoft technologies (more up).\n\nWhen I ran this analysis but \u003Cem>included\u003C/em> Redmond and the locations around the Microsoft campus in my definition of what Seattle is, Seattle had a higher contribution from this Microsoft-dominated principal component. Dallas, Columbus, and Indianapolis are furthest in the direction (up) on this plot that indicates more Microsoft technologies; these are cities that have proportionally more developers working with technologies like C#, .NET, and Visual Studio. Depending on how invested Amazon wants to be in the Microsoft tech stack, this might be attractive or a limitation.\n\nWhat if Amazon wants to invest more in mobile development? (I know \u003Cem>I\u003C/em> have bought plenty of things on Amazon's app on my phone.)\n\n\u003Cimg class=\"aligncenter size-large wp-image-9172\" src=\"https://stackoverflow.blog/wp-content/uploads/2018/02/mobile-1-844x675.png\" alt=\"\" width=\"844\" height=\"675\" />\n\nThe candidates are even closer together in this plot, and far away from areas (up and left) that are associated with lots of mobile development. We find that \u003Ca href=\"https://stackoverflow.blog/2017/08/22/world-mobile-development/\">mobile development happens a lot in countries outside of North America\u003C/a>. If Amazon wants to choose a city with proportionally more mobile developers, good choices would be Los Angeles, New York, and Toronto.\n\nWhat if Amazon wants to invest more in data science and machine learning? All of Amazon's customers experience how they put data science to work, whether it is the \u003Ca href=\"https://twitter.com/justinshanes/status/803453049603690496\">recommendation engine\u003C/a> or the natural language processing of the \u003Ca href=\"https://stackoverflow.com/questions/tagged/amazon-echo\">Amazon Echo\u003C/a>.\n\n\u003Cimg class=\"aligncenter size-large wp-image-9181\" src=\"https://stackoverflow.blog/wp-content/uploads/2018/02/data_science-1-1-844x675.png\" alt=\"\" width=\"844\" height=\"675\" />\n\nThis next plot moves us pretty far down the rank of principal components; notice that these dimensions each account for about 1.5% of variation among our users. All of Amazon's candidate cities have unusually large absolute value and negative-PC17/positive-PC18 values for these two components compared to the global distribution. Let's check out the technologies that contribute to these dimensions in these directions.\n\n\u003Cimg class=\"aligncenter size-large wp-image-9171\" src=\"https://stackoverflow.blog/wp-content/uploads/2018/02/later_components-1-1200x480.png\" alt=\"\" width=\"1024\" height=\"410\" />\n\nThe negative side of principal component 17 involves Hadoop, Spark, Hive, and Scala while the positive side of principal component 18 focuses on R, ggplot2, and statistics. These two components measure how much users are involved in data engineering and data science, respectively, and all of Amazon's candidate cities have relatively large values for these. If Amazon wants to choose a city with proportionally more developers experienced in these technologies, Raleigh and Columbus would be great choices. It is important to note that often we see statistical analysis technologies like R used proportionally more in cities with high academic, research, and grad student populations. Columbus and Raleigh both have healthy academic centers that are likely contributing here, but Amazon specifically listed proximity to major universities as something they are looking for, so maybe this is good!\n\n\u003Ch2>Where should Amazon establish a second headquarters?\u003C/h2>\n\nSo after all this analysis, what can we say from analysis of Stack Overflow traffic about Amazon's options for a second headquarters? If I were asked to offer insights into this choice, what would I recommend?\n\n\u003Cul>\n\u003Cli>These large cities and metro areas in the United States are quite similar to each other, especially compared to worldwide variation, and it's unlikely that any would be a truly bad choice.\u003C/li>\n\u003Cli>The choices that are most similar overall to Seattle in terms of technology ecosystems are Northern VA and Washington, DC. If Amazon wants to go with a city where the developer population feels as familiar as possible, these would be the way to go.\u003C/li>\n\u003Cli>If Amazon wants to choose a city with proportionally more mobile developers, Los Angeles, New York, and Toronto would be the best choices.\u003C/li>\n\u003Cli>If Amazon wants to choose a city with proportionally more developers working in data science and machine learning, Raleigh or Columbus would be excellent choices.\u003C/li>\n\u003C/ul>\n\nAt Stack Overflow, we're able to explore these kinds of questions because we understand developers, technologies, and how these technologies are related to each other in complex ecosystems. We use this expertise to help companies \u003Ca href=\"https://www.stackoverflowbusiness.com/talent\">understand, reach, engage with, and hire developers\u003C/a>.","html","2018-02-28T17:00:47.000Z",{"current":482},"evaluating-options-amazons-hq2-using-stack-overflow-data",[484,492,497,502],{"_createdAt":485,"_id":486,"_rev":487,"_type":488,"_updatedAt":485,"slug":489,"title":491},"2023-05-23T16:43:21Z","wp-tagcat-announcements","9HpbCsT2tq0xwozQfkc4ih","blogTag",{"current":490},"announcements","Announcements",{"_createdAt":485,"_id":493,"_rev":487,"_type":488,"_updatedAt":485,"slug":494,"title":496},"wp-tagcat-insights",{"current":495},"insights","Insights",{"_createdAt":485,"_id":498,"_rev":487,"_type":488,"_updatedAt":485,"slug":499,"title":501},"wp-tagcat-stackoverflow",{"current":500},"stackoverflow","Stackoverflow",{"_createdAt":485,"_id":503,"_rev":487,"_type":488,"_updatedAt":485,"slug":504,"title":505},"wp-tagcat-data",{"current":505},"data","Evaluating Options for Amazon's HQ2 Using Stack Overflow Data",[508,514,520,526],{"_id":509,"publishedAt":510,"slug":511,"sponsored":12,"title":513},"370eca08-3da8-4a13-b71e-5ab04e7d1f8b","2025-08-28T16:00:00.000Z",{"_type":10,"current":512},"moving-the-public-stack-overflow-sites-to-the-cloud-part-1","Moving the public Stack Overflow sites to the cloud: Part 1",{"_id":515,"publishedAt":516,"slug":517,"sponsored":471,"title":519},"e10457b6-a9f6-4aa9-90f2-d9e04eb77b7c","2025-08-27T04:40:00.000Z",{"_type":10,"current":518},"from-punch-cards-to-prompts-a-history-of-how-software-got-better","From punch cards to prompts: a history of how software got better",{"_id":521,"publishedAt":522,"slug":523,"sponsored":12,"title":525},"65472515-0b62-40d1-8b79-a62bdd2f508a","2025-08-25T16:00:00.000Z",{"_type":10,"current":524},"making-continuous-learning-work-at-work","Making continuous learning work at work",{"_id":527,"publishedAt":528,"slug":529,"sponsored":12,"title":531},"1b0bdf8c-5558-4631-80ca-40cb8e54b571","2025-08-21T14:00:25.054Z",{"_type":10,"current":530},"research-roadmap-update-august-2025","Research roadmap update, August 2025",{"count":533,"lastTimestamp":534},32,"2023-05-25T09:46:33Z",["Reactive",536],{"$sarticleModal":537},false,["Set"],["ShallowReactive",540],{"sanity-_VxozhbaR5VODufGk0JiVxh5t-TsoNkgAWEhIu6uthg":-1,"sanity-comment-wp-post-9168-1756687408647":-1},"/2018/02/28/evaluating-options-amazons-hq2-using-stack-overflow-data"]