A new chapter for this site! This is something I’ve been working on for a long time and I’m excited to see it finally coming together.
For almost two decades, I’ve been working on a generalized data abstraction and analysis platform which has expanded and contracted and drifted in scope but always been a central part of my own approach to data.
The idea is that it’s a simple, free, open-source piece of software similar to wordpress except for the purpose of analyzing and illustrating and narrating complex relationships in data. Historically, this basically looked just like wordpress except instead of blog posts, you wrote SQL queries, and then the public-facing pages showed a graph and a downloadable data table. The data was automatically refreshed periodically from whatever source, and then the graphs and tables were updated. It is a really cool platform that I always enjoyed working on, but in the past few years something changed for me.
Towards the end of my Urban Planning degree, I took my fourth semester of statistics and came across a concept I’d not heard before. The final project was to construct a chart showing all the different types of data we see in social surveys and then explain how we can mathematically determine whether sets of data have significant relationships, and then what kind of analysis and illustration is most effective based on the types of data involved in the relationship and the kind of relationship that has been mathematically identified.
This really opened up a whole new way of thinking about the problem my old project was trying to solve, especially considering the data we were looking at in the class was things like the General Social Survey.
The GSS collects information biannually and keeps a historical record of the concerns, experiences, attitudes, and practices of residents of the United States. Since 1972, the GSS has been monitoring societal change and studying the growing complexity of American society.
This survey on its own is a super interesting source of a ton of fascinating information about changing popular sentiments from every corner of America, geolocated and in time-series across the last half-century. You could spend a lifetime finding fascinating trends and relationships in this dataset.
BUT, there are also other similar massive datasets of related information about changes in society over time. One example is the Census Bureau’s Household Pulse Survey. This is a new but enormous dataset of trends around more hyper-specific social data, especially related to micro and macro changes in people’s lives after the start of the pandemic era.
And of course there are also the enormous datasets published by BLS and FRED which includes the standard meaningless political metrics like unemployment and inflation but it also includes things like:
- Labor force participation rate by age (BLS) (FRED)
- Consumer price index
- Mean Household Wages Adjusted by Cost of Living
- Percent of people with a disability 16 years and over
- Real gross domestic product per capita and of course its second and third derivatives.
Maybe you can guess where this is going. The amount of good non-politicized data on real measures of social change is a fascinating opportunity on its own, even without the advent of large language models and cheap compute.
I’ve also been working for years on projects around programmatic summarization of the main topics in current events across hundreds of newspapers on an hourly basis.
Imagine the potential for bringing all of these things together. How much work would it take a person to run and interpret chi squares and pearson correlations or multivariate regressions across every possible timeseries of data in all of these datasets on a daily basis and correlate findings to current events and then construct compelling narratives about the upstream causal factors of these relationships and how they play out in society.
Accomplishing this is not just a valuable and important thing on its own, but also a great way to bring my older ideas about building analytics tools into this new LLM world. It’s going to be a wild ride for sure.