PredictWise: 20-For-20

A little bit of personal background: Two very different feelings dominated on the morning after the 2016 election: anger/shame/frustration/sadness regarding the State of the Union (Not Strong!), and a sense of validation, namely that large-scale continuous data collection via disparate modes, paired with the right analytics, can yield actionable insights able to help us dissect the American mindscape. We were motivated to share these methods with the progressive ecosystem, but first we needed to understand it. What we learned about the progressive ecosystem in the wake of ’16 did not make us sleep better.

The Progressive Ecosystem post 2016 – non-continuous, monopolistic, non-collaborative

Progressive investment, both financial and soft, is inefficiently geared toward elections – period. Thus, the days after any election almost all campaign-related data is lost: data repositories insufficiently transfer from campaign to campaign and turnover rates at central Democratic organizations make it hard to develop sustainable infrastructure and institutional knowledge. Then, data collection, data analytics, content creation, and distribution come to a halt for extended periods of time, guaranteeing a massive hole in our: understanding of voters, development of infrastructure and content, and communication and messaging for vast periods of time between elections. Any inefficiencies in progressive methodology, conditional on a campaign being operational, are dwarfed by these dormant periods, but add to a baseline of negative spill-overs: In campaign times, data and analytics are created for specific clients, without regard for the general progressive cause, oftentimes focus on rival primary candidates, and is hardly shared between camps. All of this is exacerbated further by an ecosystem divided into small monopolies or duopolies, stymieing both competition and cooperation.

Conversely, Republicans have pushed toward continuous, hierarchical/vertical and cohesive messaging campaigns for decades, run by Koch, Mercer, Murdoch, and the RNC. In light of the non-existing Democratic defense – virtually no positive progressive content has been created and distributed on a continuous basis – this onslaught has led to predictable results. The outrage over the hideously framed estate tax is now an almost anachronistic example. In this day and age, Republican messaging campaigns, with their focus on misinformation, register everywhere, even among Independents and Democrats. Two prominent examples, from the PredictWise stock: On immigration, only 26% of Democrats know that fewer immigrants have come to the US since 2009. On healthcare, only 45% are aware that the percentage of Americans without health insurance has increased under Trump.

The problem does not stop there. On many issues. Republicans have convinced a sizable chunk of Americans that a Republican administration serves their interests better, while in fact their policy preferences are much more in line with Democrats. Example: 81% of Americans support Medicare Buy-in, 57% of Americans disapprove of a healthcare market allowing insurance polices NOT covering pre-existing positions. But, only 41% prefer the Democratic healthcare plan, as opposed to 39% preferring the Republican plan.

Toward a Permanent Campaign: 20-For-20

To combat these dynamics, our PredictWise 20-For-20 vision centers around a permanent campaign, focusing on positive progressive messaging.

Data Collection/Analytics: Continuous surveys with ad-hoc additions combined with behavioral data should be used to develop key attitudinal insights and value frames, and should be paired with continuous monitoring of media agenda setting and exposure. Data should be homogenized and aggregated centrally and linked individually where appropriate.

CONTINUOUS collection of baseline data. This data could include aspects of public opinion, fact knowledge, concern intensity, and psychographics. As the repository grows, this data will get more precise over time, while being able to shed light on important dynamics. For now, this kind of data has to be projected onto voter files via Machine Learning models, but the goal is to reduce the modeling component more and more as new data flows in, with the ultimate goal of curating a ground-truth attitudinal layer on top of the voter file that is not (or only very lightly modeled). Currently, PredictWise is stocking data of 300,000 unique respondents on more than 200 economic/psychographic/political attitudes.

AD-HOC data collection. This data should be used to rapidly inform political elites, media, and mass about public opinion on emerging issues. Example: A plurality of Americans believe that the Trump administration encourages non-democratic regimes to clamp down on dissidents and the free press, per PredictWise data.

COMBINATION of behavioral, survey and other data. We routinely build our models on top of a mix of behavioral and survey data. For example, ambient cell phone data can help us achieve scale; behavioral data can help account for measurement error/social desirability bias inherent in attitudinal data, for example when it comes to media consumption etc..

MEDIA AGENDA Tracking. We need to track agendas of mainstream media: (which still produces the bulk of news that people consume) (a) broadcast/cable media, (b) online as well, in order to react to emerging discourses. This requires independent scraping efforts, or deals and with the Internet Archive, NewsBank, or other aggregators of transcript-level data.

MEDIA EXPOSURE Tracking. We need to track individual-level exposure to (a) broadcast/cable media, (b) online as well, with the ultimate goal to create single-dimensional, dynamic exposure profiles combining exposure patterns and patterns of exposure content.This requires new partnerships with Nielsen, Commscore/Rentrak, etc.

Data collection HOMOGENIZED as much as possible across modes to create positive spill-over effects.For example, matching question formats of canvasing data with question formats in ongoing surveys allows various organizations to conduct more granular analyses, with more statistical power.

ID LINKAGE. In a world in which our content dissemination strategies move to the digital realm, we need to invest in ID linkage technology allowing us to reach who we want to reach with highest possible accuracy. For example, match rates of voter-file-based PII into Facebook, other Demand-Side-Platforms or Addressable TV are dismal, and can be significantly increased if we move to mobile-first identifiers such as MAIDs.

CENTRALIZED ANALYTICS layer. As opposed to sharing top-line data across the ecosystem, modern machine learning tools can yield much more powerful results when raw data are combined first.

Content Creation: Continuous content creation informed by data should be created cheaply with crowd-sourced labor and directed at both earned and paid media, focusing on low cost and high reach. Message testing needs to be externally valid and focused on where it has the highest marginal lift.

Content creation INFORMED BY BASELINE data. Data on targeted Americans – both on policy preferences and psychographics – should be used to inform (and cut down on) content dimensions considered for testing.

CROWD-SOURCED viral content. As opposed to relying on boutique ad shops that create curated ads for $$$$, we can leverage record engagement on the left for this task.

Content creation aiming for both, EARNED and PAID media. Content should be designed with two distribution channels in mind: organic, through social networks, AND paid media.

LIMIT MESSAGE TESTING to as few dimensions as possible. We believe that baseline data provides much more stable information regarding what political/psychographic content to focus on. Mote limited content details – color-schemes, placing, sizes of images etc. are much better suited for testing.

Message testing limited to ORGANIC ENVIRONMENTS. This is key to getting real treatment effects (i.e., externally valid), especially if treatment groups can be targeted weeks or even months post treatment, given what we know about decay of communication effects.

Content Distribution: Time to focus on digital, targetable media, with long-term strategy to win people over years, not weeks or days, with continuous and repeated exposure tested externally, i.e. outside the distribution platforms).

Content distributed DIGITAL FIRST. As we have pointed out in our academic work, the marginal value of pouring $$$ into DMA-level cable ad buys diminishes (a) over time of the campaign, (b) as $$$ spent in the DMA increases. Americans natively spend time online and on their phones, and we need to reach them there. And while Addressable TV is a new attractive alternative, we need to be mindful of the lack of marketplaces governing addressable TV, which means that we have to buy full DMAs, even if we are only interested in targeting a segment in that DMA. Being outspent by Trump’s campaign – especially when his content is geared toward the general election and ours is not, is a real disadvantage!

Different content distributed to different individuals. Targeting is good, but remember not to over-target and account for blow-backs, or boomerang effects, from mistargeted content. Boomerang effects stemming from mis-targeting certainly have the potential to depress overall treatment effects that are already quite low to begin with in the real world.

Content distributed with a LONG-TERM STRATEGY. As opposed to lining voters up on the horse-race dimension and going after the median voter, we need to use our data repositories to talk to Americans – no matter whether they voted or not, whether they fall in the middle of the horse-race distribution or not – about the issue they care about, and do so in a positive light. This creates long-term positive effects, and reflects our underlying belief that American voters hold different, but meaningful views on different issues,and that we can use this potpourri of policy preferences to our advantage, if we address the right ones with the right folks. For example, Republicans taking progressive stances on healthcare but conservative stances on LBTQ should be targeted with persuasive appeals on healthcare, while Republicans taking conservative stances on healthcare but progressive stances on LBTQ should be targeted with persuasive appeals on LBTQ.

Content distributed via NATIVE IDENTIFIERS. We cannot continue to use offline identifiers to target in the digital realm. Instead, we need to build repositories stored by MAID identifiers allowing a flawless integration into digital platforms and enabling cross-device targeting, while maintaining privacy.

Content distributed CONTINUOUSLY and REPEATEDLY. We need to hit Americans with the same content in various forms over time to combat the decay of effects stemming from one-time-exposure interventions.

Data used to generate EARNED MEDIA. For example, baseline data can be geared towards keeping mainstream media accountable regarding the unpopularity of major Republican polling initiatives. We can achieve this by distributing content written around timely public opinion data both organically and to paid channels (shout-out to the folks at Data for Progress who have internalized that strategy).

Large-scale data repositories used to TEST INTERMEDIATE EFFECT and ADJUST. Attitudinal data on targeted segments can inform treatment effect estimates that (a) include effects among non-targeted segments who might register effects because content has been shared with them via their social networks – both online or offline, (b) offer an intermediate estimate of movement among target demographics in-between post-intervention tests and elections.

INVESTMENT: This only happens if progressives shift spending from campaigns to continuous operations. We started with financing, and we shall close here: We need to sensitize our donor base to a different spending-culture. One reason progressive donors have been focusing on electioneering: elections provide measurable RoI. Continuously updating data repositories can help replace elections as the only RoI in our space, and can incentive progressive donors to move away from an electioneering-centered spending model.

PredictWise: 20-For-20

Toward a Permanent Campaign: 20-For-20

Data Journalist Publisher is Unfortunate Pundit

Misinformation has a Republican bias. The question is: why?