A framework for understanding why high-achieving students tend to have suboptimal impact.

How I View Personal Growth

It seems facetious to compare the messy journey of life to the simple gradient descent algorithm, but at our core, humans are roughly performing this first-order iterative optimization all the time.

For the motivated (which I believe, optimistically, is most of us), we aim for a concrete objective; accumulate incremental experience maximally aligned with said goal; and repeat this until convergence. While this abstract characterization doesn’t perfectly capture the human essence (e.g. we oftentimes prioritize multiple objectives simultaneously, we may be thrown off by transitory distractions, we don’t initially follow the best actions/experiences, etc.), the underlying connection to gradient descent is obvious.

This is especially true for chronically high-achieving individuals, i.e. the students that permeate institutions like Harvard. These students have excelled at formulating, working towards, and achieving incredibly admirable goals for more or less their entire lives.

The Pitfalls of Vanilla Gradient Descent as a Life Optimizer

Their work ethic is no doubt commendable, but their drive for success is oftentimes marred by myopia. That is, many students enter elite schools lacking a concrete vision for how they want to contribute to the world; this, coupled with newfound insecurities and the pressure pervading these hyper-competitive arenas, forces students into optimizing chiefly for accolades, validation, and reward in the short term.

Indeed, once their goals are set, elite students will more likely than not achieve them. It is in their nature to excel. But critically, recall that vanilla gradient descent is an algorithm that merely guarantees local optima.

The only setting in which vanilla gradient descent provides a globally optimal solution is, of course, when the loss function is convex. This is certainly not a given in most optimization problems, and it is doubly so that the complexity of life cannot be neatly distilled into a convex function.

From a purely objective standpoint, it’s really quite silly to think that a job at FAANG (or equivalently some shiny quant firm/bank/consultancy) with dubious inherent meaning would be a reasonable singular proxy for career/life satisfication; and yet so many of us blindly abide by these approximations and optimize within this wishfully smooth value manifold.

In reality, our value functions are anything but simple and convex. When we assume that this is the case and approach our future with the same approach we’ve used our entire lives – i.e. vanilla gradient descent – we are destined for suboptimality.

Introduce Stochasticity into your Life

In much the same way that stochastic gradient descent beats out vanilla gradient descent for training large neural nets, introducing stochasticity into our lives may help increase our chances of discovering our true caling.

Reaching career (or any kind of) nirvana is not really supposed to be straightforward. The sooner we can accept this, the sooner we’ll realize that we don’t need to be constantly optimizing towards the next “best” stage of our life.

The folks that I admire most have spent time soul-searching for their true purpose, and as a reflection of that have a very non-linear, non-traditional path to success. Peter Thiel immediately comes to mind, having pivoted from mathematics to philosophy to big law before arriving and dominating the startup scene. Ryan Petersen is another prime example, having started with exporting Chinese goods to the States, then building freight-scanning data services, to now running Flexport.

It’s okay to pause the grind in order to reflect. It’s okay to try a new field orthogonal to everything you’ve done in your life. It’s okay if you’re not chasing things that everyone else is chasing. It’s okay to pursue things for yourself, even if nobody else praises, understands, or approves.

Break free and embrace stochasticity.

A Roll of the Dice

And yet, as we all know, stochastic gradient descent doesn’t guarantee global optima either. It’s entirely possible we end up sitting in some saddle point or local optima, despite decades of steadfast effort.

It’s possible that we end up someplace strictly worse than the destination of a traditional, safer path; but it’s also possible we ascend to a place of immense fulfillment, impact, and joy.

At the end of the day, there’s no guarantee that the (stochastic) road less travelled is necessarily better in the long run. But it sure seems a shame to resign to an orthodox life path of untapped potential.