To optimize our learning, we need to balance exploring new domains of knowledge and exploiting stable sources of golden-nuggets.
There is a large amount of uncertainty regarding the value of domains you haven’t explored. Today you’re only aware of 0.000001% of what is out there. For example, there could be a group of people and a culture which would give you the best time of your life, but you are just not aware of that. If your focus is too narrow, you’ll miss out on what the world has to offer. If your focus is too broad, you’ll never reap the full rewards of that which you have already discovered. You need to carefully invest time in new domains to “sample” the value of those domains, to see whether they contain useful concepts worth investing more time in.
In the bernoulli bandit problem, the number of time periods is larger than the number of actions. The long interval of time allows for an empirical sampling approach where you can keep sampling from each slot machine in the casino, gathering information about each machine's payoff distribution.
When pulled, an arm produces a random payout drawn independently of the past.
--> In learning are the future actions independent of the past?
--> Strong path dependence because prior knowledge, ie. what you already know, is the largest determinant of what you can learn.
Not the same in learning - the number of time periods is not enough to sample from the entire domain of possibilities. The number of topics and subjects you could allocate your time to is too many for you to sample in a lifetime. Two options:
Momentum approach / Matthew approach / Preferential attachment: Invest more resources in those that provide you with the most benefit in your experience.
Case study approach: What domains / concepts have proven worthwhile in generating creative ideas in the past?
The priority system in SuperMemo acts as the feedback mechanism that controls the allocation of your time and attention between elements in your collection. Similar to the way stronger pheromone scents indicate more promising food sources for foraging ants, higher priorities in SuperMemo indicate elements that you believe contain a lot of valuable information.
In the course of learning, as you come across new concepts and domains that seem like promising sources of golden nuggets, they should be prioritised according to their estimated value, which should be continually updated as you learn more about them. Updating priorities dynamically allows you to allocate your time most efficiently between exploring new, uncertain possibilities and exploiting stable sources of nuggets.
I’ll call pathogens "attackers" and lymphocytes "defenders" here.
The body deals with attackers that invade the body by creating defenders that bind to their specific shape. When a defender binds to the pathogens shape, it kills the attacker.
The main problem facing the immune system is uncertainty about the shapes of future attackers. The body can’t possibly know ahead of time what attacker will invade the body, so it can’t create defenders in advance which will bind perfectly to future attackers.
The body deals with this uncertainty by using massive diversification - it spreads its resources out over an extremely diverse range of shapes in the hope that at least one will match the future attacker.
When a defender does successfully bind to an attacker, the binding acts as a sort of "activation event" for the body to change its investment strategy. The uncertainty decreases because the body knows it needs to create more lymphocytes to kill that attacker.
The body will suddenly change its strategy - rather than spreading resources over many different shapes, it will concentrate its investment into the current threat, by creating more lymphocytes that bind to the current attacker invader.
In learning, the optimal strategy for dealing with the uncertainty regarding the future value of ideas may be the same as the immune system’s strategy for dealing with attackers - massive diversification in time allocation between many different domains until an “activation event” occurs. The activation event could be some event that makes you realise that an idea is essential for some project or creative pursuit. Then you can suddenly change the allocation of your time to focus on that idea.
Over the course of your lifetime you may oscillate many times between periods of massive diversification between many different domains and periods of concentration in one or two domains when the need arises.
TODO: Connection to reinforcement / online learning
Optimal solutions require:
Dyamic reallocation of resources.
Preservation of optionality (Taleb).
Bayesian approach to allocating resources under uncertainty.
Sleep bandits example