Not exactly. The alignment problem is an issue with optimization. The network is trained to do somehing, but it may not neccassarily be what you were wanting it to do. The method i referred to doesnt effect the priorities of the network, it simply overides certain outputs that the network may be sending. Imagine a network trained to identify weather conditions and range, and when it has adjusted, it fires a round. Now imagine a second network specifically trained to identify friendly targets. The two networks dont neccassarily need to communicate, and they are completely seperated software wise. It simply that when the second network identifies a target, it prohibits the first network from firing.
By having them not communicate, you solve that it miht fire at friendlies. The issue now becomes that the aiming network thinks its doing its job, and it doesnt understand(because it has no way of knowing) that it cant fire. So it just holds position pointing at the friendly and continuing to attempt to fire.
That does clarify what you mean. Still, that’s just composing AI systems that do not operate on discreet logic within systems that do. There is no hard rational guarantee that the aiming model will not misjudge where the bullet will land, or that the target identification model will not misjudge who qualifies as a friendly. It might have passed extensive testing, or been trained on very carefully curated data, but you still have only approximate knowledge of what it will do. That applies even moreso to more open ended or ambiguous tasks.
Not exactly. The alignment problem is an issue with optimization. The network is trained to do somehing, but it may not neccassarily be what you were wanting it to do. The method i referred to doesnt effect the priorities of the network, it simply overides certain outputs that the network may be sending. Imagine a network trained to identify weather conditions and range, and when it has adjusted, it fires a round. Now imagine a second network specifically trained to identify friendly targets. The two networks dont neccassarily need to communicate, and they are completely seperated software wise. It simply that when the second network identifies a target, it prohibits the first network from firing.
By having them not communicate, you solve that it miht fire at friendlies. The issue now becomes that the aiming network thinks its doing its job, and it doesnt understand(because it has no way of knowing) that it cant fire. So it just holds position pointing at the friendly and continuing to attempt to fire.
That does clarify what you mean. Still, that’s just composing AI systems that do not operate on discreet logic within systems that do. There is no hard rational guarantee that the aiming model will not misjudge where the bullet will land, or that the target identification model will not misjudge who qualifies as a friendly. It might have passed extensive testing, or been trained on very carefully curated data, but you still have only approximate knowledge of what it will do. That applies even moreso to more open ended or ambiguous tasks.