u/[email protected] provided an output of its reasoning when asked to explain this behavior, and I think it’s worth examining.
The short version is that when asked why it can joke about some groups and not others it speculates that it maybe because it’s output is based on training data, and its safeguards recognize that the training data on some topics is more likely than others to be lower in cultural literacy and higher in offensive stereotypes, and this can lead it to decline a request. That sounds like a fairly credible explanation.
u/[email protected] provided an output of its reasoning when asked to explain this behavior, and I think it’s worth examining.
The short version is that when asked why it can joke about some groups and not others it speculates that it maybe because it’s output is based on training data, and its safeguards recognize that the training data on some topics is more likely than others to be lower in cultural literacy and higher in offensive stereotypes, and this can lead it to decline a request. That sounds like a fairly credible explanation.