vyasnikhil96
vyasnikhil96 OP t1_j9mu7ak wrote
Reply to comment by sam__izdat in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
Thanks for the interesting link! For the kind of copyright in your link deduplication of the data might be hard i.e. to assume that works which have access to this copyright only occur one or a few times, since it is a character and we don't really know which characters are copyrighted and which are not. But our paper assumes deduplication has been done beforehand.
Coming back to our notion, I am not trying to say that there is a already established information theoretic notion of copyright.
The copyright law (as we state in paper) relies on two things: 1. access to the copyrighted work must be proved and 2. substantial similarity to the copyrighted work must be established. Our notion cleanly separates these two and we come up with a information-theoretic way to quantify the "substantial similarity" aspect (denoted by k-NAF). For example, the strongest setting of k = 0 will never violate copyright (but also might degrade performance or be impossible to achieve) because it is equivalent to not having access. Larger values of k tradeoff between model performance and a possible increase in "substantial similarity". What k is valid for which setting to prevent copyright violation is not something we are establishing but rather that depends on the specific setting and must be determined by the law. The user can tune the value of k (assuming feasibility of the value) to the value considered acceptable by the law.
vyasnikhil96 OP t1_j9ltbq3 wrote
Reply to comment by Battleagainstentropy in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
I agree. Note that overall there are two things we can hope for: 1. Using this approach with a appropriate k removes most of the "obvious" copyright violations and 2. for the remaining images the value k can be interpreted to determine whether there was a copyright violation or not, where the interpretation will necessarily be application and context dependent.
vyasnikhil96 OP t1_j9kzwpq wrote
Reply to comment by iidealized in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
Assuming you are asking from the perspective of copyright law, I am not sure. I think the notion of “remix”/sufficient transformation also depends on the context in which the new work is being used.
vyasnikhil96 OP t1_j9ksj4v wrote
Reply to comment by bluemason in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
We already handle that as our notion is not based on reproduction but is rather a information-theoretic notion. We also have a parameter that measures how much information we have "reproduced" vs adapted which can be set depending on the underlying models and the use case.
vyasnikhil96 OP t1_j9k57az wrote
Reply to comment by ichiichisan in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
I agree that the final say rests with the courts. But do you think there is something specific that we use or claim that differs from how the copyright law is currently implemented?
Submitted by vyasnikhil96 t3_1190lw8 in MachineLearning
vyasnikhil96 OP t1_j9oi593 wrote
Reply to comment by sam__izdat in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
Thanks! this was an interesting read.