Considering that a while back when Nymx was bringing the topic up to me and we were discussing it for a while since I had put thought into it before to build an opinion on it, and that (to my surprise) Nymx then quickly made a thread for it,
http://tasvideos.org/forum/viewtopic.php?t=21070 , (though he could have searched for already existing ones on the topic) where he mentioned our discussion, but since there was quite a bit to say on the topic from my side, and because I prefer making posts in a detailed and complete manner regarding the things I have or want to say (to reduce unnecessary double posts and avoid misunderstandings) and wasn't prepared to make a write-up and didn't feel like making one at the time spontaneously to explain myself (though Nymx didn't go into the points explicitly that I brought up, so I think there wasn't much of a reason to feel like having to do so, but none the less), I guess I'll make use of this opportunity now then.
There is some guidelines for rating on TASVideos (
http://tasvideos.org/VotingGuidelines.html ), but it seems those are lacking further explanations or specifications on what value corresponds to what conditions being met or not met. I think ideally one would want large sets of easily through technical and entertainment ratings comparable TASes, but there's some problems in the way of getting to such a situation. There is no uniform/consistent/generally used scale (i.e. no consens) and mapping from abstract general or concretized conditions for what effort or lack of tests or improvability/known improvements or degree of uncertainty about the knowledge of a TAS's author over game aspects should correspond to what values from 0 to 10. As example, there is PJboy's personal scale (
http://tasvideos.org/PJBoy.html ), and I'd expect this to not be uniformly compatible with others' scales that one can find.
If every user would rate every movie, and if every user over time consistently goes by and keeps the same (personal) scaling for rating and their values, then (speaking from the perspective of an idealized mathematical formulation of it) there would be a unique resulting "averaged out" scale that'd be applied to all movies, such that there would be no problem comparing TASes with each other within a consistent context, because the discrepancies between different people using different scales would cancel out if everyone rates every movie.
[However, I guess this would be only true as long as no scale limit like 10 or 0 restricted a rater in the choice of the value that the person wanted to assign to a movie when that wanted to be assigned value would otherwise lie beyond that, while others would be able to exceed that range using scales that have a larger (more nuanced/detailed) range of differentiated movie qualities in mind to which they map their personal entire 0-to-10 spectrum within which they can assign values and might be able to assign values near their top or bottom limit that someone else cannot assign because his or her scale caps out already at their maximum (10) or minimum (0) rating which if translated to a more stretched out scale might mean e.g. a 8 or 3 respectively to them. So that what for one person would be a 10 would then have to be e.g. a 13 or 11 for someone else using their (assumed fixed) scale.]
A metaphor to explain what I'm trying to address would be some situation like this: ''people are feeling the same temperature but go by different scales, e.g. let's say Fahrenheit or Celsius, and take the corresponding values from the respective scales and assign them as their ratings, while leaving out the (potentially crucial) indicator (or an explanation on some personal page) for what scale they used, so if one doesn't know if they used Fahrenheit or Celsius, if one just has a bunch of temperature numbers given, one will not be able to compare the real felt temperatures.''
Consequence of that then might be that effectively not only really the values from ratings themselves matter for movies but (probably much more so) the set of people that did rate them or didn't rate them, because if e.g. a bunch of ''Fahrenheit scale users'' rate a given TAS, then compared to if one had a bunch of ''Celsius scale users'' rate the same TAS, even though they might express and mean roughly the same ''temperature'' = technical quality (or entertainment), the resulting rating value can be far different, and can make a TAS look much better or worse, depending on situation.
Still to some degree working (with respect to sustaining easy enough movie rating comparability) generalizations of an ''every user is rating every movie'' situation that would allow to cancel out movie rating inbalances caused through different groups of people rating some movies while not rating other movies (and this in a way that results in a chaotic complex distribution of rater-sets per movie, over all movies), could have forms as follows:
1. It'd be sufficient if it is not every user but just the same fixed (maybe through certain means restricted) set of users that do all of the rating, and this would still result in some consistent resulting scale from which the resulting rating of a movie emerges, provided those people that do rate movies keep doing so without changing their scale with respect to which they assign values. But obviously as consequence one would lose some users' opinions/ratings and introduction/contribution of their scale then, so a resulting movie rating might then not represent as much the general audience's view on a TAS, but the view of a restricted set of raters.
2. One could have a situation where the set of all rated movies is split into (preferably rather few) subsets of movies such that for each individual set it holds that any rater of any movie within the set also rated every other movie in that particular set of movies. And in this case, there would again be for each movie set 1 resulting scale so that one could at least compare movies among the same group or set, while the resulting movie ratings of 2 movies from different sets may be incomparable.
3. One could extend this even somewhat further, namely that one could get a few multiple cases of ''set X of users rates set Y of movies'', for intersection-free sets X when looking at different Y's/sets of movies, but the larger their count, the harder to keep track of what the individual averaged out scales are that are used for different sets of movies (so this would be a weakened form and if one gets to situations with more and more increased numbers of such movie and rater set correspondences, it expectedly will get chaotic, i.e. similar to the way it is currently).
The issue of ''inbalances caused through different groups of people rating different groups of movies in chaotically mixed ways'' technically might not be anymore a ''big'' issue if one could view for every rating on every movie what user rated it and what their (fixed) scale is that they use (or if not fixed, one would need to know when they rated it and how their scale was at any relevant point at which they rated a movie, presume). Though then for a guest or user that wants to compare movie ratings or inform themselves in that regard, the task of comparing them expectedly becomes excessive.
I guess the goal of a scale explanation would be to objectively/independently of the user that uses the scale provide or lead to the same values for ratings for when the same perception of (technical) quality of a movie is present in the different persons, similarly to clearly defined functions with inputs and outputs.
I guess another idea would be to set up one or multiple test movies (maybe ones that in some reasonable sense ''cover sufficiently well the range of qualitatively different types of TASes'') together with a scale and with an explanation coming with it on what (technical) rating should a movie get according to some guidelines, and then have it be (forced or) optional for (new) raters to rate these test movies, and to then use 'how they differ in their rating of the test movies compared to the average rating' for calibrating what their personally to normal movies assigned rating value would translate into if someone else with the same perception of quality had rated the movie instead. (I.e. to e.g know or estimate that a rating 5.6 by person X would translate into a rating 5.6 + 1.3 if it were provided by person Y, if both had the same perception of quality of the same movie.) But I guess this would assume there exist some uncontroversial movies such that for them there exists a true (technical) rating for these given test movies for which a consens could be formed, which then could be used as reference for translating a rating value given by person X into what the rating value would be if person Y with same opinion/evaluation had done the rating instead (for movies of similar ''type'').
Many movies lack ratings. Overall there isn't many users rating movies, so I think the larger the discrepancy/difference in the number of people rating is for any given 2 compared movies, aswell as the smaller the set of people is that rated both movies (relative to the total amount of raters for 2 movies), i.e. the larger the relative number of people is that only rated 1 of the 2 movies, the larger can probably/expectedly the resulting difference in assigned rating values be that is solely caused through different personal rating scales being used (either generally, or in particular by those that in a given movie comparison case only rated 1 movie rather than both). I guess if by groups of raters assigned values deviate too far from each other (like e.g. in the sense of a ''Fahrenheit versus Celsius'' comparison), then (provided one would know the scale candidates that exist and are out there and are majorly applied) one may be able to read out of all the assigned rating values from a given person already (to some degree of certainty) what scale might have been used, provided there's some prominent scales used rather them being of too individualized types.
Generally speaking, I've been aware of these kinds of issues for a few years since I joined TASVideos and it's part of why I'm reluctant of rating movies (especially on the entertainment side of it). An alternative suggestion could be to maybe have different ways of rating movies, e.g. with a list of adjectives (maybe preferably with as little overlap of meaning between them to prevent dependencies but to keep/have an own separate meaning for each), like e.g. ''glitchy, funny, exotic, speedy, lucky'' (for which I took movie award categories as reference but other qualities could be chosen aswell instead of these) and assigning for each aspect a within a small number of degrees/steps between the opposite poles (from yes to no) lying value for example. I'm not sure what all the threads are on the topic that exist already and may be helpful if looked into, but one in particular would be this one on technical rating and player points:
http://tasvideos.org/forum/viewtopic.php?t=20280 .
Generally though I'd think it'd be easier to give technical ratings than entertainment ratings, but this would depend on what one in particular wants to express with it or what the technical rating should mean (the rerecord count? amount of time worked on a movie? would one want to assign ratings independently of at the time or later expected or known improvements?). So, for a technical rating to make sense, TASVideos might need to change or further specify what they want the technical rating to mean or to point towards (as in what aspects of technical quality to include, possibly even with example reference values assigned to some fixed specified existing movie for calibrating purposes, and maybe even stating which technical aspects among those that one can think of to exclude for clarification, even though users - before educating themselves on what the considered parameters are and which parameters one could come up with actually aren't among them - might/could think or expect they'd also fit in there). Hopefully some of these critiques, suggestions/considerations, and expressed ideas can help for revamping/improving the rating system.