Joined: 5/1/2004
Posts: 4096
Location: Rio, Brazil
trazz wrote:
Call me jaded if you will but I managed to find over 20 seconds of wasted time in my first published Golden Axe movie even though it was described as having "no obvious [room for] improvements". Over 5% of that movie was wasted time and yet nobody said that a specific segement could have been done more efficiently. That isn't the only case where a "near-perfect" movie has been found to have significant room for improvement, either.
I'd be really suprised if rolling thunder had 2% possible improvement..
What also bothers me is the fact that people don't know the game but still vote on it. So they can't be impressed at what's shown because they don't understand the game's rules. One will argue that the movies should be enjoyed by everyone, but, on a little known game like this, the rating can confuse people who know the game well but haven't watched the video, and this way they will think twice before downloading the movie because of the bad rating.
I'd suggest a 3rd rating: "familiarity level", to see how well the people voting on technical quality and entertaining value know the game.
"Familiarity level: Rate from 0 to 10 how well you know this game"
I'd suggest a 3rd rating: "familiarity level", to see how well the people voting on technical quality and entertaining value know the game.
"Familiarity level: Rate from 0 to 10 how well you know this game"
I disagree. There are several games underrated because people don't know the game, and several movies overrated, just because the game is popular (while the movie is not really that special).
Adding a third rating like this, will only give the popular games a higher average grade, while they are possibly overrated already.
Adding a third rating like this, will only give the popular games a higher average grade, while they are possibly overrated already.
But he didn't say anything about the familiarity level counting toward the overall grade. If anything, it would make more sense for it to alter how much the technical rating counts for (which could in fact decrease the overall rating), instead of adding to the rating directly.
Sofar, I hadn't rated any movies, but I decided to rate a few yesterday, and some things bothered me. I've thought about it, and here are some things I would suggest:
Suggestions
(1) Remove the technical rating.
(2) Being able to give a rating in decimals (like, 7.8).
These two suggestions would imply two things:
(3) Call 'the entertainment value rating of the movie', just 'the rating of the movie'.
(4) Erase the texts next to the scores (like: 5 - Average).
Explanation
I think it's very hard to give a technical rating to a movie. You will likely only watch the movie at 100%, and even though you might have a small idea, it will result in taking a guess. The only way to really know the technical rating, is to make a TAS of the game yourself. Sometimes you watch a TAS of a game you've never played, and don't even know the controls. Besides, aren't these movies supposed to be perfect?
Also... somehow, the entertainment value will have an influence on the technical value (or at least for some people... and in some way, they are correct. Knowing a movie how perfect the movie is can add to your entertainment value.). It will also be easier for people to understands what the rating represents, with only one rating. You will be able to take into account the technical value of the movie in this single rating, and will be able to give it any weight you want to give it.
It might be argued that there aren't enough enough voting options, without the technical rating... but it's the same way now. The movies which have the least possible entertainment value (0) up to the movies with a 'perfect' entertainment value can be give 11 different grades. I think this is too few. If you want to rate a movie higher than a 9, you will be forced to give it a perfect 10. A good solution would be the possibility to give the movies decimal ratings.
After I rated some movies, and looked at the list of movies I rated, I noticed this didn't really represented what I thought. Some movies were placed higher than I wanted, because they had a high technical rating. Other movies were placed equal to some movies, since I had to give ratings in integers. I think this single rating, with decimal accuracy will generate a list of movies which really represents what you think.
Obviously, this single rating doesn't need to be called a rating of 'entertainment value'. You could just give it a rating for whichever factors you think are important.
Also, the text at the side of the rating can be really confusing. 5 - Average. Average to what? Should the average rating of all movies be a 5? I think these texts just make it more complicated, rather than easy, since everyone could assign different texts to certain ratings.
--------------------------------
On an unrelated note to these suggestions, it would be great if the 'ratings being public'-feature was implemented.
We already had this discussion yesterday on IRC so there's no need to repeat my points, I just want to point out one thing here.
"If you want to rate a movie higher than a 9, you will be forced to give it a perfect 10."
This (specifically, the "perfect"), I don't agree with. I don't believe the intention of the rating system is to have literally zero movies rated at 10 in any category. If you were to consider a rating of 10 perfect, there would be no movies rated that high. You even say yourself that you feel the text at the side of the rating should be removed and not considered (10 is called "Perfect").
Instead, I believe the intention was having a bell curve kinda distribution, with the very worst at 5 (I agree that the rating system is kinda weird in that pretty much no movies get less than 5, but I don't think that's a big deal), and the very best at 10. Giving out no 10s at all because of some technicality with wording seems completely unreasonable to me.
Well... I was just saying that if you wanted to rate a movie higher than a 9, but lower than a 10, being able to rate it a 9.6 would be a very nice option.
Your response is to one argument I used, and I used it in an extreme way, just to show something. I somehow agree with you on this point, but this is not what my post was about. I'm still not sure if you agree with the suggestions I made or not.
Well... I was just saying that if you wanted to rate a movie higher than a 9, but lower than a 10, being able to rate it a 9.6 would be a very nice option.
Your response is to one argument I used, and I used it in an extreme way, just to show something. I somehow agree with you on this point, but this is not what my post was about. I'm still not sure if you agree with the suggestions I made or not.
I don't. I don't think they're at all unreasonable though and I hope your post creates some good discussion on the subject.
The reason I feel that removing technical rating is not a good idea is because I believe it would result in more unfairly judged movies. The way I see it, the important question here is what we're trying to rate exactly. How much did you like the TAS, or How good was this TAS? The former is primarily a subjective thing, the latter is primarily an objective thing.
I believe the two current rating categories (entertainment, technical quality) nicely answers each of these questions, respectively. I also believe a mix of the two questions (how much did you like it, how good is it) is much preferred to only using one or the other when giving a movie an overall rating. Only answering the question "How good is this TAS?" doesn't result in a good enough rating list, in my opinion. On the other hand, only using everybody's personal top 10 list (or top n), while far from pointless or uninteresting, doesn't give an accurate description of what is the "best TAS" (or highest rated TAS), I think. I do believe how much people like it is more important than its technical quality though.
All that being said, I think the current rating system is very very good. It takes into account both people's subjective opinion of the movie as well as a more objective technical quality, and more importantly the subjective opinion makes up for more of the overall score. Just the way I believe it should be done.
Upon further thought, your second suggestion might actually be a good idea, I'm not sure. I probably think having a x.5 option would be good, at the very least.
While the technical rating *in theory* should be a neutral judgement on the technical achievements of the run (without things like "I don't like this game" or "I don't like how the run was made, it's boring" affecting it), we should still remember that nevertheless it still just reflects the *opinion* of people.
Regarldess of that, I still think that it's not worthless. It's interesting to see what's the average opinion of people about the technical quality of the run. Of course it mostly reflects "I *think* it's worth this much technically" instead of "I have made extensive calculations and noticed that 5 frames could be saved from the total, thus I vote just 9", but opinions are still valuable.
The "perfect" score was not, in fact, intended to mean "there's just no way it can be improved, it has reached absolute perfection, it completes the game in the minimum possible amount of frames, it's a run which I would watch 1000 times in a row". It was meant to be more like "this is about the best video I have ever seen here, it was absolutely fabulous, I can't even imagine how it could be improved". Perhaps the use of the word "perfect" was a bit misleading.
The reason for the descriptions in the values is to try to make votes equal in meaning. For example someone could think that 7.5 is the "average" vote, while someone else could think that 5 is. If these two people consistently vote with these meanings in their mind, the results would be biased (especially if one of them votes considerably more than the other). The purpose of the descriptions is to try to give the same value meaning to all voters.
As for using decimals, I don't think that's necessary. When developing the voting functionality some people in fact protested to the big amount of values. Some even suggested that something like 3 values would be enough ("I didn't like it", "average", "I liked it"). I think someone even suggested just 2 values ("I didn't like it", "I liked it"). Another common suggestion was 5 values.
Yet in a different forum (the IRTC to be exact) it was at some time in the past felt that 0-10 is not enough, so they used 0-20, which allows the voters more nuance. I didn't want to go to such extremes, so I just put the 0-10 range.
As for using decimals, I don't think that's necessary. When developing the voting functionality some people in fact protested to the big amount of values. Some even suggested that something like 3 values would be enough ("I didn't like it", "average", "I liked it"). I think someone even suggested just 2 values ("I didn't like it", "I liked it"). Another common suggestion was 5 values.
If you have 2 values, the question "Did you like this TAS?" will be literally the same as "Do you know/like the game?"
How could people be against more values? If you don't know how to use it, you could also vote integers like before. It's true that there are 11 values now... but if you don't plan to rate/watch ALL movies on the site, and just the ones you are interested in, you will end up using the values 6,7,8,9, 10 only probably. Like I stated before, this leads to me, to giving the same rating to some movies which I don't find equally entertaining.
The "perfect" score was not, in fact, intended to mean "there's just no way it can be improved, it has reached absolute perfection, it completes the game in the minimum possible amount of frames, it's a run which I would watch 1000 times in a row". It was meant to be more like "this is about the best video I have ever seen here, it was absolutely fabulous, I can't even imagine how it could be improved".
Well, for instance, to me a "10" in technical means "the run has reached the limit with the known tricks, and it can't be improved further using them". That's why I'm always trying to ask the authors of certain "overoptimized" runs if there are any places where time could be saved. If there weren't any, and the history of the run showed rather clearly that the game was already close to the limit, I gave such run a ten (currently there are 7 such runs, obviously none of them obsoleted).
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
Note that the technical rating is not only about how many frames long the run is. Also other things may affect it. I would even go so far as to say that even if a run uses as few frames as it is technically possible to complete the game, it still doesn't necessarily mean that a 10 should automatically given to it.
I would say that also the amount of work put into the run may/should also affect the technical score, at least a bit. For example the amount of work that has been done in order to deliver the newest rockman video is just amazing. Just that alone should raise the technical score. Another good example of work put into a run which I admire is the new Gradius run. The work put into the Zelda TAS is also completely amazing. (There are several other examples too).
There are other runs, however, which are very simple and straightforward, and the "work" done to do them consists basically of just doing frame advance and using savestates until the run is as fast as possible (or the runner thought possible). While such run may be framewise optimal, it still doesn't mean that it deserves a 10 for technical score.
Isn't that completely different from the intended use of the technical rating? I've wondered why some runs that most likely can't be significantly improved have a low technical rating, and why some runs that have large known improvements have an average technical rating higher than 9. I assumed the cause was category cross-contamination, but maybe it's really because of the technical rating being interpreted differently from its stated definition.
In the same way that using more advanced tools such as bots is not less interesting, entertaining, or seemingly less optimal to me in regards to the finished product, it is not more interesting, entertaining, or seemingly optimal to me either.
I think the methods used to create a TAS are irrelevant on both ends of the spectrum.
<Swordless> Go hug a tree, you vegetarian (I bet you really are one)
I would say that also the amount of work put into the run may/should also affect the technical score, at least a bit. For example the amount of work that has been done in order to deliver the newest rockman video is just amazing. Just that alone should raise the technical score. Another good example of work put into a run which I admire is the new Gradius run. The work put into the Zelda TAS is also completely amazing. (There are several other examples too).
There are three main reasons why I usually don't take that into account.
• One could spend 1000 rerecords optimizing a certain section, then someone else goes and obsoletes the previous effort in just a couple of takes by using completely another approach (a situation somewhat similar to Arne's discovery which lead to instant 15+ seconds of improvement in Megaman, or kirbymuncher's improvement over JXQ's first Kirby's Adventure WIP). While the amount of work is greater in the first case, the result is faster/better/whatever in the second, so I would rate the second result technically higher (that is also why I never try to judge any activity by the amount of effort it takes to complete it).
• The amount of work is too difficult to judge — especially if the author doesn't openly demonstrate it. For example, Nitsuja worked silently on some of his later runs, made them in very short time span, and they all were of top quality. But we won't ever be able to evalute the amount of effort spent onto them. We won't know if it was easy or hard to him, and we don't know how does he define easy and hard comparing to our own definitions. Moreover, any author themself can say anything regarding the effort they've spent on their runs. No way to know for sure.
• Thus, without having any facts, one should follow the Occam's razor principle: the less assumptions there are, the better. That's a regular thing for any logic-driven person (which I am).
Of course, it's always hard to make a fair judgement on someone's run until you start improving it yourself, so I try to take into account every little detail when I'm watching a movie, including overall imprecision, lag and the actual data on possible improvements.
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
Note that with "amount of work" I was not referring to the amount of rerecords nor the time taken to make the run. While those can, of course, be considered, I didn't mean they should be the main factors.
By "amount of work" I was referring more to the variety and ingenuity and developement put into making the run possible. Also the amount of preparation for making the run (such as studying the game inside and out to discover glitches, key memory values and so on) is an important factor.
Of course if the submitter doesn't describe in detail everything he did in order to make the run, it's next to impossible to judge it. However, that's the submitter's own fault. He should describe in detail the entire process of making the run. It does not only help rating the run, but it's also interesting info to read.
Of course nothing stops someone from lying in their submission description, but I think we can trust the majority of submitters to be honest.
Perhaps a page could be written outlining the basic ideas behind the two rating categories? If there's any interest, I could write some kind of draft and post it here.
I agree with Baxter about the decimals. I'm not sure what to vote when either category is almost perfect, better than a 9 but not a 10. I round it, but I would give ratings like 9.4 if I could. When something has a 9, then someone improves it and makes it know that it could be improved further, it obviously shouldn't go to a 10 but it deserves higher than it had before.
Warp wrote:
Perhaps a page could be written outlining the basic ideas behind the two rating categories? If there's any interest, I could write some kind of draft and post it here.
It couldn't hurt, but keep in mind that so far all (3) of the responses to your previous post have been disagreeing with [part of] what you said your basic idea of the technical rating is. If a run is really framewise perfect, then it deserves a 10 for technical rating by definition. If it required so little ingenuity to make it perfect, then that would show in the movie getting a low entertainment rating. I know most people aren't qualified to judge how perfect any given movie is with any accuracy, but I think it's more important for them to pay attention to the movie than to what its author says about the movie's creation.
I also believe that some people don't like reading long and detailed description texts, and thus ignore some facts that could possibly affect their rating.
Having decimals may be useful, but I probably wouldn't use them due to already formed personal scale + a considerable level of uncertainty.
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
I noticed that Circus Charlie has a 5.2. It could just be that everyone is giving it low entertainment ratings.
After a few watches at 100%, I'd have to say it appears very unlikely to be improvable by much. I don't think it took a huge amount of work and certainly not advanced tools (it's still Famtasia). But it's a simple and short game, and if it's approaching being a frame-perfect movie I'd give it a 9 or 10 technical rating.
Should technical ratings not be proportional to the simplicity of the game?
After a few watches at 100%, I'd have to say it appears very unlikely to be improvable by much.
Apparently, it is hardly improvable at all, even though it was made with Famtasia. There were some attempts at improving it with more recent tools, all of them unsuccessful.
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
If a run is really framewise perfect, then it deserves a 10 for technical rating by definition.
When implementing the rating system that was not really what I had in mind as a definition for technical rating (even though it was Bisqwit who in the end decided the rating categories). I can't speak for Bisqwit, but I'm somewhat convinced that he doesn't think so either. Defining technical rating by simply the number of frames the video uses is way too limited and IMO not in the spirit of the ratings. (Of course the number of frames should have a significant effect on the rating, but IMO it should in no way be the only affecting thing.)
I disagree. If one person were to use "standard" tools and achieve the same time as someone who used more advanced tools, I don't think one is more technically perfect than the other. By that logic, early videos that didn't even have frame advance capabilities should be rated less simply because of that.
If the tools are really that impressive and useful, it will show because someone that doesn't have those tools won't have much of a chance of matching the time of the one with the more advanced tools. No need for a double penalty.
<Swordless> Go hug a tree, you vegetarian (I bet you really are one)