The benchmarked hardware:
Performance (time for the entire SCF procedure in seconds) in single point energy calculations of some representative molecules. Note that C1060 and C2050 provide slightly different results (total energy difference of the order of 0.001 kcal/mol) because C2050 supports denormals and fully IEEE-compliant floating point square root and division operations. Therefore, results obtained on GPUs of 2.0 and higher compute capabilty (i.e. Tesla C2050, etc) are more accurate than results obtained on GPUs of 1.x compute capability (i.e. Tesla C1060 or similar).
| Method | 8 Tesla C2050 | 8 Tesla C1060 | ||||
| Taxol (jpg, xyz) |
Valinomycin (jpg, xyz) |
Olestra (jpg, xyz) |
Taxol (jpg, xyz) |
Valinomycin (jpg, xyz) |
Olestra (jpg, xyz) |
|
| RHF/6-31G | 16.19 sec | 24.88 sec | 99.12 sec | 18.47 sec | 29.87 sec | 114.1 sec |
| BLYP/6-31G, Grid 1 | 22.86 sec | 27.89 sec | 159.8 sec | 31.80 sec | 35.79 sec | 181.4 sec |
| BLYP/6-31G, Grid 2 | 32.81 sec | 37.32 sec | 193.5 sec | 44.66 sec | 52.13 sec | 233.6 sec |
| B3LYP/6-31G, Grid 1 | 34.05 sec | 44.17 sec | 154.2 sec | 43.15 sec | 54.68 sec | 184.8 sec |
| B3LYP/6-31G, Grid 2 | 41.08 sec | 54.34 sec | 169.3 sec | 53.84 sec | 70.61 sec | 216.9 sec |
Older results: comparison of times for a single SCF iteration (RHF/3-21G) using TeraChem on a single GPU GeForce 280GTX compared to GAMESS on a single Pentium D 3 GHz CPU core. Note the logarithmic scale needed to make a meaningful comparison.


