NSGminer v0.9.2: The Fastest Feathercoin / NeoScrypt GPU Miner
-
@Wolf0 I have optimised the most important XOR in FastKDF already. It was a bottleneck to do it bytewise on GCN. 120K kernel size isn’t very large because Salsa and ChaCha separately fit the code cache and FastKDF has more important issues like memory alignment. I’ll try to optimise it better.
-
I like the idea on optimising on “power efficiency”, not “speed”. ;)
-
@ghostlander said:
@Wolf0 I have optimised the most important XOR in FastKDF already. It was a bottleneck to do it bytewise on GCN. 120K kernel size isn’t very large because Salsa and ChaCha separately fit the code cache and FastKDF has more important issues like memory alignment. I’ll try to optimise it better.
Which XOR would that be? I feel like I’m derping and missing something obvious, but I see the ending XOR with the if/else branch outside the loop, and the XOR inside the loop which is done with a call to neoscrypt_bxor()… I just looked at your current git again, double-checked this, then read the neoscrypt_bxor() function again - it’s still bytewise. Unless you mean something you’ve not pushed, in which case never mind. If you have, then nice - my trick with the aligning the XOR worked out for you.
Anyways, you seem to be working from the outside in, rather than from the inside out, when it comes to the optimization of the code - the “outside” being the portions with less time spent, and the “inside” being the opposite. You really might want to look into SMix() - that’s where you really can gain hashrate.
@wrapper said:
I like the idea on optimising on “power efficiency”, not “speed”. ;)
They are almost always one in the same in the GPU arena. If I have shitty, slow code, it leaves portions of the GPU unused, or at least under-utilized, causing the lower power consumption people notice. However - if these resources are used well, then the hashrate goes up far more than power does - I actually have records from my really old X11 optimizations to show this, as well as exact percentages taken from runs of the (then) stock X11 shipping with SGMiner and mine on Freya.
-
@Wolf0 https://github.com/ghostlander/nsgminer/blob/692e2ef2946229cf057dd006c8e85c8674f0342f/neoscrypt.cl#L713
It’s executed 64 times per hash. The final XOR outside the loop is less important.
@Wolf0 said:
Unless you mean something you’ve not pushed, in which case never mind. If you have, then nice - my trick with the aligning the XOR worked out for you.
Well, I added it to my beta 10 days ago. You have mentioned to do bytewise XOR in uints, I have vectorised it which is also fine. Not uploaded to GitHub yet, but quite a few people use it right now. It’s well improved over the previous release in performance and compatibility. I see only a 5% decrease while switching from 14.6 to 15.7 drivers. It was much worse before (https://bitcointalk.org/index.php?topic=712650.msg13585416#msg13585416).
-
@ghostlander said:
@Wolf0 https://github.com/ghostlander/nsgminer/blob/692e2ef2946229cf057dd006c8e85c8674f0342f/neoscrypt.cl#L713
It’s executed 64 times per hash. The final XOR outside the loop is less important.
@Wolf0 said:
Unless you mean something you’ve not pushed, in which case never mind. If you have, then nice - my trick with the aligning the XOR worked out for you.
Well, I added it to my beta 10 days ago. You have mentioned to do bytewise XOR in uints, I have vectorised it which is also fine. Not uploaded to GitHub yet, but quite a few people use it right now. It’s well improved over the previous release in performance and compatibility. I see only a 5% decrease while switching from 14.6 to 15.7 drivers. It was much worse before (https://bitcointalk.org/index.php?topic=712650.msg13585416#msg13585416).
OH, lol, yes, that is good, but that was not what I meant! This line:
[code]
neoscrypt_bxor(&Bb[bufptr], &T[0], 32);
[/code]I’m saying I did this operation using uints.
-
@Wolf0 I get it. I’ve also rewritten it. The code quoted is plain bytewise, though old VLIW GPUs like it for some arcane reason.
-
@ghostlander said:
@Wolf0 I get it. I’ve also rewritten it. The code quoted is plain bytewise, though old VLIW GPUs like it for some arcane reason.
Odd. I got my 6970 today, so I should be able to work on Cayman in a while.
-
I have managed to squeeze 520 and 500Kh/s out of my r9 290s, and up to 420KH/s on my 7950s which is more where I was expecting, all my 7970/280x are between 450 and 500KH/s with the majority being around 500.
-
I have optimised almost all bytewise parts of FastKDF. 800KH/s before with v7 beta, 820KH/s now (Catalyst 14.6) or 770KH/s (Catalyst 15.7).
-
Even with beta v7, Im floating around 320KH/s on my 7950s. I only saw a ~20-30KH/s increase.
-
@AmDD said:
Even with beta v7, Im floating around 320KH/s on my 7950s. I only saw a ~20-30KH/s increase.
what driver version are you using?
-
@RIPPEDDRAGON said:
@AmDD said:
Even with beta v7, Im floating around 320KH/s on my 7950s. I only saw a ~20-30KH/s increase.
what driver version are you using?
14.7
-
@AmDD said:
@RIPPEDDRAGON said:
@AmDD said:
Even with beta v7, Im floating around 320KH/s on my 7950s. I only saw a ~20-30KH/s increase.
what driver version are you using?
14.7
weird… i think that is what I am running, plain and simple -w 128 -I 16…I will check tonight
What clocks are yours set to? Im running 1110/1550 or higher…
-
@RIPPEDDRAGON said:
@AmDD said:
@RIPPEDDRAGON said:
@AmDD said:
Even with beta v7, Im floating around 320KH/s on my 7950s. I only saw a ~20-30KH/s increase.
what driver version are you using?
14.7
weird… i think that is what I am running, plain and simple -w 128 -I 16…I will check tonight
What clocks are yours set to? Im running 1110/1550 or higher…
Ill have to double check my settings but I think clocks are 1100/1600 or so.
-
Trying to compile on a custom Puppy Linux install (MinerPup).
... make[2]: Entering directory `/root/archive/nsgminer' CC nsgminer-miner.o In file included from miner.c:66: neoscrypt.h:9: error: redefinition of typedef ‘ullong’ miner.h:34: error: previous declaration of ‘ullong’ was here neoscrypt.h:12: error: redefinition of typedef ‘uchar’ miner.h:30: error: previous declaration of ‘uchar’ was here make[2]: *** [nsgminer-miner.o] Error 1 make[2]: Leaving directory `/root/archive/nsgminer' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/root/archive/nsgminer' make: *** [all] Error 2
Any suggestions?
-
@RIPPEDDRAGON said:
@AmDD said:
@RIPPEDDRAGON said:
@AmDD said:
Even with beta v7, Im floating around 320KH/s on my 7950s. I only saw a ~20-30KH/s increase.
what driver version are you using?
14.7
weird… i think that is what I am running, plain and simple -w 128 -I 16…I will check tonight
What clocks are yours set to? Im running 1110/1550 or higher…
-w 256 -I 13 -g 2 and 1050/1600 clocks on 14.7 drivers. I did see when I got home that the rig had shutdown and had issues booting back up. I reinstalled the drivers and tried -w 128. So far its slower but I’ll let it hash awhile to see what it does.
-
@UnklAdM said:
Trying to compile on a custom Puppy Linux install (MinerPup).
... make[2]: Entering directory `/root/archive/nsgminer' CC nsgminer-miner.o In file included from miner.c:66: neoscrypt.h:9: error: redefinition of typedef ‘ullong’ miner.h:34: error: previous declaration of ‘ullong’ was here neoscrypt.h:12: error: redefinition of typedef ‘uchar’ miner.h:30: error: previous declaration of ‘uchar’ was here make[2]: *** [nsgminer-miner.o] Error 1 make[2]: Leaving directory `/root/archive/nsgminer' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/root/archive/nsgminer' make: *** [all] Error 2
Any suggestions?
Edit miner.c and driver-cpu.c to include neoscrypt.h before miner.h, and update typedefs in miner.h to the following:
#if !(uchar)
typedef unsigned char uchar;
#endif
#if !(uint)
typedef unsigned int uint;
#endif
#if !(ullong)
typedef unsigned long long ullong;
#endif -
In the code a type is defined, that was previously defined in another part/module/file
Suggesstion:
Edit miner.h and comment out the lines defining the type ullong and uchar
Then try again.
[Edit]
Ghostlanders way is far more elegant :D -
@Wellenreiter said:
In the code a type is defined, that was previously defined in another part/module/file
Suggesstion:
Edit miner.h and comment out the lines defining the type ullong and uchar
Then try again.
[Edit]
Ghostlanders way is far more elegant :DTried that, that’s why I’m here. Thanks anyway! I’ll try the other fix when I get to the office
- UnklAdM.
-
@AmDD So I checked driver version…its reporting as 15.8 but I know I downgraded it to 14.6 rc2
also I am running at I 15 because I am under 400KH/s with I 16