mips: start.S: avoid overwriting outside gd when clearing global data in stack

When setting up initial stack, global data will also be put in the stack,
and being cleared.

The assembler instructions for clearing gd is as follows:

	move	t0, k0
1:
	PTR_S	zero, 0(t0)
	blt	t0, t1, 1b
	 PTR_ADDIU t0, PTRSIZE

t0 is the start address of gd, t1 is the end address of gd (t0 + GD_SIZE).

[PTR_ADDIU t0, PTRSIZE] is in the delay slot of [blt t0, t1, 1b], so it
will be executed before the branch operation.

However the comparison for the BLT instruction is done before executing the
delay slot. This means when the last word just before k1 is cleared, the
loop will continue to run once. This will clear an extra word at k1, which
is outside the global data.

Global data is placed at the top of the stack. If the initial stack is a
SRAM or locked cache, the area outside them may be inaccessible. A write
operation performed in this area may cause an exception.

To solve this, [PTR_ADDIU t0, PTRSIZE] should be placed before the BLT
instruction.

Reviewed-by: Daniel Schwierzeck <daniel.schwierzeck@gmail.com>
Reviewed-by: Stefan Roese <sr@denx.de>
Signed-off-by: Weijie Gao <weijie.gao@mediatek.com>
diff --git a/arch/mips/cpu/start.S b/arch/mips/cpu/start.S
index 1d21b23..784b8be 100644
--- a/arch/mips/cpu/start.S
+++ b/arch/mips/cpu/start.S
@@ -71,8 +71,9 @@
 	move	t0, k0
 1:
 	PTR_S	zero, 0(t0)
+	PTR_ADDIU t0, PTRSIZE
 	blt	t0, t1, 1b
-	 PTR_ADDIU t0, PTRSIZE
+	 nop
 
 #if CONFIG_VAL(SYS_MALLOC_F_LEN)
 	PTR_S	sp, GD_MALLOC_BASE(k0)	# gd->malloc_base offset