PureBasic – strip_tags() just like in PHP!

Posted by on February 11, 2014

Let the strippin’ begin!

This is a basic implementation of PHP’s strip_tags() for PureBasic, it’s not fully optimized but it’s fast enough for most applications. It doesn’t use native string manipulation functions, instead it parses through each character of the string.

Naturally it’s going to be faster than using REGEXP, but not as flexible. Lots of room for improvement, but for now it is what it is!

Procedure.s strip_tags( *szInput.character)
	Define.s szOutput
	
	If (*szInput)
				
		Repeat
			
			If (*szInput\c = '< ')
				;Strip markup tags (completely)
				Repeat
					;-ToDo: add support for selective tag stripping!
					*szInput + SizeOf(CHARACTER)
				Until *szInput\c = '>'
				
			Else
				
				;Not a tag?, process for entities or send to output directly
				Define.c ThisCharacter = *szInput\c
				Select ThisCharacter
					Case '&' ;Entity
						*szInput + SizeOf(CHARACTER)
						If *szInput\c = '#'
							*szInput + SizeOf(CHARACTER)
							Define.s szEntity
							Repeat
								szEntity + Chr(*szInput\c)
								*szInput + SizeOf(CHARACTER)
							Until *szInput\c = ';'
							szOutput + Chr(Val(szEntity))
						EndIf
					Default
						szOutput + Chr( ThisCharacter )
				EndSelect
				
			EndIf
			
			*szInput + SizeOf(CHARACTER)
		Until *szInput\c = #Null
		ProcedureReturn (szOutput)
		
	EndIf
	
EndProcedure

Simple use example:

Define.s test = "

Test paragraph.

Other text" Debug test Debug strip_tags(@test)

Got any ideas on how to improve this?, let me know!

Enjoy,
Gus