Let the strippin’ begin!
This is a basic implementation of PHP’s strip_tags() for PureBasic, it’s not fully optimized but it’s fast enough for most applications. It doesn’t use native string manipulation functions, instead it parses through each character of the string.
Naturally it’s going to be faster than using REGEXP, but not as flexible. Lots of room for improvement, but for now it is what it is!
Procedure.s strip_tags( *szInput.character)
Define.s szOutput
If (*szInput)
Repeat
If (*szInput\c = '< ')
;Strip markup tags (completely)
Repeat
;-ToDo: add support for selective tag stripping!
*szInput + SizeOf(CHARACTER)
Until *szInput\c = '>'
Else
;Not a tag?, process for entities or send to output directly
Define.c ThisCharacter = *szInput\c
Select ThisCharacter
Case '&' ;Entity
*szInput + SizeOf(CHARACTER)
If *szInput\c = '#'
*szInput + SizeOf(CHARACTER)
Define.s szEntity
Repeat
szEntity + Chr(*szInput\c)
*szInput + SizeOf(CHARACTER)
Until *szInput\c = ';'
szOutput + Chr(Val(szEntity))
EndIf
Default
szOutput + Chr( ThisCharacter )
EndSelect
EndIf
*szInput + SizeOf(CHARACTER)
Until *szInput\c = #Null
ProcedureReturn (szOutput)
EndIf
EndProcedure
Simple use example:
Define.s test = "Test paragraph.
Other text"
Debug test
Debug strip_tags(@test)
Got any ideas on how to improve this?, let me know!
Enjoy,
Gus
Beware of the portion “‘< '" should be "'<'" without the spaces, there's a bug in the parser somewhere...