Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/OgreMain/include/OgreCompiler2Pass.h @ 1

Last change on this file since 1 was 1, checked in by landauf, 18 years ago

File size: 34.1 KB

Rev	Line
[1]	1	/*
	2	-----------------------------------------------------------------------------
	3	This source file is part of OGRE
	4	(Object-oriented Graphics Rendering Engine)
	5	For the latest info, see http://www.ogre3d.org
	6
	7	Copyright (c) 2000-2006 Torus Knot Software Ltd
	8	Also see acknowledgements in Readme.html
	9
	10	This program is free software; you can redistribute it and/or modify it under
	11	the terms of the GNU Lesser General Public License as published by the Free Software
	12	Foundation; either version 2 of the License, or (at your option) any later
	13	version.
	14
	15	This program is distributed in the hope that it will be useful, but WITHOUT
	16	ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
	17	FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
	18
	19	You should have received a copy of the GNU Lesser General Public License along with
	20	this program; if not, write to the Free Software Foundation, Inc., 59 Temple
	21	Place - Suite 330, Boston, MA 02111-1307, USA, or go to
	22	http://www.gnu.org/copyleft/lesser.txt.
	23
	24	You may alternatively use this source under the terms of a specific version of
	25	the OGRE Unrestricted License provided you have obtained such a license from
	26	Torus Knot Software Ltd.
	27	-----------------------------------------------------------------------------
	28	*/
	29
	30
	31	#ifndef __Compiler2Pass_H__
	32	#define __Compiler2Pass_H__
	33
	34	#include "OgrePrerequisites.h"
	35	#include "OgreStdHeaders.h"
	36
	37	namespace Ogre {
	38
	39	/** Compiler2Pass is a generic 2 pass compiler/assembler
	40	@remarks
	41	provides a tokenizer in pass 1 and relies on the subclass to provide the virtual method for pass 2
	42
	43	PASS 1 - tokenize source: this is a simple brute force lexical scanner/analyzer that also parses
	44	the formed token for proper semantics and context in one pass
	45	it uses top down (recursive descent) ruling based on Backus - Naur Form (BNF) notation for semantic
	46	checking.
	47
	48	During Pass1, if a terminal token is identified as having an action then that action gets triggered
	49	when the next terminal token is uncountered that has an action.
	50
	51	PASS 2 - generate application specific instructions ie native instructions based on the tokens in the instruction container.
	52
	53	@par
	54	this class must be subclassed with the subclass providing some implementation details for Pass 2. The subclass
	55	is responsible for setting up the token libraries along with defining the language syntax and handling
	56	token actions during the second pass.
	57
	58	@par
	59	The sub class normally supplies a simplified BNF text description in its constructor prior to doing any parsing/tokenizing of source.
	60	The simplified BNF text description defines the language syntax and rule structure.
	61	The meta-symbols used in the BNF text description are:
	62	@par
	63	::= meaning "is defined as". "::=" starts the definition of a rule. The left side of ::= must contain an <identifier>
	64	@par
	65	<> angle brackets are used to surround syntax rule names. A syntax rule name is also called a non-terminal in that
	66	it does not generate a terminal token in the instruction container for pass 2 processing.
	67	@par
	68	\| meaning "or". if the item on the left of the \| fails then the item on the right is tested.
	69	Example: <true_false> ::= 'true' \| 'false';
	70	whitespace is used to imply AND operation between left and right items.
	71	Example: <terrain_shadaws> ::= 'terrain_shadows' <true_false>
	72	the 'terrain_shadows' terminal token must be found and <true_false> rule must pass in order for <terrain_shadows> rule
	73	to pass.
	74	@par
	75	[] optional rule identifier is enclosed in meta symbols [ and ].
	76	Note that only one identifier or terminal token can take [] modifier.
	77	@par
	78	{} repetitive identifier (zero or more times) is enclosed in meta symbols { and }
	79	Note that only one identifier or terminal token can take {} modifier.
	80	@par
	81	'' terminal tokens are surrounded by single quotes. A terminal token is always one or more characters.
	82	For example: 'Colour' defines a character sequence that must be matched in whole. Note that matching is case
	83	sensitive.
	84	@par
	85	@ turn on single character scanning and don't skip white space.
	86	Mainly used for label processing that allow white space.
	87	Example: '@ ' prevents the white space between the quotes from being skipped
	88	@par
	89	-'' no terminal token is generated when a - precedes the first single quote but the text in between the quotes is still
	90	tested against the characters in the source being parsed.
	91	@par
	92	(?! ) negative lookahead (not test) inspired by Perl 5. Scans ahead for a non-terminal or terminal expression
	93	that should fail in order to make the rule production pass.
	94	Does not generate a token or advance the cursur. If the lookahead result fails ie token is found,
	95	then the current rule fails and rollback occurs. Mainly used to solve multiple contexts of a token.
	96	An Example of where not test is used to solve multiple contexts:
	97
	98	<rule> ::= <identifier> "::=" <expression>\n
	99	<expression> ::= <and_term> { <or_term> }\n
	100	<or_term> ::= "\|" <and_term>\n
	101	<and_term> ::= <term> { <term> }\n
	102	<term> ::= <identifier_right> \| <terminal_symbol> \| <repeat_expression> \| <optional_expression>\n
	103	<identifier_right> ::= <identifier> (?!"::=")
	104
	105	<identiefier> appears on both sides of the ::= so (?!"::=") test to make sure that ::= is not on the
	106	right which would indicate that a new rule was being formed.
	107
	108	Works on both terminals and non-terminals.
	109	Note: lookahead failure causes the whole rule to fail and rollback to occur
	110
	111	@par
	112	<#name> # indicates that a numerical value is to be parsed to form a terminal token. Name is optional and is just a descriptor
	113	to help with understanding what the value will be used for.
	114	Example: <Colour> ::= <#red> <#green> <#blue>
	115
	116	@par
	117	() parentheses enclose a set of characters that can be used to generate a user identifier. for example:
	118	(0123456789) matches a single character found in that set.
	119	An example of a user identifier:
	120
	121	@par
	122	<Label> ::= <Character> {<Character>}\n
	123	<Character> ::= (abcdefghijklmnopqrstuvwxyz)
	124
	125	This will generate a rule that accepts one or more lowercase letters to make up the Label. The User identifier
	126	stops collecting the characters into a string when a match cannot be found in the rule.
	127
	128	@par
	129	(! ) if the first character in the set is a ! then any input character not found in the set will be
	130	accepted.
	131	An example:
	132
	133	<Label> ::= <AnyCharacter_NoLineBreak> {<AnyCharacter_NoLineBreak>}\n
	134	<AnyCharacter_NoLineBreak> ::= (!\n\r)
	135
	136	any character but \n or \r is accepted in the input.
	137
	138	@par
	139	: Insert the terminal token on the left before the next terminal token on the right if the next terminal token on right parses.
	140	Usefull for when terminal tokens don't have a definate text state but only context state based on another terminal or character token.
	141	An example:
	142
	143	<Last_Resort> ::= 'external_command' : <Special_Label>\n
	144	<Special_Label> ::= (!\n\r\t)
	145
	146	In the example, <Last_Resort> gets processed when all other rules fail to parse.
	147	if <Special_Label> parses (reads in any character but \n\r\t) then the terminal token 'external_command'
	148	is inserted prior to the Special_Label for pass 2 processing. 'external_command' does not have have an explicit text
	149	representation but based on the context of no other rules matching and <Special_Label> parsing, 'external_command' is
	150	considered parsed.
	151	*/
	152	class _OgreExport Compiler2Pass
	153	{
	154
	155	protected:
	156
	157	// BNF operation types
	158	enum OperationType {otUNKNOWN, otRULE, otAND, otOR, otOPTIONAL,
	159	otREPEAT, otDATA, otNOT_TEST, otINSERT_TOKEN, otEND};
	160
	161	/** structure used to build rule paths
	162
	163	*/
	164	struct TokenRule
	165	{
	166	OperationType operation;
	167	size_t tokenID;
	168
	169	TokenRule(void) : operation(otUNKNOWN), tokenID(0) {}
	170	TokenRule(const OperationType ot, const size_t token)
	171	: operation(ot), tokenID(token) {}
	172	};
	173
	174	typedef std::vector<TokenRule> TokenRuleContainer;
	175	typedef TokenRuleContainer::iterator TokenRuleIterator;
	176
	177	static const size_t SystemTokenBase = 1000;
	178	enum SystemRuleToken {
	179	_no_token_ = SystemTokenBase,
	180	_character_,
	181	_value_,
	182	_no_space_skip_
	183	};
	184
	185	enum BNF_ID {BNF_UNKOWN = 0,
	186	BNF_SYNTAX, BNF_RULE, BNF_IDENTIFIER, BNF_IDENTIFIER_RIGHT, BNF_IDENTIFIER_CHARACTERS, BNF_ID_BEGIN, BNF_ID_END,
	187	BNF_CONSTANT_BEGIN, BNF_SET_RULE, BNF_EXPRESSION,
	188	BNF_AND_TERM, BNF_OR_TERM, BNF_TERM, BNF_TERM_ID, BNF_CONSTANT, BNF_OR, BNF_TERMINAL_SYMBOL, BNF_TERMINAL_START,
	189	BNF_REPEAT_EXPRESSION, BNF_REPEAT_BEGIN, BNF_REPEAT_END, BNF_SET, BNF_SET_BEGIN, BNF_SET_END,
	190	BNF_NOT_TEST, BNF_NOT_TEST_BEGIN, BNF_CONDITIONAL_TOKEN_INSERT, BNF_OPTIONAL_EXPRESSION,
	191	BNF_NOT_EXPRESSION, BNF_NOT_CHK,
	192	BNF_OPTIONAL_BEGIN, BNF_OPTIONAL_END, BNF_NO_TOKEN_START, BNF_SINGLEQUOTE, BNF_SINGLE_QUOTE_EXC, BNF_SET_END_EXC,
	193	BNF_ANY_CHARACTER, BNF_SPECIAL_CHARACTERS1,
	194	BNF_SPECIAL_CHARACTERS2, BNF_WHITE_SPACE_CHK,
	195
	196	BNF_LETTER, BNF_LETTER_DIGIT, BNF_DIGIT, BNF_WHITE_SPACE,
	197	BNF_ALPHA_SET, BNF_NUMBER_SET, BNF_SPECIAL_CHARACTER_SET1,
	198	BNF_SPECIAL_CHARACTER_SET2, BNF_SPECIAL_CHARACTER_SET3, BNF_NOT_CHARS,
	199
	200	// do not remove - this indicates where manually defined tokens end and where auto-gen ones start
	201	BNF_AUTOTOKENSTART
	202	};
	203
	204
	205	/** structure used to build lexeme Type library */
	206	struct LexemeTokenDef
	207	{
	208	size_t ID; /// Token ID which is the index into the Lexeme Token Definition Container
	209	bool hasAction; /// has an action associated with it. only applicable to terminal tokens
	210	bool isNonTerminal; /// if true then token is non-terminal
	211	size_t ruleID; /// index into Rule database for non-terminal token rulepath and lexeme
	212	bool isCaseSensitive; /// if true use case sensitivity when comparing lexeme to source
	213	String lexeme; /// text representation of token or valid characters for label parsing
	214
	215	LexemeTokenDef(void) : ID(0), hasAction(false), isNonTerminal(false), ruleID(0), isCaseSensitive(false) {}
	216	LexemeTokenDef( const size_t ID, const String& lexeme, const bool hasAction = false, const bool caseSensitive = false )
	217	: ID(ID)
	218	, hasAction(hasAction)
	219	, isNonTerminal(false)
	220	, ruleID(0)
	221	, isCaseSensitive(caseSensitive)
	222	, lexeme(lexeme)
	223	{
	224	}
	225
	226	};
	227
	228	typedef std::vector<LexemeTokenDef> LexemeTokenDefContainer;
	229	typedef LexemeTokenDefContainer::iterator LexemeTokenDefIterator;
	230
	231	typedef std::map<std::string, size_t> LexemeTokenMap;
	232	typedef LexemeTokenMap::iterator TokenKeyIterator;
	233	/// map used to lookup client token based on previously defined lexeme
	234
	235
	236	/** structure for Token instructions that are constructed during first pass*/
	237	struct TokenInst
	238	{
	239	size_t NTTRuleID; /// Non-Terminal Token Rule ID that generated Token
	240	size_t tokenID; /// expected Token ID. Could be UNKNOWN if valid token was not found.
	241	size_t line; /// line number in source code where Token was found
	242	size_t pos; /// Character position in source where Token was found
	243	bool found; /// is true if expected token was found
	244	};
	245
	246	typedef std::vector<TokenInst> TokenInstContainer;
	247	typedef TokenInstContainer::iterator TokenInstIterator;
	248
	249	// token que, definitions, rules
	250	struct TokenState
	251	{
	252	TokenInstContainer tokenQue;
	253	LexemeTokenDefContainer lexemeTokenDefinitions;
	254	TokenRuleContainer rootRulePath;
	255	LexemeTokenMap lexemeTokenMap;
	256	};
	257
	258	TokenState* mClientTokenState;
	259
	260	/// Active token que, definitions, rules currntly being used by parser
	261	TokenState* mActiveTokenState;
	262	/// the location within the token instruction container where pass 2 is
	263	mutable size_t mPass2TokenQuePosition;
	264	/** the que position of the previous token that had an action.
	265	A token's action is fired on the next token having an action.
	266	*/
	267	size_t mPreviousActionQuePosition;
	268	/** the que position for the next token that has an action.
	269	*/
	270	size_t mNextActionQuePosition;
	271
	272	/// pointer to the source to be compiled
	273	const String* mSource;
	274	/// name of the source to be compiled
	275	String mSourceName;
	276	size_t mEndOfSource;
	277
	278	size_t mCurrentLine; /// current line number in source being tokenized
	279	size_t mCharPos; /// position in current line in source being tokenized
	280	size_t mErrorCharPos; /// character position in source where last error occured
	281
	282	/// storage container for constants defined in source
	283	/// container uses Token index as a key associated with a float constant
	284	std::map<size_t, float> mConstants;
	285	/// storage container for string labels defined in source
	286	/// container uses Token index as a key associated with a label
	287	typedef std::map<size_t, String> LabelContainer;
	288	LabelContainer mLabels;
	289	/// flag indicates when a label is being parsed.
	290	/// It gets set false when a terminal token not of _character_ is encountered
	291	bool mLabelIsActive;
	292	/// the key of the active label being built during pass 1.
	293	/// a new key is calculated when mLabelIsActive switches from false to true
	294	size_t mActiveLabelKey;
	295	/// The active label that is receiving characters during pass 1.
	296	String* mActiveLabel;
	297	/// flag being true indicates that spaces are not to be skipped
	298	/// automatically gets set to false when mLabelIsActive goes to false
	299	bool mNoSpaceSkip;
	300	/// if flag is true then next terminal token is not added to token que if found
	301	/// but does effect rule path flow
	302	bool mNoTerminalToken;
	303	/// TokenID to insert if next rule finds a terminal token
	304	/// if zero then no token inserted
	305	size_t mInsertTokenID;
	306
	307	/// Active Contexts pattern used in pass 1 to determine which tokens are valid for a certain context
	308	uint mActiveContexts;
	309
	310	/** perform pass 1 of compile process
	311	scans source for lexemes that can be tokenized and then
	312	performs general semantic and context verification on each lexeme before it is tokenized.
	313	A tokenized instruction list is built to be used by Pass 2.
	314	A rule path can trigger Pass 2 execution if enough tokens have been generated in Pass 1.
	315	Pass 1 will then pass control to pass 2 temporarily until the current tokens have been consumed.
	316
	317	*/
	318	bool doPass1();
	319
	320	/** performs Pass 2 of compile process which is execution of the tokens
	321	@remark
	322	Pass 2 takes the token instructions generated in Pass 1 and
	323	builds the application specific instructions along with verifying
	324	symantic and context rules that could not be checked in Pass 1.
	325	@par
	326	Pass 2 execution consumes tokens and moves the Pass 2 token instruction position towards the end
	327	of the token container. Token execution can insert new tokens into the token container.
	328	*/
	329	bool doPass2();
	330
	331	/** execute the action associated with the token pointed to by the Pass 2 token instruction position.
	332	@remarks
	333	Its upto the child class to implement how it will associate a token key with and action.
	334	Actions should be placed at positions withing the BNF grammer (instruction que) that indicate
	335	enough tokens exist for pass 2 processing to take place.
	336	*/
	337	virtual void executeTokenAction(const size_t tokenID) = 0;
	338	/** Get the start ID for auto generated token IDs. This is also one pass the end of manually set token IDs.
	339	Manually set Token ID are usually setup in the client code through an enum type so its best to make the
	340	last entry the auto ID start position and return this enum value.
	341	This method gets called automatically just prior to setupTokenDefinitions() to ensure that any tokens that are auto generated are placed after
	342	the manually set ones.
	343	*/
	344	virtual size_t getAutoTokenIDStart() const = 0;
	345	/** setup client token definitions. Gets called when BNF grammer is being setup.
	346	*/
	347	virtual void setupTokenDefinitions(void) = 0;
	348	/** Gets the next token from the instruction que.
	349	@remarks
	350	If an unkown token is found then an exception is raised but
	351	the instruction pointer is still moved passed the unknown token. The subclass should catch the exception,
	352	provide an error message, and attempt recovery.
	353
	354	@param expectedTokenID if greater than 0 then an exception is raised if tokenID does not match.
	355	*/
	356	const TokenInst& getNextToken(const size_t expectedTokenID = 0) const
	357	{
	358	skipToken();
	359	return getCurrentToken(expectedTokenID);
	360	}
	361	/** Gets the current token from the instruction que.
	362	@remarks
	363	If an unkown token is found then an exception is raised.
	364	The subclass should catch the exception, provide an error message, and attempt recovery.
	365
	366	@param expectedTokenID if greater than 0 then an exception is raised if tokenID does not match.
	367
	368	*/
	369	const TokenInst& getCurrentToken(const size_t expectedTokenID = 0) const;
	370	/** If a next token instruction exist then test if its token ID matches.
	371	@remarks
	372	This method is usefull for peeking ahead during pass 2 to see if a certain
	373	token exists. If the tokens don't match or there is no next token (end of que)
	374	then false is returned.
	375	@param expectedTokenID is the ID of the token to match.
	376	*/
	377	bool testNextTokenID(const size_t expectedTokenID) const;
	378
	379	/** If a current token instruction exist then test if its token ID matches.
	380	@param expectedTokenID is the ID of the token to match.
	381	*/
	382	bool testCurrentTokenID(const size_t expectedTokenID) const
	383	{
	384	return mActiveTokenState->tokenQue[mPass2TokenQuePosition].tokenID == expectedTokenID;
	385	}
	386	/** skip to the next token in the pass2 queue.
	387	*/
	388	void skipToken(void) const;
	389	/** go back to the previous token in the pass2 queue.
	390	*/
	391	void replaceToken(void);
	392	/** Gets the next token's associated floating point value in the instruction que that was parsed from the
	393	text source. If an unkown token is found or no associated value was found then an exception is raised but
	394	the instruction pointer is still moved passed the unknown token. The subclass should catch the exception,
	395	provide an error message, and attempt recovery.
	396	*/
	397	float getNextTokenValue(void) const
	398	{
	399	skipToken();
	400	return getCurrentTokenValue();
	401	}
	402	/** Gets the current token's associated floating point value in the instruction que that was parsed from the
	403	text source.
	404	@remarks
	405	If an unkown token is found or no associated value was found then an exception is raised.
	406	The subclass should catch the exception, provide an error message, and attempt recovery.
	407	*/
	408	float getCurrentTokenValue(void) const;
	409	/** Gets the next token's associated text label in the instruction que that was parsed from the
	410	text source.
	411	@remarks
	412	If an unkown token is found or no associated label was found then an exception is raised but
	413	the instruction pointer is still moved passed the unknown token. The subclass should catch the exception,
	414	provide an error message, and attempt recovery.
	415	*/
	416	const String& getNextTokenLabel(void) const
	417	{
	418	skipToken();
	419	return getCurrentTokenLabel();
	420	}
	421	/** Gets the next token's associated text label in the instruction que that was parsed from the
	422	text source. If an unkown token is found or no associated label was found then an exception is raised.
	423	The subclass should catch the exception, provide an error message, and attempt recovery.
	424	*/
	425	const String& getCurrentTokenLabel(void) const;
	426	/** Get the next token's ID value.
	427	*/
	428	size_t getNextTokenID(void) const { return getNextToken().tokenID; }
	429	/** Get the current token's ID value.
	430	*/
	431	size_t getCurrentTokenID(void) const { return getCurrentToken().tokenID; }
	432	/** Get the next token's lexeme string. Handy when you don't want the ID but want the string
	433	representation.
	434	*/
	435	const String& getNextTokenLexeme(void) const
	436	{
	437	skipToken();
	438	return getCurrentTokenLexeme();
	439	}
	440	/** Get the current token's lexeme string. Handy when you don't want the ID but want the string
	441	representation.
	442	*/
	443	const String& getCurrentTokenLexeme(void) const;
	444	/** Gets the number of tokens waiting in the instruction que that need to be processed by an token action in pass 2.
	445	*/
	446	size_t getPass2TokenQueCount(void) const;
	447	/** Get the number of tokens not processed by action token.
	448	Client Actions should use this method to retreive the number of parameters(tokens)
	449	remaining to be processed in the action.
	450	*/
	451	size_t getRemainingTokensForAction(void) const;
	452	/** Manualy set the Pass2 Token que position.
	453	@remarks
	454	This method will also set the position of the next token in the pass2 que that
	455	has an action ensuring that getRemainingTokensForAction works currectly.
	456	This method is useful for when the token que must be reprocessed after
	457	pass1 and the position in the que must be changed so that an action will be triggered.
	458	@param pos is the new position within the Pass2 que
	459	@param activateAction if set true and the token at the new position has an action then the
	460	action is activated.
	461	*/
	462	void setPass2TokenQuePosition(size_t pos, const bool activateAction = false);
	463	/** Get the current position in the Pass2 Token Que.
	464	*/
	465	size_t getPass2TokenQuePosition(void) const { return mPass2TokenQuePosition; }
	466	/** Set the position of the next token action in the Pass2 Token Que.
	467	@remarks
	468	If the position is not within the que or there is no action associated with
	469	the token at the position in the que then NextActionQuePosition is not set.
	470	@param pos is the position in the Pass2 Token Que where the next action is.
	471	@param search if true then the que is searched from pos until an action is found.
	472	If the end of the que is reached and no action has been found then NextActionQuePosition
	473	is set to the end of the que and false is returned.
	474	*/
	475	bool setNextActionQuePosition(size_t pos, const bool search = false);
	476	/** Add a lexeme token association.
	477	@remarks
	478	The backend compiler uses the associations between lexeme and token when
	479	building the rule base from the BNF script so all associations must be done
	480	prior to compiling a source.
	481	@param lexeme is the name of the token and use when parsing the source to determin a match for a token.
	482	@param token is the ID associated with the lexeme. If token is 0 then the token ID is auto generated and returned.
	483	@param hasAction must be set true if the client wants an action triggered when this token is generated
	484	@param caseSensitive should be set true if lexeme match should use case sensitivity
	485	@return the ID of the token. Useful when auto generating token IDs.
	486	*/
	487	size_t addLexemeToken(const String& lexeme, const size_t token, const bool hasAction = false, const bool caseSensitive = false);
	488
	489	/** Sets up the parser rules for the client based on the BNF Grammer text passed in.
	490	@remarks
	491	Raises an exception if the grammer did not compile successfully. This method gets called
	492	when a call to compile occurs and no compiled BNF grammer exists, otherwise nothing will happen since the compiler has no rules to work
	493	with. The grammer only needs to be set once during the lifetime of the compiler unless the
	494	grammer changes.
	495	@note
	496	BNF Grammer rules are cached once the BNF grammer source is compiled.
	497	The client should never have to call this method directly.
	498	*/
	499	void setClientBNFGrammer(void);
	500
	501
	502
	503	/// find the eol charater
	504	void findEOL();
	505
	506	/** check to see if the text at the present position in the source is a numerical constant
	507	@param fvalue is a reference that will receive the float value that is in the source
	508	@param charsize reference to receive number of characters that make of the value in the source
	509	@return
	510	true if characters form a valid float representation
	511	false if a number value could not be extracted
	512	*/
	513	bool isFloatValue(float& fvalue, size_t& charsize) const;
	514
	515	/** Check if source at current position is supposed to be a user defined character label.
	516	A new label is processed when previous operation was not _character_ otherwise the processed
	517	character (if match was found) is added to the current label. This allows _character_ operations
	518	to be chained together to form a crude regular expression to build a label.
	519	@param rulepathIDX index into rule path database of token to validate.
	520	@return
	521	true if token was found for character label.
	522	*/
	523	bool isCharacterLabel(const size_t rulepathIDX);
	524	/** check to see if the text is in the lexeme text library
	525	@param lexeme points to begining of text where a lexem token might exist
	526	@param caseSensitive set to true if match should be case sensitive
	527	@return
	528	true if a matching token could be found in the token type library
	529	false if could not be tokenized
	530	*/
	531	bool isLexemeMatch(const String& lexeme, const bool caseSensitive) const;
	532	/// Check if pass 1 has parsed to the end of the source
	533	bool isEndOfSource() const { return mCharPos >= mEndOfSource; }
	534	/// position to the next possible valid sysmbol
	535	bool positionToNextLexeme();
	536	/** process input source text using rulepath to determine allowed tokens
	537	@remarks
	538	the method is reentrant and recursive
	539	if a non-terminal token is encountered in the current rule path then the method is
	540	called using the new rule path referenced by the non-terminal token
	541	Tokens can have the following operation states which effects the flow path of the rule
	542	RULE: defines a rule path for the non-terminal token
	543	AND: the token is required for the rule to pass
	544	OR: if the previous tokens failed then try these ones
	545	OPTIONAL: the token is optional and does not cause the rule to fail if the token is not found
	546	REPEAT: the token is required but there can be more than one in a sequence
	547	DATA: Used by a previous token ie for character sets
	548	NOTTEST: performs negative lookahead ie make sure the next token is not of a certain type
	549	END: end of the rule path - the method returns the succuss of the rule
	550
	551	@param rulepathIDX index into an array of Token Rules that define a rule path to be processed
	552	@return
	553	true if rule passed - all required tokens found
	554	false if one or more tokens required to complete the rule were not found
	555	*/
	556	bool processRulePath( size_t rulepathIDX);
	557
	558
	559	/** setup ActiveContexts - should be called by subclass to setup initial language contexts
	560	*/
	561	void setActiveContexts(const uint contexts){ mActiveContexts = contexts; }
	562
	563	/// comment specifiers are hard coded
	564	void skipComments();
	565
	566	/// find end of line marker and move past it
	567	void skipEOL();
	568
	569	/// skip all the white space which includes spaces and tabs
	570	void skipWhiteSpace();
	571
	572
	573	/** check if current position in source has the lexeme text equivalent to the TokenID
	574	@param rulepathIDX index into rule path database of token to validate
	575	@param activeRuleID index of non-terminal rule that generated the token
	576	@return
	577	true if token was found
	578	false if token lexeme text does not match the source text
	579	if token is non-terminal then processRulePath is called
	580	*/
	581	bool ValidateToken(const size_t rulepathIDX, const size_t activeRuleID);
	582
	583	/** scan through all the rules and initialize token definition with index to rules for non-terminal tokens.
	584	Gets called when internal grammer is being verified or after client grammer has been parsed.
	585	@param grammerName is the name of the grammer the token rules are for
	586	*/
	587	void verifyTokenRuleLinks(const String& grammerName);
	588	/** Checks the last token instruction and if it has an action then it triggers the action of the previously
	589	found token having an action.
	590	*/
	591	void checkTokenActionTrigger(void);
	592	/** Get the text representation of the rule path. This is a good way to way to visually verify
	593	that the BNF grammer did compile correctly.
	594	@param ruleID is the index into the rule path.
	595	@param level is the number of levels a non-terminal will expand to. Defaults to 0 if not set which
	596	will cause non-terminals to not expand.
	597	*/
	598	String getBNFGrammerTextFromRulePath(size_t ruleID, const size_t level = 0);
	599
	600
	601	private:
	602	// used for interpreting BNF script
	603	// keep it as static so that only one structure is created
	604	// no matter how many times this class is instantiated.
	605	static TokenState mBNFTokenState;
	606	// maintain a map of BNF grammer
	607	typedef std::map<String, TokenState> TokenStateContainer;
	608	static TokenStateContainer mClientTokenStates;
	609	/// if a previous token action was setup then activate it now
	610	void activatePreviousTokenAction(void);
	611	/// initialize token definitions and rule paths
	612	void initBNFCompiler(void);
	613	/// Convert BNF grammer token que created in pass 1 into a BNF rule path
	614	void buildClientBNFRulePaths(void);
	615	/// modify the last rule in the container. An end operation is added to the rule path.
	616	void modifyLastRule(const OperationType pendingRuleOp, const size_t tokenID);
	617	/** get the token ID for a lexeme in the client state. If the lexeme is not found then
	618	it is added to the map and definition container and a new tokenID created.
	619	@return the ID of the token.
	620	*/
	621	size_t getClientLexemeTokenID(const String& lexeme, const bool isCaseSensitive = false);
	622	/// Extract a Non Terminal identifier from the token que
	623	void extractNonTerminal(const OperationType pendingRuleOp);
	624	/// Extract a Terminal lexeme from the token que and add to current rule expression
	625	void extractTerminal(const OperationType pendingRuleOp, const bool notoken = false);
	626	/// Extract a set from the token que and add to current rule expression
	627	void extractSet(const OperationType pendingRuleOp);
	628	/// Extract a numeric constant from the token que and add it to the current rule expression
	629	void extractNumericConstant(const OperationType pendingRuleOp);
	630	/// changes previous terminal token rule into a conditional terminal token insert rule
	631	void setConditionalTokenInsert(void);
	632	/// get the lexeme text of a rule.
	633	String getLexemeText(size_t& ruleID, const size_t level = 0);
	634
	635
	636	public:
	637
	638	/// constructor
	639	Compiler2Pass();
	640	virtual ~Compiler2Pass() {}
	641
	642	/** compile the source - performs 2 passes.
	643	First pass is to tokinize, check semantics and context.
	644	The second pass is performed by using tokens to look up function implementors and executing
	645	them which convert tokens to application specific instructions.
	646	@remark
	647	Pass 2 only gets executed if Pass 1 has built enough tokens to complete a rule path and found no errors
	648	@param source a pointer to the source text to be compiled
	649	@return
	650	true if Pass 1 and Pass 2 are successfull
	651	false if any errors occur in Pass 1 or Pass 2
	652	*/
	653	bool compile(const String& source, const String& sourceName);
	654	/** gets BNF Grammer. Gets called when BNF grammer has to be compiled for the first time.
	655	*/
	656	virtual const String& getClientBNFGrammer(void) const = 0;
	657
	658	/** get the name of the BNF grammer.
	659	*/
	660	virtual const String& getClientGrammerName(void) const = 0;
	661
	662	};
	663
	664	}
	665
	666	#endif
	667

Note: See TracBrowser for help on using the repository browser.

Download in other formats: