[444] | 1 | .\" Copyright (c) 1992, 1993, 1994 Henry Spencer. |
---|
| 2 | .\" Copyright (c) 1992, 1993, 1994 |
---|
| 3 | .\" The Regents of the University of California. All rights reserved. |
---|
| 4 | .\" |
---|
| 5 | .\" This code is derived from software contributed to Berkeley by |
---|
| 6 | .\" Henry Spencer. |
---|
| 7 | .\" |
---|
| 8 | .\" Redistribution and use in source and binary forms, with or without |
---|
| 9 | .\" modification, are permitted provided that the following conditions |
---|
| 10 | .\" are met: |
---|
| 11 | .\" 1. Redistributions of source code must retain the above copyright |
---|
| 12 | .\" notice, this list of conditions and the following disclaimer. |
---|
| 13 | .\" 2. Redistributions in binary form must reproduce the above copyright |
---|
| 14 | .\" notice, this list of conditions and the following disclaimer in the |
---|
| 15 | .\" documentation and/or other materials provided with the distribution. |
---|
| 16 | .\" 3. All advertising materials mentioning features or use of this software |
---|
| 17 | .\" must display the following acknowledgement: |
---|
| 18 | .\" This product includes software developed by the University of |
---|
| 19 | .\" California, Berkeley and its contributors. |
---|
| 20 | .\" 4. Neither the name of the University nor the names of its contributors |
---|
| 21 | .\" may be used to endorse or promote products derived from this software |
---|
| 22 | .\" without specific prior written permission. |
---|
| 23 | .\" |
---|
| 24 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND |
---|
| 25 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
---|
| 26 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
---|
| 27 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE |
---|
| 28 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
---|
| 29 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS |
---|
| 30 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
---|
| 31 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
---|
| 32 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
---|
| 33 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
---|
| 34 | .\" SUCH DAMAGE. |
---|
| 35 | .\" |
---|
| 36 | .\" @(#)regex.3 8.4 (Berkeley) 3/20/94 |
---|
| 37 | .\" $FreeBSD: src/lib/libc/regex/regex.3,v 1.9 2001/10/01 16:08:58 ru Exp $ |
---|
| 38 | .\" |
---|
| 39 | .Dd March 20, 1994 |
---|
| 40 | .Dt REGEX 3 |
---|
| 41 | .Os |
---|
| 42 | .Sh NAME |
---|
| 43 | .Nm regcomp , |
---|
| 44 | .Nm regexec , |
---|
| 45 | .Nm regerror , |
---|
| 46 | .Nm regfree |
---|
| 47 | .Nd regular-expression library |
---|
| 48 | .Sh LIBRARY |
---|
| 49 | .Lb libc |
---|
| 50 | .Sh SYNOPSIS |
---|
| 51 | .In sys/types.h |
---|
| 52 | .In regex.h |
---|
| 53 | .Ft int |
---|
| 54 | .Fn regcomp "regex_t *restrict preg" "const char *_restrictpattern" "int cflags" |
---|
| 55 | .Ft int |
---|
| 56 | .Fo regexec |
---|
| 57 | .Fa "const regex_t *_restrict preg" "const char *_restrict string" |
---|
| 58 | .Fa "size_t nmatch" "regmatch_t pmatch[_restrict]" "int eflags" |
---|
| 59 | .Fc |
---|
| 60 | .Ft size_t |
---|
| 61 | .Fo regerror |
---|
| 62 | .Fa "int errcode" "const regex_t *_restrict preg" |
---|
| 63 | .Fa "char *_restrict errbuf" "size_t errbuf_size" |
---|
| 64 | .Fc |
---|
| 65 | .Ft void |
---|
| 66 | .Fn regfree "regex_t *preg" |
---|
| 67 | .Sh DESCRIPTION |
---|
| 68 | These routines implement |
---|
| 69 | .St -p1003.2 |
---|
| 70 | regular expressions |
---|
| 71 | .Pq Do RE Dc Ns s ; |
---|
| 72 | see |
---|
| 73 | .Xr re_format 7 . |
---|
| 74 | .Fn Regcomp |
---|
| 75 | compiles an RE written as a string into an internal form, |
---|
| 76 | .Fn regexec |
---|
| 77 | matches that internal form against a string and reports results, |
---|
| 78 | .Fn regerror |
---|
| 79 | transforms error codes from either into human-readable messages, |
---|
| 80 | and |
---|
| 81 | .Fn regfree |
---|
| 82 | frees any dynamically-allocated storage used by the internal form |
---|
| 83 | of an RE. |
---|
| 84 | .Pp |
---|
| 85 | The header |
---|
| 86 | .Aq Pa regex.h |
---|
| 87 | declares two structure types, |
---|
| 88 | .Ft regex_t |
---|
| 89 | and |
---|
| 90 | .Ft regmatch_t , |
---|
| 91 | the former for compiled internal forms and the latter for match reporting. |
---|
| 92 | It also declares the four functions, |
---|
| 93 | a type |
---|
| 94 | .Ft regoff_t , |
---|
| 95 | and a number of constants with names starting with |
---|
| 96 | .Dq Dv REG_ . |
---|
| 97 | .Pp |
---|
| 98 | .Fn Regcomp |
---|
| 99 | compiles the regular expression contained in the |
---|
| 100 | .Fa pattern |
---|
| 101 | string, |
---|
| 102 | subject to the flags in |
---|
| 103 | .Fa cflags , |
---|
| 104 | and places the results in the |
---|
| 105 | .Ft regex_t |
---|
| 106 | structure pointed to by |
---|
| 107 | .Fa preg . |
---|
| 108 | .Fa Cflags |
---|
| 109 | is the bitwise OR of zero or more of the following flags: |
---|
| 110 | .Bl -tag -width REG_EXTENDED |
---|
| 111 | .It Dv REG_EXTENDED |
---|
| 112 | Compile modern |
---|
| 113 | .Pq Dq extended |
---|
| 114 | REs, |
---|
| 115 | rather than the obsolete |
---|
| 116 | .Pq Dq basic |
---|
| 117 | REs that |
---|
| 118 | are the default. |
---|
| 119 | .It Dv REG_BASIC |
---|
| 120 | This is a synonym for 0, |
---|
| 121 | provided as a counterpart to |
---|
| 122 | .Dv REG_EXTENDED |
---|
| 123 | to improve readability. |
---|
| 124 | .It Dv REG_NOSPEC |
---|
| 125 | Compile with recognition of all special characters turned off. |
---|
| 126 | All characters are thus considered ordinary, |
---|
| 127 | so the |
---|
| 128 | .Dq RE |
---|
| 129 | is a literal string. |
---|
| 130 | This is an extension, |
---|
| 131 | compatible with but not specified by |
---|
| 132 | .St -p1003.2 , |
---|
| 133 | and should be used with |
---|
| 134 | caution in software intended to be portable to other systems. |
---|
| 135 | .Dv REG_EXTENDED |
---|
| 136 | and |
---|
| 137 | .Dv REG_NOSPEC |
---|
| 138 | may not be used |
---|
| 139 | in the same call to |
---|
| 140 | .Fn regcomp . |
---|
| 141 | .It Dv REG_ICASE |
---|
| 142 | Compile for matching that ignores upper/lower case distinctions. |
---|
| 143 | See |
---|
| 144 | .Xr re_format 7 . |
---|
| 145 | .It Dv REG_NOSUB |
---|
| 146 | Compile for matching that need only report success or failure, |
---|
| 147 | not what was matched. |
---|
| 148 | .It Dv REG_NEWLINE |
---|
| 149 | Compile for newline-sensitive matching. |
---|
| 150 | By default, newline is a completely ordinary character with no special |
---|
| 151 | meaning in either REs or strings. |
---|
| 152 | With this flag, |
---|
| 153 | .Ql [^ |
---|
| 154 | bracket expressions and |
---|
| 155 | .Ql .\& |
---|
| 156 | never match newline, |
---|
| 157 | a |
---|
| 158 | .Ql ^\& |
---|
| 159 | anchor matches the null string after any newline in the string |
---|
| 160 | in addition to its normal function, |
---|
| 161 | and the |
---|
| 162 | .Ql $\& |
---|
| 163 | anchor matches the null string before any newline in the |
---|
| 164 | string in addition to its normal function. |
---|
| 165 | .It Dv REG_PEND |
---|
| 166 | The regular expression ends, |
---|
| 167 | not at the first NUL, |
---|
| 168 | but just before the character pointed to by the |
---|
| 169 | .Va re_endp |
---|
| 170 | member of the structure pointed to by |
---|
| 171 | .Fa preg . |
---|
| 172 | The |
---|
| 173 | .Va re_endp |
---|
| 174 | member is of type |
---|
| 175 | .Ft "const char *" . |
---|
| 176 | This flag permits inclusion of NULs in the RE; |
---|
| 177 | they are considered ordinary characters. |
---|
| 178 | This is an extension, |
---|
| 179 | compatible with but not specified by |
---|
| 180 | .St -p1003.2 , |
---|
| 181 | and should be used with |
---|
| 182 | caution in software intended to be portable to other systems. |
---|
| 183 | .El |
---|
| 184 | .Pp |
---|
| 185 | When successful, |
---|
| 186 | .Fn regcomp |
---|
| 187 | returns 0 and fills in the structure pointed to by |
---|
| 188 | .Fa preg . |
---|
| 189 | One member of that structure |
---|
| 190 | (other than |
---|
| 191 | .Va re_endp ) |
---|
| 192 | is publicized: |
---|
| 193 | .Va re_nsub , |
---|
| 194 | of type |
---|
| 195 | .Ft size_t , |
---|
| 196 | contains the number of parenthesized subexpressions within the RE |
---|
| 197 | (except that the value of this member is undefined if the |
---|
| 198 | .Dv REG_NOSUB |
---|
| 199 | flag was used). |
---|
| 200 | If |
---|
| 201 | .Fn regcomp |
---|
| 202 | fails, it returns a non-zero error code; |
---|
| 203 | see |
---|
| 204 | .Sx DIAGNOSTICS . |
---|
| 205 | .Pp |
---|
| 206 | .Fn Regexec |
---|
| 207 | matches the compiled RE pointed to by |
---|
| 208 | .Fa preg |
---|
| 209 | against the |
---|
| 210 | .Fa string , |
---|
| 211 | subject to the flags in |
---|
| 212 | .Fa eflags , |
---|
| 213 | and reports results using |
---|
| 214 | .Fa nmatch , |
---|
| 215 | .Fa pmatch , |
---|
| 216 | and the returned value. |
---|
| 217 | The RE must have been compiled by a previous invocation of |
---|
| 218 | .Fn regcomp . |
---|
| 219 | The compiled form is not altered during execution of |
---|
| 220 | .Fn regexec , |
---|
| 221 | so a single compiled RE can be used simultaneously by multiple threads. |
---|
| 222 | .Pp |
---|
| 223 | By default, |
---|
| 224 | the NUL-terminated string pointed to by |
---|
| 225 | .Fa string |
---|
| 226 | is considered to be the text of an entire line, minus any terminating |
---|
| 227 | newline. |
---|
| 228 | The |
---|
| 229 | .Fa eflags |
---|
| 230 | argument is the bitwise OR of zero or more of the following flags: |
---|
| 231 | .Bl -tag -width REG_STARTEND |
---|
| 232 | .It Dv REG_NOTBOL |
---|
| 233 | The first character of |
---|
| 234 | the string |
---|
| 235 | is not the beginning of a line, so the |
---|
| 236 | .Ql ^\& |
---|
| 237 | anchor should not match before it. |
---|
| 238 | This does not affect the behavior of newlines under |
---|
| 239 | .Dv REG_NEWLINE . |
---|
| 240 | .It Dv REG_NOTEOL |
---|
| 241 | The NUL terminating |
---|
| 242 | the string |
---|
| 243 | does not end a line, so the |
---|
| 244 | .Ql $\& |
---|
| 245 | anchor should not match before it. |
---|
| 246 | This does not affect the behavior of newlines under |
---|
| 247 | .Dv REG_NEWLINE . |
---|
| 248 | .It Dv REG_STARTEND |
---|
| 249 | The string is considered to start at |
---|
| 250 | .Fa string |
---|
| 251 | + |
---|
| 252 | .Fa pmatch Ns [0]. Ns Va rm_so |
---|
| 253 | and to have a terminating NUL located at |
---|
| 254 | .Fa string |
---|
| 255 | + |
---|
| 256 | .Fa pmatch Ns [0]. Ns Va rm_eo |
---|
| 257 | (there need not actually be a NUL at that location), |
---|
| 258 | regardless of the value of |
---|
| 259 | .Fa nmatch . |
---|
| 260 | See below for the definition of |
---|
| 261 | .Fa pmatch |
---|
| 262 | and |
---|
| 263 | .Fa nmatch . |
---|
| 264 | This is an extension, |
---|
| 265 | compatible with but not specified by |
---|
| 266 | .St -p1003.2 , |
---|
| 267 | and should be used with |
---|
| 268 | caution in software intended to be portable to other systems. |
---|
| 269 | Note that a non-zero |
---|
| 270 | .Va rm_so |
---|
| 271 | does not imply |
---|
| 272 | .Dv REG_NOTBOL ; |
---|
| 273 | .Dv REG_STARTEND |
---|
| 274 | affects only the location of the string, |
---|
| 275 | not how it is matched. |
---|
| 276 | .El |
---|
| 277 | .Pp |
---|
| 278 | See |
---|
| 279 | .Xr re_format 7 |
---|
| 280 | for a discussion of what is matched in situations where an RE or a |
---|
| 281 | portion thereof could match any of several substrings of |
---|
| 282 | .Fa string . |
---|
| 283 | .Pp |
---|
| 284 | Normally, |
---|
| 285 | .Fn regexec |
---|
| 286 | returns 0 for success and the non-zero code |
---|
| 287 | .Dv REG_NOMATCH |
---|
| 288 | for failure. |
---|
| 289 | Other non-zero error codes may be returned in exceptional situations; |
---|
| 290 | see |
---|
| 291 | .Sx DIAGNOSTICS . |
---|
| 292 | .Pp |
---|
| 293 | If |
---|
| 294 | .Dv REG_NOSUB |
---|
| 295 | was specified in the compilation of the RE, |
---|
| 296 | or if |
---|
| 297 | .Fa nmatch |
---|
| 298 | is 0, |
---|
| 299 | .Fn regexec |
---|
| 300 | ignores the |
---|
| 301 | .Fa pmatch |
---|
| 302 | argument (but see below for the case where |
---|
| 303 | .Dv REG_STARTEND |
---|
| 304 | is specified). |
---|
| 305 | Otherwise, |
---|
| 306 | .Fa pmatch |
---|
| 307 | points to an array of |
---|
| 308 | .Fa nmatch |
---|
| 309 | structures of type |
---|
| 310 | .Ft regmatch_t . |
---|
| 311 | Such a structure has at least the members |
---|
| 312 | .Va rm_so |
---|
| 313 | and |
---|
| 314 | .Va rm_eo , |
---|
| 315 | both of type |
---|
| 316 | .Ft regoff_t |
---|
| 317 | (a signed arithmetic type at least as large as an |
---|
| 318 | .Ft off_t |
---|
| 319 | and a |
---|
| 320 | .Ft ssize_t ) , |
---|
| 321 | containing respectively the offset of the first character of a substring |
---|
| 322 | and the offset of the first character after the end of the substring. |
---|
| 323 | Offsets are measured from the beginning of the |
---|
| 324 | .Fa string |
---|
| 325 | argument given to |
---|
| 326 | .Fn regexec . |
---|
| 327 | An empty substring is denoted by equal offsets, |
---|
| 328 | both indicating the character following the empty substring. |
---|
| 329 | .Pp |
---|
| 330 | The 0th member of the |
---|
| 331 | .Fa pmatch |
---|
| 332 | array is filled in to indicate what substring of |
---|
| 333 | .Fa string |
---|
| 334 | was matched by the entire RE. |
---|
| 335 | Remaining members report what substring was matched by parenthesized |
---|
| 336 | subexpressions within the RE; |
---|
| 337 | member |
---|
| 338 | .Va i |
---|
| 339 | reports subexpression |
---|
| 340 | .Va i , |
---|
| 341 | with subexpressions counted (starting at 1) by the order of their opening |
---|
| 342 | parentheses in the RE, left to right. |
---|
| 343 | Unused entries in the array (corresponding either to subexpressions that |
---|
| 344 | did not participate in the match at all, or to subexpressions that do not |
---|
| 345 | exist in the RE (that is, |
---|
| 346 | .Va i |
---|
| 347 | > |
---|
| 348 | .Fa preg Ns -> Ns Va re_nsub ) ) |
---|
| 349 | have both |
---|
| 350 | .Va rm_so |
---|
| 351 | and |
---|
| 352 | .Va rm_eo |
---|
| 353 | set to -1. |
---|
| 354 | If a subexpression participated in the match several times, |
---|
| 355 | the reported substring is the last one it matched. |
---|
| 356 | (Note, as an example in particular, that when the RE |
---|
| 357 | .Ql "(b*)+" |
---|
| 358 | matches |
---|
| 359 | .Ql bbb , |
---|
| 360 | the parenthesized subexpression matches each of the three |
---|
| 361 | .So Li b Sc Ns s |
---|
| 362 | and then |
---|
| 363 | an infinite number of empty strings following the last |
---|
| 364 | .Ql b , |
---|
| 365 | so the reported substring is one of the empties.) |
---|
| 366 | .Pp |
---|
| 367 | If |
---|
| 368 | .Dv REG_STARTEND |
---|
| 369 | is specified, |
---|
| 370 | .Fa pmatch |
---|
| 371 | must point to at least one |
---|
| 372 | .Ft regmatch_t |
---|
| 373 | (even if |
---|
| 374 | .Fa nmatch |
---|
| 375 | is 0 or |
---|
| 376 | .Dv REG_NOSUB |
---|
| 377 | was specified), |
---|
| 378 | to hold the input offsets for |
---|
| 379 | .Dv REG_STARTEND . |
---|
| 380 | Use for output is still entirely controlled by |
---|
| 381 | .Fa nmatch ; |
---|
| 382 | if |
---|
| 383 | .Fa nmatch |
---|
| 384 | is 0 or |
---|
| 385 | .Dv REG_NOSUB |
---|
| 386 | was specified, |
---|
| 387 | the value of |
---|
| 388 | .Fa pmatch Ns [0] |
---|
| 389 | will not be changed by a successful |
---|
| 390 | .Fn regexec . |
---|
| 391 | .Pp |
---|
| 392 | .Fn Regerror |
---|
| 393 | maps a non-zero |
---|
| 394 | .Fa errcode |
---|
| 395 | from either |
---|
| 396 | .Fn regcomp |
---|
| 397 | or |
---|
| 398 | .Fn regexec |
---|
| 399 | to a human-readable, printable message. |
---|
| 400 | If |
---|
| 401 | .Fa preg |
---|
| 402 | is |
---|
| 403 | .No non\- Ns Dv NULL , |
---|
| 404 | the error code should have arisen from use of |
---|
| 405 | the |
---|
| 406 | .Ft regex_t |
---|
| 407 | pointed to by |
---|
| 408 | .Fa preg , |
---|
| 409 | and if the error code came from |
---|
| 410 | .Fn regcomp , |
---|
| 411 | it should have been the result from the most recent |
---|
| 412 | .Fn regcomp |
---|
| 413 | using that |
---|
| 414 | .Ft regex_t . |
---|
| 415 | .No ( Fn Regerror |
---|
| 416 | may be able to supply a more detailed message using information |
---|
| 417 | from the |
---|
| 418 | .Ft regex_t . ) |
---|
| 419 | .Fn Regerror |
---|
| 420 | places the NUL-terminated message into the buffer pointed to by |
---|
| 421 | .Fa errbuf , |
---|
| 422 | limiting the length (including the NUL) to at most |
---|
| 423 | .Fa errbuf_size |
---|
| 424 | bytes. |
---|
| 425 | If the whole message won't fit, |
---|
| 426 | as much of it as will fit before the terminating NUL is supplied. |
---|
| 427 | In any case, |
---|
| 428 | the returned value is the size of buffer needed to hold the whole |
---|
| 429 | message (including terminating NUL). |
---|
| 430 | If |
---|
| 431 | .Fa errbuf_size |
---|
| 432 | is 0, |
---|
| 433 | .Fa errbuf |
---|
| 434 | is ignored but the return value is still correct. |
---|
| 435 | .Pp |
---|
| 436 | If the |
---|
| 437 | .Fa errcode |
---|
| 438 | given to |
---|
| 439 | .Fn regerror |
---|
| 440 | is first ORed with |
---|
| 441 | .Dv REG_ITOA , |
---|
| 442 | the |
---|
| 443 | .Dq message |
---|
| 444 | that results is the printable name of the error code, |
---|
| 445 | e.g.\& |
---|
| 446 | .Dq Dv REG_NOMATCH , |
---|
| 447 | rather than an explanation thereof. |
---|
| 448 | If |
---|
| 449 | .Fa errcode |
---|
| 450 | is |
---|
| 451 | .Dv REG_ATOI , |
---|
| 452 | then |
---|
| 453 | .Fa preg |
---|
| 454 | shall be |
---|
| 455 | .No non\- Ns Dv NULL |
---|
| 456 | and the |
---|
| 457 | .Va re_endp |
---|
| 458 | member of the structure it points to |
---|
| 459 | must point to the printable name of an error code; |
---|
| 460 | in this case, the result in |
---|
| 461 | .Fa errbuf |
---|
| 462 | is the decimal digits of |
---|
| 463 | the numeric value of the error code |
---|
| 464 | (0 if the name is not recognized). |
---|
| 465 | .Dv REG_ITOA |
---|
| 466 | and |
---|
| 467 | .Dv REG_ATOI |
---|
| 468 | are intended primarily as debugging facilities; |
---|
| 469 | they are extensions, |
---|
| 470 | compatible with but not specified by |
---|
| 471 | .St -p1003.2 , |
---|
| 472 | and should be used with |
---|
| 473 | caution in software intended to be portable to other systems. |
---|
| 474 | Be warned also that they are considered experimental and changes are possible. |
---|
| 475 | .Pp |
---|
| 476 | .Fn Regfree |
---|
| 477 | frees any dynamically-allocated storage associated with the compiled RE |
---|
| 478 | pointed to by |
---|
| 479 | .Fa preg . |
---|
| 480 | The remaining |
---|
| 481 | .Ft regex_t |
---|
| 482 | is no longer a valid compiled RE |
---|
| 483 | and the effect of supplying it to |
---|
| 484 | .Fn regexec |
---|
| 485 | or |
---|
| 486 | .Fn regerror |
---|
| 487 | is undefined. |
---|
| 488 | .Pp |
---|
| 489 | None of these functions references global variables except for tables |
---|
| 490 | of constants; |
---|
| 491 | all are safe for use from multiple threads if the arguments are safe. |
---|
| 492 | .Sh IMPLEMENTATION CHOICES |
---|
| 493 | There are a number of decisions that |
---|
| 494 | .St -p1003.2 |
---|
| 495 | leaves up to the implementor, |
---|
| 496 | either by explicitly saying |
---|
| 497 | .Dq undefined |
---|
| 498 | or by virtue of them being |
---|
| 499 | forbidden by the RE grammar. |
---|
| 500 | This implementation treats them as follows. |
---|
| 501 | .Pp |
---|
| 502 | See |
---|
| 503 | .Xr re_format 7 |
---|
| 504 | for a discussion of the definition of case-independent matching. |
---|
| 505 | .Pp |
---|
| 506 | There is no particular limit on the length of REs, |
---|
| 507 | except insofar as memory is limited. |
---|
| 508 | Memory usage is approximately linear in RE size, and largely insensitive |
---|
| 509 | to RE complexity, except for bounded repetitions. |
---|
| 510 | See |
---|
| 511 | .Sx BUGS |
---|
| 512 | for one short RE using them |
---|
| 513 | that will run almost any system out of memory. |
---|
| 514 | .Pp |
---|
| 515 | A backslashed character other than one specifically given a magic meaning |
---|
| 516 | by |
---|
| 517 | .St -p1003.2 |
---|
| 518 | (such magic meanings occur only in obsolete |
---|
| 519 | .Bq Dq basic |
---|
| 520 | REs) |
---|
| 521 | is taken as an ordinary character. |
---|
| 522 | .Pp |
---|
| 523 | Any unmatched |
---|
| 524 | .Ql [\& |
---|
| 525 | is a |
---|
| 526 | .Dv REG_EBRACK |
---|
| 527 | error. |
---|
| 528 | .Pp |
---|
| 529 | Equivalence classes cannot begin or end bracket-expression ranges. |
---|
| 530 | The endpoint of one range cannot begin another. |
---|
| 531 | .Pp |
---|
| 532 | .Dv RE_DUP_MAX , |
---|
| 533 | the limit on repetition counts in bounded repetitions, is 255. |
---|
| 534 | .Pp |
---|
| 535 | A repetition operator |
---|
| 536 | .Ql ( ?\& , |
---|
| 537 | .Ql *\& , |
---|
| 538 | .Ql +\& , |
---|
| 539 | or bounds) |
---|
| 540 | cannot follow another |
---|
| 541 | repetition operator. |
---|
| 542 | A repetition operator cannot begin an expression or subexpression |
---|
| 543 | or follow |
---|
| 544 | .Ql ^\& |
---|
| 545 | or |
---|
| 546 | .Ql |\& . |
---|
| 547 | .Pp |
---|
| 548 | .Ql |\& |
---|
| 549 | cannot appear first or last in a (sub)expression or after another |
---|
| 550 | .Ql |\& , |
---|
| 551 | i.e. an operand of |
---|
| 552 | .Ql |\& |
---|
| 553 | cannot be an empty subexpression. |
---|
| 554 | An empty parenthesized subexpression, |
---|
| 555 | .Ql "()" , |
---|
| 556 | is legal and matches an |
---|
| 557 | empty (sub)string. |
---|
| 558 | An empty string is not a legal RE. |
---|
| 559 | .Pp |
---|
| 560 | A |
---|
| 561 | .Ql {\& |
---|
| 562 | followed by a digit is considered the beginning of bounds for a |
---|
| 563 | bounded repetition, which must then follow the syntax for bounds. |
---|
| 564 | A |
---|
| 565 | .Ql {\& |
---|
| 566 | .Em not |
---|
| 567 | followed by a digit is considered an ordinary character. |
---|
| 568 | .Pp |
---|
| 569 | .Ql ^\& |
---|
| 570 | and |
---|
| 571 | .Ql $\& |
---|
| 572 | beginning and ending subexpressions in obsolete |
---|
| 573 | .Pq Dq basic |
---|
| 574 | REs are anchors, not ordinary characters. |
---|
| 575 | .Sh SEE ALSO |
---|
| 576 | .Xr grep 1 , |
---|
| 577 | .Xr re_format 7 |
---|
| 578 | .Pp |
---|
| 579 | .St -p1003.2 , |
---|
| 580 | sections 2.8 (Regular Expression Notation) |
---|
| 581 | and |
---|
| 582 | B.5 (C Binding for Regular Expression Matching). |
---|
| 583 | .Sh DIAGNOSTICS |
---|
| 584 | Non-zero error codes from |
---|
| 585 | .Fn regcomp |
---|
| 586 | and |
---|
| 587 | .Fn regexec |
---|
| 588 | include the following: |
---|
| 589 | .Pp |
---|
| 590 | .Bl -tag -width REG_ECOLLATE -compact |
---|
| 591 | .It Dv REG_NOMATCH |
---|
| 592 | .Fn regexec |
---|
| 593 | failed to match |
---|
| 594 | .It Dv REG_BADPAT |
---|
| 595 | invalid regular expression |
---|
| 596 | .It Dv REG_ECOLLATE |
---|
| 597 | invalid collating element |
---|
| 598 | .It Dv REG_ECTYPE |
---|
| 599 | invalid character class |
---|
| 600 | .It Dv REG_EESCAPE |
---|
| 601 | .Ql \e |
---|
| 602 | applied to unescapable character |
---|
| 603 | .It Dv REG_ESUBREG |
---|
| 604 | invalid backreference number |
---|
| 605 | .It Dv REG_EBRACK |
---|
| 606 | brackets |
---|
| 607 | .Ql "[ ]" |
---|
| 608 | not balanced |
---|
| 609 | .It Dv REG_EPAREN |
---|
| 610 | parentheses |
---|
| 611 | .Ql "( )" |
---|
| 612 | not balanced |
---|
| 613 | .It Dv REG_EBRACE |
---|
| 614 | braces |
---|
| 615 | .Ql "{ }" |
---|
| 616 | not balanced |
---|
| 617 | .It Dv REG_BADBR |
---|
| 618 | invalid repetition count(s) in |
---|
| 619 | .Ql "{ }" |
---|
| 620 | .It Dv REG_ERANGE |
---|
| 621 | invalid character range in |
---|
| 622 | .Ql "[ ]" |
---|
| 623 | .It Dv REG_ESPACE |
---|
| 624 | ran out of memory |
---|
| 625 | .It Dv REG_BADRPT |
---|
| 626 | .Ql ?\& , |
---|
| 627 | .Ql *\& , |
---|
| 628 | or |
---|
| 629 | .Ql +\& |
---|
| 630 | operand invalid |
---|
| 631 | .It Dv REG_EMPTY |
---|
| 632 | empty (sub)expression |
---|
| 633 | .It Dv REG_ASSERT |
---|
| 634 | can't happen - you found a bug |
---|
| 635 | .It Dv REG_INVARG |
---|
| 636 | invalid argument, e.g. negative-length string |
---|
| 637 | .El |
---|
| 638 | .Sh HISTORY |
---|
| 639 | Originally written by |
---|
| 640 | .An Henry Spencer . |
---|
| 641 | Altered for inclusion in the |
---|
| 642 | .Bx 4.4 |
---|
| 643 | distribution. |
---|
| 644 | .Sh BUGS |
---|
| 645 | This is an alpha release with known defects. |
---|
| 646 | Please report problems. |
---|
| 647 | .Pp |
---|
| 648 | The back-reference code is subtle and doubts linger about its correctness |
---|
| 649 | in complex cases. |
---|
| 650 | .Pp |
---|
| 651 | .Fn Regexec |
---|
| 652 | performance is poor. |
---|
| 653 | This will improve with later releases. |
---|
| 654 | .Fa Nmatch |
---|
| 655 | exceeding 0 is expensive; |
---|
| 656 | .Fa nmatch |
---|
| 657 | exceeding 1 is worse. |
---|
| 658 | .Fn Regexec |
---|
| 659 | is largely insensitive to RE complexity |
---|
| 660 | .Em except |
---|
| 661 | that back |
---|
| 662 | references are massively expensive. |
---|
| 663 | RE length does matter; in particular, there is a strong speed bonus |
---|
| 664 | for keeping RE length under about 30 characters, |
---|
| 665 | with most special characters counting roughly double. |
---|
| 666 | .Pp |
---|
| 667 | .Fn Regcomp |
---|
| 668 | implements bounded repetitions by macro expansion, |
---|
| 669 | which is costly in time and space if counts are large |
---|
| 670 | or bounded repetitions are nested. |
---|
| 671 | An RE like, say, |
---|
| 672 | .Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}" |
---|
| 673 | will (eventually) run almost any existing machine out of swap space. |
---|
| 674 | .Pp |
---|
| 675 | There are suspected problems with response to obscure error conditions. |
---|
| 676 | Notably, |
---|
| 677 | certain kinds of internal overflow, |
---|
| 678 | produced only by truly enormous REs or by multiply nested bounded repetitions, |
---|
| 679 | are probably not handled well. |
---|
| 680 | .Pp |
---|
| 681 | Due to a mistake in |
---|
| 682 | .St -p1003.2 , |
---|
| 683 | things like |
---|
| 684 | .Ql "a)b" |
---|
| 685 | are legal REs because |
---|
| 686 | .Ql )\& |
---|
| 687 | is |
---|
| 688 | a special character only in the presence of a previous unmatched |
---|
| 689 | .Ql (\& . |
---|
| 690 | This can't be fixed until the spec is fixed. |
---|
| 691 | .Pp |
---|
| 692 | The standard's definition of back references is vague. |
---|
| 693 | For example, does |
---|
| 694 | .Ql "a\e(\e(b\e)*\e2\e)*d" |
---|
| 695 | match |
---|
| 696 | .Ql "abbbd" ? |
---|
| 697 | Until the standard is clarified, |
---|
| 698 | behavior in such cases should not be relied on. |
---|
| 699 | .Pp |
---|
| 700 | The implementation of word-boundary matching is a bit of a kludge, |
---|
| 701 | and bugs may lurk in combinations of word-boundary matching and anchoring. |
---|