querybuilder's Introduction
<meta charset="utf-8" lang="en"> **Project: QueryBuilder** This is the report of our progress so far in developing a domain specific language known as QueryBuilder. <!-- General ======== For each of your team's implementations, explain the following (where appropriate and applicable): - details on calling conventions, input and output data formats, limitations, bugs, and special features. - negative aspects of your program (limitations, known bugs) - positive aspects (extensions, special features) - describe your choice of modularization (abstractions), data structures, and algorithms - explain anything you did that is likely to be different from what other students may have done - justify any design decisions for which the rationale isn't immediately clear Feel free to modify the structure of this `readme.html` file to fit the current assignment and to fit how you wish to present your findings. Submission ----------- Create a zip file that contains all of your code, this `readme.html` document, and any additional files of evidence (ex: screenshots). Use folders to divide up the subparts of your submission. For example, your zip file could look like the following: ~~~~ readme.html ada/ primepartitions.adb screenshot.png go/ primepartitions.go screenshot.png haskell/ primepartitions.hs screenshot.png prolog/ primepartitions.pl screenshot.png python/ primepartitions.py screenshot.png ~~~~ --> Authors ============= <!-- Note: wrapping table in div.noheader will hide the table's header --> <!-- Note: wrapping table in div.firstcol will style the first column different from other columns --> <div class="noheader firstcol"> | ------------------|------------- names | Owen Elliott / Jeff Jewett computer + OS | cse21703 (Ubuntu) time to complete | 30 hours </div> Introduction ================ QueryBuilder is a DSL for composing and executing regex statements regex is confusing, the DSL seeks to simplify it and allow for testing useful to write complex regex queries quickly solves the problem of capture groups not being very robust Features ================ labels for repeated inputs simplified common expressions distinguish between capture groups (format output accordingly) compile to regex template functions test regex against input data (STDIN | INPUT FILE) Examples ================ File 1: example.qb ``` namespace example label capture firstname = "[a-zA-z]+" capture lastname = "[a-zA-z]+" label fullname = firstname "\s" lastname function fencepost(x,y) = x "(?>" y x ")*" label namelist = fencepost(fullname, ",\s") build output = namelist #build regex form test output @STDIN test output 'inputfile.txt' ``` File 2: test.qb ``` import "previousfile.qb" namespace test label capture[] email = "[\w\.]*@\w*.\w*" label emaillist = example.fencepost(email, ",\s") build output = emaillist ``` Process ================ namelist fencepost(fullname,",\s") fullname(,",\s"fullname)* firstname"\s"lastname"(?>,\s"firstname"\s"lastname)* (firstname)"\s"(lastname)"(?>,\s"(firstname)"\s"(lastname))* ([a-zA-z]+)\s([a-zA-z]+)(?>,\s([a-zA-z]+)\s([a-zA-z]+))* Test ================ input 1 -------------------------------- Joe Smith, Debra Jones Jeff Jewett, Owen Elliott output 1 -------------------------------- Match 1: Joe Smith, Debra Jones Group 1: Joe Group lastname1: Smith Group 3: Debra Group lastname2: Jones Match 2: Jeff Jewett, Owen Elliott Match 1: Joe Smith, Debra Jones Group 1: Jeff Group lastname1: Jewett Group 3: Owen Group lastname2: Elliott input 2 -------------------------------- [email protected], [email protected] output 2 -------------------------------- Match 1: [email protected], [email protected] Group email1: [email protected] Group email2: [email protected] Grammar ================ import: appends following file to the top of the script, generates symbol table (like a #include) label: a named pattern of regex literals or function outputs that can be referenced later, added to the symbol table capture: a modifier to a label that indicates that it is a capture group (can also name capture groups in a list) function: a template pattern with placeholder values build: translates the following pattern into proper regex. can be modified with flags. test: given a built query and a target, search the target using the given query. namespace: delcare the title of this file's labels / functions, if included in another file reference with namespace.label Parser ================ QueryBuilder files are recognized as valid input using the ANTLR grammar system. We ran tests to see if the parser could tell the difference between valid and invalid syntax, and it passed all ten tests. All ten tests used are in the /test/ subdirectory. ![Parse Tree](asset1.png) Here are a few sample tests for valid input test6 ``` import './test.qb' #import the file label ex = 'ex' label mod = '?' function concat (a,b) = a b #the parser accepts this input ``` test5 ``` import 'test.qb' namespace example label new = test.foo test.bar('a') build out = test.prebuilt @GM test out @STDIN #the parser accepts this input ``` Here are a few sample tests for invalid input test 3 ``` namespace bad3 label nothingWrongSoFar = 'ok' build forgotAnEquals nothingWrongSoFar #the parser rejects this input ``` test 4 ``` namespace bad4 label missingQuotes = $hello^ build output = missingQuotes #the parser rejects this input ``` Translator ================== We used the ANTLR listener classes as a tool to build a translator for QueryBuilder. We accomplished our goal of making a language that is easy to write in while being functional. The best way to explain it is to show a few examples. The following QueryBuilder program is an example of the proper way to utilize the language. ``` # declare the namespace # (optional unless you import this file into a different file) namespace example capture firstname = "[a-zA-z]+" capture[] lastname = "[a-zA-z]+" label fullname = firstname "\s" lastname function fencepost(x,y) = x "(" y x ")*" label namelist = fencepost(fullname, ",\s?") # build regex form build output = namelist # test against STDIN test output @STDIN ``` ![Result (from STDIN)](result1.png) As you can see, the compiled regex statement successfully matched the user input string. You can also match a different file by replacing `test output @STDIN` with something like `test output 'test/inputfile.txt'` ![Result (from file)](result2.png) Here are a few of the situations where our translator will catch semantic errors in the code. We created a SymbolTable class that stores all the named labels, functions, builds, etc. and used it to make sure the code doesn't try to declare more than one of these with the same identifier. The following code tries to declare two labels with the same name: ``` label example = "[^abc]" label example = "[^def]" build output = example test output @STDIN ``` The process ends with the message `Cannot overwrite an existing label or function name: example`. The previous example also did not have a namespace declared. While that is legal, it would cause a problem if a different program imported it, as its symbols are referenced by that name. The error message `Imported files must have a namespace: test/testFile1.qb` would appear if testFile1 was about to be imported and it didn't have a declared namespace. Similarly, if you try to import a file that doesn't exist, the program will halt with the error message `Could not open file filename.txt`. Another feature of QueryBuilder is the concept of namespaces. By importing other files, their symbols are accessed with the syntax `namespace.symbol`. For example, take a look at the following program. ``` import "test/good4.qb" label test1 = aNameSpace.ftest build output = test1 test output @STDIN ``` And this is what `good4.qb` has: ``` namespace aNameSpace label ex = 'ex' function concat (a,b) = a "(?>" b a ")*" label ftest = concat('g', 'h') build out = ftest ``` ![Result (imports)](result3.png) Non Trivial Examples ===================== IPv6 Address Matcher ---------------------- This code uses a combination of labels and a function to generate a long regex query that matches all forms of IPv6 addresses. ``` namespace ipv6 label addrChunk1to4 = '(?>[0-9a-fA-F]{1,4}:)' function setCountChanger(num1,num2) = addrChunk1to4 '{1,' num1 '}' '(?>:[0-9a-fA-F]{1,4}){1,' num2 '}' # IPv6 Address Type 1 (1-4 chars per set) (8 sets) # Example: 2001:db8:85a3:0000:0000:8a2e:370:7334 capture addrType1 = addrChunk1to4 '{7,7}' '[0-9a-fA-F]{1,4}' # IPv6 Address Type 2 (1-4 chars per set) (1-7 sets :: 0 sets) # Example: fe42:26a1:370:507:: capture addrType2 = addrChunk1to4 '{1,7}:' # IPv6 Address Type 3 (1-4 chars per set) (1-6 sets :: 1 set) # Example: ff06:1d09:1344:cfc2:31::c33d:1313 capture addrType3 = setCountChanger('6','1') # IPv6 Address Type 4 (1-4 chars per set) (1-5 sets :: 1-2 sets) # Example: ff06:1d09:1344:cfc2:31::c33d:1313 capture addrType4 = setCountChanger('5','2') # IPv6 Address Type 5 (1-4 chars per set) (1-4 sets :: 1-3 sets) # Example: ff06:1d09:1344:cfc2::31:c33d:1313 capture addrType5 = setCountChanger('4','3') # IPv6 Address Type 6 (1-4 chars per set) (1-3 sets :: 1-4 sets) # Example: ff06:1d09:1344::cfc2:31:c33d:1313 capture addrType6 = setCountChanger('3','4') # IPv6 Address Type 7 (1-4 chars per set) (1-2 sets :: 1-5 sets) # Example: ff06:1d09::1344:cfc2:31:c33d:1313 capture addrType7 = setCountChanger('2','4') # IPv6 Address Type 8 (1-4 chars per set) (1 set :: 1-6 sets) # Example: ff06::1d09:1344:cfc2:31:c33d:1313 capture addrType8 = setCountChanger('1','6') # IPv6 Address Type 9 (1-4 chars per set) (0 sets :: 0-7 sets) # Example: ::3e13:1335 capture addrType9 = ':((:[0-9a-fA-F]{1,4}){1,7}|:)' # build and test label ipv6addr = addrType1 '|' addrType2 '|' addrType3 '|' addrType4 '|' addrType5 '|' addrType6 '|' addrType7 '|' addrType8 '|' addrType9 build query = ipv6addr test query "ipv6.txt" test query "badIPV6data.txt" ``` ![IPv6 Address Matcher](ipv6.png) HTML Table Query ---------------------- This code takes an html page as an input and matches the data in any tables on that page (including rows and cols). ``` namespace htmltable label notbrackets = '[^<>]' label spacebetween = notbrackets '*' function dup6(a) = a a a a a a function opentag(type) = '<' type spacebetween '>' function closetag(type) = '</' type notbrackets '*>' label openth = opentag('th') label closeth = closetag('th') capture[] thval = notbrackets '*' label thpair = openth thval closeth label opentd = opentag('t[dh]') label closetd = closetag('t[dh]') capture[] tdval = '.*?' label tdpair = opentd tdval closetd label opentr = opentag('tr') label closetr = closetag('tr') label thpairspace = thpair spacebetween label trh6inner = spacebetween dup6(thpairspace) spacebetween label trhpair = opentr trh6inner closetr label tdpairspace = tdpair spacebetween label trd6inner = spacebetween dup6(tdpairspace) spacebetween label trdpair = opentr trd6inner closetr build open = trdpair test open 'htmltable.html' ``` ![HTML Table](wikitableoutput.png) Conclusions ================= QueryBuilder is a pretty useful tool when trying to conceptualize and build complex queries. Compared to doing regex by hand, our DSL is vastly superior. For example, `ipv6.qb` generates a *very* complex query that is essentially a list of possible forms for IPv6 address. Using the function I declared at the start, however, I was able to create this massive query in less than 20 lines of code (comments not included). In addition, our `test` functionality can optionally output the results to a .json file for convenience. These features are what makes QueryBuilder a more useful tool than a generalized language. ![JSON example](json.png) In our opinion, the code looks very clean and readable, which can help someone who comes accross the code to understand it, even if they dont know what this means: ``` (?<ipv6addrType1>(?>[0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4})|(?<ipv6addrType2>(?>[0-9a-fA-F]{1,4}:){1,7}:)|(?<ipv6addrType3>(?>[0-9a-fA-F]{1,4}:){1,6}(?>:[0-9a-fA-F]{1,4}){1,1})|(?<ipv6addrType4>(?>[0-9a-fA-F]{1,4}:){1,5}(?>:[0-9a-fA-F]{1,4}){1,2})|(?<ipv6addrType5>(?>[0-9a-fA-F]{1,4}:){1,4}(?>:[0-9a-fA-F]{1,4}){1,3})|(?<ipv6addrType6>(?>[0-9a-fA-F]{1,4}:){1,3}(?>:[0-9a-fA-F]{1,4}){1,4})|(?<ipv6addrType7>(?>[0-9a-fA-F]{1,4}:){1,2}(?>:[0-9a-fA-F]{1,4}){1,4})|(?<ipv6addrType8>(?>[0-9a-fA-F]{1,4}:){1,1}(?>:[0-9a-fA-F]{1,4}){1,6})|(?<ipv6addrType9>:((:[0-9a-fA-F]{1,4}){1,7}|:)) ``` We took about equal responsibilty for this project, although Jeff did more coding while Owen did more documenting. <!-- Feel free to modify the following to fit a theme of your choosing --> <link href="https://fonts.googleapis.com/css?family=Open+Sans&display=swap" rel="stylesheet"> <!-- a sans-serif font --> <style> /* A TAYLOR-INSPIRED THEME */ body {font-family:'Open Sans',sans-serif;} .md a:link, .md a:visited {color:hsl(252,23.0%,44.3%); font-family:'Open Sans',sans-serif;} .md table.table th {background-color:hsl(252,23.0%,44.3%);} .md .noheader th {display:none;} .md .firstcol td:first-child {white-space:pre;color:white;vertical-align:top;font-weight:bold;border-color:black;background:hsl(252,23.0%,54.3%);} .md .firstcol tr:nth-child(even) td:first-child {background:hsl(252,23.0%,44.3%);} </style> <!-- ****************************** --> <!-- Leave the content below --> <!-- ****************************** --> <!-- The script and style below are added for clarity and to workaround a bug --> <script> // this is a hack to workaround a bug in Markdeep+Mathjax, where // `$`` is automatically converted to `\(`` and `\)`` too soon. // the following code will replace the innerHTML of all elements // with class "dollar" with a dollar sign. setTimeout(function() { var dollars = document.getElementsByClassName('dollar'); for(var i = 0; i < dollars.length; i++) { dollars[i].innerHTML = '&#' + '36;'; // split to prevent conversion to $ } }, 1000); </script> <style> /* adding some styling to <code> tags (but not <pre><code> coding blocks!) */ :not(pre) > code { background-color: rgba(0,0,0,0.05); outline: 1px solid rgba(0,0,0,0.15); margin-left: 0.25em; margin-right: 0.25em; } /* fixes table of contents of medium-length document from looking weird if admonitions are behind */ .md div.mediumTOC { background: white; } .md div.admonition { position: initial !important; } </style> <!-- Leave the following Markdeep formatting code, as this will format your text above to look nice in a wed browser --> <style class="fallback">body{visibility:hidden;white-space:pre;font-family:monospace}</style> <script src="https://casual-effects.com/markdeep/latest/markdeep.min.js"></script> <script>window.alreadyProcessedMarkdeep||(document.body.style.visibility="visible");</script>
querybuilder's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.