Bidirectional grammars for machine-code decoding and encoding. (English) Zbl 1426.68069

Summary: Binary analysis, which analyzes machine code, requires a decoder for converting bits into abstract syntax of machine instructions. Binary rewriting requires an encoder for converting instructions to bits. We propose a domain-specific language that enables the specification of both decoding and encoding in a single bidirectional grammar. With dependent types, a bigrammar enables the extraction of an executable decoder and encoder as well as a correctness proof showing their consistency. The bigrammar DSL is embedded in Coq with machine-checked proofs. We have used the bigrammar DSL to specify the decoding and encoding of subsets of the x86-32 and MIPS instruction sets. We have also extracted an executable decoder and encoder from the x86 bigrammar with competitive performance.


68N99 Theory of software
68Q42 Grammars and rewriting systems
68V15 Theorem proving (automated and interactive theorem provers, deduction, resolution, etc.)
Full Text: DOI


[1] Abadi, M., Budiu, M., Erlingsson, Ú., Ligatti, J.: Control-flow integrity. In: 12th ACM Conference on Computer and Communications Security (CCS), pp. 340-353 (2005)
[2] Alimarine, A., Smetsers, S., van Weelden, A., van Eekelen, M.C.J.D., Plasmeijer, R.: There and back again: arrows for invertible programming. In: Proceedings of the ACM SIGPLAN Workshop on Haskell, pp. 86-97 (2005)
[3] Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M., Lavoie, Y., Tawbi, N.: Static detection of malicious code in executable programs. Int. J. Requir. Eng. (2001) · Zbl 1063.68035
[4] Bohannon, A., Foster, J.N., Pierce, B.C., Pilkiewicz, A., Schmitt, A.: Boomerang: resourceful lenses for string data. In: 35th ACM Symposium on Principles of Programming Languages (POPL), pp. 407-419 (2008) · Zbl 1295.68077
[5] Brabrand, C., Møller, A., Schwartzbach, M.I.: Dual syntax for XML languages. In: 10th International Symposium on Database Programming Languages (DBPL), pp. 27-41 (2005) · Zbl 1159.68407
[6] Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Computer Aided Verification (CAV), pp. 463-469 (2011)
[7] Brzozowski, JA, Derivatives of regular expressions, J. ACM, 11, 481-494, (1964) · Zbl 0225.94044
[8] Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: 12th Usenix Security Symposium, pp. 169-186 (2003)
[9] The Coq proof assistant. https://coq.inria.fr/ · Zbl 1039.68543
[10] Ford, B.: Parsing expression grammars: A recognition-based syntactic foundation. In: 31st ACM Symposium on Principles of Programming Languages (POPL), pp. 111-122 (2004) · Zbl 1325.68120
[11] Fox, A.C.J.: Improved tool support for machine-code decompilation in HOL4. In: 6th International Conference on Interactive Theorem Proving (ITP), pp. 187-202 (2015) · Zbl 1465.68048
[12] Godefroid, P., Levin, M.Y., Molnar, D.A.: Automated whitebox fuzz testing. In: Network and Distributed System Security Symposium (NDSS) (2008)
[13] Jansson, P., Jeuring, J.: Polytypic compact printing and parsing. In: 8th European Symposium on Programming (ESOP), pp. 273-287 (1999)
[14] Jourdan, J.H., Pottier, F., Leroy, X.: Validating LR(1) parsers. In: European Symposium on Programming (ESOP), pp. 397-416 (2012) · Zbl 1352.68131
[15] Kawanaka, S., Hosoya, H.: biXid: a bidirectional transformation language for XML. In: ACM International Conference on Functional programming (ICFP), pp. 201-214 (2006) · Zbl 1321.68127
[16] Kennedy, A, Pickler combinators, J. Funct. Program., 14, 727-739, (2004) · Zbl 1063.68035
[17] Kotha, A., Anand, K., Smithson, M., Yellareddy, G., Barua, R.: Automatic parallelization in a binary rewriter. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2010, pp. 547-557 (2010)
[18] Kroll, J., Dean, D.: BakerSFIeld: Bringing software fault isolation to x64. www.jkroll.com/papers/bakersfield_sfi.pdf · Zbl 1163.68317
[19] McCamant, S., Morrisett, G.: Evaluating SFI for a CISC architecture. In: 15th Usenix Security Symposium (2006)
[20] Might, M., Darais, D., Spiewak, D.: Parsing with derivatives: a functional pearl. In: ACM International Conference on Functional programming (ICFP), pp. 189-195 (2011) · Zbl 1323.68138
[21] Morrisett, G., Tan, G., Tassarotti, J., Tristan, J.B., Gan, E.: Rocksalt: better, faster, stronger SFI for the x86. In: ACM Conference on Programming Language Design and Implementation (PLDI), pp. 395-404 (2012)
[22] Owens, S; Reppy, J; Turon, A, Regular-expression derivatives re-examined, J. Funct. Program., 19, 173-190, (2009) · Zbl 1163.68317
[23] Ramsey, N; Fernández, MF, Specifying representations of machine instructions, ACM Trans. Program. Lang. Syst., 19, 492-524, (1997)
[24] Rendel, T., Ostermann, K.: Invertible syntax descriptions: unifying parsing and pretty printing. In: Third ACM Haskell Symposium on Haskell, pp. 1-12 (2010)
[25] Reps, T., Lim, J., Thakur, A., Balakrishnan, G., Lal, A.: There’s plenty of room at the bottom: analyzing and verifying machine code. In: Computer Aided Verification (CAV), pp. 41-56 (2010) · Zbl 0225.94044
[26] Sepp, A., Kranz, J., Simon, A.: GDSL: a generic decoder specification language for interpreting machine language. In: Tools for Automatic Program Analysis, pp. 53-64 (2012)
[27] Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: A new approach to computer security via binary analysis. In: Proceedings of the 4th International Conference on Information Systems Security (2008)
[28] Tan, G., Morrisett, G.: Bidirectional grammars for machine-code decoding and encoding. In: 8th International Conference on Verified Software: Theories, Tools, and Experiments (VSTTE), pp. 73-89 (2016) · Zbl 1426.68069
[29] Wahbe, R., Lucco, S., Anderson, T., Graham, S.: Efficient software-based fault isolation. In: ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pp. 203-216. ACM Press, New York (1993)
[30] Wartell, R., Mohan, V., Hamlen, K.W., Lin, Z.: Securing untrusted code via compiler-agnostic binary rewriting. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 299-308 (2012)
[31] Xu, Z., Miller, B., Reps, T.: Safety checking of machine code. In: ACM Conference on Programming Language Design and Implementation (PLDI), pp. 70-82 (2000)
[32] Yee, B., Sehr, D., Dardyk, G., Chen, B., Muth, R., Ormandy, T., Okasaka, S., Narula, N., Fullagar, N.: Native client: A sandbox for portable, untrusted x86 native code. In: IEEE Symposium on Security and Privacy (S&P) (2009)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.